5,953 Matching Annotations
  1. Jun 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study reports that IT neurons have biased representations toward low spatial frequency

      (SF) and faster decoding of low SFs than high SFs. High SF-preferred neurons, and low SF-preferred neurons to a lesser degree, perform better category decoding than neurons with other profiles (U and inverted U shaped). SF coding also shows more sparseness than category coding in the earlier phase of the response and less sparseness in the later phase. The results are also contrasted with predictions of various DNN models.

      Strengths:

      The study addressed an important issue on the representations of SF information in a high-level visual area. Data are analyzed with LDA which can effectively reduce the dimensionality of neuronal responses and retain category information.

      We would like to express our sincere gratitude for your insightful and constructive comments which greatly contributed to the refinement of the manuscript. We appreciate the time and effort you dedicated to reviewing our work and providing suggestions. We have carefully considered each of your comments and addressed the suggested revisions accordingly.

      Weaknesses:

      The results are likely compromised by improper stimulus timing and unmatched spatial frequency spectrums of stimuli in different categories.

      The authors used a very brief stimulus duration (35ms), which would degrade the visual system's contrast sensitivity to medium and high SF information disproportionately (see Nachmias, JOSAA, 1967). Therefore, IT neurons in the study could have received more degraded medium and high SF inputs compared to low SF inputs, which may be at least partially responsible for higher firing rates to low SF R1 stimuli (Figure 1c) and poorer recall performance with median and high SF R3-R5 stimuli in LDA decoding. The issue may also to some degree explain the delayed onset of recall to higher SF stimuli (Figure 2a), preferred low SF with an earlier T1 onset (Figure 2b), lower firing rate to high SF during T1 (Figure 2c), somewhat increased firing rate to high SF during T2 (because weaker high SF inputs would lead to later onset, Figure 2d).

      We appreciate your concern regarding the course-to-fine nature of SF processing in the vision hierarchy and the short exposure time of our paradigm. According to your comment, we repeated the analysis of SF representation with 200ms exposure time as illustrated in Appendix 1 - Figure 4. Our recorded data contains the 200ms version of exposure time for all neurons in the main phase. As can be seen, the results are similar to what we found with 33 ms experiments.

      Next, we bring your attention to the following observations:

      (1) According to Figure 2d, the average firing rate of IT neurons for HSF could be higher than LSF in the late response phase. Therefore, the amount of HSF input received by the IT neurons is as much as LSF, however, its impact on the IT response is observable in the later phase of the response. Thus, the LSF preference is because of the temporal advantage of the LSF processing rather than contrast sensitivity.

      (2) According to Figure 3a, 6% of the neurons are HSF-preferred and their firing rate in HSF is comparable to the LSF firing rate in the LSF-preferred group. This analysis is carried out in the early phase of the response (70-170 ms). While most of the neurons prefer LSF, this observation shows that there is an HSF input that excites a small group of neurons. Furthermore, the highest separability index also belongs to the HSF-preferred profile in the early phase of the response which supports the impact of the HSF part of the input.

      (3) Similar LSF-preferred responses are also reported by Chen et al. (2018) (50ms for SC) and Zhang et al. (2023) (3.5 - 4 secs for V2 and V4) for longer duration times.

      Our results suggest that the LSF-preferred nature of the IT responses in terms of firing rate and recall, is not due to the weakness or lack of input source (or information) for HSF but rather to the processing nature of the SF in the vision hierarchy.

      To address this issue in the manuscript:

      Figure Appendix 1 - Figure 4 is added to the manuscript and shows the recall value and onset for R1-R5 with 200ms of exposure time.

      We added the following description to the discussion:

      “To rule out the degraded contrast sensitivity of the visual system to medium and high SF information because of the brief exposure time, we repeated the analysis with 200ms exposure time as illustrated in Appendix 1 - Figure 4 which indicates the same LSF-preferred results. Furthermore, according to Figure 2, the average firing rate of IT neurons for HSF could be higher than LSF in the late response phase. It indicates that the amount of HSF input received by the IT neurons in the later phase is as much as LSF, however, its impact on the IT response is observable in the later phase of the response. Thus, the LSF preference is because of the temporal advantage of the LSF processing rather than contrast sensitivity. Next, according to Figure 3(a), 6\% of the neurons are HSF-preferred and their firing rate in HSF is comparable to the LSF firing rate in the LSF-preferred group. This analysis is carried out in the early phase of the response (70-170ms). While most of the neurons prefer LSF, this observation shows that there is an HSF input that excites a small group of neurons. Additionally, the highest SI belongs to the HSF-preferred profile in the early phase of the response which supports the impact of the HSF part of the input. Similar LSF-preferred responses are also reported by Chen et. al. (2018) (50ms for SC) and Zhang et. al. (2023) (3.5 - 4 secs for V2 and V4). Therefore, our results show that the LSF-preferred nature of the IT responses in terms of firing rate and recall, is not due to the weakness or lack of input source (or information) for HSF but rather to the processing nature of the SF in the IT cortex.”

      Figure 3b shows greater face coding than object coding by high SF and to a lesser degree by low SF neurons. Only the inverted-U-shaped neurons displayed slightly better object coding than face coding. Overall the results give an impression that IT neurons are significantly more capable of coding faces than coding objects, which is inconsistent with the general understanding of the functions of IT neurons. The problem may lie with the selection of stimulus images (Figure 1b). To study SF-related category coding, the images in two categories need to have similar SF spectrums in the Fourier domain. Such efforts are not mentioned in the manuscript, and a look at the images in Figure 1b suggests that such efforts are likely not properly made. The ResNet18 decoding results in Figure 6C, in that IT neurons of different profiles show similar face and object coding, might be closer to reality.

      Because of the limited number of stimuli in our experiments, it is hard to discuss the category selectivity, which needs a higher number of stimuli. To overcome the limited number of stimuli in our experiment, we fixed 60% (nine out of 15 stimuli) while varying the remaining stimuli to reduce the selective bias. To check the coding capability of the IT neurons for face and non-face objects, we evaluated the recall of face vs. non-face classification in intact stimuli (similar to classifiers stated in the manuscript). Results show that at the population level, the recall value for objects is 90.45%, and for faces is 92.45%. However, the difference is not significant (p-value=0.44). On the other hand, we note that a large difference in the SI value does not translate directly to the classification accuracy, rather it illustrates the strength of representation.

      Regarding the SF spectrums, after matching the luminance and contrast of the images we matched the power of the images concerning SF and category. Powers are calculated using the sum of the absolute value of the Fourier transform of the image. Considering all stimuli, the ANOVA analysis shows that various SF bands have similar power (one-way ANOVA, p-value=0.24). Furthermore, comparing the power of faces and images in all SF bands (including intact) and both unscrambled and scrambled images indicates no significant difference between face and object (p-vale > 0.1). Therefore, the result of Figure 3b suggests that IT employs various SF bands for the recognition of various objects.

      Comparing the results of CNNs and IT shows that the CNNs do not capture the complexities of the IT cortex in terms of SF. One of the sources of this difference is because of the behavioral saliency of the face stimulus in the training of the primate visual system.

      To address this issue in the manuscript:

      The following description is added to the discussion:

      “… the decoding performance of category classification (face vs. non-face) in intact stimuli is 94.2%. The recall value for objects vs. scrambled is 90.45%, and for faces vs. scrambled is 92.45% (p-value=0.44), which indicates the high level of generalizability and validity characterizing our results.”

      The following description is added to the method section, SF filtering.

      “Finally, we equalized the stimulus power in all SF bands (intact, R-R5). The SF power among all conditions (all SF bands, face vs. non-face and unscrambled vs. scrambled) does not vary significantly (p-value > 0.1). SF power is calculated as the sum of the square value of the image coefficients in the Fourier domain.”

      Reviewer #2 (Public Review):

      Summary:

      This paper aimed to examine the spatial frequency selectivity of macaque inferotemporal (IT) neurons and its relation to category selectivity. The authors suggest in the present study that some IT neurons show a sensitivity for the spatial frequency of scrambled images. Their report suggests a shift in preferred spatial frequency during the response, from low to high spatial frequencies. This agrees with a coarse-to-fine processing strategy, which is in line with multiple studies in the early visual cortex. In addition, they report that the selectivity for faces and objects, relative to scrambled stimuli, depends on the spatial frequency tuning of the neurons.

      Strengths:

      Previous studies using human fMRI and psychophysics studied the contribution of different spatial frequency bands to object recognition, but as pointed out by the authors little is known about the spatial frequency selectivity of single IT neurons. This study addresses this gap and they show that at least some IT neurons show a sensitivity for spatial frequency and

      interestingly show a tendency for coarse-to-fine processing.

      We extend our sincere appreciation for your thoughtful and constructive feedback on our paper. We are grateful for the time and expertise you invested in reviewing our work. Your detailed suggestions have been instrumental in addressing several key aspects of the paper, contributing to its clarity and scholarly merit. We have carefully considered each of your comments and have made revisions accordingly.

      Weaknesses and requested clarifications:

      (1) It is unclear whether the effects described in this paper reflect a sensitivity to spatial frequency, i.e. in cycles/ deg (depends on the distance from the observer and changes when rescaling the image), or is a sensitivity to cycles /image, largely independent of image scale. How is it related to the well-documented size tolerance of IT neuron selectivity?

      Our stimuli are filtered using cycles/images and knowing the distance of the subject to the monitor, we can calculate the cycles/degrees. To the best of our knowledge, this is also the case for all other SF-related studies. To find the relation of observations to the cycles/image and degree/image, one should keep one of them fixed while changing the other, for example changing the subject's distance to the monitor will change the SF content in terms of cycle/degree. With our current data, we cannot discriminate this effect. To address this issue, we added the following description to the discussion. To address this issue, we added the following description to the discussion:

      “Finally, since our experiment maintains a fixed SF content in terms of both cycles per degree and cycles per image, further experiments are needed to discern whether our observations reflect sensitivity to cycles per degree or cycles per image.”

      (2) The authors' band-pass filtered phase scrambled images of faces and objects. The original images likely differed in their spatial frequency amplitude spectrum and thus it is unclear whether the differing bands contained the same power for the different scrambled images. If not, this could have contributed to the frequency sensitivity of the neurons.

      After equalizing the luminance and contrast of the images, we equilized their power concerning SF and category. The powers were calculated using the sum of the absolute values of the Fourier transform of the images. The results of the ANOVA analysis across all stimuli indicate that various SF bands exhibit similar power (one-way ANOVA, p-value = 0.24). Additionally, a comparison of power between faces and objects in all SF bands (including intact), for both unscrambled and scrambled images, reveals no significant differences (p-value > 0.1). To clarify this point, we have incorporated the following information into the Methods section.

      “Finally, we equalized the stimulus power in all SF bands (intact, R-R5). The SF power among all conditions (all SF bands, face vs. non-face and unscrambled vs. scrambled) does not vary significantly (ANOVA, p-value > 0.1).”

      (3) How strong were the responses to the phase-scrambled images? Phase-scrambled images are expected to be rather ineffective stimuli for IT neurons. How can one extrapolate the effect of the spatial frequency band observed for ineffective stimuli to that for more effective stimuli, like objects or (for some neurons) faces? A distribution should be provided, of the net responses (in spikes/s) to the scrambled stimuli, and this for the early and late windows.

      The sample neuron in Figure 1c is chosen to be a good indicator of the recorded neurons. In the early response phase, the average firing rate to scrambled stimuli is 26.3 spikes/s which is significantly higher than the response in -50 to 50ms which is 23.4. In comparison, the mean response to intact face stimuli is 30.5 spikes/s, while object stimuli elicit an average response of 28.8 spikes/s. Moving to the late phase, T2, the responses to scrambled, face, and object stimuli are 19.5, 19.4, and 22.4 spikes/s, respectively. Moreover, when the classification accuracy for SF exceeds chance levels, it indicates a significant impact of SF bands on the IT response. This raises a direct question about the explicit coding for SF bands in the IT cortex observed for ineffective stimuli and how it relates to complex and effective stimuli, such as faces. To show the strength of neuron responses to the SF bands in scrambled images, We added Appendix 1 - Figure 2 and also added Appendix 1 - Figure 1, according to comment 4, which shows the average and std of the responses to all SF bands. The following description is added to the results section.

      “Considering the strength of responses to scrambled stimuli, the average firing rate in response to scrambled stimuli is 26.3 Hz, which is significantly higher than the response observed between -50 and 50 ms, where it is 23.4 Hz (p-value=3x10-5). In comparison, the mean response to intact face stimuli is 30.5 Hz, while non-face stimuli elicit an average response of 28.8 Hz. The distribution of neuron responses for scrambled, face, and non-face in T1 is illustrated in Appendix 1 - Figure 2.

      […]

      Moreover, the average firing rates of scrambled, face, and non-face stimuli are 19.5 Hz, 19.4 Hz, and 22.4 Hz, respectively. The distribution of neuron responses is illustrated in Appendix 1 Figure 2.”

      (4) The strength of the spatial frequency selectivity is unclear from the presented data. The authors provide the result of a classification analysis, but this is in normalized units so that the reader does not know the classification score in percent correct. Unnormalized data should be provided. Also, it would be informative to provide a summary plot of the spatial frequency selectivity in spikes/s, e.g. by ranking the spatial frequency bands for each neuron based on half of the trials and then plotting the average responses for the obtained ranks for the other half of the trials. Thus, the reader can appreciate the strength of the spatial frequency selectivity, considering trial-to-trial variability. Also, a plot should be provided of the mean response to the stimuli for the two analysis windows of Figure 2c and 2d in spikes/s so one can appreciate the mean response strengths and effect size (see above).

      The normalization of the classification result is just obtained by subtracting the chance level, which is 0.2, from the whole values. Therefore the values could still be interpreted in percent as we did in the results section. To make this clear, we removed the “a.u.” from the figure and we added the following description to the results section.

      “The accuracy value is normalized by subtracting the chance level (0.2).”

      Regarding the selectivity of the neuron, as suggested by your comment, we added a new figure in the appendix section, Appendix 1 - figure 2. This figure shows the strength of SF selectivity, considering trial-to-trial variability. The following description is added to the results section:

      “The strength of SF selectivity, considering the trial-to-trial variability is provided in Appendix 1 Figure 2, by ranking the SF bands for each neuron based on half of the trials and then plotting the average responses for the obtained ranks for the other half of the trials.”

      The firing rates of Figures 2c and 2d are normalized for better illustration since the variation in firing rates is high across neurons, as can be observed in Figure Appendix 1 - Figure 1. Since we seek trends in the response, the absolute values are not important (since the baseline firing rates of neurons are different), but the values relative to the baseline firing rate determine the trend. To address the mean response and the strength of the SF response, the following description is added to the results section.

      “Considering the strength of responses to scrambled stimuli, the average firing rate in response to scrambled stimuli is 26.3 Hz, which is significantly higher than the response observed between -50 and 50 ms, where it is 23.4 Hz (p-value=3x10-5). In comparison, the mean response to intact face stimuli is 30.5 Hz, while non-face stimuli elicit an average response of 28.8 Hz. The distribution of neuron responses for scrambled, face, and non-face in T1 is illustrated in Appendix 1 - Figure 2.

      […]

      Moreover, the average firing rates of scrambled, face, and non-face stimuli are 19.5 Hz, 19.4

      Hz, and 22.4 Hz, respectively. The distribution of neuron responses is illustrated in Appendix 1 Figure 2.”

      Furthermore, we added a figure, Appendix 1 - Figure 3, to illustrate the strength of SF selectivity in our profiles. The following is added to the results section:

      “To check the robustness of the profiles, considering the trial-to-trial variability, the strength of SF selectivity in each profile is provided in Appendix 1 - Figure 3, by forming the profile of each neuron based on half of the trials and then plotting the average SF responses with the other

      half of the trials.”

      (5) It is unclear why such brief stimulus durations were employed. Will the results be similar, in particular the preference for low spatial frequencies, for longer stimulus durations that are more similar to those encountered during natural vision?

      Please refer to the first comment of Reviewer 1.

      (6) The authors report that the spatial frequency band classification accuracy for the population of neurons is not much higher than that of the best neuron (line 151). How does this relate to the SNC analysis, which appears to suggest that many neurons contribute to the spatial frequency selectivity of the population in a non-redundant fashion? Also, the outcome of the analyses should be provided (such as SNC and decoding (e.g. Figure 1D)) in the original units instead of undefined arbitrary units.

      The population accuracy is approximately 5% higher than the best neuron. However, we have no reference to compare the effect size (the value is roughly similar for face vs object while the chance levels are different). However, as stated in Methods, SNC is calculated for two label modes (LSF and HSF) and it can not be directly compared to the best neuron accuracy. Regarding the unit of SNC, it can be interpreted directly to percent by multiplying by a factor of 100. We removed the “a.u.” to prevent misunderstanding and modified the results section for clearance.

      “… SNC score for SF (two labels, LSF (R1 and R2) vs. HSF (R4 and R5)) and category … (average SNC for SF=0.51\%±0.02 and category=0.1\%±0.04 …”

      (7) To me, the results of the analyses of Figure 3c,d, and Figure 4 appear to disagree. The latter figure shows no correlation between category and spatial frequency classification accuracies while Figure 3c,d shows the opposite.

      In Figure 3c,d, following what we observed in Figure 3a,b about the category coding capabilities in the population of neurons based on the profile of the single neurons, we tested a similar idea if the coding capability of single neurons in SF/category could predict the coding capability of population neurons in terms of category/SF. Therefore, both analyses investigate a relation between a characteristic of single neurons and the coding capability of a population of similar neurons. On the other hand, in Figure 4, the idea is to check the characteristics of the coding mechanisms behind SF and category coding. In Figure 4a, we check if there exists any relation between category and SF coding capability within a single neuron activity without the impact of other neurons, to investigate the idea that SF coding may be a byproduct of an object recognition mechanism. In Figure 4b, we investigated the contribution of all neurons in population decision, again to check whether the mechanisms behind the SF and category coding are the same or not. This analysis shows how individual neurons contribute to SF or category coding at the population level. Therefore, the experiments in Figures 3 and 4 are different in the analysis method and what they were designed to investigate and we cannot directly compare the results.

      (8) If I understand correctly, the "main" test included scrambled versions of each of the "responsive" images selected based on the preceding test. Each stimulus was presented 15 times (once in each of the 15 blocks). The LDA classifier was trained to predict the 5 spatial frequency band labels and they used 70% of the trials to train the classifier. Were the trained and tested trials stratified with respect to the different scrambled images? Also, LDA assumes a normal distribution. Was this the case, especially because of the mixture of repetitions of the same scrambled stimulus and different scrambled stimuli?

      In response to your inquiry regarding the stratification of trials, both the training and testing data were representative of the entire spectrum of scrambled images used in our experiment. To address your concern about the assumption of a normal distribution, especially given the mixture of repetitions of the same scrambled stimulus and different stimuli, our analysis of firing rates reveals a slightly left-skewed normal distribution. While there is a deviation from a perfectly normal distribution, we are confident that this skewness does not compromise the robustness of the LDA classifier.

      (9) The LDA classifiers for spatial frequency band (5 labels) and category (2 labels) have different chance and performance levels. Was this taken into account when comparing the SNC between these two classifiers? Details and SNC values should be provided in the original (percent difference) instead of arbitrary units in Figure 5a. Without such details, the results are impossible to evaluate.

      For both SNC and CMI calculations in SF, we considered two labels of HSF (R4 and R5) and LSF (R1 and R2). This was mentioned in the Methods section, after equation (5). According to your comment, to make it clear in the results section, we also added this description to the results section.

      “… illustrates the SNC score for SF (two labels, LSF (R1 and R2) vs. HSF (R4 and R5)) and category (face vs. non-face) … conditioned on the label, SF (LSF (R1 and R2) vs. HSF (R4 and R5)) or category, to assess the information.”

      The value of SNC can also be directly converted to the percent by a factor of 100. To make it clear, we removed “a.u.” from the y-axis.

      (10) Recording locations should be described in IT, since the latter is a large region. Did their recordings include the STS? A/P and M/L coordinate ranges of recorded neurons?

      We appreciate your suggestion for the recording location. Nevertheless, given the complexities associated with neurophysiological recordings and the limitations imposed by our methodologies, we face challenges in precisely localizing every unit if they are located in STS or not. To address your comment, We added Appendix 1 - Figure 5 which shows the SF and category coding capability of neurons along their recorded locations.

      (11) The authors should show in Supplementary Figures the main data for each of the two animals, to ensure the reader that both monkeys showed similar trends.

      We added Appendix 2 which shows the consistency of the main results in the two monkeys.

      (12) The authors found that the deep nets encoded better the spatial frequency bands than the IT units. However, IT units have trial-to-trial response variability and CNN units do not. Did they consider this when comparing IT and CNN classification performance? Also, the number of features differs between IT and CNN units. To me, comparing IT and CNN classification performances is like comparing apples and oranges.

      Deep convolutional neural networks are currently considered the state-of-the-art models of the primate visual pathway. However, as you mentioned and based on our results, they do not yet capture various complexities of the visual ventral stream. Yet studying the similarities and differences between CNN and brain regions, such as the IT cortex, is an active area of research, such as:

      a. Kubilius, Jonas, et al. "Brain-like object recognition with high-performing shallow recurrent ANNs." Advances in neural information processing systems 32 (2019).

      b. Xu, Yaoda, and Maryam Vaziri-Pashkam. "Limits to visual representational correspondence between convolutional neural networks and the human brain." Nature Communications, 12.1 (2021).

      c. Jacob, Georgin, et al. "Qualitative similarities and differences in visual object representations between brains and deep networks." Nature Communications, 12.1 (2021).

      Therefore, we believe comparing IT and CNN, despite all of the differences in terms of their characteristics, can help both fields grow faster, especially in introducing brain-inspired networks.

      (13) The authors should define the separability index in their paper. Since it is the main index to show a relationship between category and spatial frequency tuning, it should be described in detail. Also, results should be provided in the original units instead of undefined arbitrary units. The tuning profiles in Figure 3A should be in spikes/s. Also, it was unclear to me whether the classification of the neurons into the different tuning profiles was based on an ANOVA assessing per neuron whether the effect of the spatial frequency band was significant (as should be done).

      Based on your comment, we added the description of the separability index to the methods section. However, since the separability index is defined as the division of two dispersion matrices, it has no units by nature. The tuning profiles in Figure 3a are normalized for better illustration since the variation in firing rates is high. Since we seek trends in the response, the absolute values are not important. Regarding the SF profile formation, to better present the SF profile assignment, we updated the method section. Furthermore, The strength of responses for scrambled stimuli can be observed in Appendix 1 - Figures 1 and 2.

      (14) As mentioned above, the separability analysis is the main one suggesting an association between category and spatial frequency tuning. However, they compute the separability of each category with respect to the scrambled images. Since faces are a rather homogeneous category I expect that IT neurons have on average a higher separability index for faces than for the more heterogeneous category of objects, at least for neurons responsive to faces and/or objects. The higher separability for faces of the two low- and high-pass spatial frequency neurons could reflect stronger overall responses for these two classes of neurons. Was this the case? This is a critical analysis since it is essential to assess whether it is category versus responsiveness that is associated with the spatial frequency tuning. Also, I do not believe that one can make a strong claim about category selectivity when only 6 faces and 3 objects (and 6 other, variable stimuli; 15 stimuli in total) are employed to assess the responses for these categories (see next main comment). This and the above control analysis can affect the main conclusion and title of the paper.

      We appreciate your concern regarding category selectivity or responsiveness of the SF profiles. First, we note that we used SI since it overcomes the limitations of the accuracy and recall metrics as they are discrete and can be saturated. Using SI, we cannot directly calculate face vs object with SI, since this index only reports one value for the whole discrimination task. Therefore, we have to calculate the SI for face/object vs scrambled to obtain a value per category. However, as you suggested, it raises the question of whether we assess how well the neural responses distinguish between actual images (faces or objects) and their scrambled versions or if we just assess the responsiveness. Based on Figure 3b, since we have face-selective (LSF and HSF preferred profiles), object-selective (inverse U), and the U profile, where SI is the same for both face and object, we believe the SF profile is associated with the category selectivity, otherwise we would have the same face/object recall in all profiles, as we have in the U shape profile.

      To analyze this issue further, we calculated the number of face/object selective neurons in 70-170ms. We found 43 face-selective neurons and 36 object-selective neurons (FDR corrected p-value < 0.05). Therefore, the number of face-selective and object-selective neurons is similar. Next, we check the selectivity of the neurons within each profile. Number of face/object selective neurons is LP=13/3, HP=6/2, IU=3/9, U=14/13, and the remaining belong to the NP group. Results show higher face-selective neurons in LP and HP and a higher number of object-selective neurons in the IU class. The U class contains roughly the same number of face and object-selective neurons. This observation supports the relationship between category selectivity and profiles.

      Next, we examined the average neuron response to the face and object in each profile. The difference between the firing rate of the face and object in none of the profiles was significant (Ranksum with a significance level of 0.05). However, the rates are as follows. The average firing rate (spikes/s) of face/object is LP=36.72/28.77, HP=28.55/25.52, IU=21.55/27.25, U=38.48/36.28. While the differences are not significant, they support the relationship between profiles and categories instead of responsiveness.

      The following description is added to the results section to cover this point of view.

      “To assess whether the SF profiles distinguish category selectivity or merely evaluate the neuron's responsiveness, we quantified the number of face/non-face selective neurons in the 70-170ms time window. Our analysis shows a total of 43 face-selective neurons and 36 non-face-selective neurons (FDR-corrected p-value < 0.05). The results indicate a higher proportion of face-selective neurons in LP and HP, while a greater number of non-face-selective neurons is observed in the IU category (number of face/non-face selective neurons: LP=13/3, HP=6/2, IU=3/9). The U category exhibits a roughly equal distribution of face and non-face-selective neurons (U=14/13). This finding reinforces the connection between category selectivity and the identified profiles. We then analyzed the average neuron response to faces and non-faces within each profile. The difference between the firing rates for faces and non-faces in none of the profiles is significant (face/non-face average firing rate (Hz): LP=36.72/28.77, HP=28.55/25.52, IU=21.55/27.25, U=38.48/36.28, Ranksum with significance level of 0.05). Although the observed differences are not statistically significant, they provide support for the association between profiles and categories rather than mere responsiveness.”

      About the low number of stimuli, please check the next comment.

      (15) For the category decoding, the authors employed intact, unscrambled stimuli. Were these from the main test? If yes, then I am concerned that this represents a too small number of stimuli to assess category selectivity. Only 9 fixed + 6 variable stimuli = 15 were in the main test. How many faces/ objects on average? Was the number of stimuli per category equated for the classification? When possible use the data of the preceding selectivity test which has many more stimuli to compute the category selectivity.

      We used only the main phase recorded data, which contains 15 images in each session. Each image results in 12 stimuli (intact, R1-R5, and phase-scrambled version). Thus, there exists a total of 180 unique stimuli in each session. Increasing the number of images would have increased the recording time. We compensated for this limitation by increasing the diversity of images in each session by picking the most responsive ones from the selectivity phase. On average, 7.54 of the stimuli were face in each session. We added this information to the Methods section. Furthermore, as mentioned in the discussion, for each classification run, the number of samples per category is equalized. We note that we cannot use the selectivity data for analysis, since the SF-related stimuli are filtered in different bands.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors):

      I suggest that the authors double-check their results by performing control experiments with longer stimulus duration and SF-spectrum-matched face and object stimuli.

      Thanks for your suggestion, according to your comment, we added Appendix 1 - Figure 3.

      In addition, I had a very difficult time understanding the differences between Figure 3c and Figure 4a. Please rewrite the descriptions to clarify.

      Thanks for your suggestion, we tried to revise the description of these two figures. The following description is added to the results section for Figure 3c.

      “Next, to examine the relation between the SF (category) coding capacity of the single neurons and the category (SF) coding capability of the population level, we calculated the correlation between coding performance at the population level and the coding performance of single neurons within that population (Figure 3 c and d). In other words, we investigated the relation between single and population levels of coding capabilities between SF and category. The SF (or category) coding performance of a sub-population of 20 neurons that have roughly the same single-level coding capability of the category (or SF) is examined.”

      Lines 147-148: The text states that 'The maximum accuracy of a single neuron was 19.08% higher than the chance level'. However, in Figure 4, the decoding accuracies of individual neurons for category and SF range were between 49%-90% and 20%-40%, respectively.

      Please explain the discrepancies.

      The first number is reported according to chance level which is 20%, thus the unnormalized number is 39% which is consistent with the SF accuracy in Figure 4. We added the following description to prevent any misunderstanding.

      “… was 19.08\% higher than the chance level (unnormalized accuracy is 49.08\%, neuron \#193, M2).”

      Lines 264-265: Should 'the alternative for R3 and R4' be 'the alternative for R4 and R5'?

      Thanks for your attention, it's “R4 and R5”. We corrected that mistake.

      Lines 551-562: The labels for SF classification are R1-R5. Is it a binary or a multi-classification task?

      It’s a multi-label classification. We made it clear in the text.

      “… labels were SF bands (R1, R2, ..., R5, multi-label classifier).”

      Figure 4b: Neurons in SF/category decoding exhibit both positive and negative weights. However, in the analysis of sparse neuron weights in Equation 6, only the magnitude of the weights is considered. Is the sign of weight considered too?

      We used the absolute value of the neuron weight to calculate sparseness. We also corrected Equation 6.

      Reviewer #2 (Recommendations For The Authors):

      (1) Line 52: what do the authors mean by coordinate processing in object recognition?

      To avoid any potential misunderstanding, we used the exact phrase in Saneyoshi and Michimata (2015). It is in fact, coordinate relations processing. Coordinate relations specify the metric information of the relative locations of objects.

      (2) About half of the Introduction is a summary of the Results. This can be shortened.

      Thanks for your suggestion.

      (3) Line 134: Peristimulus time histogram instead of Prestimulus time histogram.

      Thanks for your attention. We corrected that.

      (4) Line 162: the authors state that R1 is decoded faster than R5, but the reported statistic is only for R1 versus R2.

      It was a typo, the p-value is only reported for R1 and R5.

      (5) Line 576: which test was used for the asses the statistical significance?

      The test is Wilcoxon signed-rank. We added it to the text.

      (6) How can one present a 35 ms long stimulus with a 60 Hz frame rate (the stimuli were presented on a 60Hz monitor (line 470))? Please correct.

      Thanks for your attention. We corrected that. The time of stimulus presentation is 33ms and the monitor rate is 120Hz.

    1. Author response:

      The following is the authors’ response to the original reviews.

      These are valuable findings that support a link between low-dimensional brain network organization, patterns of ongoing thought, and trait-level personality factors, making it relevant for researchers in the field of spontaneous cognition, personality, and neuropsychiatry. While this link is not entirely new, the paper brings to bear a rich dataset and a well-conducted study, to approach this question in a novel way. The evidence in support of the findings is convincing.

      We thank the reviewers and editors for their time, feedback, and recommendations for improvement. We have revised the manuscript with those recommendations in mind and provide a point-by-point description of the revisions below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors ran an explorative analysis in order to describe how a "tri-partite" brain network model could describe the combination of resting fMRI data and individual characteristics. They utilized previously obtained fMRI data across four scanning runs in 144 individuals. At the end of each run, participants rated their patterns of thinking on 12 statements (short multi-dimensional experience sampling-MDES) using a 0-100% visual analog scale. Also, 71 personality traits were obtained on 21 questionnaires. The authors ran two separate principal component analyses (PCA) to obtain low dimensional summaries of the two individual characteristics (personality traits from questionnaires, and thought patterns from MDES). The dimensionality reduction of the fMRI data was done by means of gradient analysis, which was combined with Neurosynth decoding to visualize the functional axis of the gradients. To test the reliability of thought components across scanning time, intra-class correlation coefficients (ICC) were calculated for the thought patterns, and discriminability indices were calculated for whole gradients. The relationship between individual differences in traits, thoughts, and macro-scale gradients was tested with multivariate regression.

      The authors found: a) reliability of thought components across the one hour of scanning, b) Gradient 1 differentiated between visual regions and DMN, Gradient 2 dissociated somatomotor from visual cortices, Gradient 3 differentiated the DMN from the fronto-parietal system, c) the associations between traits/thought patterns and brain gradients revealed significant effects of "introversion" and "specific internal" thought: "Introversion" was associated with variant parcels on the three gradients, with most of parcels belonging to the VAN and then to the DMN; and "Specific internal thought" was associated with variant parcels on the three gradients with most of parcels belonging to the DAN and then the visual. The authors conclude that interactions between attention systems and the DMN are important influences on ongoing thought at rest.

      Strengths:

      The study's strength lies in its attempt to combine brain activity with individual characteristics using state-of-the-art methodologies.

      Weaknesses:

      The study protocol in its current form restricts replicability. This is largely due to missing information on the MRI protocol and data preprocessing. The article refers the reader to the work of Mendes et al 2019 which is said to provide this information, but the paper should rather stand alone with all this crucial material mentioned here, as well. Also, effect sizes are provided only for the multiple multivariate regression of the inter-class correlations, which makes it difficult to appreciate the power of the other obtained results. 

      Thank you for these comments. We have addressed both issues by adding effect sizes for reported trait and thought related effects within the results table (Table 3, Line 427) and providing more information about the fMRI protocol and preprocessing steps.  (Lines 162- 188)

      Reviewer #2 (Public Review):

      The authors set out to draw further links between neural patterns observed at "rest" during fMRI, with their related thought content and personality traits. More specifically, they approached this with a "tri-partite network" view in mind, whereby the ventral attention network (VAN), the dorsal attention network (DAN), and the default mode network (DMN) are proposed to play a special role in ongoing conscious thought. They used a gradients approach to determine the low dimensional organisation of these networks. In concert, using PCA they reduced thought patterns captured at four time points during the scan, as well as traits captured from a large battery of questionnaires.

      The main findings were that specific thought and trait components were related to variations in the organisation of the tri-partite networks, with respect to cortical gradients.

      Strengths of the methods/results: Having a long (1 hr) resting state MRI session, which could be broken down into four separate scanning/sampling components is a strength. Importantly, the authors could show (via intra-class correlation coefficients) the similarity of thoughts and connectivity gradients across the entire session. Not only did this approach increase the richness of the data available to them, it speaks in an interesting way to the stability of these measures. The inclusion of both thought patterns during scanning along with trait-level dispositional factors is most certainly a strength, as many studies will often include either/or of these, rather than trying to reconcile across. Of the two main findings, the finding that detailed self-generated thought was associated with a decoupling of regions of DAN from regions in DMN was particularly compelling, in light of mounting literature from several fields that support this.

      Weaknesses of the methods/results: Considering the richness of the thought and personality data, I was a little surprised that only two main findings emerged (i.e., a relationship with trait introversion, and a relationship with the "specific internal" thought pattern). I wondered whether, at least in part and in relation to traits, this might stem from the large and varied set of questionnaires used to discern the traits. These questionnaires mostly comprised personality/mood, but some sampled things that do not fall into that category (e.g., musicality, internet addition, sleep), and some related directly to spontaneous thought properties (e.g., mind wandering, musical imagery). It would be interesting to see what relationships would emerge by being more selective in the traits measured, and in the tools to measure them.

      We agree that being more selective in trait measures and measuring tools could lead to more insights into trait – brain relationships. In part the emergence of only two main findings could also be a trade-off of multiple comparison corrections inherent in our current approach (i.e. 400 separate models for all parcels). Furthermore, we have adjusted the text in the discussion in this revision to highlight that more targeted measures of personality (e.g. self-consciousness) could provide a more nuanced view of the relationship between traits and patterns of thought at rest. (Line 532):

      “In the future it may also be important to consider measures of traits that could have relationships to both neural activity and or experience at rest (e.g. self-consciousness de Caso et al., 2017, or autistic tendencies, Turnbull et al., 2020a).”  

      Taken together, the main findings are interesting enough. However, the real significance of this work, and its impact, lie in the richness of the approach: combing across fMRI, spontaneous thought, and trait-level factors. Triangulating these data has important potential for furthering our understanding of brain-behaviour relationship across different levels of organisation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Recommendations for improving the writing and presentation.

      - Frame the study objectives more clearly. If it's about which theoretical framework best supports the data, you might need to advocate on why the tri-partite approach is a more efficient framework than others. If not, the argument will beg the question: you will find an effect on this model, so you will claim that this is an informative model. For example, if the focus is on these three RSNs and thought reporting, the authors might want to contextualize it historically, like how from two networks (DMN-antagonistic; Vanhaudenhuyse JCognNeurosci 2012; Demertzi et al, NetwNeuroci 2022) we end up to three and why this is a more suitable approach. What about whole-brain connectomic approaches, such as the work by Amico et al? 

      We have expanded on the objectives and rationale of the study by editing/ expanding the introduction as follows (Lines 84-87): 

      “Traditionally, it was argued that the DMN was thought to have an antagonistic relationship with systems linked to external processing (Fox et al., 2005). However, according to the ‘tri-partite’ network accounts the relationship between the DMN and other brain systems is more nuanced. From this perspective key hubs of the ventral attention network, such as the anterior insula and dorso-lateral prefrontal cortex, help gate access to conscious experience, influencing regardless of the focus of attention. This is hypothesised to occur because the VAN influences interactions between the DAN, which is more important for external mental content (Corbetta and Shulman, 2002), and the DMN which is important when states (including tasks) rely more on internal representations (Smallwood et al., 2021a)..”  (… and Lines 112:125):

      “Our current study explored whether this “tri-partite network” view of ongoing conscious thought derived from studies focused on understanding conscious experience, provides a useful organizing framework for understanding the relation between observed brain activity at rest and patterns of cognition/ personality traits. Such analysis is important because at rest there are multiple features of brain activity that can be identified via complex analyses that include regions that show patterns of coactivation (which are traditionally viewed as forming a cohesive network, (Biswal et al., 1995) as well as patterns of anti-correlation with other regions (e.g. Fox et al., 2005). However, it is unclear which of these relationships reflect aspects of cognition or behaviour or are in fact aspects of the functional organization of the cortex (Fox and Raichle, 2007). Consequently, our study builds on foundational work (e.g. Vanhaudenhuyse et al., 2011) in order to better understand which aspects of neural function observed at rest are mostly likely linked to cognition and behaviour. With this aim in mind, we examined links between macro-scale neural activation and both (i) trait descriptions of individuals and (ii) patterns of ongoing thought.”

      - As there was no explicit description of the adopted design and the fMRI procedure, I deduced that it was about a within-subject design, 1-hour scanning session, comprised of four runs, each lasting 15 min, can that be correct? In any case, an explicit description of the design and the fMRI procedure eases the reading and replicability. 

      Thank you for pointing this out. We have now restructured and edited the text relating to write those details clearly and explain the MDES part of the procedure in the same section. It now reads (Lines 162:167): 

      “Resting state fMRI with Multidimensional Experience Sampling (MDES)

      The current sample includes one hour of fully pre-processed rs-fMRI data from 144 participants (four scans from 135 participants, and three scans from nine participants whose data were missing or incomplete). The rs-fMRI was performed in four adjacent 15-minute sessions each immediately followed by MDES which retrospectively measured various dimensions of spontaneous thought during the scan.”

      - Was there a control to the analysis, such as a gradient which also associated with these characteristics? Anything else?

      In our analyses we explore multiple gradients and how they link to traits and thoughts at rest. While there is no explicit control, each analyses provides a constraint on the interpretation of the other analyses. We have added the following text to expand on this point (Line 372): 

      “To this end, we performed a multiple multivariate regression with thoughts, traits, and nuisance variables (motion, age and gender) as independent variables, with whole brain functional organisation, as captured by the first three gradients, as dependent variables. In this analytic approach relationships between cognition along one gradient but not along another help identify which relationships between brain systems are mostly likely to relate to the feature of cognition in question (i.e. each gradient acts as a control for the other).”  

      - I feel that Table 1 (list of tests) carries less information compared to Supplementary Table 1 (how spontaneous thought was reported and scored). I would suggest swapping them, unless Table 1 further contains which outcome measures per test were used for the analysis.  

      Thank you for this suggestion. Table showing the MDES questions has now been moved to the main text (Table 1, Line 194). However, as there is no other description of the questionnaires included in the main text, we have also retained the table listing personality/ trait questionnaires (Table 2, Line 200).

      - Ten group-level gradients were calculated out of which three were shown on the basis of previous work. Please, visualize all 10 gradients as complementary material to inform potential future works on how these look.  

      Thank you for this suggestion. Supplementary figure 3 now shows all 10 gradients.

      - Please provide more information on preprocessing, especially with motion artifacts and how the global signal was processed.  

      Thank you for pointing this out. We have now included the following text, summarized from Mendes et al., 2019, to describe the preprocessing in brief (Line 171:188): 

      “Motion correction parameters were derived by rigid-body realignment of the timeseries to the first (after discarding the first five volumes) volume with FSL MCFLIRT (Jenkinson et al., 2002). Parameters for distortion correction were calculated by rigidly registering a temporal mean image of this time series to the fieldmap magnitude image using FSL FLIRT (Jenkinson and Smith, 2001) which was then unwarped using FSL FUGUE (Jenkinson et al., 2012). Transformation parameters were derived by coregistering the unwarped temporal mean to the subject’s structural scan using FreeSurfer’s boundary-based registration algorithm (Greve and Fischl, 2009). All three spatial transformations were then combined and applied to each volume of the original time series in a single interpolation step. The time series was residualised against the six motion parameters, their first derivatives, “outliers” identified by Nipype’s rapidart algorithm (https://nipype.readthedocs.io/en/latest/interfaces/ A CompCor (Behzadi et al., 2007) approach was implemented to remove physiological noise from the residual time-series- which included first six principal components from all the voxels identified as white-matter cerebrospinal fluid. The denoised time series were temporally filtered to a frequency range between 0.01 and 0.1 Hz using FSL, mean centered and variance normalized using Nitime (Rokem et al., 2009). Imaging and pre-processing protocols are described in detail in Mendes et al (Mendes et al., 2019).”

      - Please, describe the duration of the whole process, and when the questionnaire data were collected.

      We apologize for the lack of clarity. “Data” section of the Methods has now been edited to explain this more clearly, it now reads (Line 146:154):

      “The dataset used here is part of the MPI-Leipzig Mind-Brain-Body (MPILMBB) database (Mendes et al., 2019). The complete dataset consists of a battery of selfreported personality measures, measures of spontaneous thought, task data, and structural and resting-state functional MRI (one hour, divided into four adjacent 15-min sessions) from participants between 20 and 75 years of age. Data were collected over a period of five days, with the MRI sessions always falling on day 3. The questionnaires were completed by participants before and after this day, using Limesurvey (https://www.limesurvey.org: version 2.00+) at their own convenience and using penand-paper on-site. A detailed description of the participants, measures, and data acquisition protocol has been previously published along with the dataset (Mendes et al., 2019).”

      - In light of the discussion about sample sizes and the power of the correlations, can you indicate the effect sizes of the reported results?  

      Thank you for pointing this out. Effect sizes have been added to the results table (Table 3, Line 427)

      Minor corrections to the text and figures

      - Introduction: "Our sample was a cohort....states were explanatory variables": Better move this part to Methods. Ideally, provide the hypotheses here, the ways you wanted to test them, and how you would negate them. What would it mean that you got the hypotheses confirmed? What would the opposite outcome mean? 

      We have added the following text before this part to clarify expand on the objective of the study (Lines 112:125): 

      “Our current study explored whether this “tri-partite network” view of ongoing conscious thought derived from studies focused on understanding conscious experience, provides a useful organising framework for understanding the relation between observed brain activity at rest and patterns of cognition/ personality traits. Such analysis is important because at rest there are multiple features of brain activity that can be identified via complex analyses that include regions that show patterns of coactivation (which are traditionally viewed as forming a cohesive network, (Biswal et al., 1995) as well as patterns of anti-correlation with other regions (e.g. Fox et al., 2005). However, it is unclear which of these relationships reflect aspects of cognition or behaviour or are in fact aspects of the functional organisation of the cortex (Fox and Raichle, 2007). Consequently, our study builds on foundational work (e.g. Vanhaudenhuyse et al., 2011) in order to better understand which aspects of neural function observed at rest are mostly likely linked to cognition and behaviour. With this aim in mind, we examined links between macro-scale neural activation and both (i) trait descriptions of individuals and (ii) patterns of ongoing thought.”   

      We have refrained from listing hypothesis, as the analyses we performed were data driven rather than hypothesis driven to include all possible associations between largescale connectivity patterns and individual state and trail level differences in personality and thought/ experience. We hope that the added text provides more context to understand this rationale.  

      - Please, clarify whether "conscious thought" means "reportable. 

      Thank you for this suggestion. We have now edited the first reference to thought patterns in the discussions to read “self-reports of ongoing thought”, instead of just “ongoing thought” (Line 432)

      - Please, clarify whether "experience" and "thought" are used interchangeably. This is because experience can also be ineffable, beyond thought reporting. 

      To clarify this in the context of the current study, we have edited first reference to “ongoing experience” in the introduction to “self-reports of ongoing experience”. (Line 75)

      - To ease reading comprehension for each Figure, communicate the main findings first, before describing the figures. 

      We believe this lack of clarity is caused by including the figure reference in the heading of the results subsections. We hope this issue is fixed by editing the text in the following manner (Line 381):

      “Trait Introversion 

      Along the first gradient, a parcel within the right orbitofrontal cortex (within the executive control network, shown in orange) showed more similarity with transmodal regions for individuals high on introversion. Six parcels within the ventral attention network, including anterior insula, operculum and cingulate cortex were closer to the somatomotor end along gradient two (shown in purple). The same regions showed lower scores along the third gradient in participants with higher introversion scores, indicating stronger integration with the default mode network. A parcel within posterior cingulate cortex (control) was also more segregated from the visual end of gradient two in participants with higher introversion scores. Associations between trait “introversion” and brain-wide activity are shown in Figure 4.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In "Prediction error determines how memories are organized in the brain: a study of Pavlovian fear 2 extinction in rats", Kennedy et al examine how new information is organized in memory. They tested an idea based on latent theory that suggests that a large prediction error leads to the formation of a new memory, whereas a small prediction error leads to memory updating. They directly tested the prediction by extinguishing fear-conditioned rats with gradual extinction. For their experiment, gradual extinction was carried out by progressively reducing the intensity of shocks that were co-terminated with the CS, until the CS was presented alone. Doing so resulted in diminished spontaneous recovery and reinstatement compared to Standard Extinction. The results are compelling, and have important implications for the field of fear learning and memory as well as translation to anxiety-related disorders.

      The authors carried out the Spontaneous Recovery experiment in 2 separate experiments. In one, they found differences between the Gradual and Standard Extinction groups, but in the second, they did not. It seems that their reinstatement test was more robust, and showed significant differences between the Gradual and Standard Extinction groups.

      The authors carried out important controls that enable proper contextualization of the findings. They included a "Home" group, in which rats received fear conditioning, but not extinction manipulation. Relative to this group, the Gradual and Standard extinction groups showed a reduction in freezing.

      In Experiments 3 and 4, the authors essentially carried out clever controls that served to examine whether shock devaluation (Experiment 4) and reduction in shock intensity (rather than a gradual decrease in shock intensity) (Experiment 3) would also yield a decrease in the return of fear. In line with a latent-cause updating explanation for accounting for the Gradual Extinction, they did not.

      In Experiment 5, the authors examined whether a prediction error produced by a change of context might contribute interference to the latent cause updating afforded by the Gradual Extinction. Such a prediction would align with a more flexible interpretation of a latent-cause model, such as those proposed by Redish (2007) and Gershman et al (2017), but not the latent-cause interpretation put forth by the Cochran-Cisler model (2019). Their findings showed that whereas Gradual Extinction carried out in the same context as acquisition resulted in less return of fear than Standard Extinction, it actually yielded a greater degree of return of fear when carried out in a different context, in support of the Redish and Gershman accounts, but not Cochran-Cisler.

      Experiment 6 extended the findings from Experiment 5 in a different state-splitting modality: timing. In this experiment, the authors tested whether a shift in temporal context also influenced the gradual extinction effect. They thus carried out the extinction sessions 21 days after conditioning. They found that while Gradual Extinction was indeed effective when carried out one day after fear conditioning, it did not when conducted 21 days later.

      The authors next carried out an omnibus analysis which included all the data from their 6 experiments, and found that overall, Gradual Extinction resulted in diminished return of fear relative to Standard Extinction. I thought the omnibus analysis was a great idea and an appropriate way to do their data justice.

      Strengths:

      Compelling findings. The data support the conclusions. 6 rigorous experiments were conducted which included clever controls. Data include male and female rats. I really liked the omnibus analysis.

      We thank the reviewer for their positive comments – they are appreciated.

      Weaknesses:

      None noted

      Reviewer #2 (Public Review):

      Summary:

      The present article describes a series of experiments examining how a gradual reduction in unconditional stimulus intensity facilitates fear reduction and reduces relapse (spontaneous recovery and reinstatement) relative to a standard extinction procedure. The experiments provide compelling, if somewhat inconsistent, evidence of this effect and couch the results in a scholarly discussion surrounding how mechanisms of prediction error contribute to this effect.

      Strengths:

      The experiments are theoretically motivated and hypothesis-driven, well-designed, and appropriately conducted and analyzed. The results are clear and appropriately contextualized into the broader relevant literature. Further, the results are compelling and ask fundamental questions regarding how to persistently weaken fear behavior, which has both strong theoretical and real-world implications. I found the 'scrambled' experiment especially important in determining the mechanism through which this reduction in shock intensity persistently weakens fear behavior.

      We thank the reviewer for their positive comments – they are appreciated.

      Weaknesses:

      Overall, I found very few weaknesses in this paper. I think some might view the somewhat inconsistent effects on relapse between experiments to be a substantial weakness, I appreciate the authors directly confronting this and using it as an opportunity to aggregate data to look at general trends. Further, while Experiment 1 only used males, this was corrected in the rest of the experiments and therefore is not a substantial concern.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript examined the role of large versus small prediction errors (PEs) in creating a state-based memory distinction between acquisition and extinction. The premise of the paper is based on theoretical claims and empirical findings that gradual changes between acquisition and extinction would lead to the potential overwriting of the acquisition memory with extinction, resulting in a more durable reduction in conditioned responding (i.e. more durable extinction effect). The paper tests the hypotheses in a series of elegant experiments in which the shock intensity is decreased across extinction sessions before non-reinforced CS presentations are given. Additional manipulations include context change, shock devaluation, and controlling for lower shock intensity exposure. The critical comparison was standard non-reinforced extinction training. The critical tests were done in spontaneous recovery and reinstatement.

      Strengths:

      The findings are of tremendous importance in understanding how memories can be updated and reveal a well-defined role of PE in this process. It is well-established that PE is critical for learning, so delineating how PE is critical for generating memory states and the role it serves in keeping memories dissociable (or not) is exciting and clever. As such the paper addresses a fundamental question in the field.

      The studies test clear and defined predictions derived from simulations of the state-belief model of Cochran & Cisler (2019). The designs are excellent: well-controlled and address the question.

      The authors have done an excellent job of explaining the value of the latent state models.

      The authors have studied both sexes in the study presented, providing generality across the sexes in their findings. However, depicting the individual data points in the bar graphs and noting which data represent males and which represent females would be of great value.

      We thank the reviewer for their positive comments. We have included individual data points in the bar graphs and indicated which represent males and females.

      Weaknesses:

      (1) While it seems obvious that delivering a lower intensity shock will generate a smaller PE than say no shock, it would have been nice to see data from say a compound testing procedure that confirms this.

      It would be great if we could provide independent evidence that shifting from a 0.8 mA shock to a 0.4 mA shock (first session of gradual extinction) produces a smaller prediction error than shifting from a 0.8 mA shock to no shock at all (first session of standard extinction). In theory, this could be assessed using Rescorla’s (2000) compound test procedure. However, application of this procedure requires the use of a within-subject design and latent state theories would not predict the gradual extinction effect in such a design (as all prediction errors generated in such a design would affect the state-splitting process). That is, the between-subject design used to generate the gradual extinction effect is not amenable to application of the compound test procedure; and the within-subject design in which the compound test procedure could be applied is unlikely to generate the gradual extinction effect. Thus, we instead rely on the high degree of similarity between our results and those predicted by Cochran & Cisler (2019) to argue that the gradual extinction protocol produces a series of smaller prediction errors than does the standard extinction protocol: hence the present pattern of results.

      (2) The devaluation experiment is quite clever, but it also would be strengthened if there was evidence in the paper that this procedure does indeed lead to shock devaluation.

      The aim of Experiment 3 was to determine whether the gradual extinction effect is due to prediction error-based memory updating or shock devaluation. If the effect was due to shock devaluation, the group that received the gradual extinction treatment should have displayed the same low level of spontaneous recovery as the group that only experienced the shock at its lowest (0.1 mA) intensity (i.e., the shock devaluation group). Contrary to this prediction, the results showed that the gradually extinguished group displayed less spontaneous recovery than the shock devaluation group. That is, in this experiment, the slow and progressive reduction in shock intensity was processed differently to the repeated 0.1 mA shock exposures but the results were inconsistent with any shock devaluation effect. Hence, we conclude that the gradual extinction effect does not involve shock devaluation but instead is due to prediction error-based memory updating.

      (3) It would have been very exciting to see even more parametric examinations of this idea, like maintaining shock intensity but gradually reducing shock duration, which would have increased the impact of the paper.

      We appreciate the reviewer’s point. As each shock was presented for just 0.5 s, we are not confident that rats would detect gradual and progressive changes in its duration in the same way as they can obviously detect gradual and progressive changes in its intensity. We are, however, investigating the effects of gradual extinction in a second order conditioning protocol, which will allow us to examine the full range of parameters that are important for its regulation, including manipulations of stimulus duration. In our second-order conditioning protocol, rats are first exposed to pairings of a 10 s S1 and a 0.5 s foot shock US; and then exposed to pairings of a 30 s S2 and the 10 s S1. Across the latter pairings, rats acquire second-order conditioned fear responses to S2. Importantly, these responses can be extinguished through repeated presentations of the S2 in the absence of its S1-associate; and the duration of the S1 can be progressively and gradually reduced from 10 s to 0 s across the shift to this extinction. These experiments are currently in progress and will eventually represent an extension of the present findings.

      (4) Individual data points should be represented in the test figures (see above also).

      We have updated the figures to show these data points.

      Rescorla, R. A. (2000). Associative changes in excitors and inhibitors differ when they are conditioned in compound. Journal of Experimental Psychology: Animal Behavior Processes26(4), 428.

      Reviewing Editor (Recommendations For The Authors):

      The eLife assessment relates to the present form of the paper. However, following a discussion with the reviewers, the significance of the findings could be bolstered to fundamental if you decided to revise the current manuscript by scaling up the investigation to examine a wider set of parameters and conditions under which error can influence state allocation of memories. One way of doing this, but not limited to this, is suggested in the reviews (e.g. maintaining shock intensity, reducing its duration). Relatedly, a more extensive discussion of the Gershamn et al. (2013) paper would be relevant.

      As noted in our response to Reviewer 3, we are currently investigating the effects of gradual extinction in a second order conditioning protocol, which will allow us to examine the full range of parameters that are important for its regulation, including manipulations of stimulus duration. These experiments are in-progress and will eventually represent an extension of the present findings. They are not, however, ready to be included as part of the present study.

      We have further referenced the Gershman et al., (2013) paper as well as the related Bouton et al., (2004) paper on the effects of gradually reducing the frequency of the US across extinction. This appears in the fifth paragraph of the Discussion: “The present study adds to a growing body of evidence that manipulations applied across the shift from CS-US pairings to presentations of the CS alone can influence the effectiveness of extinction. For example, Gershman et al., (2013) and Bouton et al., (2004) showed that gradually reducing the proportion of reinforced CS presentations results in less spontaneous recovery and slower reacquisition, respectively; though both studies left open fundamental questions about the basis of their findings (see also Woods & Bouton, 2007).”

      Reviewer #1 (Recommendations For The Authors):

      I don't have any strong recommendations. I think the paper is really great as is.

      One minor suggestion to consider:

      The authors carried out the Spontaneous Recovery experiment in 2 separate experiments. In one, they found differences between the Gradual and Standard Extinction groups, but in the second, they did not. This is perhaps not entirely surprising, since their extinction test was conducted 2 weeks post-extinction, and not all rats show spontaneous recovery within that timeframe. The authors mention that the lack of SR might be due to the low level of freezing reported in their test, but since they are showing group mean data, they might consider showing the individual data points to showcase the range of SR freezing as an additional way to make sense of the variability (ie, maybe a few rats that showed very low freezing carried the mean down in the Standard Extinction group, while others showed return of fear).

      We agree and have included individual data points for test results in Figures 2D, 2F, 3D, 3H, 4D and 4H. Hence, these figures now reflect both group and individual freezing levels.

      Reviewer #2 (Recommendations For The Authors):

      Overall, I thought this was an exceptional paper. Aside from the comments listed above which I'm not sure are inherently addressable, the only real changes I would like to see are that individual data points should be depicted in the main testing figures, as is becoming more conventional in the field.

      We thank the reviewer for their positive comments. As indicated in our response to the other reviewers, we have added individual data points to the histograms showing test results.

      Reviewer #3 (Recommendations For The Authors):

      Figures

      (1) The test data are presented as bars, but I did wonder if there were differences between the groups from the start of testing or if those emerged across testing (SR vs extinction savings).

      We have added two new figures to the supplementary section, Figures 8 and 9. These display the trial-by-trial data from spontaneous recovery and reinstatements tests in each experiment. The data clearly show that the between-group differences in freezing were very stable across the test sessions.

      (2) While I understand the importance of presenting the last extinction session, I felt depicting the entire CS session would be more informative. Alternatively, removing this altogether and leaving the information from the extinction session in the supplemental would focus the reader on the key test data.

      We appreciate the reviewer’s point. It is important to show that the groups displayed equivalent freezing in the final extinction session prior to testing. Given that the test data are conveniently and best presented in a histogram, we have chosen to present the data from the final extinction session in the same way. The full, trial-by-trial trajectory of freezing across conditioning and extinction, as well as the analyses of these data, are presented in the supplementary A.

      (3) I did not find the figures to be very aesthetically pleasing (in part because some panels were unnecessarily large). For example, I found it rather odd that the simulation panels were split in Figure 1. One suggestion of how this figure could look better is to keep the size of panels B, C, and D the same and align them on the same row with the design figure above them. The other option is to have the design figure above the test figure and the two simulation figures above each other and next to the design and test. Also, there are grey lines that appear around the simulation figures on my PDF.

      We have updated the figures so that they are consistent across experiments and more aesthetically pleasing. Specifically, we have consistently: 1) inserted the simulations of Cochran & Cisler (2019) next to the design schematic; 2) inserted the extinction and test data beneath the design schematic; and 3) Made the sizing of figures more uniform across Experiments 1-6.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      This study presents valuable findings as it shows that sleep rhythm formation and memory capabilities depend on a balanced and rich diet in fly larvae. The evidence supporting the claims of the authors is convincing with rigorous behavioral assays and state-of-the-art genetic manipulations. The work will be of interest to researchers working on sleep and memory. 

      Public Reviews: 

      Summary: 

      This manuscript investigates how energetic demands affect the sleep-wake cycle in Drosophila larvae. L2 stage larvae do not show sleep rhythm and long-term memory (LTM), however, L3 larvae do. The authors manipulate food content to provide insufficient nutrition, which leads to more feeding, no LTM, and no sleep even in older larvae. Similarly, activation of NPF neurons suppresses sleep rhythm. Furthermore, they try to induce a sleep-like state using pharmacology or genetic manipulations in L2 larvae, which can mimic some of the L3 behaviours. A key experimental finding is that activation of DN1a neurons activate the downstream DH44 neurons, as assayed by GCaMP calcium imaging. This occurs only in third instar and not in second instar, in keeping with the development of sleep-wake and feeding separation. The authors also show that glucose metabolic genes are required in Dh44 neurons to develop sleep rhythm and that DH44 neurons respond differently in malnutrition or younger larvae. 

      Strengths: 

      Previous studies from the same lab have shown the sleep is required for LTM formation in the larvae, and that this requires DN1a and DH44 neurons. The current work builds upon this observation and addresses in more detail when and how this might develop. The authors can show that low quality food exposure and enhanced feeding during larval stage of Drosophila affects the formation of sleep rhythm and long-term memory. This suggests that the development of sleep and LTM are only possible under well fed and balanced nutrition in fly larvae. Non-sleep larvae were fed in low sugar conditions and indeed, the authors also find glucose metabolic genes to be required for a proper sleep rhythm. The paper presents precise genetic manipulations of individual classes of neurons in fly larvae followed by careful behavioural analysis. The authors also combine thermogenetic or peptide bath application experiments with direct calcium imaging of specific neurons. 

      Weaknesses: 

      The authors tried to induce sleep in younger L2 larvae, however the behavioral results suggest that they were not able to induce proper sleep behaviour as in normal L3 larvae. Thus, they cannot show that sleep during L2 stage would be sufficient to form LTM. 

      We agree that the experiments with Gaboxadol feeding in L2 did not perfectly mimic L3 sleep behaviors. However, genetic induction of sleep in L2 was effective in increasing sleep duration and depth similar to that observed in normal L3. As noted below in response to specific reviewer comments, because gaboxadol feeding is standard in the field for adult sleep induction, we prefer to still include this data in the manuscript for transparency. Moreover, the gaboxadol manipulation did cause a significant decrease in arousal threshold compared to control larvae. Together these approaches support the hypothesis that sleeping more/more deeply is not sufficient to promote LTM in L2.

      The authors suggest that larval Dh44 neurons may integrate "information about the nutritional environment through the direct sensing of glucose levels to modulate sleep-wake rhythm development". They identify glucose metabolism genes (e.g., Glut1) in the downstream DH44 neurons as being required for the organization of the sleep-wake-feeding rhythm, and that CCHa signaling in DN1a signaling to the DH44 cells via the receptor. However, how this is connected is not well explained. Do the authors think that the nutrient sensing is only occurring in the DH44 neurons and not in DN1a or other neurons? Would not knocking down glucose metabolism in any neuron lead to a functional defect? What is the evidence that Dh44 neurons are specific sensors of nutritional state? For example, do the authors think that e.g. the overexpression of Glut1 in Dh44 neurons, a manipulation that can increase transport of glucose into cells, would rescue the effects of low-sugar food? 

      We thank the reviewer for these suggestions and have added the experiment proposed. We found that knockdown of Hex-C in DN1a neurons did not disrupt sleep-wake rhythms (Fig. S4G-I) suggesting that Dh44 neurons are specialized in requiring glucose metabolism to drive sleep-wake rhythms. We have also added further clarification in the text regarding the existing evidence that Dh44 neurons act has nutrient sensors.

      Some of the genetic controls seem to be inconsistent suggesting some genetic background effects. In Figure 2B, npf-gal4 flies without the UAS show no significant circadian change in sleep duration, whereas UAS-TrpA flies do. The genetic control data in Figure 2D are also inconsistent. Npf-Gal4 seems to have some effect by itself without the UAS. The same is not seen with R76G11-Gal4. Suppl Fig 2: Naïve OCT and AM preference in L3 expressing various combinations of the transgenes show significant differences. npf-Gal4 alone seems to influence preference. 

      The sleep duration and bout number/length data are highly variable. 

      All experiments are performed in isogenized background so variability seen in genetic controls likely reflects stochastic nature of behavioral experiments. Indeed, adult sleep data also shows a great deal of variability within the same genetic background (PMID: 29228366). We agree it is an important point, and we attempt to minimize variability as much as possible with backcrossing of flies and tight control of environmental conditions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Low sugar exposure and activation of NPF neurons might not induce the same behavioral changes. LS exposure does not enhance mouth hook movements, but overall food intake. NPF activation seems to enhance mouth hook movements, but the data for food intake is not shown. This information would be necessary to compare the two different manipulations. 

      We thank the reviewer for this suggestion. However, we elected not to perform food intake experiments with the NPF activation experiments. Since we are not directly comparing the low sugar and NPF manipulations to each other, we think that both experiments together support the conclusion that immature food acquisition strategies (whether food intake or feeding rate) limit LTM performance. 

      The authors write that the larval feeding assays run for 4 hours, can they explain why that long? Larvae should already have processed food within 4 hours, so that the measurement would not include all eaten food.

      We clarified the rationale for doing 4 hour feeding assays in the results section. We did 4 hours on blue dyed food because initial experiments of 1 hour with control L3 at CT1-4 were difficult to interpret. The measurement does not include all of the eaten food in the 4 hours but does reflect more long-term changes in food intake.

      Sleep induction with Gaboxadol seems to not really work - sleep duration, bout number and length are not enhanced, and arousal threshold is only slightly lower. Thus, the authors should not use this data as an example for inducing sleep behaviour. 

      We agree this approach did not have a large effect in larvae. However, because gaboxadol feeding is standard in the field for adult sleep induction, we prefer to still include this data in the manuscript for transparency. Moreover, the Gaboxadol manipulation did cause a mild (but significant) decrease in arousal threshold compared to control larvae. Gaboxadol feeding also caused a significant decrease in total body weight compared to control larvae indicating that even slightly deeper sleep could be detrimental to younger animals.

      Activation of R76G11 with TrpA1 seems to work better for inducing sleep like behaviour. However, the authors describe that they permanently activated neurons. To induce a "normal" sleep pattern, the authors might try to only activate these neurons during the normal enhanced sleep time in L3 (CT13?) and not during the whole day. This might also allow larvae to eat during day time and gain more weight. 

      We apologize that this point was not clearer, but we did do acute activation of R76G11(+) neurons, as proposed by the reviewer. We have clarified the text to make this point.

      It would be interesting to see how larvae fed with high sucrose and low protein diet would behave in this assay. Do the authors suggest that sugar is most important for the development of sleep behaviour or that it is a combination of sugar and protein that might be required? 

      We agree that feeding larvae a high sucrose and low protein diet would be interesting. However, we initially tried a low protein diet and observed significant developmental delays. Therefore, we are concerned that developmental defects on a high sucrose and low protein diet would confound behavioral results. Additionally, the Dh44 manipulations (glucose & GCN2 signaling) suggest that sugar is the most important for the development of sleep behaviors.

      Reviewer #3 (Recommendations For The Authors): 

      The authors could discuss if the interaction between DN1a clock neurons and Dh44 neurons is mediated synaptic or by volume transmission following the extracellular release of the CCHa1 neuropeptide. They write that "the development of Dh44 neuronal competency to receive clock-driven cues" and that "DN1a clock neurons anatomically and functionally connect to Dh44" but a discussion about volume vs. synaptic signalling would be of interest. 

      We thank the reviewer for this suggestion. We revised the discussion to address this point.

      line 223 " demonstrating that post-synaptic processes likely". It would be interesting to read a discussion on whether it is known if these are postsynaptic or peptide-mediated volume effects? 

      We added additional text to the discussion to address these points.

      - The authors may want to include a schematic of the circuit and how its position in the general anatomy of the fly larva. 

      We thank the reviewer for this suggestion. We have added a model figure to Fig. S6.

      "Dh44 neurons act through glucose metabolic genes" - consider rewording e.g. require glucose metabolic genes 

      We revised the text.

      - line 45 "Early in development, young animals must obtain enough nutrients to ensure proper growth" - this is too general, many animals do not feed in early life-cycle stages (e.g. lecitotrophic development), consider rewording 

      We revised the text to be more specific.

      - line 90 "however, L3 at CT1 consume more than L3 at CT12 (Figure S1A)" - typo CT13, also consider rewording to match the structure of the sentence before 'however, L3 consumed more at CT1 than at CT13' 

      We revised the text to fix this error.

      - Line 111 "and loss of deep sleep" - how is deep sleep defined and measured in the larvae? It is not clear from the data or the text. 

      We revised the text to define deep sleep in the results section. We also have a description of how arousal threshold is calculated in the methods.

      - In Figure 3B and G the individual data points are not shown 

      We did not show individual data points for those graphs because we are plotting the average percentage of 4 biological replicates.

      Typo: 

      Figure 1 legend "F, n= n=100-172 " 

      We revised the text to fix this typo.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The manuscript by Hussain and collaborators aims at deciphering the microtubule-dependent ribbon formation in zebrafish hair cells. By using confocal imaging, pharmacology tools, and zebrafish mutants, the group of Katie Kindt convincingly demonstrated that ribbon, the organelle that concentrates glutamate-filled vesicles at the hair cell synapse, originates from the fusion of precursors that move along the microtubule network. This study goes hand in hand with a complementary paper (Voorn et al.) showing similar results in mouse hair cells. 

      Strengths: 

      This study clearly tracked the dynamics of the microtubules, and those of the microtubule-associated ribbons and demonstrated fusion ribbon events. In addition, the authors have identified the critical role of kinesin Kif1aa in the fusion events. The results are compelling and the images and movies are magnificent. 

      Weaknesses: 

      The lack of functional data regarding the role of Kif1aa. Although it is difficult to probe and interpret the behavior of zebrafish after nocodazole treatment, I wonder whether deletion of kif1aa in hair cells may result in a functional deficit that could be easily tested in zebrafish? 

      We have examined functional deficits in kif1aa mutants in another paper David et al. 2024. In Submission, preprint available:  

      https://www.biorxiv.org/content/10.1101/2024.05.20.595037v1

      In addition to playing a role in ribbon fusions, Kif1aa is also responsible for enriching glutamate-filled secretory vesicles at the presynaptic active zone. In kif1aa mutants (and crispants), vesicles are no longer localized to the hair cell base, and there is a reduction in the number of vesicles associated with presynaptic ribbons. Kif1aa mutants also have functional defects including reductions in spontaneous vesicle release and evoked postsynaptic calcium responses. Behaviorally, kif1aa mutants exhibit impaired rheotaxis, indicating defects in the lateral-line system and an inability to accurately detect water flow.  Since our paper focuses on microtubule-associated ribbon movement and dynamics early in hair cell development, we have only discussed the effects of Kif1aa directly related to ribbon dynamics during this time window in this paper. In our revision, we will reference this recently submitted work.

      Impact: 

      The synaptogenesis in the auditory sensory cell remains still elusive. Here, this study indicates that the formation of the synaptic organelle is a dynamic process involving the fusion of presynaptic elements. This study will undoubtedly boost a new line of research aimed at identifying the specific molecular determinants that target ribbon precursors to the synapse and govern the fusion process. 

      Reviewer #2 (Public Review): 

      Summary:

      In this manuscript, the authors set out to resolve a long-standing mystery in the field of sensory biology - how large, presynaptic bodies called "ribbon synapses" migrate to the basolateral end of hair cells. The ribbon synapse is found in sensory hair cells and photoreceptors, and is a critical structural feature of a readily-releasable pool of glutamate that excites postsynaptic afferent neurons. For decades, we have known these structures exist, but the mechanisms that control how ribbon synapses coalesce at the bottom of hair cells are not well understood. The authors addressed this question by leveraging the highly-tractable zebrafish lateral line neuromast, which exhibits a small number of visible hair cells, easily observed in time-lapse imaging. The approach combined genetics, pharmacological manipulations, high-resolution imaging, and careful quantifications. The manuscript commences with a developmental time course of ribbon synapse development, characterizing both immature and mature ribbon bodies (defined by position in the hair cell, apical vs. basal). Next, the authors show convincing (and frankly mesmerizing) imaging data of plus end-directed microtubule trafficking toward the basal end of the hair cells, and data highlighting the directed motion of ribbon bodies. The authors then use a series of pharmacological and genetic manipulations showing the role of microtubule stability and one particular kinesin (Kif1aa) in the transport and fusion of ribbon bodies, which is presumably a prerequisite for hair cell synaptic transmission. The data suggest that microtubules and their stability are necessary for normal numbers of mature ribbons and that Kif1aa is likely required for fusion events associated with ribbon maturation. Overall, the data provide a new and interesting story on ribbon synapse dynamics. 

      Strengths: 

      (1) The manuscript offers a comprehensive Introduction and Discussion sections that will inform generalists and specialists. 

      (2) The use of Airyscan imaging in living samples to view and measure microtubule and ribbon dynamics in vivo represents a strength. With rigorous quantification and thoughtful analyses, the authors generate datasets often only obtained in cultured cells or more diminutive animal models (e.g., C. elegans). 

      (3) The number of biological replicates and the statistical analyses are strong. The combination of pharmacology and genetic manipulations also represents strong rigor. 

      (4) One of the most important strengths is that the manuscript and data spur on other questions - namely, do (or how do) ribbon bodies attach to Kinesin proteins? Also, and as noted in the Discussion, do hair cell activity and subsequent intracellular calcium rises facilitate ribbon transport/fusion? 

      These are important strengths and we do plan to investigate adaptors and how hair cell activity impacts ribbon fusion and transport in the future!

      Weaknesses: 

      (1) Neither the data or the Discussion address a direct or indirect link between Kinesins and ribbon bodies. Showing Kif1aa protein in proximity to the ribbon bodies would add strength.

      This is a great point, and we are working to create a transgenic line with fluorescently labelled Kif1aa to directly visualize its association with ribbons. At present, we have not obtained a transgenic line, and localization of Kif1aa and ribbons in live hair cells it is beyond the scope of this paper. In our revision we will discuss this caveat.

      (2) Neither the data or Discussion address the functional consequences of loss of Kif1aa or ribbon transport. Presumably, both manipulations would reduce afferent excitation.

      Excellent point. Please see the response above to Reviewer #1 weaknesses.  

      (3) It is unknown whether the drug treatments or genetic manipulations are specific to hair cells, so we can't know for certain whether any phenotypic defects are secondary. 

      This is correct and is a caveat of our Kif1aa and drug experiments. However, to mitigate this in the pharmacological experiments, we have done the drug treatments at 3 different timescales: long-term (overnight), short-term (4 hr) and fast (30 min) treatments. The faster experiment done after 30 min drug treatment is where we observe reduced directional motion and fusions. This later experiment should not be affected by any long-term changes or developmental defects that could be caused by the drugs as hair cell development occurs over 8-12 hrs. However, we acknowledge that these treatments and genetic experiments could have secondary phenotypic defects that are not hair-cell specific. In our revision, we will discuss these issues.

      Reviewer #3 (Public Review): 

      Summary: 

      The manuscript uses live imaging to study the role of microtubules in the movement of ribeye aggregates in neuromast hair cells in zebrafish. The main findings are that 

      (1) Ribeye aggregates, assumed to be ribbon precursors, move in a directed motion toward the active zone; 

      (2) Disruption of microtubules and kif1aa increases the number of ribeye aggregates and decreases the number of mature synapses. 

      The evidence for point 2 is compelling, while the evidence for point 1 is less convincing. In particular, the directed motion conclusion is dependent upon fitting of mean squared displacement that can be prone to error and variance to do stochasticity, which is not accounted for in the analysis. Only a small subset of the aggregates meet this criteria and one wonders whether the focus on this subset misses the bigger picture of what is happening with the majority of spots. 

      Strengths: 

      (1) The effects of Kif1aa removal and nocodozole on ribbon precursor number and size are convincing and novel. 

      (2) The live imaging of Ribeye aggregate dynamics provides interesting insight into ribbon formation. The movies showing the fusion of ribeye spots are convincing and the demonstrated effects of nocodozole and kif1aa removal on the frequency of these events is novel. 

      (3) The effect of nocodozole and kif1aa removal on precursor fusion is novel and interesting. 

      (4) The quality of the data is extremely high and the results are interesting. 

      Weaknesses: 

      (1) To image ribeye aggregates, the investigators overexpressed Ribeye-a TAGRFP under the control of a MyoVI promoter. While it is understandable why they chose to do the experiments this way, expression is not under the same transcriptional regulation as the native protein, and some caution is warranted in drawing some conclusions. For example, the reduction in the number of puncta with maturity may partially reflect the regulation of the MyoVI promoter with hair cell maturity. Similarly, it is unknown whether overexpression has the potential to saturate binding sites (for example motors), which could influence mobility. 

      We agree that overexpression in transgenic lines is a common issue and would have loved to do these experiments with endogenously expressed fluorescent proteins under a native promoter. However, this was not technically possible for us. We originally characterized several transgenic Ribeye lines in the past to ensure they have normal ribbon numbers and size (myo6b:ribb-mcherry, myo6b:riba-tagRFP and myo6b:riba-GFP) - in 2014. Unfortunately, we no longer have the raw data from this analysis. In our revision, we will repeat our immunolabel on myo6b:riba-tagRFP transgenic fish and examine ribbon numbers and size and show what impact (or not) exogenous Ribeye expression has on ribbon formation.

      (2) The examples of punctae colocalizing with microtubules look clear (Figures 1 F-G), but the presentation is anecdotal. It would be better and more informative, if quantified. 

      We attempted a co-localization study between microtubules and ribbons but decided not to move forward with it due to several issues:

      (1)  Hair cells have an extremely crowded environment, especially since the nucleus occupies the majority of the cell. All proteins are pushed together in the small space surrounding the nucleus and hence co-localization is not meaningful because the distances are so small.

      (2) We also attempted to segment microtubules in these images and quantify how many ribbons were associated with microtubules, but 3D microtubule segmentation was not accurate in these hair cells due to highly varying filament intensities, and diffuse cytoplasmic tubulin signal.

      Therefore, we decided that a better measure of ribbon-microtubule association would be a demonstration that individual ribbons keep their association with microtubules over time (in our time lapses), rather than a co-localization study. We see that ribbons localize to microtubules in all our timelapses, including the examples shown. We observed that if a ribbon dissociates, it is just to switch from one filament to another. We have not observed free-floating ribbons in our study.

      (3) It appears that any directed transport may be rare. Simply having an alpha >1 is not sufficient to declare movement to be directed (motor-driven transport typically has an alpha approaching 2). Due to the randomness of a random walk and errors in fits in imperfect data will yield some spread in movement driven by Brownian motion. Many of the tracks in Figure 3H look as though they might be reasonably fit by a straight line (i.e. alpha = 1). 

      As we have stated in the paper, we only see a small subset of the ribbon precursors moving directionally. The majority of the ribbons are stationary. We cannot say for sure what is happening with the stationary ribbons, but our hypothesis is that these ribbons eventually exhibit directed motion. This idea is supported by the fact that we have seen ribbons that are stationary begin movement, and ribbons that are moving come to a stop during the acquisition of our timelapses. The ribbons that are stationary may not have enough motors attached, or they may be in a sort of ‘seeding’ phase where the ribeye protein could be condensing on the ribbon. We have discussed the possibility of ribbons being biomolecular condensates in our Discussion.

      In our revision we will discuss why ribbon transport does not resemble typical motor-driven transport (also see response to point 4 below). We will also reexamine our MSD data in more detail as suggested by Reviewer 3 and provide distributions of alpha values in our revision.

      (4) The "directed motion" shown here does not really resemble motor-driven transport observed in other systems (axonal transport, for example) even in the subset that has been picked out as examples here. While the role of microtubules and kif1aa in synapse maturation is strong, it seems likely that this role may be something non-canonical (which would be interesting). 

      One major difference between axonal and ribbon transport is that microtubules are very stable and linear in axonal transport. Therefore, the directed motion observed is ‘canonical’. In hair cells, the microtubules are extremely dynamic, especially towards the hair cell base. Within a single time frame (60-100 s), we see the network changing (moving and branching). This dynamic network adds another layer of complexity onto the motion of the ribbon, as the filament track itself is changing. Therefore, we see a lot of stalling, filament switching, and reversals of ribbon movement in our movies. However, we have demonstrated in our movies as well as using MSD analysis, that a subset of ribbons exhibit directional motion. In our revision we will discuss why directed motion in hair cells does not resemble canonical motor-driven transport in axons.

      (5) The effect of acute treatment with nocodozole on microtubules in movie 7 and Figure 6 is not obvious to me and it is clear that whatever effect it has on microtubules is incomplete. 

      When using Nocodazole, it is important to optimize the concentration of the drug such that there is minimal cytotoxicity, while still being effective. Microtubules in the apical region of hair cells are very stable and do not respond well to Nocodazole treatment at concentrations that are tolerable to hair cells. While a few stable filaments remain largely at the cell apex, there are almost no filaments at the hair cell base, which is different from the wild-type hair cells. In addition, Nocodazole-treated hair cells have more cytoplasmic YFP-tubulin signal compared to wild type. We will add additional images and quantification in our revision to illustrate these points.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      The model presented by the authors is consistent with the data described. Further testing of this model, for example by mutating the deep cholesterol binding site, would strengthen the model. However, such experiments might be challenging due to the relatively non-specific/hydrophobic nature of the deep cholesterol binding site.

      We completely agree that testing of the deep cholesterol-binding site by mutagenesis would be ideal. However, as the reviewer points out, such experiments would be challenging, not only because of the non-specific/hydrophobic nature of the deep cholesterol-binding site but also because we have been purifying AQP0 from natural sources (sheep eyes) and because it would be very difficult to secure the substantial amount of cryo-EM time needed to generate an electron crystallographic structure.

      Reviewer #2 (Public Review):

      The authors report that the findings generally apply to raft formation in membranes. However, this point is less clear as the lens membrane in which AQP0 resides is rather unique in lipid and protein content and density.

      We agree that the lens membrane is quite unique in its lipid and protein content and density, but rafts are also characterized by the same lipids and high protein density. Nonetheless, we do agree that our suggested implications for lipid rafts are speculative and so we emphasize this more in the revised version of the manuscript by writing: “This model is specific for the formation of AQP0 arrays in lens membranes, but we speculate that similar principles may underlie the organization of lipid rafts”.

      Reviewer #3 (Public Review):

      The authors showed that these adjacent tetramers can withstand a larger lateral detachment force when deep cholesterol molecules are present at the interface compared to scenarios with sphingomyelin (SM) molecules at the interface between two AQP0 tetramers. Authors interpret that result as evidence that deep cholesterol molecules mechanically stabilize the interface of the AQP0 tetramers. This conclusion has minor weaknesses, and the rigor of the lateral detachment simulations could be increased by establishing a reference point for the detachment force needed to separate AQP0 tetramers in a scenario without lipids at the interface between tetramers, and by increasing the number of repeats for the non-equilibrium steered MD simulations. Thermodynamic integration might be a better approach to compute the stabilization energy in the presence of cholesterol compared to the SM case.

      In all electron crystallographic structures of AQP0 determined to date, lipids have always been observed sandwiched in between the AQP0 tetramers (see, for example, Gonen et al., Nature, 2005 and Hite et al., EMBO J., 2010). Therefore, considering a scenario without lipids at the interface would be unnatural and the AQP0 array would likely not be stable. Such a scenario would thus not be the most appropriate reference point for the lateral detachment simulations. In our view, comparison of a scenario with the deep cholesterol at the interface versus a scenario without it appeared a more realistic setup to investigate the stabilizing role the deep cholesterol has on the association of AQP0 tetramers. In the Results subsection regarding these simulations, we added the following sentence to further stress the rationale of our experimental setup: “Comparison of these two cases should allow us to assess the effect of the deep-binding Chol3 molecules on the mechanical stability of the associated AQP0 tetramers.”

      Concerning the second suggestion of the reviewer of increasing the number of repeats, we doubled the number of simulation replicas: now it is n=20 for each pulling velocity and lipid interface. The trend of higher detachment forces for the interface containing cholesterol prevailed in a statistically significant, robust fashion (see Figure 7 of the revised manuscript and the main text referring to it). In consequence, as the reviewer suggested, extension of the dataset increased the rigor of the lateral detachment simulations. In addition to Figure 7 and the Results section, the Methods section and Table 4 have been updated to reflect the expanded dataset. 

      Finally, concerning the usage of thermodynamic integration to compute the stabilization energy, we agree with the reviewer that calculation of the free energy would be better to determine the thermodynamic stabilization imparted by the cholesterol molecules. At an earlier stage of the project, we did indeed consider carrying out this type of simulations, but we decided against it because of the complexity and poor convergence of such calculations. Our choice is also based on a previous attempt in which it proved very challenging to use free energy calculations to assess the binding of lipids to a flippase (see Wang et al. BioRxiv, https://doi.org/10.1101/ 2020.06.24.169771, 2021). We now included this consideration in the revised manuscript by adding the following sentence in the Discussion: “Although we provide solid evidence here that deep cholesterol impart mechanical stabilization, free energy calculations would be required to obtain the full picture of thermodynamic stabilization. Such free energy calculations are challenging for lipids, due to the chemical complexity and poor convergence involved (Wang et al., 2021), and are thus beyond the scope of the current work.”

      Reviewer #1 (Recommendations For The Authors):

      Reorganizing a few concepts would make the story easier to follow. For example, the analysis of the bilayer thickness seems disjointed. Although Figure 4 shows measurements, it is not clear that the measurements represent bilayer thickness until the last paragraph of page 21 in the discussion, where "Hydrophobic thickness" is first introduced. Moving that first paragraph of page 22 that refers to Fig. 4A to the results would be helpful to understand the figure, and would prepare the reader for this part of the discussion.

      In response to the reviewer, we moved the description of the measurements of the hydrophobic thickness to the Results section (Page 12) and adjusted the Discussion to minimize repetition (page 22).

      Likewise, Figure 4E shows measurements of something, but it is not clear that these are the dimensions of a protein pocket until well into the discussion.

      In response to the reviewer’s comment, we added a sentence both in the Results section [It sits in a pocket between the two adjacent AQP0 tetramers that is wider in the extracellular leaflet than the cytoplasmic leaflet (Figure 4E)] as well as to the caption of Figure 4E [The dotted lines indicate the distance between the two adjacent AQP0 tetramers at the positions of the ring system (~8.5 Å) and the acyl chain (~2.5 Å)].

      Figure 2 - a comment for the non-specialists on what this region of the protein is would be helpful context. Is this the pore with part of the NPA motif?

      We agree with the referee and added the following sentence to the caption of Figure 2: “A region of the water-conducting pathway close to the NPA (asparagine-proline-alanine), the AQP signature motif, is shown”.

      Reviewer #2 (Recommendations For The Authors):

      There is only one recommendation: In the results section entitled "Cholesterol positions observed in the electron crystallographic structures are representative of those around single AQP0 tetramers" the authors do not describe their approach. They refer to a reference (AponteSantamaria et al., 2012). The authors state the problem (investigate cholesterol positions), but it would be helpful to the readers if they also described the experimental approach.

      We agree with the reviewer and made the following addition to the sentence “we performed MD simulations and calculated time-averaged densities to investigate ...”

      Reviewer #3 (Recommendations For The Authors):

      Technical comments:

      (1) Authors stated: "Equilibration simulations were then performed until bulk membrane properties, such as thickness and deuterium order parameters, became stable and congruent with previous reports such as those by (Doktorova et al., 2020) and others (Figure 5-figure supplement 2 and Figure 5-figure supplement 3)." However, bilayer thickness is not represented in these figures. Additionally, I observed that the area per lipid (APL) appeared to be somewhat variable. This variation was particularly noticeable in systems where SM:CHOL=2:1, which seem to be not fully equilibrated. Is the figure displaying APL data for only one repetition? Could you please include plots for the other repetitions?

      We thank the reviewer for pointing this out. We would like to clarify that we used CHARMMGUI to generate one lipid bilayer configuration for each mixture and system size. These configurations (one per system) were extensively simulated to generate stable initial configurations of the lipid bilayers. Figure 5 – supplements 2 and 3 refer to this pre-equilibration step. The final pre-equilibrated configurations were then used in the subsequent multiple equilibrium MD runs that we performed, either with a single cholesterol molecule or with the AQP0 tetramer(s) inserted. We have clarified this procedure in the revised manuscript (see changes in the Methods section for the MD equilibrium simulations).  

      Concerning this pre-equilibration step, we have chosen the area per lipid, not thickness, to characterize the equilibration of the pure lipid bilayers. Accordingly, the area per lipid is the quantity shown in Figure 5 – figure supplement 3. We no longer refer to the membrane thickness in the revised manuscript.

      Concerning the variability in the area per lipid, we note that the large changes occur within the first few tens of nanoseconds of the pre-equilibration step, after which the area per lipid stabilizes. We would like to also point out that in Figure 5 – figure supplement 3, we chose a logarithmic scale for the time axis to actually make it possible for the reader to see the major changes that occur at the beginning of the pre-equilibration step (which would otherwise be difficult to see). In the particular case of the SM:CHOL=2:1 mixture_,_ the 64 lipids/leaflet system converged to a stable area per lipid value in the last 70 ns and the 244 lipids/leaflet system approached the same value in approximately the last 30 ns. This was a good indication that the large system had also converged. After equilibration of the membranes, a single cholesterol or AQP0 tetramer(s) were inserted and equilibrium simulations were initiated. However, the first 100 ns (or 300 ns in the case of the double tetramer system) were considered as a further equilibration time and were not included in the analysis. This is now explicitly stated in the revised manuscript: “The first 100 ns of each simulation replica (the first 300 ns for the two tetramer simulations) were considered as additional equilibration time and were not included in further analysis.”

      (2) Could you clarify the reasoning behind conducting the simulations at 323 K?

      We conducted the simulations at 323 K to ensure that the lipid bilayers were in the liquid phase.

      SM:CHOL mixtures have been reported to be in the liquid phase above 314 K (Keyvanloo et al. Biophys. J. 114: 1344, 2018). 323 K was thus chosen to be well above this value. Note that this temperature was also chosen in a previous MD simulation study of pure sphyngomyelin bilayers (Niemelä et al. Biophys. J. 87: 2976, 2004). This reasoning, as well as the two references, have been added to the Methods section in the revised manuscript.  

      (3) There appears to be a discrepancy in Figure 7. Panel F does not align with the provided caption. 

      We apologize for this mistake. The captions for panels E and F were switched. We corrected this mistake.

      (4) Likewise, in Figure 8, there is a mismatch between the caption and the figures. Furthermore, in the text, the authors assert, "In the absence of cholesterol, the AQP0 surface is completely covered by sphingomyelin in the hydrophobic region of the membrane and by water outside this region (Figure 8A, left column). As noted before, there are essentially no direct protein-protein interactions between the adjacent tetramers. When cholesterol was present at the interface, it interacted with AQP0 at the center of the membrane and remained mostly in place (Figure 8A, right column)." However, the left column shows cholesterol density. Could you please clarify this inconsistency, especially regarding the absence of cholesterol?

      We apologize for this mistake. The panels in Figure 8A showing the AQP0 surfaces in the absence and presence of cholesterol were switched. We corrected this mistake.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript by Estevam et al. reports new insights into the regulation of the receptor tyrosine kinase MET gained from two deep mutational scanning (DMS) datasets. In this paper, the authors use a classic selection system for oncogenic kinase signaling, the murine Ba/F3 cell line, to assess the functional effects of thousands of mutations in the kinase domains of MET in two contexts: (1) fusion of the whole MET intracellular region to the dimerization domain TPR, and (2) the same fusion protein, but with exon 14, which encodes part of the juxtamembrane region of MET, skipped. Critically, exon 14 skipping yields a version of MET that is found in many cancers and has higher signaling activity than the canonical MET isoform. The authors extensively analyze their DMS data to very convincingly show that their selection assay reports on kinase activity, by illustrating that many functionally important structural components of the kinase domain are not tolerant of mutation. Then, they turn their attention to a helical region of the juxtamembrane region (αJM), immediately after exon 14, which is posited to play a regulatory role in MET. Their DMS data illustrate that the strength and mutational tolerance of interactions between αJM and the key αC helix in the kinase domain depends on the presence or absence of exon 14. They also identify residues in the N-lobe of the kinase, such as P1153, which are not conserved across tyrosine kinases but appear to be essential for MET and MET-like kinases. Finally, the authors analyze their DMS data in the context of clinically-observed mutations and drug-resistance mutations.

      Overall, this manuscript is exciting because it provides new insights into MET regulation in general, as well as the role of exon 14. It also reveals ways in which the JM region of MET is different from that of many other receptor tyrosinekinases. The exon 14-skipped fusion protein DMS data is somewhat underexplored and could be discussed in greater detail, which would elevate excitement about the work. Furthermore, some of the cell biological validation experiments and the juxtaposition with clinical data are perhaps not assessed/interpreted as clearly they could be. Some constructive suggestions are given below to enhance the impact of the manuscript.

      Strengths:

      The main strengths of this paper, also summarized above in the summary, are as follows:

      (1) The authors very convincingly show that Ba/F3 cells can be coupled with deep mutational scanning to examine MET mutational effects. This is most clearly shown by highlighting how all of the known kinase structure and regulatory elements are highly sensitive to mutations, in accordance with a few other DMS datasets on other kinases.

      (2) A highlight of this paper is the juxtaposition of two DMS datasets for two different isoforms of the MET receptor. Very few comparisons like this exist in the literature, and they show how small changes to the overall architecture of a protein can impact its regulation and mutational sensitivity.

      (3) Another exciting advance in this manuscript is the deep structural analysis of the MET juxtamembrane region with respect to that of other tyrosine kinases - guided by the striking effect of mutations in the juxtamembrane helical region. The authors illustrate how the JM region of MET differs from that of other tyrosine kinases.

      (4) Overall, this manuscript will provide a resource for interpreting clinically relevant MET mutations.

      Weaknesses:

      (1) The manuscript is front-loaded with extensive analysis of the first DMS dataset, in which exon 14 is present, however, the discussion and analysis of the exon 14-skipped dataset is somewhat limited. In particular, a deeper discussion of the differences between the two datasets is warranted, to lay out the full landscape of mutations that have different functional consequences in the two isoforms. Rather, the authors only focus on differences in the JM region. What are the broader structural effects of exon 14 skipping across the whole kinase domain?

      Thank you for your feedback on our manuscript and our analysis of the exon 14 skipped mutational scanning data. The lack of a robust growth differential  between the wild type MET intracellular domain and the exon 14 skipped isoform within the Ba/F3 system suggests that there is not a significant growth advantage related to exon 14 skipping, likely due to the constitutive activation of both constructs by the TPR domain, which also suggests that the assay is potentially less sensitive to nuanced JM-driven effects between these two isoforms, aside from the highly sensitive ⍺JM-helix. We also lose insight on membrane-related interactions imposed on the juxtamembrane that may be important to fully understand the differences between these two isoforms in the cytoplasmically-expressed context. Therefore, we can at most speculate exon 14 skipped related differences between these two datasets.

      With these caveats in mind, to further address exon 14 and juxtamembrane-driven differences between these two mutational landscapes, we calculated the absolute score difference between TPR-METΔEx14 and TPR-MET (|METΔEx14 - MET|) and plotted the |ΔScore| in a heatmap. Overall, the two landscapes, as noted in the text, are largely similar with differences emerging mostly for specific mutations. Where we see the largest secondary structural difference continues to be the ⍺JM-helix, where MET is more sensitive to helix-breaking mutations such as proline. Again L1062 has the greatest difference in sensitivity between these two datasets for the ⍺JM-helix, with the introduction of negative charge resulting in loss-of-function for the TPR-MET kinase domain but having a null effect in the TPR-METΔEx14 kinase domain. Other positions with strong differences include the ⍺G and APE motif.

      We have incorporated more detailed discussion in text. 

      (2) It is unclear if gain-of-function mutations can actually be detected robustly in this specific system. This isn't a problem at face value, as different selection assays have different dynamic ranges. However, the authors don't discuss the statistical significance and reproducibility of gain- vs loss-of-function mutations, and none of the gain-of-function mutations are experimentally validated (some appear to show loss-of-function in their cellular validation assay with full-length MET). The manuscript would benefit from deeper statistical analysis (and discussion in the text) of gain-of-function mutations, as well as further validation of a broad range of activity scores in a functional assay. For the latter point, one option would be to express individual clones from their library in Ba/F3 cells and blot for MET activation loop phosphorylation (which is probably a reasonable proxy for activity/activation).

      Thank you for your comment on the statistical interpretations of gain-of-function (GOF) and loss-of-function (LOF) mutations. In this study we classify GOF and LOF based on the following metrics:

      (1) The difference between the missense mutation score and the wild type synonymous score for a given position must be smaller than the calculated propagated error, for both IL-3 withdrawal and IL-3 conditions

      (2) Missense mutations must be ≥ ±2 standard deviations (SD) from the mean of wild type synonymous mutations

      Given that our assay was conducted in a constitutively active kinase in the TPR-fusion context, gain-of-function mutations are expected to not only be rare, but also supersede baseline fitness. Within the IL-3 conditions, we expect that cells are not reliant or “addicted” to MET for growth proliferation. Nevertheless, due to the parallel nature of the screen, we can compare scores for variants in the IL-3 control and IL-3 withdrawal conditions to filter mutations that are solely exhibiting high fitness under selective pressure.

      To identify these mutations we 1) calculated the propagation of error between IL-3 and IL-3 withdrawal scores for the same variant 2) calculated the absolute difference between IL-3 and IL-3 withdrawal scores for the same variant 3) filtered variants if the IL-3 withdrawal score was ≥ +2 SDs, the IL-3 score was ≤ 0, and the absolute score difference between IL-3 and withdrawal conditions was larger than the propagated error.

      In analyzing mutations within the IL-3 withdrawal conditions, applying our statistical metrics, we find 33 mutations within the MET library, and 30 in the METΔEx14 library, that have a score of ≥ +2 SD and low propagated error. By increasing our boundary to ≥+2.5 SD, we can classify mutations with even higher confidence, identifying 10 mutations within the MET library, and 9 in the METΔEx14 library (Supplemental Data Figure 7).

      (3) In light of point 2, above, much of the discussion about clinically-relevant gain-of-function mutations feels a bit stretched - although this section is definitely very interesting in premise. A clearer delineation of gain-of-function, with further statistical support and ideally also some validation, would greatly strengthen the claims in this section.

      To address this concern, we have provided additional analysis and details on gain-of-function (GOF) classification in Supplemental Data Figure 5 and the overlap between GOF and clinically associated mutations in Supplemental Data Figure 8. Within our gain-of-function classifications, we pick up on several mutations at positions that have been clinically detected and experimentally validated in previous studies in both libraries (D1228, G1163, L1195), and show that GOF mutations also have low variance.

      Reviewer #2 (Public Review):

      Summary:

      The authors describe a deep mutational scanning (DMS) study of the kinase domain of the c-MET receptor tyrosine kinase. The screen is conducted with a highly activated fusion oncoprotein - Tpr-MET - in which the MET kinase domain is fused to the Tpr dimerization element. The mutagenized region includes the entire kinase domain and an alpha-helix in the juxtamembrane region that is essentially part of the MET kinase domain. The DMS screen is carried out in two contexts, one containing the entire cytoplasmic region of MET, and the other with an "exon 14 deletion" which removes a large portion of the juxtamembrane region (but retains the aforementioned alpha-helix). The work provides a robust and essentially exhaustive catalog of the effect of mutations (within the kinase domain) on the ability of the Tpr-MET fusion oncoproteins to drive IL3-independent growth of Ba/F3 cells. Every residue in the kinase is mutated to every natural amino acid. Given the design of the screen, one would expect it to be a powerful tool for identifying mutations that impair catalytic activity and therefore impair IL3-independent proliferation, but not the right tool for identifying gain-of-function mutations that operate by shifting the kinase from an inactive to active state (because the Tpr-Met fusion construct is already very highly activated). This is borne out by the data, which reveal many many deleterious mutations and few "gain-of-function" mutations (which are of uncertain significance, as discussed below).

      Strengths:

      The authors take a very scholarly and thorough approach to interpreting the effect of mutations in light of available information for the structure and regulation of MET and other kinases. They examine the effect of mutations in the so-called catalytic (C) and regulatory (R) spines, the interface between the JM alpha-helix and the C-helix, the glycine-rich loop, and other key elements of the kinase, providing a structural rationale for the deleterious effect of mutations. Comparison of the panoply of deleterious mutations in the TPR-met versus TPR- exon14del-MET DMS screens reveals an interesting difference - the exon14 deletion MET is much more tolerant of mutations in the JM alpha-helix/C-helix interface. The reason for this is unclear, however.

      Weaknesses:

      Because the screens were conducted with highly active Tpr-MET fusions, they have limited power to reveal gain-of-function mutations. Indeed, to the extent that Tpr-MET is as active or even more active than ligand-activated WT MET, one could argue that it is "fully" activated and that any additional gain of fitness would be "super-physiologic". I would expect such mutations to be rare (assuming that they could be detected at all in the Ba/F3 proliferation assay). Consistent with this, the authors note that gain-of-function mutations are rare in their screen (as judged by being more fit than the average of synonymous mutations). In their discussion of cancer-associated mutations, they highlight several "strong GOF variants in the DMS". It is unclear what the authors mean by "strong GOF", indeed it is unclear to this reviewer whether the screen has revealed any true gain of function mutations at all. A few points in this regard:

      (1) More active than the average of synonymous mutations (nucleotide changes that have no effect on the sequence of the expressed protein) seems to be an awfully low bar for GOF - by that measure, several synonymous mutations would presumably be classified as GOF.

      We completely agree that any mutation above the average synonymous would not be a robust assessment and thus why we statically filtered mutations in our entire analysis. To this point, and that of  Reviewer 1, we have further outlined our statistical definitions. In classifying mutations as GOF or LOF, the following parameters were used:

      (1) The difference between the missense mutation score and the wild type synonymous score for a given position must be smaller than the calculated propagated error, for both IL-3 withdrawal and IL-3 conditions

      (2) Missense mutations must be ≥ ±2 standard deviations (SD) from the mean of wild type synonymous mutations

      Therefore, only variants at the tail-ends of the mutational distribution were assessed, and further filtered based on propagation of error. For this reason, a “strong GOF” mutation as noted in this study is one that improves the fitness of an already active kinase. As pointed out, within our analysis, these are very rare occurrences, and in focusing on cancer-associated mutations we find that the variants that meet these statistical parameters require a larger genetic “leap” in the codon space. Overall, we have also changed our language in reference to GOF mutations in text.

      We hope this concern has been addressed in the new Supplemental Data Figures.

      (2) In the +IL3 heatmap in supplemental Figure 1A, there is as much or more "blue" indicating GOF as in the -IL3 heatmap, which could suggest that the observed level of gain in fitness is noise, not signal.

      We hope this concern has been addressed in the previous responses and new Supplemental Data Figures.

      (3) And finally, consistent with this interpretation, in Supplemental Figure 1C, comparing the synonymous and missense panels in the IL3 withdrawal condition suggests that the most active missense mutations (characterized here as strong GOF) are no more active than the most active synonymous mutations.

      We hope this concern has been addressed in the previous responses and figures above.

      My other major concern with the work as presented is that the authors conflate "activity" and "activation" in discussing the effects of mutations. "Activation" implies a role in regulation - affecting a switch between inactive and active conformations or states - at least in this reviewer's mind. As discussed above, the screen per se does not probe activation, only activity. To the extent that the residues discussed are important for activation/regulation of the kinase, that information is coming from prior structural/functional studies of MET and other kinases, not from the DMS screen conducted here. Of course, it is appropriate and interesting for the authors to consider residues that are known to form important structural/regulatory elements, but they should be careful with the use of activity vs. activation and make it clear to the reader that the screen probes the former. One example - in the abstract, the authors rightly note that their approach has revealed a critical hydrophobic interaction between the JM segment and the C-helix, but then they go on to assert that this points to differences in the regulation of MET and other RTKs. There is no evidence that this is a regulatory interaction, as opposed to simply a structural element present in MET (and indeed the authors' examination of prior crystal structures shows that the interaction is present in both active and inactive states.

      Thank you, and we completely agree that the distinction between “activity” and “activation” is important and that we can at most speculate and propose models for effects related to activation from this screen. We have edited the text to reflect these distinctions. In respect to activation and the second point, we believe the screen highlights the ⍺JM-C interface as a critical structural region, which may have a role in regulation based on the paradigm of juxtamembrane regulation in RTKs, the presence of a similar interface in TAM family kinases, the co-movement of the ⍺JM-helix and ⍺C-helix between active and inactive conformations in the structural ensemble, and the observation that within the TPR-METΔEx14 library there is a greater tolerance for mutations at interface positions than TPR-MET. We hope that are follow-up studies that directly probe the ⍺JM-C interface in respect to the entire juxtamembrane to truly say if/ what role this conserved motif plays in regard to MET function. We have changed the language of the text to reflect how these differences contribute to our proposed model, rather than any unintended assertion on direct regulatory effects.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggested major points to address:

      (1) Although the authors show that several key functional residues in the kinase domain are highly sensitive to mutation, it would be nice if the authors further established a clear connection between kinase activity and enrichment in the Ba/F3 assay. Specifically, it is unclear to what extent there is a correlation between the extent of enrichment/depletion and kinase activity - is a larger activity score necessarily indicative of higher kinase activity? This is partly validated by the P1153L mutation autophosphorylation western blots in Figure 4B, but this correlation is somewhat undermined by the data in 5F. Autophosphorylation data (or phosphorylation data on a direct downstream substrate) for a few mutants would really solidify what the activity score is truly reporting. This might also clarify the extent to which the difference between the two screens can be interpreted, and the extent to which gain-of-function can be interpreted.

      The Ba/F3 assay was carefully chosen for its addiction to exogenous IL-3, which serves as a permissive signaling switch. Any mutation that prevents TPR-MET/ΔEx14 from properly functioning is therefore dampening its signaling ability. Nevertheless, it is possible that some mutations with high scores are truly improving activity and others are sustaining activity through more stable interactions than the wild type kinase domain or with downstream signaling partners, which would require careful biochemical dissection outside the scope of this study. To address these points, we now refer to the mutation score simply as “score” rather than “activity score” and further discuss these caveats in text.

      (2) Overall, the exon 14-skipped dataset is under-discussed in the paper. The comparison of the two datasets is where most deep insights are likely to be found, and so a more thorough analysis/discussion of this dataset would really elevate the significance of the paper. For example, there appear to be a very large number of mutations that have divergent effects in the two screens (everything along the dashed lines in Figure 5D), but it's unclear where most of these mutations lie on the structure. It would be helpful if the residues with divergent mutational effects between the two screens (Supplementary Figure 5E) were mapped onto a structure of the JM-KD construct.

      To address this concern, new analysis has been added to the supplement, showing the score differences between MET and METΔEx14 mutations as a heatmap (Supplemental Data Figure 7A). Within this analysis we further applied our statistical filtering methods and structurally mapped positions with the greatest differential scores to show where divergent effects cluster (Supplemental Data Figure 7D). Consistent with our previous reports, the ⍺JM-helix and ⍺C-helix show the largest cluster of divergent effects, in addition to sites such as the ⍺G and APE motif. Further discussion of these points have been added to the text.

      (3) Based on the observations that αJM-αC interactions seem to be less strictly required in the exon 14 mutant, the hypothesis that exon 14 skipping merely removes a Cbl docking site seems largely unsatisfactory. There seems to be more direct structural alterations that could explain this change, but these are not really discussed or speculated on. Related to this, while L1062 mutations are more tolerated, as the authors showed in both the mutational heatmap and the cellular experiments, its binding counterpart L1125 still seems to be somewhat immutable based on the heatmaps. So, more hypothesis/exploration of how exon 14 skipping affects MET KD structure would be a nice addition to the paper.

      We agree that loss of the Cbl docking site is an insufficient model to capture the full nature of JM regulation and exon 14 skipping effects, which was a major incentive for this study. The outstanding ⍺JM-⍺C-helix sensitivity also excites us because it points to a potential regions of the JM that potentially is involved in kinase activity through ⍺C-helix interactions, much like the CDK models and other RTK-JM interactions. We observed that the ⍺JM-⍺C helix retain contact, and propose that the ⍺JM-⍺C helix move in unison between active and inactive conformations. However, it is possible that a more complicated mechanism might also exist, where there is a larger degree of maintenance of these contacts in a homodimer. For instance, in Figure 3G, if you compare the ⍺JM-helix conformations, in both RON and AXL there is more distance and a pivot away from the ⍺C-helix. It’s is possible that there are shared mechanisms between the MET and TAM families that could further elucidate exactly how these ⍺JM-helices interact with the kinase domain during the activity transitions and what biophysical role JM truncations play.

      (4) The discussion about mutations S1122Q and L1062D is a bit confusing and incomplete. From the DMS data, it appears that L1062D should be mildly gain-of-function for the exon 14 deletion variant and very loss of function for wild-type MET. In the validation HeLa cell experiments L1062D is loss-of-function in both contexts, but a mention of this discrepancy is omitted. Then, when the discordance between DMS and HeLa cell experiments is observed again for S1122Q, it is explicitly called out for activation-loop phosphorylation, but then there is no mention of the fact that HGF stimulation leads to greater pERK levels for S1122Q in the exon 14 deletion context (the opposite of the DMS result). The Erk phosphorylation discrepancy should be mentioned. It is entirely reasonable, as the authors suggest, that there are differences between full-length MET and the TPR fusions, but the enhanced Erk phosphorylation by the S1122Q mutation is surprising (and intriguing!). This section could use some re-analysis/re-writing and further discussion.

      Thank you for this comment. As noted L1062D shows slight GOF in METΔEx14 but LOF in MET. The blots show expression of L1062D and S1122Q in the full length receptor in the absence and presence of HGF stimulation. L1062D is loss of function for both contexts only in -HGF conditions, but shows expression in phosphorylated METΔEx14, but not MET. For S1122Q, indeed there is a stronger pERK signal in the METΔEx14, which highlights how probing all regions of phosphorylation (A-loop and C-tail) and many MET-associate pathways (ERK, AKT) may be important to understand in what way these mutations are affect MET phosphorylation and proliferation. We have included this point in the text.

      (5) Related to the previous point, one other thing to consider here is that perhaps gain-of-function mutations are simply not detectable in this particular DMS assay. The authors state that GOF and LOF are defined as 2 standard deviations from the mean of the WT-synonymous distribution. How many mutations are actually designated to be GOF based on this criterion? Are those GOF mutations as reproducible as the LOF mutations? It would be worthwhile to separately analyze the variance in activity scores for every loss-of-function mutation and gain-of-function mutation. It seems likely that loss-of-function scores are a lot more reproducible than gain-of-function ones, suggesting that the most apparent gain-of-function signal is just noise in the assay. The few outliers to this point (true gain-of-function mutations) may be some of the ones discussed in Figure 6. If this is true, it would lend confidence to the claims associated with Figure 6.

      In analyzing and classifying both GOF and LOF mutations, error was a main filtering parameter. Each fitness score, calculated by Enrich2, is representative of the slope across time points  and biological replicates for the read frequency of the mutation. The associated standard error (SE) reflects the variance for each mutation within the scoring framework (Rubin et al., 2017). Mutations were then further filtered based on low propagated error, calculated by comparing the standard error (SE) of each missense mutation to the SE of the respective wild type synonymous mutation. Therefore, mutations were only classified as GOF or LOF if there was low error, in addition to the other score filters previously described. We have plotted the classified GOF mutations with their respective SE in the newly incorporated Supplemental Data Figure 8C.

      (6) In the discussion of panels 6C and 6D, the assertion is that the "clinical, not validated" category has more mutations that are low-fitness outliers than the "clinical, validated" category. From the graphs, it's actually hard to tell if this is the case for two reasons: (1) the way the graphs are normalized, (to the largest value in each histogram), you cannot compare bar heights (and thus number of mutations) between two histograms on the same graph. (2) Just looking at the shapes of the distributions, or considering maybe the mean or median values, it's unclear whether the "validated" and "not validated" populations are actually different from one another.

      This is an important indication, and we have added analysis showing the distribution and number of clinically-associated mutations within our libraries without normalization in the main text and in Supplemental Data Figure 8A-B.

      (7) This sentence in the last results section is somewhat unclear: "GOF resistance mutations may indicate an effect on the equilibrium of kinase activation, whereas LOF resistance mutations likely affect inhibitor-protein interactions directly." The first part makes sense, but it is not totally obvious how one can infer anything about inhibitor-protein interactions from mutations that are LOF with respect to kinase activity. Related to this, how are LOF mutations selected in the presence of an inhibitor? Is the assumption here that the mutation might totally abrogate inhibitor binding but only slightly impair the kinase? Perhaps this could be explained a bit more.

      Here, the idea we wanted to get across is that there are two models  that can explain how a mutation can contribute to resistance: shift the activity equilibrium at baseline or directly impair drug effects and restore baseline activity. Mutations that are labeled resistant and GOF, favor the first model. Mutations that are labeled resistant and LOF, favor the second model. In the presence of an inhibitor, which is in the scope outside of this study, LOF mutations would be sensitive to the inhibitor (ie WT-like and sensitive).

      (8) Some additional details of the library preparation and sequencing should be given in the methods section. It appears that the variable region of the library is roughly 275 amino acid residues long, which means >800 bases. How was this sequenced? From the methods, it sounds like all of the variants were pooled into a single library, but then sequencing was done using a 300x300 paired-end Illumina kit, which would not cover the length of the whole variable region. Was the library actually screened in segments as sub-libraries and then separately sequenced? Alternatively, was the whole library screened at once, and then different segments were amplified out for sequencing? If the latter approach is used, this could yield confounding results for counting wild-type variants that have the parent wild-type coding sequence. For example, if you amplify your kinase library in three segments after a single selection on the whole library, and you sequence those three segments separately, you might find a read that appears as wild-type in the part you amplified/sequenced but has a mutation in a region that you did not sequence. If this approach is taken, the counts for the wild-type sequence would be inaccurate, in which case, how is the data normalized with WT as a reference? Regardless of the method used, some more details should be provided in the methods section.

      In this study, we used the Nextera XT DNA Library Preparation Kit (Illumina), which uses a tagmentanation approach that randomly fragments our 861 bp amplicon into ~300 bp fragments with a transposase, resulting in a Poisson distribution of fragment sizes. This allows for direct sequencing of all amplicons and libraries with an SP300 paired-end run, which we ran on two lanes of a NovaSeq6000. Samples are demultiplexed  and processed by our analysis pipeline with a lookup table that associates the unique dual index to the specific sample (library, time point, biological replicate, IL-3 condition).

      The TPR-MET and TPR-METΔEx14 libraries were prepared in parallel throughout the entire experiment, from cloning to virus generation to transductions, screening, cell harvesting, sequencing prep, and sequencing. In other words, the TPR-MET and TPR-METΔEx14 were transduced into their own, respective batch of cells for each biological replicate, then selected and screened on the same day for each replicate and time point. Each library and condition (time point, biological replicate, IL-3 condition) was prepared in parallel but still an independent sample. At the stage of tagmentation, each sample was arrayed, where each well corresponds to a library, biological replicate, and time point. At the stage of sequencing, samples across the two libraries were normalized to 10mM (library, biological replicate, time point, IL-3 condition) then pooled together and all run on two lanes of the same NovaSeq6000 flow cell.

      PCR and sequencing bias was one of the most important parameters for us, which is why we performed tagmentation in parallel and sequenced everything on the same run. We have added extra details to the methods and hope that we have clarified your questions on this matter.

      Suggested minor points to address:

      (1) TPR (as in TPR-MET fusion) is not defined in the text when it is first mentioned. And it wasn't immediately clear that this is not a membrane-associated domain (Figure 5E makes this way more obvious than Figure 1B does). Perhaps this could be made more explicit in the text or in Figure 1.

      We have incorporated a new schematic in Figure 1B to better illustrate the TPR-fusion constructs used within this study. The usage of the TPR-fusion is first mentioned in the introduction, paragraph 4, and revised the main-text to delineate the usage of the TPR-fusion more clearly.

      (2) In Figure 2G, it would be helpful if the wild-type amino acid residue was listed underneath the position number in the two graphs (even though those residues are also highlighted in 2H).

      Thank you for this recommendation, we have added the wild type amino acid next to the position number in the x-axis label.

      (3) For Supplementary Data Figure 2, is it possible to calculate conservation scores at each position using some kind of evolutionary model, rather than relying on visual inspection of the sequence logo? Can one quantitatively assert that the C-spine is less conserved than the R-spine overall, or can this only be said for certain positions? Related to this, in comparing Figure 2G to Supplementary Data Figure 2, it is interesting that there isn't any obvious correspondence between mutational tolerance and conservation within the C-spine. For example, 1165 seems to be the most conserved position in the C-spine, but several substitutions are tolerated at this position, just like 1210, which is one of the least conserved positions in the C-spine. Finally, it's very likely that positions 1165, 1210, 1272, and 1276 co-vary, given that they all pack into the same hydrophobic cluster. This might be why they appear less conserved. These last few points might be worth discussing briefly if the authors want to relate mutational tolerance to evolutionary conservation.

      Thank you for this recommendation. To better quantitatively determine C-spine versus R-spine conservation, we performed a multiple sequence alignment of all RTK kinase domain sequences to properly identify corresponding R- and C-spine locations, as previously done in generating the spine logos, then used the bio3D structural bioinformatics package in R to calculate the conservation score of each residue position by amino acid “similarity” with a blosum62 matrix (Supplemental Data Figure 2B). In concordance with the logos, we find that C-spine positions 1092, 1108, 1165 have the highest conservation scores, even compared to some R-spine mutations. We also see across the alignment that indeed, C-spine positions 1165 1210,1211,1212, and 1272, and 1276 co-vary within RTK families. We have revised the text to reflect these points, and more specifically discuss position-level conservation rather than generalizing conservation for the C- and R-spines.

      (4) On Page 7 of the merged document, there appear to be some figure labeling errors. In the first and second paragraphs of the "Critical contacts between..." section, Figure 3B is referenced multiple times as a structural alignment/ensemble, but this is a heatmap.

      Thank you for catching this! The correct figure panels are now referenced.

      (5) In the text describing Figure 3A, it is stated that the structures were aligned to the N-lobe, but the figure legend says that all structures were aligned to alpha-C and alpha-JM.

      Thank you - a local alignment to the ⍺JM-helix and ⍺C-helix is correct, the idea here being that if the ⍺JM-helix and ⍺C-helix are linked to an active/inactive conformation like in the case of the insulin receptor, these two clusters could be revealed through the structural ensemble. However, we discovered this was not the case, combined with the DMS sensitivity to mutations at the packing interface leads us to believe that the MET JM has a distinctive regulatory mechanism that relies on this ⍺C-helix interface. We have made this correction to the text.

      (6) It would be helpful if the alpha-C and alpha-JM helices in Figure 3D were labeled on the MET structures.

      The ⍺C-helix and ⍺JM-helix are now labeled in Figure 3D.

      (7) It appears that Figure 4E is never explicitly referenced in the text.

      Thank you, Figure 4E is now appropriately referenced in the text.

      (8) Throughout the Figure 6 legend, for the histograms, it is stated that "Counts are normalized to the total mutations in each screen dataset." This might not be the correct description of normalization, as this would mean that the sum of all of the bins should equal 1. Rather, the normalization appears to be to the bin with the largest number of mutants in it, which is given a value of 1. This difference is really critical to how one visually inspects the overlaid histograms.

      Thank you for this comment. Here, the intention was to aid in the visualization of the distribution of cancer-associated and resistance associated mutations, which is a much smaller population compared to the whole library and becomes easily masked. We originally applied a “stat(ncount)” function in R, which as noted scales the data and sets the peak to 1, which only applied to the clinical and cancer-associated mutations plotted. Now, to better compare distributions, normalization has been removed, instead opting to overlay the distributions of all missense mutations and the subset of clinical mutations directly with their own y-axis scale. This modification has been made throughout Figure 6 panels, hopefully improving interpretability.

      Reviewer #2 (Recommendations For The Authors):

      A few thoughts/suggestions:

      (1) Regarding kinase regulation, the "closing of N- and C-lobe" upon activation is an often mentioned component of activation, and I'm sure is true in many cases, but it is not a general feature of kinase activation.

      The text has been updated - we removed the description of N- and C-lobe closure. 

      (2) With respect to the inactive state of MEK, the DFG-flipped structure discussed here is almost certainly an inhibitor-induced conformation. Again, DFG-flip is often discussed as a mechanism of kinase regulation, and while in some kinases this might be the case, more often it is a drug-induced or drug-stabilized inactive conformation. The SRC/CDK-like inactive conformation in 2G15 is more likely a physiologically relevant inactive state. (or even better, the ATP-bound inactive state structure 3DKC, which exhibits a somewhat different SRC/CDK-like inactive conformation).

      The PDB 3R7O structure was chosen as the main representation because it was the clearest representation of a wild type structure with an aligned R- and C- spine, solvent-exposed, phosphorylated activation loop. Although 3DKC is bound to ATP, this structure is still in an inactive conformation and has stabilizing mutations (Y1234/F, Y1235D) and an atypical alpha helix structure in the activation loop. However, we agree the SRC/CDK-like inactive conformation is an important representation and we have incorporated our structural mapping on 2G15 in the new supplemental figures with further details on statistical analysis and comparison of libraries.

      (3) Following the comments above, I would describe the process of activation in a simpler way (in any case, it is peripheral to the work described here). Something along the lines of "phosphorylation on tyrosines XX and XX induces rearrangement of the activation segment and promotes and stabilizes the inward active position of the C-helix." Can go on to mention that this forms the E1127/K1110 salt bridge. (The DFG is already "in" in the SRc/CDK-like inactive state).

      We have changed the language to more simply describe activation. Thank you!

      (4) Would be great to see DMS with the intact receptor done in a way that could identify mutations that lead to activation in a ligand-independent manner. (but obviously beyond the scope of this paper).

      Agreed! This would be an excellent follow up for the future, especially to elucidate juxtamembrane regulation, as the membrane context is likely required.

      A typo or two:

      Boarded instead of bordered/outlined in legend to Fig. 1.

      P11553L in the 2nd line of the 2nd paragraph in that section.

      Thank you, we have addressed these typos!

    1. Author response:

      eLife assessment

      This valuable study uses single-cell transcriptomics to explore the mouse vomeronasal organ and represents an advance that enhances our understanding of neural diversity within this sensory system. Findings suggest a unique endoplasmic reticulum (ER) structure in Gnao1 neurons and allow for the synthesis of a developmental trajectory from stem cells to mature vomeronasal sensory neurons. Convincing methods, data, and analyses broadly support the claims, although experiments supporting the main ER-related claim are incomplete and lack quantification of co-expression and statistics on labeling intensity or coverage. Adding these data would greatly strengthen the conclusions of the paper.

      Public Reviews:

      Reviewer #1 (Public Review):

      Devakinandan and colleagues present a manuscript analyzing single-cell RNA-sequencing data from the mouse vomeronasal organ. The main advances in this manuscript are to identify and verify the differential expression of genes that distinguish apical and basal vomeronasal neurons. The authors also identify the enriched expression of ER-related genes in Gnao1 neurons, which they verify with in situ hybridizations and immunostaining, and also explore via electron microscopy. Finally, the results of this manuscript are presented in an online R shiny app. Overall, these data are a useful resource to the community. I have a few concerns about the manuscript, which I've listed below.

      General Concerns:

      (1) The authors mention that they were unable to identify the cells in cluster 13. This cluster looks similar to the "secretory VSN" subtype described in a recent preprint from C. Ron Yu's lab (10.1101/2024.02.22.581574). The authors could try comparing or integrating their data with this dataset (or that in Katreddi et al. 2022) to see if this is a common cell type across datasets (or arises from a specific type of cell doublets). In situ hybridizations for some of the marker genes for this cluster could also highlight where in the VNO these cells reside.

      Cluster13 (Obp2a+) cells identified in our study have similar gene expression markers to those identified with the “putative secretory” cells in Hills et al. manuscript. At the time this manuscript was available publicly, our publication was already finalized and communicated. We welcome the suggestion to integrate data, which we will attempt and address in our revision.      

      (2) I found the UMAPs for the neurons somewhat difficult to interpret. Unlike Katreddi et al. 2022 or Hills et al. 2024, it's tricky to follow the developmental trajectories of the cells in the UMAP space. Perhaps the authors could try re-embedding the data using gene sets that don't include the receptors? It would also be interesting to see if the neuron clusters still cluster by receptor-type even when the receptors are excluded from the gene sets used for clustering. Plots relating the original clusters to the neuronal clusters, or dot plots showing marker gene expression for the neuronal clusters might both be useful. For example, right now it's difficult to interpret clusters like n8-13.

      We will represent the UMAPs to make the developmental trajectory clearer. How neuron clusters are affected by the presence or exclusion of receptors is an interesting question that we will address in our revision, along with showing markers of each neuronal cluster, as suggested by the reviewer.  

      Reviewer #2 (Public Review):

      Summary:

      The study focuses on the vomeronasal organ, the peripheral chemosensory organ of the accessory olfactory system, by employing single-cell transcriptomics. The author analyzed the mouse vomeronasal organ, identifying diverse cell types through their unique gene expression patterns. Developmental gene expression analysis revealed that two classes of sensory neurons diverge in their maturation from common progenitors, marked by specific transient and persistent transcription factors. A comparative study between major neuronal subtypes, which differ in their G-protein sensory receptor families and G-protein subunits (Gnai2 and Gnao1, respectively), highlighted a higher expression of endoplasmic reticulum (ER) associated genes in Gnao1 neurons. Moreover, distinct differences in ER content and ultrastructure suggest some intriguing roles of ER in Gnao1-positive vomeronasal neurons. This work is likely to provide useful data for the community and is conceptually novel with the unique role of ER in a subset of vomeronasal neurons. This reviewer has some minor concerns and some suggestions to improve the manuscript.

      Strengths:

      (1) The study identified diverse cell types based on unique gene expression patterns, using single-cell transcriptomic.

      (2) The analysis suggests that two classes of sensory neurons diverge during maturation from common progenitors, characterized by specific transient and persistent transcription factors.

      (3) A comparative study highlighted differences in Gnai2- and Gnao1-positive sensory neurons.

      (4) Higher expression of endoplasmic reticulum (ER) associated genes in Gnao1 neurons.

      (5) Distinct differences in ER content and ultrastructure suggest unique roles of ER in Gnao1-positive vomeronasal neurons.

      (6) The research provides conceptually novel on the unique role of ER in a subset of vomeronasal neurons, offering valuable insights to the community.

      Weaknesses:

      (1) The connection between observations from sc RNA-seq and EM is unclear.

      (2) The lack of quantification for the ER phenotype is a concern.

      We would like to point out that the connection between scRNA-seq and EM was made in our experiments that investigated the localization of ER proteins via IHC (in Figure 5). The intriguing observation that the levels of a number of ER luminal and membrane proteins were higher in Gnao1 compared to Gnai2 neurons, led us to hypothesize a differential ER content or ultrastructure, which was verified by EM. The quantification of ER phenotype would definitely strengthen our observations, which we will add in our revised manuscript.       

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Devakinandan and colleagues have undertaken a thorough characterization of the cell types of the mouse vomeronasal organ, focusing on the vomeronasal sensory neurons (VSNs). VSNs are known to arise from a common pool of progenitors that differentiate into two distinct populations characterized by the expression of either the G protein subunit Gnao1 or Gnai2. Using single-cell RNA sequencing followed by unsupervised clustering of the transcriptome data, the authors identified three Gnai2+ VSN subtypes and a single Gnao1+ VSN type. To study VSN developmental trajectories, Devakinandan and colleagues took advantage of the constant renewal of the neuronal VSN pool, which allowed them to harvest all maturation states. All neurons were re-clustered and a pseudotime analysis was performed. The analysis revealed the emergence of two pools of Gap43+ clusters from a common lineage, which differentiate into many subclusters of mature Gnao1+ and Gnai2+ VSNs. By comparing the transcriptomes of these two pools of immature VSNs, the authors identified a number of differentially expressed transcription factors in addition to known markers. Next, by comparing the transcriptomes of mature Gnao1+ and Gnai2+ VSNs, the authors report the enrichment of ER-related genes in Gnao1+ VSNs. Using electron microscopy, they found that this enrichment was associated with specific ER morphology in Gnao1+ neurons. Finally, the authors characterized chemosensory receptor expression and co-expression (as well as H2-Mv proteins) in mature VSNs, which recapitulated known patterns.

      Strengths:

      The data presented here provide new and interesting perspectives on the distinguishing features between Gnao1+ and Gnai2+ VSNs. These features include newly identified markers, such as transcription factors, as well as an unsuspected ER-related peculiarity in Gnao1+ neurons, consisting of a hypertrophic ER and an enrichment in ER-related genes. In addition, the authors provide a comprehensive picture of specific co-expression patterns of V2R chemoreceptors and H2-Mv genes.

      Importantly, the authors provide a browser (scVNOexplorer) for anyone to explore the data, including gene expression and co-expression, number and proportion of cells, with a variety of graphical tools (violin plots, feature plots, dot plots, ...).

      Weaknesses:

      The study still requires refined analyses of the data and rigorous quantification to support the main claims.

      The method description for filtering and clustering single-cell RNA-sequencing data is incomplete. The Seurat package has many available pipelines for single-cell RNA-seq analysis, with a significant impact on the output data. How did the authors pre-process and normalize the data? Was the pipeline used with default settings? What batch correction method was applied to the data to mitigate possible sampling or technical effects? Moreover, the authors do not describe how cell and gene filtering was performed.

      The data in Figure 7-Supplement 3 show that one-sixth of the V1Rs do not express any chemoreceptor, while over a hundred cells express more than one chemoreceptor. Do these cells have unusually high or low numbers of genes or counts? To exclude the possibility of a technical artifact in these observations, the authors should describe how they dealt with putative doublet cells or debris.

      Surprisingly, some clusters are characterized by the expression of specific chemoreceptors (VRs). Have these been used for clustering? If so, clustering should be repeated after excluding these receptors.

      The identification of the VSN types should be consistent across the different analyses and validated. The data presented in Figure 1 lists four mature VSN types, whereas the re-clustering of neurons presented in Figure 3 leads to a different subdivision. At present, it remains unclear whether these clusters reflect the biology of the system or are due to over-clustering of the data, and therefore correspond to either noise or arbitrary splitting of continua. Clusters should be merged if they do not correspond to discrete categories of cells, and correspondence should be established between the different clustering analyses. To validate the detected clusters as cell types, markers characteristic of each of these populations can be evaluated by ISH or IHC.

      There is a lack of quantification of imaging data, which provides little support for the ER-related main claim. Quantification of co-expression and statistics on labeling intensity or coverage would greatly strengthen the conclusions and the title of the paper.

      scRNA-seq data analysis methods: We agree with the reviewer and will elaborate on the various criterion, parameters and methods in our revision. As described above, our revised manuscript will include analysis of how inclusion / exclusion of VRs affects cell clusters, as well as quantification of the ER phenotype. We will address the reviewer’s concern of over-clustering.

      We think that the cells expressing zero as well as two V1Rs are real and cannot be attributed to debris or doublets for the following reasons:

      a) Cells expressing no V1Rs are not necessarily debris because they express other neuronal markers at the same level as cells that express one or two V1Rs. Higher expression threshold values used in our analysis may have somewhat increased the proportion of cells with zero V1Rs. We will modify figure 7-supplement 3c to add another group showing Gnai2 level in cells expressing zero V1Rs.

      b) Cells co-expressing V1R genes: We listed the frequency of cells co-expressing V1R gene combinations in Supplementary table - 8. Among 134 cells that express two V1Rs, 44 cells express Vmn1r85+Vmn1r86, 21 express Vmn1r184+Vmn1r185, 13 express Vmn1r56+Vmn1r57, 6 express Vmn1r168+Vmn1r177, and so on. Doublets generally are a random combination of two cells. Here, each specific co-expression combination represents multiple cells and is highly unlikely by random chance. Some of the co-expression combinations were identified earlier and verified experimentally in Lee et al., 2019 and Hills et. al. Furthermore, Figure-7 supplement 3c shows that the level of Gnai2 expression is comparable across cells expressing one or two V1Rs. If the V1R expressing cells are doublets, we expect the level of Gnai2 to be higher, as compared to cells expressing single V1R. We will elaborate on this in our revised manuscript.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      The manuscript by Sejour et al. is testing "translational ramp" model described previously by Tuller et al. in S. cerevisiae. Authors are using bioinformatics and reporter based experimental approaches to test whether "rare codons" in the first 40 codons of the gene coding sequences increase translation efficiency and regulate abundance of translation products in yeast cells. Authors conclude that "translation ramp" model does not have support using a new set of reporters and bioinformatics analyses. The strength of bioinformatic evidence and experimental analyses (even very limited) of the rare codons insertion in the reporter make a compelling case for the authors claims. However the major weakness of the manuscript is that authors do not take into account other models that previously disputed "rare or slow codon" model of Tuller et al. and overstate their own results that are rather limited. This maintains to be the weak part of the manuscript even in the revised form.

      We are glad the reviewer thinks our evidence makes “a compelling case for the authors claims”. This was our main aim, and we are satisfied with this.

      The reviewer believes the major weakness of the manuscript is that we do not take into account other models and do not (see below) cite numerous other relevant papers. The reviewer made essentially the same criticism at the first review, at which time we looked quite hard for papers generally meeting the reviewer’s description. We found a few, which we incorporated here. Still, we did not find the body of evidence whose existence the reviewer implies. We are citing every study we know to be relevant, though of course we will have inadvertently missed some, given the huge body of literature. After the first round of review, we wrote “the reviewer did not give specific references, and, though we looked, we weren’t always sure which papers the reviewer had in mind.” We hoped the reviewer would provide citations. But only two citations are provided here, both to A. Kochetov, and these don’t seem central to the reviewer’s points.

      The studies that authors do not mention argue with "translation ramp" model and show more thorough analyses of translation initiation to elongation transition as well as early elongation "slow down" in ribosome profiling data. Moreover several studies have used bioinformatical analyses to point out the evolution of N-terminal sequences in multiple model organisms including yeast, focusing on either upstream ORFs (uORFs) or already annotated ORFs. The authors did not mention multiple of these studies in their revised manuscript and did not comment on their own results in the context of these previous studies.

      Mostly, we do not know to what papers the reviewer is referring. This may be our failing, but it would have helped if the reviewer had cited one of them. There are papers discussing the evolution of N-terminal sequences, but as far as we know, these do not discuss translation speed or codon usage. Of course, we may have missed some papers.

      As such the authors approach to data presentation, writing and data discussion makes the manuscript rather biased, focused on criticizing Tuller et al. study and short on discussing multiple other possible reasons for slow translation elongation at the beginning of the protein synthesis. This all together makes the manuscript at the end very limited.

      We think the reviewer may be considering our paper as being generally about translation speeds, whereas in our minds, it is not. This difference in views as to what the paper is “about” is perhaps causing friction. To us, it is indeed a limited paper. We are narrowly focused on the finding of Tuller that there is an enrichment of rare, slow codons at the 5’ end of genes, and we have sought an explanation of this particular fact. This is not a paper about rates of translation generally—it is a limited paper about the reason for the 5’ enrichment of rare, slow codons.

      To expand on this, the encoded slow 5’ translation due to rare, slow codons (of Tuller et al.) is a small effect (1% to 3%). The possible unencoded slow 5’ translation of unknown mechanism discussed by some other papers (e.g., Weinberg et al. 2016, Shah et al. 2013) is a much larger effect (50% or more). Just from the different magnitudes, it seems likely these are different phenomena. And yet, despite the small size of the encoded effect, it is for some reason this paper by Tuller et al. that has captured the attention of the literature: as we point out below, Tuller et al. has been cited over 900 times. Partly because of the wide and continuing influence of this paper, it is worth specifically and narrowly addressing its findings.

      Reviewer #2 (Public Review):

      Tuller et al. first made the curious observation, that the first ∼30-50 codons in most organisms are encoded by scarce tRNAs and appear to be translated slower than the rest of the coding sequences (CDS). They speculated that this has evolved to pace ribosomes on CDS and prevent ribosome collisions during elongation - the "Ramp" hypothesis. Various aspects of this hypothesis, both factual and in terms of interpreting the results, have been challenged ever since. Sejour et al. present compelling results confirming the slower translation of the first ~40 codons in S. cerevisiae but providing an alternative explanation for this phenomenon. Specifically, they show that the higher amino acid sequence divergence of N-terminal ends of proteins and accompanying lower purifying selection (perhaps the result of de novo evolution) is sufficient to explain the prevalence of rare slow codons in these regions. These results are an important contribution in understanding how aspects of the evolution of protein coding regions can affect translation efficiency on these sequences and directly challenge the "Ramp" hypothesis proposed by Tuller et al.

      I believe the data is presented clearly and the results generally justify the conclusions.

      We thank the reviewer for his/her attention to the manuscript, and for his/her comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      As mentioned in the public review major weakness of the manuscript is the lack of analyses for confounding effects, overstatements of the results (using single amino acid sequence reporter) and the lack of discussion of previous work that argues against Tuller et al model. In my previous review I mentioned multiple other studies that addressed "slow codons" model in more detail.

      No, the reviewer did not cite any specific studies.

      While some of these studies are mentioned in the revised manuscript, authors are still rather biased and selective in their discussions. I should also point out that previous studies, that authors fail again to mention, were focused on either translation initiation, initiation to elongation transition or early elongation effects in relation to mRNA sequence, structure, codons as well as amino acid sequence. Also additional studies with bioinformatic analyses of N-terminal conservation and existence of start sites at the beginning of the protein sequences in multiple model organisms were also omitted.

      Again, we do not know to what papers the reviewer is referring. But this sounds like a lot. Our paper is aimed at a specific, narrow topic: Why is there an excess of rare, slow codons in the 5’ region of genes? We are not trying to make general statements about all things affecting and affected by translation speed, we are just trying to explain the excess of rare, slow codons.

      In general manuscript seems to be too much focused-on discussion of Tuller's paper . . .

      Yes, we are focused on the Tuller findings, the excess of rare slow codons in 5’ regions.

      . . . and arguing with the model that was already shown by multiple other studies to be limited and not correct.

      We find it unsatisfactory that the reviewer states in a public review that there are multiple other studies showing that the Tuller model is not correct, and yet does not cite any of them. Furthermore, for the reviewer to say that Tuller et al. is “not correct” is too sweeping. The core finding of Tuller et al. was the excess of rare, slow codons in the 5’ regions of genes. We confirm this; we believe it is correct; we are not aware of any literature disputing this. Then, Tuller interpreted this as an adaptation to promote translational efficiency. On the interpretation, we disagree with Tuller. But if one is to disagree with this interpretation, one needs an alternative explanation of the fact of the excess rare, slow codons. Providing such an alternative explanation, and doing an experiment to distinguish the explanations, is our contribution. We are not aware of any other paper making our interpretation.

      There are of course many papers that discuss various aspects of translation at the 5’ ends of genes, and we do cite quite a few such papers in our manuscript, though certainly not all. But papers of this general kind do not, and cannot, show that Tuller et al. is “not correct”. As far as we know, no paper provides an alternative explanation for the rare slow codons, and no paper does an experiment to modulate translation speed and look at the effect on gene expression. Notably, the slow translation phenomenon associated with the rare codons found by Tuller et al. is a very small effect—a change of about 1% to 3% of translation speed. Some other papers on translation speed are dealing with possible changes in the range of 50% or more. These are presumably some other phenomenon (if indeed they are even real changes in translation speed), and, whether they are true or not, the results and interpretations of Tuller et al. could still be true or not. Of course, if we knew of some previous paper showing the Tuller paper is not correct, we should and would cite it.

      To expand on the current view of Tuller in the literature, Tuller et al. has been cited 956 times according to Google Scholar. This makes it an extremely influential paper. After finding Tuller et al. in Entrez Pubmed, one can look under “Cited by” and see the five most recent papers that cite Tuller et al. The five papers given on May 23 2024 were Bharti . . . Ignatova 2024; Uddin 2024; Khandia . . . Choudhary 2024; Love and Nair 2024; and Oelschlaeger 2024. We went through these five most recent papers that cite Tuller et al., and asked, did these authors cite the Tuller results as fully correct, or did they mention any doubts about the results? All five of the papers cited the Tuller results as fully correct, with no mention of any kind of doubt. For instance, Kandia et al. 2024 state “The slow “ramp” present at 5’ end of mRNA forms an optimal and robust means to reduce ribosomal traffic jams, thus minimizing the cost of protein expression40.”, while Oelschlaeger (2024) states “Slow translation ramps have also been described elsewhere and proposed to prevent traffic jams along the mRNA [51,52,53].” Although Uddin (2024) cited Tuller as fully correct, Uddin seemed to think (it is a little unclear) that Tuller found an enrichment of highly-used codons, opposite to the actual finding. The multiple contrary studies mentioned by the reviewer do not seem to have been very influential.

      There are papers containing skepticism about the Tuller interpretation, and also papers with results that are difficult to reconcile in a common-sense way with the Tuller interpretation. But skepticism, and a difficulty to reconcile with common sense, are far from a demonstration that a paper is incorrect. Indeed, Tuller et al. may have been published in Cell, and may be so highly cited, exactly because the findings are counter-intuitive, colliding with common sense. Our contribution is to find a common-sense interpretation of the surprising but correct underlying fact of the 5’ enrichment of rare, slow codons.

      Having wrote that in the previous review, I have to admit that Sejour et al manuscript in the main text has a minimal amount of novelty with experimental evidence, the conclusions are based on three reporters with and without stalling/collision sequence with the same amino acid sequence and varying codons. Some more novelty is seen in bioinformatic analyses of multiple yeast sequences and sequence conservation at the N-termini of proteins. However, even this part of the manuscript is not discussed fully and with correct comparison to previous studies. Authors, based on my previous comments discuss further experimental shortcomings in their new and "expanded" discussion but the use of a single reporter in this case cannot relate to all differences that may be coming from ORFs seen in complete yeast transcriptome. There are multiple studies that used more reporters with more than one amino-acid and mRNA sequence as well as with similar variation of the rare or common codons. The handwaving argument about the influence of all other mechanisms that can arise from different start sites, RNA structure, peptide interaction with exit channel, peptidyl-tRNA drop-off, eIF3 complex initiation-elongation association, and etc, is just pointing up to a manuscript that is more about bashing up Tuller's model and old paper than trying to make a concise story about their own results and discuss their study in plethora of studies that indicated multiple other models for slow early elongation.

      We don’t understand why the reviewer is so grudging.

      Discussion of the ribosome's collisions and potential impact of such scenario in the author's manuscript is left completely without citation, even though such work has relevant results to the author's conclusions and Tuller's model.

      This is not true. We cite Dao Duc and Song (2018) “The impact of ribosomal interference, codon usage, and exit tunnel interactions on translation elongation rate variation.” PLoS Genet 14, and Tesina, . . . and Green (2020) “Molecular mechanism of translational stalling by inhibitory codon combinations and Poly(A) tracts. EMBO J., which are two excellent papers on this subject. We also cite Gamble et al. (2016), who found the underlying result, but at that time did not attribute it to ribosome collisions.

      Previous studies (not cited) for example clearly indicate how the length from stalling sequence to start codon is related to ribosome collisions. Moreover such studies are pointing out differences in initiation vs elongation rates that may impact ribosome collisions and protein expression. Both of these topics would be very valuable in discussions of evolutionary changes in the current yeast ORFs. Not to mention that authors do not really discuss also possibilities for differences in 5'UTRs and uORFs in relation to downstream ORFs sequence and codon composition.

      It is not clear to us that such papers are highly relevant to the issue on which we are working.

      The argument about whether cycloheximide or not is doing 5' ribosome slowdown (lines 425-443) is just rambling about Weinberg's paper from 2016 without any real conclusion. In this section authors are just throwing down hypothesis that were more clearly explained in Weinberg's manuscript or shown experimentally in studies done after the Weinberg et al. paper was published.

      Earlier, the reviewer had the criticism that “The studies that authors do not mention argue with "translation ramp" model and show more thorough analyses of translation initiation to elongation transition as well as early elongation "slow down" in ribosome profiling data.” The main study we know of dealing with these issues like these is that of Weinberg et al. 2016. In our opinion, this is a thoughtful paper on these issues. But now, at this point, the reviewer seems to criticize the fact that we do extensively cite results from Weinberg et al. It is true that there is no ultimate conclusion, but why there is no conclusion is a little bit interesting. Weinberg et al show that even in studies that do not use cycloheximide as the first step in ribosome profiling, there is some left-over high density of ribosomes near 5’ ends. But, all these ribosome profiling experiments do use cycloheximide at a later step in the procedure. Until someone does a ribosome profiling experiment without the use of any cycloheximide at any step, there will be no firm conclusion. This is not our fault—and also not the issue we are writing about. And, the reason this paragraph is in the manuscript at all is that the reviewer (we thought) had asked for something like this in the first review.

      At the end, even in the limited novelty of evolutionary arguments about non-existing N-terminal conservation of codons or amino acids they fail to cite and discuss previous work by Kochetov (BioEssays, 2008 and NAR, 2011) which have additional explanation on evolution of N-terminal sequences in yeast, human or Drosophila.

      These two papers of Dr. Kochetov’s have some relevance and we now cite them. These are the only papers cited by the reviewer in his/her two reviews.

      Probably the reviewer would have preferred a paper on a different subject.


      The following is the authors’ response to the original reviews.

      Response to Reviewers:

      We thank the reviewers for their comments, and their evident close reading of the manuscript. Generally, we agree with the reviewers on the strengths and weaknesses of our manuscript. Our revised manuscript has a more extensive discussion of alternative explanations for initial high ribosome density as seen by ribosome profiling, and which more specifically points out the limitations of our work.

      As a preface to specific responses to the reviewers, we will say that we could divide observations of slow initial translation into two categories, which we will call “encoded slow codons”, and “increased ribosome density”. With respect to the first category, Tuller et al. documented initial “encoded slow codons”, that is, there is a statistical excess of rare, slowly-translated codons at the 5’ ends of genes. Although the size of this effect is small, statistical significance is extremely high, and the existence of this enrichment is not in any doubt. At first sight, this appears to be a strong indication of a preference for slow initial translation. In our opinion, our main contribution is to show that there is an alternative explanation for this initial enrichment of rare, slow codons—that they are a spandrel, a consequence of sequence plasticity at the 5’ (and 3’) ends of genes. The reviewers seem to generally agree with this, and we are not aware that any other work has provided an explanation for the 5’ enrichment of rare codons.

      The second category of observations pertaining to slow initial translation is “increased ribosome density”. Early ribosome profiling studies used cycloheximide to arrest cell growth, and these studies showed a higher density of ribosomes near the 5’ end of genes than elsewhere. This high initial ribosome density helped motivate the paper of Tuller et al., though their finding of “encoded slow codons” could explain only a very small part of the increased ribosome density. More modern ribosome profiling studies do not use cycloheximide as the first step in arresting translation, and in these studies, the density of ribosomes near the 5’ end of genes is greatly reduced. And yet, there remains, even in the absence of cycloheximide at the first step, a significantly increased density of ribosomes near the 5’ end (e.g., Weinberg et al., 2016). (However, most or all of these studies do use cycloheximide at a later step in the protocol, and the possibility of a cycloheximide artefact is difficult to exclude.) Some of the reviewer’s concerns are that we do not explain the increased 5’ ribosome density seen by ribosome profiling. We agree; but we feel it is not the main point of our manuscript. In revision, we more extensively discuss other work on increased ribosome density, and more explicitly point out the limitations of our manuscript in this regard. We also note, though, that increased ribosome density is not a direct measure of translation speed—it can have other causes.

      Specific Responses.

      Reviewer 1 was concerned that we did not more fully discuss other work on possible reasons for slow initial translation. We discuss such work more extensively in our revision. However, as far as we know, none of this work proposes a reason for the 5’ enrichment of rare, slow codons, and this is the main point of our paper. Furthermore, it is not completely clear that there is any slow initial translation. The increase in ribosome density seen in flash-freeze ribosome profiling could be an artefact of the use of cycloheximide at the thaw step of the protocols; or it could be a real measure of high ribosome density that occurs for some other reason than slow translation (e.g., ribosomes might have low processivity at the 5’ end).

      Reviewer 1 was also concerned about confounding effects in our reporter gene analysis of the effects of different codons on efficiency of translation. We have two comments. First, it is important to remember that although we changed codons in our reporters, we did not change any amino acids. We changed codons only to synonymous codons. Thus at least one of the reviewer’s possible confounding effects—interactions of the nascent peptide chain with the exit channel of the ribosome—does not apply. However, of course, the mRNA nucleotide sequence is altered, and this would cause a change in mRNA structure or abundance, which could matter. We agree this is a limitation to our approach. However, to fully address it, we feel it would be necessary to examine a really large number of quite different sequences, which is beyond the scope of this work. Furthermore, mRNAs with low secondary structure at the 5’ end probably have relatively high rates of initiation, and also relatively high rates of elongation, and it might be quite difficult to disentangle these. But in neither case is there an argument that slow initial translation is efficient. Accurate measurement of mRNA levels would be helpful, but would not disentangle rates of initiation from rates of elongation as causes of changes in expression.

      Reviewer 2 was concerned that the conservation scores for the 5’ 40 amino acids, and the 3’ 40 amino acids were similar, but slow translation was only statistically significant for the 5’ 40 amino acids. As we say in the manuscript, we are also puzzled by this. We note that 3’ translation is statistically slow, if one looks over the last 100 amino acids. Our best effort at an explanation is a sort of reverse-Tuller explanation: that in the last 40 amino acids, the new slow codons created by genome plasticity are fairly quickly removed by purifying selection, but that in the first 40 amino acids, for genes that need to be expressed at low levels, purifying selection against slow codons is reduced, because poor translation is actually advantageous for these genes. To expand on this a bit, we feel that the 5000 or so proteins of the proteome have to be expressed in the correct stoichiometric ratios, and that poor translation can be a useful tool to help achieve this. In this explanation, slow translation at the 5’ end is bad for translation (in agreement with our reporter experiments), but can be good for the organism, when it occurs in front of a gene that needs to be expressed poorly. Whereas, in Tuller, slow translation at the 5’ end is good for translation.

      Reviewer 2 wondered whether the N-terminal fusion peptide affects GFP fluorescence in our reporter. This specific reporter, with this N-terminus, has been characterized by Dean and Grayhack (2012), and by Gamble et al. (2016), and the idea that a super-folder GFP reporter is not greatly affected by N-terminal fusions is based on the work of Pedelacq (2006). None of these papers show whether this N-terminal fusion might have some effect, but together, they provide good reason to think that any effect would be small. These citations have been added.

    1. Author response:

      Reviewer #1 (Public Review):

      Abbasi et al. assess in this MEG study the directed connectivity of both cortical and subcortical regions during continuous speech production and perception. The authors observed bidirectional connectivity patterns between speech-related cortical areas as well as subcortical areas in production and perception. Interestingly, they found in speaking low-frequency connectivity from subcortical (the right cerebellum) to cortical (left superior temporal) areas, while connectivity from the cortical to subcortical areas was in the high frequencies. In listening a similar cortico-subcortical connectivity pattern was observed for the low frequencies, but the reversed connectivity in the higher frequencies was absent.

      The work by Abbasi and colleagues addresses a relevant, novel topic, namely understanding the brain dynamics between speaking and listening. This is important because traditionally production and perception of speech and language are investigated in a modality-specific manner. To have a more complete understanding of the neurobiology underlying these different speech behaviors, it is key to also understand their similarities and differences. Furthermore, to do so, the authors utilize state-of-the-art directed connectivity analyses on MEG measurements, providing a quite detailed profile of cortical and subcortical interactions for the production and perception of speech. Importantly, and perhaps most interesting in my opinion, is that the authors find evidence for frequency-specific directed connectivity, which is (partially) different between speaking and listening. This could suggest that both speech behaviors rely (to some extent) on similar cortico-cortical and cortico-subcortical networks, but different frequency-specific dynamics.

      These elements mentioned above (investigation of both production and perception, both cortico-cortical and cortico-subcortical connectivity is considered, and observing frequency-specific connectivity profiles within and between speech behaviors), make for important novel contributions to the field. Notwithstanding these strengths, I find that they are especially centered on methodology and functional anatomical description, but that precise theoretical contributions for neurobiological and cognitive models of speech are less transparent. This is in part because the study compares speech production and perception in general, but no psychophysical or psycholinguistic manipulations are considered. I also have some critical questions about the design which may pose some confounds in interpreting the data, especially with regard to comparing production and perception.

      (1) While the cortico-cortical and cortico-subcortical connectivity profiles highlighted in this study and the depth of the analyses are impressive, what these data mean for models of speech processing remains on the surface. This is in part due, I believe, to the fact that the authors have decided to explore speaking and listening in general, without targeting specific manipulations that help elucidate which aspects of speech processing are relevant for the particular connectivity profiles they have uncovered. For example, the frequency-specific directed connectivity is it driven by low-level psychophysical attributes of the speech or by more cognitive linguistic properties? Does it relate to the monitoring of speech, timing information, and updating of sensory predictions? Without manipulations trying to target one or several of these components, as some of the referenced work has done (e.g., Floegel et al., 2020; Stockert et al., 2021; Todorović et al., 2023), it is difficult to draw concrete conclusions as to which representations and/or processes of speech are reflected by the connectivity profiles. An additional disadvantage of not having manipulations within each speech behavior is that it makes the comparison between listening and speaking harder. That is, speaking and listening have marked input-output differences which likely will dominate any comparison between them. These physically driven differences (or similarities for that matter; see below) can be strongly reduced by instead exploring the same manipulations/variables between speaking and listening. If possible (if not to consider for future work), it may be interesting to score psychophysical (e.g., acoustic properties) or psycholinguistic (e.g., lexical frequency) information of the speech and see whether and how the frequency-specific connectivity profiles are affected by it.

      We thank the reviewer for pointing this out. The current study is indeed part of a larger project investigating the role of the internal forward model in speech perception and production. In the original, more comprehensive study, we also included a masked condition where participants produced speech as usual, but their auditory perception was masked. This allowed us to examine how the internal forward model behaves when it doesn't receive the expected sensory consequences of generated speech. However, for the current study, we focused solely on data from the speaking and listening conditions due to its specific research question. We agree that further manipulations would be interesting. However, for this study our focus was on natural speech and we avoided other manipulations (beyond masked speech) so that we can have sufficiently long recording time for the main speaking and listening conditions.

      (2) Recent studies comparing the production and perception of language may be relevant to the current study and add some theoretical weight since their data and interpretations for the comparisons between production and perception fit quite well with the observations in the current work. These studies highlight that language processes between production and perception, specifically lexical and phonetic processing (Fairs et al., 2021), and syntactic processing (Giglio et al., 2024), may rely on the same neural representations, but are differentiated in their (temporal) dynamics upon those shared representations. This is relevant because it dispenses with the classical notion in neurobiological models of language where production and perception rely on (partially) dissociable networks (e.g., Price, 2010). Rather those data suggest shared networks where different language behaviors are dissociated in their dynamics. The speech results in this study nicely fit and extend those studies and their theoretical implications.

      We thank the reviewer for the suggestion and we will include these references and the points made by the reviewer in our revised manuscript.

      (3) The authors align the frequency-selective connectivity between the right cerebellum and left temporal speech areas with recent studies demonstrating a role for the right cerebellum for the internal modelling in speech production and monitoring (e.g., Stockert et al., 2021; Todorović et al., 2023). This link is indeed interesting, but it does seem relevant to point out that at a more specific scale, it does not concern the exact same regions between those studies and the current study. That is, in the current study the frequency-specific connectivity with temporal regions concerns lobule VI in the right cerebellum, while in the referenced work it concerns Crus I/II. The distinction seems relevant since Crus I/II has been linked to the internal modelling of more cognitive behavior, while lobule VI seems more motor-related and/or contextual-related (e.g., D'Mello et al., 2020; Runnqvist et al., 2021; Runnqvist, 2023).

      We thank the reviewer for their insightful comment. The reference was intended to provide evidence for the role of the cerebellum in internal modelling in speech. We do not claim that we have the spatial resolution with MEG to reliably spatially resolve specific parts of the cerebellum.

      (4) On the methodological side, my main concern is that for the listening condition, the authors have chosen to play back the speech produced by the participants in the production condition. Both the fixed order as well as hearing one's own speech as listening condition may produce confounds in data interpretation, especially with regard to the comparison between speech production and perception. Could order effects impact the observed connectivity profiles, and how would this impact the comparison between speaking and listening? In particular, I am thinking of repetition effects present in the listening condition as well as prediction, which will be much more elevated for the listening condition than the speaking condition. The fact that it also concerns their own voice furthermore adds to the possible predictability confound (e.g., Heinks-Maldonado et al., 2005). In addition, listening to one's speech which just before has been articulated may, potentially strategically even, enhance inner speech and "mouthing" in the participants, hereby thus engaging the production mechanism. Similarly, during production, the participants already hear their own voice (which serves as input in the subsequent listening condition). Taken together, both similarities or differences between speaking and listening connectivity may have been due to or influenced by these order effects, and the fact that the different speech behaviors are to some extent present in both conditions.

      This is a valid point raised by the reviewer. By listening to their own previously produced speech, our participants might have anticipated and predicted the sentences easier. However, during designing our experiment, we tried to lower the chance of this anticipation by several steps. First, participants were measured in separate sessions for speech production and perception tasks. There were always several days' intervals between performing these two conditions. Secondly, our questions were mainly about a common/general topic. Consequently, participants may not remember their answers completely.

      Importantly, using the same stimulus material for speaking and listening guaranteed that there was no difference in the low-level features of the material for both conditions that could have affected the results of our statistical comparison.

      Due to bone conduction, hearing one’s unaltered own speech from a recording may seem foreign and could lead to unwanted emotional reactions e.g. embarrassment, so participants were asked whether they heard their own voice in a recording already (e.g. from a self-recorded voice-message in WhatsApp) which most of them confirmed. Participants were also informed that they were going to hear themselves during the measurement to further reduce unwanted psychophysiological responses.

      (5) The ability of the authors to analyze the spatiotemporal dynamics during continuous speech is a potentially important feat of this study, given that one of the reasons that speech production is much less investigated compared to perception concerns motor and movement artifacts due to articulation (e.g., Strijkers et al., 2010). Two questions did spring to mind when reading the authors' articulation artifact correction procedure: If I understood correctly, the approach comes from Abbasi et al. (2021) and is based on signal space projection (SSP) as used for eye movement corrections, which the authors successfully applied to speech production. However, in that study, it concerned the repeated production of three syllables, while here it concerns continuous speech of full words embedded in discourse. The articulation and muscular variance will be much higher in the current study compared to three syllables (or compared to eye movements which produce much more stable movement potentials compared to an entire discourse). Given this, I can imagine that corrections of the signal in the speaking condition were likely substantial and one may wonder (1) how much signal relevant to speech production behavior is lost?; (2) similar corrections are not necessary for perception, so how would this marked difference in signal processing affect the comparability between the modalities?

      One of the results of our previous study (Abbasi et al., 2021) was that the artefact correction was not specific to individual syllables but generalised across syllables. Also, the repeated production of syllables was associated with substantial movements of the articulators mimicking those observed during naturalistic speaking. We therefore believe that the artefact rejection is effective during speaking. We also checked this by investigating speech related coherence in brain parcels in spatial proximity to the articulators. In our previous study we also show that the correction method retains neural activity to a very large degree. We are therefore confident that speaking and listening conditions can be compared and that the loss of true signals from correcting the speaking data will be minor.

      References:

      • Abbasi, O., Steingräber, N., & Gross, J. (2021). Correcting MEG artifacts caused by overt speech. Frontiers in Neuroscience, 15, 682419.

      • D'Mello, A. M., Gabrieli, J. D., & Nee, D. E. (2020). Evidence for hierarchical cognitive control in the human cerebellum. Current Biology, 30(10), 1881-1892.

      • Fairs, A., Michelas, A., Dufour, S., & Strijkers, K. (2021). The same ultra-rapid parallel brain dynamics underpin the production and perception of speech. Cerebral Cortex Communications, 2(3), tgab040.

      • Floegel, M., Fuchs, S., & Kell, C. A. (2020). Differential contributions of the two cerebral hemispheres to temporal and spectral speech feedback control. Nature Communications, 11(1), 2839.

      • Giglio, L., Ostarek, M., Sharoh, D., & Hagoort, P. (2024). Diverging neural dynamics for syntactic structure building in naturalistic speaking and listening. Proceedings of the National Academy of Sciences, 121(11), e2310766121.

      • Heinks‐Maldonado, T. H., Mathalon, D. H., Gray, M., & Ford, J. M. (2005). Fine‐tuning of auditory cortex during speech production. Psychophysiology, 42(2), 180-190.

      • Price, C. J. (2010). The anatomy of language: a review of 100 fMRI studies published in 2009. Annals of the new York Academy of Sciences, 1191(1), 62-88.

      • Runnqvist, E., Chanoine, V., Strijkers, K., Pattamadilok, C., Bonnard, M., Nazarian, B., ... & Alario, F. X. (2021). Cerebellar and cortical correlates of internal and external speech error monitoring. Cerebral Cortex Communications, 2(2), tgab038.

      • Runnqvist, E. (2023). Self-monitoring: The neurocognitive basis of error monitoring in language production. In Language production (pp. 168-190). Routledge.

      • Stockert, A., Schwartze, M., Poeppel, D., Anwander, A., & Kotz, S. A. (2021). Temporo-cerebellar connectivity underlies timing constraints in audition. Elife, 10, e67303.

      • Strijkers, K., Costa, A., & Thierry, G. (2010). Tracking lexical access in speech production: electrophysiological correlates of word frequency and cognate effects. Cerebral cortex, 20(4), 912-928.

      • Todorović, S., Anton, J. L., Sein, J., Nazarian, B., Chanoine, V., Rauchbauer, B., ... & Runnqvist, E. (2023). Cortico-cerebellar monitoring of speech sequence production. Neurobiology of Language, 1-21.

      Reviewer #2 (Public Review):

      Summary:

      The authors re-analyse MEG data from a speech production and perception study and extend their previous Granger causality analysis to a larger number of cortical-cortical and in particular cortical-subcortical connections. Regions of interest were defined by means of a meta-analysis using Neurosynth.org and connectivity patterns were determined by calculating directed influence asymmetry indices from the Granger causality analysis results for each pair of brain regions. Abbasi et al. report feedforward signals communicated via fast rhythms and feedback signals via slow rhythms below 40 Hz, particularly during speaking. The authors highlight one of these connections between the right cerebellum lobule VI and auditory association area A5, where in addition the connection strength correlates negatively with the strength of speech tracking in the theta band during speaking (significant before multiple comparison correction). Results are interpreted within a framework of active inference by minimising prediction errors.

      While I find investigating the role of cortical-subcortical connections in speech production and perception interesting and relevant to the field, I am not yet convinced that the methods employed are fully suitable to this endeavour or that the results provide sufficient evidence to make the strong claim of dissociation of bottom-up and top-down information flow during speaking in distinct frequency bands.

      Strengths:

      The investigation of electrophysiological cortical-subcortical connections in speech production and perception is interesting and relevant to the field. The authors analyse a valuable dataset, where they spent a considerable amount of effort to correct for speech production-related artefacts. Overall, the manuscript is well-written and clearly structured.

      Weaknesses:

      The description of the multivariate Granger causality analysis did not allow me to fully grasp how the analysis was performed and I hence struggled to evaluate its appropriateness. Knowing that (1) filtered Granger causality is prone to false positives and (2) recent work demonstrates that significant Granger causality can simply arise from frequency-specific activity being present in the source but not the target area without functional relevance for communication (Schneider et al. 2021) raises doubts about the validity of the results, in particular with respect to their frequency specificity. These doubts are reinforced by what I perceive as an overemphasis on results that support the assumption of specific frequencies for feedforward and top-down connections, while findings not aligning with this hypothesis appear to be underreported. Furthermore, the authors report some main findings that I found difficult to reconcile with the data presented in the figures. Overall, I feel the conclusions with respect to frequency-specific bottom-up and top-down information flow need to be moderated and that some of the reported findings need to be checked and if necessary corrected.

      Major points

      (1) I think more details on the multivariate GC approach are needed. I found the reference to Schaum et al., 2021 not sufficient to understand what has been done in this paper. Some questions that remained for me are:

      (i) Does multivariate here refer to the use of the authors' three components per parcel or to the conditioning on the remaining twelve sources? I think the latter is implied when citing Schaum et al., but I'm not sure this is what was done here?

      If it was not: how can we account for spurious results based on indirect effects?

      Yes, multivariate refers to the three components.

      (ii) Did the authors check whether the GC of the course-target pairs was reliably above the bias level (as Schaum et. al. did for each condition separately)? If not, can they argue why they think that their results would still be valid? Does it make sense to compute DAIs on connections that were below the bias level? Should the data be re-analysed to take this concern into account?

      We performed statistics on DAI and believe that this is a valid approach. We argue that random GC effects would not survive our cluster-corrected statistics.

      (iii) You may consider citing the paper that introduced the non-parametric GC analysis (which Schaum et al. then went on to apply): Dhamala M, Rangarajan G, Ding M. Analyzing Information Flow in Brain Networks with Nonparametric Granger Causality. Neuroimage. 2008; 41(2):354-362. https://doi.org/10.1016/j.neuroimage.2008.02. 020

      Thanks, we will add this reference in the revised version.

      (2) GC has been discouraged for filtered data as it gives rise to false positives due to phase distortions and the ineffectiveness of filtering in the information-theoretic setting as reducing the power of a signal does not reduce the information contained in it (Florin et al., 2010; Barnett and Seth, 2011; Weber et al. 2017; Pinzuti et al., 2020 - who also suggest an approach that would circumvent those filter-related issues). With this in mind, I am wondering whether the strong frequency-specific claims in this work still hold.

      This must be a misunderstanding. We are aware of the problem with GC on filtered data. But GC was here computed on broadband data and not in individual frequency bands.

      (3) I found it difficult to reconcile some statements in the manuscript with the data presented in the figures:

      (i) Most notably, the considerable number of feedforward connections from A5 and STS that project to areas further up the hierarchy at slower rhythms (e.g. L-A5 to R-PEF, R-Crus2, L CB6 L-Tha, L-FOP and L-STS to R-PEF, L-FOP, L-TOPJ or R-A5 as well as R-STS both to R-Crus2, L-CB6, L-Th) contradict the authors' main message that 'feedback signals were communicated via slow rhythms below 40 Hz, whereas feedforward signals were communicated via faster rhythms'. I struggled to recognise a principled approach that determined which connections were highlighted and reported and which ones were not.

      (ii) "Our analysis also revealed robust connectivity between the right cerebellum and the left parietal cortex, evident in both speaking and listening conditions, with stronger connectivity observed during speaking. Notably, Figure 4 depicts a prominent frequency peak in the alpha band, illustrating the specific frequency range through which information flows from the cerebellum to the parietal areas." There are two peaks discernible in Figure 4, one notably lower than the alpha band (rather theta or even delta), the other at around 30 Hz. Nevertheless, the authors report and discuss a peak in the alpha band.

      (iii) In the abstract: "Notably, high-frequency connectivity was absent during the listening condition." and p.9 "In contrast with what we reported for the speaking condition, during listening, there is only a significant connectivity in low frequency to the left temporal area but not a reverse connection in the high frequencies."

      While Fig. 4 shows significant connectivity from R-CB6 to A5 in the gamma frequency range for the speaking, but not for the listening condition, interpreting comparisons between two effects without directly comparing them is a common statistical mistake (Makin and Orban de Xivry). The spectrally-resolved connectivity in the two conditions actually look remarkably similar and I would thus refrain from highlighting this statement and indicate clearly that there were no significant differences between the two conditions.

      (iv) "This result indicates that in low frequencies, the sensory-motor area and cerebellum predominantly transmit information, while in higher frequencies, they are more involved in receiving it."

      I don't think that this statement holds in its generality: L-CB6 and R-3b both show strong output at high frequencies, particularly in the speaking condition. While they seem to transmit information mainly to areas outside A5 and STS these effects are strong and should be discussed.

      We appreciate the reviewer's thoughtful comments. We acknowledge that not all connectivity patterns strictly adhere to the initial observation regarding feedback and feedforward communication. It's true that our primary focus was on interactions between brain regions known to be crucial for speech prediction, including auditory, somatosensory, and cerebellar areas. However, we also presented connectivity patterns across other regions to provide a more comprehensive picture of the speech network. We believe this broader perspective can be valuable for future research directions.

      Regarding the reviewer's observation about the alpha band peak in Figure 4, we agree that a closer examination reveals the connectivity from right cerebellum to the left parietal is in a wider low frequency range. We will refrain from solely emphasizing the alpha band and acknowledge the potential contribution of lower frequencies to cerebellar-parietal communication.

      We also appreciate the reviewer highlighting the need for a more nuanced interpretation of the listening condition connectivity compared to the speaking condition. The reviewer is correct in pointing out that while Figure 4 suggests a high-frequency connectivity from L-A5 to R-CB only in the speaking condition, a direct statistical comparison between conditions might not reveal a significant difference. We will revise the manuscript to clarify this point.

      Finally, a closer examination of Figure 3 revealed that the light purple and dark green edges in the speaking condition for R-CB6 and L-3b suggest outgoing connections at low frequencies, while other colored edges indicate information reception at high frequencies. We acknowledge that exceptions to this directional pattern might exist and warrant further investigation in future studies.

      (4) "However, definitive conclusions should be drawn with caution given recent studies raising concerns about the notion that top-down and bottom-up signals can only be transmitted via separate frequency channels (Ferro et al., 2021; Schneider et al., 2021; Vinck et al., 2023)."

      I appreciate this note of caution and think it would be useful if it were spelled out to the reader why this is the case so that they would be better able to grasp the main concerns here. For example, Schneider et al. make a strong point that we expect to find Granger-causality with a peak in a specific frequency band for areas that are anatomically connected when the sending area shows stronger activity in that band than the receiving one, simply because of the coherence of a signal with its own linear projection onto the other area. The direction of a Granger causal connection would in that case only indicate that one area shows stronger activity than the other in the given frequency band. I am wondering to what degree the reported connectivity pattern can be traced back to regional differences in frequency-specific source strength or to differences in source strength across the two conditions.

      This is indeed an important point. That is why we are discussing our results with great caution and specifically point the reader to the relevant literature. We are indeed thinking about a future study where we investigate this connectivity using other connectivity metrics and a detailed consideration of power.

      Reviewer #3 (Public Review):

      In the current paper, Abbasi et al. aimed to characterize and compare the patterns of functional connectivity across frequency bands (1 Hz - 90 Hz) between regions of a speech network derived from an online meta-analysis tool (Neurosynth.org) during speech production and perception. The authors present evidence for complex neural dynamics from which they highlight directional connectivity from the right cerebellum to left superior temporal areas in lower frequency bands (up to beta) and between the same regions in the opposite direction in the (lower) high gamma range (60-90 Hz). Abbasi et al. interpret their findings within the predictive coding framework, with the cerebellum and other "higher-order" (motor) regions transmitting top-down sensory predictions to "lower-order" (sensory) regions in the lower frequencies and prediction errors flowing in the opposite direction (i.e., bottom-up) from those sensory regions in the gamma band. They also report a negative correlation between the strength of this top-down functional connectivity and the alignment of superior temporal regions to the syllable rate of one's speech.

      Strengths:

      (1) The comprehensive characterization of functional connectivity during speaking and listening to speech may be valuable as a first step toward understanding the neural dynamics involved.

      (2) The inclusion of subcortical regions and connectivity profiles up to 90Hz using MEG is interesting and relatively novel.

      (3) The analysis pipeline is generally adequate for the exploratory nature of the work.

      Weaknesses:

      (1) The work is framed as a test of the predictive coding theory as it applies to speech production and perception, but the methodological approach is not suited to this endeavor.

      We agree that we cannot provide definite evidence for predictive coding in speech production and perception and we believe that we do not make that claim in the manuscript. However, our results are largely consistent with what can be expected based on predictive coding theory.

      (2) Because of their theoretical framework, the authors readily attribute roles or hierarchy to brain regions (e.g., higher- vs lower-order) and cognitive functions to observed connectivity patterns (e.g., feedforward vs feedback, predictions vs prediction errors) that cannot be determined from the data. Thus, many of the authors' claims are unsupported.

      We will revise the manuscript to more clearly differentiate our results (e.g. directed Granger-Causality from A to B) from their interpretation (potentially indicating feedforward or feedback signals).

      (3) The authors' theoretical stance seems to influence the presentation of the results, which may inadvertently misrepresent the (otherwise perfectly valid; cf. Abbasi et al., 2023) exploratory nature of the study. Thus, results about specific regions are often highlighted in figures (e.g., Figure 2 top row) and text without clear reasons.

      Our connectograms reveal a multitude of results that we hope is interesting to the community. At the same time the wealth of findings poses a problem for describing them. We did not see a better way then to highlight specific connections of interest.

      (4) Some of the key findings (e.g., connectivity in opposite directions in distinct frequency bands) feature in a previous publication and are, therefore, interesting but not novel.

      We actually see this as a strength of the current manuscript. The computation of connectivity is here extended to a much larger sample of brain areas. It is reassuring to see that the previously reported results generalise to other brain areas.

      (5) The quantitative comparison between speech production and perception is interesting but insufficiently motivated.

      We thank the reviewer for this comment. We have addressed that in detail in response to the point (1&4) of reviewer 1.

      (6) Details about the Neurosynth meta-analysis and subsequent selection of brain regions for the functional connectivity analyses are incomplete. Moreover, the use of the term 'Speech' in Neurosynth seems inappropriate (i.e., includes irrelevant works, yielding questionable results). The approach of using separate meta-analyses for 'Speech production' and 'Speech perception' taken by Abbasi et al. (2023) seems more principled. This approach would result, for example, in the inclusion of brain areas such as M1 and the BG that are relevant for speech production.

      We agree that there are inherent limitations in automated meta-analysis tools such as Neurosynth. Papers are used in the meta-analysis that might not be directly relevant. However, Neurosynth has proven its usefulness over many years and has been used in many studies. We also agree that our selection of brain areas is not complete. But Granger Causality analysis of every pair of ROIs leads to complex results and we had to limit our selection of areas.

      (7) The results involving subcortical regions are central to the paper, but no steps are taken to address the challenges involved in the analysis of subcortical activity using MEG. Additional methodological detail and analyses would be required to make these results more compelling. For example, it would be important to know what the coverage of the MEG system is, what head model was used for the source localization of cerebellar activity, and if specific preprocessing or additional analyses were performed to ensure that the localized subcortical activity (in particular) is valid.

      There is a large body of evidence demonstrating that MEG can record signals from deep brain areas such as thalamus and cerebellum including Attal & Schwarz 2013, Andersen et al, Neuroimage 2020; Piastra et al., 2020; Schnitzler et al., 2009. These and other studies provide evidence that state-of-the-art recording (with multichannel SQUID systems) and analysis is sufficient to allow reconstruction of subcortical areas. However, spatial resolution is clearly reduced for these deep areas. We will add a statement in the revised manuscript to acknowledge this limitation.

      (8) The results and methods are often detailed with important omissions (a speech-brain coupling analysis section is missing) and imprecisions (e.g., re: Figure 5; the Connectivity Analysis section is copy-pasted from their previous work), which makes it difficult to understand what is being examined and how. (It is also not good practice to refer the reader to previous publications for basic methodological details, for example, about the experimental paradigm and key analyses.) Conversely, some methodological details are given, e.g., the acquisition of EMG data, without further explanation of how those data were used in the current paper.

      We will revise the relevant sections of the manuscript.

      (9) The examination of gamma functional connectivity in the 60 - 90 Hz range could be better motivated. Although some citations involving short-range connectivity in these frequencies are given (e.g., within the visual system), a more compelling argument for looking at this frequency range for longer-range connectivity may be required.

      Given previous evidence of connectivity in the gamma band we think that it would be a weakness to exclude this frequency band from analysis.

      (10) The choice of source localization method (linearly constrained minimum variance) could be explained, particularly given that other methods (e.g. dynamic imaging of coherent sources) were specifically designed and might potentially be a better alternative for the types of analyses performed in the study.

      Both LCMV and DICS are beamforming methods. We used LCMV because we wanted used Granger Causality which requires broadband signals. DICS would only provide frequency-specific band-limited signals.

      (11) The mGC analysis needs to be more comprehensively detailed for the reader to be able to assess what is being reported and the strength of the evidence. Relatedly, first-level statistics (e.g., via estimation of the noise level) would make the mGC and DAI results more compelling.

      We perform group-level cluster-based statistics on mGC while correcting for multiple comparisons across frequency bands and brain parcels and report only significant results. This is an established approach that is routinely used in this type of studies.

      (12) Considering the exploratory nature of the study, it is essential for other researchers to continue investigating and validating the results presented in the current manuscript. Thus, it is concerning that data and scripts are not fully and openly available. Data need not be in its raw state to be shared and useful, which circumvents the stated data privacy concerns.

      We acknowledge the reviewer's concern regarding the full availability of the dataset. Due to privacy limitations on the collected data, we are unable to share it publicly at this time. However, to promote transparency and enable further exploration, we have provided the script used for data analysis and an example dataset. This example dataset should provide a clear understanding of the data structure and variables used in the analysis. Additionally, we are happy to share the complete dataset upon request from research teams interested in performing in-depth secondary analyses.

    1. Author response:

      We would like to thank all reviewers for their time, critical evaluation, recognition, and constructive comments of the manuscript. We will revise the manuscript accordingly. Below are our point-to-point response to the comments.

      From Reviewer #1:

      “…several previous studies have identified co-expression of vomeronasal receptors by vomeronasal sensory neurons, and the expression of non-vomeronasal receptors, and this was not adequately addressed in the manuscript as presented.”

      We plan to add context and citations to the Introduction and Results sections relating to recent studies on the co-expression of vomeronasal receptors and the expression of non-vomeronasal receptors in VSNs.

      “The data resulting from the use of the Resolve Biosciences spatial transcriptomics platform are somewhat difficult to interpret, and the methods are somewhat opaque.”

      Unfortunately, detailed Molecular Cartography protocols remain proprietary at Resolve Biosciences and were not disclosed. We acknowledge this limitation. Our role in the acquisition and processing of data for this experiment is included in the current Methods section. We will clarify this in the revised manuscript. Additional figures produced by the Molecular Cartography analysis will also be added (See response to Reviewer #2, below) to the supplemental materials to help clarify interpretation of the results.

      From Reviewer #2:

      “…the authors present a biased report of previously published work, largely including only those results that do not overlap with their own findings, but ignoring results that would question the novelty of the data presented here.”

      We had no intention of misleading the readers. In fact, we have discussed discrepancies between our results with other studies. However, we inadvertently left out a critical publication in preparing the manuscript. We plan to add context and citations (where missing) relating to recent studies that use single cell RNA sequencing in the vomeronasal organ, studies relating to the co-expression of vomeronasal receptors, and studies discussing V1R/V2R lineage determination.

      “Did the authors perform any cell selectivity, or any directed dissection, to obtain mainly neuronal cells? Previous studies reported a greater proportion of non-neuronal cells. For example, while Katreddi and co-workers (ref 89) found that the most populated clusters are identified as basal cells, macrophages, pericytes, and vascular smooth muscle, Hills Jr. et al. in this work did not report such types of cells. Did the authors check for the expression of marker genes listed in Ref 89 for such cell types?”

      For VNO dissections, we removed bones and blood vessels from VNO tissue and only kept the sensory epithelium. This procedure removed vascular smooth muscle cells, pericytes, and other non-neuronal cell types, which explains differences in cell proportions between out study and previous studies. We used a DAPI/Draq5 assay to sort live/nucleated cells for sequencing and no specific markers were used for cell selection. All cells in the experiment were successfully annotated using the cell-type markers shown in Fig. 1B, save for cells from the sVSN cluster, which were novel, and required further analysis to characterize.

      “The authors should report the marker genes used for cell annotation.”

      Marker genes used for cell annotation are shown in figure 1B. A full list of all marker genes used in the cell annotation process will be provided.

      “The authors reported no differences between juvenile and adult samples, and between male and female samples. It is not clear how they evaluate statistically significant differences, which statistical test was used, or what parameters were evaluated.”

      The claims made about male/female mice and P14/P56 mice directly pertain to the distribution of clusters and cells in UMAP space as seen in Figure 1 C & D. We have indeed performed differential gene expression analysis for male/female and P14/P56 comparisons using the FindMarkers function from the Seurat R package. Although we have found significant differential expression between male and female, and between P14 and P56 animals, the genes in this list do not appear to be influential for the neuronal lineage and cell type specification or related to cell adhesion molecules, which are the main focuses of this study. Nevertheless, we plan to add these results to the supplemental materials in a revised manuscript.

      “‘Based on our transcriptomic analysis, we conclude that neurogenic activity is restricted to the marginal zone.’ This conclusion is quite a strong statement, given that this study was not directed to carefully study neurogenesis distribution, and when neurogenesis in the basal zone has been proposed by other works, as stated by the authors.”

      Eighteen slides from whole VNO sections were used in Molecular Cartography analysis, while one representative slide was used to present findings. Across all slides, GBCs, INPs, and iVSNs show a pattern of proximity to the marginal zone (MZ), with GBCs presenting nearest to the MZ and iVSNs presented furthest. We believe that the full scope of our results justifies our claim that neurogenesis is restricted to the MZ. This claim is also supported by the 2021 study by Katreddi & Forni. We will provide additional figures to further support this claim.

      “The authors report at least two new types of sensory neurons in the mouse VNO, a finding of huge importance that could have a substantial impact on the field of sensory physiology. However, the evidence for such new cell types is based solely on this transcriptomic dataset and, as such, is quite weak, since many crucial morphological and physiological aspects would be missing to clearly identify them as novel cell types. As stated before, many control and confirmatory experiments, and a careful evaluation of the results presented in this work must be performed to confirm such a novel and interesting discovery. The reported "novel classes of sensory neurons" in this work could represent previously undescribed types of sensory neurons, but also previously reported cells (see below) or simply possible single-cell sequencing artefacts.”

      The reviewer is correct that detailed morphological and physiological studies are needed to further understand these cells. This is an opinion we share. Our paper is primarily intended as a resource paper to provide access to a large-scale single-cell RNA-sequenced dataset and discoveries based on the transcriptomic data that can support and inspire ongoing and future experiments in the field. Nonetheless, we are confident that neither of the novel cell clusters are the result of sequencing artefacts. We performed a robust quality-control protocol, including count correction for ambient RNA with the R package, SoupX, multiplet cell detection and removal with the Python module, Scrublet, and a strict 5% mitochondrial gene expression cut-off. Furthermore, the cell clusters in question show no signs of being the result of sequencing artefacts, as they are physically connected in a reasonable orientation to the rest of the neuronal lineage in modular clusters in 2D and 3D UMAP space. The OSN and sVSN (S1H) cell clusters each show distinct and self-consistent expressions of genes. Gene ontology (GO) analysis reveals significant GO term enrichment for both the sVSN (Fig. 2G) and mOSN clusters when compared to mature V1R and V2R VSNs, indicating functional differences. Additional figures for mOSN differential gene expression and gene ontology analysis results will be added to the supplemental figures.

      “The authors report the co-expression of V2R and Gnai2 transcripts based on sequencing data. That could dramatically change classical classifications of basal and apical VSNs. However, did the authors find support for this co-expression in spatial molecular imaging experiments?” 

      Genes with extremely high expression levels overwhelm signals from other genes, and therefore had to be removed from the experiment. This is a limitation of the Molecular Cartography platform. Unfortunately, Gnai2 was determined to be one of these genes and was not evaluated for this purpose.

      “Canonical OSNs: The authors report a cluster of cells expressing neuronal markers and ORs and call them canonical OSN. However, VSNs expressing ORs have already been reported in a detailed study showing their morphology and location inside the sensory epithelium (References 82, 83). Such cells are not canonical OSNs since they do not show ciliary processes, they express TRPC2 channels and do not express Golf. Are the "canonical OSNs" reported in this study and the OR-expressing VSNs (ref 82, 83) different? Which parameters, other than Gnal and Cnga2 expression, support the authors' bold claim that these are "canonical OSNs"? What is the morphology of these neurons? In addition, the mapping of these "canonical OSNs" shown in Figure 2D paints a picture of the negligible expression/role of these cells (see their prediction confidence).” 

      We observe OR expression in VSNs in our data; these cells cluster with VSNs. The putative mOSN cluster exhibits its own trajectory, distinct from VSN clusters. These cells express Gnal (Golf), which is not expressed in VSNs expressing ORs, nor in any other cell-type in the data. After performing differential gene expression on the putative mOSN cluster, comparing with V1R and V2R VSNs, independently, GO analysis returned the top significantly enriched GO molecular function, ‘olfactory receptor activity’, and the top significantly enriched cellular component, ‘cilium’. Because we were limited to list of 100 genes in Molecular Cartography probe panel, we have prioritized the detection of canonical VNO cell-types, vomeronasal receptor co-expression, and the putative sVSNs, and were not able to include a robust analysis of the putative OSNs.

      “Secretory VSN: The authors report another novel type of sensory neurons in the VNO and call them "secretory VSNs". Here, the authors performed an analysis of differentially expressed genes for neuronal cells (dataset 2) and found several differentially expressed genes in the sVSN cluster. However, it would be interesting to perform a gene expression analysis using the whole dataset including neuronal and non-neuronal cells. Could the authors find any marker gene that unequivocally identifies this new cell type?”

      We did not find unequivocal marker genes for sVSNs. We did perform differential analysis of the sVSN cluster with whole VNO data and with the neuronal subset, as well as against specific cell-types. We could not find a single gene that was perfectly exclusive to sVSNs. We used a combinatorial marker-gene approach to predicting sVSNs in the Molecular Cartography data. This required a larger subset of our 100 gene panel to be dedicated to genes for detecting sVSNs.

      “When the authors evaluated the distribution of sVSN using the Molecular Cartography technique, they found expression of sVSN in both sensory and non-sensory epithelia. How do the authors explain such unexpected expression of sensory neurons in the non-sensory epithelium?” 

      In our scRNA-Seq experiment, blood vessels were removed, limiting the power to distinguish between certain cell types. Because of the limited number of genes that we can probe using Molecular Cartography, the number of genes associated with sVSNs may be present in the non-sensory epithelium. This could lead to the identification of cells that may or may not be identical to the sVSNs in the non-neuronal epithelium. Indeed, further studies will need to be conducted to determine the specificity of these cells.

      “The low total genes count and low total reads count, combined with an "expression of marker genes for several cell types" could indicate low-quality beads (contamination) that were not excluded with the initial parameter setting. It looks like cells in this cluster express a bit of everything V1R, V2R, OR, secretory proteins...”

      We are confident that the putative sVSN cell cluster is not the result of low-quality cells. We performed a robust quality-control protocol, including count correction for ambient RNA with the R package, SoupX, multiplet cell detection and removal with the Python module, Scrublet, and a strict 5% mitochondrial gene expression cut-off. Furthermore, the cell clusters in question show no signs of being the result of sequencing artefacts, as they are connected in a reasonable orientation to the rest of the neuronal lineage in modular clusters in 2D and 3D UMAP space. The OSN and sVSN cell clusters each show distinct and self-consistent expressions of genes (Fig. S1H). Gene ontology (GO) analysis reveals significant GO term enrichment for both the sVSN (Fig. 2G) and mOSN clusters when compared to mature V1R and V2R VSNs, indicating functional differences. Moreover, while some genes were expressed at a lower level when compared to the canonical VSNs, others were expressed at higher levels, precluding the cause of discrepancy as resulting from an overall loss of gene counts.

      “The authors wrote ‘...the transcriptomic landscape that specifies the lineages is not known...’. This statement is not completely true, or at least misleading. There are still many undiscovered aspects of the transcriptomics landscape and lineage determination in VSNs. However, authors cannot ignore previously reported data showing the landscape of neuronal lineages in VSNs (Ref ref 88, 89, 90, 91 and doi.org/10.7554/eLife.77259). Expression of most of the transcription factors reported by this study (Ascl1, Sox2, Neurog1, Neurod1...) were already reported, and for some of them, their role was investigated, during early developmental stages of VSNs (Ref ref 88, 89, 90, 91 and doi.org/10.7554/eLife.77259). In summary, the authors should fully include the findings from previous works (Ref ref 88, 89, 90, 91 and doi.org/10.7554/eLife.77259), clearly state what has been already reported, what is contradictory and what is new when compared with the results from this work.“

      This is a difference in opinion about the terminology. Transcriptomic landscape in our paper refers to the genome-wide expression by individual cells, not just individual genes. The reviewer is correct that many of the genetic specifiers have been identified, which we cited and discussed. We consider these studies as providing a “genetic” underpinning, rather than the “transcriptomic landscape” in lineage progression. We will clarify this point in the revised manuscript. 

      “…the co-expression of specific V2Rs with specific transcription factors does not imply a direct implication in receptor selection. Directed experiments to evaluate the VR expression dependent on a specific transcription factor must be performed.” 

      The reviewer is correct, and we did not claim that the co-expression of specific transcription factors indicate a direct relationship with receptor selection. We agree that further directed experiments are required to investigate this question.

      “This study reports that transcription factors, such as Pou2f1, Atf5, Egr1, or c-Fos could be associated with receptor choice in VSNs. However, no further evidence is shown to support this interaction. Based on these purely correlative data, it is rather bold to propose cascade model(s) of lineage consolidation.”

      The reviewer is correct. As any transcriptomic study will only be correlative, additional studies will be needed to unequivocally determine the mechanistic link between the transcription factors with receptor choice. Our model provides a base for these studies.

      “The authors use spatial molecular imaging to evaluate the co-expression of many chemosensory receptors in single VNO cells. […] However, it is difficult to evaluate and interpret the results due to the lack of cell borders in spatial molecular imaging. The inclusion of cell border delimitation in the reported images (membrane-stained or computer-based) could be tremendously beneficial for the interpretation of the results.”

      The most common practice for cell segmentation of spatial transcriptomics data is to determine cell borders based on nuclear staining with expansion. We have tested multiple algorithms based on recent studies, but each has its own caveat. We will clarify this point in the revised manuscript.

      “It is surprising that the authors reported a new cell type expressing OR, however, they did not report the expression of ORs in Molecular Cartography technique. Did the authors evaluate the expression of OR using the cartography technique?” 

      We were limited to a 100-gene probe panel and only included one OR, the expression was not high enough for us to substantiate any claims.

      From Reviewer #3:

      “(1) The authors claim that they have identified two new classes of sensory neurons, one being a class of canonical olfactory sensory neurons (OSNs) within the VNO. This classification as canonical OSNs is based on expression data of neurons lacking the V1R or V2R markers but instead expressing ORs and signal transduction molecules, such as Gnal and Cnga2. Since OR-expressing neurons in the VNO have been previously described in many studies, it remains unclear to me why these OR-expressing cells are considered here a "new class of OSNs." Moreover, morphological features, including the presence of cilia, and functional data demonstrating the recognition of chemosignals by these neurons, are still lacking to classify these cells as OSNs akin to those present in the MOE. While these cells do express canonical markers of OSNs, they also appear to express other VSN-typical markers, such as Gnao1 and Gnai2 (Figure 2B), which are less commonly expressed by OSNs in the MOE. Therefore, it would be more precise to characterize this population as atypical VSNs that express ORs, rather than canonical OSNs.”

      We observe OR expression in VSNs in our data; these cells cluster with VSNs. The putative mOSN cluster exhibits its own trajectory, distinct from VSN clusters. These cells express Gnal (Golf), which is not expressed in VSNs expressing ORs, nor in any other cell-type in the data. We have performed differential gene expression analysis on the putative mOSN cluster to compare with V1R and V2R VSNs. GO analysis returned the top significantly enriched GO terms include “olfactory receptor activity” and “cilium”., further supporting that these are OSNs Because we were limited to list of 100 genes in Molecular Cartography probe panels, we have prioritized the detection of canonical VNO cell-types, vomeronasal receptor co-expression, and the putative sVSNs, and were not able to include a robust analysis of the putative OSNs. With regard to Gnai2 and Go expression, we have examined our data from the OSNs dissociated from the olfactory epithelium and detected substantial expression of both. This new analysis provides additional support for our claim. We will update the information in a revised manuscript.

      “(2) The second new class of sensory neurons identified corresponds to a group of VSNs expressing prototypical VSN markers (including V1Rs, V2Rs, and ORs), but exhibiting lower ribosomal gene expression. Clustering analysis reveals that this cell group is relatively isolated from V1R- and V2R-expressing clusters, particularly those comprising immature VSNs. The question then arises: where do these cells originate? Considering their fewer overall genes and lower total counts compared to mature VSNs, I wonder if these cells might represent regular VSNs in a later developmental stage, i.e., senescent VSNs. While the secretory cell hypothesis is compelling and supported by solid data, it could also align with a late developmental stage scenario. Further data supporting or excluding these hypotheses would aid in understanding the nature of this new cell cluster, with a comparison between juvenile and adult subjects appearing particularly relevant in this context.” 

      We wholeheartedly agree with this assessment. Our initial thought was that these were senescent VSNs, but the trajectory analysis did not support this scenario, leading us to propose that these are putative secretive cells. Our analysis also shows that overall, 46% of the putative sVSNs were from the P14 sample and 54% from P56. These cells comprise roughly 6.4% of all P14 cells and 8.5% of P56 cells. In comparison, 28.4% of all cells are mature V1R VSNs at P14, but the percentage rise to 46.7% at P56. The significant presence of sVSNs at P14, and the disproportionate increase when compared with mature VSNs indicate that these are unlikely to be late developmental stage or senescent cells, although we cannot exclude these possibilities. We plan to clarify these points in the revised manuscript.   

      We did not include sVSNs in the trajectory inference analysis because of inherent uncertainty about their developmental origins. However, PCA embeddings were the basis of the pseudotime analysis, and those embeddings that do include the sVSN cluster show that it is distributed evenly between the mature V1R and V2R clusters, with all mature clusters equidistant from GBC and INP clusters, indicating that they may indeed originate from the same stem cell populations. We plan to include trajectory analysis based on this assumption in the revised manuscript.

      (3) The authors' decision not to segregate the samples according to sex is understandable, especially considering previous bulk transcriptomic and functional studies supporting this approach. However, many of the highly expressed VR genes identified have been implicated in detecting sex-specific pheromones and triggering dimorphic behavior. It would be intriguing to investigate whether this lack of sex differences in VR expression persists at the single-cell level. Regardless of the outcome, understanding the presence or absence of major dimorphic changes would hold broad interest in the chemosensory field, offering insights into the regulation of dimorphic pheromone-induced behavior. Additionally, it could provide further support for proposed mechanisms of VR receptor choice in VSNs. 

      The reviewer raised a good point. We did not observe differences between male and female, or between P14 and P56 mice in the distribution of clusters and cells in UMAP space. Indeed, our differential expression analysis has revealed significantly differentially expressed genes in both comparisons. These genes have not been implicated in lineage or cell type determination and we decided not to include the analysis in the current version. In the revised manuscript, we plan to include the results.   

      “(4) The expression analysis of VRs and ORs seems to have been restricted to the cell clusters associated with the neuronal lineage. Are VRs/ORs expressed in other cell types, i.e. sustentacular, HBC, or other cells?” 

      Sparsely expressed low counts of VR and OR genes were observed in non-neuronal cell-types. When their expression as a percentage of cell-level gene counts is considered, however, the expression is negligible when compared to the neurons. The observed expression may be explained by stochastic base-level expression, or it may be the result of remnant ambient RNA that passed filtering. We will clarify this point in the revision.

    1. Author response:

      Thank you for organising the review and providing us with the reviewer's feedback. These comments are very useful, and we would like to express our gratitude to the reviewers for their efforts.

      The reviewers all point out a number of related improvements, relating to: 1) describing various processing steps more clearly, in the online documentation but also in the manuscript itself (e.g. for particle picking), 2) describing more clearly what features Ais offers, how these compare to those of other programmes, and how they might be interfaced with in third-party programmes (e.g. the expected format of models), and 3) a degree of subjectivity in discussion of the results presented in the manuscript (e.g. our statement that Pix2pix performed better in some cases than did other architectures).

      We will address these points, as well as the various other suggestions, in the upcoming revised manuscript and updates to Ais.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Perampalam et al. describe novel methods for genome-wide CRISPR screening to identify and validate genes essential for HGSOC spheroid viability. In this study, they report that Netrin signaling is essential for maintaining disseminated cancer spheroid survival, wherein overexpression of Netrin pathway genes increases tumor burden in a xenograft model of ovarian cancer. They also show that high netrin expression correlates with poor survival outcomes in ovarian cancer patients. The study provides insights into the biology of netrin signaling in DTC cluster survival and warrants development of therapies to block netrin signaling for treating serous ovarian cancer.

      Strengths:

      - The study identifies Netrin signaling to be important in disseminated cancer spheroid survival

      - A Novel GO-CRISPR methodology was used to find key genes and pathways essential for disseminated cancer cell survival

      Thanks for the endorsement of our work and its importance to metastasis in ovarian cancer.

      Weaknesses:

      - The term dormancy is not fully validated and requires additional confirmation to claim the importance of Netrin signaling in "dormant" cancer survival.

      - Findings shown in the study largely relate to cancer dissemination and DTS survival rather than cancer dormancy.

      Much of the validation of dormancy and cell cycle arrest in HGSOC spheroids, as well as the culture model, have been published previously and hence was not repeated here.  I think this reviewer will appreciate the updated citations and explanations to better illustrate the state of knowledge.  We have also added new experiments that further emphasize the dormant state of spheroid cells in culture and xenografts, as well as patient derived spheroids used in this study.

      Reviewer #1 (Recommendations for Authors):

      (1) It is unclear what spheroid/adherent enrichment ratio is and how it ties into genes affecting cell viability. Why is an ER below 1 the criteria for selecting survival genes?

      Our screen uses the ‘guide only’ comparison in each culture condition to establish a gene score under that specific condition.  A low adherent score captures genes that are essential under standard culture conditions where cells are proliferating and this can include genes needed for proliferation or other basic functions in cell physiology.  A low spheroid score identifies the genes that are most depleted in suspension when cells are growth arrested and this is an indication of cell death in this condition.  Since gene knock outs are first established in adherent proliferating conditions, essential genes under these conditions will already start to become depleted from the population before suspension culture.  By selecting genes with a ratio of <1 we can identify those that are most relevant to dormant suspension culture conditions.  Ultimately, the lowest enrichment ratio scores represent genes whose loss of function is dispensable in the initial adherent condition, but critical for survival in suspension and this is what we aimed to identify. We’ve updated Figure 1B to illustrate this and we’ve updated the explanation of the enrichment ratio on page 6, lines 144 to 147 of the results.

      (2) The WB for phospho-p38 in figure 1A for OVCAR8 line does not show increased phosphorylation in the spheroid relative to the adherent. If anything, phospho-p38 appears to be reduced in the spheroid. Can the authors provide a better western blot?

      We’ve updated this blot with a longer exposure, see Figure 1A.  Phosphorylation levels of p38 are essentially unchanged in OVCAR8 cells in suspension culture, although the overall levels of p38 may be slightly reduced in dormant culture conditions.

      (3) How did the authors confirm dormancy apart from western blot for phospho-ERK vs phospho-p38? Authors should add EdU/BrdU staining and/or Ki67 staining to confirm dormancy.

      Previous publications that appear as citations 7,10, and 33 in the reference list established the growth arrest state of these cells in suspension culture in the past.  This included measuring other known markers of dormancy and quiescence such as p27, p130, and reduced cyclin/cdk activity and 3H-thymidine incorporation. In addition, other associated characteristics of dormancy such as EMT and catabolic metabolism have been demonstrated in these culture conditions (see citation 11 and Rafehi et al. Endocr. Relat. Cancer 23;147-59).  We’ve added these additional citations to our descriptions of dormant spheroid culture to better clarify the status of these cells in our experiments (see page 6, lines 126-28).  To ensure that cells are growth arrested in the experiments shown in this paper, we have updated Figure 1A to include blots of p130 and Ki67 to further emphasize that spheroid cells are not proliferating as the quiescence marker (p130) is high and the proliferative marker (Ki67) is lost in suspension culture.

      (4) Can the authors report spheroid volume over time in culture? How was viability measured?

      We’ve updated the methods (see page 27, line 574) to better highlight the description of cell survival that answers both of these questions. At the ends of experimental time points in both the screen and viability assays we captured live cells by replating on adherent plasticware. We fixed and stained with crystal violet and photographed plates to illustrate the sizes of spheroids (shown in Fig. 2 Supplement 1E, Fig. 6C, and 7D). We subsequently extracted the dye and quantitated it spectrophotometrically to quantitatively compare biomass of viable cells between experiments irrespective of the relatively random shapes of spheroids. We found reattachment and staining in this manner to match traditional viability assays such as CellTiter-Glo in a previous paper (10). Furthermore, biomass never increases in culture and diminishes gradually over time in culture consistent with the non-proliferative state of these experiments. Double checks of this equivalency of viability and reattached biomass measurments, as well as demonstrating that biomass is lost over time, are shown in Fig. 2 Supplement 1E that compares reattached crystal violet staining measurements with CellTiter-Glo for DYRK1A knock out cells over time in culture. In addition, we include a comparison of crystal violet staining of reattached spheroids with trypan blue dye exclusion in Fig. 5G and H. In both cases reattachment and more direct viability assays demonstrate the same conclusion that Netrin signaling supports viability in dormant culture.

      (5) Please show survival significance of Netrin signaling genes in recurrence/relapse free survival to claim importance in cancer dormancy.

      See Fig. 7 Supplement 1C where we include the recurrence free survival data. Netrin-1, and -3 high expressors also have a numerically shorter progression free survival but it is not statistically significant. Netrin-1 overexpression alone is also shown and it shows shorter survival with a P-value of 0.0735. Elevated survival of dormant cells in a residual disease state is expected to increase the chance of relapse and shorten this interval. Thus, this data is consistent with our model, but lacks statistical significance. 

      There are many alternative ways to interpret what shorter progression free survival, or overall survival, may mean biologically. Since survival of dormant cells is but one of them, we also added new data to experimentally investigate the role of endogenous Netrin signaling in dormant residual disease in Fig. 6 and described on page 12, lines 266-87.  We used xenograft experiments to show OVCAR8 spheroids form and withdraw from the cell cycle equivalently to suspension culture following intraperitoneal injection.  Furthermore, loss of Netrin signaling due to receptor deletions compromises survival during this early window before disseminated lesions form.  This argues that Netrin signaling contributes to survival during this window of dormancy.  In addition, mice engrafted with mutant cells experience prolonged survival when Netrin signaling is blocked.  Together, these experiments further argue that Netrin signaling supports survival in the dormant, non-proliferative phase, and leads to reduced survival of mice.

      (6) The authors show IHC staining of patient ascites derived HGSOC spheroids. However, no marker for dormancy is shown in these spheroids. Adding Ki67 staining or phospho-ERK vs phospho-p38 would be necessary to confirm cancer dormancy.

      We have added new staining for Ki67 and p130 that compares these markers in HGSOC tumors where Ki67 is high and p130 is low with ascites derived spheroids where staining is the opposite. Importantly, expression of p130 is linked to cellular quiescence and is not found to accumulate in the nucleus of cells that are just transiting through G1.  This confirms that the ascites derived spheroids are dormant.  See Fig. 4A-E and described on page 9, lines 201-7.

      (7) Overall, the findings are interesting in the context of cancer dissemination. There is not enough evidence for cancer dormancy and the importance of Netrin signaling in the survival of cancer dormancy. Overexpression of Netrin increases phosphorylation of ERK, leading one to expect an increase in proliferation. This suggests that Netrin breaks cancer cells out of dormancy, into a proliferative state.

      We have found that the discovery of Netrin activation of MEK-ERK in growth arrested cells is counterintuitive to many cancer researchers.  However, this axis exists in other paradigms of Netrin signaling in axon outgrowth that are not proliferation related (see citation 26, Forcet et al. Nature 417; 443-7 as an example).  We have added Fig. 5D and descriptions on page 11, lines 244-52 to better clarify that Netrins CAN’T induce cell proliferation through ERK.  Addition of recombinant Netrin-1 can only induce ERK phosphorylation in suspension culture conditions and not in quiescent adherent conditions.  The small magnitude of ERK phosphorylation induced by Netrin-1 in suspension compared to treating adherent, quiescent cells with the same concentration of mitogenic EGF further emphasizes that this is not a proliferative signal.  Lastly, the new xenograft experiment in Fig. 6A-D (described on page 12, lines 266-81 demonstrates the growth arrested context in which Netrin signaling in dormant spheroids leads supports viability.

      (8) If authors wish to claim cancer dormancy as the premise of their study, additional confirmatory experiments are required to support their claims. Alternatively, based on the current findings of the study, it would be best to change the premise of the article to Netrin signaling in cancer dissemination and survival of disseminated cancer spheroids rather than cancer dormancy.

      I expect that this reviewer will agree that we have added more than sufficient explanations of background work on HGSOC spheroid dormancy from the literature, as well as new experiments that address their questions about dormancy in our experiments.

      Reviewer #2 (Public Review):

      Summary:

      In this article, the authors employed modified CRISPR screens ["guide-only (GO)-CRISPR"] in the attempt to identify the genes which may mediate cancer cell dormancy in the high grade serous ovarian cancer (HGSOC) spheroid culture models. Using this approach, they observed that abrogation of several of the components of the netrin (e.g., DCC, UNC5Hs) and MAPK pathways compromise the survival of non-proliferative ovarian cancer cells. This strategy was complemented by the RNAseq approach which revealed that a number of the components of the netrin pathway are upregulated in non-proliferative ovarian cancer cells and that their overexpression is lost upon disruption of DYRK1A kinase that has been previously demonstrated to play a major role in survival of these cells. Perampalam et al. then employed a battery of cell biology approaches to support the model whereby the Netrin signaling governs the MEK-ERK axis to support survival of non-proliferative ovarian cancer cells. Moreover, the authors show that overexpression of Netrins 1 and 3 bolsters dissemination of ovarian cancer cells in the xenograft mouse model, while also providing evidence that high levels of the aforementioned factors are associated with poor prognosis of HGSOC patients.

      Strengths:

      Overall it was thought that this study is of potentially broad interest in as much as it provides previously unappreciated insights into the potential molecular underpinnings of cancer cell dormancy, which has been associated with therapy resistance, disease dissemination, and relapse as well as poor prognosis. Notwithstanding the potential limitations of cellular models in mimicking cancer cell dormancy, it was thought that the authors provided sufficient support for their model that netrin signaling drives survival of non-proliferating ovarian cancer cells and their dissemination. Collectively, it was thought that these findings hold a promise to significantly contribute to the understanding of the molecular mechanisms of cancer cell dormancy and in the long term may provide a molecular basis to address this emerging major issue in the clinical practice.

      Thanks for the kind words about the importance of our work in the broader challenges of cancer treatment.

      Weaknesses:

      Several issues were observed regarding methodology and data interpretation. The major concerns were related to the reliability of modelling cancer cell dormancy. To this end, it was relatively hard to appreciate how the employed spheroid model allows to distinguish between dormant and e.g., quiescent or even senescent cells. This was in contrast to solid evidence that netrin signaling stimulates abdominal dissemination of ovarian cancer cells in the mouse xenograft and their survival in organoid culture. Moreover, the role of ERK in mediating the effects of netrin signaling in the context of the survival of non-proliferative ovarian cancer cells was found to be somewhat underdeveloped.

      Experiments previously published in citation 7 show that growth arrest in patient ascites derived spheroids is fully reversible and that argued against non-proliferative spheroids being a form of senescence and moved this work into the dormancy field.  We have added extensive new support for our model systems and data to address the counterintuitive aspects of MEK-ERK signaling in survival instead of proliferation. 

      Reviewer #1 Recommendations for Authors

      (1) A better characterization of the spheroid model may be warranted, including staining for the markers of quiescence and senescence (including combining these markers with staining for the components of the netrin pathway)

      See Figure 1A and page 6, lines 126-36 where we have added blots for Ki67 and p130 to better emphasize the arrested proliferative state of cells in our screening conditions.  We have also added these same controls for patient ascites-derived spheroids in Figure 4 and described on page 9, lines 203-7.  One realization from this CRISPR screen, and others in our lab, is that it identifies functionally important aspects of cell physiology and not necessarily ones that are easily explored using commercially available antibodies.  Netrin-1 and -3 staining of patient derived spheroids in Fig. 4, as well as cell line spheroids stained in Fig. 4 Supplement 1 further support the relevance of this pathway in dormant cancer cells because Netrins are expressed in the right place at the right time.  The Netrin-1 stimulation experiments in Fig. 5C were originally carried out to probe HGSOC cells for functionality of Netrin receptors since we couldn’t reliably detected them by blotting or staining with available antibodies.  This demonstrates that this pathway is active in the various HGSOC cell lines we’ve used and specifically, using OVCAR8 cells, we show it is only active in suspension culture conditions.

      (2) In figure 1A it appears that total p38 levels are reduced in some cell lines in spheroid vs. adherent culture. The authors should comment on this.

      These blots have been updated to be more clear.  Overall p38 levels may be reduced in some cell lines and when compared with activation levels of phosphorylated p38 it suggests the fraction of activated p38 is higher. OVCAR8 cells may be an exception where the overall activity level remains approximately the same.

      (3) The authors should perhaps provide a clearer rationale for choosing to focus on the netrin signaling vs. e.g., GPCR signaling, and consider more explicit defining of "primary" vs. "tertiary" categories in Reactome gene set analysis.

      We’ve updated Fig. 1E and the text on page7, lines 161-5 to illustrate which gene categories identified in the screen belong to which tiers of Reactome categories. It better visualizes why we have investigated the Axon guidance pathway that includes Netrin because it is a highly specific signaling pathway that scores similarly to the broader and less specific categories at the very top of the list. As an aside, the GPCR signaling and GPCR downstream signaling have proven to be fairly intractable categories.  As best we can tell the GPCR downstream signaling category is full of MAPK family members and likely represents some redundancy with MAPK further down.  

      (4) In figure 3A-C, including factors whose expression did not appear to change between adherent and suspension conditions may be warranted as the internal control. Figure 3D-F may benefit from some sort of quantification.

      The mRNA expression levels are normalized to GAPDH as an internal control. We have updated this figure and re-plotted it as fold change relative to adherent culture cells with statistical comparisons to indicate which are significantly upregulated in suspension culture.

      The IHC experiments are now in Fig. 4D-F and show positive staining for Netrin-1 and -3.  Netrin-3 is easiest to see, while Netrin-1 is trickier because the difference with the no primary antibody control isn’t intensity, but the tint of the DAB stain.  We had to counter stain the patient spheroids with Hematoxylin in order for the slide scanner to find the best focal plane and make image registration between sections possible.  This unfortunately makes the Netrin-1 staining rather subtle.  For cell line spheroids in the Fig. 4, Supplement 1 we didn’t need the slide scanner and show negative controls without counter stain that are much more convincing of Netrin-1 detection and reassure us that our staining detects the intended target.  We’ve updated the labels in Fig. 4 and Fig. 4, Supplement 1 for this to be more intuitive.  Unfortunately, relying on the tint of the DAB stain leaves this as a qualitative experiment.

      - In figure 4C-E the authors show that Netrin-1 stimulation induces ERK phosphorylation whereby it is argued that this is a "low-level" stimulation of ERK signaling required for the survival of ovarian cells in the suspension. This is however hard to appreciate, and it was thought that having adherent cells in parallel would be helpful to wage whether this indeed is a "low level" ERK activity. Moreover, the authors should likely include downstream substrates of ERK (e.g., RSKs) as well as p38 in these experiments. The control experiments for the effects of PD184352 on ERK phosphorylation also appear to be warranted. Finally, performing the experiments with PD184352 in the presence of Netrin-1 stimulation would also be advantageous.

      We have added a new Netrin-1 stimulation experiment in Fig. 4D (described on page 11, line 244-52) that shows that Netrins can only activate  very low levels of ERK phosphorylation in suspension when proliferation is arrested. Netrin-1 stimulation of quiescent adherent cells where stimulation of proliferation is possible shows that Netrins are unable to activate ERK phosphorylation in this condition.  In contrast, we also stimulate quiescent adherent OVCAR8 cells with an equal concentration of EGF (a known mitogen) to offer high level ERK phosphorylation as a side by side comparison.  I think that this offers clear evidence that Netrin signaling is inconsistent with inducing cell proliferation.  We’ve also updated citations in the introduction to include citation 26 that offers a previously reported paradigm of Netrin-ERK signaling in axon outgrowth that is a non-cancer, non-proliferative context to remind readers that Netrins utilize MEK-ERK differently. 

      We highlight Netrin-MEK-ERK signaling as key to survival for a number of reasons.  First, Netrin signaling in this paradigm does not fit the dependence receptor paradigm where loss of Netrin receptors protect against cell death.  Fig. 5B rules this out as receptor loss never offers a survival advantage, but clearly receptor deletions compromise survival in suspension culture.  Second, positive Netrin signaling is known to support survival by inactivating phosphorylation of DAPK1.  We’ve added this experiment as Fig. 5 Supplement 1D and show that loss of Netrin receptors doesn’t reduce DAPK1 phosphorylation in a time course of suspension culture.  Consequently, we conclude this isn’t the survival signal either.  Since MEK and ERK family members scored in our screen, we investigated their role in survival.  We now show two different MEK inhibitors with different inhibitory mechanisms to confirm that MEK inhibition induces cell death. In addition to the previous PD184352 inhibitor in our first submission, we’ve added Trametinib as well and this is shown in Fig. 5G.  Since it is surprising the MEK inhibition can kill instead of just arrest proliferation, we’ve also added another cell death assay in which we show trypan blue dye exclusion as a second look at survival.  This is now Fig. 5H.  Lastly, we include Trametinib inhibition of ERK phosphorylation in these assays in Fig. 5I.  While we leave open what takes place downstream of ERK, our model in Fig. 5J offers a very detailed look at the components upstream.

      - Does inhibition of ERK prevent the abdominal spread of ovarian cancer cells? The authors may feel that this is out of the scope of the study, which I would agree with, but then the claims regarding ERK being the major mediator of the effects of netrin signaling should be perhaps slightly toned down.

      We agree that loss of function xenograft experiments will enhance our discovery of Netrin’s role in dormancy and metastasis.  We have added a new Fig. 6 that uses xenografts with Netrin receptor deficient OVCAR8 cells (UNC5 4KO).  It demonstrates that two weeks following IP engraftment we can isolate spheroids from abdominal washes and that cells have entered a state of reduced proliferation as determined by lowered Ki67 expression as well as other proliferation inducing genes.  In the case of UNC5 4KO cells, there is significant attrition of these cells as determined by recovering spheroids in adherent culture (Fig.6C) and by Alu PCR to detect human cells in abdominal washes (Fig. 6D).  Lastly, xenografts of UNC5 4KO cells cause much less aggressive disease and significantly extend survival of these mice (Fig. 6E,F).  Not exactly the experiment that the reviewer is asking for, but a clear indication that Netrin signaling supports survival in xenograft model of dormancy.

      - Notwithstanding that this could be deduced from figures 6D and F, it would be helpful if the number of mice used in each experimental group is clearly annotated in the corresponding figure legends. Moreover, indicating the precise statistical tests that were used in the figures would be helpful (e.g., specifying whether anova is one-way, two-way, or?)

      We have added labels to what is now Fig. 8B to indicate the number of animals used for each genotype of cells.  We have also updated figure legends to include more details of statistical tests used in each instance.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Roy et al. used the previously published deep transfer learning tool, DEGAS, to map disease associations onto single-cell RNA-seq data from bulk expression data. The authors performed independent runs of DEGAS using T2D or obesity status and identified distinct β-cell subpopulations. β-cells with high obese-DEGAS scores contained two subpopulations derived largely from either non-diabetic or T2D donors. Finally, immunostaining using human pancreas sections from healthy and T2D donors validated the heterogeneous expression and depletion of DLK1 in T2D islets.

      Strengths:

      (1) This meta-analysis of previously published scRNA-seq data using a deep transfer learning tool.

      (2) Identification of novel beta cell subclusters.

      (3) Identified a relatively innovative role of DLK1 in T2D disease progression.

      We thank the reviewer for their constructive critiques and positive feedback. We hope to further apply deep transfer learning tools in future scRNA-seq meta-analyses.

      Weaknesses:

      (1) There is little overlap of the DE list of bulk RNA-seq analysis in Figure 1D and 1E overlap with the DE list of pseudo-bulk RNA-seq analysis of all cells in Figure S2C.

      We thank the reviewer for this insightful thought and plan to perform additional analyses and comparisons to address this comment.

      (2) The biological meaning of "beta cells had the lowest scores compared to other cell types" is not clear.

      We agree with the reviewer and will amend this statement to clarify in the revised manuscript. In summary, the relatively lower T2D-DEGAS scores for beta cells overall compared to all other cell types (alpha cells, acinar cells, etc) reflects the fact that in T2D, beta cell-specific genes can be downregulated. This is also possibly due to beta cell loss in T2D and would be reflected in bulk islet RNAseq data. This affects the DEGAS model which is reflected in the scores of all cells in the scRNA-seq data (Fig 3A). For this reason, subsetting the beta cells and replotting them on their own (Fig 4B) is an important step to identify relative differences in DEGAS scores between different subsets of beta cells.

      (3) The figures and supplemental figures were not cited following the sequence, which makes the manuscript very difficult to read. Some supplemental figures, such as Figures S1C-S1D, S2B-S2E, S3A-S3B, were not cited or mentioned in the text.

      We apologize and thank the reviewer for pointing out these errors. All of the annotated errors will be amended in the revised manuscript.

      (4) In Figure 7, the current resolution is too low to determine the localization of DLK1.

      We will include the original highest-resolution confocal images in our resubmission. We will also improve the color combination to improve visibility of colocalization of DLK1 with Insulin.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Gitanjali Roy et al. applies deep transfer learning (DEGAS) to assign patient-level disease attributes (metadata) to single cells of T2D and non-diabetic patients, including obese patients. This led to the identification of a singular cluster of T2D-associated β-cells; and two subpopulations of obese- β-cells derived from either non-diabetic or T2D donors. The objective was to identify novel and established genes implicated in T2D and obesity. Their final goal is to validate their findings at the protein level using immunohistochemistry of pancreas tissue from non-diabetic and T2D organ donors.

      Strengths:

      This paper is well-written, and the findings are relevant for β-cell heterogeneity in T2D and obesity.

      We thank the reviewer for their constructive critiques and positive feedback. We believe this study can improve our understanding β-cell heterogeneity in the context of T2D and obesity.

      Weaknesses:

      The validation they provide is not sufficiently strong: no DLK1 immunohistochemistry is shown of obese patient-derived sections. Additional presumptive relevant candidates from this transcriptomic analysis should be screened for, at the protein level.

      Thank the reviewer for this suggestion. We are planning to perform new immunostaining of DLK1 in human pancreas tissue sections from non-diabetic lean, non-diabetic obese, T2D lean, and T2D obese donors. We also note that Table S6 contains the patient metadata for the pancreas samples we show in the current manuscript. Two of the T2D donors have BMI > 30 (obese). However, the non-diabetic donors have BMI between 26-29. Our new planned studies should address the question of differential DLK1 expression / beta cell heterogeneity in the context of both diabetes and obesity.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1 (Public Review):

      He et al. investigate the requirement and function of Blimp1 (encoded by Prdm1) in murine NK cells and ILC1. Employing a conditional knockout mouse model (Prdm1flox x Ncr1cre), the authors describe impaired abundance and maturation of Prdm1-deficient NK cells and ILC1 in different tissues. Blimp1-deficient NK cells have reduced expression of cytotoxic molecules (Gzmb, Prf1) and, in some instances, Ifng production, and Prdm1flox x Ncr1cre mice show impaired tumor control in experimental metastasis models. Using single-cell RNA sequencing analysis, the authors propose that Prdm1 regulates JunB expression and NK cell maturation. Based on in silico analyses, the authors suggest manifold intercellular communication between NK/ILC1 and macrophages. Without following up on any of these potentially interesting suggestions, the authors conclude their study reiterating that Prdm1 regulates IFNg-production of tumor-infiltrating NK cells and ILC1. Many of the reported functions of Blimp1 in NK cells have previously been identified using a mixed-chimera strategy comparing Prdm1 WT and KO NK cells (Kallies et al., Blood 2011). Here, the authors expand on these findings using a conditional model to delete Prdm1 in NK/ILC1 and single-cell sequencing and provide a more refined analysis of the functions of Blimp1 in these cells. Cell-chat analysis suggests close interactions of Blimp-dependent NK/ILC1 subsets with hepatic macrophages, but these suggestions are not followed up by experiments. Potentially interesting differences in the macrophage compartment of Ncr1-Cre x Prdm1-fl/fl mice are suggested by the scRNA-Seq data but are not validated e.g. by FACS. The study falls short in providing new mechanistic insights. Nevertheless, it is an interesting confirmation of "old" suggestions in a more refined setting, and the provided single-cell mRNA-Seq data represents a potentially valuable resource for the community. There are some control analyses that are required to support the conclusions of the authors, and I have a few suggestions that would help to improve the manuscript.

      We sincerely appreciate your careful review and insightful feedback on our manuscript. We have carefully considered your comments and present the results of new experiments conducted in response to your suggestions. Please find the detailed responses below.

      Major comments

      Comment 1: The authors do not control for the potential effects of Cre expression. Expression of Cre from within the Ncr1 locus (using the mouse model established by Narni-Mancinelli et al.) has significant effects on NK cells and especially ILC1s (reducing their frequency and absolute numbers and altering their functionality. The authors should characterize the Ncr1cre mice used here (developed by Shanghai Model Organism Center) in this regard and should use proper controls (Ncr1Cre+ Prdm1wt/wt as control for Ncr1Cre+ Prdm1fl/fl, instead of WT littermates) for all of their key data, e.g. those depicted in Fig 1FG, 2ADFH, 7D, S2,3,4.

      Response 1: This is a very insightful question that has posed a challenge for many researchers, including us, engaged in conditional knockout studies. The expression of Cre and the insertion of loxP sequences both have the potential to influence gene expression. This is because the region where loxP is inserted may contain regulatory sequences for the gene of interest. Ncr1-Cre is a frequently used transgenic mouse model in our laboratory. In our prior research, we also had concerns about the possible impact of Cre on NKp46 expression, which could lead to a decline in NK cell function. Therefore, in our previous studies focused on Smad4 expression in NK cells, we conducted similar experiments. In Figure 6 of our published paper in the Journal of Clinical Investigation (Wang et al., J Clin Invest, 2018), we compared NKp46-iCreTgfbr2fl/flSmad4fl/WT with NKp46-iCreTgfbr2fl/flSmad4fl/fl. Although the primary purpose is to establish Smad4's independence from TGF-β, it also allows for a comparison between Smad4fl/fl and Smad4fl/WT in the presence of Cre. In the critical phenotype we assessed, NKp46-iCreTgfbr2fl/flSmad4fl/fl (compared with NKp46-iCreTgfbr2fl/flSmad4fl/WT) exhibited the same phenotype as NKp46-iCreSmad4fl/fl (compared with NKp46WTSmad4fl/fl). This suggests that Cre's influence on NK cells may be within a reasonable and controllable range. Furthermore, in contrast to the decrease in Ncr1 expression caused by Cre, the reduction in the expression levels of genes targeted by Loxp knockout, such as Prdm1 in this study (Figure 1 E), is more significant. Therefore, with the current techniques and research methods, we believe that the data provided in this study can support the role of Prdm1 in

      NK cells.

      Comment 2: Several of the phenotypic findings on NK cells have been described before by Kallies et al. in 2011 (Ref 29), although using a different genetic Prdm1-ablation model (Prdm1-GFP/GFP knockin/knockout model). This study reported impaired NK cell maturation, reduced Gzmb expression, impaired in vivo cytotoxicity against subcutaneous RMA-S cells, impaired in vitro proliferation, comparable in vitro killing, increase in BM NK cell numbers. The authors should discuss/mention this more prominently in their manuscript, and highlight where they confirm or refine these previous findings, and where they actually provide new information.

      Response 2: We appreciate your valuable suggestions. The article you referred to, published in Blood, is indeed an excellent work. While we had cited this article, our discussion regarding its specific content was limited. Based on your advice, we have made revisions and included the following content in our discussion section (page 24; line 489-493):

      “In a study involving systemic knockout combined with competitive transplantation, it was found that Prdm1 promotes NK cell maturation and the expression of Gzmb. On the contrary, the same study also found that NK cells with Prdm1 deficiency exhibit heightened proliferation, increased survival, enhanced migratory abilities towards tumors, and greater cytotoxicity against subcutaneously implanted RMAS tumors (31).”.

      Comment 3: What is the reason to refer to the enriched cluster in Blimp1-deficient NK cells as "Junbhi"? There is no follow-up for a function of Junb, and there are many other genes upregulated in these cells. Most critically, these cells seem to represent exactly the c-Kithi cells that Kallies et al. already showed and discussed in their paper. The authors should stain for Kit, and also refer to this. Also, MacKay et al. performed Blimp1-Chip-Seq (in T cells), maybe it would be interesting to check to which of the identified DEGs Blimp1 can bind.

      Response 3: We appreciate the suggestion from the reviewer. We think a gene that supports the development of lymphocytes doesn't necessarily positively regulate their function. For example, JunB is essential for T cell development but can also induce T cell exhaustion (Lynn et al., Nature. 2019). Therefore, while Prdm1 has been shown to promote NK cell development, it cannot be assumed that it always positively regulates NK cell function, especially for anti-cancer immune surveillance. In this respect, we try to find a driving-factor of the impaired anti-tumor ability of Prdm1_Δ_Ncr1 NK cells. Although there are many other genes upregulated in this cluster (e.g. Kit), JunB attracts more our interest of its potential for regulating NK cells functions in cancer, whereas c-Kit is more likely a marker of NK cells maturation, which has been well-demonstrated by Kallies et al. and other studies. Our previous studies also showed that the expression of c-kit was decreased in mature NK cells, compared immature NK cells (Wang et al., J Clin Invest, 2018). 

      The lack of following experiments of Junb is because we cannot find valuable surface markers to investigate the follow-up function of _Junb_hi cNK cluster. If we use intracellular markers, it is more likely an analysis of gene expression pattern, which has been well-described in our RNA-seq data. As we describe above, our study did not aim to further investigate the role of prdm1 in NK cells maturation, as the c-Kit expression was upregulated in Prdm1-kncok NK cells and correlated with NK cell maturation, which has been validated by Kallies et al.. 

      We also have discussed the potential DEGs that could be bound and regulated by Prdm1 in our revised manuscript (page 27-28; line 561-571):

      “Prdm1 and Hobit directly bound and repressed Tcf7 (18), which encoded TCF-1, a TF binding and limiting the activity of Gzmb regulatory element (69). Gzmb has been demonstrated directly bound and activated by Junb in NK cells, which suggested Gzmb expression regulated by multiple Prdm1/Hobit downstream signals (26). In human T cells, binding motif of JUNB was enriched in the binding sites of PRDM1 (70), indicating the essential role of PRDM1-JUNB axis during NK cell and T cell development. In NK cells deficient in Prdm1 expression, we noted a decrease in Gzmb levels alongside with an elevation in Junb expression. This indicates that Prdm1 not only facilitates the expression of Gzmb in NK cells but also suppresses Junb expression. Given that Junb is recognized as a positive regulator of Gzmb (71), this presents a complex interplay that seems contradictory. Therefore, it is imperative to develop a theoretical framework to comprehensively understand and interpret this paradoxical relationship.”.

      Comment 4: cNK cells are considered circulating cells, that transiently pass through the liver.

      Previous studies have suggested almost identical gene expression patterns in hepatic and splenic NK cells. In functional tests, they often "perform" identically. I am therefore a bit surprised that the authors find a differential dependency of Blimp1 for the IFNg production of splenic (no role of Blimp1) versus hepatic (Blimp1 regulating IFNg production) NK cells (Fig S3). Do the authors have any suggestions on that? The analyses are performed by 12+4h stimulations with IL12/18, which could involve the effects of altered bystander cells (as suggested by Figure 6). Therefore, these analyses should be provided upon standard 4h stimulations with IL12/18 and also with PMA/I under BFA. Note: liver and splenic cNK cells look quite different in the chosen histograms in Figures 7 A, B, C, yet there is massive variability in these analyses - is there any systematic/technical problem?

      Response 4: We appreciate the valuable suggestion from the reviewer. Studies have suggested that, at the gene expression or transcriptomic level, liver NK cells exhibit more similarity to splenic NK cells while displaying greater divergence from liver ILC1s. However, we do not think that splenic NK cells or peripheral blood NK cells (which are more abundant in circulation) are entirely indistinguishable from liver NK cells. Notably, there are substantial differences in their maturity levels, with liver NK cells being more mature. Since we are examining the protein levels, a 4-hour stimulation period may not fully capture these distinctions. Even when considering the potential impact of bystander cells, the experimental design specifically targets Prdm1 knockout within NK cells, ensuring that the study accurately elucidates the role of Prdm1 in NK cells. For each experiment, we have implemented control measures, and any variances observed in the figures may be attributed to individual variations among the animals. It is also possible that the MFI values measured by flow cytometry exhibit larger variations than a percentage.

      Comment 5: Figure 4 H/I - In contrast to NK cells in Fig 4E, F, the KO and WT ILC1s seem to co-cluster largely. Authors should validate differentially expressed genes. How strong is the effect of Blimp1 in ILC1s? Or is Blimp1 a critical TF driving effector differentiation in NK cells, while it has only subtle effects in ILC1 (these may be regulated by Hobit?)? This seems an interesting finding that should at least be discussed. For these types of small differences in ILC1, FACS confirmation analyses should be performed and findings be reevaluated using Cre-expressing controls (see above).

      Response 5: We appreciate the suggestion from the reviewer. As request, we analyze the DEGs in liver cNK cells and ILC1s from our scRNA-seq data (revised Supplemental Figure 8, A and B). There only a few valuable DEGs in ILC1s compared to cNK cells. It’s likely that Prdm1 have more essential effect of cNK cells transcriptional program, while it plays more important role in keep the homeostasis of ILC1s population. We have discussed these points to better inform the readers. (page 27; line 554-561): 

      “Previous studies have identified Hobit and Prdm1 as central regulators instructing tissue-dependent programs and retention of diverse tissue-resident lymphocytes (18, 51, 53). Liver ILC1s required Hobit, but not necessary for cNK cells (6). Expression of Prdm1 was remarkably higher in cNK cells versus ILC1s (18). While in our study, cNK cells and liver ILC1s reduced simultaneously in Prdm1ΔNcr1 mice, and even more significant in ILC1s. This indicates that while Prdm1 is expressed at lower levels in ILC1s, its role in preserving the quantity of ILC1s may be more crucial. Thus, Prdm1 and Hobit may have parallel program in instructing ILC1s functional development and maturation.”. 

      We cannot find valuable surface marker to evaluate the change in ILC1s, as most of changes are intracellular markers.

      Comment 6: The authors describe and discuss some of Figure 1 and 2 data as if Blimp1 would be involved in alternative NK versus ILC1 fates, but there is no evidence for this.

      Response 6: There is no evidence that Prdm1 could alter the fate decision of the progenitor towards liver cNK or ILC1s. Although some studies reported the conversion between cNK cells and ILC1s in special contexts, it was widely accepted that liver cNK cells and ILC1s originated from different progenitors. While we observed changes in the proportions of liver cNK cells and ILC1 in Prdm1 KO mice, we still lack sufficient evidence to support the relative independence of NK and ILC1 development, as well as evidence to indicate that Prdm1 is exclusively responsible for NK and ILC1.

      Regarding the changes in NK and ILC1 proportions after Prdm1 KO, we believe that both NK and ILC1 cells require Prdm1 to maintain their populations, with ILC1 possibly requiring it to a greater extent. This is the reason for the altered balance between NK and ILC1 cells following Prdm1 KO. We wish to clarify this point to prevent any misconceptions among readers. To address this, we have added the following content to the discussion section (page 25; line 509-516):

      “Furthermore, although both liver NK cells and liver ILC1s require Prdm1 to maintain their quantity, liver ILC1s demonstrate a more pronounced dependency on Prdm1. However, it is currently widely believed that liver NK cells and liver ILC1s originate from different progenitors. It is worth noting that while we observed changes in the NK and ILC1 proportions after Prdm1 knockout, our data does not support the hypothesis that Prdm1 affects progenitor differentiation decisions, thereby influencing the fate selection of NK and ILC1. Further research may be needed to elucidate how Prdm1 regulates the balance between NK cells and ILC1s.”.

      Comment 7: There are several recent studies suggesting a role for Hobit, homologue of Blimp1, in NK cells and in ILC1, and in the control of liver metastases. The authors should discuss similar and unique functions of Hobit and Blimp1, also in the regulation of gene expression patterns, and should refer to these studies.

      Response 7: We would like to express our gratitude to the reviewer for your insightful comments, which bring forth a critical perspective. In accordance with the reviewer's suggestion, we have updated our discussion to include the diverse functions guided by Hobit and Prdm1 in regulating the development and function of cNK cells and ILC1s (page 27; line 554-561):

      “Previous studies have identified Hobit and Prdm1 as central regulators instructing tissue-dependent programs and retention of diverse tissue-resident lymphocytes (18, 51, 53). Liver ILC1s required Hobit, but not necessary for cNK cells (6). Expression of Prdm1 was remarkably higher in cNK cells versus ILC1s (18). While in our study, cNK cells and liver ILC1s reduced simultaneously in Prdm1ΔNcr1 mice, and even more significant in ILC1s. This indicates that while Prdm1 is expressed at lower levels in ILC1s, its role in preserving the quantity of ILC1s may be more crucial. Thus, Prdm1 and Hobit may have parallel program in instructing ILC1s functional development and maturation.”.

      As shown in Supplemental Figure 8, we analyzed two published scRNA-seq data performed with Hobit_KO mice and integrated DEGs in cNK cells and ILC1s with our data. We observed overlaps of DEGs in _Prdm1_Δ_Ncr1 and Hobit_KO between cNK cells and ILC1s, such as _Junb, Tcf7, Gzmb, and Prf1 (Supplemental Figure 8), indicating the similar regulatory network of Prdm1 and Hobit. These data are now described on page 19; lines 386-395:   

      “We also compared the gene expression patterns between Prdm1 and Hobit (homologue of Blimp1) with two published scRNA-seq data (51, 53). Following the knockout of Hobit, the DEGs were primarily identified within ILC1s. Conversely, after the knockout of Prdm1, a greater number of DEGs were observed in cNK cells. This indicates that Prdm1 likely possesses a broader range of target genes within cNK cells, whereas Hobit appears to have a more pronounced impact on gene expression within ILC1s (Supplemental Figure 8, C-F). There are some overlaps between the downstream transcriptional profile of Prdm1 and Hobit in liver cNK cells and ILC1s (Supplemental Figure 8, G and H), such as Junb, Fosb, Tcf7, Kit, Gzmb, Prf1, and Cxcr6 was simultaneously upregulated or downregulated in both Prdm1ΔNcr1 and _Hobit_KO liver cNK cells or ILC1s, indicating the similar regulatory networks of Prdm1 and Hobit.”.

      Comment 8: Figure 4: The authors should discuss (and cross-validate) their liver gene expression analyses in the context of published datasets of NK and ILC1, such as the ones by Lopez et al, Friedrich et al, Ducimetiere et al and Yomogida et al.

      Response 8: We thank the reviewer for raising this important point. To address this question, we have now analyzed the gene expression of liver cNK cells and ILC1 in two published data mentioned above, also in the context of Hobit-knock. We compared gene expression of different clusters and described in our revised manuscript (page 19; lines 386-395). 

      “We also compared the gene expression patterns between Prdm1 and Hobit (homologue of Blimp1) with two published scRNA-seq data (51, 53). Following the knockout of Hobit, the DEGs were primarily identified within ILC1s. Conversely, after the knockout of Prdm1, a greater number of DEGs were observed in cNK cells. This indicates that Prdm1 likely possesses a broader range of target genes within cNK cells, whereas Hobit appears to have a more pronounced impact on gene expression within ILC1s (Supplemental Figure 8, C-F). There are some overlaps between the downstream transcriptional profile of Prdm1 and Hobit in liver cNK cells and ILC1s (Supplemental Figure 8, G and H), such as Junb, Fosb, Tcf7, Kit, Gzmb, Prf1, and Cxcr6 was simultaneously upregulated or downregulated in both Prdm1ΔNcr1 and _Hobit_KO liver cNK cells or ILC1s, indicating the similar regulatory networks of Prdm1 and Hobit.”.

      Recommendations For The Authors:

      Comment 9: The use of a paired t-test analysis when comparing cells/groups from different mice is not correct. Instead, the authors should consider using e.g. an unpaired t-test and re-test the indicated significance (e.g. Figure 1F, Figure 2H).

      Response 9: We thank the reviewer’s comments. As we used littermates for the experiments and they are compared side by side, so the paired t-test analysis is acceptable. We reanalysis the significance in the results of Figure 1F, and Figure 2H using unpaired t-test. The statistics significance of Figure 1F using unpaired t-test was same as using t-test. However, in Figure 2H, the reduced IFN-γ production not reach statistics significance when used un-paired t-test (Supplemental Figure 12B). It may attribute to the variation between different littermates, but the trend is still under the scope of our conclusion. We believe that employing a paired t-test between littermates could be also meaningful. As such, we kept both statistical methodologies to ensure a thorough evaluation.

      Comment 10: In several instances, it is unclear whether data are pooled or representative (and if so, of how many analyses). This information needs to be provided for all analyses. 

      Response 10: We apologize for the lack of details and have now provided the sufficient information in our figure legends. 

      For example, we delete the number in original histogram to avoid the misunderstanding of the unclear whether data are pooled or representative (e.g. original Figure7 A-C; revised Figure7 A-C). Furthermore, we added the “representative” in figure legends of all flow cytometric plots to better inform readers (e.g. original Figure2, D and F; revised Figure2, B and D).

      Comment 11: In the title and abstract authors use "type 1 ILCs" for both NK cells and ILC1, and it is difficult to understand which phenotypes correspond to cNK cells versus ILC1. Most of the analyses clearly separate these two different cell types. I would appreciate a lot being more accurate in the abstract, and describing cNK and ILC1 phenotypes in a clear way.

      Response 11: We are really sorry for our inaccurate descriptions. According to Spits et al., (Spits et al., Nature Reviews Immunology, 2013) and other related studies, we have now adopted a more appropriate nomenclature as “Conventional NK cells” correspond to “cNK cells”, “Type 1 innate lymphoid cells” to “ILC1s”, and “Group 1 ILC” as the collective name of cNK and ILC1s. 

      The definition of these cells was described in the introduction (page 4, line 52-53; line58-62): 

      “Group 1 ILCs consist of cNK cells and ILC1s (1, 2), with distinct developmental trajectories and effect molecules (3).”, “In a state of homeostasis, liver group 1 ILCs (CD45+CD3-NK1.1+NKp46+) can be discriminated into cNK cells and ILC1s by the differential expression of CD49a and CD49b (2): cNK cells are marked by the expression of CD49b, while liver ILC1s exhibit a distinctive positivity for CD49a. Tumor Necrosis Factor Related Apoptosis Inducing Ligand (TRAIL) is also expressed on liver ILC1s, but not on cNK cells (10, 11).”. 

      We also describe cNK and ILC1 phenotypes in our scRNA-seq data, as shown in page 13; line 259-261: 

      “cNK cells expressed high levels of Itga2 (CD49b) and Eomes, while ILC1s had high levels expression of Itga1 (CD49a) and Tnfsf10 (Supplemental Figure 5, F and G).”.

      Comment 12: In the abstract authors state "The present study unveiled a novel regulatory mechanism of Prdm1 in liver Type 1 ILCs, showing promising potential for developing innovative immune therapy strategies against liver cancer." - maybe authors should discuss how their findings could be used for therapeutic approaches?

      Response 12: We appreciate comments from the reviewer. As there hasn't been a clear consensus on the role of Prdm1 in NK cells prior to this, some studies have suggested that Prdm1 can inhibit cytokine secretion by NK cells. Particularly, Kallies et al. in their 2011 article in Blood found that Prdm1 might suppress NK cell anti-tumor activity. Hence, there hasn't been any immunotherapy targeting Prdm1 in NK cells for cancer treatment. Our research demonstrates the enhancing role of Prdm1 in NK cell anti-tumor activity, providing theoretical support for NK cell therapy targeting Prdm1. 

      We added the following content to the discussion section (page 29; line 605-609): 

      “Further research may provide deeper insight into the role of PRDM1 in the anti-tumor function of human NK cells, enabling a more direct investigation of its application in cancer therapies. Given its important role in preserving liver cNK cells and ILC1s functional heterogeneity, enhancing Prdm1 function in human NK cells could potentially be a strategy to promote NK cell-based immunotherapy for cancer.”.

      Comment 13: The authors should explain or interpret their data a bit more (e.g. what is the consequence of GSEA enriched in negative regulation of Il6 production? (Fig. 3D)  do NK cells produce Il6 (Figure 3)? What's the impact of Il17 signaling in NK/ILC1 (Figure 5). Do the authors suggest JunB-driven metabolic reprogramming (Suppl. Fig 6D-F?).

      Response 13: We appreciate comments from the reviewer. The question of IL-6 production in NK cell also raised by another reviewer. We have checked the GSEA results, and found no valuable genes in IL-6 production in NK cells. According to the suggestions of another reviewer (Response to Reviewer 2 Comment, Comment 14), it may be prudent to omit this figure.

      IL-17 signaling indicated the plasticity of ILC1s, that may be originated from the differentiation of ILC3, we added more discussion of this part (page 17; line 341-344). 

      “Several ILC3 signature genes, such as Rora, Tmem176a, and Tmem176b (45), highly expressed in this cluster (Supplemental Figure 7D). Considering the close relationship between IL-17 mediated immunity response and ILC3 (1, 46), it is plausible that _Il7r_hi ILC1 cluster may be attributed, at least in part, to potential plasticity between ILC1 and ILC3 subsets.”.

      The decreased mitochondrial function may have more relevance to NK cell exhaustion in tumors. Our data suggest that the elevated expression of JunB in NK cells may predispose them to exhaustion. Currently, our hypothesis regarding the promotion of NK cell exhaustion by high JunB expression is based on the observed correlation between JunB expression levels and exhaustion phenotypes (at the gene expression and IFN-γ secretion levels) and the findings in reference 67 (Lynn et al., Nature, 2019), where JunB was found to promote T cell exhaustion. However, we have not demonstrated causation between high JunB expression and exhaustion in NK cells. We propose that in NK cells, especially mature NK cells, excessive JunB expression may make them more sensitive to exhaustion inducers. Nevertheless, further research is needed to confirm this. To clarify this, we added the following content in the discussion section (page 26; line 537-543): 

      “While our current data is not sufficient to definitively classify these cells as exhausted NK cells, it supports that a subpopulation, referred to Junbhi cluster, demonstrates an exhaustion-like phenotype.

      The significant increase in this cell population following Prdm1 knockout in NK cells may potentially be one of the reasons why Prdm1ΔNcr1 mice lose their tumor-killing capacity. Whether the excessive expression of JunB in NK cells is also a contributing factor to their exhaustion, similar to T cells(65), requires further investigation.”.

      Comment 14: Ref 25 and Ref 57 are the same publication?

      Response 14: We are really sorry for our careless mistakes. We have checked all the reference and corrected the wrong format.

      Comment 15: Figure 1, E - The method description of RT-PCR is missing. I apologize if I have overlooked this information.

      Response 15: We have now added the description of RT-PCR in our revised method section (page 31; line 638-644):

      “RNA was extracted from FACS-sorted NK cells or splenocytes using RNASimple Total RNA Kit (TIANGEN Biotech, 4992858) and subsequently reverse transcribed to cDNA with SuperScript VILO Master Mix (Thermo Fisher Scientific, 11755050) according to manufacturer’s instructions. qPCR was performed with SYBR Green Mix (Thermo Fisher Scientific, A25742) and CFX Opus 96 Real-Time PCR System (Bio-Rad). The relative mRNA expression level was calculated using 2-ddCt method. Primer sequences:           Prdm1: 5’-CAGAAACACTACTTGGTACA-3’; 5’-GATTGCTTGTGCTGCTAA-3’.”

      Comment 16: Figure 1, F - The NKp46+CD3- gate for the liver seems to cut the population, not all cells are included.

      Response 16: We appreciate the review’s comment and apologize for our carelessness. We expend our data with more samples and reanalyzed them with a more convincing gating strategy. We now update our figures (revised Figure 1G; revised Supplemental Figure 2A). Several changes have occurred in the data and conclusions, and we have accordingly revised these contents in our manuscript.

      The original text is:

      “Proportion and absolute number of cNK cells in blood, bone marrow, lung, liver, spleen, and lymph nodes were analyzed by flow cytometry. Compared with Prdm1+/+ mice, the percentage of cNK cells (CD3-NK1.1+NKp46+) among lymphocytes was decreased in all of these tissues except bone marrow and lymph nodes (Figure 1F; Supplemental Figure 2A). However, no significant difference was observed in the percentage of cNK cells among bone marrow-derived lymphocytes between Prdm1ΔNcr1 and Prdm1+/+ mice. The absolute number of cNK cells in blood, lung, liver, and spleen also decreased in Prdm1ΔNcr1 mice (Figure 1F; Supplemental Figure 2A). Only a slight decrease in the number of cNK cells was observed in the lymph nodes of Prdm1ΔNcr1 mice, which did not reach statistical significance either (Supplemental Figure 2A). In contrast, the absolute number of cNK cells in Prdm1fl/fl mice bone marrow is moderately higher than Prdm1ΔNcr1 mice (Figure 1F).”

      The revised text is (page 8; line 142-146):

      “Proportion and absolute number of cNK cells in blood, bone marrow, lung, liver, spleen, and lymph nodes were analyzed by flow cytometry. Compared with Prdm1+/+ mice, the percentage and absolute number of NK cells (CD45+CD3-NK1.1+NKp46+) among lymphocytes was decreased in all of these tissues, whereas increased number of NK cells were observed in bone marrow (Figure 1G; Supplemental Figure 2A).”

      Comment 17: Figure 1, The y-axis labeling of lung CD3-NKp46+ cells (x10^3) is not correct.

      Response 17: We are really sorry for our carelessness. We now check the labels and make sure they are correct.

      Comment 18: Figure 1, The statistical significance of absolute numbers of NKp46+ cells in the bone marrow should be reviewed.

      Response 18: We expend our data with more samples and reanalyzed them with a more convincing gating strategy. We observed significant increase of bone marrow NK cells quantity in our updated data. These changes are now described in our revised manuscript.

      The original text is: 

      “However, no significant difference was observed in the percentage of cNK cells among bone marrow-derived lymphocytes between Prdm1ΔNcr1 and Prdm1+/+ mice”, “In contrast, the absolute number of cNK cells in Prdm1fl/fl mice bone marrow is moderately higher than Prdm1ΔNcr1 mice (Figure 1F).”

      The revised text is (page 8; line 142-146):

      “Proportion and absolute number of cNK cells in blood, bone marrow, lung, liver, spleen, and lymph nodes were analyzed by flow cytometry. Compared with Prdm1+/+ mice, the percentage and absolute number of NK cells (CD45+CD3-NK1.1+NKp46+) among lymphocytes was decreased in all of these tissues, whereas increased number of NK cells were observed in bone marrow (Figure 1G; Supplemental Figure 2A).”

      Comment 19: Figure 1, G - CD27 and CD11b are used to define maturation stages within NK cells. Here the authors are analyzing group 1 ILC instead (containing both NK cells and ILC1, especially in the liver). It would be better to pre-gate on Eomes+ or CD49b+ NK cells for this analysis.

      Response 19: We apologize for the lack of details in this analysis. We have pre-gate CD49b+ NK cells for the maturation stages analysis. We have now added this statement in our revised manuscript and figure legend (page 8; line 149-151)

      “The maturation of cNK cells (gated by CD45+CD3-NK1.1+NKp46+CD49b+) from blood, bone marrow, lung, liver, spleen, and lymph nodes were assessed, based on the expression of CD11b and CD27.”.

      Comment 20: Supplementary Figure 1, A - The NKp46+CD3- gate seems to cut the population, not all cells are included. y-axis labeling of spleen CD3-NKp46+ cells (%) is not correct.

      Response 20: Thanks, we have corrected these errors and shown in our revised supplementary Figure 2A.

      Comment 21: Figure 2, D-G - Did the authors analyse the ILC1/NK compartment of the tumor? What is the abundance and phenotype of these cells dependent on Prdm1 expression? Proper Crecontrols should be used (see above).

      Response 21: We appreciate the suggestions from the reviewer. As request, we have now added the analysis of cNK/ILC1s population in the context of tumor. The proportion changes of cNK cells and ILC1s in Prdm1_Δ_Ncr1 mice was similar with the no tumor-burden condition, while the number of both cNK cells and ILC1s decreased in tumor bearing liver (revised Figure 7D). These contents have been updated in our revised manuscript (page 23; line 479-481):

      “The proportion changes of cNK cells and ILC1s in Prdm1ΔNcr1 mice was similar with the no tumorburden condition, while the number of both cNK cells and ILC1s have significant decreased in tumor-bearing liver (Figure 7D).”.

      The reason why we did not use Cre-controls was described in comment 1.

      Comment 22: Figure 2, H - Prdm1-deficient NK and ILC1 produce less Ifng in response to in vitro stimulations with Il-12 and /or Il-18, and bulk Seq analysis (Fig 3F) shows reduced Il12rb2 expression. Does the expression of cytokine receptors correlate with the maturation of NK cells? This could be analyzed from the single-cell RNA-seq dataset. The statistical significance of %Ifng after Il12/Il18 stimulation should be revisited (see above).

      Response 22: We thank the reviewer for the suggestions. To address this question, we explored the expression of IL-12 and IL-18 receptors in cNK and ILC1 clusters. Within cNK clusters, Il12rb2, Il18r1 and Il18rap was highly expressed in Prf1hi and Cxcr3hi cNK clusters (revised Supplemental Figure 6H), indicating the IL-18 receptor expression correlated with the NK cell maturation. While in ILC1, these receptors mostly expressed on Il7r_hi and _Gzmb_hi ILC1 clusters (revised Supplemental Figure 7C). Significant decreased of _Il18r1 expression in Prdm1_Δ_Ncr1 cNK cells and ILC1s may associated with the impaired ability to produce IFN-γ. We now added this analysis (page 18; line 364-368):

      “Within cNK cells, Il12rb2, Il18r1 and Il18rap was highly expressed in Prf1hi and Cxcr3hi cNK clusters (Supplemental Figure 6I), indicating the IL-18 receptor expression correlated with the NK cell maturation. While in ILC1, these receptors mostly expressed on Il7r_hi and _Gzmb_hi ILC1 clusters (Supplemental Figure 7D). Significant decreased of _Il18r1 expression in Prdm1ΔNcr1 cNK cells and ILC1s may associated with the impaired ability to produce IFN-γ.”.

      The un-paired t test of IFN-γ production was displayed in revised supplemental Figure 12 B. Difference in IFN-γ production was found to be not significant when analyzed using an unpaired ttest in original Figure 2 H. However, significance was observed in tumor-bearing liver cNK cells and ILC1s, specifically under the context of IL-12/IL-18 stimulation, as depicted in the original Figure 7E using an unpaired t-test. These variations may be attributed to differences among different littermates. Despite these variations, the trend remains consistent with our overall conclusions. We believe that employing a paired t-test between littermates could be also meaningful. As such, we kept both statistical methodologies to ensure a thorough evaluation.

      Comment 23: Figure 3, A-E - For bulk sequencing analysis, splenic CD3-NK1.1+NKp46+ were isolated. This population also contains ILC1 in the spleen (e.g. Flommersfeld et al.), although much less abundant compared to NK cells, and compared to the liver compartment. However, have the authors tested the abundance of splenic ILC1 in Prdm1-deficient mice, which may impact the gene expression data? In line with this the detection of altered Cxcr6 expression in Figure F, which is usually expressed by ILC1 rather than NK cells, may indicate an alteration in ILC1 numbers. The authors should validate the altered expression of CXCR6, Itga1, and Cx3cr1 on NK cells by flow cytometry.

      Response 23: We cited the work of Flommersfeld et al. into our manuscript and have expanded our Results section to include the following information (page 19; line 377-385):

      “Previous research found that spleen NK cells could be divided into three distinct groups based on their expression levels of CD27, CD62L, CD49a, and CD49b (52). CD27+CD62L- NK cells have remarkable high expression of Batf3, while it was only barely expressed in CD27+CD62L+ and CD27-CD62L+ NK cells (52). Based the sequencing data published by Flommersfeld et al., (GSE180978), a notable negative correlation was observed between the expression levels of Prdm1 and Batf3 (Supplemental Figure 8I). On top of that, our findings unveiled the negative regulatory influence of Prdm1 on Batf3 within both spleen and liver NK cells. This discovery highlights a potential upstream mechanism that may influence the hemostasis of the spleen NK cell subpopulations through Batf3.”.

      We validated the expression of CD49a (Itga1) and CX3CR1 in liver cNK cells and ILC1s in our revised manuscript, which is described in our revised manuscript (page 9; line 170-174, page 14; line 231-233):

      “Increased CD49a expression was also observed in Prdm1ΔNcr1 liver ILC1s, while it showed decreased expression in NKp46+ cells in the liver, bone marrow, and lymph nodes (Supplemental Figure 2, F and G).”, “The percentage of CX3CR1+ cNK cells was significantly decreased in multiple tissues of Prdm1_Δ_Ncr1 mice, while the proportion of CX3CR1+ ILC1 was increased in the liver (Figure 3F).”

      Comment 24: Figure 3, F - Tnfsf26: which gene is this? is this a typo? Is a function of this gene in NK cells reported? Altered Batf3 expression suggests an impact on ILC1-like NK cells (Flommersfeld et al).

      Response 24: We are very sorry for our mistakes. We have removed Tnfrsf26 from the heatmap.

      Comment 25: Figure 3, G-J refer to Kallies data?! 

      Response 25: Kallies‘s data has mentioned the reduced GzmB expression in Blimp1gfp/gfp mice. However, compared with Kallies’s study, we further analyzed the GzmB and Perforin expression in different mature stages of NK cells. Reduced GzmB expression not only due to the less mature phenotype in Prdm1-deficient NK cells, highlighting the role of Prdm1 in regulating NK cell function. So, we added these contents in the revised manuscript (page 12; line 233-242):

      “Lower GZMB and PRF1 production was observed in Prdm1-deficient splenic cNK cells, liver cNK cells and ILC1s (Figure 3, H-K; Supplemental Figure 4, A-I). Notably, the proportion of GZMB+ and PRF1+ cNK cells was decreased among almost all of the maturation stages of cNK cells (Figure 3, J and K). The relative mean fluorescent intensities (MFIs) of GZMB and PRF1 consistently show a reduction across all developmental stages in PrdmΔNcr1 NK cells (Supplemental Figure 4, H and I). Yet, no statistical difference of PRF1 was found within the CD11b-CD27+ and CD11b+CD27+ subsets, likely due to the relatively lower perforin levels in these populations (Supplemental Figure 4I). These findings suggest that Prdm1 may directly influence cytotoxic molecule in NK cells, rather than impacting their anti-tumor abilities solely by affecting the maturation phenotype of Prdm1-deficient NK cells.”

      In Discussion section (Kallies’s work is cited here in revised manuscript) (page 24; line 500-502):

      “Our results not only confirmed a decrease in cytotoxic molecules in Prdm1-deficient NK cells (31) but also showed that the reduction in Gzmb and perforin is not solely attributable to the diminished maturation of these cells.”

      Comment 26: Figure 3, G, I - How do the authors explain the high variability of GzmB and Prf1 in Prdm1+/+ cells? 2 samples have comparable values to Prdm1-deficient cells.

      Response 26: This may be due to the inherent differences in MFI among different samples. In the revised version, we have added data on percentages, which exhibit much less variability (Figure 3, H and I). The MFIs of GZMB and PRF1 are moved to supplemental Figure 4 E and F.

      Comment 27: Did the authors test the mice for potential germline recombination of the floxed allele, which has been suggested as a potential problem of Ncr1cre?

      Response 27: We appreciate the insightful comments provided by the reviewer, and this is a really good question. In Prdm1fl/fl mice, germline recombination typically results in a systemic knockout of Prdm1, which can lead to embryonic lethality. Given that mice were successfully born in the current study, it is almost unlikely that germline recombination of Prdm1 occurred due to leaky expression of Cre.

      To confirm this issue, we isolated splenocytes and assessed Prdm1 expression using qPCR. We observed no significant difference in Prdm1 expression between splenocytes from Prdm1+/+ and Prdm1ΔNcr1 mice (revised Figure 1F). This also indicated that germline recombination issues are unlikely to be present in the Prdm1ΔNcr1 mice.

      Comment 28: Histograms do not show MFI

      Response 28: We appreciate the comments provided by the reviewer. The MFI value was omitted.

      Comment 29: Supplementary Figure 4, B - FACS plot labelling: Typo, Histograms do not show MFI.

      Response 29: We sincerely thank the reviewer for careful reading. The typo in this figure was corrected. The MFI is omitted.

      Comment 30: Figure 4, A - What are the cells in the red cluster in the middle of the UMAP, do they belong to B cells? Why do they cluster so separately? It is interesting, but also surprising that NK and ILC1 cluster map so far apart from each other (rather with CD8 or B cells? or NKT cells) - do the authors have any comments?

      Response 30: We sincerely apologize for the mistakes in labeling a group of cells in our previous analysis. Upon a thorough re-evaluation, we have corrected the labels of several cell clusters that were previously misidentified. The revised heatmap (revised Supplemental Figure 5C) represents the marker genes for each cluster. Additionally, in our updated analysis (revised Figure 4A), we have included clusters for Epithelial cells, CD4+ T cells, NKT cells, and Kupffer cells. Please note, the red cluster identified in the center of the original heatmap corresponds to the CD4+ T cells.

      We checked the markers of cNK cell and ILC1 clusters and confirmed they are labeled correctly, as Ncr1 and Klrb1c (NK1.1) was highly expressed in these clusters compared to others (revised Supplemental Figures 5E).

      Comment 31: Does Junb expression correlate with the maturation stages of NK cells?

      Response 31: Our previous research indicated that during the maturation process of NK cells, there was a decrease in the expression levels of Junb (negative correlation), whereas there was an increase in the expression levels of Prdm1 (Wang et al., J Clin Invest, 2018; Supplemental Figure 5c and Supplemental Figure 11).

      Comment 32: The authors may consider validating their scRNA-seq data (e.g. by FACS analysis for highlighted markers, eg. cKit, Tcf7, Gzma, Cxcr3).

      Response 32: We appreciate the suggestion from the reviewer. We validated several marker genes, including Gzmb, Prf1, and Cx3cr1 by FACS, as shown in the revised Figure 3 F-K. Currently, FACS cannot distinguish liver NK cells into as many distinct clusters as can be achieved through scRNAseq analysis. However, we expect that as technology progresses, we will be able to enhance our validation of the scRNA-seq data.

      Comment 33: It is a bit unclear to me why authors refer to Cxcr3hi NK cells as tissue-resident. This is based on Cxcr3 and Ccr2 expression. To make this statement, a much more detailed analysis would be required. How are CD69, CD49a, or CXCR6 expression of these cells?

      Response 34: We appreciate the suggestion from the reviewer. The primary reason for classifying this specific cluster of NK cells as tissue-resident is derived from the differential expression genes (DEGs) and Gene Ontology (GO) analysis, which demonstrate significant chemokine receptor activity within this cluster.

      To make this statement more clearly, we check the expression of the above markers, but only Cd69 had expression in cNK clusters, which was highly expressed in _Junb_hi and _Cxcr3_hi cNK cells (revised Supplemental Figure 6D). We also used top30 DEGs in ILC1s versus cNK to calculate the module score in all cNK clusters, as _Cxcr3_hi cNK had highest score among these clusters (revised Supplemental Figure 6D). This part has been updated in our manuscript (page 15; line 298-308):

      “Expression of tissue-resident markers Cd69 was also highly expressed in this clusters (Supplemental Figure 6D). The enrichment of chemokine receptors in the genes upregulated in the Cxcr3_hi cluster implying a greater likelihood of this cluster being tissue-resident compared with other cNK cell clusters (Figure 4H). To further confirmed tissue-resident properties of this clusters, we calculated the module score based on top30 DEGs in ILC1 versus cNK clusters, including _Cxcr6, Itga1, Cd160, Cd226, etc. _Cxcr3_hi cNK clusters have the highest score among all cNK clusters (Supplemental Figure 6H), indicating the similarity with liver ILC1s. In the tumor microenvironment, reports indicated that NK cells could transform into ILC1s (25). If this conversion of cNK cells into ILC1s also occurred under normal physiological conditions, then _Cxcr3_hi cNK cell cluster might be the most susceptible to such transformation.”

      Comment 35: The authors suggest that Prdm1 regulates chemokine receptor expression. An alternative explanation could be that this is an indirect effect of altering the abundance of NK cell subsets.

      Response 35: We are sorry for lacking the details in these figures. The input cell number of each genotype has now been added in following figure legends. 

      Figure 4F, “Proportions of cNK cells among total cNK cells (left; 211 cells in Prdm1+/+, and 141 cells in Prdm1ΔNcr1) and within clusters (right).”; Figure 5C, “Proportions of ILC1s among total ILC1s in different genotypes (left; 114 cells in Prdm1+/+, and 63 cells in Prdm1ΔNcr1) and within each cluster (right).”; Figure 6C, “Proportions of MDMs and KCs among total macrophages in different genotypes (510 cells in Prdm1+/+, and 624 cells in Prdm1ΔNcr1).”

      To minimize the effects of discrepancies in input numbers between samples with different genotypes, we represented the relative proportions of each cluster within its specific genotype (e.g. Supplemental Figure 6B; Supplemental Figure 7B; Supplemental Figure 9B).

      Comment 36: Supplementary Figures 6 and 7, A - The formatting of gene annotations does not fit the heat maps (the gene names on the last rows are missing).

      Response 36: We apologize for our careless mistakes. We have now addressed these mistakes.

      Comment 37: Supplementary Figures 6 and 7, What is the consequence of compromised mitochondrial function? Increase apoptosis?

      Response 37: In our experiments, we did not find that Prdm1 has an effect on the apoptosis of NK cells. Conversely, previous studies have found that Prdm1 might inhibit the proliferation of NK cells (C. Kucuk, et. al., PNAS, 2011). We acknowledge that there is ongoing debate regarding the precise definition of NK cell exhaustion. In our experiments, no changes were detected in the expression levels of surface markers (TIGIT) associated with exhaustion on NK cells following the knockout of Prdm1. However, we did note a significant reduction in the cytokine secretion capacity and tumor control efficacy of NK cells after Prdm1 knockout. We prefer to say that the consequence of compromised mitochondrial function might be increased exhaustion. As we mentioned in discussion part (line 482-483), mitochondrial fragmentation has been confirmed to be closely associated with NK cell exhaustion in tumor (Zheng et al. Nature immunology, 2019). Although the evidence to define the exhausted NK cells in Prdm1_Δ_Ncr1 was not sufficient, our data may support the compromised mitochondrial functions, at least in part, associated with the exhausted phenotype of Prdm1_Δ_Ncr1 NK cells in cancer. 

      We have discussed these points in our revised manuscript (page 26; line 529-543): 

      “Mitochondria are pivotal organelles crucial for cellular metabolism. Disruptions in mitochondrial function have been linked to T Cell exhaustion, attributed to glycolytic reprogramming (66). Similarly, mitochondrial fragmentation has been closely associated with NK cell exhaustion (67).

      However, the concept of NK cell exhaustion isn't as firmly established as it is for T cells. Exhausted NK cells should primarily exhibit diminished functions. This is characterized by a diminished ability to destroy tumor cells, a reduced capability to activate other components of the immune system, and compromised proliferation and survival rates. Additionally, this reduced functionality is associated with a decline in the expression of molecules responsible for cytotoxic activity, lower production of IFN-γ, and metabolic disturbances that may arise from mitochondrial dysfunction. While our current data is not sufficient to definitively classify these cells as exhausted NK cells, it supports that a subpopulation, referred to Junb_hi cluster, demonstrates an exhaustion-like phenotype. The significant increase in this cell population following _Prdm1 knockout in NK cells may potentially be one of the reasons why Prdm1ΔNcr1 mice lose their tumor-killing capacity. Whether the excessive expression of JunB in NK cells is also a contributing factor to their exhaustion, similar to T cells(65), requires further investigation.”.

      Comment 38: Figure 5, Describing the scRNA Seq data, the authors are switching a lot between Figure 4 and Figure 5. Maybe a reorganization of the Figures (Figure 4: NK cell; Figure 5: ILC1) could help.

      Response 38: We appreciate the reviewer’s suggestion. We have now reorganized the Figure 4 and Figure 5.

      Comment 39: Figure 5, We suggest naming one of the ILC1 clusters "Gzmbhi" to keep it consistent with the FACS data.

      Response 39: We agree with this excellent suggestion and have now renaming the “Gzmahi” ILC1 cluster as “Gzmbhi” ILC1 cluster.

      Comment 40: Figure 5, C - How was the JunB score derived (which genes were used)?

      Response 40: The JunB score was calculated based on the expression of marker genes in _Junb_hi cNK clusters (DEGs in _Junb_hi cNK cluster compared to other clusters, as shown in revised Supplemental figure 6A). The score was calculated using “AddModuleScore” R package.

      Comment 41: Figure 5, G, I - The authors highlight Il17 signaling pathway, what is the impact of Il17 on NK/ILC1? Did the authors check for ILC3 (Rorc expression) within the ILC1 cluster?

      Response 41: The enrichment of IL-17 signaling pathway in Il7r_hi ILC1 indicated that this cluster encompass ILC1s originate from the conversion of Rorγt+ ILC3s. Although the Rorc expression was undetectable in all ILC1 clusters, we found several ILC3 marker genes highly expressed in this clusters (e.g. Rora, Tmem176a, Tmem176b) according to the ILC3 transcriptomes (Robinette et al., _Nature Immunology, 2015). 

      We have added these contents in our revised manuscript (page 17; line 341-344): 

      “Several ILC3 signature genes, such as Rora, Tmem176a, and Tmem176b (45), highly expressed in this cluster (Supplemental Figure 7D). Considering the close relationship between IL-17 mediated immunity response and ILC3 (1, 46), it is plausible that _Il7r_hi ILC1 cluster may be attributed, at least in part, to potential plasticity between ILC1 and ILC3 subsets.”.

      Comment 42: Figure 5, The authors detect more Ly49E+ cytotoxic ILC1 in Prdm1fl Ncr1cre mice.

      How does this observation fit to the reduced cytotoxicity of NK cells?

      Response 42: The proportion of _Klra_hi ILC1 was increased, while the _Gzmb_hi ILC1 was decreased in _Prdm1_ΔNcr1 mice. Moreover, total number of three ILC1 cluster was reduced in _Prdm1_ΔNcr1 mice.

      Comment 43: Line 350/351: Citation required.

      Response 43: We added the respective reference. (reference 55 and 56).

      Comment 44: Figure 6, The Cell-chat analysis provides interesting suggestions, but none are experimentally addressed. It is also difficult to evaluate these analyses: are any of the Mac subsets altered in frequency or phenotype in either genotype? This could be analyzed from the single-cell data in Fig 4. At the very least, flow cytometric validation of predicted shifts in the Mac compartment should be confirmed.

      Response 44: We gratefully thanks for these valuable suggestions. As requested, we analyzed macrophages and validated some of the scRNA-seq data by flow cytometry. We have re-written this part with the analysis of altered proportion of two macrophage clusters (Kupffer cells and Monocyte-derived macrophages) (page 20-21; line 399-436):

      “The scRNA sequencing analysis identified two well-established subpopulations of liver macrophages: the resident Kupffer Cells (KCs) and the Monocyte-Derived Macrophages (MDMs) (Figure 6, A-C; Supplemental Figure 9A). When comparing the total proportion of macrophages within the immune cell population of the liver between WT and Prdm1ΔNcr1 mice, there is an increase in Prdm1ΔNcr1 mice (Figure 6C). To confirm these findings, we utilized flow cytometry to define macrophages, including both KCs and MDMs, gating by CD45+Ly6G-F4/80+CD11b+ (Figure 6D).

      Our analysis showed that, following the deletion of Prdm1 in Group 1 ILCs, there is a significant increase in both the proportion and number of macrophages in the liver (Figure 6D).

      According to the transcriptional profile, liver macrophages further clustered and were labeled as “Ly6c2_hi”; “_Cxcl2_hi”; “_Ear2_hi” MDMs, and “_Mrc1_hi”; “_C1q_hi” KCs (Figure 6A, Supplemental Figure 9, A-E). Increased proportion of MDMs and KCs was observed in _Prdm1ΔNcr1 cells (Supplemental Figure 9B). Within MDMs clusters, Ly6c2_hi MDMs mainly compose of _Prdm1+/+ cells, while Prdm1ΔNcr1 cells concentrated in Cxcl2_hi cluster (Figure 6C). The scRNA-seq data reveal that following Prdm1 knockout in NKp46+ cells, there is a decrease in the proportion of KCs within the macrophage population, while the proportion of MDMs increases (Figure 6D). CX3CR1, a chemokine receptor, is extensively utilized to distinguish KCs and MDMs within macrophages. Cells expressing CX3CR1 are identified as MDMs, whereas those without CX3CR1 expression are categorized as KCs (56). Employing flow cytometry and leveraging CX3CR1 expression, we assessed the ratios of KCs and MDMs. However, diverging from the scRNA-seq findings, flow cytometry indicates that post-Prdm1 knockout in group 1 ILCs, there is a minor increase in the proportion of KCs within the total liver macrophages, and a decrease in the proportion of MDMs (Figure 6D; Supplemental Figure 9B). This discrepancy could stem from the different bases of classification: scRNA-seq defines KCs based on gene expression profiles, whereas flow cytometry differentiates between KCs and MDMs using the single surface marker, CX3CR1. Analysis of the macrophage subsets identified by scRNA-seq reveals that, while MDM clusters generally show high CX3CR1 expression, there exists a subset within MDMs, labeled _Mrc1hi, that also exhibits high levels of CX3CR1 (Supplemental Figure 9C). Consequently, if flow cytometry solely employs CX3CR1 for differentiating between KCs and MDMs, it could result in disparities when compared to scRNA-seq outcomes. Both KCs and MDMs has significantly increased in Prdm1ΔNcr1 mice, which was consist with the scRNA-seq data (Supplemental Figure 9, B and F). Despite the decrease in the proportion of Ly6c2hi MDMs in Prdm1ΔNcr1 mice, the expression levels of Ly6c2 exhibited minimal variation between WT and Prdm1ΔNcr1 mice (Supplemental Figure 9D). Intriguingly, within certain cellular subsets, notably the Ear2hi cluster, the Ly6c2 expression levels in KO mice were found to be higher than those in WT mice. Additionally, we employed flow cytometry to examine Ly6C expression within the macrophages. Similar with the scRNA-seq findings, there were no notable differences in Ly6C expression levels between WT and KO mice (Figure 6E; Supplemental Figure 9G).”.

      The changes of the macrophage compartment indicated the potential influence of functional NK cells to macrophages. We have revised these parts in our results and discussion (line 590-601). However, to address more analysis on macrophage is worthy but would go beyond the scope of this manuscript, which will be a direction of our further work.

      Comment 45: Figure 6, C1qhi Mac only are few cells/events, and interactions (or cells?) seem to be gone in the Prdm1-floxed mice. Is that true? Does it make sense to perform cell-chat analysis on so few cells?

      Response 45: We have now added KCs to the cell-chat analysis, and this cluster was belonged to C1qhi KCs. We have revised the analysis of corresponding parts in our manuscript (page 20-21; line 408-428):

      “According to the transcriptional profile, liver macrophages further clustered and were labeled as “Ly6c2_hi”; “_Cxcl2_hi”; “_Ear2_hi” MDMs, and “_Mrc1_hi”; “_C1q_hi” KCs (Figure 6A, Supplemental Figure 9, A-E). Increased proportion of MDMs and KCs was observed in _Prdm1ΔNcr1 cells (Supplemental Figure 9B). Within MDMs clusters, Ly6c2_hi MDMs mainly compose of _Prdm1+/+ cells, while Prdm1ΔNcr1 cells concentrated in Cxcl2_hi cluster (Figure 6C). The scRNA-seq data reveal that following Prdm1 knockout in NKp46+ cells, there is a decrease in the proportion of KCs within the macrophage population, while the proportion of MDMs increases (Figure 6D). CX3CR1, a chemokine receptor, is extensively utilized to distinguish KCs and MDMs within macrophages. Cells expressing CX3CR1 are identified as MDMs, whereas those without CX3CR1 expression are categorized as KCs (56). Employing flow cytometry and leveraging CX3CR1 expression, we assessed the ratios of KCs and MDMs. However, diverging from the scRNA-seq findings, flow cytometry indicates that post-Prdm1 knockout in group 1 ILCs, there is a minor increase in the proportion of KCs within the total liver macrophages, and a decrease in the proportion of MDMs (Figure 6D; Supplemental Figure 9B). This discrepancy could stem from the different bases of classification: scRNA-seq defines KCs based on gene expression profiles, whereas flow cytometry differentiates between KCs and MDMs using the single surface marker, CX3CR1. Analysis of the macrophage subsets identified by scRNA-seq reveals that, while MDM clusters generally show high CX3CR1 expression, there exists a subset within MDMs, labeled _Mrc1hi, that also exhibits high levels of CX3CR1 (Supplemental Figure 9C). Consequently, if flow cytometry solely employs CX3CR1 for differentiating between KCs and MDMs, it could result in disparities when compared to scRNA-seq outcomes.”.

      Comment 46: Figure 6, C - Here the interactions of both Mac+ILC1 and Mac+NK are shown together. It would be interesting to separate this analysis (also Suppl. Fig 9A-B) into comparisons of Mac+ILC1 vs Mac1+NK from WT or Prdm1fl Ncr1 mice.

      Response 46: As request, we re-analyzed this part in each genotype, which was showed in the Supplemental Figure 10. These data have now been described in (page 22; line 445-447).

      “The reduction of interaction mostly occurred in the cross-talk of ILC1-MDM and ILC1-KC, whereas no difference was observed in cNK-MDM and cNK-KC interaction (Supplemental Figure 10, A-H)”

      Comment 47: Supplementary Figure 9, A, B - Is this analysis using WT and Prdm1fl Ncr1cre dataset together? 

      Response 47: Yes, we used WT and Prdm1_Δ_Ncr1 data together. As the request above, we separate this analysis from WT or Prdm1_Δ_Ncr1 Ncr1 mice. These data have now been described in (page 22; line 445-460):

      “The reduction of interaction mostly occurred in the cross-talk of ILC1-MDM and ILC1-KC, whereas no difference was observed in cNK-MDM and cNK-KC interaction (Supplemental Figure 10, A-H). A reduction in the interaction of ligand-receptor, such as Mif-CD74, Cxcl16-Cxcr6, and Cxcl10-Cxcr3 was observed in Prdm1ΔNcr1 mice compared to Prdm1+/+ mice (Supplemental Figure 11). Compared to Prdm1+/+ mice, the information flow of CXCL and MIF pathways significantly decreased in Prdm1ΔNcr1 mice (Figure 6, H and I; Supplemental Figure 10, B, D, F, and H). These pathways play a crucial role in facilitating macrophage migration. The CXCL signaling was sent from Ly6c2_hi _Cxcl2_hi MDMs and _C1q_hi KC, targeting all ILC1 clusters and _Cxcr3_hi cNK cell clusters (Figure 6J). Of note, although the population of _Cxcl2_hi macrophage primarily comprised cells from _Prdm1ΔNcr1 mice, the interaction within the CXCL pathway between macrophages and group 1 ILCs was obviously less than Prdm1+/+ sample (Figure 6J). These changes could be linked to a decreased population of ILC1s and Cxcr3_hi cNK cell cluster in _Prdm1ΔNcr1 mice, implying that the homeostasis of _Cxcl2_hi macrophages required sufficient signals from cNK cells and ILC1s. The impaired CXCLCXCR interactions might subsequently lead to reduced recruitment and activation of group 1 ILCs and macrophages within the tumor microenvironment.”.

      Comment 48: Figure 7, A-C -What is the consequence/interpretation of reduced Mitotracker staining? Any metabolic assays performed? The definition of NK cell "exhaustion" is unclear, is reduced IFNg enough for that? Is the concept of NK cell exhaustion clearly established? Only shortly touched upon in the discussion, the rationale for suggesting an exhausted phenotype, should be explained.

      Response 48: MitoTracker was used to assess the mitochondrial mass. The reduced staining indicated compromised mitochondria function, which associated with mitochondrial fragmentation.

      We believe that the exhaustion of NK cells is not as well-established a concept as it is for T cells. The purpose of detecting mitochondria in this study is to provide evidence for the relationship between Prdm1 and the exhaustion of NK cells. In the discussion section, we have added the following content (page 26; line 529-543):

      “Mitochondria are pivotal organelles crucial for cellular metabolism. Disruptions in mitochondrial function have been linked to T Cell exhaustion, attributed to glycolytic reprogramming (66). Similarly, mitochondrial fragmentation has been closely associated with NK cell exhaustion (67).

      However, the concept of NK cell exhaustion isn't as firmly established as it is for T cells. Exhausted NK cells should primarily exhibit diminished functions. This is characterized by a diminished ability to destroy tumor cells, a reduced capability to activate other components of the immune system, and compromised proliferation and survival rates. Additionally, this reduced functionality is associated with a decline in the expression of molecules responsible for cytotoxic activity, lower production of IFN-γ, and metabolic disturbances that may arise from mitochondrial dysfunction. While our current data is not sufficient to definitively classify these cells as exhausted NK cells, it supports that a subpopulation, referred to Junb_hi cluster, demonstrates an exhaustion-like phenotype. The significant increase in this cell population following _Prdm1 knockout in NK cells may potentially be one of the reasons why Prdm1ΔNcr1 mice lose their tumor-killing capacity. Whether the excessive expression of JunB in NK cells is also a contributing factor to their exhaustion, similar to T cells(65), requires further investigation.”.

      Comment 49: Figure 7, x-axis labelling (MFI) of histograms is not correct. Do bar graphs and FACS plots show the same data? Does the number in the FACS plots indicate the MFI? If so, the FACS plots do not show representative samples?

      Response 48: We appreciate the valuable comments provided by the reviewer. In the revised Figure 7, the MFI values have been removed. Bar graphs now display summary data from FACS histograms.

      A representative sample close to the group's mean value was chosen for display in the histograms.

      Comment 50: Figure 7, D - How are these data different from Figure 2H? Why is it now called "exhaustion", but not in 2H? Is the detected IFNg only driven by ex vivo stimulation with Il12/Il18? As above, a "standard" 4h assay should also be provided to allow better interpretation of potential differences. In the introduction, the authors cite the Ducimetiere study (Ref 5) highlighting "the primary function of ILC1 in suppressing the seeding of metastatic tumor cells in liver tissue". Thus, it would be interesting to test Ifng production by liver ILC1 and NK cells ex vivo at early time points of tumor inoculation.

      Response 50: Tumors grow and proliferate within tissues, constituting one of the major causes of lymphocyte exhaustion. This part of the current study aims to investigate whether Prdm1 aids NK cells or ILC1 in resisting the exhaustion induced by malignant tumors. Specifically, we seek to ascertain whether the absence of Prdm1 renders NK cells or ILC1 more susceptible to exhaustion within the tumor microenvironment. Therefore, we will consider the capacity to secrete IFN-γ upon IL-12/IL-18 stimulation as one indicative aspect of exhaustion. It's crucial to emphasize that this assessment serves as only one piece of evidence, not the sole determinant. Overnight stimulation is a conventional method for studying NK cells and has been widely used across different laboratories, including our lab (e.g. Bream et al., Blood, 2003; Yu et al., Immunity, 2006; Wang et al., J Clin Invest, 2018). It's essential to clarify that our approach does not involve stimulating with tumor cells to evaluate the secretion capacity of IFN-γ by NK cells or ILC1.

      Reviewer 2 (Public Review):

      Summary:

      This study offers a significant advancement in understanding liver innate lymphoid cell (ILC) biology by elucidating the role of the transcription factor Prdm1. It shows that Prdm1 is crucial in maintaining the balance between conventional natural killer (cNK) cells and ILC1s in the liver, with knockout models revealing a vital role in cancer defense mechanisms. Despite not affecting direct cytotoxicity, Prdm1 deficiency leads to increased cancer metastasis and reduced secretion of key molecules like IFN-γ, pointing to its importance in immune regulation. The use of single-cell RNA sequencing further underscores Prdm1's role in cellular communication within the liver's immune milieu. This study is a robust contribution to the field, providing insights that could inform new immunotherapy approaches for liver cancer.

      Strengths:

      The study's strength lies in its comprehensive approach, combining the specificity of Prdm1 conditional deletion in Ncr1-cre mice with integrative omics analyses and cutting-edge cytometry to delineate Prdm1's role in liver Type 1 ILC biology and its functional implications in tumor immunity. This multifaceted strategy not only clarifies Prdm1's influence on ILC composition and maturation but also conveys potential therapeutic insights for liver cancer immunotherapy.

      We sincerely appreciate your interest and critical assessment of our manuscript. We have carefully read your comments and suggestions, and I am truly grateful for your expert guidance. We have worked on addressing each of your concerns and comments, and below we provide a point-to-point response. Please find the detailed responses below:

      Weakness

      Comment 1: A notable weakness of the study is the limited scope of in vivo disease models, primarily relying on the B16F10 melanoma model, which may not fully capture the complex behavior of Type 1 ILCs across diverse cancer types. Furthermore, the absence of direct human data, such as the effects of PRDM1 deletion in human NK cells or stem cells during their differentiation into NK and ILC1, leaves a gap in translating these findings to clinical settings.

      Response 1: We appreciate the reviewer for raising these important points, which we see as a unique opportunity for future work to transform our understanding of Prdm1 and its targets as opposed to a weakness of the present study. 

      In our revised manuscript, we have discussed these limitations of our study (page 29; line 602-609):

      “While our findings underscore the importance of Prdm1 in liver cNK cells and ILC1s tumor immune surveillance, it does not be validated in human NK cells, whereas previous studies have found that PRDM1 might inhibit the proliferation and function of human NK cells (33, 73). Furthermore, we not provided an in-depth evaluation in multiple tumor models. Further research may provide deeper insight into the role of PRDM1 in the anti-tumor function of human NK cells, enabling a more direct investigation of its application in cancer therapies. Given its important role in preserving liver cNK cells and ILC1s functional heterogeneity, enhancing Prdm1 function in human NK cells could potentially be a strategy to promote NK cell-based immunotherapy for cancer.”.

      Recommendations For The Authors:

      (Introduction) 

      Comment 2: Reference 1 appears slightly misplaced. You might find the nomenclature discussion in Spits et al., Nature Reviews Immunology, 2013, more appropriate.

      Response 2: We are really sorry for our inaccurate descriptions. According to Spits et al., (Spits et al., Nature Reviews Immunology, 2013) and other related studies, we have now adopted a more appropriate nomenclature as “Conventional NK cells” correspond to “cNK cells”, “Type 1 innate lymphoid cells” to “ILC1s”, and “Group 1 ILC” as the collective name of cNK and ILC1s. 

      The definition of these cells was described in the introduction (page 4, line 52-53; line58-62): 

      “Group 1 ILCs consist of cNK cells and ILC1s (1, 2), with distinct developmental trajectories and effect molecules (3).”, “In a state of homeostasis, liver group 1 ILCs (CD45+CD3-NK1.1+NKp46+) can be discriminated into cNK cells and ILC1s by the differential expression of CD49a and CD49b (2): cNK cells are marked by the expression of CD49b, while liver ILC1s exhibit a distinctive positivity for CD49a. Tumor Necrosis Factor Related Apoptosis Inducing Ligand (TRAIL) is also expressed on liver ILC1s, but not on cNK cells (10, 11).”. 

      We also describe cNK and ILC1 phenotypes in our scRNA-seq data, as shown in page 13; line 259-261: 

      “cNK cells expressed high levels of Itga2 (CD49b) and Eomes, while ILC1s had high levels expression of Itga1 (CD49a) and Tnfsf10 (Supplemental Figure 5, F and G).”.

      Comment 3: It has come to my attention that Reference 9 has been retracted. I recommend removing this citation to maintain the integrity of your references (https://doi.org/10.1182/blood.2023022801).

      Response 3: We thank the reviewer’s comment and we now have removed this citation.

      Comment 4: For a more comprehensive context around reference 15, consider citing Thierry Walzer's work ([https://rupress.org/jem/article/211/3/563/41636/T-bet-and-Eomes-instruct-thedevelopment-of-two)]) which aligns closely with your discussion.

      Response 4: We agree with the reviewer’s suggestion and have added this citation in our introduction (page 4; line 64-66):

      “Liver environment facilitated T-bet expression in the early stage of NK cells development, which results in Eomes repression. The repression of T-bet is required for Eomes+ NK cells (17).”.

      (Results) 

      Comment 5: The NK cell signature referenced in 32 has been questioned for its reliability as discussed by Cursons et al., CRI 2019 (https://pubmed.ncbi.nlm.nih.gov/31088844/). Reanalysis of data in Figure 1 B/C and Supplementary Figure 1 with the refined NK cell signature from Curson's work would be advantageous.

      Response 5: We thank the reviewer’s comment. As requested, we reanalyzed our data using the refined NK cell signature from Cursons et al. (revised Figure 1 A-C; revised Supplemental Figure 1). Of note, the overall survival of liver cancer (LIHC) patients only reached statistics significance when compared high and low expression of refined PRDM1-NK signature with a median cutoff (Figure 1, A-C). The overall survival performed with quartile high and low expression of refined PRDM1-NK signature was moved to supplemental figure 1, G-I. 

      The original text is: “Examination of 363 liver hepatocellular carcinoma (LIHC) patient samples from The Cancer Genome Atlas (TCGA) revealed a positive correlation between the expression of NK cell-associated genes (NCR1, NCR3, KLRB1, CD160, and PRF1) (32) and PRDM1 expression (Figure 1A). Patients with top and bottom quartiles of NK-PRDM1 signature expression were chosen for survival analysis (Figure 1B). Notably, patients with the NK-PRDM1_hi signature had better overall survival compared to the these with NK-_PRDM1_lo signature (Figure 1C). Similar results were also found in skin cutaneous melanoma (SKCM, n=454) and lung adenocarcinoma (LUAD, n=497) patients (Supplemental Figure 1, A-F). These data suggested that _PRDM1 in NK cells might be essential for immune surveillance in some solid tumors, including liver cancer. These findings prompted us to investigate the impact and mechanism of PRDM1 in NK cells and ILC1 within the context of liver cancer.”

      We have rewritten this part in our revised manuscript (page 7; line 119-132): 

      “Examination of 363 liver hepatocellular carcinoma (LIHC) patient samples from The Cancer Genome Atlas (TCGA) revealed a positive correlation between the expression of NK cell-associated genes (34) (NCR1, KLRB1, CD160, PRF1, etc.) and PRDM1 expression (Figure 1A). The patients are ordered from highest to lowest based on the expression of NK-Prdm1 for survival analysis (Figure 1B). Notably, patients exhibiting higher levels of NK-PRDM1 expression (above the median) experienced better survival outcomes compared to those with lower levels of NK-PRDM1 expression (below the median) (Figure 1C). Similar results were also found in skin cutaneous melanoma (SKCM, n=454) and lung adenocarcinoma (LUAD, n=497) patients (Supplemental Figure 1, A-F). Patients within the highest quartile of NK-PRDM1 signature expression demonstrated enhanced overall survival, a result that achieved statistical significance in LUAD and SKCM patients (Supplemental Figure 1, G-I). These data suggested that PRDM1 in NK cells might be essential for immune surveillance in solid tumors, including liver cancer, and prompted us to investigate the function and mechanism of PRDM1 in NK cells and ILC1 within the context of liver cancer.”.

      Comment 6: The origin of the Ncr1-cre mice utilised should be clarified; is this the line developed by Eric Vivier? (https://www.pnas.org/doi/10.1073/pnas.1112064108).

      Response 6: We did not use the line developed by Eric Vivier, our Ncr1-cre mice was purchase from Shanghai Model Organism Center, Inc.. We described this in our method parts (page 29-30; line 612-614): 

      Prdm1fl/fl mice were purchased from The Jackson Laboratory. Ncr1-iCre and B2m-/- mice were purchased from Shanghai Model Organisms Center, Inc.. Six- to twelve-week-old littermates were used for the experiment.”

      Comment 7: Considering the known reduction of Ncr1 expression in Ncr1-cre mice and its implications, it is recommended to repeat the B16F10 experiments with the correct control, Ncr1cre/+ Prdm1+/+.

      Response 7: This is an excellent question, and it has been raised by another reviewer and comprehensively answered (Reviewer 1, Comment 1). The answer is below: 

      The expression of Cre and the insertion of loxP sequences both have the potential to influence gene expression. This is because the region where loxP is inserted may contain regulatory sequences for the gene of interest. Ncr1-Cre is a frequently used transgenic mouse model in our laboratory. In our prior research, we also had concerns about the possible impact of Cre on NKp46 expression, which could lead to a decline in NK cell function. Therefore, in our previous studies focused on Smad4 expression in NK cells, we conducted similar experiments. In Figure 6 of our published paper in the Journal of Clinical Investigation (Wang et al., J Clin Invest, 2018), we compared NKp46iCreTgfbr2fl/flSmad4fl/WT with NKp46-iCreTgfbr2fl/flSmad4fl/fl. Although the primary purpose is to establish Smad4's independence from TGF-β, it also allows for a comparison between Smad4fl/fl and Smad4fl/WT in the presence of Cre. In the critical phenotype we assessed, NKp46iCreTgfbr2fl/flSmad4fl/fl (compared with NKp46-iCreTgfbr2fl/flSmad4fl/WT) exhibited the same phenotype as NKp46-iCreSmad4fl/fl (compared with NKp46WTSmad4fl/fl). This suggests that Cre's influence on NK cells may be within a reasonable and controllable range. Furthermore, in contrast to the decrease in Ncr1 expression caused by Cre, the reduction in the expression levels of genes targeted by Loxp knockout, such as Prdm1 in this study (Figure 1 E), is more significant. Therefore, with the current techniques and research methods, we believe that the data provided in this study can support the role of Prdm1 in NK cells.

      Comment 8: The proportion of ILC1 in wild-type mouse livers is notably higher than standard references. Could you confirm whether liver perfusion was performed before analysis? This procedure was not clearly detailed in the methods section.

      Response 8: We apologize that we did not provide enough detail regarding this point in our original method. We had performed the liver perfusion before analysis. This has now been clarified in the method section of the revised text (page 30-31; line 630-636): 

      “Mice were perfused with 1◊ PBS by portal vein puncture before harvesting tissues. Liver and lung was digested with 0.05% collagenase II for 30 minutes and filtered through 70 µm cell strainers, and mononuclear cells were isolated after subjected to density gradient using 30% and 70% percoll. Spleen were also removed and pressed through 70 µm filterers to obtain splenocytes. Peripheral blood mononuclear cells were obtained from peripheral blood after lysis of red blood cells (Biolegend, 420301). Flushing femurs and mechanical disruption of inguinal lymph nodes were performed to obtain cells from bone marrow and lymph nodes.”.

      The lymphocyte proportions in mice from different laboratories may exhibit slight variations, possibly due to genetic background disparities. To minimize the influence of genetic backgrounds, paired littermates were used in the current study, wherein one is Prdm1 WT and the other has the Prdm1 gene knocked out in NK cells.

      Comment 9: There appears to be inconsistency in reference formatting; for instance, Ref 39 does not match the formatting of other references. A thorough review of your citation format is suggested.

      Response 9: We apologize for the inadvertent errors and we reviewed the citation format.

      Comment 10: The information in Figures 2B and C may be better suited to the supplementary section as it does not significantly contribute to the main text.

      Response 10: We agree with the reviewer’s suggestion and these are now moved to supplementary figures (Supplemental Figure 2).

      Comment 11: The citation of reference 40 could be strengthened by including Sathe et al., 2014, which directly pertains to your findings (https://www.nature.com/articles/ncomms5539).

      Response 11: We added the suggested reference.

      Comment 12: Can the findings presented in Figure 2D/F be replicated using alternative models?

      This would substantiate the versatility of your results.

      Response 12: The current predominant in vivo tumor model for NK cells is primarily based on the use of B16F10 melanoma cells. These melanoma cells, with their low expression of MHC-I molecules, evade T cell-mediated immune surveillance, rendering them ideal targets for NK cells. Typically, this experimental melanoma metastasis assay involves tail vein injection, followed by nodules' detection in the lungs. To align with our investigation of liver-resident cNK and ILC1, we've introduced splenic injection (via the portal vein) and evaluated melanoma metastasis in the liver to reflect the anti-tumor capabilities of liver group 1 ILCs. We also explored subcutaneous tumor models, but we believe they may not effectively support Prdm1's role in cNK cells, particularly liver-resident NK cells and ILC1. While we've experimented with models using mouse liver tumor cells like Hepa 1-6, we found them less stable than B16F10 and less conducive to quantification. Should more suitable models or cells line emerge, we remain open to exploring them in future research.

      Comment 13: The absence of in vitro killing assessments against B16F10 and YAC-1 leaves a gap in the NK cell characterisation which would be valuable to address.

      Response 13: Isolating NK cells for ex vivo cytotoxicity assays typically requires stimulation with high concentrations of IL-2. Under such high IL-2 stimulation, many intracellular differences that contribute to difference in cytotoxicity, such as changes in transcription factors, are often masked. Another issue is that current ex vivo NK cell cytotoxicity assays often only isolate NK cells from the spleen. Liver-resident NK cells, on the other hand, are often limited in quantity and isolation methods, making it challenging to conduct ex vivo cytotoxicity assays effectively. If more sensitive detection methods become available, we will also incorporate ex vivo data into our future research endeavors.

      Comment 14: The suggestion that NK cells produce IL-6 is indeed a bold one, and without additional validation through intracellular cytokine detection or ELISA, it may be prudent to omit these claims.

      Response 14: We have checked the GSEA results, and found no valuable genes in IL-6 production.

      Therefore, we have removed this figure.

      Comment 15: The lack of fluorescence minus one (FMO) controls in Figure 3 and Supplementary

      Figure 4 is noted; including these would enhance the validity of your gating strategies.

      Response 15: As requested, we add the FMO controls in aforementioned figures.

      Comment 16: There seems to be a minor mix-up in referring to Figure 4A in the scRNAseq results section, perhaps it was intended to refer to Figure 3A?

      Response 16: We have corrected this part (line 247). We also double checked corrected the inaccuracies in the references to the figures. we apologize for the inadvertent errors.

      Comment 17: The rich datasets generated from bulk and scRNAseq are commendable. However, I urge you to make these datasets publicly accessible with a GEO accession number.

      Response 17: We appreciate the suggestion from the reviewer. We plan to upload our datasets when in the last version of our manuscript, which is also the request of the eLife policy.

      Comment 18: Figure 4K is insightful, yet a similar analysis of the ILC1 cluster could provide a more rounded understanding.

      Response 18: We thank the reviewer for the comments. We provide the similar analysis of ILC1s, as showing in revised Figure 5H. 

      Comment 19: The metabolic RNA signatures featured in Supplementary Figure 6 are intriguing and warrant further validation, perhaps through Seahorse analysis. Such validation could merit their inclusion in the main figures.

      Response 19: This is a very good suggestion. Currently, our data offer only limited indications in this context. We have chosen to validate some aspects of Prmd1's influence on cytotoxicity molecules. As for Prdm1's impact on other aspects of NK cells, such as metabolic functions, we may explore further in future research. Additionally, we hope that by publishing our research findings, laboratories worldwide can draw insights for their own studies and conduct relevant research based on this data.

      Comment 20: It is difficult to discern whether the cells depicted in Figure 7D are truly tumorinfiltrating ILC1 or NK cells that have adopted ILC1-like characteristics. Intravenous injection of CD45-PE could clarify this distinction, and if they are the latter, it may be more appropriate to refer to them as ILC1-like cells.

      Response 20: We completely agree with the reviewer's suggestion that "tumor-infiltrating lymphocytes" may not be accurate for the current experiment. Therefore, in the revised manuscript, we have changed it to "liver cNK or ILC1 from tumor-bearing livers.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, Drougard et al. examined the consequences of an acute high fat diet (HFD) on microglia in mice. 3-day HFD influenced the regulation of systemic glucose homeostasis in a microglia-dependent and independent manner, as determined using microglial depletion with PLX5622. 3-day HFD increased microglial membrane potential and the levels of palmitate and stearate in cerebrospinal fluid in vivo. Using confocal imaging, respirometry and stable isotope-assisted tracing in primary microglial cultures, the authors suggest an increase in mitochondrial fission and metabolic remodeling occurs when exposed to palmitate, which increases the release of glutamate, succinate and itaconate that may alter neuronal metabolism. This acute microglial metabolic response following acute HFD is subsequently linked to improved higher cognitive function (learning and memory) in a microglia and DRP1-dependent manner.

      Strengths:

      Overall, this study is interesting and novel in linking acute high fat diet to changes in microglia and improved learning and memory in mice. The role for microglia and DRP1 in regulating glucose homeostasis and memory in vivo appears to be supported by the data.

      Weaknesses:

      The authors suggest that utilization of palmitate by microglia following HFD is the driver of the acute metabolic changes and that the release of microglial-derived lactate, succinate, glutamate and itaconate are causally linked to improvements in learning and memory. A major weakness is that the authors provide no mechanistic link between beta-oxidation of palmitate (or other fatty acids) in microglia and the observed systemic metabolic and memory phenotypes in vivo. Pharmacological inhibition of CPT1a could be considered or CPT1a-deficient microglia.

      We thank Reviewer #1 for their time, effort and the critique. Indeed, we suggest that palmitate drives the aMMR response and associated improvements in learning and memory. In response to acute HFD we observe 1) increased in palmitate in CSF; 2) impaired mitochondrial ETC activity in primary microglia (within 12 hours of HFD); and 3) improved learning and memory. The greatest barrier to proving how acute palmitate uptake in microglia improves learning and memory in vivo is the protracted methodology required for microglial isolation and purification. The timeframes and relatively harsh digestion protocols required are currently incompatible with metabolomic tracing and well beyond those required for most cell-types used for metabolomic investigation.  We have tested and failed to obtain reproducible data across numerous in vivo protocols and finally settled on in vitro 13C palmitate treated neonatal microglia as the best current option. Primary neonatal microglia are accepted as one of the current best culture models by the microglial community (Valdercaos cell report 2014, Kim Cell Metab 2019). Using neonatal microglia we demonstrate that 13Cpalmitate label is processed to palmitoylcarnitine (Fig 4C) and acetylcarnitine (Fig 4D) indicating that microglial fatty acid metabolism acts via the canonical CPT1/CPT2 pathway. These experiments highlight that microglia process palmitate via beta oxidation generating acetyl coA and engaging the TCA cycle (Fig 4G-I).

      We now acknowledge these technical limitations more clearly and highlight their impact on any conclusions regarding adult microglia in vivo:

      Results “Microglia take up and metabolize free fatty acids”; 

      “Due in part to the long isolation times required to generate pure primary adult microglia, metabolite tracing experiments on primary adult microglia are not currently feasible. We therefore chose primary murine neonatal microglia as our model of choice for more mechanistic experiments (Valdercaos, Cell Report 2014)”

      And,

      Discussion:

      “We propose that aMMR could result from direct uptake, processing, and release of fatty acid derived carbons, and demonstrate that microglia are capable of metabolizing fatty acids towards diverse intracellular and extracellular pools.”

      While acute ICV injection a CPT1a blocker would be of potential interest, the caveats associated with CPT1a inhibition in other cell-types (neurons, astrocytes, etc) and with targeting the appropriate brain region (currently unknown) currently preclude the effective use of this approach for to generate clear additional mechanistic insights. Similarly, given the time and resources required to generate, validate, optimize and experiment on a clean model of in vivo adult microglia-specific CPT1a knockout, this approach was deemed beyond the scope of this study. That said, the critique is important, and it should comprise a follow-up project.

      Comment: Another major weakness is that the authors also suggest that 3-day HFD microglial response (increase membrane potential) is likely driven by palmitate-induced increases in itaconate feedforward inhibition of complex II/SDH. Whilst this is an interesting hypothesis, the in vitro metabolic characterization is not entirely convincing.

      The reviewer is correct, we suggest that our data is consistent with a model where a palmitate-induced increase in itaconate inhibits complex II/SDH. While our findings do not comprise mechanistic proof, the hypothesis is supported by our Seahorse studies (Fig 2E) highlighting that a combined Palmitate + Succinate stimulation does not increase OCR beyond that of Palmitate alone; by primary microglial cell experiments highlighting that 3d-HFD treated adult primary microglia are refractory to succinate-induced mitochondrial membrane depolarization (Fig 2F); and by the identification of increased palmitate induced itaconate production/release in cultured primary neonatal microglia (Fig 4H). The data are consistent with an inhibition of complex II/ SDH and with increased itaconate secretion. They are also consistent with literature on more easily accessible myeloid lineages (Lampropoulou V, Cell Metab 2016).  

      Comment: The authors suggest that acute palmitate appears to rapidly compromise or saturate complex II activity. Succinate is a membrane impermeable dicarboxylate. It can enter cells via MCT transporters at acidic pH. It is not clear that I) Succinate is taken up into microglia, II) If the succinate used was pH neutral sodium succinate or succinic acid, and III) If the observed changes are due to succinate oxidation, changes in pH or activation of the succinate receptor SUCNR1 on microglia. In the absence of these succinate treatments, there are no alterations in mitochondrial respiration or membrane potential following palmitate treatment, which does not support this hypothesis.

      We thank Reviewer #1 for highlighting a lack of information in the material and methods. We have updated them accordingly as follows:

      “For the electron transport chain experiments (ETC), the experiment was based on the Salabei et al. The cell suspension was incubated with the mitochondrial probe Tetramethylrhodamine TMRM (10mM; Abcam, Cat# ab228569) and fluorescent glucose analog 2-NBDG (Abcam, Cat# 235976) for 30min at 37degrees before FACS acquisition. For challenging the ETC, the cell pellet was resuspended in 500ul of warm MAS buffer solution + 1nM Plasma Membrane Permeabilizer (Agilent Seahorse XF PMP) in order to permeabilize the cells. Microglial cells were gated from CD45low-CD11b+ cells followed by singlet after forward and side scatter pattern. They were incubated each 90 seconds by the following drugs: 0,5ul of 100uM Rotenone (Sigma), 2ul of 2.5M Succinate adjusted to ph 7.4 with NaOH (succinic acid, Sigma) and 0.5ul of 1mM Antimycin (Sigma). Cytometry was performed on Fortessa (BD Bioscience) and analyzed with FlowJo v10 (Treestar).”

      Following the updated protocol, we hope we highlighted that the succinate (solution of succinic acid ph 7.4) is reaching directly the ETC since the microglial cells have been permeabilized by the Plasma Membrane Permeabilizer (Agilent Seahorse XF PMP).

      Comment: Intracellular itaconate measurements and quantification are lacking and IRG1 expression is not assessed. There also appears to be more labelled itaconate in neuronal cultures from control (BSA) microglia conditioned media, which is not discussed. What is the total level of itaconate in neurons from these conditioned media experiments? No evidence is provided that the in vivo response is dependent on IRG1, the mitochondrial enzyme responsible for itaconate synthesis, or itaconate. To causally link IRG1/itaconate, IRG1-deficient mice could be used in future work. 

      We appreciate the interest, the exciting question, and the suggested future experiment. Indeed, our results suggest a difference in metabolite release between the BSA treated-microglia and palmitate treated-microglia and their impact on neurons comprises a prime question for future work. We have highlighted this in the discussion as well as adding a comment regarding relative levels of labelled itaconate as follows:

      Results; Acute HFD induces widespread MMR and rapid modulation (…) memory  

      “As a control for the direct uptake of 13C-glucose, we treated parallel neuronal cultures with the same fresh 13C-glucose tracing media originally added to the microglia. Intriguingly, and consistent with literature documenting poor direct glucose utilization by neurons [29], we found substantial m+3 lactate (as well as other metabolites) in neurons treated with microglial conditioned media, and at levels that far exceeded labelling triggered by glucose tracer alone (Fig 5A, middle column vs left column)(Suppl Fig S5B). The data indicate higher uptake of citrate and itaconate from the control microglia-conditioned media, further supporting the hypothesis that neuronal metabolism is reproducibly impacted by palmitate-triggered changes in microglial products. These data demonstrate that palmitate metabolism by microglia modulates neuronal carbon substrate use in vitro, and, they highlight the relative importance of this process compared to uptake of pure glucose. The data identify a candidate mechanism by which aMMR may alter neuronal function in vivo.”

      Comment: While microglial DRP1 is causally implicated the role of palmitate is not convincing. Mitochondrial morphology changes are subtle including TOMM20 and DRP1 staining and co-localization - additional supporting data should be provided. Electron microscopy of mitochondrial structure would provide more detailed insight to morphology changes. Western blot of fission-associated proteins Drp1, phospho-Drp1 (S616), MFF and MiD49/51. Higher magnification and quality confocal imaging of DRP1/TOMM20. Drp1 recruitment to mitochondrial membranes can be assessed using subcellular fractionation.

      We appreciate the reviewer’s comment. Previous work by others, already cited elsewhere in our manuscript

      (PMCID: PMC7251564), has clearly demonstrated increased mitochondrial fragmentation and

      phosphorylated DRP1 in 3d HFD animals. This very specific result can therefore be considered confirmatory / validating of existing literature, and important for inclusion of DRP1 in our overall model. We have made sure to better highlight this important literature accordingly:

      Results; A rapid Microglial Mitochondria response to high fat diet

      “Consistent with the in vivo observations above, in vitro palmitate exposure decreased microglial mitochondrial length within 24 hours, indicating that fatty acid exposure itself is sufficient to trigger mitochondrial fission in a cell autonomous manner (Fig 2G upper panels). This result also confirms observations by Kim et al. who observed mitochondrial fission and DRP1 phosphorylation upon 3d-HFD treated mice [Kim JD et al, Microglial UCP2 mediates Inflammation and Obesity induced by High Fat feeding, Cell Metab 2019].”

      Comment: No characterization of primary microglia from DRP1-knockout mice is performed with palmitate treatment. Authors demonstrate an increase in both stearate and palmitate in CSF following 3day HFD. Only palmitate was tested in the regulation of microglial responses, but it may be more informative to test stearate and palmitate combined.

      Testing stearate and palmitate combined is an interesting experiment for mimicking the global effect of HFD which is highly enriched with these two satured fatty acids, and then, more informative. In vitro stimulation of microglia model cells has been previously published by Valdearcos and al. (Cell Reports 2014) who studied the effect of a mix of stearate and palmitate on the Mediobasal Hypothalamus inflammation. Here, we build on their important findings by demonstrating that these 2 compounds are actually found in the CSF of 3d-HFD mice. Studies from other labs have also shown the presence of stearate and palmitate in the CSF of chronically obese and diabetic patients which highlights the importance of these findings (Melo HM et al. cell report 2020). While a systematic dissection of the roles of HFD-regulated CSF metabolites (including direct (diet containing) and indirect (secondary) is beyond the scope of this study, this point is important, not least because it highlights less well-studied metabolites and the potential of possible combinatorial interactions. We have highlighted this idea in the results as follows:

      Results; A rapid Microglial Mitochondria response to high fat diet

      “To test whether these observed fatty acid changes in the CSF might directly trigger aMMR, we switched to an in vitro primary neonatal microglia model and examined the effects of the more abundant of these, palmitate (Fig S2A-B).”

      and, in the discussion as follows:

      “Studies have identified stearate and palmitate in the CSF of patients with chronic obesity and with diabetes, reports that highlight the importance of these findings (Melo HM et al. cell report 2020). While a systematic dissection of the roles of HFD-regulated CSF metabolites (including direct (diet containing) and indirect (secondary)) is beyond the scope of this study, they represent priority areas for future investigation, particularly given the wide-range of fatty-acid specific biological effects in the literature, and the potential for combinatorial interactions.” 

      Reviewer #1 (Recommendations For The Authors):

      Congratulations on this interesting and novel work. Please see public review for details on potential experiments. While I would not expect all the experiments to be performed for this current study, it’s important to not overstate what the data is showing. For example, there is no causal link between palmitate oxidation in microglia or released metabolites (itaconate etc) from microglia in the effect on systemic glucose metabolism or memory. To make such claims more supporting data would be required.

      We thank Reviewer #1 for their highly constructive critique_._

      Reviewer #2 (Public Review):

      The study "A rapid microglial metabolic response controls metabolism and improves memory" by Drougard et al. provides evidence that short-term HFD has a beneficial effect on spatial and learning memory through microglial metabolic reprogramming. The manuscript is well-written and the statistics were properly performed with all the data. However, there are concerns regarding the interpretation of the data, particularly the gap between the in vivo observations and the in vitro mechanistic studies.

      In the PLX-5622 microglial depletion study, it is unclear what happened to the body weight, food intake, and day-night behavior of these mice compared to the vehicle control mice. It is important to address the innate immunity-dependent physiology affected by a long period of microglial depletion in the brain (also macrophages in the periphery). Furthermore, it would be beneficial to validate the images presented in Fig.1F by providing iba1 staining in chow diet-fed mice with or without PLX-5622 for 7-10 days. Additionally, high-quality images, with equal DAPI staining and comparable anatomical level, should be provided in both chow diet-fed mice and HFD-fed mice with or without PLX-5622 in the same region of hypothalamus or hippocampus. These are critical evidences for this project, and it is suggested that the authors provide more data on the general physiology of these mice, at least regarding body weight and food intake.

      We are grateful to Reviewer #2 for their constructive comments and for their time and effort; and for highlighting the lack of experimental details regarding the PLX-5622 microglial depletion study. We followed the protocol established in Feng et al JCI 2017. No adverse effects on body weight, food intake and day-night behavior have been described in this study as well as in other studies for longer treatment (Sonia George et al Molecular Neurodegeneration 2019). We didn’t observe any differences in body weight and the food intake within or between groups, upon PLX administration. These data have been included as new Supplementary Fig 6 A-B.

      The material and method was updated as follows:

      “Animals were administered PLX5622-containing diet for 7-9 days without observable impact on the body weight or food intake (Fig S6A-B), using protocols adopted from [Feng et al JCi 2017, Sonia George et al Molecular Neurodegeneration 2019].”

      Comment: It is also unclear whether the microglia shown in Fig.3A were isolated from mice 4 weeks after Tamoxifen injection. It is suggested that the authors provide more evidence, such as additional images or primary microglia culture, to demonstrate that the mitochondria had more fusion upon drp1 KO. It is recommended to use mito-tracker green/red to stain live microglia and provide good resolution images.

      We thank Reviewer #2 for pointing out the lack of detailed information about Fig 3A. Microglial cells were indeed isolated from mice after the tamoxifen injection for highlighting the deletion. We updated the Material and methods with the text below;

      “For the colocalization experiment, microglia were isolated from 10 to 12-week old drp1ko mice and their littermate controls, immediately fixed in PFA and stained with DRP1 (diluted 1:50 Cell signaling; Cat#8570) and tomm20 antibodies (diluted 1:1000, SantaCruz; Cat#sc177615).”

      This experiment was performed as an additional control of the drp1 deletion from our knockout-mice. For this experiment we used Tomm20 since the microglia cells weren’t live after the addition of PFA. 

      Comment: Regarding the data presented in Fig.5A, it is suggested that the authors profile the metabolomics of the microglial conditioned media (and provide the methods on how this conditioned media was collected) to determine whether there was already abundant lactate in the media. Any glucose-derived metabolites, e.g. lactate, are probably more preferred by neurons as energy substrates than glucose, especially in embryonic neurons (which are ready to use lactate in newborn brain).

      With regards to Fig 5A, metabolomics of microglia conditioned media are provided as Fig 5A, Supp Figure 5Band we provided a supplementary table 2.

      We thank Reviewer #2 for noting the lapse of technical detail. We updated the Material and methods with the following:

      “For conditioned media experiments, microglial cells were incubated with DMEM (Gibco) without lactate completed with BSA-conjugated palmitate or Control BSA. Conditioned media was collected after the incubation, centrifuged 15min at 300g (4oC) and the supernatant transferred and frozen in a fresh tube avoiding the cells and debris pellet. Sample were immediately snap frozen or use for the neurons incubation.”

      Any glucose-derived metabolites, e.g. lactate, are more preferred by neurons as energy substrates than glucose as described first in the literature by Prof. Pellerin and Prof. Magistretti via the astrocyte-neuron cooperation (PNAS 1994). Since their discovery, lactate has been explored and is well known as a key signaling molecule (Magistretti PJ Nat Rev Neurosciences 2018). We explored the role of lactate released from the microglia, and we demonstrated that it is taken up by neurons independently of any microglial pretreatment. This experiment highlights microglia as another lactate provider for the neurons (Fig 4N and Fig 5A). 

      Comment: Finally, it is important to address whether PLX-5622 affects learning and spatial memory in chow diet-fed animals. Following the findings shown in Fig 5J and 5K, the authors should confirm these by any morphological studies on synapse, e.g. by synaptophysin staining or ultrastructure EM study in the area shown in Fig 5I.

      We appreciate the comment and question. We performed the controls and included them now as Fig 5J and Fig S5 E-F-G. We do not observe any adverse effects of PLX5622 on learning and spatial memory in normal chow-fed animals. 

      While we were unable to study the synapses as requested, it is important to note that no changes are expected given publications from other labs using the same protocol (Feng x JCI 2017 ,Spangenberg E Nat Com 2019), or longer PLX5622 treatment (Niiyama T eNeuro 2023, Witcher KG J neurosciences 2021), all four of which did not find morphological differences at synapses. 

      Reviewer #2 (Recommendations For The Authors):

      The authors should provide more evidence that palmitate is derived from HFD to prove that it mediates the HFD effects on the microglial mitochondria response. This could be done by adding 13C-palmitate into the HFD and performing metabolomics in isolated microglia from control mice (and Drp1-MG-KO mice, if possible).

      We thank the Reviewer #2 for the enthusiastic revision. Unfortunately, we were unable to attempt this final suggested experiment. We have adjusted our wording accordingly and appreciate the reviewer’s understanding.

      Reviewer #3 (Public Review):

      Drougard et al. explore microglial detection of a switch to high-fat diet and a subsequent metabolic response that benefits memory. The findings are both surprising and novel in the context of acute highfat intake, with convincing evidence of increased CSF palmitate after 3 days of HFD. While the authors demonstrate compelling signs of microglial activation in multiple brain regions and unique metabolite release in tracing studies, they should address the following areas prior to acceptance of this manuscript.

      Major Points:

      (1) It appears that the authors perform key metabolic assays in vitro/ex vivo using primary microglia from either neonatal or adult mice, which should be more clearly delineated especially for the 13C-palmitate tracing. In the case of experiments using primary microglia derived from mixed glial cultures stimulated with M-CSF, this system relies on neonatal mice. This is understandable given the greater potential yield from neonatal mice, but the metabolic state and energetic demands of neonatal and adult microglia differ as their functional roles change across the lifespan. The authors should either show that the metabolic pathways they implicate in neonatal microglia are also representative of adult microglia or perform additional experiments using microglia pooled from adult mice, especially because they link metabolites derived from neonatal microglia (presumably not under the effects of acute HFD) to improved performance in behavioral assays that utilize adult mice.

      We thank Reviewer #3 for their constructive critique and encouraging words. As indicated, the 13C-palmitate experiments were performed with primary microglia derived from mixed glial cultures stimulated with M-CSF and we demonstrated our primary cultures were almost pure by the supplementary experiments (supp Fig2A and B). Additional minor details in these contexts have been added to the Material and Methods.

      The experiments focusing on the mitochondrial ETC were performed on sorted microglia from adult mice and parallels demonstrated with the neonatal cultures (the primary model for metabolic tracing). Compromised complex II activity under conditions of acute HFD/palmitate stimulation for instance were shown in both systems. Unfortunately, despite best-efforts, attempts to run 13C-palmitate tracing experiments on primary adult microglia failed, attributable in large part to the long (~4 hour) and harsh microglial extraction and sorting process. These experiments will require substantial follow-up efforts including the establishment and validation ideally of an adult microglia-neuron co-culture model that faithfully recapitulates most aspects of in vivo metabolic cross-talk. This noble aim is beyond the scope of this study. We have made sure to temper the  conclusions made in the manuscript and to not overstate the impact and interpretation of the in vitro work including updating the following sentences.

      Results “Microglia take up and metabolize free fatty acids”; 

      “Due in part to the long isolation times required to generate pure primary adult microglia, metabolite tracing experiments on primary adult microglia are not currently feasible. We therefore chose primary murine neonatal microglia as our model of choice for more mechanistic experiments (Valdercaos cell Report 2014)”

      and Discussion:

      “We propose that aMMR could result from direct uptake, processing, and release of fatty acid derived carbons, and demonstrate that microglia are capable of metabolizing fatty acids towards diverse intracellular and extracellular pools.”

      Comment: The authors demonstrate that 3 days of HFD increases circulating palmitate by CSF metabolomics and that microglia can readily metabolize palmitate, but the causal link between palmitate metabolism specifically by microglia and improved performance in behavioral paradigms remains unclear. A previous body of research, alluded to by the authors, suggests that astrocyte shuttling of lactate to neurons improves long-term and spatial memory. The authors should account for palmitate that also could be derived from astrocyte secretion into CSF, and the relative contribution compared to microglia-derived palmitate. Specifically, although microglia can metabolize the palmitate in circulation, there is no direct evidence that the palmitate from the HFD is directly shuttled to microglia and not, for example, to astrocytes (which also express CX3CR1). 

      We appreciate the comment. Indeed, this issue highlights one of the greatest challenges for efforts aimed at tracing (beyond doubt) that a single minor cell population contributes towards metabolic cross-talk in vivo. Our experiments show: increased CSF palmitate levels within one feeding cycle of HFD; rapidly induced microglial metabolic activation (characterized by increased mitochondrial membrane potential and impaired complex II activity); and that microglia mount a comparable mitochondrial activation profile in vitro when exposed to palmitate. They show in vitro using neonatal microglia that microglia take up and metabolize palmitate; that they release metabolites with neuro-modulatory potential; that neurons take these metabolites up and modulate their function differentially when exposed to control vs palmitate-treated microglia-conditioned media (in the absence of astrocytes). The experiments show through acute PLX-induced elimination of microglia, however crude, that this compartment impacts the acute HFD response, and using conditional deletion, that full DRP1 expression is required CX3CR1-CreERT2 targeted cells (primarily microglia deleting; Zhao et al 2019).  While these experiments cannot rule out a contribution of astrocytes to the observations in vivo, comparable experiments rarely can and we cannot rationalize why microglia should not have equal access to CSF palmitate for uptake or to neurons for substrate provisioning. We now better highlight this important issue, and temper our conclusions accordingly:

      “Tanycytes and astrocytes have both been documented to release select metabolites into the extracellular environment [33, 34]. While suggestive, the experiments highlighted here do not rule out a contribution of these or cell types in coupling acute HFD intake to memory and learning.”

      Comment: Thus, the Barnes Maze results could be attributed to multiple cell types. Furthermore, the evidence provided in Figure 5J is insufficient to claim a microglia-dependent mechanism without showing data from mice on HFD with and without microglia depletion (analogous to the third and fourth bars in panel K).

      Agreed. We appreciate the comment. We have now added the requested HFD condition to Figure 5J. The data support our previous interpretation of the data. 

      Comment: Given the emphasis on improved cognitive function, there is minimal discussion of the actual behavioral outcomes in both the results and discussion sections. The data that HFD-treated animals outperform controls should be presented in more detail both in the figure and in the text. For example, data from all days/trials of the Barnes Maze should be shown, including the day(s) HFD mice outperform controls. Furthermore, the authors should either cite additional literature or provide experimental evidence supporting the notion that microglia release of TCA-associated substrates into the extracellular milieu after HFD specifically benefits neuronal function cellularly or regionally in the brain, which could translate to improved performance in classical behavioral paradigms. The single reference included is a bit obscure, given the study found that increased lactate enhances fear memory which is a neural circuit not studied in the current manuscript. Are there no additional studies on more relevant metabolites (e.g., itaconate, succinate)?

      We agree. We have now re-plotted the behavioral test to better highlight that the HFD-treated animals outperform controls, as requested (Fig S7 and S8). We also added the requested literature. While we cannot be sure our search captured all relevant studies, we find a relative paucity of studies that characterize CSF metabolite changes in the context of acute high fat feeding or that demonstrate the ability of CSF substrates to convincingly improve memory and learning in vivo at physiological levels. Indeed, while simple, we feel the findings are of substantial novelty and highlight an area for significant future research. We have tempered our conclusions throughout and added to the discussion as follows:

      “Such substrate release could mediate the learning and memory effects that accompany aMMR; they are consistent with the data of other studies that have examined metabolite associations with learning and memory (itaconate [Morgunov IG, microorganisms 2020; Xiong J, Neuromolecular med 2023], succinate [Serra FT neurosciences letter 2022; Cline BH, BMC neurosciences 2012].”

      Minor Points:

      (1) In Figure 5J the latency to find the hole was noticeably higher (mean around 150s) than the latency in panel K (mean around 100s for controls, and 60s for Drp1MGWT on HFD). This suggests high variability between experiments using this modified version of the Barnes Maze, despite the authors assertion that a standard Barnes Maze was employed and the results were reproducible at multiple institutions. Why do Drp1MGWT mice on control diet find the escape hole significantly faster than WT mice on control diet in panel J? Given the emphasis on cognitive improvement following acute HFD as a novel finding, the authors should explain this discrepancy.

      We appreciate this question and comment. Indeed, as the reviewer knows, behavioral tests including the Barnes test show variation with genetic background, and with environment and context (eg. age, caging density, litter size, behavioral state and more (Inglis A, Physiol Behavior 2019; Loos M Mamm Genome 2015; and unpublished observations). We do not know the exact origin of the difference mentioned above but our best guess would be that it stems from either environmental differences  that are ever present in vivaria (seasonal, mouse house room, cage-changing cycles, etc) and/or, differences between the background genetics (eg. presence of Cre transgene and linked genome, genetic drift) or precise experimental differences between the cohorts (eg. repeated tamoxifen-injection paradigm for the deletion group). All of our experiments were performed in parallel, with all relevant animal groups equally represented in every run, and,and used age- and sex-matched individuals from congenic strains. Wherever possible, controls and test animals were littermates to minimize within strain variance attributable to litter effects (litter size, maternal and paternal effects). Given our lab’s interest and focus on the mechanistic and developmental origins of variance heterogeneity, these differences are of high interest for future study. 

      Comment: The authors highlight in the graphical abstract and again in Figure 4A the formation of lipid droplets following palmitate exposure as evidence of that microglia can process fatty acids. They later suggest that a lack of substantial induction of lipid droplet accumulation suggests that microglia are metabolically wired to release carbon substrates to neighboring cells. Clarification as to the role of lipid droplet formation/accumulation in explaining the results would eliminate any possible confusion.

      We modified the wording in the manuscript accordingly:

      Results “Microglia take up and metabolize free fatty acids”;

      “Based on BODIPY fluorescence, we found that primary microglia increase lipid droplet numbers within 24h of in vitro exposure to palmitate (200uM; Fig 4A), demonstrating a capacity to take up fatty acids.”

      Comment: In many bar graphs showing relatively modest effects, it would be helpful to use symbols to also show the distribution of sample and animal replicates (especially behavioral paradigms).

      Agreed. Indeed, the results are both modest and impressive given the nature of the intervention (simple change in dietary macronutrient composition). We have now re-plotted the results from the behavioral experiments, accordingly (Fig S7 and Fig S8).

      Reviewer #3 (Recommendations For The Authors):

      This is a good manuscript deserving of publication assuming some of the concerns posed above are addressed.

      We thank Reviewer #3 again for their time, effort, and dedication, and for their objective review of the manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      All of the reviewers indicate that their major concerns have been adequately addressed, but they each have a few comments that the authors should consider before submitting a final version (without further review) for publication. For example, a statement about the sex of the mice used in the studies and whether any differences were noted if both sexes were used. The idea that the loss of glutamate transport might affect NA loading into vesicles is also worth considering. Finally, the authors might want to mention that the role of neuropeptide release from NA neurons needs further examination. 

      As noted in the prior submitted revision, all experiments contained both males and females and this was addressed in our re-submission. In our analysis of breathing and metabolism, sex was included in the analysis and no significant phenotypic difference was observed (The statement of no sex difference is in line 451-456). For the fate map and in situ experiments, although the group size is small, we did not see obvious differences in the expression patterns in the three glutamate transporters between females and males (line 347-350). All the anatomical and phenotypic data in this manuscript are presented as combined graphs (figure 1, figure 1 supplement 1, figure 2, figure 2 supplement 2, figure 4,5,6,7) and we had differentially labeled our data points by sex (female data is pink and male data is blue).

      The possibility that loss of Vglut2 might affect NA release has been added in the discussion (line 485-491) of the current revision. Dopamine Beta Hydroxylase (DBH) converts dopamine to noradrenaline in the vesicles, thus, glutamate may not directly affect noradrenaline loading into vesicles. However, since loss of Vglut2 reduced dopamine release in subsets of dopaminergic neurons, it remains possible that glutamate affects dopamine loading in NA neurons and in turn perturbs DA to NA conversion in the vesicle by DBH and subsequent noradrenaline release. Future work could examine this hypothesis using fast-scan cyclic voltammetry (FSCV) or microdialysis.

      The further examination of the role of neuropeptide release from NA neurons is mentioned in the discussion (line 491-494 and line 497-499 of the pre).

      eLife assessment

      Chang et al. provide glutamate co-expression profiles in the central noradrenergic system and test the requirement of Vglut2-based glutamatergic release in respiratory and metabolic activity under physiologically relevant gas challenges. Their experiments provide compelling evidence that conditional deletion of vesicular glutamate transporters from noradrenergic neurons does not impact steady-state breathing or metabolic activity in room air, hypercapnia, or hypoxia. This study provides an important contribution to our understanding of how noradrenergic neurons regulate respiratory homeostasis in conscious adult mice. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Chang et al. provide glutamate co-expression profiles in the central noradrenergic system and test the requirement of Vglut2-based glutamatergic release in respiratory and metabolic activity under physiologically relevant gas challenges. Their experiments show that conditional deletion of Vglut2 in NA neurons does not impact steady-state breathing or metabolic activity in room air, hypercapnia, or hypoxia. Their observations challenge the importance of glutamatergic signaling from Vglut2 expressing NA neurons in normal respiratory homeostasis in conscious adult mice. 

      Strengths:

      The comprehensive Vglut1, Vglut2, and Vglut3 co-expression profiles in the central noradrenergic system and the combined measurements of breathing and oxygen consumption are two major strengths of this study. Observations from these experiments provide previously undescribed insights into (1) expression patterns for subtypes of the vesicular glutamate transporter protein in the noradrenergic system and (2) the dispensable nature of Vglut2dependent glutamate signaling from noradrenergic neurons to breathing responses to physiologically relevant gas challenges in adult conscious mice. 

      Weaknesses:

      Although the cellular expression profiles for the vesicular glutamate transporters are provided, the study does not document that glutamatergic-based signaling originating from noradrenergic neurons is evident at the cellular level under normal, hypoxic, and/or hypercapnic conditions. The authors effectively recognize this issue and appropriately discuss their findings in this context. 

      We thank the reviewer for the positive evaluation of our work.

      Reviewer #2 (Public Review):

      The authors characterized the recombinase-based cumulative fate maps for vesicular glutamate transporters (Vglut1, Vglut2 and Vglut3) expression and compared those maps to their realtime expression profiles in central NA neurons by RNA in situ hybridization in adult mice. Authors have revealed a new and intriguing expression pattern for Vglut2, along with an entirely uncharted co-expression domain for Vglut3 within central noradrenergic neurons. Interestingly, and in contrast to previous studies, the authors demonstrated that glutamatergic signaling in central noradrenergic neurons does not exert any influence on breathing and metabolic control either under normoxic/normocapnic conditions or after chemoreflex stimulation. Also, they showed for the first-time the Vglut3-expressing NA population in C2/A2 nuclei. In addition, they were also able to demonstrate Vglut2 expression in anterior NA populations, such as LC neurons, by using more refined techniques, unlike previous studies. 

      A major strength of the study is the use of a set of techniques to investigate the participation of NA-based glutamatergic signaling in breathing and metabolic control. The authors provided a full characterization of the recombinase-based cumulative fate maps for Vglut transporters. They performed real-time mRNA expression of Vglut transporters in central NA neurons of adult mice. Further, they evaluated the effect of knocking down Vglut2 expression in NA neurons using a DBH-Cre; Vglut2cKO mice on breathing and control in unanesthetized mice. Finally, they injected the AAV virus containing Cre-dependent Td tomato into LC of v-Glut2 Cre mice to verify the VGlut2 expression in LC-NA neurons. A very positive aspect of the article is that the authors combined ventilation with metabolic measurements. This integration holds

      particular significance, especially when delving into the exploration of respiratory chemosensitivity. Furthermore, the sample size of the experiments is excellent.  Despite the clear strengths of the paper, some weaknesses exist. It is not clear in the manuscript if the experiments were performed in males and females and if the data were combined. I believe that the study would have benefited from a more comprehensive analysis exploring the sex specific differences. The reason I think this is particularly relevant is the developmental disorders mentioned by the authors, such as SIDS and Rett syndrome, which could potentially arise from disruptions in central noradrenergic (NA) function, exhibit varying degrees of sex predominance. Moreover, some of the noradrenergic cell groups are sexually dimorphic. For instance, female Wistar rats exhibit a larger LC size and more LC-NA neurons than male subjects (Pinos et al., 2001; Garcia-Falgueras et al., 2005). More recently, a detailed transcriptional profiling investigation has unveiled the identities of over 3,000 genes in the LC. This revelation has highlighted significant sexual dimorphisms, with more than 100 genes exhibiting differential expression within LC-NA neurons at the transcript level. Furthermore, this investigation has convincingly showcased that these distinct gene expression patterns have the capacity to elicit disparate behavioral responses between sexes (Mulvey et al., 2018).

      Therefore, the authors should compare the fate maps, Vglut transporters in males and females, at least considering LC-NA neurons. Even in the absence of identified sex differences, this information retains significant importance. 

      An important point well raised by the authors is that although suggestive, these experiments do not definitively rule out that NA-Vglut2 based glutamatergic signaling has a role in breathing control. Subsequent experiments will be necessary to validate this hypothesis. 

      An improvement could be made in terms of measuring body temperature. Opting for implanted sensors over rectal probes would circumvent the need to open the chamber, thereby preventing alterations in gas composition during respiratory measurements. Further, what happens to body temperature phenotype in these animals under different gas exposures? These data should be included in the Tables. 

      Is it plausible that another neurotransmitter within NA neurons might be released in higher amounts in DBH-Cre; Vglut2 cKO mice to compensate for the deficiency in glutamate and prevent changes in ventilation? 

      Continuing along the same line of inquiry is there a possibility that Vglut2 cKO from NA neurons not only eliminates glutamate release but also reduces NA release? A similar mechanism was previously found in VGLUT2 cKO from DA neurons in previous studies (Alsio et al., 2011; Fortin et al., 2012; Hnasko et al., 2010). Additionally, does glutamate play a role in the vesicular loading of NA? Therefore, could the lack of effect on breathing be explained by the lack of noradrenaline and not glutamate? 

      We thank the reviewer for the positive evaluation and further suggestions. Please see our response in “Author Response” to the previous version of Reviewer #2 (Public review).

      Reviewer #4 (Public Review): 

      Summary:

      Although previous research suggested that noradrenergic glutamatergic signaling could influence respiratory control, the work performed by Chang and colleagues reveals that excitatory (specifically Vglut2) neurons is dynamically and widely expressed throughout the central noradrenergic system, but it is not significantly crucial to change baseline breathing as well the hypercapnia and hypoxia ventilatory responses. The central point that will make a significant change in the field is how NA-glutamate transmission may influence breathing control and the dysfunction of NA neurons in respiratory disorders. 

      Strengths:

      There are several strengths such as the comprehensive analysis of Vglut1, Vglut2, and Vglut3 expression in the central noradrenergic system and the combined measurements of breathing parameters in conscious unrestrained mice. 

      Other considerations :

      These results strongly suggest that glutamate may not be necessary for modulating breathing under normal conditions or even when faced with high levels of carbon dioxide (hypercapnia) or low oxygen levels (hypoxia). This finding is unexpected, considering many studies have underscored glutamate's vital role in respiratory regulation, more so than catecholamines. This leads us to question the significance of catecholamines in controlling respiration. Moreover, if glutamate is not essential for this function, we need to explore its role in other physiological processes such as sympathetic nerve activity (SNA), thermoregulation, and sensory physiology. 

      We thank the reviewer for the positive evaluation and further suggestions. The potential role of noradrenergic-derived glutamate in other processes, which is beyond the scope of this study, should be addressed in the future.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      All of my concerns were effectively resolved, leading me to accept the paper. However, I suggest that the authors consider investing in a more reliable system for measuring body temperature, as accurate measurements of this parameter are crucial for whole body plethysmography. 

      Thank you for the suggestion. The real-time measurement of body temperature is a goal in future studies.

      Reviewer #4 (Recommendations For The Authors):

      Because I am revising a revised version, I believe the authors have addressed most, if not all, the concerns raised by already 3 reviewers. In my understanding the authors achieved their aims and the results are totally supported by the conclusions. The impact of this work on the respiratory field is significant and is likely to advance the field. The methods and data utilized, which combine standard techniques with genetic tools, will be highly beneficial to the research community. 

      In my understanding I still have one concern that if glutamate is not critical, then what is? Could we potentially disable the noradrenergic (NA) system while preserving glutamate functionality to determine if the NA system is indeed crucial for respiratory physiology? This approach might provide clearer insights into the mechanisms underlying respiratory control. 

      We agree that there remain several exciting questions about the respective roles of noradrenaline, glutamate, and other neuropeptides such as Neuropeptide Y (NPY) and galanin. We are currently devising strategies to address the respective and combinatorial roles for all these candidates in breathing control. Most simply, we can conditionally, mutagenized each of them in the central noradrenergic system in an acute manner using DBH-CreER mice to determine if any of them are critical to respiratory control with the advantage of minimizing developmental compensatory events.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors evaluated a novel eIF2B activator, DNL343, in two mouse models representing different forms of the integrated stress response (ISR). They first assessed the pharmacokinetics of DNL343, demonstrating its ability to cross the blood-brain barrier and exhibit good bioavailability. In an acute ISR model induced by optic nerve crush (ONC) injury, DNL343 treatment reduced ISR-induced transcriptional changes and neuronal loss, demonstrating neuroprotective effects. Next, the authors generated an eIF2B loss-of-function mice model by knocking in disease-causing Eif2b5 variants. The model presents a chronic ISR and mimics vanishing white matter disease (VWMD). DNL343 treatment from the pre-symptomatic stage improved body weight and motor functions corrected transcriptional changes, and reversed proteomic and metabolomic alterations in the brain and cerebrospinal fluid. DNL343 treatment initiated at an advanced disease stage also showed positive effects, restoring body weight gain, suppressing ISR, reducing neurodegeneration biomarkers, and extending lifespan. These findings highlight DNL343 as an effective ISR inhibitor with potential applications in treating VWMD and other neurodegenerative disorders involving ISR.

      Strengths:

      The study's findings regarding the novel compound DNL343 offer significant promise in addressing VWMD, a condition currently lacking disease-modifying treatment. DNL343 directly targets eIF2B, the disease-causing complex in VWMD, and demonstrates notable efficacy in reversing the integrated stress response (ISR) and mitigating neurodegeneration in a VWMD mouse model. These results raise hope for the potential application of DNL343 in VWMD treatment, a development eagerly anticipated by patients and the VWMD research community. Moreover, the study hints at the broader potential of DNL343 in treating other ISR-related neurodegenerative disorders, such as amyotrophic lateral sclerosis, a prospect that holds broader interest. Additionally, the study's identification of potential biomarkers for VWMD represents a notable strength, potentially leading to improved disease progression assessment pending further confirmation in future research.

      Weaknesses:

      There are a couple of notable concerns in this study. Firstly, while the in vivo evidence strongly supports the efficacy of DNL343 in mitigating ISR and neurodegeneration, there is a lack of direct biochemical evidence to confirm its activity in eIF2B activation. Secondly, the potential for cardiovascular toxicity, which has been reported for a related eIF2B activator in a canine model (as mentioned in the manuscript), has not been evaluated for DNL343 in this study. This data gap regarding toxicity could be crucial for informing the future development of DNL343 for potential human use. Further investigation into these areas would be valuable for a comprehensive understanding of the compound's mechanisms and safety profile.

      We thank the reviewer for the thoughtful feedback and an opportunity to provide further clarification. To address the first question regarding biochemical evidence of the mechanism of action of DNL343, we agree that additional data is helpful to interpreting the results presented in this manuscript. We now include a citation to Craig et al (Craig, R.A., 2nd, J. De Vicente, A.A. Estrada, J.A. Feng, K.W. Lexa, M.J. Canet, W.E. Dowdle, R.I. Erickson, B.N. Flores, P.C.G. Haddick, L.A. Kane, J.W. Lewcock, N.J. Moerke, S.B. Poda, Z. Sweeney, R.H. Takahashi, V. Tong, J. Wang, E. Yulyaningsih, H. Solanoy, K. Scearce-Levie, P.E. Sanchez, L. Tang, M. Xu, R. Zhang and M. Osipov (2024). "Discovery of DNL343: A Potent, Selective, and Brain-Penetrant eIF2B Activator Designed for the Treatment of Neurodegenerative Diseases." J Med Chem.) which includes the full details on the discovery and characterization of DNL343.

      On the question of cardiovascular toxicity observed with previous eIF2B activating compounds, Craig et al also provides evidence in a non-human primate (cynomolgus monkey) model that DNL343 dosing did not result in QT prolongation or any functional cardiac changes. We have also completed a Phase 1 (NCT04268784) and Phase 1B double-blind (NCT05006352) trials in healthy and ALS participants, respectively and these trials are referenced on page 4, lines 102-103. The safety profile observed in these clinical studies supported further development of DNL343 for ALS in the Healey Platform trial (NCT04297683, Regimen G).

      Reviewer #2 (Public Review):

      Summary:

      The authors developed DNL343, a CNS-penetrant small molecule integrated stress response (ISR) inhibitor, to treat neurodegenerative diseases caused by ISR.

      Strengths:

      DNL343 is an investigational CNS-penetrant small molecule integrated stress response (ISR) inhibitor designed to activate the eukaryotic initiation factor 2B (eIF2B) and suppress aberrant ISR activation. The therapeutic efficacy of DNL343 has been extensively characterized in two animal models. Importantly, plasma biomarkers of neuroinflammation and neurodegeneration can be reversed with DNL343 treatment. Remarkably, several of these biomarkers show differential levels in CSF and plasma from patients with vanishing white matter disease (VWMD) upon DNL343 treatment. Overall, this is a very exciting study to target ISR for therapeutic interventions.

      Weaknesses:

      My main questions center around the characterization of DNL343.

      (1) Is there any biochemical evidence showing DNL343 activates eIF2B, such as binding assays or in vitro biochemical activity assays? A conference presentation was cited - "Osipov, M. (2022). Discovery of DNL343: a Potent Selective and Brain-penetrant eIF2B Activator Designed for the Treatment of Neurodegenerative Diseases. Medicinal Chemistry Gordon Research Conference. New London, NH." However, there needs to be public information about this presentation.

      Information from this presentation and more details on the discovery and characterization of DNL343 can be found in Craig et al J Med Chem (2024) and this citation has been replaced.

      (2) How was the selectivity of DNL343 demonstrated? What are the off-targets of DNL343, in particular when DNL343 is administered at a high dose? Thermal-proteasome profiling or photoaffinity labeling experiments could be considered.

      Please see Craig et al J Med Chem (2024) for full details. In brief, there were no significant off target effects observed for DNL343 in a Cerep panel.

      (3) What are the total drug concentrations in the brain and plasma? What are the unbound ratios?

      Following a single oral dose of DNL343 in mice, unbound brain-to-unbound plasma exposures ratios (Kp,uu) of 0.8 to 1.1 were observed, indicating high CNS penetrance. This was further supported by CSF-to-unbound plasma exposures ratios at 0.9 in the same mouse study. The CNS penetrance was also confirmed in rats and NHP by CSF-to-unbound plasma ratios near unity as reported in Craig et al J Med Chem (2024).

      (4) If DNL343 is given intravenously, what are the concentrations in the brain and plasma after 5 minutes and 1 hour or longer time points? In other words, does DNL343 cross BBB through passive diffusion or an active process?

      Unbound brain-to-unbound plasma exposure ratios following a single oral dose in the mouse were 0.8 to 1.1 and showed no time dependence. These measurements were made prior to, near, and following plasma tmax of DNL343, indicating unbound DNL343 crosses the BBB through passive diffusion and rapidly reached equilibrium between the brain and systemic circulation. Details can be found in Craig et al J Med Chem (2024).

      (5) What is the complete PK profile of DNL343 for intravenous and oral dosing?

      DNL343 administered orally to mice as a suspension formulation showed plasma PK consistent with prolonged absorption with tmax ranging from 3 to 4 h, and a terminal elimination half-life (t1/2) of ~10 h. Details can be found in Craig et al J Med Chem (2024).

      (6) Are there any major drug metabolites that could be of concern?

      DNL343 metabolism is through Phase 1 biotransformation pathways. None of the in vivo circulating metabolites show potency towards eIF2B activation. Given that none of these metabolites are of concern, we believe this information is beyond the scope of the current manuscript.

      Reviewer #3 (Public Review):

      Summary:

      ISR contributes to the pathogenesis of multiple neurodegenerative diseases, such as ALS, FTD, VWMD, etc. Targeting ISR is a promising avenue for potential therapeutics. However, previously identified ways to target ISR present some challenges. PERK inhibitors suppress ISR by inhibiting eIF2alpha phosphorylation and cause pancreatic toxicity in mice. In order to bypass eIF2alpha, previous studies have identified ISR suppressors that target eIF2B, such as ISRIB and 2BAct. These molecules suppress neurodegeneration but do not cause detrimental effects in mouse models. However, ISRIB is water-insoluble, and 2BAct causes cardiovascular complications in dogs, preventing their use in clinics. Here, the authors showed that DNL343, a new ISR inhibitor targeting eIF2B, suppresses neurodegeneration in mouse models. Combined with their previous results of a clinical phase I trial showing the safety of DNL343, these findings suggest the promise of DNL343 as a potential drug for neurodegenerative diseases in which ISR contributes to pathogenesis.

      Strengths:

      The finding is important and has disease implications, and the conclusion is not surprising.

      Weaknesses:

      The experimental design and data are hard to comprehend for an audience with a basic research background. This reviewer suggests that the authors use the same way that previous studies on ISRIB and 2BAct (e.g., Wong et al; eLife, 2019) designed experiments and interpret data.

      We thank this reviewer for their feedback and recognition that DNL343 has a promising potential as treatment for neurodegenerative diseases. While our studies share some similarities to Wong et al., eLife (2019) and Abbink et al., ACTN (2019), our study design is intentionally distinct (e.g. inclusion of both prevention and treatment dosing paradigms, determining dose-response impact of drug treatment across biomarkers) which necessitates tailored data visualization to effectively communicate our findings. However, we understand the importance of clarity for a broader audience and to this end, we have made a number of changes to the data figures, in particular data from omics experiments in Figures 3 and 5. We also provided additional supplemental tables to aid data interpretation. This would hopefully cater to both audiences familiar with previous work and those with a less specialized background.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Demyelination is a significant pathological feature in the VWMD mouse model. The authors should clarify whether they observed similar demyelination in their study and if DNL343 had any impact on reversing this demyelination. These findings are crucial for assessing the compound's effectiveness in mitigating neurodegeneration.

      Demyelination is indeed an important feature in the eIF2B LOF (VWMD) mouse model. Given that this phenotype and the ability to rescue the histological phenotype with this MOA (Wong et al; eLife, 2019, cited in introduction) is very well characterized, along with our limitation from the size and number of mouse tissues, we prioritized non-histological targeted and unbiased analyses that were aimed at identifying translatable biomarkers. Nonetheless, the totality of our data, in different mouse models and cell types, strongly supports DNL343 as a potent ISR inhibitor that is effective in attenuating neurodegeneration:

      · In the optic nerve crush model, DNL343 dose-dependently reduced retinal cell degeneration

      · In the VWMD mouse model, DNL343 attenuated the increase in a plasma biomarker of neurodegeneration, neurofilament-light, which corresponded to normalization in motor function.

      · Metabolomic and lipidomic analyses in the VWMD mouse model brain showed increases in oxysterols, such as 7-ketocholesterol, and cholesterol esters and these lipids are associated with demyelination (Nugent et al, 2020). DNL343 treatment attenuated the levels of these oxysterols, indicating decreased demyelination.

      · When initiated at an advance disease stage, reversal of plasma biomarkers of neurodegeneration (Nf-L) and neuroinflammation (GFAP) by DNL343 in this model was accompanied by extension in the lifespan that is otherwise shortened as the mutant animals succumb to disease.

      These data highlight the potential therapeutic benefits of DNL343 in the broader context of ISR-mediated neurodegeneration which can include but may not be limited to VWMD.

      (2) Figure 6 presents several biomarkers with significantly increased levels in VWMD mice and patient biofluids. However, these biomarkers are not reflected in the brain proteomics data presented in Figure 3. The discrepancy between these findings should be addressed and discussed in the manuscript to provide a more comprehensive understanding.

      Proteins detected in Figure 6 were not detected by TMT proteomics in the CSF. In the brain, only GFAP was detected and the overall abundance in tissue were similar in both genetic groups. Cytokines such as TIMP1, MCP1 are usually present in low abundances and therefore are challenging to detect in broad discovery proteomics method applied in this study. Antibody-based immunoassays are better suited to specifically measure low abundant proteins than mass-spectrometry-based proteomics, while mass-spectrometry based methods offer wider dynamic range to detect more highly abundant proteins. Differences in detection sensitivity between immunoassay vs mass spectrometry assays has been previously noted (Petrera et al, J Proteome Res, 2021). We have added new text to address this point in the revised manuscript (page 7, line 274-277).

      (3) Figure 7 discusses the effects of DNL343 treatment initiated at an advanced disease stage. Since the 4-week treatment did not rescue performance in the balance beam test (as shown in Figure 6A), it is important to clarify if a 20-week treatment had any impact on this parameter.

      This reviewer raised an important question that we were unfortunately unable test. When the balance beam training was administered after 8 (out of 20) weeks of dosing, most animals of both wildtype and mutant genotypes struggled to remain on or maintain balance on the beam and were unable to progress traversing the beam, making the assay unsuccessful in this cohort. This impairment appeared to be driven by distinct factors in the two genotypes: age-associated obesity in wild-type animals and severe motor impairment in the eIF2B HOM mice, irrespective of treatment. While it is possible that other less demanding and more sensitive assays could reveal more nuanced differences, this, and our earlier data (Figure 4G-I), suggest that DNL343 could prevent but not reverse functional deterioration. This is in line with our understanding of DNL343 mechanism of action that does not include neuronal regeneration, a therapeutic effect that is likely required for functional recuperation. We have added this point to the manuscript (page 8, line 319-326).

      Additionally, considering the significant increase in Gdf15 levels in the disease model, it would be valuable to know if DNL343 treatment affected Gdf15 levels. If these assays were conducted, reporting the data would greatly assist in evaluating the compound's efficacy when administered at an advanced disease stage.

      We were not able to measure GDF15 levels in the 20-week study due to limitation in the in-life collected plasma samples which was dedicated to assessing biomarkers of neurodegeneration (Figure 7E-F). However, data from our 4-week treatment study, which was initiated at a similar age range to the 20-week treatment study (19-26 and 24-33 weeks of age, respectively), showed that DNL343 was able to reduce GDF15 levels in the brain (mRNA and protein) and CSF (protein) (Supplemental Figure 5A-C), suggesting that DNL343 reduces ISR activation at an advanced disease stage in the model. We expect that this reduction observed at 4 weeks of treatment would persist for the duration of the extended treatment in the 20-week cohort.

      (4) A minor point. In Figures 5A, 5C, and 5E, it appears that the red-colored group should likely be labeled as "HOM 0 mg/kg" instead of "HOM 3 mg/kg".

      This has been amended, thank you.

      Reviewer #3 (Recommendations For The Authors):

      Major concerns:

      (1) The cellular function of DNL343 needs to be clarified. The authors claim that it activates eIF2B, but no cellular or molecular evidence is provided. Does it bind to eIF2B? Does it not affect eIF2alpha phosphorylation? Does it restore translation upon stress that causes eIF2alpha phosphorylation? Does it suppress stress granule assembly? The authors cited Sun, Tsai et al. 2023 and Osipov et al., 2022. However, these citations are conference abstracts with no published figures available for review.

      We agree that additional data outlining the biochemical evidence of the mechanism of action of DNL343 was needed. We now include a citation to Craig et al J Med Chem (2024) that includes the full details on the discovery and molecular characterization of DNL343.

      (2) It needs to be clarified how the authors selected the ISR marker genes. ISR genes are more than those selected. How about others? How did the authors measure the mRNA levels, bulk RNA-seq or RT-PCR? If the former, have the authors verified their results using RT-PCR? Have the authors measured the protein levels for nerve crush experiments (by both proteomic and individual protein analyses)? Also, no statistical analyses were found for the heat maps.

      The ISR marker genes were selected by a combination of experimental and literature data. Transcriptomics analysis of the eIF2B HOM brains was conducted using untargeted RNAseq (Supplemental Figure 1B). Here, we found an enrichment of transcripts previously reported to be ISR dependent, namely Atf4, Chac1, Ddit3, Eif4ebp1, Ppp1r15a (Larhammar et al., 2017), Atf3, Asns, Mthfd2, Psat1, Sesn2, Slc1a5, Slc7a5, Slc7a11, Trib3 (Wong et al., 2019, Abbink et al., 2019).  These transcripts were assayed using targeted qPCR in the eIF2B HOM brains, spleen and PBMC (Supplemental Figure 1A, C, D) and in the retinas from the ONC experiments (Figure 2C). We have further clarified the analysis method for the gene expression data in the figure legends.

      We did not interrogate the proteome of the retina in the ONC model. Our study in this model was intended as a proof-of-concept evaluation of DNL343 effects in this acute ISR-dependent model of neurodegeneration. To this end, we performed gene expression (Figure 2C) and immunofluorescence analyses (Figure 2D-F). Each of these analyses were conducted using dedicated whole retinas; conducting additional protein analyses would necessitate a separate cohort of animals.

      We believe that heatmaps provide the best visualization of the data, particularly the dose dependent effects of DNL343 on multiple genes, but we understand the value for also providing statistical analyses. To address this, we provide additional Supplemental tables to show the outcome of statistical analyses undertaken. Statistical data relating to Figure 2C can be found on new Supplemental Tables 1 & 2; those relating to Supplemental Figures 1A, C, and D on new Supplemental Tables 3, 5, 6, respectively; that from Figure 4D on new Supplemental Table 8, and that from Figure 7D on new Supplemental Table 11.

      (3) Both the authors and Wong et al. (eLife, 2019) performed transcriptomic analyses on HOM mice. How do the authors compare the two data sets? Are they the same?

      In this work, transcriptomic approach was applied to confirm induction of ISR response in our in vivo model. While data are not identical, all of the top annotated genes shown in supplementary figure 1B were also deemed to be significant by Wong and coworkers (Bayes factor > 10). More importantly, as explained in our responses to question #2 from reviewer 3,  ISR genes highlighted in supplementary Figure 1B were also confirmed in two other studies (Larhammar et al., 2017, Abbink et al., 2019). These data support our interpretation that eIF2B HOM have elevated ISR relative to WT mice. We have added new text to line 164 on page 5 to clarify this point.

      (4) Can the authors interpret their omic data using volcano plots for HOM rescue experiments, as Wong et al. did in eLife 2019? Heat maps with statistical analyses are more straightforward to comprehend. Can the authors verify some of these data using RT-PCR, Western blot, etc.?

      We added additional pathway interpretation in our Figure 3 and 5 to highlight key biological processes altered in the brain and cellular compartment origin of CSF proteins changed in eIF2B HOM at baseline and following treatment with DNL343. Our treatment designed employed multiple dosing levels and as such, summarization by volcano plot would have resulted in creation of many figures that can be more easily captured by a single heat map plot. However, to provide additional quantitative information, we now added supplementary tables showing full statistical analysis for all heat maps for added clarity and transparency.

      We demonstrated 100% correlation between the select genes we examined by qPCR in supplemental Figure 1A and those identified from brain by RNA-seq. In addition, question of reliability of RNA-seq data has been previously been examined in great detail (Everaet et al, Sci Rep 2017) and found ~85% concordance between RNA-seq and qPCR data and those that were discordant tended to have < 2 log2FC and were present in low abundance. Given that top core ISR genes identified in our study have >2 log2FC and have been verified by other independent labs (Larhammar et al., 2017, Abbink et al., 2019, Wong et al., 2019). Based on these, we do not think that there is a rationale need for technical confirmation of RNAseq data.

      Risks for mis-annotation of proteins in TMT data were further mitigated by removing protein with coverage < 20% and having less than 8 unique peptides detected and setting protein annotation FDR to <1%.

      Additionally, TMT-labelling based proteomics offers wider dynamic range and sensitivity than western blotting. Validation of TMT logFC data with western blot technique, which is less quantitative and has lower dynamic ranges of detection may not be very informative. Furthermore, similar trends of changes in key ISR genes and proteins shown in figures 4D and 5A (e.g PSAT, SLC7A11, SLC7A5) provides additional support for the authenticity of proteins identified in this work.

      Also, for Figures 4E and F, it is assumed that each line represents an individual animal, but why their body weight gains are so different for the wild type? Can the authors plot the mean and s.e.m.? Also, there are no data about neurodegeneration. The authors need to show microscopy images, count the numbers, and assess the morphology of nerve cells.

      The large data spread in the body weight gain in our wild-type mice reflect the normal variability of this endpoint which can be influenced by sex and age. Indeed, both factors are present in our cohorts as animals of both sexes were included and there was a 7-week age-range (10-17 weeks of age at dosing start). Each line in Figures 4E-F indeed represents data sampled from individual animal over time. We chose to represent the data this way for transparency and have provided additional visualization (new Supplemental Figure 3) showing both body weight gain and plasma Nf-L levels as mean ± SEM as requested by this reviewer.

      In this study we chose to use a clinically-relevant biomarker of neurodegeneration, plasma neurofilament light chain (NfL) (Figure 4F). This allowed us to prioritize the tissue samples from these studies to execute comprehensive unbiased analyses for more complete characterization of the phenotype of these eIF2B LoF mice. NfL is a biomarker that has been recognized as a sensitive measurement of neuronal/axonal damage regardless of cause (Gaetani et al., 2018, Khalil et al., 2018). Elevated levels of plasma (and CSF) NfL levels has been demonstrated across neurodegenerative conditions such as Alzheimer’s disease (Giacomucci et al., 2022), multiple sclerosis (Ferreira-Atuesta et al., 2021), and in ALS (Huang et al., 2018).

      (5) How ISR is connected to metabolomic changes? Can the authors explain it?

      ISR caused significant increases in amino acid transporter and serine/glycine/1-carbon metabolism enzymes transcript and protein abundances that were highlighted in Figure 3A and C and lines 237-255 in the main text. Similar patterns were also observed in prior published studies (Larhammar et al., 2017, Abbink et al., 2019, Wong et al., 2019). Consistent with these changes we observed increased levels of Alanine (transported by SLC3A2, SLC7A11, SLC7A3) and decreased cystathionine levels (associated with increased expression of CTH).  ATF4 is one of the main orchestrator of ISR response to stress (e.g., amino acid deprivation) and it is required for expression of amino acid transporters and enzymes required for synthesis non-essential amino acids (PMID: 28494858). ATF4 increases cellular amino acid uptake and deliver AA needed for synthesis of proteins and glutathione needed for survival.

      We also observed prominent changes in CE in eIF2B HOM and its normalization with DNL343 treatment shown in Figure 5C. We checked for changes in expression levels of CEL, CES1, LCAT, LIPA, SOAT1, and NCEH1 proteins involved in CE metabolism and failed to detect any changes in protein or RNA abundances.  This  suggests that a rapid demyelination is a more likely trigger for CE accumulation as reported in FTD-GRN (Marian OC et al., 2023 acta neuropathol commun 11, 52), and in experimental demyelination models (Nugent AA et al., 2020 Neuron). We have added new text to the discussion section of the manuscript page 9, lines 408-411 to discuss how these results relate to each other.

      (6) It is hard to understand the biomarker part. The authors said "potential translational biomarkers are elevated..." Do the authors mean they are elevated so they can be potential biomarkers? If their levels are unchanged (e.g., TIMP-1), how can they be biomarkers? Also, this part needs a conclusion/summary. Also, what does "reversed biomarkers..." mean?

      We have modified the text to clarify and included a concluding sentence for this section of the results (page 7, lines 297-299). In assessing whether a given protein could be a potential translational biomarker for human disease we evaluated if the following two conditions were met: (1) Increased or decreased gene expression or protein levels of the biomarker in the brain or biofluids (CSF or plasma) of Eif2b5 R191H homozygote mice relative to wild-type controls that is modulated or normalized by administration of DNL343 and (2) protein levels in biofluids from VWMD patients that show differential levels than healthy controls in the same directionality as what is seen in the mouse model. GDF-15, GFAP, and NfL meet these criteria, but TIMP-1 and MCP-1 do not.

      Minor concerns:

      (1) Please explain which multiple comparison tests the authors used.

      This information has been further clarified in the figure legends.

      (2) Administrating the drug at an advanced stage led to a trend of NfL reduction but did not rescue function. Can the authors discuss what this means?

      Further elaboration and discussion about this finding have been added to the results section on page 8, line 319-325.

      (3) For statistical analyses on the bar graphs, it would be better if the authors labeled the comparison pairs on the graphs.

      We agree that labelling comparisons in bar graphs could aid the readership and have added this modification. Additionally, comparisons are indicated in the figure legend.

      (4) The authors need to state clearly that 2BAct's cardiovascular toxicity was observed in dogs, not mice. The current study does not exclude similar DNL343 toxicity. However, previous clinical trials suggest that DNL343 may be safe for humans.

      The suggestion to specify cardiovascular toxicity in dogs has been added (page 3, line 101), thank you. We now include a citation to Craig et al J Med Chem (2024) that provides evidence in a non-human primate (cynomolgus monkey) model that DNL343 dosing did not result in QT prolongation or any functional cardiac changes. We have also completed a Phase 1 (NCT04268784) and Phase 1B double-blind (NCT05006352) trials in healthy and ALS participants, respectively and now include reference to these trials on page 4, lines 102-104. The safety profile observed in these clinical studies supported further development of DNL343 for ALS in the Healey Platform trial (NCT04297683, Regimen G).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      We thank Reviewer #1 for the assessment of our study.

      Reviewer #2:

      The authors should use DF/F to quantify over time the calcium response in photoreceptors. Furthermore, they should show that there is no concern of motion artifact when the pressure changes - as it could be a concern”.

      We used the ΔR/R measure (as defined in Böhm et al. 2016) to correct for motion artifacts due to the larvae moving out of the focal plane at the onset of pressure stimulation. This measure calculates the ratio of the GCaMP signal and a reference fluorescent signal (tdTomato in our case). This ratiometric quantification can better correct for changes in fluorescence that are not related to changes in calcium concentration than the ΔF/F metric, which does not use an independent reference channel.

      The authors have not shown

      (1) how the off response to decrease of pressure is mediated

      (2) which receptor/channel mediates in photoreceptors the response to increased pressure,

      (3) nor how the integration of light and pressure information is integrated by photoreceptors in order to guide the behavior of the larvae.

      These points are beyond the scope of the study. However, if possible within a short time frame, it would be really interesting to find out whether conflicting stimuli or converging stimuli (light & pressure) can cancel each other out or synergize. In particular since the authors cite unpublished results in the discussion: "Our unpublished results indeed suggest that green light determines the direction of swimming and can override upward swimming induced by pressure, which only influences the speed of swimming (LABC and GJ, unpublished)." Showing in one panel this very cool phenomenon would be exciting & open tons of questions for the field.”

      We agree that investigating the interaction of light and pressure is a very exciting direction. However, doing it properly with the rigour we characterised pressure sensation here (across stages, pressure levels and genotypes) and phototaxis and UV avoidance in previous work (across stages, wavelengths, genotypes and stimulus direction; see Randel et al. 2014, Gühmann et al. 2015, Verasztó et al. 2018, Jokura et al. 2023) would require a separate in-depth study.

      We agree with points 1-3 regarding the limitations and mentioned these in the discussion.

      (1) Although we carried out pressure-release experiments to characterise in more detail the response to pressure OFF, our setup did not allow us to control pressure release as accurately as we could for pressure increase. Therefore, we decided not to address this aspect of the response in more detail in this study.

      “Upon a decrease in pressure, three-day-old (but not two-day-old) larvae also show an off-response characterised by downward swimming. We have not analysed in detail the neuronal mechanisms of this response but it may depend on an inverted activation of the cPRC circuit, as happens during UV avoidance (Jokura et al., 2023)”

      (2) We decided not to explore this important question in this study, due to the significant effort it would take to test the expression and function of potential candidate channels in pressure transduction mechanism. “The cellular and molecular mechanisms by which cPRCs sense and transduce changes in hydrostatic pressure deserve further enquiry. “ and “The molecular mechanisms of pressure detection remain unclear. Components of the phototransduction cascade may be involved in pressure sensation. Our results indicate that the ciliary opsin required for detecting UV light is not essential for pressure sensation.“ We hypothesise in the discussion that TRP channels may play a role in pressure transduction, due to their diversity, multiple modalities and participation in phototransduction cascades.

      (3) We considered that the complexity of this question merits a separate study, where both cues can be accurately titrated and temporally combined to dissect the mechanisms of sensory integration. We have therefore removed the sentence referring to the interaction of phototaxis and the pressure response from the discussion.

      “How UV and pressure signals are integrated by the cPRC and how other light responses such as phototaxis interact with pressure responses remain exciting avenues for future research.”

    1. Author response:

      We thank the reviewers for their positive evaluation and constructive comments.  In our revision, we will aim to improve the analysis of our existing data and perform new experiments to address questions raised by the reviewers. 

      Reviewer 1 found it interesting that Kdm6b-deletion in hippocampal dentate gyrus (DG) neural stem cells causes precocious neuronal differentiation, whereas in contrast, Kdm6b is required for the maturation of neural progenitors in the ventricular-subventricular zone (V-SVZ). In the submitted manuscript, we did not provide much insight into the differences in Kdm6b function in these two neural stem cell populations. We plan on performing new experiments and expanding on our prior V-SVZ studies in a way that allows a direct comparison to the analyses of the DG. We hope that the addition of this data will shed light on why Kdm6b-deletion produces such different phenotypes in postnatal neural stem cells of the mouse brain. 

      Reviewer 2 noted that our submitted manuscript lacked insight into how KDM6B regulates gene expression. In particular, this reviewer asked whether the function of KDM6B is mediated by its enzymatic activity. The CUT&RUN experiment in our manuscript revealed an increase in H3K27me3 levels at select neural maintenance genes in the DG of Kdm6b-deleted mice. However, we agree that this data is insufficient to assess the significance of KDM6B-mediated H3K27me3 demethylation in regulating the NSC transcriptome. To address this point, we are performing experiments that can directly test this mechanistic model of KDM6B function and answer the question of whether the H3K27me3 demethylase activity of KDM6B is required for its ability to activate transcription.  Reviewer 2 also had a specific question about the cell types observed in the developing hippocampus after Kdm6b-deletion, and we believe that additional analyses will provide clarity to the overall phenotype.  More generally, we will aim to improve data quality and visualization. 

      Reviewer 3 raised the concern that because Kdm6b is not exclusively expressed in neural stem cells, the phenotype of precocious neuronal differentiation in mice with Kdm6b-deletion driven by the hGFAP-Cre transgene may be indirect, such as through changes in mature glial populations.  We will study the mature glia, as suggested by the reviewer.  We will also more thoroughly describe how our experiments targeting Kdm6b-deletion to adult neural stem cells with the tamoxifen-inducible Nestin-CreER provide evidence for the precocious neuronal differentiation phenotype being cell autonomous, at least in adult mice.  Reviewer 3 also had helpful suggestions for analyzing our single-cell RNA-seq data and behavioral studies, and we will address these comments in the revision. 

      Again, we thank the editors and reviewers for their time and consideration.  We believe that our manuscript will be greatly improved through this review process and hope to construct a stronger understanding of the role of KDM6B in DG neurogenesis.

    1. Author response:

      The following is the authors’ response to the original reviews.

      In the revised manuscript we have included an additional study that significantly contributes to the conclusions and models of the original version. Briefly, Figure 3 now describes our characterization of the diaphragm and laryngeal muscle activities (electromyography, EMG) during endogenous vocalizations. These EMGs also serve as representations of the brainstem breathing central pattern generator (CPG) inspiratory and post-inspiratory generating neurons, respectively. In our original submission, we found that many of the vocalizations had changes in pitch that mirrored the change in expiratory airflow (we termed positive intonation), and we proposed that the coordination of breathing muscles (like the inspiratory muscles) and larynx patterned this. This mechanism is akin to our findings for how neonatal cries are rhythmically timed and produced (Wei et al. 2022). The newly presented EMG data re-inforces this idea. We found that for vocalizations with positive intonation, the inspiratory diaphragm muscle has an ectopic burst(s) of activity during the expiration phase which corresponds to a decrease in airflow and pitch, and this is followed by laryngeal muscle activity and increased pitch. This can be cycled throughout the expiration to produce complex vocalizations with oscillations in pitch. A basal breath is hardwired for the laryngeal muscle activity to follow the diaphragm, so the re-cycling of this pattern nested within an expiration (a ‘mini-breath’ in a ‘breath’) demonstrates that the vocalization patterning system engages the entire breathing CPG. This contrasts with the canonical model that activity of the laryngeal premotor neurons control all aspects of producing / patterning vocalizations. Furthermore, this mechanism is exactly how the iRO produces and patterns neonatal vocalizations (Wei et al. 2022) and motivates the likely use of the iRO in adult vocalizations.

      Response to recommendations for the authors:

      Reviewer #1:

      (1) The authors should note in the Discussion that the cellular and circuit mechanisms by which the vocalization pattern generator integrates with the respiratory pattern generator to control expiratory airflow have not been fully worked out, requiring future studies.

      This was noted in the discussion section “The iRO likely patterns intonation for endogenous phonation”.

      (2) Please change the labeling of the last supplemental figure to Figure Supplemental 5.

      Thank you for identifying this.

      Reviewer #2:

      Major concerns

      (1) While it is true that modulation of activity in RAm modulates the laryngeal opening, this statement is an incomplete summary of prior work. Previous studies (Hartmann et al., 2020; Zhang et al., 1992, 1995) found that activation of RAm elicits not just laryngeal adduction but also the production of vocal sounds, albeit vocal sounds that were spectrally dissimilar from speciestypical vocalizations. Moreover, a recent study/preprint that used an activity-dependent labeling approach in mice to optogenetically activate RAm neurons that were active during USV production found that re-activation of these neurons elicits USVs that are acoustically similar to natural USVs (Park et al., 2023). While the authors might not be required to cite that recent preprint (as it is not yet peer-reviewed), the fact that activation of RAm elicits vocal sounds is clear evidence that its effects go beyond modulating the size of the laryngeal opening, as this alone would not result in sound production (i.e., RAm activation must also recruit expiratory airflow). The authors should include these relevant studies in their Introduction. Moreover, the rationale for the model proposed by the authors (that RAm controls laryngeal opening whereas iRO controls expiratory airflow) is unclear with regard to these prior studies. The authors should include a discussion of how these prior findings are consistent with their model (as presented in the Introduction, as well as in Figure 4 and relevant Discussion) that RAm modulates the size of laryngeal opening but not expiratory airflow.

      An introduction and discussion of the Veerakumar et. al. 2023 and Park et. al. 2024 manuscripts describing RAm in mice has now been included.

      The iRO serves to coordinate the breath airflow and laryngeal adduction to produce sound and the intonation within it that mirrors the breath airflow. This occurs because the iRO can control the breathing CPG (synaptic input to the preBötC inspiratory pacemaker) and is premotor to multiple laryngeal muscles (Wei et. al. 2022). The modulation of the expiratory airflow is by inducing momentary contraction of the diaphragm (via excitation of the preBötC) which opposes (a.k.a. slows) expiration. This change in flow results in a decrease in pitch (Fig. 3 in the revised manuscript, Wei et. al. 2022).

      It is our understanding that the basic model for RAm evoked USVs is that RAm evokes laryngeal adduction (and presumed abdominal expiratory muscle activation) and this activity is momentarily stopped during the breath inspiration by inhibition from the preBötC (Park et. al. 2024). So, in this basic model, any change in pitch and expiratory airflow would be controlled by tuning RAm activity (i.e., extent of laryngeal adduction). In this case, the iRO induced inspiratory muscle activity should not occur during expiration, which is not so (Fig. 3). Note, the activity of abdominal expiratory muscles during endogenous and RAm evoked USVs has not been characterized, so the contribution of active expiration remains uncertain. This is an important next step.

      We have now included a discussion of this topic which emphasizes that iRO and RAm likely have reciprocal interactions (supported by the evidence of this anatomical structure). These interactions would explain why excitation of either group can evoke USVs and, perhaps, the extent that either group contributes to a USV explains how the pitch / airflow changes. An important future experiment will be to determine the sufficiency of each site in the absence of the other.

      (2) The authors provide evidence that the relationship between expiratory airflow and USV pitch is variable (sometimes positive, sometimes negative, and sometimes not related). While the representative spectrograms clearly show examples of all three relationship types, no statistical analyses are included to evaluate whether the relationship between expiratory airflow and USV pitch is different than what one would expect by chance. For example, if USV pitch were actually unrelated to expiratory airflow, one might nonetheless expect spurious periods of positive and negative relationships. The lack of statistical analyses to explicitly compare the observed data to a null model makes it difficult to fully evaluate to what extent the evidence provided by the authors supports their claims.

      We have now included two null distributions and compared our observed correlation values to these. The two distributions were created by taking each USV / airflow pair and randomly shuffling either the normalized USV pitch values (pitch shuffled) or the normalized airflow values (airflow shuffled) to simulate the distribution of data should no relationship exist between the USV pitch and airflow.

      (3) The relationship between expiratory airflow and USV pitch comes with two important caveats that should be described in the manuscript. First, even in USV types with an overall positive relationship between expiratory airflow and pitch contour, the relationship appears to be relative rather than absolute. For example, in Fig. 2E, both the second and third portions of the illustrated two-step USV have a positive relationship (pitch goes down as expiratory airflow goes down). Nonetheless, the absolute pitch of the third portion of that USV is higher than the second portion, and yet the absolute expiratory airflow is lower. The authors should include an analysis or description of whether the relationship between expiratory airflow and USV pitch is relative vs.

      absolute during periods of 'positive intonation'.

      The relationship between pitch and airflow is relative and this in now clarified in the text. To determine this, we visualized the relationship between the two variables by scatterplot for each of the USVs syllables and, as the reviewer notes, a given airflow cannot predict the resulting frequency and vice versa.

      (4) A second important caveat of the relationship between expiratory airflow and USV pitch is  that changes in expiratory airflow do not appear to account for the pitch jumps that characterize mouse USVs (this lack of relationship also seems clear from the example shown in Fig. 2E). This caveat should also be stated explicitly.

      The pitch jumps do not have a corresponding fluctuation in airflow, and this is now stated in the results and discussion.

      (5) The authors report that the mode of relationship between expiratory airflow and USV pitch (positive intonation, negative intonation, or no relationship) can change within a single USV. Have the authors considered/analyzed whether the timing of such changes in the mode of relationship coincides with pitch jumps? Perhaps this isn’t the case, but consideration of the question would be a valuable addition to the manuscript.

      We analyzed a subset of USVs with pitch jumps that were defined by a change >10 kHz, at least 5ms long, and had one or two jumps. The intonation relationships between the sub-syllables within a USV type were not stereotyped as evidenced by the same syllable being composed of combinations of both modes.

      (6) The authors incorrectly state that PAG neurons important for USV production have been localized to the ventrolateral PAG. Tschida et al., 2019 report that PAG-USV neurons are located predominantly in the lateral PAG and to a lesser extent in the ventrolateral PAG (see Fig. 5A from that paper). The finding that iRO neurons receive input from VGlut2+ ventrolateral PAG neurons represents somewhat weak evidence that these neurons reside downstream of PAG-USV neurons. This claim would be strengthened by the inclusion of FOS staining (following USV production), to assess whether the Vglut+ ventrolateral PAG neurons that provide input to iRO are active in association with USV production.

      This comment correctly critiques that our PAG à iRO tracing does not demonstrate that the labeled PAG neurons are sufficient nor necessary for vocalization. Directly demonstrating that activation and inhibition the PAG-iRO labeled neurons ectopically drives or prevents endogenous USVs is an important next step. While FOS implies this connectivity, it does not definitely establish it and so this experiment is impacted by some of the caveats of our tracing (e.g. PAG neurons that drive sniffing might be erroneously attributed to vocalization).

      Our reading of the literature could not identify an exact anatomical location within the mouse PAG and this site appears to vary within a study and between independent studies (like within and between Tschida et. al. 2019 and Chen et. al. 2021). The labeling we observed aligns with some examples provided in these manuscripts and with the data reported for the retrograde tracing from RAm (Tschida et al 2019).

      (7) In Figure S5A, the authors show that USVs are elicited by optogenetic activation of iRO neurons during periods of expiration. In that spectrogram, it also appears that vocalizations were elicited during inspiration. Are these the broadband vocalizations that the authors refer to in the Results? Regardless, if optogenetic activation of iRO neurons in some cases elicits vocalization both during inspiration and during expiration, this should be described and analyzed in the manuscript.

      The sound observed on the spectrogram during inspiration is an artefact of laser evoked head movements that resulted in the fiber cable colliding with the plethysmography chamber. In fact, tapping an empty chamber yields the same broad band spectrogram signal. The evoked USV or harmonic band vocalization is distinct from this artefact and highlighted in pink.

      (8) Related to the comment above, the authors mention briefly that iRO activation can elicit broadband vocalizations, but no details are provided. The authors should provide a more detailed account of this finding.

      The broadband harmonic vocalizations we sometimes observe upon optogenetic stimulation of AAV-ChR2 expressing iRO neurons are akin to those previously described within the mouse vocal repertoire (see Grimsley et. al .2011). We have added this citation and mentioned this within the text. 

      (9) The effects of iRO stimulation differ in a couple of interesting ways from the effects of PAGUSV activation. Optogenetic activation of PAG-USV neurons was not found to entrain respiration or to alter the ongoing respiratory rate and instead resulted in the elicitation of USVs at times when laser stimulation overlapped with expiration. In contrast, iRO stimulation increases and entrains respiratory rate, increases expiratory and inspiratory airflow, and elicits USV production (and also potentially vocalization during inspiration, as queried in the comment above). It would be informative for the authors to add some discussion/interpretation of these differences.

      We have added a section of discussion to describe the how these different results may be explained by the iRO being a vocal pattern generator versus the PAG as a ‘gating’ signal to turn on the medullary vocalization patterning system (iRO and RAm). See discussion section ‘The iRO likely patterns intonation for endogenous phonation’.

      (10) The analysis shown in Fig. 4D is not sufficient to support the author’s conclusion that all USV types elicited by iRO activation are biased to have more positive relationships between pitch and expiratory airflow. The increase in the relative abundance of down fm USVs in the opto condition could account for the average increase in positive relationship when this relationship is considered across all USV types in a pooled fashion. The authors should consider whether each USV type exhibits a positive bias. Although such a comparison is shown visually in Fig. 4G, no statistics are provided. All 7 USV types elicited by optogenetic activation of iRO should be considered collectively in this analysis (rather than only the 5 types currently plotted in Fig. 4G).

      In the original submission the statistical analysis of r values between opto and endogenous conditions was included in the figure legend (‘panels E-G, two-way ANOVA with Sidak’s post-hoc test for two-way comparisons was used; all p-values > 0.05), and this has not changed in the revised manuscript. We have now provided the suggested comparison of opto vs endogenous USVs without down fm (Fig. 5D). This positive shift in r is statistically significant (…).

      (11) The evidence that supports the author’s model that iRO preferentially regulates airflow and that RAm preferentially regulates laryngeal adduction is unclear. The current study finds that activation of iRO increases expiratory (and inspiratory) airflow and also elicits USVs, which means that iRO activation must also recruit laryngeal adduction to some extent. As the authors hypothesize, this could be achieved by recruitment of RAm through iRO’s axonal projections to that region.

      Note, it is more likely that iRO is directly recruiting laryngeal adduction as they are premotor to multiple laryngeal muscles like the thyroarytenoid and cricothyroid (Wei et. al. 2022). The ‘Discussion’ now includes our ideas for how the iRO and RAm likely interact to produce vocalizations.

      In the recent preprint from Fan Wang’s group (Park et al., 2023), those authors report that RAm is required for USV production in adults, and that activation of RAm elicits USVs that appear species-typical in their acoustic features and elicits laryngeal adduction (assessed directly via camera). Because RAm activation elicits USVs, though, it must by definition also recruits expiratory airflow. Can the authors add additional clarification of how the evidence at hand supports this distinction in function for iRO vs RAm?

      See response to ‘Major Concern #1”.

      Minor concerns 

      (1) The authors might consider modifying the manuscript title. At present, it primarily reflects the experiments in Figure 2.

      We have provided a title that we feel best reflects the major point of the manuscript. We hope that this simplicity enables it to be recognized by a broad audience of neuroscientists as well as specialists in vocalization and language.

      (2) The statement in the abstract that "patterns of pitch are used to create distinct 'words' is somewhat unclear. Distinct words are by and large defined by combinations of distinct phonemes. Are the authors referring to the use of "tonemes" in tonal languages? If so, a bit more explanation could be added to clarify this idea. This minor concern includes both the Abstract, as well as the first paragraph of the Introduction.

      We have clarified this line in the abstract to avoid the confusing comparison between mouse vocalizations and human speech. In the introduction we have expanded our explanation to clarify that variations in pitch are a component of spoken language that add additional meaning and depth to the underlying, phonemic structure. 

      (3) Multiple terms are used throughout the manuscript to refer to expiratory airflow: breath shape (in the title), breath pattern, deviations in exhalation, power of exhalation, exhalation strength, etc. Some of these terms are vague in meaning, and a consolidation of the language would improve the readability of the abstract and introduction.

      We have chosen a smaller selection of descriptive words to use when describing these breath features.

      (4) Similarly, "exhalation" and "expiration" are both used, and a consistent use of one term would help readability.

      See point 3.

      (5) In a couple of places in the manuscript, the authors seem to state that RAm contains both laryngeal premotor neurons as well as laryngeal motor neurons. This is not correct to our knowledge., but if we are mistaken, we would ask that the authors add the relevant references that report this finding.

      It is our understanding that the RAm is defined as the anatomical region consistent with the murine rostral and caudal ventral respiratory groups composed of multiple premotor neuron pools to inspiratory, expiratory, laryngeal, and other orofacial muscles. This is supported by neurons within RAm that reflect multiple phases of the inspiratory and expiratory cycle (Subramanian et. al. 2018) and excitation of sub-regions within RAm modulating multiple parts of the breathing control system (Subramanian et. al. 2018 and Subramanian 2009). Rabies tracing of the various premotor neurons which define the anatomical region of RAm in the mouse shows that they surround the motor neurons in the loose region of the nucleus ambiguus (the anatomical location of RAm) for multiple muscles of the upper airway system, such as the thyroarytenoid (Wu et. al. 2017, Dempsey et. al. 2021 and Wei et. al. 2022). Given that the name RAm reflects a broad anatomical location, we have used it to describe both the premotor and motor neurons embedded within it. We have now clarified this in the text.

      (6) The statistical analysis applied in Figure 1C is somewhat confusing. The authors show two distributions that appear different but report a p-value of 0.98. Was the analysis performed on the mean value of the distributions for each animal, the median, etc.? If each animal has two values (one for USV+ breaths and one for USV- breaths), why not instead compare those with a paired t-test (or Wilcoxon rank sign)? Additional information is needed to understand how this analysis was performed.

      The original manuscript version used a two-way anova to compare the normalized histogram of instantaneous frequency for breaths with (USV+) or without (USV-) for each animal (first factor: USV+/-, second factor: Frequency). The p-value for the first factor (USV) was 0.98 showing no statistically significant effect of USV on the distribution of the histogram.

      For simplicity, we have instead performed the analysis as suggested and include a bar graph. This analysis shows that the instantaneous frequency of USV breaths is, in fact, statistically significantly lower than those without USVs. We have updated the figure legend and text to reflect this.

      (7) The use of the word "syllable" to describe parts of a USV that are produced on a single breath may be confusing to some scientists working on rodent USVs. The term 'syllable' is typically used to describe the entirety of a USV, and the authors appear to use the term to describe parts of a USV that are separated by pitch jumps. The authors might consider calling these parts of USVs "sub-syllables".

      We have clarified these descriptions throughout the text. We now refer to the categories as ‘syllable types’, define ‘syllables’ as ‘a continuous USV event’ with no more than 20ms of silence within and finally ‘sub-syllables’ to refer to components of the syllable separated by jumps in frequency (but not gaps in time).

      (8) In Figure S3, final row, the authors show a USV produced on a single breath that contains two components separated by a silent period. This type of bi-syllabic USV may be rare in adults and is similar to what the authors showed in their previous work in pups (multiple USVs produced on a single expiration, separated by mini-inspirations). One might assume that the appearance of such USVs in pups and their later reduction in frequency represents a maturation of vocalrespiratory coordination. Nonetheless, the appearance of bi-syllabic USVs has not been reported in adult mice to our knowledge, and the authors might consider further highlighting this finding.

      We were also struck by the similarity of these USVs to our study in neonates and such types of similarities sparked an interest in the role of the iRO in patterning adult USVs. We now include a description of the presence and abundance of bi- and tri-syllablic calls observed in our recordings to highlight this finding.

      (9) Figure 4 is referenced at the end of the second Results section, but it would seem that the authors intended to reference Figure 2. 

      For simplicity we included some of the referenced data within Fig. S5. We appreciate the recommendation.

      (10) In the optogenetic stimulation experiments, the authors should clarify why bilateral stimulation was applied. Was unilateral stimulation ineffective or less effective? The rationale provided for the use of bilateral stimulation (to further localize neural activation) is unclear.

      The iRO is bilateral and, we presume, functions similarly. So, we attempted to maximally stimulate the system. We have clarified this in the methods.

      (11) Figure Supplemental '6' should be '5'.

      Thanks!

      (12) Last sentence of the Introduction: "Lasty" should be "lastly".

      Thanks!

      (13) There are two references for Hage et al., 2009. These should be distinguished as 2009a and 2009b for clarity.

      Thanks!

    1. Author response:

      The following is the authors’ response to the current reviews.

      We thank the reviewers and editor for their careful review of our work. We believe the resulting manuscript is much stronger. We agree with the comments made by Reviewer #2 regarding additional histology and neuronal data analysis, which will be presented in subsequent work.


      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Weaknesses):

      It was not always clear what the lesion size was. This information is important for future applica- tions, for example, in the visual cortex, where neurons are organized in retinotopy patterns.

      We thank the reviewer for this feedback. While there is some variation in lesion volume for a given parameter set, we have added more details of the volumes of lesions created in our testing (Fig. 4 and Fig. 5).

      It would be helpful if the author could add some discussion about whether and how this method could be used in other types of array/multi-contact electrodes, such as passive neuropixels, S- probes, and so on. In addition, though an op-amp was used in the design, it would still be helpful if the author could provide a recommended range for the impedance of the electrodes.

      We thank the reviewer for this suggestion. We have both added a demonstration of use in a differ- ent multielectrode probe type (with a U-probe) in Fig. 8, and we have added a discussion about which types of multielectrode probes would be suitable on Page 15, Line 420.

      “We demonstrated that our electrolytic lesioning technique works with a linear multicontact probe by testing with a U-Probe in ex vivo rabbit cortex. There are no particular limitations that would prevent our specific electrolytic lesioning technique and device from working with any passive multielectrode probe. The main requirements for use are that the probe has two electrodes that can directly (via whatever necessary adapters) connect to the lesioning device, such that arbitrary current can be passed into them as the anode and cathode. This would limit use of probes, like Neuropixels, where the on-chip acquisition and digitization circuitry generally precludes direct connection to electrodes [1], [2]. The impedance of the multielectrode probe should not be an issue, due to the use of an op amp. We showed use  with a Utah array (20-800 kΩ) and a U-Probe (1-1.5 MΩ). The specific op amp used here has a voltage range of ± 450 V, which assuming a desired output of 150 µA of current would limit electrode impedance to 6 MΩ. Though a different op amp could easily be used to accommodate a higher electrode impedance, it is unlikely that this would be necessary, since most electrodes have impedances between 100 kΩ to 1 MΩ [3].”

      Reviewer 2 (Public Weaknesses):

      In many of the figures, it is not clear what is shown and the analysis techniques are not well described.

      We thank the reviewer for this feedback. We hope that our edits to both the figures and the text have improved clarity for readers.

      The flexibility of lesioning/termination location is limited to the implantation site of the multielec- trode array, and thus less flexible compared to some of the other termination methods outlined in Appendix 2.

      We thank the reviewer for this point. You are right that the lesioning location is limited to the multielectrode array’s implantation site, while other methods in Appendix 2 do not require prox- imity of the lesion location and the electrophysiology recording site. However, we believe that the closeness of the lesioning location to the microelectrode array is a strength - guaranteeing record- ings from the perilesional area - even with the small negative of reduced flexibility. Multielectrode arrays can be implanted in many areas of cortex. If one wanted to study distal effects of a lesion, additional electrophysiology probes could be implanted to record from those areas. We have noted this on Page 3, Line 117.

      “While the link between the lesion location and the multielectrode location technically con- strains the lesion to an area of cortex in which a multielectrode array could be implanted, we see the connection as a positive, because it ensures recording some neuroelectrophysiology from the perilesional area in which recovery is hypothesized to occur (see Appendix 1Data Availabilityappendix.41).”

      Although the extent of the damage created through the Utah array will vary based on anatomical structures, it is unclear what is the range of lesion volumes that can be created with this method, given a parameter set. It was also mentioned that they performed a non-exhaustive parameter search for the applied current amplitude and duration (Table S1/S2) to generate the most suitable lesion size but did not present the resulting lesion sizes from these parameter sets listed. Moreover, there’s a lack of histological data suggesting that the lesion size is precise and repeatable given the same current duration/amplitude, at the same location.

      We thank the reviewer for this thoughtful feedback. We have added figures (Figs. 4 and 5), where we show the relationship between estimated lesion volume and the current amplitude and duration parameters. These figures include more data from the tests in Supplementary File 1 and Supplementary File 2. While there is some variation in lesion volume for a given current amplitude and duration, there is still a clear relationship between the parameters and lesion volume.

      It is unclear what type of behavioral deficits can result from an electrolytic lesion this size and type (∼3 mm in diameter) in rhesus macaques, as the extent of the neuronal loss within the damaged parenchyma can be different from past lesioning studies.

      While we appreciate the reviewer’s interest in the behavioral deficits associated with our lesions in rhesus macaques, reporting these falls beyond the scope of this manuscript. Future work will explore the behavioral deficits associated with these lesions

      The lesioning procedure was performed in Monkey F while sedated, but no data was presented for Monkey F in terms of lesioning parameters, lesion size, recorded electrophysiology, histological, or behavioral outcomes. It is also unclear if Monkey F was in a terminal study.

      We apologize for not being more explicit about the parameters used for the lesion in Monkey F. We have added this in Results on Page 5, Line 209 and in Methods on Page 19, Line 586.

      “After this validation and refinement, one proof-of-concept lesion (150 µA direct current passed through adjacent electrodes for 45 seconds) was performed in an in vivo sedated rhe- sus macaque (Monkey F) in order to validate the safety of the procedure.”

      “This lesion was created by applying 150 µA of direct current to two adjacent electrodes in the microelectrode array for 45 seconds.”

      We also clarified the parameters used for the other lesions in Monkeys H and U in Results on Page 7, Line 233 and in Methods on Page 19, Line 586.

      “In all of the fourteen lesions across two awake-behaving rhesus macaques (150 µA direct current passed through adjacent electrodes for 30 or 45 seconds (30s for Monkey U and 45s for Monkey H, except lesion H200120 which was for 50 seconds)), the current source worked as expected, providing a constant current throughout the duration of the procedure.”

      “In these lesions, 150 µA of direct current was applied to two adjacent electrodes in the mi- croelectrode array for 30 or 45 seconds (30s for Monkey U, 45s for Monkey H), except in lesion H200120 where current was applied for 50 seconds.”

      Monkey F was euthanized shortly after the lesion, so we now mention this on Page 19, Line 583.

      “Based on this, and a lack of physiological signs of pain from the anaesthetized pig studies, a lesion was performed on a sedated rhesus macaque who was subsequently euthanized due to unrelated health complications (Monkey F; 16 year-old adult, male rhesus macaque) in order to further verify safety before use in awake-behaving rhesus.”

      Because Monkey F was sedated and then euthanized shortly after, there is no behavioral data. As the lesion in sedated Monkey F was used to validate the safety of the procedure, any further data and analysis fall beyond the scope of this manuscript.

      As an inactivation method, the electrophysiology recording in Figure 5 only showed a change in pairwise comparisons of clustered action potential waveforms at each electrode (%match) but not a direct measure of neuronal pre and post-lesioning. More evidence is needed to suggest robust neuronal inactivation or termination in rhesus macaques after electrolytic lesioning. Some exam- ples of this can be showing the number of spike clusters identified each day, as well as analyzing local field potential and multi-unit activity.

      The reviewer has pointed out some short comings of the original analysis, which we believe have since been addressed with the revised analysis. LFP and spiking activity are functional measures that are more ambiguous in terms of loss and are also the subject of another manuscript currently under revision.

      The advantages over recently developed lesioning techniques are not clear and are not discussed.

      We thank the reviewer for noting this. We have added a section, also responding to their later request for us to compare our work to Khateeb et al. 2022, by adding a section to the Discussion on Page 16, Line 434.

      “Perhaps the most unique advantage of our technique in comparison with other existing inactivation methods lies in Design Consideration #1: stable electrophysiology pre- and post-inactivation (Appendix 1Data Availabilityappendix.41). While several methods exist that allow for localization and size control of the inactivation (Design Consideration #2) and cross compatibility across regions and species (Design Consideration #3), few have achieved compatibility with stable electrophysiology. For example, some studies record electrophysiology only after the creation of the lesion, preventing comparison with baseline neuronal activity [4]. One recent study, Khateeb, et al., 2022, developed an inactivation method that is effectively combined with stable electrophysiology by creating photothrombotic lesions through a chronic cranial window integrated with an electrocorticography (ECoG) array [5], which may be appropriate for applications where local field potential (LFP) recording is sufficient. This approach has trade-offs with regards to the three design considerations presented in Appendix 1Data Availabilityappendix.41.

      While Khateeb, et al., present a toolbox with integrated, stable electrophysiology from an ECoG array pre- and post- inactivation (Design Consideration #1), it demonstrated recordings from an ECoG array with limited spatial resolution. While a higher density ECoG array that would provide higher spatial resolution could be used, increasing the density of opaque electrodes might occlude optical penetration and constrain photothrombotic lesions. Further, ECoG arrays are limited to recording LFP, not electrophysiology at single neuron resolution, potentially missing meaningful changes in the neuronal population activity after lesioning. Khateeb, et al., demonstrated localization and control the size of inactivation (Design Consideration #2). In this manuscript, we have shown that the amount and duration of direct current are significant determinants of lesion size and shape, while with photothrombotic lesions, light intensity and aperture diameter are the significantly relevant parameters. One potential advantage of photothrombotic approaches is the use of optical tools to monitor anatomical and physiological changes after lesioning through the cranial window, though the research utility of this monitoring remains to be demonstrated.

      Although the method presented by Khateeb, et al., shows some cross-compatibility (Design Consideration #3), it has greater limitations in comparison with the method presented here. For example, while Khateeb, et al., notes that the approach could be adapted for use in smaller organisms, no modification is needed for use in other species with this work’s approach–so long as a multielectrode probe is implantable. In this manuscript we demon- strate electrolytic lesioning spanning two multielectrode probes across rabbits, pigs, sheep, and rhesus macaques, and our same device could be easily used with other smaller species, like rats, in which multielectrode probes have been successfully implanted [6]. Further, the approach in Khateeb, et al., is limited to superficial brain structures, due to the need for opti- cal accessibility. As noted, fiber optics could allow access to deeper structures, which would bring associated additional tissue damage, but deeper structure lesioning was not demon- strated. In contrast, the approach presented here can be used in any region of cortex in which a multielectrode probe can be implanted, which, depending on the probe used, does not limit it to surface structures. For example, we demonstrated use of our lesioning tech- nique with a linear U-probe (Fig. 8figure.caption.25), which could be used to reach deeper layers of cortex or specific deep cortical structures. In both techniques, the location of the lesion is tied to the location of the electrophysiology (for Khateeb et al., wherever the cra- nial window and ECoG array are; for this technique, wherever the multielectrode probe has been implanted), which ensures that the electrophysiology will include recordings from the perilesional area. Neither work addresses the potential of their technique to induce chronic post-lesion behavioral effects, which is a key goal for future work.”

      There is a lack of quantitative histological analysis of the change in neuronal morphology and loss.

      We appreciate the reviewer’s desire for a quantitative histological analysis, however this falls out- side of the scope of this manuscript. We are not attempting to make strong claims about the number of neurons lost through lesioning or thoroughly characterize morphological changes in the neurons. The histology is intended to show that lesioning did lead to a loss of neurons, but the precise num- ber of neurons lost is neither in scope nor is likely to be highly conserved across lesions.

      There is a lack of histology data across animals and on the reliability of their lesioning techniques across animals and experiments.

      We thank the reviewer for this point. As stated above, we have now added Fig. 4 and Fig. 5, which includes volume estimates based on the histology from more of our ex vivo and in vivo testing across animals.

      There is a lack of data on changes in cortical layers and structures across the lesioning and non- lesioning electrodes.

      We acknowledge that the histology does not have the level of detail that is expected from many modern studies. However, the goal here was dramatically different: we sought to calibrate a novel lesion device, ensure it’s safe use in large mammals (specifically, non-human primates) and pro- vide estimates of the lesion size to compare with the literature. The extent of histology that could be performed and the tools available to us prevent such an in depth analysis. We can say based on shank length of the Utah arrays used and known anatomy that we have affected layer 2/3 and maybe a bit of layer 4.

      Reviewer 1 (Recommendations For The Authors):

      Figure 5b. It would be helpful if the author could plot the delta match separately for the lesion elec- trodes, near neighbor electrodes, and far neighbors. This would help understand the lesion effect, specifically whether the effect is selective (e.g., more potent for the lesion and adjacent electrodes.)

      The fact that neuron loss is not particularly selective can already be seen in the spike waveform plots, arranged spatially on the array. Plenty of clear change is observed far from the lesion elec- trodes (marked with black dots) as well as nearby. We have made mention of this localized non- specificity in the main text and have ensured to remphasize in the figure legened. While a nice suggestion, we currently don’t feel this result rises to the level of a figure given it is not highly specific spatially.

      Reviewer 2 (Recommendations For The Authors):

      Overall the quality of the paper, the figures and the analysis used could be significantly improved. There is a lack of scientific rigor in the presentation of figures and analysis techniques. It is not clear what the authors are trying to communicate through the figures and their choice of figures to show is confusing (see below).

      We thank the reviewer for their pointed critiques and believe we have addressed their concerns with many changes to the text, a revamped waveforms analysis, and both the expansion and addition of results.

      The neurophysiology data shown doesn’t suggest neuronal loss, it only shows change which needs strong control data to show it is due to a lesion.

      As detailed below, we have presented a revised analysis that provides this control. While the reviewer is right to point out we can distinguish actual neuron loss from neuron silencing, we be- lieve the new analysis rigorously indicates new rates of sample turnover beyond those expected from healthy state.

      The histology figure should be replaced with a high-quality representation without folds.

      We understand the reviewer’s suggestion. While ideally we would have many histology slices from each lesion, due to cost, we were only able to collect one histology slice per lesion. The folds were introduced by the company that performed the H&E staining, and we unfortunately cannot remove the folds. Therefore, despite the folds, this is the best and only image from this lesion. We hope that the markings on the figure and the comment in the caption is sufficient to explain to readers that the folds are not a result of the lesion but instead a result of the histology process.

      The authors suggest that this lesioning method will be compatible with any available multielec- trode probe theoretically. Since all testing was done with a Utah array, it will be helpful to add an explanation about potential constraints that will make a given array compatible with this method.

      We thank the reviewer for this suggestion. As stated above, we have both added a demonstration of use in a different multielectrode probe type (with a U-probe) in Fig. 8, and we have added a discussion about which types of multielectrode probes would be suitable on Page 15, Line 420.

      The authors should cite and discuss previous studies using electrolytic lesioning in awake-behaving animals to study the causal connection between the brain and behavior. (One example study: Morissette MC, Boye SM. Electrolytic lesions of the habenula attenuate brain stimulation reward. Behavioural brain research. 2008 Feb 11;187(1):17-26.)

      We thank the reviewers for this suggestion. We have added a mention of existing electrolytic le- sioning studies on Page 2, Line 88.

      “Prior termination studies mostly measure behavioral output, with no simultaneous measures of neuronal activity during the behavior, impairing their ability to provide insight into the causal connection between the brain and behavior [7]–[11], or with no baseline (i.e., pre- lesion) measures of neuronal activity [4].”

      The authors should compare their technique with other recent lesioning studies in primates (e.g. Khateeb et al, 2022)

      We again thank the reviewer for this point. Specifically not mentioning Khateeb et al. 2022 was a submission error on our part; we cited the paper in Appendix 2 in the version uploaded to the eLife submission portal, but we had uploaded the version prior to citing it to bioRxiv. We have combined addressing this with addressing a previous comment, as mentioned above, with a section in the Discussion on Page 16, Line 434.

      In Appendix 2, the authors suggest that a major limitation of optogenetics and chemogenetic in- activation methods is the lack of rhesus-compatible constructs. However, several viral constructs have successful implementation in rhesus monkeys so far (e.g. Galvan A, Stauffer WR, Acker L, El-Shamayleh Y, Inoue KI, Ohayon S, Schmid MC. Nonhuman primate optogenetics: recent advances and future directions. Journal of Neuroscience. 2017 Nov 8;37(45):10894-903; Tremblay et al, Neuron 2020)

      We thank the reviewer for pointing us to these papers. We have added a more thorough description of what we meant by lack of rhesus-compatible constructs in that Appendix.

      “However, other challenges exist with using optogenetics as an inactivation method in nonhu- man primates, including difficulty reliably affecting behavior [12]. While several constructs for rhesus macaques have been developed [13], [14], reports of successfully inducing be- havioral effects have a small effect size and are less numerous than might be expected [12], and several null results have been published [15]–[17]. Other remaining challenges include the need to develop a head-mounted, battery powered light delivery system for multi-day delivery of light and difficulty integrating illumination with simultaneous chronic neuro- electrophysiology.”

      For Figure 5b, only pairwise comparison results from monkey U (L11-14) are shown. It is unclear why such results from monkey H were shown in Figure 5a but not in 5b.

      We thank the reviewer for pointing out this unconventional one monkey result. As described in the original submission, we previously omitted Monkey H from the analysis in Figure 5b (now Figure 7) since some of the lesions were closely spaced together, preventing well defined pre- and post- lesion rates of turnover. Never-the-less we have included Monkey H in all the revised analysis and believe even the less cleanly separated data shows useful indications of neuron loss or silencing evoked by the lesion.

      Behavioral data (during a motor task) from the awake behaving monkeys (U and H) would greatly strengthen the claim that this lesioning method is capable of creating a behavioral effect and can be adopted to study the relationship between neural function and behavior outcomes.

      While we are grateful for the reviewer’s interest in the application of our lesioning technique to studies involving behavior, a behavioral analysis of the effects of our electrolytic lesions falls be- yond the scope of this Tools and Resources manuscript. We would also like to point out that we do not claim that we have achieved a behavioral deficit in this manuscript.

      Figure 2 would benefit from an illustration of the Utah array placement and the location of the sites used for lesioning. The authors can either overlay the illustrations on the current ex-vivo and histology images or create a separate schematic to demonstrate that for the readers. Also, Figure 2B needs to be replaced with one without the folds to avoid confusion for the readers.

      We have added Figure 2 - figure supplement 1, which shows both the location within the Utah array of the two electrodes used to create the lesions as well as the relative size of the surface area of the lesion and the array. Unfortunately, as the lesion was created under the array, the exact location of the array relative to the lesion is unknown.

      As mentioned above, Figure 2B is the only histological image from that lesion. We hope that the markings in the image as well as the caption sufficiently explain that the folds are unrelated to the lesion itself.

      Figure 3, the conical region is not well delineated. Data across animals and lesion volume with respect to different parameters should be included.

      We have included a supplemental figure, Figure 3 - figure supplement 1, where we have used a dashed white line to clearly indicate the area of damaged parenchyma, in case it was not clear in Figure 3a. We have also added volume estimates from lesions across animals and different param- eters. The ex vivo estimates are shown in Figure 4 and the in vivo estimates are shown in Figure 5.

      Figure 4: it is not clear what is being communicated, and where the voltage traces are from.

      We thank the reviewer for noting this confusion. We have added some lines in the text to explain what the voltage traces show, both in the caption to Fig. 6 and in the text on Page 7, Line 238.

      “Traces only capture the values while the lesioning device was turned on (45 seconds for most lesions and 50 seconds for lesion H200120). A) Voltage traces. Discontinuity at the beginning of the traces indicates transient voltages that were too rapid to be captured by the voltmeter, lasting between 0.13 and 0.33 s. The fluctuating voltages, especially the rapid in- crease in voltage at the beginning of lesioning, emphasize the importance of using a current source to deliver consistent amounts of current into the brain.”

      “The voltage across the microelectrode array fluctuated much more than the current did, em- phasizing that we made the correct choice in using a current source to ensure delivery of consistent amounts of current into the brain (Fig. 6figure.caption.19).”

      Figure 5: why did the authors choose to use matching units as a measure of the lesion? It is surprising that there are still units on the location that the authors claim to be a lesion. To clarify that it would be helpful to show the location of the lesion in Figure 4a. Also, what can we conclude about the lesion induction when we see units on the lesion electrode? The change in unit match shows that there is a change in the network (although the authors need to show control for that so we know those changes don’t happen due to natural dynamics). It is not clear what is the time duration for pre-pre and post-post (i.e. minutes, seconds, hours). Do these comparisons come from the same time frame or are they coming from two fragments of time for both pre and post- conditions?

      Aside from post-mortem histology and tissue assays, there is no good way to confirm neuron loss with chronically implanted electrode arrays in nonhuman primates. Waveforms were chosen as they are the one readily isolated physical measure of the system we are injuring. Although functional measures of activity could indicate neuron loss (topic of following papers), there are many conceivable changes in firing rate patterns that could manifest spuriously as loss, making the estimation of loss even more ambiguous and challenging this way.

      We believe the new Figure 7 will make the procedure much more clear, while also providing the control requested by the reviewer, illustrating that new statistical categories of altered waveforms emerge during a lesion, beyond those associated with typical changes in waveform composition within multi-unit recordings seen during recording sample turnover fom healthy animals. We further note that by confining this analysis to four day spans at most, we have limited the impact of daily sample turnover described in the literature (Gallego, 2020).

      The time duration for pre-session versus pre-session (pre-post and post-post), is some multiple of the approximate 24 hours between each daily recording session. Therefore, since restricting our- selves to four days separation, between 24 and 96 hours. Spikes are sampled from successful trial periods (so on the order of seconds, compiled into minutes across the whole recording session). Although already described in the main text, these points have been reemphasized in the figure legend.

      CNO (line 931) needs to be explained.

      We thank the reviewer for this point. We have defined CNO and its relevance in Appendix 2.

      “Additionally, chronic inactivation over days may be logistically challenging, as the half life of clozapine N-oxide (CNO, a ligand used to activate DREADD receptors) is on the order of hours.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study examines the spatial and temporal patterns of occurrence and the interspecific associations within a terrestrial mammalian community along human disturbance gradients. They conclude that human activity leads to a higher incidence of positive associations.

      Strengths:

      The theoretical framework of the study is brilliantly introduced. Solid data and sound methodology. This study is based on an extensive series of camera trap data. Good review of the literature on this topic.

      Weaknesses:

      The authors use the terms associations and interactions interchangeably.

      This is not the case. In fact, we state specifically that "... interspecific associations should not be directly interpreted as a signal of biotic interactions between pairs of species…" However, co-occurrence can be an important predictor of likely interactions, such as competition and predation. We stand by our original text.

      It is not clear what the authors mean by "associations". A brief clarification would be helpful.

      Our specific definition of what is meant here by spatial association can be found in the Methods section. To clarify, the calculation of the index of associations is based on the covariance for the two species of the residuals (epsilon) after consideration of all species-specific response to known environmental covariates. These covariances are modelled to allow them to vary with the level of human disturbance, measured as human presence and human modification. After normalization, the final index of association is a correlation value that varies between -1 (complete disassociation) and +1 (complete positive association).

      Also, the authors do not delve into the different types of association found in the study. A more ecological perspective explaining why certain species tend to exhibit negative associations and why others show the opposite pattern (and thus, can be used as indicator species) is missing.

      Suggesting the ecological underpinnings of the associations observed here would mainly be speculation at this point, but the associations demonstrated in this analysis do suggest promising areas for the more detailed research suggested.

      Also, the authors do not distinguish between significant (true) non-random associations and random associations. In my opinion, associations are those in which two species co-occur more or less than expected by chance. This is not well addressed in the present version of the manuscript.

      Results were considered to be non-random if correlation coefficients (for spatial association) or overlap (for temporal association) fell outside of 95% Confidence Intervals. This is now stated clearly in the Methods section.  In Figure 3—figure supplement 1-3 and Figure 4—figure supplement 1-3, p<0.01 levels are also presented.

      The obtained results support the conclusions of the study.

      Anthropogenic pressures can shape species associations by increasing spatial and temporal co-occurrence, but above a certain threshold, the positive influence of human activity in terms of species associations could be reverted. This study can stimulate further work in this direction.

      Reviewer #2 (Public Review):

      Summary:

      This study analyses camera trapping information on the occurrence of forest mammals along a gradient of human modification of the environment. The key hypotheses are that human disturbance squeezes wildlife into a smaller area or their activity into only part of the day, leading to increased co-occurrence under modification. The method used is joint species distribution modelling (JSDM).

      Strengths:

      The data source seems to be very nice, although since very little information is presented, this is hard to be sure of. Also, the JSDM approach is, in principle, a nice way of simultaneously analysing the data.

      Weaknesses:

      The manuscript suffers from a mismatch of hypotheses and methods at two different levels.

      (1) At the lower level, we first need to understand what the individual species do and "like" (their environmental niche). That information is not presented, and the methods suggest that the representation of each species in the JSDM is likely to be extremely poor.

      The response of each species to the environmental covariates provides a window into their environmental niche, encapsulated in the beta coefficients for each environmental covariate. This information is presented in Figure 2.

      (2) The hypothesis clearly asks for an analysis of the statistical interaction between human disturbance and co-occurrence. Yet, the model is not set up this way, and the authors thus do a lot of indirect exploration, rather than direct hypothesis testing.

      Our JSDM model is set up specifically to examine the effect of human disturbance on co-occurrence, after controlling for shared responses to environmental variables.  It directly tests the first hypothesis, since, if increase in indices of human disturbance had not tended to increase the measured spatial correlations between species as detected by the model, we would have rejected our stated hypothesis that human modification of habitats results in increased positive spatial associations between species.

      Even when the focus is not the individual species, but rather their association, we need to formulate what the expectation is. The hypotheses point towards presenting the spatial and the temporal niche, and how it changes, species for species, under human disturbance. To this, one can then add the layer of interspecific associations.

      Examining each species one by one and how each one responds to human disturbance would miss the effects of any meaningful interactions between species.  The analysis presented provides a means to highlight associations that would have been overlooked.  Future research could go on to analyze the strongest associations in the community and the strongest effects of human disturbance so as to uncover the underlying interactions that give rise to them and the mechanisms of human impact.  We believe that this will prove to be a much more productive approach than trying to tackle this problem species by species and pair by pair.

      The change in activity and space use can be analysed much simpler, by looking at the activity times and spatial distribution directly. It remains unclear what the contribution of the JSDM is, unless it is able to represent this activity and spatial information, and put it in a testable interaction with human disturbance.

      The topic is actually rather complicated. If biotic interactions change along the disturbance gradient, then observed data are already the outcome of such changed interactions. We thus cannot use the data to infer them! But we can show, for each species, that the habitat preferences change along the disturbance gradient - or not, as the case may be.

      Then, in the next step, one would have to formulate specific hypotheses about which species are likely to change their associations more, and which less (based e.g. on predator-prey or competitive interactions). The data and analyses presented do not answer any of these issues.

      We suggest that the so-called “simpler” approach described above is anything but simple, and this is precisely what the Joint Species Distribution Model improves upon.  As pointed out in the Introduction, simply examining spatial overlap is not enough to detect a signal of meaningful biotic interaction, since overlap could be the result of similar responses to environmental variables.  With the JSDM approach, this would not be considered a positive association and would then not imply the possible existence of meaningful interaction.

      Another more substantial point is that, according to my understanding of the methods, the per-species models are very inappropriate: the predictors are only linear, and there are no statistical interactions (L374). There is no conceivable species in the world whose niche would be described by such an oversimplified model.

      While interaction terms can be included in the JSDM, this would considerably increase the complexity of the models.  In previous work, we have found no strong evidence for the importance of interaction terms and they do not improve the performance of the models.

      We have no idea of even the most basic characteristics of the per-species models: prevalences, coefficient estimates, D2 of the model, and analysis of the temporal and spatial autocorrelation of the residuals, although they form the basis for the association analysis!

      The coefficient estimates for response to environmental variables used in the JSDM are provided in Figure 2 and Figure 2—source data 1.

      Why are times of day and day of the year not included as predictors IN INTERACTION with niche predictors and human disturbance, since they represent the temporal dimension on which niches are hypothesised to change?

      Also, all correlations among species should be shown for the raw data and for the model residuals: how much does that actually change and can thus be explained by the niche models?

      The discussion has little to add to the results. The complexity of the challenge (understanding a community-level response after accounting for species-level responses) is not met, and instead substantial room is given to general statements of how important this line of research is. I failed to see any advance in ecological understanding at the community level.

      We agree that the community-level response to human disturbance is a complex topic, and we believe it is also a very important one.  This research and its support of the spatial compression hypothesis, while not providing definitive answers to detailed mechanisms, opens up new lines of inquiry that makes it an important advance.  For example, the strong effects of human disturbance on certain associations that were detected here could now be examined with the kind of detailed species by species and pair by pair analysis that this reviewer appears to demand.

      Reviewer #1 (Recommendations For The Authors):

      L27 indicates instead of "idicates".

      We thank the reviewer for catching that error.

      L64 I would refer to potential interactions or just associations. It is always hard to provide evidence for the existence of true interactions.

      We have revised to “potential interactions” to qualify this statement.

      L69 Suggestion: distort instead of upset.

      We thank the reviewer for catching that error.

      L70-71 Here, authors use the term associations. Please, be consistent with the terminology throughout the manuscript.

      We thank the reviewer for raising this important point.  The term “co-occurrence” appears to be used inconsistently in the literature, so we have tried to refer to it only when referencing the work of us. For us, co-occurrence means “spatial overlap” without qualification as to whether it is caused by interaction or simply by similar responses to environmental factors (see Blanchet et al. 2020, Argument 1). In our view, interactions refer to biotic effects like predation, competition, commensalism, etc., while associations are the statistical footprint of these processes.   In keeping with this understanding, in Line 73, we changed "association" to the stronger word "interaction," but in Line 76, we keep the words "spatiotemporal association", which is presumed to be the result of those interactions. In Line 91, we have changed “interactions” to “associations,” as we do not believe interactions were demonstrated in that study. 

      L76 "Species associations are not necessarily fixed as positive or negative..." This sentence is misleading. I would say that species associations can vary across time and space, for instance along an environmental gradient.

      We thank the reviewer for pointing out the potential for confusion.  In Line 79, we have changed as suggested.

      L78 "Associations between free-ranging species are especially context-dependent" Loose sentence. Please, explain a bit further.

      We have changed the sentence to be more specific; ”Interactions are known to be context-dependent; for example, gradients in stress are associated with variation in the outcomes of pairwise species interactions.”

      L83-85 This would be a good place to introduce the 'stress gradient' hypothesis, which has also been applied to faunal communities in a few studies. According to this hypothesis, the incidence of positive associations should increase as environmental conditions harden.

      In our review of the literature, we find that the stress gradient hypothesis is somewhat controversial and does not receive strong support in vertebrates.  We have added the phrase “…the controversial stress-gradient hypothesis predicts that positive associations should increase as environmental conditions become more severe…”

      L86-88 Well, overall, the number of studies examining spatiotemporal associations in vertebrates is relatively small. That is, bird associations have not received much more attention than those of mammals. I find this introductory/appealing paragraph a bit rough. I think the authors can do better and find a better justification for their work.

      We thank the reviewer for the comments.  We have rewritten the paragraph extensively to make it clearer and to provide a stronger justification for the study.

      L106 "[...] resulting in increased positive spatial associations between species" I'd say that habitat shrinking would increase the level of species clustering or co-occurrence, but in my opinion, not necessarily the incidence of positive associations. It is not clear to me if the authors use positive associations as a term analogous to co-occurrence.

      We thank the reviewer for raising this very important distinction.  Habitat shrinking would increase levels of species co-occurrence, but this is not particularly interested.  We wanted to test whether there were effects on species interactions, as revealed by associations.  We find that the terms association and co-occurrence are used somewhat loosely in the literature and so have made some new effort to clarify and systematize this in the manuscript.  For example, there appear to be a differences in the way “co-occurrence” is used in Boron 2023 and in Blanchet 2020. We do not use the term "positive spatial association" as analogous to "spatial co-occurrence.". Spatial co-occurrence, which for us has the meaning of spatial overlap, could simply be the result of similar reactions to environmental co-variates, not reflecting any biotic interaction. Joint Species Distribution Models enable the partitioning of spatial overlap and segregation into that which can be explained by responses to known environmental factors, and that which cannot be explained and thus might be the result of biotic interactions.  It is only the latter that we are calling spatial association, which can be positive or negative.   These associations may be the statistical footprint of biotic interactions.

      Results:

      Difference between random and non-random association patterns. It is not clear to me if the reported associations are significant or not. The authors only report the sign of the association (either positive or negative) but do not clarify if these associations indicate that two species coexist more or less than expected by chance. In my opinion, that is the difference between true ecological associations (e.g., via facilitation or competition effects) and random co-existence patterns. This is paramount and should be addressed in a new version of the manuscript.

      This information is provided in Figure 3—figure supplement 1,2,3 and Figure 4—figure supplement 1,2,3.  This is referenced in the text as follows, “… correlation coefficients for 18 species pairs were positive and had a 95 % CI that did not overlap zero, and the number increased to 65 in moderate modifications but dropped to 29 at higher modifications" and so on. This criterion for significance (ie., greater than expected by chance) is now stated at the end of the Materials and methods.  In Figure 3—figure supplement 1,2,3 and Figure 4—figure supplement 1,2,3, those correlations that were significant at p<0.01 are also shown.

      I am also missing a more ecological explanation for the observed findings. For instance, the top-ranked species in terms of negative associations is the red fox, whereas the muntjac seems to be the species whose presence can be used as an indicator for that of other species. What are the mechanisms underlying these patterns? Do red foxes compete for food with other species? Do the species that show positive associations (red goral, muntjac) have traits or a diet that are more different from those of other species? More discussion on these aspects (role of traits and the trophic niche) would be necessary to better understand the obtained results.

      The purpose of this paper was to test the compression hypotheses, and we have tried to keep that as the focus.  However, the analysis does open up interesting lines of inquiry for future research to decipher the details of the interactions between species and the mechanisms by which human disturbance facilitates or disrupts these interactions. The reviewer raises some interesting possibilities, but at this point, any discussion along these lines would be largely speculation and could lengthen the paper without great benefit. 

      Reviewer #2 (Recommendations For The Authors):

      The manuscript should be accompanied by all data and code of analysis.

      All data and RScripts have been made available in Science Data Bank: https://doi.org/10.57760/sciencedb.11804.

      The sentence "not much is known" is weak: it suggests the authors did not bother to quantify what IS known, and simply waved any previous knowledge aside. Surely we have some ideas about who preys on whom, and which species have overlapping resource requirements (e.g., due to jaw width). For those, we would expect a particularly strong signal, if the association is indeed indicative of interactions.

      We believe that the reviewer is referring to the statement in Line 90-92 about the lack of understanding of the resilience of terrestrial mammal associations to human disturbance.  We have added a reference to one very recent publication that addresses the issue (Boron et al., 2023), but otherwise we stand by our statement. We have, however, added a qualifier to make it clear that we did indeed look for previous knowledge; "However, a review of the literature indicates that ...."

      Figures:

      Fig. 1. This reviewer considers that this is too trivial and should be deleted.

      This is a graphical statement of the hypotheses and may be helpful to some readers.

      Fig. 2. Using points with error bars hides any potential information.

      Done as suggested.

      That only 4 predictors are presented is unacceptably oversimplified.

      Only 4 predictors are included because, in previous work, we found that adding additional predictors or interactions did little to improve the model’s performance (Li et al. 2018, 2021 and 2022) and could lead to over-fitting.

      Fig. 5. and 6. aggregate extremely strongly over species; it remains unclear which species contribute to the signal, and I guess most do not.

      The number of detection events presented in Table 1 should help to clarify the relative contribution of each species to the data presented in Figures 5 and 6.

      This reviewer considers that the introduction 'oversells' the paper.

      L55: can you give any such "unique ecological information"

      L60: Lyons et al. (Kathleen is the first name) has been challenged by Telford et al. (2016 Nature) as methodologically flawed.

      The first name has been deleted.  The methodological flaw has to do with interpretation of the fossil record and choice of samples, not with the need to partition shared environmental preferences and interactions.

      L61 contradicts line 64: Blanchet et al. (2022, specifying some arguments from Dormann et al. 2018 GEB) correctly point out that logically one cannot infer the existence or strength from co-occurrence data. It is thus wrong to then claim (citing Boron et al.) that such data "convey key information about interactions". The latter statement is incorrect. A tree and a beetle can have extremely high association and nothing to do with each other. Association does not mean anything in itself. When two species are spatially and temporally non-overlapping, they can exhibit perfect "anti-association", yet, by the authors' own definition, cannot interact.

      We believe that the reviewer’s concerns arise from a misunderstanding of how we use the term association.  In our usage, an association is not the same as co-occurrence or overlap, which may simply be the result of shared responses to environmental variables.  The co-occurring tree and beetle would not be found to have any association in our analysis, only shared environmental sensitivities.  In contrast, associations can be the statistical footprint of interactions, and would be overlaid onto any overlap due to similar responses to the environment.  In the case of negative associations, such as might be the result of competitive exclusion or avoidance of predators, the two species would share environmental responses but show lower than expected spatial overlap.  Even though they might be only rarely found in the same vicinity, they would indeed be interacting when they were together.

      Joint Species Distribution Models "allow the partitioning of the observed correlation into that which can be explained by species responses to environmental factors... and that which remains unexplained after controlling for environmental effects and which may reflect biotic interactions." (Garcia Navas et al. 2021). It is the latter that we are calling “associations.”

      L63: Gilbert reference: Good to have a reference for this statement.

      This point is important, but the reviewer’s comments below have made it clear that it is even more important to point out that strong interactions should be expected to lead to significant associations.  We have added a statement to clarify this.

      L70-72: Incorrect, interactions play a role, not associations (which are merely statistical).

      In this, we agree, and we have revised the statement to refer to interactions, not associations. In our view, an interaction is a biological phenomenon, while an association is the resulting statistical signal that we can detect.

      L75: Associations tell us nothing, only interactions do. Since these can not be reliably inferred, this statement and this claim are wrong.

      We thank the reviewer for raising this point, but we beg to disagree. Strong interactions should be expected to lead to significant associations that can be detected in the data. Associations, which can be measured reliably, are the evidence of potential interactions, and hence associations can tell us a great deal.  We have added a note to this effect after the Gilbert reference above to clarify this point.

      However, we do accept that associations must be interpreted with caution. As Blanchet et al. 2020 explain, " …the co-occurrence signals (e.g. a significant positive or negative correlation value) estimated from these models could originate from any abiotic factors that impact species differently. Therefore, this correlation cannot be systematically interpreted as a signal of biotic interactions, as it could instead express potential non-measured environmental drivers (or combinations of them) that influence species distribution and co-distribution.”  Or alternatively an association could be the result of interaction with a 3rd species. 

      L87: Regarding your claim, how would you know you DO understand? For that, you need to formulate an expectation before looking at the data and then show you cannot show what you actually measure. (Jaynes called this the "mind-projection fallacy".)

      We are not sure if the reviewer is criticizing our paper or the entire field of community ecology.  Perhaps it is the statement that “….resilience of interspecific spatiotemporal associations of terrestrial mammals to human activity remains poorly understood….”  Since we are confident that the reviewer believes that mammals do interact, we guess that it is the term “association” that is questioned.  We have revised this to “…the impacts of human activity on interspecific interactions of terrestrial mammals remains poorly understood…” 

      In this particular case, we did formulate an expectation before looking at the data, in the form of the two formal hypotheses that are clearly stated in the Introduction and illustrated in Figure 1. If the hypotheses had not been supported, then we would have accepted that we do not understand. But as the data are consistent with the hypotheses, we submit that we do understand a bit more now.

    1. Author response:

      We thank the reviewers for their critical appraisal of our manuscript. We will address the points of confusion and/or lack of clarity in a revised manuscript. We agree with reviewer 1 that applying the best practice pipeline(s) on new experimental data and comparing this approach with current practices would be a useful demonstration of how this alters the biological interpretation. This is something we are in the process of completing but believe this is best addressed in a separate manuscript where we can focus on the associated biological findings, allowing this manuscript to remain focused on the accurate quantification of tRNA-Seq data.

    1. Author response:

      We thank the reviewers for their positive evaluation and constructive feedback on our study.

      We acknowledge the concern regarding the use of HEK293T cells. In the revised manuscript, we will provide a more detailed explanation of the role of the PKA pathway in the regulation of GSIS by PGE2. To validate this regulation through Kv2.2, we will overexpress the Kv2.2 mutant channel in beta cells and assess its impact. Additionally, we will verify the specificity of the antibodies for EP1-EP4 receptors by knockdown. To confirm the receptors involved in PGE2 function, we will use additional EP receptor blockers or perform receptor knockdown experiments.

      We will clarify that the described signaling pathway operates under normal physiological conditions and differs from pathological changes.

      We once again thank the reviewers for their positive evaluation and constructive suggestions.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1

      We modified the text regarding PRC1 according to the reviewer’s recommendation.

      Reviewer #2

      Following the reveiwer’s advise, we introduced the holdup assay, as well as the native holdup assay in more details.

      This new part now also discusses the question of replicates in more details. We do not agree with the eLife assessment on this matter, but we think that this assessment was made because analyzing holdup data requires a different approach compared to more conventional interactomic approaches and these differences were not introduced in sufficient depth. We hope that the inclusion of more background reasoning, as well as by providing a more detailed comparison of the measured independent BIN1 interactomes, now included on Figure S4, will eliminate all confusion in the reader.

      We thank the reviewer for guiding us to a previous work that was done on Grb2. Indeed, the finding of this earlier work aligns perfectly with our finding suggesting general similarities in SH3 domain mediated interactions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Benner et al. identify OVO as a transcriptional factor instrumental in promoting the expression of hundreds of genes essential for female germline identity and early embryo development. Prior data had identified both ovo and otu as genes activated by OVO binding to the promoters. By combining ChIP-seq, RNA-seq, and analysis of prior datasets, the authors extend these data to hundreds of genes and therefore propose that OVO is a master transcriptional regulator of oocyte development. They further speculate that OVO may function to promote chromatin accessibility to facilitate germline gene expression. Overall, the data compellingly demonstrate a much broader role for OVO in the activation of genes in the female germline than previously recognized. By contrast, the relationship between OVO, chromatin accessibility, and the timing of gene expression is only correlative, and more work will be needed to determine the mechanisms by which OVO promotes transcription.

      We fully agree with this summary.  

      Strengths:

      Here Benner et al. convincingly show that OVO is a transcriptional activator that promotes expression of hundreds of genes in the female germline. The ChIP-seq and RNA-seq data included in the manuscript are robust and the analysis is compelling.

      Importantly, the set of genes identified is essential for maternal processes, including egg production and patterning of the early embryo. Together, these data identify OVO as a major transcriptional activator of the numerous genes expressed in the female germline, deposited into the oocyte and required for early gene expression. This is an important finding as this is an essential process for development and prior to this study, the major drivers of this gene expression program were unknown.

      We are delighted that this aspect of the work came across clearly. Understanding the regulation of maternal effect genes has been something of a black-box, despite the importance of this class of genes in the history of developmental genetics. The repertoire of essential oogenesis/embryonic development genes that are bound by and respond to OVO are well characterized in the literature, but nothing is known about how they are transcriptionally regulated. We feel the manuscript will be of great interest to readers working on these genes.

      Weaknesses:

      The novelty of the manuscript is somewhat limited as the authors show that, like two prior, well-studied OVO target genes, OVO binds to promoters of germline genes and activates transcription. The fact that OVO performs this function more broadly is not particularly surprising.

      Clearly, transcription factors regulate more than one or two genes. Never-the-less we were surprised at how many of the aspects of oogenesis per se and maternal effect genes were OVO targets. It was our hypothesis that OVO would have a transcriptional effect genome-wide, however, it was less clear whether OVO would always bind at the core promoter, as is with the case of ovo and otu. Our results strongly support the idea that core promoter proximal binding is essential for OVO function; a conclusion of work done decades ago, which has not been revisited using modern techniques. 

      A major challenge to understanding the impact of this manuscript is the fact that the experimental system for the RNA-seq, the tagged constructs, and the expression analysis that provides the rationale for the proposed pioneering function of OVO are all included in a separate manuscript.

      This is a case where we ended up with a very, very long manuscript which included a lot of revisiting of legacy data. It was a tough decision on how to break up all the work we had completed on ovo to date. In our opinion, it was too much to put everything into a single manuscript unless we wanted a manuscript length supplement (we were also worried that supplemental data is often overlooked and sometimes poorly reviewed). We therefore decided to split the work into a developmental localization/characterization paper and a functional genomics paper. As it stands both papers are long. Certainly, readers of this manuscript will benefit from reading our previous OVO paper, which we submitted before this one. The earlier manuscript is under revision at another journal and we hope that this improved manuscript will be published and accessible shortly.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Benner et al. interrogate the transcriptional regulator OVO to identify its targets in the Drosophila germline. The authors perform ChIP-seq in the adult ovary and identify established as well as novel OVO binding motifs in potential transcriptional targets of OVO. Through additional bioinformatic analysis of existing ATAC-seq, CAGE-seq, and histone methylation data, the authors confirm previous reports that OVO is enriched at transcription start sites and suggest that OVO does not act as part of the core RNA polymerase complex. Benner et al. then perform bulk RNA-seq in OVO mutant and "wildtype" (GAL4 mediated expression of OVO under the control of the ovo promoter in OVO mutants) ovaries to identify genes that are differentially expressed in the presence of OVO. This analysis supports previous reports that OVO likely acts at transcription start sites as a transcriptional activator. While the authors propose that OVO activates the expression of genes that are important for egg integrity, maturation, and for embryonic development (nanos, gcl, pgc, bicoid), this hypothesis is based on correlation and is not supported by in vivo analysis of the respective OVO binding sites in some of the key genes. A temporal resolution for OVO's role during germline development and egg chamber maturation in the ovary is also missing. Together, this manuscript contains relevant ChIP-seq and RNA-seq datasets of OVO targets in the Drosophila ovary alongside thorough bioinformatic analysis but lacks important in vivo experimental evidence that would validate the high-quality datasets.

      We thank reviewer 2 for the appreciation of the genomics data and analysis. Some of the suggested in vivo experiments are clear next steps, which are well underway. These are beyond the scope of the current manuscript. 

      Temporal analysis of ovo function in egg chamber development is not easy, as only the weakest ovo alleles have any egg chambers to examine. However, we will also point out the long-known phenotypes of some of those weak alleles in the text (e.g. ventralized chambers in ovoD3/+). We will need better tools for precise rescue/degradation during egg chamber maturation.     

      Strengths:

      The manuscript contains relevant ChIP-seq and RNA-seq datasets of OVO targets in the Drosophila ovary alongside thorough bioinformatic analysis

      Thank you. We went to great lengths to do our highly replicated experiments in multiple ways (e.g. independent pull-down tags) and spent considerable time coming up with an optimized and robust informatic analysis.

      Weaknesses:

      (1) The authors propose that OVO acts as a positive regulator of essential germline genes, such as those necessary for egg integrity/maturation and embryonic/germline development. Much of this hypothesis is based on GO term analysis (and supported by the authors' ChIP-seq data). However accurate interpretation of GO term enrichment is highly dependent on using the correct background gene set. What control gene set did the authors use to perform GO term analysis (the information was not in the materials and methods)? If a background gene set was not previously specified, it is essential to perform the analysis with the appropriate background gene set. For this analysis, the total set of genes that were identified in the authors' RNA-seq of OVO-positive ovaries would be an ideal control gene set for which to perform GO term analysis. Alternatively, the total set of genes identified in previous scRNA-seq analysis of ovaries (see Rust et al., 2020, Slaidina et al., 2021 among others) would also be an appropriate control gene set for which to perform GO term analysis. If indeed GO term analysis of the genes bound by OVO compared to all genes expressed in the ovary still produces an enrichment of genes essential for embryonic development and egg integrity, then this hypothesis can be considered.

      We feel that this work on OVO as a positive regulator of genes like bcd, osk, nos, png, gnu, plu, etc., is closer to a demonstration than a proposition. These are textbook examples of genes required for egg and early embryonic development. Hopefully, this is not lost on the readers by an over-reliance on GO term analysis, which is required but not always useful in genome-wide studies. 

      We used GO term enrichment analysis as a tool to help focus the story on some major pathways that OVO is regulating. To the specific criticism of the reference gene-set, GO term enrichment analysis in this work is robust to gene background set. We will update the GO term enrichment analysis text to indicate this fact and add a table using expressed genes in our RNA-seq dataset to the manuscript and clarify gene set robustness in greater detail in the methods of the revision. We will also try to focus the reader’s attention on the actual target genes rather than the GO terms in the revised text.

      We have updated the GO term analysis by including all the expressed genes in our RNA-seq datasets as a background control. Figure 6 has been updated to include the significant GO terms. We have outlined changes in the methods section below.

      Lines 794-801:

      “Gene ontology enrichment analysis was completed with g:Profiler’s g:GOSt software (Raudvere et al. 2019) on the set of genes overlapping OVO ChIP peaks over the TSS and significantly upregulated in the presence of ectopic OVO (525 genes in total). All genes that were considered to be expressed in our RNA-seq datasets were used as a background control (10,801 genes in total). Default parameters were used for the enrichment analysis except for ‘statistical domain scope’ was set to ‘custom’ (our control background genes were uploaded here), ‘significance threshold’ was set to ‘Bonferroni correction’, and only GO biological process terms were searched for enrichment with the gene list. The GO terms listed in Figure 6 represent the 24 smallest GO term sizes according to Table S5.”

      (2) The authors provide important bioinformatic analysis of new and existing datasets that suggest OVO binds to specific motifs in the promoter regions of certain germline genes. While the bioinformatic analysis of these data is thorough and appropriate, the authors do not perform any in vivo validation of these datasets to support their hypotheses. The authors should choose a few important potential OVO targets based on their analysis, such as gcl, nanos, or bicoid (as these genes have well-studied phenotypes in embryogenesis), and perform functional analysis of the OVO binding site in their promoter regions. This may include creating CRISPR lines that do not contain the OVO binding site in the target gene promoter, or reporter lines with and without the OVO binding site, to test if OVO binding is essential for the transcription/function of the candidate genes.

      Exploring mechanism using in vivo phenotypic assays is awesome, so this is a very good suggestion. But, it is not essential for this work -- as has been pointed out in the reviews, in vivo validation of OVO binding sites has been comprehensively done for two target genes, ovo and otu. The “rules” appear similar for both genes. That said, we are already following up specific OVO target genes and the detailed mechanism of OVO function at the core promoter. We removed some of our preliminary in vivo figures from the already long current manuscript. We continue to work on OVO and expect to include this type of analysis in a new manuscript.

      (3) The authors perform de novo motif analysis to identify novel OVO binding motifs in their ChIP-seq dataset. Motif analysis can be significantly strengthened by comparing DNA sequences within peaks, to sequences that are just outside of peak regions, thereby generating motifs that are specific to peak regions compared to other regions of the promoter/genome. For example, taking the 200 nt sequence on either side of an OVO peak could be used as a negative control sequence set. What control sequence set did the authors use as for their de novo motif analysis? More detail on this is necessary in the materials and methods section. Re-analysis with an appropriate negative control sequence set is suggested if not previously performed.

      We apologize for being unclear on negative sequence controls in the methods. We used shuffled OVO ChIP-seq peak sequences as the background for the de novo motif analysis, which we will better outline in the methods of the revision. This is a superior background set of sequences as it exactly balances GC content in the query and background sequences. We are not fond of the idea of using adjacent DNA that won’t be controlled for GC content and shadow motifs. Furthermore, the de novo OVO DNA binding motifs are clear, statistically significant variants of the characterized in vitro OVO DNA binding motifs previously identified (Lu et al., 1998; Lee and Garfinkel, 2000; Bielinska et al., 2005), which lends considerable confidence. We also show that the OVO ChIP-seq read density are highly enriched for all our identified motifs, as well as the in vitro motifs. We provide multiple lines of evidence, through multiple methods, that the core OVO DNA binding motif is 5’-TAACNGT-3’. We have high confidence in the motif data.

      We have added the below text to the methods section for further clarity on motif analysis parameters.

      Lines 808-812

      “The default parameters were used for de novo motif enrichment analysis, including the use of shuffled input sequences as a control. After identifying ‘OVO Motif One’, OVO ChIP peaks that contained that sequence were removed and the resulting ChIP peaks were resubmitted for STREME analysis deriving derivative OVO DNA binding motifs like above.”

      (4) The authors mention that OVO binding (based on their ChIP-seq data) is highly associated with increased gene expression (lines 433-434). How many of the 3,094 peaks (conservative OVO binding sites), and what percentage of those peaks, are associated with a significant increase in gene expression from the RNA-seq data? How many are associated with a decrease in gene expression? This information should be added to the results section.

      Not including the numbers of the overlapping ChIP peaks and expression changes in the text was an oversight on our part. The numbers that relate to this (666 peaks overlapping genes that significantly increased in expression, significant enrichment according to Fishers exact test, 564 peaks overlapping genes that significantly decreased in expression, significant depletion according to Fishers exact test) are found in figure 4C and will be added to the text.

      We have modified the results section to include the overlap between the RNA-seq and ChIP-seq data.

      Lines 463-468

      “We found that 2,298 genes that were expressed in our RNA-seq data overlapped an OVO ChIP peak. 666 genes significantly increased in expression and were bound by OVO, which is a significant enrichment according to a Fisher’s exact test (Figure 4C, cyan dots, p < 0.01, odds ratio = 2.21). While conversely, 564 genes decreased in expression and were bound by OVO, indicating a significant depletion according to a Fisher’s exact test (Figure 4C, blue dots, p < 0.01, odds ratio = 0.85).”

      (5) The authors mention that a change in endogenous OVO expression cannot be determined from the RNA-seq data due to the expression of the OVO-B cDNA rescue construct. Can the authors see a change in endogenous OVO expression based on the presence/absence of OVO introns in their RNA-seq dataset? While intronic sequences are relatively rare in RNA-seq, even a 0.1% capture rate of intronic sequence is likely to be enough to determine the change in endogenous OVO expression in the rescue construct compared to the OVO null.

      This is a good point. The GAL4 transcript is downstream of ovo expression in the hypomorphic ovoovo-GAL4 allele. We state in the text that there is a nonsignificant increase in GAL4 expression with ectopic rescue OVO, although the trend is positive. We calculated the RPKM of RNA-seq reads mapping to the intron spanning exon 3 and exon 4 in ovo-RA and found that there is also a nonsignificant increase in intronic RPKM with ectopic rescue OVO (we will add to the results in the revision). We would expect OVO to be autoregulatory and potentially increase the expression of GAL4 and/or intronic reads, but the ovoovoGAL4>UASp-OVOB is not directly autoregulatory like the endogenous locus. It is not clear to us how the intervening GAL4 activity would affect OVOB activity in the artificial circuit. Dampening? Feed-forward? Is there an effect on OVOA activity? Regardless, this result does not change our interpretation of the other OVO target genes.

      We have added the analysis of intronic ovo RNA-seq to the results as outlined below.

      Lines 512-520

      “Transcriptionally, ovo RNA-seq reads are likely derived from the UASp-3xFHA-OVO-B cDNA rescue or are indistinguishable between the genomic locus and rescuing cDNA transgene. We found a nonsignificant increase in exon 3 to exon 4 intronic ovo reads with the expression of ectopic rescue OVO (log2 fold change = 0.76, p-adj = 0.26). These intronic reads would be derived from the endogenous ovo locus, but it is difficult to conclusively determine if the endogenous ovo locus would respond transcriptionally to ectopic OVO downstream of UASp (for example, the pathway for ovo is no longer autoregulatory in ovoovo-GAL4/ovoΔBP; UASp-3xFHA-OVO-B germ cells, there is an additional GAL4>UASp activation step). So, we could not confidently assess whether ovo responded transcriptionally to ectopic rescue OVO.”

      (6) The authors conclude with a model of how OVO may participate in the activation of transcription in embryonic pole cells. However, the authors did not carry out any experiments with pole cells that would support/test such a model. It may be more useful to end with a model that describes OVO's role in oogenesis, which is the experimental focus of the manuscript.

      We did not complete any experiments in embryonic pole cells in this manuscript and base our discussion on the potential dynamics of OVO transcriptional control and our previous work showing maternal and zygotic OVO protein localization in the developing embryonic germline. Obviously, we are highly interested in this question and continue to work on the role of maternal OVO. We agree that we are extended too far and will remove the embryonic germ cell model in the figure. We will instead focus on the possible mechanisms of OVO gene regulation in light of the evidence we have shown in the adult ovary, as suggested.

      We have removed figure 7 and have re-written the last two paragraphs of the discussion as below.

      Lines 645-663

      “The requirement for OVO at the TSS of target genes has been well characterized at its own locus as well as its downstream target otu. Our OVO ChIP and expression data confirm findings from previous work that OVO is binding to these target promoters, and in the case of otu, strongly responds transcriptionally to the presence of OVO. Although we did not test the requirement for OVO DNA binding motifs at other OVO bound genes in this work, this has been extensively explored before, showing that removal of OVO

      DNA binding sites overlapping the TSS results in a strong decrease in reporter expression (Lü et al. 1998; Bielinska et al. 2005; Lü and Oliver 2001). Removal of more distal upstream OVO DNA binding sites also reduces reporter expression to a lesser degree. However, for most cases tested, removal of OVO DNA binding sites while leaving the rest of the enhancer regions intact, never totally abolished reporter expression. These dynamics are highly similar to work that has been completed on the pioneer factor zelda (zld). Adding zld DNA binding motifs to a stochastically expressed transcriptional reporter increases the activity and response of the reporter (Dufourt et al. 2018). Distally located zld DNA binding motifs influenced reporter expression to a lesser degree than proximal sites. A single zld DNA binding site adjacent to the TSS produced the strongest reporter activity. Importantly, just like the activity of OVO transgenic reporters, there is not an absolute requirement for zld DNA binding to activate reporter expression, however, the addition of TSS adjacent zld DNA binding motifs does strongly influence reporter response. We know that zld achieves this reporter response through its pioneering activity (Xu et al. 2014; Harrison et al. 2011), whether OVO achieves this similar effect on gene expression through a shared mechanism, or in cooperation with other transcription factors needs to be further explored.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The Results section could be streamlined by limiting the discussion of analysis to only those details that are unusual or essential for understanding the science. For example, the fact that MACS3 was used to call peaks seems most suitable for the Methods section.

      We have removed the below excerpts from the results section to streamline the text.

      ‘We compared immuno-purified OVO associated DNA with input DNA as a control, for a total of 12 ChIPseq libraries, which we sequenced using the Illumina system. After quality control and alignment to the Drosophila r6.46 genome (Gramates et al. 2022), we used MACS3 (Zhang et al. 2008)’

      The Supplemental Tables are referred to out of order. Table S2 is referred to on line 143 while Table S1 is not referred to until the Methods section.

      We have reorganized the order of the tables in the manuscript text.

      In the analysis of CAGE-seq data, it is unclear whether there is anything distinctive about the ~2000 regions bound by OVO but that is not near TSS in the ovary dataset. Are these TSS that are not active in the ovary or are these non-promoter bound OVO sites? If they are TSS of genes not in the CAGE-seq data set, are these genes expressed in other tissues or just expressed at lower levels in the ovary?

      This was a good point that prompted us to take a closer look at the characteristics of OVO binding and its relationships to promoters and other gene elements. 45% of OVO ChIP peaks overlapped the TSS while 55% were either non-overlapping downstream or upstream of the TSS. When plotting OVO ChIP read density, there was still a striking enrichment of OVO binding over the TSS, even though the ChIP peak was not overlapping the TSS (new figure 1K). This is possibly due to weaker direct OVO binding at the TSS that was not considered significant in the peak calling software or were indirect interactions of the distal OVO binding and the TSS. We outline this in the below text added to the results section on the OVO ChIP. To showcase these results, we have included a new panel in figure 1K. We removed the panel showing the enrichment over the cage-seq TSS, but this same data remains in the heatmap shown in figure 1L, so no information is lost. To directly answer the Cage-seq questions considering the OVO bound over the annotated TSS results, we found that 1,047 chip peaks overlapped CAGE-seq TSS, which is only 347 fewer than the annotated TSS overlap (1,394). Of the 1,394 genes that were bound by over the TSS, all of them were considered to be expressed in our RNA-seq dataset, indicating that these might just be more lowly expressed genes that for whatever reason were not considered to be enriched TSSs in the CAGE-seq data. This difference is likely not significant.

      Lines 235-251

      “Although OVO ChIP peaks overlapping genes showed a strong read density enrichment over the TSS, we found that only 45% (1,394/3,094) of OVO ChIP peaks directly overlapped a TSS. 43% (1,339/3,094) of OVO ChIP peaks were found to overlap the gene body downstream of the TSS (intronic and exonic sequences) and 12% (366/3,094) did not overlap any gene elements, indicating that they were intergenic.

      We were interested in the differences between OVO binding directly over the TSS or at more distal upstream and downstream sites. We decided to plot the OVO ChIP read density of these different classes of OVO binding patterns and found that OVO bound over the TSS produced a sharp read density enrichment over the TSS which was consistent with what was found for all OVO bound genes (Figure 1K). OVO binding along the gene body surprisingly also showed a read density enrichment over the TSS, although the magnitude of read density enrichment was notably less than TSS OVO binding. Intergenic OVO binding also showed these same characteristics with a notable upstream read density enrichment possibly indicative of enhancer binding. This indicates that although the significantly called OVO ChIP peaks did not overlap the TSS, there was still a propensity for TSS sequences to be enriched with OVO ChIP over the input control. This could be due to weaker direct in vivo binding of OVO to these TSSs or indirect interactions between the upstream/downstream OVO bound sequences and the TSS, possibly through a looping enhancer-promoter interaction. However, regardless of the location of the OVO ChIP peak, OVO seemed to always be enriched at or in close proximity to TSSs.”

      It would be helpful for the authors to provide a bit more detailed analysis of chromatin states of OVObound regions in GSC, 8c NC, and 32c NC (or some more clarity in the current analysis). Are the regions that are bound by OVO accessible in all these cell types or specifically enriched for accessibility in a subset? The authors state that OVO binding is correlated with open chromatin, but whether these are regions that are open in all cell types analyzed or a subset is not clear from the data presented. Promoters are often accessible regardless of cell type, so it is unclear what exactly is to be concluded from this association. Also, is the proximity to open chromatin features for OVO-bound promoters (as shown in Figure 2C) different than non-OVO-bound promoters (the two classes shown Figure 1L, for example)?

      We utilized previously published datasets of staged germ cell chromatin status to look at the association of chromatin status and OVO binding. Unfortunately, not all the same germ cell stages were profiled for each chromatin mark from the datasets derived for these two papers. For example, only H3K4me3 data exists for GSCs, and only gsc and 8c data exists for H3K9me3, while the other chromatin marks had more profiles, even including later stages. We focused specifically on gsc and 32c (essentially stage 5 egg chambers) for the other chromatin marks since that is when the ovo hypomorphic egg chambers arrest. A nice control would have been chromatin states in somatic follicle cells of the ovary, since we know germ cell genes such as ovo and otu are not expressed and presumably the chromatin states in somatic cell types would be different than germ cells. However, chromatin states for somatic follicle cells were not published in these two papers and we are not aware of any other existing datasets to compare too. Essentially, we need to determine the changes in chromatin states with and without OVO, which we are currently working on. 

      We did further analyze chromatin states and differential OVO binding in respect to gene elements, and found that OVO binding, regardless of the relationship to the gene element, is always open (gsc and 32c ATAC). OVO binding over the gene body shows the same enrichment for open chromatin and transcriptionally active histone marks. We compared the profiles of these chromatin marks and the promoters of OVO bound and not bound genes and consistent with the suggestion that promoters are generally open, we found that this was the case. However, there is an enrichment for open chromatin and transcriptionally active histone marks for OVO bound genes compared to non-OVO bound genes. This could be a consequence of OVO binding or indirect consequence of a downstream OVO target. Regardless, as has been suggested, future experiments directly measuring chromatin status and OVO needs to be performed. The below excerpts have been added to the text to supplement the comments provided above.

      Lines 328-343

      “The association of OVO binding with active histone marks and open chromatin was striking, but open chromatin is likely a general phenomenon of promoters (Haines and Eisen, 2008). Indeed, when measuring the read density for GSC and 32C ATAC-seq for OVO bound and OVO non-bound promoters, there is an enrichment for open chromatin at the TSS regardless of OVO binding. However, we did notice an increase in enrichment for OVO bound promoters compared to OVO non-bound promoters (Figure S1G), possibly suggesting that OVO bound promoters are more open or have an increase in accessibility when compared to non-OVO bound promoters. This same relationship held true for the transcriptionally active histone mark H3K27ac in GSCs (Figure S1H). Since only 45% of OVO ChIP peaks overlapped TSSs, we plotted the read density of the above chromatin marks over OVO ChIP peak maximums for OVO bound over the TSS, gene body, or intergenic regions (Figure S2A-D). We found that OVO bound regions that were not overlapping the TSS still showed the same propensity for enrichment of open chromatin and active histone marks. Intergenic regions were especially enriched for open chromatin measured through ATAC-seq. Altogether suggesting that OVO binding genome-wide is tightly associated with open chromatin regardless of germ cell stage, and active transcription in GSCs. In other words, chromatin state data suggests OVO is acting positively on its target genes and raises the possibility that OVO-binding and open chromatin are related.”

      For clarity, it would help the reader if the authors mentioned the male-specific TATA-associated factors as a rationale for testing the role of OVO binding in core promoter function. This is currently mentioned in the Discussion on lines 575-577, but would help in understanding the motivation behind the detailed analysis of the promoter binding of OVO in the Results and make the negative result more clearly impactful.

      We have introduced the male specific tata factors as suggested and have condensed the two intro paragraphs in this section into one, as shown below.

      Lines 347-363

      “Our data thus far clearly indicates that OVO binding occurs at or very near the core promoter, a region recognized by an enormous collection of factors that associate with RNA polymerase to initiate transcription (Aoyagi and Wassarman 2000; Vo Ngoc, Kassavetis, and Kadonaga 2019). The highly organized polymerase complex has sequence-specific DNA recognition sites with incredibly precise spacing between them, with an overall DNA footprint of a little less than 100bp (Rice, Chamberlin, and Kane 1993; FitzGerald et al. 2006; Ohler et al. 2002). There are upstream binding sites such as TATA, sites at transcription start, such as the initiator (INR), and downstream promoter elements (DPE) (Vo Ngoc, Kassavetis, and Kadonaga 2019). The combinations of these DNA motifs is not random in mammals and Drosophila (FitzGerald et al. 2006), and distinct combinations of different motifs at the TSS of genes expressed in Drosophila are conserved over tens of millions of years of evolution (Chen et al. 2014). The male germline expresses a number of TATA-associated factors that have been implicated in male-specific promoter usage for gene expression (M. Hiller et al. 2004; M. A. Hiller et al. 2001; Lu et al. 2020; V. C. Li et al. 2009). It is possible that OVO is a female germline specific TATA-associated factor, and if so, OVO binding sites at core promoters should share precise spacing with other core promoter elements, suggesting it is likely part of the complex. If not, then OVO is more likely to facilitate binding of the basal transcriptional machinery. Because of the extended footprint of engaged RNA polymerase, OVO and the basal machinery would not be likely to occupy the same region at the same time.”

      The description of the system used for the RNA-seq would benefit from additional clarity. It is not clear as written why it is "Lucky" that there is an mRNA isoform with extended exon 2 required for egg chamber development beyond stage 5. How does this requirement compare to the global requirement for OVO, which seems to be required for germ cell development even before stage 5? Understanding this system is essential for interpreting the RNA-seq results. Indeed, the authors have a separate manuscript (currently on bioRxiv) that explains the details of this system. As such, the current description requires that the reader refer to this additional pre-print. Could the authors include a diagram to better illustrate this system? Furthermore, since this RNA-seq is being performed on tissue that includes nurse cells, follicle cells, and germ cells from multiple stages of development, it is important for the authors to clearly state in which cell types OVO is expressed and likely functional. (While this is well beyond this manuscript, this analysis is the type that might benefit from the use of single-cell sequencing as a means to deconvolute the phenotypic effects of OVO loss.)

      We have rewritten the text to better describe the system for RNA-seq. We have also included a figure (Figure S1A) showing the alleles used that should help provide clarity for the readers. We agree that moving forward single cell experiments will be critical to have a better understanding of the transcriptional changes and chromatin dynamics with and without OVO. We have included the below changes to the text.

      Lines 409-423

      “Previous work from our lab has identified a transheterozygous ovo allelic combination (ovoovo-GAL4/ovoΔBP) that greatly reduces OVO activity resulting in sterility, however, female germ cells are able to survive up until at least stage 5 of oogenesis (Benner et al. 2023). ovoovo-GAL4 is a CRISPR/Cas9 derived T2A-GAL43xSTOP insertion upstream of the splice junction of exon 3 in the ovo-RA transcript (Figure S1A).

      Importantly, this insertion in the extended exon 3 would disrupt roughly 90% of the ovo-B transcripts. However, since about 10% of ovo-B transcripts utilize an upstream splice junction in exon 3, these transcripts would not be disrupted with the T2A-GAL4-3xSTOP insertion and thus allow for enough OVO activity for germ cell survival (Benner et al. 2023). Since ovoovo-GAL4 expresses GAL4 in place of full length OVO due to the T2A sequences, we can drive expression of a rescuing OVO-B construct downstream of UASp to generate OVO+ female germ cells, which in fact does rescue the arrested germ cell phenotype of ovoovo-GAL4/ovoΔBP ovaries. Therefore, in order to determine genes that are transcriptionally responsive to OVO, we compared the gene expression profiles in sets of ovaries that had the ovo hypomorphic phenotype with a negative control rescue construct (ovoovo-GAL4/ovoΔBP; UASp-GFP)(Figure 4A) versus those that drive expression of the rescue construct expressing OVO-B (ovoovo-GAL4/ovoΔBP; UASp-3xFHAOVO-B)(Figure 4B).”

      Lines 427-432

      “The adult female ovary contains somatic cells, germline stem cells, and germline derived nurse cells that would be profiled in a bulk ovary tissue RNA-seq experiment. Although OVO is only required and expressed in germline derived cell types, we chose to dissect one day old post-eclosion ovoovoGAL4/ovoΔBP; UASp-3xFHA-OVO-B female ovaries to enrich for early stages of oogenesis and collected only ovarioles containing the germarium through previtellogenic egg chambers.”

      On lines 526-532, it is unclear why the genes fs(1)N, fs(1)M3, and closca are particularly sensitive to the ovoD3 allele. What is this allele trans heterozygous with in the assay that allows development through egg laying? Why might these genes be unique in their sensitivity?

      These genes are not particularly sensitive, the transheterozygous hypomorphic ovo ovaries are weak enough to reveal the role of OVO for these genes. We rewrote this paragraph to try and provide more clarity to the relationship between OVO+ binding at these vitelline membrane genes and the phenotype of OVOD3 expressing females.

      Lines 562-577

      “We also found that the genes fs(1)N, fs(1)M3, and closca, were all bound by OVO and responded transcriptionally to the presence of ectopic rescue OVO. These genes are significant because they constitute a set of genes that are expressed in the germline and the encoded proteins are eventually incorporated into the vitelline membrane providing the structural integrity and impermeability of the egg (Mineo, Furriols, and Casanova 2017; Ventura et al. 2010). Loss-of-function of these three genes results in flaccid eggs that are permeable to dye and fail to develop. The loss-of-function phenotype of fs(1)N, fs(1)M3, and closca closely resembles the dominant antimorph ovoD3 phenotype. The ovoD3 allele is the weakest of the original dominant-negative ovo alleles and produces defective eggs allowing us to explore the role of OVO in late stages (Busson et al. 1983; Komitopoulou et al. 1983). ovoD3/ovo+ transheterozygous females express a repressive form of OVO that results in dominant sterility, and importantly, these females lay flaccid eggs with compromised vitelline membranes that are permeable to the dye neutral red (Oliver, Pauli, and Mahowald 1990). Since OVO+ is bound at the TSS of fs(1)N, fs(1)M3, and closca, and these three genes respond transcriptionally to OVO+, then it is plausible that the repressive OVOD3 is negatively regulating these three genes that are required for vitelline membrane formation. This is evidence that OVO is not only involved in regulating the expression of numerous essential maternal pathways for embryonic development, but it is also essential for regulating genes that are required for egg integrity and maturation.”

      The Discussion of OVO as a pioneer factor is highly speculative and based only on correlative data. In fact, the expression data in the embryonic germline is not included in this manuscript, but rather in a separate bioRxiv preprint. This makes it challenging to understand, why this is extensively discussed here. However, there are experiments that could begin to test this proposal. OVO could be expressed in an exogenous tissue and test whether it promotes accessibility. Also, mutations could be made (using gene editing) to identify previously known OVO binding sites in the otu and/or other promoters and these could be assayed for accessibility. By selecting promoters of genes that are not essential for germline development, the authors could directly test the role of OVO in promoting chromatin accessibility. Alternatively, are there reasons that the system used for RNA-seq couldn't be similarly used for ATACseq? It is imperfect but could provide insights into chromatin accessibility in the absence of OVO.

      We have largely removed the speculation on pioneering activity, reference to embryonic germline OVO dynamics included in the previous work, and Figure 7. These are excellent suggestions for experiments and ones we are currently pursuing. Below is the modified discussion. 

      Lines 645-663

      “The requirement for OVO at the TSS of target genes has been well characterized at its own locus as well as its downstream target otu. Our OVO ChIP and expression data confirm findings from previous work that OVO is binding to these target promoters, and in the case of otu, strongly responds transcriptionally to the presence of OVO. Although we did not test the requirement for OVO DNA binding motifs at other OVO bound genes in this work, this has been extensively explored before, showing that removal of OVO

      DNA binding sites overlapping the TSS results in a strong decrease in reporter expression (Lü et al. 1998; Bielinska et al. 2005; Lü and Oliver 2001). Removal of more distal upstream OVO DNA binding sites also reduces reporter expression to a lesser degree. However, for most cases tested, removal of OVO DNA binding sites while leaving the rest of the enhancer regions intact, never totally abolished reporter expression. These dynamics are highly similar to work that has been completed on the pioneer factor zelda (zld). Adding zld DNA binding motifs to a stochastically expressed transcriptional reporter increases the activity and response of the reporter (Dufourt et al. 2018). Distally located zld DNA binding motifs influenced reporter expression to a lesser degree than proximal sites. A single zld DNA binding site adjacent to the TSS produced the strongest reporter activity. Importantly, just like the activity of OVO transgenic reporters, there is not an absolute requirement for zld DNA binding to activate reporter expression, however, the addition of TSS adjacent zld DNA binding motifs does strongly influence reporter response. We know that zld achieves this reporter response through its pioneering activity (Xu et al. 2014; Harrison et al. 2011), whether OVO achieves this similar effect on gene expression through a shared mechanism, or in cooperation with other transcription factors needs to be further explored.”

      The authors suggest that OVO binding is essential for transcriptional activation, but that this may be indirect and that expression of other transcription factors might be necessary for activating gene expression. Did the motif analysis of the OVO-bound regions suggest additional transcription factors that might provide this function?

      We did find other motifs significantly enriched in OVO ChIP peaks. We performed XSTREME analysis on the same set of OVO ChIP peaks which allowed us to determine if any of these motifs were significant matches to DNA binding motifs of known transcription factors. Notably, the DNA binding motifs of GAF and CLAMP were enriched in OVO ChIP peaks. GAF is required in germline clones and the potentially for co-regulation of genes is possible. Other enriched motifs did not match any known binding motifs of other transcription factors but we reported some of the most significantly enriched motifs that were alongside of OVO in Figure S1C-F. The below text outlines changes made to the text incorporating these findings.

      Lines 170-182

      “Along with the OVO DNA binding motif, other motifs were also significantly enriched in OVO ChIP peaks. The motif 5’-GWGMGAGMGAGABRG-3’ (Figure S1C) was found in 18% of OVO ChIP peaks and is a significant match to the DNA binding motifs of the transcription factors GAF (Trl) (Omelina et al. 2011) and CLAMP (Soruco et al. 2013). Trl germline clones are not viable, indicating that GAF activity is required in the germline during oogenesis (Chen et al. 2009). The possibility that OVO binds with and regulates genes alongside of GAF given the enrichment of both transcription factors DNA binding motifs is intriguing. Other significantly enriched motifs 5’-ACACACACACACACA-3’ (29% of peaks, Figure S1D), 5’RCAACAACAACAACA-3’ (26% of peaks, Figure S1E), and 5’-GAAGAAGAAGAAGAR-3’ (17% of peaks,

      Figure S1F) were present in OVO ChIP peaks, however, these motifs did not significantly match known

      DNA binding motifs of other transcription factors. Determining the factors that bind to these sequences

      will certainly help elucidate our understanding of transcriptional control with relationship to OVO in the female germline.”

      The figures would benefit from a bit more detail in the legends (see comments below).

      Minor comments:

      In multiple places throughout the document, the citations are inadvertently italicized (see lines 57-59, 91, and 327 as examples.)

      We have changed this in these locations and other instances in the text.

      On line 76, when discussing OVO as a transcription factor this is referencing the protein and not the gene. Thus, should be written OVO and not ovo.

      We have made the correction ovo to OVO.

      On line 349, "core" promoters is likely what is meant rather than "care" promoters.

      We have corrected ‘care’ to ‘core’ in the text.

      On line 404, the authors state that they wanted to use a "less conservative log2 fold change" but it is not clear what they are comparing to. This is important to understand the motivation.

      We are talking about the gene expression comparison between the ectopic ovo rescue and ovo hypomorphic ovaries. “less conservative” was an unfortunate phrasing. We have rewritten the text to state this directly to the reader.

      Lines 435-444

      “We then performed RNA-seq in quadruplicate and measured the changes in gene expression between ectopic rescue OVO and hypomorphic OVO ovaries. We used a significance level of p-adj < 0.05 and a log2 fold change cutoff of >|0.5| to call differential expression between these two sets of ovaries. We utilized these log2 fold change cutoffs for two reasons. Our control ovary genotype (ovoovo-GAL4/ovoΔBP; UASp-GFP) has hypomorphic OVO activity, hence germ cells can survive but are arrested. With the addition of ectopic rescue OVO in ovoovo-GAL4/ovoΔBP; UASp-3xFHA-OVO-B ovaries, we predicted that genes that were directly regulated by OVO would transcriptionally respond, however, we were unsure as to what degree the response would be in comparison to hypomorphic OVO. We reasoned that if the changes were not significant between genotypes, then minor changes in gene expression would not matter.”

      On line 615, it is unclear what is meant by "showing expression with only 10s of bp of sequence in reporters."

      This is in reference to some of the previously studied ovo reporter deletion lines, however, we have decided to remove the below text in the revised discussion.

      “, despite being remarkably compact. The OVO-dependent ovo core promoter is very compact; showing expression with only 10s of bp of sequence in reporters.” 

      It would be useful to cite and discuss Dufourt et al. Nature Communications 2018 (PMID30518940) regarding the role of Zelda in potentiating transcriptional activation when mentioned on line 624.

      We have added this and the relationship to previous similar work on OVO in the discussion.

      Lines 645-663

      “The requirement for OVO at the TSS of target genes has been well characterized at its own locus as well as its downstream target otu. Our OVO ChIP and expression data confirm findings from previous work that OVO is binding to these target promoters, and in the case of otu, strongly responds transcriptionally to the presence of OVO. Although we did not test the requirement for OVO DNA binding motifs at other OVO bound genes in this work, this has been extensively explored before, showing that removal of OVO

      DNA binding sites overlapping the TSS results in a strong decrease in reporter expression (Lü et al. 1998; Bielinska et al. 2005; Lü and Oliver 2001). Removal of more distal upstream OVO DNA binding sites also reduces reporter expression to a lesser degree. However, for most cases tested, removal of OVO DNA binding sites while leaving the rest of the enhancer regions intact, never totally abolished reporter expression. These dynamics are highly similar to work that has been completed on the pioneer factor zelda (zld). Adding zld DNA binding motifs to a stochastically expressed transcriptional reporter increases the activity and response of the reporter (Dufourt et al. 2018). Distally located zld DNA binding motifs influenced reporter expression to a lesser degree than proximal sites. A single zld DNA binding site adjacent to the TSS produced the strongest reporter activity. Importantly, just like the activity of OVO transgenic reporters, there is not an absolute requirement for zld DNA binding to activate reporter expression, however, the addition of TSS adjacent zld DNA binding motifs does strongly influence reporter response. We know that zld achieves this reporter response through its pioneering activity (Xu et al. 2014; Harrison et al. 2011), whether OVO achieves this similar effect on gene expression through a shared mechanism, or in cooperation with other transcription factors needs to be further explored.”

      On line 1006 (Figure 1 legend), it is unclear what is meant by "The percentage of OVO ChIP peaks each motif was found". Is a word missing?

      This was unclear, we have revised the sentence below.

      Lines 1035-1036

      “The percentage of OVO ChIP peaks containing each motif and their corresponding p-value are indicated to the right.”

      In the Figure 1 legend, please include citations for the Garfinkel motif and Oliver motif.

      Included, as below.

      Lines 1036-1039

      “H) OVO ChIP minus input control ChIP-seq read coverage density centered on the location of the four de novo OVO DNA binding motifs and previously defined in vitro OVO DNA binding motifs (Lü et al. 1998, Bielinska et al. 2005, Lee and Garfinkel 2000).”

      In Figure 2 legend, it is unclear if B is all instances of a given motif or the DNA motifs that are bound by ChIP. Please clarify.

      We meant only the OVO DNA binding motifs that were within significant OVO ChIP peaks. We have revised the legend below.

      Lines 1049-1052

      “A, B) OVO ChIP minus input control, GSC and 32c ATAC-seq, GSC H3K27ac, H3K4me3, H3K27me3, H3K9me3, 8c NC H3K9me3, 32c NC H3K27ac, and H3K27me3 ChIP-seq read coverage density centered on each OVO peak maximum or OVO DNA binding motif located within a significant OVO ChIP peak.”

      The Figure legend for 2D could use more explanation. What do the lines and circles indicate?

      These lines and circles indicate the amount of overlapping peaks measured between the two datasets with solid circles. We have included a better description of what these indicate in the figure legend.

      Lines 1054-1058

      “D) Total number of significant peaks (left) and the total number of overlapping peaks (top) between OVO

      ChIP and GSC and 32c ATAC-seq, GSC H3K27ac, H3K4me3, H3K27me3, H3K9me3, 8c NC H3K9me3, 32c NC H3K27ac, and H3K27me3 ChIP-seq. Lines connecting solid dots indicates the amount of overlapping peaks between those two corresponding datasets.”

      In Figure 4C, bring the 564 blue dots forward so they are not masked by the yellow dots.

      We have brought the colored dots forward in both figure 4C and 4D.

      In Figure 4E, what is the order of the heatmaps?

      The order is genes with the highest to lowest OVO read density enrichment. We have included this in the figure 4 legend.

      Lines 1086-1087

      “The order of the heatmap is genes with the highest to lowest amount of OVO ChIP read density.”

      In Figure 5, the order of the tracks is not immediately obvious. It appears to be those chromatin features most associated with OVO ChIP and those less correlated. Additional clarity could be provided by showing these tracks (and in Supplemental Figure S2) in different colors with a reference to the figure legend about what the colors might indicate.

      We have changed the colors and order of the tracks to be more similar and consistent in both figures.

      Lines 1090-1093

      ovo gene level read coverage tracks for OVO ChIP minus input (black), GSC and 32c ATAC-seq (light blue), GSC and 32C H3K27ac (green), H3K4me3 (dark blue), GSC and 32c H3K27me3 (orange), and GSC and 8c H3K9me3 (pink) ChIP-seq, and ovoΔBP/ovoovo-GAL4; UASp-3xFHA-OVO-B minus ovoΔBP/ovoovo-GAL4; UASp-GFP RNA-seq (red).”

      In Figure S1 legend, what is the reference to the da-GAL4 X UAS transgene in the title?

      This was an error on our part and we have removed it.

      Reviewer #2 (Recommendations For The Authors):

      Overall, the manuscript would benefit from revisions of the writing style. At times it is difficult to distinguish between hypothesis and results. The use of colloquial phrases/prose was distracting while reading, which the authors may consider revising. Some sentences were confusing or extraneous, and the authors may consider revising those. Occasionally sentences within the results sections seem more appropriate for the materials and methods.

      (1) The manuscript is generally clear; however, it is at times difficult to distinguish between hypothesis and results. The use of colloquial phrases/prose was distracting while reading, which the authors may consider revising. Examples include:

      a)  Lines 48-49 "While thematic elements of this complex orchestration have been well studied, coordinate regulation of the symphony has not."

      We have edited this sentence below.

      Lines 48-50

      “While the complex interactions between maternally supplied mRNAs and proteins have been well studied, transcriptional regulation driving the expression of these pathways are less well understood.“

      b)  Lines 232-233 "In other words, where exactly does transcription start at these genes."

      We have removed this sentence.

      c)  Line 385, the word "sham" could be changed to "negative control" or "GFP control"

      We have rewritten this sentence below.

      Lines 419-423

      “Therefore, in order to determine genes that are transcriptionally responsive to OVO, we compared the gene expression profiles in sets of ovaries that had the ovo hypomorphic phenotype with a negative control rescue construct (ovoovo-GAL4/ovoΔBP; UASp-GFP)(Figure 4A) versus those that drive expression of the rescue construct expressing OVO-B (ovoovo-GAL4/ovoΔBP; UASp-3xFHA-OVO-B)(Figure 4B)”

      d)  Line 490 "For the big picture"

      We have removed this and revised with the below sentence.

      Lines 530-531

      “To do this, we performed Gene Ontology enrichment analysis with gProfiler software (Raudvere et al. 2019).

      (2) Some sentences were confusing or extraneous, and the authors may consider revising them. Examples include:

      a)  Lines 195-196 "Therefore, we plotted the significant ChIP (minus input) read density peaks centered on the location of the motif itself."

      We have removed the word ‘peaks’ and ‘itself’, as below.

      Lines 200-201

      “Therefore, we plotted the significant ChIP (minus input) read density centered on the location of the motif.”

      b)  Lines 201-203 "... over the location of the motifs, strongly reinforces the idea that our dataset contains regions centered on sequence-specifically bound OVO transcription factor in the ovary."

      We have edited this sentence to clarify below.

      Lines 204-208

      “While it is possible that OVO comes into contact with regions of DNA in three-dimensional nuclear space non-specifically, the presence of OVO motifs within a large percentage of significant ChIP peaks in vivo and enrichment of OVO ChIP read density at the location of the motifs, strongly reinforces the idea that our OVO ChIP dataset contains regions centered on sequences specifically bound by OVO in the ovary.”

      c)  Lines 326-328 "The combinations of these elements...tens of millions of years of evolution."

      We have revised this sentence below.

      Lines 354-357

      “The combinations of these DNA motifs is not random in mammals and Drosophila (FitzGerald et al. 2006), and distinct combinations of different motifs at the TSS of genes expressed in Drosophila are conserved over tens of millions of years of evolution (Chen et al. 2014).

      d)  Lines 444-446 "To address this directly, we tested the idea that genes with... and thus downstream of OVO."

      We have removed this sentence in its entirety.

      e)  Line 579-580 "Where OVO binding in close proximity, in any ...activates transcription"

      We have removed this sentence in its entirety.

      (3)    Occasionally sentences within the results sections seem more appropriate for the materials and methods. For example, lines 213-218.

      (4)    At the end of line 375, do the authors mean "only" instead of "also"?

      We have modified this sentence below.

      Lines 411-414

      ovoovo-GAL4 is a CRISPR/Cas9 derived T2A-GAL4-3xSTOP insertion upstream of the splice junction of exon 3 in the ovo-RA transcript (Figure S1A). Importantly, this insertion in the extended exon 3 would disrupt roughly 90% of the ovo-B transcripts. However, since about 10% of ovo-B transcripts utilize an upstream splice junction in exon 3, these transcripts would not be disrupted with the T2A-GAL4-3xSTOP insertion and thus allow for enough OVO activity for germ cell survival (Benner et al. 2023).”

      (5)    In line 392 the authors say that they dissected ovaries "one day post-eclosion" but the methods section says that ovaries were 3-5 days old. Please clarify.

      We meant one day old for the RNAseq experiments. We have changed this in the text.

      Lines 679-681

      “Twenty, one day old post-eclosion ovoΔBP/ovoovo-GAL4; UASp-GFP and ovoΔBP/ovoovo-GAL4; UASp-3xFHAOVO-B ovaries were dissected and germariums through previtellogenic egg chambers were removed with microdissection scissors and placed in ice cold PBS making up one biological replicate.”

      (6)    In line 668 the authors mention CRISPR/Cas9 in the methods, but no such experiment was described.

      We have removed this from the Methods header.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      We thank the reviewer for their careful reading of our manuscript and have taken all of their grammatical corrections into account.

      Reviewer #2 (Public Review):

      Weaknesses: 

      The paper contains multiple instances of non-scientific language, as indicated below. It would also benefit from additional details on the cryo-EM structure determination in the Methods and inclusion of commonly accepted requirements for cryo-EM structures, like examples of 2D class averages, raw micrographs, and FSC curves (between half-maps as well as between rigid-body fitted (or refined) atomic models of the different polymorphs and their corresponding maps). In addition, cryo-EM maps for the control experiments F1 and F2 should be presented in Figure 9.

      We tried to correct the non-scientific language and have included the suggested data on the Cryo-EM analyses including new Figures 11-17.  We did not collect data on the sample used for the seeds in the cross seeding experiments because we had already confirmed in multiple datasets that the conditions in F1 and F2 reproducibly produce fibrils of Type 1 and Type 3, respectively. We have now analyzed cryo-EM data for 6 more samples at pH 7.0 and found that several kinds of polymorphs (Types 1A, 1M, 2A, 2B and 5) are accessible at this pH, however the Type 3 polymorphs are not formed at pH 7.0 under the conditions that we used for aggregation.

      Reviewer #2 (Recommendations For The Authors):

      Remove unscientific language: "it seems that there are about as many unique atomicresolution structures of these aggregates as there are publications describing them"   

      We have rephrased this sentence.

      For same reason, remove "Obviously, " 

      Done

      What does this mean? “polymorph-unspecific” 

      Rephrased as non-polymorph-specific

      What does this mean? "shallow amyloid energy hypersurface"  

      By “shallow hypersurface” we mean that the minimum of the multi-dimensional function that describes the energy of the amyloid is not so deep that subtle changes to the environment will not favor another fold/energy minimum. We have left the sentence because while it may not be perfect, it is concise and seems to get the point across.

      "The results also confirm the possibility of producing disease-relevant structure in vitro." -> This is incorrect as no disease-relevant structure was replicated in this work. Use another word like “suggest”.

      We have changed to “suggest” as suggested.

      Remove "historically" 

      Done

      Rephrase “It has long been understood that all amyloids contain a common structural scaffold” 

      Changed to “It has long been established that all amyloids contain a common structural scaffold..” 

      "Amyloid polymorphs whose differences lie in both their tertiary structure (the arrangement of the beta-strands) and the quaternary structure (protofilamentprotofilament assembly) have been found to display distinct biological activities [8]" -> I don't think this is true, different biological activities of amyloids have never been linked to their distinct structures.  

      We have added 5 new references (8-12) to support this sentence.

      Reference 10 is a comment on reference 9; it should be removed. Instead, as for alphasynuclein, all papers describing the tau structures should be included.  

      We have removed the reference, but feel that the addition of all Tau structure references is not merited in this manuscript since we are not comparing them.

      Rephrase: "is not always 100% faithful"

      Removed “100%”

      What is pseudo-C2 symmetry? Do the authors mean pseudo 2_1 symmetry (ie a 2-start helical symmetry)?

      Thank for pointing this out.  We did indeed mean pseudo 21 helical symmetry.  

      Re-phrase: "alpha-Syn's chameleon-like behavior" 

      We have removed this phrase.

      "In the case of alpha-Syn, the secondary nucleation mechanism is based on the interaction of the positively charged N-terminal region of monomeric alpha-Syn and the disordered, negatively charged C-terminal region of the alpha-Syn amyloid fibrils [54]" -> I would say the mechanisms of secondary nucleation are not that well understood yet, so one may want to tune this down a bit. 

      We have changed this to “mechanism has been proposed to be”

      The paragraphs describing experiments by others are better suited for a Discussion rather than a Results section. Perhaps re-organize this part? 

      We have left the text intact as we are using a Results and Discussion format.

      A lot of information about Image processing seems to be missing: what steps were performed after initial model generation? 

      We have added more details in the methods section on the EM data processing and model analysis.

      Figure 1: Where is Type 4 on the pH scale?

      We have adjusted the Fig 1 legend to clarify that pH scale is only applicable to the structures presented in this manuscript. 

      Figure 2: This might be better incorporated as a subpanel of Figure 1.

      We agree that this figure is somewhat of a loner on its own and we only added it in order to avoid confusion with the somewhat inconsistent naming scheme used for the Type 1B structure. However, we prefer to leave it as a separate figure so that it does not get dilute the impact of figure 1.

      Figure 3: What is the extra density at the bottom of Type 3B from pH 5.8 samples 1 and 2. pH 5.8 + 50mM NaCl (but not pH 5.8 + 100 mM NaCl)? Could this be an indication of a local minimum and the pH 5.8 + 100 mM NaCl structure is correct? Or is this a real difference between 0/50mM NaCl and 100 mM NaCl? 

      We did not see the extra density to which the reviewer is referring, however the images used in this panel are the based on the output of 3D-classification which is more likely to produce more artifacts than a 3D refinement. With this in mind, we did not see any significant differences in the refined structures and therefore only deposited the better quality map and model for each of the polymorph types.

      Figure 3: To what extent is Type 3B of pH 6.5 still a mixture of different types? The density looks poor. In general, in the absence of more details about the cryo-EM maps, it is hard to assess the quality of the structures presented.

      In order to improve the quality of the images in this panel, a more complete separation of the particles from each polymorph was achieved via the filament subset selection tool in RELION 5. In each case, an unbiased could be created from the 2D classes via the relion_helix_inimodel2D program, further supporting the coexistence of 4 polymorphs in the pH 6.5 sample. The particles were individually refined to produce the respective maps that are now used in this figure.

      Many references are incorrect, containing "Preprint at (20xx)" statements.  

      This has been corrected.

      Reviewer #3 (Public Review):

      Weaknesses: 

      (1) The authors reveal that both Type 1 monofilament fibril polymorph (reminiscent of JOSlike polymorph) and Type 5 polymorph (akin to tissue-amplified-like polymorph) can both form under the same condition. Additionally, this condition also fosters the formation of flat ribbon-like fibril across different batches. Notably, at pH 5.8, variations in experimental groups yield disparate abundance ratios between polymorph 3B and 3C, indicating a degree of instability in fibrillar formation. The variability would potentially pose challenges for replicability in subsequent research. In light of these situations, I propose the following recommendations: 

      (a) An explicit elucidation of the factors contributing to these divergent outcomes under similar experimental conditions is warranted. This should include an exploration of whether variations in purified protein batches are contributing factors to the observed heterogeneity.

      We are in complete agreement that understanding the factors that lead to polymorph variability is of utmost importance (and was the impetus for the manuscript itself). However the number of variables to explore is overwhelming and we will continue to investigate this in our future research. Regarding the variability between batches of purified protein, we also think that this could be a factor in the polymorph variability observed for otherwise “identical” aggregation conditions, particularly at pH 7 where the largest variety of polymorphs have been observed. However, even variation between identical replicates (samples created from the same protein solution and simply aggregated simultaneously in separate tubes) can lead to different outcomes (see datasets 15 and 16 in the revised Table 1) suggesting that there are stochastic processes that can determine the outcome of an individual aggregation experiment. While our data still indicates that Type 1,2 and 3 polymorphs are strongly selected by pH, the selection between interface variants 3B vs. 3C and 2A vs. 2B might also be affected by protein purity. Our standard purification protocol produces a single band by coomassie-stained SDS-PAGE however minor truncations and other impurities below a few percent would go undetected and, given the proposed roles of the N and C-termini in secondary nucleation, could have a large effect on polymorph selection and seeding. In line with the reviewer’s comments we now include a batch number for each EM dataset. While no new conclusions can be drawn from the inclusion of this additional data, we feel that it is important to acknowledge the possible role of batch to batch variability. 

      (b) To enhance the robustness of the conclusions, additional replicates of the experiments under the same condition should be conducted, ideally a minimum of three times.  

      The pH 5.8 conditions that yield Type 3 fibrils has already been repeated several times in the original manuscript. Since the pH 7.4 conditions produce the most common a-Syn polymorph (Type 1A) and were produced twice in this manuscript (once as an unseeded and once as a cross-seeded fibrilization) we decided to focus on the intermediate condition where the most variability had been seen (pH 7.0). The revised table 1 now has 6 new datasets (11-16) representing 6 independent aggregations at pH 7.0 starting from two different protein purification batches. The results is that we now produce the type 2A/B polymorphs in three samples and in two of these samples we once again observed the type 1M polymorph.  The other samples produced Type 1A or non-twisted fibrils.

      (c) Further investigation into whether different polymorphs formed under the same buffer condition could lead to distinct toxicological and pathology effects would be a valuable addition to the study.  

      The correlation of toxicity with structure would in principle be interesting. However the Type 1 and Type 3 polymorphs formed at pH 5.8 and 7.4 are not likely to be biologically relevant. The pH 7 polymorphs (Type 5 and 1M) would be more interesting because they form under the same conditions and might be related to some disease relevant structures. Still, it is rare that a single polymorph appears at 7.0 (the Type 5 represented only 10-20% of the fibrils in the sample and the Type 1M also had unidentified double-filament fibrils in the sample). We plan to pursue this line of research and hope to include it in a future publication.

      (2) The cross-seeding study presented in the manuscript demonstrates the pivotal role of pH conditions in dictating conformation. However, an intriguing aspect that emerges is the potential role of seed concentration in determining the resultant product structure. This raises a critical question: at what specific seed concentration does the determining factor for polymorph selection shift from pH condition to seed concentration? A methodological robust approach to address this should be conducted through a series of experiments across a range of seed concentrations. Such an approach could delineate a clear boundary at which seed concentration begins to predominantly dictate the conformation, as opposed to pH conditions. Incorporating this aspect into the study would not only clarify the interplay between seed concentration and pH conditions, but also add a fascinating dimension to the understanding of polymorph selection mechanisms.

      A more complete analysis of the mechanisms of aggregation, including the effect of seed concentration and the resulting polymorph specificity of the process, are all very important for our understanding of the aggregation pathways of alphasynuclein and are currently the topic of ongoing investigations in our lab.

      Furthermore, the study prompts additional queries regarding the behavior of cross-seeding production under the same pH conditions when employing seeds of distinct conformation. Evidence from various studies, such as those involving E46K and G51D cross-seeding, suggests that seed structure plays a crucial role in dictating polymorph selection. A key question is whether these products consistently mirror the structure of their respective seeds. 

      We thank the reviewer for reminding us to cite these studies as a clear example of polymorph selection by cross-seeding. Unfortunately, it is not 100% clear from the G51D cross seeding manuscript (https://doi.org/10.1038/s41467-021-26433-2) what conditions were used in the cross-seeding since different conditions were used for the seedless wild-type and mutant aggregations… however it appears that the wildtype without seeds was Tris pH 7.5 (although at 37C the pH could have dropped to 7ish) and the cross-seeded wild-type was in Phosphate buffer at pH 7.0. In the E46K cross-seeding manuscript, it appears that pH 7.5 Tris was used for all fibrilizations (https://doi.org/10.1073/pnas.2012435118).  In any event, both results point to the fact that at pH 7.0-7.5 under low-seed conditions (0.5%) the Type 4 polymorph can propagate in a seed specific manner.

      (3) In the Results section of "The buffer environment can dictate polymorph during seeded nucleation", the authors reference previous cell biological and biochemical assays to support the polymorph-specific seeding of MSA and PD patients under the same buffer conditions. This discussion is juxtaposed with recent research that compares the in vivo biological activities of hPFF, ampLB as well as LB, particularly in terms of seeding activity and pathology. Notably, this research suggests that ampLB, rather than hPFF, can accurately model the key aspects of Lewy Body Diseases (LBD) (refer to: https://doi.org/10.1038/s41467-023-42705-5). The critical issue here is the need to reconcile the phenomena observed in vitro with those in in-vivo or in-cell models. Given the low seed concentration reported in these studies, it is imperative for the authors to provide a more detailed explanation as to why the possible similar conformation could lead to divergent pathologies, including differences in cell-type preference and seeding capability.  

      We thank the reviewer for bring this recent report to our attention. The findings that ampLB and hPFF have different PK digestion patterns and that only the former is able to model key aspects of Lewy Body disease are in support of the seed-specific nature of some types of alpha-synuclein aggregation.  We have added this to the discussion regarding the significant role that seed type and seed conditions likely play in polymorph selection.

      (4) In the Method section of "Image processing", the authors describe the helical reconstruction procedure, without mentioning much detail about the 3D reconstruction and refinement process. For the benefit of reproducibility and to facilitate a deeper understanding among readers, the authors should enrich this part to include more comprehensive information, akin to the level of detail found in similar studies (refer to:

      https://doi.org/10.1038/nature23002).

      As also suggested by reviewer #2, we have now added more comprehensive information on the 3D reconstruction and refinement process.

      (5) The abbreviation of amino acids should be unified. In the Results section "On the structural heterogeneity of Type 1 polymorphs", the amino acids are denoted using three-letter abbreviation. Conversely, in the same section under "On the structural heterogeneity of Type 2 and 3 structures", amino acids are abbreviated using the one-letter format. For clarity and consistency, it is essential that a standardized format for amino acid abbreviations be adopted throughout the manuscript.

      That makes perfect sense and had been corrected.

      Reviewing Editor:

      After discussion among the reviewers, it was decided that point 2 in Reviewer #3's Public Review (about the experiments with different concentrations of seeds) would probably lie outside the scope of a reasonable revision for this work. 

      We agree as stated above and will continue to work on this important point.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary and Strengths:

      The ability of Wolbachia to be transmitted horizontally during parasitoid wasp infections is supported by phylogenetic data here and elsewhere. Experimental analyses have shown evidence of wasp-to-wasp transmission during coinfection (eg Huigins et al), host to wasp transmission (eg Heath et al), and mechanical ('dirty needle') transmission from host to host (Ahmed et al). To my knowledge this manuscript provides the first experimental evidence of wasp to host transmission. Given the strong phylogenetic pattern of host-parasitoid Wolbachia sharing, this may be of general importance in explaining the distribution of Wolbachia across arthropods. This is of interest as Wolbachia is extremely common in the natural world and influences many aspects of host biology.

      Weaknesses:

      The first observation of the manuscript is that the Wolbachia strains in hosts are more closely related to those in their parasitoids. This has been reported on multiple occasions before, dating back to the late 1990s. The introduction cites five such papers (the observation is made in other studies too that could be cited) but then dismisses them by stating "However, without quantitative tests, this observation could simply reflect a bias in research focus." As these studies include carefully collected datasets that were analysed appropriately, I felt this claim of novelty was rather strong. It is unclear why downloading every sequence in GenBank avoids any perceived biases, when presumably the authors are reanalysing the data in these papers.

      Thank you for bringing this to our attention, and we will make the necessary amendments in our revised manuscript.

      I do not doubt the observation that host-parasitoid pairs tend to share related Wolbachia, as it is corroborated by other studies, the effect size is large, and the case study of whitefly is clearcut. It is also novel to do this analysis on such a large dataset. However, the statistical analysis used is incorrect as the observations are pseudo-replicated due to phylogenetic non-independence. When analysing comparative data like this it is essential to correct for the confounding effects of related species tending to be similar due to common ancestry. In this case, it is well-known that this is an issue as it is a repeated observation that related hosts are infected by related Wolbachia. However, the authors treat every pairwise combination of species (nearly a million pairs) as an independent observation. Addressing this issue is made more complex because there are both the host and symbiont trees to consider. The additional analysis in lines 123-124 (including shuffling species pairs) does not explicitly address this issue.

      We concur with your observation regarding the non-independence of the data due to phylogenetic relationships. While common phylogenetic correction methods are indeed not directly applicable to wsp distances between species pairs, we are investigating the potential of phylogenetic mixed models to address this issue. We hope to include a revised analysis using this approach in our revised manuscript.

      The sharing of Wolbachia between whitefly and their parasitoids is very striking, although this has been reported before (eg the authors recently published a paper entitled "Diversity and Phylogenetic Analyses Reveal Horizontal Transmission of Endosymbionts Between Whiteflies and Their Parasitoids"). In Lines 154-164 it is suggested that from the tree the direction of transfer between host and parasitoid can be inferred from the data. This is not obvious to me given the poor resolution of the tree due to low sequence divergence. There are established statistical approaches to test the direction of trait changes on a tree that could have been used (a common approach is to use the software BEAST).

      Thank you for your insightful comments regarding the transfer direction of Wolbachia between whiteflies and their parasitoids. We acknowledge the concern about the resolution of the phylogenetic tree and the inference of the direction of Wolbachia transmission based on the available data. We considered the high infection frequency and obligate nature of Wolbachia in En. formosa, which exhibits a 100% infection rate, as a strong indicator that recent transmission of Wolbachia in this clade likely occurred from En. formosa to B. tabaci. We appreciate your recommendation and will ensure that our conclusions are supported by a more statistically sound approach. As you suggested, we will employ the software BEAST to rigorously test the direction of transmission, and we will revise our statements accordingly.

      Reviewer #2 (Public Review):

      The paper by Yan et al. aims to provide evidence for horizontal transmission of the intracellular bacterial symbiont Wolbachia from parasitoid wasps to their whitefly hosts. In my opinion, the paper in its current form consists of major flaws.

      Weaknesses:

      The dogma in the field is that although horizontal transmission events of Wolbachia occur, in most systems they are so rare that the chances of observing them in the lab are very slim.

      For the idea of bacteria moving from a parasitoid to its host, the authors have rightfully cited the paper by Hughes, et al. (2001), which presents the main arguments against the possibility of documenting such transmissions. Thus, if the authors want to provide data that contradict the large volume of evidence showing the opposite, they should present a very strong case.

      In my opinion, the paper fails to provide such concrete evidence. Moreover, it seems the work presented does not meet the basic scientific standards.

      We are grateful for your critical perspective on our work. Nonetheless, we are confident in the credibility of our findings regarding the horizontal transmission of Wolbachia from En. formosa to B. tabaci. Our study has documented this phenomenon through phylogenetic tree analyses, and we have further substantiated our observations with rigorous experiments in both cages and petri dishes. The horizontal transfer of Wolbachia was confirmed via PCR, with the wsp sequences in B. tabaci showing complete concordance with those in En. formosa. Additionally, we utilized FISH, vertical transmission experiments, and phenotypic assays to demonstrate that the transferred Wolbachia could be vertically transmitted and induce significant fitness cost in B. tabaci. All experiments were conducted with strict negative controls and a sufficient number of replicates to ensure reliability, thereby meeting basic scientific standards. The collective evidence we present points to a definitive case of Wolbachia transmission from the parasitoid En. formosa to the whitefly B. tabaci.

      My main reservations are:

      • I think the distribution pattern of bacteria stained by the probes in the FISH pictures presented in Figure 4 looks very much like Portiera, the primary symbiont found in the bacterium of all whitefly species. In order to make a strong case, the authors need to include Portiera probes along with the Wolbachia ones.

      We are very grateful for your critical evaluation regarding the specificity of FISH in our study. We assure the reliability of our FISH results based on several reasons.

      1) We implemented rigorous negative controls which exhibited no detectable signal, thereby affirming the specificity of our hybridization. 2) The central region of the whitefly nymphs is a typical oviposition site for En. formosa. Post-parasitism, we observed FISH signals around the introduced parasitoid eggs, distinct from bacteriocyte cells which are rich in endosymbionts including Portiera (FIG 3e-f). This observation supports the high specificity of our FISH method. 3) In the G3 whiteflies, we detected the presence of Wolbachia in bacteriocytes in nymphs and at the posterior end of eggs in adult females (FIG 4). This distribution pattern aligns with previously reported localizations of Wolbachia in B. tabaci (Shi et al., 2016; Skaljac et al., 2013). Furthermore, the distribution of Wolbachia in the whiteflies does indeed exhibit some overlap with that of Portiera (Skaljac et al., 2013; Bing et al., 2014). 4) The primers used in our FISH assays have been widely cited (Heddi et al., 1999) and validated in studies on B. tabaci and other systems (Guo et al., 2018; Hegde et al., 2024; Krafsur et al., 2020; Rasgon et al., 2006; Uribe-Alvarez et al., 2019; Zhao et al., 2013). Taking all these points into consideration, we stand by the reliability of our FISH results.

      References:

      Bing XL, Xia WQ, Gui JD, Yan GH, Wang XW, Liu SS. 2014. Diversity and evolution of the Wolbachia endosymbionts of Bemisia (Hemiptera: Aleyrodidae) whiteflies. Ecol Evol, 4(13): 2714-37.

      Guo, Y, Hoffmann, AA, Xu, XQ, Zhang X, Huang HJ, Ju JF, Gong JT, Hong XY. 2018. Wolbachia-induced apoptosis associated with increased fecundity in Laodelphax striatellus (Hemiptera: Delphacidae). Insect Mol Biol, 27: 796-807.

      Heddi A, Grenier AM, Khatchadourian C, Charles H, Nardon P. 1999. Four intracellular genomes direct weevil biology: Nuclear, mitochondrial, principal endosymbiont, and Wolbachia. Proc Natl Acad Sci USA, 96: 6814-6819.

      Hegde S, Marriott AE, Pionnier N, Steven A, Bulman C, Gunderson E, et al. 2024. Combinations of the azaquinazoline anti-Wolbachia agent, AWZ1066S, with benzimidazole anthelmintics synergise to mediate sub-seven-day sterilising and curative efficacies in experimental models of filariasis. Front Microbiol, 15: 1346068.

      Krafsur AM, Ghosh A, Brelsfoard CL. 2020. Phenotypic response of Wolbachia pipientis in a cell-free medium. Microorganisms, 8: 1060.

      Rasgon JL, Gamston, CE, Ren X. 2006. Survival of Wolbachia pipientis in cell-free medium. Appl Environ Microbiol, 72: 6934-6937.

      Shi P, He Z, Li S, An X, Lv N, Ghanim M, Cuthbertson AGS, Ren SX, Qiu BL. 2016. Wolbachia has two different localization patterns in whitefly Bemisia tabaci AsiaII7 species. PLoS One, 11: e0162558.

      Skaljac M, Zanić K, Hrnčić S, Radonjić S, Perović T, Ghanim M. 2013. Diversity and localization of bacterial symbionts in three whitefly species (Hemiptera: Aleyrodidae) from the east coast of the Adriatic Sea. Bull Entomol Res, 103(1): 48-59.

      Uribe-Alvarez C, Chiquete-Félix N, Morales-García L, Bohórquez-Hernández A, Delgado-Buenrostro N L, Vaca L, et al. 2019. Wolbachia pipientis grows in Saccharomyces cerevisiae evoking early death of the host and deregulation of mitochondrial metabolism. MicrobiologyOpen, 8: e00675.

      Zhao DX, Zhang XF, Chen DS, Zhang YK, Hong XY, 2013. Wolbachia-host interactions: Host mating patterns affect Wolbachia density dynamics. PLoS One, 8: e66373.

      • If I understand the methods correctly, the phylogeny presented in Figure 2a is supposed to be based on a wide search for Wolbachia wsp gene done on the NCBI dataset (p. 348). However, when I checked the origin of some of the sequences used in the tree to show the similarity of Wolbachia between Bemisia tabaci and its parasitoids, I found that most of them were deposited by the authors themselves in the course of the current study (I could not find this mentioned in the text), or originated in a couple of papers that in my opinion should not have been published to begin with.

      We appreciate your meticulous examination of the sources for our sequence data. All the sequences included in our phylogenetic analysis were indeed downloaded from the NCBI database as of July 2023. The sequences used to illustrate the similarity of Wolbachia between B. tabaci and its parasitoids include those from our previously published study (Qi et al., 2019), which were sequenced from field samples. Additionally, some sequences were also obtained from other laboratories (Ahmed et al., 2009; Baldo et al., 2006; Van Meer et al., 1999). We acknowledge that in our prior research (Qi et al., 2019), the sequences were directly submitted to NCBI and, regrettably, we did not update the corresponding publication information after the article were published. It is not uncommon for sequences on NCBI, with some never being followed by a published paper (e.g., FJ710487- FJ710511 and JF426137-JF426149), or not having their associated publication details updated post-publication (for instance, sequences MH918776-MH918794 from Qi et al., 2019, and KF017873-KF017878 from Fattah-Hosseini et al., 2018). We recognize that this practice can lead to confusion and apologize for the oversight in our work.

      References:

      Ahmed MZ, Shatters RG, Ren, SX, Jin GH, Mandour NS, Qiu BL. 2009. Genetic distinctions among the Mediterranean and Chinese populations of Bemisia tabaci Q biotype and their endosymbiont Wolbachia populations. J Appl Entomol, 133: 733-741.

      Baldo L, Hotopp JCD, Jolley KA, Bordenstein SR, Biber SA, Choudhury RR, et al. 2006. Multilocus sequence typing system for the endosymbiont Wolbachia pipientis. Appl Environ Microbiol, 72: 7098-110.

      Fattah-Hosseini S, Karimi J, Allahyari H. 2014. Molecular characterization of Iranian Encarsia formosa Gahan populations with natural incidence of Wolbachia infection. J Entomol Res Soc, 20: 85–100.

      Qi LD, Sun JT, Hong XY, Li YX. 2019. Diversity and phylogenetic analyses reveal horizontal transmission of endosymbionts between whiteflies and their parasitoids. J Econ Entomol, 112(2): 894-905.

      Van Meer MM, Witteveldt J, Stouthamer R. 1999. Phylogeny of the arthropod endosymbiont Wolbachia based on the wsp gene. Insect Mol Biol, 8: 399-408.

      • The authors fail to discuss or even acknowledge a number of published studies that specifically show no horizontal transmission, such as the one claimed to be detected in the study presented.

      Thank you for bringing this to our attention. We will address and discuss the published studies that report no evidence of horizontal transmission, as you've highlighted, in the revised version of our manuscript.

      Reviewer #3 (Public Review):

      This is a very ordinary research paper. The horizontal of endosymbionts, including Wolbachia, Rickettsia etc. has been reported in detail in the last 10 years, and parasitoid vectored as well as plant vectored horizontal transmission is the mainstream of research. For example, Ahmed et al. 2013 PLoS One, 2015 PLoS Pathogens, Chiel et al. 2014 Enviromental Entomology, Ahmed et al. 2016 BMC Evolution Biology, Qi et al. 2019 JEE, Liu et al. 2023 Frontiers in Cellular and Infection Microbiology, all of these reported the parasitoid vectored horizontal transmission of endosymbiont. While Caspi-Fluger et al. 2012 Proc Roy Soc B, Chrostek et al. 2017 Frontiers in Microbiology, Li et al. 2017 ISME Journal, Li et al. 2017 FEMS, Shi et al. 2024 mBio, all of these reported the plant vectored horizontal transmission of endosymbiont. For the effects of endosymbiont on the biology of the host, Ahmed et al. 2015 PLoS Pathogens explained the effects in detail.

      Thank you very much for your insightful comments and for highlighting the relevant literature in the field of horizontal transmission of endosymbionts, including Wolbachia and Rickettsia. After careful consideration of the studies you have mentioned, we believe that our work presents significant novel contributions to the field. 1) Regarding the parasitoid-mediated horizontal transmission of Wolbachia, most of the cited articles, such as Ahmed et al. 2013 in PLoS One and Ahmed et al. 2016 in BMC Evolutionary Biology, propose hypotheses but do not provide definitive evidence. The transmission of Wolbachia within the whitefly cryptic species complex (Ahmed et al. 2013) or between moths and butterflies (Ahmed et al. 2016) could be mediated by parasitoids, plants, or other unknown pathways. 2) Chiel et al. (2014 in Environmental Entomology reported “no evidence for horizontal transmission of Wolbachia between and within trophic levels” in their study system. 3) The literature you mentioned about Rickettsia, rather than Wolbachia, indirectly reflects the relative scarcity of evidence for Wolbachia horizontal transmission. For example, the evidence for plant-mediated transmission of Wolbachia remains isolated, with Li et al. 2017 in The ISME Journal being one of the few reports supporting this mode of transmission. 4) While the effects of endosymbionts on their hosts are not the central focus of our study, the effects of transgenerational Wolbachia on whiteflies are primarily demonstrated to confirm the infection of Wolbachia into whiteflies. Furthermore, the effects we report of Wolbachia on whiteflies are notably different from those reported by Ahmed et al. 2015 in PLoS Pathogens, likely due to different whitefly species and Wolbachia strains. 6) More importantly, our study reveals a mechanism of parasitoid-mediated horizontal transmission of Wolbachia that is distinct from the mechanical transmission suggested by Ahmed et al. 2015 in PLoS Pathogens. Their study implies transmission primarily through host-feeding contamination, without the need for Wolbachia to infect the parasitoid, suggesting host-to-host transmission at the same trophic level. In contrast, our findings demonstrate transmission from parasitoids to hosts through unsuccessful parasitism, which represents cross-trophic level transmission. To our knowledge, this is the first experimental evidence that Wolbachia can be transmitted from parasitoids to hosts. We believe these clarifications and the novel insights provided by our research contribute valuable knowledge to the field.

      References:

      Ahmed MZ, De Barro PJ, Ren SX, Greeff JM, Qiu BL. 2013. Evidence for horizontal transmission of secondary endosymbionts in the Bemisia tabaci cryptic species complex. PLoS One, 8: e53084.

      Ahmed MZ, Li SJ, Xue X, Yin XJ, Ren SX, Jiggins FM, Greeff JM, Qiu BL. 2015. The intracellular bacterium Wolbachia uses parasitoid wasps as phoretic vectors for efficient horizontal transmission. PLoS Pathog, 10: e1004672.

      Ahmed MZ, Breinholt JW, Kawahara AY. 2016. Evidence for common horizontal transmission of Wolbachia among butterflies and moths. BMC Evol Biol, 16: 118. doi.org/10.1186/s12862-016-0660-x.

      Caspi-Fluger A, Inbar M, Mozes-Daube N, Katzir N, Portnoy V, Belausov E, Hunter MS, Zchori-Fein E. 2012. Horizontal transmission of the insect symbiont Rickettsia is plant-mediated. Proc Biol Sci, 279(1734): 1791-6.

      Chiel E, Kelly SE, Harris AM, Gebiola M, Li X, Zchori-Fein E, Hunter MS. 2014. Characteristics, phenotype, and transmission of Wolbachia in the sweet potato whitefly, Bemisia tabaci (Hemiptera: Aleyrodidae), and its parasitoid Eretmocerus sp. nr. emiratus (Hymenoptera: Aphelinidae). Environ Entomol, 43(2): 353-62.

      Chrostek E, Pelz-Stelinski K, Hurst GDD, Hughes GL. 2017. Horizontal transmission of intracellular insect symbionts via plants. Front Microbiol, 8: 2237.

      Li SJ, Ahmed MZ, Lv N, Shi PQ, Wang XM, Huang JL, Qiu BL. 2017. Plantmediated horizontal transmission of Wolbachia between whiteflies. ISME J, 11: 1019-1028.

      Li YH, Ahmed MZ, Li SJ, Lv N, Shi PQ, Chen XS, Qiu BL. 2017. Plant-mediated horizontal transmission of Rickettsia endosymbiont between different whitefly species. FEMS Microbiol Ecol, 93(12). doi: 10.1093/femsec/fix138.

      Liu Y, He ZQ, Wen Q, Peng J, Zhou YT, Mandour N, McKenzie CL, Ahmed MZ, Qiu BL. 2023. Parasitoid-mediated horizontal transmission of Rickettsia between whiteflies. Front Cell Infect Microbiol, 12: 1077494. DOI: 10.3389/fcimb.2022.1077494

      Qi LD, Sun JT, Hong XY, Li YX. 2019. Diversity and phylogenetic analyses reveal horizontal transmission of endosymbionts between whiteflies and their parasitoids. J Econ Entomol, 112: 894-905.

      Shi PQ, Wang L, Chen XY, Wang K, Wu QJ, Turlings TCJ, Zhang PJ, Qiu BL. 2024. Rickettsia transmission from whitefly to plants benefits herbivore insects but is detrimental to fungal and viral pathogens. mBio, 15(3): e0244823.

      Weaknesses:

      In the current study, the authors downloaded the MLST or wsp genes from a public database and analyzed the data using other methods, and I think the authors may not be familiar with the research progress in the field of insect symbiont transmission, and the current stage of this manuscript lacking sufficient novelty.

      We appreciate your critical perspective on our study. However, we respectfully disagree with the viewpoint that our manuscript lacks sufficient novelty.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) Original blots in Figures 2E and 2H should be shown as well as the quantification of miR-182-5p overexpression in HepG2 cells. miR-182-5p expression in T2D patients was 2.3-fold higher than ND patients. The lack of insights into the degree of miR-182-5p overexpression precluded proper interpretation of the data presented.

      Thank you very much for these comments. We now include the original uncut blots and relevant bands (new supplementary figure 3A) as well as the quantification of miR-182-5p expression in mimic-treated HepG2 cells in the supplement (new supplementary figure 2).

      (2) What are the upstream transcriptional regulators of miR-182-5p?

      To the best of our knowledge the upstream transcriptional regulators of miR-182-5p are currently unknown.

      (3) What's the purpose of the weight cycling cohort? Figure 3A only showed that miR-182-5p expression was highly correlated to body weight, but the cohort can not explain why the human cohort has different miR-182-5p expression. GTT and ITT data are lacking for this cohort and thus cannot demonstrate a causal link between insulin sensitivity and miR-182-5p. The lack of histological evidence cannot show the relationship between NAFLD and miR-182-5p.

      The purpose of the weight cycling cohort was to demonstrate that miR-182-5p is dynamically altered and that it can be reversed to almost control levels by weight loss. Thereby we validate in mice that obesity is associated with miR-182-5p upregulation (HFD group without intervention) and we propose that the adverse effects of increased miR-182-5p in obesity might be reversible by weight loss.  We did not perform ITTs and GTTs in this weigh cycling cohort because the HFD-model in C57BL/6 mice is well established and it can be assumed that glucose- and insulin-tolerance deteriorated during HFD feeding (doi.org/10.1038/oby.2007.608; doi:10.1007/978-1-61779-430-8_27 and improved after weight loss (doi:10.1038/s41598-023-40514-w). To corroborate this assumption, we provide plasma insulin along with as other important metabolic marker of the weight cycling model in supplemental figure 5A.

      (4) Loss-of-function of miR-182-5p and/or gain-of-function of Lrp6 in vivo or in vitro would clarify the importance of the miR-182-5p-Lrp6 axis and provide more direct evidence for its potential as a therapeutic target.

      We absolutely agree with the reviewer that loss of miR-182 and gain of LRP6 function experiments are missing. However, we provide miR-182 gain of function experiments that impressively show increased liver triglycerides after only seven days of miR-182 overexpression. Because these in vivo data are only short-term, we stated our conclusions carefully and point out that we do not have evidence for a direct involvement of miR-182-5p in insulin signaling. We are now planning follow-up studies in which miR-182-5p will be overexpressed and also antagonized for a longer time. However, for the timeframe of this revision process these extensive studies are not feasible and we ask the reviewer for his/her understanding.

      (5) The schematic summary is too complex and includes too many assumptions to faithfully represent the data shown in this study.

      We agree, the schematic summary is very complex. Therefore we simplified the upper part (new figure 5) and only focused on the clearly regulated genes and main pathways.

      Reviewer #2 (Recommendations For The Authors):

      (1) Although lots of microarray analyses were performed in this study, the authors didn't systemically investigate the function of miR-182 in T2DM or NAFLD. The current data provided in this manuscript may only support that miR-182 is involved in the homeostasis of glucose or insulin.

      We thank the reviewer for this comment and agree that the nature of or data is mostly correlative. We tried to overcome this by performing mechanistic in vitro data. Because overexpression of miR-182-5p decreases inulin signaling in vitro and induces hyperinsulinemia in vivo we still strongly believe that miR-182-5p is highly relevant for the homeostasis of glucose and insulin.

      (2) The authors used miRNA mimics to overexpress miR-182 in mice. How to emphasize the target specificity in the liver? Normally, adeno-associated virus 8 (AAV8) is used to specifically target the liver.

      Tail vein injections as used in our experimental set-up are known to deliver compounds directly to the liver via the portal vein. For modulation of microRNAs in the liver it is an established technique to deliver mimics (or inhibitors) via the tail vein (doi:10.1007/978-1-62703-435-7_18; doi: 10.1089/10430349950017734). To account for off-target effects we quantified miR-182-5p and target gene expression in spleen and heart. Although miR-182-5p concentrations in mimic treated mice were strongly increased in these tissues, expression in the liver was still highest (new supplementary figure 6A).

      (3) The HE and Oil red staining of the mouse liver should be shown in miR-182-5p overexpressing mice compared with the control mice, which could provide a more intuitive view of the fat content in the mouse liver.

      Unfortunately the livers were flash frozen and not optimally prepared for later histological analyses. Nevertheless, we performed H&E stainings in all livers and provide representative HE stainings of two control and two miR-182-mimic treated mice (new supplementary figure 5D). The increase hepatic lipid content is clearly visible in the H&E staining of miR-182-mimic treated mice and supports our previous findings of increased hepatic triglycerides (Figure 4H). Due to the freezing process, livers were damaged and Oil red staining was impossible.

      (4) After overexpression of miR-182-5p in mice, the serum insulin levels were increased. Does miR-182-5p affect insulin resistance in mice? The insulin tolerance test (ITT) experiment needs to be performed.

      We thank the reviewer for this comment. Indeed, the performance of an ITT would have clarified the effects of miR-182 on insulin tolerance best. Because we did not see differences in the GTT after treating mice acutely with the miR-182 mimic we decided to not perform the ITT in this short-term. The increased fasting serum levels after miR-182-5p mimic treatment (Fig. 4G) suggest that rather insulin sensitivity than insulin secretion is disturbed by miR-182-5p. We are aware, that in future experiments mice should be treated for a longer period with miR-182-5p mimics and that an ITT should be performed in these more chronic studies.

      (5) In Figure 2H, the author measured the level of p-Akt/Akt to indicate the effect of miR-182-5p on insulin resistance in HepG2 cells. It is best to provide the western blotting results of p-AKT and t-AKT after HepG2 cells are treated with or without insulin.

      We now provide the full blots for all western blotting experiments as new supplemental figure 3B. The HepG2 cells were stimulated with 20 nM insulin 10 min before harvest as described in 2.11 and consequently Akt and p-Akt were quantified. We did not analyze Akt and p-Akt without stimulation because Akt is rarely phosphorylated in the basal non-insulin stimulated state.

      (6) This study suggests that miR-182-5p may promote insulin resistance and hyperinsulinemia by downregulating LRP6. Nevertheless, to confirm this conclusion, we suggest you transfect miR-182-5p after downregulating the level of LRP6 with its siRNA for further validation.

      Because miR-182-5p targets LRP6 as we have validated by luciferase-assays, LRP6 levels are already low after miR-182-5p overexpression. Thus, the additional downregulation of LRP6 by other means (such as siRNAs) does not make sense in our opinion.

      (7) The author described that serum miR-182-5p was neither altered in T2D nor correlated with hepatic miR-182-5p expression, so is it suitable as the biomarker of T2D?

      Yes, as the reviewer stated correctly, serum concentrations of miR-182-5p were not related to its liver concentrations or the type 2 diabetic state. We therefore suggest that circulating miR-182-5p levels are not a suitable biomarker for T2D. We clarified this in the discussion.

      (8) What are the changes in fasting blood glucose levels in HFD, HC, and YoYo mouse models? Is there a correlation between miR-182-5p level and fasting blood glucose level in T2D patients and mouse models?

      Unfortunately, we did not measure the fasting blood glucose levels in this mouse model and therefore cannot answer this question. However, we provide the fasting insulin levels of our mouse models and their positive correlations with miR-182-5p (Fig. 3D and Suppl.Fig. 5D). In T2D humans, hepatic miR-182-5p correlates positively with fasting glucose (Fig. 2B).

      (9) The capitalization of the letters in "STrengthening the Reporting of OBservational studies in Epidemiology" should be checked. What does the "Among these is miRNAs miR-182-5p" mean? Please clarify it.

      The “STrengthening the Reporting of OBservational studies in Epidemiology “ report form is abbreviated as “STROBE” list. We this capitalized the letters that are used to build the abbreviation.

      “Among these is miRNAs miR-182-5p” is a typo for which we apologize. It should mean “Among these conserved miRNAs is miR-182-5p.” We corrected this error.

      Reviewer #3 (Recommendations For The Authors):

      (1) The functional importance of miR-182 on gene expression is not rigorously tested.

      (A) Many of the target genes in Fig. 1C and Fig. 3 are controlled by multiple factors that are known to be increased with obesity (e.g., lipogenic genes are increased by hyperinsulinemia), making it likely that their association with miR-182 is correlative rather than a consequence of miR-182 increases.

      We thank the reviewer for this comment and agree that miR-182 is not the only factor regulating the here investigated genes. We rather propose, that miR-182 could be an additional upstream regulator that holds the potential to modify entire pathways of insulin signaling and lipogenesis. However, miR-182 should be not viewed as an on/off-switch as it likely plays a modulating role. Although, our in vivo data stemming from humans and mice are correlative we believe that the in vitro data derived in HepG2 cells clearly show a causal role for miR-182-5ß in decreasing LRP6 and insulin signaling, indicated by lower AKT phosphorylation after miR-182-5p overexpression.

      (B) 500-fold overexpression of miR-182 does not significantly change gene expression. The authors need to knockdown miR-182 in mice and then feed them a chow versus high-fat diet. If miR-182 is a significant regulator of these genes, the effects of the diet will be blunted.

      We thank the reviewer for the constructive criticism and agree that an optimal experiment would be to antagonize miR-182-5p in mice to rescue glucose and lipid metabolism. There here presented in vivo upregulation of miR-182-5p was a proof-of-concept study to confirm our hypothesis in a reasonable timeframe. We are aware, that follow-up studies are needed, and we are now planning studies in which miR-182-5p will be overexpressed and also antagonized for a longer time. However, for the timeframe of this revision process these extensive studies are not feasible and we ask the reviewer for his/her understanding. 

      (2) It has previously been shown that miR-182 is in a polycistrionic microRNA locus that is activated directly by SREBP-2. Is this also true in humans? If so, this would indicate that miR-182 is a marker of SREBP activity. How does the nuclear active form of SREBP1 and SREBP2 change in the human livers and HFD-fed mice?

      We thank the reviewer for this very interesting question. Suitable experiments to investigate if miR-182-5p is activated by SREBF would be EMSAs or ChIPs. Unfortunately we have only frozen protein lysate of the human livers left in which such experiments cannot be performed. We agree that this should be prioritizes in the future.

      (3) Similarly, to test the role of LRP6 in mediating the effects of miR-182, the authors should compare the effects of miR-182 overexpression in the presence and absence of LRP6.

      Because miR-182-5p targets LRP6 as we have validated by luciferase-assays, LRP6 levels are already low after miR-182-5p overexpression. Thus, the additional downregulation of LRP6 by other means (such as siRNAs) does not make sense in our opinion.

      (4) The methods are a bit confusing. The authors state that "we applied a logistic regression analysis for the 594 mature miRNAs using the NAFLD activity score (NAS) as a cofactor to exclude any bias by hepatic fat content, lobular inflammation, and fibrosis." However, they later showed that miR-182 levels are correlated with NAS. Please clarify.

      We excluded NAFLD explicitly as driving factor for the association to T2D by including a surrogate (the NAFLD activity score) as cofactor. It is well known that NAFLD and T2D are indeed likely associated to each other. Since not all our included individuals with T2D have NAFLD and vice versa, a second correlation with NAS revealed also that a high NAS is associated with higher expression of miR-182.

      (5) Does two-fold overexpression of miR-182 (which mimics the effects of HFD) have any effect on chow-fed mice?

      This is a very interesting question that we unfortunately cannot answer right now. We are planning further mouse studies in which we will include a chow-fed mice as controls.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer 1:

      Weaknesses:

      While I generally agree with the author's interpretations, the idea of Saccorhytida as a divergent, simplified off-shot is slightly contradictory with a probably non-vermiform ecdysozoan ancestor. The author's analyses do not discard the possibility of a vermiform ecdysozoan ancestor (importantly, Supplementary Table 4 does not reconstruct that character),

      Saccorhytids are only known from the early Cambrian and their unique morphology has no equivalent among any extinct or extant ecdysozoan groups. This prompted us to consider them as a possible dead-end evolutionary off-shot. The nature of the last common ancestor of ecdysozoan (i.e. an elongated worm-like or non-vermiform animal with capacities to renew its cuticle by molting) remains hypothetical. At present, palaeontological data do not allow us to resolve this question. The animal in Fig. 4b at the base of the tree is supposed to represent an ancestral soft-bodied form with no cuticle from which ecdysozoan evolved via major innovations (cuticular secretion and ecdysis). Its shape is hypothetical as indicated by a question mark. Our evolutionary model is clearly intended to be tested by further studies and hopefully new fossil discoveries.

      …and outgroup comparison with Spiralia (and even Deuterostomia for Protostomia as a whole) indicates that a more or less anteroposteriorly elongated (i.e., vermiform) body is likely common and ancestral to all major bilaterian groups, including Ecdysozoa. Indeed, Figure 4b depicts the potential ancestor as a "worm". The authors argue that the simplification of Saccorhytida from a vermiform ancestor is unlikely "because it would involve considerable anatomical transformations such as the loss of vermiform organization, introvert, and pharynx in addition to that of the digestive system". However, their data support the introvert as a specialisation of Scalidophora (Figure 4a and Supplementary Table 4), and a pharyngeal structure cannot be ruled out in Saccorhytida. Likewise, loss of an anus is not uncommon in Bilateria. Moreover, this can easily become a semantics discussion (to what extent can an animal be defined as "vermiform"? Where is the limit?).

      We agree that “worm” and “vermiform” are ill-defined terms. They are widely used in various palaeontological and biological papers to describe elongated tubular animals such as edydsozoans and annelids (see Giribet and Edgecombe 2017; popular textbook written by Nielsen 2012; Schmit-Rhaesa 2013; Brusca et al. 2023; Giribet and Edgecombe 2020). Very few other animals are termed “worms”. Changes have been made in the text to solve this semantic problem, for example in the abstract where we added (i.e elongated and tubular) to better define what we mean by “vermiform”.

      Priapulid worms or annelids are examples of extremely elongated, tubular animals. In saccorhytids, the antero-posterior elongation is present (as it is in the vast majority of bilaterians) but extremely reduced, Saccorhytus and Beretella having a sac-like or beret-shape, respectively. That such forms may have derived from elongated, tubular ancestors (e.g. comparable with present-day priapulid worms) would require major anatomical transformations that have no equivalent among modern animals. We agree that further speculation about the nature of these transformations is unnecessary and should be deleted simply because the nature of these ancestors is purely hypothetical. We also agree that the loss of anus and the extreme simplification of the digestive system is common among extant bilaterians. In Figure 4b, the hypothetical pre-ecdysozoan animal is slightly elongated (along its antero-posterior axis) but in no way comparable with a very elongated and cylindrical ecdysozoan worm (e.g. extant or extinct priapulid).

      Therefore, I suggest to leave the evolutionary scenario more open. Supporting Saccorhytida as a true group at the early steps of Ecdysozoa evolution is important and demonstrates that animal body plans are more plastic than previously appreciated. However, with the current data, it is unlikely that Saccorhytida represents the ancestral state for Ecdysozoa (as the authors admit), and a vermiform nature is not ruled out (and even likely) in this animal group. Suggesting that the ancestral Ecdysozoan might have been small and meiobenthic is perhaps more interesting and supported by the current data (phylogeny and outgroup comparison with Spiralia).

      We agree to leave the evolutionary scenario more open, especially the evolutionary process that gave rise to Saccorhytida. Again, we know nothing about the morphology of the ancestral ecdysozoan (typically the degree of body elongation, whether it had a differentiated introvert or not, whether it had a through gut or not). In Fig.4, the ancestral ecdysozoan is supposed to have evolved from a soft-bodied epibenthic animal through key innovations such as the secretion of a cuticle and ecdysis. It is a hypothesis that needs to be tested by further studies and fossil discoveries. Speculations concerning the process through which saccorhytids may have arisen have been deleted.

      Reviewer 2:

      Weaknesses:

      The preservations of the specimens, in particular on the putative ventral side, are not good, and the interpretation of the anatomical features needs to be tested with additional specimens in the future. The monophyly of Cycloneuralia (Nematoida + Scalidophora) was not necessarily well-supported by cladistic analyses, and the evolutionary scenario (Figure 4) also needs to be tested in future works.

      Yes, we agree that the animal described in our manuscrip remains enigmatic (e.g. the natures of its internal organs, its lifestyle, etc..). Whereas the dorsal side of the animal is well documented (consistent pattern of pointed sclerites), uncertainties remain concerning its ventral anatomy (typically the mouth location and shape). Additional better-preserved specimens will hopefully provide the missing information. Concerning Cycloneuralia, their monophyly is generally better supported by analyses based on morphological characters than in molecular phylogenies.

      Reviewer 3:

      Weaknesses:

      I, as a paleontology non-expert, experienced several difficulties in reading the manuscript. This should be taken into consideration when assuming a wide range of readers including non-experts.

      We have ensured that the text is comprehensible to biologists. The main results are summarized in relatively simple diagrams (e.g. Fig. 4) that can be understood by non-specialized readers. We are aware that technical descriptive terms may appear obscure to non-specialists. We can hardly avoid them in the descriptive parts. However, our figures (e.g. SEM images and 3D-reconstruction) are clear enough to give the reader a clear idea of the morphology of Beretella.

      Recommendations for the authors:

      All three reviewers appreciate the discovery and found the merit of publishing this manuscript. They also raised some concerns about the data presentation. The authors are requested to perform no additional analysis but to go through all the reviewer comments and rebut or intake them in revising the manuscript.

      Reviewer 1:

      - Line 41: comma after "ecdysozans".

      OK, done.

      - Formatting style: add a space before references.

      OK, done.

      - Line 169: B. spinosa in italics

      OK, done.

      - Line 157: could the "relatively large opening" in the flattened ventral side of a mouth (even when altered by the fossilisation process)?

      Most bilaterians have a mouth. There is no opening on the relatively well-preserved dorsal side of Beretella, that could be interpreted as a mouth. In contrast the flattened ventral side often show a depressed area that could potentially bear a mouth. This ventral area is often pushed in and poorly preserved. The cuticle of this ventral side might have been relatively thinner, perhaps more flexible than that of the dorsal one (with strong sclerites). These differences might explain why the possible oral area is poorly preserved.

      - Line 178: "position of the mouth"

      OK, done.

      - Line 219: "These sclerites, unknown..."

      OK, done.

      - Line 282: update reference formatting

      OK, done.

      - Line 298: remove reference to Supplementary Table 4, as it does not refer to the possible vermiform nature of the last common ecdysozoan ancestor?

      OK, done.

      - Figure 4a: change "paired legs" for "paired appendages"?

      OK, done.

      - Supplementary Table 4: For TGE and Introvert, the state 0 (absent) should be in bold and underlined (as it is the most likely state).

      OK, done.

      Reviewer 2:

      Line 25: "from the early Cambrian" should be changed into "from the lower Cambrian"

      OK, done.

      Line 126: The range of maximum length should be reported in µm (rather than mm) just like those of maximum width and height.

      OK, done.

      Lines 191-192: Please recheck the figure panels of Saccorhytus (Supplementary Figure 4c) and scalidophoran worm (Supplementary Figure 4d). Perhaps, the former should refer to Figure 4d, and the latter to Figure 4c?

      OK, done.

      Lines 239 and 241: "1" and "2" appear to stand for citations (the other journal style), but I am not certain what they are.

      To avoid confusing, we replace ‘1’ and ‘2’ by ‘i’ and ‘ii’.

      Figures 3d and 4a: "Cycloneuralia" should be included in the phylogenetic trees.

      OK, done.

      Figure 3: The caption for the panel d is redundant. It should be changed into, for example, "Phylogenetic tree obtained from cladistic analyses using maximum likelihood (IQTREE)."

      OK, done.

      Supplementary Figures 6-9: In the captions, more detailed explanations of the results (for example, "50% majority rule consensus of XXX trees" and "strict consensus of all 4 most-parsimonious trees") should be provided.

      OK, done.

      Supplementary Figures 8 and 9: The caption explains that Cycloneuralia is resolved as a paraphyletic group, but it is not certain because Nematoida, Scalidophora, and Panarthropoda are resolved in a polytomy.

      We changed the sentence into:

      “Note that Cycloneuralia does not appear as a monophyletic clade”

      Reviewer 3:

      Line 25 'tiny' - I suggest giving an absolute measure of the size.

      We add ‘maximal length 3 mm’.

      Line 29 'both forms' - This is hard to follow by a non-expert. Can this be replaced with 'fossil species'?

      OK, done.

      Line 32 'dead-end' - Is this word necessary? I suggest to skip this word, as it is obvious that this lineage is extinct.

      OK, done.

      Lines 80, 94, and 172 'Remarks' - I, as a palaeontology non-expert, cannot get this manuscript structure with a repetition of this same section title.

      Our systematic descriptions follow the standard rules in palaeontology.

      Line 119 - I could not get what this 'Member 5' that was not introduced earlier means.

      In Stratigraphy, ‘member’ is a lithostratigraphic subdivision (a Formation is usually subdivided into several Members).

      Lines 104, 105, 417, ... - The name of the organization or database hosting these IDs (CUB.... and ELIXX....) should also be supplied.

      OK, done.

      Lines 341 and 361 - These two Figures (Figures 1 and 2) have the same caption (with an addition to the one for Figure 1). There should be a distinction based on what is presented in each figure.

      We corrected the caption of Figure 2 and wrote the following: ‘Beretella spinosa gen. et sp. nov.’.

      Line 362-367 - There is no guide about what the individual figure panels (e.g., Figure 2g, 2h, and 2i) show in detail. This guide should be supplied. This also applies to Figure 3a-c - are they anterolateral (a), dorsal (b), and posterolateral (c) views? It is better to write clearly in this way.

      OK, done.

      Figure 3d - The color contrast is not sufficient, and this figure does not look reader-friendly. Plus, the division into Cycloneuralia and Panarthropoda is indicated above the tree, but it is not clear what range of lineages these clades include. For example, is Pliciloricidae included in Cycloneuralia? Also, is Collinsium included in Panarthropoda? This figure looks quite unreliable, and it should be easy to fix.

      OK, done.

      Line 277 legend of Figure 3 - Including the parenthesis only with the program name (IQTREE) is not useful at all. Isn't it enough to describe it in Methods?

      OK, done. We remove (IQTREE).

      Line 380 legend of Figure 3 - I could not get where 'thicker bars' are.

      Known fossil record indicated by thicker vertical bars. We added “vertical”.

      Line 453 - Give full names of the methods, maximum parsimony, and maximum-likelihood.

      OK, done.

      Line 489 - State clearly what 'the recent paper' means.

      Replace ‘recent’ by ‘present’.

    1. Author Response:

      We thank the reviewers for careful reading, acknowledging the strength of our manuscript, and pointing out its weakness, which we will address in the revised version as described below.

      (1) We will supplement our analysis with finer statistical testing and analysis, such as cross-validation and a more detailed analysis of the relation between the inferred model and the intrinsic timescales of the system. For the effect of the drug TIMP-1 on the animal, we will first explore the possibility of assessing the results using a multifactor ANOVA test, with the caveat that the distribution of interactions is not Gaussian. We will further test the effect of different group size on the significance of our results by considering subgroups of animals in the drug group, and compare the statistics between the (subsampled) drug group and the controlled group.

      (2) Our manuscript is similar with that of Shemesh et al. in that we both analyze socially interacting mice by constructing maximum entropy models (MEM) of the co-localization patterns of mice. The difference is in the setup and the number of mice (4 mice in Shemesh et al, 10-15 in our work), as we outlined in the manuscript. To further supplement our current argument of the difference of our results in the Discussion section, we will learn a MEM model up to triplet interactions for our Eco-HAB mice data, and compare to our current MEM model up to pairwise interactions using test-set validation or the Bayesian information criterion (BIC).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) The manuscript by Lu et al aims to study the effects of tubulin post-translational modification in C. elegans touch receptor neurons. Authors use gene editing to engineer various predicted PTM mutations in a-tubulin MEC-12 and b-tubulin MEC-7. Authors generate and analyze an impressive battery of mutants in predicted phosphorylation site and acetylation site of b-tubulin MEC-7, K40 acetylation site in a-tubulin MEC-12, enzymatic site of the a-tubulin acetyltransferase MEC-17, and PTM sites in the MEC-12 and MEC-7 C-tails (glutamylation, detyrosination, delta-tubulin). This represents a lot of work, and will appeal to a readership interested in C. elegans touch receptor neurons. The major concern/criticism of this manuscript is whether the introduced mutation(s) directly affects a specific PTM or whether the mutation affects gene expression, protein expression/stability/localization, etc. As such, this work does convincingly demonstrate, as stated in the title, that "Editing of endogenous tubulins reveals varying effects of tubulin posttranslational modifications on axonal growth and regeneration." 

      We thank the reviewer for the constructive comments. With regards to the major concern or criticism, we like to point out that we have previously characterized ~100 missense mutations in mec-7 and mec-12 (Zheng et al., 2017, PMID: 28835377; Lee et al., 2021, PMID: 33378215). So, we are familiar with the phenotypes associated with mutations that affect gene expression or protein stability, which mostly result in a null phenotype. When analyzing the PTM site mutants, we compared their phenotypes with the previously categorized phenotypes of null alleles, neomorphic mutations that increase microtubule stability, and antimorphic mutations that prevent polymerization or disrupt microtubule stability. For example, in the case of mec-7 S172 mutations, we found that S172P mutants had the same phenotype as the mec-7 knockout (mild neurite growth defects), suggesting that S172P likely affects protein folding or stability, resulting in the loss of MEC-7. In contrast, S172A and S172E mutations showed phenotypes similar to neomorphic alleles (the emergence of ectopic ALM posterior neurite) and antimorphic alleles (the severe shortening of all neurites in the TRNs), respectively. These phenotypic differences suggested to us that the effects of S172A and S172E mutations cannot be simply attributed to the loss of protein expression and stability. Similar logic was applied to the studies of other PTM-inactivating or -mimicking mutations.

      (2) For example, the authors manipulate the C-terminal tail of MEC-12 and MEC-7, to test the idea that polyglutamylation may be an important PTM. These mutants displayed subtle phenotypes. The authors show that branch point GT335 and polyglutamyation polyE recognizing antibodies stain cultured embryonic touch receptor neurons (TRNs), but did not examine staining in C. elegans TRNs in situ. To my knowledge, these antibodies have not been shown to stain the TRNs in any published papers, raising the question of how these "glutamylation" mutations are affecting mec-12 and -7. The rationale for using cultured embryonic TRNs and the relevance of the data and its interpretation are not clear. 

      The GT335 and polyE antibodies were used by previous studies (O’Hagan et al., 2011, PMID: 21982591; and O’Hagan et al., 2017, PMID: 29129530) to detect the polyglutamylation signals in the sensory cilia of C. elegans. We initially tried to stain the whole animals using these antibodies but could not get clear and distinct signals in the TRNs. We reason that the tubulin polyglutamylation signals in the TRNs may be weak, and the in situ staining method which requires the antibodies to penetrate multiple layers of tissues (e.g., cuticles and epidermis) to reach the TRN axons may be not sensitive enough to detect the signal. In fact, the TRN axons are located deeper in the worm body compared to the sensory cilia that are mostly exposed to the environment. Another reason could be that the tissues (mostly epidermis) surrounding the TRN axons also have polyglutamylation staining, which makes it difficult to recognize TRN axons. This is a situation different from the anti-K40 acetylation staining, which only occurs in the TRNs because MEC-12 is the only a-tubulin isotype that carries K40. Due to these technical difficulties, we decided to use the in vitro cultured TRNs for the staining experiment, which allows both easy access of the antibodies (thus higher sensitivity) and the dissociation of the TRNs from other tissues. The fact that we were able to observe reduced staining in the ttll mutants and the tubulin mutants that lost the glutamate residues suggest that these antibodies indeed detected glutamylation signals in the cells.

      (3) The final paragraph of the discussion is factually incorrect. The C. elegans homologs of the CCP carboxypeptidases are called CCPP-1 and CCPP-6. There are several publications on their functions in C. elegans.

      We thank the reviewer for pointing out the mistake in the text. We intended to say that “there is no C. elegans homolog of the known tubulin carboxypeptidases that catalyze detyrosination”, which is true given that the detyrosinase vasohibins (VASH1/VASH2) homologs cannot be found in C. elegans. We are aware of the publications on CCPP-1 and CCPP-6; CCPP-1 is known to regulate tubulin deglutamylation in the cilia of C. elegans (O’Hagan et al., 2011 and 2017), while CCPP-6 may function in the PLM to regulate axonal regeneration (Ghosh-Roy et al., 2012). In the revised manuscript, we have corrected the error.

      Reviewer #2 (Public Review):

      Summary:

      The tubulin subunits that make up microtubules can be posttranslationally modified and these PTMs are proposed to regulate microtubule dynamics and the proteins that can interact with microtubules in many contexts. However, most studies investigating the roles of tubulin PTMs have been conducted in vitro either with purified components or in cultured cells. Lu et al. use CRISPR/Cas9 genome editing to mutate tubulin genes in C. elegans, testing the role of specific tubulin residues on neuronal development. This study is a real tour de force, tackling multiple proposed tubulin modifications and following the resulting phenotypes with respect to neurite outgrowth in vivo. There is a ton of data that experts in the field will likely reference for years to come as this is one of the most comprehensive in vivo analyses of tubulin PTMs in vivo.

      This paper will be very important to the field, however would be strengthened if: 1) the authors demonstrated that the mutations they introduced had the intended consequences on microtubule PTMs, 2) the authors explored how the various tubulin mutations directly affect microtubules, and 3) the findings are made generally more accessible to non C. elegans neurobiologists.

      (1) The authors introduce several mutations to perturb tubulin PTMs, However, it is unclear to what extent the engineered mutations affect tubulin in the intended way i.e. are the authors sure that the PTMs they want to perturb are actually present in C. elegans. Many of the antibodies used did not appear to be specific and antibody staining was not always impacted in the mutant cases as expected. For example, is there any evidence that S172 is phosphorylated in C. elegans, e.g. from available phosphor-proteomic data? Given the significant amount of staining left in the S172A mutant, the antibody seems non-specific in this context and therefore not a reliable readout of whether MTs are actually phosphorylated at this residue. As another example, there is no evidence presented that K252 is acetylated in C. elegans. At the very least, the authors should consider demonstrating the conservation of these residues and the surrounding residues with other organisms where studies have demonstrated PTMs exist. 

      We thank the reviewer for the comments. To our knowledge, there are very few phosphor-proteome data available for C. elegans. We searched a previously published dataset (Zielinska et al., 2009; PMID: 19530675) and did not find the S172 phosphorylation signal in MEC-7. This is not surprising, given that only six touch receptor neurons expressed MEC-7 and the abundance of MEC-7 in the whole animal lysate may be below the detection limit. However, this phosphorylation site S172 is highly conserved across species and tubulin isotypes (Figure 1-figure supplement 1 in the revised manuscript), suggesting that this site is likely phosphorylated in MEC-7.

      In the case of K252, the potential acetylation site and the flanking sequences are extremely conserved across species and isotypes. In fact, the 20 amino acids from 241-260 a.a. are identical among the tubulin genes of C. elegans, fruit flies, Xenopus, and humans (Figure 4-figure supplement 1B). Thus, although K252 acetylation was found in the HeLa cells, this site can possibly be acetylated. 

      In the case of K40, we observed sequence divergence at the PTM site and adjacent sequences among the tubulin isotypes in C. elegans. MEC-12 is the only C. elegans a-tubulin isotype that has the K40 residue, and the 40-50 a.a. region of MEC-12 appears to be more conserved than other isotypes when compared to Drosophila, frog, and human a-tubulins (Figure 4-figure supplement 1A).

      (2) Given that the authors have the mutants in hand, it would be incredibly valuable to assess the impact of these mutations on microtubules directly in all cases. MT phenotypes are inferred from neurite outgrowth phenotypes in several cases, the authors should look directly at microtubules and/or microtubule dynamics via EBP-2 when possible OR show evidence that the only way to derive the neurite phenotypes shown is through the inferred microtubule phenotypes. For example, the effect of the acetylation or detyrosination mutants on MTs was not assessed. 

      We thank the reviewer for the suggestions. In this study, we created >20 tubulin mutants. Due to limited time and resources, we were not able to examine microtubule dynamics in every mutant strain using EBP-2 kymographs. We assessed the effects of the tubulin mutations mostly based on the changes on neurite growth pattern. From our previous experience of analyzing ~100 mec-7 and mec-12 missense mutations (Zheng et al., 2017, MBoC; Lee et al., 2021, MBoC), we found that the changes in microtubule dynamics are correlated with the changes in neuronal morphologies. For example, the growth of ectopic ALM-PN is correlated with fewer EBP-2 comets and potentially reduced microtubule dynamics; this correlation holds true for several mec-7 neomorphic missense alleles we examined before (Lee et al., 2021, MBoC) and the PTM site mutants [e.g., mec-7(S172A) and mec-12(4Es-A)] analyzed in this study. Similarly, the shortening of TRN neurites is correlated with more EBP-2 comets and increased microtubule dynamics. For the mutants that don’t show neurite growth defects, our previous experience is that they are not likely to show altered microtubule dynamics in EBP-2 tracking experiments. So, we did not analyze the acetylation mutants (which had no defects in neurite growth) and the detyrosination mutants (which had weak ALM-PN phenotype). Nevertheless, we agree with the reviewer that we could not rule out the possibility that there may be some slight changes to microtubule dynamics in these mutants.

      Using tannic acid staining and electron microscopy (EM), we previously examined the microtubule structure in several tubulin missense mutants (Zheng et al., 2017, MBoC) and found that the loss-of-function and antimorphic mutations significantly reduced the number of microtubules and altered microtubule organizations by reducing protofilament numbers. These structural changes are consistent with highly unstable microtubules and defects in neurite growth. On the other hand, neomorphic mutants had only slight decrease in microtubule abundance, maintained the 15-protofilament structure, and had a more tightly packed microtubule bundles that filled up most of the space in the TRN neurite (Zheng et al., 2017, MBoC). These structural features are consistent with increased microtubule stability and ectopic neurite growth. Although we did not directly examine the microtubule abundance and structure using EM in this study, we would expect similar changes that are correlated with the neurite growth phenotypes in the PTM mutants. We agree with the reviewer, it will be informative to conduct more comprehensive analysis on these mutants using EM and other structural biology methods.

      (3) There is a ton of data here that will be important for experts working in this field to dig into, however, for the more general cell biologist, some of the data are quite inaccessible. More cartoons and better labeling will be helpful as will consistent comparisons to control worms in each experiment.

      Response: We thank the reviewer for the comment. In the revised manuscript, we added some cartoons to Figure 2G to show the location of the synaptic vesicles. The neurite growth phenotype should be quite straightforward. Nevertheless, we added one more Figure (Figure 8) to summarize all the results in the study with cartoons that depicted the changes to neuronal morphologies.

      (4) In addition, I am left unconvinced of the negative data demonstrating that MBK does not phosphorylate tubulin. First, the data described in lines 207-211 does not appear to be presented anywhere. Second, RNAi is notoriously finicky in neurons, thus necessitating tissue-specific degradation using either the ZF/ZIF-1 or AID/TIR1 systems which both work extremely well in C. elegans. Third, there appears to be increasing S172 phosphorylation in Figure 3 Supplement 2 with added MBK-2, but there is no anti-tubulin blot to show equal loading, so this experiment is hard to interpret.

      We added the results of mbk-1, mbk-2, and hpk-1 mutants and cell-specific knockdown of MBK-2 into Figure 3-figure supplement 1D. Considering the reviewer’s suggestion, we attempted to use a ZIF-1 system to remove the MBK-2 proteins specifically in the TRNs using a previously published method (PMID: 28619826). We fused endogenous MBK-2 with GFP by gene editing and then expressed an anti-GFP nanobodies fused with ZIF-1 in the TRNs to induce the degradation of MBK-2::GFP. To our surprise, unlike the mbk-2p::GFP transcriptional reporter, the MBK-2::GFP did not show detectable expression in the TRNs, although expression can be seen in early embryos, which is consistent with the “embryonic lethal” phenotype of the mbk-2(-) mutants (Figure 3-figure supplement 2A-B in the revised manuscript). We reason that either endogenous MBK-2 is not expressed in the TRNs or is expressed at a very low level. We then crossed mbk-2::GFP with ItSi953 [mec-18p::vhhGFP4::Zif-1] to trigger the degradation of any potential MBK-2 proteins and did not observe the ectopic growth of ALM-PN (Figure 3- figure supplement 2C). These results suggest that MBK-2 is not likely to regulate tubulin phosphorylation in the TRNs, which is consistent with the results of other genetic mutants and the RNAi experiments.

      For Figure 3 Supplement 2 (Figure 3-figuer supplement 3 in revised manuscript), because we added the same amount of purified MEC-12/MEC-7 to all reactions and had established equal loading in Figure 3E, we did not do the anti-tubulin staining in this experiment. Since higher concentration (1742 nM) of MBK-2 did not produce stronger signal than the condition with 1268 nM, we don’t think the 1268 nM band represents true phosphorylation. Moreover, the signal is not significantly stronger than the control without MBK-2 and is much lower than the signal generated by CDK1 in Figure 3E. Based on these results, we concluded that MBK-2 is not likely to phosphorylate MEC-7.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      General:

      A summary table would help the reader digest the vast amount of phenotypic data.

      Cartoons to help a non-C. elegans reader understand the figures. 

      We added Figure 8 to summarize and illustrate the effects of the various mutants analyzed in this study.

      Specific:

      The authors engineered mutations into the predicted phosphorylation site of b-tubulin mec-7. These CRISPR-alleles mutations phenocopied previously identified loss-of-function, gain-of-function, and neomorphic mec-7 alleles identified in genetic screens by the Chalfie lab. Next, the authors sought to identify the responsible kinase, taking a candidate gene approach. The most likely family - minibrain - had no effect when knocked down/out. The authors showed that cdk-1 mutants displayed ectopic ALM-PN outgrowth. Whether cdk-1 specifically acts in the TRNs was not demonstrated, calling into question whether CDK-1 phosphorylates S172 in vivo. In their introduction (lines 45-59), the authors built a case for engineering PTM mutations directly into tubulins, because the PTM enzymes may have multiple substrates. This logic applies to the cdk-1 experiment and its interpretation. 

      The reviewer is right. Since CDK1 and minibrain kinase are the only known kinases that catalyze S172 phosphorylation, our results suggest that CDK-1 is more likely to catalyze S172 phosphorylation in the TRNs compared to MBK-1/2. Genetic studies found that cdk-1(-); mec-7(S172A) double mutants did not show stronger phenotype than the two single mutants, suggesting that they function in the same pathway. Nevertheless, we could not rule out the possibility that other kinases may also control S172 phosphorylation, and the effect of CDK-1 is indirect. We mentioned this possibility in the revised manuscript.

      For a-tubulin MEC-12, acetyl-mimicking K40Q and unmodifiable K40R mutants failed to stain with the anti-acetyl-a-tubulin (K40) antibody and displayed subtle TRN phenotypes. The enzymatically dead MEC-17 had phenotypes similar to those described by Topalidou (2012), confirming the Chalfie lab finding that MEC-17 has functions in addition and independent of its acetyltransferase activity. The authors moved onto a predicted acetylation site in MEC-7 and observed TRN developmental defects, and acknowledged that this may be due to tubulin instability and not a PTM. This is a concern for all mutants, as there is no way to measure whether the protein is expressed, stable, or localized properly. 

      We acknowledge that this is a caveat of mutational studies. An amino acid substitution at the PTM site may have multiple effects, including the change of the PTM state and potential alteration of protein conformation. Without direct evidence for enzymatic modification of the PTM site in the neurons, we could not rule out the possibility the phenotype we observed is not related to PTM and instead is the result of abnormal protein conformation and function caused by the mutation.

      Nevertheless, as stated in our above response to the first point in the public review, we can phenotypically differentiate loss-of-function and gain-of-function mutants. If the mutation reduces expression or general protein stability, it is more likely to cause a loss-of-function phenotype. For most PTM site mutants, this is not the case. We observed mostly gain-of-function phenotype, suggesting that the missense mutations did not simply inactivate the tubulin protein and instead affected the functional properties of the protein.

      From here, the authors manipulate the C-terminal tail of MEC-12 and MEC-7, testing the idea that polyglutamylation may be an important PTM. These mutants displayed subtle phenotypes. The authors show that branch point GT335 and polyglutamyation polyE recognizing antibodies stain cultured embryonic TRNs, but did not examine staining in TRNs. To my knowledge, these antibodies have not been shown to stain the TRNs in any published papers (see next point). The rationale for using cultured embryonic TRNs is not clear. 

      See our response to the second point in the public review.

      Lines 548-553 There are several publications on CCPP-1 and CCPP-6 functions in TRNs and ciliated sensory neurons. See

      PMID: 20519502

      PMID: 21982591

      PMID: 21943602

      PMID: 23000142

      PMID: 29129530

      PMID: 33064774

      PMID: 36285326

      PMID: 37287505 

      We thank the reviewer for pointing out these references, some of which were cited in the revised manuscript. We made a mistake in the Discussion by saying that there are no C. elegans homologs of tubulin carboxypeptidases while we intended to state that there is no homolog of tubulin detyrosinase in C. elegans. We are aware of the studies of CCPP-1 and CCPP-6 and have corrected the mistake in revised manuscript (also see our response to the third point in the public review).

      Reviewer #2 (Recommendations For The Authors):

      Figures: 

      As stated in the public review, more cartoons and better labeling will be helpful as will consistent comparisons to control worms in each experiment. A good example of this issue is demonstrated in Figure 2 and Figure 4: 

      (1) Figure 2: Please label images with what is being probed in each panel. 

      We added labels to the panels.

      (2) Figure 2G is very hard to interpret - cartoon diagramming what is being observed would be helpful. 

      We added cartoons to help illustrate the images.

      (3) Line 182-185: is this referring to your data or to Wu et al? It is not clear in this paragraph when the authors are describing published work versus their own data presented here. 

      It is from our data. We have made it clear in the revised manuscript.

      (4) Figure 2 - 2K is not well described. What experiment is being done here? What is dlk-1 and why did you look at this mutant? 

      Figure 2K showed that both wild-type animals and S172A mutants could reconnect the severed axons after laser axotomy. Previous studies have found that dlk-1(-) mutants were not able to regenerate axons due to altered microtubule dynamics (PMID: 19737525; PMID: 23000142). We used dlk-1(-) mutants as a negative control, because DLK-1 promotes microtubule growth following axotomy, and the DLK-1 pathway is essential for regeneration (PMID: 23000142). We want to highlight the phenotypic difference between dlk-1(-) mutants and the S172E mutants. Although both mutants showed similar regrowth length, dlk-1(-) mutants showed unbranched regrowth probably due to the lack of microtubule polymerization, whereas the S172E mutants showed a mesh-like regrowth pattern likely due to highly dynamic and unstable microtubules. We explained the different phenotypes in the revised manuscript.

      (5) Figure 4C: this phenotype is hard to interpret. Where is the wt control? Where is the quantification? 

      In the Figure legend, we have referred the readers to Figure 1G for the wild-type image. Quantification is provided in the text (~20% of the animals showed the branching defects).

      (6) There are no WT comparison images in Figure 4I, making the quantification difficult to interpret 

      In the Figure legend, we have referred the readers to Figure 1A for the wild-type control. Moreover, we included a new Figure 8 to summarize the phenotypes of all mutants.

      Experimental:

      (1) Is it clear that only MEC-7/MEC-12 are the only a- and b-tubulin present in the TRNs? The presence of other tubulins not mutated would complicate the interpretation of the results. 

      According to the mRNA levels, the expression of MEC-7 and MEC-12 are >100 fold higher than other tubulin isotypes. For example, single-cell transcriptomic data (Taylor et al., 2021) showed that mec-7 mRNA is at 135,940 TPM in ALM neurons, whereas two other tubulin isotypes, tbb-1 and tbb-2, have expression value of 54 and 554 TPM, respectively in the ALM. So, even if there are some other tubulin isotypes, their abundance is much lower than mec-7 and mec-12 and are not likely to interfere with the effects of the mec-7 and mec-12 mutants.

      (2) The in vitro kinase assays should be quantified. 

      We have added the quantification.

      (3) The idea that Cdk1 phosphorylates tubulin in interphase is surprising and I am left wondering how the authors propose that Cdk1 is activated in interphase. Is cyclin B (or another cyclin) present in interphase in this cell type? Expression but not activation of Cdk1 is not discussed. 

      CDK1 can work with cyclin A and cyclin B. C. elegans has one cyclin A gene (cya-1) and four cyclin B genes (cyb-1, cyb-2.1, cyb-2.2, and cyb-3). According to single-cell transcriptomic data of L4 animals, cya-1 and cyb-1 showed weak expression in many postmitotic neurons (including the ALM neurons), while cyb-2.1, cyb-2.2, and cyb-3 had no expression in neurons. So, it is possible that cya-1/cyclin A and cyb-1/cyclin B has low level of expression in the TRNs. A previous study also found the expression of cell cycle regulators (including cyclins) in postmitotic neurons in mouse brain (Akagawa et al., 2021; PMID: 34746147).

      (4) What is the significance of neurite swelling and looping in Figure 4H? The underlying cause of this phenotype is not described. 

      The neurite swelling and looping phenotype of mec-17(-) mutants were described by Topalidou et al., (2012; PMID: 22658602) and were caused by the bending of the microtubules. It appears that the loss of the a-tubulin acetyltransferase altered the organization of microtubules in the TRNs. These defects were partially rescued by the enzymatically dead MEC-17, suggesting that MEC-17 may play a non-enzymatic (and likely structural) role in regulating microtubule organization. We added more explanation in the revised manuscript.

      (5) It is quite surprising that polyglutamylation is not affected in the quintuple ttll mutant. Since the authors made the sextuple ttll mutant, could they demonstrate whether polyglutamylation is further reduced in this mutant via GT335 staining? 

      We did not make the comparison of the quintuple and sextuple ttll mutants because they were crossed with TRN markers with different colors for technical reasons. The quintuple mutants CGZ1475 carried uIs115 [mec-17p::TagRFP] IV, whereas the sextuple mutants CGZ1474 carried zdIs5 [mec-4p::GFP] I. As a result, we need to use different secondary antibodies for the antibody staining, which makes the results not compatible.

      Polyglutmaylation signal in the cell body was strongly affected by the ttll mutations. In fact, in the ttll-4(-); ttl-5(-); ttll-12(-) triple mutants, the signal is significantly reduced in the cell body of the TRNs, as well as the cell body of other cells. What’s surprising is that the signal in the axons persisted in the ttll triple and quintuple mutants. As the reviewers suggested, we also stained the sextuple mutants and found similar pattern as the triple and quintuple mutants (new Figure 6-figure supplement 1C in the revised manuscript), although the results are not quantitatively comparable due to the use of secondary antibodies with different fluorophores.

      Writing:

      (1) The beginning of the results section is quite jarring. The information in lines 96-104 should be in the Introduction. 

      Due to the nature of this paper, each section deals with a particular PTM. We think it is helpful to discuss some background information before describing our results on each PTM rather than giving all in the introduction. Nevertheless, we modified the beginning of the results to make it more coherent and more connected with the preceding paragraphs.

      (2) Line 122-126: conclusions are not supported by the data: it is suggested from previous experiments, but authors do not look at MTs directly. 

      We have rephrased the statement to acknowledge that we made such conclusion based on phenotypic similarity with mutants we previously examined.

      (3) I am confused by the usage of both mec-12(4EtoA) and mec-12(4Es-A). Are these the same mutations? If so, there needs to be consistency. If not, each case needs to be defined. 

      They are the same. We have corrected the mistake and are now using mec-12(4Es-A) to refer to the mutants.

      Line 105: phosphor --> phospho 

      Line 187: were --> was 

      Line 298: is --> are

      The above typos are corrected.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Recommendations For The Authors):

      I still find it really impressive that the Purkinje cell stimulation so closely mimics the pathogenic phenotypes - in my opinion, the strongest part of the paper. I would like just a little clarification on some of my previous questions.

      Major points:

      (1) Can the authors clarify where the new units came from? Are these units that were recorded before the initial submission and excluded, but are now included? If so, why were they excluded before? Or are these units that were recorded since the original submission?

      The number of units increased in Figure 1 for three reasons: 1) We have now plotted the classifier results in Figure 1 instead of the validation results, which have been moved to Figure 1 Supplement 3. 2) In response to reviewer comments, we no longer include units that had >60 s of recording in both our model creation and validation. We had previously used 30 s for creating the model and a different 30 s for validating the model, if an additional 30 s were available. 3) We changed our model creation and validation strategy based on previous reviewer comments. The new units in Figures 2-4 were taken from our pool of previously collected but unanalyzed data (we collect neural data on a rolling basis and thus these data were not initially available). We were fortunate to have these data to analyze in order to address the concerns about the number of cells included in the manuscript. The number of units increased in Figure 5 because new units were recorded in response to reviewer comments.

      (2) Why did some of the neuron counts go down? For example, in Pdx1Cre;Vglut2fl/fl mice, the fraction of units with the control signature went from 11/21 to 7/23. Is this because the classifier changed between the original submission and the revision?

      Yes, the proportion of cells matching each classification changed due to the different parameters and thresholds used in the updated classifier model.

      Minor points:

      In the Discussion: "We find some overlap and shared spike features between the different disease phenotypes and show that healthy cerebellar neurons can adapt multiple disease-associated spike train signatures." I think "adapt" should be "adopt"

      In the Discussion: "compare" is misspelled as "compared"

      Thank you for bringing these typos to our attention. We will upload a new version of the text with the typos corrected.


      The following is the authors’ response to the original reviews.

      We would like to thank the Reviewers for providing excellent and constructive suggestions that have enabled us to strengthen our overall presentation of our data. We have addressed each of the comments by altering the text, providing additional data, and revising the figures, as requested.

      Below are our explanations for how we have altered the manuscript in this revised version.

      Recommendations for the authors:

      I think you will have seen from the comments that there was great enthusiasm for the importance of this study. There were also shared concerns about how the classifier may be inadequate in its current format, as well as specific suggestions to consider to improve. I hope that you will consider a revision to really amplify the impact of the importance of this study.

      Reviewer #1 (Recommendations For The Authors):

      Distinct motor phenotypes are reflected in different neuronal firing patterns at different loci in motor circuits. However, it is difficult to determine if these altered firing patterns: 1) reflect the underlying neuropathology or phenotype, 2) whether these changes are intrinsic to the local cell population or caused by larger network changes, and 3) whether abnormal firing patterns cause or reflect abnormal movement patterns. This manuscript attempts to address these questions by recording neural firing patterns in deep cerebellar nucleus neurons in several models of cerebellar dysfunction with distinct phenotypes. They develop a classifier based on parameters of single unit spike trains that seems to do an inconsistent job of predicting phenotype (though it does fairly well for tremor). The major limitation of the recording/classifier experiments is the low number of single units recorded in each model, greatly limiting statistical power. However, the authors go on to show that specific patterns of Purkinje cell stimulation cause consistent changes in interposed nucleus activity that map remarkably well onto behavioral phenotypes. Overall, I did not find the recording/classifier results to be very convincing, while the stimulation results strongly indicate that interposed nucleus firing patterns are sufficient to drive distinct behavioral phenotypes.

      We thank the reviewer for their comments. We describe below how we have addressed the major concerns.

      Major concerns:

      (1) I don't think it's legitimate to use two 30-second samples from the same recording to train and validate the classifier. I would expect recordings from the same mouse, let alone the same unit, to be highly correlated with each other and therefore overestimate the accuracy of the classifier. How many of the recordings in the training and validation sets were the same unit recorded at two different times?

      We previously published a paper wherein we measured the correlation (or variability) between units recorded from the same mouse versus units recorded from different mice (see: Van der Heijden et al., 2022 – iScience, PMID: 36388953). In this paper we did not find that nuclei neuron recordings from the same mouse were more correlated or similar to each other than recordings from different mice. 

      Upon this reviewer comment, however, we did observe strong correlations between the two 30-second samples from the same recording units. We therefore decided to no longer validate our classifier based on a training and validation sets that had overlapping units. Instead, we generated 12 training sets and 12 non-overlapping validation sets based on our entire database. We then trained 12 classifier models and ranked these based on their classification ability on the validation sets (Figure 1 – supplemental Figure 3). We found that the top two performing classifier models were the same, and used this model for the remainder of the paper. 

      (2) The n's are not convincing for the spike signature analyses in different phenotypic models. For example, the claim is that Pdx1Cre;Vglut2fl/fl mice have more "control" neurons than ouabain infusion mice (more severe phenotype). However, the numbers are 11/21 and 7/20, respectively. The next claim is that 9/21 dystonic neurons are less than 11/20 dystonic neurons. A z-test for proportions gives a p-value of 0.26 for the first comparison and a pvalue of 0.44 for the second. I do not think any conclusions can be drawn based on these data.

      We included more cells in our analyses and found that the z-test for n the proportion of cells with the “control” and “dystonia” signature is indeed statistically significant. 

      (3) Since the spiking pattern does not appear to predict an ataxic phenotype and the n's are too small to draw a conclusion for the dystonic mice, I think the title is very misleading - it does not appear to be true that "Neural spiking patterns predict behavioral phenotypes...", at least in these models.

      We have changed the title to: “Cerebellar nuclei cells produce distinct pathogenic spike signatures in mouse models of ataxia, dystonia, and tremor.” We feel that this new title captures the idea that we find differences between spike signatures associated with ataxia, dystonia, and tremor and that these signatures induce pathological movements.

      (4) I don't think it can be concluded from the optogenetic experiments that the spike train signatures do not depend on "developmental changes, ...the effect of transgene expression, ... or drug effects outside the cerebellum." The optogenetic experiments demonstrate that modulating Purkinje cell activity is sufficient to cause changes in DCN firing patterns and phenotypes (i.e., proof-of-principle). However, they do not prove that this is why DCN firing is abnormal in each model individually.

      Thank you for highlighting this section of the text. We agree that the optogenetic experiments cannot explain why the DCN is firing abnormally in each model. We have edited this section of the text to prevent this conclusion from being drawn by the readers.

      Minor points:

      (1) It would be nice to see neural recordings in the interposed nucleus during Purkinje terminal stimulation to verify that the firing patterns observed during direct Purkinje neuron illumination are reproduced with terminal activation. This should be the case, but I'm not 100% certain it is.

      We have edited the text to clarify that representative traces and analysis of interposed nucleus neurons in response to Purkinje terminal stimulation are the data in Figure 5.

      (2) How does the classifier validation (Fig. 1E) compare to chance? If I understand correctly, 24/30 neurons recorded in control mice are predicted to have come from control mice (for example). This seems fairly high, but it is hard to know how impressive this is. One approach would be to repeat the analysis many (1000s) of times with each recording randomly assigned to one of the four groups and see what the distribution of "correct" predictions is for each category, which can be compared against the actual outcome.

      We have now also included the proportion of spike signatures in the entire population of neurons and show that the spike signatures are enriched in each of the four groups (control, ataxia, dystonia, tremor) relative to the presence of these signatures in the population (Figure 1E). 

      (3) I don't think this is absolutely necessary, but do the authors have ideas about how their identified firing patterns might lead to each of these phenotypes? Are there testable hypotheses for how different phenotypes caused by their stimulation paradigms arise at a network level?

      We have added some ideas about how these spike signatures might lead to their associated phenotypes to the discussion.

      Reviewer #2 (Recommendations For The Authors):

      (1) As mentioned earlier, my main concern pertains to the overall architecture and training of the classifier. Based on my reading of the methods and the documentation for the classifier model, I believe that the classifier boundaries may be biased by the unequal distribution of neurons across cerebellar disease groups (e.g., n=29 neurons in control versus n=19 in ataxics). As the classifier is trained to minimize the classification error across the entire sample, the actual thresholds on the parameters of interest may be influenced by the overrepresentation of neurons from control mice. To address this issue, one possible solution would be to reweight each group so that the overall weight across classes is equal. However, I suggest a better strategy might be to revise the classifier architecture altogether (as detailed below).

      We have retrained the classifier model based on equal numbers of ataxic, dystonic, and tremor cells (n=20) but we intentionally included more control cells (n=25). We included more control cells because we assume this is the baseline status for all cerebellar neurons and wanted to avoid assigning disease signatures to healthy neurons too easily. 

      (2) As the authors make abundantly clear, one mouse model of disease could potentially exhibit multiple phenotypes (e.g., a mouse with both ataxia and tremor). To address this complexity, it might be more valuable to predict the probability of a certain CN recording producing specific behavioral phenotypes. In this revised approach, the output of the classifier wouldn't be a single classification (e.g., "this is an ataxic mouse") but rather the probability of a certain neural recording corresponding to ataxia-like symptoms (e.g., "the classifier suggests that this mouse has a 76% likelihood of exhibiting ataxic symptoms given this CN recording"). This modification wouldn't require additional data collection, and the exemplar disease models could still be used to train such a revised network/classifier, with each mouse model corresponding to 0% probability of observing all other behavioral phenotypes except for the specific output corresponding to the disease state (e.g., L7CreVgat-fl/fl would be 0% for all categories except ataxia, which would be trained to produce a score of 100%). This approach could enhance the validation results across other mouse models by allowing flexibility in a particular spike train parameter to produce a diverse set of phenotypes.

      This is a great comment. Unfortunately, our current dataset is constrained to fully address this comment for the following reasons:

      - We have a limited number of neurons on which we can train our classifier neurons. Further dividing up the groups of neurons or complicating the model limited the power of our analyses and resulted in overfitting of the model on too few neurons.

      - The recording durations (30 seconds) used to train our model are likely too short to find multiple disease signatures within a single recording. We feel that the complex phenotypes are likely resulting from cells within one mouse exhibiting a mix of disease signatures (as in the Car8wdl/wdl mice).

      We think this question would be great for a follow-up study that uses a large number of recordings from single mice to fully predict the mouse phenotype based on the population spike signatures. 

      To limit confusion about our classifier model, we have also altered the language of our manuscript and refer to the cells exhibiting a spike signature instead of predicting a phenotype. 

      However, the paper falls short in terms of the classifier model itself. The current implementation of this classifier appears to be rather weak. For instance, the crossvalidated performance on the same disease line mouse model for tremor is only 56%. While I understand that the classifier aims to simplify a high-dimensional dataset into a more manageable decision tree, its rather poor performance undermines the authors' main objectives. In a similar vein, although focusing on three primary features of spiking statistics identified by the decision tree model (CV, CV2, and median ISI) is useful for understanding the primary differences between the firing statistics of different mouse models, it results in an overly simplistic view of this complex data. The classifier and its reliance on the reduced feature set are the weakest points of the paper and could benefit from further analysis and a different classification architecture. Nevertheless, it is commendable that the authors have collected high-quality data to validate their classifier. Particularly impressive is their inclusion of data from multiple mouse models of ataxia, dystonia, and tremor, enabling a true test of the classifier's generalizability.

      We intentionally simplified our parameter space from a high-dimensional dataset into a more manageable decision tree. We did this for the following reasons:

      - The parameters, even though all measuring different features, are highly correlated (see Figure 1 – supplemental Figure 2). Further, we were training our dataset on a limited number of recordings. We found that including all parameters (for example using a linear model) caused overfitting of the data and poor model performance.

      - Describing the spike signatures using a lower number of parameters allowed us to design optogenetic parameters that would mimic this parameter space. This would be infinitely more complex with a bigger parameter space. 

      We agree with the reviewer that inclusion of multiple mouse models in addition to the optogenetics experiments provide the classifier’s generalizability. 

      Minor Comments:

      (1) The blown-up CN voltage traces in Figures 5C and Supplementary Figure 2B appear more like bar plots than voltage traces on my machine.

      Thank you for bringing this to our attention. We have improved the rendering of the traces.

      (2) The logic in lines 224-228 is somewhat confusing. The spike train signatures are undoubtedly affected by all the factors mentioned by the authors. What, I believe, the authors intend to convey is that because changes in CN firing rates can be driven by multiple factors, it is the CN firing properties themselves that likely drive disease-specific phenotypes.

      We agree that our discussion of the CN firing needs clarification. We have made the appropriate edits in the text.

      Reviewer #3 (Recommendations For The Authors):

      It's quite astounding that this can be done from single spike trains from what are almost certainly mixed populations of neurons. Could you add something to the discussion about this? Some questions that could be addressed would be would multiple simultaneous recordings additionally help classify these diseases, or would non-simultaneous recordings from the same animal be useful? Also more discussion about which cells you are likely recording from would be useful.

      Thank you for this suggestion. We have added discussion about multiple recordings, simultaneous vs non-simultaneous recordings, and our thoughts on the cell population recorded in this work.

      Data in figure 2 is difficult to understand - it appears that the majority of dysregulated cells in 2 ataxic models are classified as dystonia cells, not ataxic cells. This appears surprising as it seems to be at odds with earlier data from Fig 1. In my opinion, it is not discussed adequately in the Results or Discussion section.

      We have added further discussion of the ataxia models represented in Figures 1 and 2.

      Minor comment:

      The colours of the subdivisions of the bars in 2C and 3C, and the rest of the paper appear to be related to the groups in the middle (under "predicted"), but the colours are much paler in the figure than in the legend, although the colours in the bars and the legends match in the first figure (1E). Does this signify something?

      These figures were remade with the same colors across the board.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study by Prieto et al. faces the increasingly serious problem of bacterial resistance to antimicrobial agents. This work has an important element of novelty proposing a new approach to control antibiotic resistance spread by plasmids. Instead of targeting the resistance determinant, plasmid-borne proteins are used as antigens to be bound by specific nanobodies (Nbs). Once bound plasmid transfer was inhibited and Salmonella infection blocked. This in-depth study is quite detailed and complex, with many experiments (9 figures with multiple panels), rigorously carried out. Results fully support the authors' conclusions. Specifically, the authors investigated the role of two large molecular weight proteins (RSP and RSP2) encoded by the IncHI1 derivative-plasmid R27 of Salmonella. These proteins have bacterial Ig-like (Big) domains and are expressed on the cell surface, creating the opportunity for them to serve as immunostimulatory antigens. Using a mouse infection model, the authors showed that RSP proteins can properly function as antigens, in Salmonella strains harboring the IncHI1 plasmid. The authors clearly showed increased levels of specific IgG and IgA antibodies against these RSP proteins proteins in different tissues of immunized animals. In addition, non-immunized mice exhibited Salmonella colonization in the spleen and much more severe disease than immunized ones. 

      However, the strength of this work is the selection and production of nanobodies (Nbs) that specifically interact with the extracellular domain of RSP proteins. The procedure to obtain Nbs is lengthy and complicated and includes the immunization of dromedaries with purified RPS and the construction of a VHH (H-chain antibody variable region) library in E. coli. As RSP is expressed on the surface of E. coli, specific Nbs were able to agglutinate Salmonella strains harboring the p27 plasmid encoding the RSP proteins. 

      The authors demonstrated that Nbs-RSP reduced the conjugation frequency of p27 thus limiting the diffusion of the amp resistance harbored by the plasmid. This represents an innovative and promising strategy to fight antibiotic resistance, as it is not blocked by the mechanism that determines, in the specific case, the amp resistance of p27 but it targets an antigen associated with HincHI- derivative plasmids. Thus, RPS vaccination could be effective not only against Salmonella but also against other enteric bacteria. A possible criticism could be that Nbs against RSP proteins reduce the severity of the disease but do not completely prevent the infection by Salmonella.

      It is true that vaccina2on of mice with purified RSP protein did not provide complete protec2on against infec2on with a Salmonella strain harboring an IncHI plasmid. As this finding is based on an animal model, further inves2ga2on is required to evaluate its clinical efficacy. In any case, even par2al protec2on provided by nanobodies or by a vaccine could poten2ally improve survival rates among cri2cally ill pa2ents infected with a pathogenic bacterium harboring an IncHI plasmid. An addi2onal beneficial aspect of our approach is that it will reduce dissemina2on of IncHI plasmids among pathogenic bacteria, which would reduce the presence of an2bio2c resistance plasmids in the environment and in the bacteria infec2ng pa2ents. 

      Reviewer #2 (Public Review):

      Summary:

      This manuscript aims to tackle the antimicrobial resistance through the development of vaccines. Specifically, the authors test the potential of the RSP protein as a vaccine candidate. The RSP protein contains bacterial Ig-like domains that are typically carried in IncHl1 plasmids like R27. The extracellular location of the RSP protein and its role in the conjugation process makes it a good candidate for a vaccine. The authors then use Salmonella carrying an IncHl plasmid to test the efficacy of the RSP protein as a vaccine antigen in providing protection against infection of antibioticresistant bacteria carrying the IncHl plasmid. The authors found no differences in total IgG or IgA levels, nor in pro-inflammatory cytokines between immunized and non-immunized mice. They however found differences in specific IgG and IgA, attenuated disease symptoms, and restricted systemic infection.

      The manuscript also evaluates the potential use of nanobodies specifically targeting the RSP protein by expressing it in E. coli and evaluating their interference in the conjugation of IncHl plasmids. The authors found that E. coli strains expressing RSPspecific nanobodies bind to Salmonella cells carrying the R27 plasmid thereby reducing the conjugation efficacy of Salmonella. 

      Strengths:

      The main strength of this manuscript is that it targets the mechanism of transmission of resistance genes carried by any bacterial species, thus making it broad.

      The experimental setup is sound and with proper replication.

      Weaknesses:

      The two main experiments, evaluating the potential of the RSP protein and the effects of nanobodies on conjugation, seem as parts of two different and unrelated strategies.

      In preparing our manuscript, we were aware that we included two different strategies to combat an2microbial resistance. However, we deemed it valuable to include both in the paper. The development of new vaccines and the inhibi2on of the transfer of an2bio2c resistance determinants are currently considered relevant approaches to combat an2microbial resistance. Our inten2on in the ar2cle is to integrate these two strategies. 

      The survival rates shown in Figure 1A and Figure 3A for Salmonella pHCM1 and non-immunized mice challenged with Salmonella, respectively, are substantially different. In the same figures, the challenge of immunized mice and Salmonella pHCM1 and mice challenged with Salmonella pHCM1 with and without ampicillin are virtually the same. While this is not the only measure of the effect of immunization, the inconsistencies in the resulting survival curves should be addressed by the authors more thoroughly as they can confound the effects found in other parameters, including total and specific IgG and IgA, and pro-inflammatory cytokines.

      Overall the results are inconsistent and provide only partial evidence of the effectiveness of the RSP protein as a vaccine target.

      To address the concerns regarding the disparities in survival rates depicted in Figures 1A and 3A, it is important to refer to several factors that contribute to these variations. Firstly, it should be noted that the data depicted in these figures stem from distinct experimental sets conducted at different times employing different batches of mice. Despite the use of the same strain and supplier, individual animals and their batches can exhibit variability in susceptibility to infection due to inherent biological differences.

      Unlike in vitro cell culture experiments, which can achieve high replicability due to the homogeneity of cell lines, in vivo animal studies often exhibit greater variability. This variability is influenced not only by genetic variations within animal populations, even if originating from the same supplier, but also by environmental factors within the animal facility. These factors include temperature variations, the concentration y of non-pathogenic microorganisms in the facility, which can modify the immune responses, or the density of animals in the environment, consequently affecting human traffic and generating potential disturbances. 

      When designing experiments with animals, it is desirable for the results to be consistent across different animal batches. If one bacterial strain exhibits higher mortality rates than another across multiple experimental series, this pattern should be reproducible despite the inherent variability in in vivo studies. It is more important to demonstrate consistency in trends than to focus on absolute figures when validating experimental results. 

      It is also important to clarify that when we refer to survival rates, it doesn’ t necessarily mean that the animals were found deceased. The animal procedures were approved by the Ethics Committee of Animal Experimentation of the Universitat de Barcelona, which include an animal monitoring protocol. Our protocol requires close daily monitoring of several health and behavioral parameters, each evaluated according to specific criteria. When an animal reaches a predetermined score threshold indicating severe distress or suffering, euthanasia is administered to alleviate further suffering. At this point, biological samples are collected for subsequent analysis.

      The conjugative experiments use very long conjugation times, making it harder to assess if the resulting transconjugants are the direct result of conjugation or just the growth of transconjugants obtained at earlier points in time. While this could be assessed from the obtained results, it is not a direct or precise measure.

      In the conjuga2on experiments we u2lized a reduced number of donor cells expressing the RSP protein and of recipient cells, as well as long conjuga2on 2mes, to reflect more accurately a situa2on that may occur naturally in the environment. Short conjuga2on 2mes are efficient in controlled laboratory condi2ons using high densi2es of donor and recipient cells, but these condi2ons are not commonly found in the environment. For the interference of the conjuga2ve transfer of the IncHI plasmid we used an E. coli strain displaying the nanobody binding RSP to simulate a process that could be also scaled-up in a natural environment (i.e., a probio2c strain in a livestock farm) and that could be cost effec2ve. See discussion sec2on, lanes 326-328.   

      While the potential outcomes of these experiments could be applied to any bacterial species carrying this type of plasmids, it is unclear why the authors use Salmonella strains to evaluate it. The introduction does a great job of explaining the importance of these plasmids but falls short in introducing their relevance in Salmonella.

      The prevalence of IncHI plasmids in Salmonella was indicated in the introduc2on sec2on, lanes 65-67. Nevertheless, we understand the reviewer’s cri2cisms and have modified both these sentences in the introduc2on sec2on and also added comments in the results sec2on (lanes 118-128).

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I understand working with mice can be challenging in terms of repeating experiments to further support the study's claims. For this reason, I think the authors need to discuss more thoroughly the following things:

      Can the authors comment on why the presence of Ampicillin leads to a lower upregulation of proinflammatory cytokines in the spleen despite harboring resistance against ampicillin?

      At the intestinal level, physiological inflammatory responses play a crucial role in enabling the host to identify foreign and commensal bacterial antigens and initiate a highly regulated and "controlled" immune response (Fiocchi, 2008. Inflamm Bowel Dis. 2008, 14 Suppl 2:S77-8). The administration of antibiotics such as ampicillin, reduces the load of intestinal resident microbiota, thereby lowering the extent of intestinal immune activation. This decline in immune activation extends to systemic levels, potentially accounting for the reduced expression of proinflammatory cytokines observed in the spleen.

      There are inconsistent results in the survival rates in Figures 1A and 3A, please discuss how this could alter the observed differences in total and specific IgG and IgA, and pro-inflammatory cytokines.

      To address the reviewer concerns regarding the discrepancies in survival rates shown in Figures 1A and 3A, and how these differences might influence the observed variations in total and specific IgG and IgA, as well as pro-inflammatory cytokines, it is important to clarify the terminology used in our study. In our context, "survival" does not solely refer to mortality per se, but encompasses the endpoints defined by our animal welfare protocols, which are rigorously supervised by the Animal Experimentation Ethics Committee of the University of Barcelona. Our protocol mandates close daily monitoring of several health and behavioral parameters, each scored according to specific criteria. When an animal reaches a predefined score threshold indicating severe distress or suffering, euthanasia is conducted to prevent further distress, at which point we collect biological samples for analysis.

      In contrast to in vitro cell culture experiments, which often achieve high replicability thanks to the homogeneity of cell lines, in vivo animal studies frequently display greater variability. This variability stems not only from genetic differences within animal populations, even if originating from the same supplier, but also from environmental factors within the animal facility. These factors encompass variations in temperature, the presence of non-pathogenic microorganisms in the facility (capable of altering immune responses) and the density of animals, which can impact human traffic and potentially lead to disturbances. 

      The experiments depicted in Figs. 1A and 3A were separated in time, and hence may be influenced by environmental factors within the animal facility. Nevertheless, in the comparative analysis performed between immunized and non-immunized animals, experiments were performed simultaneously and hence under similar environmental conditions in the animal facility. For several parameters (i.e., immunoglobulins and proinflammatory cytokines) statistically significant differences were observed. 

      Regarding the conjugation assays, it is not entirely clear to me why the conjugation times are so long. It would be beneficial to have more data about the conjugation efficacy between the donor and recipient without any E. coli expressing the nanobodies at different time intervals. This would help to differentiate between transconjugants and transconjugants obtained from early conjugation events.

      This comment is par2ally answered in a previous response, regarding the numbers of donor and recipient cells and dura2on of conjuga2on.  We note here that in fig. 9, the requested experiment with donor and recipient cells without E. coli interferent cells is already present, corresponding to the label “none”. To avoid confusion, we have modified the legend in fig. 9.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #2 (Public Review):

      Although the study by Xiaolin Yu et al is largely limited to in vitro data, the results of this study convincingly improve our current understanding of leukocyte migration.

      (1) The conclusions of the paper are mostly supported by the data and in the revised manuscript clarification is provided concerning the exact CCL5 forms (without or with a fluorescent label or His-tag) and amounts/concentrations that were used in the individual experiments. This is important since it is known that modification of CCL5 at the N-terminus affects the interactions of CCL5 with the GPCRs CCR1, CCR3 and CCR5 and random labeling using monosuccinimidyl esters (as done by the authors with Cy-3) is targeting lysines. The revised manuscript more clearly indicates for each individual experiment which form is used. However, a discussion on the potential effects of the modifications on CCL5 in the results and discussion sections is still missing.

      Many thanks for the reviewer's suggestion. We fully agree it is important to clarify the potential issue of Cy3 labeling, and believe it is more suitable in the Materials and Methods section (line 312-314).

      (2) In general, authors used high concentrations of CCL5 in their experiments. In their reply to the comments they indicate that at lower CCL5 concentrations no LLPS is detected. This is important information since it may indicate the need for chemokine oligomerization for LLPS. This info should be added to the manuscript and comparison with for instance the obligate monomer CCL7 and another chemokine such as CXCL4 that easily forms oligomers may clarify whether LLPS is controlled by oligomerization.

      We are pleased by the help of the reviewers and accordingly inserted a brief discussion as suggested (line 240-246).

      (3) Statistical analyses have been improved in the revised manuscript.

      Thanks to the reviewer for his/her comment.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors intended to prove that gut GLP-1 expression and secretion can be regulated by Piezo1, and hence by mechanistic/stretching regulation. For this purpose, they have assessed Piezo1 expression in STC-1 cell line (a mouse GLP-1 producing cell line) and mouse gut, showing the correlation between Piezo1 level and Gcg levels (Figure S1). They then aimed to generate gut L cell-specific Piezo1 KO mice, and claimed the mice show impaired glucose tolerance and GLP-1 production, which can be mitigated by Ex-4 treatment (Figures 1-2). Pharmacological agents (Yoda1 and GsMTx4) and mechanic activation (intestinal bead implantation) were then utilized to prove the existence of ileal Piezo1-regulated GLP-1 synthesis (Figure 3). This was followed by testing such mechanism in a limited amount of primary L cells and mainly in the STC-1 cell line (Figures 4-7).

      While the novelty of the study is somehow appreciable, the bio-medical significance is not well demonstrated in the manuscript. The authors stated (in lines between lines 78-83) a number of potential side effects of GLP-1 analogs, how can the mechanistic study of GLP-1 production on its own be essential for the development of new drug targets for the treatment of diabetes. Furthermore, the study does not provide a clear mechanistic insight on how the claimed CaMKKbeta/CaMKIV-mTORC1 signaling pathway upregulated both GLP-1 production and secretion. This reviewer also has concerns about the experimental design and data presented in the current manuscript, including the issue of how proglucagon expression can be assessed by Western blotting.

      Strengths:

      The novelty of the concept.

      Weaknesses:

      Experimental design and key experiment information.

      Current GLP-1-based therapies for diabetes use GLP-1 agonists/analogs. Although generally safe, there are some side effect or risks of GLP-1 agonists/analogs. We agree to the reviewer that a mechanistic study on the regulation of GLP-1 production will not directly lead to development of new drug targets for the treatment of diabetes. However, understanding the mechanism of GLP-1 production may shed light onto alternative treatment strategies for diabetes that targeting the production of GLP-1. In our previous studies, we have elucidated the role of mTOR/S6K pathway in regulating GLP-1 production in L cells. Using STC-1 cell line and different mouse models, including Neurog3-Tsc1−/− mice, rapamycin or L-lucine treatment to stimulate mTOR activity, we have demonstrated that mTOR stimulates proglucagon gene expression and thus GLP-1 production (Diabetologia 2015;58(8):1887-97; Mol Cell Endocrinol. 2015 Nov 15:416:9-18.). Based on our previous studies, we found that Piezo1 regulated mTOR/S6K pathway and thus proglucagon expression and GLP-1 production through Ca2+/CaMKKbeta/CaMKIV in our present study. Although we could not exclude involvement of other signaling pathways downstream of Piezo1 in regulating the cleavage of proglucagon, granule maturation and the final release of GLP-1, our present study provided evidence to support the involvement of the Ca2+/CaMKKbeta/CaMKIV/mTOR pathway in mediating the role Piezo1 in proglucagon expression and GLP-1 production. The reviewer also expressed concerns on the use of western blot to detect proglucagon expression. In fact, western blot is often used in detection of proglucagon. Here are some examples from other researchers: Diabetes. 2013 Mar;62(3):789-800. Gastroenterology. 2011 May;140(5):1564-74. 2004 Jul 23;279(30):31068-75. The proglucagon antibody we used in our study was purchased from abcam (Cat#ab23468), which can detect proglucagon of 21 kDa.

      Reviewer #2 (Public Review):

      Summary:

      The study by Huang and colleagues focuses on GLP-1 producing entero-endocrine (EEC) L-cells and their regulation of GLP-1 production by a mechano-gated ion channel Piezo1. The study describes Piezo1 expression by L-cells and uses an exciting intersectional mouse model (villin to target epithelium and Gcg to target GLP-1-producing cells and others like glucagon-producing pancreatic endocrine cells), which allows L-cell specific Piezo1 knockout. Using this model, they find an impairment of glucose tolerance, increased body weight, reduced GLP-1 content, and changes to the CaMKKbeta-CaMKIV-mTORC1 signaling pathway using a normal diet and then high-fat diet. Piezo1 chemical agonist and intestinal bead implantation reversed these changes and improved the disrupted phenotype. Using primary sorted L-cells and cell model STC-1, they found that stretch and Piezo1 activation increased GLP-1 and altered the molecular changes described above.

      Strengths:

      This is an interesting study testing a novel hypothesis that may have important mechanistic and translational implications. The authors generated an important intersectional genetics mouse model that allowed them to target Piezo1 L-cells specifically, and the surprising result of impaired metabolism is intriguing.

      Weaknesses:

      However, there are several critical limitations that require resolution before making the conclusions that the authors make.

      (1) A potential explanation for the data, and one that is consistent with existing literature [see for example, PMC5334365, PMC4593481], is that epithelial Piezo1, which is broadly expressed by the GI epithelium, impacts epithelial cell density and survival, and as such, if Piezo1 is involved in L-cell physiology, it may be through regulation of cell density. Thus, it is critical to determine L-cell densities and epithelial integrity in controls and Piezo1 knockouts systematically across the length of the gut, since the authors do not make it clear which gut region contributes to the phenotype they see. Current immunohistochemistry data are not convincing.

      We appreciate the reviewer’s comment. We agree that Piezo1 may affect L-cell density and epithelial integrity. We will do quantification of L-cell density and test the epithelial integrity by examining the expression of tight junction proteins (ZO-1 and Occludin) and determine the transepithelial resistance in different regions of the gut

      (2) Calcium signaling in L-cells is implicated in their typical role of being gut chemo-sensors, and Piezo1 is a calcium channel, so it is not clear whether any calcium-related signaling mechanism would phenocopy these results.

      We will examine whether other calcium-related signaling mechanism also contribute the phenotype seen in the IntL-Piezo1-/- mice.

      (3) Intestinal bead implantation, while intriguing, does not have clear mechanisms - and is likely to provide a point of intestinal obstruction and dysmotility.

      To ascertain if intestinal bead implantation led to intestinal obstruction and dysmotility, we conducted a bowel transit time test. The results revealed no difference in bowel transit time between the sham-operated mice and those implanted with beads.

      (4) Previous studies, some that are very important, but not cited, contradict the presented results (e.g., epithelial Piezo1 role in insulin secretion) and require reconciliation.

      Overall, this study makes an interesting observation but the data are not currently strong enough to support the conclusions.

      We will cite more previous studies on GLP-1 production and discuss the discrepancy between our study and others’ studies. The lack of changes in blood glucose seen in Villin-Piezo1-/- mice reported by Sugisawa et. al. is not surprising (Cell. 2020 Aug 6;182(3):609-624.e21.). Actually, in another recent study from our group, we found similar results when the Villin-Piezo1-/- mice Piezo1fl/fl control mice were fed with normal chow diet. Since Villin-1 is expressed in all the epithelial cells of the gut, including enterocytes and various types of endocrine cells, the effect of L-cell Piezo1 loss may be masked by other cell types under normal condition. However, impair glucose tolerance was seen in Villin-Piezo1-/- mice compared to the Piezo1fl/fl control mice after high fat diet for 8 weeks. We further found that Piezo1 in enterocytes exerted a negative effect on the glucose and lipid absorption. Loss of Piezo1 in enterocytes led to over-absorption of nutrients under high-fat diet (Tian Tao, Qing Shu, Yawen Zhao, Wenying Guo, Jinting Wang, Yuhao Shi, Shiqi Jia, Hening Zhai, Hui Chen, Cunchuan Wang*, Geyang Xu*, Mechanical regulation of lipid and sugar absorption by Piezo1 in enterocytes, Acta Pharmaceutica Sinica B , Accepted, 2024,https://doi.org/10.1016/j.apsb.2024.04.016).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Your editorial guidance, reviews, and suggestions have led us to make substantial changes to our manuscript. While we detail point-by-point responses in typical fashion below, I wanted to outline, at a high level, what we’ve done.

      (1) Methods. Your suggestions led us to rethink our presentation of our methods, which are now described more cohesively in a new methods section in the main text.

      (2) Model Validation & Robustness. Reviewers suggested various validations and checks to ensure that our findings were not, for instance, the consequence of a particular choice of parameter. These can be found in the supplementary materials.

      (3) Data Cleaning & Inclusion/Exclusion. Finally, based on feedback, our new methods section fully describes the process by which we cleaned our original data, and on what grounds we included/excluded individual faculty records from analysis.

      eLife assessment

      Efforts to increase the representation of women in academia have focussed on efforts to recruit more women and to reduce the attrition of women. This study - which is based on analyses of data on more than 250,000 tenured and tenure-track faculty from the period 2011-2020, and the predictions of counterfactual models - shows that hiring more women has a bigger impact than reducing attrition. The study is an important contribution to work on gender representation in academia, and while the evidence in support of the findings is solid, the description of the methods used is in need of improvement.

      Reviewer #1 (Public Review):

      Summary and strengths

      This is an interesting paper that concludes that hiring more women will do more to improve the gender balance of (US) academia than improving the attrition rates of women (which are usually higher than men's). Other groups have reported similar findings but this study uses a larger than usual dataset that spans many fields and institutions, so it is a good contribution to the field.

      We thank the reviewer for their positive assessment of the contributions of our work.

      Weaknesses

      The paper uses a mixture of mathematical models (basically Leslie matrices, though that term isn't mentioned here) parameterised using statistical models fitted to data. However, the description of the methods needs to be improved significantly. The author should consider citing Matrix Population Models by Caswell (Second Edition; 2006; OUP) as a general introduction to these methods, and consider citing some or all of the following as examples of similar studies performed with these models:

      Shaw and Stanton. 2012. Proc Roy Soc B 279:3736-3741

      Brower and James. 2020. PLOS One 15:e0226392

      James and Brower. 2022. Royal Society Open Science 9:220785 Lawrence and Chen. 2015.

      [http://128.97.186.17/index.php/pwp/article/view/PWP-CCPR-2015-008]

      Danell and Hjerm. 2013. Scientometrics 94:999-1006

      We have expanded the description of methods in a new methods section of the paper which we hope will address the reviewer’s concerns.

      We agree that our model of faculty hiring and attrition resembles Leslie matrices. In results section B, we now mention Leslie matrices and cite Matrix Population Models by Caswell, noting a few key differences between Leslie matrices and the model of hiring and attrition presented in this work. Most notably, in the hiring and attrition model presented, the number of new hires is not based on per-capita fertility constants. Instead, population sizes are predetermined fixed values for each year, precluding exponential population growth or decay towards 0 that is commonly observed in the asymptotic behavior of linear Leslie Matrix models.

      We have additionally revised the main text to cite the listed examples of similar studies (we had already cited James and Brower, 2022). We thank the reviewer for bringing these relevant works to our attention.

      The analysis also runs the risk of conflating the fraction of women in a field with gender diversity! In female-dominated fields (e.g. Nursing, Education) increasing the proportion of women in the field will lead to reduced gender diversity. This does not seem to be accounted for in the analysis. It would also be helpful to state the number of men and women in each of the 111 fields in the study.

      We have carefully examined the manuscript and revised the text to correctly differentiate between gender diversity and women’s representation.

      We have additionally added a table to the supplemental materials (Tab. S3) that reports the estimated number of men and women in each of the 111 fields.

      Reviewer #2 (Public Review):

      Summary:

      This important study by LaBerge and co-authors seeks to understand the causal drivers of faculty gender demographics by quantifying the relative importance of faculty hiring and attrition across fields. They leverage historical data to describe past trends and develop models that project future scenarios that test the efficacy of targeted interventions. Overall, I found this study to be a compelling and important analysis of gendered hiring and attrition in US institutions, and one that has wide-reaching policy implications for the academy. The authors have also suggested a number of fruitful future avenues for research that will allow for additional clarity in understanding the gendered, racial, and socioeconomic disparities present in US hiring and attrition, and potential strategies for mitigating or eliminating these disparities.

      We thank the reviewer for their positive assessment of the contributions of our work.

      Strengths:

      In this study, LaBerge et al use data from over 268,000 tenured and tenure-track faculty from over 100 fields at more than 12,000 PhD-granting institutions in the US. The period they examine covers 2011-2020. Their analysis provides a large-scale overview of demographics across fields, a unique strength that allows the authors to find statistically significant effects for gendered attrition and hiring across broad areas (STEM, non-STEM, and topical domains).

      LaBerge et al. find gendered disparities in attrition-using both empirical data and their counterfactual model-that account for the loss of 1378 women faculty across all fields between 2011 and 2020. It is true that "this number is both a small portion of academia... and a staggering number of individual careers," as ." - as this loss of women faculty is comparable to losing more than 70 entire departments. I appreciate the authors' discussion about these losses-they note that each of these is likely unnecessary, as women often report feeling that they were pushed out of academic jobs.

      LaBerge et al. also find-by developing a number of model scenarios testing the impacts of hiring, attrition, or both-that hiring has a greater impact on women's representation in the majority of academic fields in spite of higher attrition rates for women faculty relative to men at every career stage. Unlike many other studies of historical trends in gender diversity, which have often been limited to institution-specific analyses, they provide an analysis that spans over 100 fields and includes nearly all US PhD-granting institutions. They are able to project the impacts of strategies focusing on hiring or retention using models that project the impact of altering attrition risk or hiring success for women. With this approach, they show that even relatively modest annual changes in hiring accumulate over time to help improve the diversity of a given field. They also demonstrate that, across the model scenarios they employ, changes to hiring drive the largest improvement in the long-term gender diversity of a field.

      Future work will hopefully - as the authors point out - include intersectional analyses to determine whether a disproportionate share of lost gender diversity is due to the loss of women of color from the professoriate. I appreciate the author's discussion of the racial demographics of women in the professoriate, and their note that "the majority of women faculty in the US are white" and thus that the patterns observed in this study are predominately driven by this demographic. I also highly appreciate their final note that "equal representation is not equivalent to equal or fair treatment," and that diversifying hiring without mitigating the underlying cause of inequity will continue to contribute to higher losses of women faculty.

      Weaknesses

      First, and perhaps most importantly, it would be beneficial to include a distinct methods section. While the authors have woven the methods into the results section, I found that I needed to dig to find the answers to my questions about methods. I would also have appreciated additional information within the main text on the source of the data, specifics about its collection, inclusion and exclusion criteria for the present study, and other information on how the final dataset was produced. This - and additional information as the authors and editor see fit - would be helpful to readers hoping to understand some of the nuance behind the collection, curation, and analysis of this important dataset.

      We have expanded upon the description of methods in a new methods section of the paper.

      We have also added a detailed description of the data cleaning steps taken to produce the dataset used in these analyses, including the inclusion/exclusion criteria applied. This detailed description is at the beginning of the methods section. This addition has substantially enhanced the transparency of our data cleaning methods, so we thank the reviewer for this suggestion.

      I would also encourage the authors to include a note about binary gender classifications in the discussion section. In particular, I encourage them to include an explicit acknowledgement that the trends assessed in the present study are focused solely on two binary genders - and do not include an analysis of nonbinary, genderqueer, or other "third gender" individuals. While this is likely because of the limitations of the dataset utilized, the focus of this study on binary genders means that it does not reflect the true diversity of gender identities represented within the professoriate.

      In a similar vein, additional context on how gender was assigned on the basis of names should be added to the methods section.

      We use a free, open-source, and open-data python package called nomquamgender (Van Buskirk et al, 2023) to estimate the strengths of (culturally constructed) name-gender associations. For sufficiently strong associations with a binary gender, we apply those labels to the names in our data. We have updated the main text to make this approach more apparent.

      We have also added language to the main text which explicitly acknowledges that our approach only assigns binary (woman/man) labels to faculty. We point out that this is a compromise due to the technical limitations of name-based gender methodologies and is not intended to reinforce a gender binary.

      I do think that some care might be warranted regarding the statement that "eliminating gendered attrition leads to only modest changes in field-level diversity" (Page 6). while I do not think that this is untrue, I do think that the model scenarios where hiring is "radical" and attrition is unchanged from present (equal representation of women and men among hires (ER) + observed attrition (OA)) shows that a sole focus on hiring dampens the gains that can otherwise be addressed via even modest interventions (see, e.g., gender-neutral attrition (GNA) + increasing representation of women among hires (IR)). I am curious as to why the authors did not include an additional scenario where hiring rates are equal and attrition is equalized (i.e., GNA + ER). The importance of including this additional model is highlighted in the discussion, where, on Page 7, the authors write: "In our forecasting analysis, we find that eliminating the gendered attrition gap, in isolation, would not substantially increase representation of women faculty in academia. Rather, progress towards gender parity depends far more heavily on increasing women's representation among new faculty hires, with the greatest change occurring if hiring is close to gender parity." I believe that this statement would be greatly strengthened if the authors can also include a comparison to a scenario where both hiring and attrition are addressed with "radical" interventions.

      Our rationale for omitting the GNA + ER scenario in the presented analysis is that we can reason about the outcomes of this scenario without the need for computation; if a field has equal inputs of women and men faculty (on average) and equal retention rates between women and men (on average), then, no matter the field’s initial age and gender distribution of faculty, the expected value for the percentage of women faculty after all of the prior faculty have retired (which may take 40+ years) is exactly 50%. We have updated the main text to discuss this point.

      Reviewer #3 (Public Review):

      This manuscript investigates the roles of faculty hiring and attrition in influencing gender representation in US academia. It uses a comprehensive dataset covering tenured and tenure-track faculty across various fields from 2011 to 2020. The study employs a counterfactual model to assess the impact of hypothetical gender-neutral attrition and projects future gender representation under different policy scenarios. The analysis reveals that hiring has a more significant impact on women's representation than attrition in most fields and highlights the need for sustained changes in hiring practices to achieve gender parity.

      Strengths:

      Overall, the manuscript offers significant contributions to understanding gender diversity in academia through its rigorous data analysis and innovative methodology.

      The methodology is robust, employing extensive data covering a wide range of academic fields and institutions.

      Weaknesses:

      The primary weakness of the study lies in its focus on US academia, which may limit the generalizability of its findings to other cultural and academic contexts.

      We agree that the U.S. focus of this study limits the generalizability of our findings. The findings that we present in this work will only generalize to other populations–whether it be to an alternate industry, e.g., tech workers, or to faculty in different countries–to the extent that these other populations share similar hiring patterns, retention patterns, and current demographic representation. We have added a discussion of this limitation to the manuscript.

      Additionally, the counterfactual model's reliance on specific assumptions about gender-neutral attrition could affect the accuracy of its projections.

      Our projection analysis is intended to illustrate the potential gender representation outcomes of several possible counterfactual scenarios, with each projection being conditioned on transparent and simple assumptions. In this way, the projection analysis is not intended to predict or forecast the future.

      To resolve this point for our readers, we now introduce our projections in the context of the related terms of prediction and forecast, noting that they have distinct meanings as terms of art: On one hand, prediction and forecasting involve anticipating a specific outcome based on available information and analysis, and typically rely on patterns, trends, or historical data to make educated guesses about what will happen. Projections are based on assumptions and are often presented in a panel of possible future scenarios. While predictions and forecasts aim for precision, projections (which we make in our analysis) are more generalized and may involve a range of potential outcomes.

      Additionally, the study assumes that whoever disappeared from the dataset is attrition in academia. While in reality, those attritions could be researchers who moved to another country or another institution that is not included in the AARC (Academic Analytics Research Centre) dataset.

      In our revision, we have elevated this important point, and clarified it in the context of the various ways in which we count hires and attritions. We now explicitly state that “We define faculty hiring and faculty attrition to include all cases in which faculty join or leave a field or domain within our dataset.” Then, we enumerate the number of situations that could be counted as hires and attritions, including the reviewer’s example of faculty who move to another country.

      Reviewer #1 (Recommendations For The Authors):

      Section B: The authors use an age structured Leslie matrix model (see Caswell for a good reference to these) to test the effect of making the attrition rates or hiring rates equal for men and women. My main concern here is the fitting techniques for the parameters. These are described (a little too!) briefly in section S1B. Some specific questions that are left hanging include:

      A 5th order polynomial is an interesting choice. Some statistical evidence as to why it was the best fit would be useful. What other candidate models were compared? What was the "best fit" judgement made with: AIC, r^2? What are the estimates for how good this fit is? How many data points were fitted to? Was it the best fit choice for all of the 111 fields for men and women?

      We use a logistic regression model for each field to infer faculty attrition probabilities across career ages and time, and we include the career age predictor up to its fifth power to capture the career-age correlations observed in Spoon et. al., Science Advances, 2023. For ease of reference, we reproduce the attrition risk curves in Fig S4.

      We note that faculty attrition rates start low and then reach a peak around 5-7 years after earning PhD, and then decline until around 15-20 years post-PhD, after which, attrition rates increase as faculty approach retirement.

      This function shape starts low and ends high, and includes at least one local minimum, which indicates that career age should be odd-ordered in the model and at least order-3, but only including career age up to its 3rd order term tended to miss some of the overserved career-age/attrition correlations. We evaluated the fit using 5-fold cross validation with a Brier score loss metric, and among options of polynomials of degree 1, 3, 5, or 7, we found that 5th order performed well overall on average over all fields (even if it was not the best for every field), without overfitting in fields with fewer data. Example fits, reminiscent of the figure from Spoon et al, are now provided in Figs S4 and S5.

      While the model fit with fifth order terms may not be the best fit for all 111 fields (e.g., 7th order fits better in some cases), we wanted to avoid field-specific curves that might be overfitted to the field-specific data, especially due to low sample size (and thus larger fluctuations) on the high career age side of the function. Our main text and supplement now includes justifications for our choice to include career age up to its fifth order terms.

      You used the 5th order logistic regression (bottom of page 11) to model attrition at different ages. The data in [24] shows that attrition increases sharply, then drops then increases again with career age. A fifth order polynomial on its own could plausibly do this but I associate logistic regression models like this as being monotonically increasing (or decreasing!), again more details as to how this worked would be useful.

      Our first submission did not explain this point well, but we hope that Supplementary Figures S4 and S5 provide clarity. In short, we agree of course that typical logistic regression assumes a linear relationship between the predictor variables and the log odds of the outcome variable. This means that the relationship between the predictor variables and the probability of the outcome variable follows a sigmoidal (S-shaped) curve. However, the relationship between the predictor variables and the outcome variable may not be linear.

      To capture more complex relationships, like the increasing, decreasing and then increasing attrition rates as a function of career age, higher-order terms can be added to the logistic regression model. These higher-order terms allow the model to capture nonlinear relationships between the predictor variables and the outcome variable — namely the non-monotonic relationship between rates of attrition and career age — while staying within a logistic regression framework.

      "The career age of new hires follows the average career age distribution of hires" did you use the empirical distribution here or did you fit a standard statistical distribution e.g. Gamma?

      We used the empirical distribution. This information has been added to the updated methods section in the main text.

      How did you account for institution (presumably available)? Your own work has shown that institution types plays a role which could be contributing to these results.

      See below.

      What other confounding variables could be at play here, what is available as part of the data and what happens if you do/don't account for them?

      A number of variables included in our data have been shown to correlate with faculty attrition, including PhD prestige, current institution prestige, PhD country, and whether or not an individual is a “self-hire,” i.e., trained and hired at the same institution (Wapman et. al., Nature, 2022). Additional factors that faculty self-report as reasons for leaving academia include issues of work-life balance, workplace climate, and professional reasons, and in some cases to varying degrees between men and women faculty (Spoon et. al., Sci. Adv., 2023).

      Our counterfactual analysis aims to address a specific question: how would women’s representation among faculty be different today if men and women were subjected to the same attrition patterns over the past decade? To answer this question, it is important to account for faculty career age, which we accept as a variable that will always correlate strongly with faculty attrition rates, as long as the tenure filter remains in place and faculty continue to naturally progress towards retirement age. On the other hand, it is less clear why PhD country, self-hire status, or any of the other mentioned variables should necessarily correlate with attrition rates and with gendered differences in attrition rates more specifically. While some or all of these variables may underlie the causal roots of gendered attrition rates, our analysis does not seek to answer causal questions about why faculty leave their jobs (e.g., by testing the impact of accounting for these variables in simulations per the reviewers suggestion). This is because we do not believe the data used in this analysis is sufficient to answer such questions, lacking comprehensive data on faculty stress (Spoon et. al., Sci. Adv., 2023), parenthood status, etc.

      What career age range did the model use?

      The career age range observed in model outcomes are a function of the empirically derived attrition rates for faculty across academic fields. The highest career age observed in the AARC data was 80, and the faculty career ages that result from our model simulations and projections do not exceed 80.

      We have also added the distribution of faculty across career ages for the projection scenario model outputs in the supplemental materials Fig. S3 (see response to your later comment regarding career age for further details). Looking at these distributions, it is observed that very few faculty have career age > 60, both in observation and in our simulations.

      What was the initial condition for the model?

      Empirical 2011 Faculty rosters are used as the initial conditions for the counterfactual analysis, and 2020 faculty rosters are these as the initial conditions for the projections analysis. This information has been added to the descriptions of methods in the main text.

      Starting the model in 2011 how well does it fit the available data up to 2020?

      Thank you for this suggestion. We ran this analysis for each field starting in 2011, and found that model outcomes were statistically indistinguishable from the observed 2020 faculty gender compositions for all 111 academic fields. This finding is not surprising, because the model is fit to the observed data, but it serves to validate the methods that we used to extract the model's parameters. We have added these results to the supplement (Fig. S2).

      What are the sensitivity analysis results for the model? If you have made different fitting decisions how much would the results change? All this applied to both the hiring and attrition parameters estimates.

      We model attrition and hiring using logistic regression, with career age included as an exogenous variable up to its fifth power. A natural question follows: what if we used a model with career age only to its first or third power? Or to higher powers? We performed this sensitivity analysis, and added three new figures to the supplement to present these findings:

      First, we show the observed attrition probabilities at each career age, and four model fits to attrition data (Supplementary Figs S4 and S5). The first model includes career age only to its first power, and this model clearly does not capture the full career age / attrition correlation structure. The second model includes career age to its third power, which does a better job of fitting to the observed patterns. The third model includes career age up to its fifth power, which appears to very modestly improve upon the former model. The fourth model includes career age up to its seventh power, and the patterns captured by this model are largely the same as the 5th-power model up to career age 50, beyond which there are some notable differences in the inferred attrition probabilities. These differences would have relatively little impact on model outcomes because the vast majority of faculty have a career age below 50.

      Second, we show the observed probability that hires are women, conditional on the career age of the hire. Once again, we fit four models to the data, and find that career age should be included at least up to its fifth order in order to capture the correlation structures between career age and the gender of new hires. However, limited differences result from including career age up to the 7th degree in the model (relative to the 5th degree).

      As a final sensitivity analysis, we reproduce Fig. 2, but rather than including career age as an exogenous variable up to its fifth power in our models for hiring and attrition, we include career age up to its third power. Findings under this parameterization are qualitatively very similar to those presented in Fig. 2, indicating that the results are robust to modest changes to model parameterization (shown in supplement Fig. S6).

      Far more detail in this and some interim results from each stage of the analysis would make the paper far more convincing. It currently has an air of "black box" too much of the analysis which would easily allow an unconvinced reader to discard the results.

      We have added more detailed descriptions of the methods to the main text. We hope that the changes made will address these concerns.

      Section C: You use the Leslie model to predict the future population. As the model is linear the population will either grow exponentially (most likely) or dwindle to zero. You mention you dealt with this by scaling the average value of H to keep the population at 2020 levels? This would change the ratio of hiring to attrition. How did this affect the timescale of the results. If a field had very minimal attrition (and hence grew massively over the time period of the dataset) the hiring rate would have to be very small too so there would be very little change in the gender balance. Did you consider running the model to steady state instead?

      We chose the 40 year window (2020-2060) for this projection analysis because 40 years is roughly the timespan of a full-length faculty career. In other words, it will take around 40 years for most of the pre-existing faculty from 2020 to retire, such that the new, simulated faculty will have almost entirely replaced all former faculty by 2060.

      For three out of five of our projection scenarios (OA, GNA, OA+ER), the point at which observed faculty are replaced by simulated faculty represents steady state. One way to check this intuition is to observe the asymptotic behavior of the trajectories in Fig. 3B; the slopes for these 3 scenarios nearly level out within 40 years.

      The other two scenarios (OA + IR, GNA+IR) represent situations where women’s representation among new hires is increasing each year. These scenarios will not reach steady state until women represent 100% of faculty. Accordingly, the steady state outcomes for these scenarios would yield uninteresting results; instead, we argue that it is the relative timescales that are interesting.

      What did you do to check that your predictions at least felt realistic under the fitted parameters? (see above for presenting the goodness of fit over the 10 years of the data).

      We ran the analysis suggested in a prior comment (Starting the model in 2011 how well does it fit the available data up to 2020?) and found that model outcomes were statistically indistinguishable from the observed 2020 faculty gender compositions for all 111 academic fields, plus the “All STEM” and “All non-STEM” aggregations.

      You only present the final proportion of women for each scenario. As mentioned earlier, models of this type have a tendency to lead to strange population distributions with wild age predictions and huge (or zero populations). Presenting more results here would assuage any worries the reader had about these problems. What is the predicted age distribution of men and women in the long term scenarios? Would a different method of keeping the total population in check have yielded different results? Interim results, especially from a model as complex as this one, rather than just presenting a final single number answer are a convincing validation that your model is a good one! Again, presenting this result will go a long way to convincing readers that your results are sound and rigorous.

      Thank you for this suggestion. We now include a figure that presents faculty age distributions for each projection scenario at 2060 against the observed faculty age distribution in 2020 (pictured below, and as Fig. S3 in the supplementary materials). We find that the projected age distributions are very similar to the observed distributions for natural sciences (shown) and for the additional academic domains. We hope this additional validation will inspire confidence in our model of faculty hiring and attrition for the reviewer, and for future readers.

      In Fig S3, line widths for the simulated scenarios span the central 95% of simulations.

      Other people have reached almost identical conclusions (albeit it with smaller data sets) that hiring is more important than attrition. It would be good to compare your conclusions with their work in the Discussion.

      We have revised the main text to cite the listed examples of similar studies. We thank the reviewer for bringing these relevant works to our attention.

      General comments:

      What thoughts have you given to non-binary individuals?

      Be careful how you use the term "gender diversity"! In many countries "Gender diverse" is a term used in data collection for non-binary individuals, i.e. Male, female, gender diverse. The phrase "hiring more gender diverse faculty" can be read in different ways! If you are only considering men and women then gender balance may be a better framework to use.

      We have added language to the main text which explicitly acknowledges that our analysis focuses on men and women due to limitations in our name-based gender tool, which only assigns binary (woman/man) labels to faculty. We point out that this is a compromise due to the technical limitations of name-based gender methodologies and is not intended to reinforce a gender binary.

      We have also taken additional care with referring to “gender diversity,” per reviewer 1’s point in their public review.

      Reviewer #2 (Recommendations For The Authors):

      Data availability: I did not see an indication that the dataset used here is publicly available, either in its raw format or as a summary dataset. Perhaps this is due to the sensitive nature of the data, but regardless of the underlying reason, the authors should include a note on data availability in the paper.

      The dataset used for these analyses were obtained under a data use agreement with the Academic Analytics Research Center (AARC). While these data are not publicly available, researchers may apply for data access here: https://aarcresearch.com/access-our-data.

      We also added a table to the supplemental materials (Tab. S3) that reports the estimated number of men and women in each of the 111 fields.

      Additionally, a variety of summary statistics based on this dataset are available online, here: https://github.com/LarremoreLab/us-faculty-hiring-networks/tree/main

      Gender classification: Was an existing package used to classify gender from names in the dataset, or did the authors develop custom code to do so? Either way, this code should be cited. I would also be curious to know what the error rate of these classifications are, and suggest that additional information on potential biases that might result from automated classifications be included in the discussion, under the section describing data limitations. The reliability of name-based gender classification is particularly of interest, as external gender classifications such as those applied on the basis of an individual's name - may not reflect the gender with which an individual self-identifies. In other words, while for many people their names may reflect their true genders, for others those names may only reflect their gender assigned at birth and not their self-perceived or lived gender identity. Nonbinary faculty are in particular invisibilized here (and through any analysis that assigns binary gender on the basis of name). While these considerations do not detract from the main focus of the study - which was to utilize an existing dataset classified only on the basis of binary gender to assess trends for women faculty-these limitations should be addressed as they provide additional context for the interpretation of the results and suggest avenues for future research.

      We use a free, open-source, and open-data python package called nomquamgender (Van Buskirk et al, 2023) to estimate the strengths of (culturally constructed) name-gender associations. For sufficiently strong associations with a binary gender, we apply those labels to the names in our data. We have updated the main text to make this approach more apparent.

      We have also added language to the main text which explicitly acknowledges that our approach only assigns binary (woman/man) labels to faculty. We point out that this is a compromise due to the technical limitations of name-based gender methodologies and is not intended to reinforce a gender binary.

      As we mentioned in response to the public review, we use a free and open source python package called nomquamgender to estimate the strengths of name-gender associations, and we apply gender labels to the names with sufficiently strong associations with a binary gender. This package is based on a paper by Van Buskirk et. al. 2023, “An open-source cultural consensus approach to name-based gender classification,” which documents error rates and potential biases.

      We have also added language to the main text which explicitly acknowledges that our approach only assigns binary (woman/man) labels to faculty. We point out that this is a compromise due to the technical limitations of name-based gender methodologies and is not intended to reinforce a gender binary.

      Page 1: The sentence beginning "A trend towards greater women's representation could be caused..." is missing a conjunction. It should likely read: "A trend towards greater women's representation could be caused entirely by attrition, e.g., if relatively more men than women leave a field, OR entirely by hiring..."

      We have edited the paragraph to remove the sentence in question.

      Pages 1-2: The sentence beginning "Although both types of strategy..." and ending with "may ultimately achieve gender parity" is a bit of a run-on; perhaps it would be best to split this into multiple sentences for ease of reading.

      We have revised this run-on sentence.

      Page 2: See comments in the public review about a methods section, the addition of which may help to improve clarity for the readers. Within the existing descriptions of what I consider to be methods (i.e., the first three paragraphs currently under "results"), some minor corrections could be added here. First, consider citing the source of the dataset in the line where it is first described (in the sentence "For these analyses, we exploit a census-level dataset of employment and education records for tenured and tenure-track faculty in 12,112 PhD-granting departments in the United States from 2011-2020.") It also may be helpful to include context here (or above, in the discussion about institutional analyses) about how "departments" can be interpreted. For example, how many institutions are represented across these departments? More information on how the authors eliminated the gendered aspect of patterns in their counterfactual model would be helpful as well; this is currently hinted at on page 4, but could instead be included in the methods section with a call-out to the relevant supplemental information section (S2B).

      We have added a citation to Academic Analytics Research Center’s (AARC) list of available data elements to the data’s introduction sentence. We hope this will allow readers to familiarize themselves with the data used in our analysis.

      Faculty department membership was determined by AARC based on online faculty rosters. 392 institutions are represented across the 12,112 departments present in our dataset. We have updated the main text to include this information.

      Finally, we have added a methods section to the main text, which includes information on how the gendered aspect of attrition patterns were eliminated in the counterfactual model.

      Page 2: Perhaps some indication of how many transitions from an out-of-sample institution might be helpful to readers hoping to understand "edge cases."

      In our analysis, we consider all transitions from out-of-sample institutions to in-sample institutions as hires, and all transitions away from in-sample institutions–whether it be to an out of sample institution, or out of academia entirely–as attritions. We choose to restrict our analysis of hiring and attrition to PhD granting institutions in the U.S. in this way because our data do not support an analysis of other, out-of-sample institutions.

      I also would have liked additional information on how many faculty switched institutions but remained "in-sample and in the same field" - and the gender breakdowns of these institutional changes, as this might be an interesting future direction for studies of gender parity. (For example, readers may be spurred to ask: if the majority of those who move institutions are women, what are the implications for tenure and promotion for these individuals?)

      While these mid-career moves are not counted as attritions in the present analysis, a study of faculty who switch institutions but remain (in-sample) as faculty could shed light on issues of gendered faculty retention at the level of institutions. We share the reviewer’s interest in a more in depth study of mid-career moves and how these moves impact faculty careers, and we now discuss the potential value of such a study towards the end of the paper. In fact, this subject is the topic of a current investigation by the authors!

      Page 3: I was confused by the statement that "of the three types of stable points, only the first point represents an equitable steady-state, in which men and women faculty have equal average career lengths and are hired in unchanging proportions." Here, for example, computer science appears to be close to the origin on Figure 1, suggesting that hiring has occurred in "unchanging proportions" over the study interval. However, upon analysis of Table S2, it appears that changes in hiring in Computer Science (+2.26 pp) are relatively large over the study interval compared to other fields. Perhaps I am reading too literally into the phrase that "men and women faculty are hired in unchanging proportions" - but I (and likely others) would benefit from additional clarity here.

      We had created an arrow along with the computer science label in Fig. 1, but it was difficult to see, which is likely the source of this confusion. This was our fault, and we have moved the “Comp. Sci.” label and its corresponding arrow to be more visible in Figure 1.

      Changes in women’s representation in Computer Science due to hiring over 2011 - 2020 was +2.26 pp as the reviewer points out, but, consulting Fig. 1 and the corresponding table in the supplement, we observe that this is a relatively small amount of change compared to most fields.

      Page 3: If possible it may be helpful to cite a study (or multiple) that shows that "changes in women's representation across academic fields have been mostly positive." What does "positive" mean here, particularly when the changes the authors observe are modest? Perhaps by "positive" you mean "perceived as positive"?

      We used the term positive in the mathematical sense, to mean greater than zero. We have reworded the sentence to read “women's representation across academic fields has been mostly increasing…” We hope this change clarifies our meaning to future readers.

      Page 3: The sentence that ends with "even though men are more likely to be at or near retirement age than women faculty due to historical demographic trends" may benefit from a citation (of either Figure S3 or another source).

      We now cite the corresponding figure in this sentence.

      Page 4: The two sentences that begin with "The empirical probability that a person leaves their academic career" would benefit from an added citation.

      We have added a citation to the sentences.

      Figure 3: Which 10 academic domains are represented in Panel 3B? The colors in appear to correspond to the legend in Panel 3A, but no indication of which fields are represented is provided. If possible, please do so - it would be interesting and informative to be able to make these comparisons.

      This was not clear in the initial version of Fig. 3B, so we now label each domain. For reference, the domains represented in 3B are (from top to bottom):

      ● Health

      ● Education

      ● Journalism, Media, Communication

      ● Humanities

      ● Social Sciences

      ● Public Administration and Policy

      ● Medicine

      ● Business

      ● Natural Sciences

      ● Mathematics and Computing

      ● Engineering

      Page 6: Consider citing relevant figure(s) earlier up in paragraph 2 of the discussion. For example, the first sentence could refer to Figure 1 (rather than waiting until the bottom of the paragraph to cite it).

      Thank you for this suggestion, we now cite Fig. 1 earlier in this discussion paragraph.

      Page 10: A minor comment on the fraction of women faculty in any given year-the authors assume that the proportion of women in a field can be calculated from knowing the number of women in a field and the number of men. This is, again, true if assuming binary genders but not true if additional gender diversity is included. It is likely that the number of nonbinary faculty is quite low, and as such would not cause a large change in the overall proportions calculated here, but additional context within the first paragraph of S1 might be helpful for readers.

      We have added additional context in the first paragraph of S1, explaining that an additional term could be added to the equation to account for nonbinary faculty representation if our data included nonbinary gender annotations. Thank you for making this point.

      Page 10: Please include a range of values for the residual terms of the decomposition of hiring and attrition in the sentence that reads "In Figure S1 we show that the residual terms are small, and thus the decomposition is a good approximation of the total change in women's representation."

      These residual terms range from -0.51pp to 1.14pp (median = 0.2pp). We have added this information to the sentence in question.

      Page 12: It may be helpful to readers to include a description of the information contained in Table S2 in the supplemental text under section S3.

      We refer to table S2 twice in the main text (once in the observational findings, and once for the counterfactual analysis), and the contents of table S2 are described thoroughly in the table caption.

      Reviewer #3 (Recommendations For The Authors):

      (1) There is a potential limitation in the generalizability of the findings, as the study focuses exclusively on US academia. Including international perspectives could have provided a more global understanding of the issues at hand.

      The U.S. focus of this study limits the generalizability of our findings, as non-U.S. other faculty may exhibit differences in hiring patterns, retention patterns, and current demographic representations. We have added a discussion of this limitation to the manuscript. Unfortunately, our data do not support international analyses of hiring and attrition.

      (2) I am not sure that everyone who disappeared from the AARC dataset could be count as "attrition" from academia. Indeed, some who disappeared might have completely left academia once they disappeared from the AARC dataset. Yet, there's also the possibility that some professors left for academic positions in countries outside of the US, or US institutions that are not included in the AARC dataset. These individuals didn't leave academia. Furthermore, it is also possible that these scholars who moved to an institution outside of US or not indexed by AARC are gender specific. Therefore, analyses that this study conducts should find a way to test whether the assumption that anyone who disappeared from AARC is indeed valid. If not, how will this potentially challenge the current conclusions?

      The reviewer makes an important point: faculty who move to faculty positions in other countries and faculty who move to non-PhD granting institutions, or to institutions that are otherwise not included in the AARC data are all counted as attritions in our analysis. We intentionally define hiring and attrition broadly to include all cases in which faculty join or leave a field or domain within our dataset.

      The types of transitions that faculty make out of the tenure track system at PhD granting institutions in the U.S. may correlate with faculty attributes, like gender. For example, women or men may be more likely to transition to tenure track positions at non-U.S. institutions. Nevertheless, these types of career transition represent an attrition for the system of study, and a hire for another system. Following this same logic, faculty who transition from one field to another field in our analysis are treated as an attrition from the first field and a hire into the new field.

      By focusing on “all-cause” attrition in this way, we are able to make robust insights for the specific systems we consider (e.g.,, STEM and non-STEM faculty at U.S. PhD granting institutions), without being roadblocked by the task of annotating faculty departures and arbitrating which should constitute “valid” attritions.

      (3) It would be very interesting to know how much of the attribution was due to tenure failure. Previous studies have suggested that women are less likely to be granted tenure, which makes me wonder about the role that tenure plays in the gendered patterns of attrition in academia.

      We note that faculty attrition rates start low and then reach a peak around 5-7 years after earning PhD, and then decline until around 15-20 years post-PhD, after which, attrition rates increase as faculty approach retirement. The first local maximum appears to coincide roughly with the tenure clock timing, but we can only speculate that these attritions are tenure related. Our dataset is unfortunately not equipped to determine the causal mechanisms driving attrition.

      We reproduce the attrition risk curve in the supplementary materials, Fig. S4:

      (4) The dataset used doesn't fully capture the complexities of academic environments, particularly smaller or less research-intensive institutions (regional universities, historically black colleges and universities, and minority-serving institutions). This could be potentially added to the manuscript for discussions.

      We have added this point to the description of this study’s limitations in the discussion.

    1. Author response:

      We thank the reviewers for their thoughtful consideration of our study and are delighted they found the findings to be important. In this initial response to the overall positive reviews, we want to address common themes raised, clarify points relevant to a few specific reviewer concerns, and frame plans for the revised manuscript.

      (1) Analysis of data from human tissue: Reviewer 1 notes “In their analyses of enteric glia from existing single-cell transcriptomic data sets, it is stated that these come from 'non-diseased' humans. However, the data on the small intestine is obtained from children with functional gastrointestinal disorders (Zheng 2023). Data on colonic enteric glia was obtained from colorectal cancer patients (Lee 2020). Although here the cells were isolated from non-malignant regions, saying that the large intestines of these patients are non-diseased is probably an overstatement.

      In the Zheng et al. dataset, “functional GI disorders” refers to biopsies from children that do not have any histopathologic evidence of digestive disease. The children do, however, have at least one GI symptom that prompted a diagnostic endoscopy with biopsies, leading to the designation of “functional” disorder. Given that diagnostic endoscopies are invasive procedures that necessitate anesthesia, obtaining biopsies from completely healthy, asymptomatic children without any clinical indication would not be allowable per most institutional review boards, leading the authors of that study to use these samples as a control group. We thus used the “non-diseased” label to encompass these samples as well as those from the unaffected regions of large intestine from colorectal cancer patients. We recognize, however, that this label might be misleading and will revise the manuscript to more accurately reflect the information on control tissue origin.

      Another existing dataset including human mucosal enteric glia of healthy subjects is presented in Smillie et al (2019). It would be interesting to see how the current findings relate to the data from Smillie et al.” 

      We thank the reviewer for directing us to the Smillie et al. 2019 dataset. This dataset derives from colonic mucosal biopsies from 12 healthy adults (8480 stromal cells) and 18 adults with ulcerative colitis (10,245 stromal cells from inflamed bowel segments and 13,146 from uninflamed), all between the ages of 20-77 years. Our preliminary analysis shows that the putative glial cluster in this dataset does not separate by inflammation or disease state based on the common glial genes: S100B, PLP1, and SOX10. PLP1 and S100B are broadly expressed across this cluster while GFAP is not detected in this dataset, consistent with our observations from the two other human datasets included in our manuscript. In the revised manuscript, we will include the Smillie et al. 2019 data in a supplemental figure as additional supportive evidence.

      (2) Validation and further details of the Plp1CreER-DTA model for genetic depletion of enteric glia: Reviewer 1 notes “The time between enteric glia depletion and analyses (mouse sacrifice) must be a crucial determinant of the type of effects, and the timing thereof. In the current study 11 days after tamoxifen treatment was chosen as the time point for analyses, which is consistent with earlier work by the lab using the same model (Rao et al 2017). What would happen when they wait longer than 11 days after tamoxifen treatment?”  Reviewer 3 asks whether “the Plp1CreER Rosa26DTA/+ mice system established correctly” and raises concern about quantitative characterization.

      In previous work, we discovered that the gene Plp1 is broadly expressed by enteric glia and, within the mouse intestine, is quite specific to glial cells (PMID: 26119414). We characterized the Plp1CreER mouse line as a genetic tool in detail in this initial study. Then in a subsequent study, we used Plp1CreER-DTA mice to genetically deplete enteric glia and study the consequences on epithelial barrier integrity, crypt cell proliferation, enteric neuronal health and gastrointestinal motility (PMID: 28711628). In this second study, we performed extensive validation of the Plp1CreER-DTA mouse model including detailed quantification of glial depletion in the small and large intestines across the myenteric, intramuscular and mucosa compartments by immunohistochemical (IHC) staining of whole tissue segments to sample thousands of cells. We found that the majority of S100B+ enteric glia were depleted within 5 days in both sexes, including more than 88% loss of mucosal glia, and that this loss was stable at 3 subsequent timepoints (7, 9 and 14 days post-tamoxifen induction of Cre activity). Glial loss was further confirmed by IHC for GFAP in the myenteric plexus, and by ultrastructural analysis of the small intestine to ensure cell depletion rather than simply loss of marker expression. Our group was the first to use this model to study enteric glia, and since then similar models and our key observations have been replicated by other groups (PMID: 33282743, 34550727). Thus, we consider this model to be well established.

      Reviewer 1 raises an excellent question about examining epithelial health beyond 11 days post-tamoxifen (11dpt) in this model. Particularly given the longer-lived nature of Paneth cells relative to other epithelial cell types, this would be very interesting to explore. Through 11dpt, Cre+ mice are well-appearing and indistinguishable from their Cre-negative control littermates. Unfortunately, a limitation of the Plp1CreER-DTA model is that beyond 11dpt, Cre+ mice become anorexic, lose body weight, and have signs of neurologic debility such as hindlimb weakness and uncoordinated gait that are prominent by 14dpt. These phenotypes are likely the consequence of targeting Plp1+ glia outside the gut, such as Schwann cells and oligodendrocytes (as described in another study which used a similar model to study demyelination in the central nervous system, PMID: 20851998). Given these CNS effects and that starvation is well known to affect Paneth cell phenotypes (PMIDs: 1167179, 21986443), we elected not to examine timepoints beyond 11dpt. Technological advances that enable more selective cell depletion would allow study of more chronic effects of enteric glial loss.

      (3) Sex differences in the microbiome data: All 3 reviewers queried whether there were sex differences in the microbiome data with Reviewer 1 explaining “Previously the authors showed that enteric glia regulation of intestinal motility is sex-dependent (Rao et al 2017). While enteric glia depletion caused dysmotility in female mice, it did not affect motility in males. For this reason, most experiments in the current study were conducted in male mice only. However, for the experiments focusing on the effect of enteric glia depletion on host-microbiome interactions and intestinal microbiota composition both male and female mice were used. In Figure 8A male and female mice are distinctly depicted but this was not done for Figure 8C. Separate characterization of the microbiome of male and female mice would have helped to figure out how much intestinal dysmotility (in females) contributes to the effect on gut microbial composition. This is an important exercise to confirm that the effect on the microbiome is indeed a consequence of altered Paneth cell function…”

      In our microbiome analysis, we initially analyzed males and females separately but did not observe significant differences between the two sexes. Thus, we merged the data to increase the statistical power of the genotype comparisons. It was an oversight on our part to not label the female and male datapoints in Figure 8C as we did for the other data in the manuscript. We will update this graph and related supplemental figures in the revised version. Per Reviewer 2’s suggestion, we will also address this further in the Results and Discussion.

      (4) Reconciling RNA-Seq identification of transcriptional changes in the colon, but not the small intestine, while the GSEA and downstream tissue level morphological and functional analyses detected phenotypes in the small intestine. Reviewers 1 and 3 raised this question with Reviewer 1 noting “…enteric glia depletion was found to affect Paneth cells structurally and functionally in the small intestine, where transcriptional changes were initially not identified. Only when performing GSEA with the in silico help of cell type-specific gene profiles, differences in Paneth cell transcriptional programs in the small intestine were uncovered. A comment on this discrepancy would be helpful, especially for the non-bioinformatician readers among us.” 

      Standard differential gene expression analysis (DEG) of the effects of glial loss revealed significant differences only in the colon, and even there only a handful of genes were changed. These changes were not accompanied by corresponding changes at the protein level, at least as detectable by IHC. In the small intestine, there were no significant differences by standard DEG thresholds. Unlike DEG, gene set enrichment analyses (GSEA), provides a significance value based on whether there is a higher than chance number of genes that are changing in a uniform direction without consideration for the significance of the magnitude of change. Therefore, the GSEA detected that a significant number of genes in the curated Paneth cell gene list exhibited a positive fold change difference in the bulk RNA sequencing data. This prompted us to examine Paneth cells and other epithelial cell types in more detail by IHC, functional and ultrastructural analyses, which all converged on the observation that Paneth cells were relatively selectively disrupted in the epithelium of glial depleted mice.

      (5) Other: We will address all remaining comments in our detailed author response that will accompany our revised manuscript. We thank Reviewer 2 for the very positive feedback overall and highlighting opportunities to better label findings in some of the figures. We will make these suggested changes in our revised manuscript.

    1. Author response:

      We thank the reviewers for their highly valuable comments and recommendations on our manuscript. We particularly appreciate receiving reviews from three distinct points of view, all highly relevant to our study (i.e. from an ecological, biomechanics, and evolutionary biology perspective).

      We will now carefully address all reviewer comments and questions, and resubmit a revised version in due time. Again, we thank the reviewers for their rigorous assessment of our study, which will greatly help us improving our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The findings in this study are useful and may have practical implications for predicting DLBCL risk subject to further validating the bioinformatics outcomes. We found the approach and data analysis solid. However, some concerns regarding the drug sensitivity prediction and the links between the selected genes for the risk scores have been raised that need to be addressed by further functional works.

      Thanks for your high recognition for our study. In fact, we have searched the treatment information of DLBCL patients in our own cohort, however, unfortunately all patients were treated strictly according to the guidelines issued by authorities of China, which suit Chinese patients fine but do not include the drugs explored in the present study. Therefore, more further investigations should be designed and conducted to validate our conclusion. Here, we provided a possible direction for future studies base on large cohorts, which could not only provide more reliable conclusions, but gain more attentions to the role of tumor microenvironment in influencing outcome and drug sensitivity.

      Public Reviews:

      Sincere thanks for all reviewers’ positive comments on our study and their helpful recommendations for improving our manuscript. For this part, we have sorted out the comments and recommendations from all reviewers, and made corresponding revisions. And here are our responses.

      (1) How did we determined the three genes (VCAN, C1QB and CD3G) in the prognostic model?

      Just as was mentioned in the “Prognostic model” in Materials and Methods section, the gene was selected by “survival” package in R. After we obtained the nine genes, we input the expression value of them, and analyzed with “survival” package in R. And the function “step” in that package can optimize the model, that is, to construct a model with as less factors as possible, and the finally enrolled factors were representative and presented the least collinearity. Through this way, the prognostic model we got could be more practical in clinical practice.

      (2) Different centers have different protocols of IHC, so how could we put this model into clinical practice under this circumstance?

      Not only did different centers have different protocols, the materials like antibodies also vary. Therefore, there is actually a long way to go in putting our study into clinical practice. As far as we’re concerned, there are at least three problems to solve. First, diagnostic antibodies should be used in clinical practice, which usually manifest better specificity and sensitivity. And this may be the reason why the staining of VCAN and C1QB was strong and difficult to differentiate. Second, a standardized protocol should be made. Last but not least, more precise analyses and studies should be conducted to make it clear which type of cells specifically express these genes (just as was mentioned by Reviewer #2). We are now endeavoring to solve these problems by utilizing as many techniques as possible, like multi-omics and mIHC. From revealing the true expression pattern to developing high quality antibodies and even standardized test kit, we are looking forward to a clinical translation.

      (3) The analyses about immune infiltration and the key genes in DLBCL were superficial, limited within the correlation analyses.

      Due to the model constructed based on tumor purity of DLBCL, the risk score could be associated with the enrichment of cell functions. We conducted GSEA analysis based on the differentially expressed genes between high-risk group and low-risk group in the two datasets (Figure 5H-I). It showed that the extracellular organization and cellular adhesion were different between the two groups, in which way the immune infiltration and activity might be regulated owing to the motility of immune cells. Besides, we have validated the infiltration of M1 macrophages and M2 macrophages with our own cohort (Supplementary Figure 3P).

      (4) The drug sensitivity was just analyzed based on the model, which should be validated in real world research or lab study. And the sensitivity score seemed not different too much in most cases, even though there were statistical significance.

      We tried to search the treatment information of DLBCL patients in our own cohort, however, unfortunately all patients were treated strictly according to the guidelines issued by authorities of China, which suit Chinese patients fine but do not include the drugs explored in the present study. Therefore, more further investigations should be designed and conducted to validate our conclusion. Here, we provided a possible direction for future studies base on large cohorts, which could not only provide more reliable conclusions, but gain more attentions to the role of tumor microenvironment in influencing outcome and drug sensitivity. As for the differences between high- and low-risk group, as a matter of fact, sometimes a little dose of drug could have a huge effect, because the dose-effect curve is usually nonlinear. Therefore, reduce the dose, even just 1%, the adverse effects could be avoided. To sum up, the drug sensitivity analyses in our study could provide more possibility for clinical trial and practice, and we are taking it into consideration to design reasonable clinical research.

      (5) C1QB was associated with decreased tumor purity and worse prognosis, but decreased tumor purity was related to better prognosis. How to elucidate the contradiction?

      Just as discussed in Discussion section, previous studies have revealed the role of C1QB in promoting an immunosuppressive microenvironment in cancer (see reference 22-26). C1QB might recruit the infiltration of pro-tumor immune cells, resulting in a reducing tumor purity on its perspective. However, the immune microenvironment was regulated by multi factors which form a network and combat or synergize each other. The statistical analysis often gives a possible phenomenon, but could not provide mechanism explanation. Therefore, more mechanic studies are needed to reveal the connection and key node. This is exactly what we will explore next.

      (6) Others:

      (1) Line 51 has been rewritten.

      (2) References for ESTIMATE algorithm (reference 16) and CD3G+ T cells has been added (reference 17).

      (3) The illegible figure labels might be caused by the incompatibility between the PDF file we submitted and the submission system. We have provided the TIFF images in this revision, and the EPS file could be submitted to editors upon their requests.

      (4) A supplement description has been added to the Figure legend of Figure 6 to make it clear.

      (5) In order to explore the expression of key genes among different locations of DLBCL we performed analyses in Figure5 and supplementary Figure3. These results might be thought-provoking that the tumor microenvironment differs among DLBCLs even though they share similar histological characteristics.

    1. Author response:

      We thank the editors and reviewers for their thorough engagement with the manuscript and their well-informed comments on the Poseidon framework. We are pleased to note that they consider Poseidon a promising and timely attempt to resolve important issues in the archaeogenetics community. We also agree with the main challenges they raise, specifically the lack of long-term, independent infrastructure funding at the time of writing, and various aspects of Poseidon that bear the potential to further consolidate a de-facto alienation of the aDNA community from the wider field of genomics.

      Poseidon is indeed dependent on the Department of Archaeogenetics at MPI-EVA. For the short to middle-term future (3-5 years) we consider this dependency beneficial, providing a reliable anchor point and direct integration with one of the most proficient data-producing institutions in archaeogenetics. For the long term, as stated in the discussion section of the manuscript, we hope for a snowball effect in the dissemination and adoption of Poseidon to establish it as a valuable community resource that automatically attracts working time and infrastructure donations. To kickstart this process we have already intensified our active community outreach and teach Poseidon explicitly to (early career) practitioners in the field. We are aware of options to apply for independent infrastructure funding, for example through the German National Research Data Infrastructure (NFDI) initiative, and we plan to explore them further.

      As the reviewers have noted, key decisions in Poseidon’s data storage mechanism have been influenced by the special path archaeogenetics has taken compared to other areas of genomics. The founding goal of the framework was to integrate immediately with established workflows in the field. Nevertheless we appreciate the concrete suggestions on how to connect Poseidon better with the good practices that emerged elsewhere. We will explicitly address the European Variation Archive in a revised version of the manuscript, deliberate embedding the BioSamples ID of the INSDC databases more prominently in the .janno file, prioritise support for VCF next to EIGENSTRAT and PLINK and add an option to clearly document the relevant human reference genome on a per-sample level. In the revised version of the text we will also explain the treatment of non-overlapping SNPs between studies by trident’s forge algorithm and how we imagine the interplay of different call sets in the Poseidon framework in general.

      Beyond these bigger concerns we will also consider and answer the various more detailed recommendations thankfully shared by the reviewers, not least the question how we imagine Poseidon to be used by archaeologists and for archaeological data.

    1. Author response:

      We wish to express our sincere acknowledgement to the reviewers and the editors for the time and the effort spent in reviewing our manuscript. We highly appreciate the positive feedback and the thorough and constructive comments.

      We plan to conduct additional experiments to address the reviewers’ concerns.

      (1) We plan to utilize the RIPK1 kinase dead mice to investigate the role of RIPK1 kinase activity in these metabolic stress responses.

      (2) We plan to conduct flow cytometry analysis to detect the percentage or number of different cell types in fasted liver tissue, to provide more accurate and quantitative assessments of monocyte   recruitment.

      (3) We plan to conduct more western blotting to detect the expression of related molecules in the signal transduction pathway, to further clarify the underlying mechanisms.

      (4) Regarding the single-cell RNA sequencing analysis,we plan to conduct CellChat analysis to provide information about the interactions between different cell populations.

      (5) We will fix the issues regarding the data graphs and image resolutions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      This study is very well framed and the writing is very clear. The manuscript is well organized and easy to follow and overall the previous state of the art of the field is taken into account.  I only have a couple of minor comments 

      (1) There is a preprint that uses single nuclei RNA-Seq and ST on human MS subcortical white matter lesions doi: https://doi.org/10.1101/2022.11.03.514906. This work needs to be included in the discussion of the results. 

      (1.1) We appreciate the reviewer bringing up this important preprint, and we have referenced it in the Discussion section of our updated manuscript. 

      (2) The discussion should include the overall limitations of the study and how much it can be translated to human MS. Specifically, the current work uses EAE and therefore different disease stages are not captured in this study. This point is also raised by other reviewers. 

      (1.2) We thank the reviewer for raising this important point, and we have included additional discussion about the limitations of EAE and its disease relevance to MS.

      Reviewer #2 (Recommendations For The Authors):

      The authors state that this EAE model is better for studying cortical gradients because previous models "such as directly injecting inflammatory cytokines into the meninges/cortex" cause a traumatic injury. It needs to be discussed that these models have now been superseded by more refined models involving long-term overexpression of pro-inflammatory cytokines in the sub-arachnoid space, thereby avoiding traumatic injury. The current results should be discussed in light of these newer models (James et al, 2020; 2022), which are more similar to MS cortical pathology and do exhibit lymphoid-like structures. 

      (2.1) We thank the reviewer for pointing out these relevant studies, and we agree they describe non-traumatic and more MS-relevant models of leptomeningeal inflammation. We have included discussion of these works in the updated manuscript.  

      • The study will be substantially improved if some of the ST data is validated at least partially with some RNAscope or other in situ hybridization using a subset of probes that capture the take-home message of the paper. 

      (2.2) We agree with the reviewer that validation of transcriptomics results is important to support our conclusions. In the updated manuscript Figure 5 and Supplemental Figure 6 we have added RNAscope results for relevant genes. In agreement with the trends noted in the manuscript, expression of genes related to antigen processing and presentation such as B2m decreases gradually with distance from LMI. We also have included a reference to a newly published manuscript from our group (Gupta et al., 2023, J. Neuroinflammation) that characterizes meningeal inflammation and sub-pial changes in the SJL EAE model. In that manuscript, IHC is used to show accumulation of B cells and T cells in the leptomeningeal space, increased microglial and astrocyte reactivity adjacent to leptomeningeal inflammation, and reduction of neuronal markers adjacent to leptomeningeal inflammation.  

      • The lack of change in signaling pathways involved in B-cell/T-cell interaction and cytokine/chemokine signaling, which would be expected in areas of immune cell aggregation in the meninges, needs discussion. 

      (2.3) While we detected significant upregulation in antigen presentation, complement activation, and humoral immune signaling, areas of meningeal inflammation identified as cluster 11 showed upregulation of numerous other GO gene sets associated with immune cell interaction and cytokine signaling, as described in supplementary table 3. These include T-cell receptor binding, CCR chemokine receptor binding, interleukin 8 production, response to interleukin 1, positive regulation of interleukin-6 production, tumor necrosis factor production, leukocyte cell-cell adhesion. Overall, we believe that the collection of enriched gene sets is consistent with peripheral myeloid and lymphoid infiltration and cytokine production, with the most prominent cytokine / pathways being interferon ɣ/antigen processing and presentation, complement, and humoral inflammation.

      • Fig 4 subclusters includes T-cell activation, pos regulation of neuronal death, cellular response to IFNg, neg regulation of neuronal projections, Ig mediated immune response, cell killing, pos regulation of programmed cell death, pos regulation of apoptotic process, but none of these are discussed despite their obvious importance. 

      (2.4) We agree with the reviewer that these upregulated genesets warrant additional discussion and have added additional reference to these genesets in the results section. Also, the genesets ‘positive regulation of programmed cell death’, ‘positive regulation of apoptotic process’, and ‘positive regulation of cell death’ were erroneously included in Figure 4F in the initial manuscript, as they are actually downregulated in cluster 1_4. This has been clarified in the text.

      • Subcluster 11 appears spatially to represent the meninges, but what pathways are expressed there? 330 genes/pathways altered independent of other clusters - immune cell regulation? 

      (2.5) We refer the reviewer to Supplementary Table 3, which contains a complete list of GO genesets enriched within cluster 11 spots.

      • The surprising lack of immunoglobulin genes upregulated in the meninges of the mice, considering these are the genes most upregulated in the MS meninges. Should be pointed out and discussed. 

      (2.6) We appreciate the reviewer bringing up immunoglobulin genes, which previous publications have shown are elevated in MS meninges and cortical grey matter lesions. Consistent with this, several immunoglobulin genes are elevated in cluster 11, including genes encoding IgG2b, IgA, and IgM. While these results were available within the original submission in Supplementary Table 2, we have included the graph in the updated Supplementary Figure 3.

      • Meningeal signature may be poorly represented given the individual slices shown in suppl 3A, which suggests that only 3 of the EAE slices had significant meningeal infiltrates, indicated by cluster 11 genes.  

      (2.7) There was heterogeneity in the location and extent of meningeal infiltrate / cluster 11 in the EAE slices, as the reviewer points out. 2 slices had severe inflammation, 2 had moderate inflammation, and 2 had relatively mild inflammation, but all EAE slices were enriched in inflammation relative to naïve as demonstrated not only through clustering, but also through enriched marker analysis between EAE and Naive and Progeny analysis.  

      • The ST is not resolving the meningeal tissue and the immediate underlying grey matter, as demonstrated by a high signal for both CXCL13 and GFAP in cluster 11. 

      (2.8) We agree that the spatial transcriptomics strategy applied here is inadequate to precisely delineate between meningeal inflammation and the underlying brain parenchyma, and that the elevation of markers such as GFAP in cluster 11 indicates some ‘contamination’ of parenchymal cells into cluster 11. We have clarified this in the text and discussed the limitation of the spatial transcriptomics method used.  

      • More information is required concerning how many animals were used in this study, to meet the requirements for complying with the 3Rs. 

      (2.9) A total of 4 mice were used per group. In the naïve group one mouse contributed two slices, for a total of 5 naïve slices. In the EAE group two mice contributed two slices, for a total of 6 EAE slices. We have clarified this in the methods section of the updated manuscript.

      Reviewer #3 (Recommendations For The Authors):

      The authors should provide a more thorough description of the methodology, and there are a few minor concerns about experimental details, data presentation, and description that need to be addressed. In the next few lines, I will highlight a few important aspects that need to be addressed, propose some changes to the main manuscript, and suggest some additional experiments that, if successful, could confirm/support/further strengthen the conclusions that are at this point purely based on transcriptomic data. 

      Major comments/suggestions: 

      • The main gene expression changes between the control and EAE groups obtained via spatial transcriptomics need to be validated with another technique, at least partially. I suggest performing RNAscope or immunofluorescence imaging using brain sections from a new and independent cohort of animals, where cell-specific markers can also be tested. This type of assessment would work as a validation method and could also inform about the cell-specific contribution to the observed transcriptomic changes. 

      (3.1) Please refer to response 2.2 

      • The representative qualitative spatial expression heatmaps for each gene in Fig. 1F should be accompanied by corresponding graphs with quantitative measurements. Similar to what is done regarding the data in Fig. 2B and D. 

      (3.2) We agree with the reviewer that quantitative graphs were missing, and we have included them in the updated Supplementary Figure 1. 

      • A supplementary table discriminating all the DEGs (132 up and 70 downregulated) between cluster 11 and the other clusters has to be provided. What is the contribution of recruited encephalitogenic adaptive immune cells to this cluster 11 gene signature? 

      (3.3) These unfiltered results are provided in Supplementary Table 2, and to view the up and down regulated genes the reader can sort the table based on fold change and adjusted P value. We believe providing the complete table is more useful to the reader, since the fold change and

      P value thresholds used to determine “significance” are arbitrary. Since the spatial transcriptomics method used in this work does not have single cell resolution, we cannot accurately estimate the contribution of encephalitogenic adaptive immune cells in cluster 11. However, given previously published work of lymphocyte infiltration into the subarachnoid space in SJL EAE (Gupta et al., 2023, J. Neuroinflammation) and the enrichment of Cd3e in cluster 11 (Log2FC 0.31, adjusted P-val 0.005) we assume some contribution of peripheral lymphocytes.

      • The authors mention that there is grey matter pathology in this relapse model, and this has been shown in a previous publication (Bhargava et al., 2021). However, the regions analyzed in the present study are different from the ones shown in the referenced paper. Is there an overexpression of genes involved in, or gene modules indicative of, neuronal stress and/or death that spatially overlap with clusters 1 and 2? If so, it would be important to provide information about those gene modules in the main figures. It would also be quite relevant to show the levels of cell stress/death proteins and of axonal stress/damage, by APP and/or nonphosphorylated SMI-32 staining, in the deep brain regions (like the thalamus), to corroborate the link between these phenomena and the gene signatures of subclusters 1_3, 1_4, and 2_6. 

      (3.4) We thank the review for this insightful comment. We have recently published a manuscript that histologically analyzes leptomeningeal inflammation in the SJL EAE model, specifically assessing the areas looked at in our submitted manuscript (Gupta et al., 2023, J. Neuroinflammation). In that manuscript, IHC is used to show accumulation of B cells and T cells in the leptomeningeal space, increased microglial and astrocyte reactivity adjacent to leptomeningeal inflammation, and reduction of neuronal markers adjacent to leptomeningeal inflammation. To further describe the gene modules in the inflammatory subclusters 1_3/1_4/2_6, we have now provided heatmaps of the selected genesets and their constituent genes (Supplementary Figure 5). 

      • It would be important to provide heatmaps discriminating the DEGs that make the gene modules that are significantly altered in subclusters 1_3, 1_4, and 2_6. The gene ontology terms are sometimes ambiguous. For instance, it would be very informative to the reader (and to the field) to know which altered genes compose the "lysosome", "immune response", "response to stress", or "B cell meditated immunity" pathways that are altered in the EAE subcluster 1_3 (Fig. 4E). The same applies to the gene modules altered in the other subclusters of interest. Authors should also consider generating a Venn diagram with the DEGs from subclusters 1_3, 1_4, and 2_6, to complement the GO term Venn presented in Fig. 4H. Having these pieces of information readily available, either as main or supplementary figures, would be a great addition. 

      (3.5) We agree with the reviewer on this point and have included these heatmaps in Supplementary Figure 5. 

      • The role of IFN-gamma as well as B cells (and Igs) in myelination/remyelination is mentioned in the discussion. However, there is very little evidence that these cells or their cytokines/Igs are mediating the described transcriptomic signatures at the level of the brain parenchyma of EAE mice undergoing relapse. Do the "antigen processing and presentation, cell killing, interleukin 6 production, and interferon gamma response" go terms, which better fitted the trajectory analysis, in fact include genes expressed almost exclusively by T and/or B cells? Are there genes that are downstream of IFN type I or II signaling? 

      (3.6) Pathways including antigen processing / presentation, humoral inflammation, complement, among others were enriched in areas of meningeal inflammation and adjacent areas of parenchyma. These signaling pathways are mediated by effector molecules, many of which are produced by lymphocytes, but that can act on cells within the CNS parenchyma. The heatmaps in Supplementary Figure 5 demonstrate the significant role of MHC and complement genes, which could be expressed by leukocytes as well as glia, on many of the pathways.

      • Is the transcriptomic overlap between meningeal and brain parenchymal regions, or the appearance of signatures similar to the parenchymal subclusters 1_3, 1_4, and 2_6, prevented if the mice are treated with the murine versions of natalizumab or rituximab prior relapse? 

      (3.6) We appreciate the reviewers suggestion. Our future directions for this work includes testing the effects of disease modifying therapies on spatial and single-cell transcriptomic readouts of disease in SJL EAE.

      • Please clarify what control group was used in this study. Naïve mice are mentioned in the Results section, does this mean that control animals were not injected with CFA? Authors should also elaborate on the descriptive methodology employed for the analysis of the spatial

      transcriptomics data - especially regarding the trajectory analysis. As is, overall, the methodology description might not favor reproducibility. 

      (3.7) We appreciate the need for clarification here. Our control group in this study was naïve, not having received any CFA or pertussis toxin. While often used as the control in EAE studies focused on mechanisms of autoimmunity, CFA and pertussis toxin independently induce systemic inflammation. Since in this study we were interested in neuroinflammation broadly, we chose to use a naïve comparison group to maximize our ability to find genes enriched in neuroinflammation. We have elaborated our methods section, including methods related to trajectory analysis. 

      Minor comments/suggestions: 

      In Fig. 1D the indication of the rostral to ventral axis needs to be inverted. 

      Addressed.

      In Fig. 1E the authors should also include a representative H&E staining of the same region in a control animal. 

      Addressed.

      There is inconsistency in the number of clusters obtained after UMAP unbiased clustering of the spatial transcriptomic data: 

      • Fig. 3A-E - twelve clusters are shown (cluster 0 to 11). 

      • In the Results section eleven clusters are mentioned - "we performed unbiased UMAP clustering on the spatial transcriptomic dataset and identified 11 distinct clusters".

      The text was incorrect, there were 12 distinct clusters. This has been corrected.

      Considering the mice strain used was SJL/J mice, the peptide used to induce EAE should be PLP139-151, as mentioned in the Methods section "Induction of SJL EAE". However, the legend of Fig. 1 mentions "post immunization with MOG 35-55". Please correct this. 

      Corrected.

      In the Methods section it is mentioned "At 12 weeks post-immunization, animals were euthanized", however the Results section mentions that tissues were harvested at 11 weeks post-immunization - "Brain slices were collected from four naïve mice and four EAE mice 11 weeks postimmunization". Please correct this. 

      The Methods were incorrect, this has now been fixed. 

      Please clarify the number of animals used for spatial transcriptomic analysis: 

      • Legend of Fig. 1 mentions "Red arrows indicate MRI time points, black arrow indicates time of tissue harvesting (N = 6)." Whilst in the Results section it states "Brain slices were collected from four naïve mice and four EAE mice". 

      The figure one legend has now been corrected (N = 4). Additionally, we have added clarification about the number of animals / slices used in the Methods section (see response 2.9).

      Please be consistent in the way of representing DEGs in the MA plots: 

      • Fig. 3F shows the upregulated genes (in red) on the right and the downregulated genes (in blue) on the left. 

      • Supplemental Fig. 2K shows the upregulated genes (in red) on the left and the downregulated genes (in blue) on the right. 

      • Supplemental Fig. 4 shows the upregulated genes on the right in blue, while the downregulated genes are in red. 

      This has been fixed.

      The letters attributed to each subcluster in panels E-G of Fig. 4 are different from the respective figure legend. 

      This has been fixed.

      Correct the legend of supplemental figure 2: o "(G-H) Representative spatial feature plots of read count (F) and UMI (G) demonstrate expected anatomic variability in transcript amount and diversity.". 

      This has been fixed.

      In Supplemental Fig. 4G there is probably an error with the XX axis, since the significantly up and down-regulated genes are not visible. 

      This has been fixed.

    1. Author response:

      Reviewer 1:

      Summary:

      In this manuscript by Bimbard et al., a new method to perform stable recordings over long periods of time with neuropixels, as well as the technical details on how the electrodes can be explanted for follow-up reuse, is provided. I think the description of all parts of the method is very clear, and the validation analyses (n of units per day over time, RMS over recording days...) are very convincing. I however missed a stronger emphasis on why this could provide a big impact on the ephys community, by enabling new analyses, new behavior correlation studies, or neurophysiological mechanisms across temporal scales

      Strengths:

      Open source method. Validation across laboratories. Across species (mice and rats) demonstration of its use and in different behavioral conditions (head-fixed and freely moving).

      Weaknesses:

      Weak emphasis on what can be enabled with this new method that didn't exist before.

      We thank the reviewer for highlighting the limited discussion around scientific impact. Our implant has several advantages which combine to make it much more accessible than previous solutions. This enables a variety of recording configurations that would not have been possible with previous designs, facilitating recordings from a wider range of brain regions, animals, and experimental setups. In short, there are three key advances:

      (1) Adaptability: The CAD files can be readily adapted to a wide range of configurations (implantation depth, angle, position of headstage, etc.). Labs have already, modified the design to optimise for their needs, and re-shared with the community.

      (2) Weight:  Because of the lightweight design, experimenters can i) perform complex and demanding freely moving tasks as we exemplify in the manuscript, and ii) implant female and water restricted mice while respecting animal welfare weight limitations.

      (3) Cost: At ~$10, our implant is significantly cheaper than published alternatives, which makes it affordable to more labs and means that testing modifications is cost-effective.

      We will make these features clearer in the manuscript.

      Reviewer 2:

      Summary:

      This work by Bimbard et al., introduces a new implant for Neuropixels probes. While Neuropixels probes have critically improved and extended our ability to record the activity of a large number of neurons with high temporal resolution, the use of these expensive devices in chronic experiments has so far been hampered by the difficulty of safely implanting them and, importantly, to explant and reuse them after conclusion of the experiment. The authors present a newly designed two-part implant, consisting of a docking and a payload module, that allows for secure implantation and straightforward recovery of the probes. The implant is lightweight, making it amenable for use in mice and rats, and customizable. The authors provide schematics and files for printing of the implants, which can be easily modified and adapted to custom experiments by researchers with little to no design experience. Importantly, the authors demonstrate the successful use of this implant across multiple use cases, in head-fixed and freely moving experiments, in mice and rats, with different versions of Neuropixels probes, and across 8 different labs. Taken together, the presented implants promise to make chronic Neuropixel recordings and long-term studies of neuronal activity significantly easier and attainable for both current and future Neuropixels users.

      Strengths:

      - The implants have been successfully tested across 8 different laboratories, in mice and rats, in head-fixed and freely moving conditions, and have been adapted in multiple ways for a number of distinct experiments.

      - Implants are easily customizable and the authors provide a straightforward approach for customization across multiple design dimensions even for researchers not experienced in design.

      - The authors provide clear and straightforward descriptions of the construction, implantation, and explant of the described implants.

      - The split of the implant into a docking and payload module makes reuse even in different experiments (using different docking modules) easy.

      - The authors demonstrate that implants can be re-used multiple times and still allow for high-quality recordings.

      - The authors show that the chronic implantations allow for the tracking of individual neurons across days and weeks (using additional software tracking solutions), which is critical for a large number of experiments requiring the description of neuronal activity, e.g. throughout learning processes.

      - The authors show that implanted animals can even perform complex behavioral tasks, with no apparent reduction in their performance.

      Weaknesses:

      - While implanted animals can still perform complex behavioral tasks, the authors describe that the implants may reduce the animals' mobility, as measured by prolonged reaction times. However, the presented data does not allow us to judge whether this effect is specifically due to the presented implant or whether any implant or just tethering of the animals per se would have the same effects.

      The reviewer is correct: some of the differences in mouse reaction time could be due to the tether rather than the implant. As these experiments were also performed in water-restricted female mice with the heavier Neuropixels 1.0 implant, our data represent the maximal impact of the implant, and we will highlight this in the revision.

      - While the authors make certain comparisons to other, previously published approaches for chronic implantation and re-use of Neuropixels probes, it is hard to make conclusive comparisons and judge the advantages of the current implant. For example, while the authors emphasize that the lower weight of their implant allows them to perform recordings in mice (and is surely advantageous), the previously described, heavier implants they mention (Steinmetz et al., 2021; van Daal et al., 2021), have also been used in mice. Whether the weight difference makes a difference in practice therefore remains somewhat unclear.

      The reviewer is correct: without a direct comparison, we cannot be certain that our smaller, lighter implant improves behavioural results (although this is supported by the literature, e.g. Newman et al, 2023). However, the reduced weight of our implant is critical for several laboratories represented in this manuscript due to animal welfare requirements. Indeed, in Daal et al the authors “recommend a [mouse] weight of >25 g for implanting Neuropixels 1.0 probes.” This limit precludes using (the vast majority of) female mice, or water-restricted animals. Conversely, our implant can be routinely used with lighter, water-restricted male and female mice. We will emphasise this point in the revision.

      - The non-permanent integration of the headstages into the implant, while allowing for the use of the same headstage for multiple animals in parallel, requires repeated connections and does not provide strong protection for the implant. This may especially be an issue for the use in rats, requiring additional protective components as in the presented rat experiments.

      We apologise for not clarifying the various headstage options in the manuscript and we will address this in the revision. Our repository has headplate holder designs (in the XtraModifications/Mouse_FreelyMoving folder). This allows leaving the headstage on the implant, and thus minimize the number of connections (albeit increasing the weight for the mouse). Indeed, mice recorded while performing the task described in our manuscript had the head-stage semi-permanently integrated to the implant, and we will highlight this in the revision.

      Reviewer 3:

      Summary:

      In this manuscript, Bimbard and colleagues describe a new implant apparatus called "Apollo Implant", which should facilitate recording in freely moving rodents (mice and rats) using Neuropixels probes. The authors collected data from both mice and rats, they used 3 different versions of Neuropixels, multiple labs have already adopted this method, which is impressive. They openly share their CAD designs and surgery protocol to further facilitate the adaptation of their method.

      Strengths:

      Overall, the "Apollo Implant" is easy to use and adapt, as it has been used in other laboratories successfully and custom modifications are already available. The device is reproducible using common 3D printing services and can be easily modified thanks to its CAD design (the video explaining this is extremely helpful). The weight and price are amazing compared to other systems for rigid silicon probes allowing a wide range of use of the "Apollo Implant".

      Weaknesses:

      The "Apollo Implant" can only handle Neuropixels probes. It cannot hold other widely used and commercially available silicon probes. Certain angles and distances are not possible in their current form (distance between probes 1.8 to 4mm, implantation depth 2-6.5 mm, or angle of insertion up to 20 degrees).

      We appreciate the reviewer’s points, but as we will discuss in the revised manuscript, one implant accommodating the diversity of the existing probes is beyond the scope of this project. However, because the design is adaptable, groups should be able to modify the current version of the implant to adapt to their electrodes’ size and format (and can highlight any issues in the Github “Discussions” area).

      With Neuropixels, the current range of depths covers practically all trajectories in the mouse brain. In rats, where deeper penetrations may be useful, the experimenter can attach the probe at a lower point in the payload module to increase the length of exposed shank. We now specify this in the Github repository.

      We have now extended the range of inter-probe distances from a maximum of 4 mm to 6.5 mm, and this will be reflected in the revised manuscript. Distances beyond this may be better served by 2 implants, and smaller distances could be achieved by attaching two probes on the same side of the docking module. In the next revision, we will add these points to the discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors report that optogenetic inhibition of hippocampal axon terminals in retrosplenial cortex impairs the performance of a delayed non-match to place task. The significance of findings elucidating the role of hippocampal projections to the retrosplenial cortex in memory and decision-making behaviors is important. However, the strength of evidence for the paper's claims is currently incomplete.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is a study on the role of the retrosplenial cortex (RSC) and the hippocampus in working memory. Working memory is a critical cognitive function that allows temporary retention of information for task execution. The RSC, which is functionally and anatomically connected to both primary sensory (especially visual) and higher cognitive areas, plays a key role in integrating spatial-temporal context and in goal-directed behaviors. However, the specific contributions of the RSC and the hippocampus in working memory-guided behaviors are not fully understood due to a lack of studies that experimentally disrupt the connection between these two regions during such behaviors.

      In this study, researchers employed eArch3.0 to silence hippocampal axon terminals in the RSC, aiming to explore the roles of these brain regions in working memory. Experiments were conducted where animals with silenced hippocampal axon terminals in the RSC performed a delayed non-match to place (DNMP) task. The results indicated that this manipulation impaired memory retrieval, leading to decreased performance and quicker decision-making in the animals. Notably, the authors observed that the effects of this impairment persisted beyond the light-activation period of the opsin, affecting up to three subsequent trials. They suggest that disrupting the hippocampal-RSC connection has a significant and lasting impact on working memory performance.

      Strengths:

      They conducted a study exploring the impact of direct hippocampal inputs into the RSC, a region involved in encoding spatial-temporal context and transferring contextual information, on spatial working memory tasks. Utilizing eArch3.0 expressed in hippocampal neurons via the viral vector AAV5-hSyn1-eArch3.0, they aimed to bilaterally silence hippocampal terminals located at the RSC in rats pre-trained in a DNMP task. They discovered that silencing hippocampal terminals in the RSC significantly decreased working memory performance in eArch+ animals, especially during task interleaving sessions (TI) that alternated between trials with and without light delivery. This effect persisted even in non-illuminated trials, indicating a lasting impact beyond the periods of direct manipulation. Additionally, they observed a decreased likelihood of correct responses following TI trials and an increased error rate in eArch+ animals, even after incorrect responses, suggesting an impairment in error-corrective behavior. This contrasted with baseline sessions where no light was delivered, and both eArch+ and control animals showed low error rates.

      Weaknesses:

      While I agree with the authors that the role of hippocampal inputs to the RSC in spatial working memory is understudied and merits further investigation, I find that the optogenetic experiment, a core part of this manuscript that includes viral injections, could be improved. The effects were rather subtle, rendering some of the results barely significant and possibly too weak to support major conclusions.

      We thank Reviewer#1 for carefully and critically reading our manuscript, and for the valuable comments provided. The judged “subtlety” of the effects stems from a perspective according to which a quantitatively lower effect bears less biological significance for cognition. We disagree with this perspective and find it rather reductive for several reasons.

      Once seen in the context of the animal’s ecology, subtle impairments can be life-threatening precisely because of their subtlety, leading the animal to confidently rely on a defective capacity, for such events as remembering the habitual location of a predator, or food source.

      Also, studies in animal cognition often undertake complete, rather than graded, suppression of a given mechanism (in the same sense as that of “knocking out” a gene that is relevant for behaviour), leading to a gravelly, rather that gradually, impaired model system, to the point of not allowing a hypothetical causal link to be mechanistically revealed beyond its mere presence. This often hinders a thorough interpretation of the perturbed factor’s role. If a caricatural analogy is allowed, it would be as if we were to study the role of an animal’s legs by chopping them both off and observing the resulting behaviour.

      In our study we conclude that silencing HIPP inputs in RSC perturbs cognition enough to impair behaviour while not disabling the animal entirely, as such allowing for behaviour to proceed, and for our observation of graded, decreased (not absent), proficiency under optogenetic silencing. So rather than weak, we would say the results are statistically significant, and biologically realistic.

      Additionally, no mechanistic investigation was conducted beyond referencing previous reports to interpret the core behavioral phenotypes.

      We fully agree with this being a weakness, as we wish we could have done more mechanistic studies to find out exactly what is Arch activation doing to HIPP-RSC transmission, which neurons are being affected, and perhaps in the future dissect its circuit determinants. We have all these goals very present and hope we can address them soon.

      Reviewer #2 (Public Review):

      The authors examine the impact of optogenetic inhibition of hippocampal axon terminals in the retrosplenial cortex (RSP) during the performance of a working memory T-maze task. Performance on a delayed non-match-to-place task was impaired by such inhibition. The authors also report that inhibition is associated with faster decision-making and that the effects of inhibition can be observed over several subsequent trials. The work seems reasonably well done and the role of hippocampal projections to retrosplenial cortex in memory and decision-making is very relevant to multiple fields. However, the work should be expanded in several ways before one can make firm conclusions on the role of this projection in memory and behavior.

      We thank Reviewer#2 for carefully and critically reading our manuscript, and for the valuable comments provided.

      (1) The work is very singular in its message and the experimentation. Further, the impact of the inhibition on behaviour is very moderate. In this sense, the results do not support the conclusion that the hippocampal projection to retrosplenial cortex is key to working memory in a navigational setting.

      As we have mentioned in response to Reviewer#1, the judged “very moderate” effect stems from a perspective according to which a quantitatively lower effect bears less biological significance for cognition, precluding its consideration as “key” for behaviour. We disagree with this perspective and find it rather reductive for several reasons. Once seen in the context of the animal’s ecology, quantitatively lower impairments in working memory are no less key for this cognitive capacity, and can be life-threatening precisely because of their subtlety, leading the animal to confidently rely on a defective capacity, for such events as remembering the habitual location of a predator, or food source. Furthermore, studies in animal cognition often undertake complete, rather than graded, suppression of a given mechanism (in the same sense as “knocking out” a gene that is relevant for behaviour), leading to a gravelly, rather that gradually, impaired model system, to the point of not allowing a hypothetical causal link to be mechanistically revealed beyond its mere presence. This often hinders a thorough interpretation of its role.

      In our study we conclude that silencing HIPP inputs in RSC perturbs behaviour enough to impair behaviour while not disabling the animal entirely, as such allowing for behaviour to proceed, and our observation of graded, decreased (not absent), proficiency under optogenetic silencing. So rather than weak, we would say the results are statistically significant, and biologically realistic.

      (2) There are no experiments examining other types of behavior or working memory. Given that the animals used in the studies could be put through a large number of different tasks, this is surprising. There is no control navigational task. There is no working memory test that is non-spatial. Such results should be presented in order to put the main finding in context.

      It is hard to gainsay this point. The more thorough and complete a behavioural characterization is, the more informative is the study, from every angle you look at it. While we agree that other forms of WM would be quite interesting in this context, we also cannot ignore the fact that DNMP is widely tested as a WM task, one that is biologically plausible, sensitive to perturbations of neural circuitry know to be at play therein, and fully accepted in the field. Faced with the impossibility of running further studies, for lack of additional funding and human resources, we chose to run this task.

      A control navigational task would, in our understanding, be used to assess whether silencing HIPP projections to RSC would affect (spatial?) navigation, rather than WM, thus explaining the observed impairment. To this we have the following to say: Spatial Navigation is a very basic cognitive function, one that relies on body orientation relative to spatial context, on keeping an updated representation of such spatial context, (“alas”, as memory), and on guiding behaviour according to acquired knowledge about spatial context. Some of these functions are integral to spatial working memory, as such, they might indeed be affected.

      Dissecting the determinants of spatial WM is indeed an ongoing effort, one that was not the intention of the current study, but also one that we have very present, in hope we can address in the future.

      A non-spatial WM task would indeed vastly solidify our claims beyond spatial WM, onto WM. We have, for this reason, changed the title of the manuscript which now reads “spatial working memory”.

      (3) The actual impact of the inhibition on activity in RSP is not provided. While this may not be strictly necessary, it is relevant that the hippocampal projection to RSP includes, and is perhaps dominated by inhibitory inputs. I wonder why the authors chose to manipulate hippocampal inputs to RSP when the subiculum stands as a much stronger source of afferents to RSP and has been shown to exhibit spatial and directional tuning of activity. The points here are that we cannot be sure what the manipulation is really accomplishing in terms of inhibiting RSP activity (perhaps this explains the moderate impact on behavior) and that the effect of inhibiting hippocampal inputs is not an effective means by which to study how RSP is responsive to inputs that reflect environmental locations.

      We fully agree that neural recordings addressing the effect of silencing on RSC neural activity is relevant. We do wish we could have provided more mechanistic studies, to find out exactly what is Arch activation doing to HIPP-RSC transmission, which neurons are being affected, and thus dissecting its circuit determinants. We have all these goals very present and hope we can address them soon. Subiculum, which we mention in the Introduction, is indeed a key player in this complex circuitry, one whose hypothetical influence is the subject of experimental studies which will certainly reveal many other key elements.

      (4) The impact of inhibition on trials subsequent to the trial during which optical stimulation was actually supplied seems trivial. The authors themselves point to evidence that activation of the hyperpolarizing proton pump is rather long-lasting in its action. Further, each sample-test trial pairing is independent of the prior or subsequent trials. This finding is presented as a major finding of the work, but would normally be relegated to supplemental data as an expected outcome given the dynamics of the pump when activated.

      We disagree that this finding is “trivial”, and object to the considerations of “normalcy”, which we are left wondering about.

      In lack of neurophysiological experiments (for the reasons stated above) to address this interesting finding, we chose to interpret it in light of (the few) published observations, such being the logical course of action in scientific reporting, given the present circumstances.

      Evidence for such a prolonged effect in the context of behaviour is scarce (to our knowledge only the one we cite in the manuscript). As such, it is highly relevant to report it, and give it the relevance we do in our manuscript, rather than “relegating it to supplementary data”, as the reviewer considers being “normal”.

      In the DNMP task the consecutive sample-test pairs are explicitly not independent, as they are part of the same behavioural session. This is illustrated by the simple phenomenon of learning, namely the intra-session learning curves, and the well-known behavioral trial-history effects. The brain does not simply erase such information during the ITI.

      (5) In the middle of the first paragraph of the discussion, the authors make reference to work showing RSP responses to "contextual information in egocentric and allocentric reference frames". The citations here are clearly deficient. How is the Nitzan 2020 paper at all relevant here?

      Nitzan 2020 reports the propagation of information from HIPP to CTX via SUB and RSC, thus providing a conduit for mnemonic information between the two structures, alternative to the one we target, thus providing thorough information concerning the HIPP-RSC circuitry at play during behaviour.

      Alexander and Nitz 2015 precisely cite the encoding, and conjunction, of two types of contextual information, internal (ego-) and external (allocentric).

      The subsequent reference is indeed superfluous here.

      We thank the Reviewer#2 for calling our attention to the fact that references for this information are inadequate and lacking. We have now cited (Gill et al., 2011; Miller et al., 2019; Vedder et al., 2017) and refer readers to the review (Alexander et al., 2023)  for the purpose of illustrating the encoding of information in the two reference frames. In addition, we have substantially edited the Introduction and Discussion sections, and suppressed unnecessary passages.

      (6) The manuscript is deficient in referencing and discussing data from the Smith laboratory that is similar. The discussion reads mainly like a repeat of the results section.

      Please see above. We thank Reviewer#2 for this comment, we have now re-written the Discussion such that it is less of a summary of the Results and more focused on their implications and future directions.

      Response to recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major

      Line 101: Even with the tapered lambda fibre optic stub, if the fibre optics were longitudinally staggered by 2 millimetres, they would deliver light to diagonal regions in the horizontal plane rather than covering the full length of the RSC. Is this staggering pattern randomized or fixed? Additionally, Figure 1C is a bit misleading, as the light distribution pattern from the tapered fibre optic is likely to be more concentrated near the surface of the fibre, rather than spreading widely in a large spherical pattern.

      The staggering is fixed. The elliptical (not spherical) contour in Fig 1C is not meant to convey any quantitative information, but rather to visually orient the reader towards the directions into which light will likely propagate, the effects of which we do not attempt to estimate here. We have made this contour smaller.

      Line 119: The authors demonstrate the viral expression pattern of a representative animal and the overall expression patterns of all other animals in Figure 1 and the Supplementary Figures. However, numerous cases in the Supplementary Figures exhibit viral leakages and strong expressions in adjacent cortical and thalamic areas. Although there is a magnified view of the RSC's expression pattern in Figure 1, authors should show the same way in the supplemental data as well. Additionally, the degree of viral expression in the hippocampal subregions varies substantially across animals. This variation is concerning and impacts the interpretation of the results.

      The viral construct was injected in the HIPP at coordinates based on our previous work (Ferreira-Fernandes et al., 2019) wherein injections of a similar vector in mid-dorsal HIPP resulted in widespread expression throughout the medial mesocortex AP extent, RSC through CG, as well as other areas in which HIPP establishes synapses. These were studied in detail then, by estimating the density of axon terminals. In the present work we did not acquire high-mag images of all slices, since they were too expensive, and we had this information from the study above. Still, we have now added further examples of high-mag images taken from eArch and CTRL animals.

      We believe it is important here to mention the fact that the virus we use, AAV5, only travels anterograde and is static (i.e. it does not travel transynaptically).

      Variations in viral expression are to be expected even if injections happen in the exact same way. It is crucial then, that fibre positioning is constant across animals, to guarantee that its relationship with viral expression is thence consistent, and to render irrelevant whatever off-target expression of the viral construct. We have ascertained this condition post-mortem in all our animals.

      Line 124: Another point regarding the viral expressions and optical fibre implants used to inhibit the HIPP-RSC pathway is that the RSC and HIPP extend substantially along the anterior-posterior axis. The authors should demonstrate how the viral expression is distributed along this axis and indicate where the tip of the tapered optical fibre ended by marking it in the histological images. This information is crucial to confirm the authors' claim that the hippocampal projection terminals were indeed modulated by optical light. Also, the manuscript would benefit from details about the power/duration and/or modulation of the light used.

      In both Figures 1 and S1 panels we can clearly see the tracks formed by the fibres. This provides examples of such dual angle placement vis a vis the expression of the construct, demonstrating that the former is fully targeted towards the latter. We have added markers to highlight these tracks and an example of a “full” track in figure S1. We did not have animals deviating from this relative positioning to any significant extent. The methods section mentions illumination power as 240mA, and we have now added estimated illumination time as well.

      Line 141: The authors should include data on task performance during learning and baseline sessions for each animal, to demonstrate that they fully grasped the task rules and that achieving a 75% performance ratio was sufficient.

      DNMP is a standard WM task used for many decades, in which animals reach performances above 75% in 4-8 sessions. We have used it extensively, and never saw any deviations from this learning rate and curve. We ran daily sessions until animals reached 75%, and thereafter until they maintained this performance, or above, for three consecutive sessions (the data points we show). We saw no deviations from what is published, nor from what is our own extensive experience, and thence are fully confident that all animals included in this manuscript grasped task rules.

      Line 146: While the study focused on inhibiting inputs during the test run (retrieval phase), it would be beneficial to also inhibit inputs during the sample run (encoding phase) and the delay period. This would help confirm whether the silencing affects only working memory retrieval, or if it also impacts encoding and maintenance.

      We agree, it would be very interesting to determine if there are any effects of silencing HIPP RSC terminals during Sample. However, since there is a limit to the number of trials per session, and to the total number of sessions, we could not run the three manipulations within each session of our experimental design, as that would lower the number of trials per condition to an extent that would affect statistical power. Silencing HIPP RSC terminals during Sample would best be a separate experiment, asking a different question, and perhaps within an experimental design distinct from the one envisioned.

      A very important point here relates to the fact that the effects of optogenetic manipulation do not limit themselves to the illumination epoch, in fact they extend far beyond onto the 3rd trial post-illumination. The insertion of Sample-illuminated trials interleaved in the same session would fundamentally affect the interpretation of experimental results, as we could not attribute lower performances to the effects in either or both manipulated epochs.

      Line 225: Figure 5 illustrates that silencing the inputs results in an extended impairment of working memory performance. However, it's unclear if there are any behavioural changes during the sample run. The inhibition could potentially affect encoding in the subsequent sample run, considering the inter-trial interval (ITI) is only 20 seconds.

      From the observation of behaviour and the analysis of our data, we saw no overt “behavioural changes during the sample run”, as latencies and speeds were essentially unchanged.

      If what is meant by your comment is the effect of optogenetic manipulation being protracted from the Test towards the Sample epoch, we find this unlikely. Conservatively, we estimate the peak of our optogenetic manipulation to occur around the time light is delivered, the Test phase, rather than 20-30 secs later.

      In theory, any effect of optogenetic silencing of HIPP terminals in RSC can cause disturbances in encoding or Sample, the ITI itself, and the epoch in which mnemonic information retrieved from the Sample epoch is confronted with the contextual information present during Test, leading to a decision. This is regardless of the illumination epoch, and even if the effect of optogenetic manipulation is not prolonged in time. 

      Since in our experiments we specifically target the Test epoch, and there is, in all likelihood, a decaying magnitude of neurophysiological effects, manifest in the reported decaying nature of the manipulation mechanism, and in our observed decrease of behavioural proficiency from subsequent trials 1:4, we are convinced that a conservative interpretation is that our major effect is concentrated in the epoch in which we deliver light - the Test epoch, the consequences of which (possibly related to short term plasticity events taking place within the HIPP-RSC neural circuit) extending further in time.

      Line 410: The methods section on the surgical procedure could be clearer, particularly regarding the coordinates for microinjection and fibre implantation. A more precise description would aid reader comprehension.

      The now-reported injection and implantation coordinates include the numbers corresponding to the distances, in mm, from Bregma to the targets, in the three stereotaxic dimensions considered: antero-posterior, medial-lateral left and right, and dorso-ventral, as well as the angle at which the fibres were positioned. We have added labels to the figures to highlight the fibreoptic track locations. We will be happy to provide further details as deemed necessary.

      Line 461: It would be helpful to know if each animal displayed a preference for the left or right side. Including a description or figure showing that the performance ratio exceeded 75% in both left and right trials would provide a more comprehensive understanding of the animals' behaviour.

      In the DNMP, an extensively used and documented WM task, it is an absolute pre-condition that no animals are biased to either side. As such, we did not use any animal that showed such bias.<br /> We have not observed this to be the case in any of our candidate animals, nor would we use any animal exhibiting such a preference.

      Minor

      Line 25: In the INTRODUCTION section, the authors introduce ego-centric and allocentric variables in the RSC. However, if they intend to discuss this feature, there is no supporting data for ego-centric or allocentric variables in the Results section.

      We agree. The extent of the discussion of ego vs allo-centric variables in our manuscript might venture a bit out of the main subject. It was included to provide wider context to our reporting of the data, considering that spatial working memory is indeed one instance in which egocentric- and allocentric-referenced cognitive mechanisms confront each other, and one in which silencing the HIPP input to a cortical region thence involved would likely disturb ensuing computations. We have now substantially edited the manuscript’s Introduction and Discussion, sections, namely toning down this aspect.

      Line 125: In the section title, DNMT -> DNMP obviously.

      We have corrected this passage.

      Figures: The quality of the figure panels does not meet the expected standards. For example, scale bars are missing in many panels (e.g., Figure 1A bottom, 1B, 1C, S1), figure labels are misaligned (as seen in Figure 3A-B compared to 3C, same with Figure 5), and there is inconsistency in color schemes (e.g., Figure 3C versus Figure 6, where 'Error' versus 'Correct' is depicted using green versus blue, respectively).

      We have now corrected these inconsistencies and mistakes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      …I find the concept and execution of the study very interesting and elegant. The paper is also commendably clear and readable. The differences between primary and higher cortex are compelling and I am largely convinced by the authors' claim that they have found evidence that broadly supports a mixed selectivity model of neural disentanglement along the lines of Rigotti et al (2013). I think that the increasing body of evidence for these kinds of representations is a significant development in our understanding of higher sensory representations. I also think that the dDR method is likely to be useful to researchers in a variety of fields who are looking to perform similar types of neural decoding analysis.

      Thanks! We agree that questions around population coding and high-level representations are critical in the field of sensory systems.

      Reviewer #2 (Public Review):

      ... This is a well-carried out study with thoughtful analyses which in large part achieves its aims to evaluate how task-engagement changes neural activity across multiple auditory regions. As with all work, there are several caveats or areas for future study/analysis. First, the sounds used here (tones, and narrow-band noise) are relatively simple sounds; previous work suggests that exactly what activity is observed within each region (e.g., sensory only, decision-related, etc) may depend in part upon what stimuli are used. Therefore, while the current study adds importantly to the literature, future work may consider the use of more varied stimuli. Second, the animals here were engaged in a behavioral task; but apart from an initial calculation of behavioral d', the task performance (and its effect on neural activity) is largely unaddressed.

      The reviewer makes several important points that we hope we addressed in the specific changes detailed below. Indeed, it is important to recognize the possibility that the specific stimuli involved in a task may interact with the effects of behavioral state and that variability in task performance should be considered as an important aspect of behavioral state.

      Reviewer #1 (Recommendations For The Authors):

      I have a few minor comments and criticisms:

      (1) Figure 1c. The choice of low-contrast grey text (e.g. "Target vs. target" is unfortunate, especially when printed, and should be replaced (e.g. with dark grey).

      We have edited the figure to use a higher contrast (dark grey). Thanks for catching this.

      (2) Figure 2 and Supplementary Figure 3. I think some indication of error or significance is required in all panels. Without this, it's hard to interpret any of these panels.

      Thank you for this feedback. Including significance here was clarifying and helps to strengthen our claim that state-dependent changes in neural activity were smaller and more diverse for single neurons than at the population level. We modified Figure 2b-c to indicate whether each neuron’s response to the target stimulus was significantly different than its response to the catch stimulus. The same test was performed in Supplementary Figure 3. Additionally, we added a statistical test in Figure 2d-e to indicate, for each pair of target/catch stimuli, whether discrimination (d-prime) changed significantly between active and passive conditions. Furthermore, we modified the text of the second paragraph under the results heading: “Diverse effects of task engagement on single neurons in primary and non-primary auditory cortex” to reference and interpret the results of these significance tests. The new text reads as follows (L. 121):

      “Sound-evoked spiking activity was compared between active and passive states to study the impact of task engagement on sound representation. In both A1 and dPEG, responses to target and catch stimuli were significantly discriminable for a subset of single neurons (about 25% in both areas, Figure 2A-C, Supplemental Figures 3-5, bootstrap test). This supports the idea that stimulus identity can be decoded in both brain regions, regardless of task performance. However, the fact that the responses of most neurons in both brain areas could not significantly discriminate target vs. catch stimuli also highlights the diversity of sound encoding observed at the level of single neurons. The accuracy of catch vs. target discrimination for each neuron was quantified using neural d-prime, the z-scored difference in target minus catch spiking response for each neuron (Methods: Single neuron PSTHs and d-prime (Niwa et al., 2012a)). Task engagement was associated with significant changes in catch vs. target d-prime for roughly 10% of neurons in both A1 (40 / 481 neurons, bootstrap test) and dPEG (33 / 377 neurons, bootstrap test). This included neurons that both increased their discriminability and decreased their discriminability (Figure 2D-E). Thus, the effects of task engagement at the level of single neurons were relatively mild and inconsistent across the population; many neurons showed no significant change and of those that did, effects were bidirectional (Figure 2D-E).”

      We also included an additional methods paragraph in the “Statistical tests” section to describe the bootstrapping procedure used for these significance tests (L. 644):

      “The one exception to this general approach is in Figure 2, where we analyzed the sound discrimination abilities of single neurons. In this case, we computed p-values for each neuron and stimulus independently. First, for each neuron and catch vs. target stimulus pair, we measured d-prime (see Methods: Single neuron evoked activity and d-prime). We generated a null distribution of d-prime values for each neuron-stimulus pair, under each experimental condition by shuffling stimulus identity across trials before computing d-prime (100 resamples). A neuron was determined to have a significant d-prime for a given target vs. catch pair if its actual measured d-prime was greater than the 95th percentile of the null d-prime distribution. Second, for each neuron and catch vs. target stimulus pair, we tested if d-prime was significantly different between active and passive conditions. To test this, we followed a similar procedure as above, however, rather than shuffle stimulus identity, we shuffled active vs. passive trial labels. This allowed us to generate a null distribution of active vs. passive d-prime difference for each neuron and stimulus pair. A neuron was determined to have a significant change in d-prime between conditions if the actual Δ d-prime lay outside the 95% confidence interval of the null Δ d-prime distribution.”

      For Figure 2a, we chose not to indicate significance on the figure to avoid clutter, since the significance for all neurons in the population are shown in panels b-c anyway. Additionally, the difference plot shown in panel a is in units of z-scores, which we believe already gives a raw sense of the significance of the target vs. catch response change per neuron in this example dataset.

      (3) Figure 2 and Supplementary Figure 3. I would consider including some more examples as a Supplementary Figure (and perhaps combining Supp Fig 3 with Fig 2 as a main figure).

      We found no significant or apparent difference in single-neuron properties between A1 and dPEG. Therefore, we decided it is not helpful to plot both A1 and PEG examples in the main text. However, we agree that the ability to see more examples of the raw data could be useful. Therefore, we compiled two supplementary figures (Supplementary Figures 4 and 5) that replicate Figure 2a for all datasets, encompassing A1 and PEG.

      (4) Figure 2a and Supp Fig 3a. I was initially confused that the "delta-spk/sec (z-score)" values had themselves been z-scored, but now I think that they are simply the differences of the two left hand sub-panels. This could be made clear in the figure legend.

      The figure legends have been modified to state the procedure for computing “delta-spk/sec” more clearly. Specifically, we added the following information to the legend (L. 141):

      “Difference is computed as the z-scored response to the target minus the z-scored catch response (resulting in a difference shown in units of z-score).”

      (5) Figure 2b-e and Supp Fig 3b-e. Indicate the time window over which the responses were measured, and the number of neurons.

      Figure legends have been modified to include a sentence clearly stating the time window over which responses were measured. The number of neurons is also now included in the legend and on the figure itself. Furthermore, a brief description of the new statistical testing procedure has been added here (L. 144).

      “Responses were defined as the total number of spikes recorded during the 300 ms of sound presentation (area between dashed lines in panel A). Neurons with a significantly different response to the catch vs. target stimulus are indicated in black and quantified on the respective figure panel.”

      (6) Figure 2. "singe" should read "single"

      Typo in figure label has been fixed.

      (7) Line 144. Figure number is missing (Figure 3B-C).

      The missing figure number has been added to the text.

      (8) Figure 3. Again, the low-contrast grey should be replaced.

      The low-contrast grey has been replaced with dark grey.

      Reviewer #2 (Recommendations For The Authors):

      This study really nicely compares the activity and effects on activity in two areas of the auditory cortex in respect to task-engagement; I think it is, for the most part, very well done.

      A couple of specific recommendations:

      (1) Although I understand 'inf dB' as the SNR, including the actual dB level used in the experiments, would be useful, especially in the case of the inf dB.

      Thank you for this feedback. We agree that clarification about the overall sound level used here would be helpful. We have modified the methods section “Behavioral paradigm” to include the following sentence (L. 450):

      “That is, the masking noise (and distractor stimuli) were always presented with an overall sound level of 60 dB SPL. Infinite (inf) dB trials corresponded to trials where the target tone was presented at 60 dB SPL without any masking noise present, 0 dB to trials where the target was 60 dB SPL, -5 dB to trials where the target was presented at 55 dB SPL etc.”

      In addition, we have modified the main text (L. 82):

      “Animals reported the occurrence of a target tone in a sequence of narrowband noise distractors by licking a piezo spout (Figure 1A, Methods: Behavioral paradigm, distractor stimulus sound level: 60 dB SPL). … We describe SNR as the overall SPL of the target relative to distractor noise level. Thus, an SNR of –5 dB corresponds to a target level of 55 dB SPL while an Inf dB SNR corresponds to a target tone presented without any masking noise.”

      And Figure legend 1 now explicitly states the sound level used in the experiments (L. 104):

      “Variable SNR was achieved by varying overall SPL of the target relative to the fixed (60 dB SPL) distractor noise, e.g., -5 dB SNR corresponds to a 55 dB SPL target with 60 dB SPL masking noise. Infinite (inf) dB SNR corresponds to a target tone presented in isolation (60 dB SPL).”

      (2) I very much appreciate the attempt to disentangle task engagement from generalized arousal state, and specifically, addressing this through the use of pupillometry. However, by focusing the discussion of pupil dynamics solely on the arousal-state aspects of pupil size, the paper doesn't address the increasing evidence suggests that pupil size may fluctuate based upon a lot of other things, including perceptual events (see Kronemer et al, 2022 for a recent human paper; for auditory: Zekveld et al 2018 (review) and Montes-Lourido et al, 2021; but many many others, too). It would be nice to see either a bit more nuanced discussion of what pupil size may be indicating (easier), or analyzing the behavior in the context of pupil dynamics (a heavier lift).

      This is a good point. We agree that it is worth mentioning these more nuanced aspects of cognition that may be reflected by pupil size. Therefore, we also analyzed pupil size in the context of behavioral performance (see Supplemental Figure 6) and added the following text to the results (L. 193).

      “In addition to reflecting overall arousal level, pupil size has also been reported to reflect more nuanced cognitive variables such as, for example, listening effort (Zekveld et al., 2014). Furthermore, rodent data suggests that optimal sensory detection is associated with intermediate pupil size (McGinley et al., 2015), consistent with the hypothesis of an inverted-U relationship between arousal and behavioral performance (Zekveld et al., 2014). To determine if this pattern was true for the animals in our task, we measured the dynamics of pupil size in the context of behavioral performance. Across animals, task stimuli evoked robust pupil dilation that varied with trial outcome (Supplemental Figure 6b-c). Notably, pre-trial pupil size was significantly different between correct (hit and correct reject), hit, and miss trials (Supplemental Figure 6b-c), recapitulating the finding of an inverted-U relationship to performance in rodents (McGinley et al., 2015).  Since we focused only on correct trials in our decoding analysis, these outcome-dependent differences in pupil size are unlikely to contribute to the emergent decoding selectivity in dPEG.”

      (3) I think it would make this paper shine that much more if behavioral performance were not subsumed into the overall label of task engagement. You've already established you have performance that varies as a function of SNR; I would love to see the neural d' and covariability related to the behavioral d' (in the comparisons where this is possible). I would also love to see a more direct measure of choice for those stimuli that show variable behavior (e.g., a choice probability analysis or something of the like would seem to be easily applied to the target SNRs of -5 and 0 dB); and compare task engaged activity of hits vs misses vs passive listening to those same stimuli. You discuss previous studies looking at choice-related/decision-related activity and draw parallels to this work-given that there is the opportunity with this data set to *directly* assess choice-related activity, the absence of such an analysis seems like a missed opportunity.

      Thank you for this feedback. We agree that “task engagement” is not a unimodal state and that a more fine-grained analysis of task-engaged neural activity, according to behavioral choice, could be informative.

      First, we would like to point out that in Figure 4 we did already compare behavioral d’ to delta neural d’. We found that the two were significantly correlated in dPEG, but not in A1. This suggests that task-dependent changes in stimulus decoding in dPEG, but not A1, are predictive of behavioral performance. This is consistent with the finding that task-relevant stimulus representations were selectively enhanced in dPEG, but not in A1.

      Second, we added a choice decoding analysis to address whether auditory cortex represents the animal’s choice in our task. The results of this analysis are summarized in Supplemental Figure 8 and are discussed under the results section: “Behavioral performance is correlated with neural coding changes in non-primary auditory cortex only.” (L. 226):

      “The previous analysis suggests that the task-dependent increase in stimulus information present in dPEG population activity is predictive of overall task performance. Next, we asked whether the population activity in either brain region was directly predictive of behavioral choice on single hit vs. miss trials. To do this, we conducted a choice probability analysis (Methods). We found that in both brain regions choice could be decoded well above chance level (Supplemental Figure 8). Choice information was present throughout the entire trial and did not increase during the target stimulus presentation. This suggests that the difference in population activity primarily reflects a cognitive state associated with the probability of licking on a given trial, or “impulsivity” rather than “choice.” This interpretation is consistent with our finding that baseline pupil size on each trial is predictive of trial outcome (Supplemental Figure 6b).”

      To keep our decoding approach consistent throughout the manuscript, we followed the same approach for choice decoding as we did for stimulus decoding (perform dDR then calculate neural d-prime in the dimensionality reduced space). To make the results more interpretable, we converted choice d-prime to a choice probability (percent correctly decoded choices) using leave-one-out cross validation. (We note that d-prime and percent correct are very highly correlated statistics.) This is described in the methods as follows (L. 550):

      “We performed a choice decoding analysis on hit vs. miss trials. We followed the same procedure as described above for stimulus decoding, where instead of a pair of stimuli our two classes to be decoded were “hit trial” vs. “miss trial”. That is, for each target stimulus we computed the optimal linear discrimination axis separating hit vs. miss trials (Abbott and Dayan, 1999) in the reduced dimensionality space identified with dDR (Heller and David, 2022). For the sake of interpretability with respect to previous work we reported choice probability as the percentage of correctly decoded trial outcomes rather than d-prime. Percent correct was calculated by projecting the population activity onto the optimal discrimination axis and using leave-one-out cross validation to measure the number of correct classifications.”

      (4) It would also be interesting to look at population coding across sessions (although the point is taken that within a session allows the opportunity to assess covariability). Minorly self-servingly but very much related to the above point, Christison-Lagay et al, 2017 employed a similar detect-in-noise task, analyzed single neurons and population level activity, and looked at putative choice-related activity. The current study has the opportunity to expand on that kind of analysis that much more by looking across multiple sites vs within a given recording site; and compare across regions.

      Thank you for highlighting this point, we agree that it is important. When studying population coding it is critical to consider the impact of covariability between neurons. Therefore, it is worthwhile to revisit our interpretations of prior results, e.g., Christison-Lagay et al, 2017, which studied population coding by combining neurons across different sessions, given that we now have access to simultaneously recorded population data.

      First, we would like to point out that this was the primary motivation for our simulation analyses presented in Figure 5. Using simulations, we found that task-dependent gain modulation (which can be observed across sessions) was sufficient to explain our primary finding – selective enhancement in decoding of behaviorally relevant sound stimuli in dPEG.

      Second, to address the question about how covariability affects choice-related information in auditory cortex and compare our findings with prior studies, we performed the same set of simulations for choice probability analysis. We found that, again, choice-dependent gain modulation was sufficient to explain our findings. That is, simulations with hit- vs. miss-dependent gain changes, but fixed covariability, closely mirrored the choice probability we observed in the raw data. An additional simulation where covariability between all neurons was set to zero also recapitulated our findings in the raw data. Collectively, this suggests that covariability does not play a significant role in shaping the choice information present in A1 and dPEG during this task. We have added the following text to the manuscript to summarize this finding (L. 293):

      “Finally, we used the same simulation approach to determine what aspects of population activity carry the “choice” related information we observed in A1 and dPEG (Figure 4 – figure supplement 1). Similar to our findings for stimulus decoding, we found that gain modulation alone was sufficient to recapitulate the choice information present in the raw data for this task. This helps frame prior work that pooled neurons across sessions to study population coding of choice in similar auditory discrimination tasks (Christison-Lagay et al, 2017).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This manuscript presents a solid and generally convincing set of experiments to address the question of whether the lateral parafacial area (pFL) is active in controlling active expiration, which is particularly important in patient populations that rely on active exhalation to maintain breathing (eg, COPD, ALS, muscular dystrophy). This study presents a valuable finding by pharmacologically mapping the core medullary region that contributes to active expiration and addresses the question of where these regions lie anatomically. Results from these experiments will be of value to those interested in the neural control of breathing and other neuroscientists as a framework for how to perform pharmacological mapping experiments in the future.

      Thanks for the positive feedback on our study, as well as the assessment of the novelty of our investigation and the advancements to the field that these results will bring in the future.

      We have addressed the specific comments and made changes to the manuscript as indicated below.

      Public Reviews:

      Reviewer #1 (Public Review):

      The main focus of the current study is to identify the anatomical core of an expiratory oscillator in the medulla using pharmacological disinhibition. Although expiration is passive in normal eupneic conditions, activation of the parafacial (pFL) region is believed to evoke active expiration in conditions of elevated ventilatory demands. The authors and others in the field have previously attempted to map this region using pharmacological, optogenetic, and chemogenetic approaches, which present their own challenges.

      In the present study, the authors take a systematic approach to determine the precise anatomical location within the ventral medulla's rostrocaudal axis where the expiratory oscillator is located. The authors used a bicuculline (a GABA-A receptor antagonist) and fluorobeads solution at 5 distinct anatomical locations to study the effects on neuronal excitability and functional circuitry in the pFL. The effects of bicuculline on different phases of the respiratory cycle were characterized using a multidimensional cycle-by-cycle analysis. This analysis involved measuring the differences in airflow, diaphragm electromyography (EMG), and abdominal EMG signals, as well as using a phase-plane analysis to analyze the combined differences of these respiratory signals. Anatomical immunostaining techniques were also used to complement the functional mapping of the pFL.

      Major strengths of this work include a robust study design, complementary neurophysiological and immunohistochemical methods, and the use of a novel phase-plane analysis. The authors construct a comprehensive functional map revealing functional nuances in respiratory responses to bicuculline along the rostrocaudal axis of the parafacial region. They convincingly show that although bicuculline injections at all coordinates of the pFL generated an expiratory response, the most rostral locations in the lateral parafacial region play the strongest role in generating active expiration. These were characterized by a strong impact on the duration and strength of ABD activation and a robust change in tidal volume and minute ventilation. The authors also confirmed histologically that none of the injection sites overlapped grossly with PHOX2B+ neurons, thus confirming the specificity of the injections in the pFL and not the neighboring RTN.

      Collectively, these findings advance our understanding of the presumed expiratory oscillator, the pFL, and highlight the functional heterogeneity in the functional response of this anatomical structure.

      Thanks for the positive feedback on the results presented in the current manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Pisanski and colleagues map regions of the brainstem that produce the rhythm for active expiratory breathing movements and influence their motor patterns. While the neural origins of inspiration are very well understood, the neural bases for expiration lag considerably. The problem is important and new knowledge pertaining to the neural origins of expiration is welcome.

      The authors perturb the parafacial lateral (pFL) respiratory group of the brainstem with microinjection of bicuculline, to elucidate how disinhibition in specific locations of the pFL influences active expiration (and breathing in general) in anesthetized rats. They provide valuable, if not definitive, evidence that the borders of the pFL appear to extend more rostrally than previously appreciated. Prior research suggests that the expiratory pFL exists at the caudal pole of the facial cranial nucleus (VIIc). Here, the authors show that its borders probably extend as much as 1 mm rostral to VIIc. The evidence is convincing albeit with caveats.

      Strengths:

      The authors achieve their aim in terms of showing that the borders of the expiratory pFL are not well understood at present and that it (the pFL) extends more rostrally. The results support that point. The data are strong enough to cause many respiratory neurobiologists to look at the sites rostral to the VIIc for expiratory rhythmogenic neurons and characterize their properties and mechanisms. At present my view is that most respiratory neurobiologists overlook the regions rostral to VIIc in their studies of expiratory rhythm and pattern.

      Weaknesses:

      The injection of bicuculline has indiscriminate effects on excitatory and inhibitory neurons, and the parafacial region is populated by excitatory neurons that are expiratory rhythmogenic and GABA and glycinergic neurons whose roles in producing active expiration are contradictory (Flor et al. J Physiol, 2020, DOI: 10.1113/JP280243). It remains unclear how the microinjections of bicuculline differentially affect all three populations. A more selective approach would be able to disinhibit the populations separately. Nevertheless, for the main point at hand, the data do suggest that we should reconsider the borders of the expiratory pFL nucleus and begin to examine its physiology up to 1 mm rostral to VIIc.

      The control experiment showed that bicuculline microinjections induced cFos expression in the pFL, which is good, but again we don't know which neurons were disinhibited: glutamatergic, GABAergic, or glycinergic.

      Thanks for sharing your excitement on the results of our study, and appreciating the thorough investigation performed with the use of bicuculline, an approach that was originally used in Pagliardini et al, 2011, PMID: 21414911) and then used by many other groups to generate and study active expiration in vivo.

      In the current study we used the well known effect of Bicuculline to systematically test the area that is more sensitive to such a pharmacological effect, and hence may be the core for generating active expiration. While the use of GABA receptor antagonists may have an indiscriminate effect on GABA receptor expressing neurons with various phenotypes, anatomical assessment of inhibitory cells has shown very little distribution of GABAergic and glycinergic cells in the parafacial area (Tanaka et.al, 2003; PMID: 14512139) and it has been inferred in multiple publications (Huckstepp et al., 2015, PMID: 25609622; Huckstepp et al. 2016 PMID: 27300271; Huckstepp et al., 2018, PMID: 30096151; Flor et al., 2020, PMID: 32621515; Britto & Moraes, 2017; PMID: 28004411; Silva et al. 2016; PMID: 26900003) and demonstrated recently (Magalhaes et al.,  2021; PMID: 34510468) that late-E neurons in the parafacial region are excitatory and have a glutamatergic phenotype. We can’t exclude that a small fraction of neurons in the pFL area are inhibitory, and that they could influence recruitment of adjacent late-E expiratory neurons. A more selective activation of neuronal populations with different phenotype would be indeed interesting, nonetheless, if local inhibitory neurons have a role in the generation of active expiration, then their disinhibition could have either an inhibitory effect on late-E activity or stimulate expiration in a more indirect fashion.

      While the effect of bicuculline on active expiration has been reported and replicated in multiple manuscripts, the source of inhibition across different phases of the respiratory cycle is still under investigation. Some studies suggest that GABAergic and glycinergic inhibition is not originated in pFL but rather in the BötC and preBötC areas (Flor et al., 2020, PMID: 32621515; Magalhaes et al., 2021; PMID: 34510468) and the effects of this inhibition across the respiratory cycle is debated. Future studies will be key to identify the source of pFL inhibition.

      The manuscript characterizes how bicuculline microinjections affect breathing parameters such as tidal volume, frequency, ventilation, inspiratory and expiratory time, as well as oxygen consumption. Those aspects of the manuscript are a bit tedious and sometimes overanalyzed. Plus, there was no predictive framework established at the outset for how one should expect disinhibition to affect breathing parameters. In other words, if the authors are seeking to map the pFL borders, then why analyze the breathing patterns so much? Does doing so provide more insight into the borders of pFL? I did not think it was compellingly argued.

      We have edited the introduction to address this comment and emphasize the rationale for the study. We also edited the results section to summarize our findings.

      We continue to report our in-depth analysis of the perturbations induced by bicuculline injection over the various respiratory characteristics as this will be fundamental to determine the effects of our experiment not only on the activation of pFL and active expiration, but also on the respiratory network in general. In order to be fair and open about our findings we have reported the results of our analysis in detail. Of note, all sites generated active expiration, but since the objective of the study was to determine the sites with the most significant changes, a finer and multilevel analysis has been used.

      Further, lines 382-386 make a point about decreasing inspiratory time even though the data do not meet the statistical threshold. In lines 386-395, the reporting appears to reach significance (line 388) but not reach significance (line 389). I had trouble making sense of that disparity.

      The statistics were confirmed, and the lines edited as follows: “Interestingly, the duration of inspiration during the response was found to decrease in all groups relative to baseline respiration (Ti response = 0.279 ± 0.034s, Ti baseline = 0.318 ± 0.043s, Wilcoxon rank sum: Z = 3.24, p = 0.001). Contrary to this decrease in inspiratory duration, the total expiratory time was observed to increase in all groups and remained elevated compared to baseline (TE response = 1.313 ± 0.188s, TE baseline = 1.029 ± 0.161s, Wilcoxon rank sum: Z = 4.49, p = 0.001).”

      The other statistical hiccups include "tended towards significance" (line 454), "were found to only reach significance for a short portion of the response" (line 486-7), "did not reach the level of significance" (line 506), which gives one the sense of cherry picking or over-analysis. Frankly, this reviewer finds the paper much more compelling when just asking whether the microinjections evoke active expiration. If yes, then the site is probably part of the pFL.

      Statistical “tendencies” have been eliminated throughout the manuscript.

      We have analyzed in details our results in order to determine changes and differential effects on respiration when comparing the 5 sites of injections. Although the presentation of the results may seem tedious, it has allowed us to highlight some interesting effects: first, the effects on respiratory frequency. It has been shown in the past that optogenetic stimulation of this area causes an increase in respiratory frequency (Pagliardini et al., 2011, PMID: 21414911), whereas a dishinibition with this same approach or stimulation of AMPAreceptor in pFL have shown a reduction in frequency or not a significant change in the response (Pagliardini et al., 2011, PMID: 21414911; Huckstepp et al., 2015, PMID: 25609622; Huckstepp et al. 2016 PMID: 27300271; Huckstepp et al., 2018, PMID: 30096151). Here, we suggest that the reduction in respiratory frequency is observed only in the caudal sites and could be attributed to BötC effects rather than the stimulation of the core of the pFL since no respiratory change was observe where the effect was more potent (rostral side). Another interesting point was the effects on O2 consumption, although difficult to interpret at this point, we found very interesting that hyperventilation occurred only at the most rostral injection sites.

      I encourage the authors to consider the fickleness of p-values in general and urge them to consider not just p but also effect size.

      Thank you for the feedback on our description of the statistical results and the suggestion of incorporating effect size. We have now included measurements of effect size in the results section.  Specifically, we calculated the effect size within each ANOVA using the value of eta squared for all data shown in Figures 3 and 4. Please note that in our phase-plane analysis (Fig. 5-6) the Mahalanobis distance is itself an effect size measure for multidimensional data. We also note that statistical evaluation using non-parametric analyses do not involve effect sizes.

      Reviewer #3 (Public Review):

      Summary:

      The study conducted by Pisanski et al investigates the role of the lateral parafacial area (pFL) in controlling active expiration. Stereotactic injections of bicuculline were utilized to map various pFL sites and their impact on respiration. The results indicate that injections at more rostral pFL locations induce the most robust changes in tidal volume, minute ventilation, and combined respiratory responses. The study indicates that the rostrocaudal organization of the pFL and its influence on breathing is not simple and uniform.

      Strengths:

      The data provide novel insights into the importance of rostral locations in controlling active expiration. The authors use innovative analytic methods to characterize the respiratory effects of bicuculline injections into various areas of the pFL.

      Weaknesses:

      Bicuculline injections increase the excitability of neurons. Aside from blocking GABA receptors, bicuculline also inhibits calcium-activated potassium currents and potentiates NMDA current, thus insights into the role of GABAergic inhibition are limited.

      Increasing the excitability of neurons provides little insights into the activity pattern and function of the activated neurons. Without recording from the activated neurons, it is impossible to know whether an effect on active expiration or any other respiratory phase is caused by bicuculline acting on rhythmogenic neurons or tonic neurons that modulate respiration. While this approach is inappropriate to study the functional extent of the conditional "oscillator" for active expiration, it provides valuable insights into this region's complex role in controlling breathing.

      We have included a reflection of the weaknesses of our studies in the technical consideration section to address the possibility that bicuculline may induce active expiration through other mechanisms. Please note that the use of bicuculline was not to gain further insight on GABAergic inhibition of pFL but to adopt a tool to generate active expiration that has been extensively validated by our group and others.

      Multiple studies have shown recruitment of excitatory late expiratory neurons with bicuculline injections. Although we did not record from late-E neurons in this study, we infer from the body of literature that disinhibition of neurons in this area will activate late-E neurons (as previously demonstrated) and generate active expiration. Although we see value in recording activity of single neurons (especially to study mechanisms of rhythmogenesis), we opted to measure the physiological response from respiratory muscles as an indication of active expiration recruitment in vivo. Recording from single neurons after bicuculline injections in each site would confirm the presence of expiratory neurons along the parafacial area, which is probably not surprising, since every site tested promoted active expiration. The focus of the study though was to determine the site with the strongest physiological response to disinhibition. Future studies will be key to determine whether all neurons along this column have similar electrophysiological rhythmic properties to the ones recently reported (Magalhaes et al., 2021; PMID: 34510468), or some of them simply provide tonic drive to late-E neurons located elsewhere.

      We have discussed the issue as follows:

      “Our experiments focused on determining the area in the pFL that is most effective in generating active expiration as measured by ABD EMG activity and expiratory flow. We did not attempt to record single cell neuronal activity at various locations as previously shown in other studies (Pagliardini et al 2011; Magalhaes et al., 2021), as this approach would most likely find some late-E neurons across the pFL and thus not effectively discriminate between areas of the pFL. Future studies involving multi-unit recordings or imaging of cell population activities will help to determine the firing pattern and population density of bicuculline-activated cells and further determine differences in distribution and function of late-E neurons across the region of the pFL.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Overall, the manuscript addresses an important question in the field, the anatomical location of the expiratory oscillator. I commend the authors for a well-thought-out and clearly presented study. However, a few small concerns deserve attention to improve the clarity of the report.

      (1) The figures would benefit from a rostral-to-caudal representation of results instead of a caudal-to-rostral orientation. Example, Figure 2.

      We opted for a caudal to rostral representation to progressively move away from the inspiratory oscillator (preBötC) and the anatomical reference point (the caudal tip of the facial nucleus) with our series of injections. 

      (2) A discussion about how expiratory responses generated by these pharmacological approaches would compare to endogenous baseline conditions. The authors mention that bicuculline injections elicited a late-E downward inflection that was absent in baseline conditions. Thus, this raises the point of how these findings compare to awake freely moving animals or during different conditions of increased ventilatory demand.

      This is an interesting question that has not yet been address in the field. As far as we know, there are no recordings of pFL neurons in freely behaving animals although recordings of pFL late-E neurons under elevated PaCO2 have shown a late-E activity in in situ preparations (Britto & Moraes, 2017; PMID: 28004411; Magalhaes et al., 2021; PMID: 34510468).

      We have clarified this in the discussion as follows:

      “At rest, respiratory activity does not present with active expiration (i.e, expiratory flow below its functional residual capacity in conjunction with expiratory-related ABD muscle recruitment) and expiratory flow occurs due to passive recoil of chest wall with no contribution of abdominal activity. Active expiration and abdominal recruitment can be spontaneously observed during sleep (in particular REM sleep, Andrews and Pagliardini, 2015; Pisanski et al., 2019) and can be triggered during increased respiratory drive (e.g. Hypercapnia, RTN stimulation, Abbott et al., 2011). Although never assessed in freely moving, unanesthetized rodents, bicuculline has been extensively used to generate active expiration and late-E neuron activity in both juvenile and adult anesthetized rats (Pagliardini et al., 2011; Huckstepp et al., 2015 Huckstepp et al., 2016; Huckstepp et al., 2018; De Britto and Moraes, 2017; Magalhaes et al., 2021). “

      (3) In Figure 2A, there appears to be an injection site in the top right quadrant of the image, very distant from the intended site. Could the authors confirm if this is an artifact?

      Yes, it is an artifact of image acquisition, we should have marked that in the figure. To avoid confusion and follow other reviewers’ suggestions we have edited he figure.

      (4) A stylistic suggestion would be to include the subpanel of Figure 2C saline control injection as a graph of its own and also include the control anatomical location in 2B.

      Thanks for the suggestion. Because of the complex organization of the figure we opted to leave it as a subpanel in order to not distract the reader from the 5 injection sites, but still provide information about vehicle injection and their lack of changes in respiratory response.

      (5) The authors note that DIAm Area (norm.) during the inspiratory phase is increased in the +6 and +8mm groups. However, Figure 5E shows that the +8mm group is significantly reduced as compared to the +6mm group. Please clarify.

      During the inspiratory phase we did not observe any significant change in the DIA Area (norm.). We realize that the description of this part of the results was confusing and therefore we have eliminated that section.

      Reviewer #2 (Recommendations For The Authors):

      I encourage the authors to consider the fickleness of p-values in general and urge them to consider not just p but also effect size. There is a valuable editorial in this week's J Physiology (https://doi.org/10.1113/JP285575) that may provide helpful guidance.

      Thanks for this comments and the general assessment. We realized that the results section was dense and with a lot of information. We significantly slimmed the description of the results in order to facilitate the appreciation of the results and avoid confounding statement about significant vs non- significant results.

      We have now included measurements of effect size in the results section.  Specifically, we calculated the effect size within each ANOVA using the value of eta squared for all data shown in Figures 3 and 4. Please note that in our phase-plane analysis (Fig. 5-6) the Mahalanobis distance is itself an effect size measure for multidimensional data. We also note that statistical evaluation using non-parametric analyses do not involve effect sizes.

      The equipment and resources should be clearly identified and use RRIDs whenever possible. Resources like antibodies and other reagents (e.g., cryoprotectants) should be identified, not just by manufacturer, but also by specific part or product numbers or identifiers.

      Manuscript has been edited to add these details.

      The manuscript makes reference to ImageJ and Matlab routines, which must be public through GitHub or another stable repository.

      Thanks for pointing this out. Image J analysis has been performed following scripts already available to users (no custom scripts). The Matlab scripts used for the multivariate analysis is now available at: https://github.com/mprosteb/Pisanski2024

      The way that ABD-DIA coupling was assessed was unclear from the Methods.

      The following text has been added to the methods: “The coupling between ABD and DIA signals was measured as a ratio and analyzed by quantifying the number of bursts of activity observed for the ABD and DIA EMG signals during the first 10 minutes of the response, excluding time bins at end of the response (due to fading and waning of the ABD response in those instances).”

      Fig. 1A was never cited in the text.

      It has been cited now.

      Fig. 1A-C appears to be exactly the same as Fig. 5A-C.

      The reviewer is correct. We have used figure 1 to describe and explain our analytical methods with sample data and Figure 5 describes our results. We have clarified that in: “Figure 5: Rostral injections elicit more prominent changes to respiration in each signal and sub-period. A-C: Is the same as Method Figure 1, has been included here for further clarity when analyzing the results.”

      Late Expiratory airflow is given in units of volts (V) in lines 358-363 (Fig. 4C) but then in units of volts-seconds (V•s) in lines 363-367. Both units are problematic because the voltage is neither an air volume nor an air volume per unit time. Is there some conversion factor left out?

      In this section of the results we describe the changes in expiratory peak amplitude (V) and expiratory peak flow (V•s). Since calibration of airflow was performed on the positive flow and for larger volumes, we prefer to use the original units to guarantee precise assessment of the change and avoid introducing potential errors. Since the analysis considers changes from baseline readings, converting to ml or ml*s would not affect our analysis.

      Reviewer #3 (Recommendations For The Authors):

      The study conducted by Pisanski et al investigates the role of the lateral parafacial area (pFL) in respiratory control, specifically in modulating active expiration. The precise location of this expiratory oscillator within the ventral medulla remains uncertain, with some studies indicating that the caudal tip of the facial nucleus (VIIc) forms the core while others propose more rostral areas. Bicuculline injections were utilized at various pFL sites to explore the impact of these injections on respiration. The authors use innovative and impressive analytic methods to characterize the effect on respiratory activity. The results indicate that injections at more rostral pFL locations induce the most robust changes in tidal volume, minute ventilation, and combined respiratory responses. The study will contribute to an enhanced understanding of the neural mechanisms controlling active expiration. The main message of the study is that the rostro-caudal organization of the pFL is not simple and uniform. The data provides novel insights into the importance of rostral locations in controlling active expiration (see e.g. lines 738-740).

      The data and results of the paper are intriguing, and it appears that the experiments are well-managed and executed. However, there are several major and minor comments and suggestions that should be addressed by the authors:

      (1) The study relies heavily on local injections into specific areas that are confirmed histologically. One potential concern is the injection volume of 200 nL in such a tiny area. The authors suggest that the drug did not spread to rostral/caudal areas outside the specified coordinate partly based on their cFOS staining. For example, the lack of cFOS activation in TH+ cells and Phox2B cells is interpreted as proof that bicuculline did not spread to these somas (Figure 2). The authors seem to use a similar argument as evidence that the pFL does not include Phox2B neurons in the RTN as discussed in the Discussion section (lines 830-847). However, it is very surprising that bicuculline injections into an area that is known to contain Phox2B and Th+ neurons do not activate these neurons as assessed by the cFOS staining. It seems puzzling to me that none of their injections shown in Figure 2 activated Phox2B or Th neurons. I assume that in targeting the pFL the authors must have sometimes hit areas that included neurons that define the RTN, which would have activated Phox2B or Th+ neurons. Did the authors find that these activations did not activate active expiration? Such negative "controls" would strengthen their argument that pFL is a separate and distinct region that selectively controls active expiration.

      Thanks for the positive feedback on the manuscript. As it has been demonstrated and discussed in several previous publications, PHOX2B expressing neurons in this area of the brain are part of the RTN Neuromedin B positive neurons (more densely located in the ventral paraFacial rather than the lateral parafacial, our site of injection), the TH+ C1 neurons (located in a somewhat more caudal and medial position compared to our sites of injection, around the BötC/ preBötC area) and the large Facial MN (easily identifiable by their large size and compact location). Given this differential spatial distribution, and the controls described below, we believe we have reduced the possibility of the direct activation of these neurons, although we can’t exclude it in full.

      There is now strong evidence about lack of PHOX2B expression in late E neuron in juvenile and adult rats (Magalhaes et al., 2021; PMID: 34510468). We realize that the microinjected solution could potentially diffuse in the brain and hit other areas, but we combined two strategies to verify our intention for a focal injection activating only a restricted area of the brain (i.e., the pFL): i) localization of fluorobeads that were diluted in the Bicuculline solution; ii) expression of cFos combined with anatomical markers, to identify activated cells. Fluorobeads have a very limited spread in the brain and therefore informed us of the site of the injection to differentiate between the five injections locations. Although we can’t assume that Bicuculline will have a similar spread (and it will also be quickly degraded in the tissue), the combination of this analysis with the localized expression of cFos cells has helped us to differentiate between injections site. Because of the proximity of PHOX2B cells in RTN and C1 neurons, we also combined cFos expression with immunohistochemistry to determine whether bicuculline activation was also visible in these two neuronal populations. Our results indicate that there is baseline cfos activity in RTN neurons (see vehicle injection) but the fraction of PHOX2B activated cells did not increase with bicuculline injections suggesting that these neurons were not the target of our injections. Please note that cfos expression has been extensively used to determine RTN neuron activation, especially following chemoreflex responses. 

      (2) The authors refer to "the expiratory oscillator" throughout the manuscript (e.g. lines 58, 62, 65) as if there is only one expiratory oscillator i.e. "the expiratory oscillator". For some reason, the authors avoided citing and mentioning PiCo (Anderson et al. 2016), which is considered the oscillator for postinspiration. Since the present study focuses on the role of expiration, and since the authors describe convincing effects on postinspiration, considering this oscillator which is located dorsomedial to the VRC seems relevant for the present study.

      Due to the limited and controversial literature that is currently present describing Pico as a third oscillator and the fact that our studies do not directly assess the post-inspiratory activity (as measure by the V nerve or laryngeal muscles) or Pico activity and location (which would be even more distant than the RTN, for example), we prefer to avoid commenting on the effects of this injection on Pico or the connectivity between Pico and pFL.

      We have added this to the discussion:

      “Therefore, although it has previously been described, it is currently unknown the exact mechanism by which this post-I activity in the ABD muscles is generated. For example the interplay between the rostral pFL and brainstem structures generating post-inspiratory activity, such as the proposed post-inspiratory oscillator (PiCo; Anderson et al., 2016) or pontine respiratory networks, could be reasonably involved in this process.”

      (3) The authors do not specify what type of bicuculline they injected. Bicuculline is known to have significant effects on potassium channels. Thus, the effects reported here could be due to a non-specific change in excitability, rather than caused by a specific GABAergic blockade.

      The authors also do not know what effects these injections cause in the neurons in vivo, since the injections are not accompanied by recordings from the respiratory neurons that they activate. This together with the non-specific bicuculline effects will affect the interpretation of the results. Thus, the authors need to be more careful when interpreting their effects as "GABAergic". The use of more specific blockers like gabazine could partly address this concern. The authors have to discuss this in a "limitation section".

      Thanks for pointing that out, we have now clarified in the methods section that we used bicuculline methochloride. We can’t exclude that some side- effects could be present due to the use of this drug. For the purpose of this study though, we focused on using bicuculline as a tool to consistently generate active expiration since it has been extensively used by multiple laboratories to induce abdominal muscle recruitment and active expiration, as well as to directly record late-E neurons in this same area.

      We have included in the discussion the following statement:

      “Technical considerations

      Bicuculline methiodide has previously been observed to exhibit inhibitory effects on Ca2+ activated K+ currents inducing non-specific potentiation of NMDA currents (Johnson and Seutin, 1997). Consequently, caution is warranted in attributing our findings solely to the GABAa antagonist properties of bicuculline. Previous work has demonstrated a temporal correlation between the onset of late-E neuron activity in the caudal parafacial region and ABD activity in response to bicuculline (Pagliardini et al., 2011; de Britto and Moraes, 2017; Magalhaes et al., 2021) as well as GABAergic sIPSCs in late-E neurons (Magalhaes et al., 2012). However, it is essential to note that the current study lacks single unit recording, preventing us from definitively confirming whether the observed activity stems from late-E neuronal GABAergic dishinibition or excitation through non GABAergic mechanisms.”

      (4) I also caution the authors when stating that the bicuculline injections will reveal the precise location and functional boundaries of "the" expiratory oscillation within the pFL. Increasing the excitability with bicuculline is inappropriate to study the functional boundaries of an oscillator. It is particularly inappropriate to identify the boundaries of the pFL, a network that is normally inactive and activated only under certain behavioral and metabolic conditions. Because the injections are increasing the neuronal excitability unspecifically, and because the authors are not recording the activity of the neurons in the pFL region it is unclear what kind of neurons are activated. The cFOS staining may help to define whether these neurons are Phox2B or Th positive or negative, but they will not provide insights into the activity patterns of the activated neurons. Thus, it is fair to assume that these injections will likely include also tonic neurons that might indirectly control the activity of pFL neurons under certain metabolic or behavioral conditions without actually being involved in the rhythmogenesis of active expiration. Many of the effects peak after several minutes, and different regions cause differential effects with different time courses, which is difficult to interpret functionally. Thus, the "core" identified in the present study could consist of tonic neurons as opposed to rhythmic neurons generating active expiration.

      We agree with the reviewer that our local injections may have activated an heterogeneous population of neurons. We do not claim that we only activated late-E rhythmogenic neurons but that our multiple sites of injections revealed the area that is generating the strongest excitation of ABD muscles and active expiration.

      While the use of GABA receptor antagonists may have an indiscriminate effect on GABA receptor expressing neurons with various phenotypes, anatomical assessment of inhibitory cells has shown very little distribution of GABAergic and glycinergic cells in the parafacial area (Tanaka et.al, 2003; PMID: 14512139) and it has been inferred in multiple publications (Huckstepp et al., 2015, PMID: 25609622; Huckstepp et al. 2016 PMID: 27300271; Huckstepp et al., 2018, PMID: 30096151; Flor et al., 2020, PMID: 32621515; Britto & Moraes, 2017; PMID: 28004411; Silva et al. 2016; PMID: 26900003) and demonstrated recently (Magalhaes et al.,  2021; PMID: 34510468) that late-E neurons in the parafacial region are excitatory and have a glutamatergic phenotype

      As suggested by the reviewer, it is possible that the bicuculline injection may have activated some tonic non rhythmogenic neurons which could activate the expiratory oscillator located elsewhere.

      We have edited the introduction as follows:

      “By strategically administering localized volumes of bicuculline at multiple rostrocaudal levels of the ventral brainstem, we aimed to selectively enhance the excitability of neurons driving active expiration, thereby revealing the extension of the pharmacological response and the most efficient site in generating active expiration.”

      We have edited the results as follows:

      “Importantly, the group with injection sites at +0.6 mm from VIIc exhibited the swiftest response onset, suggesting that this area is the most critical for the generation of active expiration, either through direct activation of the expiratory oscillator or, alternatively, for providing a strong tonic drive to late-E neurons located elsewhere.”

      In the introduction, it should also be emphasized that the pharmacological approach used in the present study complements the existing elegant chemogenetic studies, rather than emphasizing primarily the limitations of the chemogenetic inhibitions. The conclusion should be that these studies together provide different, yet complementary insights: The chemogenetic approach by inhibiting neurons, the present study by exciting neurons, and all studies come with their own limitations.

      Thanks for the suggestion, we have updated the manuscript as follows:

      “Although both of these elegant chemogenetic studies have contributed extensively to our understanding of the pFL, the existing evidence suggests that the expiratory oscillator may expand beyond the limits of the viral expression achieved in said studies, as proposed by Huckstepp et al., (2015).”

      Throughout the manuscript, the authors have to be cautious when implying that an excitatory effect relates to the activity of rhythmogenic pFL neurons. For example, on line 710 the authors state that "it is conceivable to infer that the rostral pFL is in the closest proximity to the cells responsible for the generation of active expiration". While it may indeed be "conceivable", the bicuculline injections themselves provide no insights into the location of neurons responsible for rhythmogenesis. It is equally "conceivable" that the excited neurons provide a tonic drive to the neurons without being involved in the generation of active expiration. These tonic neurons could be located at a distance from the presumed rhythmogenic core.

      We have included the possibility of tonic excitation in the technical considerations section:

      “However, our study did not include recording from late-E neurons following bicuculline injections, preventing us from definitively confirming whether the observed activity stems from late-E neuronal excitation or the potentiation of a tonic drive, particularly in the rostral areas.”

      (5) It is intriguing that some of their injections (Fig.2D) evoked postinspiratory activity. This interesting finding should be discussed as it could provide important insights into the coordination of the different phases of expiration.

      Thanks for the suggestion. We have included the following to the discussion:

      “Therefore, although it has previously been described, the exact mechanism by which this post-I ABD activity is generated is unclear. This late-E/post-I pattern of activity is similar to what has been observed in in vitro preparations and in vivo recordings in juvenile rats (Janczewski et al., 2002; Janczewski et al., 2006).

      “Therefore, although it has previously been described, it is currently unknown the exact mechanism by which this post-I activity in the ABD muscles is generated. For example the interplay between the rostral pFL and brainstem structures generating post-inspiratory activity, such as the proposed post-inspiratory oscillator (PiCo; Anderson et al., 2016) or pontine respiratory networks, could be reasonably involved in this process.”

      (6) The authors conducted bilateral disinhibition of the pFL, but only a unilateral photomicrograph was shown. Figure 2 should include a representative bilateral photomicrograph along with a scatter plot for clarity and completeness.

      We have edited figure 2 to include representative images of bilateral injections.

      (7) Regarding the Bicuculline injections in the Methods section: Aside from specifying exactly what type of bicuculline was used, the authors should provide more information about the pFL location and landmarks used, including the missing medial-lateral coordinate. The fluorobead spread of approximately ~300 µm, as observed in Figure 2C, is crucial for the interpretation of the results and should be detailed. An alternative approach could involve e.g. calculating the area covered by fluorobeads in each group.

      We have included the following in the text:

      “Each rat was injected at 2.8 mm lateral from the midline and at a specific RC coordinate based on the following groups: -0.2 mm from the caudal tip of the facial nucleus (VIIc) (n=5), +0.1 mm from VIIc (n=7), +0.4 mm from VIIc (n=5), +0.6 mm from VIIc (n=6), +0.8 mm from VIIc (n=5)”

      “These findings strongly suggest that bicuculline specifically activated cells within the vicinity of the injection sites which spread ~300 ìm (Figure 2C, horizontal lines) and did not activate PHOX2B+ cells in the RTN area, beyond their baseline level of activity.”

      (8) In the Experimental Protocol, the authors should provide more details on how the parameters were determined. For example, specify the number of cycles included for Dia frequency/amplitude, Abd frequency/amplitude, and with regards to the averaging process, the authors should specify over how many cycles they obtained an average for Dia/Abd activity time and AUC. The authors should also provide information on the number of bicuculline injections that they repeated to average these values and they should report the coefficient of variation for repeated injections. Please clarify the method used to calculate AUC, considering the non-linear nature of the activity.

      Only one bicuculline injection per rat was performed and the number of rats used for each injection site is indicated in the methods as follows:

      “Each rat was injected at 2.8 mm lateral from the midline and at a specific RC coordinate based on the following groups: -0.2 mm from the caudal tip of the facial nucleus (VIIc) (n=5), +0.1 mm from VIIc (n=7), +0.4 mm from VIIc (n=5), +0.6 mm from VIIc (n=6), +0.8 mm from VIIc (n=5), and CTRL (n=7). We recorded the physiological responses to the injection for 20-25 min.”

      We have clarified in the methods section the following:

      “Respiratory data was tracked in time bins of 2-minute duration from the baseline period prior to injections and spanned 20 min of recording post-injection. Mean-cycle measurements for each signal were computed by averaging values across all cycles within a given time bin.”

      Additional clarifications have been added:

      “We then used the average calculations of respiratory rate (RR), tidal volume (VT), Minute Ventilation (Ve), expiratory ABD amplitude, expiratory ABD area, VO2, VE/VO2 to obtain values relative to the baseline period. Peak responses were identified as the time bin that produced the strongest changes relative to baseline.”

      “Mean-cycle measurements for each signal were computed by averaging across all cycles within a given time bin. (~300 cycles in baseline, ~100 cycles per response time bin). We then used the average calculations of respiratory rate (RR), tidal volume (VT), Minute Ventilation (Ve), expiratory ABD amplitude, expiratory ABD area, VO2, VE/VO2 to obtain values relative to the baseline period. Peak responses were identified as the time bin that produced the strongest changes relative to baseline.”

      “The Area under the curve (AUC) was measured during baseline and was subtracted from the corresponding AUC of the response for each time bin (Figure 1C). This AUC measure was computed as the sum of the signal in a given respiratory phase as all signals were sampled at the same rate. Note that areas calculated below the zero- (0) line, as would be expected from a negative airflow during expiration, yields negative AUC values.”

      (9) The authors should explain how oxygen consumption was calculated-did it involve the Depocas & Hart (1957) formula? Please provide information on expiratory CO2, whether ventilation was adjusted to achieve consistent CO2 levels across animals, and ideally specify the end-tidal CO2 range for the experiments. Discuss the rationale behind the chosen CO2 levels and whether CO2-dependent pFL activity could have influenced results.

      We have clarified in the measurement in the methods as follows:

      “The gas analyzer measured fractional concentration of O2. Based on this and the flow rate at the level of the trachea (minute ventilation), we calculated O2 consumption according to Depocas and Hart (1957).”

      We have also added to the methods section:

      “During the entire experimental procedure, rats breathed spontaneously and end tidal CO2 was not adjusted through the experimental protocol.”

      In terms of the CO2-dependent pFL activity possibly influencing the results: by inducing active expiration in conditions in which there is no physiological demand for it (i.e. no hypoxia or hypercapnia), it is likely that pCO2 is reduced, overall decreasing the drive for ABD activity which would suggest that our results are likely an underestimation of the response that would have been produced if we maintained the CO2 levels constant.

      (10) The authors should address the discrepancy in fos-activated neurons between the control (44 neurons) and experimental animals (90-120 neurons per hemisection). Please explain the activation in the control group. Please also provide insights into how the authors interpret this difference in cfos-activated neurons between control and experimental groups.

      The following paragraph has been added to the discussion:

      “The assessment of cellular activity, quantified through cFos staining, unveiled the existence of basal activity in control rats. This observed baseline activity is likely emanating from subthreshold physiological processes within the parafacial area which do not culminate in ABD activity. Analysis of the cFos staining confirmed focal activation of neurons in the pFL of rats injected with bicuculline and minimal cFos expression in the PHOX2B+ cells in all groups as compared to the control group. These results confirm the very limited mediolateral spread of the drug from the core site of injection and back previous findings supporting the hypothesis that the majority of PHOX2B+ cells are more ventrally located in the parafacial area (pFV, Huckstepp et al., 2015) and PHOX2B+ cell recruitment is not necessary for active expiration (de Britto & Moraes, 2017; Magalhães et al., 2021).”

      (11) In Figure 8, the authors plotted the relationship of each cycle correlated to the normalized area. Have you also calculated the same late-E, inspiratory, and post-I to fR or VT separately?

      No, we only did the separated breathing phase (late-E, I, Post-I) analysis in the calculations of the DIA, airflow and ABD area, as well as on the Euclidean and Mahalanobis distances.

      Minor comments:

      Is there any specific reason for conducting these experiments exclusively in males?

      No, we usually use male rats for this type of experiments. We use both male and female rats for other studies that concern the effects of sex hormones but in this case, we performed experiments only in male rats.

      Page 13, Line 320: What is the duration of the bicuculline-induced effects?

      This information is included in the results section as follows:

      “Similarly, the ABD response duration was longer at the two most rostral locations (+0.6 mm = 17.6 ± 2.7 min; +0.8 = 17.1 ± 3.3 min) compared to the most caudal group (-0.2 mm = 2.4 ± 1.1 min; One-Way ANOVA p = 0.043; Tukey -0.2 mm vs +0.6 mm: p = 0.048; -0.2 mm vs +0.8 mm: p = 0.041; Figure 3E).”

      Page 16, Line 400: Is there a rationale for the high tidal volume (VT) observed in these animals? A baseline VT of 7 ml/kg appears notably elevated.

      Please note that rats were vagotomised and spontaneously breathing, hence the tidal volume is increased compared to non-vagotomised rats as seen in previous studies (Ouahchi et al., 2011).

      Figure 2D: Could you provide longer recordings? Additionally, incorporating diaphragm (Dia) recordings would enhance the interpretation of abdominal (Abd) recordings.

      Figure 3 A has a representative example of the 20 minute recordings for each location.

      Page 18, Line 458: Please rectify "Dunn: p , 0.001" to the appropriate format, perhaps "Dunn: p < 0.001."

      Thank you, edited.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This fundamental study investigates the transcriptional changes in neurons that underlie loss of learning and memory with age in C. elegans, and how cognition is maintained in insulin/IGF-1-like signaling mutants. The presented evidence is compelling, utilizing a cutting-edge method to isolate neurons from worms for genomics that is clearly conveyed with a rigorous experimental approach. Overall, this study supports that older daf-2 worms maintain cognitive function via mechanisms that are unique from younger wild type worms, which will be of great interest to neuroscientists and researchers studying ageing.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors perform RNA-seq on FACS isolated neurons from adult worms at days 1 and 8 of adulthood to profile the gene expression changes that occur with cognitive decline. Supporting data are included indicating that by day 7 of adulthood, learning and memory are reduced, indicating that this timepoint or after represents cognitively aged worms. Neuronal identity genes are reduced in expression within the cognitively aged worms, whereas genes involved in proteostasis, transcription/chromatin, and the stress response are elevated. A number of specific examples are provided, representing markers of specific neuronal subtypes, and correlating expression changes to the erosion of particular functions (e.g. motor neurons, chemosensory neurons, aversive learning neurons, etc).

      To investigate whether upregulation of genes in neurons with age is compensatory or deleterious, the authors reduced expression of a set of three significantly upregulated genes and performed behavioral assays in young adults. In each case, reduction of expression improved memory, consistent with a model in which age-associated increases impair neuronal function.

      The authors then characterize learning and memory in wild type, daf-2, and daf-2/daf-16 worms with age and find that daf-2 worms have an extended ability to learn for approximately 10 days longer that wild types. This was daf-16 dependent. Memory was extended in daf-2 as well, and strikingly, daf-2;daf-16 had no short term memory even at day 1. Transcriptomic analysis of FACS-sorted neurons was performed on the three groups at day 8. The authors focus their analysis on daf-2 vs. daf-2;daf-16 and present evidence that daf-2 neurons express a stress-resistance gene program. They also find small differences between the N2 and daf-2;daf-16 neurons, which correlate with the observed behavioral differences, though these differences are modest.

      The authors tested eight candidate genes that were more highly expressed in daf-2 neurons vs. daf-2;daf-16 and showed that reduction of 2 and 5 of these genes impaired learning and memory, respectively, in daf-2 worms. This finding implicates specific neuronal transcriptional targets of IIS in maintaining cognitive ability in daf-2 with age, which, importantly, are distinct from those in young wild type worms.

      Overall, this is a strong study with rigorously performed experiments. The authors achieved their aim of identifying transcriptional changes in neurons that underlie loss of learning and memory in C. elegans, and how cognition is maintained in insulin/IGF-1-like signaling mutants. 

      We thank you for the evaluation and response.

      Reviewer #2 (Public Review):

      Weng et al. perform a comprehensive study of gene expression changes in young and old animals, in wild-type and daf-2 insulin receptor mutants, in the whole animal and specifically in the nervous system. Using this data, they identify gene families that are correlated with neuronal ageing, as well as a distinct set of genes that are upregulated in neurons of aged daf-2 mutants. This is particularly interesting as daf-2 mutants show both extended lifespan and healthier neurons in aged animals, reflected by better learning/memory in older animals compared with wild-type controls. Indeed, knockdown of several of these upregulated genes resulted in poorer learning and memory. In addition, the authors showed that several genes upregulated during ageing in wild-type neurons also contribute to learning and memory; specifically, knockdown of these genes in young animals resulted in improved memory. This indicates that (at least in this small number of cases), genes that show increased transcript levels with age in the nervous system somehow suppress memory, potentially by having damaging effects on neuronal health.

      Finally, from a resource perspective, the neuronal transcriptome provided here will be very useful for C. elegans researchers as it adds to other existing datasets by providing the transcriptome of older animals (animals at day 8 of adulthood) and demonstrating the benefits of performing tissue-specific RNAseq instead of whole-animal sequencing.

      The work presented here is of high quality and the authors present convincing evidence supporting their conclusions. I only have a few comments/suggestions:

      (1) Do the genes identified to decrease learning/memory capacity in daf-2 animals (Figure 4d/e) also impact neuronal health? daf-2 mutant worms show delayed onset of age-related changes to neuron structure (Tank et al., 2011, J Neurosci). Does knockdown of the genes shown to affect learning also affect neuron structure during ageing, potentially one mechanism through which they modulate learning/memory? 

      (2) The learning and memory assay data presented in this study uses the butanone olfactory learning paradigm, which is well established by the same group. Have the authors tried other learning assays when testing for learning/memory changes after knockdown of candidate genes? Depending on the expression pattern of these genes, they may have more or less of an effect on olfactory learning versus for e.g. gustatory or mechanosensory-based learning.

      (3) A comment on the 'compensatory vs dysregulatory' model as stated by the authors on page 7 - I understand that this model presents the two main options, but perhaps this is slightly too simplistic: gene expression that rises during ageing may be detrimental for memory (= dysregulatory), but at the same time may also be beneficial other physiological roles in other tissues (=compensatory). 

      Thank you for your original suggestions; we addressed them in the previous version of response to the reviewers.

      Comments on revised version:

      I am satisfied with how the authors have addressed all my comments/suggestions. 

      Thank you for your response!

      Reviewer #3 (Public Review):

      Summary

      In this manuscript, Weng et al. identify the neuron specific transcriptome that impacts age dependent cognitive decline. The authors design a pipeline to profile neurons from wild type and long-lived insulin receptor/IGF-1 mutants using timepoints when memory functions are declining. They discover signatures unique to neurons which validates their approach. The authors identify that genes related to neuronal identity are lost with age in wild type worms. For example, old neurons reduce the expression of genes linked to synaptic function and neuropeptide signaling and increase the expression of chromatin regulators, insulin peptides and glycoproteins. Depletion of selected genes which are upregulated in old neurons (utx-1, ins-19 and nmgp-1) leads to improved short memory function. This indicates that some genes that increase with age have detrimental effects on learning and memory. The pipeline is then used to test neuronal profiles of long-lived insulin/IGF-1 daf-2 mutants. Genes related to stress response pathways are upregulated in long lived daf-2 mutants (e.g. dod-24, F08H9.4) and those genes are required for improved neuron function.

      Strengths

      The manuscript is well written, and the experiments are well described. The authors take great care to explain their reasoning for performing experiments in a specific way and guide the reader through the interpretation of the results, which makes this manuscript an enjoyable and interesting read. The authors discover novel regulators of learning and memory using neuron-specific transcriptomic analysis in aged animals, which underlines the importance of cell specific deep sequencing. The timepoints of the transcriptomic profiling are elegantly chosen, as they coincide with the loss of memory and can be used to specifically reveal gene expression profiles related to neuron function. The authors discuss on the dod-24 example how powerful this approach is. In daf-2 mutants whole-body dod-24 expression differs from neuron specific profiles, which underlines the importance of precise cell specific approaches. This dataset will provide a very useful resource for the C. elegans and aging community as it complements existing datasets with additional time points and neuron specific deep profiling.

      Weakness

      This study nicely describes the neuron specific profiles of aged long-lived daf-2 mutants. Selected neuronal genes that were upregulated in daf-2 mutants (e.g. F08H9.4, mtl-1, dod-24, alh-2, C44B7.5) decreased learning/memory when knocked down. However, the knock down of these genes was not specific to neurons. The authors use a neuron-sensitive RNAi strain to address this concern and acknowledge this caveat in the text. While it is likely that selected candidates act only in neurons it is possible that other tissues participate as well.

      Thank you for pointing this caveat out. We have mentioned it in the figure legend.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      This computational modeling study builds on multiple previous lines of experimental and theoretical research to investigate how a single neuron can solve a nonlinear pattern classification task. The authors construct a detailed biophysical and morphological model of a single striatal medium spiny neuron, and endow excitatory and inhibitory synapses with dynamic synaptic plasticity mechanisms that are sensitive to (1) the presence or absence of a dopamine reward signal, and (2) spatiotemporal coincidence of synaptic activity in single dendritic branches. The latter coincidence is detected by voltage-dependent NMDA-type glutamate receptors, which can generate a type of dendritic spike referred to as a "plateau potential." The proposed mechanisms result in moderate performance on a nonlinear classification task when specific input features are segregated and clustered onto individual branches, but reduced performance when input features are randomly distributed across branches. Given the high level of complexity of all components of the model, it is not clear which features of which components are most important for its performance. There is also room for improvement in the narrative structure of the manuscript and the organization of concepts and data.

      To begin with, we will better explain the goal of the study in the introduction and explain that it relies on earlier theoretical work. The goal of the study was to investigate whether and how detailed neuron models with biologically-based morphologies, membrane properties, ion channels, dendritic nonlinearities, and biologically plausible learning rules can quantitatively account for the theoretical results obtained with more abstract models.

      We will further evaluate and clarify the roles of several components in our model regarding their impact on the results. These include a) the role of sufficiently robust and supralinear plateau potentials in computing the NFBP; and b) the importance of metaplasticity for individual synapses, allowing them to start or stop responding to relevant or irrelevant stimuli, respectively, over the training period.

      Strengths:

      The integrative aspect of this study is its major strength. It is challenging to relate low-level details such as electrical spine compartmentalization, extrasynaptic neurotransmitter concentrations, dendritic nonlinearities, spatial clustering of correlated inputs, and plasticity of excitatory and inhibitory synapses to high-level computations such as nonlinear feature classification. Due to high simulation costs, it is rare to see highly biophysical and morphological models used for learning studies that require repeated stimulus presentations over the course of a training procedure. The study aspires to prove the principle that experimentally-supported biological mechanisms can explain complex learning.

      Weaknesses:

      The high level of complexity of each component of the model makes it difficult to gain an intuition for which aspects of the model are essential for its performance, or responsible for its poor performance under certain conditions. Stripping down some of the biophysical detail and comparing it to a simpler model may help better understand each component in isolation. That said, the fundamental concepts behind nonlinear feature binding in neurons with compartmentalized dendrites have been explored in previous work, so it is not clear how this study represents a significant conceptual advance. Finally, the presentation of the model, the motivation and justification of each design choice, and the interpretation of each result could be restructured for clarity to be better received by a wider audience.

      To achieve the goal of the study as described above, we chose to use a biophysically and morphologically detailed neuron model to see if it could quantitatively account for the theoretically-based nonlinear computations, for instance, those discussed in Tran-Van-Minh, A. et al. (2015).

      We will explain the role of each component of the learning rule, as well as the dendritic nonlinearities, for the performance on the NFBP.

      Reviewer #2 (Public Review):

      Summary:

      The study explores how single striatal projection neurons (SPNs) utilize dendritic nonlinearities to solve complex integration tasks. It introduces a calcium-based synaptic learning rule that incorporates local calcium dynamics and dopaminergic signals, along with metaplasticity to ensure stability for synaptic weights. Results show SPNs can solve the nonlinear feature binding problem and enhance computational efficiency through inhibitory plasticity in dendrites, emphasizing the significant computational potential of individual neurons. In summary, the study provides a more biologically plausible solution to single-neuron learning and gives further mechanical insights into complex computations at the single-neuron level.

      Strengths:

      The paper introduces a novel learning rule for training a single multicompartmental neuron model to perform nonlinear feature binding tasks (NFBP), highlighting two main strengths: the learning rule is local, calcium-based, and requires only sparse reward signals, making it highly biologically plausible, and it applies to detailed neuron models that effectively preserve dendritic nonlinearities, contrasting with many previous studies that use simplified models.

      Indeed, the learning rule is local and reward-based, and we will highlight better in the paper that it is “always on”, i.e. there are no separate training and testing phases.

      Weaknesses:

      I am concerned that the manuscript was submitted too hastily, as evidenced by the quality and logic of the writing and the presentation of the figures. These issues may compromise the integrity of the work. I would recommend a substantial revision of the manuscript to improve the clarity of the writing, incorporate more experiments, and better define the goals of the study.

      We will revise the manuscript thoroughly to better present the figures and writing (more detailed below). We will also show supplementary figures showcasing the role of the different components of the learning rule.

      Major Points:

      (1) Quality of Scientific Writing: The current draft does not meet the expected standards. Key issues include:

      i. Mathematical and Implementation Details: The manuscript lacks comprehensive mathematical descriptions and implementation details for the plasticity models (LTP/LTD/Meta) and the SPN model. Given the complexity of the biophysically detailed multicompartment model and the associated learning rules, the inclusion of only nine abstract equations (Eq. 1-9) in the Methods section is insufficient. I was surprised to find no supplementary material providing these crucial details. What parameters were used for the SPN model? What are the mathematical specifics for the extra-synaptic NMDA receptors utilized in this study? For instance, Eq. 3 references [Ca2+]-does this refer to calcium ions influenced by extra-synaptic NMDARs, or does it apply to other standard NMDARs? I also suggest the authors provide pseudocodes for the entire learning process to further clarify the learning rules.

      The detailed setup of the model is described in the referenced papers, including equations and parameter values. The model is downloadable on github. For this reason we did not repeat the information here. That said, we will go through the manuscript and clarify all details, and provide supplemental figures and a GitHub link where necessary for reproducing the results.

      ii. Figure quality. The authors seem not to carefully typeset the images, resulting in overcrowding and varying font sizes in the figures. Some of the fonts are too small and hard to read. The text in many of the diagrams is confusing. For example, in Panel A of Figure 3, two flattened images are combined, leading to small, distorted font sizes. In Panels C and D of Figure 7, the inconsistent use of terminology such as "kernels" further complicates the clarity of the presentation. I recommend that the authors thoroughly review all figures and accompanying text to ensure they meet the expected standards of clarity and quality.

      We will revise the figures for consistency and clarity.

      iii. Writing clarity. The manuscript often includes excessive and irrelevant details, particularly in the mathematical discussions. On page 24, within the "Metaplasticity" section, the authors introduce the biological background to support the proposed metaplasticity equation (Eq. 5). However, much of this biological detail is hypothesized rather than experimentally verified. For instance, the claim that "a pause in dopamine triggers a shift towards higher calcium concentrations while a peak in dopamine pushes the LTP kernel in the opposite direction" lacks cited experimental evidence. If evidence exists, it should be clearly referenced; otherwise, these assertions should be presented as theoretical hypotheses. Generally, Eq. 5 and related discussions should be described more concisely, with only a loose connection to dopamine effects until more experimental findings are available.

      The reviewer is correct; the cited text does not present experimental facts but rather illustrates how the learning rule operates. We will revise the section on the construction of learning rules to clarify which aspects are explicit assumptions and which are experimentally verified. In particular, we will provide a more detailed description and motivation for metaplasticity

      (2) Goals of the Study: The authors need to clearly define the primary objective of their research. Is it to showcase the computational advantages of the local learning rule, or to elucidate biological functions?

      Briefly, the goal of the study was to investigate whether earlier theoretical results with more abstract models can be quantitatively recapitulated in morphologically and biophysically detailed neuron models with dendritic nonlinearities and with biologically based learning rules. (similar response to Summary and Weaknesses to Reviewer #1). We will update the introduction with this information.

      i. Computational Advantage: If the intent is to demonstrate computational advantages, the current experimental results appear inadequate. The learning rule introduced in this work can only solve for four features, whereas previous research (e.g., Bicknell and Hausser, 2021) has shown capability with over 100 features. It is crucial for the authors to extend their demonstrations to prove that their learning rule can handle more than just three features. Furthermore, the requirement to fine-tune the midpoint of the synapse function indicates that the rule modifies the "activation function" of the synapses, as opposed to merely adjusting synaptic weights. In machine learning, modifying weights directly is typically more efficient than altering activation functions during learning tasks. This might account for why the current learning rule is restricted to a limited number of tasks. The authors should critically evaluate whether the proposed local learning rule, including meta-plasticity, actually offers any computational advantage. This evaluation is essential to understand the practical implications and effectiveness of the proposed learning rule.

      As mentioned above, our intent is not to demonstrate the computational advantages of the proposed learning rule but to investigate and illustrate how biophysically detailed neuron models that also display dendritic plateau potential mechanisms, together with biologically-based learning rules, can support the theoretically predicted computational requirements for complex neuronal processing (e.g., Tran-Van-Minh, A. et al., 2015), as well as the results obtained with more abstract neuron models and plateau potential mechanisms (e.g., Schiess et al., 2016; Legenstein and Maass, 2011).

      In the revised manuscript, we will also discuss the differences between the supervised learning rule in Bicknell and Hausser (2021) and our local and reward-based learning rule. We will also show a critical evaluation of how our local learning rule and metaplasticity affect the synaptic weights and why the different components of the rule are needed.

      ii. Biological Significance: If the goal is to interpret biological functions, the authors should dig deeper into the model behaviors to uncover their biological significance. This exploration should aim to link the observed computational features of the model more directly with biological mechanisms and outcomes.

      We will make an attempt to better link the learning rule and dendritic supra-linearities and interpret their biological function.

    1. Author response:

      eLife assessment

      “…The evidence however is incomplete, since the tai loss-of-clone phenotype is based on one allele and the mechanism involved in cell competition through Dlp and Wg lacks adequate supporting data.”

      We agree with the need for a second allele and are adding supporting data from a new tai lof allele we have generated by Crispr.

      We also agree that additional functional data would help demonstrate that differences in Dlp levels are required for the mechanism of Tai cell competition. Experiments are ongoing to test whether normalizing Dlp levels across clonal boundaries rescues elimination of Tai-low clones.

      Reviewer #1:

      Overall Statements:

      “There is some data in the supplementary materials suggesting that Tai promotes dlp mRNA expression, but this was not compelling.”

      We are currently testing effects on Tai on dlp and dally transcription using qPCR and reporter transgenes. As noted below, the effects of Tai on Dlp trafficking are ‘strong’, so resolving effects on Dlp transcription will complement this localization data.

      “The authors don't further examine Dlp protein in tai clones.”

      As noted by the Reviewer, we do examine Dlp levels and localization in tai-low clones (see Figure 9), but these experiments are challenging due to their very small size and the hypomorphic nature of the tai allele (tai[k15101]) that was used. Experiments are in progress to examine the effect of our Crispr null allele of tai on Dlp levels and localization in wing clones.

      “In sum, the authors have uncovered some interesting results, but the story has some unresolved issues that, if addressed, could boost its impact. Additionally, the preprint seems to have 2 stories, one about tai and cell competition and the other about tai and Wg distribution. It would be helpful to reorder the figures and improve the narrative so that these are better integrated with each other.”

      We agree. The results of our modifier screen required that we first understand how Tai regulates the Wg pathway before could apply this to understanding the competitive mechanism. Thus, the paper is composed of three sections: 1. the screen, 2. the Tai-Dlp-Wg connection in the absence of competition, and 3. the contribution of Dlp-Wg to the tai[low] ‘loser’ phenotype. These sections use different techniques (e.g., clonal mosaics with genomic alleles, Gal4/UAS and RNAi to define the effect of Tai loss on Wg and Dlp). Ongoing experiments return to clonal mosaics to test whether elevating Dlp can rescue tai lof clones in the same manner as Apc/Apc2 alleles (see Figs. 2-3), which elevate Wg pathway activity.

      Specifics:

      “It would be good to know whether the authors can rescue tai-low clones by over-expression UAS-Dlp.”

      As noted above, experiments are ongoing to test whether normalizing Dlp levels across clonal boundaries rescues elimination of Tai-low clones.

      “The data on Wg distribution seems disjointed from the data about cell competition. The authors could refocus the paper to emphasize the cell competition story. The role of Dlp in Wg distribution is well established, so the authors could remove or condense these results. The story really could be Figs 1, 2, 3 and 7 and keep the paper focused on cell competition. The authors could then discuss Dlp as needed for Wg signaling transduction, which is already established in the literature.”

      We appreciate the suggestion to reorganize the figures to focus the first part of the story on competition, and then follow with the role of Tai in controlling Dlp. We will consider this approach pending the results of ongoing experiments.  

      “The model of tai controlling dlp mRNA and Dlp protein distribution is confusing. In fact, the data for the former is weak, while the data for the latter is strong. I suggest that the authors focus on the altered Dlp protein distribution on tai-low clones. It would also be helpful to prove the Wg signaling is impeded in tai clones (see #5 below).”

      We agree but are currently testing how dlp reporters and mRNA respond to Tai in order to rigorously test a Dlp transcriptional mechanism. To complement the ‘strong’ evidence that Tai regulates Dlp distribution, we are testing Dlp in clones of our Tai Crispr null. Since submission, we have also assessed the effect of blocking the endocytic factor shibire/dynamin in Dlp distribution in Tai deficient cells to complement the data on Pentagone that is already in the paper (see Fig. S3).

      “I don't know if the Fz3-RFP reported for Wg signaling works in imaginal discs, but if it does then the authors could make clones in this background to prove that cell-autonomous Wg signaling is reduced in tai-low clones.”

      We thank the reviewer for this suggestion, which we are now testing.

      Reviewer #2

      Overall Comments:

      “While the authors present good evidence in support of most of their conclusions, there are alternative explanations in many cases that have not been excluded.”

      We appreciate this point and are conducting experiments for a revised submission that will help test alternative mechanisms and clarify our conclusions.

      Specifics:

      “However, the experiments have been done with a single allele, and these experiments do not exclude the possibility that there is another mutation on the same chromosome arm that is responsible for the observed phenotype. Since the authors have a UAS-tai stock, they could strengthen their results using a MARCM experiment where they could test whether the expression of UAS-tai rescues the elimination of tai mutant clones. Alternatively, they could use a second (independent) allele to demonstrate that the phenotype can be attributed to a reduction in tai activity.”

      As noted above, we agree with the need for a second allele and are adding supporting data from a new tai lof allele we have generated by Crispr.

      The tai[k15101] allele acts as a tai hypomorph and has been shown to produce weaker phenotypes than the 61G1 strong lof in a number of papers (Bai et al, 2000; König et al, 2011, Luo et al, 2019, and Zhang et al, 2015). We agree that rescue of tai[k1501] with a UAS-Tai transgene would help rule out effects of second site mutations. We are currently pursuing the reviewer’s second suggestion of phenocopy with a different allele, our new tai Crispr lof.   

      “The authors have screened a total of 21 chromosomes for modification and have not really explained which alleles are nulls and which are hypomorphs. The nature of each of the alleles screened needs to be explained better.”

      We will update the text to better reflect what type of alleles were chosen. In most cases we preferred amorphs or null alleles over hypomorphs, however when the amorph option was not available, we used hypomorphs.

      “Also, the absence of a dominant modification does not necessarily exclude a function of that gene or pathway in the process. This is especially relevant for the Spz/Toll pathway which the authors have previously implicated in the ability of tai-overexpressing cells to kill wild-type cells.”

      We thank the reviewer for this completely accurate point. The dominant screen does not rule out effects of other pathways such as Spz/Toll. Indeed, we were surprised by the lack of dominant effects by Spz/Toll alleles on tai[low] competition given our prior work. The reciprocally clear dominant effect of Apc/Apc2 led us to consider that Wg signaling plays a role in this phenomenon, which then became the starting point of this study.

      “The most important discovery from this screen is the modification by the Apc alleles. This part of the paper would be strengthened by testing for modification by other components of the Wingless pathway. The authors show modification by Apc[MI01007] and the double mutant Apc[Q8] Apc2[N175A]. Without showing the Apc[Q8] and Apc2[N175A] alleles separately, it is hard to know if the effect of the double mutant is due to Apc, Apc2,` or the combination.”

      We agree that testing for modification with other components of the Wg pathway would be helpful to strengthen the connection between Tai low clonal elimination and Wg pathway biology. We also agree that separating Apc [Q8] and Apc2 [N175A] would be a good idea to check if both Apc proteins are equally important for rescuing Tai low cell death, and future experiments for the lab could investigate this distinction.

      “RNAi of tai seems to block the formation of the Wg gradient. If so, one might expect a reduction in wing size. Indeed, this could explain why the wings of tai/Df flies are smaller. The authors mention briefly that the posterior compartment size is reduced when tai-RNAi is expressed in that compartment. However, this observation merits more emphasis since it could explain why tai/Df flies are smaller (Are their wings smaller?).”

      We agree that this is an exciting possibility. Growth effects of Tai linked to interactions with Yorkie and EcR could be due to a distinct role in promoting Wg activity. Alternatively, Tai may cooperate with Yorkie or EcR to control Wg pathway. These are exciting possibilities that we are pursuing in future work

      With regard to the “small size” effect of reducing Tai, we have previously shown that RNAi of Tai using engrailed-Gal4 causes the posterior compartment to shrink (Zhang et al. 2015, Figure 1C-F, H). In this paper, we also showed that tai[k15101]/Df animals are proportionally smaller than wildtype animals and quantified this by measuring 2D wing size (Zhang et al. 2015, Figure 1A and 1B)

      “In Figure 7, the authors show the effect of manipulating Tai levels alone or in combination with increasing Dlp levels. However, they do not include images of Wg protein distribution upon increasing Dlp levels alone.”

      We thank the reviewer for this reminder and have already generated these control images to include in a revised submission paper.

      “In Figure 8, there is more Wg protein both at the DV boundary and spreading when tai is overexpressed in the source cells using bbg-Gal4. However, in an earlier experiment (Figure 5C) they show that the wg-lacZ reporter is downregulated at the DV boundary when tai is overexpressed using en-Gal4. They therefore conclude that wg is not transcriptionally upregulated but is, instead secreted at higher levels when tai is expressed in the source cells. Wg protein is reduced in the DV stripe with tai is overexpressed using the en-Gal4 driver (Figure 6B') and is increased at the same location when tai is overexpressed with the bbg-Gal4 driver. (Figure 8) I don't know how to reconcile these observations.”

      We thank the reviewer for pressing us to develop an overall model explaining our results and how we envision Tai regulating Dlp and Wg. We are preparing a graphic abstract that illustrates this model and will be included in our revision.

      Briefly, we favor a model in which Tai controls the rate of Wg spread via Dlp, without a significant effect on wg transcription. For example, the induction of Dlp across the ‘engrailed’ domain of en>Tai discs (Fig 7B-B”) allows Wg to spread rapidly across the flanks and moderately depletes it from the DV margin (Fig 6B-B”) as noted by the reviewer. Adding a UAS-Dlp transgene in the en>Tai background dramatically accelerates Wg spread and causes it to be depleted from the DV margin and build up at the far end of the gradient adjacent to the dorsal and ventral hinge. Significantly blocking endocytosis of Wg in en>Tai discs with a dominant negative shibire transgene also causes Wg to build up in the same location (new data to be added in a revision) consistent with enhanced spreading. The difference in the bbg-Gal4 experiment is that Tai is only overexpressed in DV margin cells, which constrains and concentrates Wg within this restricted domain; we are in the process of testing whether this effect on Wg is blocked by RNAi of Dlp in bbg>Tai discs.

      “In Figure 9, the tai-low clones have elevated levels of Dlp. How can this be reconciled with the tai-RNAi knockdown shown in Figure 7C' where reducing tai levels causes a strong reduction in Dlp levels?”

      We apologize for not explaining this data well enough. First, the tai[k15101] allele is a weak, viable hypomorph (as shown in our Zhang et al, 2015 paper) whereas the Tai RNAi line is lethal with most drivers (including en-Gal4) and thus a stronger lof. Second, Tai RNAi lower Dlp levels (Fig 7C) while tai[k15101] causes Dlp to accumulate intracellularly (see Fig. 9A-C). These data indicate that reduced Tai leads to a defect in Dlp intracellular trafficking while its loss reduces Dlp overall levels; these data can be explained by a single role for Tai in Dlp traffic to or from the cell membrane, or two roles, one in trafficking and one Dlp expression. As noted, we are investigating both possibilities using dlp reporter lines and our new tai null Crispr allele.

      Reviewer #3:

      Overall Weaknesses:

      “The study has relatively weak evidence for the mechanism of cell competition mediated by Dlp and Wg.”

      The screen and middle section of the paper provide genetic evidence that elevating Wg pathway activity rescues Tai[low} loser cells and that Tai controls levels/localization of Dlp and distribution of Wg in the developing wing disc. Our current work is focused on linking these two finding together in Tai “loser” clones.

      “More evidence is required to support the claim that dlp transcription or endocytosis is affected in tai clones.”

      As noted above, we are testing whether normalizing Dlp levels across clonal boundaries rescues tai[low] loser clones and assessing effects of Tai on dlp transcription and Dlp trafficking.

      Specifics:

      “Most of the rest of the study is not in the clonal context, and mainly relies on RNAi KD of tai in the posterior compartment, which is a relatively large group of cells. I understand why the authors chose a different approach to investigate the role of tai in cell competition. However because ubiquitous loss of tai results in smaller organs, it is important to determine to what extent reducing levels of tai in the entire posterior compartment compares with clonal elimination i.e. cell competition. This is important in order to determine to what extent the paradigm of Tai-mediated regulation of Dlp levels and by extension, Wg availability, can be extended as a general mechanism underlying competitive elimination of tai-low clones. If the authors want to make a case for mechanisms involved in the competitive elimination of tai clones, then they need to show that the KD of tai in the posterior compartment shows hallmarks of cell competition. Is there cell death along the A/P boundary? Or is the compartment smaller because those cells are growing slower?”

      Based on data that cell competition does not occur over compartment boundaries (e.g., see review by L.A. Johnston, Science, 2009), we chose not to use UAS-Gal4 to assess competition, but rather to investigate underlying biology occurring between Tai, Wg, and Dlp.

      “Are the levels of Myc/DIAP1, proteins required for fitness, affected in en>tai RNAi cells?”

      This is, of course, an interesting question given that Myc is a well-studied competition factor and is proposed to be downstream of the Tai-interacting protein Yki. We are not currently focused on Myc, but plan to test its role in the Tai-Dlp-Wg pathway in future work.

      “The authors do not have direct/strong evidence of changes in dlp mRNA levels or intracellular trafficking. To back these claims, the authors should look for dlp mRNA levels and provide more evidence for Dlp endocytosis like an antibody uptake assay or at the very least, a higher resolution image analysis showing a change in the number of intracellular Dlp positive punctae. Also, do the authors think that loss of tai increases Dlp endocytosis, making it less available on the cell surface for maintaining adequate extracellular Wg levels?”

      As noted above, have added experiments using a dominant-negative shibire/dynamin allele to test whether Tai controls Dlp endocytosis. These data will be added to a revised manuscript. We have also gathered reagents to test effects of Tai gain/loss on Dlp secretion.

      “The data shown in the last figure is at odds with the model (I think) the authors are trying to establish: When cells have lower Tai levels, this reduces Dlp levels (S2) presumably either by reducing dlp transcription and/or increasing (?) Dlp endocytosis. This in turn reduces Wg (availability) in cells away from source cells (Figure 6). The reduced Wg availability makes them less fit, targeting them for competitive elimination. But in tai clones, I do not see any change in cell-surface Dlp (9B) (I would have expected them to be down based on the proposed model). The authors also see more total Dlp (9A) (which is at odds with S2 assuming data in S2 were done under permeabilizing conditions.).”

      As noted above (under Rev #2 comments), we apologize for not explaining this data well enough. First, the tai[k15101] allele is a weak, viable hypomorph (as shown in our Zhang et al, 2015 paper) whereas the Tai RNAi line is lethal with most drivers (including en-Gal4) and thus a stronger lof. Second, Tai RNAi lower Dlp levels (Fig 7C) while tai[k15101] causes Dlp to accumulate intracellularly (see Fig. 9A-C). These data indicate that reduced Tai leads to a defect in Dlp intracellular trafficking while its loss reduces Dlp overall levels; these data can be explained by a single role for Tai in Dlp traffic to or from the cell membrane, or two roles, one in trafficking and one Dlp expression. We are investigating both possibilities using dlp reporter lines and our new tai null Crispr allele.

      “As a side note, because Dlp is GPI-anchored, the authors should consider the possibility that the 'total' Dlp staining observed in 9A may not be actually total Dlp (and possibly mostly intracellular Dlp, since the permeabilizing membranes with detergent will cause some (most?) Dlp molecules to be lost, and how this might be affecting the interpretation of the data. I think one way to address this would be to process the permeabilized and non-permeabilized samples simultaneously and then image them at the same settings and compare what membrane staining in these two conditions looks like. If membrane staining in the permeabilized condition is decreased compared to non-permeabilized conditions, and the signal intensity of Dlp in permeabilized conditions remains high, then the authors will have evidence to support increased endocytosis in tai clones. Of course, these data will still need to be reconciled with what is shown in S2.

      We thank the reviewer for this excellent suggestion and are generating mosaic discs to test the proposed approach of synchronous analysis of total vs. intracellular Dlp.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) A problem with in vitro work is that homogeneous cell lines/cultures are, by nature, absent from the rest of the microenvironment. The authors need to discuss this. 

      We have added two sentences to the second paragraph of the Discussion section in which we now acknowledge this concern, but also point out that in vitro models of this sort also provide an experimental advantage in that they facilitate a deconvolution of the extensive complexity resident within the intact animal. Nevertheless, we acknowledge that this deconvolution requires ultimate validation of findings obtained within an in vitro model system to ensure they accurately recapitulate functions that occur in the intact animal in vivo.

      (2) What are n's/replicates for each study? Were the same or different samples used to generate the data for RNA sequencing, methylation beadchip analysis, and EM-seq? This clarification is important because if the same cultures were used, this would allow comparisons and correlations within samples.  

      Additional text has been added in the Methods section to indicate that all samples involving cell culture models which include iPSCs and PGCLCs came from a single XY iPS cell line aliquoted into replicates and all primary cultures which included Sertoli and granulosa cells were generated from pooled tissue preps from mice and then aliquoted into replicates. Finally, all experiments in the study were performed on three replicates. Because this experimental design did indeed allow for comparisons among samples, we have added a new Supplement figure 9 which displays PCA plots showing clustering among control and treatment datasets, respectively, as well as distinctions between each cluster representing each experimental condition.

      (3) In Figure 1, it is interesting that the 50 uM BPS dose mainly resulted in hypermethylation whereas 100 uM appears to be mainly hypomethylation. (This is based on the subjective appearance of graphs). The authors should discuss and/or present these data more quantitatively. For example, what percentage of changes were hypo/hypermethylation for each treatment? How many DMRs did each dose induce? For the RNA-seq results, again, what were the number of up/down-regulated genes for each dose?  

      The experiment shown in Figure 1 was designed to 1) serve as proof of principle that cells maintained in culture could be susceptible to EDC-induced epimutagenesis at all, 2) determine if any response observed would be dose-dependent, and 3) identify a minimally effective dose of BPS to be used for the remaining experiments in this study (which we identified as 1 μM). We agree that it is interesting that the 50 µM dose of BPS induced predominantly hypermethylation changes whereas the 1 µM and 100 µM doses induced predominantly hypomethylation changes, but are not in a position to offer a mechanistic explanation for this outcome at this time. As the results shown satisfied our primary objectives of demonstrating that exposure of cells in culture to BPS could indeed induce DNA methylation epimutations, that this occurs in a dose-dependent manner, and that a dose of as low as 1 µM of BPS was sufficient to induce epimutagenesis, the data obtained satisfied all of the initial objectives of this experiment. That said, in response to the reviewer’s request we have now added text on pages 6-7 alluding to new Supplemental tables 1-3 indicating the total number of DMCs and DMRs, as well as the number of DEGs, detected in response to exposure to each dose of BPS shown in Figure 1, as well as stratifying those results to indicate the numbers of hyper- and hypomethylation epimutations and up- and down-regulated DEGs induced in response to each dose of BPS. While, as noted above, investigating the mechanistic basis for the difference in responses induced by the 50 µM versus 1 and 100 µM doses of BPS was beyond the scope of the study presented in this manuscript, we do find this result reminiscent of the “U-shaped” response curves often observed in toxicology studies. Importantly, this result does demonstrate the elevated resolution and specificity of analysis facilitated by our in vitro cell culture model system.

      (4) Also in Figure 1, were there DMRs or genes in common across the doses? How did DMRs relate to gene expression results? This would be informative in verifying or refuting expectations that greater methylation is often associated with decreased gene expression.  

      In general, we observed a coincidence between changes in DNA methylation and changes in gene expression (Supplement Tables 1-3). Pertaining directly to the reviewer’s question about the extent to which we observed common DMRs and DEGs across all doses, while we only found 3 overlapping DMRs conserved across all doses tested, we did find an average of 51.25% overlap in DMCs and an average of 80.45% overlap in DEGs across iPSCs exposed to the different doses of BPS shown in Figure 1. In addition, within each dose of BPS tested in iPSCs, we also found that there was an overlap between DMCs and the promoters or gene bodies of many DEGs (Supplement Table 4). Specifically within gene promoters, we observed a correlation between hypermethylated DMCs and decreased gene expression and hypomethylated DMCs and increased gene expression, respectively (Supplement Figure 2).

      (5) In Figure 2, was there an overlap in the hypo- and/or hyper-methylated DMCs? Please also add more description of the data in 2b to the legend including what the dot sizes/colors mean, etc. Some readers (including me) may not be familiar with this type of data presentation. Some of this comes up in Figure 4, so perhaps allude to this earlier on, or show these data earlier.  

      We observed an average of 11.05% overlapping DMCs between different pairs of cell types, we did not observe any DMCs that were shared among all four cell types. Indeed, this limited overlap of DMCs among different cell types exposed to BPS was the primary motivation for the analysis described in Figure 2. Thus, instead of focusing solely on direct overlap between specific DMCs, we instead examined similarities among the different cell types tested in the occurrence of epimutations within different annotated genomic regions. To better describe this, we have now added additional text to page 9. We have also added more detail to the legend for Figure 2 on page 8 to more clearly explain the significance of the dot sizes and colors, explaining that the dot sizes are indicative of the relative number of differentially methylated probes that were detected within each specific annotated genomic region, and that the dot colors are indicative of the calculated enrichment score reflecting the relative abundance of epimutations occurring within a specific annotated genomic region. The relative score is calculated by iterating down the list of DMCs and increasing a running-sum statistic when encountering a DMC within the specific annotated genomic region of interest and decreasing the sum when the epimutation is not in that annotated region. The magnitude of the increment depends upon the relative occurrence of DMCs within a specific annotated genomic region.

      (6) iPSCs were derived from male mice MEFs, and subsequently used to differentiate into PGCLCs. The only cell type from an XX female is the granulosa cells. This might be important, and should be mentioned and its potential significance discussed (briefly).  

      We have added a new paragraph just before the final paragraph of the Discussion section in which we acknowledge that most of the cell types analyzed during our study were XY-bearing “male” cells and that the manner in which XX-bearing “female” cells might respond to similar exposures could differ from the responses we observed in XY cells. However, we also noted that our assessment of XX-bearing granulosa cells yielded results very similar to those seen in XY Sertoli cells suggesting that, at least for differentiated somatic cell types, there does not appear to be a significant sex-specific difference in response to exposure to a similar dose of the same EDC. That said, we also acknowledged that in cell types in which dosage compensation based on X-chromosome inactivation is not in place, differences between XY- and XX-bearing cells could accrue.

      (7) EREs are only one type of hormone response element. The authors make the point that other mechanisms of BPS action are independent of canonical endocrine signaling. Would authors please briefly speculate on the possibility that other endocrine pathways including those utilizing AREs or other HREs may play a role? In other words, it may not be endocrine signaling independent. The statement that the differences between PGCLCs and other cells are largely due to the absence of ERs is overly simplistic.  

      Previous reports have indicated that BPS does not have the capacity to bind with the androgen receptor (Pelch et al., 2019; Yang et al., 2024). However there have been reports indicating that BPS can interact with other endocrine receptors including PPARγ and RXRα, which play a role in lipid accumulation and the potential to be linked to obesity phenotypes (Gao et al., 2020; Sharma et al., 2018). To address the reviewer’s comment we assessed the expression of a panel of hormone receptors including PPARγ, RXRα, and AR  in each of the cell types examined in our study and these results are now shown in a new Supplent Figure 4. We show that in addition to not expressing either estrogen receptor (ERa or ERb), germ cells also do not express any of the other endocrine receptors we tested including AR, PPARγ, and RXRα. Thus we now note that these results support our suggestion that the induction of epimutations we observed in germ cells in response to exposure to BPS appears to reflect disruption of non-canonical endocrine signaling. We also note that non-canonical endocrine signaling is well established (Brenker et al., 2018; Ozgyin et al., 2015; Song et al., 2011; Thomas and Dong, 2006). Thus we feel the suggestion that the effects of BPS exposure could conceivably reflect either disruption of canonical or non-canonical signaling in any cell type is well justified and that our data suggests that both of these effects appear to have accrued in the cells examined in our study as suggested in the text of our manuscript.

      (8) Interpretation of data from the GO analysis is similarly overly simplistic. The pathways identified and discussed (e.g. PI3K/AKT and ubiquitin-like protease pathways) are involved in numerous functions, both endocrine and non-endocrine. Also, are the data shown in Figure 6a from all 4 cell types? I am confused by the heatmap in 6c, which genes were significantly affected by treatment in which cell types?  

      Per the reviewer’s request, we have added text to indicate that Figure 6a is indeed data from all four cell types examined. We have also modified the text to further clarify that Figure 6c displays the expression of other G-coupled protein receptors which are expressed at similar, if not higher, levels than either ER in all cell types examined, and that these have been shown to have the potential to bind to either 17β-estradiol or BPA in rat models. As alluded to by the reviewer, this is indicative of a wide variety of distinct pathways and/or functions that can potentially be impacted by exposure to an EDC such as BPS. Thus, we have attempted to acknowledge the reviewer’s primary point that BPS may interact with a variety of receptors or other factors involved with a wide variety of different pathways and functions. Importantly, this illustrates the strength of our model system in that it can be used to identify potential impacted target pathways that can then be subsequently pursued further as deemed appropriate.

      (9) In Figure 7, what were the 138 genes? Any commonalities among them? 

      We have now added a new supplemental Excel file that lists the 138 overlapping conserved DEGs that did not become reprogrammed/corrected during the transition from iPSCs to PGCLCs. In addition, we have added new text on page 22 and a new Supplemental Figure 8 which displays KEGG analysis of pathways associated with these 138 retained DEGs. We find that these genes are primarily involved with cell cycle and apoptosis pathways which, interestingly, have the potential to be linked to cancer development which is often linked to disruptions in chromatin architecture.

      (10) The Introduction is very long. The last paragraph, beginning line 105, is a long summary of results and interpretations that better fit in a Discussion section.

      We have now significantly reduced the length and scope of the final paragraph of the Introduction per the reviewer’s recommendation.

      (11) Provide some details on husbandry: e.g. were they bred on-site? What food was given, and how was water treated? These questions are to get at efforts to minimize exposure to other chemicals.  

      We have added additional text detailing that all mice used in the project were bred onsite, water was non-autoclaved conventional RO water, and our selection of 5V5R extruded feed for mice used in this study which was highly controlled for the presence of isoflavones and has been certified to be used for estrogen-sensitive animal protocols.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript uses cell lines representative of germ line cells, somatic cells, and pluripotent cells to address the question of how the endocrine-disrupting compound BPS affects these various cells with respect to gene expression and DNA methylation. They find a relationship between the presence of estrogen receptor gene expression and the number of DNA methylation and gene expression changes. Notably, PGCLCs do not express estrogen receptors and although they do have fewer changes, changes are nevertheless detected, suggesting a nonconical pathway for BPS-induced perturbations. Additionally, there was a significant increase in the occurrence of BPS-induced epimutations near EREs in somatic and pluripotent cell types compared to germ cells. Epimutations in the somatic and pluripotent cell types were predominantly in enhancer regions whereas that in the germ cell type was predominantly in gene promoters.

      Strengths:

      The strengths of the paper include the use of various cell types to address the sensitivity of the lineages to BPS as well as the observed relationship between the presence of estrogen receptors and changes in gene expression and DNA methylation.

      Weaknesses:

      The weaknesses include the lack of reporting of replicates, superficial bioinformatic analysis, and the fact that exposures are more complicated in a whole organism than in an isolated cell line.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Overall, this is an intriguing paper but more transparency in the replicates and methods and a more rigorous bioinformatic treatment of the data are required.

      Specific comments:

      (1) End of abstract "These results suggest a unique mechanism by which an EDC-induced epimutated state may be propagated transgenerationally following a single exposure to the causative EDC." This is overly speculative for an abstract. There is only epigenetic inheritance following mitosis or differentiation presented in this study. There is no meiosis and therefore no ability to assess multi- or transgenerational inheritance. 

      We have modified the text at the end of the abstract to more precisely reflect our intended conclusions based on our data. In our view, the ability of induced epimutations to transcend meiosis per se is not as relevant to the mechanism of transgenerational inheritance as their ability to transcend major waves of epigenetic reprogramming that normally occur during development of the germ line. In this regard the transition from pluripotent iPSCs to germline PGCLCs has been shown to recapitulate at least the first portion of normal germline reprogramming, and now our data provide novel insight into the fate of induced epimutations during this process. Specifically, we show that a prevelance of epimutations was conserved during the iPSC à germ cell transition but that very few (< 5%) of the specific epimutations present in the the BPS-exposed iPSCs were retained when those cells were induced to form PGCLCs. Rather, we observed apparent correction of a large majority of the initially induced epimutations during this transition, but this was accompanied by the apparent de novo generation of novel epimutations in the PGCLCs. We suggest, based on other recent reports in the literature, that this is a result of the BPS exposure inducing changes in the chromatin architecture in the exposed iPSCs such that when the normal germline reprogramming mechanism is imposed on this disrupted chromatin template there is both correction of many existing epimutations and the genesis of many novel epimutations. This observation has the potential to explain the long-standing question of why the prevalence of epimutations persists across multiple generations despite the occurrence of epigenetic reprogramming during each generation. Nevertheless, as noted above, we have modified the text at the end of the abstract to temper this interpretation given that it is still somewhat speculative at this point.

      (2) Doses used in the experiments. One needs to be careful when stating that the dose used is "below FDA's suggested safe environmental level established for BPA" because a different bisphenol is being used here (BPA vs BPS) and the safe level is that which the entire organism experiences. It is likely that cell lines experience a higher effective dose.  

      We have now made a point of noting that our reference to an EPA-recommended “safe dose” of BPA was for humans and/or intact animals. Changes to this effect have been made in the second and sixth paragraphs of the Introduction section. In addition, we have added text at the end of the fourth paragraph of the Discussion section acknowledging that, as the reviewer suggests, the same dose of an EDC could exert greater effects on cells in a homogeneous culture than on the same cell type within an intact animal given the potential for mitigating metabolic effects in the latter. However, we also note that the ability we demonstrated to quantify the effects of such exposures on the basis of numbers of epimutations (DMCs or DMRs) induced could potentially be used in future studies to study this question by assessing the effects of a specific dose of a specific EDC on a specific cell type when exposed either within a homogeneous culture or within an intact animal.

      (3) Figure 1: In the dose response, what was the overlap in DMCs and DEGs among the 3 doses? Are the responses additive, synergistic, or completely non-overlapping? This is an important point that should be addressed. 

      Please see our response to Reviewer 1 critique #4 above where we address similar concerns. While we do find overlap among different cell types with respect to the DMCs, DMRs, and DEGs displayed in Figure 1, we found the effect to be only partially additive as opposed to synergistic in any apparent manner. The fold increase in DMCs, DMRs, and DEGs resulting from exposure to doses of 1 μM or 50 μM ranged from 2.5x to 4.4x, which was well below the 50x increase that would have been expected from a strictly additive effect, and the effect increased even less, if at all, in response to exposure to doses of 50 μM versus 100 μM BPS. Finally, as now noted in the Discussion section on page 25, our conclusion is that these results display a limited dose-dependent effect that was partially additive but also plateaued at the highest doses tested.

      (4) Methods: How many times was each exposure performed on a given cell type? This information should be in the figure legends and methods. In the case of multiple exposures for a given line, do the biological replicates agree? 

      Please see our response to Reviewer 1 critique #2 where we address similar concerns with newly added text and analysis. We now note repeatedly on pages 39-45 that each analysis was conducted on three replicate samples, and we display the similarity among those replicates graphically in a new Supplement Figure 9.

      (5) DNA methylation analyses. Very little analysis is presented on the BeadChip array other than hypermethylated/hypomethylated and genomic regions of DMCs. What is the range of methylation changes? Does it vary between hypo vs. hyper DMCs? How many array experiments were performed (biological replicates) and what stats were used to determine the DMCs? Are there DMCs in common among the various cell types? As an example, if more meaningful analysis, one can plot the %5mC over a given array for comparisons between control and treated cell types. For more granularity, the %5mC can be presented according to the element type (enhancers vs promoters). 

      Please see our response to Reviewer 1 critique #2 above where we address similar concerns regarding the number of biological replicates used in this study. DMCs on the Infinium array are identified using mixed linear models. This general supervised learning framework identifies CpG loci at which differential methylation is associated with known control vs. treated co-variates. CpG probes on the array were defined as having differential changes that met both p-value and FDR (≤ 0.05) significant thresholds between treatment and control samples for each cell type analyzed. The range of medians across all samples was 0.0278 to 0.0059 for hypermethylated beta values and -0.0179 to -0.0033 for hypomethylated beta values. As noted above, we did observe an overlap in DMCs between cell types. Thus, we observed an average of 11.05% overlapping DMCs between two or more cell types but we did not observe any DMCs shared between all four cell types. We have added additional text on page 9 and new Supplement Tables 1-4 and Supplement Figure 1 to now more clearly describe that this limited similarity in direct overlap of DMCs was the underlying motivation for the analysis described in Figure 2. Finally, the enrichment dot plots shown in Figure 2 provide the information the reviewer requested regarding the %5mC observed at different annotated genomic element types.

      (6) The investigators correlate the number of DMCs in a given cell type with the presence of estrogen receptors. Does the correlation extend to the methylation difference (delta beta) at the statistically different probes?

      We have added a new Supplement Figure 3 in which we provide data addressing this question. In brief, we find that the delta betas of probes enriched at enhancer regions and associated with relative proximity to ERE elements in Sertoli cells, granulosa cells, and iPSCs appear very similar to those associated with DMCs not located within these enriched regions. However, when we compared the similarity of the two data sets with goodness of fit tests, we found these relatively small differences were, in fact, statistically significant based on a two-sample Kolmogorov-Smirnov test. These observed significant differences appear to indicate that there is higher variability among the delta betas associated with hypomethylated, but not hypermethylation changes occurring at DMCs associated with enhancers, potentially suggesting a greater tendency for exposure to BPS to induce hypomethylation rather than hypermethylation changes, at least in these specific regions.

      (7) Methylation changes relative to EREs are presented in multiple figures. Are other sequences enriched in the DMCs? 

      We profiled the genomic sequence within 500 bp of cell type-specific enriched DMCs that were either associated with enhancer regions in Sertoli, granulosa, or iPS cells or transcription factor binding sites in PGCLCs for the identification of higher abundance motif sequences. We then compared any motifs identified with the JASPAR database to potentially find transcription factors that could be binding to these regions. Interestingly we found that the two most common motifs across all cell types were associated with either the chromatin remodeling transcription factor HMG1A or the pluripotency factor KLF4.

      (8) Please present a correlation plot between the methylation differences and the adjacent DEGs. Again, the absence of consideration of the absolute changes in methylation and gene expression minimizes the impact of the data. 

      We analyzed the relationship between DMCs at DEGs promoter regions and the corresponding change in expression of that DEG. Our data support a relationship between up-regulated genes showing decreased methylation in promoter regions and down-regulated genes showing increased methylation at promoter regions, although there were some exceptions to this relationship.

      (9) EM-Seq is mentioned in Figure 7 and in the material and methods. Where is it used in this study? 

      We now note in the text on page 22 that EM-seq was used during experiments assessing the propagation of BPS-induced epimutations during the iPSC à EpiLC à PGCLC cell state transitions to gather higher resolution data of changes to DNA methylation differences at the whole-epigenome level.

      References

      Brenker C, Rehfeld A, Schiffer C, Kierzek M, Kaupp UB, Skakkebæk NE, Strünker T. 2018. Synergistic activation of CatSper Ca2+ channels in human sperm by oviductal ligands and endocrine disrupting chemicals. Hum Reprod 33:1915–1923. doi:10.1093/humrep/dey275

      Gao P, Wang L, Yang N, Wen J, Zhao M, Su G, Zhang J, Weng D. 2020. Peroxisome proliferator-activated receptor gamma (PPARγ) activation and metabolism disturbance induced by bisphenol A and its replacement analog bisphenol S using in vitro macrophages and in vivo mouse models. Environ Int 134. doi:10.1016/J.ENVINT.2019.105328

      Ozgyin L, Erdos E, Bojcsuk D, Balint BL. 2015. Nuclear receptors in transgenerational epigenetic inheritance. Prog Biophys Mol Biol. doi:10.1016/j.pbiomolbio.2015.02.012

      Pelch KE, Li Y, Perera L, Thayer KA, Korach KS. 2019. Characterization of Estrogenic and Androgenic Activities for Bisphenol A-like Chemicals (BPs): In Vitro Estrogen and Androgen Receptors Transcriptional Activation, Gene Regulation, and Binding Profiles. Toxicol Sci 172:23–37. doi:10.1093/TOXSCI/KFZ173

      Sharma S, Ahmad S, Khan MF, Parvez S, Raisuddin S. 2018. In silico molecular interaction of bisphenol analogues with human nuclear receptors reveals their stronger affinity vs. classical bisphenol A. Toxicol Mech Methods 28:660–669. doi:10.1080/15376516.2018.1491663

      Song K-H, Lee K, Choi H-S. 2011. Endocrine Disrupter Bisphenol A Induces Orphan Nuclear Receptor Nur77 Gene Expression and Steroidogenesis in Mouse Testicular Leydig Cells. Endocrinology 143:2208–2215. doi:10.1210/endo.143.6.8847

      Thomas P, Dong J. 2006. Binding and activation of the seven-transmembrane estrogen receptor GPR30 by environmental estrogens: A potential novel mechanism of endocrine disruption. J Steroid Biochem Mol Biol 102:175–179. doi:10.1016/j.jsbmb.2006.09.017

      Yang Z, Wang L, Yang Y, Pang X, Sun Y, Liang Y, Cao H. 2024. Screening of the Antagonistic Activity of Potential Bisphenol A Alternatives toward the Androgen Receptor Using Machine Learning and Molecular Dynamics Simulation. Environ Sci Technol 58:2817–2829. doi:10.1021/ACS.EST.3C09779/ASSET/IMAGES/LARGE/ES3C09779_0004.JPEG

    1. Author response:

      eLife assessment

      This manuscript reports an important finding that the transcription factor Scleraxis regulates regenerative myogenesis by controlling the proliferation and differentiation of muscle stem cells. The evidence presented is compelling and supports the conclusions and the mechanisms by which this gene regulates satellite cell function. These data will be of interest to developmental, transcriptional, and stem cell biologists.

      Public Reviews:

      Reviewer #1 (Public Review):

      This manuscript by Bai et al concerns the expression of Scleraxis (Scx) by muscle satellite cells (SCs) and the role of that gene in regenerative myogenesis. The authors report the expression of this gene associated with tendon development in satellite cells. Genetic deletion of Scx in SCs impairs muscle regeneration, and the authors provide evidence that SCs deficient in Scx are impaired in terms of population growth and cellular differentiation. Overall, this report provides evidence of the role of this gene, unexpectedly, in SC function and adult regenerative myogenesis.

      We appreciate the comments and thank her/him for the support of our manuscript.

      There are a few minor points of concern.

      (1) From the data in Figure 1, it appears that all of the SCs, assessed both in vitro and in vivo, express Scx. The authors refer to a scRNA-seq dataset from their lab and one report from mdx mouse muscle that also reveals this unexpected gene expression pattern. Has this been observed in many other scRNA-seq datasets? If not, it would be important to discuss potential explanations as to why this has not been reported previously.

      Thanks for this question regarding data in Figure 1. We did initially use immunofluorescence staining of Pax7 and GFP on muscle sections and primary myoblast cultures prepared from Tg-ScxGFP mice to conclude that Scx was expressed in satellite cells (SCs). In addition to the cited mdx RNA-seq data, we have included a re-analysis of a published scRNA-seq data set in Figure 2E (Dell'Orso, Juan et al., Development, 2019), and our own scRNA-seq data (Figure S5D, F). We have also re-examined an additional scRNA-seq data set of TA muscles at various regeneration time points (De Micheli et al., Cell Rep. 2020), in which Scx expression was detected in MuSC progenitors and mature muscle cells (in addition to tenocytes). Thus, our immunostaining results are consistent with scRNA-seq data from our and two other independent scRNA-seq data sets.

      We think that Scx expression in the adult myogenic lineage was not previously reported mainly because its expression level was low, and might be dismissed as spurious detection. Additionally, detecting such low expression levels requires sophisticated detection methods with high capture efficiency. Previous studies have noted limitations in transcript capture or transcription factor dropout in 10x Genomics-based datasets (Lambert et al., Cell, 2018; Pokhilko et al., Genome Res., 2021). Or, Scx was simply not a focus in prior studies amid other genes of interest. Our specific focus on Scx has led us to evaluate its expression in these data sets. We will add the above cited scRNA-seq data set (De Micheli et al., Cell Rep. 2020) and provide a discussion in the revised version.

      (2) A major point of the paper, as illustrated in Fig. 3, is that Scx-neg SCs fail to produce normal myofibers and renewed SCs following injury/regeneration. They mention in the text that there was no increased PCD by Caspase staining at 5 DPI. A failure of cell survival during the process of SC activation, proliferation, and cell fate determination (differentiation versus self-renewal) would explain most of the in vivo data. As such, this conclusion would seem to warrant a more detailed analysis in terms of at least one or two other time points and an independent method for detecting dead/dying cells (the in vitro data in Fig. 4F is also based on an assessment of activated Caspase to assess cell death). The in vitro data presented later in Fig. S4G, H do suggest an increase in cell loss during proliferative expansion of Scx-neg SCs. To what extent does cell loss (by whatever mechanism of cell death) explain both the in vivo findings of impaired regeneration and even the in vitro studies showing slower population expansion in the absence of Scx?

      We appreciate these constructive suggestions. Additional methods and different time points should be helpful in investigating SC cell loss in ScxcKO. Based on the number of available cKO animals, we will carefully choose additional time point(s) to assess PCD, using anti-active Caspase-3 immunostaining and another independent method (e.g., TUNNEL). Although the outcomes are uncertain, we will endeavor to obtain meaningful data from these experiments.

      (3) I'm not sure I understand the description of the data or the conclusions in the section titled "Basement membrane-myofiber interaction in control and Scx cKO mice". Is there something specific to the regeneration from Scx-neg myogenic progenitors, or would these findings be expected in any experimental condition in which myogenesis was significantly delayed, with much smaller fibers in the experimental group at 5 DPI?

      We very much appreciate this comment. We agree that there is unlikely anything specific about the regeneration from Scx-negative myogenic progenitors. Unfilled or empty ghost fibers (basement membrane remnant) are to be expected due to the small fiber and poor regeneration in the ScxcKO mice at 5 dpi. We will correct the subtitle and content accordingly.

      (4) The data presented in Fig. 4B showing differences in the purity of SC populations isolated by FACS depending on the reporter used are interesting and important for the field. The authors offer the explanation of exosomal transfer of Tdt from SCs to non-SCs. The data are consistent with this explanation, but no data are presented to support this. Are there any other explanations that the authors have considered and that could be readily tested?

      Thanks for highlighting this phenomenon. We struggled with the SC purity issue for a long time. The project started with using the R26RtdT reporter for tdT’s paraformaldehyde  resistant strong fluorescence (fixation) to aid visualization in vivo. Later, when we used the tdT signal to purify SCs by FACS, we found that only 80% sorted tdT+ cells are Pax7+. We then switched to the R26RYFP reporter, from which we achieved much higher purity (95%) of SCs (Pax7+) by FACS. As such, we also repeated and confirmed many in vivo experimental results using the R26RYFP reporter (included in the manuscript). Due to the low purity of tdT+SCs by FACS, we discontinued that mouse colony after we confirmed the superior utility of the R26RYFP reporter for SC isolation.

      We sincerely apologize for not being able to conduct further testable experiments on this intriguing phenomenon. However, this issue has since been addressed and published by Murach et al., iScience, (2021). Like our experience, they found non-satellite mononuclear cells with tdT fluorescence after TMX treatment when SCs were isolated via FACS. To determine this was not due to off-target recombination or a technical artifact from tissue processing, they conducted extensive analyses. They found that the tdT+ mononuclear cells included fibrogenic cells (fibroblasts and FAPs), immune cells/macrophages, and endothelial cells. Additionally, they confirmed the significant potential of extracellular vesicle (EV)-mediated cargo transfer, which facilitates the transfer of full-length tdT transcript from lineage-marked Pax7+ cells to those mononuclear cells. We will modify our text to include and acknowledge their contribution to this important point.

      (5) The Cut&Run data of Fig. 6 certainly provide evidence of direct Scx targets, especially since the authors used a novel knock-in strain for analyses. The enrichment of E-box motifs provides support for the 207 intersecting genes (scRNA-seq and Cut&Run) being direct targets. However, the rationale elaborated in the final paragraph of the Results section proposing how 4 of these genes account for the phenotypes on the Scx-neg cells and tissues is just speculation, however reasonable. These are not data, and these considerations would be more appropriate in the Discussion in the absence of any validation studies.

      We agree with this comment and will move this speculation into the discussion.

      Reviewer #2 (Public Review):

      Summary:

      Scx is a well-established marker for tenocytes, but the expression in myogenic-lineage cells was unexplored. In this study, the authors performed lineage-trace and scRNA-seq analyses and demonstrated that Scx is expressed in activated SCs. Further, the authors showed that Scx is essential for muscle regeneration using conditional KO mice and identified the target genes of Scx in myogenic cells, which differ from those of tendons.

      Strengths:

      Sometimes, lineage-trace experiments cause mis-expression and do not reflect the endogenous expression of the target gene. In this study, the authors carefully analyzed the unexpected expression of Scx in myogenic cells using some mouse lines and scRNA-seq data.

      We appreciate the comments and thank her/him for noting the strengths of our manuscript.

      Weaknesses:

      Scx protein expression has not been verified.

      We are aware of this weakness. We had previously used Western blotting (WB) using cultured SCs from control and ScxcKO mice, but did not detect endogenous Scx protein in the control. Hence, we used ScxCreERT2 lineage-tracing, Tg-ScxGFP expression, and ScxTy1 knock-in allele as complementary, even though indirect, ways to address this issue. Following the reviewer’s comment, we will purchase new anti-Scx antibodies and re-perform WB using cultured SCs. If the new antibodies fail to detect endogenous Scx by WB, we will then use immunofluorescence staining to detect endogenous Scx protein.

    1. Author response:

      eLife assessment:

      This manuscript reports valuable findings on the role of the Srs2 protein in turning off the DNA damage signaling response initiated by Mec1 (human ATR) kinase. The data provide solid evidence that Srs2 interaction with PCNA and ensuing SUMO modification is required for checkpoint downregulation. However, experimental evidence with regard to the model that Srs2 acts at gaps after camptothecin-induced DNA damage is currently lacking. The work will be of interest to cell biologists studying genome integrity but would be strengthened by considering the possible role of Rad51 and its removal. 

      We appreciate the editors and the reviewers for providing evaluation and helpful comments. As detailed below, we plan to adjust the writing and figures to address the points raised by the reviewers. We believe that these changes will improve the clarity of the work. Below is a summary of our plan to address the two main criticisms.

      (1) Regarding the sites of Srs2 action, our data support the conclusion that Srs2 removal of RPA is favored at a subset of ssDNA regions that have proximal PCNA, but not at sites lacking PCNA. A logical supposition for the former types of ssDNA regions includes ssDNA gaps and tails generated during DNA repair or replication, wherein PCNA can be loaded at the ssDNA-dsDNA junction with a 3’ DNA end. Examples of the latter type of ssDNA regions without proximal PCNA can form within negatively supercoiling regions or intact R-loops, both of which lack 3’ DNA end for PCNA loading. While we have stated this conclusion in the text, we highlighted ssDNA gaps as sites of Srs2 action in Discussion and in the model figure, which could be misleading. We will clarify our model, that is, Srs2 distinguishes among different types of ssDNA regions using PCNA proximity as a guide for RPA removal, and state that the precise nature of Srs2 action sites remain to be determined. Regardless, the feature of Srs2 revealed in this work provides a rationale for how it can remove RPA at subsets of ssDNA regions without unnecessary stripping of RPA at other sites.

      (2) While Rad51 removal is an important facet of Srs2 functions, it is not relevant to our current study based on the following observations and rationales.

      First, we have provided several lines of evidence to support the conclusion that Rad51 removal by Srs2 is separable from the Srs2-RPA antagonism (Dhingra et al., 2021). For example, while rad51∆ rescues the hyper-recombination phenotype of srs2∆ cells, it does not affect the hyper-checkpoint phenotype of srs2∆. Strikingly, rfa1-zm1/zm2 have the opposite effect. The differential effects of rad51∆ and rfa1-zm1/zm2 were also seen for the srs2-_ATPase dead allele (_srs2-K41A). For example, rfa1-zm2 rescued the hyper-checkpoint defect and the CPT sensitivity of srs2-K41A, while rad51∆ had neither effect.

      These and other data described in Dhingra et al suggest that Srs2’s effects on checkpoint vs. recombination are separable and that the Srs2-RPA antagonism during the DNA damage checkpoint is independent of Rad51.

      Second, our current work addresses which Srs2 features affect the Srs2-RPA antagonism during the DNA damage response and its implications. Given this antagonism is separable from Srs2 removal of Rad51, including Rad51 regulation would be distractive from the main points of this work.

      Third, in the current work, we began by examining all known regulatory and protein-protein interaction features of Srs2, including the Rad51 binding domain. Consistent with our conclusion summarized above based on the Dhingra et al study, deleting the Rad51 binding domain in Srs2 (srs2-∆Rad51BD) has no effect on rfa1-zm2 phenotype in CPT (Figure 2D). This is in sharp contrast to mutating the PCNA binding and the sumoylation sites of Srs2, which suppressed rfa1-zm2 for its CPT sensitivity and checkpoint abnormalities (Figure 2C). This data provides yet another evidence that Srs2 regulation of Rad51 is separable from the Srs2-RPA antagonism. 

      In summary, our work provides a foundation for future examination of how Srs2 regulates RPA and Rad51 in different manners, how these two facets of the Srs2 functions affect genome integrity in different capacity, and whether there is a crosstalk between them during certain DNA metabolism processes.

      Public Reviews:

      Reviewer #1:

      Overall, the data presented in this manuscript is of good quality. Understanding how cells control RPA loading on ssDNA is crucial to understanding DNA damage responses and genome maintenance mechanisms. The authors used genetic approaches to show that disrupting PCNA binding and SUMOylation of Srs2 can rescue the CPT sensitivity of rfa1 mutants with reduced affinity for ssDNA. In addition, the authors find that SUMOylation of Srs2 depends on binding to PCNA and the presence of Mec1. Noted weaknesses include the lack of evidence supporting that Srs2 binding to PCNA and its SUMOylation occur at ssDNA gaps, as proposed by the authors. Also, the mutants of Srs2 with impaired binding to PCNA or impaired SUMOylation showed no clear defects in checkpoint dampening, and in some contexts, even resulted in decreased Rad53 activation. Therefore, key parts of the paper would benefit from further experimentation and/or clarification.  

      We thank the reviewer for the positive comments on this work and address her/his remark regarding ssDNA gaps below in Major Comment #1. In addition, we detailed below our data and rationale in suggesting that the checkpoint dampening phenotype of srs2-∆PIM and -3KR (deficient for PCNA binding and sumoylation, respectively) is masked by redundant pathways. We further describe our plan to enhance the clarity of both text and model to address these points from the reviewer. 

      Major Comments 

      (1) The central model proposed by the authors relies on the loading of PCNA at the 3' junction of an ssDNA gap, which then mediates Srs2 recruitment and RPA removal. While several aspects of the model are consistent with the data, the evidence that it is occurring at ssDNA gaps is not strong. The experiments mainly used CPT, which generates mostly DSBs. The few experiments using MMS, which mostly generates ssDNA gaps, show that Srs2 mutants lead to weaker rescue in this context (Figure S1). How do the authors explain this discrepancy? In the context of DSBs, are the authors proposing that Srs2 is engaging at later steps of HRdriven DSB repair where PCNA gets loaded to promote fill-in synthesis? If so, is RPA removal at that step important for checkpoint dampening? These issues need to be addressed and the final model adjusted. 

      We appreciate the reviewer’s concern. Our conclusion is that Srs2 can be guided by PCNA to a subset of ssDNA regions for RPA removal, and that this Srs2 action is not favored at ssDNA regions with no proximal PCNA. It is important to note that CPT can produce both types of ssDNA regions. Besides ssDNA generated via DSB-associated recombinational repair, CPT can also lead to ssDNA gap formation upon excision repair and DNA-protein crosslink repair of trapped Top1 (Sun et al., 2020). ssDNA regions generated during these DNA repair processes often contain 3’ DNA end for PCNA loading, thus they can favor Srs2 removal of RPA. Another facet of CPT’s effects (besides DNA lesions) is depleting functional pool of Top1, thus causing topological stress and consequently increased levels of DNA supercoiling and R-loops (Koster et al., 2007, Petermann et al., 2022). ssDNA formed within the negatively supercoiled regions and in R-loops lacks 3’ DNA end unless it is cleaved by nucleases, thus these sites would be disfavored for Srs2 removal of RPA due to lack of PCNA loading. Our conclusion that ssDNA regions with nearby PCNA are preferred sites for Srs2 action provides a rationale for how Srs2 can remove RPA at certain ssDNA regions but minimize unnecessary stripping of RPA from other sites.

      We will clarify in Discussion that CPT can generate twp types of ssDNA regions as stated above, and that Srs2 could distinguish among them using PCNA proximity as a guide for RPA removal. While this conclusion was described in the text, we emphasized ssDNA gap as a Srs2 action site in the model. We will clarify that while this is a logical supposition, other types of ssDNAs with proximal PCNA could also be targeted by Srs2 and that our work paves the way to determine the precise nature of ssDNA regions for Srs2’s action. 

      The reasons for the less potent growth suppression of rfa1 mutants by srs2 alleles in MMS condition compared with CPT condition are unclear, but multiple possibilities should be considered, given that MMS and CPT affect checkpoint responses differently and that RPA and Srs2 affect growth in multiple ways. For example, while CPT only activates the DNA damage checkpoint, MMS additionally induces DNA replication checkpoint (Menin et al., 2018, Redon et al., 2003). It is thus possible that the Srs2-RPA antagonism is relatively more important for the DNA damage checkpoint than the DNA replication checkpoint. Further investigation of this possibility among others will shed light on differential suppressive effects seen in this work. We will include this discussion in the revised text.

      (2) The data in Figure 3 showing that Srs2 mutants reduce Rad53 activation in the rfa1-zm2 mutant are confusing, especially given the claim of an anti-checkpoint function for Srs2 (in which case Srs2 mutants should result in increased Rad53 activation). The authors propose that Rad53 is hyperactivated in rfa1-zm2 mutant because of compromised ssDNA protection and consequential DNA lesions, however, the effects sharply contrast with the central model. Are the authors proposing that in the rfa1-zm2 mutant, the compromised protection of ssDNA supersedes the checkpoint-dampening effect? Perhaps a schematic should be included in Figure 3 to depict these complexities and help the reader. The schematic could also include the compensatory dampening mechanisms like Slx4 (on that note, why not move Figure S2 to a main figure?... and even expand experiments to better characterize the compensatory mechanisms, which seem important to help understand the lack of checkpoint dampening effect in the Srs2 mutants) 

      Genetic interactions that involve partially defective alleles, multi-functional proteins, and redundant pathways are complex to comprehend. For example, a phenotype seen for the null allele may not be seen for partially defective alleles. In the context of this study, while srs2 null increased Rad53 activation (Dhingra et al., 2021), srs2-∆PIM and -3KR did not (Figure 3A-3B). However, srs2-∆PIM enhanced Rad53 activation when combined with another checkpoint dampening mutant slx4RIM, suggesting that defects of srs2-∆PIM can be compensated by Slx4 (Figure S2). Importantly, srs2-∆PIM and -3KR rescued rfa1-zm2’s checkpoint abnormality (Figure 3A3B), suggesting that Srs2 binding to PCNA and its sumoylation contribute to the Srs2-RPA antagonism in the DNA damage checkpoint response.

      A partially defective allele that impairs a specific function of a protein can be a powerful genetic tool even when it lacks a particular phenotype on its own. For example, a partially defective allele of the checkpoint protein Rad9 impairing its binding to gamma-H2A (rad9-K1088M) does not affect the G2/M checkpoint nor cause DNA damage sensitivity due to the compensation of other checkpoint factors (Hammet et al., 2007); however_, rad9-K1088M_ rescues the DNA damage sensitivity and persistent G2/M checkpoint of rtt107 and slx4 mutants, providing one of the evidences supporting a role of the Slx4-Rtt107 axis in removal of Rad9 from chromatin (via competing with Rad9 for gamma-H2A binding) (Ohouo et al., 2013).

      In order to highlight the checkpoint recovery process, the model in Figure 6 did not depict another consequence of the Srs2-RPA antagonism. In the presence of Srs2, DNA binding rfa1 mutants can lead to increased levels of DNA lesions and checkpoint, and these defects are rescued by lessening Srs2’s ability to strip RPA from DNA (Dhingra et al., 2021). We will modify the model in Figure 6 and its legend to clarify that the model depicts just one of the consequences of the Srs2 and RPA antagonism with a focus on the checkpoint recovery. We will also state these points more clearly in the Discussion. Further, a new schematic in Figure 3 as suggested by the reviewer will be added to outline the genetic relationship and interpretation. We will also follow reviewer’s suggestion to move Figure S2 to the main figures. Better characterizing the compensatory mechanisms among different checkpoint dampening pathways is very interesting but requires substantial amounts of work. While it is beyond the scope of the current study, it could be pursued in the future.

      (3) The authors should demarcate the region used for quantifying the G1 population in Figure 3B and explain the following discrepancy: By inspection of the cell cycle graph, all mutants have lower G1 peak height compared to WT (CPT 2h). However, in the quantification bar graph at the bottom, ΔPIM has higher G1 population than the WT. 

      We have added the description on how the G1 region of the FACS histogram was selected to derive the percentage of G1 cells in Figure 3B. Briefly, for samples collected for a particular strain, the G1 region of the “G1 sample” was used to demarcate the G1 region of the “CPT 2h” sample. Upon re-checking the included FACS profiles, we realized that a mutant panel and its datapoint were mistakenly put in the place for wild-type. We will correct this mistake. The conclusion remains that srs2-∆PIM and srs2-3KR improved rfa1-zm2 cells’ ability to exit G2/M, while they themselves do not show difference from the wild-type control for the percentage of G1 cells after 2hr CPT treatment. We will add statistics in figures to reflect this conclusion and adjust the order of strains shown in panel A and B to be consistent with each other.

      Reviewer #2:

      This is an interesting paper that delves into the post-translational modifications of the yeast Srs2 helicase and proteins with which it interacts in coping with DNA damage. The authors use mutants in some interaction domains with RPA and Srs2 to argue for a model in which there is a balance between RPA binding to ssDNA and Srs2's removal of RPA. The idea that a checkpoint is being regulated is based on observing Rad53 and Rad9 phosphorylation (so there are the attributes of a checkpoint), but evidence of cell cycle arrest is lacking. The only apparent delay in the cell cycle is the re-entry into the second S phase (but it could be an exit from G2/M); but in any case, the wild-type cells enter the next cell cycle most rapidly. No direct measurement of RPA residence is presented. 

      We thank the reviewer for the helpful comments. Previous studies have shown that CPT does not induce the DNA replication checkpoint, thus it does not slow down or arrest S phase progression; however, CPT does induce the DNA damage checkpoint, which causes a delay of G2/M cells to re-enter into the second cell cycle (Menin et al., 2018, Redon et al., 2003). Our result is consistent with previous findings, showing that CPT induces G2/M delay but not arrest. We will adjust the text to make this point clearer.

      We have previously reported chromatin-bound RPA levels in rfa1-zm2, srs2, and their double mutants, as well as in vitro ssDNA binding by wild-type and mutant RPA complexes (Dhingra et al., 2021). We found that Srs2 loss or its ATPase dead mutant led to 4-6 fold increase of RPA levels on chromatin, which was rescued by rfa1-zm2 (Dhingra et al., 2021). On its own, rfa1-zm2 did not cause defective chromatin association in our assays, despite modestly reducing ssDNA binding in vitro (Dhingra et al., 2021). This discrepancy could be due to a lack of sensitivity of chromatin fractionation assay in revealing moderate changes of RPA residence on DNA. Considering this, we decided to employ functional assays (Figure 2-3) that are more effective in identifying the Srs2 features pertaining to RPA regulation. 

      Strengths:

      Data concern viability assays in the presence of camptothecin and in the post-translational modifications of Srs2 and other proteins.

      Weaknesses:

      There are a couple of overriding questions about the results, which appear technically excellent. Clearly, there is an Srs2-dependent repair process here, in the presence of camptothecin, but is it a consequence of replication fork stalling or chromosome breakage? Is repair Rad51-dependent, and if so, is Srs2 displacing RPA or removing Rad51 or both? If RPA is removed quickly what takes its place, and will the removal of RPA result in lower DDC1-MEC1 signaling? 

      While Srs2 can affect both the checkpoint response and DNA repair in CPT conditions, the rfa1-zm2 allele, which affects the former but not the latter, role of Srs2, allows us to gain a deeper understanding of the former role (Dhingra et al., 2021). This role also appears to be critical for cell survival in CPT, since srs2∆ growth on CPT-containing media was greatly improved by rfa1-zm mutants (Dhingra et al., 2021). Building on this understanding, our current study identified two Srs2 features that could afford spatial and temporal regulations of RPA removal from DNA, thus providing a rationale for how cells can properly utilize this beneficial yet also dangerous activity. Study of Srs2-mediated repair in CPT conditions, either in Rad51-dependent or independent manner, before and after replication forks stall or DNA breaks, will require substantial efforts and can be pursued in the future. We will add this point to the revised manuscript.

      Moreover, it is worth noting that in single-strand annealing, which is ostensibly Rad51 independent, a defect in completing repair and assuring viability is Srs2-dependent, but this defect is suppressed by deleting Rad51. Does deleting Rad51 have an effect here? 

      We have shown in our previous paper (Dhingra et al., 2021). that rad51∆ did not rescue the hyper-checkpoint phenotype of srs2∆ cells in CPT condition (Dhingra et al., 2021), while rfa1-zm1 and -zm2 did (Dhingra et al., 2021). Such differential effects were also seen for the srs2 ATPase-dead allele (Dhingra et al., 2021). These and other data described in the Dhingra et al paper suggest that Srs2’s effects on checkpoint vs. recombination are separable at least in CPT condition, and that the Srs2-RPA antagonism in checkpoint regulation is not affected by Rad51 removal (unlike in SSA situation).

      Neither this paper nor the preceding one makes clear what really is the consequence of having a weakerbinding Rfa1 mutant. Is DSB repair altered? Neither CPT nor MMS are necessarily good substitutes for some true DSB assay. 

      In our previous report (Dhingra et al., 2021), we showed that the rfa1-zm mutants did not affect the frequencies of rDNA recombination, gene conversation, or direct repeat repair (Dhingra et al., 2021). Further, rfa1-zm mutants did not suppress the hyper-recombination phenotype of srs2∆, while rad51∆ did (Dhingra et al., 2021). In a DSB system, wherein the direct repeats flanking the break were placed 30 kb away from each other, srs2∆ led to hyper-checkpoint and lethality, both of which were rescued by rfa1-zm mutants (Dhingra et al., 2021). In this assay, rfa1-zm mutants themselves did not show sensitivity, suggesting the repair is largely proficient. Collectively, these data provide evidence to suggest that weaker DNA binding of Rfa1 does not have detectable effect on the recombinational repair assays examined thus far, rather it has a profound effect in Srs2-mediated checkpoint downregulation. In-depth studies of rfa1-zm mutations in the context of various DSB repair steps will be interesting to pursue in the future.

      With camptothecin, in the absence of site-specific damage, it is difficult to test these questions directly. (Perhaps there is a way to assess the total amount of RPA bound, but ongoing replication may obscure such a measurement). It should be possible to assess how CPT treatment in various genetic backgrounds affects the duration of Mec1/Rad53-dependent checkpoint arrest, but more than a FACS profile would be required. 

      Quantitative measurement of RPA residence time on DNA in cells and the duration of Mec1/Rad53-dependent checkpoint arrest will be very informative but requires further technology development. Our current work provides a foundation for such quantitative assessment.

      It is also notable that MMS treatment does not seem to yield similar results (Fig. S1). 

      Figure S1 showed that srs2-∆PIM and srs2-3KR had weaker suppression of rfa1-zm2 growth on MMS plates than on CPT plates. The reasons for the less potent growth suppression in MMS condition compared with CPT condition are unclear, but multiple possibilities should be considered, given that MMS and CPT affect checkpoint responses differently and that RPA and Srs2 affect growth in multiple ways. For example, while CPT only activates the DNA damage checkpoint, MMS additionally induces DNA replication checkpoint (Menin et al., 2018, Redon et al., 2003). It is thus possible that the Srs2-RPA antagonism is more important for the DNA damage checkpoint than the DNA replication checkpoint. Further investigation of this and other possibilities will provide clues to the differential suppressive effects seen in this work. We will include this discussion in the revised text.

      Reviewer #3:

      The superfamily I 3'-5' DNA helicase Srs2 is well known for its role as an anti-recombinase, stripping Rad51 from ssDNA, as well as an anti-crossover factor, dissociating extended D-loops and favoring non-crossover outcome during recombination. In addition, Srs2 plays a key role in ribonucleotide excision repair. Besides DNA repair defects, srs2 mutants also show a reduced recovery after DNA damage that is related to its role in downregulating the DNA damage signaling or checkpoint response. Recent work from the Zhao laboratory (PMID: 33602817) identified a role of Srs2 in downregulating the DNA damage signaling response by removing RPA from ssDNA. This manuscript reports further mechanistic insights into the signaling downregulation function of Srs2. 

      Using the genetic interaction with mutations in RPA1, mainly rfa1-zm2, the authors test a panel of mutations in Srs2 that affect CDK sites (srs2-7AV), potential Mec1 sites (srs2-2SA), known sumoylation sites (srs2-3KR), Rad51 binding (delta 875-902), PCNA interaction (delta 1159-1163), and SUMO interaction (srs2SIMmut). All mutants were generated by genomic replacement and the expression level of the mutant proteins was found to be unchanged. This alleviates some concern about the use of deletion mutants compared to point mutations. The double mutant analysis identified that PCNA interaction and SUMO sites were required for the Srs2 checkpoint dampening function, at least in the context of the rfa1-zm2 mutant. There was no effect of these mutants in a RFA1 wild-type background. This latter result is likely explained by the activity of the parallel pathway of checkpoint dampening mediated by Slx4, and genetic data with an Slx4 point mutation affecting Rtt107 interaction and checkpoint downregulation support this notion. Further analysis of Srs2 sumoylation showed that Srs2 sumoylation depended on PCNA interaction, suggesting sequential events of Srs2 recruitment by PCNA and subsequent sumoylation. Kinetic analysis showed that sumoylation peaks after maximal Mec1 induction by DNA damage (using the Top1 poison camptothecin (CPT)) and depended on Mec1. These data are consistent with a model that Mec1 hyperactivation is ultimately leading to signaling downregulation by Srs2 through Srs2 sumoylation. Mec1-S1964 phosphorylation, a marker for Mec1 hyperactivation and a site found to be needed for checkpoint downregulation after DSB induction did not appear to be involved in checkpoint downregulation after CPT damage. The data are in support of the model that Mec1 hyperactivation when targeted to RPA-covered ssDNA by its Ddc2 (human ATRIP) targeting factor, favors Srs2 sumoylation after Srs2 recruitment to PCNA to disrupt the RPA-Ddc2-Mec1 signaling complex. Presumably, this allows gap filling and disappearance of long-lived ssDNA as the initiator of checkpoint signaling, although the study does not extend to this step.

      Strengths 

      (1) The manuscript focuses on the novel function of Srs2 to downregulate the DNA damage signaling response and provide new mechanistic insights. 

      (2) The conclusions that PCNA interaction and ensuing Srs2-sumoylation are involved in checkpoint downregulation are well supported by the data. 

      We thank the reviewer for carefully reading our work and for his/her positive comments. 

      Weaknesses 

      (1) Additional mutants of interest could have been tested, such as the recently reported Pin mutant, srs2Y775A (PMID: 38065943), and the Rad51 interaction point mutant, srs2-F891A (PMID: 31142613). 

      srs2-Y775A was shown to be proficient for stripping RPA from ssDNA and behaved like wild-type Srs2 in assays such as gene conversion and crossover control, and exhibited a genetic interaction profile as the wildtype allele. The authors suggest that the Y775 pin can contribute to unwinding secondary DNA structures. Collectively, these findings do not provide a strong rationale for srs2-Y775A being relevant for RPA removal from ssDNA. 

      We have already included the data showing that a srs2 mutant lacking the Rad51 binding domain (srs2-∆Rad51BD, ∆875-902) did not affect rfa1-zm2 growth in CPT nor caused other defects in CPT on its own (Figure 2D). This data suggest that Rad51 binding is not relevant to the Srs2-RPA antagonism in CPT, a conclusion fully supported by data in our previous study (Dhingra et al., 2021). Collectively, these findings do not provide a strong rationale to test a point mutation within the Rad51BD region. 

      (2) The use of deletion mutants for PCNA and RAD51 interaction is inferior to using specific point mutants, as done for the SUMO interaction and the sites for post-translational modifications. 

      We agree with this view generally. However, this is less of a concern for the Rad51 binding site mutant (srs2∆Rad51BD), as it behaved as the wild-type allele in our assays. The srs2-∆PIM mutant (lacking 4 amino acids) has been examined for PCNA binding in vitro and in vivo in several studies (e.g. Kolesar et al., 2016, Kolesar et al., 2012); to our knowledge no unintended defect was reported. We thus believe that this allele is suitable for testing whether Srs2’s ability to bind PCNA is relevant to RPA regulation.

      (3) Figure 4D and Figure 5A report data with standard deviations, which is unusual for n=2. Maybe the individual data points could be plotted with a color for each independent experiment to allow the reader to evaluate the reproducibility of the results. 

      We will include individual data points as suggested and correct figure legend to indicate that three independent biological samples per genotype were examined in both panels.

      References:

      Dhingra N, Kuppa S, Wei L, Pokhrel N, Baburyan S, Meng X, Antony E and Zhao X (2021) The Srs2 helicase dampens DNA damage checkpoint by recycling RPA from chromatin Proc Natl Acad Sci U S A 118

      Hammet A, Magill C, Heierhorst J and Jackson SP (2007) Rad9 BRCT domain interaction with phosphorylated H2AX regulates the G1 checkpoint in budding yeast EMBO Rep 8: 851-857

      Kolesar P, Altmannova V, Silva S, Lisby M and Krejci L (2016) Pro-recombination Role of Srs2 Protein Requires SUMO (Small Ubiquitin-like Modifier) but Is Independent of PCNA (Proliferating Cell Nuclear Antigen) Interaction J Biol Chem 291: 7594-7607

      Kolesar P, Sarangi P, Altmannova V, Zhao X and Krejci L (2012) Dual roles of the SUMO-interacting motif in the regulation of Srs2 sumoylation Nucleic Acids Res 40: 7831-7843

      Koster DA, Palle K, Bot ES, Bjornsti MA and Dekker NH (2007) Antitumour drugs impede DNA uncoiling by topoisomerase I Nature

      448: 213-217

      Menin L, Ursich S, Trovesi C, Zellweger R, Lopes M, Longhese MP and Clerici M (2018) Tel1/ATM prevents degradation of replication forks that reverse after topoisomerase poisoning EMBO Rep 19

      Ohouo PY, Bastos De Oliveira FM, Liu Y, Ma CJ and Smolka MB (2013) DNA-repair scaffolds dampen checkpoint signalling by counteracting the adaptor Rad9 Nature 493: 120-124

      Petermann E, Lan L and Zou L (2022) Sources, resolution and physiological relevance of R-loops and RNA-DNA hybrids Nat Rev Mol Cell Biol 23: 521-540

      Redon C, Pilch DR, Rogakou EP, Orr AH, Lowndes NF and Bonner WM (2003) Yeast histone 2A serine 129 is essential for the efficient repair of checkpoint-blind DNA damage EMBO Rep 4: 678-684

      Sun Y, Saha S, Wang W, Saha LK, Huang SN and Pommier Y (2020) Excision repair of topoisomerase DNA-protein crosslinks (TOP-

      DPC). DNA Repair 89: 102837

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      The authors use point light displays to measure biological motion (BM) perception in children (mean = 9 years) with and without ADHD, and relate it to IQ, social responsiveness scale (SRS) scores and age. They report that children with ADHD were worse at all three BM tasks, but that those tasks loading more heavily on local processing relate to social interaction skills and those loading on global processing relate to age. There are still some elements of the results that are unclear, but nevertheless, the important and solid findings extend our limited knowledge of BM perception in ADHD, as well as biological motion processing mechanisms in general.

      We thank the editors and reviewers for their valuable feedback and constructive comments. In the revised manuscript, we have incorporated all statistics for the models and also provided detailed analytical evidence about the distinct contributions of local and global BM processing. We hope these clarifications could enhance the robustness of our conclusions.

      Public Reviews:

      Reviewer #2 (Public Review):

      Summary:

      Tian et al. aimed to assess differences in biological motion (BM) perception between children with and without ADHD, as well as relationships to indices of social functioning and possible predictors of BM perception (including demographics, reasoning ability and inattention). In their study, children with ADHD showed poorer performance relative to typically developing children in three tasks measuring local, global, and general BM perception. The authors further observed that across the whole sample, performance in all three BM tasks was negatively correlated with scores on the social responsiveness scale (SRS), whereas within groups a significant relationship to SRS scores was only observed in the ADHD group and for the local BM task. Local and global BM perception showed a dissociation in that global BM processing was predicted by age, while local BM perception was not. Finally, general (local & global combined) BM processing was predicted by age and global BM processing, while reasoning ability mediated the effect of inattention on BM processing.

      Strengths:

      Overall, the manuscript is presented in a clear fashion and methods and materials are presented with sufficient detail so the study could be reproduced by independent researchers. The study uses an innovative, albeit not novel, paradigm to investigate two independent processes underlying BM perception. The results are novel and have the potential to have wide-reaching impact on multiple fields.

      We appreciate the your positive feedback very much.

      Weaknesses:

      The manuscript has improved in clarity and conceptual and methodological considerations in response to the last review. However, the reported results still provide incomplete support for the claims the authors make in the paper.

      In relation to other reviewers' earlier comments, the model notation used is still not consistent and model results are reported incompletely, which make it difficult to gain a full picture of the data and how they support the authors' secondary claims. For instance, across the models in the supplementary materials, ß coefficients are only reported selectively which makes it difficult to assess the model as a whole. Furthermore, different terms (task 1, task 2 vs. BM-Local, BM-global) are used to refer to the same levels of a variable, and it is unclear which levels of a dummy variable correspond to which task, making it overall very difficult to comprehend the modelling procedure.

      Thanks for pointing out these issues. In the revised version, we have unified the terminology by consistently referring to task types as BM-Local, BM-Global, BM-General. Additionally, we have provided clarification on the interpretation of dummy variables in relation to model construction. Furthermore, we corrected the model results and included all statistics in Table S1, S2, and S3. For more detailed information, please refer to the response to your Recommendations for the authors.

      Reviewer #3 (Public Review):

      The authors presented point light displays of human walkers to children (mean = 9 years) with and without ADHD to compare their biological motion perception abilities, and relate them to IQ, social responsiveness scale (SRS) scores and age. They report that children with ADHD were worse at all three biological motion tasks, but that those loading more heavily on local processing related to social interaction skills and global processing to age. The valuable and solid findings are informative for understanding this complex condition, as well as biological motion processing mechanisms in general. However, the correlations present a pattern that needs further examination in future studies because many of the differences between correlations are not significant.

      Strengths:

      The authors present differences between ADHD and TD children in biological motion processing, and this question has not received as much attention as equivalent processing capabilities in autism. They use a task that appears well controlled. They raise some interesting mechanistic possibilities for differences in local and global motion processing, which are distinctions worth exploring. The group differences will therefore be of interest to those studying ADHD, as well as other developmental conditions, and those examining biological motion processing mechanisms in general.

      Thanks for this positive assessment of our work.

      Weaknesses:

      The data are not strong enough to support claims about differences between global and lobal processing wrt social communication skills and age. The mechanistic possibilities for why these abilities may dissociate in such a way are interesting, but the crucial tests of differences between correlations do not present a clear picture. Further empirical work would be needed to test this further. Specifics:

      The authors state frequently that it was the local BM task that related to social communication skills (SRS) and not the global tasks. However, the results section shows a correlation between SRS and all three tasks. The only difference is that when looking specifically within the ADHD group, the correlation is only significant for the local task. The supplementary materials demonstrate that tests of differences between correlations present an incomplete picture. Currently they have small samples for correlations, so this is unsurprising.

      We apologize for not clarifying these points earlier. We did identify correlations between performance on all BM tasks and SRS scores. However, it is noteworthy that this finding is not unexpected, given the significant distinctions in SRS scores between TD and ADHD children, alongside their marked differences in all BM tasks. Correlation analyses involving data from both groups may reflect group differences. To elucidate the relationship between social ability impairment and diminished BM processing in children with ADHD, we conducted additional subgroup analyses and found correlations only in the BM-local task. To further support the specificity of this correlation, we compared the differences in coefficients. We revised our modelling procedure for testing differences between correlations in supplementary materials and presented all models statistics in Table S2, S3. Discrepancies in these coefficients, which exclude the influence of differences between groups, suggest that social factors specifically influence the performance of the BM-Local task in children with ADHD. We acknowledge that the analysis for differences between correlations is based on a relative small sample size and provided modest interpretation in discussion. Future studies will aim to increase the sample size to validate our findings.

      Theoretical assumptions. The authors make some statements about local vs global biological motion processing that may have been made in previous studies, but would appear controversial and not definitive. E.g., that local BM processing does not improve with age and is uninfluenced by attention.

      Thanks for your comment. To the best of our knowledge, there have been fewer developmental studies conducted on local BM processing compared to global BM processing. Our study is the first one to directly explore the relationship between local BM processing and age. Additionally, we used QbInattention to evaluate sustained attention function (considered as “top-down” attention) and examined its correlation with local BM processing. Some indirect evidence supported that the ability to process local BM cues remained stable and was unaffected by top-down attention. For example, local BM processing did not show a learning trend (Chang 2009) and was linked to the activation of subcortical regions (Hirai 2020). Research has demonstrated that local BM cues can convey information about walking direction without participants’ explicit attention or recognition (Chang 2009, Hirai 2011, Thompson 2007, Wang 2010), indicating the involvement of “bottom-up” processing (Hirai 2020, Troje 2023). Consistent with previous findings, we did not find significant correlation between local BM processing and age or QbInattention. We acknowledge that the statement such as “local BM processing does not improve with age and is uninfluenced by attention” should be approached with cautions. Therefore, we interpreted our results carefully:

      “Once a living creature is detected, an agent (i.e., is it a human?) can be recognised by a coherent, articulated body structure that is perceptually organised based on its motions (i.e., local BM cues)71. This involves top-down processing and probably requires attention25,72, particularly in the presence of competing information26. Our findings are consistent with those of previous studies on the cortical processing of BM73, as we found that the severity of inattention in children with ADHD was negatively correlated with their performance in global BM processing, whereas this significant correlation was not found in local BM processing, which may involve bottom-up processing61,65 and might not need participants’ explicit attention21,23,74,75. However, further studies are needed to verify this hypothesis.” (lines 461-470)

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Supplementary materials: For all reported results, I suggest the authors use consistent model notation with complete reporting of all statistics in line with common conventions (ideally tables reporting beta values, error terms and confidence intervals for all model predictors, as well as R squared values). In particular the beta values for the reference category are needed to be able to fully interpret the beta values for the reported contrasts.

      We appreciate the your suggestion. In the newly revised manuscript, we reported all statistics including beta values, error terms and confidence intervals for all model predictors, and R squared values. These detailed statistics can be found in Table S1, S2 and S3. We hope this additional information will offer readers a more comprehensive understanding of our study.

      Please also address the following inconsistencies:

      - At least when reporting the model results, the same term should be used when refering to task type (either task 1/2/3/ or local/global/general BM).

      Thank the your for this feedback. We use the same term (BM-Local/Global/General) to refer to task type in the whole text.

      - Second linear model in the Supplementary Materials: The authors state that the results suggest that the correlation between SRS and task 1 is greater than that between task 2 and SRS scores. First of all, to be able to support this claim the authors need to provide the coefficient for task 1 (which, if task 1 is the reference variable should be ß1). Second, as I currently understand the reported model results, the fact that ß4 (representing the difference in relationship to SRS scores between task 2 and task 1; the authors refer to ß3 here although I assume they mean ß4) is negative and shows a trend towards significance would actually mean the relationship between BM processing accuracy and SRS scores is more negative for task 2 relative to task 1 and not, as the authors state, that the correlation with SRS scores is greater for task 1. I realise this contradicts the individual r values and scatter plots and hope the authors can clarify the model results.

      We thank you for pointing out these issues. For the second linear model (Model 4 in revised manuscript), we reported the coefficients for all predictors and model summaries including the coefficient for task 1 (ß1). In addition, we have made correction to the model results. The values of ß4 (representing the difference in relationship to SRS scores between BM-Global and BM-Local) and ß5 (representing the difference in relationship to SRS scores between BM-General and BM-Local) were positive and showed a trend towards significance, indicating that the correlations with SRS total score were more negative for BM-Local relative to BM-Global and BM-General:

      “A general linear model was constructed (Table S2, Model 4): SRS = β0 + β1 * ACC + β2 * D1 + β3 * D2 + β4 * (ACC * D1) + β5 * (ACC * D2). If the effect of the interaction term (i.e., β4 or β5 ) is statistically significant, it indicates a difference in correlations with SRS total score between BM-Local and BM-Global (or BM-General). The results suggested trends where the correlations with SRS total score were more negative for BM-Local relative to BM-Global (standardized β4 \= 0.580 p = 0.074) and BM-General (standardized β5 = 0.550 p = 0.073).” (lines SI 36-42)

      - Third linear model in the Supplementary Materials: In the dummy variable representing task, when local BM is the reference level, which task is represented by d1 and d2, respectively? If I understand the authors' procedure correctly, d1 should represent the difference between local and global BM and d2 the difference between local and general BM. If this is true, ß4 should code for the difference between local and global BM and not, as stated by the authors, for the difference between local and general BM. Also, what is d3?

      Thank you for pointing out this issue. We corrected and clarified the results of third model (Model 5 in revised manuscript) in the revised version and pointed out what is represented by d1 (D1) and d2 (D2), respectively:

      “We recoded task types into two dummy variables, D1 and D2, using BM-Local as a reference. The coefficient of D1 represents the difference in relationship to age between BM-Local and BM-Global, and the coefficient of D2 represents the difference in relationship to age between BM-Local and BM-General. The following model was created for each group (Table S3, Model 5-6): ACC = β0 + β1 * age + β2 * D1 + β3 * D2 + β4 * (age * D1) + β5 * (age * D2). If the effect of the interaction term (i.e., β4 or β5) is statistically significant, it indicates a difference in the effect of age on ACC between BM-Local and BM-Global (or BM-General). In the ADHD group, we observed a significant difference in the effect of age on ACC between BM-Local and BM-General (standardized β5 \= 0.462, p < 0.001) and marginally significant differences in the effect of age on ACC between BM-Local and BM-Global (standardized β4 \= 0.228, p = 0.073).” (lines SI 47-57)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 3:

      Response to authors' revisions:

      This reviewer is not convinced that the authors have done enough to satisfactorily address either of the major issues described in the original public review, above.

      They're still not providing a quantification of Fig. 5D (originally 5C).

      Their response regarding the expression pattern of Rh1 is particularly concerning, as it represents a misinterpretation of previously published data.

      The gene encoding Rh1, ninaE, is expressed at such high levels in R1-6 PRs that any RNA-seq data (bulk or single-cell) generated from the optic lobes, no matter what cell-type, will display some ninaE transcripts that are present in the background, as they leak from R1-6 during dissociation steps. This phenomenon has been well described, for instance in Davis et al., 2020, eLife, and in fact led to the development of computational tools to abate such artifacts. In other words: no, rh1 is not expressed in glia, or any other neuron besides PRs for that matter. Therefore, I remain deeply suspicious about the functional relevance of the regulatory mechanisms described in this paper.

      We thank the reviewer for her or his critical comments.

      We quantified the cell-type differences in translation of the reporter with Tub-GAL4 and now show the results in Figure 5F. Consistent with other results, this analysis revealed that the glia-to-neuron ratio of the reporter protein expression is significantly lower when it contains the UTR sequences of rh1.  

      We removed the mRNA counts (former Figure 5A and Figure 5 - figure supplement 1A), as we agree that these may well be contaminated by the very high rh1 expression in R1-6. We also amended the graph showing the ribosome distribution on the rh1 mRNA (Figure 5B) to better compare the translational efficiency (footprints normalized with mRNA, in a similar manner to Figure 3C). Now it clearly highlights the cell-type differences of footprint distributions; ribosomes are much more enriched on the CDS (being translated) in neurons, while the fraction of ribosomes on the 5ʹ leader (being stalled) is much higher in glia. We summarized this differential ribosome distribution in a new graph (now Figure 5C).  

      We apologize for the misleading description of the reporter experiments. Despite the high level of mRNA expression in the R1-6, we chose the 5ʹ leader of rh1 for the translation reporter, as it contains clear uORFs and differential ribosome accumulation thereon (Figure 5B). This biased ribosome distribution and differential translation are the consistent features for many neuronal genes (Figure 3). We revised the text to clarify this point (Line 195-203).

      In summary, we provide more rigorous analysis and extensive revision, which we hope clarified the concern.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript focuses on the role of the deubiquitinating enzyme UPS-50/USP8 in endosome maturation. The authors aimed to clarify how this enzyme drives the conversion of early endosomes into late endosomes. Overall, they did achieve their aims in shedding light on the precise mechanisms by which UPS-50/USP8 regulates endosome maturation. The results support their conclusions that UPS-50 acts by disassociating RABX-5 from early endosomes to deactivate RAB-5 and by recruiting SAND-1/Mon1 to activate RAB-7. This work is commendable and will have a significant impact on the field. The methods and data presented here will be useful to the community in advancing our understanding of endosome maturation and identifying potential therapeutic targets for diseases related to endosomal dysfunction. It is worth noting that further investigation is required to fully understand the complexities of endosome maturation. However, the findings presented in this manuscript provide a solid foundation for future studies.

      We thank this reviewer for the instructive suggestions and encouragement.

      Strengths:

      The major strengths of this work lie in the well-designed experiments used to examine the effects of UPS-50 loss. The authors employed confocal imaging to obtain a picture of the aftermath of the USP-50 loss. Their findings indicated enlarged early endosomes and MVB-like structures in cells deficient in USP-50/USP8.

      We thank this reviewer for the instructive suggestions and encouragement.

      Weaknesses:

      Specifically, there is a need for further investigation to accurately characterize the anomalous structures detected in the ups-50 mutant. Also, the correlation between the presence of these abnormal structures and ESCRT-0 is yet to be addressed, and the current working model needs to be revised to prevent any confusion between enlarged early endosomes and MVBs.

      Excellent suggestions. The EM imaging indeed revealed an increase in enlarged cellular vesicles containing various contents in usp-50 mutants. However, the detailed molecular features of these vesicles remain unclear. Therefore, we plan to utilize ESCRT components for double staining with early or late endosome markers. This will enable us to accurately characterize the anomalous structures detected in the usp-50 mutants.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors study how the deubiquitinase USP8 regulates endosome maturation in C. elegans and mammalian cells. The authors have isolated USP8 mutant alleles in C. elegans and used multiple in vivo reporter lines to demonstrate the impact of USP8 loss-of-function on endosome morphology and maturation. They show that in USP8 mutant cells, the early endosomes and MVB-like structures are enlarged while the late endosomes and lysosomal compartments are reduced. They elucidate that USP8 interacts with Rabx5, a guanine nucleotide exchange factor (GEF) for Rab5, and show that USP8 likely targets specific lysine residue of Rabx5 to dissociate it from early endosomes. They also find that the localization of USP8 to early endosomes is disrupted in Rabx5 mutant cells. They observe that in both Rabx5 and USP8 mutant cells, the Rab7 GEF SAND-1 puncta which likely represents late endosomes are diminished, although Rabex5 is accumulated in USP8 mutant cells. The authors provide evidence that USP8 regulates endosomal maturation in a similar fashion in mammalian cells. Based on their observations they propose that USP8 dissociates Rabex5 from early endosomes and enhances the recruitment of SAND-1 to promote endosome maturation.

      We thank this reviewer for the instructive suggestions and encouragement.

      Strengths:

      The major highlights of this study include the direct visualization of endosome dynamics in a living multi-cellular organism, C. elegans. The high-quality images provide clear in vivo evidence to support the main conclusions. The authors have generated valuable resources to study mechanisms involved in endosome dynamics regulation in both the worm and mammalian cells, which would benefit many members of the cell biology community. The work identifies a fascinating link between USP8 and the Rab5 guanine nucleotide exchange factor Rabx5, which expands the targets and modes of action of USP8. The findings make a solid contribution toward the understanding of how endosomal trafficking is controlled.

      We thank this reviewer for the instructive suggestions and encouragement.

      Weaknesses:

      - The authors utilized multiple fluorescent protein reporters, including those generated by themselves, to label endosomal vesicles. Although these are routine and powerful tools for studying endosomal trafficking, these results cannot tell whether the endogenous proteins (Rab5, Rabex5, Rab7, etc.) are affected in the same fashion.

      Good suggestion. Indeed, to test whether the endogenous proteins (Rab5, Rabex5, Rab7, etc.) are affected in the same fashion as fluorescent protein reporters, we supplemented our approach with the utilization of endogenous markers. These markers, including Rab5, RAB-5, Rabex5, RABX-5, and EEA1 for early endosomes, as well as RAB-7, Mon1a, and Mon1b for late endosomes, were instrumental in our investigations (refer to Figure 3, Figure 6, Sup Figure 4, Sup Figure 5, and Sup Figure 7). Our comprehensive analysis, employing various methodologies such as tissue-specific fused proteins, CRISPR/Cas9 knock-in, and antibody staining, consistently highlights the critical role of USP8 in early-to-late endosome conversion.

      - The authors clearly demonstrated a link between USP8 and Rabx5, and they showed that cells deficient in both factors displayed similar defects in late endosomes/lysosomes. However, the authors didn't confirm whether and/or to which extent USP8 regulates endosome maturation through Rabx5. Additional genetic and molecular evidence might be required to better support their working model.

      Excellent point. We plan to conduct additional genetic analyses, including the construction of double mutants between usp-50 and various rabex-5 mutations, to further elucidate the extent to which USP8 regulates endosome maturation via Rabex5.

      Reviewer #3 (Public Review):

      Summary:

      The authors were trying to elucidate the role of USP8 in the endocytic pathway. Using C. elegans epithelial cells as a model, they observed that when USP8 function is lost, the cells have a decreased number and size in lysosomes. Since USP8 was already known to be a protein linked to ESCRT components, they looked into what role USP8 might play in connecting lysosomes and multivesicular bodies (MVB). They observed fewer ESCRT-associated vesicles but an increased number of "abnormal" enlarged vesicles when USP8 function was lost. At this specific point, it's not clear what the objective of the authors was. What would have been their hypothesis addressing whether the reduced lysosomal structures in USP8 (-) animals were linked to MVB formation? Then they observed that the abnormally enlarged vesicles, marked by the PI3P biosensor YFP-2xFYVE, are bigger but in the same number in USP8 (-) compared to wild-type animals, suggesting homotypic fusion. They confirmed this result by knocking down USP8 in a human cell line, and they observed enlarged vesicles marked by YFP-2xFYVE as well. At this point, there is quite an important issue. The use of YFP-2xFYVE to detect early endosomes requires the transfection of the cells, which has already been demonstrated to produce differences in the distribution, number, and size of PI3P-positive vesicles (doi.org/10.1080/15548627.2017.1341465). The enlarged vesicles marked by YFP-2xFYVE would not necessarily be due to the loss of UPS8. In any case, it appears relatively clear that USP8 localizes to early endosomes, and the authors claim that this localization is mediated by Rabex-5 (or Rabx-5). They finally propose that USP8 dissociates Rabx-5 from early endosomes facilitating endosome maturation.

      Weaknesses:

      The weaknesses of this study are, on one side, that the results are almost exclusively dependent on the overexpression of fusion proteins. While useful in the field, this strategy does not represent the optimal way to dissect a cell biology issue. On the other side, the way the authors construct the rationale for each approximation is somehow difficult to follow. Finally, the use of two models, C. elegans and a mammalian cell line, which would strengthen the observations, contributes to the difficulty in reading the manuscript.

      The findings are useful but do not clearly support the idea that USP8 mediates Rab5-Rab7 exchange and endosome maturation, In contrast, they appear to be incomplete and open new questions regarding the complexity of this process and the precise role of USP8 within it.

      We thank this reviewer for the insightful comments. Fluorescence-fused proteins serve as potent tools for visualizing subcellular organelles both in vivo and in live settings. Specifically, in epidermal cells of worms, the tissue-specific expression of these fused proteins is indispensable for studying organelle dynamics within living organisms. This approach is necessitated by the inherent limitations of endogenously tagged proteins, whose fluorescence signals are often weak and unsuitable for live imaging or genetic screening purposes. Acknowledging concerns raised by the reviewer regarding potential alterations in organelle morphology due to overexpression of certain fused proteins, we supplemented our approach with the utilization of endogenous markers. These markers, including Rab5, RAB-5, Rabex5, RABX-5, and EEA1 for early endosomes, as well as RAB-7, Mon1a, and Mon1b for late endosomes, were instrumental in our investigations (refer to Figure 3, Figure 6, Sup Figure 4, Sup Figure 5, and Sup Figure 7). Our comprehensive analysis, employing various methodologies such as tissue-specific fused proteins, CRISPR/Cas9 knock-in, and antibody staining, consistently highlights the critical role of USP8 in early-to-late endosome conversion. Specifically, we discovered that the recruitment of USP-50/USP8 to early endosomes is depending on Rabex5. However, instead of stabilizing Rabex5, the recruitment of USP-50/USP8 leads to its dissociation from endosomes, concomitantly facilitating the recruitment of the Rab7 GEF SAND-1/Mon1. In cells with loss-of-function mutations in usp-50/usp8, we observed enhanced RABX-5/Rabex5 signaling and mis-localization of SAND-1/Mon1 proteins from endosomes. Consequently, this disruption impairs endolysosomal trafficking, resulting in the accumulation of enlarged vesicles containing various intraluminal contents and rudimentary lysosomal structures.

      Through an unbiased genetic screen, verified by cultured mammalian cell studies, we observed that loss-of-function mutations in usp-50/usp8 result in diminished lysosome/late endosomes. To elucidate the underlying mechanisms, we investigated the formation of multivesicular bodies (MVBs), a process tightly linked to USP8 function. Extensive electron microscopy (EM) analysis indicated that MVB-like structures are largely intact in usp-50 mutant cells, suggesting that USP8/USP-50 likely regulate lysosome formation through alternative pathways in addition to their roles in MVB formation and ESCRT component function. USP8 is known to regulate the endocytic trafficking and stability of numerous transmembrane proteins. Interestingly, loss-of-function mutations in usp8 often lead to the enlargement of early endosomes, yet the mechanisms underlying this phenomenon remain unclear. Given that lysosomes receive and degrade materials generated by endocytic pathways, we hypothesized that the abnormally enlarged MVB-like vesicular structures observed in usp-50 or usp8 mutant cells correspond to the enlarged vesicles coated by early endosome markers. Indeed, in the absence of usp8/usp-50, the endosomal Rab5 signal is enhanced, while early endosomes are significantly enlarged. Given that Rab5 guanine nucleotide exchange factor (GEF), Rabex5, is essential for Rab5 activation, we further investigated its dynamics. Additional analyses conducted in both worm hypodermal cells and cultured mammalian cells revealed an increase of endosomal Rabex5 in response to usp8/usp-50 loss-of-function. Live imaging studies further demonstrated active recruitment of USP8 to newly formed Rab5-positive vesicles, aligning spatiotemporally with Rabex5 regulation. Through systematic exploration of putative USP-50 binding partners on early endosomes, we identified its interaction with Rabex5. Comprehensive genetics and biochemistry experiments demonstrated that USP8 acts through K323 site de-ubiquitination to dissociate Rabex5 from early endosomes and promotes the recruitment of the Rab7 GEF SAND-1/Mon1. In summary, our study began with an unbiased genetic screen and subsequent examination of established theories, leading to the formulation of our own hypothesis. Through multifaceted approaches, we unveiled a novel function of USP8 in early-to-late endosome conversion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Wu et al. explores the role of the histone reader protein SntB in Aspergillus flavus, claiming it to be a key regulator of development and aflatoxin biosynthesis. While the study incorporates various techniques, including gene deletion, ChIP-seq, and RNA-seq, several concerns and omissions in the paper raise questions about the validity and completeness of the presented findings.

      (1) Omissions of Prior Work:

      The authors fail to acknowledge and integrate prior research by Pfannenstiel et al. (2018) on the sntB gene in A. flavus, which covered phenotypic changes, RNA-seq data, and histone modifications. This omission raises concerns about the transparency and completeness of the current study.

      The absence of reference to studies by Karahoda et al. (2022, 2023) revealing SntB's involvement in the KERS complex in A. flavus and A. nidulans is a major oversight. This raises questions about the specificity of SntB's regulatory functions, as it may be part of a larger complex. The authors should clarify why these studies were omitted and how they ensure that SntB alone, and not the entire KERS complex, is responsible for the observed effects.

      We very appreciate reviewer’s professional question. As reviewer mentioned, Pfannenstiel et al. (2018) reported the functions of sntB gene covered secondary metabolism, development and global histone modifications in A. flavus and we also cited this paper (please see reference 20). In their study, the functions of sntB gene were analyzed by both Δ_sntB_ and overexpression sntB genetic mutants. SntB deletion impaired several developmental processes, such as sclerotia formation and heterokaryon compatibility, secondary metabolite synthesis, and the ability to colonize host seeds, which were consistent with our results (Figure 1 and 2). Unlike, a complementation strain was constructed in our study which further clarified and confirmed the function of sntB gene. What’s more, our main purpose is to find the downstream regulatory mechanism of SNTB, which was reported to be a transcription factor, not only as an important epigenetic reader. Please see lane 452-457 and lane 486-500.

      For the function of KERS complex in A. nidulans (Karahoda et al., 2022), we had cited the papers, please see reference 29. For the report about the function of KERS complex in A. flavus (Karahoda et al., 2023), this paper was published recently. We are sorry for the omissions of this work. In our revised manuscript, we have cited this paper and compared with our work. Please see lane 97-98 and reference 30. Based solely on our experiments, we cannot confirm whether it is acting alone or in conjunction with others, what we can confirm is that SntB plays a key role in the process. And we will conduct related research in the future.

      (2) Transparency and Accessibility of Data:

      The lack of accessibility and visualization tools for ChIP-seq and RNA-seq data poses a challenge for independent verification and in-depth analysis. The authors should address this issue by providing more accessible data or explaining the limitations of data availability. A critical component missing from the paper is a detailed presentation of ChIP-seq data, specifically demonstrating SntB binding patterns on key promoters. This omission weakens the link between SntB and the mentioned regulatory genes. The authors should include these crucial data visualizations to strengthen their claims.

      To review GEO accession GSE247683, you can go to https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE247683, and enter the token “ipilouscnruprsl” into the box. And after our paper being published, the data will be released. For the SntB binding patterns on key promoters, we have added in the Figure 4, please see Figure 4D, 4E, 5F, 5G, and table S9.

      (3) SntB Binding Sites and Consensus Sequence:

      The study mentions several genes upregulated in the sntB mutant without demonstrating SntB binding sites on their promoters. A detailed analysis of SntB binding maps is necessary to establish a direct link between SntB and these regulatory genes.

      Thanks for your suggestion. We have added the binding maps of SntB, please see Figure 5F, 5G; lane 362-364.

      (4) Mechanistic Insight into Peroxisome Biogenesis:

      If SntB indeed regulates peroxisome biogenesis, the absence of markers for peroxisomes and the localization of peroxisomes in the sntB mutant vs. WT strains is a significant gap. Providing evidence for peroxisome regulation is crucial for understanding the proposed mechanism and validating the study's claims.

      Thanks for your suggestion. Catalase is ubiquitously present in aerobic organisms and plays a crucial role in mitigating oxidative stress through the scavenging of reactive oxygen species (ROS). So, we detected the ROS level in sntB mutant and WT strain, as well as ∆catC strain (Figure 6H).

      In summary, while the manuscript presents intriguing findings regarding SntB's role in A. flavus, the omissions of prior work, lack of transparency in data accessibility, and insufficient mechanistic insights call for revisions and additional experimental evidence to strengthen the validity and impact of the study. Addressing these concerns will enhance the manuscript's contribution to the field.

      Thanks. We have revised our manuscript depending on the valuable comments provided above.

      Additionally, the way the English language is used could be improved.

      Thanks. We have asked a native English-writing assistant to proof read the paper and revised the grammar errors and typos and improve the readability and quality of the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This work is of great significance in revealing the regulatory mechanisms of pathogenic fungi in toxin production, pathogenicity, and in its prevention and pollution control. Overall, this is generally an excellent manuscript.

      Strengths:

      The data in this manuscript is robust and the experiments conducted are appropriate.

      Weaknesses:

      (1) The authors found that SntB played key roles in the oxidative stress response of A. flavus by ChIP-seq and RNA sequencing. To confirm the role of SntB in oxidative stress, the authors have to better measure the ROS levels in the ΔsntB and WT strains, besides the ΔcatC strain.

      Thanks for your suggestion. We have supplemented the relevant experiments and the results were shown in Figure 6G and lane 185-192 and 395-398.

      (2) Why did the authors only study the function of catC among the 7 genes related to an oxidative response listed in Table S14?

      The function of some genes in Table S15 (Table S14 in old version of our manuscript) had been studied, such as cat1 [1]. In this study, we just choose catC for further validation, which was the most up-regulated gene in Δ_sntB_ strain. The others may also have important roles in SntB triggered antioxidant pathways to regulate development and aflatoxin biosynthesis in A. flavus. We will focus on this in the following work.

      (1) Zhu Z., Yang M., Bai Y., Ge F., Wang S. Antioxidant-related catalase CTA1 regulates development, aflatoxin biosynthesis, and virulence in pathogenic fungus Aspergillus flavus [J]. Environ Microbiol, 2020, 22(7): 2792-2810.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Line 52: Change "shad light" to "shed light"

      Thanks. We have revised. Please see lane 50.

      Line 62: Change "has" to "have" to match the plural noun "aflatoxins."

      Original: "Aflatoxins produced by A. flavus has strong toxicity..."

      Suggested: "Aflatoxins produced by A. flavus have strong toxicity..."

      Thanks. We have revised it. Please see lane 62.

      Line 79: Consider rephrasing for clarity.

      Original: "...which may result in the modulation of the expression of genes involved in toxin production [15-17]."

      Thanks. We have revised. Please see lane 77-80.

      Line 105: Add a comma after "host strain."

      Original: "A. flavus Δku70 ΔpyrG was used as a host strain for genetic manipulations."

      Suggested: "A. flavus Δku70 ΔpyrG was used as a host strain, for genetic manipulations."

      Thanks. We have revised it. Please see lane 107.

      Line 113, Table 1: Remove the extra "r" in "from" in the Source column.

      Original: "Kindly presented form Prof. Chang[1]"

      Suggested: "Kindly presented from Prof. Chang[1]"

      Thanks. We have revised it. Please see Table 1.

      Line 140: Typo - Change "reaches" to "reach."

      Original: "when silkworm larva reaches about 1 g in weight."

      Suggested: "when silkworm larvae reach about 1 g in weight."

      Thanks. We have revised it. Please see lane 141.

      Line 158: Typo - Change "pervious" to "previous."

      Original: "Data processing was according pervious study [39]."

      Suggested: "Data processing was according to a previous study [39]."

      Thanks. We have revised it. Please see lane 150.

      Line 138 The animal invasion assay using silkworms was conducted according to a previous study.

      Change "according" to "conducted according to" for clarity.

      Thanks. We have revised it. Please see lane 139.

      Line 148 Was carried out by APPLIED PROTEIN TECHNOLOGY, Shanghai (www. aptbiotech.com).

      Change "TECHNOLOY" corrected to "TECHNOLOGY."

      Thanks. We have revised it. Please see lane 149.

      Line 148 Data processing was conducted according to a previous study [39].

      Change "according to" to "conducted according to" for clarity.

      Thanks. We have revised it. Please see lane 139.

      Line 429 Schizzosaccharomyces pombe, Correct the spelling to "Schizosaccharomyces pombe [55]."

      Thanks. We have revised it. Please see lane 448.

      Reviewer #2 (Recommendations For The Authors):

      (1) The resolution of the words written in Figures 3 and 4 is not clear (or high) enough.

      Thanks. We have revised them. Please see Figures 3 and 4.

      (2) Which kind of protein marker (protein ladder) was used in Figure 4A, you should mark out the size of the related protein.

      Thanks. We have revised. Please see Figure 4A and lane 332-333.

      (3) Latin names do not necessarily need to be written in full when they are not the first time used in the text.

      Thanks. We have revised them throughout the manuscript.

      (4) The complementary strain of sntB was labeled as sntB-C in Figure 2B, while in other figures was Com-sntB. You should correct all related problems.

      Thanks. We have revised it. Please see Figure 2B.

      (5) What is the meaning of "1" in Table 1?

      Thanks. The meaning of "1" in Table 1 was a citation. We have revised. Please see Table 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The manuscript constitutes an important contribution to antimalarial drug discovery, employing diverse systems biology methodologies; with a focus on an improved M1 metalloprotease inhibitor, the study provides convincing evidence of the utility of chemoproteomics in elucidating the preferential targeting of PfA-M1. Additionally, metabolomic analysis effectively documents specific alterations in the final steps of hemoglobin breakdown. These findings underscore the potential of the developed methodology, not only in understanding PfA-M1 targeting but also in its broader applicability to diverse malarial proteins or pathways. Revisions are needed to further enhance overall clarity and detail the scope of these implications.

      We thank the editor and reviewers for recognising the contribution our work makes to understanding the selective targeting of aminopeptidase inhibitors in malaria parasites and the wider impact this multi-omic strategy can have for anti-parasitic drug discovery efforts. The reviewers have provided constructive feedback and raised important points that we have taken on-board to improve our manuscript. In particular, we have revised aspects of the text and figures to enhance clarity, performed additional analysis on the other possible MIPS2673 interacting proteins and more comprehensively analysed the effect of MIPS2673 on parasite morphology. NB: Specific responses to comments in the public reviews are provided within responses to the specific recommendations to authors.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The article "Chemoproteomics validates selective targeting of Plasmodium M1 alanyl aminopeptidase as a cross-species strategy to treat malaria" presents a series of biochemical methods based on proteomics and metabolomics, as a means to:

      (1) validate the specific targeting of biologically active molecules (MIPS2673) towards a defined (unique) protein target within a parasite and (2) to explore whether by quantifying the perturbations generated at the level of the parasite metabolome, it is possible to extrapolate which metabolic pathway has been disrupted by using this biologically active molecule and whether this may further confirm selective targeting in parasites of the expected (or in-vitro targeted) enzyme (here PfA-1).

      The inhibitor used in this work by the authors (MIPS2673) is to my knowledge a novel one, although belonging to a chemical series previously explored by the authors, which recently enabled them to discover a specific PfA-M17 inhibitor, MIPS2571 (Edgard et al., 2022, ref 11 of this current work). Indeed, inhibitors specifically targeting either PfA-M1 or PfA-M17 (and not both, as currently done in the past) are scarce today, and highly needed to functionally characterize these two zinc-aminopeptidases. MIPS2673, blocks the development of erythrocytic stages of Plasmodium falciparum with an EC50 of 324 nM, blocks the parasite development at the young trophozoite stage at 5x EC50 (but at ring stages at 10xEC50, figure 1E), and inhibits the enzymatic activity of PfA-M1 (and its ortholog Pv-M1) but not of the related malarial metallo-aminopeptidases (M17 and M18 families) nor the human metalloenzymes from closely related enzymatic families, supporting its selective targeting of PfA-M1 (and Pv-M1).

      All experiments are carried out in vitro (e.g. biochemical studies such as enzymology, proteomics, metabolomics) and on cultured parasites (erythrocyte stages of Plasmodium falciparum and several gametocytes stages obtained in vitro); there are no in vivo manipulations. The work related to Plasmodium vivax, which justifies the "cross-species" indication in the title of the article, is restricted to using a recombinant form of the M1-family aminopeptidase in enzymatic assays. The rest of the work concerns only Plasmodium falciparum. While I found globally that this work is original and brings new data and above all proposes chemical validation approaches that could be used for other target validations under similar limiting conditions (impossibility of KO of the gene), I have some specific questions to address to the authors.

      Strengths and weaknesses:

      - The chemoproteomic approach, that explores the ability of MIPS2673 to more significantly "protect" the putative target (PfA-M1) against thermal degradation or enzymatic attack (by proteinase K), to document its selective targeting towards PfA-M1 (the inhibitor, once associated with its target, is expected to stabilize its structure or prevent the action of end proteases), uses several concentrations of MIPS2673 and provides convincing results. My main criticism is that these tests are carried out with parasite extracts enriched in 30-38 hours old forms, and restricted to the fraction of soluble proteins isolated from these parasitic forms, which still limits the scope of the analysis. It is clear that this methodological approach is a choice that can be argued both biologically (PfA-M1 is well expressed in these stages of the parasite development) and biochemically (it is difficult to do proteomic analyses on insoluble proteins) but I regret that the authors do not discuss these limitations further, notably, I would have expected (from Figure 1E) some targets to be also present at ring stages.

      - The metabolomic approach, by documenting the ability of MIPS2673 to selectively increase the number of non-hydrolyzed dipeptides in treated versus untreated parasites is another argument in favor of the selective targeting of PfA-M1 by MIPS2673, in particular by its broad-spectrum aminopeptidase action preferentially targeting peptides resulting from the degradation of hemoglobin by the parasite. The relative contribution of peptides derived from host hemoglobin versus other parasite proteins is, however, little discussed.

      The work as a whole remains highly interesting, both for the specific topic of PfA-M1's role in parasite biology and for the method, applicable to other malarial drug contexts.

      Reviewer #2 (Public Review):

      In this manuscript, the authors first developed a new small molecular inhibitor that could target specifically the M1 metalloproteases of both important malaria parasite species Plasmodium falciparum and P. vivax. This was done by a chemical modification of a previously developed molecule that targets PfM1 as well as PfM17 and possibly other Plasmodial metalloproteases. After the successful chemical synthesis, the authors showed that the derived inhibitor, named MIPS2673, has a strong antiparasitic activity with IC50 342 nM and it is highly specific for M1. With this in mind, the authors first carried out two large-scale proteomics to confirm the MIPS2673 interaction with PfM1 in the context of the total P. falciparum protein lysate. This was done first by using thermal shift profiling and subsequently limited proteolysis. While the first demonstrated overall interaction, the latter (limited proteolysis) could map more specifically the site of MIPS2673-PfM1 interaction, presumably the active site. Subsequent metabolomics analysis showed that MIPS2673 cytotoxic inhibitory effect leads to the accumulation of short peptides many of which originate from hemoglobin. Based on that the authors argue that the MIPS2673 mode of action (MOA) involves inhibition of hemoglobin digestion that in turn inhibits the parasite growth and development.

      Reviewer #3 (Public Review):

      This is a manuscript that attempts to validate Plasmodium M1 alanyl aminopeptidase as a target for antimalarial drug development. The authors provide evidence that MIPS2673 inhibits recombinant enzymes from both Pf and Pv and is selective over other proteases. There is in vitro antimalarial activity. Chemoproteomic experiments demonstrate selective targeting of the PfA-M1 protease.

      This is a continuation of previous work focused on designing inhibitors for aminopeptidases by a subset of these authors. Medicinal chemistry explorations resulted in the synthesis of MIPS2673 which has improved properties including potent inhibition of PfA-M1 and PvA-M1 with selectivity over a closed related peptidase. The compound also demonstrated selectivity over several human aminopeptidases and was not toxic to HEK293 cells at 40 uM. The activity against P. falciparum blood-stage parasites was about 300 nM.

      Thermal stability studies confirmed that PfA-M1 was a binding target, however, there were other proteins consistently identified in the thermal stability studies. This raises the question as to their potential role as additional targets of this inhibitor. The authors dismiss these because they are not metalloproteases, but further analysis is warranted. This is particularly important as the authors were not able to generate mutants using in vitro evolution of resistance strategies. This often indicates that the inhibitor has more than one target.

      The next set of experiments focused on a limited proteolysis approach. Again several proteins were identified as interacting with MIPS2673 including metalloproteases. The authors go on to analyze the LiP-MS data to identify the peptide from PfA-M1 which putatively interacts with MIPS2673. The authors are clearly focused on PfA-M1 as the target, but a further analysis of the other proteins identified by this method would be warranted and would provide evidence to either support or refute the authors' conclusions.

      The final set of experiments was an untargeted metabolomics analysis. They identified 97 peptides as significantly dysregulated after MIPS2673 treatment of infected cells and most of these peptides were derived from one of the hemoglobin chains. The accumulation of peptides was consistent with a block in hemoglobin digestion. This experiment does reveal a potential functional confirmation, but questions remain as to specificity.

      Overall, this is an interesting series of experiments that have identified a putative inhibitor of PfA-M1 and PvA-M1. The work would be significantly strengthened by structure-aided analysis. It is unclear why putative binding sites cannot be analyzed via specific mutagenesis of the recombinant enzyme.

      In the thermal stability and LiP -MS analysis, other proteins were consistently identified in addition to PfA-M1 and yet no additional analysis was undertaken to explore these as potential targets.

      The metabolomics experiments were potentially interesting, but without significant additional work including different lengths of treatment and different stages of the parasite, the conclusions drawn are overstated. Many treatments disrupt hemoglobin digestion - either directly or indirectly and from the data presented here it is premature to conclude that treatment with MIPS2673 directly inhibits hemoglobin digestion.

      Finally, the potency of this compound on parasites grown in vitro is 300 nM - this would need improvements in potency and demonstration of in vivo efficacy in the SCID mouse model to consider this a candidate for a drug.

      Summary:

      Overall, this is an interesting series of experiments that have identified a putative inhibitor of the Plasmodium M1 alanyl aminopeptidases, PfA-M1 and PvA-M1.

      Strengths:

      The main strengths include the synthesis of MIPS2673 which is selectively active against the enzymes and in whole-cell assay.

      Weaknesses:

      The weaknesses include the lack of additional analysis of additional targets identified in the chemoproteomic approaches.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Question 1. Line 737 (and elsewhere). Why are Plasmodium vivax orthologs of PfA-M1 and PfA-M17 called Pv-M1 and Pv-M17 and not PvA-M1 and PvA-M17, where A stands for Aminopeptidase? I would recommend changing the names if possible, although the mention of Pv-M1 and Pv-M17 is now current in the literature (which is kind of regrettable). See also Supplemental Table S1 where PfA-M1 is named Pf-M1.

      Supplemental Table S1 was updated to PfA-M1. Nomenclature for the Plasmodium vivax aminopeptidase orthologs was amended to PvA-M1 and PvA-M17 as suggested by the reviewer.

      Question 2. Figure 1. Observation of parasite culture slide smears in Figure 1E strongly suggests that an important target of MIPS2673 appears to be expressed at the ring stage or very young trophozoites, whereas the authors, in their proteomic and metabolomic analyses, performed studies focused on late trophozoites stages (30-38h post-invasion). This difference in the targeting of Plasmodium stages puzzles me and deserves some explanations from the authors, and is related to my question 3.

      As the reviewer indicates, ring-stage parasite growth appears to be affected at high concentrations (5x and 10x EC50) of MIPS2673. Under these conditions, parasite growth appears to stall during late rings/early trophs at ~16-22 h post invasion when haemoglobin digestion is increasing and when one presumes PfA-M1 (the primary target of MIPS2673) is increasing in both expression and activity (see references 26 and 28 of this manuscript). Thus, whilst it is unsurprising that MIPS2673 has some activity against ring-stage parasites, we focused on the trophozoite stage for our proteomics studies as we showed this to be the stage most susceptible to MIPS2673 (Fig. 1D) and reasoned that we would most likely identify the primary MIPS2673 target, and other interacting proteins, from a complex biological mixture at this stage. The same reasoning underpinned our decision to perform metabolomics on drug-treated trophozoites, as we reasoned we would see a greater functional effect on this stage. Furthermore, performing these experiments on trophozoites rather than rings minimises the interference from the host red blood cell. While we cannot rule out additional targets in rings, repeating all experiments during this parasite stage is beyond the scope of this study.

      Question 3. Figure 2. Although Figure 2 is insightful and somehow self-explanatory, I think it misses two specific pieces of information. First, it is indicated in line 618 (M&M) that parasite material for thermal stability and limited proteolysis studies correspond to synchronized parasites (30-38h post-invasion) but this information is not given in Figure 2. In addition, if I fully understand the experimental protocol of obtaining parasite extracts, they strictly correspond to the soluble protein fraction of the erythrocytic stages of plasmodium at the late trophozoite stage, and not to all parasitic proteins as the scheme of Figure 2 might suggest. I would appreciate it very much if these two points (parasite stages and soluble proteins) were clearly indicated in the scheme as indeed, not the whole parasite blood stage proteome is investigated in the study but just a part of it (~47%, as the authors indeed indicate line 406). Please, edit also the legend of the figure accordingly.

      This is correct, the soluble protein fraction from synchronised trophozoites was used in our proteomics studies. These details have been included in an updated Figure 2 and in the corresponding figure legend.

      Question 4. Thermal stabilization. Figure 3B. Could the authors explain how they calculated or measured "absolute" protein abundances, and how this refers to a number of parasites in initial assays as this is not clear to me. Notably, abundance for PfA-M1 is much higher than for PF3D7_0604300, which are interesting "absolute" values.

      Protein abundance was calculated using the mean peptide quantity of the stripped peptide sequence, with only precursors passing the Q-value threshold (0.01) considered for relative quantification. Within independent experiments, normalisation was based on total protein amount (determined by the BCA assay) rather than the initial number of parasites.

      PfA-M1 is known to be a highly abundant protein and PF3D7_0604300 (as well as the other protein hits identified by thermal stability proteomics) are likely less abundant. It is noted that abundance is also dependent on ionisation efficiency and trypsin digestion efficiency. Therefore, we avoid comparing absolute abundances across proteins and use relative differences across conditions instead.

      NB: the word “absolute” in the text (“absolute fold-change”) refers to the absolute value of the fold-change (i.e. positive or negative), and not to absolute quantification of proteins. The preceding text in each case clarifies that these are based on “relative peptide abundance”.

      Question 5. Figure 5A. How do the authors explain peptides whose abundances are decreasing instead of increasing? Figure 5C. Could the authors provide digital cues (aa numbers or positions) on the ribbon representation of the PfA-M1 sequence? It is difficult to correlate the position of the 3D domains with respect to the primary structure of the protein. Also, the "yellow" supposed to show the "drug ligand" is really not very visible.

      LiP-MS is based on the principle that ligand binding alters the local proteolytic susceptibility of a protein to a non-specific protease (in this case proteinase K, PK). In this sense, in LiP-MS we are not looking at variations in the stability of whole proteins (as is the case with thermal stability proteomics, where proteins detected with significantly higher abundance in treated relative to control samples reflects thermal stabilisation of the target due to ligand binding), but differences in peptide patterns between treated and control samples that reflect a change in the ability of PK to cleave the target. Thus, in the bound state, the ligand prevents proteolysis with PK. This results in decreased abundance of peptides with non-tryptic ends (as PK cannot access the region around where the ligand is bound) and increased abundance of the corresponding fully tryptic peptide, when compared to the free target. This concept is demonstrated in Fig. 4A and is explained in the text (lines 279-282) and Fig. 4 figure legend.

      To aid visualisation, we have not added amino acid positions on the PfA-M1 sequence in Fig. 5, but have provided amino acid positions for all peptides in Supplementary File 3. We have also changed the colour of the ligand in Fig. 5C to blue and increased transparency of the binding and centre of mass neighbourhoods.

      Question 6. Gametocyte assays. Line 824 states that several compounds were used as positive controls for anti-gametocyte activity (chloroquine, artesunate, pyronaridine, pyrimethamine, dihydroartemisinin, and methylene blue) and line 821 states that the biological effects are measured against puromycin. This is not very clear to me, could the authors comment on this?

      This wording has been clarified in the methods to reflect that 5 µM puromycin was used as the positive control to calculate percent viability, whereas the other antimalarials were run in parallel as reference compounds with known anti-gametocyte activity (line 862).

      Question 7. Metabolomics. Metabolomic assays were done on parasites at 28h pi, incubated for 1h with 3x EC50 of MIPS2673. You mention applying the drug on 2x10E8 infected red blood cells (line 838) but you do not explain how you isolate these infected red blood cells from non-infected red blood cells. Could you please specify this?

      Metabolomics studies were performed such that cultures at 2% haematocrit and 6% trophozoite-stage parasitaemia (representing 2 x 108 cells in total, rather than 2 x 108 infected cells) were treated with compound or vehicle and after 1 h metabolites were extracted. This methodological detail has been clarified in the methods (line 875).

      Question 8. Figure 3B. Does this diagram come from the experimental 3D structure created by the authors (8SLO) or from molecular modeling? Please specify in the legend (line 1305).

      The diagram showing the binding mode of MIPS2673 bound to PfA-M1 comes from the experimentally determined 3D structure (PDB ID: 8SLO). This has now been stated in the figure legend. Note that the structural diagram refers to Fig. 1B (not Fig. 3B as indicated by the reviewer). The experimentally determined PfA-M1 structure with MIPS2673 bound (PDB ID: 8SLO) was also used to map LiP peptides and estimate the MIPS2673 binding site in Fig. 5, which is also now reflected in the appropriate section of the text (line 308) and Fig. 5 legend.

      Question 9. Line 745. Why not indicate µm concentration for this H-Leu-NHMec substrate while it is indicated for the other substrates mentioned in the rest of the paragraph (H-Ala-NHMec, 20 μM, etc..). Also in this section (Enzyme assays) the pH at which the various enzymatic assays were done is missing.

      All enzyme assays were performed at pH 8.0. The concentration of H-Leu-NHMec varied depending on the enzyme assayed, as follows: 20 µM for PfA-M1, 40 µM for PvA-M1 and 100 µM for ERAP1 and ERAP2. This information is now clearly stated in the methods section (lines 782 and 787) and as a footnote for Supplemental Table S1.

      Question 10. Line 830, please define FBS.

      Fetal bovine serum (FBS) has been added where appropriate (line 867).

      Question 11. The authors mention in the title the targeting of several plasmodium species, but the only experimental study on the Plasmodium vivax species concerns the use of the recombinant enzyme Pv-M1. Authors also mention "multi-stage targets", but ultimately only look at erythrocyte stages and three different gametocyte stages.

      We have now removed the words “cross-species” and “multi-stage” from the manuscript title and abstract so as not to overstate these findings. We have also added the word “potential” in the manuscript text to clarify that selective M1 inhibition could offer a potential multistage and cross species strategy for malaria.

      Question 12. Supplemental Table S1. I would suggest replacing "Percent inhibition by MIPS2673 of PfA-M1 and Pv-M1 aminopeptidases compared to selected human M1 homologues" with "Percent inhibition by MIPS2673 of PfA-M1 and Pv-M1 aminopeptidase activities compared to selected human M1 homologues".

      Done.

      Question 13. Supplemental Table S3. Here you indicate IC50 while in text and Figure 1 you quote EC50. Why this difference?

      This has now been changed to EC50 in Supplemental Table S3.

      Reviewer #2 (Recommendations For The Authors):

      Amendments that I would recommend in order to improve the presentation include all four parts of the study:

      (1) In vitro antiparasitic activity of MIPS2673.

      The authors showed that MIPS2673 inhibits parasite growth with IC50 of 324nM measured by a standard drug sensitivity assay, Fig 1C. This is all well and good, but it would be helpful to include at least one if not more other compounds such as antimalaria drugs and/or their earlier inhibitors (e.g. inhibitor 1) for comparisons. This is typically done to show that the assay in this manuscript is fully compatible with previous studies. It will also give a better view of how the selective inhibition of PfM1 kills the parasite, specifically.

      Alongside MIPS2673, we also analysed the potency of the known antimalarial artesunate, which was found to have an EC50 of 4 nM. This value agrees with the expected potency of artesunate and indicates our MIPS2673 value of 324 nM is indeed compatible with previous studies. We have now reported the artesunate EC50 value for reference (lines 197-198 and Fig. S1).

      Next, the authors proceeded to investigate the stage-specific effect of MIPS2673 but this time doing a survival assay instead of proper IC50 estimations (Figure 1. I wonder why? Drug survival assays have typically very limited information content and measuring proper IC50 in stage-specific wash-off assays would be much more informative.

      We performed single concentration stage specificity assays to determine the parasite asexual stage at which MIPS2673 is most active. This involved washing off the compound after a 24 h exposure in rings or trophozoites and determining parasite viability in the next asexual lifecycle. While a full dose response curve would allow generation of an EC50 value against the respective parasite stages, this information is unlikely to change the interpretation that MIPS2673 is more active against trophozoites stages than against rings.

      Finally, in Figure 1E, the authors present the fact that the MIPS2673 arrests the parasite development. This is done by presenting a single (presumably representative) cell per time point. This is in my view highly insufficient. I recommend this figure be supplemented by parasite stage counts or other more comprehensive data representation. Also, the authors mention that while there is a growth arrest, hemoglobin is still being made. From the cell images, I can not see anything that supports this statement.

      We thank the reviewer for this constructive comment and they are correct in their assessment that these are representative parasite images at the respective time points. To address the reviewers concerns we have now provided cell counts from each treatment condition (Fig. 1E) at selected time points, which shows parasite stalling at the ring to trophozoite transition under drug treatment. On reflection, we agree that it is difficult to determine the presence of haemozoin from our images and have removed this statement.

      (2) Protein thermal shift profiling. In the next step, the authors proceed to carry out cellular thermal shift profiling to show that PfM1 indeed interacts with MIPS2673, this time in the context of the total protein lysates from P. falciparum. This section of the study is in my view quite solid and indeed it is nice to see that the inhibitor causes a thermal shift of PfM1 which further supports what was already expected: interaction.

      I have no problem with this study in terms of the technical outcome but I would urge the authors to tone down the interpretation of these results in two ways.

      Four other proteins were found to be shifted by the inhibitor which also indicates interactions. Calling it simply "off-target" interactions might not represent the truth. The authors should explore and in some way comment that interactions with these proteins could contribute to the MIPS2673 MOA. I do not suggest conducting any more studies but simply acknowledge this situation. Identifying more than one target is indeed very common in CETSA studies and it would be helpful to acknowledge this here as well.

      We agree that identifying binding proteins in addition to the “expected” target is commonplace, and is indeed one of the benefits of this unbiased and proteome-wide approach. In the results and discussion, we have now amended our language to refer to these additional hits as MIPS2673-interacting proteins. In our original manuscript we dedicate a paragraph in the discussion to these additional interacting proteins and the likelihood of them being targets that contribute to antimalarial activity. Of these four additional interacting proteins, only the putative AP2 domain transcription factor (PF3D7_1239200) is predicted to be essential for blood stage growth and is therefore the only protein from this additional four that would likely contribute to antimalarial activity. These points are explicitly stated in the discussion (lines 530-550). Notably, all of the other interacting proteins identified in our thermal stability dataset were detected in our LiP-MS experiment but were not identified as interacting proteins by this method. The remaining three proteins were two non-essential P. falciparum proteins with unknown functions (PF3D7_1026000 and PF3D7_0604300) that are poorly described in the literature and a human protein (RAB39A). Further analysis of these other thermal stability proteomics hits in our LiP-MS dataset (see responses to Reviewer #3) identified none or only 1 significant LiP peptide from these proteins across our LiP-MS datasets, indicating they are likely to be false positive hits. Caveats around identifying protein targets by different deconvolution methods are also now addressed (lines 545-550).

      At some point, the author argues that causing shifts of only four/five proteins including PfM1 shows that MIPS2673 does not interact with other (off) targets. Here one must be careful to present the lack of shifts in the CETSA as proof of no interaction. There are many reasons why thermal shifts are not observed including the physical properties of the individual proteins, detection limit etc. Again I suggest adjusting these statements accordingly.

      We thank the reviewer for raising this important point and have now included additional discussion around this comment (lines 545-550).

      Finally, I am not convinced that Figure 2 presents nothing more than the overall experimental scheme with not much new information. Many of such schemes were published previously in the original publication of thermal profiling. I would suggest omitting it from the main text and shifting it into supplementary methods etc.

      We agree that similar schemes have been published previously, especially for thermal proteome profiling, and acknowledge the reviewer’s suggestion of moving this figure to the supplemental material. However, we have kept Fig. 2 in the main text as this scheme also incorporates a LiP-MS workflow for malaria drug target deconvolution (the first to do so) and also to satisfy the additional details requested for this figure by Reviewer #1 (question 3).

      (3) Identification of MIPS2673 target proteins using LiP-MS. In the next step, the authors carried out the limited proteolysis analysis with the rationale that protein peptides that are near the inhibitor binding site will exhibit higher resilience to proteolysis. The authors did a very good job of showing this for PfM1-MISP2673 interaction. This part is very impressive from a technological perspective, and I congratulate the authors on such achievement. I imagine these types of studies require very precise optimizations and performance.

      Here, however, I struggle with the meaning of this experiment for the overall flow of the manuscript. It seems that the binding pocket of MIPS2673 is less known since the inhibitor was designed for it. In fact, the authors mentioned that the crystal structure of PfM1 is available. From this perspective, the LiP-MS study represents more of a technical proof of concept for future drug target analysis but has limited contribution to the already quite well-established PfM1-MISP2673 interaction. Perhaps this could be presented in this way in the text.

      We thank the reviewer for this comment and they are correct that we solved the crystal structure of PfA-M1 bound to MIPS2673. We wish to highlight that the primary reason for performing the LiP-MS study was as an independent and complementary target deconvolution method to narrow down the shortlist of targets identified with thermal stability proteomics, and validate with high confidence that PfA-M1 is indeed the primary target of MIPS2673 in parasites. The use of a complementary approach based on a different biophysical principle (proteolytic susceptibility vs thermal stability) would also allow us to identify MIPS2673 interacting proteins that may not be detectable by thermal stability proteomics, for example targets that do not alter their thermal stability upon ligand binding. The text in the results and discussion has been amended to clarify these points (lines 266-268 and 545-550).

      Furthermore, we agree that correctly predicting the MIPS2673 binding site on PfA-M1 using our LiP-MS peptide data is a technical proof of concept. Indeed, we wished to highlight the potential utility of LiP-MS for identifying both the protein targets of drugs and predicting their binding site, which is not possible with many other target deconvolution approaches. This point has been updated in the text (lines 303-304, 459-461).

      (4) Metabolomic profiling of MIPS2673 inhibition showed a massive accumulation of short peptides which clearly indicates that this inhibitor blocks some proteolytic activity of short peptides, presumably products of upstream proteolytic activities. Here the authors argue, that because many of these detected short (di-/tri-) peptides could be mapped on the hemoglobin protein sequence, this must be their origin. Although this might be the case the author could not exclude the fact that at least some of these come from other sources (e.g. Plasmodium proteins). It would be quite helpful to comment on such a possibility as well. In particular, it was mentioned that the main subcellular localization of PfM1 is in the cytoplasm while most if not all hemoglobin digestion occurs in the digestive vacuole...?

      Indeed, we agree that Pf_A-M1 is likely processing both Hb and non-Hb peptides and do not definitively conclude that all dysregulated peptides must be derived from haemoglobin. A subset of dysregulated peptides cannot be mapped to haemoglobin and must have an alternative source such as other host proteins or turnover of parasite proteins. We have amended the discussion to better reflect these possible alternate peptide sources (480-482). Although the peptides detected in the metabolomics study (2-5 amino acids) are too short to be definitively assigned to any specific parasite or RBC protein, it is important to note that our analysis strongly indicates that the majority, but not all, of dysregulated peptides are more likely to originate from haemoglobin than other human or parasite proteins. This is based on sequence mapping, which was aided by acquiring MS/MS data for a subset of dysregulated peptides from which we derive accurate sequences (as opposed to residue composition inferred from total peptide mass) to more directly link dysregulated peptides to haemoglobin. We further quantified the sequence similarity of dysregulated peptides to all detectable proteins in the _P. falciparum infected erythrocyte proteome (~4700 proteins), showing that these peptides are statistically more similar to haemoglobin than other host or parasite proteins.

      The apparent disconnect between PfA-M1 localisation (cytosol) and the predominant site of haemoglobin digestion (digestive vacuole, DV) is explained by the fact that peptides originating from digestion of haemoglobin in the DV are required to be transported into the cytoplasm for further cleavage by peptidases, including PfA-M1. This point has now been clarified in the discussion (lines 473-474).

      Reviewer #3 (Recommendations For The Authors):

      (1) Thermal stability studies confirmed that PfA-M1 was a binding target, however, there were other proteins consistently identified in the thermal stability studies. This raises the question as to their potential role as additional targets of this inhibitor. The authors dismiss these because they are not metalloproteases, but further analysis is warranted. This is particularly important as the authors were not able to generate mutants using in vitro evolution of resistance strategies. This often indicates that the inhibitor has more than one target.

      We thank the reviewer for this comment. The possibility of other targets contributing to MIPS2673 activity was also raised by Reviewer #2 (question 2) and is addressed above. Further to our response to Reviewer #2, we agree that the inability to generate resistant parasites in vitro could indicate that inhibition of multiple essential parasite proteins (including PfA-M1) contribute to MIPS2673 activity and do not rule out this possibility. It may also indicate the target has a very high barrier for resistance and is unable to tolerate resistance causing mutations as they are deleterious to function. Indeed, previous attempts to mutate PfA-M1 (references 12 and 50), and our own attempts to generate MIPS2673 resistant parasites in vitro (unpublished), were unsuccessful. It is important to note that of the hits reproducibly identified using thermal stability proteomics, only PfA-M1 and a putative AP2 domain transcription factor (PF3D7_1239200) are predicted to be essential for blood stage growth. We have explicitly stated that PF3D7_1239200 could also contribute to activity (line 533 and 537).

      As we identified multiple hits with thermal stability proteomics we employed the complementary LiP-MS method to further investigate the target landscape of MIPS2673. PfA-M1 was the only protein reproducibly identified as the target through this approach. Importantly, the five proteins identified as hits by thermal stability proteomics were also detected in our LiP-MS datasets, but only PfA-M1 was identified as a target by both target deconvolution methods, strongly indicating it is the primary target of MIPS2673 in parasites. An important caveat is that we profiled the soluble proteome (we did not include detergents necessary for extracting membrane proteins as they may interfere with these stability assays) and other factors (e.g. the biophysical properties of the protein) will impact on whether ligand induced stabilisation events are detected. We have added additional text in the discussion around the above points (lines 545-550).

      While we do not definitively rule out other MIPS2673 interacting proteins existing in parasites (that possibly also contribute to activity), our metabolomics studies indicated no functional impact by MIPS2673 outside of elevated levels of short peptides. This is indicative of aminopeptidase inhibition and the profile of peptide accumulation was distinct from a known PfA-M17 inhibitor, and other antimalarials, further pointing to selective inhibition of the PfA-M1 enzyme by MIPS2673 being responsible for antimalarial activity.

      (2) The next set of experiments focused on a limited proteolysis approach. Again several proteins were identified as interacting with MIPS2673 including metalloproteases. The authors go on to analyze the LiP-MS data to identify the peptide from PfA-M1 which putatively interacts with MIPS2673. The authors are clearly focused on PfA-M1 as the target, but a further analysis of the other proteins identified by this method would be warranted and would provide evidence to either support or refute the authors' conclusions.

      As PfA-M1 was the only protein reproducibly identified as an interacting protein across both LiP-MS experiments (and by thermal stability proteomics) we focused our analysis on this protein. However, we agree that further analysis of the other putative interacting proteins would be valuable. Additional analysis was performed  (see new figure S4) on the other interacting proteins identified by thermal stability proteomics and the other interacting proteins identified in LiP-MS experiment one, as no other proteins (apart from PfA-M1) were identified as hits in the second LiP-MS experiment (lines 314-318, 495-505, 740-762 and Fig. S4). Using the common peptides detected across both LiP-MS experiments we mapped significant LiP peptides to the structures of the other putative MIPS2673-interacting proteins, where a structure was available and significant LiP-MS peptides were detected, and measured the minimum distance to expected binding sites. It is noted that when using the same criteria for a significant LiP peptide that we used for our PfA-M1 analysis, only one significant LiP peptide is identified from these other putative interacting proteins (YSPSFMSFK from PfADA). Therefore, we used a less stringent criteria for defining significant LiP peptides for these other proteins (see methods and Fig. S4 legend) in order to identify significant LiP peptides to map to structures. This analysis showed that, with the exception of PfA-M17, significant LiP-MS peptides for these other proteins are not significantly closer to binding sites than all other detected peptides, supporting our assertion that these other proteins are likely to be false positives or not functionally relevant MIPS2673 interacting proteins. Although significant peptides from PfA-M17 were closer to the binding site, our thermal stability and metabolomics data, combined with our previous work on the PfA-M17 enzyme, argue against this being a functionally relevant target (see lines 362-374 and 486-529 for a more detailed discussion). Another possible explanation for this result is that peptide substrates accumulating due to primary inhibition of PfA-M1 interact with PfA-M17, leading to structural changes around the enzyme active site that are detected by LiP-MS.

      (3) The final set of experiments was an untargeted metabolomics analysis. They identified 97 peptides as significantly dysregulated after MIPS2673 treatment of infected cells and most of these peptides were derived from one of the hemoglobin chains. The accumulation of peptides was consistent with a block in hemoglobin digestion. This experiment does reveal a potential functional confirmation, but questions remain as to specificity.

      As indicated, the accumulation of short peptides identified by metabolomics suggests MIPS2673 perturbs aminopeptidase function. Many of these peptides (but not all) likely map to haemoglobin and are more haemoglobin-like than other proteins in the infected red blood cell proteome. An effect on a subset of non-haemoglobin peptides is also apparent and we have added this to our discussion (also refer to our response to question 4 from Reviewer #2). A direct comparison to our previous metabolomics analysis of a specific PfA-M17 inhibitor (MIPS2571, reference 11) revealed MIPS2673 induces a unique metabolomic profile. The extent of peptide accumulation differed and a subset of short basic peptides (containing Lys or Arg) were elevated only by MIPS2673, consistent with the broad substrate preference of PfA-M1. Importantly, the metabolomics profile induced by MIPS2673 is the opposite of many other antimalarials, which cause depletion of haemoglobin peptides. Taken together, the profile of short peptide accumulation induced by MIPS2673 is consistent with specific inhibition of PfA-M1.

      (4) Overall, this is an interesting series of experiments that have identified a putative inhibitor of PfA-M1 and PvA-M1. The work would be significantly strengthened by structure-aided analysis. It is unclear why putative binding sites cannot be analyzed via specific mutagenesis of the recombinant enzyme.

      Contrary to this comment we solved the crystal structure of PfA-M1 bound to MIPS2673, determining its binding mechanism to the enzyme. This was further supported through proteomics-based structural analysis by LiP-MS. Undertaking site specific mutagenesis would be interesting to further probe the binding dynamics of MIPS2673 to the M1 protein. However, we believe it is beyond the scope of this study and would not change our conclusion that MIPS2673 binds to PfA-M1, which we have shown using multiple unbiased proteomics-based methods, enzyme assays and X-ray crystallography.

      (5) In the thermal stability and LiP -MS analysis, other proteins were consistently identified in addition to PfA-M1 and yet no additional analysis was undertaken to explore these as potential targets.

      As addressed in our previous responses, across independent thermal stability proteomics experiments we consistently identified 5 interacting proteins, including the expected target PfA-M1. In contrast, only PfA-M1 was reproducible across independent LiP-MS experiments. While several plausible putative targets (including aminopeptidases and metalloproteins) were identified in one of our LiP-MS experiment, they appear to be false discoveries and not responsible for the antiparasitic activity of MIPS2673, as peptide-level stabilisation was not consistent across independent LiP-MS experiments, and an interaction is refuted by our thermal stability, metabolomics and recombinant enzyme inhibition data. We have now performed further analysis of these other putative interacting proteins, which also argues against them being likely interacting proteins (see also response to question 2). We have also added to our existing discussion on possible MIPS2673 targets and the likelihood of these proteins contributing to antimalarial activity (lines 486-550).

      (6) The metabolomics experiments were potentially interesting, but without significant additional work including different lengths of treatment and different stages of the parasite, the conclusions drawn are overstated. Many treatments disrupt hemoglobin digestion - either directly or indirectly and from the data presented here it is premature to conclude that treatment with MIPS2673 directly inhibits hemoglobin digestion.

      Our metabolomics studies were performed using typical experimental conditions for investigating the antimalarial mechanisms of compounds by metabolomics (see references 11, 39, 40 and 55-57). We used a short 1 h incubation at 3x EC50 allowing us to profile the primary parasite pathways affected by MIPS2673 and avoid a nonspecific death phenotype associated with longer incubations. As addressed in our response to Reviewer #1 (question 2) we focused on trophozoite infected red blood cells as this is the stage most susceptible to MIPS2673 and when one presumes the greatest functional impact would be seen. It is possible that an expanded kinetic metabolomics analysis may reveal secondary mechanisms involved in MIPS2673 activity and we have now acknowledged this in the manuscript (lines 515-516). However, even though secondary mechanisms may become apparent at longer incubations it also becomes difficult to uncouple drug specific responses from nonspecific death effects. We believe any additional information provided by an expanded metabolomics analysis is unlikely to outweigh the significant extra financial cost associated with this type of experiment.

      It is correct that many antimalarial compounds appear to disrupt haemoglobin digestion when analysed by metabolomics. However, as indicated in our manuscript (lines 369-373) and previous responses, the profile of elevated haemoglobin peptides induced by MIPS2673 is substantially different to the profile caused by other antimalarials. For example, artemisinins and mefloquine cause haemoglobin peptide depletion (references 55-57) and chloroquine results in increased levels of a different subset of non-haemoglobin peptides (see Creek et al. 2016). While there is some overlap in profile with a selective M17 inhibitor (our previous work, reference 11), the level of enrichment of these peptides is different and MIPS2673 also induces accumulation of a distinct set of basic peptides consistent with the substrate preference of the PfA-M1 enzyme. As we show that MIPS2673 does not inhibit other parasite aminopeptidases, a likely explanation for the profile overlap is that the build-up of substrates that cannot be processed by PfA-M1 leads to secondary dysregulation of other aminopeptidases. Our analyses (sequence mapping, MS/MS analysis and sequence similarities to all infected red blood cell proteins) strongly indicate that the majority of elevated peptides (but not all) originate from haemoglobin. Combined with our proteomics and recombinant enzyme data indicating direct engagement of PfA-M1, and with previous literature indicating the enzyme functions to cleave amino acids from haemoglobin-derived peptides, our data indicates MIPS2673 likely directly perturbs the haemoglobin digestion pathway through PfA-M1 inhibition.

      (7) Finally, the potency of this compound on parasites grown in vitro is 300 nM - this would need improvements in potency and demonstration of in vivo efficacy in the SCID mouse model to consider this a candidate for a drug.

      We do not propose MIPS2673 as an antimalarial candidate. The experiments presented here were centred on target validation rather than identification of an antimalarial lead, which may be the focus of future studies. To avoid this confusion, we have amended the manuscript title and language throughout to clarify this point.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This study advances our understanding of the allosteric regulation of anaerobic ribonucleotide reductases (RNRs) by nucleotides, providing valuable new structural insight into class III RNRs containing ATP cones. The cryo-EM structural characterization of the system is solid, but some open questions remain about the interpretation of activity/binding assays and the newly incorporated HDX-MS results. The work will be of interest to biochemists and structural biologists working on ribonucleotide reductases and other allosterically regulated enzymes.

      Public Reviews:

      Reviewer #1 (Public Review):

      The goal of this study is to understand the allosteric mechanism of overall activity regulation in an anaerobic ribonucleotide reductase (RNR) that contains an ATP-cone domain. Through cryo-EM structural analysis of various nucleotide-bound states of the RNR, the mechanism of dATP inhibition is found to involve order-disorder transitions in the active site. These effects appear to prevent binding of substrate and a radical transfer needed to initiate the reaction.

      Strengths of the manuscript include the comprehensive nature of the work - including both numerous structures of different forms of the RNR and detailed characterization of enzyme activity to establish the parameters of dATP inhibition. The manuscript has been improved in a revision by performing additional experiments to help corroborate certain aspects of the study. But these new experiments do not address all of the open questions about the structural basis for mechanism. Additionally, some questions about the strength of biochemical data and fit of binding or kinetic curves to data that were raised by other referees still remain. Some experimental observations are not consistent with the proposed model. For example, why does dATP enhance Gly radical formation when the proposed mechanism of dATP inhibition involves disorder in the Gly radical domain?

      The work is impactful because it reports initial observations about a potentially new mode of allosteric inhibition in this enzyme class. It also sets the stage for future work to understand the molecular basis for this phenomenon in more detail.

      We express our gratitude to the reviewer for dedicating time to review our work and for the overall favorable assessment. We agree that the question of exactly how much the glycyl radical domain becomes more mobile without losing the glycyl radical entirely is an unresolved one but we also think that our work sets a solid basis for future experiments by us and others.

      Reviewer #3 (Public Review):

      The manuscript by Bimai et al describes a structural and functional characterization of an anaerobic ribonucleotide reductase (RNR) enzyme from the human microbe, P. copri. More specifically, the authors aimed to characterize the mechanism by how (d)ATP modulates nucleotide reduction in this anaerobic RNR, using a combination of enzyme kinetics, binding thermodynamics, and cryo-EM structural determination, complemented by hydrogen-deuterium exchange (HDX). One of the principal findings of this paper is the ordering of a NxN 'flap' in the presence of ATP that promotes RNR catalysis and the disordering (or increased protein dynamics) of both this flap and the glycyl radical domain (GRD) when the inhibitory effector, dATP, binds. The latter is correlated with a loss of substrate binding, which is the likely mechanism for dATP inhibition. It is important to note that the GRD is remote (>30 Ang) from the binding site of the dATP molecule, suggesting long-range communication of the structural (dis)ordering. The authors also present evidence for a shift in oligomerization in the presence of dATP. The work does provide evidence for new insights/views into the subtle differences of nucleotide modulation (allostery) of RNR, in a class III system, through long-range interactions.

      The strengths of the work are the impressive, in-depth structural analysis of the various regulated forms of PcRNR by (d)ATP using cryo-EM. The authors present seven different models in total, with striking differences in oligomerization and (dis)ordering of select structural features, including the GRD that is integral to catalysis. The authors present several, complementary biochemical experiments (ITC, MST, EPR, kinetics) aimed at resolving the binding and regulatory mechanism of the enzyme by various nucleotides. The authors present a good breadth of the literature in which the focus of allosteric regulation of RNRs has been on the aerobic orthologues.

      The addition of hydrogen-deuterium exchange mass spectrometry (HDX-MS) complements the results originating from cryo-EM data. Most notably, is the observation of the enhanced exchange (albeit quite subtle) of the GRD domain in the presence of dATP that matches the loss of structural information in this region in the cryo-EM data. The most pronounced and compelling HDX results are seen in the form of dATP-induced protection of peptides immediately adjacent to the b-hairpin at the s-site, where dATP is expected to bind based on cryo-EM. It is clear that the presence of dATP increases the rigidity of this region.

      We are happy that both reviewers find the HDX-MS experiments to be a valuable addition to the existing data.

      Weaknesses:

      The discussion of the change in peptide mobility in the N-terminal region is complicated by the presence of bimodal mass spectral features and this may prevent detailed interpretation of the data, especially for select peptide region that shows opposite trends upon nucleotide association.

      Further, the HDX data in the NxN flap is unchanged upon nucleotide binding (ATP, dATP, or CTP), despite changes observed in the cryo-EM data.

      We are grateful to the reviewer for the comprehensive feedback on the HDX-MS part and for identifying areas for improvement. The HDX analysis was of course undertaken with the intention of identifying differences in disorder of the NxN flap and GRD region. From an HDX perspective both regions were found to be highly susceptible to HDX regardless of state/ligand, due to surface accessibility and/or very fast dynamics. However, this does not mean that there is no difference in the degree of order of these regions upon ligand addition, simply that we with HDX-MS, in the limited time span of 30-3000 seconds, could not conclusively support an increased disorder. We have rephrased the discussion text to reflect this fact

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      On page 5 (and throughout the manuscript) there are some inconsistencies in how dissociation constants for effectors and inhibitors are described - for example, D in KD is sometimes subscripted and sometimes not.

      Thank you for noticing these remaining errors. We hope that we have fixed all of them now.

      Reviewer #3 (Recommendations For The Authors):

      The authors addressed many of the initial concerns raised. The addition of the HDX-MS data in this revision is a welcomed contribution to the work and complements the cryo-EM data. In select cases, the data may be over-interpreted. This reviewer suggests that the authors revise the text in this section so that it is more consistent with the presented data.

      Specific points:

      (1) The bimodal mass spectral features in the N-terminal domain complicate the data interpretation. Specifically for peptides in 81-99 region, the fast exchanging feature shows protection in the presence of (d)ATP/CTP, but the opposite trend is observed for the slow exchanging species. It is therefore advisable to not make absolutes about the HDX results in this region, as the data are complicated.

      As stated by the reviewer, it is not possible from the presented HDX data to deduce if this is a result of 50% loaded dimer or the oligomerization state of the protein. We have remedied this by removing mentions of a difference between the dATP and ATP in bimodality. Also, we have addressed this in the text by stating that the main reason is most likely the different oligomerization states present in solution. Nevertheless, it is clear from the HDX data that the N-terminal region and 81-99 are very interesting, and it was somewhat disappointing that due to the dynamics of the oligomerization it was not possible to SEC-purify pure dimer or tetramer samples for HDX-MS, in order to deconvolute the cause.

      (2) Related to #1, the authors assign the bimodal HDX behavior to EX1 mechanism, but this is not necessarily (and unlikely) true based on the limited time points. The authors also state that it originates from the heterogeneity of the sample: "a mixture of states" which could reflect the mixture of oligomerization states. The authors should be careful assigning EX1 mechanism unless there are compelling results to support it.

      We apologize for the unfortunate phrasing. It was not our intention to imply that the bimodality is due to true EX1 kinetics. See the above answer. The mention of EX1 has been removed from the discussion text.

      (3) The deuterium uptake for peptide 118-126 is very small (~1Da) compared to the length of the peptide. The change in deuterium uptake (<0.25Da) from dATP is very small; the authors should proceed with caution when presenting interpretations of such small differences.

      We agree with the reviewer that extra caution should be taken when dealing with such a small difference. However, the 118-126 peptide has been significance tested in both HDExaminer and Deuteros 2.0, and we also observed this for more than one run. The difference in uptake is small but increases to significance at the longer labelling times. The proximity to the NxN flap makes it interesting in context of an allosteric conformational change. i.e the dynamics of the NxN might be too fast so we can only see some secondary effects. We would like to keep the data  in Figure 10 for reasons of transparency. In essence this is similar to the observed bimodality mentioned above: we cannot fully explain the observation but present the data as it was observed.

      (4) On p. 22, the authors should consider revising the following statement: "confirming dATP binding to the s-site." Even though the HDX data are most compelling for the protection of peptides 178-204 and 330-348 that are adjacent to the beta-hairpin at the s-site, these data cannot "confirm" a binding site for a small molecule, such as dATP.

      We appreciate that the reviewer has pointed out that the statement can be misleading, and we agree that the binding site of small molecules can’t be confirmed based solely on HDX data. The sentence reformulated to clarify that the binding site was confirmed based on the combined evidence of HDX data and the previously presented biochemical and structural data on the s-site.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Valk and Engert et al. examined the potential relations between three different mental training modules, hippocampal structure and functional connectivity, and cortisol levels (stress) over a 9-month period. They found that among the three types of mental training: Presence (attention and introspective awareness), Affect (socio-emotional - compassion and prosocial motivation), and Perspective (socio-cognitive - metacognition and perspective taking) modules; Affect training most robustly related to changes in hippocampal structure and function - specifically, CA1-3 subfields of the hippocampus. Moreover, change in intrinsic functional connectivity related to changes in diurnal cortisol release and long-term cortisol exposure. These changes are proposed to result from a combination of factors, which is supported by multivariate analyses showing changes across subfields and training content relate to cortisol changes.

      The authors demonstrate that mindfulness training programs are a potential avenue for stress interventions that impact hippocampal structure and cortisol, providing a promising approach to improve health. The data contribute to the literature on plasticity of hippocampal subfields during adulthood, the impact of mental training interventions on the brain, and the link between CA1-3 and both short- and long-term stress changes.

      The authors thoughtfully approached the study of hippocampal subfields, utilizing a method designed for T1w images that outperformed Freesurfer 5.3 and that produced comparable results to an earlier version of ASHS. The authors note the limitations of their approaches and provide detailed information on the data used and analyses conducted. The results provide a strong basis from which future studies can expand using computational approaches or more fine-grained investigations of the impact of mindfulness training on cortisol levels and the hippocampus.

      We thank the Reviewer for the positive re-evaluation and summary of our findings and work. We made additional change as suggested and hope this clarified any open points.

      I have a few additional suggestions. Clarifying the language around the multivariate results and the impact across subfields and training modules would be helpful. 

      We are happy to provide further clarifications with respect to the multivariate results and the impact of training on subfields.

      The multivariate analyses served as a final step to explore any potential connections between training modules and hippocampal subfields, beyond just the link between CA1-3 and the Affect Module. These additional analyses were suggested by the Reviewers, and we, as authors, agreed that taking a broader view of how different parts of the hippocampus interact with overall changes can provide valuable insights into the relationship between mental training, cortisol fluctuations, and changes in CA1-3 subfields.

      We employed a multivariate partial least squares method, which aims to identify the directions in the predictor space that account for the most variance in changes observed, by creating latent variables. Initially, we investigated whether there was a general connection between CA1-3 subfields and cortisol changes, regardless of which training module produced these effects. Our findings confirmed a consistent relationship across all three training modules, indicating a strong association between cortisol changes, particularly markers such as AUC and slope change, and alterations in CA1-3 structure and functional connectivity. We explored a model incorporating changes across all hippocampal subfields and stress markers across different modules. In the right hemisphere, changes in the volume of the CA1-3 subfield were more strongly associated with stress markers, compared to other subfields. However, this association was less pronounced in the left hemisphere.

      Our multivariate approach captured fluctuations across subfields and modules beyond group-level associations, leading to a more nuanced interpretation. While the univariate analysis of module-specific changes in volume and associations within the Affect Module may offer a straightforward interpretation, as they coincide with increases in CA1-3 volume, the multivariate analysis also accounts for individual-level changes not observed at the group level using a data-driven approach. Overall these findings are in line with the group-level observations, yet provide nuance on specificity.

      We clarified these considerations further in the manuscript;

      Abstract:

      “Notably, using a multivariate approach, we found that other subfields that did not show group-level changes also contributed to changes in cortisol levels.”

      Results:

      “We employed a multivariate partial least squares method, which aims to identify the directions in the predictor space that account for the most variance in changes observed, by creating latent variables. Initially, we investigated whether there was a general connection between CA1-3 subfields and cortisol changes, regardless of which training module produced these effects.”

      Discussion:

      “Finally, through conducting multivariate analysis, we once more noticed associations between changes in CA1-3 volume and functional adaptability and alterations in stress levels, particularly prominent within the Affect Module. Integrating all subfields into a unified model highlighted a distinct significance of CA1-3, although for the left hemisphere, we observed a more diverse range of contributions across subfields. In summary, we establish a connection between a socio-emotional behavioral intervention, shifts in hippocampal subfield structure and function, and decreases in cortisol levels among healthy adults.

      Although the univariate examination of changes specific to modules in volume and connections within the Affect Module presents how changes in cortisol align with group-level rises in CA1-3 volume, the multivariate analysis extended this observation through considering individual-level alterations not discernible at the group level through a data-driven method. These results generally corresponded with observations at the group level but offer additional insights into specificity, and hint at system-level alterations.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This manuscript reports important in vitro biochemical and in planta experiments to study the receptor activation mechanism of plant membrane receptor kinase complexes with non-catalytic intracellular kinase domains. Several lines of evidence convincingly show that one such putative pseudokinase, the immune receptor EFR achieves an active conformation following phosphorylation by a co-receptor kinase, and then in turn activates the co-receptor kinase allosterically to enable it to phosphorylate down-stream signaling components. This manuscript will be of interest to scientists focusing on cell signalling and allosteric regulation.

      We wish to clarify that EFR is itself, not a pseudokinase. We could show in previous work (Bender et al., 2021; https://doi.org/10.1073/pnas.2108242118 ) that EFR has catalytic activity in vitro. This catalytic activity is, however, not required for elf18-induced immune signaling in planta.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary

      The authors use an elegant but somewhat artificial heterodimerisation approach to activate the isolated cytoplasmic domains of different receptor kinases (RKs) including the receptor kinase BRI1 and EFR. The developmental RK BRI1 is known to be activated by the co-receptor BAK1. Active BRI1 is then able to phosphorylate downstream substrates. The immune receptor EFR is also an active protein kinase also activated by the co-receptor BAK1. EFR however appears to have little or no kinase activity but seems to use an allosteric mechanism to in turn enable BAK1 to phosphorylate the substrate kinase BIK1. EFR tyrosine phosphorylation by BAK1 appears to trigger a conformational change in EFR, activating the receptor. Likewise, kinase activating mutations can cause similar conformational transitions in EFR and also in BAK1 in vitro and in planta.

      We wish to clarify that we make no strong link between tyrosine phosphorylation and the conformational change leading to activation of the complex. Rather, the HDX-MS data demonstrate the structural importance of Tyr836 for the activation mechanism. At present, we do not know how phosphorylation of the residue would affect the activation process.

      Strengths:

      I particularly liked The HDX experiments coupled with mutational analysis (Fig. 2) and the design and testing of the kinase activating mutations (Fig. 3), as they provide novel mechanistic insights into the activation mechanisms of EFR and of BAK1. These findings are nicely extended by the large-scale identification of EFR-related RKs from different species with potentially similar activation mechanisms (Fig. 5).

      Weaknesses:

      In my opinion, there are currently two major issues with the present manuscript. (1) The authors have previously reported that the EFR kinase activity is dispensible for immune signaling (https://pubmed.ncbi.nlm.nih.gov/34531323/) but the wild-type EFR receptor still leads to a much better phosphorylation of the BIK1 substrate when compared to the kinase inactive D849N mutant protein (Fig. 1). (2) How the active-like conformation of EFR is in turn activating BAK1 is poorly characterized, but appears to be the main step in the activation of the receptor complex. Extending the HDX analyses to resting and Rap-activated receptor complexes could be a first step to address this question, but these HDX studies were not carried out due to technical limitations.

      Overall this is an interesting study that aims to advance our understanding of the activation mechanisms of different plant receptor kinases with important functions in plant immunity.

      Reviewer #2 (Public Review):

      Summary:

      Transmembrane signaling in plants is crucial for homeostasis. In this study, the authors set out to understand to what extent catalytic activity in the EFR tyrosine kinase is required in order to transmit a signal. This work was driven by mounting data that suggest many eukaryotic kinases do not rely on catalysis for signal transduction, relying instead on conformational switching to relay information. The crucial findings reported here involve the realisation that a kinase-inactive EFR can still activate (ie lead to downstream phosphorylation) of its partner protein BAK1. Using a convincing set of biochemical, mass spectrometric (HD-exchange) and in vivo assays, the team suggest a model in which EFR is likely phosphorylated in the canonical activation segment (where two Ser residues are present), which is sufficient to generate a conformation that can activate BAK1 through dimersation. A model is put forward involving C-helix positioning in BAK1, and the model extended to other 'non-RD' kinases in Arabidopsis kinases that likely do not require kinase activity for signaling.

      We prefer not to describe EFR as a tyrosine kinase. It may be the case that EFR can function under certain conditions as a dual-specificity protein kinase, but this has never been demonstrated experimentally. We therefore describe EFR as a Ser/Thr protein kinase, since it is known that the isolated cytoplasmic domain can phosphorylate on Ser and Thr residues (Wang et al., 2014; https://doi.org/10.1016/j.jprot.2014.06.009).

      Strengths:

      The work uses logical and well-controlled approaches throughout, and is clear and convincing in most areas, linking data from IPs, kinase assays (including clear 32P-based biochemistry), HD-MX data (from non-phosphorylated EFR) structural biology, oxidative burst data and infectivity assays. Repetitions and statistical analysis all appear appropriate.

      Overall, the work builds a convincing story and the discussion does a clear job of explaining the potential impact of these findings (and perhaps an explanation of why so many Arabidopsis kinases are 'pseudokinases', including XPS1 and XIIa6, where this is shown explicitly).

      Weaknesses:

      No major weaknesses are noted from reviewing the data and the paper follows a logical course built on solid foundations; the use of Tables to explain various experimental data pertinent to the reported studies is appreciated.

      (1) The use of a, b,c, d in Figures 2C and 3C etc is confusing to this referee, and is now addressed in the latest version

      (2) The debate about kinase v pseudokinases is well over a decade old. For non-experts, the kinase alignments/issues raised are in PMID: 23863165 and might prove useful if cited.

      We have cited the suggested reference in the second paragraph of the discussion.

      (3) Early on in the paper, the concept of kinases and pseudokinases related to R-spine (and extended R-spine) stability and regulation really needs to be more adequately introduced to explain what comes next; e.g. some of the key work in this area for RAF and Tyr kinases where mutual F-helix Phe amino acid changes are evaluated (conceptually similar to this study of the E-helix Tyr to Phe changes in EFR) should be cited (PMID: 17095602, 24567368 and 26925779).

      As an alternative, we have amended the text in several places to focus on conformational toggling between active/inactive states rather than R-spine stability. We think that this keeps the message of our manuscript focused. We hope that the reviewer finds this acceptable.

      (4) In my version, some of the experimental text is also currently in the wrong order (and no page numbers, so hard for me to state exactly where in the manuscript); However, I am certain that Figure 2C is mentioned in the text when the data are actually shown in Figure 3C for the EFR-SSAA protein.

      Indeed, some references to Figure 2 in the text were incorrect. We have corrected these. References in the text to Figure 3 and the data reported therein are correct.

      (5) Tyr 156 in PKA is not shown in Supplement 1, 2A as suggested in the text; for readers, it will be important to show the alignment of the Tyr residue in other kinases; this has been updated in the second version. Although it is clearly challenging to generate phosphorylated EFR (seemingly through Codon-expansion here?), it appears unlikely that a phosphorylated EFR protein, even semi-pure, couldn't have been assayed to test the idea that the phosphorylation drives/supports downstream signaling. What about a DD or EE mutation, as commonly used (perhaps over-used) in MEK-type studies?

      Our aim with codon expansion was to generate recombinant protein carrying high-stoichiometry phosphorylation at sites which we have previously documented to be required for downstream signaling (Macho et al., 2014; Bender et al., 2021). We additionally demonstrated previously that a DD mutant of the activation loop sites in EFR does not fully complement the efr-1 mutant (Bender et al., 2021), suggesting that the Asp mutations are not good phospho-mimics in this context. We therefore did not generate DD or EE mutations for in vitro studies.

      Impact:

      The work is an important new step in the huge amount of follow-up work needed to examine how kinases and pseudokinases 'talk' to each other in (especially) the plant kingdom, where significant genetic expansions have occurred. The broader impact is that we might understand better how to manipulate signaling for the benefit of plants and mankind; as the authors suggest, their study is a natural progression both of their own work, and the kingdom-wide study of the Kannan group.

      Reviewer #3 (Public Review):

      The study presents strong evidence for allosteric activation of plant receptor kinases, which enhances our understanding of the non-catalytic mechanisms employed by this large family of receptors.

      Plant receptor kinases (RKs) play a critical role in transducing extracellular signals. The activation of RKs involves homo- or heterodimerization of the RKs, and it is believed that mutual phosphorylation of their intracellular kinase domains initiates downstream signaling. However, this model faces a challenge in cases where the kinase domain exhibits pseudokinase characteristics. In their recent study, Mühlenbeck et al. reveal the non-catalytic activation mechanisms of the EFR-BAK1 complex in plant receptor kinase signaling. Specifically, they aimed to determine that the EFR kinase domain activates BAK1 not through its kinase activity, but rather by utilizing a "conformational toggle" mechanism to enter an active-like state, enabling allosteric trans-activation of BAK1. The study sought to elucidate the structural elements and mutations of EFR that affect this conformational switch, as well as explore the implications for immune signaling in plants. To investigate the activation mechanisms of the EFR-BAK1 complex, the research team employed a combination of mutational analysis, structural studies, and hydrogen-deuterium exchange mass spectrometry (HDX-MS) analysis. For instance, through HDX-MS analysis, Mühlenbeck et al. discovered that the EFR (Y836F) mutation impairs the accessibility of the active-like conformation. On the other hand, they identified the EFR (F761H) mutation as a potent intragenic suppressor capable of stabilizing the active-like conformation, highlighting the pivotal role of allosteric regulation in BAK1 kinase activation. The data obtained from this methodology strengthens their major conclusion. Moreover, the researchers propose that the allosteric activation mechanism may extend beyond the EFR-BAK1 complex, as it may also be partially conserved in the Arabidopsis LRR-RK XIIa kinases. This suggests a broader role for non-catalytic mechanisms in plant RK signaling.

      The allosteric activation mechanism was demonstrated for receptor tyrosine kinases (RTKs) many years ago. A similar mechanism has been suggested for the activation of plant RKs, but experimental evidence for this conclusion is lacking. Data in this study represent a significant advancement in our understanding of non-catalytic mechanisms in plant RK signaling. By shedding light on the allosteric regulation of BAK1, the study provides a new paradigm for future research in this area.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors have considered points 1-5 raised in my initial review and the revised manuscript contains a more balanced discussion and limitation section. No additional experiments have been performed to substantiate the envisioned allosteric activation mechanism of the co-receptor kinase BAK1 by the receptor EFR. I rewrote the public statement accordingly.

      Reviewer #2 (Recommendations For The Authors):

      Thanks for responding to my comments.

      Reviewer #3 (Recommendations For The Authors):

      The revised manuscript has fully addressed my previous concerns and is now suitable for publication in eLife.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Key Considerations:

      There seem to be two inconsistencies related to some results depicted in Figures 1, 2, 3 and 5.

      Firstly, Figure 1 shows the effect on C_Las infection (_C_Las+) compared to the control (_C_Las-), where results show an increase of TAG, Glycogen, lipid droplet size, oviposition period, and fecundity. In Figures 2, 3, and 5, the authors establish the involvement of the genes _DcAKH, DcAKHR, and miR34 in this process, by showing that by preventing the function of these three factors the effects of _C_Las+ are lost. However, while Figure 1 shows the increase of TAG and lipid droplet size in _C_Las+, Figures 2, 3, and 5 do not show a significant elevation in TAG when comparing _C_Las- and _C_Las+.

      Secondly, in addition to the absence of statistical difference in TAG and lipid droplet size observed in Figure 1, Figures 2, 3, and 5 show an increase in TAG and lipid droplet size after ds_DcAKH_ (Figure 2), ds_DcAKHR_ (Figure 3) and agomiR34 (Figure 5) treatments. Considering that AKH, AKHR, and miR34 are important factors to _C_Las-induce increase in TAG and lipid droplet size, one might expect a reduction in TAG and lipid droplet size when _C_Las+ insects are silenced for these factors, contrary to the observed results.

      Thanks for your excellent suggestion. Lipid droplets are cellular organelles responsible for storing lipids within cells, playing a crucial role in fat metabolism and energy homeostasis. The formation and breakdown of lipid droplets involve a complex interplay of genes and enzymes, including DGAT (for synthesis), ATGL and HSL (for breakdown). In C_Las-negative _D. citri, there is a delicate balance between creasing and breaking down of lipid droplets. The enlargement of lipid droplet size following C_Las infection may result from a significantly higher synthesis rate compared to breakdown, as more energy is required during early ovarian development. The hormone AKH, a key player in fat metabolism, primarily stimulates fat breakdown. Therefore, when _DcAKH and DcAKHR are silenced without affecting fat synthesis, there is no enhancement of fat breakdown; instead, there is an accumulation of lipid droplets, resulting in their enlargement. This suggests that _C_Las infection affects both the breakdown and synthesis of lipid droplets, while AKH and AKHR primarily impact the breakdown, leading to similar outcomes. However, the underlying physiological mechanisms warrant further in-depth exploration.

      Reviewer #1 (Recommendations For The Authors):

      (1) Line 25: change "In addition" to "Additionally".

      Thanks for your wonderful suggestion. We have changed “In addition” to “Additionally” in our revised manuscript (Line 26).

      (2) Lines 60-72: Have there been any previous reports on the interaction between host AKH hormones and microorganisms in insects or animals? If yes, please add more background.

      Thanks for your wonderful suggestion. We have added the interactions between host AKH hormones and microorganisms in insects (Line 74-81).

      (3) Lines 82-95: add the following reference about the miR-275 of Diaphorina citri in the background. Nian, X., Luo, Y., He, X., Wu, S., Li, J., Wang, D., Holford, P., Beattie, G. A. C., Cen, Y., Zhang, S., & He, Y. (2024). Infection with 'Candidatus Liberibacter asiaticus' improves the fecundity of Diaphorina citri aiding its proliferation: A win-win strategy. Molecular Ecology, 33, e17214.

      Thanks for your wonderful suggestion. We have added the sentence “in D. citri-C_Las interaction, _C_Las hijacks the JH signaling pathway and host miR-275 that targets the _vitellogenin receptor (DcVgR) to improve D. citri fecundity, while simultaneously increasing the replication of C_Las itself, suggesting a mutualistic interaction in _D. citri ovaries with _C_Las” in our revised manuscript (Line 97-100).

      (4) In the figures of Nile red staining, the digit of the scale bar should be added.

      Thanks for your wonderful suggestion. We have added the digit of the scale bar for Nile red staining in the Figure 1C, 2E, 3E, 5C.

      (5) In Figures 2G-H, 3G-H, 5E-F, the presentation of data should be consistent with Figure 1D-E.

      Thanks for your wonderful suggestion. We have changed figure 1D-E in our revised manuscript.

      (6) In the discussion part, more information should be added about miR-275 and DcVgR from the above reference.

      Thanks for your wonderful suggestion. We have added the information “In D. citri-C_Las interaction, _C_Las operates host hormone signaling and miRNA to mediate the mutualistic interaction between _D. citri fecundity and its replication” in Line 350-353.

      (7) For the primer specific, please add the melting curves for qPCR primers of DcAKH, DcAKHR, Dcβ-ACT, U6, and miR-34 in the supplementary material.

      Thanks for your wonderful suggestion. We have added the melting curves for qPCR primers of DcAKH, DcAKHR, Dcβ-ACT, U6 and miR-34 in the supplementary material of Figure S6.

      (8) Line 476: Dcβ-ACT was indicated as a gene and should be Italic.

      Thanks for your wonderful suggestion. We have changed “DcβACT” to “Dcβ-ACT” in our revised manuscript (Line 491).

      (9) Reference style should be consistent and correct. Like [5], [10], [37], [47].

      Thanks for your wonderful suggestion. We have revised them in our revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      (1) In order to better engage readers, I suggest emphasizing the "enhanced fecundity" in the title. A suggestion for the revised title is: Adipokinetic hormone signaling mediates the enhanced fecundity of Diaphorina citri infected by 'Candidatus Liberibacter asiaticus'.

      Thanks for your wonderful suggestion. We have changed the title to “Adipokinetic hormone signaling mediates the enhanced fecundity of Diaphorina citri infected by 'Candidatus Liberibacter asiaticus'” in our revised manuscript.

      (2) For the abstract, in lines 14-15, please change the first sentence to "Diaphorina citri serves as the primary vector for 'Candidatus Liberibacter asiaticus' (C_Las), the bacterium associated with the severe Asian form of huanglongbing." In line 18, delete "present". In line 19, change "increased" to "increasing". In line 21, change "triacylglycerol accumulation" to "the accumulation of triacylglycerol". In line 33, change "in _D. citri ovaries with C_Las" to "between _C_Las and _D. citri ovaries".

      Thanks for your wonderful suggestion. We have revised them following your suggestion in our revised manuscript, including changed “Diaphorina citri is the primary vector of the bacterium, ‘Candidatus Liberibacter asiaticus’ (C_Las) associated with the severe Asian form of huanglongbing” to “_Diaphorina citri serves as the primary vector for 'Candidatus Liberibacter asiaticus' (C_Las), the bacterium associated with the severe Asian form of huanglongbing” in Line 15-16; deleted "present" in Line 19; changed "increased" to "increasing" in Line 20; changed "triacylglycerol accumulation" to "the accumulation of triacylglycerol" in Line 22; changed "in _D. citri ovaries with C_Las" to "between _C_Las and _D. citri ovaries" in Line 34.

      (3) In lines 57-59, change "How D. citri maintains a balance between lipid metabolism and increased fecundity after infection with C_Las is not known." to "However, the mechanism of how _D. citri maintains a balance between lipid metabolism and increased fecundity after infection with _C_Las remains unknown.".

      Thanks for your wonderful suggestion. We have changed " How D. citri maintains a balance between lipid metabolism and increased fecundity after infection with C_Las is not known" to "However, the mechanism of how _D. citri maintains a balance between lipid metabolism and increased fecundity after infection with _C_Las remains unknown" in our revised manuscript (Line 58-60).

      (4) In Figure 1, "n.s" should be changed to "n.s.", "n.s." should be added in 13 DAE of Figure 1A, and the specific numerical value of the scale bar should be indicated on Figures 1C, 2E, 3E, and 5C.

      Thanks for your wonderful suggestion. We have revised them in our revised manuscript.

      (5) In all the figure legends, the "**P < 0.01,***P < 0.001" should be changed to "**p < 0.01,***p < 0.001".

      Thanks for your wonderful suggestion. We have revised them in our revised manuscript.

      (6) In Figures 1D-E, the preoviposition period and oviposition period were presented using a box diagram, but in other figures (including Figure 2G-H, Figure 3G-H, Figure 5E-F) these were shown using a column chart. Please keep the method of presentation consistent.

      Thanks for your wonderful suggestion. We have revised the figure 1D-E in our revised manuscript.

      (7) For discussion, in line 333, change "Increasing numbers" to "An increasing number". In line 334, change "vertically transmitted" to "transmitted vertically".

      Thanks for  your wonderful suggestion. We have changed "Increasing numbers" to "An increasing number" in Line 345; changed "vertically transmitted" to "transmitted vertically" in Line 346 in our revised manuscript.

      (8) In lines 338-342, change "There are few studies on the mechanisms underlying vector-bacteria interactions. However, Singh and Linksvayer (2020) [38] found that Wolbachia-infected colonies of Monomorium pharaonis had increased colony-level growth, accelerated colony reproduction, and shortened colony life cycles compared to those that were uninfected." to "Although there is limited research on the mechanisms underlying vectorbacteria interactions, Singh and Linksvayer (2020) [38] found that Wolbachia_infected colonies of _Monomorium pharaonis exhibited increased colony-level growth, accelerated colony reproduction, and shortened colony life cycles compared to uninfected colonies.".

      Thanks for your wonderful suggestion. We have revised it in our revised manuscript (Line 350-355) .

      (9) In line 370, delete "present". In lines 386-387, change "More and more miRNAs have been reported to be involved in the metabolic processes of insects including reproduction." to "There is increasing evidence implicating miRNAs in the metabolic processes of insects, particularly in relation to reproduction.".

      Thanks for your wonderful suggestion. We have revised them in our revised manuscript, including deleted "present" in Line 383 and changed "More and more miRNAs have been reported to be involved in the metabolic processes of insects including reproduction" to "There is increasing evidence implicating miRNAs in the metabolic processes of insects, particularly in relation to reproduction" in Line 399-400.

      (10) In line 423, change "After infection with C_Las, _D. citri are more fecund than their uninfected counterparts." to "Upon infection with C_Las, _D. citri exhibits enhanced fecundity compared to uninfected individuals.". In lines 424-425 and 439-440, change "the more offspring of D. citri, the more C_Las in the field" to "the increased offspring of _D. citri contributes to a higher presence of _C_Las in the field.". In Line 429, change " information" to "insights".

      Thanks for your wonderful suggestion. We have revised them in our revised manuscript, including changed "After infection with C_Las, _D. citri are more fecund than their uninfected counterparts" to "Upon infection with C_Las, _D. citri exhibits enhanced fecundity compared to uninfected individuals" in Line 436-437; changed "the more offspring of D. citri, the more C_Las in the field" to "the increased offspring of _D. citri contributes to a higher presence of _C_Las in the field" in Line 438-439; changed "information" to "insights" in Line 443.

      (11) In lines 446-447, change "The _C_Las-infected lemon plants and psyllids were monitored to detect _C_Las infection monthly using the quantitative polymerase chain reaction (qPCR)" to "Monthly monitoring of the _C_Las infection in both the lemon plants and psyllids was conducted using quantitative polymerase chain reaction (qPCR)".

      Thanks for your wonderful suggestion. We have revised it in our revised manuscript (Line 460-461).

      (12) In lines 452-458, how did the authors identify homologous sequences of AKH and AKHR for phylogenetic tree analysis and alignment of the amino acid sequences? From NCBI or other databases? The methodological details should be added.

      Thanks for your wonderful suggestion. We have added the methodological details in our revised manuscript (Line 469-470).

      (13) In line 476, Dcβ-ACT should be italic.

      Thanks for your wonderful suggestion. We have changed “DcβACT” to italic in our revised manuscript (Line 491).

      (14) In line 538, the manufacturer should be provided for Nile Red.

      Thanks for your wonderful suggestion. We have provided the manufacturer of Nile Red in our revised manuscript (Line 553).

      (15) Does miR-34 have any other target genes? If yes, whether they have any function in the fecundity improvement of D. citri after infected by CLas.

      Thanks for your insightful suggestion. In addition to DcAKHR, we predicted three other genes have binding sites in 3’UTR with miR-34, including Innexin, T-box transcription factor TBX1, and fatty acid synthase. Despite this, the mRNA expression levels of all three genes remained unchanged between _C_Las-negative and _C_Las-postive females. Therefore, we believe that these genes are not implicated in the fecundity improvement.

      (16) The reference format should be unified. Please revise references 10, 28, 43, 47, and 53.

      Thanks for your wonderful suggestion. We have revised them in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their feedback on our manuscript. Taking the advice of the reviewers, we have streamlined the text and formatted the figures to conform to the format instructions. We believe that the revised manuscript has been improved. 

      Point-by-point responses are presented below.

      Reviewer #1:

      (1) There is an over-interpretation regarding the results in Figure 6A. There is no difference between isoHD1 iMac control and HD1 Mut iMac.<br />

      We thank the reviewer for his/her feedback on our manuscript. We have since changed the wordings on Page 11, line 294 of the manuscript, to reflect this important point.

      Reviewer #2:

      (2) The authors have not elucidated the significance of the increased CSF1 dosage in Figure 2F, aside from its effect on cell viability, lacking a thorough discussion of this result.

      We have incorporated the significance of the results of our CSF1 dosage data with a newly added observation of an upregulated immature myeloid marker and downregulated expression mature macrophage marker within mutant iMac from the respective RNA-seq data (Page 5, line 163); and elaborated further within the Discussion section that this results in the possible generation of immature iMacs even after maturation (Page 14, line 356).

      (3) Additionally, while transcriptomic and metabolic alterations related to the mutation were demonstrated in iMac models, similar investigations in iMicros are absent, necessitating further experiments to validate the findings across cell models.

      We thank the reviewer for this feedback and feel that this is beyond the scope of this study at current stage, and that we would keep this in consideration to incorporate into subsequent experiments.

      (4) The conclusion drawn regarding cytokine levels lacks robust support from the data, particularly considering the varied responses observed in different mutant lines. Further analysis of the secretome (e.g. via ELISA) could provide additional insights.

      We thank the reviewer for this feedback and feel that this is beyond the scope of this study at current stage, and that we would keep this in consideration to incorporate into subsequent experiments.

      (5) Moreover, the characterization of iMicros is incomplete, with limited protein-level analysis (e.g. validate RNA-seq via flow cytometry).

      We thank the reviewer for this feedback and feel that this is beyond the scope of this study at current stage, and that we would keep this in consideration to incorporate into subsequent experiments.

      (6) Additionally, the claim of microglial-like morphology lacks adequate evidence, as the provided image is insufficient for such an assessment.

      We have added confocal images depicting microglial-like morphology in our co-culture system within Supp Fig 3C.

      (7) RNA-seq experiments should be represented better, it is not possible to read the legends or gene names in the figures. Maybe the data sets can be combined into PCAone and one overall analysis, e.g. via WGCNA-like analyses? This would make it easier for the reader to compare the two cell lines side by side.

      We have since enhanced the quality of the respective RNA-seq figures with enlarged data points and gene names for better clarity.

      (8) Statistical test information is missing.

      We are sorry for leaving this out and have added the statistical test information within Page 15 of the methods section.

      (9) Finally, inconsistent terminology usage throughout the paper may confuse readers (iMac versus iMicros).

      We have streamlined the terminology used within Page 10, line 265 and 267, of the manuscript for better consistency.

      (10) Fig. 1D: which cell line is displayed here?

      Mut HD1 iPSC is displayed here. We have also revised the figure legend of Fig 1D within Page 1, line 8 to include this information.

      (11) Fig. 1E: Karyotype of which cell line is shown?

      We have included karyotype of both IsoHD1 and IsoHD2 iPSC in Fig 1E, and also revised the legend within Page 1, line 11, to reflect this change.

      (12) Supp. Fig. 1: scale bar information missing.

      We thank the reviewer for pointing out this and have revised the legend within Page 1, line 17, to include scale bar information.

      (13) Fig. 5: legend for A is missing.

      We thank the reviewer for pointing out this and have revised the legend within Page 2, line 91, to include Figure (A) within.

      14) Supp. Fig. 3A says 30 days, but only 23 days are shown.

      We are sorry for making this inadvertent typo and have since aligned the correct days (31 days) shown within the figure (Supp Fig 3A) and legend (Page 3, line 110, 113), as mentioned in the manuscript.

      (15) Supp. Fig. 3C: scale bar length is incorrect.

      We did a recheck and are confident that the scale car is of the correct length. The images displaying the respective fluorescent channels are proportionately reduced with respect to the main figure (now Supp. Fig. 3D), and thus are of the same size (200 uM).

      (16) Fig. 6: legend for D, E is missing.

      We have revised the figure legend within Page 3, line 128, 130 and 131, to address said missing legends.

      (17) Stem cells do also express Sox2. how does Sox2 expression lead to the conclusion of an optimal generated organoid?

      We thank the reviewer for pointing this out. Sox2 has been defined as a core intrinsic factor for regulating pluripotency (Avilian et al, 2003, Zhang et al, 2014), as well as lineage specifiers to regulate ectodermal differentiation which is crucial in controlling neural initiation and differentiation from iPSC (Zhao et al, 2004, Thomson et al, 2011, Wang et al 2014). Additionally, Sox2 is highly expressed in proliferating neural progenitor cells as documented in previous iterations of cerebral organoids generation protocol (Lancaster et al 2013, Qian at el, 2018). Perhaps “optimally” sounds too forced in this context, as such we have toned down on the phrasing.  

      (18) HD1 and HD2 react differently (e.g. in IL-1B production), but the text is written often as if both cell lines react in the same way.

      We thank the reviewer for pointing this out and have since clarified this within Page 4, line 366-368, of the manuscript.

      (19) Precise information on medium missing (e.g. no Pen/Strep?).

      We thank the reviewer for pointing out this. Culturing of iPSC colonies was done without the use of Pen/Strep. Additionally, we have elaborated the medium composition for our iMac cultures for clarity within Page 4, line 106, of Materials and Methods as well as the information within Supp. Table 4.

      (20) How was ReleSR used exactly?

      We have included the usage of ReleSR within Page 2, line 41 of Materials and Methods.

      (21) What kind of microscopes/objectives were used for imaging?

      We have added the respective microscope details for bright-field, phase-contrast and cytospin related experiments within Page 3, line 73, and Page 14, line 360, of Materials and Methods.

      (22) For the dissociation of organoids: what kind of pipit was use and at which temperature were organoids incubated?

      We have included the pipette used for organoids dissociation, as well as the incubation temperature for organoids culture within Page 9, line 243, 244 and 245, of Materials and Methods.

      (23) How was the RNA-seq analysis done? Which packages? Which versions?

      We provide now the information requested in the material and method section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      Using concurrent in vivo whole-cell patch clamp and dendritic calcium imaging, the authors characterized how functional synaptic inputs across dendritic arborizations of mouse primary visual cortex layer 2/3 neurons emerge during the second postnatal week. They were able to identify spatially and functionally separated domains of clustered synapses in these neurons even before eye-opening and characterize how the clustering changes from P8 to P13. 

      Strengths: 

      The work is technically challenging and the findings are novel. The results support previous EM and immunostaining studies but provide in vivo evidence on the time course and the trajectory of how functional synaptic input develops. 

      Weaknesses: 

      There are some missing details about how the experiments were performed, and I also have some questions about the analyses. 

      We have now added a more detailed description of the methods and added new supplemental figures and descriptions to clarify our analyses. Please find our responses to the specific points of this reviewer in the section “Recommendations for the authors” below.

      Reviewer #2 (Public Review):

      In this study, Leighton et al performed remarkable experiments by combining in-vivo patch-clamp recording with two-photon dendritic Ca2+ imaging. The voltage-clamp mode is a major improvement over the pioneer versions of this combinatorial experiment that has led to major breakthroughs in the neuroscience field for visualizing and understanding synaptic input activities in single cells in-vivo (sharp electrodes: Svoboda et al, Nature 1997, Helmchen et al, Nature Neurosci 1999; whole-cell current-clamp: Jia et al, Nature 2010, Chen et al, Nature 2011. I suggest that these papers would be cited). This is because in voltage-clamp mode, despite the full control of membrane voltage in-vivo not being realistic, is nevertheless most effective in preventing back-propagation action potentials, which would severely confound the measurement of individual synaptically-induced Ca2+ influx events. Furthermore, clamping the cell body at a strongly depolarized potential (here the authors did -30mV) also facilitates the detection of synaptically-induced Ca2+ influx. As a result, the authors successfully recorded high-quality Ca2+ imaging data that can be used for precise analysis. To date, even in view of the rapid progress of voltage-sensitive indicators and relevant imaging technologies in recent years, this very old 'art' of combining single-cell electrophysiology and two-photon imaging (ordinary, raster-scanned, video-rate imaging) of Ca2+ signals still enables measurements of the best level precision. 

      We thank the reviewer for reminding us of these important previous studies that we cite now in the revised manuscript. 

      On the other hand, the interpretation of data in this study is a bit narrow-minded and lacks a comprehensive picture. Some suggestions to improve the manuscript are as follows: 

      (1) The authors made a segregation of 'spine synapse' and 'shaft synapse' based solely on the two photon images in-vivo. However, caution shall be taken here, because the optical resolution under in vivo imaging conditions like this cannot reliably tell apart whether a bright spot within or partially overlapping a segment of the dendrite is a spine on top of (or below) it. Therefore, what the authors consider as a 'shaft synapse' (by detecting Ca2+ hotspots) has an unknown probability of being just a spine on top or below the dendrite. If there is other imaging data of higher axial resolution to validate or calibrate, the authors shall take some further considerations or analysis to check the consistency of their data, as the authors do need such a segregation between spine and shaft synapses to show how they evolve over the brain development stages. 

      We agree with the reviewer that the differentiation between spine and sha synapses can be difficult for those spines that are located above or below the dendric sha in the z-dimension because of the lower resolution of 2-photon microscopy in the z-dimension compared to the image plane. We have now added a new paragraph to the Methods section to describe in more detail how we identify spine and sha synapses and provide more examples in a new supplementary figure (Fig S5). We believe that we can identify spine and sha synapses reliably in most cases, but added a cautionary note to make the reader aware of potential misidentifications.

      (2) The use of terminology 'bursts of spontaneous inputs' for describing voltage-clamp data seems improper. Conventionally, 'burst' refers to suprathreshold spike firing events, but here, the authors use 'burst' to refer to inward synaptic currents collected at the cell body. Not every excitatory synaptic input (or ensemble of inputs) activation will lead to spike firing under naturalistic conditions, therefore, these two concepts are not equivalent. It is recommended to use 'barrage of inputs' instead of 'burst of inputs'. Imagine a full picture of the entire dendritic tree, the fact that the authors could always capture spontaneous Ca2+ events here and there within a few pieces of dendrites within an arbitrary field-of-view suggests that, the whole dendritic tree must have many more such events going on as a barrage while the author's patch electrode picks up the summed current flow from the whole dendritic tree. 

      We agree with the reviewer that “barrage” is a clearer term for multiple synaptic inputs occurring simultaneously and therefore we changed the terminology throughout the manuscript.

      (3) Following the above issue, an analysis of the temporal correlation between synaptic (not segregating 'spine' or 'shaft') Ca2+ events and EPSCs is absent. Again, the authors drew arbitrary time windows to clump the events for statistical analysis. However, the demonstrated example data already shows that the onset times of individual synaptic Ca2+ events do not necessarily align with the beginning of a 'barrage' inward current event. 

      The reviewer writes that “an analysis of the temporal correlation between synaptic calcium events and EPSCs is absent”. We would like to point out that we did determine the percentage of calcium transients that occurred during barrages of synaptic inputs (~60%, page 7). This is important, since the barrages in our patch-clamp recordings most likely reflect spontaneous network events as described in the developing cortex previously by us and many other labs . The time window we chose was not “arbitrary” as the reviewer suggests, but based on the duration of the barrages of synaptic inputs as defined in the Methods section. 

      The reason, why we did not perform a more in-depth analysis of the temporal relationship between synaptic calcium transients and synaptic input currents is that it is essentially impossible to relate calcium transients at individual synapses to specific synaptic input events. First, during barrages of synaptic inputs many synapses are active simultaneously, both in the mapped dendrites as well as in the un-observed parts of the dendric arborization as the reviewer notes above. Thus, barrages cannot be broken down into individual synaptic transmission events. Second, since our acquisition frequency is ~10 Hz, we can identify the onset of individual synaptic calcium transients with 100-200 ms precision (1 or 2 frames). However, throughout any 100-200 ms period of recording, several synapses are active across the entire dendric arborization such that we cannot assign a given calcium transient to a specific EPSC within a 100-200 ms epoch. Third, due to the limited clamping capacity of in vivo patch recordings, we cannot be certain that individual transmission events in distal dendrites can be resolved in the patch recording.

      (4) The authors claim that "these observations indicate that the activity patterns investigated here are not or only slightly affected by low-level anesthesia". It would be nice to show some of the recordings in this work without any anesthesia to support this claim. 

      Indeed, the conclusion that the patterns of activity are only slightly affected by low levels of anesthesia is based on our previous recordings on the network level. Unfortunately, we are still not able to record calcium imaging with single synapse resolution in unanesthezed developing mice (and no one else is as far as we know), because the skull of these young animals is not firm, yet. As a consequence, movements cannot be reduced sufficiently for patching and imaging with single synapse resolution. Our previously published (Siegel et al., 2012) and unpublished work on the cellular level suggests that activity patterns during light anesthesia are very similar to those during sleep in mouse pups at this age.

      Reviewer #3 (Public Review):

      Summary: 

      There is a growing body of litterature on the clustering of co-active synapses in adult mice, which has important implications for understanding dendritic integration and sensory processing more broadly. However, it has been unclear when this spatial organization of co-active synapses arises during development. In this manuscript, Leighton et al. investigate the emergence of spatially organized, coactive synapses on pyramidal dendrites in the mouse visual cortex before eye-opening. They find that some dendrite segments contain highly active synapses that are co-active with their neighbors as early as postnatal day (P) 8-10, and that these domains of co-active synapses increase their coverage of the dendritic arbor by P12-13. Interestingly, Leighton et al. demonstrate that synapses co-active with their neighbors are more likely to increase their activity across a single recording session, compared to synapses that are not co-active with their neighbors, suggesting local plasticity driven by coincident activity before eye-opening. 

      The current manuscript includes some replication of earlier results from the same research group (Winnubst et al., 2015), including the presence of clustered, co-active synapses in the visual cortex of mouse pups, and the finding that synapses co-active with their neighbors show an increase in transmission frequency during a recording session. The main novelty in the current study compared to Winnubst et al. (2015) is the inclusion of younger animals (P8-13 in the current study compared to P10-15 in Winnubst et al., 2015). The current manuscript is the first demonstration that active synapses are clustered on specific dendrite segments as early as P8-10 in the mouse visual cortex, and the first to show the progression in active synapse distribution along the dendrite during the 2nd postnatal week. These results from the visual cortex may help inform our understanding of sensory development more broadly. 

      Strengths: 

      The authors ask a novel question about the emergence of synaptic spatial organization, and they use well-chosen techniques that directly address their questions despite the challenging nature of these techniques. To capture both structural and functional information from dendrites simultaneously, the authors performed a whole-cell voltage clamp to record synaptic currents arriving at the soma while imaging calcium influx at individual synaptic sites on dendrites. The simultaneous voltage clamp and calcium imaging allowed the authors to isolate individual synaptic inputs without their occlusion by widespread calcium influx from back-propagating action potentials. Achieving in vivo dendrite imaging in live mice that are as young as P8 is challenging, and the resulting data provides a unique view of synaptic activity along individual dendrites in the visual cortex at an early stage in development that is otherwise difficult to assess. 

      The authors provide convincing evidence that synapses are more likely to be co-active with their neighbors compared to synapses located farther away (Fig. 6F-H), and that synapses co-active with their neighbors increase their transmission frequency during a recording session (Figure 7C). These findings are particularly interesting given that the recordings occur before eye-opening, suggesting a relationship between co-activity and local synaptic plasticity even before the onset of detailed visual input. These results replicate previously published findings from P10-15 pups (Winnubst et al., 2015), increasing confidence in the reproducibility of the data. 

      The authors also provide novel data documenting for the first time spatially organized, co-active synapses in pups as young as P8. Comparing the younger (P8-10) and older (P12-13) pups, provides insight into how clusters of co-active synapses might emerge during development. 

      Weaknesses: 

      This manuscript provides insufficient detail for assessing the rigor and reproducibility of the methods, particularly for age comparisons. The P8-10 vs P12-13 age comparisons are the primary novel finding in this manuscript, and it is, therefore, critical to avoid systematic age differences in the methods and analysis whenever possible. Specific concerns related to the age comparisons are listed below: 

      (1) Given that the same research group previously published P12-13 data (Winnubst et al., 2015), it is unclear whether both age groups in the current study were imaged/analyzed in parallel by the same researcher(s), or whether previous data was used for the P12-13 group. 

      While indeed the approach in the present study is similar to that of our previous study (Winnubst et al. 2015), the data set presented here is entirely new. The current study was made possible by a new microscope that allows combining resonant scanning with piezo-focusing to image large fractions of the dendric arborization. In fact, we could now image almost 10 times larger dendric segments including branch points than in our previous study. One author contributed to the experiments in both studies. Image analysis of all experiments was performed by the first author of the present study who was not involved in the Winnubst et al. work.

      (2) The authors mention that they used 2 different microscopes, and used a fairly wide range of imaging frame rates (5-15 Hz). It is unclear from the current manuscript whether the same imaging parameters were used across the two age groups. If data for the two experimental groups was collected separately, perhaps at different times, by a different person, or on a different microscope, there is a concern that some differences between the groups may not necessarily be due to age. 

      The reviewer mentions that the experimental settings are not identical across the experiments of this study. In the original manuscript we erroneously reported in the Methods section that 2 different setups were used for this study; however, all experiments were performed on the same microscope. We have corrected this in the new manuscript. We took timelapse recordings of small stacks of varying depth to cover as many dendrites as possible in each recording, therefore, we needed to adjust the rate of acquired stacks within a certain range as the reviewer points out. The data were acquired by two scientists during an overlapping period. And while the different ages were not recorded in a strictly randomized fashion, they were not acquired in sequence according to ages, but rather involved many attempts on animals of different ages from many different litters. For each litter a small percentage of animals would generate successful recordings, and the ages of these successes were random. Therefore, we believe that neither the collection of data nor the analysis (see point above) affected the differences we describe here for the two age groups.

      (3) It is unclear whether the image analysis was performed blind to age. Blinding to age during analysis is particularly important for this study, in which it was not possible to blind to age during imaging due to visible differences in size and developmental stage between younger and older pups. 

      The analysis was not setup to be performed blind to age. Not only is the age of the animal apparent at the stage (as the reviewer points out), also the number of spines and the activity levels clearly show differences between neurons only a few days apart. However, all age-related findings reported in this study - except the increase in synapse density and activity - became apparent to us only after the full set of synaptic transmission events was determined and the analysis was performed on the entire data set, making it very unlikely that event detection was biased.

      (4) The relatively low N (where N is the number of dendrites or the number of mice) in this study is acceptable due to the challenging nature of the techniques used, but unintentional sampling bias is a concern. For example, if higher-order dendrites from the apical tuft were imaged at P12-13, while more segments of the apical trunk were imaged at P8-10, this could inadvertently create apparent age differences that were in fact due to dendrite location on the arbor or dendrite depth. 

      The reviewer points out that sampling bias with respect to synapse location along dendrites in the dataset could lead to falsely apparent age differences. In all experiments we imaged dendrites of layer 2/3 neurons that were relatively close to the cortical surface to optimize image quality. In addition, we confirmed that the mean distance of the imaged dendric stretches from the cell body was similar between the dendrites of each age group (Young: 392 +/-  104 µm, Old: 323 +/- 118 µm; mean +/- STD). Therefore, we do not think that sampling bias affected these results.

      Additional general methodological concerns, which are not specifically related to the age comparisons, are listed below: 

      (5) The authors assert that clustered, co-active synapses emerge in the visual cortex before eye-opening, which is an important finding in that it suggests this phenomention is driven by spontaneous activity rather than visual input. However, this finding hinges on the imaged cells being reliably located in the visual cortex, which is difficult to identify with certainty in animals that have not yet opened their eyes and therefore cannot undergo intrinsic signal imaging to demarcate the boundaries of the visual cortex. If the imaged cells were in, for example, nearby somatosensory cortex, then the observed spatial organization could be due to sensory input rather than spontaneous activity. 

      The reviewer argues that if the neurons included in our analysis were located in non-visual sensory cortex, e.g. the somatosensory cortex, sensory experience might have shaped clustered inputs instead of spontaneous activity. We are, however, certain that the neurons were located inside the primary visual cortex. In previous experiments where we performed the same craniotomies, we mapped spontaneous activity across the sensory areas in the occipital neocortex and we know the exact location of V1 which is already very consistent during the second postnatal week. (See for example Supplemental Figure 4 in Leighton et al., 2021).  

      (6) It is unclear how the authors defined a synaptic transmission event in the GCaMP signal (e.g. whether there was a quantitative deltaF/F threshold). 

      In the revised manuscript, we describe the procedure of identifying synaptic calcium transients in more detail and added a new supplemental figure to clarify this aspect of the analysis. In short, we use an automated detection with a 2x standard deviation threshold and a subsequent manual control and selection step. Please, find all details in the Methods section and Figure S4 of the revised manuscript.

      (7) The authors' division of synapses into spine vs shaft is unconvincing due to the difficulty of identifying Z-projecting spines in images from 2-photon microscopy, where the Z resolution is insufficient to definitively identify Z-projecting spines, and the fact that spines in young animals may be thin and dim. The authors' examples of spine synapses (e.g. in Fig. 2A) are convincing, but some of the putative shaft synapses may in fact be on spines. 

      We agree with the reviewer that the differentiation between spine and sha synapses can be difficult for those spines that are located above or below the dendric sha in the z-dimension because of the lower resolution of 2-photon microscopy in the z-dimension compared to the image plane (see also response to Reviewer 2, point 1). We have now added a new paragraph to the Methods section to describe in more detail how we identify spine and sha synapses and provide more examples in a new supplementary figure (Fig S5). We believe that we can identify spine and sha synapses reliably in most cases, but added a cautionary note to make the reader aware of potential misidentifications.

      Reviewer #1 (Recommendations For The Authors):

      I think the experiments performed were very technically challenging (probably one of the few labs that can do this in the field), and the findings provide in vivo evidence on how structured synaptic inputs are assembled during development that has never been reported. 

      I suggest improving the writing and presentation and really explaining how they conducted the experiments and how they defined shaft synapses. 

      Line 96: 12 dendritic areas from 11 mice at ages between postnatal day 8 to 13. 

      - Do the authors know how many neurons were imaged? It is unclear if the authors patch on all the imaged neurons and only imaged (or analyzed) the dendrites of those patched neurons. If yes, how sparse are the neurons labelled from IUE? From 1B, it looks like there are two cells adjacent to each other. Can the authors really distinguish whether the imaged dendrites are from the patched neuron? 

      The reviewer wonders whether we can tell apart dendrites of patched cells from those of neighboring neurons that were not patched. This is actually very straight forward: the experiment included a depolarization step (see Methods section) which leads to an immediate, but temporary, increase in fluorescence in all of the patched neurons’ dendrites, but none of the neighboring dendrites. We have added this information to the Methods section of the new manuscript and provide now an example (Fig S3). Furthermore, as these cells normally fire frequently, it would immediately become clear that an unpatched cell is being imaged if backpropagating action potentials are predominantly observed rather than synaptic signals. The visualization of these synaptic signals is only possible due to the blockade of Na+ channels with QX314 in the intracellular solution (see Methods). 

      - In the methods section, it says 'dendrites were imaged in single plane or small stacks with plane...'. How do the authors do calcium imaging with small stacks of plane using Nikon MP scope? 

      Small stacks were acquired by using the piezo focusing device of our Nikon A1 microscope. Since we combined this fast focusing approach with resonant scanning, we were able to acquire z-stacks of 3-5 frames at a rate of up to 15 Hz (per stack).

      - I also assume this is not chronic imaging, and there are different mice for each postnatal day. If it's true, this is somewhat important for all the correlation analysis as there are only 2 mice for each postnatal day (other than day 12) and day 13 only has 1 animal. 

      Yes, indeed these are not chronic experiments and dendrites imaged on different days are from different neurons and different mice. We agree with the reviewer that if it had been possible to image the same neurons across these developmental stages, we would have detected even clearer correlations. Therefore, we see our results as conservative estimates of the developmental trajectory of the analyzed parameters.

      Line 104 - 109: I don't understand why the authors need to hold at -30mV to facilitate calcium influx through NMDA receptors? I assume this helps them to visualize as many synapses as possible? but wouldn't that also make the 'event frequency' not reflect the true value? 

      Indeed depolarizing the imaged neurons to -30 mV was necessary to get sufficient calcium influx to map synaptic inputs. We don’t think that this affects the frequency of inputs, because the frequency of synaptic inputs is determined by the presynaptic firing rate and the release probability of the presynaptic terminal, which are not affected by the depolarization of the dendrite.

      Figure 2A - It says in the method section that ROIs are manually selected. However, it's not explained what the criteria are. For spine synapses, it's easy to define but for shaft synapses like in Fig 2B, why are there 2 synapses on the shaft? And in Fig 4a, 5a, Fig S1 P13, some of the dendrites are packed with ROIs. What's the distance between those shaft synapses? Can the imaging resolution really separate them? 

      The reviewer asks for a better description of how we identified individual ROIs and thus synapse locations and whether this is actually feasible. We have now added a more detailed description of how we select synaptic sites based on the occurrence of synaptic calcium transients. In addition, we have added a new supplemental Figure (S4) to give the reader an impression of the image quality and the ability to locate individual synapses reliably. We find that separating sha synapses was possible for inter-synapse distances of ~4 µm or more. The mean sha synapse distance in our data set is 21 µm.

      - Similar issue applies to Figure 4A that I'm not sure what's the resolution of each 'hot spot'. They all seem very close together. Maybe additional raw dendrite images with fluorescence changes like 1C or 2A could be helpful (or movies in the supplementary?) 

      As the reviewer suggests, we have added now additional supplemental figures to illustrate better how we identify synaptic transmission events as well as spine and sha synapses.

      - Also for line 164, it says that 76% of high-activity synapses were located on spines. This could also maybe support that only the spine synapses are real synapses and many shaft synapses are actually not synapses and they were just categorized as shaft synapses from manual ROI? 

      We are actually quite sure that sha synapses are real synapses based on our analysis, since they show repeated synaptic calcium transients that co-occur with barrages of synaptic inputs as measured by patch-clamp recordings. Indeed one would expect to see a number of excitatory synapses on dendric shas of pyramidal neurons at these ages based on previous EM studies (Miller and Peters, 1981; Wildenberg et al., 2023).

      - While this might not impact the overall novelty of the paper, I would be curious to know if the authors can still observe the same findings if they only analyze spine synapses. 

      We repeated several analyses with a dataset that contained only spine synapses. For most analyses we observed the expected result: the effect sizes were similar compared to the entire data set, but the power was reduced. For example the effect of distance to closest high-activity neighbor and own activity (Fig 5E, F) was similar, but p-values were around 0.1 (Similar results for Figure 7B). In contrast, the co-activity with synapses within a domain was significantly higher than the co-activity with synapses in other domains also for the spine-synapse only data set. 

      Fig 6 - Does the domain co-activity also contribute to the synaptic current recorded (related to Fig 4). 

      Yes, the synaptic activity measured by calcium imaging contributes to the recorded EPSCs. However, the exact relationship between synaptic inputs measured by calcium imaging and those measured by patch-clamping is complicated by 3 factors: first, during barrages of synaptic inputs many synapses are active simultaneously, both in the mapped dendrites as well as in the un-observed parts of the dendric arborization. Thus, barrages cannot be broken down into individual events. Second, since our acquisition frequency is ~10 Hz, we can identify the onset of individual synaptic calcium transients with 100-200 ms precision (1 or 2 frames). However, throughout any 100-200 ms period of recording several synapses are active across the entire dendric arborization such that we cannot assign a given calcium transient to a specific EPSC within a 100-200 ms epoch. Third, due to the limited clamping capacity of in vivo patch recordings, we cannot be certain that individual transmission events in distal dendrites can be resolved in the patch recording as EPSCs.

      Reviewer #2 (Recommendations For The Authors):

      (1) I suggest the authors should provide the number of cells and mice recorded in the figure legends. 

      The number of dendrites and mice is the same across all analyses: 12 dendrites from 11 mice for all experiments, 6/6 for P8-10 and 6/5 for P12-13. All dendrites and synapses (and their ages) are shown in the supplemental figures S1 and S2. We mention the number of imaged dendrites now at the beginning of the Results section and when we split ages for the first me.

      (2) Instead of showing only cartoon illustrations of dendrites in Figures 3-6, I suggest showing the two photon images as well together with the cartoon. 

      The 2-photon images of all dendrites of the dataset are available in Figure S1. To allow the reader to compare the cartoon representations in the main figures and the 2-photon images of each neuron, we have now labeled each dendrite in the dataset (D1-D12, see figures S1 and S2). For every figure, where we show example neurons (cartoons or zoom ins) we now provide this identifier.

      Reviewer #3 (Recommendations For The Authors):

      To address the weaknesses outlined above, we recommend that the authors do the following: 

      • To address concerns about the rigor and reproducibility of the methods specifically related to age comparisons, please confirm the following: 

      - Both age groups were run in parallel by the same researcher(s). 

      Experiments were run partly overlapping and experiments from different age groups were performed in parallel by both researchers.

      - Both age groups were imaged on the same microscope, or animals from each age group were imaged on both microscopes. If it was necessary to use different microscopes for the different age groups for biological or practical reasons, please explain. 

      All experiments were run on the same microscope, a Nikon A1 2-photon microscope. In the original methods description we erroneously mentioned two microscopes (copy and paste error from a previous publication). We corrected that in the revised manuscript.

      - There was no difference in imaging frame rates or other imaging parameters between age groups. If it was necessary to use different parameters for different age groups for biological reasons, please explain. 

      We varied the frame rates somewhat to allow larger z-stacks for some experiments where dendrites traversed different depths; however the mean frame rates were similar between the experiments in P8-10 vs P12-13 dendrites, 8.5 vs 10 Hz, respectively.

      - Images were analyzed blind to age. 

      The analysis was not setup to be performed blind to age. The number of spines and the activity levels clearly show obvious differences between neurons only a few days apart. However, all findings reported in this study related to age - except the increase in synapse density and activity - became apparent to us only after the full set of synaptic transmission events was determined and the analysis was performed on the entire data set, making it unlikely that event detection was biased.

      - There was no difference in the location of analyzed dendrites (e.g. depth from the pia, branch order) between age groups. 

      In all experiments we imaged dendrites of layer 2/3 neurons that were relatively close to the cortical surface to optimize image quality. In addition, we determined the mean distance of the imaged dendric stretches from the cell body and found that this distance was similar between the dendrites of each age group (Young: 392 +/-  104 µm, Old: 323 +/- 118 µm; mean +/- STD). Therefore, we do not think that sampling bias affected these results.

      • To address general methodological concerns, please provide additional description of the following points: 

      - Please clarify how the visual cortex was identified in P8-13 pups. If there was ambiguity about identifying the visual cortex in these pups, please discuss the implications of this ambiguity. 

      The reviewer asks how we identified V1 in these experiments. We are indeed certain that the neurons were located inside the primary visual cortex. We have ample experience with mapping V1 in these animals based on patterns of spontaneous activity as well as post-hoc stainings. V1 is quite large already at these ages (> 2 mm long and > 1 mm wide) and its extent very consistent across animals. Thus, we would argue it is actually hard to miss.

      - Please clarify how synaptic transmission events were identified in the GCaMP signal. 

      We have now added a more detailed description of how we identify synaptic calcium transients. In addition, we have added a new supplemental Figure (S3) to give the reader an impression of the image quality and the ability to locate individual synapses reliably. 

      - It is acceptable to use the spine vs shaft analysis despite the inevitable difficulty resolving Z-projecting spines, but this caveat should be mentioned in the discussion of the spine vs shaft results. 

      We added a more detailed description of spine and sha synapse identification, a new supplemental figure (S5) and we now mention the caveat related to the limited z-resolution of 2-photon microscopy in the revised manuscript.

      • Two additional minor details should be clarified in the text of the manuscript: 

      - Please specify the volume of DNA solution injected into each embryo. 

      The injected volume was 1 µl. We added this information in the Methods section of the revised manuscript.

      - In Fig S1, please specify whether the scale bar applies to all images. 

      The scale bar applies to all images. This information was added to the figure legend.

      References

      Leighton AH, Cheyne JE, Houwen GJ, Maldonado PP, De Winter F, Levelt CN, Lohmann C. 2021. Somatostatin interneurons restrict cell recruitment to renally driven spontaneous activity in the developing cortex. Cell Rep 36:109316. doi:10.1016/j.celrep.2021.109316

      Miller M, Peters A. 1981. Maturation of rat visual cortex. II. A combined Golgi-electron microscope study of pyramidal neurons. JComp Neurol 203:555–573.

      Siegel F, Heimel JA, Peters J, Lohmann C. 2012. Peripheral and central inputs shape network dynamics in the developing visual cortex in vivo. Current Biology 22:253–258.

      Wildenberg G, Li H, Sampathkumar V, Sorokina A, Kasthuri N. 2023. Isochronic development of cortical synapses in primates and mice. Nat Commun 14:8018. doi:10.1038/s41467-02343088-3

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This is an interesting and well-written paper reporting on a novel approach to studying cerebellar function based on the idea of selective recruitment using fMRI. The study is well-designed and executed. Analyses are sound and results are properly discussed. The paper makes a significant contribution to broadening our understanding of the role of the cerebellum in human behavior.

      We thank the reviewer for the positive assessment of our paper.

      (1) While the authors provide a compelling case for the link between BOLD and the cerebellar cortical input layer, there remains considerable unexplained variance. Perhaps the authors could elaborate a bit more on the assumption that BOLD signals mainly reflect the input side of the cerebellum (see for example King et al., elife. 2023 Apr 21;12:e81511).

      Our paper is based on the assumption that the cerebellar BOLD signal reflects solely the input to the cerebellum and does not reflect the changes in firing rates of Purkinje cells. This assumption relies on two lines of arguments: Studies that have directly looked at the mechanism of vasodilation in the cerebellum, and studies that try to infer the contributions of different neurophysiological mechanisms to overall cerebellar metabolism (Attwell and Iadecola, 2002).

      Vasodilatory considerations: The mechanisms that causes vasodilation in the cerebellum, and hence BOLD signal increases, has been extensively studied: Electrical stimulation of mossy fibers (Gagliano et al., 2022; Mapelli et al., 2017), as well as parallel fibers (Akgören et al., 1994; Iadecola et al., 1996; Mathiesen et al., 1998; Yang and Iadecola, 1997) lead to robust increases in cerebellar blood flow. In contrast to the neocortex, the regulation of blood flow in the cerebellum depends nearly purely on the vasodilator Nitric Oxide (NO) (Akgören et al., 1994; Yang and Iadecola, 1997) with stellate cells playing a key role in the signaling cascade (Yang et al., 2000).

      Electrical (Mathiesen et al., 2000) and pharmacological (Yang and Iadecola, 1998) stimulation of climbing fibers also leads to robust increases in blood flow. Simultaneous parallel and climbing fiber stimulation seems to combine sub-additively to determine the blood flow changes (K. Caesar et al., 2003).

      Importantly, even dramatic changes in spiking rate of Purkinje cells do not lead to changes in vasodilation. For starters, parallel fiber stimulation leads to blood flow increases, even though the net effect on Purkinje cell firing is inhibitory (Mathiesen et al., 1998). More importantly, complete inhibition of the Purkinje cell using a GABA agonist does not change baseline cerebellar blood flow (Kirsten Caesar et al., 2003). Conversely, even a 200-300% increase in simple (and complex) spike firing rate through application of a GABA antagonist does not show any measurable consequences for blood flow, even though it clearly increases the metabolic rate of oxygen consumption in the tissue (Thomsen et al., 2009, 2004).

      In sum, this extensive set of studies clearly argues that the cerebellar blood flow response is mostly dictated by synaptic input, and that the firing rate of Purkinje cells does not influence vasodilation. Because the BOLD signal is caused by an supply of oxygen over and above the level of oxygen consumption, this would argue that increases in Purkinje cell firing would not lead to BOLD increases. What is less clear is the degree to which changes in BOLD signal during normal activity are determined by changes in mossy fiber or climbing fiber input. Disruption of either pathway leads to 60-70% reductions in the evoked blood flow response during whisker stimulation (Yang et al., 2000; Zhang et al., 2003) – but it remains unclear to what degree this reflects the distribution of contributions in the healthy animal, as these powerful disruptions may have a number of side-effects.

      Metabolic considerations: To estimate the relative contributions climbing fiber / mossy fiber input to the variations in BOLD signal under natural conditions, it is useful to consider the contributions of different cerebellar processes to the overall metabolism of the cerebellum. Assuming an average firing rate of 40Hz for mossy fibers, ~3Hz for Granule cells, and 1Hz for climbing fibers, Howarth et al. (Howarth et al., 2012, 2010) estimated that the transmission from mossy fibers to granular cells, dominates the energy budget with 53%. The subsequent stage, encompassing the transfer of information from Granular cells to Purkinje cells, accounts for 32% of energy expenditure. In contrast, integration within Purkinje cells and the spiking (simple and complex) of these cells represents only 15% of the total energy consumption.

      More important for the BOLD signal, however, are the activity-induced variations in metabolic consumption: Purkinje cells fire relatively constantly at a very high frequency (~50Hz) both during awake periods and during sleep (Shin et al., 2007). When providing a signal to the neocortex, firing rate decreases, actually lowering the metabolic demand. Climbing fibers normally fire at ~0.5 Hz and even during activity rarely fire much above 2Hz (Streng et al., 2017). In contrast, granule cells show a low firing rates during rest (typically <1hz) and can spike during activity well above 100Hz. Combined with the sheer number of granule cells, these considerations would suggest that the vast majority of the variation in metabolic demand are due to mossy fiber input and granule cell activity.

      Overall, we therefore think it is likely that the main determinant of the cerebellar cortical BOLD signal is mossy fiber input and the transmission of information from mossy fibers to granule cells to Purkinje cells. We admit that the degree to which climbing fiber input contribute to BOLD signal changes is much less clear. We can be quite certain, however, that the firing rate of Purkinje cells does not contribute to the cerebellar BOLD signal, as even dramatic changes in the firing rate do not cause any changes in vasodilation.  We have clarified our line of reasoning in the paper, and hope this more extensive response here will give the reader a better overview over the pertaining literature.

      (2) The current approach does not appear to take the non-linear relationships between BOLD and neural activity into account.

      Thank you for raising this concern. We did not stress this point in the paper, but one big advantage of our selective recruitment approach is that it is – to some degree- robust against non-linearities in the relationship between neural activity and BOLD signal. This is the case, as long as the shape of the non-linearity is similar in the cerebellum and the neocortex. The results of our motor task (Figure 3) provide a clear example of this: The BOLD signal both in the neocortex and cerebellum incases non-linearly as a function of force – the increase from 2.5N to 6N (a 3.5N increase) is larger than the increase from 6N to 10N (a 4N increase). A similar non-linearity can be observed for tapping speed (6, 10 to 18 taps / s). However, within each condition, the relationship between cortical and cerebellar activity is nearly perfectly linear, reflecting the fact that the shape of the non-linearity for the cerebellum and cortex is very similar.

      Most importantly, even if the non-linearity across the two structures is different, any non-linear relationship between neural activity and BOLD signal (of vasodilatory nature) should apply to different conditions (here force and speed increases) similarly. Therefore, if two conditions show overlapping activity levels (as observed for force and speed across medium and high levels, Figure 3), a offset between conditions cannot be caused by a non-linearity in the relationship of cortical and cerebellar activity. Because all conditions are subject to the same non-linearity, all points should lie on a single (likely monotonically increasing) non-linear function. Both for the motor and working memory task, the pattern of results clearly violates this assumption.

      (3) The authors may want to address a bit more the issue of closed loops as well as the underlying neuroanatomy including the deep cerebellar nuclei and pontine nuclei in the context of their current cerebello-cortical correlational approach. But also the contribution of other brain areas such as the basal ganglia and hippocampus. 

      Cortical-cerebellar communication is of course bi-directional. As discussed in King at al., (2023), however, we are restricting our model to the connections from the neocortex to the cerebellum for the following reasons: First, cerebellar BOLD activity likely reflects mostly neocortical input (see our answer to pt. 1), whereas neocortical activity is determined by a much wider array of projections, including striato-thalamo-cortical and cortico-cortical connections. Secondly, the output of the cerebellum cannot be predicted from the BOLD signal of the cerebellar cortex, as it is unlikely that the firing rate of Purkinje cells contribute to cerebellar BOLD signal (see pt. 1). For these reasons we believe that the relationship between neocortical and cerebellar activity patterns is mostly dictated by the connectivity from cortex to cerebellum, and is therefore best modelled as thus. This is now more clearly discussed in a new paragraph (line 318-323) of the revised manuscript.

      We are also ignoring other inputs to the cerebellum, including the spinal chord, the basal ganglia (Bhuvanasundaram et al., 2022; Bostan and Strick, 2018) hippocampus (Froula et al., 2023; Watson et al., 2019), and amygdala (Farley et al., 2016; Jung et al., 2022; Terburg et al., 2024). In humans, however, the neocortex remains the primary source of input to pontine nuclei. Consequently, it stands as the main structure shaping activity within the cerebellar cortex. While it is an interesting question to what degree the consideration of subcortical structures can improve the prediction of cerebellar activity patterns, we believe that considering the neocortex provides a good first approximation.

      Reviewer #1 (Recommendations):

      (4)  A few sentences to clarify the used models as was done in the King et al. (2024) paper may improve readability.

      We have now added the sentences in the introduction (line 25ff):

      To approach this problem, we have recently developed and tested a range of cortical-cerebellar connectivity models (King et al., 2023), designed to capture fixed, or task-invariant, transmission between neocortex and cerebellum. For each cerebellar voxel, we estimated a regularized multiple regression model to predict its activity level across a range of task conditions (King et al., 2019) from the activity pattern observed in the neocortex for the same conditions. The models were then evaluated in their ability to predict cerebellar activity in novel tasks, again based only on the corresponding neocortical activity pattern. Two key results emerged from this work. First, while rs-FC studies (Buckner et al., 2011; Ji et al., 2019; Marek et al., 2018) have assumed a 1:1 mapping between neocortical and cerebellar networks, models which allowed for convergent input from multiple neocortical regions to a single cerebellar region performed better in predicting cerebellar activity patterns for novel tasks. Second, when given a cortical activation pattern, the best performing model could predict about 50% of the reliable variance in the cerebellar cortex across tasks (King et al., 2023).

      (5) To what extent does this paper demonstrate the limitations of BOLD in neuroscientific research? 

      The primary objective of this study was to shed light on the problems of interpreting BOLD activation within the cerebellum. The problem that the BOLD signal mostly reflect input to a region is not unique to the cerebellum, but also applies (albeit likely to a lesser degree) to other brain structures. However, the solution we propose here critically hinges on three features of the cerebellar circuitry: a) the mossy fiber input for the cerebellar hemispheres mostly arise from the neocortex, b) the BOLD signal is likely dominated by this mossy fiber input (see pt. 1), and c) there is very little excitatory recurrent activity in the cerebellum, so output activity in the cerebellum does not cause direct activity in other parts of the cerebellum.

      These features motivate us to use a directed cortex->cerebellum connectivity model, which does not allow for any direct connectivity within the cerebellum. While the same approach can also be applied to other brain structures, it is less clear that the approach would yield valid results here. For example, due the local excitatory recurrent connectivity within neocortical columns, the activity here will also relate to local processing.

      (6) What if the authors reversed their line of reasoning as in that cerebellum activity is matched to map changes in cerebral cortical activity? Perhaps this could provide further evidence for the assumed directional specificity of the task-dependent gating of neocortical inputs. 

      Given (a) that the cerebellar BOLD signal tells us very little about cerebellar output signals (b) that there are many other input signals to the neocortex that are more powerful than cerebellar inputs, and c) that there strong cortical-cortical connections, we believe that this model would be hard to interpret (see also our answer to pt. 3).

      Therefore, while the inversion of the linear task-invariant mapping between cortical and cerebellar activity is a potentially interesting exercise, it is unclear to us at this point what strong predictions we would be able to test with this approach.

      (7) The statement that cerebellar fMRI activity may simply reflect the transmission of neocortical activity through fixed connections can be better explained. Also in the context of using the epiphenomenon (on page 11) in the paper. To what extent is the issue of epiphenomenon not a general problem of fMRI research?

      We have rephrased the introduction of this idea (line 17):

      This means that increases in the cerebellar BOLD signal could simply reflect the automatic transmission of neocortical activity through fixed anatomical connections. As such, whenever a task activates a neocortical region, the corresponding cerebellar region would also be activated, regardless of whether the cerebellum is directly involved in the task or not.

      Epiphemonal activity: This is indeed a general problem in fMRI research (and indeed research that uses neurophysiological recordings, rather than manipulations of activity). Indeed, we have discussed similar issues in the context of motor activity in ipsilateral motor cortex (Diedrichsen et al., 2009). However, given that we only offer a possible approach to address this issue for the cerebellum (see pt. 5), we thought it best to keep the scope of the discussion focused on this structure.

      Reviewer #2 (Public Review):

      Summary:

      Shahshahani and colleagues used a combination of statistical modelling and whole-brain fMRI data in an attempt to separate the contributions of cortical and cerebellar regions in different cognitive contexts.

      Strengths:

      The manuscript uses a sophisticated integration of statistical methods, cognitive neuroscience, and systems neurobiology.

      The authors use multiple statistical approaches to ensure robustness in their conclusions.

      The consideration of the cerebellum as not a purely 'motor' structure is excellent and important. <br />

      We thank the reviewer for their positive evaluation.

      Weaknesses:

      (1) Two of the foundation assumptions of the model - that cerebellar BOLD signals reflect granule cells > purkinje neurons and that corticocerebellar connections are relatively invariant - are still open topics of investigation. It might be helpful for the reader if these ideas could be presented in a more nuanced light.

      Please see response to the comment 1 of Reviewer 1 for a more extensive and detailed justification of this assumption. We have now also clarified our rationale for this assumption better in the paper on line 10-14. Finally, we now also raise explicitly the possibility that some of the violations of the task-invariant model could be caused by selectively increase of climbing fiber activity in some tasks (line 340).

      (2) The assumption that cortical BOLD responses in cognitive tasks should be matched irrespective of cerebellar involvement does not cohere with the idea of 'forcing functions' introduced by Houk and Wise. 

      We are assuming that you refer to the idea that cerebellar output is an important determinant of the dynamics (and likely also of the magnitude) of neocortical activity. We agree most certainly here. However, we also believe that in the context of our paper, it is justified to restrict the model to the connectivity between the neocortex and the cerebellum only (see reviewer 1, comment 3).

      Furthermore, if increased cerebellar output indeed occurs during the conditions for which we identified unusually high cerebellar activity, it should increase neocortical activity, and bring the relationship of the cerebellar and cortical activity again closer to the predictions of the linear model. Therefore, the identification of functions for which cerebellar regions show selective recruitment is rather conservative.

      Reviewer #2 (Recommendations):

      (3) One of the assumptions stated in the abstract -- that the inputs to the cerebellum may simply be a somewhat passive relay of the outputs of the cerebral cortex -- has been challenged recently by work from Litwin-Kumar (Muscinelli et al., 2023 Nature Neuroscience), which argues for complex computational relationships between cortical pyramidal neurons, pontine nuclei and granule cells, which in turn would have a non-linear impact on the relationship between cortical and cerebellar BOLD. The modelling is based on empirical recordings from Wagner (2019, Cell) which show that the synaptic connections between the cortex and granule cells change as a function of learning, further raising concerns about the assumption that the signals inherent within these two systems should be identical. Whether these micro-scale features are indicative of the macroscopic patterns observed in BOLD is an interesting question for future research, but I worry that the assumption of direct similarity is perhaps not reflective of the current literature. The authors do speak to these cells in their discussion, but I believe that they could also help to refine the authors' hypotheses in the manuscript writ large.

      We absolutely agree with your point. However, we want to make extremely clear here that our hypothesis (that the inputs to the cerebellum are a linear task-invariant function of the outputs of the cerebral cortex) is the Null-hypothesis that we are testing in our paper. In fact, our results show the first empirical evidence that task-dependent gating may indeed occur. In this sense, our paper is consistent with the theoretical suggestion of (Muscinelli et al., 2023).

      You may ask whether a linear task-invariant model of cortical-cerebellar connectivity is not a strawman, given that is most likely incorrect. However, as we stress in the discussion (line 298-), a good Null-model is a useful model, even if it is (as all models) ultimately incorrect. Without it, we would not be able to determine which cerebellar activity outstrips the linear prediction. The fact that this Null-model itself can predict nearly 50% of the variance in cerebellar activity patterns across tasks at a group level, means that it is actually a very powerful model, and hence is a much more stringent criterion for evidence for functional involvement than just the presence of activity.

      (4) Further to this point, I didn't follow the authors' logic that the majority of the BOLD response in the cerebellum is reflective of granule cells rather than Purkinje cells. I read through each of the papers that were cited in defense of the comment: "The cerebellar BOLD signal is dominated by mossy fiber input with very little contribution from the output of the cerebellar cortex, the activity of Purkinje cells" and found that none of these studies made this same direct conclusion. As such, I suggest that the authors soften this statement, or provide a different set of references that directly confirm this hypothesis. 

      Please see response to the comment 1, Reviewer 1. We hope the answer provides a more comprehensive overview over the literature, which DOES show that spiking behavior of Purkinje cells does not influence vasodilation (as opposed to mossy fiber input). We have now clarified our rationale and the exact cited literature on line 9-14 of the paper.

      (5) Regarding the statement: "As such, whenever a task activates a neocortical region, we might observe activity in the corresponding cerebellar regions regardless of whether the cerebellum is directly involved in the task or not." -- what if this is a feature, rather than a bug? That is, the organisation of the nervous system has been shaped over phylogeny such that every action, via efference copies of motor outputs, is filtered through the complex architecture of the cerebellum in order to provide a feed-forward signal to the thalamus/cortex (and other connected structures). Houk and Wise made compelling arguments in their 1995 Cerebral Cortex paper arguing that these outputs (among other systems) could act as 'forcing functions' on the kinds of dynamics that arise in the cerebral cortex. I am inclined to agree with their hypothesis, where the implication is that there are no tasks that don't (in some way) depend on cerebellar activity, albeit to a lesser or greater extent, depending on the contexts/requirements of the task. I realise that this is a somewhat philosophical point, but I do think it is important to be clear about the assumptions that form the basis of the reasoning in the paper. 

      This is an interesting point. Our way of thinking about cerebellar function does indeed correspond quite well to the idea of forcing functions- the idea that cerebellar output can “steer” cortical dynamics in a particular way. However, based on patient and lesion data, it is also clear that some cortical functions rely much more critically on cerebellar input than others. We hypothesize here that cerebellar activity is higher (as compared to the neocortical activity) when the functions require cerebellar computation.

      We also agree with the notion that cerebellar contribution is likely not an all-or-none issue, but rather a matter of gradation (line 324ff).

      (6) Regarding the logic of expecting the cortical patterns for speed vs. force to be matched -- surely if the cerebellum was involved more in speed than force production, the feedback from the cerebellum to the cortex (via thalamus) could also contribute to the observed differences? How could the authors control for this possibility? 

      Our model currently indeed does not attempt to quantify the contributions of cerebellar output to cortical activity. However, given that cerebellar output is not visible in the BOLD signal of the cerebellum (see reviewer 1, comment 1), we believe that this is a rational approach. As argued in our response to your comment 2, increased cerebellar output in the speed compared to the force condition should bring the activity relationship closer to the linear model prediction. The fact that we find increased cerebellar (as compared to neocortical) activity in the speed conditions, suggests that there is indeed task-dependent gating of cortical projections to the cerebellum.

      Akgören N, Fabricius M, Lauritzen M. 1994. Importance of nitric oxide for local increases of blood flow in rat cerebellar cortex during electrical stimulation. Proc Natl Acad Sci U S A 91:5903–5907.

      Attwell D, Iadecola C. 2002. The neural basis of functional brain imaging signals. Trends Neurosci 25:621–625.

      Bhuvanasundaram R, Krzyspiak J, Khodakhah K. 2022. Subthalamic Nucleus Modulation of the Pontine Nuclei and Its Targeting of the Cerebellar Cortex. J Neurosci 42:5538–5551.

      Bostan AC, Strick PL. 2018. The basal ganglia and the cerebellum: nodes in an integrated network. Nat Rev Neurosci 19:338–350.

      Buckner RL, Krienen FM, Castellanos A, Diaz JC, Yeo BTT. 2011. The organization of the human cerebellum estimated by intrinsic functional connectivity. J Neurophysiol 106:2322–2345.

      Caesar K., Gold L, Lauritzen M. 2003. Context sensitivity of activity-dependent increases in cerebral blood flow. Proc Natl Acad Sci U S A 100:4239–4244.

      Caesar K., Thomsen K, Lauritzen M. 2003. Dissociation of spikes, synaptic activity, and activity-dependent increments in rat cerebellar blood flow by tonic synaptic inhibition. Proc Natl Acad Sci U S A 100:16000–16005.

      Farley SJ, Radley JJ, Freeman JH. 2016. Amygdala Modulation of Cerebellar Learning. J Neurosci 36:2190–2201.

      Froula JM, Hastings SD, Krook-Magnuson E. 2023. The little brain and the seahorse: Cerebellar-hippocampal interactions. Front Syst Neurosci 17:1158492.

      Gagliano G, Monteverdi A, Casali S, Laforenza U, Gandini Wheeler-Kingshott CAM, D’Angelo E, Mapelli L. 2022. Non-linear frequency dependence of neurovascular coupling in the cerebellar cortex implies vasodilation-vasoconstriction competition. Cells 11:1047.

      Howarth C, Gleeson P, Attwell D. 2012. Updated energy budgets for neural computation in the neocortex and cerebellum. J Cereb Blood Flow Metab 32:1222–1232.

      Howarth C, Peppiatt-Wildman CM, Attwell D. 2010. The energy use associated with neural computation in the cerebellum. J Cereb Blood Flow Metab 30:403–414.

      Iadecola C, Li J, Xu S, Yang G. 1996. Neural mechanisms of blood flow regulation during synaptic activity in cerebellar cortex. J Neurophysiol 75:940–950.

      Ji JL, Spronk M, Kulkarni K, Repovš G, Anticevic A, Cole MW. 2019. Mapping the human brain’s cortical-subcortical functional network organization. Neuroimage 185:35–57.

      Jung SJ, Vlasov K, D’Ambra AF, Parigi A, Baya M, Frez EP, Villalobos J, Fernandez-Frentzel M, Anguiano M, Ideguchi Y, Antzoulatos EG, Fioravante D. 2022. Novel Cerebello-Amygdala Connections Provide Missing Link Between Cerebellum and Limbic System. Front Syst Neurosci 16:879634.

      King M, Hernandez-Castillo CR, Poldrack RA, Ivry RB, Diedrichsen J. 2019. Functional boundaries in the human cerebellum revealed by a multi-domain task battery. Nat Neurosci 22:1371–1378.

      King M, Shahshahani L, Ivry RB, Diedrichsen J. 2023. A task-general connectivity model reveals variation in convergence of cortical inputs to functional regions of the cerebellum. Elife 12:e81511.

      Mapelli L, Gagliano G, Soda T, Laforenza U, Moccia F, D’Angelo EU. 2017. Granular layer neurons control cerebellar neurovascular coupling through an NMDA receptor/NO-dependent system. J Neurosci 37:1340–1351.

      Marek S, Siegel JS, Gordon EM, Raut RV, Gratton C, Newbold DJ, Ortega M, Laumann TO, Adeyemo B, Miller DB, Zheng A, Lopez KC, Berg JJ, Coalson RS, Nguyen AL, Dierker D, Van AN, Hoyt CR, McDermott KB, Norris SA, Shimony JS, Snyder AZ, Nelson SM, Barch DM, Schlaggar BL, Raichle ME, Petersen SE, Greene DJ, Dosenbach NUF. 2018. Spatial and Temporal Organization of the Individual Human Cerebellum. Neuron 100:977-993.e7.

      Mathiesen C, Caesar K, Akgören N, Lauritzen M. 1998. Modification of activity-dependent increases of cerebral blood flow by excitatory synaptic activity and spikes in rat cerebellar cortex. J Physiol 512 ( Pt 2):555–566.

      Mathiesen C, Caesar K, Lauritzen M. 2000. Temporal coupling between neuronal activity and blood flow in rat cerebellar cortex as indicated by field potential analysis. J Physiol 523:235–246.

      Muscinelli SP, Wagner MJ, Litwin-Kumar A. 2023. Optimal routing to cerebellum-like structures. Nat Neurosci 26:1630–1641.

      Shin S-L, Hoebeek FE, Schonewille M, De Zeeuw CI, Aertsen A, De Schutter E. 2007. Regular patterns in cerebellar Purkinje cell simple spike trains. PLoS One 2:e485.

      Streng ML, Popa LS, Ebner TJ. 2017. Climbing Fibers Control Purkinje Cell Representations of Behavior. J Neurosci 37:1997.

      Terburg D, van Honk J, Schutter DJLG. 2024. Doubling down on dual systems: A cerebellum–amygdala route towards action- and outcome-based social and affective behavior. Cortex 173:175–186.

      Thomsen K, Offenhauser N, Lauritzen M. 2004. Principal neuron spiking: neither necessary nor sufficient for cerebral blood flow in rat cerebellum. J Physiol 560:181–189.

      Thomsen K, Piilgaard H, Gjedde A, Bonvento G, Lauritzen M. 2009. Principal cell spiking, postsynaptic excitation, and oxygen consumption in the rat cerebellar cortex. J Neurophysiol 102:1503–1512.

      Watson TC, Obiang P, Torres-Herraez A, Watilliaux A, Coulon P, Rochefort C, Rondi-Reig L. 2019. Anatomical and physiological foundations of cerebello-hippocampal interaction. Elife 8:e41896.

      Yang G, Huard JM, Beitz AJ, Ross ME, Iadecola C. 2000. Stellate neurons mediate functional hyperemia in the cerebellar molecular layer. J Neurosci 20:6968–6973.

      Yang G, Iadecola C. 1998. Activation of cerebellar climbing fibers increases cerebellar blood flow: role of glutamate receptors, nitric oxide, and cGMP. Stroke 29:499–507; discussion 507-8.

      Yang G, Iadecola C. 1997. Obligatory role of NO in glutamate-dependent hyperemia evoked from cerebellar parallel fibers. Am J Physiol 272:R1155-61.

      Zhang Y, Forster C, Milner TA, Iadecola C. 2003. Attenuation of activity-induced increases in cerebellar blood flow by lesion of the inferior olive. Am J Physiol Heart Circ Physiol 285:H1177-82.

    1. Author response:

      eLife assessment:

      The manuscript establishes a sophisticated mouse model for acute retinal artery occlusion (RAO) by combining unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) with a silicone wire embolus and carotid artery ligation, generating ischemia-reperfusion injury upon removal of the embolus. This clinically relevant model is useful for studying the cellular and molecular mechanisms of RAO. The data overall are solid, presenting a novel tool for screening pathogenic genes and promoting further therapeutic research in RAO.

      Thank you for recognizing the sophistication and clinical relevance of our mouse model for acute retinal artery occlusion. We are grateful for your supportive feedback.

      Public Reviews:

      Reviewer #1:

      Summary:

      Wang, Y. et al. used a silicone wire embolus to definitively and acutely clot the pterygopalatine ophthalmic artery in addition to carotid artery ligation to completely block the blood supply to the mouse inner retina, which mimics clinical acute retinal artery occlusion. A detailed characterization of this mouse model determined the time course of inner retina degeneration and associated functional deficits, which closely mimic human patients. Whole retina transcriptome profiling and comparison revealed distinct features associated with ischemia, reperfusion, and different model mechanisms. Interestingly and importantly, this team found a sequential event including reperfusion-induced leukocyte infiltration from blood vessels, residual microglial activation, and neuroinflammation that may lead to neuronal cell death.

      Strengths:

      Clear demonstration of the surgery procedure with informative illustrations, images, and superb surgical videos.

      Two-time points of ischemia and reperfusion were studied with convincing histological and in vivo data to demonstrate the time course of various changes in retinal neuronal cell survivals, ERG functions, and inner/outer retina thickness.

      The transcriptome comparison among different retinal artery occlusion models provides informative evidence to differentiate these models.

      The potential applications of the in vivo retinal ischemia-reperfusion model and relevant readouts demonstrated by this study will certainly inspire further investigation of the dynamic morphological and functional changes of retinal neurons and glial cell responses during disease progression and before and after treatments.

      We sincerely appreciate your detailed and positive feedback. These evaluations are invaluable in highlighting the significance and impact of our work. Thank you for your thoughtful and supportive review.

      Weaknesses:

      It would be beneficial to the manuscript and the readers if the authors could improve the English of this manuscript by correcting obvious grammar errors, eliminating many of the acronyms that are not commonly used by the field, and providing a reason why this complicated but clever surgery procedure was designed and a summary table with the time course of all the morphological, functional, cellular, and transcriptome changes associated with this model.

      Thank you for your thorough review of the manuscript. We sincerely apologize for any grammatical errors resulting from our English language proficiency and have taken the necessary steps to polish the article. Additionally, we have heeded your advice and reduced the use of field-specific acronyms to enhance readability for both the manuscript and its readers.

      Regarding the rationale behind the design of the UPOAO model, we have provided a description in Introduction section. Our group focuses on the research of pathogenesis and clinical treatment for RAO. The absence of an accurate mouse model simulating the retinal ischemic process has hampered progress in developing neuroprotective agents for RAO. To better simulate the retinal ischemic process and possible ischemia-reperfusion injury following RAO, we developed a novel vascular-associated mouse model called the unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) model. We drew inspiration from the widely employed middle cerebral artery occlusion (MCAO) model, commonly used in cerebral ischemic injury research, which guided the development of the UPOAO model.

      We appreciate your valuable suggestion regarding the inclusion of a summary table outlining the time course of morphological, functional, cellular, and transcriptome changes associated with this model. To address this, we intend to include a supplementary table at the end of the article, which will offer a comprehensive overview of the experimental results, thereby aiding in clarity and interpretation.

      Once again, we thank you for your insightful comments and suggestions, which have greatly contributed to the improvement of our manuscript.

      Reviewer #2:

      Summary:

      The authors of this manuscript aim to develop a novel animal model to accurately simulate the retinal ischemic process in retinal artery occlusion (RAO). A unilateral pterygopalatine ophthalmic artery occlusion (UPOAO) mouse model was established using silicone wire embolization combined with carotid artery ligation. This manuscript provided data to show the changes in major classes of retinal neural cells and visual dysfunction following various durations of ischemia (30 minutes and 60 minutes) and reperfusion (3 days and 7 days) after UPOAO. Additionally, transcriptomics was utilized to investigate the transcriptional changes and elucidate changes in the pathophysiological process in the UPOAO model post-ischemia and reperfusion. Furthermore, the authors compared transcriptomic differences between the UPOAO model and other retinal ischemic-reperfusion models, including HIOP and UCCAO, and revealed unique pathological processes.

      Strengths:

      The UPOAO model represents a novel approach to studying retinal artery occlusion. The study is very comprehensive.

      We greatly appreciate your positive assessment of our work and are encouraged by your recognition of its significance.

      Weaknesses:

      Some statements are incorrect and confusing. It would be helpful to review and clarify these to ensure accuracy and improve readability.

      We sincerely appreciate your meticulous review of the manuscript. Taking into account your valuable feedback, we will thoroughly address the inaccuracies identified in the revised version. Additionally, we will commit to polishing the article to ensure improved readability. We apologize for any confusion caused by these inaccuracies and genuinely thank you for bringing them to our attention.

    1. Author response:

      eLife assessment

      This valuable study reveals how a rhizobial effector protein cleaves and inhibits a key plant receptor for symbiosis signaling, while the host plant counters by phosphorylating the effector. The molecular evidence for the protein-protein interaction and modification is solid, though biological evidence directly linking effector cleavage to rhizobial infection is incomplete. With additional functional data, this work could have implications for understanding intricate plant-microbe dynamics during mutualistic interactions.

      Thank you for this helpful comment. In the revised manuscript version, we will be more prudent with directly linking cleavage of Nod factor receptors by NopT and rhizobial infection.

      We plan to modify the Title, the One-Sentence Summary, Abstract, and Discussion regarding this point.

      Public Reviews:

      Reviewer #1 (Public Review):

      Bacterial effectors that interfere with the inner molecular workings of eukaryotic host cells are of great biological significance across disciplines. On the one hand they help us to understand the molecular strategies that bacteria use to manipulate host cells. On the other hand they can be used as research tools to reveal molecular details of the intricate workings of the host machinery that is relevant for the interaction/defence/symbiosis with bacteria. The authors investigate the function and biological impact of a rhizobial effector that interacts with and modifies, and curiously is modified by, legume receptors essential for symbiosis. The molecular analysis revealed a bacterial effector that cleaves a plant symbiosis signaling receptor to inhibit signaling and the host counterplay by phosphorylation via a receptor kinase. These findings have potential implications beyond bacterial interactions with plants.

      Thank you for highlighting the broad significance of rhizobial effectors in understanding legume-rhizobium interactions. We fully agree with your assessment and will emphasize these points in the revised Introduction and Discussion sections of our manuscript. Specifically, we will expand our Discussion regarding the potential impact of the NopT interaction with symbiotic receptor kinases on plant immune signaling and regarding the general significance of our work.

      Bao and colleagues investigated how rhizobial effector proteins can regulate the legume root nodule symbiosis. A rhizobial effector is described to directly modify symbiosis-related signaling proteins, altering the outcome of the symbiosis. Overall, the paper presents findings that will have a wide appeal beyond its primary field.

      Out of 15 identified effectors from Sinorhizobium fredii, they focus on the effector NopT, which exhibits proteolytic activity and may therefore cleave specific target proteins of the host plant. They focus on two Nod factor receptors of the legume Lotus japonicus, NFR1 and NFR5, both of which were previously found to be essential for the perception of rhizobial nod factor, and the induction of symbiotic responses such as bacterial infection thread formation in root hairs and root nodule development (Madsen et al., 2003, Nature; Tirichine et al., 2003; Nature). The authors present evidence for an interaction of NopT with NFR1 and NFR5. The paper aims to characterize the biochemical and functional consequences of these interactions and the phenotype that arises when the effector is mutated.

      Thank you for your positive feedback on our manuscript. In the revised Introduction and Discussion sections, we plan to better emphasize the interdisciplinary significance of our work. We will show how the knowledge gained from our study can contribute to a better understanding of microbial interactions with eukaryotic hosts in general, which may have a stimulating effect on future research in various research areas such as pathogenesis and immunity.

      To ensure that the readers can easily follow the rationale behind our experiments, we will improve the Results section and provide more detailed explanations of how NopT among 15 examined effectors was selected. Additionally, we will provide more background information on NopT and the roles of NFR1 and NFR5 in symbiotic signaling in the Introduction section. As suggested, we will include the references Madsen et al. (2003) and Tirichine et al. (2003) as well as additional references on rhizobial NopT proteins into our revised manuscript version.

      Evidence is presented that in vitro NopT can cleave NFR5 at its juxtamembrane region. NFR5 appears also to be cleaved in vivo. and NFR1 appears to inhibit the proteolytic activity of NopT by phosphorylating NopT. When NFR5 and NFR1 are ectopically over-expressed in leaves of the non-legume Nicotiana benthamiana, they induce cell death (Madsen et al., 2011, Plant Journal). Bao et al., found that this cell death response is inhibited by the coexpression of nopT. Mutation of nopT alters the outcome of rhizobial infection in L. japonicus. These conclusions are well supported by the data.

      We appreciate that you recognize the value of our data.

      The authors present evidence supporting the interaction of NopT with NFR1 and NFR5. In particular, there is solid support for cleavage of NFR5 by NopT (Figure 3) and the identification of NopT phosphorylation sites that inhibit its proteolytic activity (Figure 4C). Cleavage of NFR5 upon expression in N. benthamiana (Figure 3A) requires appropriate controls (inactive mutant versions) that have been provided, since Agrobacterium as a closely rhizobia-related bacterium, might increase defense related proteolytic activity in the plant host cells.

      Thank you for recognizing the use of an inactive NopT variant in Figure 3A. In fact, increased activity of plant proteases induced by Agrobacterium is an important point that should not be neglected. We plan to mention this aspect in our revised Discussion.

      In the context of your comments, we are planning to make the following improvements to the manuscript:

      (1) We will add a more detailed description of the experimental conditions under which the cleavage of NFR5 by NopT was observed in vitro and in vivo.

      (2) We plan to provide more comprehensive data on the phosphorylation of NopT by NFR1, including phosphorylation assays and mass spectrometry results. These additional data support the proposed mechanism by which NFR1 inhibits the proteolytic activity of NopT.

      (3) We will expand the Discussion on the cell death response induced by ectopic expression of NFR1 and NFR5 in Nicotiana benthamiana. We will include more details from Madsen et al. (2011) to contextualize our findings with published literature.

      We believe these additions and clarifications will enhance the clarity and impact of our findings.

      Key results from N. benthamiana appear consistent with data from recombinant protein expression in bacteria. For the analysis in the host legume L. japonicus transgenic hairy roots were included. To demonstrate that the cleavage of NFR5 occurs during the interaction in plant cells the authors build largely on western blots. Regardless of whether Nicotiana leaf cells or Lotus root cells are used as the test platform, the Western blots indicate that only a small proportion of NFR5 is cleaved when co-expressed with nopT, and most of the NFR5 persists in its full-length form (Figures 3A-D). It is not quite clear how the authors explain the loss of NFR5 function (loss of cell death, impact on symbiosis), as a vast excess of the tested target remains intact. It is also not clear why a large proportion of NFR5 is unaffected by the proteolytic activity of NopT. This is particularly interesting in Nicotiana in the absence of Nod factor that could trigger NFR1 kinase activity.

      Thank you for your comments regarding the cleavage of NFR5 and its functional implications. In the revised version, we will change our manuscript taking into account the following considerations:

      (1) We acknowledge that the Western blots indicate only a small proportion of NFR5 is cleaved when co-expressed with NopT. It is worth noting in this context that the proteins were expressed at high levels which likely do not reflect the natural situation in L. japonicus. Low production of cleaved NFR5 in our Western blots with transformed N. benthamiana or L. japonicus cells thus may simply reflect an experimental effect due to high NFR5 protein synthesis. We suggest that the presence of high amounts of intact NFR5 does not have a significant functional impact on plant responses (cell death in N. benthamiana, rhizobial infection of L. japonicus) whereas NFR5 cleavage (or formation of NFR5 cleavage products) may be crucial for the observation of the observed phenotypic changes. The fraction of cleaved NFR5, although small, may be sufficient to disrupt crucial signaling pathways, leading to observable phenotypic changes. We will address possible differences between experimental and natural protein levels in our revised Discussion.

      (2) We studied in our work three biochemical aspects of NopT: (i) physical binding of NopT to NFR1 and NFR5 (ii) proteolytical cleavage of NFR5 by NopT and (iii) phosphorylation of NopT by NFR1. These three biochemical properties appear to influence each other. Phosphorylation of NopT by NFR1 appears to reduce its proteolytic activity, thereby counteracting NFR5 degradation by NopT (NFR5 homeostasis). Moreover, as NopT is a phosphorylation substrate for NFR1, NopT probably interferes with kinase mediated downstream responses of NFR1. Thus, NFR5 cleavage activity of NopT appears to be only one feature of NopT. We plan to mention these considerations in our revised Discussion.

      It is also difficult to evaluate how the ratios of cleaved and full-length protein change when different versions of NopT are present without a quantification of band strengths normalized to loading controls (Figure 3C, 3D, 3F). The same is true for the blots supporting NFR1 phosphorylation of NopT (Figure 4A).

      Thank you for pointing out this aspect. Following your recommendation, we will quantify the band intensities for cleaved and full-length NFR5 in the experiments with different versions of NopT. These values will be normalized to loading controls. Similarly, the Western blots supporting NFR1 phosphorylation of NopT will be quantified. The data for normalized band intensities will be included into the revised figures. The quantifications will provide a clearer understanding of how the ratios of cleaved to full-length proteins change with different NopT variants and also will provide information to which extent NopT is phosphorylated by NFR1.

      It is clear that mutation of nopT results in a quantitative infection phenotype. Nodule primordia and infection threads are still formed when L. japonicus plants are inoculated with ∆nopT mutant bacteria, but it is not clear if these primordia are infected or develop into fully functional nodules (Figure 5). A quantification of the ratio of infected and non-infected nodules and primordia would reveal whether NopT is only active at the transition from infection focus to thread or perhaps also later in the bacterial infection process of the developing root nodule.

      Thank you for pointing this out. In the revised version of our manuscript, we will provide data showing that there are no obvious differences in nodule formation in plants inoculated with ∆nopT and wild-type NGR234, respectively. However, quantification of infection threads containing our GFP-labeled rhizobia in primordia and nodules would be difficult to perform due to strong autofluorescence signals in these tissues. The main goal of our study was to identify and characterize the interaction between NopT and Nod factor receptors. We therefore believe that an in-depth analysis of the bacterial infection process at later symbiotic stages is out of the scope of the present work.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript presents data demonstrating NopT's interaction with Nod Factor Receptors NFR1 and NFR5 and its impact on cell death inhibition and rhizobial infection. The identification of a truncated NopT variant in certain Sinorhizobium species adds an interesting dimension to the study. These data try to bridge the gaps between classical Nod-factor-dependent nodulation and T3SS NopT effector-dependent nodulation in legume-rhizobium symbiosis. Overall, the research provides interesting insights into the molecular mechanisms underlying symbiotic interactions between rhizobia and legumes.

      Strengths:

      The manuscript nicely demonstrates NopT's proteolytic cleavage of NFR5, regulated by NFR1 phosphorylation, promoting rhizobial infection in L. japonicus. Intriguingly, authors also identify a truncated NopT variant in certain Sinorhizobium species, maintaining NFR5 cleavage but lacking NFR1 interaction. These findings bridge the T3SS effector with the classical Nod-factor-dependent nodulation pathway, offering novel insights into symbiotic interactions.

      We appreciate that you recognize the value of our manuscript.

      Weaknesses:

      (1) In the previous study, when transiently expressed NopT alone in Nicotiana tobacco plants, proteolytically active NopT elicited a rapid hypersensitive reaction. However, this phenotype was not observed when expressing the same NopT in Nicotiana benthamiana (Figure 1A). Conversely, cell death and a hypersensitive reaction were observed in Figure S8. This raises questions about the suitability of the exogenous expression system for studying NopT proteolysis specificity.

      We appreciate your attention to these plant-specific differences. In view of your comments, we plan to revise the Discussion and explain the different expression systems used for studying NopT effects in planta. Previous studies showed that NopT expressed in tobacco (N. tabacum) or in specific Arabidopsis thaliana ecotypes (with PBS1/RPS5 genes) causes rapid cell death (Dai et al. 2008; Khan et al. 2022). Our data shown in Fig. S8 confirm these findings. As cell death (effector triggered immunity) is usually associated with induction of protease activities, we considered N. tabacum and A. thaliana plants as not suitable for testing NFR5 cleavage by NopT. In fact, no NopT/NFR5 experiments were performed with these plants in our study. In contrast, the expression of NopT in Nicotiana benthamiana did not lead to cell death in our experiments. Khan et al. 2022 also reported that cell death does not occur in N. benthamiana unless the cells were transformed with PBS1/RPS5 constructs. Thus, N. benthamiana is a suitable expression system to analyze NopT protease activity on co-expressed substrates. Our revision aims to better understand the advantages of the N. benthamiana expression system for studying NopT mediated proteolysis of NFR5.

      (2) NFR5 Loss-of-function mutants do not produce nodules in the presence of rhizobia in lotus roots, and overexpression of NFR1 and NFR5 produces spontaneous nodules. In this regard, if the direct proteolysis target of NopT is NFR5, one could expect the NGR234's infection will not be very successful because of the Native NopT's specific proteolysis function of NFR5 and NFR1. Conversely, in Figure 5, authors observed the different results.

      Our inoculation experiments clearly show that NopT of NGR234 has a negative effect on formation of infection foci (Fig. 5A) and nodule primordia (Fig. 5E). Our biochemical analysis indicates that NopT targets the NFR1/NFR5 complex, which most likely impairs activation of downstream responses such as NIN gene expression. Accordingly, NIN promoter activity was found to be higher in roots inoculated with the Δ_nopT_ mutant as compared to the NGR234 wild-type (Fig. 5B and 5D). It is therefore plausible that NopT impairs rhizobial infection of L. japonicus due to inhibition of NFR1/NFR5 functions. We agree with this Reviewer that it can be expected that “NGR234's infection will not be very successful”. Fig. 5 confirms that Δ_nopT_ mutant is indeed a better symbiont and we do not think that we obtained “unexpectedly different results”. In the revised version, we will try to formulate our discussion text better in order to avoid any misunderstandings. Furthermore, will write as figure title “NopT dampens rhizobial infection…” instead of “NopT regulates rhizobial infection…”. We are also considering changing the title of our manuscript.  

      (3) In Figure 6E, the model illustrates how NopT digests NFR5 to regulate rhizobia infection. However, it raises the question of whether it is reasonable for NGR234 to produce an effector that restricts its own colonization in host plants.

      We acknowledge the potential paradox of NGR234 producing an effector that appears to restrict its own colonization in host plants. In fact, depending on the host plant, most rhizobial effectors are “double-edged swords” that play either a positive or negative role in the symbiosis. In response to your comment, we will discuss the possibility that NopT may confer selective advantages in interactions between NGR234 and host plants where NopT plays a positive symbiotic role (Dai et al. 2008; Kambara et al. 2009). Inhibition of NFR1/NFR5 functions by NopT in these host plants could be a feedback response in cells in which symbiotic signaling has already started. It is tempting speculate that the interaction between NopT and Nod factor receptors reduces Nod factor perception and downstream signaling to avoid a possible overreaction of symbiotic signaling, which may result in hypernodulation or formation of empty nodules without bacteria. Furthermore, it is tempting to speculate that NopT targets not only Nod factor receptors but also other host proteins to promote symbiosis, e.g. by suppressing excessive immune responses triggered by hyperinfection of rhizobia. In our revised manuscript, we will highlight the need for further investigations to elucidate the precise mechanisms underlying the observed infection phenotype and the role of NopT in modulating symbiotic signaling pathways.  

      (4) The failure to generate stable transgenic plants expressing NopT in Lotus japonicus is surprising, considering the manuscript's claim that NopT specifically proteolyzes NFR5, a major player in the response to nodule symbiosis, without being essential for plant development.

      Thank you for your comments. The failure to obtain L. japonicus plants constitutively expressing NopT was indeed surprising and suggests that NopT targets not only NFR5 but also other proteins in L. japonicus. The number of NopT substrates in plants could be greater than assumed. For example, we show in our work that NopT can cleave AtLYK5 and LjLYS11. In our manuscript, we don’t provide protocols and data on our efforts to construct L. japonicus plants stably expressing NopT. Indeed, it cannot be completely ruled out that the observed failure is not due to NopT expression, but rather to other factors that influence the transformation and regeneration of explants into whole plants. Our results should therefore not be over-interpreted. We consider a discussion of our failed transformation experiments to be somewhat preliminary and not central to this manuscript. herefore, we plan to modify our Discussion and delete the sentence reporting that stable transgenic plants expressing NopT have not been successfully generated.

    1. Author response:

      eLife assessment

      This useful study shows how genetic variation is associated with fecundity following a period of reproductive diapause in female Drosophila. The work identifies the olfactory system as central to successful diapause with associated changes in longevity and fecundity. While the genetic screening and methods used are solid, the approach to assessing diapause is incomplete and could benefit from additional orthogonal experiments.

      Response: We agree that, as with most studies, additional follow-up work will be informative.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper begins with phenotyping the DGRP for post-diapause fecundity, which is used to map genes and variants associated with fecundity. There are overlaps with genes mapped in other studies and also functional enrichment of pathways including most surprisingly neuronal pathways. This somewhat explains the strong overlap with traits such as olfactory behaviors and circadian rhythm. The authors then go on to test genes by knocking them down effectively at 10 degrees. Two genes, Dip-gamma and sbb, are identified as significantly associated with post-diapause fecundity, and they also find the effects to be specific to neurons. They further show that the neurons in the antenna but not the arista are required for the effects of Dip-gamma and sbb. They show that removing the antenna has a diapause-specific lifespan-extending effect, which is quite interesting. Finally, ionotropic receptor neurons are shown to be required for the diapause-associated effects.

      Strengths and Weaknesses:

      Overall I find the experiments rigorously done and interpretations sound. I have no further suggestions except an ANOVA to estimate the heritability of the post-diapause fecundity trait, which is routinely done in the DGRP and offers a global parameter regarding how reliable phenotyping is. A minor point is I cannot find how many DGRP lines are used.

      Response: Thank you for the suggestions. We screened 193 lines and we will add that information to the methods.

      Additionally, we will add the heritability estimate of the post-diapause fecundity trait.

      Reviewer #2 (Public Review):

      Summary

      In this study, Easwaran and Montell investigated the molecular, cellular, and genetic basis of adult reproductive diapause in Drosophila using the Drosophila Genetic Reference Panel (DGRP). Their GWAS revealed genes associated with variation in post-diapause fecundity across the DGRP and performed RNAi screens on these candidate genes. They also analyzed the functional implications of these genes, highlighting the role of genes involved in neural and germline development. In addition, in conjunction with other GWAS results, they noted the importance of the olfactory system within the nervous system, which was supported by genetic experiments. Overall, their solid research uncovered new aspects of adult diapause regulation and provided a useful reference for future studies in this field.

      Strengths:

      The authors used whole-genome sequenced DGRP to identify genes and regulatory mechanisms involved in adult diapause. The first Drosophila GWAS of diapause successfully uncovered many QTL underlying post-diapause fecundity variations across DGRP lines. Gene network analysis and comparative GWAS led them to reveal a key role for the olfactory system in diapause lifespan extension and post-diapause fecundity.

      Weaknesses:

      (1) I suspect that there may be variation in survivorship after long-term exposure to cold conditions (10ºC, 35 days), which could also be quantified and mapped using genome-wide association studies (GWAS). Since blocking Ir21a neuronal transmission prevented flies from exiting diapause, it is possible that natural genetic variation could have a similar effect, influencing the success rate of exiting diapause and post-diapause mortality. If there is variation in this trait, could it affect post-diapause fecundity? I am concerned that this could be a confounding factor in the analysis of post-diapause fecundity. However, I also believe that understanding phenotypic variation in this trait itself could be significant in regulating adult diapause.

      Response: We agree that it is possible that the ability to endure cool temperatures per se may influence post-diapause fecundity. However, cool temperature is the essential diapause-inducing condition in Drosophila, so it is not obvious how to separate those effects experimentally, and we agree that phenotypic variation in the cool-sensitivity trait itself could be significant in regulating diapause.

      (2) On p.10, the authors conclude that "Dip-𝛾 and sbb are required in neurons for successful diapause, consistent with the enrichment of this gene class in the diapause GWAS." While I acknowledge that the results support their neuronal functions, I remain unconvinced that these genes are required for "successful diapause". According to the RNAi scheme (Figure 4I), Dip-γ and sbb are downregulated only during the post-diapause period, but still show a significant effect, comparable to that seen in the nSyb Gal4 RNAi lines (Figure 4K).

      Response: Our definition of successful diapause is the ability to produce viable adult progeny post-diapause, which requires that the flies enter, maintain, and exit diapause, alive and fertile. We will restate our conclusion to say that Dip-γ and sbb are required for post-diapause fecundity.

      In addition, two other RNAi lines (SH330386, 80461) that did not show lethality did not affect post-diapause fecundity.

      Response: We interpret those results to mean that those RNAi lines were not effective since Dip-γ and sbb are known to be essential.

      Notably, RNAi (27049, KK104056) substantially reduced non-diapause fecundity, suggesting impairment of these genes affects fecundity in general regardless of diapause experience. Therefore, the reduced post-diapause fecundity observed may be a result of this broader effect on fecundity, particularly in a more "sensitized" state during the post-diapause period, rather than a direct regulation of adult diapause by these genes.

      Response: Ubiquitous expression of RNAi lines #27049 or #KK104056 was lethal, so we included the tubGAL80ts repressor to prevent RNAi from taking effect during development. Flies had to be shifted to 30 °C to inactivate the repressor and thereby activate the RNAi. At 30 °C, fecundity of the controls (GFP RNAi lines #9331, KK60102) were also lower (average non-diapause fecundity = 12 and 19 respectively) and similar to #27049 or #KK104056. We also assessed the knockdown using Repo GAL4 and nSyb GAL4 and did not find a significant difference/decline in the non diapause fecundity for #27049 and #KK104056 as compared to a nonspecific RNAi control (#54037).

      (3) The authors characterized 546 genetic variants and 291 genes associated with phenotypic variation across DGRP lines but did not prioritize them by significance. They did prioritize candidate genes with multiple associated variants (p.9 "Genes with multiple SNPs are good candidates for influencing diapause traits."), but this is not a valid argument, likely due to a misunderstanding of LD among variants in the same gene. A gene with one highly significantly associated variant may be more likely to be the causal gene in a QTL than a gene with many weakly associated variants in LD. I recommend taking significance into account in the analysis.

      We agree with the reviewer, and in Supplemental Table S3 we list top-associated SNPs in order from the lowest (most significant) p-value. Most of the top-associated genes from this analysis were uncharacterized CG numbers for which there were insufficient tools available for validation purposes. Nevertheless, there is overlap amongst the highly significant genes by p-value and those with multiple SNPs. Amongst the top 15 genes with multiple associated SNPsCG18636 & CR15280 ranked 3rd by p-value, CG7759 ranked 4th, CG42732 ranked 10th, and Drip ranked 30th (all above the conservative Bonferroni threshold of 4.8e-8) while three Sbb-associated SNPs also appear in Table 3 above the standard e-5 threshold.

      Reviewer #3 (Public Review):

      Summary:

      Drosophila melanogaster of North America overwinters in a state of reproductive diapause. The authors aimed to measure 'successful' D. melanogaster reproductive diapause and reveal loci that impact this quantitative trait. In practice, the authors quantified the number of eggs produced by a female after she exited 35 days of diapause. The authors claim that genes involved with olfaction in part contribute to some of the variation in this trait.

      Strengths:

      The work used the power platform of the fly DRGP/GWAS. The work tried to verify some of the candidate loci with targeted gene manipulations.

      Weaknesses:

      Some context is needed. Previous work from 2001 established that D. melanogaster reproductive diapause in the laboratory suspends adult aging but reduces post-diapause fecundity. The work from 2001 showed the extent fecundity is reduced is proportional to diapause duration. As well, the 2001 data showed short diapause periods used in the current submission reduce fecundity only in the first days following diapause termination; after this time fecundity is greater in the post-diapause females than in the non-diapause controls.

      Response: The 2001 paper by Tatar et al. reports the number of eggs laid after 3, 6, or 9 weeks in diapause conditions. Thus the diapause conditions used in this study (35 days or 5 weeks) are neither short nor long, rather intermediate. Does the reviewer have a specific concern?

      In this context, the submission fails to offer a meaningful concept for what constitutes 'successful diapause'. There is no biological rationale or relationship to the known patterns of post-diapause fecundity. The phenotype is biologically ambiguous.

      Response: We have unambiguously defined successful diapause as the ability to produce viable adult progeny post-diapause. Other groups have measured % of flies that arrest ovarian development or % of post-diapause flies with mature eggs in the ovary, or # eggs laid post-diapause; however we suggest that # of viable adult progeny produced post-diapause is more meaningful than the other measurements from the point of view of perpetuating the species.

      I have a serious concern about the antenna-removal design. These flies were placed on cool/short days two weeks after surgery. Adults at this time will not enter diapause, which must be induced soon after eclosion. Two-week-old adults will respond to cool temperatures by 'slowing down', but they will continue to age on a time scale of day-degrees. This is why the control group shows age-dependent mortality, which would not be seen in truly diapaused adults. Loss of antennae increases the age-dependent mortality of these cold adults, but this result does not reflect an impact on diapause.

      Response: The reviewer has a point. We carried out the lifespan study under two different conditions: either by removing the antenna and moving the flies directly to 10 °C or by removing the antenna and allowing a “wound healing” period prior to moving the flies to 10 °C (out of concern that the flies might have died quickly because wound healing may be impaired at 10 °C). In both cases, lifespan was shortened. We will add a discussion of the technical limitations of this experiment.

      • Appraisal of whether the authors achieved their aims, and whether the results support their conclusions.

      The work falls well short of its aim because the concept of 'successful diapause' is not biologically established. The paper studies post-diapause fecundity, and we don't know what that means. The loci identified in this analysis segregate for a minimally constructed phenotype. The results and conclusions are orthogonal.

      Response: It is unclear to us why the reviewer has such a negative opinion of measuring post-diapause fecundity, specifically the ability to produce viable progeny post-diapause. The value of this measurement seems obvious from the point of view of perpetuating the species.

      • The likely impact of the work on the field, and the utility of the methods and data to the community.

      The work will have little likely impact. Its phenotype and operational methods are weakly developed. It lacks insight based on the primary literature on post-diapause. The community of insect diapause investigators are not likely to use the data or conclusions to understand beneficial or pest insects, or the impact of a changing climate on how they over-winter.

      Response: The reviewer has not explained why his/her opinion is so negative.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We previously responded to reviewer comments in a previous iteration of this draft, edited the manuscript accordingly, and have no further comments on the majority of them. However, we performed additional analyses mainly in response to weaknesses Reviewer 1 highlighted related to “one shortcoming [being] the lack of a conceptual model explaining the results”, and the eLife assessment stating “the study falls short of providing a cogent interpretation of key findings, which could be of great interest and utility”. We provide a conceptual explanation that ties together many of our results, which we demonstrate using real data and further explore using simulated data – these analyses are in a new section titled “Increase in PGS effect for increasing percentiles of BMI itself, and its relation to R2 differences when stratifying by covariates”, with the Discussion also being updated accordingly.

      Essentially, we demonstrate that the effect of PGSBMI increases as BMI itself increases (using quantile regression – newly created Figure 5). This finding helps explain the correlation between covariate main effects, interaction effects, and maximum R2 differences when stratifying on different covariates, and also why any one or combination of covariates did not seem to be of unusual interest. While this result readily explains why covariates with larger main effects have larger interaction effects, by itself it does not seem to explain the differences in R2 in covariate-stratified bins, but we show using portions of real data and simulated data that in the case of this study they are closely related.

      Effectively, as the effect of PGSBMI increases, variance in the phenotype will also increase – so long as the residuals do not increase proportionately, this causes R2 to also increase as R2 directly depends on outcome variance. We demonstrate this using simulated data (newly created S Figure 2) and real data (newly created S Figure 3). So the largest R2 differences between certain covariate-stratified bins seems to be a direct consequence of those covariates also having the largest PGSBMI*covariate interaction effects. These results tie into our previous response to Reviewer 1, where essentially there is not only heteroskedasticity in the relationship between PGSBMI and BMI, but a cause of the heteroskedasticity is an increasing effect in PGSBMI as BMI itself increases.

      In the Discussion, we highlight several broad implications of these findings. First, these results may, in part, provide a generalizable explanation for epistasis, as the effect of a PGS (or any individual SNP) seems to depend on phenotype, and as phenotype depends on many SNPs, the effect of PGS and individual SNPs depends on other SNPs. Second, these results may also provide a generalizable explanation for GxE, as, demonstrated in this paper, interaction effects for SNPs (or a PGS) may largely depend on the phenotypic value itself, rather than any specific environment(s) or combination of. Finally, related to our previous response to Reviewer 2, modeling effects of SNPs dependent on phenotype itself would almost certainly result in gains in PGS performance (and locus discovery), which should also be larger than e.g., just GxAge effects as we demonstrated in this manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their careful reading of our manuscript and their constructive comments. We have significantly improved the writing, consolidated figures, and include new experiments (see below). We now center the manuscript on the methods used and have updated the title to reflect this new emphasis. We have also added quantification with statistics, as described below. A detailed description of our improvements is provided below.

      New data figures:   

      • Fig 3 – fig supp 2 – new experiment with insulin-triggered endocytosis of InsR

      • Fig 3 – fig supp 3 – new experiments, all using the same protein construct

      • Fig 3 – movie–  new experiment with insulin-triggered endocytosis of InsR

      • Fig 4 – added new vehicle-only negative control experiments

      • Fig 5 – fig supp 1 – new negative control experiments with sequential exposures to 750 nm light

      Added figure panels with quantification/statistics for:  Fig. 1F; , Figure 1- figure supp 2B, Figure 2B, D, Fig. 2 – fig supp 1B, D; Fig 2 – fig supp 2B;  Fig 2 – fig supp 3B;  

      Reviewer #1:

      (1) The paper might benefit from a more streamlined structure and a clearer emphasis on its findings. A possible way to enhance its impact might be to focus more on its methodological aspects. The methodological facets stand out as both innovative and impactful.

      We thank the reviewer for this suggestion and have rewritten the manuscript to center the methods, with our applications to TRPV1 and the InsR serving as examples.

      (2) Line 243: Please provide a reference for Tet3-Bu or clarify its origin in this study. A concise description would be helpful.

      The Jang et al., 2020 and Jana et al., 2023 studies are cited and give the structure of Tet3-Bu in Figure 3A.

      (3) Consider merging Figures 1 and 2 for clarity.  

      Because the cell types and constructs expressed differ for the figures, we did not merge them. However, we moved Figure 1 to the supplement because it repeats previously published data.

      (4) Lines 281 and 293 should refer to Figure 5C, not 5B.  

      This is now corrected.

      (5) Should the paper pivot towards methodology, combining Figures 6 and 7 might be more coherent. 

      The experiments in Figures 6 and 7 are different, making it difficult to merge them. However, Figures 7 and 8 describe the same experimental approach applied to two different membrane proteins. To align with our new focus on the methods and deemphasis of the biological system, we have merged Figures 7 and 8.

      (6) A brief discussion comparing the cell surface labeling techniques and the merits of the presented system would offer valuable context.

      We agree that additional discussion here would be helpful but were also trying to satisfy Reviewer #3’s request to reduce review-like content that disrupts the flow of the primary results. We therefore did not add a discussion of cell-surface labeling techniques.

      Reviewer #2:

      (1) To monitor the phosphatidylinositol-3,4,5-trisphosphates, the pleckstrin homology (PH) domain from Akt was used. This PH domain is not specific for just PI(3,4,5)P3 as stated by the authors. The Akt PH domain also binds PI(3,4)P2. The observed PI3K localization increase will also increase PI(3,4)P2 concentrations so the observed responses may not be solely because of PI(3,4,5)P3…

      …Repeating the PH domain experiments with a PH domain that is specific for just PI(3,4,5)P3, like GRP1 or Btk, would be useful to separate out any contributions from PI(3,4)P2.

      We have repeated key experiments demonstrating optogenetic activation of PI3K with the Grp1-PH domain and included these data in Figure 1-figure supplement 2.

      (2) The data in Figure 4 supplement was confusing to interpret since it is unclear whether a membrane protein with the Tet3 is being expressed at the same time as the ncAA for labeling or if the observed labeling is endogenous. If the observed labeling in Figure 4 supplement D is endogenous, then significant concerns come up regarding the background labeling of the sTCO-sulfo-Cy5 used in the rest of the experiments.

      We have updated the data in this figure using the same protein (InsR-Tet3-Bu-GFP) for every sTCO-conjugated dye tested. The protein is also labelled with GFP, making it clear which cells in the field were transfected and which were not. The new panels showing the bright field images for each field further aid readers in identifying untransfected cells. We believe the new presentation addresses the reviewer’s concerns about distinguishing sTCO labeling of Tet3-Bu-incorporating protein from labeling of endogenous proteins.

      (3) I recommend reorganizing the article to be more linear. For example, Figure 4 is not fully explained until after Figure 4 supplement and Figure 5. This non-linear organization required a lot of back and forth reading to fully understand the logic of the experiments as well as the conclusions. 

      We have improved the presentation along the lines suggested by the reviewer.

      (4) The InsR data is interesting as a proof of concept however the writing around the InsR looks like an afterthought. The explanation for why InsR is chosen, what is known and unknown about its trafficking is given secondary importance in the writing but not in the figures. This difference weakens the article.  

      We have improved the presentation along the lines suggested by the reviewer.

      (5) Line 244 should read Figure 4A.  

      This is now corrected.

      (6) Line 281 should read Figure 5C.  

      This is now corrected.

      (7) Line 645. Fig 4, says C and E were shown as inverted b&w images when they aren't.  

      This is now corrected.

      (8) Fig 8. Line 702. States that these are TRPV1 positive cells but the figure is about InsR.

      This is now corrected.

      Reviewer #3:

      (1) The Results section is lengthy and disorganized. Consider revising it for better clarity and conciseness. For instance, moving lines 157 and 166-170 to the Discussion or Methods section can streamline the Results section.  

      We have improved the presentation along the lines suggested by the reviewer.

      (2) Provide more specificity in reporting: In lines 139-170, clarify why you chose to use PhyB and this particular technique. Eliminate extraneous details and maintain a more concise narrative.

      We have improved the presentation along the lines suggested by the reviewer.

      (3) Avoid excessive review-like content, and keep the Results section focused on presenting novel findings. Simplify lines 4 173-185 to provide a straightforward presentation of results rather than extensive references to previous work.

      We have improved the presentation along the lines suggested by the reviewer.

      (4) Reevaluate lines 196-204 to determine if they are best suited for the Results section or if they could be moved to the Discussion or Methods for improved focus.

      We have improved the presentation along the lines suggested by the reviewer.

      (5) 231-238, revise the content to be more concise and directly to the point.  

      We have improved the presentation along the lines suggested by the reviewer.

      (6) Limit the number of figures to a maximum of five and restructure them to enhance readability. Consider consolidating panels from Figures 1 (which replicates previouslypublished work), 2, and 3 into a single figure to improve organization and information flow. 

      See response to Reviewer #1, Comment #3. Although we did not merge Figures 2 and 3, we have consolidated the writing to improve the flow of the writing.

      (7) Move Fig 5, which depicts control experiments, to supplementary information to improve the overall flow of the paper. Also, Figure 5 comes in the text before Figure 4 C-F and before Figure 4- supp1, so placing it in supplementary information would fix this issue. 

      We have moved this figure to the supplement as Figure 3 – figure supplement 1.

      (8) Merge Figures 6, 7, and 8 (or at least 7 and 8) to facilitate the comparison of data obtained with different proteins or conditions.  

      We have merged Figures 7 and 8.

      (9) Line 303: when referring to the chemical structure of sTCO-sulfo-Cy5, refer to Figure 4 Supp 1 and not Figure 9. Alternatively, consider moving Fig 9 to supplementary information or placing it earlier in the figure list.  

      We now refer to the earlier supplemental figure when describing the structure of sTCO-sulfo-Cy5.

      (10) Ensure proper referencing of Figure 4E in the text, particularly since it's vital to understanding the selection of mutation sites for the Insulin receptor, as discussed in lines 392-400. 

      We have made this correction.

      (11) Maintain citation consistency by verifying that all references cited in the text, including those in the Introduction, Results, and Discussion sections, are included in the References list at the end of the paper.

      We have reviewed all our citations for consistency.

      The reviewer is also concerned by the lack of any statistical analyses, and of appropriate control experiments:

      (1) The trapping of PI3K at the plasma membrane, shown in Figure 3 supplementary 1, is not very convincing. It is unclear whether PI3K is trapped at the membrane, as claimed by the authors, or whether PI3K slowly accumulates at the membrane independently of the light stimulation. Indeed, the baseline fluorescence isn't flat to start with (especially in F-11 cells), and the change in fluorescence under 650 nm light is very modest, much weaker, in fact, than in control experiments without TRPV1 (Figure 2C). Do the authors observe a similar drift in fluorescence in absence of photostimulation at 650 nm? Such control experiment needs to be performed and discussed. More importantly, authors need to provide quantitative (and not just qualitative) measures of the changes in fluorescence observed in the different conditions, and run adequate statistical analyses to compare the different conditions (for all the figures of the manuscript where this applies).  

      We can see that the language of “trapped at the membrane” is more of an interpretation than a description. We now describe this result as a lack of dissociation of PIF-iSH2 from the membrane in response to 750 nm light. We more clearly explain our interpretation and label it as speculative.

      (2) Consider moving Figure 3 Supplementary 1 from supplementary information to the main document due to its importance. It seems like an important finding to me, and I believe also to the authors, who wrote a whole paragraph on PI3K trapping in the discussion section (lines 361-380).  

      We agree that the results from this figure are important. To better align with the request of all reviewers to shorten the manuscript and reduce the number of figures in the main text, however, we have left the figure in the supplement.

      (3) Figure 3: why is the increase in IP3 levels not reversible as in Figure 2? Is this because IP3 is detected only at the membrane level (TIRF experiment) and not the entire cell? Authors should comment on this aspect. 

      As described in response to Comment#2, we now better explain our interpretation. Briefly, we speculate that the PIF-iSH2 that encounters TRPV1 in the plasma membrane binds to the ankyrin repeat domain of TRPV1 and, therefore, does not readily dissociate from membrane in response to 750 nm light.

      (4) Figure 4E: Verify the functionality of the Insulin receptor mutants, as was done for TRPV1.  

      We have added new experiments to demonstrate that the insulin receptor incorporating Tet3-Bu is functional. Because the insulin receptor is not electrogenic, we could not use electrophysiology to validate its function. Instead, we measured the insulin-dependent endocytosis of the receptor. These data are now presented in Figure 3 – figure supplement 2 and Figure 3 –  supplemental movie.

      (5) Figures 6 to 8: The authors quantify the change in plasma membrane expression of TRPV1 and insulin receptors after NGF treatment (or photoactivation), but an important control experiment is missing. They first label cells with sulfo-Cy5, then treat them with NGF (or photoactivate them with 650 nm light), and then label them again with sulfo-Cy5, supposedly to label only the TRPV1 receptors that newly arrived at the membrane. However, we have no evidence that the first sulfo-Cy5 labeling (1 uM, 5 min) was complete. In fact, labeling with sulfo-Cy5 (200 nM) in Figure 4 never reaches saturation, not even after 20 min. The authors need to control for this, by comparing the change in fluorescence with and without NGF treatment. The GFP control is simply not sufficient. Also, include Figure 8 in the text, as it is missing from the results section, and discuss the results in more detail. Indeed, the current data is appealing as it suggests that what was observed with TRPV1 is also true for the Insulin receptor, but without a proper control this could just be an artefact.  

      We have performed several new control experiments to address the reviewer’s concerns. (1) For NGF-induced increase in TRPV1 at the plasma membrane, we repeated the experiment using a vehicle instead of NGF. These data, added to Figure 4E, demonstrate that the increase in plasma membrane TRPV1 depends on NGF. (2) For the light-activated increase in plasma membrane TRPV1, we repeated the experiment using a second exposure to the deactivating 750 nm light instead of the activating 650 nm light and added the data as Figure 5, figure supplement 1A-E. These new data demonstrate that the increase in plasma membrane TRPV1 occurred only in response to  the activating wavelength of light. (3) To address the same as the previous comment, but for the insulin receptor, we repeated the insulin receptor experiments also using a second exposure to the deactivating wavelength of light. These data are now shown in Figure 5, figure supplement 1F-I and demonstrate that the increase in the insulin receptor levels in the plasma membrane required the activating wavelength of light.

      (6) Line 313: "Importantly, sTCO-sulfo-Cy5 did not appear to equilibrate across the cell membrane and did not label untransfected cells (i.e., those without GFP; Figure 4 - figure supplement 1)". I don't see where the absence of labeling of untransfected cells is shown. The authors should show fluorescence changes on the surface of both transfected and untransfected cells and, as discussed above, quantify the data and provide statistical analyses.

      See response to Reviewer #2, Comment #2.

      Minor Comments:

      (1) Define « PM » and « RTK » in abstract  We have made the requested changes.

      (2) Consider presenting the signaling pathways defined in the introduction in a scheme to improve readability.  

      We have added the signaling pathways defined in the introduction to Figure 1A.

      (3) In Figure 1A, include the CAAX lipidation signal in the schematic representation.  

      We had already shown the lipidation itself, but we have added the lipidation signal as a magenta star, with its meaning explained in the figure legend. We hope the reviewer finds this useful.

      (4) Terminology clarification: Given the broad readership of Elife, provide clearer explanations for terms and techniques used, such as the function of PIF (line 144).  

      We define the acronym PIF in the text, but do not further elaborate on the biological function of PIF to align with other reviewers’ requests that we reduce the review-type material in the manuscript.

      (5) Correct "m-1s-1" to "M-1s-1" in line 119.  

      This is now corrected.

      (6) Replace "activate" with "activation" in line 122.  

      This is now corrected.

      (7) Indicate 650 nm and 750 nm next to the arrows in Figure 2B for reader clarity.  

      We have added the requested arrow labels.

      (8) Correct Figure 5A to Figure 4A in line 244.  

      This is now corrected.

      (9) Correct Figure 5B to Figure 5C in line 293.  

      This is now corrected.

      (10) In lines 274, 293, 312 and 329, clearly specify which panels of the referenced figures are being discussed to avoid confusion. 

      We have now clearly specified which panels are being referenced.

      (11) Figure 1B: it is unclear how long after 650 nm light switching the image is taken. The red bar indicating 650 nm light makes it look like the image is taken right after light switching, which would suggest that PIF-YFP trafficking to the membrane takes milliseconds in response to 650 nm light. However, the legend says that photoactivation kinetics are in the range of 10 seconds. Please accurately position the red bar in Figure 1B to reflect the time between light switching and imaging, and specify the time between light switching and imaging in the figure legend.  

      We have more accurately shown the timing of image acquisition in what is now Figure 1, figure supplement 1.

      (12) Please add a merged image for all the immune data figure.

      We are uncertain about which figures the reviewer is referring to. We do not have any immunohistochemistry in the manuscript.  

      (13) Line 205: "we found that expression of TRPV1 trapped PIF-iSH2 at the PM upon stimulation with 650 nm light, so that it no longer translocated to the cytoplasm in response to 750 nm light (Figure 3B and Figure 3 - figure supplement 1A)." This is shown in the supplementary figure but not in Figure 3B. Same issue with the following sentence.  

      We have corrected the figure references in the text.

      (14) For Figures 7 and 8, the authors state ""We next asked whether click chemistry labeling could be executed in cells in which we also used the PhyB/PIF machinery for activating PI3K." Is this really the main motivation for conducting these experiments?

      Good point. We have improved the writing around this issue.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study identifies differential Orsay virus infection of C. elegans when animals are fed on different bacteria. The evidence for this is however, incomplete, as experiments to control for feeding rate and bacterial pathogenicity are needed as well as direct quantification of viral load. 

      We appreciate that the editors and reviewers felt that our manuscript addressed an important problem. We appreciate the constructive critiques provided by the reviewers and have worked to address all of the concerns, including a number of additional experiments as indicated below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      This manuscript explores the importance of food type on virus infection dynamics using a nematode virus as a model system. The authors demonstrate that susceptibility to viral infection can change by several orders of magnitude based on the type of bacterial food that potential hosts consume. They go on to show that, for the bacterial food source that reduces susceptibility, the effect is modulated by quorum sensing molecules that the bacteria produce. 

      Strengths: 

      This manuscript shows convincingly that nematode susceptibility to viral infection changes by several orders of magnitude (i.e. doses must be increased by several orders of magnitude to infect the same fraction of the population) depending on the bacterial food source on which hosts are reared. The authors then focus on the bacteria that reduce host susceptibility to viral infection and demonstrate that certain bacterial quorum-sensing compounds are required to see this effect of reduced susceptibility. Overall, sample sizes are large, methods are generally rigorous, experiments are repeated, and patterns are clear. 

      Weaknesses: 

      Although the molecular correlate of reduced susceptibility is identified (i.e. quorum sensing compounds) the mechanisms underlying this effect are missing. For example, there are changes in susceptibility due to altered nutrition, host condition, the microbiome, feeding rate, mortality of infected hosts, etc. In addition, the authors focus almost entirely on the reduction in susceptibility even though I personally find the increased susceptibility generated when reared on Ochrobactrum to be much more exciting. 

      I was a bit surprised that there was no data on basic factors that could have led to reductions in susceptibility. In particular, data on feeding rates and mortality rates seem really important. I would expect that feeding rates are reduced in the presence of Pseudomonas. Reduced feeding rates would translate to lower consumed doses, and so even though the same concentration of virus is on a plate, it doesn't mean that the same quantity of virus is consumed. Likewise, if Pseudomonas is causing mortality of virus-infected hosts, it could give the impression of lower infection rates. Perhaps mortality rates are too small in the experimental setup to explain this pattern, but that isn't clear in the current version of the manuscript. Is mortality greatly impacted by knocking out quorum-sensing genes? Also, the authors explored susceptibility to infection, but completely ignored variation in virus shedding. 

      We have added data on feeding rates (Line numbers 141-148 and 176-182, Supplementary Figure 4). After six hours of exposure no differences in feeding rate were observed. After 24 hours minor differences emerged between O. vermis MYb71 and each Pseudomonas species, however feeding rate inversely correlated with susceptibility to Orsay virus in that O. vermis MYb71 displayed the lowest feeding rate while P. aeruginosa PA14 displayed the highest feeding rate.

      We have also added data on mortality rates (Line numbers 183-200, Supplementary Figure 6). No significant mortality was observed within the 24-hour exposure period used for our Orsay infection and transmission assays. P. aeruginosa virulence is dependent upon temperature and as our assays are done at 20°C rather than 25°C this may account for reduced mortality compared to other published results. Regardless, we noted that O. vermis MYb71 killed C. elegans as quickly as P. aeruginosa PA14 under these conditions and these two bacteria led to the shortest lifespan compared to the other tested bacteria. Interestingly, P. lurida MYb11 was observed to be more virulent than P. aeruginosa PA01 under these conditions. These results suggest that there is no direct correlation between mortality and susceptibility to Orsay virus, although it does not rule out that virulence effects unique to each bacterium could contribute to alterations in host susceptibility.  

      The reviewer is correct to assert that differences in viral shedding could exist. However, our susceptibility assays using exogenous Orsay virus remove this source of variation and yet we still observe the same trends such that O. vermis MYb71 promotes infection while P. lurida MYb11, P. aeruginosa PA01, and P. aeruginosa PA14 attenuate infection. Further we measured the amount of virus shed into the lawns in the presence of different bacteria and did not observe differences in shed virus that could account for the differences we observe in incidence proportion (Line numbers 241-254, Fig. 3 F). Viral stability could be an issue in both the transmission and susceptibility assays. We therefore tested viral stability in the presence of E. coli, P. lurida MYb11, P. aeruginosa PA01, and P. aeruginosa PA14 and successfully recovered virus from all lawns, suggesting virus is not rapidly degraded in the presence of any bacterium (Fig. 3D and 3E). However, we noted that the recovery of Orsay virus from lawns of E. coli OP50 and P. lurida MYb11 within 30 minutes was decreased compared to a spike-in control suggesting recovery from each lawn is not equivalent. This complicates a comparison of viral stability and shedding rates between different bacteria, but our ability to recover substantial amounts of virus in the shedding assay from the three Pseudomonas strains we examined precludes a substantial decrease in shedding rates as an explanation for the robust attenuation of Orsay virus observed in transmission assays.  

      I was also curious why the authors did not further explore the mechanism behind the quorumsensing effect. Not sure whether this is possible, but would it be possible to add spent media to the infection plates where the spent media was from Pseudomonas that produce the quorum sensing compound but the plates contain OP50, Pseudomonas, or the quorum sensing knockout of Pseudomonas? That would reveal whether it is the compound itself vs. something that the compound does. 

      We observed that quorum sensing mutants suppressed the attenuation of Orsay virus infection and we agree that this could be a consequence of the compounds themselves, or more likely an effect of the downstream consequences of quorum signaling. We added culture supernatant from each bacterium to lawns of E. coli OP50 to assess the effect on host susceptibility and did not observe any potent effect (Line numbers 311-318, Supplementary Figure 9). This supports an interpretation that it is not the compound itself that is responsible, however we cannot rule out that the compounds themselves may be responsible if provided at a higher concentration.

      In addition, I was surprised by how much focus there was on the attenuation of infection and how little there was on the enhancement of infection. To me, enhancement seems like the more obvious thing to find a mechanism for -- is the bacteria suppressing immunity, preventing entry to gut cells, etc? 

      We are also intrigued by the enhancement of infection by Ochrobactrum spp, however we chose to focus on attenuation given the availability of Pseudomonas aeruginosa genetic mutants for study. We have added data (Line numbers 371-402, Figure 7, and Supplemental Figure 12) that inform our current hypothesis regarding Ochrobactrum mediated enhancement of Orsay virus infection.

      I was a bit concerned about the "arbitrary units", which were used without any effort to normalize them. David Wang and Hongbing Jiang have developed a method based on tissue culture infectious dose 50 (TCID50) that can be used to measure infectious doses in a somewhat repeatable way. Without some type of normalization, it is hard to imagine how this study could be repeated. The 24-hour time period between exposure and glowing suggests very high doses, but it is still unclear precisely how high. Also, it is clear that multiple batches of virus were used in this study, but it is entirely unclear how variable these batches were. 

      We have clarified that we also measured the (TC)ID50 for every batch of virus used similar to the methods suggested by the Wang laboratory (Line numbers 107-119 and 499-506). We have added a figure showing the virus batch variability for all batches used in this study (Supp. Fig. 2). We have further clarified that the arbitrary units correspond to the actual microliters of viral filtrate used during infection and provided clear methods to replicate our viral batch production to assist with issues of reproducibility (Line numbers 107-119 and 499-506).

      The authors in several places discuss high variability or low variability in incidence as though it is a feature of the virus or a feature of the host. It isn't. For infection data (or any type of binomial data) results are highly variable in the middle (close to 50% infection) and lowly variable at the ends (close to 0% or 100% infection). This is a result that is derived from a binomial distribution and it should not be taken as evidence that the bacteria or the host affect randomness. If you were to conduct dose-response experiments, on any of your bacterial food source treatments, you would find that variability is lowest at the extremely high and extremely low doses and it is most variable in the middle when you are at doses where about 50% of hosts are infected. 

      Thank you for pointing this out, we have removed all reference to this throughout the manuscript.

      Reviewer #2 (Public Review):

      Summary and Major Findings/Strengths:

      Across diverse hosts, microbiota can influence viral infection and transmission. C. elegans is naturally infected by the Orsay virus, which infects intestinal cells and is transmitted via the fecal-oral route. Previous work has demonstrated that host immune defense pathways, such as antiviral RNAi and the intracellular pathogen response (IPR), can influence host susceptibility to virus infection. However, little is known about how bacteria modulate viral transmission and host susceptibility. 

      In this study, the authors investigate how diverse bacterial species influence Orsay virus transmission and host susceptibility in C. elegans. When C. elegans is grown in the presence of two Ochrobactrum species, the authors find that animals exhibit increased viral transmission, as measured by the increased proportion of newly infected worms (relative to growth on E. coli OP50). The presence of the two Ochrobactrum species also resulted in increased host susceptibility to the virus, which is reflected by the increased fraction of infected animals following exposure to the exogenous Orsay virus. In contrast, the presence of Pseudomonas lurida MYb11, as well as Pseudomonas PA01 or PA14, attenuates viral transmission and host susceptibility relative to E. coli OP50. For growth in the presence of P. aeruginosa PA01 and PA14, the attenuated transmission and susceptibility are suppressed by mutations in regulators of quorum sensing and the gacA two-component system. The authors also identify six virulence genes in P. aeruginosa PA14 that modulate host susceptibility to virus and viral transmission, albeit to a lesser extent. Based on the findings in P. aeruginosa, the authors further demonstrate that deletion of the gacA ortholog in P. lurida results in loss of the attenuation of viral transmission and host susceptibility. 

      Taken together, these findings provide important insights into the species-specific effects that bacteria can have on viral infection in C. elegans. The authors also describe a role for Pseudomonas quorum sensing and virulence genes in influencing viral transmission and host susceptibility. 

      Major weaknesses: 

      The manuscript has several issues that need to be addressed, such as insufficient rigor of the experiments performed and questions about the reproducibility of the data presented in some places. In addition, confounding variables complicate the interpretations that can be made from the authors' findings and weaken some of the conclusions that are stated in the manuscript. 

      (1) The authors sometimes use pals-5p::GFP expression to indicate infection, however, this is not necessarily an accurate measure of the infection rate. Specifically, in Figures 4-6, the authors should include measurements of viral RNA, either by FISH staining or qRT-PCR, to support the claims related to differences in infection rate. 

      Following the reviewers comment we have corroborated our pals-5::GFP data using FISH staining (Line numbers 291-292 and 357-359, Figure 4D & 4E, and Figure 6C).  

      (2) In several instances, the experimental setup and presentation of data lack sufficient rigor. For example, Fig 1D and Fig 2B only display data from one experimental replicate. The authors should include information from all 3 experimental replicates for more transparency. In Fig 3B, the authors should include a control that demonstrates how RNA1 levels change in the presence of E. coli OP50 for comparison with the results showing replication in the presence of PA14. In order to support the claim that "P. aeruginosa and P. lurida MYb11 do not eliminate Orsay virus infection", the authors should also measure RNA1 fold change in the presence of PA01 and P. lurida in the context of exogenous Orsay virus. Additionally, the authors should standardize the amount of bacteria added to the plate and specify how this was done in the Methods, as differing concentrations of bacteria could be the reason for species-specific effects on infection. 

      All experimental replicates are now included within the supplementary information. 

      We have also measured RNA1 fold change following infection in the presence of P. aeruginosa PA01 and P. lurida MYb11 (Line numbers Fig 3B and 3C) and found that these bacteria also do not eliminate Orsay virus replication. 

      We thank the reviewer for their comment on controlling the amount of bacteria and have clarified our methods section to more clearly explain that we seed our plates with equivalent amounts (based on volume) of overnight bacterial culture before allowing the bacteria to grow on the plates for 48 hours.  

      (3) The authors should be more careful about conclusions that are made from experiments involving PA14, which is a P. aeruginosa strain (isolated from humans), that can rapidly kill C. elegans. To eliminate confounding factors that are introduced by the pathogenicity of PA14, the authors should address how PA14 affects the health of the worms in their assays. For example, the authors should perform bead-feeding assays to demonstrate that feeding rates are unaffected when worms are grown in the presence of PA14. Because Orsay virus infection occurs through feeding, a decrease in C. elegans feeding rates can influence the outcome of viral infection. The authors should also address whether or not the presence of PA14 affects the stability of viral particles because that could be another trivial reason for the attenuation of viral infection that occurs in the presence of PA14. 

      We have added data on feeding rates (Line numbers 141-148 and 176-182, Supplementary Figure 4). After six hours of exposure no differences in feeding rate were observed. After 24 hours minor differences emerged between O. vermis MYb71 and each Pseudomonas species, however feeding rate inversely correlated with susceptibility to Orsay virus in that O. vermis MYb71 displayed the lowest feeding rate while P. aeruginosa PA14 displayed the highest feeding rate.

      We have also added data on mortality rates (Line numbers 183-200, Supplementary Figure 6). No significant mortality was observed within the 24-hour exposure period used for our Orsay infection and transmission assays. P. aeruginosa virulence is dependent upon temperature and as our assays are done at 20°C rather than 25°C this may account for reduced mortality compared to other published results. Regardless, we noted that O. vermis MYb71 killed C. elegans as quickly as P. aeruginosa PA14 under these conditions and these two bacteria led to the shortest lifespan compared to the other tested bacteria. Interestingly, P. lurida MYb11 was observed to be more virulent than P. aeruginosa PA01 under these conditions. These results suggest that there is no direct correlation between mortality and susceptibility to Orsay virus, although it does not rule out that virulence effects unique to each bacterium could contribute to alterations in host susceptibility.  

      We tested viral stability in the presence of E. coli OP50 and Pseudomonas spp. and successfully recovered virus from all lawns, suggesting virus is not rapidly degraded in the presence of P. lurida MYb11, P. aeruginosa PA01, and P. aeruginosa PA14 (Line numbers 241-249, Fig 3D and Fig 3E). However, we noted that the recovery of Orsay virus from lawns of E. coli OP50 and P. lurida MYb11 within 30 minutes was decreased compared to a spike-in control suggesting recovery from each lawn is not equivalent. This complicates a comparison of viral stability and shedding rates between different bacteria, but our ability to recover substantial amounts of virus in the shedding assay from each Pseudomonas species precludes a substantial decrease in shedding rates as an explanation for the robust attenuation of Orsay virus observed in transmission assays.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Overall, I really liked this manuscript, I do think there are areas for improvement though. 

      Some smaller things: 

      Line 84: "can be observed spreading from a single animal" -- this isn't really great wording because the virus itself can't be observed (at least not very easily) -- even infection is hard to see. 

      The wording in line 84-85 has now been adjusted to read “can spread from a single animal”.

      Fig 1C: which groups are statistically significantly different from each other? 

      Statistics have now been added to Figure 1C. 

      Line 154: not necessary to do for this paper, but this sentence made me curious whether the effect would have been seen with mixtures of bacteria (i.e. what if 50% were OP50 and 50% were Pseudomonas?) 

      This data has now been added in Line numbers 372-378, Figure 7A, and Supp. Fig. 12A and 12B.

      Line 262-264: I don't find this interesting at all for the reasons mentioned earlier about binomial data being the most variable in the middle. 

      These lines have been removed.

      Figure 4 B: The labels for the first two tick marks on the x-axis are switched I suspect. Otherwise, the controls did not behave as expected. 

      Figure 4B has been corrected.

      Line 288, 297 and several other places: "Orsay Virus" should be "Orsay virus". 

      We have corrected these instances.

      Supplemental Figure 2: Labels in the figure legend are B and C instead of A and B. 

      These labels have been adjusted for their placement within Figure 6.

      Line 411: I suspect this was supposed to be 13,200 xg rather than 13.2 xg. 

      This error has been corrected.

      Line 416-417: This sentence is very hard to interpret. More details are needed. This is the ID50 in which host strain? Is this averaged over all batches of virus? How variable are the batches? 

      This sentence (line number 114) has been amended to clarify that all ID50 values referred to here were calculated for ZD2611 populations in the presence of E. coli OP50. Further, Supplementary Figure 2 now shows all the ID50 values measured for each batch of virus used in this manuscript resulting in an average ID50 of 3.6.

      Lines 467-469: Why exclude these instead of counting them as zeros in the analysis? How many plates fit this description -- were there lots or only a few over the course of all experiments? 

      We have chosen to exclude these plates as these samples lost spreaders at some point during the course of the assay potentially skewing the eventual number of new infections counted depending on when the infected spreader animal crawled off the plate.  We have detailed the number of plates that fit this description in lines 559-562. 

      Line 476: A critical detail that is missing here is what number of worms were counted to score infection. Please say here or in the figure legends. 

      We have added the total number of worms counted and the minimum number counted per plate for each assay in the figure legends.

      Line 546: Why was only a single representative experiment shown? I'm asking for a justification, not necessarily for you to show all the data. 

      We chose to show a single representative experiment for two reasons:  We noted variability between susceptibility assays even when using the same batch of virus such that we could not combine experiments into a single plot as we did for transmission assays. Second, while we could normalize to a control within each experiment and expect to see similar relative differences across experiments, we believe this makes it more difficult to interpret the underlying data. For example, an increase in the infection rate of 80% compared to 10% within a population has only a single interpretation while a relative increase in the infection rate by 8x within a population could have several underlying meanings (e.g. 80% vs 10%, 64%vs 8%, 24% vs 3%). We have now included all experimental replicates in the supplementary material. 

      Reviewer #2 (Recommendations For The Authors):

      Minor concerns: 

      (1) Lines 86-87: "utilized a collection of bacteria isolated from the environment with wild C. elegans". The authors should provide more context on the source of these bacterial strains. 

      More references for the sources of these bacteria have been added to Supplementary Table 2.

      (2) The presentation of data in Fig 1 could be improved. The authors should include the text "pals-5p::GFP" on the images shown in Fig 1B. The red dashed line in Fig. 1D should intersect the dose-response curve at y = 0.5. The column heading for Fig 1E states "ID50 +/- SD (a.u.)", but should read "ID50 ratio" and should not have units. It also might be more intuitive to normalize the ID50 value for O. vermis to E. coli OP50. This way, having an ID50 ratio >1 indicates decreased transmission relative to E. coli, and ID50 ratio <1 indicates increased transmission relative to E. coli. To increase the transparency and rigor of 1E, the authors should plot the ratios from all 3 experimental replicates. The authors should also briefly explain why different viral doses were used in Fig 1D and 1F. 

      The text “pals-5p::GFP” has now been added to Figure 1B and throughout the text. The red dashed line in figure 1D has been corrected. Figure 1E has been adjusted to an actual figure as suggested and the y-axis label is “ID50 Ratio Compared to E. coli OP50”. The ID50 replicates have been plotted in Supplementary Figure 2. We have clarified that the doses used are the same. Briefly, the technical replicates of individual doses from Figure 1D and Supplementary Figure 3A and 3B were pooled and processed for FISH staining to provide each experimental replicate of Figure 1F. 

      (3) Line 110: The claim is that Ochrobactrum and P. lurida MYb11 reduce the variability of infection levels. However, another possibility is that there's simply less dynamic range in the assay because the infection levels have been compressed to 100% and 0% under these conditions. 

      This line has been removed.

      (4) There are discrepancies between what is shown in Fig 2C and what is described in the text. Lines 163-164: "P. aeruginosa PA01 and P. lurida MYb11 attenuated average infection to 33% and 62% of the population respectively". In Fig 2C, the mean for PA01 is ~25% whereas the mean for P. lurida appears to be less than 62%. 

      These values have been corrected.

      (5) Line 196: Provide more context for why rde-1 mutants were tested. This is the first time rde-1 is mentioned in the text (i.e. why show results in rde-1 mutants when the results are in Fig 2). 

      More context has been provided for why rde-1 mutants were tested (Line numbers 228-232). Briefly, using the rde-1 mutant, which has defective antiviral immunity and therefore supports higher viral replication levels than the wild-type (Félix et al. 2011), allows us to potentiate our infection assay in Figure 3B and 3C such that we maximize our chances of detecting viral replication in the presence of the Pseudomonas species, and especially P. aeruginiosa PA14, where fewer animals might be expected to get infected based upon Figure 2B and Supplementary Figure 5. 

      (6) Lines 228-229: "Mutations of any the regulators of the las, rhl, or pqs quorum sensing systems suppressed the attenuation of Orsay virus infection caused by the presence of wild-type P. aeruginosa PA01". Based on this description, PA01 should have a lower fraction of GFP positive relative to the quorum sensing mutants in Fig 4B. It seems that the x-axis labels OP50 and PA01 are swapped. 

      The x-axis labels of Figure 4B have been corrected. 

      (7) To improve clarity, for any figures that have data showing the "fraction of individuals GFP positive", the authors should include "pals-5p::GFP" in the y-axis title and legend. 

      The y-axis labels, legends, and text have been corrected throughout.  

      (8) To improve overall clarity and flow, the order in which the data is presented could be reordered. In particular, Fig. 6 could be better positioned instead of being the last figure, as no further characterization is performed on the mutants, and the findings are not conserved in strains that are more relevant to the C. elegans microbiota, such as P. lurida. The overall story could be strengthened if the authors ended the manuscript with more details related to the mechanism by which regulators of quorum sensing modulate the outcome of viral infection. 

      Figure 5 and Figure 6 have now been swapped.

      (9) Fig 5A: Make arrow sizes consistent across diagrams (i.e. the diagram for gacA deletion). 

      This figure (now Figure 6A) has been adjusted to make arrow sizes consistent across diagrams.  

      (10) Lines 280-282: "These data suggest that gacA has a conserved role across distant Pseudomonas species..." Here, the authors can provide more context on how well-conserved gacA is across Pseudomonas species (i.e. phylogenetic analysis of gacA sequences across different Pseudomonas species/strains). Furthermore, the data in Fig 5 does not provide strong enough support for the conclusion that gacA has a conserved role broadly across Pseudomonas species, as the authors only assess the effects of a gacA deletion in two species, P. aeruginosa and P. lurida. 

      We have adjusted lines 361-362 to “These data suggest that gacA has a conserved role between P. aeruginosa and P. lurida Myb11 in the attenuation of Orsay virus transmission and infection of C. elegans.” to reflect that we only assessed the effects of the gacA deletion in P. aeruginosa and P. lurida MYb11.

      (11) The manuscript can be strengthened by performing additional experiments to elucidate the mechanism by which Pseudomonas modulates viral infection. Does the attenuation of viral transmission and host susceptibility by P. lurida and P. aeruginosa require C. elegans to be in the presence of live bacteria? For example, the authors could measure viral transmission and susceptibility of C. elegans grown on heat-killed Pseudomonas. Additionally, it would be interesting to determine if modulation of viral infection is dependent on a secreted molecule. To assess this, the authors could perform viral infections in the context of Pseudomonas culture supernatant. 

      We added bacterial culture supernatant from each bacterium to lawns of E. coli OP50 to assess the effect on host susceptibility and did not observe any potent effect (Line numbers 311-318, Supplementary Figure 9). This supports an interpretation that attenuation is not mediated by a secreted molecule, however we cannot rule out that attenuation activity would become apparent if supernatant were provided at a higher concentration.

      We have found substantial challenges appropriately controlling live vs. heat-killed experiments particularly with the specifics of our susceptibility experiments. With regards to the underlying question of mechanism we believe that the genetic mutants (e.g. rhlR/gacA) are equally informative and that further comparison of these mutants’ interaction with the C. elegans host as compared to wild-type may be informative. 

      (12) The authors should include a discussion on the relative virulence potential of PA01, PA14, and P. lurida and the relationship between bacterial virulence potential and the outcome of viral infection. 

      We have also added data on mortality rates (Line numbers 183-200, Supplementary Figure 6). No significant mortality was observed within the 24-hour exposure period used for our Orsay infection and transmission assays. P. aeruginosa virulence is dependent upon temperature and as our assays are done at 20°C rather than 25°C this may account for reduced mortality compared to other published results. Regardless, we noted that O. vermis MYb71 killed C. elegans as quickly as P. aeruginosa PA14 under these conditions and these two bacteria led to the shortest lifespan compared to the other tested bacteria. Interestingly, P. lurida MYb11 was observed to be more virulent than P. aeruginosa PA01 under these conditions. These results suggest that there is no direct correlation between mortality and susceptibility to Orsay virus, although it does not rule out that virulence effects unique to each bacterium could contribute to alterations in host susceptibility.  

      (13) More information is needed on strains listed in Supplementary Table 2, particularly when there is no reference listed and the strain is "Gift of XXX lab". For example, the Troemel lab previously published about an Ochrobactrum strain in Troemel et al PLOS Biology 2008 PMID: 19071962 - is this the same strain? Please ensure that there is adequate information about each strain with as many published references as possible so that the work can be more easily reproduced. 

      We have added additional information and references to the strain table in Supplementary Table 2. The strain listed as Ochrobactrum sp. has been amended to Ochrobactrum BH3 as it is the strain described in Troemel et al. 2008.

    1. Author Response

      We appreciate the thoughtful comments provided by the editor and reviewers. We were pleased to hear that they appreciated our work's contribution to the field of motor learning as well as our use of state-of-the-art analysis techniques.

      We are currently preparing a comprehensive revision of our manuscript to address several of the recommendations of the reviewers. It is our belief that this revision will not only strengthen our paper but also help clarify several areas that were highlighted by the reviewers.

      To address the concerns regarding potential confounds in our experimental design, we will be providing a more detailed justification and rationale for the experimental design and analysis choices made during our study. It appears that some reviewers’ comments may stem from misunderstandings concerning certain details of our task and we will carefully revise these sections to ensure that the design and purpose of the study are unambiguous. We will also be improving our characterizations of subjects’ learning behavior, which we believe will clarify some of the reviewers comments and enhance the overall rigor of our analyses. Lastly, we will be dealing with all concerns related to the statistical quantification of our results.

      We appreciate the opportunity to improve our manuscript for eLife and are eager to provide a revision that satisfies the majority of the reviewers’ recommendations

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      While I acknowledge the authors' effort in conducting Southern blot analysis to address my prior concern regarding the presence of dual copies of torA and tapA, I find their current resolution inadequate. Specifically, the simple deletion of the respective result sections for torA and tapA significantly impacts the overall significance of this study. The repeated unsuccessful attempts to generate correct mutants only offer circumstantial evidence, as technical issues may have been a contributing factor. Therefore, instead of merely removing these sections, it is essential for the authors to present more compelling experimental data demonstrating that torA and tapA are indeed vital for the viability of A. flavus. Such data would enhance the overall significance of this study.

      We agree and appreciate reviewer's important comments on our manuscript. In this version, we address this issue by providing additional experimental data to further support the importance of torA and tapA in the viability of A. flavus. We conducted additional experiments to generate more compelling evidence regarding the essential role of torA and tapA in the growth and development of A. flavus. We constructed a mutant strain (xylPtorA) using an xylose-inducible promoter, which allows for conditional induction with the addition of xylose (Lines 204-238, page 10).

      Due to the unsuccessful construction of TapA knockout strains and xylose promoter replacement strains, we used homologous recombination to replace the original promoter with the gpdA strong promoter for overexpression of tapA (OE::tapA). We thank reviewer for highlighting this important aspect, and we revise our manuscript accordingly to enhance its overall significance (Lines 277-297, page 13). We are grateful for the opportunity to enhance our manuscript and believe these revisions provide a more comprehensive understanding of the roles of torA and tapA in A. flavus.

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      Lines 421-423 and 465-466: these sentences are grammatically awkward. Please rephrase them.

      Thank you for your feedback on our manuscript. We conducted additional experiments, so we have removed the sentence from the manuscript to maintain coherence and avoid redundancy.

      Reviewer #2 (Public Review):

      In this study, authors identified TOR, HOG and CWI signaling network genes as modulators of the development, aflatoxin biosynthesis and pathogenicity of A. flavus by gene deletions combined with phenotypic observation. They also analyzed the specific regulatory process and proposed that the TOR signaling pathway interacts with other signaling pathways (MAPK, CWI, calcineurin-CrzA pathway) to regulate the responses to various environmental stresses. Notably, they found that FKBP3 is involved in sclerotia and aflatoxin biosynthesis and rapamycin resistance in A. flavus, especially that the conserved site K19 of FKBP3 plays a key role in regulating aflatoxin biosynthesis. In general, the study involved a heavy workload and the findings are potentially interesting and important for understanding or controlling the aflatoxin biosynthesis. However, the findings have not been deeply explored and the conclusions mostly are based on parallel phenotypic observations.

      Thank you for your constructive comments on our manuscript. In response to your comments, we have conducted additional experiments, including the construction of a xylose promoter mutant strain and an overexpression strain. We have also expanded the discussion section to provide a more comprehensive analysis of our findings in the context of existing literature. Thank you again for your insightful feedback, which has been instrumental in improving the quality of our work. (Lines 464-469, page 22).

      Reviewer #2 (Recommendations For The Authors):

      Point 1: Our findings revealed that both the tor and tapA genes are present in double copies in our strains, which guided our decision to construct single-copy deletion strains using homologous recombination However, the tor gene in A. flavus exhibited varying copy numbers, as was confirmed by absolute quantification PCR at the genome level (Table S1). However, it is hard to understand for Table S1: Estimation of copy number of tor gene in A. flavus toro and sumoo stand for the initial copy number, and the data are graphed as the mean {plus minus} 95%confidence limit. CN is copy number. As indicated in the Methods, Using sumo gene as reference, the tor and tapA gene copy number was calculated by standard curve. In Table S1 of WT, for tor gene, CN value is1412537 compared to 1698243 in tor+/-, for the reference gene sumo,794328 compared to1584893, how these data could support copy gene numbers of tor?

      Thank you for your insightful comments. We understand the confusion with the data presented in Table S1 regarding the copy number estimation of the torA gene in A. flavus. We apologize for not providing a clear explanation for the data in the table. Quantitative real-time PCR (qPCR) is widely used to determine the copy number of a specific gene. It involves amplifying the gene of interest and a reference gene simultaneously using specific primers and probes. By comparing the amplification curves of the gene of interest and the reference gene, we can estimate the relative copy number of the gene.

      To address your concern and provide more accurate information, we have re-performed the copy number analysis using southern blot. Southern blot analysis allows for the direct estimation of gene copy number by hybridizing genomic DNA with a specific probe for the gene. This method provides more reliable and accurate results in determining gene copy numbers. We discovered that the A. flavus genome contains a single copy of the torA gene. Consequently, we conducted additional experiments to elucidate its function. Specifically, we generated strains with a xylose-inducible promoter system to modulate the expression of torA (Lines 204-238, page 10).

      Point 2: In response: For the knockout of the FRB domain, we used the homologous recombination method, but because tor genes are double-copy genes, there are also double copies in the FRB domain. Despite our efforts, we encountered challenges in precisely determining the location of the other copy of the tor gene. I could not understand these consistent data, why not for using sequencing?

      Thank you for your valuable feedback. We determined again and confirmed that the torA gene is a single copy. So we removed this part of the results to avoid any ambiguity or potential misinterpretation.

      Point 3: Response in Due to the large number of genes involved, we did not perform a complementation experiment. If there were no complementation data, how to demonstrate data are solid?

      Thank you for your important suggestion. We understand that complementation experiments are commonly used to validate gene deletions. Therefore, to ensure the reliability of our data, we have conducted supplementary experiments on specific gene deletions, such as Δ_sitA_-C and Δ_ppg1_-C. Thank you again for your positive comments and valuable suggestions, which have significantly contributed to enhancing the quality of our manuscript (Lines 320-322, page 15).

      Point 4: Acknowledge the confusion? We acknowledge the confusion in our presentation and will ensure that accurate genetic nomenclature is used consistently

      Thank you for your comments on our manuscript. We recognize the importance of precise and consistent use of genetic nomenclature, as it is critical for the clarity and integrity of our research findings. We have carefully reviewed the sections of our manuscript where genetic terms were used and have made the necessary corrections to ensure that all nomenclature is accurate and used consistently throughout the text.

      Point 5: In the revised version of new manuscript, southern blotting was carried out and found only one copy was existed for tested genes at last. Thus, whole manuscript conclusions should be changed. In addition, Reviewer 1 suggestion for using Illumina-sequence strategy, their tor and tapA mutants could be verified whether they are aneuploid?

      We would like to express our gratitude for your insightful comments and suggestions. Following the new experimental data obtained from Southern blotting, we have identified that only one copy of the tested genes exists, and we have revised our conclusions throughout the manuscript. This has led to a significant reinterpretation of our results and a reassessment of the implications for our study. Based on this result, we designed and constructed strains with the tor gene under the control of a xylose-inducible promoter. This approach allows for the conditional expression of the tor gene. Thank you once again for your meticulous review (Lines 204-238, page 10).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study investigates parafoveal processing during natural reading, combining eye-tracking and MEG techniques, building upon the RIFT paradigm previously introduced by Pan et al. (2021). Overall, the manuscript is well-written with a clear structure, and the data analysis and experimental results are presented in a lucid manner.

      The authors have addressed the issues I raised in the previous round of review to my satisfaction. However, I still have two concerns that require the authors' consideration.

      Firstly, the similarity between the RIFT analysis process in this study and traditional ERP analysis could lead readers to equate RIFT with components like N400, potentially influencing their interpretation of the results. Although the author's response has somewhat clarified my queries, I seek confirmation: does RIFT itself signify "visual attention" or the "allocation of attentional resources to the flickering target words" (line 208) in this study? While this may not be pivotal, as it primarily serves as an indicator to evaluate whether contextual congruity can indeed modulate the RIFT response rather than indicating early parafoveal semantic integration, I recommend that the authors explicitly address this point in the manuscript, maybe in the discussion section, to enhance reader comprehension of the article's rationale.

      Secondly, regarding the study's conclusions, there appears to be an overemphasis in stating that "semantic information ... can also be integrated with the sentence context ..." (line 21-22). As raised by Reviewer 2 (Major Point 1) and acknowledged by the authors in the limitations of the revised manuscript (lines 403-412), the RIFT effect observed likely stems from local congruency. Therefore, adjusting the conclusion to "integrated with previous context" may offer a more precise reflection of the findings.

      We appreciate the positive comments from the Reviewer.

      In response to the first concern, we have rephrased the sentence (Line 207-209 in the revised manuscript) to clarify that RIFT measure visual attention : “Moreover, as RIFT directly measures visual attention, the left-skewed RIFT response curve suggests that more visual attention is allocated towards the flickering target words before fixating on them, aligning with the left-to-right order of reading English.”

      Regarding the second concern, we have addressed the issue by modifying “sentence context” to “previous context” in both the Abstract (Line 18 and Line 22) and the Discussion section (Line 314 and Line 361) of the revised manuscript.

    1. Author response:

      We appreciate the comprehensive reviews and would like to address the critiques and suggestions provided by both reviewers. We will make significant revisions to the manuscript to address these concerns. These include a more cautious interpretation of our results, an expanded discussion on key findings, additional analyses for TRM characterization, and a clearer outline of future validation efforts. We believe these changes will enhance the clarity and robustness of our study, and we hope they meet the reviewer’s expectations.

      Reviewer 1:

      Weaknesses:

      (1) Heterogeneous and small cohort:

      Increasing the cohort size is not feasible due to resource constraints. We acknowledge the challenges posed by the heterogeneous and small cohort, which complicate adjustments for confounding. We will apply multiple testing corrections to transparently assess and accurately report the robustness of our findings in the revision.

      (2) Influence of tissue of origin on RNAseq:

      We agree that RNAseq results can be heavily influenced by the tissue of origin. While immune cell composition in the normal lung tissues and lymph nodes is quite different, we found that in tumor tissues and metastatic lymph nodes, these differences diminish and common features dominate. Although we depicted this data in the supplementary figure 1, we did not provide a quantitative test in the original submission. In the revision, we will perform additional quantitative tests to compare immune cell composition across different tissue origins. These tests will provide a more precise understanding of the cellular composition and support our argument regarding the similarity of tumor-sculpted microenvironment. We will include these results and detailed methodologies in the revision.

      (3) Accuracy performance and overfitting:

      We acknowledge the concern regarding the high “accuracy” performance potentially indicating overfitting. We will clarify the evaluation methods used and moderate our claims regarding accuracy in the revision.

      (4) Specificity of the tumor cell program/state analysis to the setting of ICIs:

      The comment suggests that the tumor programs in our study may not be specific to the ICI group but rather prognostic in lung cancer. We acknowledge this possibility as we performed comparisons between responders and non-responders (with different cut-offs) to find common trends and interpreted them in terms of their association with ICI. In the revision, we will test the prognostic association of the tumor programs using public lung cancer data.

      (5) More external validation needed:

      We recognize the importance of external validation for reproducibility. While increasing the cohort size is not feasible, we will propose future directions for validation using larger, independent cohorts and potential experimental validations.

      Reviewer 2:

      Weaknesses:

      (1) Small sample size and heterogeneous populations:

      Increasing the cohort size is not feasible due to resource constraints. We acknowledge the challenges posed by the heterogeneous and small cohort, which complicate adjustments for confounding. We will apply multiple testing corrections to transparently assess and accurately report the robustness of our findings in the revision.

      (2) Limited validation of signatures/ methods in independent cohorts:

      We recognize the importance of external validation for reproducibility. While increasing the cohort size is not feasible, we will propose future directions for validation using larger, independent cohorts and potential experimental validations.

      (3) Lack of functional characterization and discussion on key findings:

      We appreciate the feedback regarding the need for functional characterization and a more thorough discussion of key findings on the roles of specific cell populations and genes. In the revised manuscript, we will expand the discussion section to include in-depth analysis of these findings and their relevance to the study. This includes a detailed interpretation of how these factors contribute to the immune response and potential implications for therapy.

      (4) TRM findings and marker selection:

      We understand the concern regarding the association between TRM involvement in response to IO therapy, which appears counter to previous demonstrations. It is indeed important to note that we employed alternative markers for TRM characterization. Our choice of markers was based on transcriptional references relevant to our study. However, we agree that classical TRM markers such as CD69 and CD103, which were absent in our definition, are critical for accurate TRM identification. To address this, we will include a detailed rationale for our marker selection and acknowledge the limitations of our TRM characterization. We will include additional analyses using classical TRM markers where possible and incorporate these findings into the revision. This will provide a clearer understanding of our TRM population and its role in the immune response to IO therapy.

    1. Author response

      The following is the authors’ response to the previous reviews

      eLife assessment 

      This work is an attempt to establish conditions that accurately and efficiently mimic a drought response in Arabidopsis grown on defined agar-solidified media - an admirable goal as a reliable experimental system is key to conducting successful low water potential experiments and would enable high-throughput genetic screening (and GWAS) to assess the impacts of environmental perturbations on various genetic backgrounds. The authors compare transcriptome patterns of plant subjected to water limitation imposed with different experimental systems. The work is valuable in that it lays out the challenges of such an endeavor and points out shortcomings of previous attempts. There was concern, however, that a purely gene expression-based approach may not provide sufficient physiologically relevant information about plant responses to drought, and therefore, despite improvements from a previous version, the new methodology championed by this work remains inadequate.   

      Molecular biologists who study drought stress must make choices about which assays to use in their investigation. Serious resources and effort are put into their endeavor, and choice of assay matters. Our manuscript’s goal was largely practical: to guide molecular biologists employing transcriptomics in their choice of drought stress assay, and thus help ensure their work will discover transcriptional signatures of importance, and not those that may be an artifact from lowering water potential using chemical agents on agar plates.  

      We examine how different approaches of reducing water potential impact the Arabidopsis root and shoot transcriptome. Our manuscript shows that each method of reducing water potential has a different effect on Arabidopsis root transcriptome responses. We acknowledge that drought stress induces a complex physiological response, and can vary depending on the method used. However, by comparing across assays, we find instances where a gene is downregulated by low water potential in one assay, and upregulated by low water potential in another assay. We feel it is only natural to question why this could be, and to hypothesize that it may be caused by secondary effects caused by the way low water potential is imposed.  We note that comparative transcriptomics has been a standard approach for decades. We take it as the reviewer’s opinion that it may not be insightful, but it does not factually impact our findings. 

      Reviewer #2 (Public Review): 

      This manuscript purports to develop a new system to study low water potential (drought) stress responses in agar plates. They make numerous problematic comparisons among transcriptome datasets, particularly to transcriptome data from a vermiculite drying experiment which they inappropriately present as representing an authentic "drought response" to the exclusion of all other data. For some reason, which the reviewer cannot fully understand, the authors seem intent on asserting the superiority of their experimental system to all others. They do not succeed in this and such an effort is ultimately a disservice to the field of drought research as a whole. 

      While they devote considerable effort in comparing transcriptome data among various experimental systems, the potentially more informative experiment at the end of the manuscript of testing growth responses of a number of Arabidopsis accessions is only done for their "LW" system. The focus of this manuscript on transcriptome data to the almost complete exclusion of other types of data which is a symptom of a broader over-emphasis on transcriptome that unfortunately is quite prevalent in plant science now. It is worth reminding that for protein coding genes, which constitute the vast majority of genes, transcriptome data is a proxy measurement. The really important thing is protein amount, and even more so protein activity/function, which we know has an imperfect, at best, correlation with transcript level. We measure transcriptomes because we can, not because it is inherently the most informative thing to do. The author's quixotic quest to see if the transcriptomes of different stress treatments match is of limited value and further diminished by their misleading presentation of one particular transcriptome data set (from their vermiculite drying experiments) as somehow a special data set that everything else must be evaluated against. This study sheds no new light on how to do relevant drought (low water potential) experiments in the lab. 

      Although the reviewer acknowledges that the authors have made some effort to respond to previous comments, the fundamental flaws remain and the present version of this study is little improved from the first submission. 

      One challenge faced by the drought community is establishing consensus regarding the definition of drought itself. According to the criteria followed by the reviewer, any method leading to a reduction in water potential qualifies as drought stress. However, the findings presented in this manuscript demonstrate that transcriptional responses in roots vary considerably across five different methods of reducing water potential. This indicates that beyond responding to a change in water potential itself, root transcriptomes will also respond to the specific way low water potential is introduced. We believe this variability is of interest to the drought research community. 

      Of the five methods we explore, we hold the view that the gene expression changes induced by vermiculite drying as the most analogous to the expression signatures Arabidopsis would exhibit in response to low water potential in the natural environment. In contrast, we posit that Arabidopsis grown on agar plates - where the root system is exposed to air and light, and where water potential is lowered using chemical agents - may contain gene expression signatures plant molecular biologists may not find particularly relevant. However, we acknowledge that this is our opinion, and will make this more explicit on our revised text. 

      More broadly, we believe that the reviewer’s observation regarding the ‘over-emphasis’ on transcriptomics that is prevalent within the plant science community justifies, rather than diminishes, the work presented here. If transcriptomics is a commonly employed method, then we anticipate that the outcomes of this study will hold value for a broad audience. Such researchers are likely not only using transcriptomics as a proxy measure for protein abundance, as the reviewer suggests, but also because it is one of the more straightforward genomic techniques biologists can use to identify candidate genes that may be chosen for further scrutiny. 

      Reviewer #3 (Public Review): 

      Comments on revised version: 

      Specific previous criticisms that were addressed are: 

      (1) that gene expression changes were only compared between the highest dose of each stress assay. In the revised version, the authors changed their framework and are now using linear modelling to detect genes that display a dose response to each specific treatment. I agree that this might be a more robust approach to selecting genes that are specific to a certain treatment. 

      (2) that concentrations of PEG, mannitol, NaCl, and the "low water" agar which were chosen are not comparable in regards to their specific osmotic component. I appreciate that the authors measured the osmotic potential of each treatment. It revealed that both PEG and NaCl at their highest concentration had a much more negative osmotic potential compared to the other treatment. The authors claim that using ANCOVA they did not detect any significant differences between the treatments (lines 113, 114). I do believe that ANCOVA is not the appropriate test in this case. ANCOVA has an assumption of linearity, while the dose response between concentration and osmotic potential is non-linear. This is particularly evident for PEG (Steuter AA. Water potential of aqueous polyethylene glycol. Plant Physiol. 1981 Jan;67(1):64-7. doi: 10.1104/pp.67.1.64.). Since the treatments are not the same at the highest level, I think this could have effects on the validity of comparisons by linear model. One approach could be to remove the treatment level with the highest concentration and compare the results or adjust the treatments to the same osmolarity. 

      (3) that only two biological replicates were collected for RNA sequencing which makes it impossible to know how much variance exists between samples. The authors added a third replicate in the revised version for most treatments. However, some treatments still have only two replicates, which cannot be easily seen from the text or the figure. I would prefer that those differences are pointed out. 

      (4) that the original manuscript did not explore what effect the increase of agar and nutrient concentration in the "low water" agar had on water potentials. The authors conducted additional experiments showing that changes in water potential were exclusively caused by changes in the nutrient concentration (Figure 2-figure supplement 5; lines 222-224). However, the increase in agar strength had also some effect on gene expression. While this is not further discussed in the text, I believe this effect of agar on gene expression could be similar to root responses to soil compaction. 

      (5) That the lower volume of media in the "low water" agar could have an effect on plants. The authors compared these effects in Figure 2-figure supplement 7. They claim that "different volumes of LW agar media do not play a significant part in modulating gene expression". While I can see that they detected 313 overlapping DEGs, there were still 146 and 412 non-overlapping DEGs. The heatmap in subpanel E also shows that there were differences in particular in the up-regulated genes. My conclusion would be that the change in volume does play a role and this should be a consideration in the manuscript. 

      We thank the reviewer for their suggestions. We plan to resubmit the manuscript reflecting the requested changes. Specifically, we will: 

      -       We will detail more thoroughly the effects of agar volume on gene expression changes elicited by LW agar treatment. 

      -       We will investigate whether the tensile stress introduced by hard agar is similar to soil compaction by an analysis with existing literature. 

      -       Assess more rigorously the suitability of the ANCOVA model for assessing water potential changes of different media types.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) The modeling process is outlined, but an explanation of why Maxent (Phillips & Dudík, 2008) was chosen for SDMs and why the specified predictor variables were used could provide additional context. This clarity would help readers understand the rationale behind the methodology.

      In L.558-571 (Predictor variables subsection), we added the explanation about predictor variables as follows:

      “Predictors encompass a range of environmental variables recognized to impact species distribution (Table 3): land use (Newbold et al., 2015), climate (bioclim variables (Booth et al., 2014)), vegetation (Abe, 2018), lithology (Ott, 2020) and elevational range (Udy et al., 2021). Additionally, categorical variables representing known biogeographic regions, reflecting geological history, were included. We applied  Blakiston's Line —Tsugaru straits dividing the northern and main islands of Japan (i.e., Hokkaido and Honshu islands)— reflecting a significant historical migration barrier for mammals and birds (Dobson, 1994; Saitoh et al., 2015). Due to the distinct fauna (Wepfer et al., 2016; Yamasaki, 2017), we also specified oceanic islands (i.e. Ogasawara and Daito isles) which have never been connected with the Asiatic continents. Continuous environmental variables were transformed into linear, quadratic and hinge feature classes to illustrate nonlinear associations between environments and species occurrence (Phillips et al., 2017). The regularisation multiplier was set at 2.5, falling within the established optimal range of 1.5 to 4 (Elith et al., 2010; MorenoAmat et al., 2015).

      In L.614-618 (Modelling subsection), we explain why we chose MaxEnt:

      “To model species distributions from presence-only data, several algorithms have been utilised, including generalised additive models, random forest, and neural networks (Norberg et al., 2019; Valavi et al., 2022). In our study, we opted for MaxEnt (Phillips and Dudík, 2008) due to its high estimation accuracy and relatively low computational burden (Valavi et al., 2022).

      (2) While the study outlines a manual reidentification process by experts for wild individuals, it might be beneficial to elaborate on the criteria or expertise level of these experts. This transparency ensures the reliability of the reidentification process. Reply

      In L.519-523, we added description about experts as follows:

      “These experts have professional backgrounds, serving as a technician at a prefectural research institute (fish), highly-experienced field survey conductors (plants and insects, respectively), a post-doctoral researchers (amphibians and reptiles, and mammals, respectively), and a museum curator (mollusks) specialising in the focal taxa.”

      (3) The analysis of the effects of data type (Biome+Traditional data or Traditional survey data) on BI is comprehensive. However, a brief discussion on the potential implications of these effects on the study's overall conclusions could add depth to the interpretation.

      We enforced our discussion about the causes and consequences of improved modelling accuracy. 

      In L.276-282, we argued about the causes: 

      “Therefore, incorporating Biome data could significantly enhance modelling accuracy in urban and suburban landscapes, which are typically underrepresented in traditional survey data. As pseudo-absences are selected based on search effort, our models utilise numerous pseudoabsences from these areas. Consequently, this might lead to better estimation of species absence in such areas, not just presence, resulting in an overall increase in model accuracy across a wider range of species.”  

      In L.370-387, we argued how improved modelling accuracy may help build naturepositive society as follows:

      “By blending data from traditional surveys and communities, we improved the accuracy of species distribution estimates. This enhanced estimation lays the groundwork for more precise subsequent analyses. For instance, estimated distributions will be useful in selecting new protected areas or areas with OECMs (Other Effective area-based Conservation Measures: allowing a wider range of land use as long as biodiversity and ecosystem services are sustained/improved). Using estimated distributions of each species, hotspots of species or evolutionary diverse taxa can be inferred. Such sites will be good candidates for protected areas (Jones et al., 2016) or OECMs (Shiono et al., 2021). Further, estimated distributions can be used as input for spatial conservation prioritisation tools (e.g. Marxan (Ball et al., 2009))

      In our experience, stakeholders—including corporate social responsibility managers and conservation practitioners—often seek the list of species potentially inhabiting their locations. Due to the uncertainty of SDMs and their thresholding into presence/absence, on-site surveys remain essential for assessing biodiversity status. SDMs can make such surveys costeffective by screening important locations for on-site assessment (e.g., Locate phase in TNFD framework) and narrowing down the target species for surveying. Improved estimation through SDMs can mitigate risks associated with their use in society and enable more informed decisionmaking for conservation efforts.”

      Following the editorial policy, we have reorganised our supplementary materials as follows:

      -        Formerly Supplementary File 1 - Remains unchanged.

      -        Formerly Supplementary File 2 - Transferred into the main text, in the subsection "Filtering suspicious occurrence record in Biome data" in the Methods section, and Table 2. Citations remain as Supplementary File 2.

      -        Formerly Supplementary File 3 - Remains unchanged.

      -        Formerly Supplementary File 4 - Transferred into "Figure 3—figure supplement 1".

      -        Formerly Supplementary File 5 - Transferred into Figure 4.

      -        Formerly Supplementary File 6 - Transferred into the main text, in the subsection "Predictor variables" in the Methods section and Table 3.

      -        Formerly Supplementary File 7 - Transferred into the main text, in the subsection "Pseudo-absence reflecting search effort" in the Methods section and Figure 5.

      -        Formerly Supplementary File 8 - Transferred into the main text, in the subsection "Model evaluation" in the Methods section and Figure 6.

      -        Formerly Supplementary File 9 - Renamed as Supplementary File 4.

    1. Author response:

      Reviewer #1 (Public Review):

      Metabotropic glutamate receptors (mGLuRs) play a key role in regulating neuronal activity and related behaviors. In different brain regions these receptors can be expressed presynaptically and postsynaptically in different classes of neurons. Therefore, it is difficult to predict the effects of systemically applied drugs that act on these receptors. Here, the authors harness the power of photopharmacology, applying modulators that can be activated or inactivated by light with spatial precision, to address this problem. Their stated goal is to determine the role of mGluRs in regulating pain behaviors, and the circuit mechanisms driving this regulation. Their findings suggest that mGluRs acting in medial prefrontal cortex and thalamus drive antinociception in animals with neuropathic pain, whereas these receptors drive pronociception when acting in the amygdala. Their circuit analysis suggests that, in the amygdala, mGluRs act by decreasing feedforward inhibition of the output neurons. These findings have the potential to affect the development of targeted treatment for pain and related disorders. The elegant photopharmacological approaches will likely inform future studies attempting to distinguish the action of neuroactive drugs in different brain regions.

      We thank the reviewer for the insightful evaluation of our study.

      Reducing the impact of these studies are several methodological, analytical, and interpretation issues.

      The authors report that "the effect of optical manipulations of photosensitive mGlu5 NAMs in individual brain regions in pain models has been studied before". It is, therefore, not immediately clear what is novel in the present study.

      We have clarified this in the following statement (page 3, lines 15‐17): “It remains to be determined if region‐specific actions play a role in the overall analgesic activity of mGlu5 receptor NAMs, considering that opposite actions have been reported”. The subsequent paragraph nicely explains the novelty of our approach, which is based on the combined use of a drug activated by light (JF‐NP‐26) and another drug inactivated by light (alloswitch‐1) to determine which region is sufficient and/or necessary for the analgesic effect of systemic mGlu5 receptor NAMs. In the Discussion (page 7) we state that “To the best of our knowledge, this is the first study to employ photopharmacological tools to compare and contrast distinct roles of mGlu5 receptors in different regions of the pain matrix”.

      The reliance only on reflexive measures of pain, especially in a study that examines the role of "affective and cognitive aspects of pain and pain modulation".

      The main endpoint of the study was not to examine the cognitive and affective aspects of pain, although some of the regions examined are involved in these aspects of pain besides the regulation of sensory aspects (pain thresholds). However, we followed the kind suggestion and measured depression‐like and risk‐taking (anxiety‐like) behaviors in mice. To optimize the number of mice and be still consistent with the number of mice approved by the regulatory agency we used the following groups of mice for the evaluation of risk‐taking behavior with the light‐dark box: (i) sham‐operated mice treated with vehicle; (ii) CCI mice treated with vehicle; (iii) CCI mice treated with JF‐NP‐26 without light activation; and (iv) CCI mice treated with JF‐NP‐26 and irradiated with activating light (the test cannot be performed in the same mice before and after light activation to avoid habituation); depression‐like behavior with the tail suspension test was performed in two separate groups of mice: (i) CCI mice treated with JF‐NP‐26 with no light; and (ii) CCI mice treated with JF‐NP‐26 and light activation. All mice had been implanted with optic fibers in the basolateral amygdala.

      Data are shown in the new Supplementary Fig. S4 and reported in the Results section (page 5) as follows: “Knowing that mGlu5 receptors in the BLA shape susceptibility to stress and fear in rodents (35, 36), we also measured depression‐like and risk‐taking behavior after light‐induced activation of JF‐NP26 in the BLA of neuropathic mice. Light‐induced activation of JF‐NP‐26 decreased risk‐taking hence increased anxiety‐like behavior in CCI mice as shown by the decreased number of entries into, and reduced time spent in, the light compartment of the light‐dark box (Fig. S4a‐c). Depression‐like behavior assessed with the tail‐suspension test was unchanged in CCI mice after light‐induced irradiation of JF‐NP‐26 in the BLA (Fig. S4d).”

      The inclusion of only males is unfortunate because of known, significant sex differences in neuronal circuits driving pain conditions, in both preclinical models (including form work by the authors) and in clinical populations.

      We are aware that there are important sex differences in the pain neuraxis, but this study was not about sex differences. The goal was to evaluate any region‐specific actions of systemically administered compounds (mGlu5 NAMs) and the contribution and requirement of specific brain regions to the observed drug effects, using photopharmacology and drugs activated or inactivated/reactivated by light. This analysis would have been less straightforward in female mice given for example that it is known that mGlu5 receptors interact with estrogen receptors. This aspect could be addressed in a future project. The present study provides the basis for comparative studies in females.

      The elegant slice experiments (especially Fig. 3) were designed to probe circuit mechanisms through which mGluRs act in different brain regions. These experiments also provide a control to assess whether the photopharmacological compounds act as advertised. Surprisingly, the effect size produced by these compounds on neuronal activity are rather small (and, at times, seems driven by outliers). How this small effect affects the interpretation of the behavioral findings is not clear.

      These small effect sizes should also be considered when interpreting the circuit actions studied here.

      We greatly appreciate your insightful comments and constructive feedback on our findings. The mean effect sizes observed in certain experiments are quite small, but effects or changes were very consistent. And we illustrate this now by including lines to connect individual data points for the same neuron in the modified Figure 3 (f, g, n, o) to show consistent changes observed in the EPSC and IPSC graphs. We would like to add that is not quite clear how neuronal effects translate into behavioral consequence, how much of a change in individual neurons or in a population of neurons or change of a certain magnitude is sufficient and required. These are all interesting questions, but the results of our behavioral and electrophysiological data match quite nicely, including differential or opposing drug effects.

      Some of the sample sizes are as small as n=3. Without an a priori power analysis, it is difficult to assess the validity of the analyses.

      The authors present intriguing data on changes in InsP levels in some (but not all) animals after injury, but not in sham animals. They also report an increase in the expression of mGLuRs expression in some, but not all brain regions. These findings are not discussed. It is not clear how these selective changes in mGluR expression and activity might affect the interpretation of the photopharmacological results.

      We performed new experiments to increase sample size in PI experiments in the infralimbic and prelimbic cortices where the n was low. Now the data are more solid. New statistical values are reported in the legend of Fig. 1. We also added a discussion of the signaling data (page 9) as follows:

      “We found that mGlu5 receptor‐mediated PI hydrolysis was significantly amplified in all subregions of the contralateral mPFC and in the contralateral amygdala after induction of neuropathic pain whereas mGlu5 receptor protein levels were significantly increased only in the contralateral infralimbic cortex of neuropathic mice. This suggests that, at least in the anterior cingulate cortex, prelimbic cortex, and basolateral amygdala, mGlu5 receptors become hyperactive after induction of pain. It remains to be determined if this is mediated by an enhanced coupling of mGlu5 receptors to Gq/11 proteins, increased expression of phospholipase‐C or other mechanisms. Interestingly, mGlu5 receptor signaling was down‐regulated in the thalamus of neuropathic mice, but mGlu5 blockade in the thalamus still had antinociceptive effects (see below). Downregulation of mGlu5 receptor signaling in the thalamus might represent a compensatory mechanism aimed at mitigating pain in neuropathic mice.”

      The behavioral data seem to represent discrete, and not continuous variables. The statistical tests applied are likely inappropriate for these analyses.

      The behavioral values reported here represent measurements of force (g) required to elicit a reflex (i.e., reflex thresholds) and can be considered continuous variables. The statistical tests used for the behavioral experiments included either t‐test to determine if the difference between two groups was statistically significant or One‐Way ANOVA (repeated measures when appropriate) to determine if there were any statistically significant differences between the means of three or more groups. This form of analysis for the outcome measures in this study is well‐established in the literature.

      The authors assume (and state in the abstract) that they can selectively stimulate BLA afferents to the neocortex. This is technically highly unlikely.

      We appreciate the reviewer's insightful comment regarding the technical challenges associated with the selective stimulation of BLA afferents to the neocortex. We are aware that the electrical stimulation does not allow the exclusive stimulation of a specific pathway, though BLA afferents form the major component of afferent fibers running in the layer IV of the infralimbic cortex on their way to targets in layer II/III and layer V or infra‐ and pre‐limbic cortices.

      Our previous work (Kiritoshi et al., 2016) compared directly electrical and optogenetic stimulation in the mPFC, and found that they match, suggesting that electrical stimulation provides a reliable means to activate BLA input in the mPFC. We acknowledge the technical limitations of selective BLA activation with electrical stimulation, though we are confident that our approach allowed the investigation of mGlu5 manipulations in the BLA‐mPFC circuitry. We have modified the abstract to read as follows: “Electrophysiological analysis showed that alloswitch‐1 increased excitatory synaptic responses in prelimbic pyramidal neurons evoked by stimulation of presumed BLA input, and decreased BLA‐driven feedforward inhibition of amygdala output neurons”.

      The results from the experiment on rostroventral medulla (RVM) neurons are less than convincing because only a "trend" towards decreased excitation is reported. As above, without consideration of effect size, it is hard to appreciate the significance of these findings. The absence of a demonstration of a classical ON Cell firing pattern is also unfortunate.

      We appreciate this observation. Based on the Reviewer’s suggestion, we report below the effect size of optical modulation in the prelimbic cortex on RVM activity, according to Cohen’s d calculation from ttests (now shown in the Table 1). This information is also included in Results (page 6).

      Moreover, in this study we classified ON‐ or OFF‐cells based on their firing patterns relative to nocifensive withdrawal responses (H.L. Fields and M.M. Heinricher 1985). As ON‐cells with high basal firing can be easily misclassified as NEUTRAL‐cells (N.M. Barbaro, M.M. Heinricher, H.L. Fields, 1986), potential NEUTRAL‐cells with continuous spontaneous activity were verified by giving a brief bolus of anesthetic to the point that the withdrawal reflex was abolished. Indeed, firing of spontaneously active ON‐cells slows or stops with this manipulation, which unmasks reflex‐related responses. This is now reported and explained in Methods (page 14).

    1. Author response:

      Reviewer #2 (Public Review):

      (1) The groups of patients with endometrial cancer in the manuscript are classified according to age greater than/less than 60. Please explain why 60 years old is chosen as the boundary value of age.

      Thanks for your Recommendation. We have modified the discussion section of the manuscript in accordance with your suggestion.

      (2) Among the patients with endometrial cancer selected in the manuscript, AFP outliers accounted for a relatively small proportion. The authors chose the clinical detection outliers of CA-125, CA19-9, AFP and CEA as the dividing line, instead of re-selecting the optimal cut-off value in thispopulation, which should be classified and the prognostic value explored.

      Thanks for your Recommendation. We have modified the discussion section of the manuscript in accordance with your suggestion.

      (3) In cancer research, stage is an important prognostic factor to guide the treatment of patients in clinical work. Patients with different stages of endometrial cancer have obvious prognostic differences. The authors constructed a new prognostic risk score based on serum level of AFP, CEA andCA125, the prognostic value of the risk score should be validated in patients with endometrial cancer at different stages。

      Thanks for your Recommendation. We have modified the discussion section of the manuscript in accordance with your suggestion.

    1. Author response:

      Reviewer #1 (Public Review):

      The authors tested the hypothesis that protein consumption decreases with decreasing mass-specific growth during development. This hypothesis is firmly grounded in the logical premise that as animals progress from periods of reduced activity and rapid growth to phases of increased activity and reduced mass-specific growth during their development, they are likely to adjust their nutrient intake, reducing protein and increasing carbohydrate consumption accordingly. The authors tested their hypothesis using the South American locust Schistocerca cancellata, combining field observations with laboratory experiments. This approach allowed them to discern how variations in activity history and metabolism between field- and laboratory-raised locusts influenced their nutrient requirements.

      Their findings, indeed reveal the predicted shift from high protein: carbohydrate consumption to lower protein: carbohydrate intake from the first instar to adult locust - a decline that strongly correlated with a decrease in mass-specific growth rate. Their comparison between field- and laboratory-raised locusts, showed that protein demand was not different, however, carbohydrate consumption rate was >50% higher in the field locusts. These results add depth and significance to the study, shedding light on how environmental factors influence nutrient requirements. What truly amplifies the strength and novelty of the authors' hypothesis is their anticipation that this observed trend in Schistocerca cancellata could extend to all animals. This anticipation is rooted in the expectation that growth rates scale hypometrically across various body sizes and developmental stages, introducing a universal dimension to their findings that holds great promise for broader ecological and evolutionary understanding.

      However, while the study is commendable in its methodology and core findings, there is room for improvement in clarifying the implications of the results. The current lack of clarity is evident in the somewhat shallow questions outlined in lines 358 to 363. For instance, the practice of administering age-specific diets has been commonplace in human and livestock management for ages. Thus, its continued utility may not be the most stimulating question. Instead, a more thought-provoking inquiry might delve into whether variations in global protein availability play a pivotal role in driving niche specialization and the biogeography of animal body sizes and ontogeny, especially considering the potential impacts of climate change. Such inquiries would further elevate the significance of the author's work and its broader implications in the field.

      Thanks for the suggestions. We have added additional sentences to the discussion regarding how size affects protein:carbohydrate consumption may affect physiology and ecology of animals.

      Reviewer #2 (Public Review):

      How and why nutritional requirements and intake targets change over development and differ between species are significant questions with wide-ranging implications spanning ecology to health. In this manuscript, Talal et al. set out to address these questions in laboratory and field experiments with grasshoppers and in a comparative analysis of different species.

      The authors conclude that the target intake of protein to non-protein energy (in this case carbohydrate) (P:C) falls over developmental stages and that this occurs because of a decline in mass-specific intake of protein whereas mass-specific carbohydrate intake remains more constant. The decrease in mass-specific protein consumption rate is tightly correlated with a decline in specific growth rate. Hence, protein consumption directly reflects requirements for growth, with hypometric scaling of protein intake serving as a useful relationship in nutritional ecology.

      The laboratory experiments on the locust, Schistocerca cancellata, provide an elegant dataset in which different instars have been provided with one of two nutritionally complementary food pairings differing in protein to carbohydrate (P: C) content, and their self-selected protein to carbohydrate "intake target" measured.

      These lab locust results were then compared with independently collected field data for late instar nymphs of the same locust species, and the conclusion is drawn that field insects ingested similar protein but 50-90% more carbohydrate (with only 23% increased mass-specific resting oxygen consumption rates). Numerous uncontrolled variables between the lab and field studies make meaningful conclusions difficult to draw from this observation.

      Thank you for this comment. We have revised the text to better explain that very few studies have directly compared lab and field intake target data, and that our goal was to test whether lab intake targets predicted those for field-collected animals. We have also revised the discussion to describe the many possible reasons that intake targets for field-collected animals may diverge from those of lab-reared locust.

      A graph is then provided showing comparative data across a selection of species, making the case that protein consumption scales similarly both developmentally and across taxa. Questions need to be addressed for this to be convincing, including which criteria were used to select the examples in the graph and how comprehensively do these represent the available literature.

      We now provide further data in the methods on our literature search methods.

      Reviewer #3 (Public Review):

      The main goal of this study was to test how and why the intake of two important macronutrients ‒protein and carbon‒ often changes with ontogeny and body size. To do this, authors examined protein and carbon intake in a locusts lab population, across each instar and adult stages. Then, authors examined how the optimal balance of carbon and protein intake in a wild locusts population corresponded to that observed in the laboratory population. Results of these experiments showed that with ontogenic growth, locust decreased protein while increasing carbohydrate intake. Authors concluded that such decrease in the protein: carbohydrate intake may result from reductions in specific growth rates (growth within each instar). The protein: carbohydrate intake in the lab population appeared to be consistent with that observed in a wild locust population. Finally, authors combined their data with that from the literature to examine how protein intake scales with body mass throughout development, within and across different species.

      Strengths:

      To determine how locusts balance protein: carbohydrate intake, authors applied the Geometric Framework (GF) of nutrition, which is a powerful approach for studying effects of nutrition and understanding the rules of compromise associated with balancing dietary unbalances.

      Captivity can change behavior and physiology of most organisms, making it difficult to establish the relevance of laboratory experiments to what happens in the real world. A strength of this paper is that it compares behavior/physiology of lab vs. wild locusts. Finally, this study takes a step further by proposing a new scaling rule based on this study's results and data from the literature on various species.

      Weaknesses:

      Although the paper has strengths, there seems to be several methodological issues that obscure the interpretation/conclusions presented in the manuscript.

      It appears that authors are not actually estimating "Intake Targets", as stated throughout the manuscript. According to the geometric framework, the intake target (IT) is estimated as the point in the nutritional landscape under which performance/fitness is optimized. The geometric framework also predicts that animals can reach their intake targets by feeding selectivity when given a choice of diets that differ in nutrient amounts, which is what authors did here. However, because the relationship between fitness/performance with diet was not established, in the choice experiments authors seem to be assuming (but not testing) that locusts are reaching their intake target.

      The reviewer is correct that we have not tested whether the intake target selected by each instar maximizes growth or some other measure of fitness. This is a nontrivial task, as there are many possible indices of fitness for juvenile instars, including growth rate, developmental time, resistance to disease/stress, as well as effects on adult reproduction. We use intake target as defined by Raubenheimer and Simpson (2018), “the intake target (IT) is a geometric representation of the nutrient mixture that the regulatory systems target through foraging and feeding.” As we explain above, we followed the protocols used by most investigators to measure intake targets, including for many papers locusts.

      You estimated a mass-specific protein intake for each instar. It is not clear why mass-specific intake and not just intake of protein was used for analysis. While mass (or size) of an individual may influence food consumption, it seems like authors calculated mass-specific consumption using each instar's final mass, which would make mass a result of protein consumption (and not the opposite). Importantly, the comparison between mass-specific protein consumption and specific growth rate may be problematic, as both variables seem to be estimated using final mass.

      Thank you for this important comment. We agree and therefore, we changed figure 2 and the related analyses, using protein consumption rate corrected for initial rather than final mass.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors design an automated 24-well Barnes maze with 2 orienting cues inside the maze, then model what strategies the mice use to reach the goal location across multiple days of learning. They consider a set of models and conclude that the animals begin with a large proportion of random choices (choices irrespective of the goal location), which over days of experience becomes a combination of spatial choices (choices targeted around the goal location) and serial choices (successive stepwise choices in a given direction). Moreover, the authors show that after the animal has many days of experience in the maze, they still often began each trial with a random choice, followed by spatial or serial choices.

      This study is written concisely and the results are presented concisely. The best fit model provides valuable insight into how the animals solve this task, and therefore offers a quantitative foundation upon which tests of neural mechanisms of the components of the behavioral strategy can be performed. These tests will also benefit from the automated nature of the task.

      Reviewer #2 (Public Review):

      This paper uses a novel maze design to explore mouse navigation behaviour in an automated analogue of the Barnes maze. A major strength is the novel and clever experimental design which rotates the floor and intramaze cues before the start of each new trial, allowing the previous goal location to become the next starting position. The modelling sampling a Markov chain of navigation strategies is elegant, appropriate and solid, appearing to capture the behavioural data well. This work provides a valuable contribution and I'm excited to see further developments, such as neural correlates of the different strategies and switches between them.

      Reviewer #3 (Public Review):

      Strength:

      The development of an automated Barnes maze allows for more naturalistic and uninterrupted behavior, facilitating the study of spatial learning and memory, as well as the analysis of the brain's neural networks during behavior when combined with neurophysiological techniques. The system's design has been thoughtfully considered, encompassing numerous intricate details. These details include the incorporation of flexible options for selecting start, goal, and proximal landmark positions, the inclusion of a rotating platform to prevent the accumulation of olfactory cues, and careful attention given to atomization, taking into account specific considerations such as the rotation of the maze without causing wire shortage or breakage. When combined with neurophysiological manipulations or recordings, the system provides a powerful tool for studying spatial navigation system.

      The behavioral experiment protocols, along with the analysis of animal behavior, are conducted with care, and the development of behavioral modeling to capture the animal's search strategy is thoughtfully executed. It is intriguing to observe how the integration of these innovative stochastic models can elucidate the evolution of mice's search strategy within a variant of the Barnes maze.

      Comments on revised version:

      The authors have addressed all the points I outlined in the previous round of review, resulting in significant improvements to the manuscript. However, I have one remaining comment. Given the updated inter-animal analysis (Supplementary Figure 8), it appears that male and female mice develop strategies differently across days. Male mice seem to progressively increase their employment of spatial strategy across days, at the expense of the random strategy. Conversely, female mice exhibit both spatial and serial strategies at their highest levels on day 2, with minimal changes observed on the subsequent days.

      These findings could alter the interpretation of Figure 5 and the corresponding text in the section "Evolution of search strategy across days".

      For instance, this statement on page 6 doesn't hold for female mice: "The spatial strategy was increased across days, ... largely at the expense of the random strategy."

      We agree with the reviewer. While the text on page 6 is still valid for the male-female pooled data, we have clarified in the next section describing male-female differences that this trend is not observed in female. Furthermore, we adjusted the relevant part of the discussion the following manner:

      “A shift in the proportion of random, spatial and serial strategies was observed across days. Several factors might contribute to this shift, including learning of the environment and goal location, changes in motivation for exploration versus goal-directed navigation, and the evaluation of each strategy’s benefit via reinforcement learning. The spatial strategy progressively increased, mostly at the expense of the random strategy. This trend suggests a diminishing interest in exploration and an increasing benefit from employing the spatial strategy as the mice became more familiar with the environment and goal location. Consistent with this hypothesis, the development of the spatial strategy approximately matched the development of spatial maps in the hippocampus37 and the growth pattern of hippocampal feedforward inhibitory connectivity62, both showing progressive increases that reached plateaus after a week. In contrast, the serial strategy showed a sudden increase from day 1 to day 2, indicating that this goal-directed strategy is associated with rapid learning and could already be reinforced on day 2. However, the strategy shift was not uniform across the mouse population, as male and female mice showed distinct trends. Female mice showed no progressive increase in spatial strategy and initially relied more on the spatial strategy while using the random strategy less compared to male mice. This difference might be explained by faster learning of goal location and/or a stronger inclination towards goal-directed navigation over exploration in female mice.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor points:

      (1) The following sentence in the abstract is not grammatical: "The processes randomly selected vestibules based on either uniform (random) or biased (serial and spatial) probability distributions; closely matched experimental data across a range of statistical distributions characterizing the length, distribution, step size, direction, and stereotypy of vestibule sequences; and revealed a shift from random to spatial and serial strategies over time, with a strategy switch occurring approximately every 6 vestibule visits."

      One possible revision is: "The processes randomly selected vestibules based on either uniform (random) or biased (serial and spatial) probability distributions; [they] closely matched experimental data across a range of statistical distributions characterizing the length, distribution, step size, direction, and stereotypy of vestibule sequences, [revealing] a shift from random to spatial and serial strategies over time, with a strategy switch occurring approximately every 6 vestibule visits."

      We followed the reviewer’s suggestion.

      (2) There is a missing word in the following sentence in the last paragraph of the discussion: "Our tools might be combined in the future with optogenetic and/or pharmacogenetic [missing word here] to investigate the neural mechanisms underlying strategy selection"

      We added the word ‘manipulations’: ‘… optogenetic, pharmacogenetic manipulations …’

      Reviewer #2 (Recommendations For The Authors):

      I have two minor suggestions:

      (1) Results - Automated Maze section: It would be beneficial to clarify here that the floor and cues rotate allowing automation by chining start/end positions together. This information is key to the reader understanding the task and currently they would only know this by studying fig1 or delving into the methods

      As suggested by the reviewer, we have added the following text in the Results - Automated Maze section:

      “The maze consist of an enclosed arena with an array of 24 doors evenly spaced along the periphery, and two home boxes moving around the arena perimeter. Start positions are changed by rotating the arena and the home boxes (Fig. 1b). Furthermore, the arena has a tinted cover that prevents mice from seeing room cues while still allowing for infrared tracking of mouse trajectories.”

      (2) I still find the author's decision to exclude days from some of the line plots, e.g. days 3,4,5 from Fig2 etc, a little odd as this makes the reader wary. I appreciate their argument about clarity, but this can still be achieved while partitioning all of the data rather than excluding certain days. NB I do not find the heat map distributions in the far panel a particularly good substitute for this as pixel intensities are far less interpretable

      We appreciate the reviewer’s comment. We want to point out that line plots for all individual days are actually displayed in Supplementary Figure 7a.

      Reviewer #3 (Recommendations For The Authors):

      Although the difference between females and males is clear in Figure S8b, please note that the statistics in panels C and D might not be appropriate, as many of them may become insignificant if adjusted for multiple comparisons.

      If we understand correctly, a Bonferroni correction would need to consider the 3 day intervals in Figure S8c and the 2 day groups in Figure S8d. This would mean a significance threshold of 0.05/3 = 0.016667 for Figure S8c and 0.05/2 = 0.025 for Figure S8d, after Bonferroni correction. As it stands, all comparisons that are not labelled ’ns’ in Figure S8c-d remain significant even after applying the Bonferroni correction.

    1. Author response:

      The following is the authors’ response to the original reviews.

      (1) The authors should show i) whether the variants exhibit the same surface expression as wildtype and ii) whether changes of surface expression (e.g. wt transporter expressed low and high) alters growth rates under conditions where growth depends on amino acid uptake. The authors say that the uptake of radioactive substrate and the overall fitness coincide (Figures 5 and 6), but it would be good to quantify the correlation, perhaps by using a scatterplot and linear regression.

      We thank the reviewer for the questions and proposals. The comparison of the surface expression between the transporter-expressing variants was added to the manuscript (Figure 3- Figure supplement 1 and 2). In the case of the AGP1 variants it was calculated that surface expression between the evolved mutants and the wild-type is similar, indicating that the transporter overexpression has no impact on the growth rate per se. The same analysis for the PUT4 variants showed significant difference, with the PUT4-S variant seemingly expressed more than the wild-type. However, that does not seem to affect the uptake effect of the mutation in the cases of the original substrates of Ala, Gly and GABA, since in those cases the transporter activity for the evolved variant is substantially decreased (Figure 5). Thus, the variation on the surface expression between the mutant and the wild-type, which could be attributed to the small sample size and the inherent limitations of the analysis (imaging of a culture with cells in different planes), is not expected to interfere with the reported results.

      Additionally, a scatterplot accompanied with a linear regression curve describing the connection between the overall fitness and uptake of 2 mM radioactive substrates was added to the manuscript, as advised (Figure 5- Figure supplement 2). In both cases of 2 mM Phe or Glu, the regression model explains 60-70% of the variation observed in the uptake rate of the amino acids by the different variants if changes in the uptake rate are dependent on changes in the fitness.

      (2) The authors should further investigate to what extent the (over)expression of wildtype versus variant transporters impacts growth rates. I would recommend such experiments being done under conditions where nitrogen uptake does not depend on amino acid uptake. I could imagine that some of the fitness data are confounded by the general effects of mutations on growth rates. More concretely, I could imagine that overexpression of e.g. the AGP1-G variant is less of a burden for the yeast cells and would allow to grow them better in general. This could explain why its overall fitness is close to wt, whereas other variants exhibit diminished fitness (Fig. 4A).

      The growth curves of all transporter variant cultures in the absence of selection for amino acid uptake have been presented in Figure 4 - Supplement figure 1. As proposed, the growth rates of the variants in medium with ammonium as nitrogen source were calculated and presented in Figure 3- Supplement figure 1 and 2. For both cases of AGP1 and PUT4 expressing variants, statistical analysis showed no significant difference between the mutants and the wild-type.

      (3) It is quite remarkable that the PUT4-S variant has such a dramatically enlarged substrate spectrum. In addition, the fitness losses for Alanine and GABA are rather small. This striking finding asks the question of why yeast has not evolved this much better/more efficient variant in the first place?

      We thank the reviewer for this very good question. We now included an explanation in the Discussion, but to give a short answer here: One should keep in mind that we used a 10-gene deletion strain to select for given mutants. Wild-type cells have a wide spectrum of substrates through the use of many amino acid transporters, and their regulation is intricately tuned to achieve optimum transport under any environmental circumstance. Broadening the spectrum of a single transporter thus would not lead to increased fitness. On the contrary, it would probably throw off this fine balance.

      (4) It would be generally interesting which types of selections (transporter/amino acid combinations) were tried (maybe as part of the methods section). I could imagine that the examples that are shown in the paper are the "tip of the iceberg", and that many other trials may have failed either because the cultures died, or the identified clones would grow faster due to mutations outside of the plasmid. It would be helpful for researchers planning such experiments in the future to be made aware of potential stepping stones.

      The issues raised here are spot-on, as we actually did test the evolution of PUT4 towards transport of other amino acids than the two mentioned in the report. Aside from the successful Asp and Glu, we ran parallel cultures selecting for transport of Gln, Thr, Trp, Tyr, and Cit. Neither of these evolution regimes led to increased growth phenotypes that were linked to the evolved gene, and we did not investigate these cultures further. At this point, we cannot fully explain this result, which is why we decided to omit it from the report. The L207S variant of PUT4 was later shown to indeed support growth on Gln, Thr, and Cit. Therefore, we speculate that the reason for not evolving this mutant in the respective evolution cultures was that the fitness gain in these amino acids was not large enough to be sufficiently enriched in the course of the evolution trial. Given that the Δ10AA strain still harbors nine amino acid transporter genes in its genome, it is conceivable that upregulation of some of these genes causes growth in some amino acids, prohibiting the selection of mutations in PUT4 (e.g., by mutations outside the plasmid, as the reviewer aptly suggested). We deemed these (negative) results not appropriate for the manuscript, as our main focus was characterizing the fitness effects of single mutations, not the laboratory evolution process of obtaining the mutants.

      (5) The authors took a genetic gain-of-function approach based on random mutagenesis of the transporter. In such approaches, it is difficult to know which mutation space is finally covered/tested, and information that can be gained from loss-of-function analyses is missed. Accordingly, the outcome is somewhat anecdotal. To provide an idea of the mutational landscape accessible, the authors could perform NGS of cultures without any selective pressure, and report the distribution of missense variants in the population.

      We very much appreciate the interest in the details of the mutagenesis. Based on the information given in the original OrthoRep publications (e.g., Ravikumar et al., DOI: 10.1016/j.cell.2018.10.021; mutation rate approx. 10-5 per generation and nucleotide), we calculated the expected number of mutations per passage in our experiments. For AGP1, it is about 5000 mutational events per passage (10 mL culture volume and 1:200 dilution), and for PUT4, it is about 1000 mutational events per passage (2 mL culture volume and 1:100 dilution). At a gene length of about 2000 bp, we expect to cover most single mutations already in the first or second passage (in the absence of selection). This is reflected in the result that the strongly beneficial mutation L207S in PUT4 was recovered in every selection on Asp or Glu we tested. We included this information in the Methods section.

      That said, the present study was consciously designed to research gain-of-function mutations, as we wanted to know if and how membrane transporters can evolve new substrate specificities without losing the original functions. Our approach was chosen to reflect as close as possible a natural scenario where a microorganism encounters a new ecological niche (a new nutrient to be transported). At the same time, we included selective pressure to keep the capacity to thrive in the original niche (to assimilate an ancestral nutrient). This approach is designed to specifically select against any loss-of-function mutations, which is in line with most modern theories about evolution of protein function (excellently reviewed in Soskine and Tawfik, DOI: 10.1038/nrg2808). We find that this approach gives a good idea how transporters could evolve new functions in a natural setting. By engineering single mutations in the wild-type background of the transporters, we show the fitness effects of different single mutations - this finding thus does not depend on the mutational landscape that is covered in the experiment.

      (6) The authors do not discuss the impact of these mutations on transport rates/kinetics, which are known to play a role in substrate selection in solute carriers (https://www.nature.com/articles/s41467-023-39711-y). Do the authors think ligand binding/recognition is more important than kinetic selection in the evolution of function?

      Indeed, the observed phenotypes can stem from both changes in transport rate and changes in substrate binding. In our opinion, both are perfectly possible explanations for the behavior of evolved transporter variants. We are not discussing this in the manuscript as the weak transport of the novel substrates in the wild-type transporters did not allow us to unambiguously assign one or the other. Yet, we can lend minor circumstantial evidence pointing towards substrate affinity being the more important factor in evolving a new activity in transporters: Overall transport rate (for original substrates) declined in most evolved transporters. Therefore, it is a bit less likely that improved transport rate allowed novel substrates to be used as a nutrient. However, this is not to say that both processes can occur (even side by side).

      (7) Ultimately, what are the selective pressures that drive transporter function? The authors pose this question but don't fully develop the idea. Would promiscuous variants still be selected for if the limiting nitrogen source was taken up by the cell via a different pathway (i.e. ammonium or perhaps arginine)?

      Evolution and regulation of transporters is a very complex system, and we simplify this system in our single-transporter/single-amino acid approach. In nature, the selective forces are assumed to be much smaller than in our system, and multiple selective pressures might occur at the same time (maybe even in opposite directions). Therefore, such predictions are beyond the scope of the present study. To put it shortly, yeasts (and other organisms) have evolved the capacity to transport all natural amino acids. Yet, to actually allow fine-tuned regulation of transport of each individual amino acid, narrow- and broad-range transporters have evolved, including a lot of redundancy. This means that the question posed cannot be answered by yes or no, but by “it depends”.

      (8) Amino acids are a special class of metabolites, in that they all have the same basic structure. Thus, transport systems really only need to recognize the amino and carboxyl groups with high fidelity, and can modulate the side chain binding site to increase specificity. This was demonstrated in a bacterial APC transporter (https://www.nature.com/articles/s41467-018-03066-6#Sec2). Is this why the APC fold is largely responsible for AA uptake in biology?

      Indeed, typically, APC-type amino acid transporters bind the amino and carboxyl groups in the same position by backbone interactions. Therefore, this might be an ancestral feature of the APC superfamily and explain why this group represents the main group of amino acid transporters.

      (9) There isn't much discussion on the location of the mutations with respect to binding site vs. gating helices. Are there hotspots of mutations within the APC, and areas where variation is poorly tolerated? It would be helpful to briefly review what is known about mutations that change amino acid specificity in the APC family. My impression is that other studies applying rational mutagenesis have also shown that single-site mutations in the binding pocket alter substrate specificity - are these analogous to the L207 in PUT4? PUT4: I64T comes up in 3 of 5 selections. Did the authors consider a closer analysis of this mutation, and if not, why?

      We agree that it would be helpful to determine hotspots of mutations in APC transporters that lead to changes in selectivity. However, we feel that the current literature does not lend enough data to support an extended analysis of such hotspots. Conversely, the natural sequences of APC transporters are not similar enough to determine which residues are responsible for a certain selectivity profile. There are however some studies on site-directed mutagenesis, as mentioned by the reviewer. A short summary of those is discussed in the revised paper. Interpretation of the previous studies under the light of our results suggests that the evolutionary evolved sites derived in our work play a significant role in substrate selectivity and transporter function within the superfamily of the APC transporters.

      As to the question why we did not include the I64T mutation in our experiments: this mutation lies within the poorly defined N-terminus of the protein, which is not part of the transmembrane core. We therefore deemed this residue as probably not connected to the specificity of the protein; it might be related to the protein’s stability in the cell, as the termini of transporters are known to be important for post-translational regulation, especially vacuolar degradation.

      (10) What do we learn about the APC fold that informs our understanding of where substrate specificity arises in this fold? Do the authors think all SLC folds are equally capable of adaption, or are some more evolutionary-ready than others? An evolutionary analysis of these transporters to gain insights into whether the identified substitutions also occurred during natural evolution under real-life conditions would further strengthen the manuscript. Could the authors provide a sense of how similar the 18 yeast amino acid transporters are, such as sequence alignments or a matrix of pairwise sequence identity/similarity? Are they very diverged, or is the complement of amino acid substrates covered by a rather conserved suite of transporters?

      We do not want to make bold statements about adaptive evolution in other SLC folds, but we consider it not unlikely that a similar approach will lead to similar conclusions in other transporters.<br /> As advised, a pairwise identity matrix was added to the manuscript (Figure 1–figure supplement 2).

      As to the proposed analysis focusing on natural occurrence of the mutations we found: we have indeed looked into this, but have not found evidence of such mutations. This is actually expected, as our selection regime puts “unnatural” selective pressures on a single transporter in isolation, which in reality co-evolved with a whole suite of other transporters that already have the capacity to transport all amino acids. Therefore, it is unlikely that the same mutations would happen in a natural setting. Our study is designed to capture evolution where a completely novel substrate is encountered, for which no transport mechanism has evolved yet.

      (11) Throughout: some of the bar graphs show individual data points, but others do not (Figure 3, Figure 5). These should be shown for all experiments.

      We thank the reviewer for the comment. In the revised version of the manuscript, we included individual data points in all bar graphs.

      (12) For bar graphs in which no indication of significance is shown, does this mean that p>0.05? Comparisons that are not significant (p>0.05) should be indicated as such.

      We thank the reviewer for the comment. In the revised version of the manuscript, we indicated in the legends that in cases of no significant difference (p > 0.05) between the wild-type and the evolved variants, no asterisks are shown.

      (13) Figure 5, Figure 6: Are the three confocal images just three different fields of view? It might be useful to include a zoom-in on a single representative cell, as it is hard for the reader to see to evaluate the membrane localization.

      In the revised version of the manuscript, we clarified that the three confocal images represent three different cultures, as each variant was tested in triplicates. We also included a zoom-in of a representative cell, as suggested.

      (14) In the main text, page 9, the conditions used for each experimental evolution are not clear ("nitrogen limiting mixture of amino acids (1 mM final concentration)". I think this is an important detail, since the mixtures are quite different for the more promiscuous vs. the more selective transporter, and it would be helpful if this was described more clearly in the main text.

      We thank the reviewer for the comment. We have included further clarification in the revised manuscript.

      (15) Figure 1-Supplement 1 and Figure 4 Supplement 4 - can't read the figure labels. Try labeling columns and rows rather than individual plots.

      We have taken the proposal into account and revised the proposed Figures accordingly.

      (16) Page 9: "The transporter gene was sequenced and re-introduced into Delta-10AA cells." Was the plasmid isolated, sequenced, and re-introduced, or was the gene cut-and-pasted into a new vector backbone?

      In the revised manuscript we have clarified that the gene was sequenced and then cloned into the expression vector and re-introduced into naïve Δ10AA cells.

    1. Author response:

      We thank the reviewers for appreciating our study and for providing valuable comments and recommendations.

      We are convinced that by carefully addressing the reviewers' comments and questions, we will be able to improve the manuscript’s quality.  

      Specifically, we aim to provide further analysis to validate the subdivision of G32 RGCs into sub-clusters.

      In that context, we will improve the alignment of the RGC sub-types between the calcium imaging and MEA datasets.  

      To give the reader all information about our analysis, we will improve the methods section and explain the normalization of the calcium traces and the clustering in more detail.

      Furthermore, we will also address the concerns regarding the design of the calcium imaging experiments, potential false-negative effects, and why we did not include a wash-out condition in our experimental protocol.  

      Finally, we will revise the discussion about potential NO mechanisms and expand it on how the effects we observed may relate to known or potentially novel mechanisms.

      In particular, we will also deepen our discussion and interpretation of the strychnine dataset.  

      Again, we would like to thank the reviewers for their valuable comments.

    1. Author response:

      Reviewer #2 (Public Review):

      The manuscript by Chan et al reports results of a systematic mutagenesis approach to study the surface expression and APP+ transport mechanism of serotonin transporter. They complement this experimental evidence with large-scale molecular simulations of the transporter in the presence of APP+. The use of deep mutagenesis and large-scale adaptive sampling simulations is impressive and could be very exciting contributions to the field.

      On the whole, the results appear to provide a fascinating insight into the effects of mutations on transport mechanisms, and how those interrelate with the structural fold and biophysical properties of a dynamic protein and its substrate pathways. A weakness of the conclusions based on the molecular simulation is that it relies on comparison with previously-published work involving non-identical simulation systems (i.e. different protonation states).

      As we explain further below, this is because a preprint of previous MD simulations used a different protonation state for Glu508. However, the final published article (Chan, et al., Biophysical Journal. 121, 715–730, 2022) and new simulations we present here are consistent in having Glu508 protonated.

      Conclusions in this work about the origins of the sodium:serotonin 1:1 stoichiometry should also be considered in the context of the fact that there are two sodium ions bound in the structures of SERT, and more work is needed to explain why this ion is not also released/co-transported.

      We do not have any direct evidence as to why Na+ in the Na1 site is not also symported, except to say that in our simulations it remains bound while 5-HT/APP+ is imported. Only Na+ in the Na2 site is displaced into the cytosol, consistent with the known stoichiometry for transport and consistent with works by others. For example, the Na2 site is conserved as a functionally relevant site in distantly related secondary transporters (Cheng & Bahar, Structure. 2015; 23: 2171-2181; Stolzenberg et al., J. Biol. Chem. 2017; 292: 7372-7384; Koldsø et al., PLoS Comput. Biol. 2011; 7: e1002246; Khafizov et al., Proc. Natl. Acad. Sci. U S A. 2012; 109: E3035-E3044); please see further elaboration in the manuscript on lines 450-462. Nonetheless, it could be inferred from our data that Na+ in the Na2 site is the symported ion because it, rather than Na+ in the Na1 site, shares the exit pathway with substrate (interactions with the displaced Na+ ion are replaced by the amine of the substrate as it moves into the exit pathway).

    1. Author response:

      Reviewer #1 (Public Review):

      The authors report a high-quality genome assembly for a member of Xenacoelomorpha, a taxon that is at the center of the last remaining great controversies in animal evolution. The taxon and the species in question have "jumped around" the animal tree of life over the past 25 years, and seemed to have found their place as a sister-group to all remaining bilaterians. This hypothesis posits that the earliest split within Bilateria includes Xenacoelomorpha on the one hand and a clade known as Nephrozoa (Protostomia + Deuterostomia) on the other, and is thus referred to as the Nephrozoa hypothesis. Nephrozoa is supported by phylogenomic evidence, by a number of synapomorphic morphological characters in the Nephrozoa (namely, the presence of nephridia) and lack of some key bilaterian characters in Xenacoelomorpha, and by the presence of unique miRNAs in Nephrozoa.

      The Nephrozoa hypothesis has been challenged several times by the authors' groups who alternatively suggest placing Xenacoelomorpha within Deuterostomia as a sister group to a clade known as Ambulacraria. This hypothesis (the Xenambulacraria hypothesis) is supported by alternative phylogenomic datasets and by the shared presence of a number of unique molecular signatures. In this contribution, the authors aim to strengthen their case by providing full genome data for Xenoturbella bocki.

      The actual sequencing and analysis are technically and methodologically excellent. Some of the analyses were done several years ago using approaches that may now seem obsolete, but there is no reason not to include them. As a detailed report of a newly sequenced genome, the manuscript meets the highest standards.

      The authors emphasize a number of key findings. One is the fact that the genome is not as simple as one might expect from a "basal" taxon, and is on par with other bilaterian genomes and even more complex than the genome of secondarily simplified bilaterians. There is an implicit expectation here that the sister group to all Bilateria would represent the primitive state. This is of course not true, and the authors are aware of this, but it sometimes feels as though they are using this implicit assumption as a straw dog argument to say that since the genome is not as simple as expected, X. bocki must be nested within Bilateria. The authors get around this by acknowledging that their finding is consistent with a "weak version of the Nephrozoa hypothesis", which is essentially the Nephrozoa phylogenetic hypothesis without implicit assumptions of simplicity.

      We were NOT suggesting that Xenacoels are ‘basal’ though others have certainly done so. We were testing, instead, whether their supposed simplicity is reflected in the compostion of the genome.

      Another finding is a refutation of the miRNA data supporting Nephrozoa. This is an important finding although it is somewhat flogging a dead horse, since there is already a fair amount of skepticism about the validity of the miRNA data (now over 20 years old) for higher-level phylogenetics.

      The missing bilaterian microRNAs was one of the early pieces of evidence excluding the Xenacoelomorpha from Nephrozoa. Our new data are an important refutation of this source of evidence and add to the picture that this phylum is not lacking characters of Bilateria as had been suggested (missing micro RNAs Hox genes explicitly interpreted in this way).

      The finding that the authors feel is most important is gene presence-absence data that recovers a topology in which X. bocki is sister to Abulacraria. The problem is that the same tree does not support the monophyly of Xenacoelomorpha. This may be an artifact of fast evolving acoel genomes, as the authors suggest, but it still raises questions about the robustness of the data.

      In sum, the authors' results and analyses leave an open window for the Xenambulacraria hypothesis, but do not refute the Nephrozoa hypothesis. The manuscript is a valuable contribution to the debate but does not go a significant way towards its resolution.

      The manuscript has gone through several rounds of review and revision on a preprint server and is thus fairly clear of typos, inconsistencies and lack of clarity. The authors are honest and open in their interpretation of the results and their strengths.

      We thank the reviewer for their assessment of our manuscript. We have responded to some of the points they make above. As there were no specific points to edit or change raised by reviewer 1, we are replying in detail only to reviewer 2. We like to note that we have modified the text and thus focus of our manuscript in accordance to with what we think reviewer 1 is suggesting in the last two paragraphs of their review.

      Reviewer #2 (Public Review):

      The manuscript describes the genome assembly and analysis of Xenoturbella bocki, a worm that bears many morphological features ascribed to basal bilateria. The authors aim to analyse this genome in an attempt to determine the phylogenetic position of X. bocki as a representative of Xenacoelomorpha and its associated acoelomorphs. In doing so, they want to inform the debate as to whether xenacoelomorph belong among, or is in fact paraphyletic to all bilaterians.

      This paper presents a high-quality assembly of the X. bocki genome. By virtue of the phylogenetic position of this species, this genome has considerable scientific interest. This assembly appears to be highly complete and is a strength of the paper. The further characterisation of the genome is well executed and presented. Solid results from this paper include a comprehensive description of the Hox genes, miRNA and neruopeptide repertoire, as well as a description of the linkage group and how they relate to the ancestral linkage groups.

      Where this paper is weaker is that for the central claims and questions of this paper, i.e,. the question of the phylogenetic position of xenacoelomorph and whether X. bocki is a slowly evolving, but otherwise representative member of this clade, remains insufficiently resolved.

      The authors have achieved the goal of describing the X. bocki genome very well. By contrast, it is unclear, based on the presented evidence, whether xenacoelomorph is truly a monophyletic group. The balance of the evidence seems to suggest that the X. bocki genome belongs within the bilateria group. However, it is unclear as to what is driving the position of the other acoels. Assuming that X. bocki and the other two species in that group are monophyletic, then the evidence will favour the authors' conclusion (but without clearly rejecting the alternatives).

      This paper will likely further animate the debate regarding this basal species, and also questions related to the ancestral characters of bilateria as a whole. In particular the results from the HOX and paraHOX clusters, may provide an interesting counterpoint to the previous results based on the acoels.

      We thank the Reviewer for their extended comments on our manuscript. We would firstly like to point out that our work was not aiming to resolve the phylogenetic position of X. bocki. We discussed this question at length, as it was and is a major and important question in evolutionary biology, however we think that we had phrased any conclusions in this regard very cautiously as we are well aware of limitations in our data to resolve the conundrum.

      In this revision we have further modified our text, specifically in the Introduction and Abstract, to make it clear that we are contributing to the understanding of the evolution and biology of a fascinating organism that cannot easily be cultured in the laboratory.

      In addition, we have supplied more explanation on why Xenacoelomorpha are generally seen as a monophyletic group and which lines of evidence point to this. Again, it should be noted here that colleagues who regard the Nephrozoa hypothesis as true, do not doubt the monophyly of Xenacoelomorpha.

    1. Author response:

      Reviewer #1 (Public Review):

      This manuscript presents an exciting new method for separating insulin secretory granules using insulator-based dielectrophoresis (iDEP) of immunolabeled vesicles. The method has the advantage of being able to separate vesicles by subtle biophysical differences that do not need to be known by the experimenter, and hence could in principle be used to separate any type of organelle in an unbiased way. Any individual organelle ("particle") will have a characteristic ratio of electrokinetic to dielectrophoretic mobilities (EKMr) that will determine where it migrates in the presence of an electric field. Particles with different EKMr will migrate differently and thus can be separated. The present manuscript is primarily a methods paper to show the feasibility of the iDEP technique applied to insulin vesicles. Experiments are performed on cultured cells in low or high glucose, with the conclusion that there are several distinct subpopulations of insulin vesicles in both conditions, but that the distributions in the two conditions are different. As it is already known that glucose induces release of mature insulin vesicles and stimulates new vesicle biosynthesis and maturation, this finding is not necessarily new, but is intended as a proof of principle experiment to show that the technique works. This is a promising new technology based on solid theory that has the possibility to transform the study of insulin vesicle subpopulations, itself an emerging field. The technique development is a major strength of the paper. Also, cellular fractionation and iDEP experiments are performed well, and it is clear that the distribution of vesicle populations is different in the low and high glucose conditions. However, more work is needed to characterize the vesicle populations being separated, leaving open the possibility that the separated populations are not only insulin vesicles, but might consist of other compartments as well. It is also unclear whether the populations might represent immature and mature vesicles, distinct pools of mature vesicles such as the readily releasable pool and the reserve pool, or vesicles of different age. Without a better characterization of these populations, it is not possible to assess how well the iDEP technique is doing what is claimed.

      Major comments:

      1) There is no attempt to relate the separated populations of vesicles to known subpopulations of insulin vesicles such as immature and mature vesicles, or the more recently characterized Syt9 and Syt7 vesicle subpopulations that differ in protein and lipid composition (Kreutzberger et al. 2020). Given that it is unclear exactly what populations of vesicles will be immunolabeled (see point #2 below), it is also possible that some of the "subpopulations" are other compartments being separated in addition to insulin vesicles. It will be important to examine other markers on these separated populations or to perform EM to show that they look like insulin vesicles.

      We thank the reviewer for this comment and have added the following to the discussion:

      “The intensity peaks we observed at specific EKMr values likely correspond to some of the previously described insulin vesicle subpopulations34,54-57. Larger particles are expected to have a smaller EKMr value compared to smaller particles50. Subpopulations containing larger insulin vesicles, such as a mature pool34,54, synaptotagmin IX-positive vesicles57, or docked vesicles near the plasma membrane34 may have lower EKMr values than smaller immature vesicles. Additionally, phosphatidylcholine lipids increase the zeta potential of tristearoylglycerol crystals58. This effect may extend to insulin vesicle subpopulations containing more phosphatidylcholine, such as young insulin vesicles55 which could lead to higher EKMr values. Taken together, these two properties may be used to predict the EKMr values of known insulin vesicle subpopulations. For example, insulin vesicles with EKMr values of 1-2×109 V/m2 (Fig. 4C) may represent a synaptotagmin IX-positive subpopulation due to their larger radii and depletion under glucose stimulation. Additionally, young insulin vesicles may have EKMr values between 5 and 7.5×109 V/m2 (Fig. 4C) due to higher amounts of phosphatidylcholine present in this subpopulation55. In this EKMr range, we observed a higher intensity for glucose-treated cells which may suggest biosynthesis of new vesicles. Immature insulin vesicles are likely to have higher EKMr values due to their smaller size34, such as an EKMr value between 1.5-1.6×1010 V/m2 (Fig. 4C). Here we demonstrated the capabilities of DC-iDEP to separate insulin vesicle subpopulations in an unbiased manner. Future experiments using chemical probes to label subpopulations will be useful to accurately define the EKMr values associated with specific subpopulations.” pages 7-8, lines 176-191

      Furthermore, we have conducted additional experiments using a modified INS-1 cell line with a GFP-tagged C-peptide (hPro-CpepSfGFP, GRINCH cells RRID:CVCL_WH61) in order to visualize a more complete population of insulin vesicles. By using this cell line, we have performed confocal microscopy, transmission electron microscopy, and cryo-electron microscopy experiments, demonstrating that the isolated vesicles resemble insulin vesicles and contain GFP-tagged C-peptide (Fig. 1-S3). While we acknowledge that further investigation using a more detailed labeling strategy of known insulin vesicle populations with DC-iDEP would be informative, we believe it is beyond the scope of our initial proof-of-concept experiments.

      The following text was added to the results section to describe our additional microscopy analysis:

      “To verify that the insulin vesicles were intact prior to DC-iDEP, we imaged a modified INS-1E cell line that contains a human insulin and green fluorescent protein-tagged C peptide (hPro-CpepSfGFP).49 This GFP tag allowed for quick visual verification of intact vesicles using fluorescence confocal microscopy. We observed distinct puncta rather than a diffuse GFP signal which indicated that the vesicles were intact and not ruptured. Further analysis of isolated vesicles was done using EM. We observed intact vesicles with the expected size and shape using both transmission electron microscopy (TEM) and cryo-electron microscopy (cryo-EM) (Fig. 1—figure supplement 3).” Page 5, lines 104 – 109.

      2) An antibody to synaptotagmin V is used to immunolabel vesicles, but there has been confusion between synaptotagmins V and IX in the literature and it isn't clear what exactly is being recognized by this antibody (this reviewer actually thinks it is Syt 9). If it is indeed recognizing Syt 9, it might already be labeling a restricted population of insulin vesicles (Kreutzberger et al. 2020). The specificity of this antibody should be clarified. Furthermore, Figure 2 is not convincing at showing that this synaptotagmin antibody specifically labels insulin vesicles nor is there convincing colocalization of this synaptotagmin antibody with insulin vesicles. In the image shown, several cells show very weak or no staining of both insulin and the synaptotagmin. The highlighted cell appears to show insulin mainly in a perinuclear structure (probably the Golgi) rather than in mature vesicles (which should be punctate), and insulin is not particularly well-colocalized with the synaptotagmin. Other cells in the image appear to have even less colocalization of insulin and synaptotagmin, and there is no quantification of colocalization. It seems possible that this antibody is recognizing other compartments in the cell, which would change the interpretation of the populations measured in the iDEP experiments. It would also be good to perform synaptotagmin staining under glucose-stimulating conditions, in case this alters the localization.

      We thank the reviewer for bringing this issue to our attention. The antibody originally used in Figure 2 recognizes the 386 aa isoform of synaptotagmin, which is called Syt 9 in the paper mentioned above (Kreutzberger et al. 2020). We have edited our manuscript to label this antibody as “Synaptotagmin IX” to match the existing literature. This antibody, therefore, likely labels only a subset of insulin vesicles. We believe that populations measured in the iDEP experiments consist solely of insulin vesicles, as supported by Western blot and dynamic light scattering results (Fig. 1—figure supplement 2B-C), as well as EM images (Fig. 1—figure supplement 3). Even with a subset of insulin vesicles, these results show the potential of this method, as iDEP analysis reveals heterogeneity within the population of Syt 9-positive insulin vesicles. We have replaced the original immunofluorescence images in Figure 2 with images that are more representative of INS-1E cells. We recognize that immuno-labeling did not yield perfect co-localization, which was expected. However, these experiments do provide valuable insights into the promise of using DC-iDEP for more in-depth separation analysis. Future work will use a modified INS-1 cell line or mouse model with a GFP-tagged C-peptide (hPro-CpepSfGFP, GRINCH cells RRID:CVCL_WH61) in order to visualize a less restricted set of insulin vesicles, avoiding the limitations associated with antibodies confined to a specific insulin vesicle subpopulation.

      3) The EKMr values of the vesicle populations between the low and high glucose conditions don't seem to precisely match. It is unclear if this just a technical limitation in comparing between experiments or instead suggests that glucose stimulation does not just change the proportion of vesicles in the subpopulations (i.e. the relative fluorescent intensities measured), but rather the nature of the subpopulations (i.e. they have distinct biophysical characteristics). This again gets to the issue of what these vesicle subpopulations represent. If glucose stimulation is simply converting immature to mature vesicles, one might expect it to change the proportion of vesicles, but not the biophysical properties of each subpopulation.

      We thank the reviewer for this question. We agree that glucose likely shifts the proportion of vesicles within a specific EKMr value rather than impacting the overall biophysical characteristics of all vesicles. We have performed new statistical analysis as suggested and rewritten this section to better explain the differences between conditions.

      “Visual inspection of the collected data revealed generally similar patterns of vesicles collected at specific EKMr values (Fig. 4). However, at 1200 V we achieved adequate separation of vesicle populations to discern unique populations of vesicles from cells treated with glucose compared to no treatment. Using a two-way ANOVA, we found a statistically significant interaction between the effect of treatment on vesicles collected at each EKMr value for data collected only at 1200 V [F (8, 45) = 3.61, p= 0.003]. A Bonferroni post hoc test revealed a significant difference in the intensity or quantity of vesicles collected between treated and untreated samples at 1.10x109 V/m2 (p=0.0249), 5.35x109 V/m2 (p=0.0469), 7.45x109 V/m2 (p=0.0369). These differences reflect a shift in the populations of insulin vesicles upon glucose stimulation.” Page 7, lines 158-165

      We have also now directly addressed the potential identities of the different populations in the discussion section. This was addressed in major comment #1 and on page 7 lines, 176-191 of the manuscript.

      4) The title of the paper promises "isolation" of insulin vesicles, but the manuscript only presents separation and no isolation of the separated populations. Isolation of the separated populations is important to be able to better define what these populations are (see point #1 above). Isolation is also critical if this is to be a valuable technique in the future. Yet the paper is unclear on whether it is actually technically feasible to isolate the populations separated by iDEP. In line 367, it states "this method provides a mechanism for the isolation and concentration of fractions which show the largest difference between the two population patterns for further bioanalysis (imaging, proteomics, lipidomics, etc.)." However, in line 361 it says "developing the capability to port the collected individual boluses will enable downstream analyses such as mass spectrometry or electron microscopy," suggesting that true isolation of these populations is not yet feasible. This should be clarified.

      We thank the reviewer for pointing this out. We have modified the text and title to put more focus on our ability to separate vesicles rather than isolate. We agree that the isolation and further biophysical characterization of these subpopulations will be critical to understanding them. However, this capability is still in development. We have made the following change to clarify that a way to isolate these subpopulations once iDEP-assisted separation has occurred is currently being developed.

      Title: “Insulator-based dielectrophoresis-assisted separation of insulin secretory vesicles”

      “this method serves as a stepping stone towards isolation and concentration of fractions which show the largest difference between the two population patterns for further bioanalysis…” page 9, line 230-232.

      Reviewer #2 (Public Review):

      This manuscript used DC-iDEP, a technology previously used on other organelle preparations to isolate insulin secretory granules from INS1 cells based on differences in dielectrophoretic and electrokinetic properties of synaptotagmin V positive insulin granules.

      The major motivation presented for this work is to provide a methodology to allow for more sensitive isolation of subpopulations of granules allowing better understanding of the biochemical composition of these populations. This manuscript clearly demonstrates the ability of this technology to separate these subpopulations which will allow for future biochemical characterizations of insulin granules in future studies.

      After proving these subpopulations can be observed, this method was then utilized to show there are shifts in these subpopulations when granules are isolated from glucose stimulated cells. Overall the method of isolation is novel and could provide a tool for further characterization of purified secretory granules.

      The observation of glucose stimulation causing shifts in subpopulations is unsurprising. Glucose stimulation could cause a depletion of insulin and other secretory content from a subset of granules. It would be expected that this loss of content would cause a shift in electrochemical properties of the granules, but this is a nice confirmation that the isolation method has the sensitivity to delineate these changes.

      Major comments:

      1) It is unclear what Synaptotagmin isoform is being looked at. Synaptotagmin V and IX have been repetitively interchanged in the literature. See note in syt IX section of "Moghadam and Jackson 2013 Front. Endocrinology" or read "Fukuda and Sagi- Eisenberg Calcium Bind Proteins 2008".

      The 386 aa. isoform that is abundant in PC12 cells has been robustly observed in INS1 cells in multiple studies and has been frequently referred to as syt IX. The sequence the antibody was raised against should be determined from the company where this was purchased and then this should be mapped to to which isoform of Synaptotagmin by sequence and clarified in the text.

      We thank the reviewer for this comment. The supplier (Thermo Fisher Scientific) calls this antibody “Synaptotagmin V.” As it recognizes the 386 aa synaptotagmin isoform, we have changed references to this antibody to call it “Synaptotagmin IX” to match the existing literature.

      2) Immunofluorescence of insulin and syt V is confusing. The example images do not appear to show robust punctate structures that are characteristic of secretory granules (in both the insulin and syt V stain).

      We appreciate the reviewer bringing this point to our attention. We agree that the immunofluorescence images in Figure 2 are not representative of typical INS-1E cells and have replaced the original image for Figure 2 with new images that show punctate structures that are more characteristic of secretory granules. These images also have better colocalization of insulin and synaptotagmin V (now labeled synaptotagmin IX) than the original image, with Pearson’s R values of 0.66 and 0.64.

      3) In the discussion it says, "Finally, this method provides a mechanism for the isolation and concentration of fractions which show the largest difference between the two population patterns for further bioanalysis (imaging, proteomics, lipidomics, etc.) that otherwise would not be possible given the low-abundance components of these subpopulations."

      It would help to elaborate more on the yield and concentrations of isolated granules. This would give a better sense of what level of biochemical characterization could be performed on sub- populations of granules.

      We thank the reviewer for this comment. This line has been changed to clarify the current capabilities of iDEP, as subpopulations cannot presently be removed from the channel.

      “this method serves as a stepping stone towards isolation and concentration of fractions which show the largest difference between the two population patterns for further bioanalysis…” page 9 line 230-232.

      Once it is possible to isolate subpopulations from the channel, we expect to obtain sufficient sample for further characterization. We anticipate that biophysical characterization such as imaging will be highly feasible, and small-scale proteomics could also be possible. However, currently we have not measured the concentration of isolated vesicles due to complications in the isolation steps. If the quantity of isolated subpopulations proves inadequate for proteomic analysis, we plan to scale up our cell culture to generate enough insulin vesicles for further biochemical characterization. However, these experiments are out of scope for our current work, so we removed details on this idea in the Introduction and Discussion.

      Reviewer #3 (Public Review):

      The manuscript from Barekatain et al. is investigating heterogeneity within the population of insulin vesicles from an insulinoma cell line (INS-1E) in response to glucose stimulation. Prevailing dogma in the beta-cell field suggests that there are distinct pools of mature insulin granules, such as ready-releasable and a reserve pool, which contribute to distinct phases of insulin release in response to glucose stimulation. Whether these pools (and others) are distinct in protein/lipid composition or other aspects is not known, but has been suggested. In this manuscript, the authors use density gradient sedimentation to enrich for insulin vesicles, noting the existence of a number of co-purifying contaminants (ER and mitochondrial markers). Following immunolabeling with synaptotagmin V and fluorescent-conjugated secondary antibodies, insulin vesicles were applied to a microfluidic device and separated by dielectrophoretic and electrokinetic forces following an applied voltage. The equilibrium between these opposing forces was used to physically separate insulin granules. Here some differences were observed in the insulin (Syt V positive) granule populations, when isolated from cells that were either non-stimulated or stimulated with glucose, which has been suggested previously by other studies as noted by the authors; however in the current manuscript, the inclusion of a number of control experiments may provide a better context for what the data reveal about these changes.

      The major strength of the paper is in the use of the novel, highly sophisticated methodology to examine physical attributes of insulin granules and thus begin to provide some insight into the existence of distinct insulin granule populations within a beta-cell -these include insulin granules that are maturing, membrane- docked (i.e. readily releasable), in reserve, newly-synthesized, aged, etc. Whether physical differences exist between these various granule pools is not known. In this capacity, the technical abilities of the current manuscript may begin to offer some insight into whether these perceived distinctions are physical.

      The major weakness of the manuscript is that the study falls short in terms of linking the biology to the sophisticated changes observed and primarily focuses on differences in response to glucose. Without knowing what the various populations of granules are, it is challenging to understand what the changes in response to glucose mean.

      Specific concerns are as follows:

      1) There is confusion on what the DC-iDEP separation between stimulated and stimulated cells reveals. Do these changes reflect maturation state of granules, nascent vs. old granules? Ready- releasable vs. reserve pool? The comments in the text seem to offer all possibilities.

      We thank the reviewer for this comment. Additional experiments will be useful to concretely define the physical nature of these subpopulations. Our primary goal in this study is to assess the utility of DC-iDEP in reproducibly separating these subpopulations. Our current results reflect variations in the amounts of subpopulations described in the literature and/or in currently uncharacterized subpopulations. As addressed in Reviewer #1 question #1, we have added to the discussion to review these possibilities (Page 7-8, lines 176-191).

      2) It is unclear what we can infer regarding the physical changes of granules between the stimulated states of the cells. Without an understanding of the magnitude of the effect, it is unclear how biologically significant these changes are. For example, what degree of lipid or protein remodeling would be necessary to give a similar change?

      We thank the reviewer for this question. Separation by iDEP is sufficiently sensitive to distinguish particles with minimal differences between them. For example, we could successfully separate wild type GFP from a point mutation variant of GFP. We anticipate that this method is capable of distinguishing vesicles with greater physical differences between them resulting in more distinct EKMr values. However, significant future experiments are likely necessary to determine the extent of lipid and protein remodeling between each subpopulation to define the biological significance of each subpopulation.

      3) The reliance on a single vesicle marker, Syt V, is concerning given that granule remodeling is the focus.

      We appreciate the reviewer’s concern. The current manuscript focuses on synaptotagmin V (IX)-positive insulin vesicles. The results of these experiments demonstrate the capabilities of iDEP to reveal heterogeneity in a seemingly similar set of particles. In future experiments we plan to use the modified INS-1 cell line with a GFP-tagged C-peptide (hPro-CpepSfGFP, GRINCH cells RRID:CVCL_WH61). All insulin vesicles from this cell line contain GFP-tagged C-peptide, and therefore would allow for the detection of a more complete set of insulin vesicles. The results from the current manuscript provide the proof-of-concept validation that this method is promising for understanding vesicle remodeling in more detail in the future.

      4) Additional confirmation that the isolated vesicles are in fact insulin granules would be helpful. As noted, granules were gradient enriched, but did carry contaminants. Note that the microscopy image provided does not provide any real validation for this marker.

      Further confirmation that the immune-isolated vesicles are in fact insulin granules should be included. EM with immunogold labeling post-SytV enrichment would be a potential methodology to confirm.

      We thank the reviewer for this comment. We have performed new immunofluorescence imaging to demonstrate the overlap of insulin and synaptotagmin (Fig 2). Additionally, we have performed microscopy experiments with a modified INS-1 cell line with a GFP-tagged C-peptide (hPro-CpepSfGFP, GRINCH cells RRID:CVCL_WH61) in order to provide evidence of these granules’ identity. Fluorescence microscopy revealed that the isolated granules contain GFP-tagged C-peptide (Fig. 1—figure supplement 3A), while transmission electron microscopy and cryo-electron microscopy confirmed that these vesicles have radii within the correct range to be considered insulin vesicles (Fig 1—figure supplement 3B-C). We added the following text in the results section to describe the new results included:

      “To verify that the insulin vesicles were intact prior to DC-iDEP, we imaged a modified INS-1E cell line that contains a human insulin and green fluorescent protein-tagged C peptide (hPro-CpepSfGFP).49 This GFP tag allowed for quick visual verification of intact vesicles using fluorescence confocal microscopy. We observed distinct puncta rather than a diffuse GFP signal which indicated that the vesicles were intact and not ruptured. Further analysis of isolated vesicles was done using EM. We observed intact vesicles with the expected size and shape using both transmission electron microscopy (TEM) and cryo-electron microscopy (cryo-EM) (Fig. 1—figure supplement 3). Page 5, lines 104 – 109.

      5) It would be useful to understand if the observed effects are specific to the INS-1E cell line or are a more universal effect of glucose on beta-cells.

      We agree with the reviewer that it would be interesting to study these effects in primary beta cells. While we expect to see similar results in these cells, there may be differences in the population variations or EKMr values. However, working with beta cells is currently beyond the scope of this study, as our primary focus is on validating this approach.

    1. Author response:

      Reviewer #1 (Public Review):

      Authors propose a mechanism where actin polymerization in the dendritic shaft plays a key role in trapping AMPAR vesicles around the stimulated site, promoting the preferential insertion of AMPAR into the potentiated synapse. This dendritic mechanism is novel and may be important for phenomena. Authors also developed a sophisticated method to observe the endogenous behavior of AMPAR using the HITI system.

      However, there are some major issues that need to be addressed to support the authors' claims. Also, overall, it is hard to follow. It could be better written.

      We thank the reviewer for carefully reading our text and for the helpful recommendations. We have performed additional experiments and analysis to address the raised issues (detailed below). In addition, we have streamlined and shortened the text to improve its clarity and focus on the biological story.

      Reviewer #2 (Public Review):

      In this study, Wong and colleagues investigate mechanisms leading to input-specificity of LTP. They focus on the trafficking of AMPA receptors as the surface accumulation of AMPARs is one of the key features of potentiated synapses. They employ an elegant strategy to label endogenous GluA1 with a HaloTag using CRISPR-based technology and succeed to find targeting site which does not interfere with receptor's trafficking or function. This allowed them to visualize and track single receptors in endosomes as well as at the plasma membrane of primary rat hippocampal neurons. They develop and extend particle tracking and molecule counting algorithms to analyze active transport and diffusion of AMPARs and, as expected find that neuronal activation leads to increased surface expression of labelled AMPARs. Interestingly, they also observe a strong decrease in long-range motion of AMPAR-containing vesicles upon induction of chemical LTP. From this point, the manuscript focuses on explaining this observation. The authors switch from a global activation protocol to glutamate uncaging to induce LTP at individual synapses. Also, in these settings, they measure the reduction in mobile vesicle fraction within about 30 µm long dendritic segment containing the activated spine. In search of an explanation, they investigate activity-dependent actin polymerization as a possible confinement factor that could change the motility of organelles in dendrites. Their hypotheses is based on pre-existing literature demonstrating the role of F-actin in trapping and stalling dendritic endolysosomes as well similar role of F-actin in non-neuronal cells. Indeed, the authors convincingly show that pharmacological depolymerization or stabilization of F-actin bidirectionally impacts the trafficking behavior of AMPAR-containing vesicles in the dendritic shaft. To directly visualize effects of structural LTP at individual synapses on dendritic actin cytoskeleton, they employ a F-actin-binding probe Tractin. Here they find that cLTP results in the formation of dendritic F-actin fibers and bundles arranged in a network. The spatial extent of such a network correlates with an area where AMPAR vesicles exhibit decreased motility. Although this makes sense, I have some concerns about these experiments.

      Tractin has been previously published as F-actin marker but like several other binding probes (i.e. lifeact), it affects F-actin structure and dynamics. The large number of F-actin bundles is not very typical for dendrites of hippocampal neurons and might be an artifact of Tractin overexpression. It is difficult to judge whether this is a case because there is no comparison with the endogenous situation where F-actin is labelled directly. The final series of experiments focus on the role of processive myosins in stalling and exocytosis of AMPAR vesicles. To address this point, the authors employ a mixture of three different myosin inhibitors and show that although myosins are not responsible for increased vesicle confinement they facilitate exocytosis of AMPARs. What I find somewhat missing are data and examples of AMPAR trafficking into dendritic spines. Also here, stronger experimental support could benefit the conclusions.

      Overall, the authors achieved the aims of their study. They demonstrated that synapse-specific potentiation results in signaling which triggers actin polymerization in dendritic shaft beneath the activated input. This leads to trapping and accumulation of AMPAR-containing endosomes which then have higher probability to be delivered and secreted at activated dendritic spines. In addition to conceptual advance of this work, several state-of-the-art labeling and analysis techniques where developed in this project and they will likely be used by other groups.

      We thank the reviewer for raising these important issues with regards to the use of tractin as a marker for actin polymerization. We have performed additional experiments (detailed below) using phalloidin and also dominant negative inhibitors of myosin Va, Vb, and VI in order to strengthen our conclusions. We find that inducing synaptic activity with cLTP increases phalloidin labeling and the appearance of F-actin fibers. Moreover, inhibition of myosin Va and Vb (but not VI) using their dominant negative c-terminal domains recapitulates the effects of pharmacological inhibition on both the motion states and directional bias of GluA1-HT vesicles in response to cLTP.

      With regards to AMPAR trafficking into spines, we and others have found that GluA1-containing vesicles rarely enter dendritic spines (see response to Reviewer #2, comment 3). Furthermore, exocytic events occur largely at extrasynaptic sites, such as on the dendritic shaft (Figure 5-video 1-3; Lin et al., 2007; Makino et al., 2009; Patterson et al., 2010). Consequently, we believe vesicles are concentrated proximal to synaptic activity in the dendritic shaft rather than in the dendritic spine itself, creating a larger reservoir of intracellular AMPARs that can exocytose during synaptic activity. Others have demonstrated that surface bound AMPARs diffuse across the cell membrane into stimulated synapses where they are captured (Choquet and Opazo, 2022).

      We also thank the reviewers for acknowledging the conceptual and technical advances in this work.

      Reviewer #3 (Public Review):

      Wong et al. developed a new versatile approach with a robust signal to track protein dynamics by inserting a tag into the endogenous loci and different properties of fluorescent dyes for conjugation. Using this approach, the authors monitor the trafficking of Fluorescent dye and Halo-tagged GluA1 with time-lapse imaging and found that neuronal stimulation induces GluA1 accumulation surrounding stimulated synapses on dendritic shafts and actin polymerization at synapses and dendrites. Furthermore, combining with pharmacological manipulations of actin polymerization or myosin activity, the authors found that actin polymerization facilitates exocytosis of GluA1 near activated synapses. The new approach may provide broad impacts upon appropriate control experiments, and the practical application of this approach to GluA1 trafficking upon neuronal activation is significant. However, there are several weaknesses, including confirmation of activity of the tagged receptors and receptor specificity mimicking endogenous LTP machinery. If the receptor tagged by the new robust approach reflects endogenous machinery, this approach will provide a big opportunity to the community as a versatile method to visualize a protein not visualized previously.

      Although we use methods previously demonstrated to stimulate LTP, we do not ourselves demonstrate LTP using electrophysiological methods, and consequently we have changed the text to focus on synaptic plasticity (specifically structural plasticity). Furthermore, we confirm the activity of HaloTag knock-in receptors by expressing GluA1-HT and GluA1-HT-SEP in HEK293T cells and performing whole-cell patch clamp experiments. We find that GluA1-HT and GluA1-HT-SEP responds to glutamate in a similar manner to untagged GluA1.

      We also thank the reviewer for acknowledging the novelty of our strategy.

    1. Author response:

      We thank both reviewers for their constructive feedback. We were grateful to see that both reviewers found our work to be valuable to the field, and agreed that new metrics (including our introduced MECR) were important for dataset evaluation. We briefly respond to two main points from the reviewers.

      (1) Key findings from our manuscript. While we do evaluate publicly available datasets in our manuscript, the focus/conclusion of our work is not to return a definitive ranking of in-situ technologies. As reviewers point out, our comparative evaluation is only in a single biological context, and we further note that many of these in situ platforms are rapidly evolving with new chemistries and gene panels. 

      Instead, the conclusion and purpose of our manuscript was to emphasize the importance and need for new metrics when evaluating spatial datasets. We propose an option, and demonstrate how cell segmentation can affect technical metrics, but also downstream biological analysis of in-situ datasets.

      (2) Comparing technologies with different gene panels. The reviewers correctly point out that comparing technologies that use different gene panels is not a perfect benchmark. We agree that differences in molecular counts could arise due to biological differences in the abundance of targeted genes.

      We did address this in Supplementary Figure 4, where we perform pairwise comparisons of each technology - and compute these only using overlapping genes that were measured by both technology. Our results are consistent with the analysis of full gene sets. 

      While we believe that regenerating in-situ datasets with identical gene panels is beyond the scope of this work (and is likely technically infeasible), we hope that our findings are still valuable and informative to the growing spatial community.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study assesses homeostatic plasticity mechanisms driven by inhibitory GABAergic synapses in cultured cortical neurons. The authors report that up- or down-regulation of GABAergic synaptic strength, rather than excitatory glutamatergic synaptic strength, is critical for homeostatic regulation of neuronal firing rates. The reviewers noted that the findings are potentially important, but they also raised questions. In particular, the evidence supporting the findings is currently incomplete and demonstration of independent regulation of mEPSCs and mIPSCs is a necessary experiment to support the major claims of the study. 

      We appreciate the detailed, thoughtful assessment of our paper by the reviewers and editors and now submit a revised version that addresses the reviewers’ comments as detailed below in response to each concern. We include a more open discussion of alternative possibilities and have added experiments demonstrating that AMPAergic scaling in our mouse cortical cultures is triggered differently than GABAergic scaling. We treated the cultured neurons exactly as described for triggering GABAergic scaling (20µM CNQX for 24 hours), however this did not trigger AMPAergic upscaling (new Figure 7), even though it did reduce spiking/bursting activity. Below we explain the result further, but ultimately this does demonstrate independent regulation of mEPSCs and mIPSCs as requested by the editor/reviewer (spike reductions induced by CNQX reduced mIPSC amplitude, but had no effect on mEPSC amplitude).

      Reviewer #1 (Public Review):

      While the paper is ambitious in its rhetorical scope and certainly presents intriguing findings, there are several serious concerns that need to be addressed to substantiate the interpretations of the data. For example, the CTZ data do not support the interpretations and conclusions drawn by the authors. Summarily, the authors argue that GABAergic scaling is measuring spiking (at the time scale of the homeostatic response, which they suggest is a key feature of a homeostat) yet their data in figure 5B show more convincingly that CTZ does not influence spiking levels - only one out of four time points is marginally significant (also, I suspect that the bootstrapping method mentioned in line 454-459 was conducted as a pairwise comparison of distributions. There is no mention of multiple comparisons corrections, and I have to assume that the significance at 3h would disappear with correction).

      We certainly understand the criticism here (similar to reviewer 2’s third point). We now discuss these complications in a more detailed description in the manuscript (CTZ section of results and at end of the discussion). First, we are presenting our entire dataset to be as transparent as possible. Unlike most synaptic scaling studies (including our own) that apply drugs to alter activity and assess mPSC amplitude at the final time point, here we are actually showing CTZ’s effect on spiking activity within the culture over time. This is critical because it has informed us of the drug’s true effect on spiking, the variability that is associated with these perturbations, and the ability and timing of the cultured network to homeostatically recover initial levels. This was important because it revealed that the drugs do not always influence activity in the way we assume, and this provides greater context to our results. Second, we are showing all of our data, and presenting it using estimation statistics which go beyond the dichotomy of a simple p value yes or no (Ho J, Tumkaya T, Aryal S, Choi H, Claridge-Chang A. 2019. Moving beyond P values: data analysis with estimation graphics. Nat Methods 16: 565-66). Estimation statistics have become a more standard statistical approach in the last 15 years and is the preferred method for the Society for Neuroscience’s eNeuro Journal. This method shows the effect size and the confidence interval of the distribution. For the 3 hr time point in Fig. 5B the CTZ/ethanol vs. ethanol data points exhibit very little overlap and the effect size demonstrates a near doubling of spike frequency, and the confidence interval shows a clear separation from 0. This was a pairwise comparison as we compared values at each time point after the addition of ethanol or ethanol/CTZ. Third, the plots illustrate an upward trend in spike frequency at 1 and 6 hrs, but that there is also clear variability. It is important to note that these are multiunit recordings and not purely excitatory principal neurons that we target for mPSC recordings. This complication along with the variability inherent in these cultures could make simple comparisons difficult to interpret and we now discuss this (end of discussion). Regardless, we do see some increase in spiking with CTZ and we clearly see increases in mIPSC amplitude, thus providing some support for the idea that spiking could be a critical player in terms of GABAergic scaling, particularly when put in the context of all of our findings. Future work will be necessary to determine how alterations in spiking lead to changes in mIPSC amplitude and we now discuss this (2nd to last paragraph in discussion).

      Then, the fact that TTX applied on top of CTZ drives an increase in mIPSC amplitude is interpreted as a conclusive demonstration that GABAergic scaling is sensing spiking. It is inevitable, however, that TTX will also severely reduce AMAP-R activation - a very plausible alternative explanation is that the augmentation of AMPAR activation caused by CTZ is not sufficient to overcome the dramatic impact of TTX. All together, these data do not provide substantial evidence for the conclusion drawn by the authors. 

      We believe that the most parsimonious explanation for our results is that spiking activity, not AMPAR activation, triggers GABAergic downscaling. GABAergic scaling is no different when comparing 24hr TTX treatment vs TTX+CTZ, and optogenetic restoration of spiking activity while continuing to block AMPAR activation was able to restore GABAergic mPSC amplitudes to control levels. It is important to emphasize that our results with TTX vs. TTX+CTZ are different for GABAergic scaling (no difference in this study) and AMPAergic scaling (CTZ diminished upward scaling in previous study – Fong et al., 2015 - PMID: 25751516) suggesting different triggers for the two forms of scaling. While we strongly believe we have demonstrated that GABAergic downscaling is dependent on spiking (not AMPAergic transmission), we now acknowledge that we cannot rule out the possibility that upward GABAergic scaling may be influenced by AMPAR activation (2nd paragraph discussion), although we have no evidence in support of this.

      Specific points:

      - The logic of the basis for the argument is somewhat flawed: A homeostat does not require a multiplicative mechanism, nor does it even need to be synaptic. Membrane excitability is a locus of homeostatic regulation of firing, for example. In addition, synapse-specific modulation can also be homeostatic. The only requirement of the homeostat is that its deployment subserves the stabilization of a biological parameter (e.g., firing rate). 

      We largely agree with the reviewer and should not have implied that this was a necessary requirement for a spike rate homeostat. What we should have said was that historically this definition has been applied to AMPAergic scaling, which is thought to be a spike rate homeostat. We have now corrected this (introduction and discussion).

      - Line 63 parenthetically references an important, but contradictory study as a brief "however". Given the tone of the writing, it would be more balanced to give this study at least a full sentence of exposition. 

      Agreed, and we have now done this.

      - The authors state (line 11) that expression of a hyperpolarizing conductance did not trigger scaling. More recent work ('Homeostatic synaptic scaling establishes the specificity of an associative memory') does this via expression of DREADDs and finds robust scaling.

      The purpose of citing this study was to argue that the spike rate homeostat hypothesis doesn’t make sense for AMPAergic scaling based on a study that hyperpolarized an individual cell while leaving the rest of the network unaltered and therefore leaving network activity and neurotransmission largely normal. In this previous study scaling was not triggered, suggesting reduced spike rate within an individual cell was insufficient to trigger scaling in that cell. The more recent study mentioned by the reviewer achieved scaling by hyperpolarizing a majority of cells in the network. Importantly, this approach alters neurotransmission throughout the network, making it challenging to isolate the specific contributions of spiking vs. receptor activation. Unlike the previous study, which focused on the impact within individual cells, this newer study involves global alterations in network activity, complicating the interpretation of the role of spiking versus receptor activation in triggering scaling.

      - Supplemental figure 1 looks largely linear to me? Out of curiosity, wouldn't you expect the left end to be aberrant because scaling up should theoretically increase the strength of some synapses that would have been previously below threshold for detection?

      We agree that the scaling ratio plot is largely linear. To be clear, the linearity of the ratio plot was not our point here, rather that there was a positive slope meaning ratios (CNQX mEPSC amplitudes/control mEPSC amplitudes) got bigger for the larger CNQX-treated mEPSCs. Alternatively, a multiplicative relationship where mEPSCs are all increased by a single factor (e.g. 2X) would be a flat line with 0 slope at the multiplicative value (e.g. 2). In terms of the left side of the plot, we do see values that rise abruptly from 1 - this was partially obstructed by the Y axis in this figure and we have adjusted this. This left part of the plot is likely due the CNQX-induced increases in mEPSC amplitudes of mini’s that where below our detection threshold of 5pA, as suggested by the reviewer. Therefore, mini’s that were 4pAs could now be 5pAs after CNQX treatment and these are then divided by the smallest control mEPSCs which are 5 pAs (ratio of 1). We tried to do a better job describing this in the resubmission (1st paragraph of results).

      - Given that figure 2B also shows warping at the tail ends of similar distributions, how is this to be interpreted? 

      The left side of the ratio plot shows evidence consistent with the idea that mIPSCs are dropping into the noise after CNQX treatment (smallest GABA mIPSCs that don’t fall into noise are 5pA and this is divided by the smallest control GABA mPSCs of 5pPA and therefore the ratio is 1). The rest of the distribution will then approach the scaling factor (50% in this case). On the right side of the ratio plot the values appear to slightly increase. We are not sure why this is happening, but it maybe that a small percentage of mIPSCs are not purely multiplicative at 0.5, however the biggest mPSCs can vary to a great degree from one cell to the next and in other cases we do not see this (Figure 4B, Figure 5E). We tried to do a better job describing this in the resubmission (results describing Figure 2).

      - The readability of the figures is poor. Some of them have inconsistent boundary boxes, bizarre axes, text that appears skewed as if the figures were quickly thrown together and stretched to fit. 

      We have adjusted the figures to be more consistent throughout the manuscript.

      - I'm concerned about the optogenetic restoration of activity experiment. Cortical pyramidal neuron mean firing rates are log normally distributed and span multiple orders of magnitude. The stimulation experiments can only address the total firing at a network-level - given than a network level "mean" is meaningless in a lognormal distribution, how are we to think about the effect of this manipulation when it comes to individual neurons homeostatically stabilizing their own activities? In essence, the argument is made at the single-neuron level, but the experiment is conducted with a network-level resolution. 

      As described above, we do not have the capacity to know what the actual firing rate of a particular neuron was before and after perturbing the system, and certainly not for the specific cells we recorded from to obtain mPSC amplitudes, and so we cannot say that we have perfectly restored the original firing rates of neurons. However, there is reason to believe that this is achieved to some extent. Our optogenetic stimulation is only 50-100 ms long activating a subset of neurons. This is sufficient to provide a synaptic barrage that then triggers a full blown network burst where the majority of spikes occur, but this is after the light is off. In other words, the optogenetic light pulse only initiates what becomes a relatively normal network burst that fortunately allows the individual cells to express their relatively normal (pre-drug) activity pattern. In our previous study using optogenetic activity restoration (Fong et al., 2015) we were able to show that this was the case for individual units - the spiking of an individual unit during a burst is similar before and after CNQX/optogenetic stimulation (see Figure 4b and Suppl. Fig 4 in Fong et al. 2015). We are not claiming that we have restored spiking to exactly the pre-drug state, but bring it back toward those levels and we see this is associated with a return of the mIPSC amplitude to near control levels. We now include a brief description of this in the manuscript (results describing Figure 3).

      - Line 198-99: multiplicativity is not a requirement of a homeostatic mechanism.

      - Line 264-265 - again, neither multiplicativity and synaptic mechanisms are fundamentally any more necessary for a homeostatic locus than anything else that can modulate firing rate in via negative feedback. 

      As mentioned above, the multiplicative nature of scaling has been a historical proposal for AMPAergic scaling and we have now found such a relationship for GABAergic scaling. This is important for understanding how this plasticity works, but we agree that it is not necessary for a homeostat and we have adjusted the manuscript accordingly.

      - 277: do you mean AMPAR? 

      We were not clear enough here. We actually do mean GABAR. The idea was that CTZ increases network activity and thus increases both AMPAergic and GABAergic transmission. We have rewritten this part of the discussion to avoid any confusion (2nd paragraph discussion).

      - Example: Figure 1A is frustratingly unreadable. The axes on the raster insets are microscopic, the arrows are strangely large, and it seems unnecessary to fill so much realestate with 4 rasters. Only one is necessary to show the concept of a network burst. The effect of time+CNQX on the frequency of burst is shown in B and C.

      - Example: Figure 2 appears warped and hastily assembled. Statistical indications are shown within and outside of bounding boxes. Axes are not aligned. Labels are not aligned. Font sizes are not equal on equivalent axes. 

      These figures were generated by the estimation statistics website and text may have been resized inappropriately. We have tried to adjust this and now have attempted to standardize the axes text to the best of our ability.

      - The discussion should include mention of the limitations and/or constraints of drawing general conclusions from cell culture. 

      We have added this consideration at the end of the discussion. Further, this is why we cited studies that argue GABAergic neurons have a particularly important role in homeostatic regulation of firing following sensory deprivations in vivo.

      - The discussion should include mention of the role of developmental age in the expression of specific mechanisms. It is highly likely that what is studied at ~P14 is specific to early postnatal development. 

      We now discuss caveats of cortical cultures at the end of the discussion.

      It is essential to ensure that the data presented in the paper adequately supports the conclusions drawn. A more cautious approach in interpreting the results may lead to a stronger argument and a more robust understanding of the underlying mechanisms at play. 

      We have broadened our discussion of alternative interpretations throughout the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      While I am hesitant to judge a paper based on its tone, I would personally recommend revision of some of the subjective words and statements, as the manuscript undermines its own effectiveness by making unnecessarily strong statements. The text repeatedly paints an "either A or B" picture, and if there's any general lesson in biology, it's that it's always A and B. Global, multiplicative glutamatergic scaling could quite conceivably occur alongside GABAergic scaling, as well as synapse-specific homeostatic modifications. It seems that it would be wise to acknowledge that, while the data presented here point in one direction, in vivo results in an adult brain (for example) might present an entirely different set of patterns. This will not only enhance the readability of the paper but also ensure that the scientific community can engage with the work in a constructive and collaborative manner. Again, I present this as only a constructive and supportive suggestion. I am a big fan of work from this laboratory, and I would love to see this paper in an improved form - it's an important set of ideas and I do believe that these data are rigorously collected. 

      We have attempted to provide a more comprehensive interpretation of our results. We agree that a homeostat can come in many flavors, but do believe that GABAergic scaling is strong candidate, whereas AMPAergic scaling does not currently fit such a role. We do now discuss caveats with our work and are open to other interpretations that need to be flushed out in future work.

      Reviewer #2 (Public Review):

      Major points:

      (1) The reason why CNQX does not completely eliminate spiking is unclear (Fig. 1). What is the circuit mechanism by which spiking continues, although at lower frequency, in the absence of AMPA-mediated transmission and what the mechanism by which spiking frequency grows back after 24h (still in the absence of AMPA transmission)?

      Is it possible that NMDA-mediated transmission takes over and triggers a different type of network plasticity?

      The bursting in AMPAR blockade is due to the remaining NMDA receptor-mediated transmission. We showed this in our previous study in Suppl. Figure 2 and 6 of Fong et al., 2015 (PMID: 25751516). Our ability to optically induce normal looking bursts of spikes was also dependent NMDAR activation (Fong et al 2015 and Figure 6 Newman et al., 2015 - PMID: 26140329). Further, in Dr Fong’s PhD dissertation it was shown that the bursting activity was abolished when AMPA and NMDA receptors were both blocked. There are likely many factors that contribute to the recovery of activity, and certainly one of them is likely to be the weakening of inhibitory GABAergic currents as we had mentioned. We have now added the point about NMDARs mediating the remaining bursts in the manuscript (results associated with Figure 1). We are not clear on what the reviewer has in mind in terms of “NMDA-mediated transmission takes over and triggers a different kind of network plasticity”, but we do discuss the possibility that spiking triggers GABAergic scaling through its effect on NMDAergic transmission, which we cannot rule out, but also have no evidence in support of this idea (3rd and 5th paragraph of discussion). We do plan on addressing this in a future work.

      (2) A possible activation of NMDARs should be considered. One would think that experiments involving chronic glutamatergic blockade could have been conducted in the presence of NMDAR blockers. Why this was not the case?

      Unfortunately, it was not possible to optogenetically restore normal bursting in the presence of NMDAR blockade (even when AMPAergic transmission was intact), as NMDARs appeared to be critical for the optical restoration of the normal duration and form of the burst in rat cortical cultures (see Suppl. Figure 6 Fong et al., 2015 Nat Comm and Figure 6 Newman et al., 2015). Even high concentrations of CNQX (40µM) prevented us from restoring spiking in mouse cultures in the current study, which is why we moved to 20µM CNQX for this study. The reviewer raises an excellent point about a possible NMDAR contribution to altered synaptic strength, however. It is likely that NMDAR signaling is reduced in the presence of CNQX since burst frequency was dramatically reduced along with AMPAR-mediated depolarizations. We cannot rule out the possibility that NMDAR signaling could contribute to the alterations in GABAergic mIPSCs and discuss this in the resubmission (3rd and 5th paragraph of the discussion). We had not considered this previously because prior work suggested that 24/48 hour block NMDARs (APV) did not trigger AMPAergic scaling in cortical or hippocampal cultures (see Figure 1 Turrigiano et al., 1998 Nature and Suppl. Figure 4 Sutton et al., 2006 Cell), moreover, our previous study showed that restoring NMDAergic transmission ontogenetically, at least to some extent, had no influence on AMPAergic scaling (Fong et al., 2015).

      Also, experiments with global ChR2 stimulation with coincident pre and postsynaptic firing might also activate NMDARs and result in additional effects that should be taken into consideration for the global scaling mechanism.

      To be clear, our optical stimulation was of short duration (duration 50-100 ms) and was turned off before the vast majority of spiking that occurred in the bursts. So the light flash was a trigger that allowed a relatively normal looking burst to occur after the light was off (see lower panel of Figure 3B optogenetic stimulation – short duration only at onset of burst – we now make this clearer in resubmission). Therefore, we were unlikely to trigger significant synchronous activation that does not normally occur in network bursts.

      (3) Cultures exposed to CTZ to enhance AMPA receptors generated variable results (Fig. 5), somewhat increasing spiking activity in a non-significant manner but, at the same time, strengthening mIPSC amplitude. This result seems to suggest that spiking might be involved in GABAergic scaling, but it does not seem to prove it. Then, addition of TTX that blocked spiking reduced mIPSC amplitude. It was concluded here that the ability of CTZ to enhance GABAergic currents was primarily due to spiking, rather than the increase in AMPA-mediated currents. However, in addition to blocking action potentials, TTX would also prevent activation of AMPARs in the presence of CTZ due to the lack of glutamatergic release. Therefore, under these conditions, an effect of glutamatergic activation on GABAergic scaling cannot be ruled out.

      These concerns were very similar to reviewer 1’s first comments (see above). To be clear we are going a step beyond most scaling studies by assessing MEA-wide firing rate, but this still provides an incomplete picture of the particular cells that we target for patch recordings in terms of their firing before and after a drug. Further, we see considerable variability in effect on firing rate from culture to culture, which we now discuss in the resubmission (final paragraph discussion). The fact that mIPSCs are no different after TTX treatment vs CTZ+TTX treatment suggests that AMPAergic transmission is not so influential on GABAergic downscaling. While the CTZ results are not conclusive by themselves, taken together with the optogenetic results, where restoration of spiking in AMPAR blockade reverses scaling, is most consistent with idea that GABAergic scaling is triggered by spiking rather than AMPAR activation and places GABAergic scaling as a strong candidate as spike rate homeostat. Although we do feel that we have demonstrated that downward GABAergic scaling is dependent on spiking, we cannot rule out the possibility that upward GABAergic scaling could be influenced by AMPAR activation to some extent. We now acknowledge this possibility (2nd paragraph discussion).

      (4) The sample size is not mentioned in any figure. How many cells/culture dishes were used in each condition?

      The individual dots represent either individual cells for mIPSC amplitude or individual cultures in MEA experiments. Number of cultures and cells are now stated in the figure legends.

      (5) Cortical cultures may typically contain about 5-10% GABAergic interneurons and 90-95 % pyramidal cells. One would think that scaling mechanisms occurring in pyramidal cells and interneurons could be distinct, with different impact on the network. Although for whole-cell recordings the authors selected pyramidal looking cells, which might bias recordings towards excitatory neurons, naked eye selection of recording cells is quite difficult in primary cultures. Some of the variability in mIPSC amplitude values (Fig. 2A for example) might be attributed to the cell type? One could use cultures where interneurons are fluorescently labeled to obtain an accurate representation. The issue of the possible differential effects of scaling in pyramidal cells vs. interneurons and the consequences in the network should be discussed.

      We now include this discussion in the resubmission (final paragraph discussion). Briefly, we chose large cells, which will be predominantly glutamatergic neurons as suggested by the reviewer. Ultimately, even among glutamatergic principal cells there may be variability in the response to drug application. All of these issues could contribute to variability and we have expanded our description of the variability in our results, including that based on cellular heterogeneity. 

      Reviewer #2 (Recommendations For The Authors):

      Minor comments –

      Fig S3: Please quantify changes in frequency

      We have done this (Supplemental Figure 5).

      Fig 2: please choose colors with higher contrast for CNQX/TTX

      We have done this.

      Fig. 3C: Why doesn't CNQX+PhotoStim reach control levels of bursting at 2h?

      The program was designed to follow and maintain total spike frequency and so it does a better job at this than maintaining burst frequency.

      Fig. 5A: please include a comparison between control and Ethanol

      We now do this in Figure 5C. Both around 26pAs.

      Fig. 5C: where is the Etoh condition?

      We have made this figure more clear in terms of controls (Figure 5C & D).

      Reviewer #3 (Public Review):

      This paper concerns whether scaling (or homeostatic synaptic plasticity; HSP) occurs similarly at GABA and Glu synapses and comes to the surprising conclusion that these are regulated separately. This is surprising because these were thought to be co-regulated during HSP and in fact, the major mechanisms thought to underlie downscaling (TTX or CNQX driven), retinoic acid and TNF, have been shown to regulate both GABARs and AMPARs directly. (As a side note, it is unclear that the manipulations used in Josesph and Turrigiano represent HSP, and so might not be relevant). Thus the main result, that GABA HSP is dissociable from Glu HSP, is novel and exciting. This suggests either different mechanisms underlie the two processes, or that under certain conditions, another mechanism is engaged that scales one type of synapse and not the other.

      However, strong claims require strong evidence, and the results presented here only address GABA HSP, relying on previous work from this lab on Glu HSP (Fong, et al., 2015). But the previous experiments were done in rat cultures, while these experiments are done in mice and at somewhat different ages (DIV). Even identical culture systems can drift over time (possibly due to changes in the components of B27 or other media and supplements). Therefore it is necessary to demonstrate in the same system the dissociation. To be convincing, they need to show the mEPSCs for Fig 4, clearly showing the dissociation. Doing the same for Fig 5 would be great, but I think Fig 4 is the key.

      We understand the concern of the reviewer as we do see significant variability within our cultures and they were plated in different places, by different people, in different species (rat vs mouse). Therefore, we have attempted to redo the study on AMPAergic scaling on these mouse cortical neurons. Surprisingly, we found that 20µM CNQX did not trigger AMPAergic upscaling (new Figure 7), even though it did reduce spiking activity and was able to produce GABAergic downscaling. We did not carry out the optogenetic restoration of activity, because we did not trigger upscaling. The result does however, show that the reductions in spiking/bursting that trigger GABAergic downscaling, did not trigger AMPAergic upscaling and therefore dissociate the 2 forms of scaling in these mouse cultures. We do not know why 20 µM CNQX did not trigger scaling in these cultures since it does reduce spiking and AMPAR activation. In the Fong study we used 40µM CNQX because intracellular recordings from rat cortical neurons suggested this was required to completely block AMPAergic currents. Our initial studies in the current manuscript examining GABAergic scaling in mouse cortical cultures used 40µM CNQX, however, this concentration of CNQX prevented us from restoring spiking through optogenetic activation, so we reduced our concentration to 20µM CNQX, which did trigger GABAergic downscaling and allowed the restoration of spiking. We now show and discuss this result (Figure 7 and 3rd paragraph discussion).

      The paper also suggests that only receptor function or spiking could control HSP, and therefore if it is not receptor function then it must be spiking. This seems like a false dichotomy; there are of course other options. Details in the data may suggest that spiking is not the (or the only) homeostat, as TTX and CNQX causes identical changes in mIPSC amplitude but have different effects on spiking. Further, in Fig 5, CTZ had a minimal effect on spiking but a large effect on mIPSCs. Similar issues appear in Fig 6, where the induction of increased spiking is highly variable, with many cells showing control levels or lower spiking rates. Yet the synaptic changes are robust, across all cells. Overall, this is not persuasive that spiking is necessarily the homeostat for GABA synapses.

      Together our results argue against AMPAR or GABAR activation as a trigger for GABAergic scaling and that this is different than our results for AMPAergic scaling. These points alone are important to recognize. While changes in spiking do not perfectly follow the changes in GABAergic scaling they do always trend in the right direction. As mentioned above, total spiking activity is only one measure of spiking. It is possible that these drugs alter the pattern of spiking that translates into an altered calcium transients which may be important for triggering the plasticity. Further, we acknowledge that we cannot rule out a role for NMDARs contributing to GABAergic scaling (3rd and 5th paragraph of discussion). Based on the variability that we observe and the nature of our MEA recordings we cannot precisely determine how the total activity or pattern of activity changes with drug application in the specific cells that we target for whole cell recordings, and this is now discussed (final paragraph of discussion). Again, it is important to note that we are going a step beyond most homeostatic plasticity studies that add a drug and simply assume it is having an effect on spiking (e.g. CNQX was initially thought to completely abolish spiking, but clearly does not). However, we believe that the most parsimonious explanation of our results supports our proposal that GABAergic scaling is a strong candidate as a spike rate homeostat. Regardless, in the resubmission we have included a broader discussion about these possibilities, and recognize that we cannot rule out the possibility that AMPAergic transmission could contribute to upward GABAergic scaling (2nd paragraph discussion).

      The paper also suggests that the timing of the GABA changes coincides with the spiking changes, but while they have the time course of the spiking changes and recovery, they only have the 24h time point for synaptic changes. It is impossible to conclude how the time courses align without more data.

      We can only say that by the 24 hour CNQX time point, when overall spiking is recovered in some but not all cultures and bursts have not recovered, that GABAergic scaling has already occurred. We now state this more clearly in the resubmission (near the end of the 2nd paragraph of the discussion).

      Reviewer #3 (Recommendations For The Authors):

      The statistics are inadequately described. The full information including actual p values should be given, particularly for the non-significant trends reported.

      We have done this in Figure legends.

      The abstract and introduction give the impression that GABA and Glu HSP are independent, though most work links them as occurring simultaneously and in a coordinated fashion to achieve homeostasis.

      While it is true that many studies have triggered both forms of scaling with activity or transmission blockade, these studies have not addressed whether these forms of scaling are actually triggered in the same way mechanistically, except potentially for the one study that we mentioned (Joseph et al.,). Our results suggest they are independent. We now do mention the idea that these two forms of scaling have been assumed to be commonly triggered (3rd paragraph introduction).

      The data in Fig 6 is presented as if BIC treatment is a novel result, although BIC/Gabazine/PTX have been used to induce down-scaling in many previous papers. While it's good to have the results, they should be put in proper context. As suggested in the paper, testing if decreased GABAR function would lead to upscaling does not make sense given all the previous data. 

      Figure 6 shows GABAergic upscaling in response to GABAR block (bicuculline), but we are aware of only two other studies that looked at GABAergic scaling after treating with a GABAR blocker and they found upscaling but this was in hippocampal cultures, not cortical cultures (Peng et al., 2010 - PMID: 21123568, Pribiag et al., 2014 - PMID: 24753587). We now mention this in the results section describing Figure 6. While many studies have blocked GABARs and find AMPAergic downscaling, we are addressing the triggers for GABAergic scaling in Figure 6.

      Is Fig S4B mislabeled? The title says spike rate but the graph axis says burst frequency.

      The reviewer is correct and we have now adjusted this.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Through an unbiased genomewide KO screen, the authors identified loss of DBT to suppress MG132-mediated death of cultured RPE cells. Further analyses suggested that DBT reduces ubiquitinated proteins by promoting autophagy. Mechanistic studies indicated that DBT loss promotes autophagy via AMPK and its downstream ULK and mTOR signaling. Furthermore, loss of DBT suppresses polyglutamine- or TDP-43-mediated cytotoxicity and/or neurodegeneration in fly models. Finally, the authors showed that DBT proteins are increased in ALS patient tissues, compared to non-neurological controls. 

      Strengths: 

      The idea is novel, the evidence is convincing, and the data are clean. The findings have implications for human diseases. 

      Weaknesses: 

      None. 

      Reply: We thank the reviewer for the supportive comments.

      Reviewer #2 (Public Review): 

      Summary: 

      Hwang, Ran-Der et al utilized a CRISPR-Cas9 knockout in human retinal pigment epithelium (RPE1) cells to evaluate for suppressors of toxicity by the proteasome inhibitor MG132 and identified that knockout of dihydrolipoamide branched chain transacylase E2 (DBT) suppressed cell death. They show that DBT knockout in RPE1 cells does not alter proteasome or autophagy function at baseline. However, with MG132 treatment, they show a reduction in ubiquitinated proteins but with no change in proteasome function. Instead, they show that DBT knockout cells treated with MG132 have improved autophagy flux compared to wildtype cells treated with MG132. They show that MG132 treatment decreases ATP/ADP ratios to a greater extent in DBT knockout cells, and in accordance causes activation of AMPK. They then show downstream altered autophagy signaling in DBT knockout cells treated with MG132 compared to wild-type cells treated with MG132. Then they express the ALS mutant TDP43 M337 or expanded polyglutamine repeats to model Huntington's disease and show that knockdown of DBT improves cell survival in RPE1 cells with improved autophagic flux. They also utilize a Drosophila models and show that utilizing either a RNAi or CRISPR-Cas9 knockout of DBT improves eye pigment in TDP43M337V and polyglutamine repeat-expressing transgenic flies. Finally, they show evidence for increased DBT in postmortem spinal cord tissue from patients with ALS via both immunoblotting and immunofluorescence. 

      Strengths: 

      This is a mechanistic and well-designed paper that identifies DBT as a novel regulator of proteotoxicity via activating autophagy in the setting of proteasome inhibition. Major strengths include careful delineation of a mechanistic pathway to define how DBT is protective. These conclusions are well-justified. 

      Weaknesses: 

      None 

      Reply: We thank the reviewer for the supportive comments.

      Recommendations for the authors: 

      Reviewer #1 (Recommendations For The Authors): 

      The authors have addressed my concerns. I have two more suggestions: 

      (1) Since the authors found that MG132 inhibits autophagy, which is inconsistent with previous findings that it promotes autophagy (e.g., PMID: 26648402, 30647455, 28674081), they should discuss this discrepancy in the Discussion.

      Reply: We thank the reviewer for raising this point. We agree with the reviewer that it has been well known in the literature that MG132 can lead to activation of autophagy. Indeed, we have observed in this study that MG132 itself can lead to time-dependent increases in LC3II levels in the first 8 hours of the MG132 treatment (Fig. S5B). These observations reflect the adaptive response of the cell to activate autophagy following proteasomal inhibition. However, as the MG132-mediated proteasomal inhibition persists, it is expected that the accumulation of misfolded protein substrates may overwhelm protein degradation systems, including the autophagylysosome pathway. Indeed, we have observed a reduction of the autophagic flux after 48 hours of the MG132 treatment (Fig. 3). Importantly, the DBT KO cells were able to maintain significantly higher levels of autophagic activities than the WT cells at this time point, consistent with their resistance to MG132-induced cell death. As suggested, we have added more discussion on the dynamic changes in the autophagic activities following proteasomal inhibition.

      (2) A grammar issue: consider removing some of the article "the," e.g.: 

      page 6: "the increase in cleaved PARP1 "-->"an increase in cleaved PARP1";  "the loss of DBT "-->"loss of DBT" 

      page 7: "the loss of DBT "-->"loss of DBT"; "The ubiquitin modification"-->"Ubiquitin modification" 

      Reply:  We thank the reviewer for the supportive comments. And we have removed some of the grammar issues in the article.

      Reviewer #2 (Recommendations For The Authors): 

      The authors have addressed my concerns. 

      Reply: We thank the reviewer for the supportive comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Protein conformational changes are often critical to protein function, but obtaining structural information about conformational ensembles is a challenge. Over a number of years, the authors of the current manuscript have developed and improved an algorithm, qFit protein, that models multiple conformations into high resolution electron density maps in an automated way. The current manuscript describes the latest improvements to the program, and analyzes the performance of qFit protein in a number of test cases, including classical statistical metrics of data fit like Rfree and the gap between Rwork and Rfree, model geometry, and global and case-by-case assessment of qFit performance at different data resolution cutoffs. The authors have also updated qFit to handle cryo-EM datasets, although the analysis of its performance is more limited due to a limited number of high-resolution test cases and less standardization of deposited/processed data.

      Strengths:

      The strengths of the manuscript are the careful and extensive analysis of qFit's performance over a variety of metrics and a diversity of test cases, as well as the careful discussion of the limitations of qFit. This manuscript also serves as a very useful guide for users in evaluating if and when qFit should be applied during structural refinement.

      Reviewer #2 (Public Review):

      Summary

      The manuscript by Wankowicz et al. describes updates to qFit, an algorithm for the characterization of conformational heterogeneity of protein molecules based on X-ray diffraction of Cryo-EM data. The work provides a clear description of the algorithm used by qFit. The authors then proceed to validate the performance of qFit by comparing it to deposited X-ray entries in the PDB in the 1.2-1.5 Å resolution range as quantified by Rfree, Rwork-Rfree, detailed examination of the conformations introduced by qFit, and performance on stereochemical measures (MolProbity scores). To examine the effect of experimental resolution of X-ray diffraction data, they start from an ultra high-resolution structure (SARS-CoV2 Nsp3 macrodomain) to determine how the loss of resolution (introduced artificially) degrades the ability of qFit to correctly infer the nature and presence of alternate conformations. The authors observe a gradual loss of ability to correctly infer alternate conformations as resolution degrades past 2 Å. The authors repeat this analysis for a larger set of entries in a more automated fashion and again observe that qFit works well for structures with resolutions better than 2 Å, with a rapid loss of accuracy at lower resolution. Finally, the authors examine the performance of qFit on cryo-EM data. Despite a few prominent examples, the authors find only a handful (8) of datasets for which they can confirm a resolution better than 2.0 Å. The performance of qFit on these maps is encouraging and will be of much interest because cryo-EM maps will, presumably, continue to improve and because of the rapid increase in the availability of such data for many supramolecular biological assemblies. As the authors note, practices in cryo-EM analysis are far from uniform, hampering the development and assessment of tools like qFit.

      Strengths

      qFit improves the quality of refined structures at resolutions better than 2.0 A, in terms of reflecting true conformational heterogeneity and geometry. The algorithm is well designed and does not introduce spurious or unnecessary conformational heterogeneity. I was able to install and run the program without a problem within a computing cluster environment. The paper is well written and the validation thorough.

      I found the section on cryo-EM particularly enlightening, both because it demonstrates the potential for discovery of conformational heterogeneity from such data by qFit, and because it clearly explains the hurdles towards this becoming common practice, including lack of uniformity in reporting resolution, and differences in map and solvent treatment.

      Weaknesses

      The authors begin the results section by claiming that they made "substantial improvement" relative to the previous iteration of qFit, "both algorithmically (e.g., scoring is improved by BIC, sampling of B factors is now included) and computationally (improving the efficiency and reliability of the code)" (bottom of page 3). However, the paper does not provide a comparison to previous iterations of the software or quantitation of the effects of these specific improvements, such as whether scoring is improved by the BIC, how the application of BIC has changed since the previous paper, whether sampling of B factors helps, and whether the code faster. It would help the reader to understand what, if any, the significance of each of these improvements was.

      Indeed, it is difficult (embarrassingly) to benchmark against our past work due to the dependencies on different python packages and the lack of software engineering. With the infrastructure we’ve laid down with this paper, made possible by an EOSS grant from CZI, that will not be a problem going forward. Not only is the code more reliable and standardized, but we have developed several scientific test sets that can be used as a basis for broad comparisons to judge whether improvements are substantial. We’ve also changed with “substantial improvement” to “several modifications”  to indicate the lack of comparison to past versions.

      The exclusion of structures containing ligands and multichain protein models in the validation of qFit was puzzling since both are very common in the PDB. This may convey the impression that qFit cannot handle such use cases. (Although it seems that qFit has an algorithm dedicated to modeling ligand heterogeneity and seems to be able to handle multiple chains). The paper would be more effective if it explained how a user of the software would handle scenarios with ligands and multiple chains, and why these would be excluded from analysis here.

      qFit can indeed handle both. We left out multiple chains for simplicity in constructing a dataset enriched for small proteins while still covering diversity to speed the ability to rapidly iterate and test our approaches. Improvements to qFit ligand handling will be discussed in a forthcoming work as we face similar technical debt to what we saw in proteins and are undergoing a process of introducing “several modifications” that we hope will lead to “substantial improvement” - but at the very least will accelerate further development.

      It would be helpful to add some guidance on how/whether qFit models can be further refined afterwards in Coot, Phenix, ..., or whether these models are strictly intended as the terminal step in refinement.

      We added to the abstract:

      “Importantly, unlike ensemble models, the multiconformer models produced by qFit can be manually modified in most major model building software (e.g. Coot)  and fit can be further improved by refinement using standard pipelines (e.g. Phenix, Refmac, Buster).”

      and introduction:

      “Multiconformer models are notably easier to modify and more interpretable in software like Coot12 unlike ensemble methods that generate multiple complete protein copies(Burnley et al. 2012; Ploscariu et al. 2021; Temple Burling and Brünger 1994).”

      and results:

      “This model can then be examined and edited in Coot12 or other visualization software, and further refined using software such as phenix.refine, refmac, or buster as the modeler sees fit.”

      and discussion

      “qFit is compatible with manual modification and further refinement as long as the subsequent software uses the PDB standard altloc column, as is common in most popular modeling and refinement programs. The models can therefore generally also be deposited in the PDB using the standard deposition and validation process.”

      Appraisal & Discussion

      Overall, the authors convincingly demonstrate that qFit provides a reliable means to detect and model conformational heterogeneity within high-resolution X-ray diffraction datasets and (based on a smaller sample) in cryo-EM density maps. This represents the state of the art in the field and will be of interest to any structural biologist or biochemist seeking to attain an understanding of the structural basis of the function of their system of interest, including potential allosteric mechanisms-an area where there are still few good solutions. That is, I expect qFit to find widespread use.

      Reviewer #3 (Public Review):

      Summary:

      The authors address a very important issue of going beyond a single-copy model obtained by the two principal experimental methods of structural biology, macromolecular crystallography and cryo electron microscopy (cryo-EM). Such multiconformer model is based on the fact that experimental data from both these methods represent a space- and time-average of a huge number of the molecules in a sample, or even in several samples, and that the respective distributions can be multimodal. Different from structure prediction methods, this approach is strongly based on high-resolution experimental information and requires validated single-copy high-quality models as input. Overall, the results support the authors' conclusions.

      In fact, the method addresses two problems which could be considered separately:

      - An automation of construction of multiple conformations when they can be identified visually;

      - A determination of multiple conformations when their visual identification is difficult or impossible.

      We often think about this problem similarly to the reviewer. However, in building qFit, we do not want to separate these problems - but rather use the first category (obvious visual identification) to build an approach that can accomplish part of the second category (difficult to visualize) without building “impossible”/nonexistent conformations - with a consistent approach/bias.

      The first one is a known problem, when missing alternative conformations may cost a few percent in R-factors. While these conformations are relatively easy to detect and build manually, the current procedure may save significant time being quite efficient, as the test results show.

      We agree with the reviewers' assessment here. The “floor” in terms of impact is automating a tedious part of high resolution model building and improving model quality.

      The second problem is important from the physical point of view and has been addressed first by Burling & Brunger (1994; https://doi.org/10.1002/ijch.199400022). The new procedure deals with a second-order variation in the R-factors, of about 1% or less, like placing riding hydrogen atoms, modeling density deformation or variation of the bulk solvent. In such situations, it is hard to justify model improvement. Keeping Rfree values or their marginal decreasing can be considered as a sign that the model is not overfitted data but hardly as a strong argument in favor of the model.

      We agree with the overall sentiment of this comment. What is a significant variation in R-free is an important question that we have looked at previously (http://dx.doi.org/10.1101/448795) and others have suggested an R-sleep for further cross validation (https://pubmed.ncbi.nlm.nih.gov/17704561/). For these reasons it is important to get at the significance of the changes to model types from large and diverse test sets, as we have here and in other works, and from careful examination of the biological significance of alternative conformations with experiments designed to test their importance in mechanism.

      In general, overall targets are less appropriate for this kind of problem and local characteristics may be better indicators. Improvement of the model geometry is a good choice. Indeed, yet Cruickshank (1956; https://doi.org/10.1107/S0365110X56002059) showed that averaged density images may lead to a shortening of covalent bonds when interpreting such maps by a single model. However, a total absence of geometric outliers is not necessarily required for the structures solved at a high resolution where diffraction data should have more freedom to place the atoms where the experiments "see" them.

      Again, we agree—geometric outliers should not be completely absent, but it is comforting when they and model/experiment agreement both improve.

      The key local characteristic for multi conformer models is a closeness of the model map to the experimental one. Actually, the procedure uses a kind of such measure, the Bayesian information criteria (BIC). Unfortunately, there is no information about how sharply it identifies the best model, how much it changes between the initial and final models; in overall there is not any feeling about its values. The Q-score (page 17) can be a tool for the first problem where the multiple conformations are clearly separated and not for the second problem where the contributions from neighboring conformations are merged. In addition to BIC or to even more conventional target functions such as LS or local map correlation, the extreme and mean values of the local difference maps may help to validate the models.

      We agree with the reviewer that the problem of “best” model determination is poorly posed here. We have been thinking a lot about htis in the context of Bayesian methods (see: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9278553/); however, a major stumbling block is in how variable representations of alternative conformations (and compositions) are handled. The answers are more (but by no means simply) straightforward for ensemble representations where the entire system is constantly represented but with multiple copies.

      This method with its results is a strong argument for a need in experimental data and information they contain, differently from a pure structure prediction. At the same time, absence of strong density-based proofs may limit its impact.

      We agree - indeed we think it will be difficult to further improve structure prediction methods without much more interaction with the experimental data.

      Strengths:

      Addressing an important problem and automatization of model construction for alternative conformations using high-resolution experimental data.

      Weaknesses:

      An insufficient validation of the models when no discrete alternative conformations are visible and essentially missing local real-space validation indicators.

      While not perfect real space indicators, local real-space validation is implicit in the MIQP selection step and explicit when we do employ Q-score metrics.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A point of clarification: I don't understand why waters seem to be handled differently in for cryo-EM and crystallography datasets. I am interested about the statement on page 19 that the Molprobity Clashscore gets worse for cryo-EM datasets, primarily due to clashes with waters. But the qFit algorithm includes a round of refinement to optimize placement of ordered waters, and the clashscore improves for the qFit refinement in crystallography test cases. Why/how is this different for cryo-EM?

      We agree that this was not an appropriate point. We believe that the high clash score is coming from side chains being incorrectly modeled. We have updated this in the manuscript and it will be a focus of future improvements.

      Reviewer #2 (Recommendations For The Authors):

      - It would be instructive to the reader to explain how qFit handles the chromophore in the PYP (1OTA) example. To this end, it would be helpful to include deposition of the multiconformer model of PYP. This might also be a suitable occasion for discussion of potential hurdles in the deposition of multiconformer models in the PDB (if any!). Such concerns may be real concerns causing hesitation among potential users.

      Thank you for this comment. qFit does not alter the position or connectivity of any HETATM records (like the chromophore in this structure). Handling covalent modifications like this is an area of future development.

      Regarding deposition, we have noted above that the discussion now includes:

      “qFit is compatible with manual modification and further refinement as long as the subsequent software uses the PDB standard altloc column, as is common in most popular modeling and refinement programs. The models can therefore, generally also be deposited in the PDB using the standard deposition and validation process.”

      Finally, we have placed all PDBs in a Zenodo deposition (XXX) and have included that language in the manuscript. It is currently under a separate data availability section (page XXX). We will defer to the editor as to the best header that should go under.

      - It may be advisable to take the description of true/false pos/negatives out of the caption of Figure 4, and include it in a box or so, since these terms are important in the main text too, and the caption becomes very cluttered.

      We think adding the description of true/false pos/negatives to the Figure panel would make it very cluttered and wordy. We would like to retain this description within the caption. We have also briefly described each in the main text.

      - page 21, line 4: some issue with citation formatting.

      We have updated these citations.

      - page 25, second paragraph: cardinality is the number of members of a set. Perhaps "minimal occupancy" is more appropriate.

      Thank you for pointing this out. This was a mistake and should have been called the occupancy threshold.

      - page 26: it's - its

      Thank you, we have made this change. 

      - Font sizes in Supplementary Figures 5-7 are too small to be readable.

      We agree and will make this change. 

      Reviewer #3 (Recommendations For The Authors):

      General remarks

      (1) As I understand, the procedure starts from shifting residues one by one (page 4; A.1). Then, geometry reconstruction (e.g., B1) may be difficult in some cases joining back the shifted residues. It seems that such backbone perturbation can be done more efficiently by shifting groups of residues ("potential coupled motions") as mentioned at the bottom of page 9. Did I miss its description?

      We would describe the algorithm as sampling (which includes minimal shifts) in the backbone residues to ensure we can link neighboring residues. We agree that future iterations of qFit should include more effective backbone sampling by exploring motion along the Cβ-Cα, C-N, and (Cβ-Cα × C-N) bonds and exploring correlated backbone movements.

      (2) While the paper is well split in clear parts, some of them seem to be not at their right/optimal place and better can be moved to "Methods" (detailed "Overview of the qFit protein algorithm" as a whole) or to "Data" missed now (Two first paragraphs of "qFit improves overall fit...", page 8, and "Generating the qFit test set", page 22, and "Generating synthetic data ..." at page 26; description of the test data set), At my personal taste, description of tests with simulated data (page 15) would be better before that of tests with real data.

      Thank you for this comment, but we stand by our original decision to keep the general flow of the paper as it was submitted.

      (3) I wonder if the term "quadratic programming" (e.g., A3, page 5) is appropriate. It supposes optimization of a quadratic function of the independent parameters and not of "some" parameters. This is like the crystallographic LS which is not a quadratic function of atomic coordinates, and I think this is a similar case here. Whatever the answer on this remark is, an example of the function and its parameters is certainly missed.

      We think that the term quadratic programming is appropriate. We fit a function with a loss function (observed density - calculated density), while satisfying the independent parameters. We fit the coefficients minimizing a quadratic loss. We agree that the quadratic function is missing from the paper, and we have now included it in the Methods section.

      Technical remarks to be answered by the authors :

      (1) Page 1, Abstract, line 3. The ensemble modeling is not the only existing frontier, and saying "one of the frontiers" may be better. Also, this phrase gives a confusing impression that the authors aim to predict the ensemble models while they do it with experimental data.

      We agree with this statement and have re-worded the abstract to reflect this.

      (2) Page 2. Burling & Brunger (1994) should be cited as predecessors. On the contrary, an excellent paper by Pearce & Gros (2021) is not relevant here.

      While we agree that we should mention the Burling & Brunger paper and the Pearce & Gros (2021) should not be removed as it is not discussing the method of ensemble refinement.

      (3) Page 2, bottom. "Further, when compared to ..." The preference to such approach sounds too much affirmative.

      We have amended this sentence to state:

      “Multiconformer models are notably easier to modify and more interpretable in software like Coot(Emsley et al. 2010) unlike ensemble methods that generate multiple complete protein copies(Burnley et al. 2012; Ploscariu et al. 2021; Temple Burling and Brünger 1994).”

      “The point we were trying to make in this sentence was that ensemble-based models are much harder to manually manipulate in Coot or other similar software compared to multiconformer models. We think that the new version of this sentence states this point more clearly.”

      (4) Page 2, last paragraph. I do not see an obvious relation of references 15-17 to the phrase they are associated with.

      We disagree with this statement, and think that these references are appropriate.

      “Multiconformer models are notably easier to modify and more interpretable in software like Coot12 unlike ensemble methods that generate multiple complete protein copies(Burnley et al. 2012; Ploscariu et al. 2021; Temple Burling and Brünger 1994).”

      (5) Page 3, paragraph 2. Cryo-EM maps should be also "high-resolution"; it does not read like this from the phrase.

      We agree that high-resolution should be added, and the sentence now states:

      “However, many factors make manually creating multiconformer models difficult and time-consuming. Interpreting weak density is complicated by noise arising from many sources, including crystal imperfections, radiation damage, and poor modeling in X-ray crystallography, and errors in particle alignment and classification, poor modeling of beam induced motion, and imperfect detector Detector Quantum Efficiency (DQE) in high-resolution cryo-EM.”

      (6) Page 3, last paragraph before "results". The words "... in both individual cases and large structural bioinformatic projects" do not have much meaning, except introducing a self-reference. Also, repeating "better than 2 A" looks not necessary.

      We agree that this was unnecessary and have simplified the last sentence to state:

      “With the improvements in model quality outlined here, qFit can now be increasingly used for finalizing high-resolution models to derive ensemble-function insights.”

      (7) Page 3. "Results". Could "experimental" be replaced by a synonym, like "trial", to avoid confusing with the meaning "using experimental data"?

      We have replaced experimental with exploratory to describe the use of qFit on CryoEM data. The statement now reads:

      “For cryo-EM modeling applications, equivalent metrics of map and model quality are still developing, rendering the use of qFit for cryo-EM more exploratory.”

      (8) Page 4, A.1. Should it be "steps +/- 0.1" and "coordinate" be "coordinate axis"? One can modify coordinates and not shift them. I do not understand how, with the given steps, the authors calculated the number of combinations ("from 9 to 81"). Could a long "Alternatively, ...absent" be reduced simply to "Otherwise"?

      We have simplified and clarified the sentence on the sampling of backbone coordinates to state:

      “If anisotropic B-factors are absent, the translation of coordinates occurs in the X, Y, and Z directions. Each translation takes place in steps of 0.1 along each coordinate axis, extending to 0.3 Å, resulting in 9 (if isotropic) or to 81 (if anisotropic) distinct backbone conformations for further analysis.”

      (9) Page 6, B.1, line 2. Word "linearly" is meaningless here.

      We have modified this to read:

      “Moving from N- to C- terminus along the protein,”

      (10) Page 9, line 2. It should be explained which data set is considered as the test set to calculate Rfree.

      We think this is clear and would be repetitive if we duplicated it.

      (11) Page 9, line 7. It should be "a valuable metric" and not "an"

      We agree and have updated the sentence to read:

      “Rfree is a valuable metric for monitoring overfitting, which is an important concern when increasing model parameters as is done in multiconformer modeling.”

      (12) Page 10, paragraph 3. "... as a string (Methods)". I did not find any other mention of this term "string", including in "Methods" where it supposed to be explained. Either this should be explained (and an example is given?), or be avoided.

      We agree that string is not necessary (discussing the programmatic datatype). We have removed this from the sentence. It now reads:

      “To quantify how often qFit models new rotameric states, we analyzed the qFit models with phenix.rotalyze, which outputs the rotamer state for each conformer (Methods).”

      (13) Page10, lines 3-4 from bottom. Are these two alternative conformations justified?

      We are unsure what this is referring to.

      (14) Page 12, Fig. 2A. In comparison with Supplement Fig 2C, the direction of axes is changed. Could they be similar in both Figures?

      We have updated Supplementary Figure 2C to have the same direction of axes as Figure 2A.

      (15) Page 15, section's title. Choose a single verb in "demonstrate indicate".

      We have amended the title of this section to be:

      “Simulated data demonstrate qFit is appropriate for high-resolution data.”

      (16) Page 15, paragraph 2. "Structure factors from 0.8 to 3.0 A resolution" does not mean what the author wanted apparently to tell: "(complete?) data sets with the high-resolution limit which varied from 0.8 to 3.0 A ...". Also, a phrase of "random noise increasing" is not illustrated by Figs.5 as it is referred to.

      We have edited this sentence to now read:

      “To create the dataset for resolution dependence, we used the ground truth 7KR0 model, including all alternative conformations, and generated artificial structure factors with a high resolution limit ranging from  0.8 to 3.0 Å resolution (in increments of 0.1 Å).”

      (17) Page 15, last paragraph is written in a rather formal and confusing way while a clearer description is given in the figure legend and repeated once more in Methods. I would suggest to remove this paragraph.

      We agree that this is confusing. Instead of create a true positive/false positive/true negative/false negative matrix, we have just called things as they are, multiconformer or single conformer and match or no match. We have edited the language the in the manuscript and figure legends to reflect these changes.

      (18) Page 16. Last two paragraphs start talking about a new story and it would help to separate them somehow from the previous ones (sub-title?).

      We agree that this could use a subtitle. We have included the following subtitle above this section:

      “Simulated multiconformer data illustrate the convergence of qFit.”

      (19) Page 20. "or static" and "we determined that" seem to be not necessary.

      We have removed static and only used single conformer models. However, as one of the main conclusions of this paper is determining that qFit can pick up on alternative conformers that were modeled manually, we have decided to the keep the “we determined that”.

      (20) Page 21, first paragraph. "Data" are plural; it should be "show" and "require"

      We have made these edits. The sentence now reads:

      “However, our data here shows that not only does qFit need a high-resolution map to be able to detect signal from noise, it also requires a very well-modeled structure as input.”

      (21) Page 21, References should be indicated as [41-45], [35,46-48], [55-57]. A similar remark to [58-63] at page 22.

      We have fixed the reference layout to reflect this change.

      (22) Page 21, last paragraph. "Further reduce R-factors" (moreover repeated twice) is not correct neither by "further", since here it is rather marginal, nor as a goal; the variations of R-factors are not much significant. A more general statement like "improving fit to experimental data" (keeping in mind density maps) may be safer.

      We agree with the duplicative nature of these statements. We have amended the sentence to now read:

      “Automated detection and refinement of partial-occupancy waters should help improve fit to experimental data further reduce Rfree15 and provide additional insights into hydrogen-bond patterns and the influence of solvent on alternative conformations.”

      (23) Page 22. Sub-sections of "Methods" are given in a little bit random order; "Parallelization of large maps" in the middle of the text is an example. Put them in a better order may help.

      We have moved some section of the Methods around and made better headings by using an underscore to highlight the subsections (Generating and running the qFit test set, qFit improved features, Analysis metrics, Generating synthetic data for resolution dependence).

      (24) Page 24. Non-convex solution is a strange term. There exist non-convex problems and functions and not solutions.

      We agree and we have changed the language to reflect that we present the algorithm with non-convex problems which it cannot solve.

      (25) Page 26, "Metrics". It is worthy to describe explicitly the metrics and not (only) the references to the scripts.

      For all metrics, we describe a sentence or two on what each metric describes. As these metrics are well known in the structural biology field, we do not feel that we need to elaborate on them more.

      (26) Page 26. Multiplying B by occupancy does not have much sense. A better option would be to refer to the density value in the atomic center as occ*(4*pi/B)^1.5 which gives a relation between these two entities.

      We agree and have update the B-factor figures and metrics to reflect this.

      (27) Page 40, suppl. Fig. 5. Due to the color choice, it is difficult to distinguish the green and blue curves in the diagram.

      We have amended this with the colors of the curves have been switched.

      (28) Page 42, Suppl. Fig. 7. (A) How the width of shaded regions is defined? (B) What the blue regions stand for? Input Rfree range goes up to 0.26 and not to 0.25; there is a point at the right bound. (C) Bounds for the "orange" occupancy are inversed in the legend.

      (A) The width of the shaded region denotes the standard deviations among the values at every resolution. We have made this clearer in the caption

      (B) The blue region denotes the confidence interval for the regression estimate. Size of the confidence interval was set to 95%. We have made this clearer in the caption

      (C) This has been fixed now

      The maximum R-free value is 0.2543, which we rounded down to 0.25.

      (29) Page 43. Letters E-H in the legend are erroneously substituted by B-E.

      We apologize for this mistake. It is now corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Some important and interesting data are missing. For example, whether the gene therapy can extend the life span of these mutants? The overall in vivo voiding function is missing. AAV9/HSPE2 expression in the bladder wall is not shown.

      Our study was not designed to determine whether gene therapy can improve life span of the Hpse2 mutant mice. We know that the mutant mice usually become ill after the first month of life and can die. However, we wanted to study the mice when they were generally well so that there would be no confounding effects on the bladder physiology caused by general ill health. Indeed, a recent study of Hpse2 inducible deletion in adult mice has shown evidence of exocrine pancreatic insufficiency (Kayal et al., PMID 37491420). We are currently exploring the status of the pancreas in our non-conditional juvenile Hpse2 mice, and whether gene transfer into the pancreas is possible.

      We strongly agree that in vivo voiding studies will be important in the future, and suggest in vivo cystometry is the gold standard for this but is currently beyond the remit of this study.

      It is correct that in this paper we focussed on gene transduction into the pelvic ganglia, because the evidence is mounting that this is a neurogenic disease, with our ex vivo physiological studies showing predominantly neurogenic defects that are corrected by the gene therapy. To further understand the biodistribution of the vector we have now sought evidence of viral transduction into the bladder itself (the new Figure 5). In contrast to the neurons of the pelvic ganglia, we observed very limited transduction: “The vector genome sequence WPRE3, and HPSE2 transcripts, were not detected in the urothelium or lamina propria, the loose tissue directly underneath the urothelium. Within the detrusor muscle layer itself, the large smooth muscle cells were not transduced. However, there were rare small foci of BaseScopeTM signal that may represent nerves coursing through the detrusor.”

      Reviewer 2:

      Weaknesses include a lack of discussion of the basis for differences in carbachol sensitivity in Hpse2 mutant mice, limited discussion of bladder tissue morphology in Hpse2 mutant mice, some questions over the variability of the functional data, and a need for clarification on the presentation of statistical significance of functional data

      Yes, it is interesting that untreated male mutant mice have an increased bladder body contraction to carbachol compared with WT males. In a previous paper (Manak et al., 2020) we performed quantitative western blots for the M2 and M3 receptors and found levels were similar in mutants to the WTs, thus the increased sensitivity probably lies post-receptor.

      A detailed study of the bladder body is an interesting idea, in terms of possible transgene expression and detailed histology, and is something we will pursue in future studies.

      We have reported in our physiology graphs what we find. We do find some variability, particularly at lower frequencies, but our conclusions depend on analyses of the whole curve, which depend on multiple frequencies and show the expected overall pattern of frequency-dependent relaxation.

      Thank you, the stats for Figure 8 (now figure 9) have been corrected.

      Reviewer 3:

      Single-cell analysis of mutants versus control bladder, urethra including sphincter. This would be great also for the community.

      Yes, in future we are very interested in using a single cell sequencing approach to look at the mutant, WT and rescued pelvic ganglia. In the manuscript we have provided further discussion on the aetiology of urofacial syndrome, and what we still have to learn. We highlight a recent paper in eLife that uses single cell sequencing of mouse pelvic ganglia (Sivori et al., 2024), demonstrating the feasibility of this molecular approach in the pelvic ganglia, and propose this technique could be applied to the study the UFS mice to provide important insights into the molecular pathobiology of the condition.

      Detailed tables showing data from each mouse examined.

      In theory, it would be very interesting to correlate the strength of human gene transduction into the pelvic ganglia, with, for example, the effect on a physiological parameter. However, in general we used different sets of mice for these techniques so at the present we don’t have this information.

      Use of measurements that are done in vivo (spot assay for example). This sounds relatively simple.

      We strongly agree that in vivo voiding studies will be important it the future, and suggest in vivo cystometry is the gold standard for this but is currently beyond the remit of this study.

      Assessment of viral integration in tissues besides the liver (could be done by QPCR).

      This is an important point, and suggest the pancreas is a particularly interesting target for future studies. In the manuscript, we have highlighted a recent study of Hpse2 inducible deletion in young adult mice that has shown evidence of exocrine pancreatic insufficiency (Kayal et al., PMID 37491420), associated with fatty degeneration of pancreatic acinar cells. The Hpse2 mutant animals are smaller than wildtype littermates, the reason for which has not been identified but could be due to defects in processing milk and food.  We are currently exploring the status of the pancreas in our non-conditional juvenile Hpse2 mice, and whether gene transfer into the pancreas is possible.

      Discuss subtypes of neurons that are present and targeted in the context of mutants and controls.

      The make-up of the pelvic ganglia in Hpse2 mutant mice is a fascinating question. Future analysis using scRNA-Seq may be the most effective way to answer this question and is a molecular approach we are looking to pursue in the future.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study develops a machine learning method to reveal hidden unknown functions and behavior in gene regulatory networks by searching parameter space in an efficient way. The evidence for some parts of the paper is still incomplete and needs systematic comparison to other methods and to the ground truth, but the work will be of broad interest to anyone working in biology of all stripes since the ideas reach beyond gene regulatory networks to revealing hidden functions in any complex system with many interacting parts.

      We thank the editors and reviewers for their positive assessment and constructive suggestions. In our response, we acknowledge the importance of systematic comparison to other methods and to the ground truth, when available. However we also emphasize the challenges associated with evaluating such methods in the context of uncovering hidden behaviors in complex biological networks as the ground truth is often unknown.  We hope that our explanations will clarify the potential of our approach in advancing the exploration of these systems.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper suggests to apply intrinsically-motivated exploration for the discovery of robust goal states in gene regulatory networks.

      Strengths:

      The paper is well written. The biological motivation and the need for such methods are formulated extraordinarily well. The battery of experimental models is impressive.

      We thank the reviewer for sharing interest in the research problem and for recognizing the strengths of our work.

      Weaknesses:

      (1) The proposed method is compared to the random search. That says little about the performance with regard to the true steady-state goal sets. The latter could be calculated at least for a few simple ODE (e.g., BIOMD0000000454, `Metabolic Control Analysis: Rereading Reder'). The experiment with 'oscillator circuits' may not be directly interpolated to the other models.

      The lack of comparison to the ground truth goal set (attractors of ODE) from arbitrary initial conditions makes it hard to evaluate the true performance/contribution of the method. A part of the used models can be analyzed numerically using JAX, while there are models that can be analyzed analytically.

      "...The true versatility of the GRN is unknown and can only be inferred through empirical exploration and proxy metrics....": one could perform a sensitivity analysis of the ODEs, identifying stable equilibria. That could provide a proxy for the ground truth 'versatility'.

      We agree with the reviewer that one primary concern is to properly evaluate the effectiveness of the proposed method. However, as we move toward complex pathways, knowledge of the “true” steady-state goal sets is often unknown which is where the use of machine learning methods as the one we propose are particularly interesting (but challenging to evaluate).

      For simple models whose true steady-state distribution can be derived numerically and/or analytically, it is very likely that their exploration will be much simpler and this is not where a lot of improvement over random search may be found, which explains our focus on more complex models. While we agree that it is still interesting to evaluate exploration methods on these simple models for checking their behavior, it is not clear how to scale this analysis to the targeted more complex systems.

      For systems whose true steady state distribution cannot be derived analytically or numerically, we believe that random search is a pertinent baseline as it is commonly used in the literature to discover the attractors/trajectories of a biological network. For instance, Venkatachalapathy et al. [1] initialize stochastic simulations at multiple randomly sampled starting conditions (which is called a kinetic Monte Carlo-based method) to capture the steady states of a biological system. Similarly, Donzé et al. [29] use a Monte Carlo approach to compute the reachable set of a biological network «when the number of parameters  is large and their uncertain range  is not negligible». For the considered models, the true steady-state goal set is unknown, which is why we chose comparison with random search. We added a “Statistics” subsection in the Methods section providing additional details about the statistical analyses we perform between our method and the random search baseline.

      (2) The proposed method is based on `Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning', which assumes state action trajectories [s_{t_0:t}, a_{t_0:t}], (2.1 Notations and Assumptions' in the IMGEP paper). However, the models used in the current work do not include external control actions, but rather only the initial conditions can be set. It is not clear from the methods whether IMGEP was adapted to this setting, and how the exploration policy was designed w/o actual time-dependent actions. What does "...generates candidate intervention parameters to achieve the current goal....", mean considering that interventions 'Sets the initial state...' as explained in Table 2?

      We thank the reviewer for asking for clarification, as indeed the IMGEP methodology originates from developmental robotics scenarios which generally focus on the problem of robotic sequential decision-making, therefore assuming state action trajectories as presented in Forestier et al. [65]. However, in both cases, note that the IMGEP is responsible for sampling parameters which then govern the exploration of the dynamical system. In Forestier et al. [65], the IMGEP also only sets one vector at the start (denoted ) which was specifying parameters of a movement (like the initial state of the GRN), which was then actually produced with dynamic motion primitives which are dynamical system equations similar to GRN ODEs, so the two systems are mathematically equivalent. More generally, while in our case the “intervention” of the IMGEP (denoted ) only controls the initial state of the GRN, future work could consider more advanced sequential interventions simply by setting parameters of an action policy  at the start which could be called during the GRN’s trajectory to sample control actions  where  would be the state of the GRN. In practice this would also require setting only one vector at the start, so it would remain the same exploration algorithm and only the space of parameters would change, which illustrates the generality of the approach.

      (3) Fig 2 shows the phase space for (ERK, RKIPP_RP) without mentioning the typical full scale of ERK, RKIPP_RP. It is unclear whether the path from (0, 0) to (~0.575, ~3.75) at t=1000 is significant on the typical scale of this phase space. is it significant on the typical scale of this phase space?

      The purpose of Figure 2 is to illustrate an example of GRN trajectory in transcriptional space, and to illustrate what “interventions” and “perturbations” can be in that context. To that end we have used the fixed initial conditions provided in the BIOMD0000000647, replicating Figure 5 of Cho et al. [56].

      While we are not sure of what the reviewer means with “typical” scale of this phase space, we would like to point reviewer toward Figure 8 which shows examples of certain paths that indeed reach further point in the same phase space (up to ~10 in RKIPP_RP levels and ~300 in ERK levels). However, while the paths displayed in Figure 8 are possible (and were discovered with the IMGEP), note that they may be “rarer” to occur naturally  in the sense that a large portion of the tested initial conditions with random search tend to converge toward smaller (ERK, RKIPP_RP) steady-state values similar to the ones displayed in Figure 2.

      (4) Table 2:

      a. Where is 'effective intervention' used in the method?

      b. in my opinion 'controllability', 'trainability', and 'versatility' are different terms. If their correspondence is important I would suggest to extend/enhance the column "Proposed Isomorphism". otherwise, it may be confusing.

      a) We thank the reviewer for pointing out that “effective intervention” is not explicitly used in the method. The idea here is that as we are exploring a complex dynamical system (here the GRN), some of the sampled interventions will be particularly effective at revealing novel unseen outcomes whereas others will fail to produce a qualitative change to the distribution of discovered outcomes. What we show in this paper, for instance in Figure 3a and Figure 4, is that the IMGEP method is particularly sample-efficient in finding those “effective interventions”, at least more than a random exploration. However we agree that the term “effective intervention” is ambiguous (does not say effective in what) and we have replaced it with “salient intervention” in the revised version.

      b) We thank the reviewer for highlighting some confusing terms in our chosen vocabulary, and we have clarified those terms in the revised version. We agree that controllability/trainability and versatility are not exactly equivalent concepts, as controllability/trainability typically refers to the amount to which a system is externally controllable/trainable whereas versatility typically refers to the inherent adaptability or diversity of behaviors that a system can exhibit in response to inputs or conditions. However, they are both measuring the extent of states that can be reached by the system under a distribution of stimuli/conditions, whether natural conditions or engineered ones, which is why we believe that their correspondence is relevant.

      I don't see how this table generalizes "concepts from dynamical complex systems and behavioral sciences under a common navigation task perspective".

      We have replaced the verb “generalize” with “investigate” in the revised version.

      Reviewer #2 (Public Review):

      Summary:

      Etcheverry et al. present two computational frameworks for exploring the functional capabilities of gene regulatory networks (GRNs). The first is a framework based on intrinsically-motivated exploration, here used to reveal the set of steady states achievable by a given gene regulatory network as a function of initial conditions. The second is a behaviorist framework, here used to assess the robustness of steady states to dynamical perturbations experienced along typical trajectories to those steady states. In Figs. 1-5, the authors convincingly show how these frameworks can explore and quantify the diversity of behaviors that can be displayed by GRNs. In Figs. 6-9, the authors present applications of their framework to the analysis and control of GRNs, but the support presented for their case studies is often incomplete.

      Strengths:

      Overall, the paper presents an important development for exploring and understanding GRNs/dynamical systems broadly, with solid evidence supporting the first half of their paper in a narratively clear way.

      The behaviorist point of view for robustness is potentially of interest to a broad community, and to my knowledge introduces novel considerations for defining robustness in the GRN context.

      We thank the reviewer for recognizing the strengths and novelty of the proposed experimental framework for exploring and understanding GRNs, and complex dynamical systems more generally. We agree that the results presented in the section “Possible Reuses of the Behavioral Catalog and Framework” (Fig 6-9) can be seen as incomplete along certain aspects, which we tried to make as explicit as possible throughout the paper, and why we explicitly state that these are “preliminary experiments”. Despite the discussed limitations, we believe that these experiments are still very useful to illustrate the variety of potential use-cases in which the community could benefit from such computational methods and experimental framework, and build on for future work.

      Some specific weaknesses, mostly concerning incomplete analyses in the second half of the paper:

      (1) The analysis presented in Fig. 6 is exciting but preliminary. Are there other appropriate methods for constructing energy landscapes from dynamical trajectories in gene regulatory networks? How do the results in this particular case study compare to other GRNs studied in the paper?

      We are not aware of other methods than the one proposed by Venkatachalapathy et al. [1] for constructing an energy landscape given an input set of recorded dynamical trajectories, although it might indeed be the case. We want to emphasize that any of such methods would anyway depend on the input set of trajectories, and should therefore benefit from a set that is more representative of the diversity of behaviors that can be achieved by the GRN, which is why we believe the results presented in Figure 6 are interesting. As the IMGEP was able to find a higher diversity of reachable goal states (and corresponding trajectories) for many of the studied GRNs, we believe that similar effects should be observable when constructing the energy landscapes for these GRN models, with the discovery of additional or wider “valleys” of reachable steady states.

      Additionally, it is unclear whether the analysis presented in Fig. 6C is appropriate. In particular, if the pseudopotential landscapes are constructed from statistics of visited states along trajectories to the steady state, then the trajectories derived from dynamical perturbations do not only reflect the underlying pseudo-landscape of the GRN. Instead, they also include contributions from the perturbations themselves.

      We agree that the landscape displayed Fig. 6C integrates contributions from the perturbations on the GRN’s behavior, and that it can shape the landscape in various ways, for instance affecting the paths that are accessible, the shape/depth of certain valleys, etc. But we believe that qualitatively or quantitatively analyzing the effect of these perturbations  on the landscape is precisely what is interesting here: it might help 1) understand how a system respond to a range of perturbations and to visualize which behaviors are robust to those perturbations, 2) design better strategies for manipulating those systems to produce certain behaviors

      (2) In Fig. 7, I'm not sure how much is possible to take away from the results as given here, as they depend sensitively on the cohort of 432 (GRN, Z) pairs used. The comparison against random networks is well-motivated. However, as the authors note, comparison between organismal categories is more difficult due to low sample size; for instance, the "plant" and "slime mold" categories each only have 1 associated GRN. Additionally, the "n/a" category is difficult to interpret.

      We acknowledge that this part is speculative as stated in the paper: “the surveyed database is relatively small with respect to the wealth of available models and biological pathways, so we can hardly claim that these results represent the true distribution of competencies across these organism categories”. However, when further data is available, the same methodology can be reused and we believe that the resulting statistical analyses could be very informative to compare organismal (or other) categories.

      (3) In Fig. 8, it is unclear whether the behavioral catalog generated is important to the intervention design problem of moving a system from one attractor basin to another. The authors note that evolutionary searches or SGD could also be used to solve the problem. Is the analysis somehow enabled by the behavioral catalog in a way that is complementary to those methods? If not, comparison against those methods (or others e.g. optimal control) would strengthen the paper.

      We thank the reviewer for asking to clarify this point, which might not be clearly explained in the paper. Here the behavioral catalog is indeed used in a complementary way to the optimization method, by identifying a representative set of reachable attractors which are then used to define the optimization problem. For instance here, thanks to the catalog, we 1) were able to identify a “disease” region and several possible reachable states in that region and 2) use several of these states as starting points of our optimization problem, where we want to find a single intervention that can successfully and robustly reset all those points, as illustrated in Figure 8. Please note that given this problem formulation, a simple random search was used as an optimization strategy. When we mention more advanced techniques such as EA or SGD, it is to say that they might be more efficient optimizers than random search. However, we agree that in many cases optimizing directly will not work if starting from random or bad initial guess, and this even with EA or SGD. In that case the discovered behavioral catalog can be useful to better initialize  this local search and make it more efficient/useful, akin to what is done in Figure 9.

      (4) The analysis presented in Fig. 9 also is preliminary. The authors note that there exist many algorithms for choosing/identifying the parameter values of a dynamical system that give rise to a desired time-series. It would be a stronger result to compare their approach to more sophisticated methods, as opposed to random search and SGD. Other options from the recent literature include Bayesian techniques, sparse nonlinear regression techniques (e.g. SINDy), and evolutionary searches. The authors note that some methods require fine-tuning in order to be successful, but even so, it would be good to know the degree of fine-tuning which is necessary compared to their method.

      We agree that the analysis presented in Figure 9 is preliminary, and thank the reviewer for the suggestion. We would first like to refer to other papers from the ML literature that have more thoroughly analyzed this issue, such as Colas et al. [74] and Pugh et al. [34], and shown the interest of diversity-driven strategies as promising alternatives.  Additionally, as suggested by the reviewer, we added an additional comparison to the CMA-ES algorithm in the revised version in order to complete our analysis. CMA-ES is an evolutionary algorithm which is self-adaptive in the optimization steps and that is known to be better suited than SGD to escape local minimas when the number of parameters is not too high (here we only have 15 parameters). However, our results showed that while CMA-ES explores more the solution space at the beginning of optimization than SGD does, it also ultimately converges into a local minima similarly to SGD. The best solution converges toward a constant signal (of the target b) but fails to maintain the target oscillations, similar to the solutions discovered by gradient descent. We tried this for a few hyperparameters (init mean and std) but always found similar results.  We have updated the figure 9 image and caption, as well as descriptive text, to include these novel results in the revised version. We also added a reference to the CMA-ES paper in the citations.

      Reviewer #1 (Recommendations For The Authors):

      I would suggest to conduct a more rigor analysis of the performance by estimating/approximating the ground truth robust goal sets in important GRNs.

      Also, the use of terminology from different disciplines can be improved. Please see my comments above. Specifically, the connection between controllability in dynamical control systems and versatility used in this paper is unclear.

      We hope to have addressed the reviewer's concerns in our previous answers.

      Reviewer #2 (Recommendations For The Authors):

      Fig 4b: I'm not sure if DBSCAN is the appropriate method to use here, as the visual focus on the core elements of the clusters downplays the full convex hull of the points that random sampling achieves in Z space. An analysis based on convex hulls or the ball-coverage from Fig. 3b would presumably generate plots that were more similar between random sampling and curiosity search. If the goal is to highlight redundancy/non-linearity in the mapping between Z and I, another approach might be to simply bin Z-space in a grid, or to use a clustering algorithm that is less stringent about core/noise distinctions.

      We thank the reviewer for the suggestion. This plot is intended to convey the reader an understanding of why a method that uniformly samples goals in Z (what the  IMGEP is doing), is more efficient than a method that uniformly samples parameters in I (what the random search is doing), in systems for which there is high redundancy/non-linearity in the mapping between I and Z. We agree that binning the Z-space in a grid and counting the number of achieved bins is a way to quantitatively measure this, which is by the way very close to what we do in Figure 3 for measuring the achieved diversity. We believe however that the clustering and coloring provides additional intuitions on why this is the case: it illustrates that large regions of the intervention space map to small regions in the outcome space and vice versa.

      Additional changes in the revised version:

      We added a sentence in the Methods section as well as in the caption of Table S1 providing additional details about the way we simulate the biological models from the BioModels website

      We fixed a wrong reference to Figure 4 in the Methods “Sensitivity measure” subsection with reference to Figure 5.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      Despite the importance of long-lived plasma cells (LLPCs), particularly in the vaccination field, their natures are still unclear. In this valuable manuscript, as a first step towards clarifying these natures, the authors used a solid genetic approach (time-stamping one) and successfully labelled only functional LLPCs. Although four groups have already published data by the same genetic approach, the authors' manuscript includes additional significant findings in the LLPC field.

      Public Reviews:

      Reviewer #1 (Public Review):

      The mechanisms underlying the generation and maintenance of LLPCs have been one of the unresolved issues. Recently, four groups have independently generated new genetic tools that allow fate tracing of murine plasma cells and have addressed how LLPCs are generated or maintained in homeostatic conditions or upon antigen immunization or viral infection. Here, Jing et al. have established another, but essentially the same, PC time stamping system, and tried to address the issues above. The question is whether the findings reported here provide significant conceptual advances from what has already been published.

      (1) Some of the observations in this manuscript have already been made by other studies (Xu et al. 2020, Robinson et al. 2022, Liu et al. 2022, Koike et al. 2023, Robinson et al. 2023). In my opinion, however, genetic analysis of the role of CXCR4 on PC localization or survival in BM (Figure 5) was well performed and provided some new aspects which have not been addressed in previous reports. The motility of CXCR4 cKO plasma cells in BM is not shown, but it could further support the idea that reduced mobility or increased clustering is required for longevity.

      (2) The combination of the several surface markers shown in Figure 3&4 doesn't seem to be practically applicable to identify or gate on LLPCs, because differential expression of CD81, CXCR4, CD326, CD44, or CD48 on LLPCs vs bulk PCs was very modest. EpCAMhi/CXCR3-, Ly6Ahi/Tigit- (Liu et al. 2022), B220lo/MHC-IIlo (Koike et al. 2023), or SLAMF6lo/MHC-IIlo (Robinson et al. 2023) has been reported as markers for LLPC population. It is unclear that the combination of surface markers presented here is superior to published markers. In addition, it is unclear why the authors did not use their own gene expression data (Fig.6), instead of using public datasets, for picking up candidate markers.

      In terms of the utility of these markers, we agree they are not sufficient to distinguish bona fide LLPCs but they did enrich for LLPCs by 6-fold (Figure 3).  In the other studies cited, LLPCs are enriched in those gates but not exclusively found in the gates, suggesting some plasticity.  In terms of how they were chosen, we conducted the flow surface studies in parallel and prior to completing the gene expression studies, thus, they were not available in time to be useful for the longitudinal studies.  As this was not the major findings of the paper, we have reduced emphasis on this section, and moved some of the data to Figure S2.

      Reviewer #2 (Public Review):

      In this study by Jing, Fooksman, and colleagues, a Blimp1-CreERT2-based genetic tracing study is employed to label plasma cells. Over the course of several months post-tamoxifen treatment, the only remaining labeled cells are long-lived plasma cells. This system provides a way to sort live long-lived plasma cells and compare them to unlabeled plasma cells, which contain a range of short-to-long-lived cells. From this analysis, several observations are made: 1) the turnover rate of plasma cells is greater in the spleen than in the bone marrow; 2) the turnover rate is highest early in life; 3) subtle transcriptional and cell surface marker differences distinguish long- from shorter-lived plasma cells; 4) long-lived plasma cells in the bone marrow are sessile and localize in clusters with each other; 5) CXCR4 is required for plasma cell retention in these clusters and in the bone marrow; 6) Repertoire analysis hints that the selection of long-lived plasma cells is not random for any cell that lands in the bone marrow.

      Strengths:

      (1) The genetic timestamping approach is a clever and functional way to separate plasma cells of differing longevities.

      (2) This approach led to the identification of several markers that could help prospective separation of long-lived plasma cells from others.

      (3) Functional labeling of long-lived plasma cells allowed for a higher resolution analysis of transcriptomes and motility than was previously possible.

      (4) The genetic system allowed for a revisitation of the importance of CXCR4 in plasma cell retention and survival.

      Weaknesses:

      (1) Most of the labeling studies, likely for practical reasons, were done on polyclonal rather than antigen-specific plasma cells. The triggers of these responses could vary based on age at the time of exposure, anatomical sites, etc. How these differences might influence markers and transcriptomes, independently of longevity, is not completely known.

      (2) The fraction of long-lived plasma cells in the unlabeled fraction varies with age, potentially diluting differences between long- and short-lived plasma cells.

      (3) The authors suggest their data favors a model by which plasma cells compete for niche space. Yet there is no evidence presented here that these niches are limiting.

      In Figure 2, we provide important evidence that LLPCs are enriched in PC clusters, and are less motile, suggesting they occupy a unique niche compared to bulk PCs in the bone marrow.  But we agree it does not clarify if that niche is limited.

      (4) The functional importance of the observed transcriptome differences between long- and shorter-lived plasma cells is unknown. An assessment as to whether these differences are conserved in human long- and short-lived bone marrow plasma cells might provide circumstantial supporting evidence that these changes are important for longevity.

      Reviewer #3 (Public Review):

      The valuable work shows some unique characteristics of long-lived PCs in comparison with bulk PCs. In particular, the authors clearly indicated the dependency of CXCR4 in PC longevity and provided a deal of resource of PC transcriptomes. Though CD93 is known as a marker for long-lived PCs, the authors can provide more data related to CD93.

      Summary:

      Long-lived PCs are maintained with low motility and in a CXCR4-dependent manner. 

      Strengths:

      The reporter mice for fate-mapping can clearly distinguish long-lived PCs from total PCs and greatly contribute to the identification of long-lived PCs.

      Weaknesses:

      The authors are unable to find a unique marker for long-lived PCs

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Given the author's expertise, I suggest investigating the motility of CXCR4 cKO plasma cells in BM. 

      Thank you for the suggestion. This work would certainly fit in with the theme of the paper.  We tried to measure this using the BEC Rosa-LSL-YFP Cxcr4f/f system after tamoxifen treatment but unfortunately, these PCs leave the BM concurrently as they turn on YFP expression from the Rosa26 locus, making it impossible to capture the change in motility.  This is also evident in our data in updated Figure 5 which shows that intratibial injection of 4HO-Tamoxifen causes rapid mobilization of CXCR4KO PCs from the tibia within 1 day.  We tried to breed other models that would allow us to visualize these early events, which were unsuccessful, and also responsible for the long delay in resubmission.

      (2) Expression of CD81, CXCR4, CD326, CD44, or CD48 was not different enough to distinguish LLPCs from bulk PCs (Figure 3B). The caveat is that bulk PCs also contained a significant frequency of LLPCs, which would make the difference in expression levels smaller. I suggest looking at the expression of these molecules on newly generated PCs, soon after protein immunization, for example.

      This would be a separate issue, when they begin to express the LLPC phenotype, and definitely worthwhile in future studies.

      Reviewer #2 (Recommendations For The Authors):

      (1) Related to the above public comment #4, I would recommend looking at Halliley et al., Immunity, 2015 to see if some of the same LLPC transcriptional and marker differences can be observed between CD19+ and CD19- plasma cells in the human marrow.

      Thank you for the suggestion to do a human correlation.  It is unclear what conclusions we can draw from overlapping or non-overlapping patterns, on their own.

      (2) For CD93, since it is bimodal, it may be better to express this as % positive rather than fold changes in MFI as in Figure 3.

      We have updated Figure 3C to include %positive as suggested. Fold changes were moved to Figure S2.

      Reviewer #3 (Recommendations For The Authors):

      The valuable work shows some unique characteristics of long-lived PCs in comparison with bulk PCs. In particular, the authors clearly indicated the dependency of CXCR4 in PC longevity and provided a deal of resources of PC transcriptomes. Though CD93 is known as a marker for long-lived PCs, the authors can provide more data related to CD93.

      Major points:

      The authors show data that some bulk PCs express CD93 lower. Are CD93low bulk PCs are higher motile in the BM compared to CD93high? Are CD93low highly mutated in the Ig gene? Do CD93high bulk PCs have similar transcriptome to long-lived PCs on some representative genes?

      Although we do not have data here, the difference between CD93high cells and CD93low cells are likely to be small since labeled PCs were observed to express higher CD93 surface level as early as day 5 in BM and SP shown in updated Figure 3C. Thus, while CD93 is strongly enriched in LLPCs, it cannot be used as a single marker to sufficiently isolate LLPCs, which would make it very difficult to detect changes in motility, mutation of Ig gene, and gene expression.

      Minor points:

      (1) In the title, the authors describe that surface receptor expression support PC-intrinsic longevity. The surface receptor is only CXCR4. The ambiguous description confuses the readers. 

      While CXCR4 was shown functionally to be involved, we found multiple surface receptors are differentially expressed in LLPCs.

      (2) The abbreviations of 'bone marrow' and 'BM' should be unified.

      (3) In Fig. 7C, the bars for comparison are unclear. What dots are compared? 

      Bars are comparing day 90 middle aged to day 5 controls, as there were only n=2 for some day 90 young mice samples for all internally pared comparisons.

      (4) The explanation about Fig.7I can't be understood. How are conclusions occurred from the panel? 

      Fig. 7I shows that of the most common public clones found (found in the most samples or mice), across all LLPC and Bulk 42 total samples, most of the hits came from LLPC samples (all colored) whereas few were from bulk PC samples (white bars), suggesting the shared repertoire is uniquely LLPC-like.  These were observations drawn, but no statistical analysis was conducted here.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study makes a valuable empirical contribution to our understanding of visual processing in primates and deep neural networks, with a specific focus on the concept of factorization. The analyses provide solid evidence that high factorization scores are correlated with neural predictivity, yet more evidence would be needed to show that neural responses show factorization. Consequently, while several aspects require further clarification, in its current form this work is interesting to systems neuroscientists studying vision and could inspire further research that ultimately may lead to better models of or a better understanding of the brain.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper investigates visual processing in primates and deep neural networks (DNNs), focusing on factorization in the encoding of scene parameters. It challenges the conventional view that object classification is the primary function of the ventral visual stream, suggesting instead that the visual system employs a nuanced strategy involving both factorization and invariance. The study also presents empirical findings suggesting a correlation between high factorization scores and good neural predictivity.

      Strengths:

      (1) Novel Perspective: The paper introduces a fresh viewpoint on visual processing by emphasizing the factorization of non-class information.

      (2) Methodology: The use of diverse datasets from primates and humans, alongside various computational models, strengthens the validity of the findings.

      (3) Detailed Analysis: The paper suggests metrics for factorization and invariance, contributing to a future understanding & measurements of these concepts.

      Weaknesses:

      (1) Vagueness (Perceptual or Neural Invariance?): The paper uses the term 'invariance', typically referring to perceptual stability despite stimulus variability [1], as the complete discarding of nuisance information in neural activity. This oversimplification overlooks the nuanced distinction between perceptual invariance (e.g., invariant object recognition) and neural invariance (e.g., no change in neural activity). It seems that by 'invariance' the authors mean 'neural' invariance (rather than 'perceptual' invariance) in this paper, which is vague. The paper could benefit from changing what is called 'invariance' in the paper to 'neural invariance' and distinguish it from 'perceptual invariance,' to avoid potential confusion for future readers. The assignment of 'compact' representation to 'invariance' in Figure 1A is misleading (although it can be addressed by the clarification on the term invariance). [1] DiCarlo JJ, Cox DD. Untangling invariant object recognition. Trends in cognitive sciences. 2007 Aug 1;11(8):333-41.

      Thanks for pointing out this ambiguity. In our Introduction we now explicitly clarify that we use “invariance” to refer to neural, rather than perceptual invariance, and we point out that both factorization and (neural) invariance may be useful for obtaining behavioral/perceptual invariance.

      (2) Details on Metrics: The paper's explanation of factorization as encoding variance independently or uncorrelatedly needs more justification and elaboration. The definition of 'factorization' in Figure 1B seems to be potentially misleading, as the metric for factorization in the paper seems to be defined regardless of class information (can be defined within a single class). Does the factorization metric as defined in the paper (orthogonality of different sources of variation) warrant that responses for different object classes are aligned/parallel like in 1B (middle)? More clarification around this point could make the paper much richer and more interesting.

      Our factorization metric measures the degree to which two sets of scene variables are factorized from one another. In the example of Fig. 1B, we apply this definition to the case of factorization of class vs. non-class information. Elsewhere in the paper we measure factorization of several other quantities unrelated to class, specifically camera viewpoint, lighting conditions, background content, and object pose. In our revised manuscript we have clarified the exposition surrounding Fig. 1B to make it clear that factorization, as we define it, can be applied to other quantities as well and that responses do not need to be aligned/parallel but simply live in a different set of dimensions whether linearly or nonlinearly arranged. Thanks for raising the need to clarify this point.

      (3) Factorization vs. Invariance: Is it fair to present invariance vs. factorization as mutually exclusive options in representational hypothesis space? Perhaps a more fair comparison would be factorization vs. object recognition, as it is possible to have different levels of neural variability (or neural invariance) underlying both factorization and object recognition tasks.

      We do not mean to imply that factorization and invariance are mutually exclusive, or that they fully characterize the space of possible representations. However, they are qualitatively distinct strategies for achieving behavioral capabilities like object recognition. In the revised manuscript we also include a comparison to object classification performance (Figures 5C & S4, black x’s) as a predictor of brain-like representations, alongside the results for factorization and invariance.

      In our revised Introduction and beginning of the Results section, we make it more clear that factorization and invariance are not mutually exclusive – indeed, our results show that both factorization and invariance for some scene variables like lighting and background identity are signatures of brain-like representations. Our study focuses on factorization because we believe its importance has not been studied or highlighted to the degree that invariance to “nuisance” parameters has in concert with selectivity to object identity in individual neuron tuning functions. Moreover, the loss functions used for supervised training functions of neural networks for image classification would seem to encourage invariance as a representational strategy. Thus, the finding that factorization of scene parameters is an equally good if not better predictor of brain-like representations may motivate new objective functions for neural network training.

      (4) Potential Confounding Factors in Empirical Findings: The correlation observed in Figure 3 between factorization and neural predictivity might be influenced by data dimensionality, rather than factorization per se [2]. Incorporating discussions around this recent finding could strengthen the paper.

      [2] Elmoznino E, Bonner MF. High-performing neural network models of the visual cortex benefit from high latent dimensionality. bioRxiv. 2022 Jul 13:2022-07.

      We thank the Reviewer for pointing out this important, potential confound and the need for a direct quantification. We have now included an analysis computing how well dimensionality (measured using the participation ratio metric for natural images, as was done in [2] Elmoznino& Bonner bioRxiv. 2022) can account for model goodness-of-fit (additional pink bars in Figure 6). Factorization of scene parameters appears to add more predictive power than dimensionality on average (Figure 6, light shaded bars), and critically, factorization+classification jointly predict goodness-of-fit significantly better than dimensionality+classification for V4 and IT/HVC brain areas (Figure 6, dark shaded bars). Indeed, dimensionality+classification is only slightly more predictive than classification alone for V4 and IT/HVC indicating some redundancy in those measures with respect to neural predictivity of models (Figure 6, compare dark shaded pink bar to dashed line).

      That said, high-dimensional representations can, in principle, better support factorization, and thus we do not regard these two representational strategies necessarily in competition. Rather, our results suggest (consistent with [2]) that dimensionality is predictive of brain-like representation to some degree, such that some (but not all) of factorization’s predictive power may indeed owe to a partial correlation with dimensionality. We elaborate in the Discussion where this point comes up and now refer to the updated Figure 6 that shows the control for dimensionality.

      Conclusion:

      The paper offers insightful empirical research with useful implications for understanding visual processing in primates and DNNs. The paper would benefit from a more nuanced discussion of perceptual and neural invariance, as well as a deeper discussion of the coexistence of factorization, recognition, and invariance in neural representation geometry. Additionally, addressing the potential confounding factors in the empirical findings on the correlation between factorization and neural predictivity would strengthen the paper's conclusions.

      Taken together, we hope that the changes described above address the distinction between neural and perceptual invariance, provide a more balanced understanding of the contributions of factorization, invariance, and local representational geometry, and rule against dimensionality for natural images as contributing to the main finding of the benefits from factorization of scene parameters.

      Reviewer #2 (Public Review):

      Summary:

      The dominant paradigm in the past decade for modeling the ventral visual stream's response to images has been to train deep neural networks on object classification tasks and regress neural responses from units of these networks. While object classification performance is correlated to the variance explained in the neural data, this approach has recently hit a plateau of variance explained, beyond which increases in classification performance do not yield improvements in neural predictivity. This suggests that classification performance may not be a sufficient objective for building better models of the ventral stream. Lindsey & Issa study the role of factorization in predicting neural responses to images, where factorization is the degree to which variables such as object pose and lighting are represented independently in orthogonal subspaces. They propose factorization as a candidate objective for breaking through the plateau suffered by models trained only on object classification.

      They claim that (i) maintaining these non-class variables in a factorized manner yields better neural predictivity than ignoring non-class information entirely, and (ii) factorization may be a representational strategy used by the brain.

      The first of these claims is supported by their data. The second claim does not seem well-supported, and the usefulness of their observations is not entirely clear.

      Strengths:

      This paper challenges the dominant approach to modeling neural responses in the ventral stream, which itself is valuable for diversifying the space of ideas.

      This paper uses a wide variety of datasets, spanning multiple brain areas and species. The results are consistent across the datasets, which is a great sign of robustness.

      The paper uses a large set of models from many prior works. This is impressively thorough and rigorous.

      The authors are very transparent, particularly in the supplementary material, showing results on all datasets. This is excellent practice.

      Weaknesses:

      (1) The primary weakness of this paper is a lack of clarity about what exactly is the contribution. I see two main interpretations: (1-A) As introducing a heuristic for predicting neural responses that improve over-classification accuracy, and (1-B) as a model of the brain's representational strategy. These two interpretations are distinct goals, each of which is valuable. However, I don't think the paper in its current form supports either of them very well:

      (1-A) Heuristic for neural predictivity. The claim here is that by optimizing for factorization, we could improve models' neural predictivity to break through the current predictivity plateau. To frame the paper in this way, the key contribution should be a new heuristic that correlates with neural predictivity better than classification accuracy. The paper currently does not do this. The main piece of evidence that factorization may yield a more useful heuristic than classification accuracy alone comes from Figure 5. However, in Figure 5 it seems that factorization along some factors is more useful than others, and different linear combinations of factorization and classification may be best for different data. There is no single heuristic presented and defended. If the authors want to frame this paper as a new heuristic for neural predictivity, I recommend the authors present and defend a specific heuristic that others can use, e.g. [K * factorization_of_pose + classification] for some constant K, and show that (i) this correlates with neural predictivity better than classification alone, and (ii) this can be used to build models with higher neural predictivity. For (ii), they could fine-tune a state-of-the-art model to improve this heuristic and show that doing so achieves a new state-of-the-art neural predictivity. That would be convincing evidence that their contribution is useful.

      Our paper does not make any strong claim regarding the Reviewer’s point 1-A (on heuristics for neural predictivity). In the Discussion, last paragraph, we better specify that our work is merely suggestive of claim 1-A about heuristics for more neurally predictive, more brainlike models. We believe that our paper supports the Reviewer’s point 1-B (on brain representation) as we discuss below.

      We leave it to future work to determine if factorization could help optimize models to be more brainlike. This treatment may require exploration of novel model architectures and loss functions, and potentially also more thorough neural datasets that systematically vary many different forms of visual information for validating any new models.

      (1-B) Model of representation in the brain. The claim here is that factorization is a general principle of representation in the brain. However, neural predictivity is not a suitable metric for this, because (i) neural predictivity allows arbitrary linear decoders, hence is invariant to the orthogonality requirement of factorization, and (ii) neural predictivity does not match the network representation to the brain representation. A better metric is representational dissimilarity matrices. However, the RDM results in Figure S4 actually seem to show that factorization does not do a very good job of predicting neural similarity (though the comparison to classification accuracy is not shown), which suggests that factorization may not be a general principle of the brain. If the authors want to frame the paper in terms of discovering a general principle of the brain, I suggest they use a metric (or suite of metrics) of brain similarity that is sensitive to the desiderata of factorization, e.g. doesn't apply arbitrary linear transformations, and compare to classification accuracy in addition to invariance.

      We agree with the Reviewer about the shortcomings of neural predictivity for comparing representational geometries, and in our revised manuscript we have provided a more comprehensive set of results that includes RDM predictivity in new Figures 6 & 7, alongside the results for neural fit predictivity. In addition, as suggested we added classification accuracy predictivity in Figures 5C & S4 (black x’s) for visual comparison to factorization/invariance. In Figure S4 on RDMs, it is apparent how factorization is at least as good a predictor as classification on all V4 & IT datasets from both monkeys and humans (compared x’s to filled circles in Figure S4; note that some of the points from the original Figure S4 changed as we discovered a bug in the code that specifically affected the RDM analysis for a few of the datasets).

      We find that the newly included RDM analyses in Figures 6 & 7 are consistent with the conclusions of the neural fit regression analyses: that the correlation of factorization metrics with RDM matches are strong, comparable in magnitude to that of classification accuracy (Figure 6, 3rd & 4th columns, compare black dashed line to faded colored bars) and are not fully accounted for by the model’s classification accuracy alone (Figure 6, 3rd & 4th columns, higher unfaded bars for classification combined with factorization, and see corresponding example scatters in Figure 7 middle/bottom rows).

      It is encouraging that the added benefit of factorization for RDM predictivity accounting for classification performance is at least as good as the improvement seen for neural fit predictivity (Figure 6, 1st & 2nd columns for encoding fits versus 3rd & 4th columns for RDM correlations).

      (2) I think the comparison to invariance, which is pervasive throughout the paper, is not very informative. First, it is not surprising that invariance is more weakly correlated with neural predictivity than factorization, because invariant representations lose information compared to factorized representations. Second, there has long been extensive evidence that responses throughout the ventral stream are not invariant to the factors the authors consider, so we already knew that invariance is not a good characterization of ventral stream data.

      While we appreciate the Reviewer’s intuition that highly invariant representations are not strongly supported in the high-level visual cortex, we nevertheless thought it was valuable to put this intuition to a quantitative, detailed test. As a result, we uncovered effects that were not obvious a priori, at least to us – for example, that invariance for some scene parameters (camera view, object pose) is negatively correlated with neural predictions while invariance to others (background, lighting) is positively correlated. Thus, our work exercises the details of invariance for different types of information.

      (3) The formalization of the factorization metric is not particularly elegant, because it relies on computing top K principal components for the other-parameter space, where K is arbitrarily chosen as 10. While the authors do show that in their datasets the results are not very sensitive to K (Figure S5), that is not guaranteed to be the case in general. I suggest the authors try to come up with a formalization that doesn't have arbitrary constants. For example, one possibility that comes to mind is E[delta_a x delta_b], where 'x' is the normalized cross product, delta_a, and delta_b are deltas in representation space induced by perturbations of factors a and b, and the expectation is taken over all base points and deltas. This is just the first thing that comes to mind, and I'm sure the authors can come up with something better. The literature on disentangling metrics in machine learning may be useful for ideas on measuring factorization.

      Thanks to the Reviewer for raising this point. First, we wish to clarify a potential misunderstanding of the factorization metric: the number K of principal components we choose is not an arbitrary constant, but rather calibrated to capture a certain fraction of variance, set to 90% by default in our analyses. While this variance threshold is indeed an arbitrary hyperparameter, it has a more intuitive interpretation than the number of principal components.

      Nonetheless, the Reviewer’s comment did inspire us to consider another metric for factorization that does not depend on any arbitrary parameters. In the revised version, we now include a covariance matrix based metric which simply measures the elementwise correlation of the covariance matrices induced by varying the scene parameter of interest and the covariance matrix induced by varying the other parameters (and then subtracts this quantity from 1).

      Correspondingly, we now present results for both the new covariance based measure and the original PCA based one in Figures 5C, 6, and 7. The main findings remain largely the same when using the covariance based metric, and the covariance based metric (Figure 5C, compare light shaded to dark shaded filled circles; Figure 6, compare top row to bottom row; Figure 7, compare middle rows to bottom rows).

      Ultimately, we believe these two metrics are complementary and somewhat analogous to two metrics commonly used for measuring dimensionality (the number of components needed to explain a certain fraction of the variance, analogous to our original PCA based definition; the participation ratio, analogous to our covariance based definition). We have added the formula for the covariance based factorization metric along with a brief description to the Methods.

      (4) The authors defined the term "factorization" according to their metric. I think introducing this new term is not necessary and can be confusing because the term "factorization" is vague and used by different researchers in different ways. Perhaps a better term is "orthogonality", because that is clear and seems to be what the authors' metric is measuring.

      We agree with the Reviewer that factorization has become an overloaded term. At the same time, we think that in this context, the connotation of the term factorization effectively conveys the notion of separating out different latent sources of variance (factors) such that they can be encoded in orthogonal subspaces.

      To aid clarity, we now mention in the Introduction that factorization defined here is meant to measure orthogonalization of scene factors. Additionally, in the Discussion section, we now go into more detail comparing our metric to others previously used in the literature, including orthogonality, to help put it in context.

      (5) One general weakness of the factorization paradigm is the reliance on a choice of factors. This is a subjective choice and becomes an issue as you scale to more complex images where the choice of factors is not obvious. While this choice of factors cannot be avoided, I suggest the authors add two things: First, an analysis of how sensitive the results are to the choice of factors (e.g. transform the basis set of factors and re-run the metric); second, include some discussion about how factors may be chosen in general (e.g. based on temporal statistics of the world, independent components analysis, or something else).

      The Reviewer raises a very reasonable point about the limitation of this work. While we limited our analysis to generative scene factors that we know about and that could be manipulated, there are many potential factors to consider. It is not clear to us exactly how to implement the Reviewer’s suggestion of transforming the basis set of factors, as the factors we consider are highly nonlinear in the input space. Ultimately, we believe that finding unsupervised methods to characterize the “true” set of factors that is most useful for understanding visual representations is an important subject for future work, but outside the scope of this particular study. We have added a comment to this effect in the Discussion.

      Reviewer #3 (Public Review):

      Summary:

      Object classification serves as a vital normative principle in both the study of the primate ventral visual stream and deep learning. Different models exhibit varying classification performances and organize information differently. Consequently, a thriving research area in computational neuroscience involves identifying meaningful properties of neural representations that act as bridges connecting performance and neural implementation. In the work of Lindsey and Issa, the concept of factorization is explored, which has strong connections with emerging concepts like disentanglement [1,2,3] and abstraction [4,5]. Their primary contributions encompass two facets: (1) The proposition of a straightforward method for quantifying the degree of factorization in visual representations. (2) A comprehensive examination of this quantification through correlation analysis across deep learning models.

      To elaborate, their methodology, inspired by prior studies [6], employs visual inputs featuring a foreground object superimposed onto natural backgrounds. Four types of scene variables, such as object pose, are manipulated to induce variations. To assess the level of factorization within a model, they systematically alter one of the scene variables of interest and estimate the proportion of encoding variances attributable to the parameter under consideration.

      The central assertion of this research is that factorization represents a normative principle governing biological visual representation. The authors substantiate this claim by demonstrating an increase in factorization from macaque V4 to IT, supported by evidence from correlated analyses revealing a positive correlation between factorization and decoding performance. Furthermore, they advocate for the inclusion of factorization as part of the objective function for training artificial neural networks. To validate this proposal, the authors systematically conduct correlation analyses across a wide spectrum of deep neural networks and datasets sourced from human and monkey subjects. Specifically, their findings indicate that the degree of factorization in a deep model positively correlates with its predictability concerning neural data (i.e., goodness of fit).

      Strengths:

      The primary strength of this paper is the authors' efforts in systematically conducting analysis across different organisms and recording methods. Also, the definition of factorization is simple and intuitive to understand.

      Weaknesses:

      This work exhibits two primary weaknesses that warrant attention: (i) the definition of factorization and its comparison to previous, relevant definitions, and (ii) the chosen analysis method.

      Firstly, the definition of factorization presented in this paper is founded upon the variances of representations under different stimuli variations. However, this definition can be seen as a structural assumption rather than capturing the effective geometric properties pertinent to computation. More precisely, the definition here is primarily statistical in nature, whereas previous methodologies incorporate computational aspects such as deviation from ideal regressors [1], symmetry transformations [3], generalization [5], among others. It would greatly enhance the paper's depth and clarity if the authors devoted a section to comparing their approach with previous methodologies [1,2,3,4,5], elucidating any novel insights and advantages stemming from this new definition.

      [1] Eastwood, Cian, and Christopher KI Williams. "A framework for the quantitative evaluation of disentangled representations." International conference on learning representations. 2018.

      [2] Kim, Hyunjik, and Andriy Mnih. "Disentangling by factorising." International Conference on Machine Learning. PMLR, 2018.

      [3] Higgins, Irina, et al. "Towards a definition of disentangled representations." arXiv preprint arXiv:1812.02230 (2018).

      [4] Bernardi, Silvia, et al. "The geometry of abstraction in the hippocampus and prefrontal cortex." Cell 183.4 (2020): 954-967.

      [5] Johnston, W. Jeffrey, and Stefano Fusi. "Abstract representations emerge naturally in neural networks trained to perform multiple tasks." Nature Communications 14.1 (2023): 1040.

      Thanks to the Reviewer for this suggestion. We agree that our initial submission did not sufficiently contextualize our definition of factorization with respect to other related notions in the literature. We have added additional discussion of these points to the Discussion section in the revised manuscript and have included therein the citations provided by the Reviewer (please see the third paragraph of Discussion).

      Secondly, in order to establish a meaningful connection between factorization and computation, the authors rely on a straightforward synthetic model (Figure 1c) and employ multiple correlation analyses to investigate relationships between the degree of factorization, decoding performance, and goodness of fit. Nevertheless, the results derived from the synthetic model are limited to the low training-sample regime. It remains unclear whether the biological datasets under consideration fall within this low training-sample regime or not.

      We agree that our model in Figure 1C is very simple and does not fully capture the complex interactions between task performance and features of representational geometry, like factorization. We intend it only as a proof of concept to illustrate how factorized representations can be beneficial for some downstream task use cases. While the benefits of factorized representations disappear for large numbers of samples in this simulation, we believe this is primarily a consequence of the simplicity and low dimensionality of the simulation. Real-world visual information is complex and high-dimensional, and as such the relevant sample size regime in which factorization offers tasks benefits may be much greater. As a first step toward this real-world setting, Figure 2 shows how decreasing the amount of factorization in neural population data in macaque V4/IT can have an effect on object identity decoding.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      Missing citations: The paper could benefit from discussions & references to related papers, such as:

      Higgins I, Chang L, Langston V, Hassabis D, Summerfield C, Tsao D, Botvinick M. Unsupervised deep learning identifies semantic disentanglement in single inferotemporal face patch neurons. Nature communications. 2021 Nov 9;12(1):6456.

      We have added additional discussion of related work, including the suggested reference and others on disentanglement, to the Discussion section in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Here are several small recommendations for the authors, all much more minor than those in the public review:

      I suggest more use of equations in methods sections about Figure 1C and macaque neural data analysis.

      Thanks for this suggestion. We have added new Equation 1 for the method transforming neural data to reduce factorization of a variable while preserving other firing rate statistics.

      In Figure 1-C, the methods indicate that Gaussian noise was added. This is a very important detail, and complexifies the interpretation of the figure because it adds an assumption about the structure of noise. In other words, if I understand correctly, the correct interpretation of Figure 1C is "assuming i.i.d. noise, decoding accuracy improves with factorization." The i.i.d. noise is a big assumption, and it is debated how well the brain satisfies this assumption. I suggest you either omit noise for this figure or clearly state in the main text (e.g. caption) that the figure must be interpreted under an i.i.d. noise assumption.

      We have added an explicit statement of the i.i.d. noise assumption to the Figure 1C legend.

      For Figure 2B, I suggest labeling the x-axis clearly below the axis on both panels. Currently, it is difficult to read, particularly in print.

      We have made the x-axis labels more clear and included on both panels.

      Figure 3A is difficult to read because of the very small task. I suggest avoiding such small fonts.

      We agree that Figure 3A is difficult to read. We have broken out Figure 3 into two new Figures 3 & 4 to increase clarity and sizing of text in Figure 3A.

      Reviewer #3 (Recommendations For The Authors):

      To strengthen this work, it is advisable to incorporate more comprehensive comparisons with previous research, particularly within the machine learning (ML) community. For instance, it would be beneficial to explore and reference works focusing on disentanglement [1,2,3]. This would provide valuable context and facilitate a more robust understanding of the contributions and novel insights presented in the current study.

      We have added additional discussion of related work and other notions similar to factorization to the Discussion section in the revised manuscript.

      Additionally, improving the quality of the figures is crucial to enhance the clarity of the findings:

      • Figure 2: The caption of subfigure B could be revised for greater clarity.

      Thank you, we have substantially clarified this figure caption.

      • Figure 3: Consider a more equitable approach for computing the correlation coefficient, such as calculating it separately for different types of models. In the case of supervised models, it appears that the correlation between invariance and goodness of fit may not be negligible across various scene parameters.

      We appreciate the suggestion, but we are not confident in our ability to conclude much from analyses restricted to particular model classes, given the relatively small N and the fact that the different model classes themselves are an important source of variance in our data.

      • Figure 4: To enhance the interpretability of subfigures A and B, it may be beneficial to include p-values (indicating confidence levels).

      As we supply bootstrapped confidence intervals for our results, which provide at least as much information as p-values, and most of the effects of interest are fairly stark when comparing invariance to factorization, p-values were not needed to support our points. We added a sentence to the legend of new Figure 5 (previously Figure 4) indicating that error bars reflect standard deviations over bootstrap resampling of the models.

      • Figure 5: For subfigure B, it could be advantageous to plot the results solely for factorization, allowing for a clear assessment of whether the high correlation observed in Classification+Factorization arises from the combined effects of both factors or predominantly from factorization alone.

      First, we clarify/note that the scatters solely for factorization that the Reviewer seeks are already presented earlier in the manuscript across all conditions in Figures 4A,B and Figure S2.

      While we could also include these in new Figure 7 (previously Figure 5B) as the Reviewer suggests, we believe it would distract from the message of that figure at the end of the manuscript – which is that factorization is useful as a supplement to classification in predictive matches to neural data. Nonetheless, new Figure 6 (old Figure 5A) provides a summary quantification of the information that the reviewer requests (Fig. 6, faded colored bars reflect the contribution of factorization alone).

    1. Author response:

      eLife assessment

      This study presents a valuable finding on sperm flagellum and HTCA stabilization. The evidence supporting the authors' claims is incomplete. The work will be of broad interest to cell and reproductive biologists working on cilium and sperm biology.

      We thank the Editor and the two referees for their time in carefully reviewing our work, and we are grateful for the helpful guidance about how to improve our study. We will supplement the experiments and provide quantitative data guided by the referees’ comments in the revised manuscript. Additionally, we will polish the manuscript and add further context to help readers understand the significance of this work.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this paper, Wu et al. investigated the physiological roles of CCDC113 in sperm flagellum and HTCA stabilization by using CRISPR/Cas knockouts mouse models, co-IP, and single sperm imaging. They find that CCDC113 localizes in the linker region among radial spokes, the nexin-dynein regulatory complex (N-DRC), and doublet microtubules (DMTs) RS, N-DRC, and DMTs and interacts with axoneme-associated proteins CFAP57 and CFAP91, acting as an adaptor protein that facilitates the linkage between RS, N-DRC, and DMTs within the sperm axoneme. They show the disruption of CCDC113 produced spermatozoa with disorganized sperm flagella and CFAP91, DRC2 could not colocalize with DMTs in Ccdc113-/- spermatozoa. Interestingly, the data also indicate that CCDC113 could localize on the HTCA region, and interact with HTCA-associated proteins. The knockout of Ccdc113 could also produce acephalic spermatozoa. By using Sun5 and Centlein knockout mouse models, the authors further find SUN5 and CENTLEIN are indispensable for the docking of CCDC113 to the implantation site on the sperm head. Overall, the experiments were designed properly and performed well to support the authors' observation in each part. Furthermore, the study's findings offer valuable insights into the physiological and developmental roles of CCDC113 in the male germ line, which can provide insight into impaired sperm development and male infertility. The conclusions of this paper are mostly well supported by data, but some points need to be clarified and discussed.

      We thank Reviewer #1 for his or her critical reading and the positive assessment.

      (1) In Figure 1, a sperm flagellum protein, which is far away from CCDC113, should be selected as a negative control to exclude artificial effects in co-IP experiments.

      We greatly appreciate Reviewer #1’s insightful suggestion. We will include a negative control in the co-IP experiment to eliminate potential artificial effects.

      (2) Whether the detachment of sperm head and tail in Ccdc113-/- mice is a secondary effect of the sperm flagellum defects? The author should discuss this point.

      Good question. Given that CCDC113 could localized in the sperm neck region, and interact with SUN5 and CENTELIN, CCDC113 may directly function in the sperm head and tail connection. Indeed, PAS staining revealed that Ccdc113–/– sperm heads with abnormal orientation in stages V–VIII seminiferous epithelia (Fig. 6C), and transmission electron microscopy (TEM) analysis further revealed that the disruption of CCDC113 caused the detachment of the destroyed coupling apparatus from the sperm head in step 9–11 spermatids (Fig. 6D). All these results suggest that the detachment of sperm head and tail in Ccdc113–/– mice may be not a secondary effect of the sperm flagellum defects. And we have discuss this point as below:

      CCDC113 could interact with SUN5 and CENTLEIN, but not PMFBP1 (Fig. 7A-C), and CCDC113 was in the cytoplasm in Sun5–/– and Centlein–/– spermatozoa (Fig. 7L, K). In addition, CCDC113 colocalizes with SUN5 in the HTCA region, and the immunofluorescence staining in spermatozoa shows that SUN5 is closer to the sperm nucleus than CCDC113 (Fig. 7G, H). Therefore, SUN5 and CENTLEIN may be more closed to the sperm nucleus compared with CCDC113. PAS staining revealed that Ccdc113–/– sperm heads with abnormal orientation in stages V–VIII seminiferous epithelia (Fig. 6C), and transmission electron microscopy (TEM) analysis further revealed that the disruption of CCDC113 caused the detachment of the destroyed coupling apparatus from the sperm head in step 9–11 spermatids (Fig. 6D). All these results suggest that the detachment of sperm head and tail in Ccdc113–/– mice may be not a secondary effect of the sperm flagellum defects.

      (3) Given that some cytoplasm materials could be observed in Ccdc113-/- spermatozoa (Fig. 5A), whether CCDC113 is also essential for cytoplasmic removal?

      Good question. Unremoved cytoplasm could be detected in spermatozoa by using transmission electron microscopy (TEM) analysis, including disrupted mitochondria, damaged axonemes, and large vacuoles, indicating cytoplasmic removal defects in Ccdc113–/– mice. We have discussed this point as below:

      “Unremoved cytoplasm could be detected in spermatozoa by using transmission electron microscopy (TEM) analysis, including disrupted mitochondria, damaged axonemes, and large vacuoles, indicating cytoplasmic removal defects in Ccdc113–/– mice (Fig. 5A).”

      (4) Although CCDC113 could not bind to PMFBP1, the localization of CCDC113 in Pmfbp1-/- spermatozoa should be also detected to clarify the relationship between CCDC113 and SUN5-CENTLEIN-PMFBP1.

      We are thankful to Reviewer #1 for this suggestion. We will analyze the localization of CCDC113 in Pmfbp1-/- spermatozoa to clarify the relationship between CCDC113 and SUN5-CENTLEIN-PMFBP1.

      Reviewer #2 (Public Review):

      Summary:

      In the present study, the authors select the coiled-coil protein CCDC113 and revealed its expression in the stages of spermatogenesis in the testis as well as in the different steps of spermiogenesis with expression also mapped in the different parts of the epididymis. Gene deletion led to male infertility in CRISPR-Cas9 KO mice and PAS staining showed defects mapped in the different stages of the seminiferous cycle and through the different steps of spermiogenesis. EM and IF with several markers of testis germ cells and spermatozoa in the epididymis indicated defects in flagella and head-to-tail coupling for flagella as well as acephaly. The authors' co-IP experiments of expressed CCDC113 in HEK293T cells indicated an association with CFAP91 and DRC2 as well as SUN5 and CENTLEIN.

      The authors propose that CCDC113 connects CFAP91 and DRC2 to doublet microtubules of the axoneme and CCDC113's association with SUN5 and CENTLEIN to stabilize the sperm flagellum head-to-tail coupling apparatus. Extensive experiments mapping CCDC13 during postnatal development are reported as well as negative co-IP experiments and studies with SUN5 KO mice as well as CENTLEIN KO mice.

      Strengths:

      The authors provide compelling observations to indicate the relevance of CCDC113 to flagellum formation with potential protein partners. The data are relevant to sperm flagella formation and its coupling to the sperm head.

      We are grateful to Reviewer #2 for his or her recognition of the strength of this study.

      Weaknesses:

      The authors' observations are consistent with the model proposed but the authors' conclusions for the mechanism may require direct demonstration in sperm flagella. The Walton et al paper shows human CCDC96/113 in cilia of human respiratory epithelia. An application of such methodology to the proteins indicated by Wu et al for the sperm axoneme and head-tail coupling apparatus is eagerly awaited as a follow-up study.

      We thank Reviewer 2 for his/her kindly help in improving the manuscript. We now understand that directly detection of CCDC113 precise localization in sperm axoneme and head-tail coupling apparatus (HTCA) using cryo-electron microscopy (cryo-EM) could powerfully strengthen our model. Recent advances in cryo-electron microscopy (cryo-EM) have facilitated the analysis of axonemal structures and determined the structures of native axonemal DMTs from mouse, bovine, and human sperm (Leung et al., 2023; Zhou et al., 2023). However, some high-resolution structures of sperm axoneme and HTCA regions, including those involving CCDC113, remain to be detected. Thus, we would like to discuss this point and regard it as an important follow-up study.

      References:

      Bazan, R., Schröfel, A., Joachimiak, E., Poprzeczko, M., Pigino, G., & Wloga, D. (2021). Ccdc113/Ccdc96 complex, a novel regulator of ciliary beating that connects radial spoke 3 to dynein g and the nexin link. PLoS Genet, 17(3), e1009388.

      Ghanaeian, A., Majhi, S., McCafferty, C. L., Nami, B., Black, C. S., Yang, S. K., Legal, T., Papoulas, O., Janowska, M., Valente-Paterno, M., Marcotte, E. M., Wloga, D., & Bui, K. H. (2023). Integrated modeling of the Nexin-dynein regulatory complex reveals its regulatory mechanism. Nat Commun, 14(1), 5741.

      Leung, M. R., Zeng, J., Wang, X., Roelofs, M. C., Huang, W., Zenezini Chiozzi, R., Hevler, J. F., Heck, A. J. R., Dutcher, S. K., Brown, A., Zhang, R., & Zeev-Ben-Mordehai, T.  (2023). Structural specializations of the sperm tail. Cell, 186(13), 2880-2896.e2817

      Walton, T., Gui, M., Velkova, S., Fassad, M. R., Hirst, R. A., Haarman, E., O'Callaghan, C., Bottier, M., Burgoyne, T., Mitchison, H. M., & Brown, A. (2023). Axonemal structures reveal mechanoregulatory and disease mechanisms. Nature, 618(7965), 625-633.

      Zhou, L., Liu, H., Liu, S., Yang, X., Dong, Y., Pan, Y., Xiao, Z., Zheng, B., Sun, Y., Huang, P., Zhang, X., Hu, J., Sun, R., Feng, S., Zhu, Y., Liu, M., Gui, M., & Wu, J. (2023). Structures of sperm flagellar doublet microtubules expand the genetic spectrum of male infertility. Cell, 186(13), 2897-2910.e2819.

    1. Author response:

      We thank the reviewers for their thoughtful and insightful comments. We were pleased to see that the reviewers and editors consider our work a “welcome addition” that “fills a large gap” in comparative genomics methods and provides “an unparalleled community resource of insect genome regulatory annotations.”

      Many of the reviewers’ comments reflect weaknesses in our description of the methodology. As the basic SCRMshaw methodology has been published previously, we had opted for brevity over detail in the current manuscript. We recognize now that we went too far in that direction, and we will include more methodological detail in our revised submission, along with easier access to the code we used. The reviewers also offered some helpful suggestions regarding data availability which we intend to address, including direct download of the results in GFF format and adding to the results database several species that were inadvertently omitted.

      Reviewer 2 expressed concerns about benchmarking SCRMshaw against other methods. We respectfully feel this lies outside the scope of the current study, which focuses on application of SCRMshaw to generate a multi-species annotation resource rather than on an attempt to show that SCRMshaw is superior to other approaches. We provide evidence in this manuscript, as well as in previous publications, that supports the effectiveness of SCRMshaw as an approach for regulatory element discovery that is suitable for the task at hand. Benchmarking for regulatory element discovery brings many challenges, as there are no comprehensive “truth” sets to serve as a comparison baseline. We therefore do not attempt strong claims here about the relative merits of SCRMshaw vs. other methods (although we have explored this in previous publications). Note that we also previously demonstrated commonality of transcription factor binding sites in cross-species SCRMshaw predictions, in particular in Kazemian et al. 2014 (Genome Biol. Evol. 6:2301).

      Finally, because it has important implications for understanding our results, we would like to point out a small misconception in Reviewer 2’s Summary of our study. The reviewer states that we “identify the most likely predicted enhancer candidates based on the proximity of an orthologous target gene.” We stress, however, that putative target gene assignments and identities have no impact at all on our prediction of regulatory sequences. Predictions are solely based on sequence-dependent SCRMshaw scores, with no regard to the nature or identities of nearby annotated features. Putative target genes are mapped to Drosophila orthologs purely as a convenience to aid in interpreting and prioritizing the predicted regulatory elements. We will take care to clarify this important point in our revised submission.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The study elucidates a detailed molecular mechanism of the initial stages of transport in a medically relevant GABA neurotransmitter transporter GAT1 and thus generates useful new insights for this protein family. In particular, it presents convincing evidence for the presence of a "staging binding site" that locally concentrates Na+ ions to increase transport activity, whilst solid evidence for how Na+ binding affects the larger scale dynamics.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript authored by Stockner and colleagues delves into the molecular simulations of Na+ binding pathway and the ionic interactions at the two known sodium binding sites site 1 and site 2. They further identify a patch of two acidic residues in TM6 that seemingly populate the Na+ ions prior to entry into the vestibule. These results highlight the importance of studying the ion-entry pathways through computational approaches and the authors also validate some of their findings through experimental work. They observe that sodium site 1 binding is stabilized by the presence of the substrate in the S1 site and this is particularly vital as the GABA carboxylate is involved in coordinating the Na+ ion unlike other monoamine transporters and binding of sodium to the Na2 site stabilizes the conformation of the GAT1 by reducing flexibility among the helical bundles involved in alternating access.

      Strengths:

      The study displays results that are generally consistent with available information from experiments on SLC6 transporters particularly GAT1 and puts forth the importance of this added patch of residues in the extracellular vestibule that could be of importance to the ion permeation in SLC6 transporters. This is a nicely performed study and could be improved if the authors could comment on and fix the following queries.

      We thank our reviewer for the overall positive evaluation.

      Weaknesses:

      (1) How conserved are the residue pair of D281-E283 in other SLC6 transporters. The authors commented on the presence of these residues in SERT but it would be nice to know how widespread these residues are in other SLC6 transporters like NET, GlyT, and DAT.

      We have created a sequence alignment of the entire human SLC6 family (Supplementary Figure 1) and found that E283 is polar or charged in all SLC6 transporters. D281 shows a higher level of conservation across the family compared to E283. D281 is negatively charged in approximately 50% of the SLC6 family members, an aspartate in all GABA transporters and a glutamate in all monoamine transporters.

      (2) Further, one would like to see the effect of individual mutations D281A and E283A on transport, surface expression, and EC50 of Na+ to gauge the effect on transport.

      We have carried out experiments to investigate the effects of the individual mutations. The results revealed intermediate effects between WT and the double mutant (D281A-E283A) and showed that the effects mostly align with the degree of conservation, as a neutralisation of D281 by alanine has a stronger effect than the E283A mutant. Both single mutants had minimal effects on the sodium dependence of uptake, D281A had a stronger effect on expression, Km and Vmax as compared to E283. Only D281A reduced surface expression, while E283A expresses to a similar level as wild type GAT1.

      (3) A clear figure of the S1 site where Na+ tends to stay prior to Na1 site interactions needs to be provided with a clear figure. Further, it is not entirely clear how access to S1 is altered if the transporter is in an outwardoccluded conformation if F294 is blocking solvent access. Please comment.

      We have modified the structural images in Figure 1, 5, 6 and 7 to improve their comprehensibility. We have also added a comment on the role of F294 as part of the outer hydrophobic gate to the discussion. In short, F294 does not occlude the passage to the S1 as long as GAT1 is outward open, and we find that GAT1 is outward open in all sodium binding simulations.

      (4) The p-value of the EC50 differences between GAT1WT and GAT1double mutant need to be mentioned. The difference in sodium dependence EC50 seems less than twofold, and it would be useful to mention how critical the role of the recruitment site is. Since the transport is not affected the site could play a transient role in attracting ions.

      We have added p-values or standard deviation to our data.

      (5) It would be very nice to know how K+ ions are attracted by this recruitment site. This could further act as a control simulation to test the preference for Na+ ions among SLC6 members.

      We think that attraction of potassium to the recruitment site is not of relevance, as the residues are at the extracellular side and exposed to bulk, where the concentration of sodium is high (typically 130-150 mM), while the concentration of potassium is very small (3-5 mM). Exploring sodium binding by simulations for all SLC6 members could be interesting, but clearly outside the scope of this manuscript.

      (6) Some of the important figures are not very clear. For instance, there should be a zoomed-in view of the recruitment site. The current one in Fig. 1b and 1c could be made clearer. Similarly as mentioned earlier the Na residence at the S1 site away from the Na1 and Na2 sites needs to be shown with greater clarity by putting side chain information in Fig. 6d.

      We have modified the structural images in Figure 1, 5, 6 and 7 to improve their comprehensibility.

      (7) The structural features that comprise the two principal components PC1 and PC2 should be described in greater detail.

      We have modified Figure 6 and added images that show the motions along PC1 and PC2. In addition, these are now better explained in the text.

      Reviewer #2 (Public Review):

      Summary:

      Starting from an AlphaFold2 model of the outward-facing conformation of the GAT1 transporter, the authors primarily use state-of-the-art MD simulations to dissect the role of the two Na+ ions that are known to be cotransported with the substrate, GABA (and a co-transported Cl- ion). The simulations indicated that Na+ binding to OF GAT depends on the electrostatic environment. The authors identify an extracellular recruiting site including residues D281 and E283 which they hypothesized to increase transport by locally increasing the available Na+ concentration and thus increasing binding of Na+ to the canonical binding sites NA1 and NA2. The charge-neutralizing double mutant D281A-E283A showed decreased binding in simulations. The authors performed GABA uptake experiments and whole-cell patch clamp experiments that taken together validated the hypothesis that the Na+ staging site is important for transport due to its role in pulling in Na+.

      Detailed analysis of the MD simulations indicated that Na+ binding to NA2 has multiple structural effects: The binding site becomes more compact (reminiscent of induced fit binding) and there is some evidence that it stabilizes the outward-facing conformation.

      Binding to NA1 appears to require the presence of the substrate, GABA, whose carboxylate moiety participates in Na+ binding; thus the simulations predict cooperativity between binding of GABA and Na+ binding to NA1.

      Strengths:

      -  MD simulations were used to propose a hypothesis (the existence of the staging Na+ site) and then tested with a mutant in simulations AND in experiments. This is an excellent use of simulations in combination with experiments.

      -  A large number of repeat MD simulations are generally able to provide a consistent picture of Na+ binding. Simulations are performed according to current best practices and different analyses illuminate the details of the molecular process from different angles.

      -  The role of GABA in cooperatively stabilizing Na+ binding to the NA1 site looks convincing and intriguing.

      We thank the review for the very supportive assessment.

      Weaknesses:

      -  Assessing the effects of Na+ binding on the large-scale motions of the transporter is more speculative because the PCA does not clearly cover all of the conformational space and the use of an AlphaFold2 model may have introduced structural inconsistencies. For example, it is not clear if movements of the inner gate are due to an AF2 model that's not well packed or really a feature of the open outward conformation.

      The long range effect of sodium binding to GAT1 and destabilisation of the inner gate has, based on our data, a causal effect. PCA separates conformational motions into degrees of freedom and sorts them according to the largest motions. Motions of TM5a were among the 2 largest motions, which suggests that these are relevant motions. To directly quantify their behaviour, we measured informative distances at the inner gate of GAT1, as shown in Figure 6i,j,k and separated data according to the presence of sodium in NA2.

      For the following reasons we exclude that the results are a consequence of structural inconsistencies introduced by AlphaFold2 and therefore not reflecting functionally relevant effects:

      (1) If depending on the model instead of sodium binding, the effects should not be correlated with the presence of sodium in the NA2 binding site.

      (2)  We carried out new simulations starting from the occluded GAT1 structure (Figure 6j,k). The data shows that in the occluded state the distance across the inner vestibule and the length of TM5a differ, consistent with our interpretation of the data. As sodium binding fixes GAT1 outwardfacing, as it also occurs in other SLC6 family members (Szöllősi and Stockner, 2022), the distances of the outward-open GAT1 are at the short extreme of the scale, distances of the inward-open state of the cryo-EM structure(s) are at the other extreme, while the occluded conformation of GAT1 shows intermediate values.

      (3)  We have observed the same property in SERT, for which we used experimental structures as starting structure (Gradisch et al., 2024), suggesting that this could be a generally mechanism.

      (4)  All available structures from the entire SLC6 family are consistent with structural effects of TM5a in response to bundle domain motions and therefore to binding of sodium to NA2 as it stabilized the outward-open state as well as transition to the inward facing conformation.

      - Quantitative analyses are difficult with the existing data; for example, the tICA "free energy" landscape is probably not converged because unbinding events haven't been observed.

      Simulations can always be too short and therefore not fully describe the complete underlying conformational ensemble. We added a statement in the discussion indicating this shortcoming. With respect to the tICA analysis in our manuscript, the tICA approach does, by design, not need long simulations that capture the full binding and unbinding in multiple instances to construct a correct free energy landscape. Instead, the tICA method builds on Markov chain dependencies and relies only on the convergence of transitions between hundreds of conformational microstates and the fluxes between them. The free energy profile derived for the S1, including NA1, TMP and NA2 and up to the salt bridge of the outer gate is well converged and we observed many transitions. In contrast, the entry from the recruitment side to the S1 has most likely a too low density of microstate and a too small number of transition to be considered converged with respect to quantifying the free energy of binding from bulk. We now explain this shortcoming.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      Authors should furnish p-values in the figure legends for experimental results.

      We have added the p-values to text and figure legends.

      Reviewer #2 (Recommendations For The Authors):

      -  Deposit simulation data in a public repository (input files, trajectories (possibly subsampled)).

      We deposited the data to Zenodo and provided the DOI: 10.5281/zenodo.10686813 to the data. As we were unable to upload the trajectories to zenodo, we deposited the starting and the end structures of the simulations.

      -  Please include a short discussion of the reliability of using an AF2 model instead of experimental structures. What is expected to be correct/which parts of the structure are potentially incorrect? What makes you think that the AF2 model is a good model of the OF conformation of GAT1?

      Unfortunately, an outward-facing structure of GAT1 is not available. We have initially worked with an outward-open homology model of GAT1 based on SERT (build with MODELLER), but the structural differences between SERT and GAT1 are sufficiently large that these models did not behave well in simulations and too frequently could not maintain a sealed inner gate, also forming a channel. In contrast to the SERT-based GAT1 model, the AlphaFold2 model of GAT1 behaved as expected and consistent with the behaviour of SERT in simulations and with general knowledge of protein dynamics from literature. Based on structural analysis of our simulations and on the comparison to SERT we could not identify a region of GAT1 which would be potentially behave incorrect or unexpectedly. We added a statement to the discussion on this potential limitation of the use of homology models.

      -  Fig 1a: Na+ densities are not very clear (both due to small size and the transparency). I have a hard time seeing where bulk, 2*bulk regions are --- are you showing "onion shells" of density? Perhaps investigate presenting as cuts through the full density?

      I like the labelling in terms of absolute density and multiples of bulk.

      We have created new images to improve the visualisation of data. The data are shown as onion shells (isosurface), with the shells at the indicated densities. This is now clearly stated. Transparency is needed, otherwise e.g. the inner onion shells would not be visible. The cut-through is intuitive, but we could not find a useful plain, as the densities are too extensively distributed in 3D and not on a single plain.

      -  Fig 1h-k: would be clearer if "recruitment site" (TMP?) was indicated in the figure.

      We have created a new image for the recruiting site (Figure 1b,c) and temporary site (Figure 1g) and indicated these two sites as appropriate.

      -  Show time series of Na+ binding with a suitable order parameter (z or distances to NA1 and NA2?) to show how ions bind spontaneously. Mark the different sites. Mark pre- and post-binding parts of trajectories.

      We have added time series for every simulation that shows sodium binding to the NA1 or NA2 to the supplementary information Figure 2a,b,c. These quantify the distances to the recruiting site, the temporary site and the respective sodium binding site.

      -  PCA - how much of the total variance was captured by PC1 and PC2?

      The variance captured by the PCs are shown as eigenvalues in supplementary information Figure 4. PC1 captures about 19% of the variance, PC2 8%.

      -  "We found that the inner hydrophobic gate is dynamic in the absence of Na2" -- is this instability due to the AF2 model or likely realistic? E.g. was similar behaviour ever observed in simulations of the occluded state?

      In simulations of the occluded state we do not see such instabilities as observed in the outward-open state in the absence of sodium (Figure 6). As these larger scale fluctuations are not randomly distributed across all simulations starting from the AlphaFold2 models, but confined to the systems without sodium, it is unlikely an effect of the AlphaFold2 model.

      Please note, we have seen comparable behaviour in simulations of SERT starting from experimental structures (Gradisch et al., 2024), therefore suggesting a more general mechanism.

      -  Cooperativity between GABA-binding and Na+ binding to NA1: How would this lead to an experimentally measurable signature, i.e., which experiments could validate this interesting prediction?

      Direct detection of cooperativity is difficult to separate from other effects in experiments, as sodium binding and transport involves NA1 and NA2, NA2 has a higher affinity according to our data, while mutations will not only affect cooperativity, but will also have other effects.

      Conformational changes can also complicate experimental detection, as NA2 stabilises the outward-open conformation, while NA1+GABA binding triggers the transition to the inward-open state. To quantify cooperativity, it would be important to isolate the cooperative from all other effects, which is a challenge. Support for cooperativity has been found by (Zhou, Zomot and Kanner, 2006; Meinild and Forster, 2012) using this route. In the first paper the authors make use of lithium that only binds to the NA2, even though lithium is not only a mere NA2 selective ligand and otherwise identical to sodium. By comparing two GABA concentrates the authors showed that the sodium dependence of GABA transport is left shifted at higher GABA concentrations, which is not the case in the absence of lithium. This data is indirect, but consistent with cooperativity between GABA and NA1-bound sodium, as GABA transport mainly reflects binding of sodium to NA1. Similar approaches could be further explored, for example by varying the GABA concentration instead of sodium. Other options could be to create an outward-facing and conformationally locked GAT1 and to measure the cooperativity of sodium and GABA binding using for example the scintillation proximity assay. Most likely the assay would also need a way to be NA2 binding independent. We are not aware of such a GABA transporter system.

      -  There are some instances of [SI Figure] or [citation needed] that should be cleaned up.

      We have corrected these instances.

      References

      Gradisch, R. et al. (2024) ‘Ligand coupling mechanism of the human serotonin transporter differentiates substrates from inhibitors’, Nature Communications, 15(1), p. 417. Available at: https://doi.org/10.1038/s41467-023-44637-6.

      Meinild, A.-K. and Forster, I.C. (2012) ‘Using lithium to probe sequential cation interactions with GAT1’, American Journal of Physiology. Cell Physiology, 302(11), pp. C1661-1675. Available at: https://doi.org/10.1152/ajpcell.00446.2011.

      Szöllősi, D. and Stockner, T. (2022) ‘Sodium Binding Stabilizes the Outward-Open State of SERT by Limiting Bundle Domain Motions’, Cells, 11(2), p. 255. Available at: https://doi.org/10.3390/cells11020255.

      Zhou, Y., Zomot, E. and Kanner, B.I. (2006) ‘Identification of a lithium interaction site in the gamma-aminobutyric acid (GABA) transporter GAT-1’, The Journal of Biological Chemistry, 281(31), pp. 22092–22099. Available at: https://doi.org/10.1074/jbc.M602319200.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a potentially important study that integrates QM/MM free energy simulations and experimental kinetic analyses to probe the nature of phosphoryl transfer transition state in adenylate kinase. The idea that the transition state ensemble encompasses conformations with substantially different structural features (including the breaking/forming bonds) is interesting and potentially applicable to many other enzyme systems. In the current form, however, the study is considered incomplete since the connection between the putative transition state ensemble from the computations and key experimental observables, such as the activation entropy, is not well established.

      Thank you so much for your great professional work as the senior editor. We thank you and the reviewers for carefully reading our manuscript and for very valuable suggestions. In response, we have performed the recommended additional calculations and modified the manuscript as suggested, in order to improve the connection between the transition state ensemble obtained from simulations and experimental observables. Importantly, the new simulations fully corroborate our original findings, and thanks to your work made the revised manuscript stronger and better.

      Below are our point-to-point responses:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study investigated the phosphoryl transfer mechanism of the enzyme adenylate kinase, using SCC-DFTB quantum mechanical/molecular mechanical (QM/MM) simulations, along with kinetic studies exploring the temperature and pH dependence of the enzyme's activity, as well as the effects of various active site mutants. Based on a broad free energy landscape near the transition state, the authors proposed the existence of wide transition states (TS), characterized by the transferring phosphoryl group adopting a meta-phosphate-like geometry with asymmetric bond distances to the nucleophilic and leaving oxygens. In support of this finding, kinetic experiments were conducted with Ca2+ ions (instead of Mg2+) at different temperatures, which revealed a negative entropy of activation. Overall, in its present form, the manuscript has more weaknesses in terms of interpretation of the simulation results than strengths, which need to be addressed by the authors.

      We thank the reviewer for carefully reviewing our manuscript and the great suggestions for the revisions. Thanks to these points raised we are able to submit a revised manuscript addressing all questions.

      There are several major concerns:

      First, the authors' claim that the catalytic mechanism of adenylate kinase (Adk) has not been previously studied by QM/MM free energy simulations is somewhat inaccurate. In fact, two different groups have previously investigated the catalytic mechanism of Adk. The first study, cited by the authors themselves, used the string method to determine the minimum free energy profile, but resulted in an unexpected intermediate; note that they obtained a minimum free energy profile, not a minimum energy profile. The second study (Ojedat-May et al., Biochemistry 2021 and Dulko-Smith et al., J Chem Inf Model 2023) overlaps substantially with the present study, but its main conclusions differ from those of the present study. Therefore, a thorough discussion comparing the results of these studies is needed.

      We thank the reviewer for pointing out two additional articles to the one we had discussed. Accordingly, we have changed the claim that the Adk mechanism was not previously studied using QM/MM, and added a discussion of the latter two citations. Notably, although the general outcome is consistent with our results, the conclusions and details of findings differ. The two additional papers agree with our findings of a concerted TS, and not the metastable intermediate as observed in the QM/MM simulation of Shibanuma et al., 2020.

      The difference of the two papers by Nam/Wolf-Watz and our manuscript pointed out by the reviewer is mainly in the interpretation. Importantly, the authors do not primarily focus on the nature of the Transition State for the P-transfer reaction, but on the connection between the chemical and conformational steps. We have extensively reported on the fact that the conformational changes of lid opening and closing are obviously unrelated to the chemical step, see also our free energy landscape in Fig. 1a. Consequently, there cannot be a coupling. We note that our group had extensively studied the lid opening step both experimentally and computationally before. In contrast, we discover here a fundamental concept for rate enhancement by an optimal enzyme: the reduction in the activation entropy by a wide TSE. New experiments were triggered by this finding, that then delivered experimental validation of this concept.

      In the revised version of the manuscript, and according to the reviewer’s suggestion we expanded our discussion to these two additional papers.

      Second, the interpretation of the TS ensemble needs deeper scrutiny. In general, the TS is defined as the hypersurface separating the reactant and product states. Consequently, if a correct reaction coordinate is defined, trajectories initiated at the TS should have equal probabilities of reaching either the reactant or product state; if an approximate reaction coordinate, such as the distance difference used in this study, is used, recrossing may be introduced as a correction into the probabilities. Thus, in order to establish the presence of a wide TS region, it is necessary to characterize the TS ensemble through a commitment analysis across the TS region.

      We thank the reviewer for suggesting to add a commitment analysis to our calculations. The newly performed commitment analysis is shown in Fig. 4b. The corresponding analysis further strengthens our original findings of the wide TS in the fully active enzyme.

      The relatively flat free energy surface observed near TS in Figures 1c and 2a, may be attributed to the cleavage and formation of P-O bonds relative to the marginally stable phosphorane intermediate, as described in Zhou et al.'s work (Chem Rev 1998, 98:991). This scenario is clearly different from a wide TS ensemble concept. In addition, given the inherent similarity in reactivity of the two oxygens towards the phosphoryl atom, it is reasonable to expect a single TS as shown in Figure 1 - supplement 9, rather than two TSs with a marginally stable intermediate as shown in Figure 1c. Consequently, it remains uncertain whether the elongated P-O bonds observed near the TS and their asymmetry are realistic or potentially an artifact of the pulling/non-equilibrium MD simulations. Further validation in this regard is required.

      The reviewer raises the key issue of how realistic the observation of the wide TSE is, and the possibility of it being a potential artifact of the simulation strategy, and suggests that further validation is required in this regard. According to his/her suggestion, in the revised version we have further validated this key observation by two additional simulations. First, we performed a commitment analysis (see above), and second, we also performed Umbrella Sampling, see Fig. 4a. We consistently observe one wide TSE in the presence of Mg2+, but not in the absence of Mg2+. The fact that this wide TSE is observed with the three strategies (i.e pulling/nonequilibrium MD, commitment analysis, and umbrella sampling) most likely rules out the possibility of an artifact related to the simulation strategy.

      Third, there are several inconsistencies in the free energy results and their discussion. First, the data from Kerns et al. (Kerns, NSMB, 2015, 22:124) indicate that the ATP/AMP -> ADP/ADP reaction proceeds at a faster rate than the ADP/ADP -> ATP/AMP reaction, suggesting that the ADP/ADP state has a lower free energy (approximately -1.0 kcal/mol) compared to the ATP/ATP state. This contrasts with Figure 1c, which shows a higher free energy of 6.0 kcal/mol for the ATP/ADP state. This discrepancy needs to be discussed.

      The reviewer correctly found our experimental result on the equilibrium of about -1 kcal/mol for ADP/ADP relative to ATP/AMP with Mg. Importantly, that was measured at a pH of 7. With a pKA of about 7.2 for ADP, under these experimental conditions more than 50% is in the monoprotonated state. As we found in our QM/MM simulations, for the monoprotonated state the ADP/ADP is much more stable than ATP/AMP (see Figure 1 – supplement 4, about 8 kcal/mol). In contrast, as shown in Fig. 1c and highlighted by the reviewer, for the nonprotonated state the equilibrium is flipped. Consequently our QM/MM simulations roughly recapitulate the ensemble equilibrium of substrates/products measured at pH 7. 

      We should have better described these facts in the manuscript, and we thank the reviewer for noting this point, as it promoted us to better explaining this agreement between experiments and computation for this on enzyme equilibrium between the substrate and product states (see page 11 in the revised manuscript).

      Furthermore, the barrier for ATP/AMP -> ADP/ADP, calculated to be 20 kcal/mol for the fully charged state, exceeds the corresponding barrier for the monoprotonated state. This cautions against the conclusion that the fully charged state is the reactive state. In addition, the difference in the barrier for the no-Mg2+ system compared to the barriers with Mg2+ is substantially too large (21 kcal/mol from the calculation versus 7 kcal/mol from the experimental values). These inconsistencies raise questions as to their origins, whether they result from the use of the pulling/non-equilibrium MD simulation approach, which may yield unrealistic TS geometries, or from potential issues related to the convergence of the determined free energy values. To address this issue, a comparison of results obtained by umbrella sampling and similar methodologies is necessary.

      We agree that these points need to be clarified. For the resubmission, we performed an umbrella sampling for the fully charged nucleotide with Mg2+, and for the noMg2+ systems, and added these new figures to the manuscript (new Fig. 4). We agree with the reviewer that the obtained free energy profiles from the umbrella sampling are more reliable; the original simulations for the monoprotonated state have larger errors, see Fig. 1, supplement 4. Importantly, we experimentally measured the pH dependence of the reaction in the direction ADP/ADP to AMP and ATP, and hence compare the corresponding barriers in this direction.

      In respect to the comparison of the simulated (9.5 kcal/mol) to the experimental barriers with and without Mg, the experimental barrier is 7 kcal/mol for Ca2+ versus no metal, but larger for Mg2+ versus no metal, for which the simulations were performed. The P-transfer with Mg2+ is faster than 500 sec-1, meaning the experimental barrier for the no Mg versus magnesium is ≥ 11 kcal/mol, which is in quite good agreement with our umbrella sampling barrier differences (Fig. 4a). In response to this reviewer’s question, we added these points into the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The authors report the results of QM/MM simulations and kinetic measurements for the phosphoryl-transfer step in adenylate kinase. The main assertion of the paper is that a wide transition state ensemble is a key concept in enzyme catalysis as a strategy to circumvent entropic barriers. This assertion is based on the observation of a "structurally wide" set of energetically equivalent configurations that lie along the reaction coordinate in QM/MM simulations, together with kinetic measurements that suggest a decrease in the entropy of activation.

      We thank the reviewer for the endorsement, and very useful suggestions to improve the manuscript in an revised manuscript. Thanks to the questions, we have edited our manuscript accordingly. All suggested additional simulations and analysis further support our original findings.

      Strengths:

      The study combines theoretical calculations and supporting experiments.

      Weaknesses:

      The role(s) of entropy in enzyme catalysis has been discussed extensively in the literature, from the Circe effect proposed by Jencks and many other works. The current paper hypothesizes a "wide" transition state ensemble as a catalytic strategy and key concept in enzyme catalysis. Overall, it is not clear the degree to which this hypothesis is supported by the data. The reasons are as follows:

      (1) Enzyme catalysis reflects a rate enhancement with respect to a baseline reaction in solution. In order to assert that something is part of a catalytic strategy of an enzyme, it would be necessary to demonstrate from simulations that the activation entropy for the baseline reaction is indeed greater and the transition state ensemble less "wide". Alternatively stated, when indicating there is a "wide transition state ensemble" for the enzyme system - one needs to indicate that is with respect to the non-enzymatic reaction. However, these simulations were not performed and the comparisons were not demonstrated.

      We agree with the reviewer, that the ideal comparison to address enzyme catalytic power is to compare with the baseline reaction in solution. However, as is the case for many biological relevant reactions, in solution the reactions are too slow (i.e have too high barriers) and thus cannot be measured (this reaction would take about 7000 years without the enzyme). Moreover, in many cases, the reaction mechanism in solution is too different to that observed in the enzyme.

      To overcome this problem, another reference reaction is used instead of that in solution, such as a mutant enzyme, or the enzyme lacking a key cofactor, hence a non-optimized enzyme. In the present case, this baseline reaction corresponds to enzyme reaction in the absence of the Mg ion. Consistently, our results clearly show that the reaction without Mg which displays a larger barrier, has a narrower TS. We want to highlight that the extensive and excellent literature about QM/MM calculations of the hydrolysis of ATP hydrolysis in solution, which shows narrow transitions state ensembles, just to mention a few: Klähn, M., Rosta, E., & Warshel, A. (2006).

      On the mechanism of hydrolysis of phosphate monoesters dianions in solutions and proteins.

      Journal of the American Chemical Society, 128(47), 15310–15323. https://doi.org/10.1021/ja065470t; Wang, C., Huang, W., & Liao, J. lou. (2015). QM/MM investigation of ATP hydrolysis in aqueous solution. Journal of Physical Chemistry B, 119(9), 3720–3726. https://doi.org/10.1021/jp512960e.

      (2) The observation of a "wide conformational ensemble" is not a quantitative measure ofentropy. In order to make a meaningful computational prediction of the entropic contribution to the activation of free energy, one would need to perform free energy simulations over a range of temperatures (for the enzymatic and non-enzymatic systems). Such simulations were not performed, and the entropy of activation was thus not quantified by the computational predictions.

      In the present work we do not intend to quantify entropy from the simulations, since such calculations are known to have too large errors.  However, even if not strictly quantified, a wider TS ensemble is a proxy for a larger entropy.

      (3) The authors indicate that lid-opening, essential for product release, and not P-transfer is therate-limiting step in the catalytic cycle and Mg2+ accelerates both steps. How is it certain that the kinetic measurements are reporting on the chemical steps of the reaction, and not other factors such as metal ion binding or conformational changes?

      These questions were indeed the absolute critically ones we needed to answer early for studying how adenylate kinase is catalyzing the reaction by more than 14 orders of magnitude. This was done by a combination of pre-steady state, steady-state experiments combined with NMR dynamics, published in (Kerns et al., 2015), and described in the beginning of this manuscript in Fig. 1a. We agree with the reviewer that for many other enzymes such experimental examination of all microscopic steps for the enzymatic cycle had not been performed, leading to the risk of wrong interpretation of observed kinetic rates.

      (4) The authors explore different starting states for the chemical steps of the reaction (e.g.,different metal ion binding and protonation states), and conclude that the most reactive enzyme configuration is the one with the more favorable reaction-free energy barrier. However, it is not clear what is the probability of observing the system in these different states as a function of pH and metal ion concentration without performing appropriate pKa and metal ion binding calculations. This was not done, and hence these results seem somewhat inconclusive.

      As noted by the reviewer, in the present work our aim was to compare the chemical step of the reaction in different metal ion and protonation states. Our computational results show that the most reactive enzyme configuration is the nonprotonated state with Mg2+ in our forward reaction.

      We actually know what the probability of the metal-bound states are for this enzyme. The experimental data were described in (Kerns et al., 2015), we directly experimentally determined the concentration needed to fully occupy the Mg site with Mg or Ca, therefore no metal binding calculations are needed as the experiments are a direct measurement. From our x-ray structures we know the accurate binding site, and also see full occupancy. This is also true for the pH dependence of the chemical step, measured in this manuscript and shown in Fig. 5b. We note that the excellent agreement between our simulations and the experiments are one of the key features of the current manuscript.  As stated in the manuscript, we analyzed the pH dependence of the P-transfer step and showed that the rate increases with higher pH in the presence of Ca2+, while without a metal the opposite trend is observed. These results further support the QM/MM results showing that the fully-charged nucleotides state was the most reactive in the presence of the metal, whereas in the absence of the cation, only the monoprotonated nucleotides (low pH) were reactive.

      Reviewer #3 (Public Review):

      Summary:

      By conducting QM/MM free energy simulations, the authors aimed to characterize the mechanism and transition state for the phosphoryl transfer in adenylate kinase. The qualitative reliability of the QM/MM results has been supported by several interesting experimental kinetic studies. However, the interpretation of the QM/MM results is not well supported by the current calculations.

      Strengths:

      The QM/MM free energy simulations have been carefully conducted. The accuracy of the semiempirical QM/MM results was further supported by DFT/MM calculations, as well as qualitatively by several experimental studies.

      We thank the reviewer for the positive comments on the manuscript, particularly highlighting the support of the QM/MM results by additional DFT/MM calculations and several experiments.

      Weaknesses:

      (1) One key issue is the definition of the transition state ensemble. The authors appear to define this by simply considering structures that lie within a given free energy range from the barrier. However, this is not the rigorous definition of transition state ensemble, which should be defined in terms of committor distribution. This is not simply an issue of semantics, since only a rigorous definition allows a fair comparison between different cases - such as the transition state in an enzyme vs in solution, or with and without the metal ion. For a chemical reaction in a complex environment, it is also possible that many other variables (in addition to the breaking and forming P-O bonds) should be considered when one measures the diversity in the conformational ensemble.

      We thank the reviewer for noting this issue and for this great suggestion, as this led to a strengthening of the key findings in the revised manuscript version.  According to his/her suggestion, we performed a commitment analysis to properly define the TSE and compare the results between the enzyme in the presence/absence of Mg2+ (see new Fig. 4b).  The results further strengthen our previous finding and interpretation of a wider TSE for the reaction with Mg relative to without Mg.

      (2) While the experimental observation that the activation entropy differs significantly with and without the Ca2+ ion is interesting, it is difficult to connect this result with the "wide" transition state ensemble observed in the QM/MM simulations so far. Even without considering the definition of the transition state ensemble mentioned above, it is unlikely that a broader range of P-O distances would explain the substantial difference in the activation entropy measured in the experiment. Since the difference is sufficiently large, it should be possible to compute the value by repeating the free energy simulations at different temperatures, which would lead to a much more direct evaluation of the QM/MM model/result and the interpretation.

      In the present work we do not intend to quantify entropy from the simulations, since such calculations are known to have too large errors.  However, even if not strictly quantified, a wider TS ensemble is a proxy for a larger entropy. We believe that the additional committor calculations and the umbrella sampling (new Fig. 4a) are a strong support of our original findings, and better suited for supporting our findings as compared to repeating the free energy simulations at different temperatures.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments:

      Make sure consistent units are used, either kJ/mol or kcal/mol.

      Thanks, we made the changes.

      In the case of the mono-protonated simulation, where does the proton transfer between AD(T)P and AMP occur in both the forward and reverse reactions? It is worthwhile to note that the proton transfer may take place at different reaction coordinate values (between the two reactions), as it is not explicitly defined in the reaction coordinate. In this context, it is also necessary to discuss how to combine the results to generate a single free energy profile.

      We agree with the reviewer on this point. Accordingly, we have analyzed for the monoprotonated reaction when (or where in terms of RC) the proton transfer occurs in both forward and reverse reactions. The proton transfer occurs at -0.7 of the reaction coordinate (average value, figures 3-supplement 5 e and f).

      The methods section needs improvements:

      (1) Computational setup of the system: Were the systems neutralized? If so, what types of ions were used, and how many of them were included? If systems were not neutralized, discuss a potential artifact in the results. In addition, if the system for the reverse reaction (and no-Mg2+ systems) was prepared separately, provide details regarding their preparation.

      We thank the reviewer for noting this issue. Accordingly, we have provided the requested additional details of the computational setup in the revised version.

      (2) Simulation parameters: Clarify how non-bonded interactions were treated in both MM and QM/MM simulations. For the QM/MM simulation, specify the time step used, whether the Shake was applied; whether the NPT simulations were performed, and any other relevant parameters.

      We thank the reviewer for noting this issue. Accordingly, we have provided the requested additional details of the simulation parameters.

      (3) Free energy determination strategy: Describe how the two profiles (forward and reverse profiles) were combined and provide a theoretical justification for this approach. Additionally, include a comment on whether Jarzynski's inequality equation is directly applicable to the NPT simulation.

      According to the reviewer request, in the revised version of the manuscript we have described how the two profiles where combined and provided a theoretical justification for this approach.

      Reviewer #3 (Recommendations For The Authors):

      See recommendations in the Public Review regarding the analysis of transition state ensemble and activation entropy.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Response to reviewer #1:

      We thank the reviewer for the further recommendations for improving our presentation. We would like to carefully address the remaining concerns of the reviewer.

      (1) I realize now that I didn't make my point clear enough, which was that as far as I know there is no reason to believe that an oscillatory state cannot be induced with synaptic depression as with spike frequency adaptation when used in the context of the author's model. I'm fine with how the authors have distinguished their model from R&T 2015, but I think the more interesting question is whether there is any reason to believe that STD is not equally capable of doing all the things mentioned in this paper as SFA, and if not why not. I would like the authors to go out on a limb and address this, if only with a few sentences in the discussion. 

      Thank you for pointing this out again. In response to your query regarding the comparison between STD and SFA in generating bump sweeps, we have done simulations based on STD. The results showed that both STD and SFA are capable of inducing bi-directional sweeps. However, (based on our simulations) only SFA can produce uni-directional sweeps. The absence of uni-directional sweeps based on STD may be due to the subtle yet important differences between the two mechanisms. Specifically, STD modulates the neural activity by weakening the recurrent connections, which theoretically can only inhibit recurrent inputs, while SFA can attenuate all forms of excitatory inputs, including external inputs. However, since we did not exhaustively explore the entire parameter space, we cannot conclude that STD is incapable of producing uni-directional sweeps. Future simulations are required.

      According to the Reviewer’s suggestion, we added few sentences to discuss the distinctions between STD and SFA in generating theta sweeps in the CANN in line 432 to 440 in the Discussion session:

      “Based on our simulation, both STD and SFA show the ability to produce bi-directional sweeps within a CANN model, with the SFA uniquely enabling uni-directional sweeps in the absence of external theta inputs. This difference might be due to the lack of exhaustively exploration of the entire parameter space. However, it might also attribute to the subtle yet important theoretical distinctions between STD and SFA. Specifically, STD attenuates the neural activity through a reduction in recurrent connection strength, whereas SFA provides inhibitory input directly to the neurons, potentially impacting all excitatory inputs. These differences might explain the diverse dynamical behaviors observed in our simulations. Future experiments could clarify these distinctions by monitoring changes in synaptic strength and inhibitory channel activation during theta sweeps.”

      (2) I appreciate the inclusion of the experimental data in Fig 6a (though I don't find the left-most panel very useful). I also understand what the authors are trying to convey with plots in 6c and 6c. However, I don't find the text that was added above very helpful at all. I was hoping for a simpler demonstration of the effect, by plotting a series of sequential sweeps (cell index vs time, with color indicating firing rate, as in Fig 2d) in the case of both the slow speed and fast speed regimes. Here, vertical lines could mark the individual theta cycles and the firing of individual cells, showing the constancy of the former but change of the latter. 

      Thank you for your constructive feedback. It seems there might be a misunderstanding in our previous explanation, for which we apologize. The phenomenon we want to elucidate is not an increase in the theta frequency as detected in LFPs, but rather the slope of phase precession with respect to the animal's movement speed. Due to phase precession, the oscillations of place cells as the animal traverses the field is higher than the theta frequency. A plot as Fig 2.d will not make this point clearer, since it shows the baseline theta frequency (i.e., theta sweeps as we claimed previously). A straightforward way of thinking this point is as we added previously: “…The faster the animal runs, the faster the extra half cycle can be accomplished. Consequently, the firing frequency will increase more (a steeper slope in Fig. 6c red dots) than the baseline frequency”. We hope this clarification addresses the concerns raised.

      (3) This is still confusing to me. I just don't understand how the *phase* of the oscillating activity bump has anything to do with the movement of the animal. I would like to see a plot of the sweeps (again, cell index vs time, with color indicating the firing rate) before and after inactivation for short and long duration inactivation. Perhaps I am not understanding or appreciating how the bump recovers after inactivation and how this is related to the motion of the animal. 

      Thank you for pointing this out again. The activity bump will naturally pop out at the input location (which moves forward than before) after we remove the inactivation and then starts to sweep again as before the inactivation. Single cell phase precession and populational theta sweeps are actually the two sides of the same coin (if all cells start at roughly the same phase in theta cycles). If the reviewer accept this, then at the new location, the activity bump sweeps again (around the new location), and therefore phase precession starts again at a further phase, since phase codes the position as the animal traverses the place field.

      (4) I am glad the authors are spending more time discussing this phenomenon, but I am unsure of their explanation: for a sweep moving at constant speed, neurons all along the path will be equally affected (inhibited), so where does the bias for suppressing the "end" neurons come from? 

      While it may appear that neurons along the path are equally inhibited as the bump sweeps over them, our model incorporates external inputs with Gaussian profiles. These inputs bias neurons closer to the input location, resulting in fewer activations in neurons further away from the input position.

      (5) Here I was hoping that the authors might comment on what they suspect happens when the animal starts (or stops) moving, and how the network shifts from tracking regime to oscillatory regime (or vice versa), as is typically seen in experimental data (see for example, Kay et al., 2020, fig 4b,c). My apologies for not making this point clearer. 

      Thank you for pointing this out. In our model, we observed that when the animal stops, the network continues to generate theta oscillations near the input location, albeit with reduced amplitude (so the network dynamics looks like in the tracking regime). However, we hypothesize that when the animal pauses its movement for enough time (immobile but awake states), sensory input into the hippocampus also decreases, which is similar to removing external inputs in our model. In this case, the activity bump spontaneously moves away, resembling the phenomenon of replay (see also Romani & Tsodyks 2015).

      Regarding the experimental data (Kay et al.), it indeed appears that theta sweeps decoded from neural activity become less pronounced when the mouse moves at slower speeds. This observation could potentially correspond to a decrease in the amplitude of bump oscillations when external inputs associated with movement are halted but not entirely removed in our model. However, in experiments, when the mouse's movement slows down, hippocampal activity no longer oscillates at theta frequency, making it challenging to decode theta sweeps.

      We appreciate your clarification on this point and recognize the importance of further investigating how our model can accurately replicate the transition between tracking and oscillatory regimes observed in experimental data.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Weaknesses:

      The readability could be improved.

      We have gone through the paper again and tried to revise the text to improve readability.

      Reviewer #1 (Recommendations For The Authors):

      (1) Thank you for adding the discrimination ratio. However, as Fig 2 and 3 depict the same experimental data, consider harmonizing the presentation (symbols and colors) and consolidating the Figs for clarity.“

      This is an excellent point but it is actually very hard to harmonize symbols and colors because the data are divided in different ways. Upon considering this further, we actually don’t want to make the symbols and colors the same because it would be misleading. For example, WT and Tg training and testing session data are divided into grey and white throughout Figure 2, but in Figure 3, training and testing session data are pooled. To color code them grey and white in Figure 3 might make it seem that in Figure 3 training and testing were separated.

      (2) Fig 5 is missing

      We are not sure why Figure 5 was absent since it was present in our copy of the submitted pdf. We have double checked and in the revised manuscript we are sure Figure 5 is included.  

      (3) Fig 6 add raw data for WT

      We have added raw WT data. Revised figure 6 includes the raw data in part A4.

      (4) Fig 7 add raw data for WT

      We have added raw WT data. Revised Figure 7 includes the raw data in part A4.

  2. www.researchsquare.com www.researchsquare.com
    1. Author response:

      We thank the editor and reviewers for the time invested in our manuscript and their valuable and insightful critiques. However, we believe that the results justified our conclusions in the manuscript well; therefore, we have decided not to revise it.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this useful study, Wang and colleagues investigate the potential probiotic effects of Bacillus velezensis to prevent colitis in a mouse model. They provide solid evidence that B. velezensis limits the growth of Salmonella typhimurium in lab culture and in mice, together with beneficial effects on the microbiota. The work will be of interest to infectious disease researchers and those studying the microbiome.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Wang and colleagues presented an investigation of pig-origin bacteria Bacillus velezensis HBXN2020, for its released genome sequence, in vivo safety issue, probiotic effects in vitro, and protection against Salmonella infection in a murine model. Various techniques and assays are performed.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      Strengths:

      An extensive study on the probiotic properties of the Bacillus velezensis strain HBXN2020.

      Response: Thank you very much for your reading and comments our manuscript.

      Weaknesses:

      - The main results are all descriptive, without new insight advancing the field or a mechanistic understanding of the observed protection.

      Response: Thank you for your comments and suggestions on our manuscript. In later work, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. We appreciate your review and feedback.   

      - Most of the results and analysis parts are separated without a link or any story-telling to deliver a concise message.

      Response: Thank you for your comments and suggestions on our manuscript. The comments improve the quality and depth of manuscript. Based on your suggestions, we have revised modifications to the entire manuscript.

      The updated contents were presented in the revised manuscript.

      - For the Salmonella Typhimurium-induced mouse model of colitis, it is not clear how an oral infection of C57BL/6 would lead to colitis. Streptomycin is always pretreated (https://link.springer.com/protocol/10.1007/978-1-0716-1971-1_17).

      Response: Thank you very much for your reading and comments our manuscript. The S. Typhimurium ATCC14028 (STm) used in this study is a highly virulent strain. The findings of the predimed trial indicated that mice infected with 107 CFU STm exhibited notable symptoms in the absence of streptomycin pretreatment. Hence, streptomycin was not utilized as a pretreatment for mice in this study. We appreciate your review and feedback and hope that our response adequately addresses your concerns.  

      Reviewer #2 (Public Review):

      Summary:

      In this study, Wang and colleagues study the potential probiotic effects of Bacillus velezensis. Bacillus species have the potential benefit of serving as probiotics due to their ability to form endospores and synthesize secondary metabolites. B. velezensis has been shown to have probiotic effects in plants and animals but data for human use are scarce, particularly with respect to salmonella-induced colitis. In this work, the authors identify a strain of B. velezensis and test it for its ability to control colitis in mice.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      Key findings:

      (1) The authors sequence an isolate for B. velezensis - HBXN2020 and describe its genome (roughly 4 mb, 46% GC-content etc).

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      (2) The authors next describe the growth of this strain in broth culture and survival under acid and temperature stress. The susceptibility of HBXN2020 was tested against various antibiotics and against various pathogenic bacteria. In the case of the latter, the authors set out to determine if HBXN2020 could directly inhibit the growth of pathogenic bacteria. Convincing data, indicating that this is indeed the case, are presented.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      (3) To determine the safety profile of BHXN2020 (for possible use as a probiotic), the authors infected the strain in mice and monitored weight, together with cytokine profiles. Infected mice displayed no significant weight loss and expression of inflammatory cytokines remained unchanged. Blood cell profiles of infected mice were consistent with that of uninfected mice. No significant differences in tissues, including the colon were observed.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      (4) Next, the authors tested the ability of HBXN2020 to inhibit the growth of Salmonella typhimurium (STm) and demonstrate that HBXN2020 inhibits STm in a dose-dependent manner. Following this, the authors infect mice with STm to induce colitis and measure the ability of HBXN2020 to control colitis. The first outcome measure was a reduction in STm in faeces. Consistent with this, HBXN2020 reduced STm loads in the ileum, cecum, and colon. Colon length was also affected by HBXN2020 treatment. In addition, treatment with HBXN2020 reduced the appearance of colon pathological features associated with colitis, together with a reduction in inflammatory cytokines.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      (5) After noting the beneficial (and anti-inflammatory effects) of HBXN2020, the authors set out to investigate the effects on microbiota during treatment. Using a variety of algorithms, the authors demonstrate that upon HXBN2020 treatment, microbiota composition is restored to levels akin to that seen in healthy mice.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      (6) Finally, the authors assessed the effect of using HBXN2020 as prophylactic treatment for colitis by first treating mice with the spores and then infecting them with STm. Their data indicate that treatment with HBXN2020 reduced colitis. A similar beneficial impact was seen with the gut microbiota.

      Response: Thanks for the constructive comments and the positive reception of the manuscript.

      Strengths:

      (1) Good use of in vitro and animal models to demonstrate a beneficial probiotic effect.

      Response: Thank you very much for your reading and comments our manuscript.

      (2) Most observations are supported using multiple approaches.

      Response: Thanks for the comments and the positive reception of the manuscript.

      (3) The mouse experiments are very convincing.

      Response: Thanks for the comments and the positive reception of the manuscript.

      Weaknesses:

      (1) Whilst a beneficial effect is observed, there is no investigation of the mechanism that underpins this.

      Response: Thank you for pointing this out. We apologize for any inconvenience caused by the lack of mechanism research of the manuscript. In later work, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. Thank you for your suggestions, and we hope our response has addressed your concerns.

      (2) The mouse experiments would have benefited from the use of standard anti-inflammatory therapies to control colitis. That way the authors could compare their approach of using bacillus spores with the current gold standard for treatment.

      Response: We gratefully appreciate for your valuable comments. The objective of this study is to investigate the potential of B. velezensis spores in mitigating bacterial-induced colitis. In this experiment, animal experimental design referred to the method described in previous studies with slight modifications (10.1038/s41467-019-13727-9, 10.1126/scitranslmed.abf4692). We appreciate your review and feedback. We hope that our response adequately addresses your concerns.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Wang et al. investigates the effects of B. velezensis HBXN2020 in alleviating S. Typhimurium-induced mouse colitis. The results showed that B. velezensis HBXN2020 could alleviate bacterial colitis by enhancing intestinal homeostasis (decreasing harmful bacteria and enhancing the abundance of Lactobacillus and Akkermansia) and gut barrier integrity and reducing inflammation. Overall, the manuscript is of potential interest to readers.

      Response: Thanks for the comments and the positive reception of the manuscript.

      Strengths:

      B. velezensis HBXN2020 is a novel species of Bacillus that can produce a great variety of secondary metabolites and exhibit high antibacterial activity against several pathogens. B. velezensis HBXN2020 is able to form endospores and has strong anti-stress capabilities. B. velezensis HBXN2020 has a synergistic effect with other beneficial microorganisms, which can improve intestinal homeostasis.

      Response: Thanks for the comments and the positive reception of the manuscript.

      Weaknesses:

      There are few studies about the clinical application of Bacillus velezensis. Thus, more studies are still needed to explore the effectiveness of Bacillus velezensis before clinical application.

      Response: Thanks for your suggestion. This study serves as an exploratory investigation before the application of Bacillus velezensis. The main purpose of this study is to explore the potential of Bacillus velezensis in application. We appreciate your review and feedback and hope that our response adequately addresses your concerns.    

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Abstract:

      It is quite wordy, without a clear emphasis on the major point of the study. It is obvious how the host-probiotic-microbiota behaves and why it works out well, which is the key part.

      Response: Thank you for your valuable suggestion. The comments improve the quality of manuscript. We have modified this in the revised manuscript as suggested.

      The updated contents were presented in line 30-32, 34-39 and 41-46 in abstract section of the revised manuscript.

      Please remove "novel", Many previous works have already documented the probiotic Bacillus velezensis. It is also NOT novel species...

      Response: Thank you for your suggestion. We have corrected it as suggested. Please see line 26 in abstract section of the revised manuscript.

      Lines 44-46. The way this conclusion is delivered is inappropriate; it should be clarified exactly according to the supported results.

      Response: Thank you for your valuable suggestion. The comments improve the quality of manuscript. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 44-46 in abstract section of the revised manuscript.

      Introduction:

      Lines 71-71, Lines 75-77, Line 92 "the homeostasis of", please remove.

      Response: Thank you for pointing this out. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 96 in introduction section of the revised manuscript.

      Are the Salmonella loads the key indicator for this model?

      Response: We gratefully appreciate for your valuable comments. In this study, we aimed to evaluate whether B. velezensis can alleviate S. Typhimurium-induced colitis in mice. It has been reported that S. Typhimurium enters the intestine, colonizes and proliferates in the intestinal epithelium, and then breaks through the intestinal barrier to reach the whole body with the blood circulation system, leading to systemic infection. Thereby, the load of Salmonella in the intestine and tissue organs is also one of the key indicators reflecting Salmonella infection. We appreciate your review and feedback and hope that our response adequately addresses your concerns.

      The introduction should really focus on the knowledge gap in general and in a specific field, which is not available in the current version.

      Response: Thank you for your valuable suggestion. The comments improve the depth of the manuscript. We have corrected it as suggested.

      The updated contents were presented in line 53-57, 61-64, 69-75, 85-88 and 97-100 in introduction section of the revised manuscript.

      Results:

      "Genomic Characteristics" of B. velezensis HBXN2020 are separated. There are no links between this work for safety and probiotic effects.

      Response: Thank you for your suggestion. Based on your suggestion, we have revised modifications to the "genomic characteristics" in the results section. Please see line 104-110 and Supplementary Table 2 in revised manuscript and supplemental material.

      Are the AMR and virulent genes available on the chromosome? Is there any gene cluster that codes useful stuff that is linked to probiotic efficacy in vitro and in vivo?

      Response:  Thanks for your suggestion. The comments improve the quality and depth of manuscript. In this study, the HBXN2020 genome contains fragments of AMR and virulence genes. However, the results of antibiotic sensitivity test and safety test showed that HBXN2020 did not exhibit resistance and toxicity. Furthermore, the HBXN2020 genome contains 13 different clusters of secondary metabolic synthesis genes. such as surfactin (genomic position: 323,509), macrolactin H (genomic position: 1,384,185), bacillaene (genomic position: 1,691,549), fengycin (genomic position: 1,865,856), difficidin (genomic position: 2,270,091), bacillibactin (genomic position: 3,000,977) and Bacilysin (genomic position: 3,589,078) (Table S2). These secondary metabolites have been shown to have varying degrees of inhibition on fungi (10.3390/foods11020140), Gram-positive pathogens (10.1371/journal.pone.0251514) and Gram-negative pathogens (10.1007/s00253-017-8095-x). We appreciate your review and feedback and hope that our response adequately addresses your concerns. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 108-110 in results section of the revised manuscript and supplementary Table 2 in the revised supplemental material.

      Finally, the raw data (Illumina, Pacbio) should also be provided.

      Response: Thanks for pointing this out. According to your suggestion, we have submitted the raw data of the HBXN2020 genome to the GenBank database, GenBank accession number CP119399.1. We appreciate your review and feedback and hope that our response adequately addresses your concerns.

      The updated contents were presented in line 770-773 in data availability section of the revised manuscript.

      Lines 100-108, please replace this part for a more meaningful investigation that could be possibly supported by the following experimental assays.

      Response: We gratefully appreciate for your valuable comments. The comments improve the quality and depth of manuscript. Based on your suggestion, we try our best to remove some minor results and supplement more meaningful research findings. We appreciate your review and feedback, and have marked the updated contents in the revised manuscript. Please see line 104-110 and Supplementary Table 2 in revised manuscript and supplemental material.

      Lines 119-126, which are not important, did you further check what or which parts make the bacteriostasis?

      Response: Thanks for pointing this out. According to your suggestion, we try our best to remove some minor results by removing unnecessary words and sentences. Furthermore, in the following research, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. We appreciate your review and feedback and hope that our response adequately addresses your concerns. We have marked the updated contents in the revised manuscript.   

      The updated contents were presented in line 122-124 in results section of the revised manuscript.

      "Biosafety"? Is there a standard way to conduct this investigation? please clarify.

      Response: Thank you for pointing out this problem in manuscript. In this experiment, Biosafety assessment of B. velezensis HBXN2020 referred to the method described by Zhou et al. with slight modifications (10.1038/s41467-022-31171-0). We appreciate your review and feedback and hope that our response adequately addresses your concerns.

      The updated contents were presented in line 651-652 in results section of the revised manuscript.

      Why are spores used, not whole bacteria? Please clarify.

      Response: Thanks for pointing this out. We apologize for any incomprehension caused by the use of B. velezensis HBXN2020 spores in manuscript. In this study, mice were treated with B. velezensis by oral gavage, while gastric acid will drastically reduce the activity of B. velezensis. However, spores tolerated strong acidic environments well. Additionally, previous studies have also precedents of using spores (10.1126/scitranslmed.abf4692). Thank you for your comments and feedback and hope that our response adequately addresses your concerns.

      Line 196, line 287, repeated assays were conducted, but the logical link is missing.

      Response: We gratefully appreciate for your valuable comments. We apologize for any inconvenience caused by the organization and coherence of our results section. According to your suggestion, we try our best to improve the manuscript's layout by removing unnecessary words and revising sentences. We would like to express our apologies once again and hope that the revised manuscript meets your expectations. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 195-198, 246-248, 256-257 and 285-287 in results section of the revised manuscript.

      Discussion:

      Please shorten it; it is wordy but without focus.

      Response: We gratefully appreciate for your valuable comments. The comments improve the quality and depth of manuscript. According to your suggestion, we try our best to shorten the discussion length by removing unnecessary words and revising sentences. We would like to express our apologies once again and hope that the revised manuscript meets your expectations. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 353-355, 358-360, 366-371, 381-385, 395-401, 417-419, 430-438, 459-466, 478-481 and 484-485 in discussion section of the revised manuscript.

      Conclusion:

      Please clarify and rework it.

      Response: Thanks for your suggestion. The comments improve the quality and depth of manuscript. Based on your suggestion, we have now rewritten the conclusion.

      The updated contents were presented in line 492-496 in conclusion section of the revised manuscript.

      Materials and Methods:

      Much more detailed information should be provided.

      Response: Thank you for your suggestion. The comments improve the quality and depth of manuscript. Based on your suggestion, we have revised detailed modifications to the experimental method. We appreciate your review and feedback, and have marked the updated contents in the revised manuscript. Please see line 513-515, 530-533 and Supplementary Table 5 in revised manuscript and supplemental material.

      All previous bacterial sampling and a list of results should be provided as the supplemental document.

      Response: Thank you for your valuable suggestion. The comments improve the quality and depth of manuscript. In this study, we conducted preliminary biological activity testing on 362 isolates of Bacillus against pathogenic bacteria, which included S. Typhimurium ATCC14028, E. coli ATCC35150, S. aureus ATCC43300 and ATCC29213. We found that the antagonistic activity of four strains of BacillusB. subtilis H1, B. velezensis HBXN2020, B. amyloliquefaciens 6-1 and B. licheniformis BSK14)against these pathogenic bacteria, while the rest have no significant activity. So we chose these four strains to further evaluate their antibacterial activity against Gram-negative and Gram-positive pathogens (Supplementary Table 5). Based on the antibacterial test results, we found that B. velezensis HBXN2020 strain had the best antibacterial activity. so we chose B. velezensis HBXN2020 for subsequent experiments. 

      The updated contents were presented in Supplementary Table 5 in supplemental material.

      Minor points:

      All bacterial genera and species should be italicized.

      Response: Thank you for pointing this out. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 26 in abstract section and line 67, 69 in introduction section and line 111 in results section of the revised manuscript.

      Line 39, remove repeated "importantly"

      Response: Thanks for your useful suggestion. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 39 in abstract section of the revised manuscript.

      Lines 55-56, please rewrite.

      Response: Thanks for your suggestion. We have now rephrased the sentence.  

      The updated contents were presented in line 56-57 in introduction section of the revised manuscript.

      The relevant references should be updated, in the right format.

      Response: Thanks for your suggestion. Based on your suggestion, we have revised modifications according to the literature format of eLife magazine.

      The updated contents were presented in reference section of the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Major concerns:

      (1) In Figure 2, the authors make the argument that the increased survival of Bacillus spores at high temperatures and low pH renders the strain useful as a probiotic as it would survive in the gut. However, the gut temperature is not significantly higher than the rest of the body (certainly not 95 degrees). One assumes the pH argument applies to surviving in stomach acid so that spores can travel to the gut. These conclusions should be clarified/revised. The survival in bile salts gastric fluid etc makes more sense.

      Response: Thank you for your suggestion. The comments improve the quality and depth of manuscript. Based on your suggestion, we have revised these conclusions. We would like to express our apologies once again and hope that the revised manuscript meets your expectations. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 129-132 in results section of the revised manuscript.

      (2) The overall differences in the microbiota on the stacked bar graphs are difficult to determine. In many cases, it looks like the HBXN2020 does not have a significant effect. The subsequent scattergrams are more convincing. Perhaps the authors can think of a better way to compare composite populations. If not, I suggest moving these stacked graphs to the supplementary information.

      Response: We gratefully appreciate for your valuable comments. The comments improve the quality and depth of manuscript. Based on your suggestion, we have moved stacked graphs to the supplemental material. In addition, we replaced bar graphs with heatmaps, the differences of microbial community composition among different experimental groups were evaluated using the depth of color. We appreciate your review and feedback, and have marked the updated figures in the revised manuscript. Please see Figure 7and 10 in revised manuscript and supplemental material.

      Minor editorial:

      (1) Line 55 - "....antibiotic therapy is...".

      Response: Thank you for your suggestion. We have corrected it as suggested.

      The updated contents were presented in line 56-57 in introduction section of the revised manuscript.

      (2) Line 60 - replace "emergent search" - poor syntax.

      Response: Thank you for your suggestion. The comments improve the quality of manuscript. We have corrected this in the revised manuscript as suggested.  

      The updated contents were presented in line 61-62 in introduction section of the revised manuscript.

      (3) Line 63 - "...play an important...".

      Response: Thanks for pointing this out. We have now rephrased the sentence.

      The updated contents were presented in line 63-64 in introduction section of the revised manuscript.

      (4) Figure 1C is not very useful, simply reinforces the data from 1A and 1B - this can be moved to the supplementary information.

      Response: Thank you for your valuable suggestion. The comments improve the quality and depth of manuscript.

      Based on your suggestion, we have moved figure 1C to the supplemental material. We appreciate your review and feedback, and have marked the updated figures in the revised manuscript. Please see figures in revised manuscript and supplemental material.

      (5) Line 126, "...that the growth of B. velezensis HBXN2020 was relatively stable." What do the authors mean by this? "Stable" implies no increase in biomass, but the growth curve does not indicate this, there was an increase in biomass after which, the culture appeared to reach a stationary phase. This should be clarified.

      Response: Thanks for pointing this out. The comments improve the quality of manuscript. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 122-124 in results section of the revised manuscript.

      (6) In Figure 5 - all the graphs in panel A can be amalgamated into one figure using different colours/symbols.

      Response: Thank you for your suggestion. The comments improve the quality and depth of manuscript. Based on your suggestion, we have merged all the graphics in panel A in Figure 5 into one figure.

      The updated contents were presented in Figure 5 in the revised manuscript.

      (7) The overall cohesiveness of the manuscript could be improved.

      Response: Thank you for your valuable comments. The comments improve the quality and depth of manuscript. We have revised the entire manuscript based on your suggestions. The updated contents were presented in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      There are some issues that following issues require clarification to improve the quality of the manuscript further.

      (1) L.55: Replace "antibiotic therapies" with "antibiotic therapy".

      Response: Thank you for your suggestion. We have corrected it as suggested.

      The updated contents were presented in line 56-57 in introduction section of the revised manuscript.

      (2) "Bacillus" should be modified to italics in the manuscript (see e.g., L. 26, 65, 68, 109).

      Response: Thank you for your suggestion. The comments improve the quality of manuscript. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 26 in abstract section and line 67, 69 in introduction section and line 111 in results section of the revised manuscript.

      (3) The first appearance of bacterial names in the manuscript requires the full English name (see e.g., L. 158, 159, 160).

      Response: Thank you for pointing out this problem in manuscript. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 153-156 in results section of the revised manuscript.

      (4) L.166 and 167: "we evaluated its biological safety in a mouse model" suggest modifying to "we evaluated the biological safety of HBXN2020 in a mouse model".

      Response: Thanks for your suggestion. We have corrected this as suggested.  

      The updated contents were presented in line 163-164 in results section of the revised manuscript.

      (5) L.229: Replace "suggest" with "suggested".

      Response: Thanks for your suggestion. We have corrected this as suggested.  

      The updated contents were presented in line 226 in results section of the revised manuscript.

      (6) L.367: The tense of "can" should be consistent with "demonstrated".

      Response: Thanks for pointing this out. We have corrected this as suggested.

      (7) L.368 and L. 369: Replace "Gram positive and Gram negative" with "Gram-positive and Gram-negative".

      Response: Thanks for your suggestion. We have corrected this as suggested.  

      (8) L.372: Replace "and" with "as well as".

      Response: Thanks for your useful suggestion. We have corrected this in the revised manuscript as suggested.

      The updated contents were presented in line 365 in discussion section of the revised manuscript.

      (9) NCBI accession number of supplementing 16SrRNA sequencing raw data.

      Response: Thank you for your suggestion. We have added it in the revised manuscript.

      The updated contents were presented in line 770-773 in data availability section of the revised manuscript.

      (10) L. 1020 and L. 1073: It's recommended to reduce the word count in the annotations of Figures 5 and 8.

      Response: Thank you for your valuable suggestion. We have corrected it as suggested.

      The updated contents were presented in the annotations of Figure 5 and Figure 8 in figure legends section of the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We appreciate the reviewers for their insightful comments, which have helped to improve the manuscript. We provide specific examples and a point-by-point response to all comments, below. Based on the Reviewers’ comments, we revised our manuscript, adding considerable amount of new data (found in Fig. 1A,B, 4E-G, 7C,D, 8C,E, S1B,C, S2C-G, S4C, and Video 1). In the main manuscript text, blue fonts indicate added or revised texts. An additional author (Lauren N. Juga) is added for the newly generated data in the revised manuscript.

      Reviewer #1: 

      Sekulovski et al present an interesting and timely manuscript describing the temporal transition from epiblast to amnion. The manuscript builds on their previous work describing this process using stem cell models. 

      They suggest a multi-step process initiated by BMP induction of GATA3, followed by expression of TFAP2A, followed by ISL1/HAND1 in parallel with loss of pluripotency markers. This transition was reproduced through IF analysis of CS6/7 NHP embryo. 

      There are significant similarities in the expression of trophectoderm and the amnion. There are also ample manuscripts showing trophoblast induction following BMP stimulation of primed pluripotent stem cells. The authors should ensure that the amnion indeed is only amnion and not trophectoderm (or the amount of contribution to trophectoderm). As an extension, does the amnion character remain after the 48h BMP4 treatment, and is a trophectoderm-like state adopted as suggested by Ohgushi et al 2022?  

      Thank you for this insightful comment. As pointed out, Ohgushi et al. showed that, in their culture method, amnion is first induced, and extended culturing leads to the formation of trophectoderm-like cells (Ohgushi et al., 2022).

      Importantly, we would like to note that our culture system differs substantially from that of Ohgushi et al. in several respects. First our system uses a 3D culture method while Ohgushi et al. employ 2D hPSC monolayers. Second, the two systems are chemically quite distinct. In our Glass-3D+BMP protocol, cells are cultured in mTeSR media (which contains FGF2 and TGFb1) for two days, by which time they generate 3D pluripotent cysts. BMP is then added to the culture medium for 24 hours, followed by another 24 hours without BMP4. In stark contrast, Ohgushi et al. employ A83-01, an Activin/Nodal signaling inhibitor, and PD173074, an FGF signaling inhibitor (a protocol which they call AP). This treatment leads to spontaneous activation of BMP signaling, but it also clearly inhibits Activin/Nodal and FGF signaling pathways, which remain active in our system. As a result of these distinct chemical as well as geometrical culturing protocols, their system produces amnion and trophectoderm, while our system produces exclusively amnion.

      Further analysis of gene expression data provides additional data supporting our contention that our system produces amnion. Though the gene expression profiles of amnion and trophectoderm are quite similar, specific markers of trophectoderm have been identified including GCM1, PSG1, PSG4 and CGB (Blakeley et al., 2015; Meistermann et al., 2021; Ohgushi et al., 2022; Okae et al., 2018; Petropoulos et al., 2016; Yabe et al., 2016). Importantly, while all of these markers are abundantly expressed in the Ohgushi et al. system, bulk RNA sequencing analysis of our Glass-3D+BMP hPSC-amnion cells reveals that none of these markers are detectable. Indeed, SDC1, a marker that Ohgushi et al. claim distinguishes trophoblast from amnion actually decreases (more than 8-fold) as pluripotent cysts transition to amnion in Glass3D+BMP. Finally, Ohgushi et al. report that ISL1, a key marker of specified amnion population, is initially increased in their system, but is reduced to a basal level overtime. In contrast, in Glass3D+BMP hPSC-amnion, ISL1 expression continuously increases with time, and ISL1 protein expression is seen uniformly throughout the amnion cysts. This uniform expression is also seen in CS6/7 cynomolgus macaque amnion. Together, these results support out conclusion that the Glass-3D+BMP system leads to the formation of amniotic cells, and not trophectoderm cells.

      The functional data does not support a direct function of GATA3 prior to TFAP2A and the authors suggest compensatory mechanisms from other GATAs. If so, which GATAs are expressed in this system, with and without GATA3 targeting? Would it not be equally likely that the other early genes could be the key drivers of amnion initiation, such as ID2? 

      We appreciate this helpful comment. We agree that our data do not provide sufficient evidence for the role of GATA3 in early amniogenesis. We also agree that other early genes could be key drivers, and apologize for including our speculation that focuses only on GATA2. GATA2 was selected because, among the other GATAs, GATA2 and GATA3 are the only abundantly expressed GATA factors. This point suggesting a potentially redundant role of GATA2 is now removed from the manuscript (Line#355 of the original manuscript).

      The targeting of TFAP2A displays a very interesting phenotype which suggests that amnion and streak share an initial trajectory but where TFAP2A is necessary to adopt amnion fate. It would again be important to ensure that this alternative fate is indeed in streak and not misannotated alternative lineages, including trophoblast. 

      Is TBXT induced in this setting as well as in the wt situation during amnion induction? This should be displayed as in Figure 3D and would be nice to be complimented by NHP IF analysis.

      We will address these two closely related comments together.

      TFAP2A-KO cysts contain ISL1+ squamous cells as well as SOX2+ pluripotent cells, suggesting that, while the initial focal amniogenesis is seen, subsequent spreading event is not seen. Interestingly, our new data show that TFAP2A-KO cysts display cells with high TBXT expression (Fig. 8E, Line#373-374). This result suggests that, in the absence of TFAP2A, once amnion lineage progression is halted, more primitive streak-like (TBXThigh) lineage emerges. It is important to note that TBXT expression is not seen in the trophectoderm population of cynomolgus macaque peri-gastrula (Sasaki et al., 2016; Yang et al., 2021).

      As suggested, we now include a TBXT expression time course during hPSC-amnion formation in Fig. S2D of the revised manuscript. These data show weak TBXT expression (transcripts) starting at the 24-hr timepoint. However, a clear TBXT protein signal could not be detected using IF (Fig. S2C), likely because TBXT expression is very low (Line#264-265). While statistically significant compared to the 12-hr timepoint, TBXT expression is 31 FPKM +/- 0.8 (standard deviation) at 24-hr and 48 FPKM +/- 6 at 48-hr. These are low expression values compared to, for example, TFAP2A, which displays 572 FPKM +/- 23 at 12-hr and 1169 FPKM +/- 27 at 24-hr, at which TFAP2A is readily detected using IF. While weak nuclear TFAP2A is seen using IF at 6hr (187 FPKM +/- 7), no clear TFAP2A is detected at 3-hr (74 FPKM +/- 7). Another example is ISL1, which displays 758 FPKM +/- 55 at 24-hr and 1505 FPKM +/- 26 at 48-hr, when ISL can be detected using IF. Importantly, we were not able to detect ISL1 protein expression using IF at

      12-hr, at which its expression level is 12 FPKM +/-18. Lastly, we now show that, in the cynomolgus macaque peri-gastrula, while pSMAD1/5+ primitive streak-derived disseminating cells show abundant TBXT expression, no clear TBXT expression is seen in the amnion territory (Fig. S2G, Line#291-293). 

      Together, these results show that while a TBXTlow state clearly emerges during hPSC-amnion development, in wild-type hPSC cultured in Glass-3D+BMP, TBXT levels remain low throughout amnion differentiation. However, in the absence of TFAP2A, a TBXThigh state is seen, suggesting that TFAP2A is critical for suppressing this TBXThigh state in fate spreading cells, perhaps by preventing BMP responding cells from acquiring embryonic lineages (e.g., mesodermal and/or primordial germ cells).

      The authors should address why they get different results from Castillo-Venzor et al 2023 DOI: 10.26508/lsa.202201706  

      Thank you very much for this helpful suggestion, and we now include a section detailing this in the Discussion (Line#410-432). In short, we propose several possibilities. First, culturing conditions are highly distinct. Castillo-Venzor et al. (Castillo-Venzor et al., 2023) utilize initial “pre-mesoderm” conditioning by Activin and CHIR, followed by treating floating embryoid bodies with a growth factor cocktail (BMP, SCF, EGF and LIF). In contrast, our system (Glass-3D+BMP) employs BMP stimulation of pluripotent cysts. Thus, we suspect that, in the PGCLC differentiation condition, cells are conditioned to the pre-mesodermal lineage. Moreover, we propose that amnion fate spreading may not be present in the PGCLC system, perhaps due to differences in geometry (aggregates versus cysts), or due to differing lineage commitment programs. That is, while initial amniogenesis is seen in the PGCLC system, most cells may already be committed to the PGC-like or mesodermal lineages by the time amnion fate spreading can occur. Alternatively, because several cell types (PGC-like, mesodermal and amniotic) co-exist in the culture by Castillo-Venzor et al., PGC-like and/or mesodermal cells may compensate for the loss of TFAP2A.

      Reviewer #2: 

      In this study, Sekulovski and colleagues report refinements to an in vitro model of human amnion formation. Working with 3D cultures and BMP4 to induce differentiation, the authors chart the time course of amnion induction in human pluripotent stem cells in their system using immunofluorescence and RNA-seq. They carry out validation through comparison of their data to existing embryo datasets, and through immunostaining of post-implantation marmoset embryos. Functional experiments show that the transcription factor TFAP2C drives the amnion differentiation program once it has been initiated. 

      There is currently great interest in the development of in vitro models of human embryonic development. While it is known that the amnion plays an important structural supporting role for the embryo, its other functions, such as morphogen production and differentiation potential, are not fully understood. Since a number of aspects of amnion development are specific to primates, models of amniogenesis will be valuable for the study of human development. Advantages of this model include its efficiency and the purity of the cell populations produced, a significant degree of synchrony in the differentiation process, benchmarking with single-cell data and immunocytochemistry from primate embryos, and identification of key markers of specific phases of differentiation. Weaknesses are the absence of other embryonic tissues in the model, and overinterpretation of certain findings, in particular relating bulk RNA-seq results to scRNA-seq data from published analyses of primate embryos and results from limited (though high quality) embryo immunostainings.  

      We are happy that Reviewer #2 agrees that our Glass-3D+BMP model is important for investigating additional roles of amniogenesis, as well as roles of amnion as a signaling hub, due to the purity of the amniotic cell population, and a high degree of synchrony of differentiation.

      We respectfully disagree that the absence of other embryonic tissues in the model is a weakness: rather, we believe it is a strength because this single lineage amnion model allows us to directly (and independently) investigate mechanisms underlying amnion lineage progression. For example, as noted above in our response to Reviewer #1, use of our hPSCamnion model allowed us to see a very specific and interesting phenotype in the absence of TFAP2A (reduced amnion formation and emergence of an alternative lineage), though previous findings by Castilllo-Venzor et al. concluded that amniogenesis is not affected by loss of TFAP2A. We noted that the culture method used by Castillo-Venzor et al. contains several cell types (amniotic, mesodermal and PGC-like), and that amniogenesis may be intact in that model due to compensation by the presence of these other cell types. That is, while cell-cell interactions can indeed be gleaned in culture systems with several cell types, the presence of multiple cell types and their additional signaling inputs can also confound some aspects of mechanistic investigations. We now include a paragraph in the Discussion of the revised manuscript (Line#410-432), in which we detail these ideas, and suggest that, because of the cell purity, our Glass-3D+BMP model enables robust mechanistic examinations, specifically during amnion formation.

      We address Reviewer #2’s point about bulk vs. single cell transcriptomic similarity analysis in Reviewer’s specific point #4 below. We do, however, want to note here that we have performed the same analysis using a 14-day old cynomolgus macaque peri-gastrula single cell RNA sequencing dataset generated by Yang et al. (Yang et al., 2021), and obtained a lineage trajectory (Fig. 4F, Line#265-268) similar to that seen when the Tyser et al. dataset (Tyser et al., 2021) was used (Fig. 4C).

      Importantly, while cynomolgus macaque early embryo samples are limited, we now include additional staining (Fig. S2G). 

      Reviewer #2 (Recommendations For The Authors): 

      Provide more confirmation of key findings in more than one stem cell line. 

      We now confirm key findings in the H7 human embryonic stem cell line (Fig. S1C).

      Provide stronger evidence e.g. scRNA-seq to support the existence of intermediate cells or tone down the conclusions.  

      We agree that this is a very important point. In our recent study (Sekulovski et al., 2023), we performed single cell RNA sequencing of Gel-3D, another hPSC-amnion model. In this study, we comprehensively described the transcriptome associated with the “intermediate” cell types, as well as CLDN10 as a marker of these cell types. Moreover, we now include additional data showing the molecular characteristics of the TBXTlow intermediate cells during amniogenesis in hPSC-amnion (Fig. S2C, S2D) and d14 cynomolgus macaque peri-gastrula (Fig 4G, replot of single cell RNAseq by (Yang et al., 2021), Line#264-268).

      Provide more data on the expression of DLX5 in the model. 

      We now provide a DLX5 staining time course in Fig. 7C. We find that, similar to ISL1, prominent DLX5 staining is seen in the focal cells at 24-hr post-BMP. Interestingly, at 48-hr, while some cells show high levels of DLX5, some cells show low DLX5 levels; this is of an interest for future investigations.

      (1) L159 - the authors should repeat more of the key results in at least one other hPSC line, to ensure reproducibility of the method. Figure S1 contains minimal information (one timepoint, three genes, one biological replicate) on a single different hPSC line. 

      We now include additional validation analysis using the H7 human ESC line (Fig. S1).

      (2) Figure 1- it is a little difficult to appreciate cyst formation from images taken at one level in the stack, can the authors perhaps show a 3D rendering or video to display morphogenesis better? 

      We now provide all optical sections of cysts shown in Movie 1.

      (3) Figure 1-did the authors carry out podocalyxin staining? This is a standard marker for lumenogenesis.  

      We now provide PODXL staining (Fig. 1A,1B).

      (4) L248 onwards and Figure 4-I am a little skeptical concerning conclusions drawn from an overlay of bulk RNA-seq onto scRNA-seq UMAP plots. I think the authors need to provide some strong justification for this approach. I would be particularly careful about concluding that cells depicted in Fig 4D represent an intermediate close to primitive streak and even more careful about claiming any lineage relationship between T-positive "primitive streak like intermediates" and the trajectory of cells in the model. UMAP is a dimension-reduction technique for the visualization of clusters in high-dimensional data. It is not a lineage-tracing methodology. It would have been preferable for the authors to present their own scRNA-seq data from the model.  

      We are sorry that it was not clear that our approach to find similarity between bulk and single cell RNA-seq data is largely based on a published work (Granja et al., Nature Biotechnology 2019, (Granja et al., 2019)) named projectLSI. Please refer to our Methods section for details of the implementation and how we modified it for better visualization (addressed in Line#667-676 of the original manuscript, now in Line#718-730). The performance of projectLSI was extensively evaluated in the original article. Furthermore, as pointed out, UMAP is indeed a dimension reduction method that has been widely used in single cell RNA-seq research. In addition to visualizing clusters, trajectory analysis, such as RNA-velocity (which is used in this study), is another successful and widely adapted application of UMAP to gauge fate progression. Therefore, we believe that UMAP can be effectively used as a lineage prediction methodology, and that our use of bulk to single cell transcriptomic similarity analysis leveraging projectLSI is well justified at conceptual and technical levels.

      As illustrated in Fig. 5A, we performed RNA-velocity analysis of the Tyser et al. dataset, and our result clearly predicts a differentiation trajectory from Epiblast, a part of the TBXTlow population shown in Fig. 4D, and, then, to Ectoderm/Amnion cells. Consistent with this bioinformatic result, we now show that some cells show some but weak TBXT expression (at the transcript level) at the 24-hr post-BMP timepoint in control hPSC-amnion (Fig. S2D, Line#264-265). Importantly, our conclusion is drawn from a trajectory based on our time course (0, 0.5, 1, 3, 6, 12, 24, and 48 hours post-BMP treatment) which shows a clear transition from epiblast cells to TBXTlow and then finally to the ectoderm/amnion population. Moreover, using the transcriptomic similarity analysis, we found that the loss of TFAP2A leads to emergence of more primitive streak-like transcriptional characteristics (Fig. 8D). Indeed, using IF, we now show that several fate spreading cells in the TFAP2A-KO cysts are TBXThigh (Fig. 8E, Line#373-374). Thus, the new data provide additional evidence for the successful implementation of this bulk/single cell transcriptomic similarity analysis.

      Together, our bioinformatic and localization analyses show that the Glass-3D+BMP system recapitulates the trajectory found in our Tyser et al. RNA-velocity analysis, further supporting the validity of this differentiation trajectory. To avoid confusion, however, we now omit the “primitive streak-like” phrase when describing the TBXTlow cells because, while they may show some TBXT expression, they are likely intermediate fate transitioning cells. Indeed, a recent study by Ton et al. (Ton et al., 2023) showed that the Tyser et al. Primitive Streak cells consist of a mix of several lineage progressing cells (e.g., Epiblast, Non-neural ectoderm, Anterior or caudal primitive streak, PGC). Therefore, these cells are now specifically described as “TBXTlow” state; TBXThigh cells are described as primitive streak-like state.

      (5) L276 Tyser data do come from a primate model; the authors mean NHP.  

      We now specifically state that the validation is performed in a non-human primate model (Line#280).

      (6) Figure 5-though the immunostaining of the CS6/7 monkey embryos is excellent, the authors should not overinterpret these images. What is shown is not a time course, and one can only infer that a particular pattern of gene expression exists in a spatial sense from these images. In the model (Figure 2), the epiblast markers gradually fade and overlap for a time with emergent amnion markers, but in Figure 5 the transition between epiblast and amnion in the embryo seems pretty sharp, at least in terms of gene expression. There may be a few cells in D that show overlap of SOX2 and TFAP2A, but if the authors want to claim that a transition zone exists, they need to produce stronger evidence. Figure 7 is more convincing but see the next point. 

      Thank you for this insightful comment. We now address the nature of the transitioning boundary cell population extensively in our other recent study (Sekulovski et al., 2023).

      (7) Figure 7 further confuses the issue. A zone at either end of the epiblast is clearly positive for Sox2 and the two amnion markers, clearer than in Figure 5, but why does the marker DLX5 overlap with SOX2 in the embryo (7d) but not the model (7C)? Arguments regarding intermediate cell populations would be greatly strengthened by scRNA-seq data on the model system. 

      In our original manuscript, our DLX5 staining was performed at 48-hr post-BMP, at which SOX2 expression is absent in all cells. Our new analysis at the 24-hr timepoint now shows that DLX5 is expressed in SOX2+ cells (this is now presented in Fig. 7C).

      As stated in the point #6, our recent study comprehensively describes the transcriptomic and spatial characteristics of the transitioning boundary cell population (Sekulovski et al., 2023).

      (8) L357 TFAP2C KO does not resemble intermediate cysts in Figure 2. In Figure 2, both SOX2 and amnion markers are co-expressed in the same cells. In 8C, SOX2 and ISL1 are mutually exclusive.  

      We agree with this comment, and now removed this statement pointing out the resemblance (Line#359 of the original manuscript).

      (9) Figure 8d-the same caveats noted above regarding the interpretation of superposition of bulk RNA-seq data with scRNA-seq UMAP analysis apply here.  

      Please refer to our explanation in point#4.

      Reviewer #3: 

      In this work, the authors tried to profile time-dependent changes in gene and protein expression during BMP-induced amnion differentiation from hPSCs. The authors depicted a GATA3 - TFAP2A - ISL1/HAND1 order of amniotic gene activation, which provides a more detailed temporary trajectory of amnion differentiation compared to previous works. As a primary goal of this study, the above temporal gene/protein activation order is amply supported by experimental data. However, the mechanistic insights on amniotic fate decision, as well as the transcriptomic analysis comparing amnion-like cells from this work and other works remain limited. While this work allows us to see more details of amnion differentiation and understand how different transcription factors were turned on in a sequence and might be useful for benchmarking the identity of amnion in ex utero cultured human embryos/embryoids, it provides limited insights on how amnion cells might diverge from primitive streak / mesoderm-like cells, despite some transcriptional similarity they shared, during early development.  

      We are happy that Reviewer #3 appreciates that our model can be used effectively to identify previously unrecognized amniotic gene activation cascade, providing a comprehensive timecourse transcriptomic resource.

      As detailed below, we address specific concerns raised by Reviewer #3. We now provide additional mechanistic insights into amnion fate progression, and include additional transcriptomic comparisons with a cynomolgus macaque single cell RNA sequencing dataset.

      Reviewer #3 (Recommendations For The Authors): 

      (1) The authors generated KO cell lines lacking GATA3 and TFAP2A, respectively. Their results showed some disrupted amnion differentiation only in TFAP2A-KO. Therefore, these data do not provide sufficient evidence to support whether these transcription factors are crucial for amnion fate specification. Perhaps an experiment could be done with overexpression of these markers and testing if they could force hPSC to adopt amnion-like fate.  

      Thank you for this insightful comment. We generated cell lines that enable us to inducibly express GATA3 or TFAP2A, and the transgene expression was induced at d2 (when BMP treatment is normally initiated) until d4. However, this inducible expression did not lead to amniogenesis, and cysts maintained pluripotency. Due to the uninterpretable nature, these results are not included in the revised manuscript.

      As detailed extensively in the manuscript, within each cyst, amniogenesis is initially seen focally, then spreads laterally resulting in fully squamous amnion cysts. This is also seen in our previously published Gel-3D amnion model (extensively described in (Shao et al., 2017)). In the absence of TFAP2A, we showed that the focal amniogenesis is observed, but spreading is not seen, suggesting that TFAP2A controls amnion fate progression. Therefore, while TFAP2A is not critical for the amnion fate specification in the focal cells, our results show that TFAP2A indeed helps to promote amniotic specification of cells neighboring the focal amniotic cells. Moreover, in the revised manuscript, we now show that TFAP2A transgene expression in the TFAP2A-KO background restores formation of fully squamous hPSC-amnion, further establishing the role of TFAP2A in amnion fate progression (Fig. 8C of the revised manuscript, Line#362-364).

      (2) The transcriptomic analysis made by the authors provides some comparison between BMPinduced amnion-like cells in vitro and the amnion-like cells from CS7 human embryo in vivo. However, the data set from the human embryo contains only a limited number of cells, and might not provide a sufficient base for decisive assessment of the true identity of amnion-like cells obtained in vitro. It might help if the authors could integrate their bulk sequencing data with other primate embryo data sets.  

      Thank you for this helpful comment. We have now performed our transcriptional similarity analysis using early (day 14) cynomolgus macaque embryo datasets generated in a study by (Yang et al., 2021), and found that the bulk time-course transcriptome of our hPSC-amnion model overlaps with the cynomolgus macaque amniotic lineage progression (Fig. 4F, Line#265268). We also now provide the expression of key markers within the Yang et al. dataset (GATA3, TFAP2A, ISL1, TBXT, DLX5, Fig. 4G, S2F).

      (3) Following the point above, the authors used transcriptomic analysis to identify several intermediate states of cells during amnion differentiation and claimed that there is a primitivestreak-like intermediate. However, this might be an overstatement. During stem cell culture and differentiation, intermediate states showing a mixture of biomarkers are very common and do not imply that such intermediates have any biological meaning. However, stating that amnion differentiation passes through primitive streak-like intermediates, might imply a certain connection between these two lineages, for which there is a lack of solid support. Instead, a more interesting question might be how amnion and primitive streak differentiation, despite some transcriptomic similarity, diverge from each other during early development. What factors make this difference? The authors might further analyze RNA-seq data to provide some insights.  

      Thank you very much for the insightful comments. 

      We understand Reviewer #3’s concern that the intermediate state that we see may not recapitulate a primitive streak-like state. However, in our original manuscript, we described these cells as “Primitive Streak-like” because those cells were annotated as Primitive Streak in the dataset by Tyser et al. Interestingly, a recent study by Ton et al. showed that the Tyser et al. Primitive Streak cells actually consist of a mixture of different cell lineages (e.g., Epiblast, Nonneural ectoderm, Anterior or caudal primitive streak, PGC (Ton et al., 2023)). Therefore, we agree that it was an overstatement to call them “Primitive Streak-like”, and, to avoid confusions, we now label the TBXTlow sub-population found in the Tyser et al. Primitive Streak population as “TBXTlow state” throughout the manuscript.

      Our data indicate that TFAP2A may play a role in controlling the lineage decision between amnion and primitive streak cells that abundantly express TBXT (TBXThigh). In the original manuscript, we included data showing that 48-hr TFAP2A-KO cysts show transcriptomic characteristics similar to some Primitive Streak cells (Fig. 8D). Intriguingly, our new data show that, in the absence of TFAP2A, some TBXThigh cells are indeed seen (Fig. 8E, Line#373-374). These results provide a body of evidence for the role of TFAP2A in promoting the amniotic lineage, perhaps by suppressing the TBXThigh state. This point is now addressed in the Discussion (Line#401-409).

      Additional new data:

      Using Western blot, we now show that GATA3 is absent in the GATA3-KO lines (Fig. S4C). We noticed that this was lacking in the original manuscript.

      We now show that an inducible expression of TFAP2A in the TFAP2A-KO cysts leads to controllike cysts (Fig. 8C, Line#362-364).

      Additional changes:

      Typos were fixed in Fig. 5I – “boundary” and “disseminating” were not spelled correctly.

      Line#350 – we originally noted “GATA3 expression precedes TFAP2A expression by approximately 12 hours”. This was incorrect, and is changed to 9 hours in the revised manuscript. We apologize for this mistake.

      REFERENCES

      Blakeley, P., Fogarty, N.M., del Valle, I., Wamaitha, S.E., Hu, T.X., Elder, K., Snell, P., Christie, L., Robson, P., and Niakan, K.K. (2015). Defining the three cell lineages of the human blastocyst by single-cell RNA-seq. Development 142, 3151-3165.

      Castillo-Venzor, A., Penfold, C.A., Morgan, M.D., Tang, W.W., Kobayashi, T., Wong, F.C., Bergmann, S., Slatery, E., Boroviak, T.E., Marioni, J.C., et al. (2023). Origin and segregation of the human germline. Life Sci Alliance 6.

      Granja, J.M., Klemm, S., McGinnis, L.M., Kathiria, A.S., Mezger, A., Corces, M.R., Parks, B., Gars, E., Liedtke, M., Zheng, G.X.Y., et al. (2019). Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia. Nature biotechnology 37, 1458-1465. Meistermann, D., Bruneau, A., Loubersac, S., Reignier, A., Firmin, J., Francois-Campion, V., Kilens, S., Lelievre, Y., Lammers, J., Feyeux, M., et al. (2021). Integrated pseudotime analysis of human pre-implantation embryo single-cell transcriptomes reveals the dynamics of lineage specification. Cell stem cell 28, 1625-1640 e1626.

      Ohgushi, M., Taniyama, N., Vandenbon, A., and Eiraku, M. (2022). Delamination of trophoblastlike syncytia from the amniotic ectodermal analogue in human primed embryonic stem cellbased differentiation model. Cell reports 39, 110973.

      Okae, H., Toh, H., Sato, T., Hiura, H., Takahashi, S., Shirane, K., Kabayama, Y., Suyama, M., Sasaki, H., and Arima, T. (2018). Derivation of Human Trophoblast Stem Cells. Cell stem cell 22, 50-63 e56.

      Petropoulos, S., Edsgard, D., Reinius, B., Deng, Q., Panula, S.P., Codeluppi, S., Plaza Reyes, A., Linnarsson, S., Sandberg, R., and Lanner, F. (2016). Single-Cell RNA-Seq Reveals Lineage and X Chromosome Dynamics in Human Preimplantation Embryos. Cell 165, 1012-1026.

      Sasaki, K., Nakamura, T., Okamoto, I., Yabuta, Y., Iwatani, C., Tsuchiya, H., Seita, Y., Nakamura, S., Shiraki, N., Takakuwa, T., et al. (2016). The Germ Cell Fate of Cynomolgus Monkeys Is Specified in the Nascent Amnion. Developmental cell 39, 169-185.

      Sekulovski, N., Juga, L.L., Cortez, C.L., Czerwinski, M., Whorton, A.E., Spence, J.R., Schmidt, J.K., Golos, T.G., Gumucio, D.L., Lin, C.-W., et al. (2023). Identification of amnion progenitor-like cells at the amnion-epiblast bounday in the primate peri-gastrula. bioRxiv doi:

      10.1101/2023.09.07.556553.

      Shao, Y., Taniguchi, K., Townshend, R.F., Miki, T., Gumucio, D.L., and Fu, J. (2017). A pluripotent stem cell-based model for post-implantation human amniotic sac development. Nature communications 8, 208.

      Ton, M.N., Keitley, D., Theeuwes, B., Guibentif, C., Ahnfelt-Ronne, J., Andreassen, T.K., Calero-Nieto, F.J., Imaz-Rosshandler, I., Pijuan-Sala, B., Nichols, J., et al. (2023). An atlas of rabbit development as a model for single-cell comparative genomics. Nature cell biology 25, 10611072.

      Tyser, R.C.V., Mahammadov, E., Nakanoh, S., Vallier, L., Scialdone, A., and Srinivas, S. (2021). Single-cell transcriptomic characterization of a gastrulating human embryo. Nature 600, 285289.

      Yabe, S., Alexenko, A.P., Amita, M., Yang, Y., Schust, D.J., Sadovsky, Y., Ezashi, T., and Roberts, R.M. (2016). Comparison of syncytiotrophoblast generated from human embryonic stem cells and from term placentas. Proceedings of the National Academy of Sciences of the United States of America 113, E2598-2607.

      Yang, R., Goedel, A., Kang, Y., Si, C., Chu, C., Zheng, Y., Chen, Z., Gruber, P.J., Xiao, Y., Zhou, C., et al. (2021). Amnion signals are essential for mesoderm formation in primates. Nature communications 12, 5126.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      The work by Zeng et al. comprehensively explored the differences in the effects of leaf and soil microbes on the seed germination, seedling survival, and seedling growth of an invasive forb, Ageratina adenophora, and found evidence of stronger effects of leaf microbes on Ageratina compared with soil microbes, which were negative for seed germination and seedling survival but positive for seedling growth. By further DNA sequencing and fungal strain cultivation, the authors were able to identify some of the key microbial guilds that may facilitate such negative and positive feedback.

      Thank you very much for your assessment.

      Strengths:

      (1) The theoretic framework is well-established.

      (2) Relating the direction of plant-microbe feedback to certain microbial guilds is always hard, but the authors have done a great job of identifying and interpreting such relationships.

      Thank you very much for your assessment.

      Weaknesses:

      (1) In the G0 and G21 inoculation experiments, allelopathic effects from leaf litters had not been accounted for, while these two experiments happened to be the ones where negative feedback was detected.

      We did not directly test the allelopathic effects. However, we actually also recorded seed germination time (GT) and rate (GR), as well as the seedling mortality rate (MR) for those treatments inoculated soil and leaf after sowing 28 days (G28 inoculation). It is allowed us to observe possible allelopathic effect by comparing sterile sample with control (nothing inoculated during the first 28 days). In this version, we added the result of GT, GR and MR for nothing inoculated (treated as control) in Figure 1, and described results as: “When inoculated at G0 period, the sterile leaf inoculation significantly delayed germination time more than soil and sterile leaves inoculation and control (nothing inoculated) (Fig. 1a, P < 0.05)” (see Line102-104). We have also discussed this point in the resubmitted version as: “Our study did not directly test the allelopathic effects of leaf litter. However, leaf litter possibly produces allelochemicals that adversely impact A. adenophora seed germination time and seedling survival. We observed that sterile leaf litter inoculation caused longer GTs than sterile soil and the control (nothing inoculated) (Fig. 1a). Interestingly, sterile leaf litter inoculation also caused longer GTs than nonsterile leaf litter inoculation, suggesting that some pathways through which leaf microbes alleviate the adverse effects of leaf allelopathy on GTs are unknown. Moreover, sterile leaf inoculation at G0 caused a 19.7% mortality rate for seedlings growing in petri dishes (Fig. 1c), but no dead seedlings were observed when the plants were not inoculated (Fig. 1a, S1).

      Nonetheless, our study highlighted the adverse microbial role of leaf litter in seedling mortality because nonsterile leaves have significantly greater seedling mortality (96.7%) than sterile leaves (19.7%) (Fig. 1c)” in Line 289-301. 

      (2) The authors did not compare the fungal strains accumulated in dead seedlings to those accumulated in live seedlings to prove that the live seedlings indeed accumulated lower abundances of the strains that were identified to increase seedling mortality.

      Thanks for your concerns. We have not isolated fungi from healthy seedlings to make a comparative study. However, our team work previously found that the seedling-killing Allophoma strains obtained in this study had the same ITS genes as the leaf endophyte and leaf spot pathogen Allophoma associated with mature A. adenophora individual; some seedling-killing Alternaria also occur in healthy seedlings inoculated by leaf litter. We thus assumed that these seedling-killing fungi, e.g., Allophoma and Alternaria, likely exist in A. adenophora mature individual by a lifestyle switch from endophytic to pathogenic, and these fungi can kill seedling only at very early life stage of A. adenophora

      Thus, we discussed this point as: “In particular, the numerically dominant Allophoma strains obtained in this study had the same ITS genes as the leaf endophyte and leaf spot pathogen Allophoma associated with A. adenophora (Chen et al., 2022; Kai Fang et al., 2021; Yang et al., 2023). Interestingly, a previous report revealed that the dominant genera in healthy seedlings inoculated with leaf litter were Didymella and Alternaria (Kai Fang et al., 2019). We did not isolate fungi from healthy seedlings to determine whether the live seedlings indeed lacked or accumulated a lower abundance of the seedling-killing strains than did the dead seedlings in this study. We could assume that these fungal genera likely exist in A. adenophora mature individual experiencing a lifestyle switch from endophytic to pathogenic and play an essential role in limiting the population density of A. adenophora monocultures by killing seedlings only at very early stages. Thus, it is worth exploring the dynamic abundance of these strains and host resistance variation during A. adenophora seedling development.” in Line 432-

      444. 

      (3) The data of seed germination and seedling mortality could have been analyzed in the same manner as that of seedling growth, which makes the whole result section more coherent. I don't understand why the authors had not calculated the response index (RI) for germination/mortality rate and conducted analyses on the correlation between these RIs with microbial compositions.

      Thanks so much. Response index (RI) was calculated as:

      (variablenonsterile–variablesterile)/variablesterile)). Because mortality rates of some sterile groups were zero values, it is impossible to calculate their RIs. Relatively, only leaf microbes affect seed germination time (GT), leaf and soil microbes did not affect germination rate (GR) (see Fig. 1a,b). Therefore, we preferred to make a direct comparison of the difference between nonsterile and sterile treatments (also see Figure 1d) to assess microbial effect, and we also conducted a correlation by these values with microbial compositions rather than by RIs (see Fig. 3). We emphasized this point in the Materials and Methods in our resubmitted revision as: “Because the mortality rates of some sterile groups were zero and their RIs were impossible to calculate, we had to directly compare the seedling mortality caused by nonsterile with by sterile samples and perform the analysis of correlation between the mortality rate and microbial composition.” in Line 565-568. 

      (4) The language of the manuscript could be improved to increase clarity.

      We have improved language in the resubmitted version.

      Reviewer #2 (Public Review):

      Summary: 

      The study provides strong evidence that leaf microbes mediate self-limitation at an early life stage. It highlights the importance of leaf microbes in population establishment and community dynamics. 

      Thank you very much for your assessment.

      The authors conducted three experiments to test their hypothesis, elucidating the effects of leaf and soil microbial communities on the seedling growth of A. adenophora at different stages, screening potential microbial sources associated with seed germination and seedling performance, and identifying the fungus related to seedling mortality. The conclusions are justified by their results. Overall, the paper is wellstructured, providing clear and comprehensive information.

      Thank you very much for your assessment.

      Reviewing Editor (Recommendations For The Authors):

      In addition to the assessments from the reviewers, we have the following comments on your paper:

      (1) The experimental design is complicated with regard to the multiple interacting treatments. The statistical analyses show that the interaction terms are important and significant. In this case, it could be more informative to show the detailed results at the sub-level than at the main level in the main text. For example, the main effects of inoculation sources and nutrients shown in Figure 2 are difficult to interpret, because the effects of inoculation sources and nutrients have important dependencies with each other and other factors such as inoculation time as shown in Figure S3. Therefore, Figure S3 is more informative than Figure 2. Please also be cautious that it would be necessary to clarify this context dependence when showing and citing results of the main effect to avoid any possible misunderstanding, such as the case of Figure 2 and S3.

      Thanks for your suggestion. We have deleted Figure 2 and placed Figure S3 in the text as Figure 2. And corresponding results have rewritten as “leaf inoculation caused significantly greater seedling mortality than did soil inoculation (P < 0.001); the nonsterile sample caused greater seedling mortality than did the sterile sample, especially leaf inoculation during the G0 and G21 periods. Moreover, nonsterile leaf inoculation at earlier stages significantly increased seedling mortality compared with that at later stages (Fig. 1d, P < 0.05). However, seedling mortality did not differ between the high- and low-nutrient conditions, regardless of leaf or soil inoculation (Fig. 1d, both P > 0.05).” in Line 109-115.

      (2) Response index (RI) is already a measure of microbial feedback effect, so that feedback may not be necessary as an explanatory variable in the model with RI as the response variable.

      We are sorry that our writing misunderstood you. Here the word “feedback” (e.g., foliage- or soil feedback) does not represent microbial feedback effect, it means leaf or soil inoculation. We have replaced “feedback” by “inoculation source” in the figures and text for better understanding.

      (3) Mortality rate is a ratio. It is unclear whether assuming a Gaussian error distribution is fine in your case. It would be important to check the residual distribution and to see whether data transformation (e.g., log) or using other error assumptions (e.g., binomial) is necessary.

      Thanks for your suggestion. As you say, it is not appropriate to use generalized linear models (GLMs) with Gaussian error distributions (identity link) to evaluate seedling mortality, because mortality rate is a ratio, which do not meet normality. Thus, we deleted the result of GLM of seedling mortality and directly compared seedling mortality between different microbial treatments, inoculation time, nutrition level and inoculation source by Mann–Whitney U test and Kruskal–Wallis test (see Fig.1 d). All corresponding results have also been rewritten as “leaf inoculation caused significantly greater seedling mortality than did soil inoculation (P < 0.001); the nonsterile sample caused greater seedling mortality than did the sterile sample, especially leaf inoculation during the G0 and G21 periods. Moreover, nonsterile leaf inoculation at earlier stages significantly increased seedling mortality compared with that at later stages (Fig. 1d, P < 0.05). However, seedling mortality did not differ between the high- and low-nutrient conditions, regardless of leaf or soil inoculation (Fig. 1d, both P > 0.05).” in Line 109-115.

      (4) Please be consistent about the wording of different treatment names throughout the texts, tables, and figures. For example, "feedback" should only be used for microbial treatment, but not for inoculation source treatment (e.g., Figure 2). We can say there is an effect of microbial feedback only if we compare sterile vs. non-sterile groups, otherwise, there could be other effects, for example, the allelopathic effect pointed out by Reviewer #1. When writing inoculation, please be specific about whether it is for inoculation time or inoculation source (e.g., within multiple statistical tables in the appendix).

      Thanks for your good suggestion. We have changed “different feedback” into “different inoculation source” for better understanding our story.

      (5) Please clarify which inoculation periods they are for Figures 1d-g.

      Thanks for your good suggestion. We have added inoculation periods in Fig.1.

      Reviewer #1 (Recommendations For The Authors):

      Specific comments:

      Lines 12-15: This sentence is too long and complicated, making it unclear what had been done and what had not in previous studies.

      Thanks a lot. We have reorganized this sentence as: “However, how the phyllosphere and rhizosphere soil microbes distinctively affect seedling mortality and the growth of invasive plants across ontogeny under varying soil nutrient levels remains unclear.”.

      Line 19: is it appropriate to use "enrich" here?

      Thanks. We have changed “Microbial inoculation at different growth stages altered the microbial community and functions enriched in seedlings” into “Microbial inoculation at different growth stages altered the microbial community and functions of seedlings”.

      Line 24-25: "litter exhibited phylogenetic signals"? not clear what this means.

      Thanks. Significant phylogenetic signals represent the seedling-killing effects of fungal strains on A. adenophora were related to phylogenetic relatedness of these strains. So, we have changed “fungal strains isolated from dead seedlings inoculated with litter exhibited significant phylogenetic signals to seedling mortality” into “the A. adenophora seedling-killing effects of fungal strains isolated from dead seedlings by non-sterile leaf inoculation exhibited significant phylogenetic signals, by which strains of Allophoma and Alternaria generally caused high seedling mortality.”

      Line 29: using "in turn" in the first sentence seems weird.

      We deleted this.

      Lines 32-33: PSFs are usually positive because of?

      We have changed “PSFs have positive effects by escaping soil pathogens and recruiting some beneficial microbes” into “PSFs are usually positive because of escaping soil pathogens and recruiting some beneficial microbes”.

      Line 54: why emphasize "a single soil microbe"?

      Although the research of Geisen et al., (2021) assessed the effect of each strain of 34 isolates on seed germination and plant growth, Jevon et al., (2020) focused on the soil microbial community on seedling and adult plants survival. Thus, we changed “a single soil microbe” into “soil microbes”.

      Lines 85-86: "tested their mortality to seedlings"? not clear what this means.

      We are so sorry that our writing misunderstood you. We have changed “we also isolated the fungi associated with the dead seedlings and tested their mortality to seedlings.” into “we also isolated the fungi associated with the dead seedlings and tested their seedling-killing effects on A. adenophora.”.

      Results: no statistics and no references for the statistical tables that could support the results were presented in this section.

      We have deleted the inappropriate generalized linear models (GLMs) with Gaussian error distributions (identity link) for evaluating seedling mortality, and all corresponding results have also described (see Line 109-115 and Fig. 1d).

      Lines 100-102: this subtitle reads more like a summary of the following results than a title. All subtitles in the Result section have similar issues (i.e. Lines 148-150, 207-209).

      Thanks, we subdivided our Results into four sections and we changed these subtitles as:” Effects of leaf litter and rhizosphere soil on the mortality and growth of A. adenophora seedlings”, “Correlations of microbial community composition and potential function with seedling mortality at the early stage”, “Enrichment of microbial community and function by A. adenophora seedlings under different treatments”, and “Correlations of the enriched microbial community and function with A. adenophora seedling growth”.  

      Lines 148-206: since there are a lot of results concerning the microbial composition, I suggest focusing on those that could directly explain the positive or negative feedback. The one concerning diversity (e.g. Figure 3 and corresponding texts) does not seem necessary.

      Thanks for your suggestion. We have moved figure 3 into the supplementary figures as Figure S2. To focus on core microbes that could directly explain the positive or negative feedback, we reordered Figure 3, where firstly showed the core soil and leaf bacteria, bacterial functions, as well as core soil and leaf fungi, fungal function (Fig3 a-h); and then showed the correlations of top 30 bacterial and fungal genera from soil and leaf with seedling mortality rate (Fig3 i-j). 

      Line 180: is it not common sense that ectomycorrhiza can only be found in soil?

      Yeah, it is. We have deleted this sentence.

      Line 199: "the seedling mortality of these strains"? not clear what this means,

      We have changed “The seedling mortality of these strains” into “The seedling-killing of these strains on A. adenophora”.

      Line 291-292: I don't see how the authors can distinguish between allelopathic and pathogenic effects based on their results.

      We did not directly test the allelopathic effects. However, we actually also recorded seed germination time (GT) and rate (GR), as well as the seedling mortality rate (MR) for those treatments inoculated soil and leaf after sowing 28 days (G28 inoculation). It is allowed us to observe possible allelopathic effect by comparing sterile sample with control (nothing inoculated during the first 28 days). In this version, we added the result of GT, GR and MR for nothing inoculated (treated as control) in Figure 1, and described results as: “When inoculated at G0 period, the sterile leaf inoculation significantly delayed germination time more than soil and sterile leaves inoculation and control (nothing inoculated) (Fig. 1a, P < 0.05)” (see Line102-104). We have also discussed this point in the resubmitted version as: “Our study did not directly test the allelopathic effects of leaf litter. However, leaf litter possibly produces allelochemicals that adversely impact A. adenophora seed germination time and seedling survival. We observed that sterile leaf litter inoculation caused longer GTs than sterile soil and the control (nothing inoculated) (Fig. 1a). Interestingly, sterile leaf litter inoculation also caused longer GTs than nonsterile leaf litter inoculation, suggesting that some pathways through which leaf microbes alleviate the adverse effects of leaf allelopathy on GTs are unknown. Moreover, sterile leaf inoculation at G0 caused a 19.7% mortality rate for seedlings growing in petri dishes (Fig. 1c), but no dead seedlings were observed when the plants were not inoculated (Fig. 1a, S1).

      Nonetheless, our study highlighted the adverse microbial role of leaf litter in seedling mortality because nonsterile leaves have significantly greater seedling mortality (96.7%) than sterile leaves (19.7%) (Fig. 1c)” in Line 289-301.

      Lines 383-414: Correlations are not necessarily causations. Sometimes a strong correlation may result from higher-order interaction. The authors should be more cautious about the discussion of microbial function in this section.

      Thanks. We deleted all descriptions of adverse effect or beneficial effect on host plant A. adenophora growth and cautiously used “negative correlation or positive correlation” to discuss the functions of these enriched microbes by A. adenophora. In the last, we also added a sentence to say: “It is necessary to isolate these enriched microbes to test the interactions with the early life stage of A. adeonophora.”

      (see Line 411-413).

      Lines 489-490: I don't really understand why the authors performed a combination treatment. What did they expect from such a combination?

      Thanks. We described our consideration as: “Leaf inoculation at G28 was performed to simulate natural microbial spread from the leaf litter to the above part of the seedlings by suspending the leaf bag over the transplanted seedlings without direct contact all the time (see Zaret et al. (2021)). This method may result in only microbial species with easy air transmission to infect seedlings. Thus, an additional combination inoculation (named G21+28) was performed on both the 21st (with seedling contact) and 28th days (without seedling contact) to ensure that most leaf microbes had the opportunity to reach the seedlings.” see Line 498-505.

      Figure 1: why not use "mortality rate" instead of "death rate"?

      Thanks. We have changed “death rate” into “mortality rate” in all corresponding figures and text.

      Figure 8: This is a very complicated experimental setup. Why did the authors harvest the plants treated with nutrient addition after the 12th day of the experiment and harvest those without nutrient addition after the 16th day? Why the time lag?

      Thanks. We explained this as: “Seedlings were harvested after 8 weeks of growth under high-nutrient conditions because they grew too fast and touched the PTFE cover; however, we harvested those plants grown under low-nutritional conditions after another 4 weeks of growth due to their very small size (see Fig. S6).”

      (see Method in Line 514-517).

    1. Author response:

      Reviewer #1 (Public Review):

      This study by Popli et al. evaluated the function of Atg14, an autophagy protein, in reproductive function using a conditional knockout mouse model. The authors showed that female mice lacking Atg14 were infertile partly due to defective embryo transport function of the oviduct and faulty uterine receptivity and decidualization using PgrCre/+; Atg14f/f mice. The findings from this work are exciting and novel. The authors demonstrated that a loss of Atg14 led to an excessive pyroptosis in the oviductal epithelial cells that compromises cellular integrity and structure, impeding the transport function of the oviduct. In addition, the authors use both genetic and pharmacological approaches to test the hypothesis. Therefore, the findings from this study are high-impact and likely reproducible. However, there are multiple major concerns that need to be addressed to improve the quality of the work.

      We thank the reviewer for insightful comments and helpful suggestions. We will address majority of the concerns. Specifically, we will evaluate whether loss of Atg14 leads pyroptosis in other reproductive tract tissue, uterus, and ovary. To determine the ATG14 spatiotemporal expression, we will assess the ATG14 expression in oviducts of WT, and cKO mouse models. Further, to understand the impact of Atg14 loss on different regions of oviduct, we would provide additional images from cKO mice and will quantify FOXJ1 positive cells. To address the concerns on cyclicity and steroid hormone levels, we will measure the E2 or P4 levels and assess E2-target genes in uterus from control and cKO mice. We will also include the ampullary section images from the oviducts of Atg14 cKO and control females.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Popli et al investigated the roles of the autophagy-related gene, Atg14, in the female reproductive tract (FRT) using conditional knockout mouse models. By ablation of Atg14 in both oviduct and uterus with PR-Cre (Atg14 cKO), the authors discovered that such females are completely infertile. They went on to show that Atg14 cKO females have impaired embryo implantation and uterus receptivity due to impaired response to P4 stimulation and stromal decidualization. In addition to the uterus defect, the authors also discovered that early embryos are trapped inside the oviduct and cannot be efficiently transported to the uterus in these females. They went on to show that oviduct epithelium in Atg14 cKO females showed increased pyroptosis, which disrupts oviduct epithelial integrity and leads to obstructive oviduct lumen and impaired embryo transport. Therefore, the authors concluded that autophagy is critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable proper embryo transport.

      Strengths:

      This study revealed an important and unexpected role of the autophagy-related gene Atg14 in preventing pyroptosis and maintaining oviduct epithelial integrity, which is poorly studied in the field of reproductive biology. The study is well designed to test the roles ofATG14 in mouse oviduct and uterus. The experimental data in general support the conclusion and the interpretations are mostly accurate. This work should be of interest to reproductive biologists and scientists in the field of autophagy and pyroptosis.

      Weaknesses:

      Despite the strengths, there are several major weaknesses raising concerns. In addition, the mismatched figure panels, the undefined acronyms, and the poor description/presentation of some of the data significantly hinder the readability of the manuscript.

      (1) In the abstract, the authors stated that "autophagy is critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable embryo transport". This statement is not substantiated. Although Atg14 is an autophagy-related gene and plays a critical role in oviduct homeostasis, the authors did not show a direct link between autophagy and pyroptosis/oviduct integrity. In addition, the authors pointed out in the last paragraph of the introduction that none of the other autophagy-related genes (ATG16L, FIP200, BECN1) exhibited any discernable impact on oviduct function. Therefore, the oviduct defect is caused by Atg14 specifically, not necessarily by autophagy.

      We agree with the reviewer on this, we will take a cautious approach and will modify the statements that ATG14 dependent autophagy might be critical for maintaining the oviduct homeostasis and keeping the inflammation under check to enable embryo transport.

      (2) In lines 412-414, the authors stated that "Atg14 ablation in the oviduct causes activation of pyroptosis", which is also not supported by the experimental data. The authors did not show that Atg14 is expressed in oviduct cells. PR-Cre is also not specific in oviduct cells. It is possible that Atg14 knockout in other PR-expressing tissues (such as the uterus) indirectly activates pyroptosis in the oviduct. More experiments will be required to support this claim. In line with the no defect when Atg14 has knocked out in oviduct ciliary cells, it will be good to use the secretory cells Cre, such as Pax8-Cre, to demonstrate that Atg14 functions in the secretory cells of the oviduct thus supporting this conclusion.

      To address Atg14 action in oviduct, we will perform ATG14 IHC staining in the oviduct and also evaluate the GSDMD expression in uteri and ovary, wherein PR-cre expression is active. Further, we will provide literature-based evidence for PR-cre expression in the oviduct, which is well-established. However, generating a secretory Pax-8 cell cre mice model will require a substantial amount of time and effort and we respectfully argue that this is currently out of the scope of this manuscript.

      (3) With FOXJ1-Cre, the authors attempted to specifically knockout Atg14 in ciliary cells, but there are no clear fertility and embryo implantation defects in Foxj1/Atg14 cKO mice. The author should provide the verification data to show that Atg14 had been effectively depleted in ciliary cells if Atg14 is normally expressed.

      We will perform expression analysis for ATG14 in Foxj1/Atg14 cKO mice to determine the effective ablation in cilia.

      (4) In lines 307-313, the author tested whether ATG14 is required for the decidualization of HESCs. The author stated that "Control siRNA transfected cells when treated with EPC seemed to change their morphological transformation from fibroblastic to epithelioid (Fig. 2E) and had increased expression of the decidualization markers IGFBP1 and PRL by day three only (Fig. 2F)". First, the labels in Figure 2 are not corresponding to the description in the text. Second, the morphology of the HESCs in the control and Atg14 siRNA group showed no obvious difference even at day 3 and day 6. The author should point out the difference in each panel and explain in the text or figure legend.

      We will correct the labels and include high-magnification images to explain the morphological differences in HESC cells..

      (5) In lines 332-336, the authors pointed out that the cKO mice oviduct lining shows marked eosinophilic cytoplasmic change, but there's no data to support the claim. In addition, the authors further described that "some of the cells showed degenerative changes with cytoplasmic vacuolization and nuclear pyknosis, loss of nuclear polarity, and loss of distinct cell borders giving an appearance of fusion of cells (Fig. 3D)". First, Figure 3D did not show all these phenotypes and it is likely a mismatch to Figure 3E. Even in Figure 3E, it is not obvious to notice all the phenotypes described here. The figure legend is overly simple, and there's no explanation of the arrowheads in the panel. More data/images are required to support the claim here and provide a clear indication and explanation in the figure legend.

      Dr. Ramya Masand, Chief Pathologist in our department and a contributing author, critically evaluated the stained sections from Figure 3 and provided the pathological assessment as outlined in lines 332-336. We will consult Dr. Masand and will modify the statements accordingly.

      (6) In lines 317-325, it is rather confusing about the description of the portion of embryos from the oviduct and uterus. In addition, the total number of embryos was not provided. I would recommend presenting the numerical data to show the average embryos from the oviduct and uterus instead of using the percentage data in Figures 3A and 5G.

      We will calculate the average number of embryos from the oviduct and uterus and provide numerical data.

      (7) In lines 389-391, authors tested whether Polyphyllin VI treatment led to activated pyroptosis and blocked embryo transport. Although Figures 5F-G showed the expected embryo transport defect, the authors did not show the pyroptosis and oviduct morphology. It will be important to show that the Polyphyllin VI treatment indeed led to oviduct pyroptosis and lumen disruption.

      We will perform the GSDMD staining to determine whether Polyphyllin VI treatment resulted in oviductal pyroptosis activation and lumen disruption.

      (8) In line 378, it would be better to include a description of pyroptosis and its molecular mechanisms to help readers better understand your experiments. Alternatively, you can add it in the introduction.

      We will include more literature-based discussion on pyroptosis and its mechanism.

      (9) Please make sure to provide definitions for the acronyms such as FRT, HESCs, GSDMD, etc.

      We will provide definitions for the acronyms such as FRT, HESCs, and GSDMD.

      (10) It is rather confusing to use oviducal cell plasticity in this manuscript. The work illustrated the oviducal epithelial integrity, not the plasticity.

      We will correct the statement.

      A few of the additional comments for authors to consider improving the manuscript are listed below.

      (1) Some of the figures are missing scale bars, while others have inconsistent scale bars. It would be better to be consistent.

      (2) On a couple of occasions, the DAPI signal cannot be seen, such as in Figure 2B and Figure 3D.

      (3) Overall, the figure legends can be improved to provide more detailed information to help the reader to interpret the data.

      As suggested, we will include the scale bars with high quality images and will elaborate the figure legends text.

      (4) In Figure 2D, the Y-axis showed the stimulated/unstimulated uterine weight ratio, why did the author put "Atg14" at the top of the graph? At the same time, the X-axis title is missing in Figure 2D.

      (5) In the left panel of Figure 2G, "ATG14" at the top should be "Atg14" to be consistent.

      (6) In line 559, there miss "(A)" in front of Immunofluorescence analysis of GSDMD.

      We will make these necessary changes.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Pooja Popli and co-authors tested the importance of Atg14 in the female reproductive tract by conditionally deleting Atg14 using Pr Cre and Foxj1cre. The authors showed that loss of Atg14 leads to infertility due to the retention of embryos within the oviduct. The authors further concluded that the retention of embryos within the oviduct is due to pyroptosis in oviduct cells leading to defective cellular integrity. The manuscript has some interesting findings, however there are also areas that could be improved.

      Strengths:

      The importance of Atg14 and autophagy in the female reproductive tract is incompletely understood. The manuscript also provides spatial evidence about a new mechanism linking Atg14 to pyroptosis.

      Weaknesses:

      (1) It is not clear why the loss of Atg14 selectively induces Pyroptosis within oviduct cells but not in other cellular compartments. The authors should demonstrate that these events are not happening in uterine cells.

      We will carry out GSDMD staining in uterine tissues and discuss the findings.

      (2) The manuscript never showed any effect on the autophagy upon loss of Atg14. Is there any effect on autophagy upon Atg14 loss? If so, does that contribute to the observation?

      We will assess the expression of autophagy-related markers in response to Atg14 loss and will discuss the findings. 

      (3) It is not clear what the authors meant by cellular plasticity and integrity. There is no evidence provided in that aspect that the plasticity of oviduct cells is lost. Similarly, more experimental evidence is necessary for the conclusion about cellular integrity.

      We agree with reviewer on cellular plasticity aspect, we will remove the plasticity word, instead will mention only integrity.

      (4) The mitochondrial phenotype shown in Figure 3 didn't appear as severe as it is described in the results section. The analyses should be more thorough. They should include multiple frames (in supplemental information) showing mitochondrial morphology in multiple cells. The authors should also test that aspect in uterine cells. The authors should measure Feret's diagram. Diff erence in membrane potential etc. for a definitive conclusion.

      We will perform additional mitochondrial staining to determine the mitochondrial morphology in both the oviduct and uterus. Based on the results, we would consider measuring the Feret's diameters. However, we respectfully argue that performing complex membrane potential studies will take time and are beyond the scope of current focus.

      (5) The comment that the loss of Atg14 and pyroptosis leads to the narrowing of the lumen in the oviduct should be experimentally shown.

      As shown in Figure 3E, staining the oviduct epithelia with KRT8 clearly showed a disorganized oviduct with abnormally fused cells leaving no lumen space.  We could provide higher magnification images in supplementary figures to highlight this observation.

      (6) The manuscript never showed the proper mechanism through which Atg14 loss induces pyroptosis. The authors should link the mechanism.

      Autophagy has been shown to inhibit pyroptosis by either inhibiting the cleavage of GSDMD or by suppressing various pyroptosis-related factors, including NFLRs and STING proteins. We found that the loss of Atg14 results in elevated GSDMD levels, a potential mechanism through which Atg14 suppresses pyroptosis in the oviduct. Importantly, Atg14 may regulate GSDMD through several intermediary factors, and resolving this intricate nexus necessitates conducting complex biochemical, cellular, and molecular screenings, which is one of the focus of our future investigations.

  3. May 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1:

      (1) Authors need to acknowledge the physical effort in addition to visual information for the spatial coding and may consider the manipulation of physical efforts in the future to support the robustness of constant intrinsic bias in ground-based spatial coding during walking.

      Whether one’s physical effort can affect spatial coding for visual perception is not a settled issue.  Several empirical studies have not been able to obtain evidence to support the claim.  For example, empirical studies by Hutchison & Loomis (2009) and Durgin et al. (2009) did not find wearing a heavy backpack significantly influenced distance perception, in contrast to the findings by Proffitt et al (2003).  We respectfully request not to discuss this issue in our revision since it is not closely related to the focus of the current study.

      (2) Furthermore, it would be more comprehensive and fit into the Neuroscience Section if the authors can add in current understandings of the spatial reference frames in neuroscience in the introduction and discussion, and provide explanations on how the findings of this study supplement the physiological evidence that supports our spatial perception as well.  For instance, world-centered representations of the environment, or cognitive maps, are associated with hippocampal formation while self-centered spatial relationships, or image spaces, are associated with the parietal cortex (see Bottini, R., & Doeller, C. F. (2020). Knowledge Across Reference Frames: Cognitive Maps and Image Spaces. Trends in Cognitive Sciences, 24(8),606-619. https://doi.org/10.1016/j.tics.2020.05.008 for details)

      We have now added this important discussion in the revision on pages 12-13.

      We thank the reviewer for the helpful comments.

      Reviewer 2:

      (1) ….As a result, it is unclear to what extent this "allocentric" intrinsic bias is involved in our everyday spatial perception. To provide more context for the general audience, it would be beneficial for the authors to address this issue in their discussion.

      We have clarified this on pages 3-4.  In brief, our hypothesis is that during self-motion, the visual system constructs an allocentric ground surface representation (reference frame) by integrating the allocentric intrinsic bias with the external depth cues on the natural ground surface.  Supporting this hypothesis, we recently found that when there is texture cue on the ground, the representation of the ground surface is influenced by the allocentric intrinsic bias (Zhou et al, unpublished results).

      (2) The current findings on the "allocentric" coding scheme raise some intriguing questions as to why such a mechanism would be developed and how it could be beneficial. The finding that the "allocentric" coding scheme results in less accurate object localization and requires attentional resources seems counterintuitive and raises questions about its usefulness. However, this observation presents an opportunity for the manuscript to discuss the potential evolutionary advantages or trade-offs associated with this coding mechanism.

      The revision has discussed these important issues on page 12.

      (3) The manuscript lacks a thorough description of the data analysis process, particularly regarding the fitting of the intrinsic bias curve (e.g., the blue and gray dashed curve in Figure 3c) and the calculation of the horizontal separation between the curves. It would be beneficial for the authors to provide more detailed information on the specific function and parameters used in the fitting process and the formula used for the separation calculation to ensure the transparency and reproducibility of the study's results.

      The results of the statistical analysis were presented in the supplementary materials.  We had stated in the original manuscript that we fitted the intrinsic bias curve by eye (obtained by drawing the curve to transcribe the data points as closely as possible) (page 26).  This is because we do not yet have a formula for the intrinsic bias. A challenge is the measured intrinsic bias in the dark can be affected by multiple factors.  One factor is related to individual differences as the intrinsic bias is shaped by the observer’s past experiences and their eye height relative to the ground surface.  However, it is certainly our goal to develop a quantitative model of the intrinsic bias in the future.

      We thank the reviewer for the helpful comments.

      Reviewer 3:

      (1) I am a bit confused by Figure 2b. Allocentric coordinate refers to the representation of the distance and direction of an object relative to other objects but not relative to the observer. In Figure 2, however, the authors assumed that the perceived target was located on the interception between the intrinsic bias curve and the viewing line from the NEW eye position to the target. This suggests that the perceived object depends on the observer's new location, which seems odd with the allocentric coordinate hypothesis.

      We respectively disagree with the Reviewer’s statement that “Allocentric coordinate refers to the representation of the distance and direction of an object relative to other objects but not relative to the observer.”  The statement conflates the definitions of allocentric representation with exocentric representation.  We respectfully maintain that the observer’s body location, as well as observer-object distance, can be represented with the allocentric coordinate system.

      (2) According to Fig 2b, the perceived size should be left-shifted and lifted up in the walking condition compared to that in the stationary condition. However, in Figure 3C and Fig 4, the perceived size was the same height as that in the baseline condition.

      We assume by “target size”, the Reviewer actually meant, “target location”.  It is correct that figure 3c and figure 4 showed judged distance changed as predicted, while the change in judged height was not significant.  One explanation for this is that the magnitude of the height change was much smaller than the distance change and could not be revealed by our blind walking-gesturing method.  Please also note our figures used difference scales for the vertical height and horizontal distance.

      (3) Is the left-shifted perceived distance possibly reflecting a kind of compensation mechanism?  Participants could not see the target's location but knew they had moved forward.  Therefore, their brain automatically compensates for this self-movement when judging the location of a target.  This would perfectly predict the left-shifted but not upward-shifted data in Fig 3C.  A similar compensation mechanism exists for size constancy in which we tend to compensate for distance in computing object size.

      We assume the Reviewer suggested that the path-integration mechanism first estimates the traveled distance in the dark, and then the brain subtracts the estimated distance from the perceived target distance.  We respectfully maintain that this explanation is unlikely because it does not account for our empirical findings.  We found that walking in the dark did not uniformly affect perceived target distance, as the Reviewer’s explanation would predict.  As shown in figures 3 and 4, walking affected the near targets less than the far targets (i.e., the horizontal distance difference between walking and baseline-stationary conditions was smaller for the near target than far target).

      (4) According to Fig 2a, the target, perceived target, and eye should be aligned in one straight line. This means that connecting the physical targets and the corresponding perceived target results in straight lines that converge at the eye position. This seems, however, unlikely in Figure 3c.

      We have added in the revision, the averaged eye positions on the y-axes of figures 3 and 4.  To reveal the impact of the judged angular declination, we also added graphs that plotted the estimated angular declination as a function of the physical declination of the target.  In general, the slopes are close to unity.

      We thank the reviewer for the helpful comments.

      Recommendations for the authors:

      Reviewer 1 (Recommendations For The Authors):

      (1) This study is very well-designed and written. One minor comment is that anisotropy usually refers to the perceptual differences along cardinal (horizontal + vertical) and oblique directions. It might be clearer if the authors changed the "horizontal-vertical anisotropy" to "horizontal/vertical asymmetry”.

      The Reviewer is correct, and we have changed it to horizontal/vertical asymmetry (pages 8 and 11).

      Reviewer 2 (Recommendations For The Authors):

      (1) Providing more details about the "path integration mechanism" when it is first introduced in line 44 would be helpful for readers to better understand the concept.

      The revision has expanded on the path integration mechanism (page 4).

      Adding references for the statement starting with "In fact, previous findings" in lines 218 and would be helpful to provide readers with a basis for comparison between the current study and previous studies that reported an egocentric coding system.

      We have added the references and elaborated on this important issue (pages 10-11).

      (2) There appears to be a discrepancy between the Materials and Methods section, which states that 14 observers participated in Experiments 1-4, and the legends of Figures 3 and 4, which indicates a sample size of "n=8." It would be helpful if the authors could clarify this discrepancy and provide an explanation for the difference in the sample size reported.

      We have clarified the number of observers on page 14.

      (3) While reporting statistical significance is essential in the Results section, there are several instances where the manuscript only mentions a "statistically significant separation" with it p-value without providing the mean and standard deviation of the separation values (e.g., line 100 and 120). This can make it difficult for readers to fully grasp the quantitative nature of the results.

      The statistical analysis and outcomes were presented in the supplementary information document in our original submission.

      Reviewer 3 (Recommendations For The Authors):

      (1) Figure 1 is not significantly related to the current manuscript.

      We feel that retaining figure 1 in the manuscript would help readers to quickly grasp the background literature without having to refer extensively to our previous publications.

      (2) Add eye position to the results figures.

      We have added eye positions in the figures.

      (3) Fig 4c requires a more detailed explanation. The authors stated that Figures 4a and 4c showed consistent results.  However, because 4a and 4c used different horizontal axis, it is different to compare them directly.

      We have modified the sentence in the revision (page 8).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      (1) For a number of experiments the authors use their new data set on females and compare that with the data set previously published on males. In how far are these data sets comparable? Have they been performed originally in parallel for example using siblings of different sexes or have the experiments been conducted several years apart from each other? What is the expected variability, if one repeated these experiments with the same sex considering the differences/similarities between experimental setups, housing conditions, interindividual differences, etc.? 

      This is an important point. We did our best to collect the data in similar conditions (same set-ups; same animal housing conditions) and in experimental cohorts including both males and females. While some data from males were published first, the acquisition of male and female data was done in the same time period.

      Specifically, all results shown in Figure 1 and Figure 2 (Serum leptin, PPARalpha, AMPK, RNAseq) come from samples (from both males and females) that were processed at the same time and in similar conditions, by the same authors (Z.P. and P. M.).

      For the in vivo data (Figure 3, Supplementary figure 1), the male and female data were collected within a 1–2-year timeframe, in the same setups, by the same two authors (Z.P., D.K.). The males and females were housed under similar conditions (same room, same cage type, in groups of 25). We did not use siblings of different sexes. Independent cohorts (1-12 months apart), including both males and females, went into each data set. The within cohort variability does not obviously differ from between cohort variability, however the n number of animals is too small to confirm this with sufficient statistical power. 

      Altogether, the differences observed between male and female data cannot be explained by the timing and conditions of data acquisition from both sexes.

      (2) Energy consumption and visual processing may differ between periods in which animals are in different behavioral states. Is there a possibility that male and female mice differed in behavioral state during measurements? Were animals running or resting during visual stimulation and during ATP measurements? 

      We thank the reviewer for this suggestion. We have now edited the text and included a new supplementary figure. All in vivo experiments were done in stationary animals that were resting in a cardboard tube both during 2-photon imaging and ATP measurements. Animals were also well habituated to the setup. In addition, we have imaged pupil diameters during in vivo imaging session. We have quantified pupil diameter during visual stimulation and do not find a sex difference (Supplemental Figure 2). Thus, we did not find a significant difference in behavioural or attentional state between sexes, in our experimental conditions.

      We have edited the text to include this information (lines 183-185).

      (3) Related to the previous point: the authors show that ATP consumption was reduced in male mice during visual stimulation. What about visual cortex ATP consumption in the absence of visual stimulation? Do food-deprived males and/or females show lower ATP consumption in the visual cortex e.g. during sleep? 

      We have repeated V1 ATP imaging experiments in the dark, in the absence of visual stimulation, in both males and females (Supplementary figure 1). ATP consumption rates are slower in the dark vs. during visual stimulation. Moreover, we find that in the dark, there is no difference in ATP consumption rate between control and food restricted animals of either sex. Thus, the reduced ATP consumption we found with food restriction in males is related specifically to the active processing of visual information.

      We have edited the text to include this information (lines 158-159).

      Reviewer 2:

      (1) It appears that the authors have the data for doing decoding analysis, similar to Fig 6D in their previous paper. However, this analysis has not been done for this study. This would be good to include.  If the authors have attempted the behavioural discrimination tests on female mice as in the previous study, this would also be useful to include. 

      The first point of the reviewer is about datasets acquired in males that are included in our previous publication (Padamsey et al., 2022) but not compared to female data in the present manuscript.

      Whilst we fully agree that these results would be very useful, we did not have the resources (in terms of skilled researcher and funding) to perform these experiments in female mice. That is why these results are not included in this manuscript.

      (2) There appears to be an inconsistency in the methods of reporting OSI. It states that the OSI of grating-responsive neurons was calculated as 1 - circular variance. But then OSI is defined as simply abs(). Also, it would be good to be consistent about reporting medians as the median without confounding with the average (which is the mean). Sentences such as the following do not make sense: The average OSI for an animal was taken as the median OSI value calculated across neurons. This should be corrected throughout the manuscript, where the average is mentioned but the median is measured. 

      We thank the reviewer for noting this issue and we apologize for the confusion. We have now clarified the above in the manuscript (lines 587-603) and insert the following reference for the detailed explanation of OSI and DSI calculation: Mazurek M, Kager M, Van Hooser SD. Robust quantification of orientation selectivity and direction selectivity. Front Neural Circuits. 2014. https://doi.org/10.3389/fncir.2014.00092

      In the figure showing the orientation tuning, the authors have collapsed the two directions of each orientation together. However, if I understand correctly, the calculation of OSI does not do this step of collapsing. In this case, and in the interest of revealing more useful features of the data instead of averaging them out, it would be good to show the average tuning curves with and without FR for all directions, not collapsed. 

      As with orientation tuning, we found that direction tuning is reduced with food restriction, and that this is significant in males, but not in females. These results are now included in the text, with statistics (lines 179-180) and in Supplemental Figure 3.

      Reviewer 3:

      l. 183-187 The discussion based on the idea that "The Bayes factor analysis helps to differentiate the absence of evidence from the evidence of absence." does not seem very helpful. Using a statistical criterium makes less sense than providing the reader with an estimate largest effect size (if there is any) that is compatible with the observation. If there would be a significant effect but of a very small size would it change the authors' conclusion? That seems unlikely. I recommend removing the sentence on line 184, which is in fact not used afterwards. 

      We agree with the reviewer. We have now removed the sentence and rephrased (lines 202-208).  

      Editor's note: 

      Should you choose to revise your manuscript, please include full statistical reporting including exact pvalues wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05.

      We now provide exact p-values alongside the summary statistics (test statistic and df) and 95% confidence intervals for all key results.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this article, the authors investigate whether the connectivity of the hippocampus is altered in individuals with aphantasia ¬- people who have reduced mental imagery abilities and where some describe having no imagery, and others describe having vague and dim imagery. The study investigated this question using a fMRI paradigm, where 14 people with aphantasia and 14 controls were tested, and the researchers were particularly interested in the key regions of the hippocampus and the visual-perceptual cortices. Participants were interviewed using the Autobiographical Interview regarding their autobiographical memories (AMs), and internal and external details were scored. In addition, participants were queried on their perceived difficulty in recalling memories, imagining, and spatial navigation, and their confidence regarding autobiographical memories was also measured. Results showed that participants with aphantasia reported significantly fewer internal details (but not external details) compared to controls; that they had lower confidence in their AMs; and that they reported finding remembering and imagining in general more difficult than controls. Results from the fMRI section showed that people with aphantasia displayed decreased hippocampal and increased visual-perceptual cortex activation during AM retrieval compared to controls. In contrast, controls showed strong negative functional connectivity between the hippocampus and the visual cortex. Moreover, resting state connectivity between the hippocampus and visual cortex predicted better visualisation skills. The authors conclude that their study provides evidence for the important role of visual imagery in detail-rich vivid AM, and that this function is supported by the connectivity between the hippocampus and visual cortex. This study extends previous findings of reduced episodic memory details in people with aphantasia, and enables us to start theorising about the neural underpinnings of this finding.

      The data provided good support for the conclusion that the authors draw, namely that there is a 'tight link between visual imagery and our ability to retrieve vivid and detail-rich personal past events'. However, as the authors also point out, the exact nature of this relationship is difficult to infer from this study alone, as the slow temporal resolution of fMRI cannot establish the directionality between the hippocampus and the visual-perceptual cortex. This is an exciting future avenue to explore.

      We thank the reviewer for highlighting our contributions and suggesting that the relationship between visual imagery and autobiographical memory recall is an exciting future avenue.

      Weaknesses:

      A weakness of the study is that some of the questions used are a bit vague, and no objective measure is used, which could have been more informative. For example, the spatial navigation question (reported as 'How difficult is it typically for you to orient you spatially?' - a question which is ungrammatical, but potentially reflects a typo in the manuscript) could have been more nuanced to tap into whether participants relied mostly on cognitive maps (likely supported by the hippocampus) or landmarks. It would also have been interesting to conduct a spatial navigation task, as participants do not necessarily have insight into their spatial navigation abilities (they could have been overconfident or underconfident in their abilities).

      Secondly, the question 'how difficult is it typically for you to use your imagination?' could also be more nuanced, as imagination is used in a variety of ways, and we only have reason to hypothesise that people with aphantasia might have difficulties in some cases (i.e. sensory imagination involving perceptual details). It is unlikely that people with aphantasia would have more difficulty than controls in using their imagination to imagine counterfactual situations and engage in counterfactual thought (de Brigard et al., 2013, https://doi.org/10.1016%2Fj.neuropsychologia.2013.01.015) due to its non-sensory nature, but the question used does not distinguish between these types of imagination. Again, this is a ripe area for future research. The general phrasing of 'how difficult is [x]' could also potentially bias participants towards more negative answers, something which ought to be controlled for in future research.

      The main goal of our study was to examine autobiographical memory recall. Therefore, we used the gold standard Autobiographical Interview, or AI (Levine et al. 2002) and an fMRI paradigm to explore autobiographical memory recall as standardised, precisely, and objectively as possible.

      In addition to these experimentally rigorous tasks, we employed some loosely formulated questions with the intention for people to reflect on how they perceive their own abilities to recall autobiographical memories, navigate spatially, and use their imagination. We agree with the reviewer that these questions are vague and did not have the experimental standard for an investigation into spatial cognition or imagination associated with aphantasia. Nonetheless, we believe that these questions provide important additional insights into what participants think about their own cognitive abilities. In order to set these questions into perspective, we argue in the discussion that spatial cognition and other cognitive functions should be investigated in more depth in individuals with aphantasia in the future.

      As an additional note, all tasks were conducted in German. Thus, we were able to correct the wording of the debriefing question in our revision. We thank the reviewer for bringing this to our attention.

      Strengths:

      A great strength of this study is that it introduces a fMRI paradigm in addition to the autobiographical interview, paralleling work done on episodic memory in cognitive science (e.g. Addis and Schacter, 2007, https://doi.org/10.1016%2Fj.neuropsychologia.2006.10.016 ), which has examined episodic and semantic memory in relation to imagination (future simulation) in non-aphantasic participants as well as clinical populations. Future work could build on this study, and for example use the recombination paradigm (Addis et al. 2009, 10.1016/j.neuropsychologia.2008.10.026 ), which would shed further light on the ability of people with aphantasia to both remember and imagine events. Future work could also build on the interesting findings regarding spatial navigation, which together with previous findings in aphantasia (e.g. Bainbridge et al., 2021, https://doi.org/10.1016/j.cortex.2020.11.014 ) strongly suggests that spatial abilities in people with aphantasia are unaffected. This can shed further light on the different neural pathways of spatial and object memory in general. In general, this study opens up a multitude of new avenues to explore and is likely to have a great impact on the field of aphantasia research.

      We much appreciate the acknowledgment of our work into autobiographical memory employing both the autobiographical interview and fMRI. Furthermore, we hope that our work inspires future research in the way the reviewer outlines and in the way we describe in our manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This study investigates to what extent neural processing of autobiographical memory retrieval is altered in people who are unable to generate mental images ('aphantasia'). Self-report as well as objective measures were used to establish that the aphantasia group indeed had lower imagery vividness than the control group. The aphantasia group also reported fewer sensory and emotional details of autobiographical memories. In terms of brain activity, compared to controls, aphantasics had a reduction in activity in the hippocampus and an increase in activity in the visual cortex during autobiographical memory retrieval. For controls, these two regions were also functionally connected during autobiographical memory retrieval, which did not seem to be the case for aphantasics. Finally, resting-state connectivity between the visual cortex and hippocampus was positively related to autobiographical vividness in the control group but negatively in the aphantasia group. The results are in line with the idea that aphantasia is caused by an increase in noise within the visual system combined with a decrease in top-down communication from the hippocampus.

      Recent years have seen a lot of interest in the influence of aphantasia on other cognitive functions and one of the most consistent findings is deficits in autobiographical memory. This is one of the first studies to investigate the neural correlates underlying this difference, thereby substantially increasing our understanding of aphantasia and the relationship between mental imagery and autobiographical memory.

      We thank the reviewer for highlighting the importance of our findings.

      Strengths:

      One of the major strengths of this study is the use of both self-report as well as objective measures to quantify imagery ability. Furthermore, the fMRI analyses are hypothesis-driven and reveal unambiguous results, with alterations in hippocampal and visual cortex processing seeming to underlie the deficits in autobiographical memory.

      Once again, we thank the reviewer for highlighting the quality of our methods and our results.

      Weaknesses:

      In terms of weaknesses, the control task, doing mathematical sums, also differs from the autobiographical memory task in aspects that are unrelated to imagery or memory, such as self-relevance and emotional salience, which makes it hard to conclude that the differences in activity are reflecting only the cognitive processes under investigation.

      We agree with the reviewer that our control task differs from autobiographical memory in many different ways. In fact, for this first investigation of the neural correlates of autobiographical memory in aphantasia, this is precisely the reason why we chose this mental arithmetic (MA) task. We know from previous studies, that MA is, as much as possible, not dependent on hippocampal memory processes (Addis, et al. 2007, McCormick et al. 2015, 2017, Leelaarporn et al., 2024). The main goal of the current study was to establish whether there are any differences between individuals with aphantasia and controls. In the next investigation, we can now build on these findings to disentangle in more detail what this difference reflects. 

      Overall, I believe that this is a timely and important contribution to the field and will inspire novel avenues for further investigation.

      This highly positive conclusion is much appreciated.

      References

      Addis, D. R., Wong, A. T., & Schacter, D. L. (2007). Remembering the past and imagining the future: Common and distinct neural substrates during event construction and elaboration. Neuropsychologia45(7), 1363-1377.

      Kriegeskorte, N., Simmons, W., Bellgowan, P. et al. Circular analysis in systems neuroscience: the dangers of double dipping. Nat Neurosci 12, 535–540 (2009). https://doi.org/10.1038/nn.2303

      Leelaarporn, P., Dalton, M. A., Stirnberg, R., Stöcker, T., Spottke, A., Schneider, A., & McCormick, C. (2024). Hippocampal subfields and their neocortical interactions during autobiographical memory. Imaging Neuroscience.

      Levine, B., Svoboda, E., Hay, J. F., Winocur, G., & Moscovitch, M. (2002). Aging and autobiographical memory: dissociating episodic from semantic retrieval. Psychology and aging17(4), 677.

      McCormick, C., St-Laurent, M., Ty, A., Valiante, T. A., & McAndrews, M. P. (2015). Functional and effective hippocampal–neocortical connectivity during construction and elaboration of autobiographical memory retrieval. Cerebral cortex25(5), 1297-1305.

      McCormick, C., Moscovitch, M., Valiante, T. A., Cohn, M., & McAndrews, M. P. (2018). Different neural routes to autobiographical memory recall in healthy people and individuals with left medial temporal lobe epilepsy. Neuropsychologia110, 26-36.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      This is a very interesting article that makes a substantial contribution to the field of the study of aphantasia as well as the neural mechanisms of autobiographical memory. I would strongly recommend this manuscript to be accepted (with these minor revisions), as it makes a substantial and well-evidenced contribution to the research, and it opens up many interesting avenues for researchers to explore. I was especially excited to see that the Autobiographical Interview had been paired with an fMRI paradigm, something which this field of research highly benefits from, as there are yet so few fMRI studies into aphantasia. I understand that it is the authors' decision whether to accept or reject any of the revisions I recommend here, but I would like to stress that I encourage accepting the recommended revisions, especially as there are some minor inaccuracies in the manuscript as it currently stands. Finally, I would like to stress that though I am based in the area of cognitive science, am not trained in fMRI imaging techniques, and therefore do not stand in a position where I can comment on the methodology pertaining to this part of the study - I encourage the Editors to seek a second reviewer's opinion on this.

      Thank you for the positive evaluation of our manuscript as well as your comments. We have revised our manuscript according to your important suggestions as further explained below.

      Line 33: "aphantasia prohibits people from experiencing visual imagery". This  characterisation of aphantasia is too strong, especially as the authors use 32 as a cut-off point on the VVIQ, which represents weak and dim imagery. I would recommend using language like 'people with aphantasia have reduced visual imagery abilities', as this more accurately captures the group of people studied. Please revise throughout the manuscript. Please consult Blomkvist and Marks (2023) on this point who have discussed this problem in the aphantasia literature.

      We agree that aphantasics may experience reduced visual imagery abilities. We have revised our wording throughout the manuscript.

      Line 49: The authors conclude that their results 'indicate that visual mental imagery is essential for detail-rich, vivid AM', but this seems to be a bit too strong, for example since AM can be detail-rich with external (rather than internal) detail, and a person could potentially use mnemonic tricks such as keeping a detail-rich diary in order to boost their memory. That visual imagery is 'essential' implies that it is the only way to achieve detail-rich vivid AM, and this does not seem to be supported by the findings. I would recommend rephrasing it as 'visual mental imagery plays an important role in detail-rich, vivid AM' or 'visual mental imagery mediated detail-rich vivid AM'.

      We altered the sentence in Line 49 using one of the recommended phrases:

      ‘Our results indicate that visual mental imagery plays an important role in detail-rich, vivid AM, and that this type of cognitive function is supported by the functional connection between the hippocampus and the visual-perceptual cortex.’

      Line 69: Blomkvist and Marks (2023) have warned against calling aphantasia a 'condition' and this moreover seems to fit with the authors' previous research (Monzel, 2022). Please consider instead calling aphantasia an 'individual difference' in mental imagery abilities.

      Thank you for the suggestion. We have revised our wording throughout the manuscript, avoiding the term ‘condition’.

      Line 72: Add reference for emotional strength which has also been researched (Wicken et al. 2021, https://doi.org/10.1016/j.cortex.2020.11.014).

      We have added the suggested reference in Line 75:

      ‘Indeed, a handful of previous studies report convergent evidence that aphantasics report less sensory AM details than controls (Bainbridge et al., 2021; Dawes et al., 2020, 2022; Milton et al., 2020; Zeman et al., 2020), which may also be less emotional (Monzel et al., 2023; Wicken et al., 2021).’

      72-73: 'absence of voluntary imagery' - too strong as many people with aphantasia report having weak/dim mental imagery on the VVIQ.

      We agree that aphantasics may experience reduced visual imagery. We have revised this notion throughout the manuscript.

      74: Add reference to Bainbridge study which found a difference between recall of object vs spatial memory. This would be relevant here.

      We have added the suggested reference in Line 76:

      ‘Spatial accuracy, on the other hand, was not found to be impaired (Bainbridge et al., 2021).’

      Lines 94-97: The authors mention 'a prominent theory' but it is unclear which theory is referred to here. The article cited by Pearson (2019) does not suggest the possibility that aphantasia is due to altered connectivity between the hippocampus and visual-perceptual cortices. It suggests that aphantasia is due to impairment in the ventral stream, and in fact says that the hippocampus is unlikely to be affected due to spared spatial abilities in people with aphantasia. Specifically, Pearson claims: "Accordingly, memory areas of the brain that process spatial properties, including the hippocampus, may not be the underlying cause of aphantasia." (page 631). The authors further come back to this point in the discussion section (see comment below), saying that the hypothesis attributed to Pearson is supported by their study. I do not disagree with the point that the hypothesis is supported by the data, but it is unclear to me why the hypothesis is attributed to Pearson.

      Thank you for pointing out this inaccuracy. We have edited the text to spell out our entire train of thought (see Lines 96-102):

      ‘A prominent theory posits that because of this hyperactivity, small signals elicited during the construction of mental imagery may not be detected (Pearson, 2019, Keogh et al., 2020). Pearson further speculates that since spatial abilities seem to be spared, the hippocampus may not be the underlying cause of aphantasia. In agreement, Bergmann and Ortiz-Tudela (2023) speculate that individuals with aphantasia might lack the ability to reinstate visually precise episodic elements from memory due to altered feedback from the visual cortex.’

      Line 97: Blomkvist reference should be 2022 (when first published online).

      The article ‘Aphantasia: In search of a theory’ by Blomkvist was first published on 1st July 2022. However, a correction was added on 13th March 2023. Therefore, we had cited the corrected version in this manuscript. However, we agree that the first publication date should be used and edited the reference accordingly.

      Line 116: 'one aphantasic' could be seen as offensive. I would suggest 'one aphantasic participant'.

      We have altered the paragraph according to your suggestion.

      Line 138: In line with the recommendations put forward by Blomkvist and Marks (2023), I would suggest removing the word 'diagnosed', as this medicalises aphantasia in a way that is not consistent with its not being a kind of mental disorder (Monzel et al., 2022). I would say that aphantasia is instead operationalised as a score between 16-32. However, note that Blomkvist (2022) and Blomkvist and Marks (2023, https://doi.org/10.1016/j.cortex.2023.09.004 ) point out that there is also a lot of inconsistency in this score and how it is used in different studies. In your manuscript, I would recommend removing all wording that indicates that people with aphantasia have no experience of mental imagery, as you have operationalised for a score up to 32 which indicates vague and dim imagery. Describing vague and dim imagery as no imagery/absence of imagery is inconsistent (but common practice in the literature).

      Thank you for your suggestion. We have revised the entire manuscript to eliminate any ambiguous meanings regarding the definition of aphantasia. Moreover, we replaced the word ‘diagnosed’ with ‘identified’ in Line 146.

      Line 153: maybe 'correlated with imagery strength' rather than 'measures imagery strength'?

      We have altered the sentence according to your suggestion in Line 160:

      ‘Previous studies have shown that the binocular rivalry task validly correlated with mental imagery strength.’

      Line 162: "For participants who were younger than 34 years, the middle-age memory was replaced by another early adulthood memory". Is there precedence for this? Please add one sentence to explain/justify for the reader why a memory from this time period was chosen.

      To maintain the homogeneous data set of acquiring five episodic autobiographical memories from five different periods of life per one individual, we asked the participants who were at the time of the interview, younger than 34 years old, to provide another early adulthood memory instead of middle age memory, as they had not reached the age range of middle age. According to Levine et al. (2002), younger adults (age < 34 years old) selected 2 events from the early adulthood period. Hence, all participants provided the last time period with memories from their previous year. We have added an additional explanation in this section in Line 170:

      ‘In order to acquire five AMs in every participant, the middle age memory was replaced by another early adulthood memory for participants who were younger than 34 years old (see Levine et al., 2002). Hence, all participants provided the last time period with memories from their previous year.’

      Line 169: "During the general probe, the interviewer asked the participant encouragingly to promote any additional details." Consider a different word choice, 'promote' sounds odd.

      We have altered the sentence according to your suggestion in Line 180:

      ‘During the general probe, the interviewer asked the participant encouragingly to provide any additional details.’

      Line 196-198: the phrasing of these questions could have biased participants toward reporting it being more difficult. Did the authors control for this possibility in any way? The phrasing ‘How easy is it for you to [x]?’ might also be considered in a future study.

      Thank you for pointing this out. These debriefing questions were thought of as open questions to get people to talk about their experiences. They were not meant as rigorous scientific experiments. Framing it in a positive way is a good idea for future research.

      We have edited the manuscript on Line 394-396:

      ‘The debriefing questions were employed as a way for participants to reflect on their own cognitive abilities. Of note, these were not meant to represent or replace necessary future experiments.’

      Line 197: This question is ungrammatical. Is this a typo, or was this how the question was actually posed? What language was the study conducted in?

      All interviews within this study were conducted in German. Hence, the questions listed in this current manuscript were all translated from German into English. We have added this information in the Materials and Methods section in Line 169 as well as restructured the referred questions from Line 208-210:

      ‘All interviews were conducted in German.’

      (1) Typically, how difficult is it for you to recall autobiographical memories?

      (2) Typically, how difficult is it for you to orient yourself spatially? 

      (3) Typically, how difficult is it for you to use your imagination?’

      Line 211: The authors write that participants were asked to "re-experience the chosen AM and elaborate as many details as possible in their mind's eye" was this the instruction used? I think stating the explicit instruction here would be relevant for the reader. If this is the word choice, it is also interesting as the autobiographical interview does not normally specify to re-experience details 'in one's mind's eye'.

      The instructions gi‘en to ’he par’Icipa’ts were to choose an AM and re-experience/elaborate it in their mind with as many details as possible without explaining them out loud. We have clarified this in Lines 221-223.

      ‘For the rest of the trial duration, participants were asked to re-experience the chosen AM and try to recall as many details as possible without speaking out loud.’

      Line 213: Were ‘vivid’ and ‘faint’ the only two options? Why was a 5-point scale (like the VVIQ scale) not used to better be able to compare?

      During the scanning session, the participants were given a button box which contained two buttons with 'vivid' by pressing the index finger and 'faint' by pressing the middle finger. The 5-point scale was not used to avoid confusion with the buttons during the scanning session. We have clarified this in Line 224:

      ‘We chose a simple two-button response in order to keep the task as easy as possible.’

      Line 347: Do the authors mean the same thing by 'imagery strength' and 'imagery vividness'? This would be good to clarify as it is not clear that these words mean the same thing.

      Imagery strength is often used to describe the results of the Binocular Rivalry Task, whereas vividness of mental imagery is often used to describe the results of the VVIQ. Although both tasks are correlated, the VVIQ measures vividness, whereas the dimension of the Binocular Rivalry Task is not clearly defined. We added this information in a footnote on page 10.

      Lines 353 - 356: When the authors first say that aphantasics described fewer memory details than controls, does this refer to external + internal details? Please clarify.

      Lines 353-360: The authors first say that aphantasics report "internal details (M = 43.59, SD = 17.91) were reported more often than external details (M = 20.64, SD = 8.94)" (line 355). But then they say: "a 2-way interaction was found between the type of memory details and group, F(1, 27)= 54.09, p < .001, ηp2 = .67, indicating that aphantasics reported significantly less internal memory details, t(27) = 5.07, p < .001, d = 1.83, but not significantly less external memory details, t(27) = 0.13, p = .898, compared to controls (see Figure 1b)" (line 358). This seems to first say that aphantasics didn't report fewer details than controls, but then that they did report fewer internal details than controls. Please clarify if this is correct.

      Line 383: Results from controls are not reported in this section.

      We have first reported the main effects of the different factors; thus, aphantasics reported less details than controls (no matter of group and type of memory details), the internal details were reported more often than external details (no matter of group and memory period), and more details were reported for recent than remote memories (no matter of group and type of memory details). Subsequently, we report the simple effects for aphantasics and controls separately. To further clarify, we added the following segment in line 360:

      ‘Regarding the AI, we found significant main effects of memory period, F(1, 27) = 11.88, p = .002, ηp2 = .31, type of memory details, F(1, 27) = 189.03, p < .001, ηp2 = .88, and group, F(1, 27) = 9.98, p = .004, ηp2 = .27. When the other conditions were collapsed, aphantasics (M = 26.29, SD = 9.58) described less memory details than controls (M = 38.36, SD = 10.99). For aphantasics and controls combined, more details were reported for recent (M = 35.17, SD = 14.19) than remote memories (M = 29.06, SD = 11.12), and internal details (M = 43.59, SD = 17.91) were reported more often than external details (M = 20.64, SD = 8.94). More importantly, a 2-way interaction was found between type of memory details and group, F(1, 27) = 54.09, p < .001, ηp2 = .67, indicating that aphantasics reported significantly less internal memory details, t(27) = 5.07, p < .001, d = 1.83, but not significantly less external memory details, t(27) = 0.13, p = .898, compared to controls (see Figure 1b).’

      Overall, the results were reported for aphantasics and controls separately in Lines 368-372.

      Line 386: The question does not specify that it's asking about using imagination in daily life, even though this is what results report. I'm not sure that the question implies the use of imagination in daily life, so I would recommend removing this reference here.

      We have removed the “in daily life” since this was not part of the original debriefing question.

      Line 394: Could this slowness in response reflect uncertainty about the vividness?

      Since the reason for this slowness is not known, we have refrained from adding this to the discussion. However, we added this as a short insertion in line 406:

      ‘Moreover, aphantasics responded slower (M = 1.34 s, SD = 0.38 s) than controls (M = 1.00 s, SD = 0.29 s) when they were asked whether their retrieved memories were vivid or faint, t(28) = 2.78, p = .009, possibly reflecting uncertainty in their response.’

      Line 443: Graph E, significance not indicated on the graph.

      After preprocessing, the fMRI data were statistically analyzed using the GLM contrast AM versus MA. The resulting images were then thresholded at p < 0.001, so that the illuminated voxels in Fig. 3 A, B, C, and D show only voxel in which we know already that there is a statistical difference between our conditions. Graph E illustrates only the descriptive means and variance of the significant differences in Fig. 3 C and D. This display is useful since the reader can more easily assess the difference between two conditions and two groups at a glance. For a general discussion on this topic, please also see circular analysis in fMRI (Kriegeskorte et al. 2009)

      Line 521-522: The authors claim that Pearson (2019) forwards the hypothesis that heightened activity of visual-perceptual cortices hinders aphantasics from detecting small imagery-related signals. However, I find no statement of this hypothesis in Pearson (2019). It is unclear to me why this hypothesis is attributed to Pearson (2019). Please remove this reference or provide a correct citation for where the hypothesis is stated. Further, it is not clear from what is written how the results support this hypothesis as this is rather brief - please elaborate on this.

      We attributed this hypothesis to Pearson (2019) according to his Fig. 4, which states: ‘A strong top-down signal and low noise (bottom left) gives the strongest mental image (square), whereas a high level of neural noise and a weak top-down imagery signal would produce the weakest imagery experience (top right).’

      We have edited our manuscript to reflect Pearson better in Lines 543-550:

      ‘In a prominent review, Pearson synthesizes evidence about the neural mechanism of imagery strength (Pearson, 2019). Indeed, activity metrics in the visual cortex predict imagery strength (Cui et al., 2007; Dijkstra et al., 2017). Interestingly, lower resting activity and excitability result in stronger imagery, and reducing cortical activity in the visual cortex via transcranial direct current stimulation (tDCS) increases visual imagery strength (Keogh et al., 2020). Thus, one potential mechanism of aphantasia-related AM deficits is that the heightened activity of the visual-perceptual cortices observed in our and previous work hinders aphantasics to detect weaker imagery-related signals.’

      Line 575: Consider citing Blomkvist (2022) who has argued that aphantasia is an episodic memory condition

      We added the suggested reference in Line 601.

      Line 585: Consider citing Bainbridge et al (2021) https://doi.org/10.1016/j.cortex.2020.11.014

      We have added the suggested reference in Line 612.

      Line 581: It might be relevant here to also discuss non-visual details, which have indeed been investigated in your present study. E.g. the lower emotional details, temporal details, place details, etc.

      We have edited our discussion to reflect the non-visual details better in Line 605:

      ‘In fact, previous and the current study show that aphantasics and individuals with hippocampal damage report less internal details across several memory detail subcategories, such as emotional details and temporal details (Rosenbaum et al., 2008; St-Laurent et al., 2009; Steinvorth et al., 2005), and these deficits can be observed regardless of the recency of the memory (Miller et al., 2020). These similarities suggest that aphantasics are not merely missing the visual-perceptual details to specific AM, but they have a profound deficit associated with the retrieval of AM.’

      Place details are discussed on page 37 onwards.

      Line 605: I agree with this interesting suggestion for future research. It would also be relevant to reference Bainbridge (2021) here who tested spatial cognition in a drawing task and found that aphantasic participants correctly recalled spatial layouts of rooms but reported fewer objects than controls. It might also be worth pointing out that the present study does not actually test for accuracy in spatial cognition, so it could be the case that people with aphantasia feel confident that they can navigate well, but they might in fact not. Future studies relying on objective measures should test this possibility.

      We have added the suggested reference in Line 625.

      Lines 609-614: Is there any evidence that complex decision-making and complex empathy tasks depend on constructed scenes with visual-perceptual details? This hypothesis seems a bit far-fetched without any supporting evidence. In fact, it seems unlikely to be supported as we also know that people with aphantasia generally live normal lives, and often have careers that we can assume involve complex decision-making (see Zeman 2020 who report aphantasics who work as computer scientists, managers, etc). I would recommend that the authors provide evidence of the role of mental imagery in complex decision-making and complex empathy tasks, mediated by scene construction, to support this hypothesis as viable to test for future research. It is also unclear how this point connects to the argument made by Bergmann and Ortiz-Tudela (2023). In fact, Bergmann and Ortiz-Tudela seem to make the same argument as Pearson (2019) does - that aphantasia results from impairments in the ventral stream, but that the dorsal stream is unaffected. However, Blomkvist (2022) argues that this view is too simplistic to be able to account for the variety of deficits that we see in aphantasia. I would recommend either engaging more fully with this debate or cutting it, as it currently is too vague for a reader to follow.

      We have decided to leave the discussion about scene construction and its connection to complex decision making and empathy out of the current manuscript. We have included the argument of Bergmann & Ortiz-Tudela (2023) in the Introduction (Line 101):

      ‘In agreement, Bergmann and Ortiz-Tudela (2023) speculate that individuals with aphantasia might lack the ability to reinstate visually precise episodic elements from memory due to altered feedback from the visual cortex.’

      Reviewer #2 (Recommendations For The Authors):

      In general, I really enjoyed reading this paper.

      Thank you very much for the positive evaluation of our manuscript as well as your comments.

      There were only a few things that I had some concerns about. For example, it was unclear to me whether the whole-brain analysis (Figures 3 and 4) was corrected for multiple comparisons or why only a small volume correction was applied for the functional connectivity analysis. If these results are borderline significant, this should be made more explicit in the manuscript. I don't think this is a major issue as the investigation of both the hippocampus and visual cortex was strongly hypothesis-driven, but it would still be good to be explicit about the strength of the findings.

      For the whole-brain analysis, we applied a threshold of p < .001, voxel cluster of 10, but no other multiple comparisons correction applied. The peak in the right hippocampus did survive the whole-brain threshold but we decided to lower this threshold just for display purposes in Figure 3, so that the readers can easily see the cluster.

      We have made the statistical thresholds more easily assessable for the reader on the following pages:

      Figure 3 (Page 27): ‘Images are thresholded at p < .001, cluster size 10, uncorrected, except (D) which is thresholded at p < .01, cluster size 10, for display purposes only (i.e., the peak voxel and adjacent 10 voxels also survived p < .001, uncorrected).’

      Figure 4 (Page 30): ‘Image is displayed at p < .05, small volume corrected, and a voxel cluster threshold of 10 adjacent voxels.’

      I was wondering whether it would be possible to use DCM to investigate the directionality of the connectivity. Given that there are only two ROIs and two alternative hypotheses (top-down versus bottom-up) this seems like an ideal DCM problem.

      We thank the reviewer for this suggestion and will consider testing the effective connectivity between both regions of interest in a future investigation. 

      Line 385: typo: 'great' should be 'greater'.

      We have altered the typo from ‘great’ to ‘greater’ in Line 397.

      Line 400: absence of evidence of an effect is not evidence of absence of an effect.

      We agree with the reviewer that this was unclear. We changed the wording in Line 412:

      ‘In addition, aphantasics and controls did not differ significantly in their time searching for a memory in AM trials, t(19) = 1.03, p = .315.’

      Typo line 623: 'overseas'.

      We have altered the mistyped word from ‘overseas’ to ‘oversees’ in Line 647.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the Authors:

      Reviewer 1:

      (1) Figure legends are too sparing, and often fail to describe with enough detail and accuracy the experiments presented. Especially in a work like this one, which uses plenty of different approaches and techniques and has a concise main text, description in the figure legends can really help the reader to understand the technical aspects of the experimental design. In my opinion, this will also help highlight the effort the authors put into exploring different and often new technical approaches. 

      We thank Reviewer 1 for highlighting this point and agree with them that the original figure legends lacked detailed information. In this revised version of our paper we edited all figure legends providing higher detail on experiments and information displayed (see Main text p12-16, Supplementary Information p2-5). We hope this change will improve the clarity and accuracy of the description of our experiments. 

      Reviewer 2:

      (1) Is there evidence that the early movement phenotype is actually linked to the larval movement phenotype? I noticed that the chordotonal driver experiment was only examined for larval movement. Is this driver not expressed earlier? Could the authors check the early phenotype using this driver? Are there early drivers that are expressed in chordotonal organ precursors (not panneuronal) and does the knockdown of CG3638 in these specific cells suppress the early phenotype?

      (2) More broadly, I would like to understand the function of the early embryonic movements. My concern is that they may only be a sign that the nervous system is firing up. If the rescue of the late miRNA mutant phenotype with chordotonal organ expression is only through a late change in the expression of CG3638, then the larval phenotype is probably not due to a developmental change, but a change in the immediate functioning of the neurons. Would this suggest that the early pulsing is not required for anything, at least at our level of understanding? If the driver is actually expressed early and late, then perhaps the authors could test later drivers to delimit the early and late functions of the miRNA? 

      The comments by Reviewer 2 in the points above are important and enquire about the biological role of early embryonic movements and whether these movements are linked to later larval activity or are somewhat irrelevant to the behaviour of the animal at later stages. 

      To address this important question, we conducted a new experiment in which we reduced neural activity specifically in the embryo (i.e. from 10hs AEL until the end of embryogenesis) and tested whether this treatment had any impact on larval movement. If – as put by Rev2 – the ‘early pulsing is not required for anything’ and the larval phenotype emerges from an acute change in neuronal physiology, then our experiment should show no effects at the larval stage. The results shown in Figure S4 (see Supplementary Information, p5) show that this is not the case: artificial reduction of neural activity during embryogenesis leads to a statistically significant reduction in larval speed, similar to that caused by the loss of miR-2b-1. This shows that modifications of embryonic activity impact larval movement. 

      Furthermore, earlier work on the biological role of embryonic activity identified an activity-dependent ‘critical period’ during late embryogenesis (Giachello and Baines, 2015; Ackerman et al., 2021): manipulations at or around this critical period result in both locomotor and seizure phenotypes in larvae. We cite these papers in the main text (p7).

      In addition, two recent papers (Zeng et al., 2021; Carreira-Rosario et al., 2021) – which we cite in the main text (p5) – show that inhibition of muscle activity specifically during the embryonic period prevents the generation of normal neural activity patterns in both, embryo and larva. Similar results are observed when proprioceptive sensory inputs to the central nervous system are blocked, with larval locomotion also disrupted. 

      Altogether, the data already in the literature plus our new addition to the paper, show that early embryonic movements play a key role in the development of the nervous system and larval locomotion.

      (3) Given the role in the larval chordotonal organs, have the authors also checked the adult movements? 

      The question of whether miR-2b-1 action in chordotonal organs affects behaviour at later stages of the Drosophila life cycle is interesting and was the reason why we assessed different genetic manipulations at the larval stage. However, we believe that assessing adult locomotor phenotypes is beyond the scope of this paper. 

      (4) The authors state that mir-2b-1 is a mirtron. I do not believe this is correct. It is not present in an intron in Btk from what I can see. Also, in the reference that the authors use when stating that mir-2b is a mirtron, I believe mir-2b-1 is actually used as a non-mirtron control miRNA. As mirtrons are processed slightly differently from regular hairpins and often use only the 3' end of the hairpin for miRNA creation, this may not be a trivial distinction. 

      We are grateful to Rev2 for highlighting this point: indeed, as they say, miR-2b-1 is located in the 3’UTR of host gene Btk, rather than in an intron. Accordingly, in this revision we remove the comment on miR-2b-1 being a mirtron (p6) and deleted the citation accordingly. 

      (5) For miRNA detection, the authors use in situ hybridization and QPCR. Both methods show that the gene is expressed but not that the mature miRNA is made. If the authors wanted a truly independent test for the presence of the miRNA, a miRNA sensor might be a better choice and it would hint at which part of the hairpin makes the functional miRNA. This is probably not necessary but could be a nice addition. 

      We thank Rev2 for drawing attention to this point and allowing this clarification. The qPCR protocol we used is based on the method developed by Balcells et al., 2011 (w/303 citations) (see Materials and Methods section in Supplementary Information, p14) which allows the specific amplification of mature miRNA transcripts, and not their precursors. This method for mature miRNA PCR is so robust that it has even been patented (WO2010085966A2). To ensure that the reader is clear about our methods, we state in the main text (p6) that we perform "RT-PCR for the mature miRNA transcript".  [NB: miRNA sensors provide a useful method to assess miRNA expression but can also act as competitive inhibitors of physiological miRNA functions, titrating away miRNA molecules from their real targets in tissue; therefore, results using this method are often difficult to interpret.]

      (6) Curious about mir-2b-1 and any overlap with the related mir2b-2 and the mir2a genes. I am just wondering about the similarity in their sequences/targets and if they might have similar phenotypes or enhance the phenotypes being scored by the authors. 

      This is an interesting point raised by REV2 and indeed miR-2b-1 does belong to the largest family of microRNAs in Drosophila, the miR-2 family, discussed in detail by Marco et al., 2012. However, we consider that performing tests of additional miRNA mutations, both individually and in combination with miR-2b-1, is beyond the scope of this paper.

      (7) Related to this, the authors show that the reduction of a single miRNA target suppresses the miRNA loss of function phenotype. This indicates that this target is quite important for this miRNA. I wonder if the target site is conserved in the human gene that the authors highlight.

      This is another interesting comment by Rev2. To pursue their idea, we have performed a blast for the miR-2b-1 target site in the human orthologs of CG3638 and did not find a match suggesting that the relationship between miR-2b-1 and CG3638 is not evolutionarily preserved between insects and mammals. 

      Public Reviews:

      Reviewer #1:

      Weaknesses: 

      The authors do not describe properly how the miRNA screening was performed and just claim that only miR-2b-1 mutants presented a defective motion phenotype in early L1. How many miRNAs were tested, and how candidates were selected is never explicitly mentioned in the text or the Methods section.

      We identified miR-2b-1 as part of a genetic screen aimed at detecting miRNAs with impact on embryonic movement, but this full screen is not yet complete. Seeing the clear phenotype of miR2b-1 in the embryo prompted us to study this miRNA in detail, which is what we report in this paper. 

      The initial screening to identify miRNAs involved in motion behaviors is performed in early larval movement. The logic presented by the authors is clear - it is assumed that early larval movement cannot proceed normally in the absence of previous embryonic motion - and ultimately helped them identify a miRNA required for modulation of embryonic movement. However, it is possible that certain miRNAs play a role in the modulation of embryonic movement while being dispensable for early L1 behaviors. Such regulators might have been missed with the current screening setup. Although similar changes to those described for the neurogenic phase of embryonic movement are described for the myogenic phase in miR-2b-1 mutants (reduction in motion amplitude), this phenotype goes unexplored. This is not a big issue, as the authors convincingly demonstrate later that miR-2b-1 is specifically required in the nervous system for proper embryonic and larval movement, and the effects of miR-2b-1 on myogenic movement might as well be the focus of future work. However, it will be interesting to discuss here the implications of a reduced myogenic movement phase, especially as miR-2b-1 is specifically involved in regulating the activity of the chordotonal system - which precisely detects early myogenic movements. 

      We thank Rev1 for their interest in that loss of miR-2b-1 results in a decrease in movement during the myogenic phase, in addition to the neurogenic phase. Indeed, two recent papers (Zeng et al., 2021; Carreira-Rosario et al., 2021) – which we cite in the main text (p5) – show that inhibition of muscle activity during a period that overlaps with the myogenic phase prevents the formation of normal neural activity patterns and larval locomotion. They also observe the same when inhibiting proprioceptive sensory inputs to the central nervous system. This could suggest that the effects of miR-2b-1 on the myogenic phase might have ‘knock-on’ effects upon the later neurogenic phase and larval movement. However, we note that genetic restoration of miR-2b-1 expression specifically to neurons completely rescues the larval speed phenotype (Fig. 3G), suggesting that the dominant effect of miR-2b-1 upon movements is through its action within neurons. To recognise Rev1’s comment we have added a short sentence to the text (p7) suggesting that ‘the effects of miR-2b-1 observed at earlier stages (myogenic phase) are possibly offset by normal neural expression of miR-2b-1’.  

      FACS-sorting of neuronal cells followed by RT-PCR convincingly detects the presence of miR-2b-1 in the embryonic CNS. However, control of non-neuronal cells would be required to explore whether miR-2b-1 is not only present but enriched in the nervous system compared to other tissues. This is also the case in the miR-2b-1 and Janus expression analysis in the chordotonal organs: a control sample from the motor neurons would help discriminate whether miR-2b-1/Janus regulatory axis is specifically enriched in chordotonal organs or whether both genes are expressed throughout the CNS but operate under a different regulation or requirements for the movement phenotypes.

      The RNA in situ hybridisation data included in the paper (Fig. 3B) show that RNA probes for miR2b-1 precursors reveal very strong signal in neural tissue – with very low signal detected in other tissues – strongly indicating that expression of miR-2b-1 is highly enriched in the nervous system.

      Reviewer #2:

      Weaknesses: 

      As I mentioned above, I felt the presentation was a bit overstated. The authors present their data in a way that focuses on movement, the emergence of movement, and how their miRNA of interest is at the center of this topic. I only point to the title and name that they wish to give the target of their miRNA to emphasize this point. "Janus" the GOD of movement and change. The results and discussion section starts with a paragraph saying, "Movement is the main output of the nervous system... how developing embryos manage to organise the necessary molecular, cellular, and physiological processes to initiate patterned movement is still unknown. Although it is clear that the genetic system plays a role, how genes control the formation, maturation and function of the cellular networks underlying the emergence of motor control remains poorly understood." While there is nothing inherently untrue about these statements, it is a question of levels of understanding. One can always argue that something in biology is still unknown at a certain level. However, one could also argue that much is known about the molecular nature of movement. Next, I am not sure how much this work impacts the area of study regarding the emergence of movement. The authors show that a reduction of a miRNA can affect something about certain neurons, that affects movement. The early movements, although slightly diminished, still emerge. Thus, their work only suggests that the function of some neurons, or perhaps the development of these neurons may impact the early movements. This is not new as it was known already from early work from the Bate lab.  Later larval movements were also shown to be modified in the miRNA mutants and were traced to "janus" overexpression in the chordotonal organs. As neurons are quite sensitive to the levels of Cl- and Janus is thought to be a Cl- channel, this could lead to a slight dysfunction of the chordotonal neurons. So, based on this, the work suggests that dysfunction of the chordotonal organs could impact larval movement. This was, of course, already known. The novelty of this work is in the genes being studied (important or not). We now know that miR 2b-1 and Janus are expressed in the early neurons and larval chordotonal neurons and their removal is consistent with a role for these genes in the functioning of these neurons. This is not to trivialize these findings, simply to state that these results are not significantly changing our overall understanding of movement and the emergence of movement. I would call it a stretch to say that this miRNA CONTROLS the emergence of movement, as in the title. 

      As already mentioned in our provisional response, on this point we politely – but strongly – disagree with Rev2’s suggestion that the findings are inflated by our language. We also note that they criticise our use of the verb ‘control’, yet this is a standard textbook term in molecular biology to describe biological processes regulated by genetic factors: given that miR-2b-1 regulates movement patterns during embryogenesis, to say that miR-2b-1 ‘controls’ embryonic movement in the Drosophila embryo is reasonable and in line with the language used in the field. 

      Finally, the name Janus should be changed as it is already being used. A quick scan of flybase shows that there is a Janus A and B in flies (phosphatases) and I am surprised the authors did not check this. I was initially worried about the Janus kinase (JAK) when I performed the search. While I understand that none are only called Janus, studies of the jan A and B genes refer to the locus as the janus region, which could lead to confusion. The completely different molecular functions of the genes relative to CG3638 add to the confusion. Thus, I ask that the authors change the name of CG3638 to something else.

      Thank you for spotting this omission. In the revised MS we propose a new name – Movement Modulator (Motor) – for the gene previously described as Janus (CG3638) to avoid annotation issues at FlyBase due to other, unrelated genes that include this word as part of their names. All instances where Janus was used are now replaced by Motor (abstract; main text pages 9-10; Figure 4).

    1. Author response:

      Reviewer #1 (Public review):

      (1) The link between the background in the introduction and the actual study and findings is often tenuous or not clearly explained. A re-working of the intro to better set up and link to the study questions would be beneficial.

      Response: upon revision, we plan to rewrite the introduction of the manuscript.

      (2) For the sequencing, which kit was used on the Novaseq6000?

      Response: for sequencing, we used the Chromium Controller and Chromium Single Cell 3’Reagent Kits (v3 chemistry CG000183) on the Novaseq6000. We feel sorry for lacking this quite important part and will add the information in Methods.

      (3) Additional details are needed for the analysis pipeline. How were batch effects identified/dealt with, what were the precise functions and settings for each step of the analysis, how was clustering performed and how were clusters validated etc. Currently, all that is given is software and sometimes function names which are entirely inadequate to be able to assess the validity of the analysis pipeline. This could alternatively be answered by providing annotated copies of the scripts used for analysis as a supplement.

      Response: we apologize for the inadequacy of descriptions of data analysis process due to word count limit. We plan to provide more information, and if possible we also would like to provide scripts as supplementary data in the revised manuscript.

      (4) For Cell type annotation, please provide the complete list of "selected gene markers" that were used for annotation.

      Response: we will add the list of marker genes for cell type annotation in the revised manuscript.

      (5) No statistics are given for the claims on cell proportion differences throughout the paper (for cell types early, epithelial sub-clusters later, and immune cell subsets further on). This should be a multivariate analysis to account for ADC/SCC, HPV+/- and Early/Late stage.

      Response: considering this inadequacy, we plan to use statistic approaches for further analyses to compare the differences between each set of groups up revision.

      (6) The Y-axis label is missing from the proportion histograms in Figure 2D. In these same panels, the bars change widths on the right side. If these are exclusively in ADC, show it with a 0 bar for SCC, not doubling the width which visually makes them appear more important by taking up more area on the plot.

      Response: we feel sorry for impreciseness when presenting histograms such as Fig 2D and we will add labels in Y-axis. As for the width of bars, we just used the histograms generated originally from the data package. However, we did not intend to double the width on purpose to strengthen the visual importance. We sincerely feel sorry for this and will correct the similar mistakes alongside the whole manuscript.

      (7) Throughout the manuscript, informatic predictions (differentiation potential, malignancy score, stemness, and trajectory) are presented as though they're concrete facts rather than the predictions they are. Strong conclusions are drawn on the basis of these predictions which do not have adequate data to support. These conclusions which touch on essentially all of the major claims made in the manuscript would need functional data to validate, or the claims need to be very substantially softened as they lack concrete support. Indeed, the fact that most of the genes examined that were characteristic of a given cluster did not show the expected expression patterns in IHC highlights the fact that such predictions require validation to be able to draw proper inferences.

      Response: we agree that many conclusions, which were based on bio-informatic predictions, are written in an over-affirmative way. Upon revision, we will rewrite these conclusions more precisely.

      (8) The cluster Epi_10_CYSTM1 which is the basis for much of the paper is present in a single individual (with a single cell coming from another person), and heavily unconnected from the rest of the epithelial populations. If so much emphasis is placed on it, the existence of this cluster as a true subset of cells requires validation.

      Response: we are thankful for this suggestion. We think that each cluster of epithelial cells is specified from other clusters and identified by DEGs, but they are not heavily unconnected from others. Upon revision, we plan to add further validation for the existence of Epi_10_CYSTM1.

      (9) Claims based on survival analysis of TCGA for Epi_10_CYSTM1 are based on a non-significant p-value, though there is a slight trend in that direction.

      Response: from the data of TCGA survival analysis for Epi_10, we found a not-so-slight trend of difference between groups (with a small P value). As a result, we presented this data and hoped to add more strength to the clinical significance of this cluster. However, this indeed caused controversy because the P value is non-significant. We plan to rewrite the conclusion more precisely or delete this data in the revised manuscript.

      (10) The claim "The identification of Epi_10_CYSTM1 as the only cell cluster found in patients with stage IIICp raises the possibility that this cluster may be a potential marker to diagnose patients with lymph node metastasis." This is incorrect according to the sample distributions which clearly show cells from the patient who has EPI_10_CYSTM1 in multiple other clusters. This is then used as justification for SLC26A3 which appears to be associated with associated with late stage, however, in the images SLC26A3 appears to be broadly expressed in later tumours rather than restricted to a minor subset as it should be if it were actually related to the EPI_10_CYSTM1 cluster.

      Response: we feel thankful for this question. The conclusion “The identification of Epi_10_CYSTM1 as the only cell cluster found in patients with stage IIICp raises the possibility that this cluster may be a potential marker to diagnose patients with lymph node metastasis” has indeed been written too concrete according to the sample distribution. We will correct the description in the up-coming revised manuscript. As for SLC26A3, we also do not think it is “broadly” expressed, but it is specified in later tumors. When we presented the data of IHC, we only showed the strongly-positive area of each slide in order to emphasize the differences, however, this has caused misunderstandings. Thus, upon revision, we would like to show the other areas of one case or even the scan of one whole slide as supplementary data.

      (11) The authors claim that cytotoxic T cells express KRT17, and KRT19. This likely represents a mis-clustering of epithelial cells.

      Response: we apologize for the ignorance of further validation of cytotoxic T cells. From fig. 4B and 4C, the four different clusters of T cells were basically identified based on canonical T cell markers. And then we focused mainly on the validation and further analysis of Tregs, neglecting the other clusters. In fig. 4D we intended to only show the top DEGs in each T cell cluster and hoped to find some potential marker genes for next-step analysis. However, we did not notice that there might be contamination of epithelial cells within cytotoxic T cells when clustering. We will optimize the analysis of this part in our revision.

      (12) Multiple claims are made for specific activities based on GO term biological process analysis which while not contradictory to the data, certainly are by no means the only explanation for it, nor directly supported.

      Response: our initial purpose was to use GO analysis as supports for our conclusions. However we know these are only claims but not evidence, which is also the problem of our writing techniques as in question (7). Therefore, in our revised manuscript, we plan to rewrite the conclusion from the GO analysis in a more scientific way or delete these data.

      Reviewer #2 (Public review):

      (1) I believe that many of the proposed conclusions are over-interpretations or unwarranted generalizations of the single-cell analysis. These conclusions are often based on populations in the scRNA-seq data that are described as enriched or specific to a given group of samples (eg. ADC). This conclusion is based on the percentage of cells in that population belonging to the given group; for example, a cluster of cells that dominantly come from ADC. The data includes multiple samples for each group, but statistical approaches are never used to demonstrate the reproducibility of these claims.

      Response: we understand that many of the conclusions are too sure but lack profound supporting evidence, thus we will optimize the writing in the revised manuscript. More importantly, to strengthen the validity of our data, we will try to use statistical approaches for further analysis.

      (2) This leads to problematic conclusions. For example, the "ADC-specific" Epi_10_CYSTM1 cluster, which is a central focus of the paper, only contains cells from one of the 11 ADC samples and represents only a small fraction of the malignant cells from that sample (Sample 7, Figure 2A). Yet, this population is used to derive SLC26A3 as a potential biomarker. SLC26A3 transcripts were only detected in this small population of cells (none of the other ADC samples), which makes me question the specificity of the IHC staining on the validation cohort.

      Response: we sincerely feel grateful for being questioned on the validity, appropriateness and the real potential of SLC26A3. We plan to add more explanation of the importance of SLC26A3 in the discussion part. We are also sorry for some over-sure conclusions about ADC-specific cell clusters, as well as the marker gene SLC26A3. However, we do not think these conclusions are problematic. In fact, due to the heterogeneity among different individuals, as well as even different sites within one individual when sampling, we think a “small faction” does not means it will not make sense. Also, these ADC-specific clusters (including Epi_10_CYSTM1) do have certain proportions when comparing with those “big fraction” groups (Fig. 2D). Furthermore, when considering the specificity of DEGs to ADC only, but not to SCC, we think it might be these ADC-specific cluster genes to have the central function to make a difference between ADC and SCC. And we further used validation experiment to support our hypothesis. Lastly and most importantly, SLC26A3 was coming from sample 7 whose clinical stage is FIGO IIIC (late stage) and pathological type is ADC. Among the 15 cases, there are only 4 cases whose clinical stages are late (within which 3 are ADC). At this point of view, we think 1 in 3 (33%) having expression of SLC26A3 (or existence of cluster Epi_10_CYSTM1) should be considered as a potential choice. Samples coming from early-staged and SCC patients do not have fractions of Epi_10_CYSTM1. This likewise indicates the specificity of this cell cluster to ADC. Therefore, in our revised manuscript, we plan to add more in-depth discussion about this question.

      (3) This is compounded by technical aspects of the analysis that hinder interpretation. For example, it is clear that the clustering does not perfectly segregate cell types. In Figures 2B and D, it is evident that C4 and C5 contain mixtures of cell type (eg. half of C4 is EPCAM+/CD3-, the other half EPCAM-/CD3+). These contaminations are carried forward into subclustering and are not addressed. Rather, it is claimed that there is a T cell population that is CD3- and EPCAM+, which does not seem likely.

      Response: do you mean Figure 1B and D? In the revised manuscript, we will list the canonical marker genes to cluster different types of cells to at least support that the clustering of cell types match most of the present published references. To further avoid the contamination of cells in each cluster, we will use quality controls and re-analyze these data upon revision.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Comment 1: This manuscript from Clayton and co-authors, entitled ”Mechanism of dimer selectivity and binding cooperativity of BRAF inhibitors”, aims to clarify the molecular mechanism of BRAF dimer selectivity. Indeed, first-generation BRAF inhibitors, targeting monomeric BRAFV600E, are ineffective in treating resistant dimeric BRAF isoforms. Here, the authors employed molecular dynamics simulations to study the conformational dynamics of monomeric and dimeric BRAF, in the presence and absence of inhibitors. Multi-microsecond MD simulations showed an inward shift of the αC helix in the BRAFV600E mutant dimer. This helped in identifying a hydrogen bond between the inhibitors and the BRAF residue Glu501 as critical for dimer compatibility. The stability of the aforementioned interaction seems to be important to distinguish between dimer-selective and equipotent inhibitors.

      The study is overall valuable and robust. The authors used the recently developed particle mesh Ewald constant pH molecular dynamics, a state-of-the-art method, to investigate the correct histidine protonation considering the dynamics of the protein. Then, multi-microsecond simulations showed differences in the flexibility of the αC helix and DFG motif. The dimerization restricts the αC position in the inward conformation, in agreement with the result that dimer-compatible inhibitors can stabilize the αC-in state. Noteworthy, the MD simulations were used to study the interactions between the inhibitors and the protein, suggesting a critical role for a hydrogen bond with Glu501. Finally, simulations of a mixed state of BRAF (one protomer bound to the inhibitor and the other apo) indicate that the ability to stabilize the inward αC state of the apo protomer could be at the basis of the positive cooperativity of PHI1.

      Response: We thank the reviewer for the positive evaluation of our work.

      Comment 2: One potential weakness in the manuscript is the lack of reported uncertainties related to the analyzed quantities. Providing this information would significantly enhance the clarity regarding the reliability of the analyses and the confidence in the claims presented.

      Response and revision: We agree with the reviewer that reporting uncertainties will clarify and strengthen our arguments. Following this suggestion, we have added error bars to Figures 3 and 5 representing the standard deviation of the K-E salt bridge probability. This shows that the deviation across replicas of how often the salt bridge is present. Thus, it better supports our claim that this salt bridge is promoted by the presence of PHI1, as the deviation of the salt bridge is minimal for protomers containing PHI1. In addition to these error bars, we have also included a table to the Supplementary Information (Supplementary Table 2) containing the mean and standard deviation of the αC position, K-E distance, and DFG pseudo dihedral for each protomer in our dimer simulations.

      Reviewer #2 (Public review):

      Comment 1: The authors employ molecular dynamics simulations to understand the selectivity of FDA-approved inhibitors within dimeric and monomeric BRAF species. Through these comprehensive simulations, they shed light on the selectivity of BRAF inhibitors by delineating the main structural changes occurring during dimerization and inhibitor action. Notably, they identify the two pivotal elements in this process: the movement and conformational changes involving the alpha-C helix and the formation of a hydrogen bond involving the Glu-501 residue. These findings find support in the analyses of various structures crystallized from dimers and co-crystallized monomers in the presence of inhibitors. The elucidation of this mechanism holds significant potential for advancing our understanding of kinase signaling and the development of future BRAF inhibitor drugs.

      The authors employ a diverse array of computational techniques to characterize the binding sites and interactions between inhibitors and the active site of BRAF in both dimeric and monomeric forms. They combine traditional and advanced molecular dynamics simulation techniques such as CpHMD (all-atom continuous constant pH molecular dynamics) to provide mechanistic explanations. Additionally, the paper introduces methods for identifying and characterizing the formation of the hydrogen bond involving the Glu501 residue without the need for extensive molecular dynamics simulations. This approach facilitates the rapid identification of future BRAF inhibitor candidates.

      Response: We thank the reviewer for the positive evaluation of our work.

      Comment 2: The use of molecular dynamics yields crucial structural insights and outlines a mechanism to elucidate dimer selectivity and cooperativity in these systems. However, the authors could consider the adoption of free energy methods to estimate the values of hydrogen bond energies and hydrophobic interactions, thereby enhancing the depth of their analysis.

      Response: The current free energy methods are capable of giving accurate estimates of the relative binding free energies of similar ligands; however, accurate calculations of the absolute free energies of hydrogen bond and hydrophobic interactions are not feasible yet. Thus, we decided not to pursue the calculations.

      Reviewer #1 (Suggestions to author)

      Comment 1: The general recommendation is to give more details about the procedure for the analyses performed and, when possible, show the uncertainties relative to the analyzed quantities. This would clearly indicate the reliability of the analyses and the confidence of the claims. Moreover, it is not always clear how the analyses were performed.

      Response and revision: As previously mentioned, we have added uncertainties to our bar graphs in Figures 3 and 5 as well as Supplemental Table 2. In regards to the clarity of our analysis, we added more detail on how the probability distributions were created, which we will discuss in our response to Comment 3.

      Comment 2: It is not clear why the authors decided to titrate only the histidines without considering the other charged residues. In particular, the authors show in Supplementary Figure 2 a network of which Asp595 (protomer A) is a part and that, given the direct interaction, could affect the protonation state of His477 (protomer B).

      Response: The reviewer is correct in that Asp595 directly interacts with His477 on the opposite protomer. This is exactly the reason why we did not consider titrating Asp595 – the interaction with His477 should further stabilize the charged state of Asp595 and downshift its pKa from the solution value of about 3.8. Thus, Asp595 will be charged at physiological pH and does not need to be titrated in the CpHMD simulations.

      Comment 3: Regarding the probability density plots (Figures 3 and 5), clarify if you used all the data from all the replicas and all the protomers. If possible, show a comparison between each replica in the Supplementary Figures. A Supplementary Table with the probability values for the measured K-E salt bridge could be helpful since the bar plots are hard to compare. Also in this case please report the uncertainty or a comparison between the replicas.

      Response and revision: To clarify how we created the probability density plots, the following line was added to the Methods section:

      On page 15, third paragraph: All probability distributions were created by combining the last three µs of each replica for each system, with each distribution consisting of 50 bins. Unless specified, distributions contain quantities from both protomers in dimeric simulations.

      As previously mentioned, we have included Supplemental Table 2 which contains the mean and standard deviation of the K-E distance across systems. For comparison between replicas, we found the time series of the K-E distance in the inhibitor-bound monomer and dimer systems in Supplemental Figure 7 to be sufficient.

      Comment 4: It would be better to define the claim: ”it is clear that the timescale of the DFG-out to DFG-in transition is longer than our simulation timeframe of a few microseconds” (lines 208-209). To me it is not obvious why this should be ”clear”.

      Response and revision: Our original statement was to convey that, as DFG-in is sampled very rarely, our simulations cannot accurately represent DFG transitions. We have revised the manuscript to the following:

      On page 6, fourth paragraph: While this does suggest dimerization loosens the DFG motif, our simulations do not appropriately model the DFG-out/-in transition as the DFG-in state is only occasionally sampled.

      Comment 5: In the case of the inhibited monomer simulations, the authors state: ”the PHI1Glu501 interaction can become completely disrupted, with the distance moving beyond 6 A to˚ as high as 12 A; correlated with the disruption of the PHI1-Glu501 interaction, the˚     αC position is shifted out to the range of 21 A-24˚ A” (lines 241-244). However, the plot of the PHI1-Glu501˚ interaction time-series (Supplementary Figure 7) shows that just in one replica of one protomer (Protomer A), the interaction is disrupted, and the αC position never exceeds 21 A (time-series˚ reported in Supplementary Figure 6). None of the fluctuations of the αC position appear to be correlated with the disruption of the ligand-Glu501 interaction. The time-series reported in Supplementary Figures 6 and 7 suggest that the two events are uncorrelated. Please explain this aspect or quantify the correlation to support your claim.

      Response: We believe the source of this confusion is because we did not include a time series of αC for inhibited monomer simulations–Supplementary Figure 6 mentioned in the comment is of dimeric BRAF. Thus, We have added Supplementary Figure 8, a timeseries plot of the αC position for inhibited monomer and dimer protomers.

      Comment 6: Regarding the analyses of the positive cooperativity, the DFG dihedral probability densities for the apo protomer (Figure 5a) are highly overlapping. Thus, it is hard to believe that these small differences support the claim that ”PHI1 binding in one protomer can allosterically shift the DFG motif outward, making it favorable for binding a second inhibitor” (lines 300-302). The authors should show that the differences in the DFG distributions (in particular, apo dimer vs PHI1 mixed) are statistically significant. Only in this case, the data could support the claim that PHI1 bound to one protomer modulates the DFG conformation in the second one. In my opinion, the overlap between the DFG dihedral probability (Figure 5a) is too high to support the claim that PHI1 is able to allosterically modulate this region in the second apo protomer. Please provide an appropriate statistical test that demonstrates that those distributions are significantly different.

      Response and revision: We have adjusted this statement based on the new Supplementary Table 2 to read as the following:

      On page 9, third paragraph: Although the shift is small (the differences between means is approximately one standard deviation, see Supplementary Table 2), it suggests that PHI1 binding in one protomer can allosterically shift the DFG motif outward, making it favorable for binding a second inhibitor. In contrast, the DFG dihedral of the apo protomer in the LY-bound mixed dimer appears to be slightly smaller than the apo dimer with difference between means of approximately one standard deviation (Supplementary Table 2), which is unfavorable for binding the second inhibitor (orange and grey, Figure 5a right).

      Comment 7: Regarding the dimer holo simulations, I agree that in the LY-bound dimer simulations, the hydrogen bond between the ligand and the E501 is weaker, but I do not understand the sentence ”as seen from the local density maximum centered at∼3.4 A” at line 233, since the 2D˚ density plot (Figure 3h) shows that the highest peak is close to 5 A. Also, it would be useful to˚ clarify how these 2D density plots reported in Figure 3 were obtained.

      Response and revision: While the highest peak in Figure 3h is close to 5 A, we were more˚ interested in the local peak close to 3.4 A. To avoid confusion we have modified the line to separate˚ both peaks:

      On page 7, second paragraph: In the LY-bound dimer simulations, however, the LY–Glu501 h-bond is weaker and less stable than the counterpart of the PHI1-bound dimer, as seen from the local density maximum centered at ∼3.4 and the global maximum near ∼4.5 A (Figure 3g,h).˚

      Comment 8: I have a comment on the strategy suggested to empirically classify the inhibitors by comparing the Glu501-Lys483 distance and the αC position in the two protomers of the crystal structures (in the Concluding Discussion section). The authors suggest that differences below 1 A could determine whether the flexibility of these regions is restricted or not (and whether the˚ inhibitor is equipotent or dimer-selective). However, differences below 1 A, in structures where˚ the average resolution is 2.5 A, might be highly unreliable. In fact, as the authors pointed out, LY˚ and Ponatinib would be classified (erroneously) as dimer-selective inhibitors according to these criteria.

      Response and revision: We agree that this proposed method could be unreliable; we intend this strategy to be used as a “quick and dirty” method for analyzing future structures in order to assess selectivity for dimeric BRAF. To convey this, we added the following sentence:

      On page 12, second paragraph: Given that the resolution of a resolved structure is often ∼23 A, this proposed assessment is not intended to replace more rigorous tests, i.e. utilizing MD˚ simulations.

      Comment 9: A suggestion is to include representative snapshots of the MD simulation in the GitHub repository could allow the reader to better appreciate the results described in the present study.

      Response and revision: In order to convey the difference between induced effects of PHI1 and LY, we have added a new folder named snapshots to the GitHub repository which contains the snapshots from the simulations of one LY or one PHI1 bound BRAF (visualized in Figure 5c) in the form of PDB files.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This work presents an in-depth characterization of the factors that influence the structural dynamics of the Clostridium botulinum guanidine-IV riboswitch (riboG). Using a single-molecule FRET, the authors demonstrate that riboG undergoes ligand and Mg2+ dependent conformational changes consistent with the dynamic formation of a kissing loop (KL) in the aptamer domain. Formation of the KL is attenuated by Mg2+ and Gua+ ligand at physiological concentrations as well as the length of the RNA. Interestingly, the KL is most stable in the context of just the aptamer domain compared to longer RNAs capable of forming the terminator stem. To attenuate transcription, binding of Gua+ and formation of the KL must occur rapidly after transcription of the aptamer domain but before transcription of the rest of the terminator stem.

      Strengths:

      (1) Single-molecule FRET microscopy is well suited to unveil the conformational dynamics of KL formation and the authors provide a wealth of data to examine the effect of the ligand and ions on riboswitch dynamics. The addition of complementary transcriptional readthrough assays provides further support for the author's proposed model of how the riboswitch dynamics contribute to function.

      (2) The single-molecule data strongly support that the effect of Gua+ ligand and Mg2+ influence the RNA structure differently for varying lengths of the RNA. The authors also demonstrate that this is specific for Mg2+ as Na+ and K+ ions have little effect.

      (3) The PLOR method utilized is clever and well adapted for both dual labeling of RNAs and examining RNA at various lengths to mimic co-transcriptional folding. Using PLOR, they demonstrate that a change in the structural dynamics and ligand binding can occur after the extension of the RNA transcript by a single nucleotide. Such a tight window of regulation has intriguing implications for kinetically controlled riboswitches.

      Weaknesses:

      (1) The authors use only one mutant to confirm that their FRET signal indicates the formation of the KL. Importantly, this mutation does not involve the nucleotides that are part of the KL interaction. It would be more convincing if the authors used mutations in both strands of the KL and performed compensatory mutations that restore base pairing. Experiments like this would solidify the structural interpretation of the work, particularly in the context of the full-length riboG RNA or in the cotranscriptional mimic experiments, which appear to have more conformational heterogeneity.

      We thank the reviewer for describing our work “in-depth characterization” of riboG. We agree with the reviewer and we have added two more mutants, G71C and U72C with the mutations located at the KL (Figure 2– figure supplement 8A, 8B, 9A, 9B, Figure 3– figure supplement 6A, 6B, 7A, 7B, and Figure 4– figure supplement 6A, 6B, 7A, 7B). Furthermore, we have performed compensatory mutations, C30G-G71C and A29G-U72C that restore base pairing in the KL (Figure 2– figure supplement 8C, 8D, 9C, 9D, Figure 3– figure supplement 6C, 6D, 7C, 7D, and Figure 4– figure supplement 6C, 6D, 7C, 7D). We added the experimental results in the revised manuscript accordingly as “The highly conserved nucleotides surrounding the KL are crucial for its formation (Lenkeit et al., 2020). To test our hypothesis that the state with EFRET ~ 0.8 corresponds to the conformation with the KL, we preformed smFRET analysis on several mutations at these crucial nucleotides (Figure 2– figure supplement 8–10). Consistent with our expectations, the peaks with EFRET ~ 0.8 was significantly diminished in the riboG-G71C mutant, which features a single nucleotide mutation at site 71 (with 97% nucleotide conservation) in the KL (Figure 2– figure supplement 8A and 8B). It is worth noting that the C30G and G71C mutant, which were initially expected to restore a base pair in the KL, did not successfully bring about the anticipated peak of EFRET ~ 0.8 (Figure 2– figure supplement 8C and 8D). On the other hand, the riboG-U72C mutant exhibited a lower proportion at the state with EFRET ~ 0.8 than riboG-apt. However, the A29G and U72C mutations restored a base pair in the KL, as well as the formation of the KL (Figure 2– figure supplement 9). Furthermore, our investigation revealed that the G77C mutant, involving a single nucleotide mutation at a highly conversed site, 77 (with 97% nucleotide conservation), also hindered the formation of the KL (Figure 2– figure supplement 10). This finding aligns with previous research (Lenkeit et al., 2020) and the predicted second structure of G77C mutation by Mfold (Zuker, 2003)”  ( page 7), “In contrast to riboG-term, both its G71C and C30G-G71C mutants displayed a reduced proportion of the state with EFRET ~ 0.8. Remarkably, the fractions of EFRET ~ 0.8 remained unaffected by the addition of 1.0 mM Gua+ in these mutants. Distinct from riboG-term, no structural transitions between states were observed in the two mutants (Figure 3– figure supplement 6). Regarding the U72C mutant of riboG-term, the mutation at the site 72 had a reduced impact on the KL conformation in the presence of 1.0 mM Gua+ and 2.0 mM Mg2+. However, the increased proportion of EFRET ~ 0.8 in the A29G-U72C mutant of riboG-term suggests that these mutations can restore the base-pairing between sites 29 and 72, as well as facilitate the formation of the KL (Figure 3– figure supplement 7)” ( page 8), and “Upon comparing the G71C and C30G-G71C mutants of the full-length riboG with their wild-type counterpart, it was observed that the wild-type adopted higher proportions of the state with EFRET ~ 0.8 (Figure 4– figure supplement 6). Regarding the U72C and A29G-U72C mutants of the full-length riboG, their behaviors with regards to the peak with EFRET ~ 0.8 were similar to that of their counterparts in riboG-term (Figure 4– figure supplement 7)” ( page 9).

      (2) The existence of the pre-folded state (intermediate FRET ~0.5) is not well supported in their data and could be explained by an acquisition artifact. The dwell times are very short often only a single frame indicating that there could be a very fast transition (< 0.1s) from low to high FRET that averages to a FRET efficiency of 0.5. To firmly demonstrate that this intermediate FRET state is metastable and not an artifact, the authors need to perform measurements with a faster frame rate and demonstrate that the state is still present.

      We thank the reviewer for the great comment. We added smFRET experiments at higher time resolution, 20 ms, as well as lower time resolution (Figure 2– figure supplement 3).  Based on our experimental results, the intermediate state (EFRET ~0.5) exists at the smFRET collected at 20 ms, 100 ms and 200 ms. 

      (3) The PLOR method employs a non-biologically relevant polymerase (T7 RNAP) to mimic transcription elongation and folding near the elongation complex. T7 RNAP has a shorter exit channel than bacterial RNAPs and therefore, folding in the exit channel may be different between different RNAPs. Additionally, the nascent RNA may interact with bacterial RNAP differently. For these reasons, it is not clear how well the dynamics observed in the T7 ECs recapitulate riboswitch folding dynamics in bacterial ECs where they would occur in nature. 

      We thank the reviewer for the comment. We agree with the reviewer that the bacterial and T7 RNAPs may behave differently due to their differences in transcriptional speed, dynamics, interactions, and so on. And we added a statement in the Discussion as “It is worth noting that the RNAP utilized in our study is T7 RNAP, which exhibits distinct characteristics compared to bacterial RNAP in terms of transcriptional speed, dynamics, and interactions. However, Xue et al. have reported similarities between T7 and E. coli RNAP in the folding of nascent RNA. Additionally, Lou and Woodson have provided valuable insights into the co-transcriptional folding of the glmS ribozyme using T7 RNAP (Xue et al., 2023; Lou & Woodson, 2024)” ( page 13–14).

      Reviewer #2 (Public Review):

      Summary:

      Gao et al. used single-molecule FRET and step-wise transcription methods to study the conformations of the recently reported guanidine-IV class of bacterial riboswitches that upregulate transcription in the presence of elevated guanidine. Using three riboswitch lengths, the authors analyzed the distributions and transitions between different conformers in response to different Mg2+ and guanidine concentrations. These data led to a three-state kinetic model for the structural switching of this novel class of riboswitches whose structures remain unavailable. Using the PLOR method that the authors previously invented, they further examined the conformations, ligand responses, and gene-regulatory outcomes at discrete transcript lengths along the path of vectorial transcription. These analyses uncover that the riboswitch exhibits differential sensitivity to ligand-induced conformational switching at different steps of transcription, and identify a short window where the regulatory outcome is most sensitive to ligand binding.

      Strengths:

      Dual internal labeling of long RNA transcripts remains technically very challenging but essential for smFRET analyses of RNA conformations. The authors should be commended for achieving very high quality and purity in their labelled RNA samples. The data are extensive, robust, thorough, and meticulously controlled. The interpretations are logical and conservative. The writing is reasonably clear and the illustrations are of high quality. The findings are significant because the paradigm uncovered here for this relatively simple riboswitch class is likely also employed in numerous other kinetically regulated riboswitches. The ability to quantitatively assess RNA conformations and ligand responses at multiple discrete points along the path towards the full transcript provides a rare and powerful glimpse into cotranscriptional RNA folding, ligand-binding, and conformational switching.

      Weaknesses:

      The use of T7 RNA polymerase instead of a near-cognate bacterial RNA polymerase in the termination/antitermination assays is a significant caveat. It is understandable as T7 RNA polymerase is much more robust than its bacterial counterparts, which probably will not survive the extensive washes required by the PLOR method. The major conclusions should still hold, as the RNA conformations are probed by smFRET at static, halted complexes instead of on the fly. However, potential effects of the cognate RNA polymerase cannot be discerned here, including transcriptional rates, pausing, and interactions between the nascent transcript and the RNA exit channel, if any. The authors should refrain from discussing potential effects from the DNA template or the T7 RNA polymerase, as these elements are not cognate with the riboswitch under study.

      We thank the reviewer for describing our work “The data are extensive, robust, thorough, and meticulously controlled. The interpretations are logical and conservative. The writing is reasonably clear and the illustrations are of high quality”. We agree with the reviewer that the bacterial and T7 RNAPs may behave differently due to their differences in transcriptional speed, dynamics, interactions, and so on. And we added a statement in the Discussion as “It is worth noting that the RNAP utilized in our study is T7 RNAP, which exhibits distinct characteristics compared to bacterial RNAP in terms of transcriptional speed, dynamics, and interactions. However, Xue et al. have reported similarities between T7 and E. coli RNAP in the folding of nascent RNA. Additionally, Lou and Woodson have provided valuable insights into the co-transcriptional folding of the glmS ribozyme using T7 RNAP (Xue et al., 2023; Lou & Woodson, 2024)” ( page 14).

      Reviewer #3 (Public Review):

      Summary:

      In this article, Gao et. al. uses single-molecule FRET (smFRET) and position-specific labelling of RNA (PLOR) to dissect the folding and behavioral ligand sensing of the Guanidine-IV riboswitch in the presence and absence of the ligand guanidine and the cation Mg2+. The results provided valuable information on the mechanistic aspects of the riboswitch, including the confirmation of the kissing loop present in the structure as essential for folding and riboswitch activity. Co-transcriptional investigations of the system provided key information on the ligand-sensing behavior and ligandbinding window of the riboswitch. A plausible folding model of the Guanidine-IV riboswitch was proposed as a final result. The evidence presented here sheds additional light on the mode of action of transcriptional riboswitches.

      Strengths:

      The investigations were very thorough, providing data that supports the conclusions. The use of smFRET and PLOR to investigate RNA folding has been shown to be a valuable tool for the understanding of folding and behavior properties of these structured RNA molecules. The co-transcriptional analysis brought important information on how the riboswitch works, including the ligand-sensing and the binding window that promotes the structural switch. The fact that investigations were done with the aptamer domain, aptamer domain + terminator/anti-terminator region, and the full-length riboswitch were essential to inform how each domain contributes to the final structural state if in the presence of the ligand and Mg2+.

      Weaknesses:

      The system has its own flaws when compared to physiological conditions. The RNA polymerase used (the study uses T7 RNA polymerase) is different from the bacterial RNA polymerase, not only in complexity, but also in transcriptional speed, which can directly interfere with folding and ligand-sensing. Additionally, rNTPs concentrations were much lower than physiological concentrations during transcription, likely causing a change in the polymerase transcriptional speed. These important aspects and how they could interfere with results are important to be addressed to the broad audience. Another point of consideration to be aware of is that the bulky fluorophores attached to the nucleotides can interfere with folding to some extent.

      We thank the reviewer for describing our work as “The investigations were very thorough, providing data that supports the conclusions”. We agree with the reviewer that the bacterial and T7 RNAPs may behave differently due to their differences in transcriptional speed, dynamics, interactions, and so on. And we added a statement in the Discussion as “It is worth noting that the RNAP utilized in our study is T7 RNAP, which exhibits distinct characteristics compared to bacterial RNAP in terms of transcriptional speed, dynamics, and interactions. However, Xue et al. have reported similarities between T7 and E. coli RNAP in the folding of nascent RNA. Additionally, Lou and Woodson have provided valuable insights into the cotranscriptional folding of the glmS ribozyme using T7 RNAP (Xue et al., 2023; Lou & Woodson, 2024)” ( page 14). And we also agree with the reviewer that the lower NTP may affect the transcriptional speed. Regarding the fluorophores, we purposely placed them away from the KL to avoid their influence on the formation of the KL.

      Reviewer #1 (Recommendations For The Authors):

      Related to weakness 1

      - The authors cite a paper that investigated mutations in the KL duplex but do not include these mutations in their analysis. It is unclear why the authors chose the G77C mutation and not the other mutants previously tested. Can the authors explain their choice of mutation in detail in the text? I also did not see the proposed secondary structure for the G77C mutant shown in Figure 2 -supp 3A in the cited paper, is this a predicted structure? Please explain how this structure was determined. 

      We thank the reviewer for the comment. The reason we chosen the G77C mutation is based on previous report that G77C can disturb the formation of the KL, as we stated in the manuscript as “Furthermore, our investigation revealed that the G77C mutant, involving a single nucleotide mutation at a highly conversed site, 77 (with 97% nucleotide conservation), also hindered the formation of the KL (Figure 2– figure supplement 10). This finding aligns with previous research (Lenkeit et al., 2020) and the predicted second structure of G77C mutation by Mfold (Zuker, 2003)” ( page 7). And the secondary structure for the G77C mutant was predicted by Mfold, which as cited in the manuscript and added in the reference list as “Zuker, M. (2003). Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Research, 31(13), 3406-3415”. 

      - It is not clear to me that the structural interpretation of their FRET states is correct and that the FRET signal reports on the base pairing of the KL in only the high FRET state. The authors should perform experiments with additional mutations in the KL duplex to confirm that their construct reports on KL duplex formation alone and not other structural dynamics. 

      We thank the reviewer for the comment. We have included additional mutations to establish a connection between the high-FRET state to the formation of the KL. The results have been added to the manuscript as “The highly conserved nucleotides surrounding the KL are crucial for its formation (Lenkeit et al., 2020). To test our hypothesis that the state with EFRET ~ 0.8 corresponds to the conformation with the KL, we preformed smFRET analysis on several mutations at these crucial nucleotides (Figure 2– figure supplement 8–10). Consistent with our expectations, the peaks with EFRET ~ 0.8 was significantly diminished in the riboG-G71C mutant, which features a single nucleotide mutation at site 71 (with 97% nucleotide conservation) in the KL (Figure 2– figure supplement 8A and 8B). It is worth noting that the C30G and G71C mutant, which were initially expected to restore a base pair in the KL, did not successfully bring about the anticipated peak of EFRET ~ 0.8 (Figure 2– figure supplement 8C and 8D). On the other hand, the riboG-U72C mutant exhibited a lower proportion at the state with EFRET ~ 0.8 than riboG-apt. However, the A29G and U72C mutations restored a base pair in the KL, as well as the formation of the KL (Figure 2– figure supplement 9). Furthermore, our investigation revealed that the G77C mutant, involving a single nucleotide mutation at a highly conversed site, 77 (with 97% nucleotide conservation), also hindered the formation of the KL (Figure 2– figure supplement 10). This finding aligns with previous research (Lenkeit et al., 2020) and the predicted second structure of G77C mutation by Mfold (Zuker, 2003)”  ( page 7), “In contrast to riboG-term, both its G71C and C30G-G71C mutants displayed a reduced proportion of the state with EFRET ~ 0.8. Remarkably, the fractions of EFRET ~ 0.8 remained unaffected by the addition of 1.0 mM Gua+ in these mutants. Distinct from riboG-term, no structural transitions between states were observed in the two mutants (Figure 3– figure supplement 6). Regarding the U72C mutant of riboG-term, the mutation at the site 72 had a reduced impact on the KL conformation in the presence of 1.0 mM Gua+ and 2.0 mM Mg2+. However, the increased proportion of EFRET ~ 0.8 in the A29G-U72C mutant of riboG-term suggests that these mutations can restore the base-pairing between sites 29 and 72, as well as facilitate the formation of the KL (Figure 3– figure supplement 7)” ( page 8), and “Upon comparing the G71C and C30G-G71C mutants of the full-length riboG with their wild-type counterpart, it was observed that the wild-type adopted higher proportions of the state with EFRET ~ 0.8 (Figure 4– figure supplement 6). Regarding the U72C and A29G-U72C mutants of the full-length riboG, their behaviors with regards to the peak with EFRET ~ 0.8 were similar to that of their counterparts in riboG-term (Figure 4– figure supplement 7)” ( page 9).  

      - For the full-length riboG-136 (Cy3Cy5 riboG in Figure 4), the authors have clearly defined peaks at 0.6 and 0.4. However, the authors do not explain their structural interpretation of these states. Do the authors believe that the KL is forming in these states? It would be helpful to have data on mutations in the KL in the context of the full-length riboG to better understand the structural transitions of these intermediate states. 

      Based on our mutation studies, we proposed that the peak with EFRET ~0.8 corresponds to the conformation with the KL, while the states with EFRET ~0.4 and 0.6 are the states without a stable KL. 

      Related to weakness 2:

      - For the riboG-apt and riboG-term RNAs, the proposed intermediate FRET state (EFRET = 0.5) is poorly fit by a Gaussian and the dwell times in the state are almost entirely single-frame dwells. It is likely that this state is the result of a camera blurring artifact, in which RNAs undergo a FRET transition between two frames giving an apparent FRET efficiency which is between that of the two transitioning states. This artifact arises when the average dwell times of the true states (Elow and Ehigh) are comparable to the frame duration (within a factor of ~5-10; see https://doi.org/10.1021/acs.jpcb.1c01036). To confirm the presence of the intermediate state, the authors should perform at least a few experiments with higher time resolution to support the existence of the 0.5 state with a lifetime of 0.1 s. Alternatively, the data should be refit to a two-state HMM and the authors could explain in the text that the density in the FRET histogram between the two states is likely due to transitions that are faster than the time resolution of the experiment. 

      We thank the reviewer for the great comment. Taking the suggestion into consideration, we performed smFRET experiments with a higher time resolution of 20 ms. As a result, we still detected the intermediate state, supporting that it is not an artifact. The new data has been included in the revised manuscript (Figure 2-figure supplement 3).  

      Related to weakness 3:

      - The authors depict the polymerase footprint differently in some of the figures and it is unclear if this is part of their model. Is the cartoon RNAP supposed to indicate the RNA:DNA hybrid or the footprint of T7 RNAP on the RNA? For example, in Figure 8a there are 8 nts (left) and 9 nts (right) covered by RNAP, and only 6nts in Figure 6 - supp 2A. This is particularly misleading for the EC-87 and EC-88 in Figure 6 - supp 2, where it is likely that this stem is not formed at all and the KL strand is single-stranded. The authors should clarify and at least indicate in the figure legend if the RNAP cartoon is part of the model or only a representation. 

      We thank the reviewer for bringing the issues to our attention. Due to space limitations, we chose to represent the polymerase footprint differently in Figure 8. However, we have included the statement “DNA templates from EC-87 to EC-105 are not displayed in the model” in the legend of Figure 8 to avoid the confusion.

      Moreover, we have corrected the error of 6 nts Figure 6-supplement figure 2.  

      - With a correct 9 bp RNA:DNA hybrid, the EC-88 construct would not be able to form the top part of the P2 stem and the second half of the KL RNA would be single-stranded. In this case, an interaction between the KL nucleotides would resemble a pseudoknot and not a kissing loop interaction. Can the authors explain if this could explain the heterogeneity they observe in the EC-88 construct compared to the riboGapt  RNA?

      Thank the reviewer for the comment. We have added the statement in the revised manuscript as “The T7 RNA polymerase (RNAP) sequestered about 8 nt of the nascent RNA, preventing the EC-88 construct from forming the P2 stem (Durniak et al., 2008; Huang & Sousa, 2000; Lubkowska et al., 2011; Tahirov et al., 2002; Wang et al., 2022; Yin & Steitz, 2002). Consequently, a pseudoknot structure potentially formed instead of the expected KL. This distinction may account for the observed heterogeneity between EC-88 and riboG-apt” ( page 11).

      Other comments:

      (1) It appears that the FRET histograms in the PLOR experiments (Figure 6 and related figures) only show the fits presumably to highlight the overlays. However, this makes it impossible to determine the goodness of the fit. The authors should instead show the outline of the raw histogram with the fit, or at least show the raw histograms with fits in the supplement. 

      We have replaced Figure 6- figure supplements 2-4 to enhance the clarity of the raw and fitted smFRET histograms.  

      (2) The authors should consider including a concluding paragraph to put the results into a larger context. How does the kinetic window compare to other transcriptional riboswitches? Would the authors comment on how the transcription speed compares to the kinetics for the formation of the KL? 

      We thank the reviewer for the comment. We have added the comparison of riboG to other transcription riboswitches to the manuscript as “Nevertheless, the ligand-sensitive windows of riboswitches during transcription vary. In a study conducted by Helmling et al. using NMR spectroscopy, they proposed a broad transcriptional window for deoxyguanosine-sensing riboswitches, whereby the ligand binding capability gradually diminishes over several nucleotide lengths (Helmling et al., 2017). However, more recent research by Binas et al. and Landgraf et al. on riboswitches sensing ZMP, c-di-GMP, and c-GAMP revealed a narrow window with a sharp transition in binding capability, even with transcript lengths differing by only one or three nucleotides (Binas et al., 2020; Landgraf et al., 2022). In line with the findings for the c-GAMP-sensing riboswitch, our study on the guanidine-IV riboswitch also demonstrated a sharp transition in binding capability with just a single nucleotide extension” ( page 14). 

      We appreciate the reviewer’s comment in comparing the transcription speed to the kinetics of the KL formation. However, we must acknowledge that we have limited kinetic data in this study to confidently make such a comparison.

      (3) Cy3Cy5 RiboG is a confusing name because it implies that the others are not also Cy3Cy5 labeled. The authors should consider changing the names and being consistent throughout. I suggest full-length riboG or riboG-136. 

      We have changed “Cy3Cy5 riboG” to “Cy3Cy5-full-length riboG” (pages 15 and 16).

      (4) The transcriptional readthrough experiment should be explained when first mentioned in line 109. 

      We have added the citation (Chien et al., 2023) of the transcriptional readthrough experiment to the manuscript as “we noted that the transcriptional read-through of the guanidine-IV riboswitch during the single-round PLOR reaction was sensitive to Gua+, exhibiting an apparent EC50 value of 68.7  7.3 μM (Figure 1D) (Chien et al., 2023)” (page 5). 

      (5) Kd values in text should have uncertainties, and the way these uncertainties are obtained should be explained.

      We have added the uncertainties of Kd values in the revised manuscript ( page 6) and the legend of Figure 2-supplement 6 as “The percentages of the folded state (EFRET ~ 0.8) of Cy3Cy5-riboG-apt were plotted with the concentrations of Gua+ at 0.5 mM Mg2+, with an apparent Kd of 286.0  18.1 μM in three independent experiments”.

      (6) The authors mention "strategies" on line 306, but it is unclear what they are referring to. Are the strategies referring to the constructs (EC-87, etc) or Steps 1-8 in the supplemental figure? Please clarify. 

      We have clarified the confusion by adding “The detailed procedures of strategies 1-8 were shown in Figure 7–figure supplement 1” to the manuscript ( page 12).

      (7) What are the fraction of dynamic traces versus static traces in the cases for the full-length riboG? This would help depict the structural heterogeneity in the population. 

      We have added the fractions of dynamic single-molecule traces of the full-length riboG to Figure 4-supplements 1-5. 

      (8) The labels in Figure 4 (A-E) don't match the caption (A-H). 

      We have corrected the error. 

      (9) The coloring of the RNA strands in Figure 4A should be explained in the figure legend. It could be interpreted as multiple strands annealed instead of a continuous strand. 

      We have revised the legend of Figure 4A by adding “The full-length riboG contains the aptamer domain (black), terminator (red) and the extended sequence (blue). Cy3 and Cy5 are shown by green and red sparkles, respectively”.

      (10) Reported quantities and uncertainties should have the same number of decimal places. In many places, the uncertainties likely have too many significant figures, for example, in Figure 5 and related figures. 

      We have corrected the significant figures of the uncertainties. 

      (11) In Figure 5, A and B should have the same vertical scale to facilitate comparison. 

      We have adjusted Figure 5A to match the vertical scale of Figure 5B in the revised manuscript.

      (12) In Figure 5C-D, the construct from which those trajectories come should be indicated in the legend. 

      We have added the construct to the legend of Figures 5C and D.  

      (13) In Figure 6J, the splines between data points are confusing and can be misleading. They suggest that the data has been fit to a model, but I am not sure if it represents a model. The data points should be colored instead and lines removed. 

      We thank the reviewer for the comment. We have changed Figure 6J by coloring the data points and removing the lines to avoid confusion. 

      (14) Line 330 mentions a P2 structure in Figure 8, but there is no such label in Figure. Please clarify. 

      We thank the reviewer for the comment and have added P2 to Figure 8. 

      Reviewer #2 (Recommendations For The Authors):

      (1) Figure 1B. The authors don't seem to address the role of the blue stem-loop following Stems 1 and 2. Is this element needed at all for gene regulation? Does it impact the conformations or folding of the preceding Stems 1 and 2? It seems feasible to disrupt the stem and see whether there is an impact on riboswitch function. 

      We thank the reviewer for the comment. The presence of the sequence which formed blue stem-loop indicates the formation of an anti-terminator conformation in riboG during transcription. Our smFRET data shows that the inclusion of the stem-loop sequence induces additional peaks in the full-length riboG compared to the riboGterm. This indicates that the stem-loop influences the folding of the kissing loop (KL) and potentially also affects the stems 1 and 2.  

      (2) Figure 7 supplement 1, C &D. Maybe I am missing something, but it seems to me in reaction #8 (EC-105, last two lanes), the readthrough percentage is close to 50% based on the gel but plotted in D as 20%. Further, there is a strong effect of guanidine in reaction #8 but that is not reflected in the quantitation in panel D. 

      We thank the reviewer for the comment. The observed discrepancy between reaction 8 in (C) and (D) is from the differential handling of the crude product at the last step (step 17) in gel loading for (C), contrasted with the combination of crude products from steps 16 and 17 to calculate the read-through percentage in (D). We have corrected the discrepancy by replacing Figure 7-Supplement figure 1C (now Figure 7C), and revised the legend to include the following clarification: “Taking into consideration that the 17 step-PLOR reaction exhibited a pause within the terminator region, resulting in a significant amount of terminated product at step 16, crude products from steps 16 and 17 were collected for (C) and (D) of the 17 step-PLOR reaction (Lanes 15 and 16 in C)”.

      (3) Figure 7C is a control that shows the quality of the elongation complexes, which probably should be in the supplement. Instead, in Figure 7 supplement 1, panels C and D are actual experiments and could be moved into the main figure.  

      We thank the reviewer for the comment. We made the adjustment.  

      (4) Figure S7D. I would suggest not labelling the RNA polymerase halt/stoppage sites due to NTP deprivation as "pausing sites" because transcriptional pausing has previously been defined as natural sites where the RNA polymerase transiently halts itself, but not due to the lack of the next NTPs. In this case, the elongating complexes were artificially halted, which is technically not "pausing", as it will not restart/resume on its own without intervention. 

      We have changed the “pausing” to “halting”.  

      (5) Figure 7 is titled "In vitro transcriptional performance of riboG." But the data is actually not about the performance of the riboswitch, or how well it functions. I would suggest the authors revise the title. This is mostly about the observed sensitivity window of the riboswitch to ligand-mediated conformational switching. 

      We have changed the title of Figure 7 to “Ligand-mediated conformational switching of riboG during transcription”.

      (6) Figure 7A, the illustration gives the visual impression that there are multiple RNA polymerases on the same DNA template, which is not the case. 

      We have revised Figure 7A by adding arrows between RNA polymerases to illustrate the movement of a single RNAP, rather than multiple RNAP on the same template.

      (7) It could be informative to compare the guanidine-IV riboswitch with the first three classes (I, II, III), to see how their architectures or gene regulatory mechanisms are similar or different. 

      We thank the reviewer for the comment. We have added the comparison of the guanidine-IV riboswitch to other three guanidine riboswitches to the manuscript as “The guanidine-IV riboswitch exhibits similarities to the guanidine-I riboswitch in gene regulatory mechanism, functioning as a transcriptional riboswitch. Structurally, it resembles the guanidine-II riboswitch through the formation of loop-loop interactions upon binding to guanidine (Battaglia & Ke, 2018; L. Huang et al., 2017; Lin Huang et al., 2017; Lenkeit et al., 2020; Nelson et al., 2017; Reiss & Strobel, 2017; Salvail et al., 2020)” ( page 12).  

      Reviewer #3 (Recommendations For The Authors):

      In addition to the public review items, I provide the following recommendations:

      (1) As a second language speaker, I understand that writing a compelling and concise story may be hard, and we tend to write more than needed or more repetitively. That being said, I do think that the writing could be improved to make it more concise, clear, and avoid repetitions.

      We thank the reviewer for the comment. We re-wrote the abstract and some sentences in the manuscript.

      (2) In the abstract, instead of saying that "...This lack of understanding has impeded the application of this riboswitch", which makes the statement too strong, perhaps, stating something along the lines of "this understanding would assist the application of this riboswitch", would be a better fit. 

      We have re-wrote the abstract, and revised the sentence.  

      (3) Methods should state which RNA polymerase was used. PLOR uses T7 RNA pol, so I assume it was the same. 

      We have added the statement “T7 RNAP was utilized in the PLOR and in vitro transcription reactions except noted” in the Methods ( page 15). 

      (4) The impact statement says comprehensive structure-function, where perhaps comprehensive folding-function would be more appropriate. We are still missing a lot of structural information about this particular riboswitch. 

      We agree with the reviewer, and changed “comprehensive structure-function” to “folding-function” in Impact statement ( page 2).

      (5) Higher Mg2+ concentrations implicated in a lesser extent of the switch of RiboGapt, a sentence talking about it would be useful (how Mg2+ could have promiscuous interaction and interfere with folding). 

      We have added the role of higher Mg2+ to the manuscript as “However, at a higher concentration of 50.0 mM Mg2+, the proportion of the pre-folded and unfolded conformations were more prevalent at 50.0 mM Mg2+ than at 20.0 mM Mg2+. This suggests that an excess of Mg2+ may promote the pre-folded and even unfolded conformations” ( page 6).

      (6) In the investigations of RiboG-term and RiboG, seems like that monovalents from the buffer are sufficient to promote secondary structure. A statement commenting on this would benefit the paper and the audience. 

      We agree with the reviewer and have accordingly revised the manuscript accordingly by adding “This indicates that monovalent ions in the buffer can facilitate the formation of stable guanidine-IV riboswitch” ( page 8).

      (7) Figure 3. Figure goes to panel E and legend to panel H. G and H colors do not correspond to actual figure colors. 

      We made the correction.  

      (8) Figure 4. The same as Figure 3, the panels and figures are divergent.  

      We made the correction.  

      (9) During the discussion, stating that the DNA and RNA pol play a role in folding and ligand binding may be excessive. This could be an indirect effect of the transcriptional bubble hindering part of the nascent RNA from folding, which is something intrinsic to any transcription and not specific to this system. 

      We agree with the reviewer and deleted the statement about the DNA and RNAP play a role in folding and ligand binding.

      (10) PLOR is not properly cited. When introduced in the manuscript, please cite the original PLOR paper (Liu et. al. Nature 2015) and additional related papers. 

      We cited the original PLOR paper (Liu et al, Nature 2015) and the related papers (Liu et al, Nature Protocols 2018). ( pages 4 and 15)

      (11) The kinetics race of folding and binding could be a little more emphasized in discussion, particularly from the perspective of its physiological importance. 

      We agree with the reviewer and deleted the kinetics race of folding and binding from the Discussion part.

    1. Author response:

      We thank the reviewers for their positive feedback and helpful suggestions for improving our manuscript.

      We appreciate the reviewers highlighting areas where we can improve clarity, particularly in the analysis methodologies and details. We agree that additional control experiments and expansion on single-molecule tracking analysis will provide additional support for our interpretations. 

      We acknowledge the reviewers' suggestion to describe our work's relationship to other studies. While some of our findings are similar to those in past studies, our work introduces a new approach for labeling euchromatin with direct sequence specificity on a genome-wide scale, enabling a deeper understanding of euchromatin organization and dynamics. We will provide more context on the novelty of our work and incorporate a more comprehensive discussion of our work’s relation to other studies in the manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors use the model organism Drosophila to explore the sex and age impacts of a TBI method. They find age and sex differences: older age is susceptible to mild TBI and females are also more susceptible. In particular, they pursue a finding that virgin vs mated females show different responses: virgins are protected but mated females succumb to TBI with climbing deficits. In fact, virgin females compared to mated females are largely protected. They discover that this is associated with exposure of the females to Sex Peptides in the reproductive neurons of the female reproductive tract. When they extend to RNAseq of brains, they show that there are very few genes in common between males, mated females, virgins and females mated with males lacking Sex Peptide. The few chronic genes associated with mated females seem associated with the immune system. These findings suggest that mated females have a compromised immune system, which might make them more vulnerable.

      Strengths:

      This is an interesting paper that allows a detailed comparison of sex and age in TBI which is largely only possible in such a simple model, where large numbers and many variations can be addressed. Overall the findings are interesting.

      Weaknesses:

      Although the findings beyond Sex Peptide are observational, the work sets the stage for more detailed studies to pursue the role of the genes they find by RNAseq and whether for example, boosting the innate immune system would protect the mated females, among other experiments.

      We thank the reviewer for their time and effort in evaluating our manuscript. We agree that future studies are needed to further determine the role of the genes that we have identified through RNA sequencing in the late life emergence of neurodegenerative conditions after the exposure to mild head trauma. We would like to investigate whether elevating mated female immunity can mitigate the risk for age-dependent neurodegeneration after mild head trauma.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors use the Drosophila model system to study the impact of mild head trauma on sex-dependent brain deficits. They identify Sex Peptide as a modulator of greater negative outcome in female flies. Additionally, they observe that increased age at the time of injury results in worse outcomes, especially in females, and that this is due to chronic suppression of innate immune defense networks in mated females. The results demonstrate a novel signaling pathway that promotes age- and sex-dependent outcomes after head injury.

      Strengths:

      The authors have modified their previously reported TBI model in flies to mimic mild TBI, which is novel. Methods are explained in detail, allowing for reproducibility. Experiments are rigorous with appropriate statistics. A number of important controls are included. The work tells a complete mechanistic story and adds important data to increase our understanding of sex-dependent differences in recovery after TBI. The discussion is comprehensive and puts the work in the context of the field.

      Weaknesses:

      A very minor weakness is that exact n values should be included in the figure legends. There should also be confirmation of knockdown by RNAi in female flies either by immunohistochemistry or qRT-PCR if possible.

      We thank the reviewer for the evaluation of our manuscript and for the suggestion to include the exact n values in the figure legends. We will include the n values in our revision.

      Regarding RNAi knockdown of sex peptide receptors (SPRs), we agree that confirmation of the knockdown by IHC or qRT-PCR will further strengthen our findings.  It should be noted, however, that the RNAi line we used has been extensively validated by Yapici et al., 2007 and several subsequent publications. Importantly, the effectiveness of SPR knockdown is evident in female flies as they exhibit dramatically reduced egg laying and, importantly, lack the typical post-mating behaviors (such as rejection of male flies after initial mating) observed in the wild type mated female flies. In fact, female flies with RNAi-mediated SPR knockdown behave identically to females mated with SP-null male flies, confirming the effective disruption of the SP-SPR signaling pathway. We will revise the manuscript to make these points clear. 

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors used a Drosophila model to show that exposure to repetitive mild TBI causes neurodegenerative conditions that emerge late in life and disproportionately affect females. In addition to well-known age-dependent impact, the authors identified Sex Peptide (SP) signaling as a key factor in female susceptibility to post-injury brain deficits.

      Strengths:

      The authors have presented a compelling set of results showing that female Sex Peptide signaling adversely affects late-life neurodegeneration after early-life exposure to repetitive mild head injury in Drosophila. They have (1) compared the phenotypes of adult male and female flies sustaining TBI at different ages, and the phenotypes of virgin females and mated females, (2) compared the phenotypes of eliminating SP signaling in mating females and introducing SP-signaling into virgin females, (3) compared transcriptomic changes of different groups in response to TBI. The results are generally consistent and robust.

      Weaknesses:

      The authors have made their claims largely based on assaying climbing index and vacuole formation as the only indicators of late-life neurodegeneration after TBI. However, these phenotypes are not really specific to TBI-related neurodegeneration, and the significance and mechanisms of especially vacuole formation are not clear. The authors should perform additional analyses on TBI-related neurodegeneration in flies, which have been shown before (Genetics. 2015 Oct; 201(2): 377-402). Furthermore, it is also really surprising to see so few DEGs even in wild-type males and mated females, and to see that none of the DEGs overlapped among groups or are even related to the SP-signaling. This raises questions about the validity of the RNA-seq analysis. It is critical to independently verify their RNA-sequencing results and to add some more molecular evidence to support their conclusion. Finally, it is unknown what the implication of female fly mating and its associated Sex Peptide signaling would be to mammalians or humans, and what are the mechanisms underlying the sexual dimorphism.

      We thank the reviewer for the thorough evaluation of our manuscript. The reviewer raised a very important question: whether the neurodegeneration observed in our model is specific to TBI. As the reviewer rightly pointed out, the neurodegenerative phenotypes are unlikely specific to TBI-related neurodegeneration. Throughout the manuscript, we have tried to convey the notion that the mild physical impacts to the head represent one form of environmental insults, which in combination with other risk factors such as aging can lead to the emergence of neurodegenerative conditions. It should be noted that the negative geotaxis assay and vacuolation quantification are two well-established approaches to assess sensorimotor deficits and frank brain degeneration in fly brains.

      It is important to emphasize that the head-specific impacts delivered to the flies in our study are much milder than those used in previous studies. As we showed in our figure 1, this very mild form of head trauma (referred to as vmHT) did not cause any death, nor affected the lifespan of the injured flies. Our supplemental data also show very minimal structural neuronal damage and essentially no acute and chronic apoptosis induced by vmHT exposure. Consistently, we did not observe any exoskeletal or eye damage immediately following injuries, nor did we observe any retinal degeneration and pseudopupil loss at the chronic stage of these flies. We will incorporate these important points in the revision. 

      We agree that future studies are needed to independently validate our RNA sequencing results. We believe that the small number of DEGs are likely due to two unique features of our study: (1) the very mild nature of our injury paradigm and (2) the chronic examination timepoint that was long after the head injury and SP exposure, which distinguish our study from previous fly TBI studies.  As pointed out in the manuscript, our study was aimed to understand how early life exposure to repetitive head traumatic insults could lead to the late-life onset of neurodegenerative conditions. We hope to further validate our results in our next phase of experiments using single-cell RNA sequencing and RT-qPCR.

      As the reviewer pointed out, it would be very interesting to explore the possible roles of sex peptide-signaling in other animals and humans. As far as we know, there is no known mammalian ortholog to the insect sex peptide, so it would be difficult to study SP or an SP-like molecule in mammalian models. However, we believe that prolonged post-mating changes associated with reproduction in female fruit flies contribute to their elevated vulnerability to neurodegeneration.  In this regard, drastic changes within the biology of female mammals associated with reproduction can potentially lead to vulnerability to neurodegeneration. We agree that this demands further study, which may be done with future collaborators using rodent or large animal models.  We have discussed this point in the manuscript, but will revise it to further clarify the discussion.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      In this study, López-Jiménez and colleagues demonstrated the utility of using high-content microscopy in dissecting host and bacterial determinants that play a role in the establishment of infection using Shigella flexneri as a model. The manuscript nicely identifies that infection with Shigella results in a block to DNA replication and protein synthesis. At the same time, the host responds, in part, via the entrapment of Shigella in septin cages.

      Strengths:

      The main strength of this manuscript is its technical aspects. They nicely demonstrate how an automated microscopy pipeline coupled with artificial intelligence can be used to gain new insights regarding elements of bacterial pathogenesis, using Shigella flexneri as a model system. Using this pipeline enabled the investigators to enhance the field's general understanding regarding the role of septin cages in responding to invading Shigella. This platform should be of interest to those who study a variety of intracellular microbial pathogens.

      Another strength of the manuscript is the demonstration - using cell biology-based approaches- that infection with Shigella blocks DNA replication and protein synthesis. These observations nicely dovetail with the prior findings of other groups. Nevertheless, their clever click-chemistry-based approaches provide visual evidence of these phenomena and should interest many.

      We thank the Reviewer for their enthusiasm on the technical aspects of this paper, regarding both the automated microscopy pipeline coupled with artificial intelligence and the click-chemistry based approaches to dissect DNA replication and protein synthesis by microscopy.

      Weaknesses:

      There are two main weaknesses of this work. First, the studies are limited to findings obtained using a single immortalized cell line. It is appreciated that HeLa cells serve as an excellent model for studying aspects of Shigella pathogenesis and host responses. However, it would be nice to see that similar observations are observed with an epithelial cell line of intestinal, preferably colonic origin, and eventually, with a non-immortalized cell line, although it is appreciated that the latter studies are beyond the scope of this work.

      The immortalized cell line HeLa is widely regarded as a paradigm to study infection by Shigella and other intracellular pathogens. However, we agree that future studies beyond the scope of this work should include other cell lines (eg. epithelial cells of colonic origin, macrophages, primary cells). 

      The other weakness is that the studies are minimally mechanistic. For example, the investigators have data to suggest that infection with Shigella leads to an arrest in DNA replication and protein synthesis; however, no follow-up studies have been conducted to determine how these host cell processes are disabled. Interestingly, Zhang and colleagues recently identified that the Shigella OspC effectors target eukaryotic translation initiation factor 3 to block host cell translation (PMID: 38368608). This paper should be discussed and cited in the discussion.

      We appreciate the Reviewer’s concern about the lack of follow up work on observations of host DNA and protein synthesis arrest upon Shigella infection, which will be the focus of future studies. We acknowledge the recent work of Zhang et al. (Cell Reports, 2024) considering their similar results on protein translation arrest, and we fully agree that this reference should be more fully discussed in a revised version of the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      Septin caging has emerged as one of the innate immune responses of eukaryotic cells to infections by intracellular bacteria. This fascinating assembly of eukaryotic proteins into complex structures restricts bacteria motility within the cytoplasm of host cells, thereby facilitating recognition by cytosolic sensors and components of the autophagy machinery. Given the different types of septin caging that have been described thus far, a single-cell, unbiased approach to quantify and characterise septin recruitment at bacteria is important to fully grasp the role and function of caging. Thus, the authors have developed an automated image analysis pipeline allowing bacterial segmentation and classification of septin cages that will be very useful in the future, applied to study the role of host and bacterial factors, compare different bacterial strains, or even compare infections by clinical isolates.

      Strengths:

      The authors developed a solid pipeline that has been thoroughly validated. When tested on infected cells, automated analysis corroborated previous observations and allowed the unbiased quantification of the different types of septin cages as well as the correlation between caging and bacterial metabolic activity. This approach will prove an essential asset in the further characterisation of septin cages for future studies.

      We thank the Reviewer for their positive comments, and for highlighting the strength of our imaging and analysis pipeline to analyse Shigella-septin interactions.

      Weaknesses:

      As the main aim of the manuscript is to describe the newly developed analysis pipeline, the results illustrated in the manuscript are essentially descriptive. The developed pipeline seems exceptionally efficient in recognising septin cages in infected cells but its application for a broader purpose or field of study remains limited.

      The main objective of this manuscript is the development of imaging and analysis tools to study Shigella infection, and in particular, Shigella interactions with the septin cytoskeleton. In future work we will provide more mechanistic insight with novel experiments and broader applicability, using different cell lines (in agreement with Reviewer 1), mutants or clinical isolates of Shigella and different bacteria species (eg. Listeria, Salmonella, mycobacteria).

      Reviewer #3 (Public Review):

      Summary:

      The manuscript uses high-content imaging and advanced image-analysis tools to monitor the infection of epithelial cells by Shigella. They perform some analysis on the state of the cells (through measurements of DNA and protein synthesis), and then they focus on differential recruitment of Sept7 to the bacteria. They link this recruitment with the activity of the bacterial T3SS, which is a very interesting discovery. Overall, I found numerous exciting elements in this manuscript, and I have a couple of reservations. Please see below for more details on my reservations. Nevertheless, I think that these issues can be addressed by the authors, and doing so will help to make it a convincing and interesting piece for the community working on intracellular pathogens. The authors should also carefully re-edit their manuscript to avoid overselling their data (see below for issues I see there). I would consider taking out the first figure and starting with Figure 3 (Figure 2 could be re-organized in the later parts)- that could help to make the flow of the manuscript better.

      Strengths:

      The high-content analysis including the innovative analytical workflows are very promising and could be used by a large number of scientists working on intracellular bacteria. The finding that Septins (through SEPT7) are differentially regulated through actively secreting bacteria is very exciting and can steer novel research directions.

      We thank the Reviewer for their constructive feedback and the excitement for our results, including our findings on T3SS activity and Shigella-septin interactions_._ In accordance with the Reviewer’s comments, we agree to carefully re-edit our manuscript to avoid overselling our data in a future version of the manuscript. We will also consider to rearrange figures depending on new results.

      Weaknesses:

      The manuscript makes a connection between two research lines (1: Shigella infection and DNA/protein synthesis, 2: regulation of septins around invading Shigella) that are not fully developed - this makes it sometimes difficult to understand the take-home messages of the authors.

      We agree that the manuscript is mostly technical and therefore some of our experimental observations would benefit from follow up mechanistic studies in the future. We highlight our vision for broader applicability in response to weaknesses raised by Reviewer 2.

      It is not clear whether the analysis that was done on projected images actually reflects the phenotypes of the original 3D data. This issue needs to be carefully addressed.

      We agree with the Reviewer that characterizing 3D data using 2D projected images has limitations.

      We observe an increase in cell and nuclear surface that does not strictly imply a change in volume. This is why we measure Hoechst intensity in the nucleus using SUM-projection (as it can be used as a proxy of DNA content of the cell). However, we agree that future use of other markers (such as fluorescent labelled histones) would make our conclusions more robust.

      Regarding the different orientation of intracellular bacteria, we agree that investigation of septin recruitment is more challenging when bacteria are placed perpendicular to the acquisition plane. In a first step, we trained a Convolutional Neural Network (CNN) using 2D data, as it is easier/faster to train and requires fewer annotated images. In doing so, we already managed to correctly identify 80% of Shigella interacting with septins, which enabled us to observe higher T3SS activity in this population. In future studies, we will maximize the 3D potential of our data and retrain a CNN that will allow more precise identification of Shigella-septin interactions and in depth characterization of volumetric parameters.

    1. Author response:

      We would like to thank all reviewers and editors for their thorough peer review and valuable suggestions. In these provisional responses, we summarize the main concerns raised by the reviewers and outline our planned revisions to address them in the manuscript.

      Overall, we are pleased to note that the reviewers agree on the potential value of our updated toolbox for gene editing, highlighting its various applications. However, they also raised several valid concerns, which we have summarized and responded to as follows:

      (1) Mutant phenotypes in transfected populations can be occasionally reversed or escaped. This suggests it will not be possible to detect growth-associated phenotypes in pooled screens. An experiment with a pooled loss-of-function screen to test this is missing.

      Escapes or reversals of mutant phenotypes have been observed with other genetic tools used for loss-of-function screening, including lentiviral CRISPR approaches in mammalian systems and RNAi in Trypanosoma brucei. Cells can escape phenotypes through various mechanisms, such as promoter silencing or selection of non-deleterious mutations. Additionally, not every CRISPR guide is efficient in generating a mutant phenotype, and RNAi constructs can also vary in their effectiveness. Despite these challenges, genome-wide loss-of-function screens have been successfully carried out in mammalian cells and Trypanosoma parasites. Therefore, we believe that the observed escape of one mutant phenotype does not preclude the detection of growth-associated or other phenotypes in pooled screens. Moreover, we did not observe a reversal of the mutant phenotype in L. mexicana, L. donovani, and L. major parasites expressing tdTomato from an expression cassette integrated into the 18S rRNA SSU locus (Figure 4). However, the reviewers are rightfully requesting a pooled loss-of-function screen to validate this. Since submitting this manuscript, we have conducted multiple pooled loss-of-function screens, which have confirmed the ability of our here presented method to detect a range of mutant phenotypes in pooled screening formats. We will include these results in our revised manuscript.

      (2) The possibility of mis-integration of the CBE sgRNA expression construct into an entirely different locus is not explored.

      We plan to reanalyze our ONT sequencing data to verify if the CBE sgRNA expression construct was integrated into an unintended loci. If we detect any mis-integration events, we will evaluate their potential negative impacts and discuss these findings in the revised manuscript.

      (3) The achieved increase in editing efficiency compared to the previous base editing method could be more clearly presented.

      We have directly compared our improved method to our previous base editing method in Figures 1E and 4, demonstrating higher editing rates in a much shorter time. In the revised manuscript, we will present and describe the increase in editing rate more clearly.

      (4) The improvements on CBE sgRNA guide design are hypothetical and untested.

      We agree that the improvements to the CBE sgRNA design are currently hypothetical. We plan to systematically test our guide design principles in future studies. Since this will require testing hundreds of guides to draw robust conclusions, we believe that this aspect is beyond the scope of the current study. However, we will discuss our plans for future validation in the revised manuscript.

      Overall, we appreciate the reviewers' insights and are committed to addressing their concerns thoroughly. We believe that the planned revisions and additional experiments will significantly strengthen our manuscript and provide a more comprehensive evaluation of our updated gene editing toolbox.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the editors and reviewers for their encouraging comments. Reviewer 1 raises an important question regarding the translation of biomarker derived data into dietary recommendations, taking the high variability in food composition into consideration. Unfortunately, there is no straightforward answer as the high variability in food composition means that the number of cups of tea for 200mg of flavan-3-ols will depend on the flavanol content of the tea. A probabilistic modelling approach, as we have used to investigate the impact of food content variability on estimated associations with health outcomes, would be a possible solution. This could provide food based recommendations that would meet a defined intake with a certain probability. However, developing and exploring such models is beyond the scope of this manuscript and we have therefore decided not to include this in our response. We have stated in the manuscript that such a method needs to be developed.

      We have addressed the typographical errors and the other comments as follows:

      •   Line 126 - this is the first mention of DR-FCT and as such it needs to be defined. This was a typo and it was corrected throughout the manuscript. The actual abbreviation is DD-FCT and it is defined in line 78.

      •   Figure 4 - what exactly is this figure trying to convey to the reader? A better explanation about this figure is needed. Figure legend was updated and extent hoping to increase clarity.

      •   Figure 5 - Why are the graphs presented differently, meaning why are the data for the flavan-3-ols and epicatechin differentiated for men and women and not nitrate. The sample size for nitrate was too small to stratify in the same way as for flavan-3-ols.

      •   Line 365 - more information is needed, I am assuming the authors are stating ”The tableone package for R ...”. As requested by the reviewer, additional details are now included.

      We have also revised the abstract, the conclusion and the discussion of limitations of the biomarker approach to improve readabilty of the manuscript.

    1. Author response:

      We are thankful to the expert reviewers and the editorial team for their assessment of our manuscript and valuable comments, which will help us to improve our manuscript. While Reviewer #1 appreciated the comprehensive assessment using advanced methods, Reviewer #2 asked for an extension of traditional neuropathological and neuroradiological assessments. Both reviewers identified limitations of the study like the inability to provide direct histopathological evidence for meningitis due to missing meninges tissue, resulting in the conclusions being based on indirect evidence. The reviewers raised concerns about potential post mortem penetration of bacteria into the brain parenchyma. Reviewer #1 also questioned the evidence for cortical siderosis based on the intensity of histological stains.

      We agree with both reviewers and the editorial comment that a traditional neuropathological assessment of meningeal status would have strongly boosted the study's conclusions. Please note that the opportunistic sampling approach after a wild animal’s “natural” death, which is the only ethical method to study infection biology in great apes, is intrinsically accompanied by some limitations such as the lack of standardized post mortem intervals or incomplete sampling. In the revised version of the manuscript, we will complement the advanced MRI and histology already presented by extended traditional neuroradiological and neuropathological assessments as recommended by Reviewer #2, including a report on the status of other organs. However, it is important to note that the interpretation of post mortem MRI of brain material collected in the field differs substantially from conventional in vivo MRI and requires tailored analysis and interpretation. Below we comment on three aspects addressed by reviewers:

      * Missing meninges *: The meninges and associated vessels had to be removed to reduce blood-related artifacts in previously performed MRI measurements. We are aware that this poses a major limitation of this study, and thus rely on the evidence derived from the material at hand. Neuropathological assessment is in agreement with the reviewer's comments that no overt acute bacterial meningitis with e.g. turbid appearance, purulent exudates or frank hemorrhages is apparent in the macroscopic inspection of the presented material. However, the macroscopic changes should be evaluated in the light of the brief time interval between bacterial colonization and death. Meningeal bacterial invasion was visualized on a few meningeal residues we found in case 1, proofing the invasion of the subarachnoid space. Based on the reviewer's suggestions, the microscopic neuropathological evaluation will be expanded with the aim to identify further regions with meningeal residues to include more regions to 1) reduce potential sampling bias and 2) to better characterize the leptomeningeal infiltrates focusing on early inflammatory markers.<br />  However, an extensive assessment of the histopathological inflammatory status must be clarified in future studies on specimens with remaining meninges.

      *Putrefaction/Post mortem bacterial proliferation*:<br /> Reviewers raised important points by remarking  that the tissue alterations could be due to putrefaction/post mortem effects. Classical bacterial putrefaction is unlikely, since no mixed flora of opportunistic bacteria was detected, suggesting that time before fixation was sufficient to prevent secondary bacterial invasion in the presented specimens. Moreover, it has been shown that for the post mortem interval of <24 hours bacterial invasion of the brain is rare even at higher temperatures (Ith et al 2011, https://doi.org/10.1002/nbm.1623). The possibility of post mortem tissue propagation of Bcbva must be considered, since there is a lack of experimental data on the pathogen’s growth after host death, which has been discussed by us in the "Limitations" section in the original manuscript. Although it seems plausible that post mortem multiplication in the brain does occur to a certain extent, several observations suggest that this is not the only mechanism at play in the presented cases. We observed early  microglial activation and astrogliosis indicating a beginning inflammatory reaction in the brain parenchyma. Taken together, the data presented suggest a short time interval between bacterial colonization and death. Under this premise, further analyses for the revision of the manuscript will more closely investigate pathological in vivo tissue alterations.

      *Siderosis* Signs of cortical siderosis were evident in the MRI images of all adult cases (1, 3, and 4), appearing as a hyperintense rim in quantitative R2* maps, indicating substantially elevated levels of iron on the brain surface. These findings were confirmed by Perls’s stain for iron. Such rims in R2* are a typical sign of cortical iron deposition due to siderosis, as observed in conditions like angiopathies. Meningeal bleedings are the most probable source of the elevated iron levels in the cortex. Importantly, such signs were never observed in the post mortem brains of chimpanzees not infected with Anthrax (about 30 cases analyzed so far). Reviewer #1 noted that the intensity of the Perls’s stain seemed too low for siderosis. However, this intensity can vary depending on staining procedure and may be lower for the acute and short disease course of Bcbva-induced Anthrax compared to the chronic human cases Reviewer #1 may be referring to. Taken together, we believe that the evidence of cortical siderosis is compelling, speaking in favor of pre mortem meningeal hemorrhage.

      In summary, in the revised version of the manuscript, we plan to: (1) add a traditional neuroradiological assessment of all scans; (2) present an extended traditional neuropathological assessment of all cases; (3) report results on the status of early inflammatory markers; and (4) discuss the limitations of the study in more detail.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Major findings or outcomes include a genome for the wasp, characterization of the venom constituents and teratocyte and ovipositor expression profiles, as well as information about Trichopria ecology and parasitism strategies. It was found that Trichopria cannot discriminate among hosts by age, but can identify previously parasitized hosts. The authors also investigated whether superparasitism by Trichopria wasps improved parasitism outcomes (it did), presumably by increasing venom and teratocyte concentrations/densities. Elegant use of Drosophila ectopic expression tools allowed for functional characterization of venom components (Timps), and showed that these proteins are responsible for parasitoid-induced delays in host development. After finding that teratocytes produce a large number of proteases, experiments showed that these contribute to digestion of host tissues for parasite consumption.<br /> The discussion ties these elements together by suggesting that genes used for aiding in parasitism via different parts of the parasitism arsenal arise from gene duplication and shifts in tissue of expression (to venom glands or teratocytes).

      Strengths:

      The strength of this manuscript is that it describes the parasitism strategies used by Trichopria wasps at a molecular and behavioral level with broad strokes. It represents a large amount of work that in previous decades might have been published in several different papers. Including all of these data in a manuscript together makes for a comprehensive and interesting study.

      Weaknesses:

      The weakness is that the breadth of the study results in fairly shallow mechanistic or functional results for any given facet of Trichopria's biology. Although none of the findings are especially novel given results from other parasitoid species in previous publications, integrating results together provides significant information about Trichopria biology.

      We thank the reviewer for appreciating the importance of our study.

      Reviewer #2 (Public Review):

      Summary:

      Key findings of this research include the sequencing of the wasp's genome, identification of venom constituents and teratocytes, and examination of Trichopria drosophilae (Td)'s ecology and parasitic strategies. It was observed that Td doesn't distinguish between hosts based on age but can recognize previously parasitized hosts. The study also explored whether multiple parasitisms by Td improved outcomes, which indeed it did, possibly by increasing venom and teratocyte levels. Utilizing Drosophila ectopic expression tools, the authors functionally characterized venom components, specifically tissue inhibitors of metalloproteinases (Timps), which were found to cause delays in host development. Additionally, experiments revealed that teratocytes produce numerous proteases, aiding in the digestion of host tissues for parasite consumption. The discussion suggests that genes involved in different aspects of parasitism may arise from gene duplication and shifts in tissue expression to venom glands or teratocytes.

      Strengths:

      This manuscript provides an in-depth and detailed depiction of the parasitic strategies employed by Td wasps, spanning both molecular and behavioral aspects. It consolidates a significant amount of research that, in the past, might have been distributed across multiple papers. By presenting all this data in a single manuscript, it delivers a comprehensive and engaging study that could help future developments in the field of biological control against a major insect pest.

      Weaknesses:

      While none of the findings are particularly groundbreaking, as similar results have been reported for other parasitoid species in prior research, the integration of these results into one comprehensive overview offers valuable biological insights into an interesting new potential biocontrol species.

      We thank the reviewer for appreciating the importance of our study and for the suggestions on how to improve it.

      Reviewer #1 (Recommendations For The Authors):

      No additional comments

      Reviewer #2 (Recommendations For The Authors):

      Minor comments:

      Line 68 : would be better to spell out the name of the genus at first mention of the species

      It has been corrected as suggested.

      Lines 90-92 : This statement does to coincide with the figure. Could you please explain this better?

      We have carefully checked the statement and the corresponding figure panels, but failed to find the disparity between them. Perhaps, the similar and neighboring labels of Dsuz and Dsan might cause confusion of the emergence rates. To further avoid this potential, we have modified fig.1b and 1c by highlighting the focal host Dsuz.

      Lines 124: could you tell the mention of these genes (Piwi) is important in this context, particularly, for non- full-on experts in this field?

      A previous study has revealed the relationship between the expansion of piwi and large genome, we meant to report a different pattern in our focal genome. We understand your confusion might be caused by the inserted statement regarding the repeat that separated them. Thus, we have moved the citation of previous finding to the place immediately precedent to the conclusion.

      Line 233: "...composition remains largely unknown.." for Td or in general? Not clear..

      Thank you. To make it clear, we have modified this sentence as “Although teratocytes have been reported in several other parasitoids, their molecular composition remains largely unknown in general”.

      Line 286: "at a certain time".. confusing, please rephrase.

      We have rephrased it as “After a certain time (2 or 4 hours for oviposition choice)”.

      Line 293-294: I find this sentence quite hard to follow. Could you please rephrase it and/or expand this concept to make it clearer?

      We have modified this sentence as “The parasitic success of Td largely relies on locating a young host; however, Td does not have the ability to discriminate between young and old hosts. Whether Td has evolved any adaptive strategies to compensate for this disadvantage?”

      Line 314: "it would be interesting".. this is too weak of an argument. Please corroborate your motivation more soundly.

      We have changed this statement as “Because Td allows conditional intraspecific competition, the next compelling question would be whether Td allows interspecific competition with larval parasitoids.”

      Line 391: Divergent evolution is too of a big word in this context. I would tune it down to something like: "Studying ecological niche differentiation ".

      Thank you. It has been corrected as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews. 

      eLife assessment<br /> This important manuscript follows up on previous findings from the same lab supporting the idea that deficits in learning due to enhanced synaptic plasticity are due to saturation effects. Compelling evidence is presented that behavioral learning deficits associated with enhanced synaptic plasticity in a transgenic mouse model can be rescued by manipulations designed to reverse the saturation of synaptic plasticity. In particular, the finding that a previously FDA-approved therapeutic can rescue learning could provide new insights for biologists, psychologists, and others studying learning and neurodevelopment.

      eLife assessment, Significance of findings

      This valuable manuscript follows up on previous findings from the same lab supporting the idea that deficits in learning due to enhanced synaptic plasticity are due to saturation effects. 

      According to the eLife criteria for assessing significance, the “valuable” assessment indicates “findings that have theoretical or practical implications for a subfield.” We have revised the manuscript to emphasize the “theoretical and practical implications beyond a single subfield” which “substantially advance our understanding of major research questions”, with “profound implications” and the potential for “widespread influence,” the eLife criteria for a designation of “landmark” significance.   

      The most immediate implications of our results are for the two major neuroscience subfields of cerebellar research and autism research. However, as recognized by Reviewer 2, the implications are much broader than that: “the finding that a previously FDA-approved therapeutic can rescue learning could provide important new insights for biologists, psychologists, and others studying learning and neurodevelopment.” We have substantially revised the Discussion section of the manuscript to more explicitly lay out how the central idea of our manuscript-- that the capacity for learning at any given moment is powerfully influenced by dynamic, activity- and plasticity-dependent changes in the threshold for synaptic plasticity over short timescales of tens of minutes to hours --has implications for scientific thinking and experiments on plasticity and learning throughout the brain, as well as clinical practice for a wide array of brain disorders associated with altered plasticity and learning impairment. 

      To emphasize the broad conceptual implications of our research, we have reframed our conclusions in terms of metaplasticity rather than saturation of plasticity throughout the revised manuscript. In our previous submission, we had used the “saturation “ terminology for continuity with our previous NguyenVu et al 2017 eLife paper, and mentioned the related idea of threshold metaplasticity in a single sentence: “Similarly, the aberrant recruitment of LTD before training may lead, not to its saturation per se, but to some other kind of reduced availability, such as an increased threshold for its induction (Bienenstock, Cooper, and Munro, 1982; Leet, Bear, and Gaier, 2022).” However, we now appreciate that metaplasticity is a more general conceptual framework for our findings, and therefore emphasize this concept in the revised manuscript, while still making the conceptual link with the “saturation” idea presented in NguyenVu et al 2017 (lines 236-238). 

      The concept of a sliding threshold for synaptic plasticity (threshold metaplasticity) was proposed four decades ago by Bienenstock, Cooper and Munro (1982) as a mechanism for countering an instability inherent in Hebbian plasticity whereby correlated pre- and post-synaptic activity strengthens a synapse, which leads to an increase in correlated activity, which in turn leads to further strengthening. To counter this, BCM proposed a sliding threshold whereby increases in neural activity increase the threshold for LTP and decreases in activity decrease the threshold for LTP, thereby providing a mechanism for stabilizing firing rates and synaptic weights. This BCM sliding threshold model has been highly influential in theoretical and computational neuroscience, but experimental evidence for whether and how such a mechanism functions in vivo has been quite limited.  

      Our work extends the previous, limited experimental evidence for a BCM-like sliding threshold in vivo in several significant ways, which we now discuss in the revised manuscript:

      First, we analyze threshold metaplasticity at synapses where the plasticity is not Hebbian and lacks the inherent instability that inspired the BCM model. The synapses onto cerebellar Purkinje cells have been described as “anti-Hebbian” because the associative form of plasticity is synaptic LTD of excitatory inputs. This anti-Hebbian associative plasticity lacks the instability inherent in Hebbian plasticity. Moreover, a BCM-like sliding threshold that increases the threshold for associative LTD with increased firing rates and decreases threshold for LTD with decreased firing rates would tend to oppose rather than support the stability of firing rates, nevertheless we find evidence for this in our experimental results. Thus, for cerebellar LTD, the central function of the sliding threshold may not be the stabilization of firing rates, but rather to limit plasticity in order to suppress the overwrite of new memories or to allocate different memories to the synapses of different Purkinje cells. 

      Second, we analyze the influence of a BCM-like sliding threshold for plasticity on behavioral learning. Most previous evidence for the BCM model in vivo has derived from studies of the effects of sensory deprivation (e.g., monocular occlusion) on the functional connectivity of sensory circuits (Kirkwood et al., 1996; Desai et al. 2002; Fong et al., 2021) rather than on learning per se.  

      Third, our results provide evidence for major changes in the threshold for plasticity over short time scales and with more subtle manipulations of neural activity than used in previous studies, with practical implications for clinical application. Previously, metaplasticity has been demonstrated with sensory deprivation over multiple days (Kirkwood et al., 1996; Desai et al. 2002) or with drastic changes in neural activity, such as with TTX in the retina (Fong et al, 2021), TMS (Hamada et al 2008), or high frequency electrical stimulation in vitro (Holland & Wagner 1998; Montgomery & Madison 2002) or in vivo (Abraham et al 2001). In contrast, we provide evidence for metaplasticity induced by 30 min of behavioral manipulation (pre-training) and by the relatively subtle pharmacological manipulation of activity with systemic administration of diazepam, a drug approved for humans. Thus, our work contributes not only conceptually to understanding the function of threshold metaplasticity in vivo, but also offers practical observations that could pave the way for novel therapeutic interventions.  

      Fourth, whereas efforts to enhance plasticity and learning have largely focused on increasing the excitability of neurons during learning to help cross the threshold for plasticity (e.g., Albergaria et al., 2018; Yamaguchi et al., 2020; Le Friec et al., 2017), we take the opposite, somewhat counterintuitive approach of inhibiting the excitability of neurons during a period before learning to reset the threshold for plasticity to a state compatible with new learning. To our knowledge, the only other application of such an approach in an animal model of a brain disorder has been inhibiting peripheral (retinal) activity with TTX for treatment of amblyopia (Fong et al, 2021). Our findings from CNS inhibition with a single systemic dose of diazepam greatly expands the potential applications, which could readily be tested in other mouse models of human disorders, and other learning deficits. Even in cases where the specific synaptic impairments and circuitry are less fully understood, the impact of suppressing neural activity during a period before training to reduce the threshold for plasticity could be empirically tested.  

      Fifth, our work extends the consideration of a BCM-like sliding threshold for plasticity to the cerebellum, whereas previous work has focused on models and experimental studies of forebrain circuits. Currently there is a surge of interest in the contribution of the cerebellum to functions and brain disorders previously ascribed to forebrain, hence we anticipate broad interest in this work. 

      Sixth, our results suggest that the history of plasticity rather than the history of firing rates may be the homeostat controlling the threshold for plasticity, at least at the synapses under consideration. Diazepam pre-treatment only enhanced learning in the L7-Fmr1 KO mice with a low “baseline” threshold for plasticity, as measured in vitro, and not WT mice. This suggests it is not the neural activity per se that drives the change in threshold for plasticity, but the interaction of activity with the plasticity mechanism.

      In the revised Discussion, we make all of the above points, to make the implications more clear to readers.  

      The broad interest in this topic is illustrated by two concrete examples. First, an abstract of this work was honored with selection for oral presentation at the November 2023 Symposium of the Molecular and Cellular Cognition Society, a conceptually wide-ranging organization with thousands of members worldwide. Second, the most closely related published work on activity-dependent metaplasticity in vivo, the Fong et al 2021 eLife paper demonstrating reversal of amblyopia by suppression of activity in the retina by TTX, attracted such broad interest, not just of professional scientists, but also the general public, as to be reported on National Public Radio’s All Things Considered, with an audience of 11.9 million people worldwide.  

      In considering the potential of this work for widespread influence, it is important to note that activitydriven changes in the threshold for plasticity could very well be a general property of most if not all synapses, yet very little is known about its function in vivo, especially during learning.  Therefore, the seminal conceptual and practical advances described above have the potential for profound implications throughout neuroscience, psychiatry, neurology and computer science/AI, the eLife criterion for designation as “landmark” in significance. We respectfully request that the reviewers and editor reassess the significance of our findings in light of our much-improved discussion of the broad significance of the work.

      eLife assessment, Strength of support

      Convincing evidence is presented that behavioral learning deficits associated with enhanced synaptic plasticity in a transgenic mouse model can be rescued by manipulations designed to reverse the saturation of synaptic plasticity. In particular, the finding that a previously FDA-approved therapeutic can rescue learning could provide important new insights for biologists, psychologists, and others studying learning and neurodevelopment.

      The designation of “Convincing” indicates “methodology in line with current state-of the-art.” In the revised Discussion, we more clearly highlight that our evidence is “more rigorous than current state-ofthe-art” in several respects, thereby meeting the eLife criterion for “Compelling”:

      (1) Comparison of learning deficits and effects of behavioral and pharmacological pretreatment across five closely related oculomotor learning tasks, which all depend on the same region of the cerebellum (the flocculus), but which previous work has found to vary in their dependence on LTD at the cerebellar parallel fiber-to-Purkinje cell synapses. 

      The “state-of-the-art” behavioral standard in the field of learning is assessment of a single learning task that depends on a given brain area, with the implicit or explicit assumption that the task chosen is representative of “cerebellum-dependent learning” or hippocampus-, amygdala-, basal ganglia-, cortex- dependent learning, etc. Sometimes there is a no-learning behavioral control. 

      Our study exceeds this standard by comparing across many different closely related learning tasks, which all depend on the cerebellar flocculus and other shared vestibular, visual, and oculomotor circuitry, but vary in their dependence on LTD at the cerebellar parallel fiber-to-Purkinje cell synapses. In the original submission, we reported results for high-frequency VOR-increase learning that were dramatically different than for three other VOR learning tasks for which there is less evidence for a role of LTD. Reviewer 2 noted, “the specificity of the effects to forms of plasticity previously shown to require LTD is remarkable.” In the revised manuscript, we provide new data for a second oculomotor learning task in which LTD has been implicated, OKR adaptation, with very similar results as for high-frequency VORincrease learning. The remarkable specificity of both the learning deficits and the effects of pre-training manipulations, in two different lines of mice, for the two specific learning tasks in which LTD has been most strongly implicated, and not the other three oculomotor learning tasks, substantially strengthens the evidence for the conclusion that the learning deficits and effects of pre-training are related specifically to the lower threshold for LTD, rather than the result of some other effect of the gene KO or pre-treatment on the cerebellar or oculomotor circuitry (discussed on lines 270-290 of revised manuscript). 

      (2) Replication of findings in more than one line of mice, targeting distinct signaling pathways, with a common impact of enhancing LTD at the cerebellar PF-Purkinje cell synapses.  

      State-of-the-art is to report the effects of one specific molecular signaling pathway on behavior. 

      In the first part of this Research Advance, we replicate the findings of Nguyen-Vu et al 2017 for a completely different line of mice with enhanced LTD at the parallel fiber-to-Purkinje cell synapses. Like the comparison across LTD-dependent and LTD-independent oculomotor learning tasks, the comparison across completely different lines of mice with enhanced LTD strengthens the evidence that the shared behavioral phenotypes are a reflection of the state of LTD rather than other “off-target” effects of each mutation (discussed on lines 291-309 of revised manuscript).

      (3) Reversal of learning impairments with more than one type of treatment. 

      State-of-the-art is to be able to reverse a learning deficit or other functional impairment in an animal model of a brain disorder with a single treatment; indeed, success in this respect is viewed as wildly exciting, as evidenced by the reception by the scientific and lay communities of the Fong et al, 2021 eLife report of reversal of amblyopia by TTX treatment of the retina. 

      In the current work, we demonstrate reversal of learning deficits with two different types of treatment during the period before training, one behavioral and one pharmacological. The current diazepam pretreatment results provide a fundamentally new type of evidence for the hypothesis that the threshold for LTD and LTD-dependent learning varies with the recent history of activity in the circuit, complementing the evidence from behavioral and optogenetic pre-training approaches used previously in Nguyen-Vu et al, 2017 (discussed on lines 151-158 and 246-255 of revised manuscript).

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Shakhawat et al., investigated how enhancement of plasticity and impairment could result in the same behavioral phenotype. The authors tested the hypothesis that learning impairments result from saturation of plasticity mechanisms and had previously tested this hypothesis using mice lacking two class I major histocompatibility molecules. The current study extends this work by testing the saturation hypothesis in a Purkinje-cell (L7) specific Fmr1 knockout mouse mice, which have enhanced parallel fiber-Purkinje cell LTD. The authors found that L7-Fmr1 knockout mice are impaired on an oculomotor learning task and both pre-training, to reverse LTD, and diazepam, to suppress neural activity, eliminated the deficit when compared to controls.

      Strengths:

      This study tests the "saturation hypothesis" to understand plasticity in learning using a well-known behavior task, VOR, and an additional genetic mouse line with a cerebellar cell-specific target, L7-Fmr1 KO. This hypothesis is of interest to the community as it evokes a novel inquisition into LTD that has not been examined previously.

      Utilizing a cell-specific mouse line that has been previously used as a genetic model to study Fragile X syndrome is a unique way to study the role of Purkinje cells and the Fmr1 gene. This increases the understanding in the field in regards to Fragile X syndrome and LTD.

      The VOR task is a classic behavior task that is well understood, therefore using this metric is very reliable for testing new animal models and treatment strategies. The effects of pretraining are clearly robust and this analysis technique could be applied across different behavior data sets.

      The rescue shown using diazepam is very interesting as this is a therapeutic that could be used in clinical populations as it is already approved.

      There was a proper use of controls and all animal information was described. The statistical analysis and figures are clear and well describe the results.

      We thank the reviewer for summarizing the main strengths of our original submission. We have further strengthened the revised submission by 

      (1) more fully discussing the broad conceptual implications, as outlined above; 

      (2) adding additional new data (Fig. 5) showing that another LTD-dependent oculomotor learning task, optokinetic reflex (OKR) adaptation, is impaired in the L7-Fmr1 KO mice and rescued by pre-treatment with diazepam, as we had already shown for high-frequency VOR increase learning;  3) responding to the specific points raised by the reviewers, as detailed below.

      Weaknesses:

      While the proposed hypothesis is tested using genetic animal models and the VOR task, LTD itself is not measured. This study would have benefited from a direct analysis of LTD in the cerebellar cortex in the proposed circuits.

      Our current experiments were motivated by the direct analysis of cerebellar LTD in Fmr1 knock out mice that was already published (Koekkoek et al., 2005). In that previous work, LTD was analyzed in both Purkinje cell selective L7-Fmr1 KO mice (Koekkoek et al., 2005; Fig. 4D), as used in our study, and global Fmr1 knock out mice (Koekkoek et al., 2005; Fig. 4B). Both lines were found to have enhanced LTD, as cited in the Introduction of our manuscript (lines 48-51, 63-64). The goal of our current study was to build on this previous work by analyzing the behavioral correlates of the findings from this previous, direct analysis of LTD. 

      Diazepam was shown to rescue learning in L7-Fmr1 KO mice, but this drug is a benzodiazepine and can cause a physical dependence. While the concentrations used in this study were quite low and animals were dosed acutely, potential side-effects of the drug were not examined, including any possible withdrawal. 

      In humans, diazepam (valium) is one of the most frequently prescribed drugs in the world, and the side effects and withdrawal symptoms have been extensively studied and documented.1 Withdrawal symptoms are generally not observed with treatments of less than 2 weeks (Brett and Murnion, 2015). After longterm treatments tapering of the dosage is recommended to mitigate withdrawal (Brett and Murnion, 2015 and https://americanaddictioncenters.org/valium-treatment/withdrawal-duration). The extensive data on the safety of diazepam in humans lowers the barrier to potential clinical translation of our basic science findings, although we emphasize that our own expertise is scientific, and translation to Fragile X patients or other patient groups will require additional development of the research by clinicians.

      Given the extensive history of research on this drug, we focused on looking for side effects that would reflect an adverse effect of diazepam on the function of the same oculomotor neural circuitry whose ability to support certain oculomotor learning tasks was improved after diazepam. In other words, we assessed whether the pharmacological manipulation was enhancing certain functions of a given circuit at the expense of others. As we note (line 164), “The acute effect of diazepam administration [measured 2 hours after administration] was to impair learning” in both WT and L7-Fmr1 KO mice. One could consider this a side effect. More importantly, we also tested extensively for oculomotor side-effects during the therapeutic period when learning impairments were eliminated in the L7-Fmr1 KOs, 18-24 hours post-administration, and have a full section of the Results describing our findings about this, titled “Specificity of pre-training effects on learning.” As described in the Results and Discussion (lines 184195, 312-318, Figure 3, figure 3-supplement1; figure 4B; figure 5-supplement 1), we found no such adverse side-effects, which is again encouraging with respect to the translational potential of our findings. 

      This drug is not specific to Purkinje cells or cerebellar circuits, so the action of the drug on cerebellar circuitry is not well understood for the study presented.

      The effects of diazepam are indeed not specific to Purkinje cells, but rather are known to be widespread. Diazepam is a positive allosteric modulator of GABAA receptors, which are found throughout the brain, including the cerebellum. When delivered systemically, as we did in our experiments, diazepam will suppress neural activity throughout the brain by facilitating inhibition, as documented by decades of previous research with this and related benzodiazepines, including dozens of studies of the effects of diazepam in the cerebellum. 

      To our knowledge, there is currently no drug that can specifically inhibit Purkinje cells, especially one that can be given systemically to cross the blood-brain barrier. Moreover, if such a drug did exist, we would not predict it to have the same effect as diazepam in reversing the learning deficits of the L7-Fmr1 KO mice, because the latter presumably depends on suppression of activity in the cerebellar granule cells and neurons of the inferior olive, whose axons form the parallel fibers and climbing fibers, and whose correlated activity controls LTD at the parallel fiber-Purkinje cell synapses.  

      We have revised the text to clarify the key point that despite its widespread action on the brain, the effects of diazepam on cerebellum-dependent learning were remarkably specific (lines 184-195, 210-228, 312318). During the period 18-24 hours after a single dose of diazepam, the learning deficits of L7-Fmr1 KO mice on two LTD-dependent oculomotor learning tasks were completely reversed, with no effects on the same tasks in WT mice, and no effects (“side-effects”) in L7-Fmr1 KO mice or WT mice on other, LTDindependent oculomotor learning tasks that depend on the same region of the cerebellum, and no effects on baseline performance of visually or vestibularly driven eye movements. 

      As described in the revised Discussion (lines 318-323), the non-specific mild suppression of neural activity throughout the brain by diazepam makes it a potentially generalizable approach for inducing BCM-like shifts in the threshold for associative plasticity to facilitate subsequent learning. More specifically, diazepam-mediated reduction of activity throughout the brain has the potential to lower any aberrantly high thresholds for associative plasticity at synapses throughout the brain, and thereby reverse any learning deficits associated with such aberrantly high plasticity thresholds. This approach might even be useful in cases where the neural circuitry supporting a given behavior is not well characterized and the specific synapses responsible for the learning deficit are unknown. On lines 323-327 we compare this generalizable approach with the challenges of designing task- and circuit-specific approaches to reset the threshold for plasticity, particularly in circuits that are less well characterized than the oculomotor circuit.

      It was not mentioned if L7-Fmr1 KO mice have behavior impairments that worsen with age or if Purkinje cells and the cerebellar microcircuit are intact throughout the lifespan. 

      At the adult ages used in our study (8-22 weeks), the oculomotor circuitry, including the Fmr1-deficient Purkinje cells, appears to be functionally intact because all of the oculomotor performance and learning tasks we tested were either normal, or could be restored to normal with brief behavioral and/or pharmacological pre-treatment.  

      Any degeneration of the Fmr1-deficient Purkinje cells or cerebellar microcircuit or additional behavioral impairments at older ages, if they should exist, would not alter our interpretation of the results from 8-22 week old adults regarding history- and activity-dependent changes in the capacity for LTD-dependent learning. Therefore, we leave the question of changes throughout the lifespan to investigators with an interest and expertise in development and/or aging. 

      Only a small handful of the scores of previous studies of the Fmr1 KO mouse model have investigated age-dependent effects; the reviewer may be interested in papers such as Tang et al., 2015 (doi: 10.1073/pnas.1502258112) or Martin et al., 2016 (doi: 10.1093/cercor/bhv031). 

      Connections between Purkinje cells and interneurons could also influence the behavior results found.

      This comment is repeated below in a more general form (Reviewer 1, second to last comment)—please see our response there and lines 270-309 of the revised manuscript for a discussion of how concerns about “off-target” effects are mitigated by the high degree of specificity of the learning deficits and effects of pre-training for the specific learning tasks in which LTD has been previously implicated, and the very similar findings in two different lines of mice with enhanced LTD.

      While males and females were both used for the current study, only 7 of each sex were analyzed, which could be underpowered. While it might be justified to combine sexes for this particular study, it would be worth understanding this model in more detail.

      We performed additional analyses to address the question of whether there might be sex differences that were not detected because of the sample size.

      (1) In a new figure, Fig. 1-figure supplement 1, we break out the results for male and female mice in separate plots, and show that all of the effects of both the KO of Fmr1 from the Purkinje cells and of pretreatment with diazepam that are observed in the full cohort are also statistically significant in just the subset of male mice, and just the subset of female mice (see Fig. 1-figure supplement 1 legend for statistics). In other words, qualitatively, there are no sex differences, and all of the conclusions of our manuscript are statistically valid in both male and female mice. This strengthens the justification for combining sexes for the specific scientific purposes of our study.  

      (2) We performed a power analysis to determine how many mice would be needed to determine whether the very, very small quantitative differences between male and female mice are significant. The analysis indicates that this would require upwards of 70 mice of each sex for WT mice (Cohen’s d, 0.6162; power

      0.95) and upwards of 2500 mice of each sex for L7-Fmr1 KO mice (Cohen’s d, 0.0989; power 0.95). Since the very small quantitative sex differences observed in our cohorts would not alter our scientific conclusions or the possibility for clinical application to patients of both sexes, even if the small quantitative differences turned out to be significant, the very large number of animals needed did not seem warranted for the current scientific purposes. Researchers focused on sex differences may find a motivation to pursue this issue further.   

      Training was only shown up to 30 minutes and learning did not seem to plateau in most cases. What would happen if training continued beyond the 30 minutes? Would L7-Fmr1 KO mice catch-up to WT littermates? Nguyen-Vu

      (1) For VOR learning, we used a 30 min training time because in our past (e.g., Boyden et al., 2003; Kimpo and Raymond, 2007; Nguyen-Vu et al., 2013; Nguyen-Vu et al., 2017) and current results, we find that VOR learning does plateau quite rapidly, with little or no additional adaptive change in the VOR observed between the tests of learning after 30 min vs 20 min of VOR-increase training, in WT or L7Fmr1 KO mice (Fig. 1A; WT, p=0.917; L7-Fmr1 KO, p=0.861; 20 vs. 30 min; Tukey). In the L7-Fmr1 KO mice, there is no significant high-frequency VOR-increase learning after 30 min training, and the mean VOR gain is even slightly lower on average (not significant) than before training (Fig. 1A, red). Therefore, we have no reason to expect that the L7-Fmr1 KO mice would catch up to WT after additional VOR-increase training.  

      (2) We have added new data on OKR adaptation, induced with 60 min of training (Fig. 5). The L7-Fmr1 KO mice exhibited impaired OKR adaptation, even with 60 min of training (p= 1.27x10-4, Tukey). In our experience, restraint for longer than 60 min produces a behavioral state that is not conducive to learning, as also reported by (Katoh and Yamagiwa, 2018), therefore longer training times were not attempted. 

      The pathway discussed as the main focus for VOR in this learning paradigm was connections between parallel fibers (PF) and Purkinje cells, but the possibility of other local or downstream circuitry being involved was not discussed. PF-Purkinje cell circuits were not directly analyzed, which makes this claim difficult to assess.

      In the revised manuscript (lines 299-309), we have expanded our discussion of the possibility that loss of expression of Fmr1 from Purkinje cells in the Purkinje cell-specific L7-Fmr1 KO mice might influence other synapses or intrinsic properties of the Purkinje cells (including synapses from interneurons, as raised in this reviewer’s comment above), in addition to enhancing associative LTD at the parallel fiberPurkinje cell synapses. 

      It is a very general limitation of all perturbation studies, even cell-type specific perturbation studies as in the current case, that it is never possible to completely rule out “off-target” effects of the manipulation. Because of this, causality cannot be definitively concluded from correlations (e.g., between the effects of a perturbation observed at the cellular and behavioral level), and therefore we make no such claim in our manuscript. Rather, we conclude that our results “provide evidence for,” “support,” “predict,” or “are consistent with” the hypothesis of a history- and activity-dependent change in the threshold for associative LTD at the parallel fiber-Purkinje cells.

      That said, perturbation is still one of the major tools in the experimental toolbox, and there are approaches for mitigating concern about off-target effects. We highlight three aspects of our experimental design that accomplish this (lines 184-228, 256-309). First, we show nearly identical learning impairments and effects of behavioral pretreatment in lines of mice with two completely different molecular manipulations that have the common effect of enhancing PF-Purkinje cell LTD, but are likely to have different off-target cellular effects on the Purkinje cells and their synapses. Second, we show that the learning impairments were highly specific to oculomotor learning tasks in which PF-Purkinje cell LTD was previously implicated, with no such effects on three other oculomotor learning tasks that depend on the same region of the cerebellum and oculomotor circuitry. In the original submission, we provided data for one LTDdependent oculomotor learning task, high-frequency VOR-increase learning; in the revised manuscript we provide new data for a second LTD-dependent oculomotor learning task, optokinetic reflex adaptation, with nearly identical results (Fig. 5). Third, we show that the effects of diazepam pre-treatment were highly specific to the same two LTD-dependent oculomotor learning tasks and also highly specific to the L7-Fmr1 KO mice with enhanced LTD and not WT mice. These three features of the experimental design are not common in studies of learning, especially in combination. On lines 256-309, we provide an expanded discussion of how together, these three features of the design strengthen the evidence that the learning impairments and effects of diazepam pre-treatment on learning are related to LTD at the PF-Pk synapses, while acknowledging the possibility of other effects on the circuit. 

      The authors mostly achieved their aim and the results support their conclusion and proposed hypothesis. This work will be impactful on the field as it uses a new Purkinje-cell specific mouse model to study a classic cerebellar task. The use of diazepam could be further analyzed in other genetic models of neurodevelopmental disorders to understand if effects on LTD can rescue other pathways and behavior outcomes.

      We agree that the present findings are potentially relevant for a very wide array of behavioral tasks, disease models, and brain areas beyond the specific ones in our study, and we make this point on lines 310-338 of the revised manuscript. 

      Reviewer #2 (Public Review):

      This manuscript explores the seemingly paradoxical observation that enhanced synaptic plasticity impairs (rather than enhances) certain forms of learning and memory. The central hypothesis is that such impairments arise due to saturation of synaptic plasticity, such that the synaptic plasticity required for learning can no longer be induced. A prior study provided evidence for this hypothesis using transgenic mice that lack major histocompatibility class 1 molecules and show enhanced long-term depression (LTD) at synapses between granule cells and Purkinje cells of the cerebellum. The study found that a form of LTD-dependent motor learning-increasing the gain of the vestibulo-ocular reflex (VOR)-is impaired in these mice and can be rescued by manipulations designed to "unsaturate" LTD. The present study extends this line of investigation to another transgenic mouse line with enhanced LTD, namely, mice with the Fragile X gene knocked out. The main findings are that VOR gain increased learning is selectively impaired in these mice but can be rescued by specific manipulations of visuomotor experience known to reverse cerebellar LTD. Additionally, the authors show that a transient global enhancement of neuronal inhibition also selectively rescues gain increases learning. This latter finding has potential clinical relevance since the drug used to boost inhibition, diazepam, is FDA-approved and commonly used in the clinic. The evidence provided for the saturation is somewhat indirect because directly measuring synaptic strength in vivo is technically difficult. Nevertheless, the experimental results are solid. In particular, the specificity of the effects to forms of plasticity previously shown to require LTD is remarkable. The authors should consider including a brief discussion of some of the important untested assumptions of the saturation hypothesis, including the requirement that cerebellar LTD depends not only on pre- and postsynaptic activity (as is typically assumed) but also on the prior history of synaptic activation.

      We thank the reviewer for this exceptionally clear and concise assessment of the findings and strengths of the manuscript.

      We agree that one of the most “remarkable” aspects of our findings is the specificity of the effects for oculomotor learning tasks for which there is the strongest previous evidence for a role of PF-Purkinje cell LTD. In the original manuscript, we tested just one LTD-dependent oculomotor learning task, highfrequency VOR increase learning; in the revised manuscript, we strengthen the case for LTD-dependent task specificity by adding new data (Fig. 5) showing the same effects for OKR adaptation, an additional LTD-dependent oculomotor learning task.

      The reviewer’s suggestion to include discussion of “untested assumptions”, “including the requirement that cerebellar LTD depends not only on pre- and postsynaptic activity (as is typically assumed) but also on the prior history of synaptic activation” prompted us to more deeply consider the broader implications of our results, and extensively revise the Discussion accordingly. We clarify that we consider historydependent changes in the threshold for LTD to be a prediction of the behavioral and pharmacological findings (lines 339-347, 356) rather than an assumption. In addition, we highlight the broader implications of the results by putting them in the context of work in other brain areas on historydependent changes in the threshold for plasticity, i.e., metaplasticity, going back to the seminal Bienenstock-Cooper-Munro (BCM; year) theory (lines 348-378).  

      Reviewer #1 (Recommendations for The Authors):

      The text and figures are very clear to read, but there are a couple of questions that remain:

      The concentrations chosen for diazepam are not well described and it is unclear why the concentrations jump from 2.5 mg/kg to 0.5 mg/kg. Please add an explanation for these concentrations and if any additional behavior outcomes were observed.

      Our choice of diazepam concentrations was guided by the concentrations reported in the literature to be effective in mice, which suggest that a higher dose (2 mg/kg) can have additional effects not observed with a lower effective dose (0.5 mg/kg) (Pádua-Reis et al, 2021). Since we did not know how much enhancement of inhibition/suppression of activity might be necessary to substantially reduce the induction of PF-Purkinje cell LTD, we did pilot experiments to test concentrations at the low and high ends of the doses typically used in mice. These pilot experiments revealed that a lower dose of 0.4 or 0.5 mg/kg was comparable to the higher dose of 2.5 mg/kg in suppressing VOR-increase learning 2 hours after administration (Fig. 3 – figure supplement 2). Anecdotally, we observed higher levels of locomotor activity and other abnormal cage behavior during the period immediately after administration of the higher compared to the lower dose. To limit these side effects and any possibility of dependence, we used only the lower dose in all subsequent experiments. We clarify this rationale for using a lower dose in the legend of Fig. 3 – figure supplement 2.   

      Figure 4 describes low-frequency VOR, but the paragraph discussing these results (line 191) mentions high-frequency VOR-increase learning. It is unclear where the results are for the high-frequency data. Please include or rephrase for clearer understanding.

      In the revised manuscript, we clarify that the 1 Hz vestibular and visual stimuli used in Figs. 1-3 is the

      “high” frequency, which yields different results than the “low” frequency of 0.5 Hz (Fig. 4), as also observed in Boyden et al 2006, and Nguyen-Vu et al, 2017. 

      Reviewer #2 (Recommendations For The Authors):

      The authors should consider including a brief discussion of some of the important untested assumptions of the saturation hypothesis, including the requirement that cerebellar LTD depends not only on pre- and postsynaptic activity (as is typically assumed) but also on the prior history of synaptic activation.

      We thank the reviewer for this comment, which, along with your public comments, inspired us to thoroughly reconsider and revise our Discussion. We think this has greatly improved the manuscript, and will substantially increase its appeal to a broad segment of the neuroscience research community, including computational neuroscientists as well as those interested in synaptic physiology, learning and memory, or plasticity-related brain disorders including autism. 

      Note that we consider the idea that ”LTD depends not only on pre- and post- synaptic activity but also on the prior history of synaptic activation” to be the central prediction of the threshold metaplasticity hypothesis rather than an assumption, and in the revised manuscript we explicitly refer to this as a prediction (line 339, 356).  We also added a discussion of multiple known cellular phenomena in the Purkinje cells and their synapses that can regulate LTD and thus represent candidate mechanisms for LTD threshold metaplasticity (lines 339-347). Again, sincere thanks for prompting us to write a vastly improved Discussion section.

      Editor's note:

      Should you choose to revise your manuscript, please include full statistical reporting including exact pvalues wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported in the main text for all key questions and not only when the p-value is less than 0.05.

      We have added exact p-values throughout the manuscript.  

      References

      Albergaria C, Silva NT, Pritchett DL, Carey MR. (2018). Locomotor activity modulates associative learning in mouse cerebellum. Nat Neurosci.21:725-735. doi: 10.1038/s41593-018-0129-x.

      Abraham WC, Mason-Parker SE, Bear MF, Tate WT. (2001). Heterosynaptic metaplasticity in the hippocampus in vivo: A BCM-like modifiable threshold for LTP. Proc Natl Acad Sci USA. 98:1092410929.

      Bienenstock E, Cooper L, Munro P. (1982). Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex. J Neurosci. 2:32-48. https://doi.org/10.1523/JNEUROSCI.02-01-00032.1982

      Brett J, Murnion B. (2015). Management of benzodiazepine misuse and dependence. Aust Prescr.38:152155. doi: 10.18773/austprescr.055.

      Boyden ES, Raymond JL. (2003). Active Reversal of Motor Memories Reveals Rules Governing Memory Encoding. Neuron.39:1031-1042. https://doi.org/10.1016/S0896-6273(03)00562-2

      Boyden ES, Katoh A, Pyle JL, Chatila TA, Tsien RW, Raymond JL. (2006). Selective engagement of plasticity mechanisms for motor memory storage. Neuron. 51:823-834. https://doi.org/10.1016/j.neuron.2006.08.026

      Desai NS, Cudmore RH, Nelson SB, Turrigiano GG. (2002). Critical periods for experience-dependent synaptic scaling in visual cortex. Nat Neurosci. 5:783-789. doi: 10.1038/nn878.

      Fong M, Duffy KR, Leet MP, Candler CT, Bear MF. (2021). Correction of amblyopia in cats and mice after the critical period. ELife.10:e70023. https://doi.org/10.7554/eLife.70023

      Hamada M, Terao Y, Hanajima R, Shirota Y, Nakatani-Enomoto S, Furubayashi T, Matsumoto H, Ugawa Y. (2008). Bidirectional long-term motor cortical plasticity and metaplasticity induced by quadripulse transcranial magnetic stimulation. J Physiol. 586:3927-3947. doi: 10.1113/jphysiol.2008.152793.

      Katoh A, Yamagiwa A. (2018). Inhibition of PVN neurons influences stress-induced changes of motor learning in the VOR. Society for Neuroscience. Online Program No. 067.14.

      Kimpo RR, Raymond JL. (2007). Impaired motor learning in the vestibulo-ocular reflex in mice with multiple climbing fiber input to cerebellar Purkinje cells. J Neurosci. 27:5672-5682. doi:

      10.1523/JNEUROSCI.0801-07.2007.

      Kirkwood A, Rioult MG, Bear MF. (1996). Experience-dependent modification of synaptic plasticity in visual cortex. Nature. 381:526–528. https://doi.org/10.1038/381526a0

      Koekkoek SK, Yamaguchi K, Milojkovic BA, Dortland BR, Ruigrok TJ, Maex R, De Graaf W, Smit AE, VanderWerf F, Bakker CE, Willemsen R, Ikeda T, Kakizawa S, Onodera K, Nelson DL, Mientjes E, Joosten M, De Schutter E, Oostra BA, Ito M, De Zeeuw CI. (2005). Deletion of FMR1 in Purkinje Cells Enhances Parallel Fiber LTD, Enlarges Spines, and Attenuates Cerebellar Eyelid Conditioning in Fragile X Syndrome. Neuron. 47:339–352. https://doi.org/10.1016/j.neuron.2005.07.005

      Le Friec A, Salabert AS, Davoust C, Demain B, Vieu C, Vaysse L, Payoux P, Loubinoux I. (2017). Enhancing Plasticity of the Central Nervous System: Drugs, Stem Cell Therapy, and Neuro-Implants. Neural Plast. 2017:2545736. doi: 10.1155/2017/2545736.

      Leet MP, Bear MF, Gaier ED. (2022). Metaplasticity: a key to visual recovery from amblyopia in adulthood? Curr Opin Ophthalmol. 33:512–518. https://doi.org/10.1097/ICU.0000000000000901

      Martin HGS, Lassalle O, Brown JT, Manzoni OJ. (2016). Age-Dependent Long-Term Potentiation Deficits in the Prefrontal Cortex of the Fmr1 Knockout Mouse Model of Fragile X Syndrome. Cereb Cortex. 26:2084–2092. doi: 10.1093/cercor/bhv031.

      Montgomery JM, Madison DV. (2002). State-dependent heterogeneity in synaptic depression between pyramidal cell pairs. Neuron. 33:765-777. doi: 10.1016/s0896-6273(02)00606-2.

      Nguyen-Vu TDB, Kimpo RR, Rinaldi JM, Kohli A, Zeng H, Deisseroth K, Raymond JL. (2013). Cerebellar Purkinje cell activity drives motor learning. Nat Neurosci. 16:1734-1736. doi:

      10.1038/nn.3576.

      Nguyen-Vu TB, Zhao GQ, Lahiri S, Kimpo RR, Lee H, Ganguli S, Shatz CJ, Raymond JL. (2017). A saturation hypothesis to explain both enhanced and impaired learning with enhanced plasticity. ELife. 6:e20147. https://doi.org/10.7554/eLife.20147

      Pádua-Reis M, Nôga DA, Tort ABL, Blunder M. (2021). Diazepam causes sedative rather than anxiolytic effects in C57BL/6J mice. Sci Rep. 2021;11:9335.

      Singh A, Nagpal R, Mittal SK, Bahuguna C, Kumar P. (2017). Pharmacological therapy for amblyopia. Taiwan J Ophthalmol. 7:62-69. doi: 10.4103/tjo.tjo_8_17.

      Tang B, Wang T, Wan H, Han L, Qin X, Zhang Y, Wang J, Yu C, Berton F, Francesconi W, Yates JR 3rd, Vanderklish PW, Liao L. (2015). Fmr1 deficiency promotes age-dependent alterations in the cortical synaptic proteome. Proc Natl Acad Sci USA. 112:E4697-E4706. doi: 10.1073/pnas.1502258112.

      Yamaguchi T, Moriya K, Tanabe S, Kondo K, Otaka Y, Tanaka S. (2020). Transcranial direct-current stimulation combined with attention increases cortical excitability and improves motor learning in healthy volunteers. J Neuroeng Rehabil. 17:23. doi: 10.1186/s12984-020-00665-7.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This valuable work performed fMRI experiments in a rodent model of absence seizures. The results provide new information regarding the brain's responsiveness to environmental stimuli during absence seizures. The authors suggest reduced responsiveness occurs during this type of seizure, and the evidence leading to the conclusion is solid, although reviewers had divergent opinions.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this paper, the effects of two sensory stimuli (visual and somatosensory) on fMRI responsiveness during absence seizures were investigated in GEARS rats with concurrent EEG recordings. SPM analysis of fMRI showed a significant reduction in whole-brain responsiveness during the ictal period compared to the interictal period under both stimuli, and this phenomenon was replicated in a structurally constrained whole-brain computational model of rat brains.

      The conclusion of this paper is that whole-brain responsiveness to both sensory stimuli is inhibited and spatially impeded during seizures.

      Reviewer #2 (Public Review):

      Summary:

      This study examined the possible affect of spike-wave discharges (SWDs) on the response to visual or somatosensory stimulation using fMRI and EEG. This is a significant topic because SWDs often are called seizures and because there is non-responsiveness at this time, it would be logical that responses to sensory stimulation are reduced. On the other hand, in rodents with SWDs, sensory stimulation (a noise, for example) often terminates the SWD/seizure.

      In humans, these periods of SWDs are due to thalamocortical oscillations. A certain percentage of the normal population can have SWDs in response to photic stimulation at specific frequencies. Other individuals develop SWDs without stimulation. They disrupt consciousness. Individuals have an absent look, or "absence", which is called absence epilepsy.

      The authors use a rat model to study the responses to stimulation of the visual or somatosensory systems during and in between SWDs. They report that the response to stimulation is reduced during the SWDs. While some data show this nicely, the authors also report on lines 396-8 "When comparing statistical responses between both states, significant changes (p<0.05, cluster-) were noticed in somatosensory auditory frontal..., with these regions being less activated in interictal state (see also Figure 4). That statement is at odds with their conclusion. I do not see that this issue was addressed.

      See comments below starting with “We acknowledge the reviewer…”.

      They also conclude that stimulation slows the pathways activated by the stimulus. I do not see any data proving this. It would require repeated assessments of the pathways in time. This issue was not addressed.

      See comments below starting with “We acknowledge the reviewer…”.

      The authors also study the hemodynamic response function (HRF) and it is not clear what conclusions can be made from the data. This is still an issue. No conclusions appear to be possible to make.

      See comments below starting with “We acknowledge the reviewer…”.

      Finally, the authors use a model to analyze the data. This model is novel and while that is a strength, its validation is unclear. The authors did not add any validation of their model.

      See comments below starting with “We acknowledge the reviewer…”.

      Strengths:

      Use of fMRI and EEG to study SWDs in rats.

      Weaknesses:

      Several aspects of the Methods and Results were improved but some are still are unclear.

      We acknowledge the reviewer for the concerns of we not addressing the comments above. However, we emphasize that most of the comments were addressed in the already sent “Response to Review Comments” and in the updated manuscript. Here we repeat the responses and provide also additional clarifications to some of the comments.

      We thank the reviewer for noting the discrepancy in the statement of “less activated in interictal state”. The statement should have been written vice versa. We also address that the direction of activation change between groups can be misinterpreted based on statistical maps itself (Figure 3) where only statistical changes are visible and not the polarity of response (can be seen in Figure 4). Therefore, we have made a following changes in the section 3.3.: “There were more voxels with significant changes of activity during interictal state compared to ictal state (136% more). Comparing the statistical responses between interictal and ictal states revealed significant changes (p<0.05, cluster-level corrected) in the visual, somatosensory, and medial frontal cortices. In the ictal state, these regions showed significant hemodynamic decreases when comparing to interictal state, and these polarity changes can be seen the hemodynamic response functions (Figure 4).”

      We agree with the reviewer that there are no data showing slowing of the pathways in response to stimulus. However, we are a bit confused about this comment, as to what part in conclusion section it refers to. We did not intentionally claim that stimulation slows the activated pathways in the manuscript.

      Reviewer is right that strong claims cannot be made from HRF by itself. Therefore, we have avoided to such phrasing throughout the manuscript. In the conclusion section, we speculate that HRF decreases “could play a role in decreased sensory perception” but also state that “further studies are required”. The observed HRF decreases (rather than increases) in the cortex when stimulation was applied during SWD, was discussed in section 4.4., where we speculated that neuronal suppression (possible apparent in negative HRFs) caused by SWD can prevent responsiveness. Conclusion now states the following: “Moreover, the detected decreases in the cortical HRF when sensory stimulation was applied during spike-and-wave discharges, could play a role in decreased sensory perception. Further studies are required to evaluate whether this HRF change is a cause or a consequence of the reduced neuronal response.”

      We point out that the main validation of the model and its details were provided in the previous answer to the reviewer and added to the manuscript. The model presented in the paper is based on a mean-field formalism that captures neuronal activity at the mesoscale level. This mean-field formalism is derived via a detailed statistical description of the activity of a spiking neuronal population of excitatory and inhibitory with conductance-based synaptic interactions. Thus, the validation of the mean-field model is performed via direct comparison between the dynamics obtained from the mean-field model and the dynamics obtained from the underlying spiking neural network model. This comparison is shown in the supplementary material of the manuscript, where the transition studied in the paper between interictal (asynchronous irregular activity) and ictal (SWD dynamics) activity, which is predicted by the mean-field model, is indeed observed in the underlying spiking neuronal model. The existence of these two types of dynamics and the transition between them is the main component of the model used to build the analysis of the responsiveness performed in the paper (which has been properly validated).

      Reviewer #3 (Public Review):

      Summary:

      This is an interesting paper investigating fMRI changes during sensory (visual, tactile) stimulation and absence seizures in the GAERS model. The results are potentially important for the field and do suggest that sensory stimulation may not activate brain regions normally during absence seizures. But the findings are limited by substantial methodological issues that do not enable fMRI signals related to absence seizures to be fully disentangled from fMRI signals related to the sensory stimuli.

      Strengths:

      Investigating fMRI brain responses to sensory stimuli during absence seizures in an animal model is a novel approach with potential to yield important insights.

      Use of an awake, habituated model is a valid and potentially powerful approach.

      Weaknesses:

      The major difficulty with interpreting the results of this study is that the duration of the visual and tactile stimuli were 6 seconds, which is very close to the mean seizure duration per Table 1. Therefore the HRF model looking at fMRI responses to visual or auditory stimuli occurring during seizures was simultaneously weighting both seizure activity and the sensory (visual or auditory) stimuli over the same time intervals on average. The resulting maps and time courses claiming to show fMRI changes from visual or auditory stimulation during seizures will therefore in reality contain some mix of both sensory stimulation-related signals and seizure-related signals. The main claim that the sensory stimuli do not elicit the same activations during seizures as they do in the interictal period may still be true. But the attempts to localize these differences in space or time will be contaminated by the seizure related signals.

      In their response to this comment the authors state that some seizures had longer than average duration, and that they attempted to model the effects of both seizures and sensory stimulation. However these factors do not mitigate the concern because the mean duration of seizures and sensory stimulation remain nearly identical, and the models used therefore will not be able to effectively separate signals related to seizures and related to sensory stimulation.

      Regressors for seizures were formed by including periods of seizures without any stimulation present. In theory, if seizures were perfectly modeled by the regressor, the left variance is completely orthogonal to the main effect of the stimulus. Furthermore, only the cases where the seizures are longer than the stimulus are used to calculate the responsiveness of the stimulus (while the cases where the seizures are shorter than the stimulus are used as nuisance regressors to account for error variance). However, we agree with the reviewer that in practice all effects of the seizure cannot be removed completely from the effect of stimulus. We have addressed this concern in the “physiologic and methodology consideration” section: “We note a caution that presented maps and time courses showing fMRI changes from visual or whisker stimulation during seizures may contain a mixture of both sensory stimulation-related signals and seizure-related signals. To minimize this contamination in the linear model used, we considered both stimulation and seizure-only states as regressors of interest and used seizure-only responses as nuisance regressors to account for error variance. Thereby, the effects caused by the stimulation should be separated as much as possible from the effects caused by the seizure itself.”

      The claims that differences were observed for example between visual cortex and superior colliculus signals with visual stim during seizures vs interictal remain unconvincing due to above.

      Maps shown in Figure 3 do not show clear changes in the areas claimed to be involved.

      In their response the authors enlarged the cross sections. However there are still discrepancies between the images and the way they are described in the text. For example, in the Results text the authors say that comparing the interictal and ictal states revealed less activation in the somatosensory cortex during the ictal than during the interictal state, yet Figure 3 bottom row left shows greater activation in somatosensory cortex in this contrast.

      We note that the direction of activation change between groups can be misinterpreted based on statistical maps itself (Figure 3) where only statistical changes are visible and not the polarity of response (can be seen in Figure 4). Therefore, we have made the following changes to the section 3.3.: “There were more voxels with significant changes of activity during interictal state compared to ictal state (136% more). Comparing the statistical responses between interictal and ictal states revealed significant changes (p<0.05, cluster-level corrected) in the visual, somatosensory, and medial frontal cortices. In the ictal state, these regions showed significant hemodynamic decreases when comparing to interictal state, and these polarity changes can be seen the hemodynamic response functions (Figure 4).”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Authors have revised this paper with a lot of detail. The paper can be accepted for publication in this version.

      Reviewer #2 (Recommendations For The Authors):

      Reviewer #1

      (1) The analysis in this paper does not directly answer the scientific question posed by the authors, which is to explore the mechanisms of the reduced brain responsiveness to external stimuli during absence seizures (in terms of altered information processing), but merely characterizes the spatial involvement of such reduced responsiveness. The same holds for the use of mean-field modeling, which merely reproduces experimental results without explaining them mechanistically as what the authors have claimed at the head of the paper.

      We agree with the reviewer that the manuscript does not answer specifically about the mechanisms of reduced brain responsiveness. The main scientific question addressed in the manuscript was to compare whole-brain responsiveness of stimulus between ictal and interictal states. The sentence that can lead to misinterpretations in the manuscript abstract: "The mechanism underlying the reduced responsiveness to external stimulus remains unknown." was therefore modified to the following "The whole-brain spatial and temporal characteristics of reduced responsiveness to external stimulus remains unknown".

      This change did not address the issue. The problem is that there is no experimentation to address the underlying mechanisms of the results. I also think the changed language in the abstract is less clear than the original.

      We fully agree that this manuscript does not answer or claim to be answering about the mechanisms of reduced brain responsiveness. The main scientific question addressed in the manuscript was to compare whole-brain responsiveness of stimulus between ictal and interictal states, by means of hemodynamics and mean-field simulation.

      We have changed the language of the abstract to the following:

      “In patients suffering absence epilepsy, recurring seizures can significantly decrease their quality of life and lead to yet untreatable comorbidities. Absence seizures are characterized by spike-and-wave discharges on the electroencephalogram associated with a transient alteration of consciousness. However, it is still unknown how the brain responds to external stimuli during and outside of seizures.

      This study aimed to investigate responsiveness to visual and somatosensory stimulation in GAERS, a well-established rat model for absence epilepsy. Animals were maintained in a non-curarized awake state allowing for naturally occurring seizures to be produced inside the magnet. They were imaged continuously using a quiet zero-echo-time functional magnetic resonance imaging (fMRI) sequence. Sensory stimulations were applied during interictal and ictal periods. Whole brain responsiveness and hemodynamic responses were compared between these two states. Additionally, a mean-field simulation model was used to mechanistically explain the changes of neural responsiveness to visual stimulation between interictal and ictal states.

      Results showed that, during a seizure, whole-brain responses to both sensory stimulations were suppressed and spatially hindered. In several cortical regions, hemodynamic responses were negatively polarized during seizures, despite the application of a stimulus. The simulation experiments also showed restricted propagation of spontaneous activity due to stimulation and so agreed well with fMRI findings. These results suggest that sensory processing observed during an interictal state is hindered or even suppressed by the occurrence of an absence seizure, potentially contributing to decreased responsiveness during this absence epileptic process.”

      The authors also study the hemodynamic response function (HRF) and it is not clear what conclusions can be made from the data.

      The response of the authors did not clarify this issue. Instead, they explained why they examined HRF and that they can only speculate what the data means.

      Reviewer is right that strong claims cannot be made from HRF by itself. Therefore, we have avoided to such phrasing throughout the manuscript. In the conclusion section, we speculate that HRF decreases “could play a role in decreased sensory perception” but also state that “further studies are required”.

      Finally, the authors use a model to analyze the data. This model is novel and while that is a strength, its validation is unclear. The conclusion is that the modeling supports the conclusions of the study, which is useful.

      Details about the model were added.

      This is not entirely satisfactory because there is still no validation of the model.

      We point out that the main validation of the model and its details were provided in the previous answer to the reviewer and added to the manuscript. The model presented in the paper is based on a mean-field formalism that captures neuronal activity at the mesoscale level. This mean-field formalism is derived via a detailed statistical description of the activity of a spiking neuronal population of excitatory and inhibitory with conductance-based synaptic interactions. Thus, the validation of the mean-field model is performed via direct comparison between the dynamics obtained from the mean-field model and the dynamics obtained from the underlying spiking neural network model. This comparison is shown in the supplementary material of the manuscript, where the transition studied in the paper between interictal (asynchronous irregular activity) and ictal (SWD dynamics) activity, which is predicted by the mean-field model, is indeed observed in the underlying spiking neuronal model. The existence of these two types of dynamics and the transition between them is the main component of the model used to build the analysis of the responsiveness performed in the paper (which has been properly validated).

      How is ROI defined in this paper? What type of atlas is used?

      Anatomical ROIs were drawn based on Paxinos and Watson rat brain atlas 7th edition. Region was selected if there were statistically significant activations detected inside that region, based on activation maps. We clarified the definition of ROI as the following:<br /> "Anatomical ROIs, based on Paxinos atlas (Paxinos and Watson rat brain atlas 7th edition), were drawn on the brain areas where statistical differences were seen in activation maps."

      This is helpful, but the unstained brain does not show the borders of the areas. Therefore just saying an atlas was used is not enough. How in an unstained brain can the areas be accurately outlined?

      Areas of the brain were differentiated by co-registering the functional MRI images with an T1-weighted anatomical reference brain that was created on site from the same data set that was used for the manuscript. Potential co-registration inaccuracies created by using a reference brain measured in different site, sequence and a rat strain can be thus avoided. T1-images create sufficient contrast to differentiate main brain areas, but for more accurate border definition (e.g., to differentiate different thalamic nuclei), a coordinate system of the atlas and coordinates known in the used anatomical brain, were used to pinpoint exact borders of the brain areas.

      Reviewer #2

      The following also is not precise:

      "Although seizures are initially triggered by hyperactive somatosensory cortical neurons, the majority of neuronal populations are deactivated rather than activated during the seizure, resulting in an overall decrease in neuronal activity during SWD (McCafferty et al. 2023)."

      What neuronal populations? Cortex? Which neurons in the cortex? Those projecting to the thalamus? What about thalamocortical relay cells? Thalamic gabaergic neurons?

      Please check that these issues were corrected.

      The issues were addressed as follows:

      “Although SWDs are initially triggered by hyperactive somatosensory cortical neurons, neuronal firing rates, especially in majority of frontoparietal cortical and thalamocortical relay neurons, are decreased rather than increased during SWD, resulting in an overall decrease in activity in these neuronal populations (McCafferty et al., 2023). Previous fMRI studies have demonstrated blood volume or BOLD signal decreases in several cortical regions including parietal and occipital cortex, but also, quite surprisingly, increases in subcortical regions such as thalamus, medulla and pons (David et al., 2008; McCafferty et al., 2023).”

      Results

      After removing problematic animals and sessions, was there sufficient power? There probably wasn't enough to determine sex differences.

      After removing problematic sessions, we found statistically significant results (multiple comparison corrected) results in both activation maps, and hemodynamic responses. To determine sex differences, there were not enough animals for statistical findings (p>0.05).

      This is not the question. The question is whether there was sufficient power.

      A simple power calculation was performed as follows: considering a t-test, a risk alpha of 0.05, a power of 0.8, matched pairs (seizure/control), we can detect an effect size of 0.37 with our 4 animals, considering repeated measurements (4 sessions/animal x 11 seizure/control pairs per session). This is now mentioned in the manuscript.

      Table 1 has no statistical comparisons.

      Table 1 is purely an illustration of stimulation and seizure occurrence. There is no specific interest to compare stimulation types (in what state of seizure it occurred) as it does not provide any meaningful inferences to the study.

      Table 1 could be improved by statistics. More could be said and there would be justification to include it.

      We thank the reviewer for the suggestion, but as it is yet unclear to what statistical comparison would be feasible to do, we opt to leave it out.

      Statistical activation maps - it is not clear how this was done.

      Creation of statistical maps are explained in section 2.5.3.

      This section is not clear.

      We have added a reference (https://doi.org/10.1002/hbm.460020402) for readers to familiarize themselves with the concept of statistical parametric mapping.

      Fig 3 "F-contrast maps." Please explain.

      Creation of statistical maps are explained in section 2.5.3.

      This section is unclear.

      We have added a reference (https://doi.org/10.1002/hbm.460020402) for readers to familiarize themself with the concept of statistical parametric mapping.

      Reviewer #3 (Recommendations For The Authors):

      Aside from the concerns listed as weaknesses above which were not addressed, most of the more minor comments were addressed by the authors in the resubmission. However, the comment below was not addressed because it is impossible to see any firing rate changes elicited by sensory stimuli (if they are present) due to the scale during seizures. The seizure signals should be removed or accounted for by the model so that any possible sensory stimulus-related signals could be seen, and displayed on the same scale as firing rates without seizures. Prior comment (unaddressed) is repeated below:

      Figure 6-figure supplement 1, the scales are very different for many of the plots so they are hard to compare. Especially in the ictal periods (D, E, F) it is hard to see if any changes are happening during ictal stimulation similar to interictal stimulation due to very different scales. The activity related to SWD is so large that it overshadows the rest, and perhaps should be subtracted out.

      These two comments were addressed and replied in the previous round of reviews. Regarding the different scales of the plots from Figure 6-figure supplement 1, we point out that all the plots in the same scale are already presented in Figure 6 of the main-text. Regarding the activity related to SWD and sensory stimulation, we remark that the effect of the stimulation should be (and was) evaluated with respect to the ongoing activity. All the results concerning the neuronal responsiveness presented in the paper evaluate the statistical significance of the changes in activity produced by the stimulation with respect to the ongoing activity (during ictal and interictal states respectively). For this reason, all the plots containing the time series of neuronal activity in the simulations include the ongoing activity (with SWD dynamics when present) for proper comparison and relevant analysis. 

      Additional changes:

      In the section 3.2., the sentence: “In addition, responses were observed in the somatosensory cortex during a seizure state.” was removed for clarification purposes as deactivation rather than activation was observed in this brain area during a seizure state.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study tests the hypothesis that a high autism quotient in neurotypical adults is strongly associated with suboptimal motor planning and visual updating after eye movements, which in turn, is related to a disrupted efference copy mechanism. The implication is that such abnormal behavior would be exaggerated in those with ASD and may contribute to sensory overload - a key symptom in this condition. The evidence presented is convincing, with significant effects in both visual and motor domains, adequate sample sizes, and consideration of alternatives. However, the study would be strengthened with minor but necessary corrections to methods and statistics, as well as a moderation of claims regarding direct application to ASD in the absence of testing such patients.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study examines a hypothesized link between autism symptomatology and efference copy mechanisms. This is an important question for several reasons. Efference copy is both a critical brain mechanism that is key to rapid sensorimotor behaviors, and one that has important implications for autism given recent empirical and theoretical work implicating atypical prediction mechanisms and atypical reliance on priors in ASD.

      The authors test this relationship in two different experiments, both of which show larger errors/biases in spatial updating for those with heightened autistic traits (as measured by AQ in neurotypical (NT) individuals).

      Strengths:

      The empirical results are convincing - effects are strong, sample sizes are sufficient, and the authors also rule out alternative explanations (ruling out differences in motor behavior or perceptual processing per se).

      Weaknesses:

      My main concern is that the paper should be more transparent about both (1) that this study does not include individuals with autism, and (2) acknowledging the limitations of the AQ.

      On the first point, and I don't think this is intentional, there are several instances where the line between heightened autistic traits in the NT population and ASD is blurred or absent. For example, in the second sentence of the abstract, the authors state "Here, we examine the idea that sensory overload in ASD may be linked to issues with efference copy mechanisms". I would say this is not correct because the authors did not test individuals with ASD. I don't see a problem with using ASD to motivate and discuss this work, but it should be clear in key places that this was done using AQ in NT individuals.

      For the second issue, the AQ measure itself has some problems. For example, reference 38 in the paper (a key paper on AQ) also shows that those with high AQ skew more male than modern estimates of ASD, suggesting that the AQ may not fully capture the full spectrum of ASD symptomatology. Of course, this does not mean that the AQ is not a useful measure (the present data clearly show that it captures something important about spatial updating during eye movements), but it should not be confused with ASD, and its limitations need to be acknowledged. My recommendation would be to do this in the title as well - e.g. note impaired visuomotor updating in individuals with "heightened autistic traits".

      We thank the reviewer for the kind words. We now specify more carefully that our sample of participants consists of neurotypical adults scored for autistic traits and none of them was diagnosed with autism before participating in our experiment. Regarding the Autistic Quotient Questionnaire (AQ) on page 5 of the Introduction we now write:

      “The autistic traits of the whole population form a continuum, with ASD diagnosis usually situated on the high end 31-33. Moreover, autistic traits share a genetic and biological etiology with ASD 34. Thus, quantifying autistic-trait-related differences in healthy people can provide unique perspectives as well as a useful surrogate for understanding the symptoms of ASD 31,35.”

      In the Discussion (page 9) we now write:

      ”It is essential to note that our participant pool lacked pre-existing diagnoses before engaging in the experiments and we must address limitations associated with the AQ questionnaire. The AQ questionnaire demonstrates adequate test-retest reliability 36, normal distribution of sum scores in the general population 50, and cross-cultural equivalence has been established in Dutch and Japanese samples 51-53. The AQ effectively categorizes individuals into low, average, and high degrees of autistic traits, demonstrating sensitivity for both group and individual assessments 54.

      However, evolving research underscores many aspects that are not fully captured by the self-administered questionnaire: for example, gender differences in ASD trait manifestation 55. Autistic females may exhibit more socially typical interests, often overlooked by professionals 56. Camouflaging behaviors, employed by autistic women to blend in, pose challenges for accurate diagnosis 57. Late diagnoses are attributed to a lack of awareness, gendered traits, and outdated assessment tools 58. Moving forward, complementing AQ evaluations in the general population with other questionnaires, such as those assessing camouflaging abilities 59, or motor skills in everyday situation (MOSES-test 60) becomes crucial for a comprehensive understanding of autistic traits.”

      Suggestions for improvement:

      - Figure 5 is really interesting. I think it should be highlighted a bit more, perhaps even with a model that uses the results of both tasks to predict AQ scores.

      We thank the reviewer for the suggestion. However, the sample size is relatively small for building a robust and generalizable model to predict AQ scores. Statistical models built on small datasets can be prone to overfitting, meaning that they might not accurately predict the AQ for new individuals.

      - Some discussion of the memory demands of the tasks will be helpful. The authors argue that memory is not a factor, but some support for this is needed. 

      The reviewer raises an important point regarding the potential for memory demands to influence our results. We have now also investigated the accuracy of the second saccade separately for the x and y dimension. As also shown in figure 3 panel A, a motor bias was observed only in one dimension (x), weaking the argument of memory which would imply a bias in both directions (participants remembering the position of the target relative to both screen borders for example). We performed a t-test between our subsample of participants and indeed we found a difference in saccade accuracy for the x dimension (p = 0.03) but not in the y dimension (p = 0.88).

      We now add these analyses in Discussion on page 8.

      - With 3 sessions for each experiment, the authors also have data to look at learning. Did people with high AQ get better over time, or did the observed errors/biases persist throughout the experiment? 

      We thank the reviewer for pointing this out. On page 7 (Results) we now write:

      ” Understanding how these biases might change over time could provide further insights into this mechanism. Specifically, we investigated whether participants exhibited any learning effects throughout the experiments. For data of Experiment 1 – motor updating – we divided our data into 10 separate bins of 30 trials each. We conducted a repeated measure ANOVA with the within-subject factor “number of sessions” (two main sessions of 5 bins each, ~150 trials) and the between-subject factor “group” (lower vs upper quartile of the AQ distribution). We found no main effect of “number of sessions” (F(1,7) = 0.25, p = 0.66), a main effect of “group” (F(1,7) = 2.52, p = 0.015), and no interaction between the two subsample of participants and the sessions tested (F(1,7) = 0.51, p = 0.49). Data of Experiment 2 – visual updating– were separated into 3 sessions. For each session we extracted the PSE and we conducted a repeated measure ANOVA with within subject factor “sessions” and between subject factor “groups” (lower vs upper quartile of the AQ distribution). Also here we found no main effect of sessions (F(1,13) = 0.86, p = 0.39), a main effect of group (F(1,14) = 11.85, p = 0.004), and no interaction between the two subsample of participants and the sessions tested (F(1,13) = 0.20, p = 0.73). In conclusion, the current study found no evidence of learning effects across the experimental sessions. However, a significant main effect of group was observed in both Experiment 1 (motor updating) and Experiment 2 (visual updating). Participants in the group with higher autistic traits performed systematically differently on the task, regardless of the number of sessions completed compared to those in the group with lower autistic traits.”

      Reviewer #2 (Public Review):

      Summary:

      The idea that various clinical conditions may be associated, at least partially, with a disrupted corollary discharge mechanism has been present for a long time.

      In this paper, the authors draw a link between sensory overload, a characteristic of autism spectrum disorder, and a disturbance in the corollary discharge mechanism. The authors substantiate their hypothesis with strong evidence from both the motor and perceptual domains. As a result, they broaden the clinical relevance of the corollary discharge mechanism to encompass autism spectrum disorder.

      The authors write:

      "Imagine a scenario in which you're watching a video of a fast-moving car on a bumpy road. As the car hits a pothole, your eyes naturally make quick, involuntary saccades to keep the car in your visual field. Without a functional efference copy system, your brain would have difficulty accurately determining the current position of your eye in space, which in turn affects its ability to anticipate where the car should appear after each eye movement."

      I appreciate the use of examples to clarify the concept of efference copy. However, I believe this example is more related to a gain-field mechanism, informing the system about the position of the eye with respect to the head, rather than an example of efference copy per se.

      Without an efference copy mechanism, the brain would have trouble accurately determining where the eyes will be in space after an eye movement, and it will have trouble predicting the sensory consequences of the eye movement. However it can be argued that the gain-field mechanism would be sufficient to inform the brain about the current position of the eyes with respect to the head. 

      We now used a different example. And on page 3 of Introduction, we now write:

      “During a tennis game, rapid oculomotor saccades are employed to track the high-velocity ball across the visual display. In the absence of a functional efference copy mechanism, the brain would encounter difficulty in anticipating the precise retinal location of the ball following each saccade. This could result in a transient period of visual disruption as the visual system adjusts to the new eye position. The efference copy, by predicting the forthcoming sensory consequences of the saccade, would bridge this gap and facilitate the maintenance of a continuous and accurate representation of the ball's trajectory.”

      The authors write:

      "In the double-step paradigm, two consecutive saccades are made to briefly displayed targets 21, 22. The first saccade occurs without visual references, relying on internal updating to determine the eye's position."

      Maybe I have missed something, but in the double-step paradigm the first saccade can occur without the help of visual references if no visual feedback is present, that is, when saccades are performed in total darkness. Was this the case for this experiment? I could not find details about room conditions in the methods. Please provide further details.

      In case saccades were not performed in total darkness, then the first saccade can be based on the remembered location of the first target presented, which can be derived from the retinotopic trace of the first stimuli, as well as the contribution from the surroundings, that is: the remembered relative location of the first target with respect to the screen border along the horizontal meridian (i.e. allocentric cues).

      A similar logic could be applied to the second saccade. If the second saccade were based only on the retinotopic trace, without updating, then it would go up and 45 deg to the right, based on the example shown in Figure 1. With appropriate updating, the second saccade would go straight up. However, if saccades were not performed in total darkness, then the location of the second target could also be derived from its relationship with the surroundings (for example, the remembered distance from screen borders, i.e. allocentric cues).

      If saccades were not performed in total darkness, the results shown in Figures 2 and 3 could then be related to i) differences in motor updating between AQ score groups; ii) differences in the use of allocentric cues between AQ score groups; iii) a combination of i) and ii). I believe this is a point worth mentioning in the discussion." 

      Thank you for raising the important issue of visual references in the double-step saccade task. Participants performed saccades in a dimly lit room where visual references, i.e. the screen borders, were barely visible. At the time we collected the data a laboratory that allowed performing experiments in complete darkness was not at our disposal. We acknowledge the possibility that participants could have memorized the target locations relative to the screen borders. The bias of high AQ participants could then be attributed to differences in either encoding, memorization or decoding of the target location relative to the screen borders. However, the potentially abnormal use of visual references must reflect an altered remapping process since we did not find differences in saccade landing in the vertical dimension. A t-test between our group of participants revealed a difference in saccade accuracy for the x dimension (p = 0.03) but not in the y dimension (p = 0.88). We thus agree that in addition to an altered efference copy signal in high AQ participants, altered use of visual references might also affect their saccadic remapping.

      In Discussion we now write: “Our findings suggest that a general memory deficit is unlikely to fully explain the observed bias in high-AQ participants' second saccades. As highlighted in Figure 3A, the bias was specific to the horizontal dimension, weakening the argument for a global memory issue affecting both vertical and horizontal encoding of target location. However, it's important to acknowledge that even under non-darkness conditions, participants might rely on a combination of internal updating based on the initial target location and visual cues from the environment, such as screen borders. This potential use of visual references could contribute to the observed bias in the high-AQ group. If high-AQ participants differed in their reliance on visual cues compared to the low-AQ group, it could explain the specific pattern of altered remapping observed in the horizontal dimension. This possibility aligns with our argument for an abnormal remapping process underlying the results. While altered efference copy signals remain a strong candidate, the potential influence of visual cues on remapping in this population warrants further investigation. Future studies could incorporate a darkness condition to isolate the effects of internal updating on the first saccade, and systematically manipulate the availability of visual cues throughout the task. This would allow for a more nuanced understanding of how internal updating and visual reference use interact in the double-step paradigm, particularly for individuals with varying AQ scores “.

      The authors write:

      According to theories of saccadic suppression, an efference copy is necessary to predict the occurrence of a saccade."

      I would also refer to alternative accounts, where saccadic suppression appears to arise as early as the retina, due to the interaction between the visual shift introduced by the eye movement, and the retinal signal associated with the probe used to measure saccadic suppression. This could potentially account for the scaling of saccadic suppression magnitude with saccade amplitude.

      Idrees, S., Baumann, M.P., Franke, F., Münch, T.A. and Hafed, Z.M., 2020. Perceptual saccadic suppression starts in the retina. Nature communications, 11(1), p.1977. 

      We thank the reviewer. Now on page 4 of Introduction we write:

      “Some theories consider saccadic omission and saccadic suppression as resulting from an active mechanism. In this view an efference copy would signal the occurrence of a saccade, yielding a transient decrease in visual sensitivity20-22. Others however have pointed out the possibility that a purely passive mechanism suffices to induce saccadic omission23. A recent study has found evidence for saccadic suppression already in the retina. Idrees et al.24 demonstrated that retinal ganglion cells in isolated retinae of mice and pigs respond to saccade-like displacements, leading to the suppression of responses to additional flashed visual stimuli through visually triggered retinal-circuit mechanisms. Importantly, their findings suggest that perisaccadic modulations of contrast sensitivity may have a purely visual origin, challenging the need for an efference copy in the early stages of saccadic suppression. However, the suppression they measured lasted much longer than time-courses observed in behavioral data. An efference copy signal could thus be necessary to release perception from suppression.”

      Reviewer #3 (Public Review): 

      Summary:

      This work examined efference copy related to eye movements in healthy adults who have high autistic traits. Efference copies allow the brain to make predictions about sensory outcomes of self-generated actions, and thus serve important roles in motor planning and maintaining visual stability. Consequently, disrupted efference copies have been posited as a potential mechanism underlying motor and sensory symptoms in psychopathology such as Autism Spectrum Disorder (ASD), but so far very few studies have directly investigated this theory. Therefore, this study makes an important contribution as an attempt to fill in this knowledge gap. The authors conducted two eye-tracking experiments examining the accuracy of motor planning and visual perception following a saccade and found that participants with high autistic traits exhibited worse task performance (i.e., less accurate second saccade and biased perception of object displacement), consistent with their hypothesis of less impact of efference copies on motor and visual updating. Moreover, the motor and visual biases are positively correlated, indicative of a common underlying mechanism. These findings are promising and can have important implications for clinical intervention if they can be replicated in a clinical sample.

      Strengths:

      The authors utilized well-established and rigorously designed experiments and sound analytic methods. This enables easy translations between similar work in non-human primates and humans and readily points to potential candidates for underlying neural circuits that could be further examined in follow-up studies (e.g., superior colliculus, frontal eye fields, mediodorsal thalamus). The finding of no association between initial saccade accuracy and level of autistic trait in both experiments also serves as an important control analysis and increases one's confidence in the conclusion that the observed differences in task performance were indeed due to disrupted efference copies, not confounding factors such as basic visual/motor deficits or issues with working memory. The strong correlation between the observed motor and visual biases further strengthens the claim that the findings from both experiments may be explained by the same underlying mechanism - disrupted efference copies. Lastly, the authors also presented a thoughtful and detailed mechanistic theory of how efference copy impairment may lead to ASD symptomatology, which can serve as a nice framework for more research into the role of efference copies in ASD.

      Weaknesses:

      Although the paper has a lot of strengths, the main weakness of the paper is that a direct link with ASD symptoms (i.e., sensory overload and motor inflexibility as the authors suggested) cannot be established. First of all, the participants are all healthy adults who do not meet the clinical criteria for an ASD diagnosis. Although they could be considered a part of the broader autism phenotype, the results cannot be easily generalized to the clinical population without further research. Secondly, the measure used to quantify the level of autistic traits, Autistic Quotient (AQ), does not actually capture any sensory or motor symptoms of ASD. Therefore, it is unknown whether those who scored high on AQ in this study experienced high, or even any, sensory or motor difficulties. In other words, more evidence is needed to demonstrate a direct link between disrupted efference copies and sensory/motor symptoms in ASD.

      This is a valid point, and we thank the reviewer for raising it up. Moving forward, complementing AQ evaluations in the general population with other questionnaires, such as those assessing camouflaging abilities (Hull, L., Mandy, W., Lai, MC., et al., 2019), or motor skills in everyday situation (MOSES-test, Hillus J, Moseley R, Roepke S, Mohr B. 2019 ) becomes crucial for a comprehensive understanding of autistic traits.”

      We now address this point in Discussion page 9.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      - The pothole example in the introduction was really hard to follow. I wonder if there is a better example. 

      We now used a different example. And on page 3 of Introduction, we now write:

      “During a tennis game, rapid oculomotor saccades are employed to track the high-velocity ball across the visual display. In the absence of a functional efference copy mechanism, the brain would encounter difficulty in anticipating the precise retinal location of the ball following each saccade. This could result in a transient period of visual disruption as the visual system adjusts to the new eye position. The efference copy, by predicting the forthcoming sensory consequences of the saccade, would bridge this gap and facilitate the maintenance of a continuous and accurate representation of the ball's trajectory.”

      - This is really minor; I would say that saccades are not the most frequent movement that humans perform. Some of the balance-related adjustments and even heartbeats are faster. Maybe just add "voluntary". 

      We thank the reviewer for the suggestion, now added.

      - "Severe consequences" on page 4 is a bit strong. If that were true, there would be pretty severe impairments in eye movement behavior in ASD, which I don't think is the case.

      We agree with the reviewer. We now eliminated the term “severe”.

      - The results section would read better if each experiment had a short paragraph reiterating its overall goal and the specific approach each experiment took to achieve that goal. 

      Now on page 5, for the first experiment, we write:

      ”We investigated the influence of autistic traits on visual updating during saccadic eye movements using a classic double-step saccade task. This task relies on participants making two consecutive saccades to briefly presented targets. The accuracy of the second saccade serves as an indirect measure of how effectively the participant's brain integrated the execution of the first saccade into their internal representation of visual space. Participants were divided into quartiles based on the severity of their autistic traits, as assessed by the Autistic quotient questionnaire (cite). We hypothesized that individuals with higher autistic traits would exhibit greater difficulty in visual updating compared to those with lower autistic traits. This would be reflected in reduced accuracy of their second saccades in the double-step task. Figure 2C illustrates examples from participants at the extremes of the autistic trait distribution (Autistic quotient = 3, in orange and Autistic quotient = 31, in magenta). As shown, both participants were instructed to make saccades to the locations indicated by two brief target appearances (T1 and T2), as quickly and accurately as possible, following the order of presentation. However, successful execution of the second saccade requires accurate internal compensation for the first saccade, without any visual references or feedback available during the saccade itself.”

      On page 6, for experiment 2, we write:

      ”With a trans-saccadic localization task, we explored how autistic traits affect the integration of eye movements into visual perception. Participants were presented with stimuli before and after a single saccade, creating an illusion of apparent motion. We measured the perceived direction of this displacement, which is influenced by how well the participant's brain accounts for the saccadic eye movement. We predicted that individuals with higher autistic traits would show a stronger bias in the perceived displacement direction, suggesting a less accurate integration of the eye movement into their visual perception.”

      - On page 6, the text about "vertical displacement" is confusing. The spatial displacements in this experiment were horizontal? 

      Yes, they were. The spatial displacement is horizontal, but the perceived trajectory (due to the saccade) is vertical. We now changed “vertical displacement” to “vertical trajectory”.

      - Page 6, grammatical problems in "while we report a slightly slant of the dots trajectory". 

      Thank you. Now fixed.

      - It would be helpful to discuss the apparent motion part of Experiment 2 in the main text. This important part is not made clear. 

      We now in Introduction, page 4, write:

      “In this paradigm, one stimulus is shown before and another after saccade execution. Together these two stimuli produce the perception of “apparent motion”. If stimuli are placed such that the apparent motion path is orthogonal to the saccade path, then the orientation of the apparent motion path indicates how the saccade vector is integrated into vision. The apparent motion trajectory can only appear vertical if the movement of the eyes is perfectly accounted for, that is the retinotopic displacement is largely compensated, ensuring spatial stability. However, small biases of motion direction – implying under- (or over-) compensation of the eye movement – can indicate relative failures in this stabilization process. In a seminal study, Szinte and Cavanagh 27 found a slight over-compensation of the saccade vector leading to apparent motion slightly tilted against the direction of the saccade. More importantly, when efference copies are not available, i.e. localization occurring at the time of a second saccade in a double step task, a strong saccade under-compensation occurs 28.

      This phenomenon cannot be explained by perisaccadic mislocalization of flashed visual stimuli 29,30, but the two phenomena may be related in that they may both depend upon efference copy information.”

      - Figure 1 could be improved. For example, the text talks about the motor plan, but this is not clearly shown in the figure.

      We now added the motor plan into the model. Thank you.

      - Figure 2A, the scale is off (the pictures make it look like the horizontal movement was longer than the vertical). 

      Now fixed.

      - Figure 4, it would be helpful if the task was also described in the figure. 

      We thank the reviewer for the comment. We now tried to modify the figure by also adding the perceptual judgment task.

      - Figure 5A, the y-axis shows p(correct), but that is not what the y-axis shows (the legend makes the same mistake). 

      We apologize, it’s the proportion of time participants reported the second dot to be more to the right compared to the first one. We now changed the figure and the text accordingly.

      - A recent study on motion and eye movement prediction in ASD is very relevant to the work presented here.: Park et al. (2021). Atypical visual motion-prediction abilities in autism spectrum disorder. Clinical Psychological Science, 9(5), 944-960.

      Indeed. We now refer to the cited study in Discussion, on page 9.

      Reviewer #2 (Recommendations For The Authors):

      Statistics and plotting.

      I believe some of the reported statistics are not clear. For example, the authors write:

      "Saccade landing positions of participants in the lower quartile (mean degree {plus minus} SEM: 10.17{plus minus} 0.50) did not deviate significantly from those in the upper quartile (mean degree {plus minus} SEM: 9.65 {plus minus} 0.77). This result was also confirmed by a paired sample t-test (t(7) = 0.66; p = 0.66, BF10 = 0.40)"

      Maybe I am missing something, but why use a paired-sample t-test when the upper and lower quartiles constitute different groups of participants? Shouldn't a two-sample t-test be used in this case?

      We apologize for the confusion. It is indeed a two-sample t-test.

      Along the same lines, I do not understand the link between the number of degrees of freedom reported in the t-test (7) and the number of participants reported in the study (41).

      This is also evident when looking at the scatterplot in Figure 3C. How many participants formed the averages and standard errors reported in Figures 3B and 3D? Please clarify.

      I have the same comment(s) also for the visual updating task (and related figures), where 13 degrees of freedom are reported in the t-tests. Please clarify. 

      We thank the reviewer for pointing this out. The number of participants reported in the scatter plots were indeed 42.  However, we opted to compare the averages only in the lower and upper quartile of the AQ distribution to avoid dealing with a median split (which would imply a skewed distribution). Of our sample of participants in Exp1, 8 fell into the lower quartile of the AQ distribution and 8 in the upper quartile (14 deg of freedom); from Exp 2, 8 participants fell in the lower and 7 in the upper (13 deg of freedom).

      We now fixed the values accordingly.

      Reviewer #3 (Recommendations For The Authors):

      (1) The language can be a bit misleading (especially the title and abstract) as it wasn't always clear that the participants don't actually have clinical ASD. I'd suggest avoiding using words like "symptom" as that would indicate clinical severity, and using words like "traits/characteristics" instead for more precise language. 

      We apologize for the misleading terminology used. Now fixed.

      (2) In the Intro: "...perfect compensation results in a vertical trajectory, while small biases indicate stabilization issues23-25." This is a bit confusing without knowing the details of the paradigm. Consider clarifying or at least referring to Figure 4. 

      Thank you.

      (3) In the Results: "This result was also confirmed by a paired sample t-test (t(7) = 0.66;..." This is confusing as a two-sample t-test is the appropriate test here. Also, the degree of freedom seems very low - could the authors clarify how many participants are in each subgroup (i.e., low vs. high AQ quartile), for both experiments? 

      Of our sample of participants in Exp1 8 fell into the lower quartile of the AQ distribution and 8 in the upper quartile (14 deg of freedom); from Exp 2, 8 participants fell in the lower and 7 in the upper (13 deg of freedom).

      (4) In the Methods: Experiment 2: "The first dot could appear randomly above or below gaze level at a fixed horizontal location, halfway between the two fixations (x = 0, y = -5{degree sign} or +5{degree sign} depending on the trial). The second dot was then shown orthogonal to the first one at a variable horizontal location (x = 5{degree sign} {plus minus} 2.5{degree sign})." This would mean that the position of the 2nd dot relative to the 1st one would be 2.5{degree sign}- 7.5{degree sign}, but the task description in Results and Figure 5A would suggest the horizontal location of the second dot is x = 0{degree sign} {plus minus} 2.5{degree sign}. Which one is correct? 

      The second option is the correct one. We now fixed the typo in the Methods part.

      (5) There is another study that examined oculomotor efference copies in children with ASD using a similar trans-saccadic perception task (Yao et al., 2021, Journal of Vision). In that study, they found a correlation between task performance and an ASD motor symptom (repetitive behavior). This seems quite relevant to the authors' hypothesis and discussion. 

      We thank the reviewer for the suggestion. We now added the mentioned paper in the discussion.

      (6) Please proofread the entire paper carefully as there were multiple grammatical and spelling errors.

      Thank you.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, Hoops et al. showed that Netrin-1 and UNC5c can guide dopaminergic innervation from nucleus accumbens to cortex during adolescence in rodent models. 

      We showed this with respect to Netrin-1 only. With respect to UNC5c, we showed that the timing of its expression suggests that it may be involved, but did not conduct the UNC5cmanipulation experiments necessary to prove it. We state this clearly in the manuscript.

      They found that these dopamine axons project to the prefrontal cortex in a Netrin-1 dependent manner and knocking down Netrin-1 disrupted motor and learning behaviors in mice. 

      We would like to clarify that we did not show that learning or motor behaviors are affected. We showed that inhibitory control, measured in the Go/No-Go task, is altered in adulthood.

      Furthermore, the authors used hamsters, a seasonal model that is affected by the length of daylight, to demonstrate that the guidance of dopamine axons is mediated by the environmental factor such as daytime length and in sex dependent manner. 

      We agree with this characterization of our hamster experiments, but want to emphasize that it is the timing of the adolescent dopamine axon input to the prefrontal cortex what is impacted by daytime length in a sex dependent manner.

      Regarding the cell type specificity of Netrin-1 expression, the authors began by stating "this question is not the focus of the study and we consider it irrelevant to the main issue we are addressing, which is where in the forebrain regions we examined Netrin-1+ cells are present." This statement contradicts the exact issue regarding the specificity issue I raised.

      We are not sure why the identities of the cell types expressing Netrin-1 are at issue. As a secreted protein, Netrin-1 can be attached to the extracellular cell surface or in the extracellular matrix, where it interacts with its receptors, which are embedded in the cell surfaces of growing axons (Finci et al., 2015; Rajasekharan & Kennedy, 2009). Netrin-1 is expressed by a wide variety of cell types, for example it is expressed in medium spiny neurons in the striatum of rodents as well as in cholinergic neurons (Shatzmiller et al., 2008). However, we cannot see why showing exactly what type(s) of cells have Netrin-1 on their surfaces, or have secreted them into the matrix, would be at issue for our study.

      They then went on to show the RNAscope data for Netrin-1 in Figure 2, which showed Netrin-1 mRNA was actually expressed quite ubiquitously in anterior cingulate cortex, dorsopeduncular cortex, infralimbic cortex, prelimbic cortex, etc. 

      Figure 2 - this is referring to Author response image 2 of our first response to reviewers.

      We agree that Netrin-1 mRNA is present throughout the forebrain. In particular, its presence in the regions mentioned by Reviewer #1 is a key component of our theory for how dopamine axons grow to the prefrontal cortex in adolescence.

      In addition, contrary to the authors' statement that Netrin-1 is a "secreted protein", the confocal images in Figure 1 in the rebuttal letter actually show Netrin-1 present in "granule-like" organelles inside the cytoplasm of neurons. 

      The rebuttal letter’s Figure 1 is not sufficient to determine the subcellular location of the Netrin-1, however we agree that it is likely that Netrin-1 is present in the cytoplasm of neurons. Indeed, its presence in vesicles in the cytoplasm is to be expected as this is a common mechanism for cells to secrete proteins into the extracellular space (Glasgow et al., 2018). We are not sure whether Reviewer #1’s “granule-like” organelles are in fact secretory vesicles or not, and we do not think our immunohistochemical images are an appropriate method by which to determine this kind of question. We find, however, that a detailed characterization of the subcellular distribution of Netrin-1 is beyond the scope of our study. 

      That Netrin-1 is a secreted protein is well-established in the literature (for example, see Glasgow et al., 2018). The confocal images we provide suggest, but do not prove, that it is likely Netrin-1 is present both extracellularly and intracellularly, which is entirely consistent with its synthesis, secretion, and function. It is also consistent with our methodology and findings. 

      Finally, the authors presented Figure 7 to indicate the location where virus expressing Netrin-1 shRNA might be located. Again, the brain region targeted was quite focal and most likely did not cover all the Netrin-1+ brain regions in Figure 2. 

      Figure 2 - this is referring to Author response image 2 of our first response to reviewers.

      Figure 7 - this is referring to Author response image 4 of our first response to reviewers.

      We agree with Reviewer #1’s characterization of our experiment. We intended to interrupt the Netrin-1 pathway to the prefrontal cortex, like removing a bridge along a road. The Netrin-1 signal remained intact along the dopamine axon’s route before and after the location of the viral injection, however it was lost at the site of the virus injection. This is like a road remaining intact on either side of a destroyed bridge, but becoming impassable at the location where the bridge was destroyed. We are glad that Reviewer 1 agrees our experimental design achieved the desired outcome (a focal reduction in Netrin-1 expression).

      Collectively, these results raised more questions regarding the specificity of Netrin-1 expression in brain regions that are behaviorally relevant to this study.

      We do not agree with this assessment. Our manipulation of Netrin-1 expression was highly localized and specific, as Reviewer #1 seems to acknowledge. We are not clear on what questions this might raise that would call into question our findings as described in our manuscript. We have now added the following paragraph to our manuscript:  

      “It remains unknown exactly what types of cells are expressing Netrin-1 along the dopamine axon route, and how this expression is regulated to produce the Netrin-1 gradients that guide the dopamine axons. It also remains unclear where the misrouted axons end up in adulthood. Future experiments aimed at addressing these questions will provide further valuable insight into the nature of the “Netrin-1 pathway”. Nonetheless, our results allow us to conclude that Netrin-1 expressing cells “pave the way” for dopamine axons growing to the medial prefrontal cortex.”

      With respect to the effectiveness of Netrin-1 knockdown in the animals in this study, the authors cited data in HEK293 cells (Cuesta et al., 2020. Figure 2a), which did not include any statistics, and previously published in vivo data in a separate, independent study (Cuesta et al., 2020. Figure 2c). They do not provide any data regarding the effectiveness of Netrin-1 knockdown in THIS study.

      Indeed, we understand the concerns of Reviewer 1 here. This issue was discussed at the time all the experiments (both in the current manuscript and in Cuesta et al., (2020)) were conducted, and we decided that it was sufficient to show the virus was capable of knocking down Netrin-1 in vitro and in vivo in the forebrain. These characterization experiments were published in the first manuscript to present results using the virus, which was Cuesta et al., 2020. However, all experiments from both manuscripts were conducted contemporaneously.

      We do not see how repeating the same characterization experiments again is useful. 

      Similar concerns regarding UNC5C knockdown (points #6, #7, and #8) were not adequately addressed.

      There is no UNC5c knockdown in this manuscript. Furthermore, points #6, #7 and #8 do not deal with UNC5c knockdown. Point #6 is regarding the Netrin-1 virus efficacy, which we discuss above. Points #7 and #8 are requesting numerous additional experiments that we feel are worthy of their own manuscripts, and we do not feel that they call into question the findings we present here. Rather, answering points #7 and #8 would further refine our understanding of how dopamine axons grow to the prefrontal cortex beyond our current manuscript.

      In brief, while this study provides a potential role of Netrin-1-UNC5C in target innervation of dopaminergic neurons and its behavioral output in risk-taking, the data lack sufficient evidence to firmly establish the cause-effect relationship.

      We do not claim a cause-effect relationship here or anywhere in the manuscript. Concrete establishment of a cause-effect relationship will require several more manuscripts worth of experiments.

      Reviewer #2 (Public Review):

      In this manuscript, Hoops et al., using two different model systems, identified key developmental changes in Netrin-1 and UNC5C signaling that correspond to behavioral changes and are sensitive to environmental factors that affect the timing of development. They found that Netrin-1 expression is highest in regions of the striatum and cortex where TH+ axons are travelling, and that knocking down Netrin-1 reduces TH+ varicosities in mPFC and reduces impulsive behaviors in a Go-No-Go test. 

      We want to point out that we examined the Netrin-1 expression in the septum rather than the striatum but otherwise feel the above description is accurate.

      Further, they show that the onset of Unc5 expression is sexually dimorphic in mice, and that in Siberian hamsters, environmental effects on development are also sexually dimorophic. This study addresses an important question using approaches that link molecular, circuit and behavioral changes. Understanding developmental trajectories of adolescence, and how they can be impacted by environmental factors, is an understudied area of neuroscience that is highly relevant to understanding the onset of mental health disorders. I appreciated the inclusion of replication cohorts within the study.

      We appreciate Reviewer #2’s comments, which we feel accurately describe our experimental approach and findings, including their limitations.

      Reviewer #3 (Public Review):

      This study from the Flores group aims at understanding neuronal circuit changes during adolescence which is an ill-defined, transitional period involving dramatic changes in behavior and anatomy. They focus on DA innervation of the prefrontal cortex, and their interaction with the guidance cue Netrin1. They propose DA axons in the PFC increase in the postnatal period, and their density is reduced in a Netrin 1 knockdown, suggesting that Netrin abets the development of this mesocortical pathway. 

      We feel it necessary to point out that we are not the first to propose that dopamine axons in the prefrontal cortex increase in the postnatal period.  This is well-established and was first documented in rodents in the 1980s (Kalsbeek et al., 1988). Otherwise we agree with Reviewer 3’s characterization.

      In such mice impulsivity gauged by a go-no go task is reduced. They then provide some evidence that Unc5c is developmentally regulated in DA axons. Finally they use an interesting hamster model, to study the effect of light hours on mesocortical innervation, and make some interesting observations about the timing of innervation and Unc5c expression, and the fact that females housed in winter day length conditions display an accelerated innervation of the prefrontal cortex.

      We agree with Reviewer #3’s characterization of our study and findings here.

      Comments on the revision. Several points were addressed; some remain to be addressed.

      (4) It's not clear to me that TH doesnt stain noradrenergic axons in the PFC. See Islam and Blaess, 2021, and references therein.

      Presuming that Reviewer #3 is referring to Islam et al. (2021), the review they cite supports our position that TH-stained axons in the forebrain are by-and-large dopamine axons.

      Nonetheless, Islam et al. do point out that it is important to keep in mind that TH-positive axons have a slight possibility of being noradrenaline axons. We are very conscious of this possibility and are careful to minimize this risk. As we state in the methods, we only examine axons that are morphologically consistent with dopamine axons and are localized to areas within the forebrain where dopamine axons are known to innervate, in addition to being THpositive. The localization and morphology of noradrenaline axons in the forebrain is different from that of dopamine axons. This is stated in our methods on lines 76-94, where we describe in detail the differentiation between dopamine and norepinephrine axons and include a full list of relevant citations.

      (6) The Netrin knockdown data provided is from a previous study/samples.

      Indeed, however the experiments for the two manuscripts were conducted contemporaneously. We believe two sets of validation experiments are not required.

      (8) While the authors make the argument that the behavior is linked to DA, they still haven't formally tested it, in my opinion.

      We agree that we have not formally tested this link. However, we disagree that we claim to have established a formal link in our manuscript.

      (1). Fig 3, UNc 5c  levels are not yet quantified. Furthermore, I agree with the previous reviewer that Unc5C knockdown would corroborate key aspects of the model.

      We present UNC5c quantities for mice in our first response to reviewers (Figure 11 therein) however we did not do so for the hamsters due to the time involved. We are planning further experiments with the hamsters and may include quantification of UNC5c in the nucleus accumbens at such time. However, we do not feel its absence from this manuscript calls into question our findings.

      With regards to the UNC5c knockdown, we agree it would be an informative extension of our findings here, but again we do not feel that it is necessary to corroborate our current findings.

      New - Developmental trajectory of prefrontal TH-positive axons from early adolescence to adulthood is similar in male and female rats, (Willing Juraska et al., 2017). This needs discussion.

      Willing et al. (2017) reported an increase in prefrontal dopamine density during adolescence in male and female rats, with a non-significant trend towards an earlier increase in females.

      This is in line with our current results in mice indicating that the timing of dopamine axon targeting and growth is sex specific. We are currently testing this idea directly using intersectional viral tracing methods. We now added the following sentence to the manuscript: 

      “Differences in the precise timing of dopamine innervation to the PFC in adolescence have been suggested by findings reported in male and female rats (Willing et al., 2017)”.

      References

      Brignani, S., Raj, D. D. A., Schmidt, E. R. E., Düdükcü, Ö., Adolfs, Y., Ruiter, A. A. D., Rybiczka-Tesulov, M., Verhagen, M. G., Meer, C. van der, Broekhoven, M. H., MorenoBravo, J. A., Grossouw, L. M., Dumontier, E., Cloutier, J.-F., Chédotal, A., & Pasterkamp, R. J. (2020). Remotely Produced and Axon-Derived Netrin-1 Instructs GABAergic Neuron Migration and Dopaminergic Substantia Nigra Development. Neuron, 107(4), 684-702.e9. https://doi.org/10.1016/j.neuron.2020.05.037

      Cuesta, S., Nouel, D., Reynolds, LM, Morgunova, A., Torres-Berrio, A., White, A., Hernandez, G., Cooper, HM, Flores, C. (2020). Dopamine axon targeting in the nucleus accumbnes in adolescence requires Netrin-1. Frontiers in Cell and Developmental Biology, 8,  doi:10.3389/fcell.2020.00487

      Finci, L., Zhang, Y., Meijers, R., & Wang, J. H. (2015). Signaling mechanism of the netrin-1 receptor DCC in axon guidance. Progress in Biophysics and Molecular Biology, 118(3), 153-160. https://doi.org/10.1016/j.pbiomolbio.2015.04.001

      Glasgow, S. D., Labrecque, S., Beamish, I. V., Aufmkolk, S., Gibon, J., Han, D., Harris, S. N., Dufresne, P., Wiseman, P. W., McKinney, R. A., Séguéla, P., Koninck, P. D., Ruthazer, E. S., & Kennedy, T. E. (2018). Activity-Dependent Netrin-1 Secretion Drives Synaptic Insertion of GluA1-Containing AMPA Receptors in the Hippocampus. Cell Reports, 25(1),

      168-182.e6. https://doi.org/10.1016/j.celrep.2018.09.028

      Islam, K. U. S., Meli, N., & Blaess, S. (2021). The Development of the Mesoprefrontal Dopaminergic System in Health and Disease. Frontiers in Neural Circuits, 15, 746582. https://doi.org/10.3389/fncir.2021.746582

      Kalsbeek, A., Voorn, P., Buijs, R. M., Pool, C. W., & Uylings, H. B. M. (1988). Development of the Dopaminergic Innervation in the Prefrontal Cortex of the Rat. The Journal of Comparative Neurology, 269(1), 58–72. https://doi.org/10.1002/cne.902690105

      Rajasekharan, S., & Kennedy, T. E. (2009). The netrin protein family. Genome Biology, 10(9), 239. https://doi.org/10.1186/gb-2009-10-9-239

      Shatzmiller, R. A., Goldman, J. S., Simard-Émond, L., Rymar, V., Manitt, C., Sadikot, A. F., & Kennedy, T. E. (2008). Graded expression of netrin-1 by specific neuronal subtypes in the adult mammalian striatum. Neuroscience, 157(3), 621–636. https://doi.org/10.1016/j.neuroscience.2008.09.031

      Willing, J., Cortes, L. R., Brodsky, J. M., Kim, T., & Juraska, J. M. (2017). Innervation of the medial prefrontal cortex by tyrosine hydroxylase immunoreactive fibers during adolescence in male and female rats. Developmental Psychobiology, 59(5), 583–589. https://doi.org/10.1002/dev.21525

    1. Author Response:

      We appreciate the constructive reviews. We have performed additional analysis to address reviewer concerns, and we will submit a full revision in the near future. Our new analysis confirms that the visual stimulus can account for about a third of the variance in population neural activity. Pupil dynamics only account for a small fraction of the trial-to-trial variability, less than six percent. Once we regress out the stimulus responses and the pupil dynamics, we can use the network activity to predict the trial-to-trial variability of single neuron responses, and about eight percent of the variance is explained. Thus it appears as though multiplicative gain cannot account for the results. As for the concerns about missing spikes, we would like to direct readers to the supplementary figure that addresses that concern. The analysis shows that the correlation measurements are robust to the imprecisions of spike inference from calcium imaging data. Finally, we would also like to take the opportunity to clarify that we make no claim as to the discreteness of tuning classes. The GMM analysis was performed to obtain a data-driven, granular categorization of neuron tuning, to support detailed statistical analysis. We take no position on the discreteness or lack thereof of these groups. We agree that it is an interesting question, and we are happy to provide additional analysis in the revision to address this question. Our main result on functional connectivity structure holds regardless of the discreteness of neuron tuning selectivity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major recommendations

      (1) In lines 42-44 (abstract), the authors state that "ASARs function as essential RNA scaffolds for the assembly of hnRNP complexes that help maintain the structural integrity of each mammalian chromosome". Similar conclusions are restated in lines 138-140. Based on the data presented, it is evident that ASARs localization on chromatin is dependent on hnRNPs. However, there is insufficient evidence to conclude that ASARs cause the assembly of hnRNP complexes or that these hnRNP complexes are directly responsible for the regulation of chromosome replication. Please revise your claims.

      We have modified the text as follows: “Our results further demonstrate the role that ASARs play during the temporal order of genome-wide replication, and we propose that ASARs function as essential RNA scaffolds for the assembly of hnRNP complexes that help maintain the structural integrity of each mammalian chromosome.”

      (2) In the analysis in Figure 1C- F, it is unclear why XIST is used as a comparison to ASAR6-141. A more meaningful control would be to show that hnRNPs preferentially bind ASAR6-141 relative to all expressed transcripts. Also, some panels are missing the y-axis label.

      We have genetically validated 8 different ASAR genes for their role in controlling chromosome-wide replication timing. The only other gene known to control chromosome-wide replication timing is XIST, which also encodes a chromosome-associated lncRNA. Our analysis of publicly available eCLIP data (and previous literature on XIST-binding proteins) showed substantial overlap between RBPs that associate with ASARs and XIST. Hence, we anticipated that at least some RBP knockdowns would affect both lncRNAs, despite their contrasting functions. In addition, we routinely use XIST RNA as a positive control in RNA FISH assays, as the XIST RNA FISH protocol represents a robust and well validated chromosomal RNA FISH procedure.

      y-axis labels have been added to Figure 1.

      (3) In Figure 2K&L, it would be beneficial to quantify and normalize the BrdU incorporation, as ectopic integration of the sense 7kb region appears to result in overall higher BrdU incorporation in all chromosomes, not just chromosome 5.

      There are two main aspects of the BrdU incorporation assay that we use: 1) The BrdU incorporation banding pattern on each chromosome is unique to that chromosome, and the banding pattern is also representative of the time during S phase when the BrdU incorporation occurred, i.e. we detect a different banding pattern if BrdU is incorporated in early S phase versus late S phase. 2) The amount of BrdU incorporation can be used to measure the synchrony between chromosome homologs, but only within the same cell. Thus, we generate a ratio of BrdU incorporation in chromosome homologs in individual cells, then compare the ratio of incorporation into each chromosome pair in multiple cells (see Figure 2B-E). The overall BrdU incorporation into the chromosomes of different cells is quite variable; however, the banding pattern and ratio of BrdU incorporation in chromosome homologs in individual cells is comparable, unless we have disrupted or ectopically integrated an ASAR. Given the variability in overall BrdU incorporation detected between different cells in the population this is not a useful readout for measuring synchronous versus asynchronous replication between chromosome homologs.

      (4) hnRNP protein can regulate multiple aspects of RNA processing other than chromatin retention. Hence, it would be beneficial to rule out an alternative hypothesis as to what the hnRNP knockdowns do to ASAR6-131? For example, assessing changes in RNA levels or splicing upon knockdown of hnRNPs using qPCR?

      We agree that direct roles for any of the hnRNP/RBPs that are critical for ASAR RNA localization and replication timing have not been established. However, our findings combined with the observation that cells depleted of HNRNPU show reduced origin licensing in G1, and show reduced origin activation frequency during S phase (PMID: 34888666), supports a role for HNRNPU, either directly or indirectly, in DNA replication. Furthermore, we also found that depletion of the DNA replication fork remodeler HLTF or the deubiquitinase UCHL5 also results in mis-localization of ASAR RNAs, and results in asynchronous replication of every autosome pair, indicating that ASAR RNA mis-localization and asynchronous replication are not simply a phenotype associated with hnRNP depletions. A full mechanistic understanding of the role that ASAR RNAs play in combination with this relatively large and diverse set of hnRNP/RBPs will require a better understanding of the direct roles that each protein, and any higher order complexes that contain these proteins, play in regulating DNA synthesis, splicing, transcription, chromatin structure and/or ASAR RNA localization.

      (5) Both the disruption and ectopic expression of the 7kb region result in delayed chromosome replication. Would one not expect there to be opposing effects on replication timing? Please discuss.

      One puzzling set of observations is that loss of function mutations and gain of function mutations of ASAR genes result in a similar delayed replication timing and delayed mitotic condensation phenotype. We have detected delayed replication timing in human cells following genetic knockouts (loss of function) of eight different ASAR genes located on 5 different autosomes. We have also detected delayed replication timing on mouse chromosomes expressing transgenes (gain of function) from three different ASAR genes (ASAR6, ASAR6-141, and ASAR15). The ASAR transgenes ranged in size from an ~180kb BAC, to an ~3kb PCR product. One possible explanation for these observations is that ectopic integration of ASAR transgenes function in a dominant negative manner by interfering with the endogenous “ASARs” on the integrated chromosomes. Consistent with this possibility is that we recently identified ASAR candidate genes on every human autosome (PMC9588035). Our favored model is that expression of ASAR transgenes integrated into mouse chromosomes disrupts the function of endogenous ASARs by "out-competing" them for shared RBPs. We also point out that a similar ectopic integration assay, using Xist transgenes, has been an informative assay for characterization of Xist functions, including the ability to delay replication timing and induce gene silencing on autosomes (reviewed in PMID:19898525). One intriguing observation (yet largely ignored by the X inactivation field) is that deletion of the Xist gene on either the active or inactive X chromosomes in somatic cells results in delayed replication timing of the X chromosomes (PMC1667074; PMC1456779). Thus, both loss of function and gain of function mutations of Xist result in a similar delayed replication timing phenotype. Given these parallels between Xist and ASAR gene mutation phenotypes we were curious to test the consequences of ASAR gain of function on the inactive X chromosome. In this manuscript, we integrated the ~7kb ASAR6-141 transgene into the inactive X chromosome, and detected a delayed replication timing phenotype on the integrated X chromosome. We also detected an association between Xist and ASAR RNAs using RNA FISH in interphase cells (Figure 4A and 4B), which supports the observations that ASAR RNAs and XIST RNA are bound by a partially overlapping set of hnRNP/RBPs (Figure 1D-F), and is consistent with the model that ASAR transgenes disrupt function by competition for shared RBPs. Dissecting the roles that the hnRNP/RBPs that interact with both ASAR and XIST RNAs will undoubtably give important insights into both XIST and ASAR function, and how these poorly understood chromosomal phenotypes are generated.

      Minor recommendations

      (1) In Figure 1G, it would be informative to show where the LINE-1 element within ASAR6-141 is located to get a sense of what hnRNP proteins bind to it.

      There are numerous LINE-1 elements within the ASAR6-141 gene. The ~7kb RBPD does not contain LINE-1 sequences. Therefore, we did not detect significant hnRNP/RBP eCLIP peaks within LINE-1 sequences.

      (2) The rationale for ectopic integration of the 7kb region into the inactive X-chromosome is unclear. Is there something unique about the replication of the inactive X or were you interested in seeing whether the 7kb region could escape X-inactivation?

      Given the parallels between Xist and ASAR gene mutation phenotypes, i.e. loss of function and gain of function result in delayed replication timing (see above), we were curious to test the consequences of ASAR gene gain of function on the inactive X chromosome. One possibility was reversal of X inactivation and a shift to earlier replication timing. However, we detected delayed replication timing on the inactive X, and an enhanced XIST RNA FISH signal that overlapped with the ASAR RNA. This speaks to the comment of Reviewer 2 questioning: "Is it possible that integration might alter Xist expression confounding this interpretation? ". The enhanced XIST RNA FISH signal suggests that the delayed replication of the inactive X is not due to reduced expression of XIST RNA.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this potentially useful study, the authors attempt to use comparative meta-analysis to advance our understanding of life history evolution. Unfortunately, both the meta-analysis and the theoretical model is inadequate and proper statistical and mechanistic descriptions of the simulations are lacking. Specifically, the interpretation overlooks the effect of well-characterised complexities in the relationship between clutch size and fitness in birds.

      Public Reviews:

      We would like to thank the reviewers for their helpful comments, which have been considered carefully and have been valuable in progressing our manuscript. The following bullet points summarise the key points and our responses, though our detailed responses to specific comments can be found below:<br /> - Two reviewers commented that our data was not made available. Our data was provided upon submission and during the review process, however was not made accessible to the reviewers. Our data and code are available at https://doi.org/10.5061/dryad.q83bk3jnk.

      - The reviewers have highlighted that some of our methodology was unclear and we have added all the requested detail to ensure our methods can be easily understood.

      - The reviewers highlight the importance of our conclusions, but also suggest some interpretations might be missing and/or are incomplete. To make clear how we objectively interpreted our data and the wider consequences for life-history theory we provide a decision tree (Figure 5). This figure makes clear where we think the boundaries are in our interpretation and how multiple lines of evidence converge to the same conclusions.

      Reviewer #1 (Public Review):

      This paper falls in a long tradition of studies on the costs of reproduction in birds and its contribution to understanding individual variation in life histories. Unfortunately, the meta-analyses only confirm what we know already, and the simulations based on the outcome of the meta-analysis have shortcomings that prevent the inferences on optimal clutch size, in contrast to the claims made in the paper.

      There was no information that I could find on the effect sizes used in the meta-analyses other than a figure listing the species included. In fact, there is more information on studies that were not included. This made it impossible to evaluate the data-set. This is a serious omission, because it is not uncommon for there to be serious errors in meta-analysis data sets. Moreover, in the long run the main contribution of a meta-analysis is to build a data set that can be included in further studies.

      It is disappointing that two referees comment on data availability, as we supplied a link to our full dataset and the code we used in Dryad with our submitted manuscript. We were also asked to supply our data during the review process and we again supplied a link to our dataset and code, along with a folder containing the data and code itself. We received confirmation that the reviewers had been given our data and code. We support open science and it was our intention that our dataset should be fully available to reviewers and readers. Our data and code are at https://doi.org/10.5061/dryad.q83bk3jnk.

      The main finding of the meta-analysis of the brood size manipulation studies is that the survival costs of enlarging brood size are modest, as previously reported by Santos & Nakagawa on what I suspect to be mostly the same data set.

      We disagree that the main finding of our paper is the small survival cost of manipulated brood size. The major finding of the paper, in our opinion, is that the effect sizes for experimental and observational studies are in opposite directions, therefore providing the first quantitative evidence to support the influential theoretical framework put forward by van Noordwijk and de Jong (1986), that individuals differ in their optimal clutch size and are constrained to reproducing at this level due to a trade-off with survival. We further show that while the manipulation experiments have been widely accepted to be informative, they are not in fact an effective test of whether within-species variation in clutch size is the result of a trade-off between reproduction and survival.

      The comment that we are reporting the same finding as Santos & Nakagawa (2012) is a misrepresentation of both that study and our own. Santos & Nakagawa found an effect of parental effort on survival only in males who had their clutch size increased – but no effect for males who had their clutch size reduced and no survival effect on females for either increasing or reducing parental effort. However, we found an overall reduction in survival for birds who had brood sizes manipulated to be larger than their original brood (for both sexes and mixed sex studies combined). In our supplementary information, we demonstrate that the overall survival effect of a change in reproductive effort is close to zero for males, negative (though non-significant) for females and significantly negative for mixed sexes (which are not included in the Santos & Nakagawa study). Please also note that the Santos & Nakagawa study was conducted over 10 years ago. This means we added additional data (L364-365). Furthermore, meta-analyses are an evolving practice and we also corrected and improved on the overall analysis approach (e.g. L358-359 and L 393-397, and see detailed SI).

      The paper does a very poor job of critically discussing whether we should take this at face value or whether instead there may be short-comings in the general experimental approach. A major reason why survival cost estimates are barely significantly different from zero may well be that parents do not fully adjust their parental effort to the manipulated brood size, either because of time/energy constraints, because it is too costly and therefore not optimal, or because parents do not register increased offspring needs. Whatever the reason, as a consequence, there is usually a strong effect of brood size manipulation on offspring growth and thereby presumably their fitness prospects. In the simulations (Fig.4), the consequences of the survival costs of reproduction for optimal clutch size were investigated without considering brood size manipulation effects on the offspring. Effects on offspring are briefly acknowledged in the discussion, but otherwise ignored. Assuming that the survival costs of reproduction are indeed difficult to discern because the offspring bear the brunt of the increase in brood size, a simulation that ignores the latter effect is unlikely to yield any insight in optimal clutch size. It is not clear therefore what we learn from these calculations.

      The reviewer’s comment is somewhat of a paradox. We take the best studied example of the trade-off between reproductive effort and parental survival – a key theme in life history and the biology of ageing – and subject this to a meta-analysis. The reviewer suggests we should interpret our finding as if there must be something wrong with the method or studies we included, rather than considering that the original hypothesis could be false or inflated in importance. We do not consider questioning the premise of the data over questioning a favoured hypothesis to necessarily be the best scientific approach here. In many places in our manuscript, we question and address, at length, the underlying data and their interpretation (L116-117, L165-167, 202-204 and L277-282). Moreover, we make it clear that we focus on the trade-off between current reproductive effort and subsequent parental survival, while being aware that other trade-offs could counter-balance or explain our findings (discussed on L208-210 & L301-316). Note that it is also problematic, when you do not find the expected response, to search for an alternative that has not been measured. In the case here, of potential trade-offs, there are endless possibilities of where a trade-off might operate between traits. We purposefully focus on the one well-studied and most commonly invoked trade-off. We clearly acknowledge, though, that when all possible trade-offs are taken into account a trade-off on the fitness level can occur and cite two famous studies (Daan et al., 1990 and Verhulst & Tinbergen 1991) that have shown just that (L314-316).

      So whilst we agree with the reviewer that the offspring may incur costs themselves, rather than costs being incurred by the parents, the aim of our study was to test for a general trend across species in the survival costs of reproductive effort. It is unrealistic to suggest that incorporating offspring growth into our simulations would add insight, as a change in offspring number rarely affects all offspring in the nest equally and there can even be quite stark differences; for example, this will be most evident in species that produce sacrificial offspring. This effect will be further confounded by catch-up growth, for example, and so it is likely that increased sibling competition from added chicks alters offspring growth trajectories, rather than absolute growth as the reviewer suggests. There are mixed results in the literature on the effect of altering clutch size on offspring survival, with an increased clutch size through manipulation often increasing the number of recruits from a nest.

      What we do appreciate from the reviewer’s comment is that the interpretation of our findings is complex. Even though our in-text explanation includes the caveats the reviewer refers to, and are discussed at length, their inter-relationships are hard to appreciate from a text format. To improve this presentation and for ease of the reader, we have added a decision tree (Figure 5) which represents the logical flow from the hypothesis being tested through to what overall conclusion can be drawn from our results. We believe this clarifies what conclusions can be drawn from our results. We emphasise again that the theory that trade-offs between reproductive effort and parental survival being the major driver of variation in offspring production was not supported though is the one that practitioners in the field would be most likely to invoke, and our result is important for this reason.

      There are other reasons why brood size manipulations may not reveal the costs of reproduction animals would incur when opting for a larger brood size than they produced spontaneously themselves. Firstly, the manipulations do not affect the effort incurred in laying eggs (which also biases your comparison with natural variation in clutch size). Secondly, the studies by Boonekamp et al on Jackdaws found that while there was no effect of brood size manipulation on parental survival after one year of manipulation, there was a strong effect when the same individuals were manipulated in the same direction in multiple years. This could be taken to mean that costs are not immediate but delayed, explaining why single year manipulations generally show little effect on survival. It would also mean that most estimates of the fitness costs of manipulated brood size are not fit for purpose, because typically restricted to survival over a single year.

      First, our results did show a survival cost of reproduction for brood manipulations (L107-123, Figure 1, Table 1). Note, however, that much theory is built on the immediate costs of reproduction and, as such, these costs are likely overinterpreted, meaning that our overall interpretation still holds, i.e. “parental survival trade-off is not the major determinative trade-off in life history within-species” (Figure 5).

      We agree with the reviewer that lifetime manipulations could be even more informative than single-year manipulations. Unfortunately, there are currently too few studies available to be able to draw generalisable conclusions across species for lifetime manipulations. This is, however, the reason we used lifetime change in clutch size in our fitness projections, which the reviewer seems to have missed – please see methods line 466-468, where we explicitly state that this is lifetime enlargement. Of course, such interpretations do not include an accumulation of costs that is greater than the annual cost, but currently there is no clear evidence that such an assumption is valid. Such a conclusion can also not be drawn from the study on jackdaws by Boonekamp et al (2014) as the treatments were life-long and, therefore, cannot separate annual from accrued (multiplicative) costs that are more than the sum of the annual costs incurred. Note that we have now included specific discussion of this study in response to the reviewer (L265-269).

      Details of how the analyses were carried out were opaque in places, but as I understood the analysis of the brood size manipulation studies, manipulation was coded as a covariate, with negative values for brood size reductions and positive values for brood size enlargements (and then variably scaled or not to control brood or clutch size). This approach implicitly assumes that the trade-off between current brood size (manipulation) and parental survival is linear, which contrasts with the general expectation that this trade-off is not linear. This assumption reduces the value of the analysis, and contrasts with the approach of Santos & Nakagawa.

      We thank the reviewer for highlighting a lack of clarity in places in our methods. We have added additional detail to the methodology section (see “Study sourcing & inclusion criteria” and “Extracting effect sizes”) in our revised manuscript. Note, that our data and code was not shared with the reviewers despite us supplying this upon submission and again during the review process, which would have explained a lot more of the detail required.

      For clarity in our response, each effect size was extracted by performing a logistic regression with survival as a binary response variable and clutch size was the absolute value of offspring in the nest (i.e., for a bird that laid a clutch size of 5 but was manipulated to have -1 egg, we used a clutch size value of 4). The clutch size was also standardised and, separately, expressed as a proportion of the species’ mean.

      We disagree that our approach reduces the value of our analysis. First, our approach allows a direct comparison between experimental and observational studies, which is the novelty of our study. Our approach does differ from Santos & Nakagawa but we disagree that it contrasts. Our approach allows us to take into consideration the severity of the change in clutch size, which Santos & Nakagawa do not. Therefore, we do not agree that our approach is worse at accounting for non-linearity of trade-offs than the approach used by Santos & Nakagawa. Arguably, the approach by Santos & Nakagawa is worse, as they dichotomise effort as increased or decreased, factorise their output and thereby inflate their number of outcomes, of which only 1 cell of 4 categories is significant (for males and females, increased and decreased brood size). The proof is in the pudding as well, as our results clearly demonstrate that the magnitude of the manipulation is a key factor driving the results, i.e. one offspring for a seabird is a larger proportion of care (and fitness) than one offspring for a passerine. Such insights were not achieved by Santos & Nakagawa’s method and, again, did not allow a direct quantitative comparison between quality (correlational) and experimental (brood size manipulation, i.e. “trade-off”) effects, which forms a central part of our argumentation (Figure 5). 

      Our analysis, alongside a plethora of other ecological studies, does assume that the response to our predictor variable is linear. However, it is common knowledge that there are very few (if any) truly linear relationships. We use linear relationships because they serve a good approximation of the trend and provide a more rigorous test for an underlying relationship than would fitting nonlinear models. For many datasets the range of added chicks required to estimate a non-linear relationship was not available. The question also remains of what the shape of such a non-linear relationship should be and is hard to determine a priori. There is also a real risk when fitting non-linear terms that they are spurious and overinterpreted, as they often present a better fit (denoting one df is not sufficient especially when slopes vary). We have added this detail to our discussion.

      The observational study selection is not complete and apparently no attempt was made to make it complete. This is a missed opportunity - it would be interesting to learn more about interspecific variation in the association between natural variation in clutch size and parental survival.

      We clearly state in our manuscript that we deliberately tailored the selection of studies to match the manipulation studies (L367-369). We paired species extracted for observational studies with those extracted in experimental studies to facilitate a direct comparison between observational and experimental studies, and to ensure that the respective datasets were comparable. The reviewer’s focus in this review seems to be solely on the experimental dataset. This comment dismisses the equally important observational component of our analysis and thereby fails to acknowledge one of the key questions being addressed in this study. Note that in our revised version we have edited the phylogenetic tree to indicate for which species we have both types of information, which highlights our approach to selecting observational data (Figure 3).

      Reviewer #2 (Public Review):

      I have read with great interest the manuscript entitled "The optimal clutch size revisited: separating individual quality from the costs of reproduction" by LA Winder and colleagues. The paper consists in a meta-analysis comparing survival rates from studies providing clutch sizes of species that are unmanipulated and from studies where the clutch sizes are manipulated, in order to better understand the effects of differences in individual quality and of the costs of reproduction. I find the idea of the manuscript very interesting. However, I am not sure the methodology used allows to reach the conclusions provided by the authors (mainly that there is no cost of reproduction, and that the entire variation in clutch size among individuals of a population is driven by "individual quality").

      We would like to highlight that we do not conclude that there is no cost of reproduction. Please see lines 336–339, where we state that our lack of evidence for trade-offs driving within-species variation in clutch size does not necessarily mean the costs of reproduction are non-existent. We conclude that individuals are constrained to their optima by the survival cost of reproduction. It is also an over-statement of our conclusion to say that we believe that variation in clutch size is only driven by quality. Our results show that unmanipulated birds that have larger clutch sizes also lived longer, and we suggest that this is evidence that some individuals are “better” than others, but we do not say, nor imply, that no other factors affect variation in clutch size. We have added Figure 5 to our manuscript to help the reader better understand what questions we can answer with our study and what conclusions we can draw from our results.

      I write that I am not sure, because in its current form, the manuscript does not contain a single equation, making it impossible to assess. It would need at least a set of mathematical descriptions for the statistical analysis and for the mechanistic model that the authors infer from it.

      We appreciate this comment, and have explained our methods in terms that are accessible to a wider audience. Note, however, that our meta-analysis is standard and based on logistic regression and standard meta-analytic practices. We have added the model formula to the model output tables.

      For the simulation, we simply simulated the resulting effects. We of course supplied our code for this along with our manuscript (https://doi.org/10.5061/dryad.q83bk3jnk), though as we mentioned above, we believe this was not shared with the reviewers despite us making this available for the review process. We therefore understand why the reviewer feels the simulations were not explained thoroughly. We have revised our methods section and added details which we believe make our methodology more clear without needing to consult the supplemental material. However, we have also added the equations used in the process of calculating our simulated data to the Supplementary Information for readers who wish to have this information in equation form.

      The texts mixes concepts of individual vs population statistics, of within individual vs among-individuals measures, of allocation trade-offs and fitness trade-offs, etc ....which means it would also require a glossary of the definitions the authors use for these various terms, in order to be evaluated.

      We would like to thank the reviewer for highlighting this lack of clarity in our text. Throughout the manuscript we have refined our terminology and indicated where we are referring to the individual level or the population level. The inclusion of our new Figure 5 (decision tree) should also help in this context, as it is clear on which level we base our interpretation and conclusions on.

      This problem is emphasised by the following sentence to be found in the discussion "The effect of birds having naturally larger clutches was significantly opposite to the result of increasing clutch size through brood manipulation". The "effect" is defined as the survival rate (see Fig 1). While it is relatively easy to intuitively understand what the "effect" is for the unmanipulated studies: the sensitivity of survival to clutch size at the population level, this should be mentioned and detailed in a formula. Moreover, the concept of effect size is not at all obvious for the manipulated ones (effect of the manipulation? or survival rate whatever the manipulation (then how could it measure a trade-off ?)? at the population level? at the individual level ?) despite a whole appendix dedicated to it. This absolutely needs to be described properly in the manuscript.

      Thank you for identifying this sentence for which the writing was ambiguous, our apologies. We have now rewritten this and included additional explanation. L282-290: ‘The effect on parental annual survival of having naturally larger clutches was significantly opposite to the result of increasing clutch size through brood manipulation, and quantitatively similar. Parents with naturally larger clutches are thus expected to live longer and this counterbalances the “cost of reproduction” when their brood size is experimentally manipulated. It is, therefore, possible that quality effects mask trade-offs. Furthermore, it could be possible that individuals that lay larger clutches have smaller costs of reproduction, i.e. would respond less in terms of annual survival to a brood size manipulation, but with our current dataset we cannot address this hypothesis (Figure 5).’

      We would also like to thank the reviewer for bringing to our attention the lack of clarity about the details of our methodology. We have added details to our methodology (see “Extracting effect sizes” section) to address this (see highlighted sections). For clarity, the effect size for both manipulated and unmanipulated nests was survival, given the brood size raised. We performed a logistic regression with survival as a binary response variable (i.e., number of individuals that survived and number of individuals that died after each breeding season), and clutch size was the absolute value of offspring in the nest (i.e., for a bird that laid a clutch size of 5 but was manipulated to have -1 egg, we used a clutch size value of 4). This allows for direct comparison of the effect size (survival given clutch size raised) between manipulated and unmanipulated birds.

      Despite the lack of information about the underlying mechanistic model tested and the statistical model used, my impression is still that the interpretation in the introduction and discussion is not granted by the outputs of the figures and tables. Let's use a model similar to that of (van Noordwijk and de Jong, 1986): imagine that the mechanism at the population level is

      a.c_(i,q)+b.s_(i,q)=E_q

      Where c_(i,q) are s_(i,q) are respectively the clutch size for individual i which is of quality q, and E_q is the level of "energy" that an individual of quality q has available during the given time-step (and a and b are constants turning the clutch size and survival rate into energy cost of reproduction and energy cost of survival, and there are both quite "high" so that an extra egg (c_(i,q) is increased by 1) at the current time-step, decreases s_(i,q) markedly (E_q is independent of the number of eggs produced), that is, we have strong individual costs of reproduction). Imagine now that the variance of c_(i,q) (when the population is not manipulated) among individuals of the same quality group, is very small (and therefore the variance of s_(i,q) is very small also) and that the expectation of both are proportional to E_q. Then, in the unmanipulated population, the variance in clutch size is mainly due to the variance in quality. And therefore, the larger the clutch size c_(i,q) the higher E_q, and the higher the survival s_(i,q).

      In the manipulated populations however, because of the large a and b, an artificial increase in clutch size, for a given E_q, will lead to a lower survival s_(i,q). And the "effect size" at the population level may vary according to a,b and the variances mentioned above. In other words, the costs of reproduction may be strong, but be hidden by the data, when there is variance in quality; however there are actually strong costs of reproduction (so strong actually that they are deterministic and that the probability to survive is a direct function of the number of eggs produced)

      We would like to thank the reviewer for these comments. We have added detail to our methodology section so our models and rationale are more clear. Please note that our simulations only take the experimental effect of brood size on parental survival into account. Our model does not incorporate quality effects. The reviewer is right that the relationship between quality and the effects exposed by manipulating brood size can take many forms and this is a very interesting topic, but not one we aimed to tackle in our manuscript. In terms of quality we make two points: (1) overall quality effects connecting reproduction and parental survival are present, (2) these effects are opposite in direction to the effects when reproduction is manipulated and similar in magnitude. We do not go further than that in interpreting our results. The reviewer is correct, however, that we do suggest and repeat suggestions by others that quality can also mask the trade-off in some individuals or circumstances (L74-76, L95-98 & L286-289), but we do not quantify this, as it is dependent on the unknown relationship between quality and the response to the manipulation. A focussed set of experiments in that context would be interesting and there are some data that could get at this, i.e. the relationship between produced clutch size and the relative effect of the manipulation (now included L287-290). Such information is, however, not available for all studies and, although we explored the possibility of analysing this, currently this is not possible with adequate confidence and there is the possible complexity of non-linear effects. We have added this rationale in our revision (L259-265).

      Moreover, it seems to me that the costs of reproduction are a concept closely related to generation time. Looking beyond the individual allocative (and other individual components of the trade-off) cost of reproduction and towards a populational negative relationship between survival and reproduction, we have to consider the intra-population slow fast continuum (some types of individuals survive more and reproduce less (are slower) than other (which are faster)). This continuum is associated with a metric: the generation time. Some individuals will produce more eggs and survive less in a given time-period because this time-period corresponds to a higher ratio of their generation time (Gaillard and Yoccoz, 2003; Gaillard et al., 2005). It seems therefore important to me, to control for generation time and in general to account for the time-step used for each population studied when analysing costs of reproduction. The data used in this manuscript is not just clutch size and survival rates, but clutch size per year (or another time step) and annual (or other) survival rates.

      The reviewer is right that this is interesting. There is a longstanding unexplained difference in temperate (seasonal) and tropical reproductive strategies. Most of our data come from seasonal breeders, however. Although there is some variation in second brooding and such, these species mostly only produce one brood. We do agree that a wider consideration here is relevant, but we are not trying to explain all of life history in our paper. It is clearly the case that other factors will operate and the opportunity for trade-offs will vary among species according to their respective life histories. However, our study focuses on the two most fundamental components of fitness – longevity and reproduction – to test a major hypothesis in the field, and we uncover new relationships that contrast with previous influential studies and cast doubt on previous conclusions. We question the assumed trade-off between reproduction and annual survival. We show that quality is important and that the effect we find in experimental studies is so small that it can only explain between-species patterns but is unlikely to be the selective force that constrains reproduction within species. We do agree that there is a lot more work that can be done in this area. We hope we are contributing to the field, by questioning this central trade-off. We have incorporated some of the reviewers suggestions in the revision (L309-315). We have added Figure 5 to make clear where we are able to reach solid conclusions and the evidence on which these are based as clearly as possible in an easily accessible format.

      Finally, it is important to relate any study of the costs of reproduction in a context of individual heterogeneity (in quality for instance), to the general problem of the detection of effects of individual differences on survival (see, e.g., Fay et al., 2021). Without an understanding of the very particular statistical behaviour of survival, associated to an event that by definition occurs only once per life history trajectory (by contrast to many other traits, even demographic, where the corresponding event (production of eggs for reproduction, for example) can be measured several times for a given individual during its life history trajectory).

      Thank you for raising this point. The reviewer is right that heterogeneity can dampen or augment selection. Note that by estimating the effect of quality here we give an example of how heterogeneity can possibly do exactly this. We thank the reviewer for raising that we should possibly link this to wider effects of heterogeneity and we have added to our discussion of how our results play into the importance of accounting for among-individual heterogeneity (L252-256).

      References:

      Fay, R. et al. (2021) 'Quantifying fixed individual heterogeneity in demographic parameters: Performance of correlated random effects for Bernoulli variables', Methods in Ecology and Evolution, 2021(August), pp. 1-14. doi: 10.1111/2041-210x.13728.

      Gaillard, J.-M. et al. (2005) 'Generation time: a reliable metric to measure life-history variation among mammalian populations.', The American naturalist, 166(1), pp. 119-123; discussion 124-128. doi: 10.1086/430330.

      Gaillard, J.-M. and Yoccoz, N. G. (2003) 'Temporal Variation in Survival of Mammals: a Case of Environmental Canalization?', Ecology, 84(12), pp. 3294-3306. doi: 10.1890/02-0409.

      van Noordwijk, A. J. and de Jong, G. (1986) 'Acquisition and Allocation of Resources: Their Influence on Variation in Life History Tactics', American Naturalist, p. 137. doi: 10.1086/284547.

      Reviewer #3 (Public Review):

      The authors present here a comparative meta-analysis analysis designed to detect evidence for a reproduction/ survival trade-off, central to expectations from life history theory. They present variation in clutch size within species as an observation in conflict with expectations of optimisation of clutch size and suggest that this may be accounted for from weak selection on clutch size. The results of their analyses support this explanation - they found little evidence of a reproduction - survival trade-off across birds. They extrapolated from this result to show in a mathematical model that the fitness consequences of enlarged clutch sizes would only be expected to have a significant effect on fitness in extreme cases, outside of normal species' clutch size ranges. Given the centrality of the reproduction-survival trade-off, the authors suggest that this result should encourage us to take a more cautious approach to applying concepts the trade-off in life history theory and optimisation in behavioural ecology more generally. While many of the findings are interesting, I don't think the argument for a major re-think of life history theory and the role of trade-offs in fitness maximisation is justified.

      The interest of the paper, for me, comes from highlighting the complexities of the link between clutch size and fitness, and the challenges facing biologists who want to detect evidence for life history trade-offs. Their results highlight apparently contradictory results from observational and experimental studies on the reproduction-survival trade-off and show that species with smaller clutch sizes are under stronger selection to limit clutch size.

      Unfortunately, the authors interpret the failure to detect a life history trade-off as evidence that there isn't one. The construction of a mathematical model based on this interpretation serves to give this possible conclusion perhaps more weight than is merited on the basis of the results, of this necessarily quite simple, meta-analysis. There are several potential complicating factors that could explain the lack of detection of a trade-off in these studies, which are mentioned and dismissed as unimportant (lines 248-250) without any helpful, rigorous discussion. I list below just a selection of complexities which perhaps deserve more careful consideration by the authors to help readers understand the implications of their results:

      We would like to thank the reviewer for their thoughtful response and summary of the findings that we also agree are central to our study. The reviewer also highlights areas where our manuscript could benefit from a deeper consideration and we have added detail accordingly to our revised discussion.

      We would like to highlight that we do not interpret the failure to detect a trade-off as evidence that there is not one. First, and importantly, we do find a trade-off but show this is only incurred when individuals produce a clutch beyond their optimal level. Second, we also state on lines 322-326 that the lack of evidence to support trade-offs being strong enough to drive variation in clutch size does not necessarily mean there are no costs of reproduction.

      The statement that we have constructed a mathematical model based on the interpretation that we have not found a trade-off is, again, factually incorrect. We ran these simulations because the opposite is true – we did find a trade-off. There is a significant effect of clutch size when manipulated on annual parental survival. We benefit from our unique analysis allowing for a quantitative fitness estimate from the effect size on annual survival (as this is expressed on a per-egg basis). This allowed us to ask whether this quantitative effect size can alone explain why reproduction is constrained, and we evaluate this using simulations. From these simulations we find that this effect size is too small to explain the constraint, so something else must be going on, and we do spend a considerable amount of text discussing the possible explanations (L202-215). Note that the possibly most parsimonious conclusion here is that costs of reproduction are not there, or simply small, so we also give that explanation some thought (L221-224 and L315-331).

      We are disappointed by the suggestion that we have dismissed complicating factors that could prevent detection of a trade-off, as this was not our intention. We were aiming to highlight that what we have demonstrated to be an apparent trade-off can be explained through other mechanisms, and that the trade-off between clutch size and survival is not as strong in driving within-species variation in clutch size as previously assumed. We have added further discussion to our revised manuscript to make this clear and give readers a better understanding of the complexity of factors associated with life-history theory, including the addition of a decision tree (Figure 5).

      • Reproductive output is optimised for lifetime reproductive success and so the consequences of being pushed off the optimum for one breeding attempt are not necessarily detectable in survival but in future reproductive success (and, therefore, lifetime reproductive success).

      We agree this is a valid point, which is mentioned in our manuscript in terms of alternative stages where the costs of reproduction might be manifested (L316-320). We would also like to highlight that , in our simulations, the change in clutch size (and subsequent survival cost) was assumed for the lifetime of the individual, for this very reason.

      • The analyses include some species that hatch broods simultaneously and some that hatch sequentially (although this information is not explicitly provided (see below)). This is potentially relevant because species which have been favoured by selection to set up a size asymmetry among their broods often don't even try to raise their whole broods but only feed the biggest chicks until they are sated; any added chicks face a high probability of starvation. The first point this observation raises is that the expectation of more chicks= more cost, doesn't hold for all species. The second more general point is that the very existence of the sequential hatching strategy to produce size asymmetry in a brood is very difficult to explain if you reject the notion of a trade-off.

      We agree with the reviewer that the costs of reproduction can be absorbed by the offspring themselves, and may not be equal across offspring (we also highlight this at L317-318 in the manuscript). However, we disagree that for some species the addition of more chicks does not equate to an increase in cost, though we do accept this might be less for some species. This is, however, difficult to incorporate into a sensible model as the impacts will vary among species and some species do also exhibit catch-up growth. So, without a priori knowledge on this, we kept our model simple to test whether the effect on parental survival (often assumed to be a strong cost) can explain the constraint on reproductive effort, and we conclude that it does not.

      We would also like to make clear that we are not rejecting the notion of a trade-off. Our study shows evidence that a trade-off between survival and reproductive effort probably does not drive within-species variation in clutch size. We do explicitly say this throughout our manuscript, and also provide suggestions of other areas where a trade-off may exist (L317-320). The point of our study is not whether trade-offs exist or not, it is whether there is a generalisable across-species trend for a trade-off between reproductive effort and survival – the most fundamental trade-off in our field but for which there is a lack of conclusive evidence within species. We believe the addition of Figure 5 to our reviewed manuscript also makes this more evident.

      • For your standard, pair-breeding passerine, there is an expectation that costs of raising chicks will increase linearly with clutch size. Each chick requires X feeding visits to reach the required fledge weight. But this is not the case for species which lay precocious chicks which are relatively independent and able to feed themselves straight after hatching - so again the relationship of care and survival is unlikely to be detectable by looking at the effect of clutch size but again, it doesn't mean there isn't a trade-off between breeding and survival.

      Precocial birds still provide a level of parental care, such as protection from predators. Though we agree that the level of parental care in provisioning food (and in some cases in all parental care given) is lower in precocial than altricial birds, this would only make our reported effect size for manipulated birds to be an underestimate. Again, we would like to draw the reviewer’s attention to the fact we did detect a trade-off in manipulated birds and we do not suggest that trade-offs do not exist. The argument the reviewer suggests here does not hold for unmanipulated birds, as we found that birds that naturally lay larger clutch sizes have higher survival.

      • The costs of raising a brood to adulthood for your standard pair-breeding passerine is bound to be extreme, simply by dint of the energy expenditure required. In fact, it was shown that the basal metabolic rate of breeding passerines was at the very edge of what is physiologically possible, the human equivalent being cycling the Tour de France (Nagy et al. 1990). If birds are at the very edge of what is physiologically possible, is it likely that clutch size is under weak selection?

      If birds are at the very edge of what is physiologically possible, then indeed it would necessarily follow that if they increase the resource allocated in one area then expenditure in another area must be reduced. In many studies, however, the overall brood mass is increased when chicks are added and cared for in an experimental setting, suggesting that birds are not operating at their limit all the time. Our simulations show that if individuals increase their clutch size, the survival cost of reproduction counterbalances the fitness gained by increasing clutch size and so there is no overall fitness gain to producing more offspring. Therefore, selection on clutch size is constrained to the within-species level. We do not say in our manuscript that clutch size is under weak selection – we only ask why variation in clutch size is maintained if selection always favours high-producing birds.

      • Variation in clutch size is presented by the authors as inconsistent with the assumption that birds are under selection to lay the Lack clutch. Of course, this is absurd and makes me think that I have misunderstood the authors' intended point here. At any rate, the paper would benefit from more clarity about how variable clutch size has to be before it becomes a problem for optimality in the authors' view (lines 84-85; line 246). See Perrins (1965) for an exquisite example of how beautifully great tits optimise clutch size on average, despite laying between 5-12 eggs.

      We thank the reviewer for highlighting that our manuscript may be misleading in places, however, we are unsure which part of our conclusions the author is referring to here. The question we pose is “Why don’t all birds produce a clutch size at the population optimum?”, and is central to the decades-long field of life-history theory. Why is variation maintained? As the reviewer outlines, there is extensive variability, with some birds laying half of what other birds lay.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Title: while the costs of reproduction are possibly important in shaping optimal clutch size, it is not clear what you can about it given that you do not consider clutch / brood size effects on fitness prospects of the offspring.

      We have expanded on our discussion of how some costs may be absorbed by the offspring themselves. However, a change in offspring number rarely affects all offspring in the nest equally and there can even be quite stark differences; for example this will be most evident in species that produce sacrificial offspring. This effect will be further confounded by catch-up growth. There are mixed results in the literature on the effect of altering clutch size on offspring survival, with an increased clutch size through manipulation often increasing the number of recruits from a nest. We have focussed on the relationship between reproductive effort and survival because it is given the most weight in the field in terms of driving intra-specific variation in clutch size. We have altered our title to show we focus on the survival costs specifically: “The optimal clutch size revisited: separating individual quality from the parental survival costs of reproduction”.

      (2) L.11-12: I agree that this is true for birds, but this is phrased more generally here. Are you sure that that is justified?

      The trade-off between survival and reproductive effort has largely been tested experimentally through brood manipulations in birds as this provides a good system in which to test the costs and benefits of increasing parental effort. The work in this area has provided theory beyond just passerine birds, which are the most commonly manipulated group, to across-taxa theories. We are unaware of any study/studies that provide evidence that the reproduction/survival trade-off is generalisable across multiple species in any taxa. As such, we do believe this sentence is justified. An example is the lack of a consistent negative genetic correlation in populations of fruitflies, for example, that has also been hailed as a lack-of-cost paradigm. Furthermore, some mutants that live longer do so without a cost on reproduction.

      (3) L.13-14: Not sure what you mean with this sentence - too much info lacking.

      We have added some detail to this sentence.

      (4) L.14: it is slightly awkward to say 'parental investment and survival' because it is the survival effect that is usually referred to as the 'investment'. Perhaps what you want to say is 'parental effort and survival'?

      We have replaced “parental investment” with “reproductive effort”

      (5) L.15: you can omit 'caused'. Compared to control treatment or to reduced broods? Why not mention effects or lack thereof of brood reduction? And it would be good to also mention here whether effects were similar in the sexes.

      Please see our methodology where we state that we use clutch size as a continuous variable (we do not compare to control or reduced but include the absolute value of offspring in a logistic regression). The effects of a brood reduction are drawn from the same regression and so are opposite. Though we appreciate the detail here is lacking to fully comprehend our study, we would like to highlight this is the abstract and details are provided in the main text.

      (6) L. 15: I am not sure why you write 'however', as the finding that experimental and natural variation have opposite effects is in complete agreement with what is generally reported in the literature and will therefore surprise no one that is aware of the literature.

      We use “however” to highlight the change in direction of the effect size from the results in the previous sentence. We also believe that ours ise the first study that provides a quantitative estimate of this effect and that previous work is largely theoretical. The reviewer states that this is what is generally reported but it is not reported in all cases, as some relationships between reproductive effort and survival are negative (for the quality measurement, in correlational space, see Figure 1).

      (7) L.16: saying 'opposite to the effect of phenotypic quality' seems difficult to justify, as clutch size cannot be equated with phenotypic quality. Perhaps simply say 'natural variation in clutch size'? If that is what you are referring to.

      Please note we are referring to effect sizes here –- that is, the survival effect of a change in clutch size. By phenotypic quality we are referring to the fact that we find higher parental survival when natural clutch sizes are higher. It is not the case that we refer to quality only as having a higher clutch size. This is explicitly stated in the sentence you refer to. We have changed “effect” to “effect size” to highlight this further.

      (8) L.18: why do you refer to 'parental care' here? Brood size is not equivalent to parental care.

      Brood size manipulations are used to manipulate parental care. The effect on parental survival is expected to be incurred because of the increase in parental care. We have changed “parental care” to “reproductive effort” to reduce the number of terms we use in our manuscript.

      (9) L.18-19: suggest to tone down this claim, as this is no more than a meta-analytic confirmation of a view that is (in my view) generally accepted in the field. That does not mean it is not useful, just that it does not constitute any new insight.

      We are unaware of any other study which provides generalisable across-species evidence for opposite effects of quality and costs of reproduction. The work in this area is also largely theoretical and is yet to be supported experimemtally, especially in a quantitative fashion. It is surprising to us that the reviewer considers there to be general acceptance in a field, rather than being influenced by rigorous testing of hypotheses, made possible by meta-analysis, the current gold standard in our field.

      (10) L.21: what does 'parental effort' mean here? You seem to use brood size, parental care, parental effort, and parental investment interchangeably but these are different concepts. Daan et al (1990, Behaviour), which you already cite, provide a useful graph separating these concepts. Please adjust this throughout the manuscript, i.e. replace 'reproductive effort' with wording that reflect the actual variable you use.

      We have not used the phrase “parental effort” in this sentence. We agree these are different concepts but in this context are intertwined. For example, brood size is used to manipulate parental care as a result of increased parental effort. We do agree the manuscript would benefit from keeping terminology consistent throughout the manuscript and have adjusted this throughout.

      (11) L.23: perhaps add 'in birds' somewhere in this sentence? Some reference to the assumptions underlying this inference would also be useful. Two major assumptions being that birds adjusted their effort to the manipulation as they would have done had they opted for a larger brood size themselves, and that the costs of laying and incubating extra eggs can be ignored. And then there is the effect that laying extra eggs will usually delay the hatch date, which in many species reduces reproductive success.

      Though our study does exclusively use birds, birds have been used to test the survival/reproduction trade-off because they present a convenient system in which to experimentally test this. The conclusions from these studies have a broader application than in birds alone. We believe that although these details are important, they are not appropriate in the abstract of our paper.

      (12) L.26: how is this an explanation? It just repeats the finding.

      We intend to refer to all interpretations from all results presented in our manuscript. We have made this more clear by adjusting our writing.

      (13) L.27: I do not see this point. And 'reproductive output' is yet another concept, that can be linked to the other concepts in the abstract in different ways, making it rather opaque.

      We have changed “reproductive output” to “reproductive effort”.

      (14) L.33: here you are jumping from 'resources' to 'energetically' - it is not clear that energy is the only or main limiting resource, so why narrow this down to energy?

      We do not say energy is the only or main limiting resource. We simply highlight that reproduction is energetically demanding and so, intuitively, a trade-off with a highly energetically demanding process would be the focal place to observe a trade off. We have, though, replaced “energetically” with “resource”.

      (15) L.35-36: this is new to me - I am not aware of any such claims, and effects on the residual reproductive value could also arise through effects on future reproduction. The authors you cite did not work on birds, or (in their own study systems) presented results that as far as I remember warrant such a general statement.

      The trade-off between reproduction and survival is seminal to the disposable soma theory, proposed by Kirkwood. Though Kirkwood’s work was largely not focussed on birds, it had fundamental implications for the field of evolutionary ecology because of the generalisable nature of his proposed framework. In particular, it has had wide-reaching influence on how the biology of aging is interpreted. The readership of the journal here is broad, and our results have implications for that field too. The work of Kirkwood (many of the papers on this topic have over 2000 citations each) has been perhaps overly influential in many areas, so a link to how that work should be interpreted is highly relevant. If the reviewer is interested in this topic the following papers by one of the co-authors and others could be of interest, some of which we could not cite in the main manuscript due to space considerations:

      https://www.science.org/doi/pdf/10.1126/sciadv.aay3047

      https://agingcelljournal.org/Archive/Volume3/stochasticity_explains_non_genetic_inheritance_of_lifespan/

      https://pubmed.ncbi.nlm.nih.gov/21558242/

      https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2435.13444

      https://www.nature.com/articles/362305a0

      https://www.cell.com/trends/ecology-evolution/fulltext/S0169-5347(12)00147-4

      https://www.cell.com/cell/pdf/S0092-8674(15)01488-9.pdf

      https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-018-0562-z

      (16) L.42: this could be preceded with mentioning the limitations of observational data.

      We have added detail as to why brood manipulations are a good test for trade-offs and so this is now inherently implied.

      (17) L.42-43: why?

      We have added detail to this sentence.

      (18) L.45: do any of the references cited here really support this statement? I am certain that several do not - in these this statement is an assumption rather than something that is demonstrated. It may be useful to look at Kate Lessell's review on this that appeared in Etologia, I think in the 1990's. Mind however that 'reproductive effort' is operationally poorly defined for reproducing birds - provisioning rate is not necessarily a good measure of effort in so far as there are fitness costs.

      We have updated the references to support the sentence.

      (19) L.47: Given that you make this statement with respect to brood size manipulations in birds, it seems to me that the paper by Santos & Nakagawa is the only paper you should cite here. Given that you go on to analyze the same data it deserves to be discussed in more detail, for example to clarify what you aim to add to their analysis. What warrants repeating their analysis?

      Please first note that our dataset includes Santos & Nakagawa and additional studies, so it is not accurate to say we analyse the same data. Furthermore, we believe our study has implications beyond birds alone and so believe it is appropriate to cite the papers that do support our statement. We have added details to the methods to explicitly state what data is gathered from Santos & Nakagawa (it is only used to find the appropriate literature and data was re-extracted and re-analysed in a more appropriate way) and, separately, how we gathered the observational studies (see L352-381).

      (20) L.48: There are more possible explanations to this, which deserve to be discussed. For example, brood size manipulations may not have been that effective in manipulating reproductive effort - for example, effects on energy expenditure tend to be not terribly convincing. Secondly, the manipulations do not affect the effort incurred in laying eggs (which also biases your comparison with natural variation in clutch size). Thirdly, the studies by Boonekamp et al on Jackdaws found that while there was no effect of brood size manipulation on parental survival after one year of manipulation, there was a strong effect when the same individuals were manipulated in the same direction in multiple years. This could be taken to mean that costs are not immediate but delayed, explaining why single year manipulations generally show little effect on survival. It would also mean that most estimates of the fitness costs of manipulated brood size are not fit for purpose, because typically restricted to survival over a single year.

      Please see our response to this comment in the public reviews.

      Out of interest and because the reviewer mentioned “energy expenditure” specifically: There are studies that show convincing effects of brood size manipulation on parental energy expenditure. We do agree that there are also studies that show ceilings in expenditure. We therefore disagree that they “tend to be not terribly convincing”. Just a few examples:

      https://academic.oup.com/beheco/article/10/5/598/222025 (Figure 2)

      https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2435.12321 (Figure 1)

      https://besjournals.onlinelibrary.wiley.com/doi/full/10.1046/j.1365-2656.2000.00395.x (but ceiling at enlarged brood).

      (21) L.48, "or, alternatively, that individuals may differ in quality": how do you see that happening when brood size is manipulated, and hence 'quality' of different experimental categories can be assumed to be approximately equal? This point does apply to observational studies, so I assume that that is what you had in mind, but that distinction should be clear (also on line 54).

      We have made it more clear that we determine if there are quality effects separate to the costs of reproduction found using brood manipulation studies.

      (22) L.50: Drent & Daan, in their seminal paper on "The prudent parent" (1980, Ardea) were among the earliest to make this point and deserve to be cited here.

      We have added this citation

      (23) L.51, "relative importance": relative to what? Please be more specific.

      We have adjusted this sentence.

      (24) L.54: Vedder & Bouwhuis (2018, Oikos) go some way towards this point and should be explicitly mentioned with reference to the role of 'quality' effects on the association between reproductive output and survival.

      We have added this reference.

      (25) L.55: can you be more specific on what you want to do exactly? What you write here could be interpreted differently.

      We have added an explicit aim after this sentence to be more clear.

      (26) L.57: Here also a more specific wording would be useful. What does it mean exactly when you say you will distinguish between 'quality' and 'costs'?

      We have added detail to this sentence.

      (27) L.62: it should be clearer from the introduction that this is already well known, which will indirectly emphasize what you are adding to what we know already.

      We would argue this is not well known and has only been theorised but not shown empirically, as we do here.

      (28) L.62: you equate clutch size with 'quality' here - that needs to be spelled out.

      We refer to quality as the positive effect size of survival for a given clutch size, not clutch size alone. We appreciate this is not clear in this sentence and have reworded.

      (29) L.64: this looks like a serious misunderstanding to me, but in any case, these inferences should perhaps be left to the discussion (this also applies to later parts of this paragraph), when you have hopefully convinced readers of the claims you make on lines 62-63.

      We are unsure of what the reviewer is referring to as a misunderstanding. We have chosen this format for the introduction to highlight our results. If this is a problem for the editors we will change as required.

      (30) L.66: quantitative comparison of what?

      Comparison of species. We have changed the wording of this sentence

      (31) L.67-69: this should be in the methods.

      We have used a modern format which highlights our result. We are happy to change the format should the editors wish us to.

      (32) L.74-88: suggest to (re)move this entire paragraph, presenting inferences in such an uncritical manner before presenting the evidence is inappropriate in my view. I have therefore refrained from commenting on this paragraph.

      We have chosen a modern format which highlights our result. We are happy to change the format should the editors wish us to.

      (33) L.271, "must detail variation in the number of raised young": it is not sufficiently clear what this means - what does 'detail' mean in this context? And what does 'number of raised young' mean? The number hatched or raised to fledging?

      We have now made this clear.

      (34) L271, "must detail variation in the number of raised young": looking at table S4, it seems that on the basis of this criterion also brood size manipulation studies where details on the number of young manipulated were missing are excluded. I see little justification for this - surely these manipulations can for example be coded as for example having the average manipulation size in the meta-analysis data set, thereby contributing to tests of manipulation effects, but not to variation within the manipulation groups?

      We have done in part what the reviewer describes. We are specifically interested in the manipulation size, so we required this to compare effect sizes across species and categories, a key advance of our study and outlined in many places in our manuscript. Note, however, that we only need comparative differences, and have used clutch size metrics more generally to obtain a mean clutch size for a species, as well as SD where required. Please also note that our supplement details exactly why studies were excluded from our analysis, as is the preferred practice in a meta-analysis.

      (35) L.271, "referred to as clutch size": the point of this simplification is not clear to me why it is clearly confusing - why not refer to 'brood size' instead?

      Brood size and clutch size can be used interchangeably here because, in the observational studies, the individuals vary in the number of eggs produced, whereas for brood manipulations this obviously happens after hatching and brood is perhaps a more appropriate term, but we wanted to simplify the terminology used. However, we use clutch size throughout as the aim of our study is to determine why individuals differ in the number of offspring they produce, and so clutch size is the most appropriate term for that.

      (36) L.280: according to the specified inclusion criteria (lines 271/272) these studies should already be in the data set, so what does this mean exactly?

      Selection criteria refers to whether a given study should be kept for analysis or not. It does not refer to how studies were found. Please see lines 361-378 for details on how we found studies (additional details are also in the Supplementary Methods).

      (37) L.281: the use of 'quality' here is misleading - natural variation in clutch or brood size will have multiple causes, variation in phenotypic quality of the individuals and their environment (territories) is only one of the causes. Why not simply refer to what you are actually investigating: natural and experimental variation in brood size.

      We disagree, our study aims to separate quality effects from the costs of reproduction and we use observational studies to test for quality differences, though we make no inference about the mechanisms. We do not imply that the environment causes differences in quality, but that to directly compare observation and experimental groups, they should contain similar species. So, to be clear again, quality refers to the positive covariation of clutch size with survival. We feel that we explain this clearly in our study’s rationale and have also improved our writing in several sections on this to avoid any confusion (see responses to earlier comments by the three reviewers).

      (38) L.283, "in most cases": please be exact and say in xx out xx cases.

      We have added the number of studies for each category here.

      (39) L.283-285: presumably readers can see this directly in a table with the extracted data?

      Our data and code can be accessed with the following link: https://doi.org/10.5061/dryad.q83bk3jnk. We believe the data are too large to include as a table in the main text and are not essential in understanding the paper. Though we do believe all readers should have access to this information if they wish and so is publicly available.

      (40) L.293: there does not seem to be a table that lists the included studies and effect sizes. It is not uncommon to find major errors in such tables when one is familiar with the literature, and absence of this information impedes a complete assessment of the manuscript.

      We supplied a link to our full dataset and the code we used in Dryad with our submitted manuscript. We were also asked to supply our data during the review process and we again supplied a link to our dataset and code, along with a folder containing the data and code itself. We received confirmation that the reviewers had been given our data and code. We support open science and it was our intention that our dataset should be fully available to reviewers and readers. We believe the data are too large to include as a table in the main text and are not essential in understanding the paper. Our data and code are at https://doi.org/10.5061/dryad.q83bk3jnk.

      (41) L.293: from how many species?

      We have added this detail.

      (42) L.296, "longevity": this is a tricky concept, not usually reported in the studies you used, so please describe in detail what data you used.

      We have removed longevity as we did not use this data in our current version of the manuscript.

      (43) L. 298: again: where can I see this information?

      Our data and code can be accessed with the following link: https://doi.org/10.5061/dryad.q83bk3jnk. We did supply this information when we submitted our manuscript and again during the review process but we believe this was not passed onto the reviewers.

      (44) L. 304, "we used raw data": I assume that for the majority of papers the raw data were not available, so please explain how you dealt with this. Or perhaps this applies to a selection of the studies only? Perhaps the experimental studies?

      By raw data, we mean the absolute value of offspring in the nest. We have changed the wording of this sentence and added detail about whether the absolute value of offspring was not present for brood manipulation studies (L393-397).

      (45) L.304: When I remember correctly, Santos and Nakagawa examined effects of reducing and enlarging brood size separately, which is of importance because trade-off curves are unlikely to be linear and whether they are or not has major effects on the optimization process. But perhaps you tackled this in another way? I will read on.....

      You are correct that Santos & Nakagawa compared brood increases and reductions to control separately. Note that this only partially accounts non-linearity and it does not take into account the severity of the change in brood size. By using a logistic regression of absolute clutch size, as we have done, we are able to directly compare brood manipulations with experimental studies. Please see Supplementary Methods lines 11-12, where we have added additional detail as to why our approach is beneficial in this analysis.

      (46) L.319: what are you referring to exactly with "for each clutch size transformation"?

      We refer to the raw, standardised and proportional clutch size transformations. We have added detail here to be more clear.

      (47) L.319: is there a cost of survival? Perhaps you mean 'survival cost'? This would be appropriate for the experimental data, but not for the observational data, where the survival variation may be causally unrelated to the brood size variation, even if there is a correlation.

      We have changed “cost of survival” to “effect of parental survival”. We only intend to imply causality for the experimental studies. For observational studies we do not suggest that increasing clutch size is causal for increasing survival, only correlative (and hence we use the phrase “quality”).

      (48) L.320: please replace "parental effort" with something like 'experimental change in brood size'.

      We have changed “parental effort” to “reproductive effort”

      (49) L.321: due to failure of one or more eggs to hatch, and mortality very early in life, before brood sizes are manipulated, it is not likely that say an enlargement of brood size by 1 chick can be equated to the mean clutch size +1 egg / check. For example, in the Wytham great tit study, as re-analysed by Richard Pettifor, a 'brood size manipulation' of unmanipulated birds is approximately -1, being the number of eggs / chicks lost between laying and the time of brood size manipulation. Would this affect your comparisons?

      Though we agree these are important factors in determining what a clutch/brood size actually is for a given individual/pair, as this can vary from egg laying to fledging. We do not believe that accounting for this (if it was possible to do so) would significantly affect our conclusions, as observational studies are comparable in the fact that these birds would also likely see early life mortality of their offspring. It is also possibly the case that parents already factor in this loss, and so a brood manipulation still changes the parental care effort an individual has to incur.

      (50) L.332: instead of "adjusted" perhaps say 'mean centred'?

      We have implemented this suggestion.

      (51) L.345: this statement surprised me, but is difficult to verify because I could not locate a list of the included studies. However, to my best knowledge, most studies reporting brood size manipulation effects on parental survival had this as their main focus, in contrast to your statement.

      Our data and code can be accessed with the following link: https://doi.org/10.5061/dryad.q83bk3jnk. We did supply this information when we submitted our manuscript and again during the review process but we believe this was not passed onto the reviewers by the journal, although supplied by us on several occasions. We regret that the reviewer was impeded by this unfortunate communication failure, but we did our best to make the data available to the reviewers during the initial review process.

      (52) L.361-362: this seems a realistic approach from an evolutionary perspective, but we know from the jackdaw study by Boonekamp that the survival effect of brood size manipulation in a single year is very different from the survival effect of manipulating as in your model, i.e. every year of an individual's life the same manipulation. For very short-lived species this possibly does not make much difference, but for somewhat longer-lived species this could perhaps strongly affect your results. This should be discussed, and perhaps also explored in your simulations?

      Note that the Boonekamp study does not separate whether the survival effects are additive or

      multiplicative. As such, we do not know whether the survival effects for a single year manipulation are just small and hard to detect, or whether the survival effects are multiplicative. Our simulations assumed that the brood enlargement occurred every year throughout their lives. We have added some text to the discussion on the point you raise.

      (53) L.360: what is "lifetime reproductive fitness"? Is this different from just "fitness"?

      We have changed “lifetime reproductive fitness” to “lifetime reproductive output”.

      (54) L.363: when you are interested in optimal clutch size, why not also explore effects of reducing clutch size?

      As we find that a reduction in clutch size leads to a reduction in survival (for experimental studies), we already know that these individuals would have a reduced fitness return compared to reproducing at their normal level, and so we would not learn anything from adding this into our simulations. The interest in using clutch size enlargements is to find out why an individual does not produce more offspring than it does, and the answer is that it would not have a fitness benefit (unless its clutch size and survival rate combination is out of the bounds of that observable in the wild).

      (55) Fig.1 - using 'parental effort' in the y-axis label is misleading, suggest to replace with e.g. "clutch or brood size". Using "clutch size" in the title is another issue, as the experimental studies typically changed the number of young rather than the number of eggs.

      We have updated the figure axes to say “clutch size” rather than “parental effort”. Please see response to comment 35 where we explain our use of the term “clutch size” throughout this manuscript.

      (56) L.93 - 108: I appreciate the analysis in Table 1, in particular the fact that you present different ways of expressing the manipulation. However, in addition, I would like to see the results of an analysis treating the manipulations as factor, i.e. without considering the scale of the manipulation. This serves two purposes. Firstly, I believe it is in the interest of the field that you include a detailed comparison with the results of Santos & Nakagawa's analysis of what I expect to be largely the same data (manipulation studies only - for this purpose I would also like to see a comparison of effect size between the sexes). Secondly, there are (at least) two levels of meta-analysis, namely quantifying an overall effect size, and testing variables that potentially explain variation in effect size. You are here sort of combining the two levels of analysis, but including the first level also would give much more insight in the data set.

      Our main intention here was to improve on how the same hypothesis was approached by Santos & Nakagawa. We did this by improving our analysis (on a by “egg” basis) and by adding additional studies (i.e. more data). In this process mistakes are corrected (as we re-extracted all data, and did not copy anything across from their dataset – which was used simply to ensure we found the same papers); more recent data were also added, including studies missed by Santos & Nakagawa. This means that the comparison with Santos & Nakagawa becomes somewhat irrelevant, apart from maybe technical reasons, i.e. pointing out mistakes or limitations in certain approaches. We would not be able to pinpoint these problems clearly without considering the whole dataset, yet Santos & Nakagawa only had a small subset of the data that were available to us. In short, meta-analysis is an iterative process and similar questions are inevitably analysed multiple times and updated. This follows basic meta-analytic concepts and Cochrane principles. Except where there is a huge flaw in a prior dataset or approach (like we sometimes found and highlighted in our own work, e.g. Simons, Koch, Verhulst 2013, Aging Cell), in itself a comparison of the kind the reviewer suggests distracts from the biology. With the dataset being made available others can make these comparisons, if required. On the sex difference, we provide a comparison of effect sizes separated between both sexes and mixed sex in Table S2 and Figure S1.

      (57) L.93 - 108: a thing that does not become clear from this section is whether experimentally reducing brood size affects parental survival similarly (in absolute terms) as enlarging brood size. Whether these effects are symmetric is biologically important, for example because of its effect on clutch size optimization. In the text you are specific about the effects of increasing brood size, but the effect you find could in theory be due entirely to brood size reduction.

      We have added detail to make it clear that a brood reduction is simply the opposite trend. We use linear relationships because they serve a good approximation of the trend and provide a more rigorous test for an underlying relationship than would fitting nonlinear models. For many datasets there is not a range of chicks added for which a non-linear relationship could be estimated. The question also remains of what the shape of this non-linear relationship should be and is hard to determine a priori.

      We have added some discussion on this to our manuscript (L278-282), in response to an earlier comment.

      (58) L.103-107: this is perhaps better deferred to the discussion, because other potential explanations should also be considered. For example, there have been studies suggesting that small birds were provisioning their brood full time already, and hence had no scope to increase provisioning effort when brood size was experimentally increased.

      We agree this is a discussion point but we believe it also provides an important context for why we ran our simulations, and so we believe this is best kept brief but in place. We agree the example you give is relevant but believe this argument is already contained in this section. See line 121-123 “...suggesting that costs to survival were only observed when a species was pushed beyond its natural limits”.

      (59) L.103-107: this discussion sort of assumes that the results in Table 1 differ between the different ways that the clutch/brood size variation is expressed. Is there any statistical support for this assumption?

      We are unsure of what the reviewer means here exactly. Note that in each of the clutch size transformations, experimental and observational effect sizes are significantly opposite. For the proportional clutch size transformation, experimental and observation studies are both separately significantly different from 0.

      (60) L.104: at this point, I would like to have better insight into the data set. Specifically, a scatter plot showing the manipulation magnitude (raw) plotted against control brood size would be useful.

      Our data and code can be accessed with the following link: https://doi.org/10.5061/dryad.q83bk3jnk. We did supply this information when we submitted our manuscript and again during the review process but we believe this was not passed onto the reviewers by the journal.

      Thank you for this suggestion: this is a useful suggestion also to illustrate how manipulations are relatively stronger for species with smaller clutches, in line with our interpretation of the result presented in Figure 2. We have added Figure S1 which shows the strength of manipulation compared to the species average.

      (61) L. 107: this seems a bold statement - surely you can test directly whether effect size becomes disproportionally stronger when manipulations are outside the natural range, for example by including this characterization as a factor in the models in Table 1.

      It is hard to define exactly what the natural range is here, so it is not easy to factorise objectively, which is why we chose not to do this. However, it is clear that for species with small clutches the manipulation itself is often outside the natural range. Thank you for your suggestion to include a figure for this as it is clear manipulations are stronger in species with smaller clutches. We attribute this to species being forced outside their natural range. We consider our wording makes it clear that this is our interpretation of our findings and we therefore do not think this is a bold statement, especially as it fits with how we interpret our later simulations.

      (62) Fig.3, legend: the term 'node support' does not mean much to me, please explain.

      Node support is a value given in phylogenetic trees to dictate the confidence of a branch. In this case, values are given as a percentage and so can translate to how many times out of 100 the estimate of the phylogeny gives the same branching. Our values are low, as we have relatively few species in our meta-analysis.

      (63) Fig.3: it would be informative when you indicate in this figure whether the species contributed to the experimental or the observational data set or both.

      We have added into Fig 3 whether the species was observational, experimental or both.

      (64) L.139: the p-value refers to the interaction between species clutch size and treatment (observational vs. experimental), but it appears that no evidence is presented for the correlation being significant in either observational or experimental studies.

      We agree that our reporting of the effect size could be misinterpreted and have added detail here. The statistic provided describes the slopes are significantly different between observational and experimental, implying there are differences between the slopes of small and large clutch-laying species.

      (65) L.140: I am wondering to what extent these correlations, which are potentially interesting, are driven by the fact that species average clutch size was also used when expressing the manipulation effect. In other words, to what extent is the estimate on the Y-axis independent from the clutch size on the X-axis? Showing that the result is the same when using survival effect sizes per manipulation category would considerably improve confidence in this finding.

      We are unsure what the reviewer means by “per manipulation category”. Please also note that we have used a logistic regression to calculate our effect sizes of survival, given a unit increase in reproductive effort. So, for example, if a population contained birds that lay 2,3 or 4 eggs, provided that the number of birds which survived and died in each category did not change, if we changed the number of eggs raised to 10,11 or 12, respectively, then our effect size would be the same. In this way, our effect sizes are independent of the species’ average clutch size.

      (66) L.145: when I remember correctly, Santos & Nakagawa considered brood size reduction and enlargement separately. Can this explain the contrasting result? Please discuss.

      You are correct, in that Santos & Nakagawa compared reductions and enlargements to controls separately. However, we found some mistakes in the data extracted by Santos & Nakagawa that we believe explain the differences in our results for sex-specific effect sizes. We do not feel that highlighting these mistakes in the main text is fair, useful or scientifically relevant, as our approach is to improve the test of the hypothesis.

      (67) L.158-159: looking at table S2 it seems to me you have a whole range of estimates. In any case, there is something to be said for taking the estimates for females because it is my impression (and experience) that clutch size variation in most species is a sex-linked trait, in that clutch size tends to be repeatable among females but not among males.

      We agree that, in many cases, the female is the one that ultimately decides on the number of chicks produced. We did also consider using female effect sizes only, however, we decided against this for the following reasons: (1) many of the species used in our meta-analysis exhibit biparental care, as is the case for many seabirds, and so using females only would bias our results towards species with lower male investment; in our case this would bias the results towards passerine species. (2) it has also been shown that, as females in some species are operating at their maximum of parental care investment, it is the males who are able to adjust their workload to care for extra offspring. (3) we are ultimately looking at how many offspring the breeding adults should produce, given the effort it costs to raise them, and so even if the female chooses a clutch size completely independently of the male, it is still the effort of both parents combined that determines whether the parents gain an overall fitness benefit from laying extra eggs. (4) some studies did not clearly specify male or female parental survival and we would not want to reduce our dataset further.

      (68) L.158-168: please explain how you incorporated brood size effects on the fitness prospects of offspring, given that it is a very robust finding of brood size manipulation studies that this affects offspring growth and survival.

      We would argue this is near-on impossible to incorporate into our simulations. It is unrealistic to suggest that incorporating offspring growth into our simulations would add insight, as a change in offspring number rarely affects all offspring in the nest equally and there can even be quite stark differences; for example, this will be most evident in species that produce sacrificial offspring. This effect will be further confounded by catch-up growth, for example, and so it is likely that increased sibling competition from added chicks alters offspring growth trajectories, rather than absolute growth as the reviewer suggests. There are mixed results in the literature on the effect of altering clutch size on offspring survival, with an increased clutch size through manipulation often increasing the number of recruits from a nest. It would be interesting, however, to explore this further using estimates from the literature, but this is beyond our current scope, and would in our initial intuition not be very accurate. It would be interesting to explore how big the effect on offspring should be to constrain effect size strongly. Such work would be more theoretical. The point of our simple fitness projections here is to aid interpretation of the quantitative effect size we estimated.

      (69) L.163: while I can understand that you select the estimate of -0.05 for computational reasons, it has enormous confidence intervals that also include zero. This seems problematic to me. However, in the simulations, you also examined the results of selecting -0.15, which is close to the lower end of the 95% C.I., which seems worth mentioning here already.

      Thank you for this suggestion. Yes, indeed, our range was chosen based on the CI, and we have now made this explicit in the manuscript.

      (70) L.210: defined in this way, in my world this is not what is generally taken to be a selection differential. Is what you show not simply scaled lifetime reproductive success?

      As far as we are aware, a selection differential is the relative change between a given group and the population mean, which is what we have done here. We appreciate this is a slightly unusual context in which to place this, but it is more logical to consider the individuals who produce more offspring as carrying a potential mutation for higher productivity. However, we believe that “selection differential” is the best terminology for the statistic we present. We also detail in our methodology how we calculate this. We have adjusted this sentence to be more explicit about what we mean by selection differential.

      (71) L.177-180: is this not so because these parameter values are closest to the data you based your estimates on, which yielded a low estimate and hence you see that here also?

      We are unsure of what exactly the reviewer means here. The effect sizes for our exemplar species were predicted from each combination of clutch size and survival rate. Note that we used a range of effect sizes, higher than that estimated in our meta-analysis, to explore a large parameter space and that these same conclusions still hold.

      (72) L.191-194: these statements are problematic, because based on the assumption that an increase in brood size does not impact the fitness prospects of the offspring, and we know this assumption to be false.

      Though we appreciate that some cost is often absorbed by the offspring themselves, we are unaware of any evidence that these costs are substantial and large enough to drive within-species variation in reproductive effort, though for some specific species this may be the case. However, in terms of explaining a generalisable, across-species trend, the fitness costs incurred by a reduction in offspring quality are unlikely to be significantly larger than the survival costs to reproduce. We also find it highly unlikely the cost to fitness incurred by a reduction in offspring quality is large enough to counter-balance the effect of parental quality that we find in our observational studies. We do also discuss other costs in our discussion.

      (73) L.205: here and in other places it would be useful to be more explicit on whether in your discussion you are referring to observational or experimental variation.

      We have added this detail to our manuscript. Do note that many of our conclusions are drawn by the combination of results of experimental and observational studies. We believe the addition of Figure 5 makes this more clear to the reader.

      (74) L.225: this may be true (at least, when we overlook the misuse of the word 'quality' here), but I would expect some nuance here to reflect that there is no surprise at all in this result as this pattern is generally recognized in the literature and has been the (empirical) basis for the often-repeated explanation of why experiments are required to demonstrate trade-offs. On a more quantitative level, it is worth mentioning the paper of Vedder & Bouwhuis (2017, Oikos) that essentially shows the same thing, i.e. a positive association between reproductive output and parental survival.

      We have added some discussion on this point, including adding the citation mentioned. However, we would like to highlight that our results demonstrate that brood manipulations are not necessarily a good test of trade-offs, as they fail to recognise that individuals differ in their underlying quality. Though we agree that this result should not necessarily be a surprising one, we have also not found it to be the case that differences in individual quality are accepted as the reason that intra-specific clutch size is maintained – in fact, we find that it is most commonly argued that when costs of reproduction are not identifiedit is concluded that the costs must be elsewhere – yet we cannot find conclusive evidence that the costs of reproduction (wherever they lie) are driving intra-specific variation in reproductive effort. Furthermore, some studies in our dataset have reported negative correlations between reproductive effort and survival (see observational studies, Figure 1).

      (75) L.225-226: perhaps present this definition when you first use the term.

      We have added more detail to where we first use and define this term to improve clarity (L57-58).

      (76) L.227-228, "currently unknown": this statement surprised me, given that there is a plethora of studies showing within-population variation in clutch size to depend on environmental conditions, in particular the rate at which food can be gathered.

      We mean to question that if an individual is “high quality”, why is it not selected for? We have rephrased, to improve clarity.

      (77) L.231: this seems no more than a special case of the environmental effect you mention above.

      We think this is a relevant special case, as it constitutes within-individual variation in reproduction that is mistaken for between-individual variation. This is a common problem in our field, that we feel needs adressing. We only have between-individual variation here in our study on quality, and by highlighting this we show that there might not be any variation between individuals, but this could come about fully (doubtful) or partly (perhaps likely) due to terminal effects.

      (78) L235-236: but apparently depending on how experimental and natural variation was expressed? Please specify here.

      We are not sure what results the reviewer is referring to here, as we found the same effect (smaller clutch laying species are more severely affected by a change in clutch size) for both clutch size expressed as raw clutch size and standardised clutch size.

      (79) L.237: the concept of 'limits' is not very productive here, and it conflicts with the optimality approach you apply elsewhere. What you are saying here can also be interpreted as there being a non-linear relationship between brood size manipulation and parental survival, but you do not actually test for that. A way to do this would be to treat brood size reduction and enlargement separately. Trade-off curves are not generally expected to be linear, so this would also make more sense biologically than your current approach.

      We have replaced “limits” with “optima”. We believe our current approach of treating clutch size as a continuous variable, regardless of manipulation direction, is the best approach, as it allows us to directly compare with observational studies and between species that use different manipulations (now nicely illustrated by the reviewer’s suggested Figure S1). Also note that transforming clutch size to a proportion of the mean allows us to account for the severity in change in clutch size. We also do not believe that treating reductions and enlargements separately accounts for non-linearity, as either we are separating this into two linear relationships (one for enlargements and one for reductions) or we compare all enlargements/reductions to the control, as in Santos & Nakagawa 2012, which does not take into account the severity of the increase, which we would argue is worse for accounting for non-linearity. Furthermore, in the cases where the manipulation involved one offspring only, we also cannot account for non-linearity.

      (80) L.239: assuming birds are on average able to optimize their clutch size, one could argue that any manipulation, large or small, on average forces birds to raise a number of offspring that deviates from their natural optimum. At this point, it would be interesting to discuss in some detail studies with manipulation designs that included different levels of brood size reduction/enlargement.

      We agree with the reviewer that any manipulation is changing an individual’sclutch size away from its own individual optima, which we have argued also means brood manipulations are not necessarily a good test of whether a trade-off occurs in the wild (naturally), as there could be interactions with quality – we have now edited to explicitly state this (L299-300).

      (81) L.242-244: when you choose to maintain this statement, please add something along the lines of "assuming there is no trade-off between number and quality of offspring".

      As explained above, though we agree that the offspring may incur some of the cost themselves, we are not aware of any evidence suggesting this trade-off is also large enough to drive intra-specific variation in clutch size across species. Furthermore, in the context here, the trade-off between number and quality of offspring would not change our conclusion – that the fitness benefit of raising more offspring is offset by the cost on survival. We have added detail on the costs incurred by offspring earlier in our discussion (L309-315). The addition of Figure 5 should help interpret these data.

      (82) L.253: instead of reference 30 the paper by Tinbergen et al in Behaviour (1990) seems more appropriate.

      We believe our current citation is relevant here but we have also added the Tinbergen et al (1990) citation.

      (83) L.253-254: such trade-offs may perfectly explain variation in reproductive effort within species if we were able to estimate cost-benefit relations for individuals. In fact, reference 29 goes some way to achieve this, by explaining seasonal variation in reproductive effort.

      We are unaware of any quantitative evidence that any combination of trade-offs explains intra-specific variation in reproductive effort, especially as a general across-species trend.

      (84) L.255: how does one demonstrate "between species life-history trade-offs"? The 'trade-off' between reproductive rate and survival we observe between species is not necessarily causal, and hence may not really be a trade-off but due to other factors - demonstrating causality requires some form of experimental manipulation.

      Between-species trade-offs are well established in the field, stemming from GC Williams’ seminal paper in 1966, and for example in r/K selection theory. It is possible to move from these correlations to testing for causation, and this is happening currently by introducing transgenes (genes from other species) that promote longevity into shorter-lived species (e.g., naked-mole rat genes into mice). As yet it is unclear what the effects on reproduction are.

      (85) L.256: it is quite a big claim that this is a novel suggestion. In fact, it is a general finding in evolutionary theory that fitness landscapes tend to be rather flat at equilibrium.

      It is important to note here that we simulate the effect size found, and hence this is the novel suggestion, that because the resulting fitness landscape is relatively flat there is no directional selection observed. We did not intend to suggest our interpretation of flat fitness landscapes is novel. We have changed the phrasing of this sentence to avoid misinterpretation.

      (86) L.259: why bring up physiological 'costs' here, given that you focus on fitness costs? Do you perhaps mean fitness costs instead of physiological costs? Furthermore, here and in the remainder of this paragraph it would be useful to be more specific on whether you are considering natural or experimental variation.

      The cost of survival is a physiological cost incurred by the reduction of self-maintenance as a result of lower resource allocation. This is one arm of fitness; we feel it would be confusing here to talk about costs to fitness, as we do not assess costs to future reproduction (which formed the large part of the critique offered by the reviewer). We would like to highlight that the aim of this manuscript was to separate costs of reproduction from the effects of quality, and this is why we have observational and experimental studies in one analysis, rather than separately. Our conclusion that we have found no evidence that the survival cost to reproduce drives within-species variation in clutch size comes both from the positive correlation found in the observational studies and our negligible fitness return estimates in our simulations. We therefore, do not believe it is helpful to separate observational and experimental conclusions throughout our manuscript, as the point is that they are inherently linked. We hope that with the addition of Figure 5 that this is more clear.

      (87) L.262: The finding that naturally more productive individuals tend to also survive better one could say is by definition explained by variation in 'quality', how else would you define quality?

      We agree, and hence we believe quality is a good term to describe individuals who perform highly in two different traits. Note that we also say the lack of evidence that trade-offs drive intra-specific variation in clutch size also potentially suggests an alternative theory, including intra-specific variation driven by differences in individual quality.

      Supplementary information

      (88) Table S1: please provide details on how the treatment was coded - this information is needed to derive the estimates of the clutch size effect for the treatments separately.

      We have added this detail.

      (89) Table S2: please report the number of effect sizes included in each of these models.

      We have added this detail.

      (90) Table S4: references are not given. Mentioning species here would be useful. For example, Ashcroft (1979) studied puffins, which lay a single egg, making me wonder what is meant when mentioning "No clutch or brood size given" as the reason for exclusion. A few more words to explain why specific studies were excluded would be useful. For example, what does "Clutch size groups too large" mean? It surprises me that studies are excluded because "No standard deviation reported for survival" - as the exact distribution is known when sample size and proportion of survivors is known.

      We have updated this table for more clarity.

      (91) Fig.S1: please plot different panels with the same scale (separately for observational and experimental studies). You could add the individual data points to these plots - or at least indicate the sample size for the different categories (female, male, mixed).

      We have scaled all panels to have the same y axis and added sample sizes to the figure legend.

      (92) Fig.S3: please provide separate plots for experimental and observational studies, as it seems entirely plausible that the risk of publication bias is larger for observational studies - in particular those that did not also include a brood size manipulation. At the same time, one can wonder what a potential publication bias among observational studies would represent, given that apparently you did not attempt to collect all studies that reported the relevant information.

      We have coloured the points for experimental and observational studies. Note that a study is an independent effect size and, therefore, does not indicate whether multiple data (i.e., both experimental and observational studies) came from the same paper. As we detail in the paper and above in our reviewer responses, we searched for observational studies from species used in the experimental studies to allow direct comparison between observational and experimental datasets.

      Reviewer #2 (Recommendations For The Authors):

      I strongly recommend improving the theoretical component of the analysis by providing a solid theoretical framework before, from it, drawing conclusions.

      This, at a minimum, requires a statistical model and most importantly a mechanistic model describing the assumed relationships.

      We thank the reviewer for highlighting that our aims and methodology are unclear in places. We have added detail to our model and simulation descriptions and have improved the description of our rationale. We also feel the failure of the journal to provide code and data to the reviewers has not helped their appreciation of our methodology and use of data.

      Because the field uses the same wording for different concepts and different wording for the same concept, a glossary is also necessary.

      We thank the reviewer for raising this issue. During the revision of this manuscript, we have simplified our terminology or given a definition, and we believe this is sufficient for readers to understand our terminology.

      Reviewer #3 (Recommendations For The Authors):

      • The files containing information of data extracted from each study were not available so it has not been possible to check how any of the points raised above apply to the species included in the study. The ms should include this file on the Supp. Info as is standard good practice for a comparative analysis.

      We supplied a link to our full dataset and the code we used in Dryad with our submitted manuscript. We were also asked to supply our data during the review process and we again supplied a link to our dataset and code, along with a folder containing the data and code itself. We received confirmation that the reviewers had been given our data and code. We support open science and it was our intention that our dataset should be fully available to reviewers and readers. We believe the data is too large to include as a table in the main text and is not essential in understanding the paper. Our data and code are at https://doi.org/10.5061/dryad.q83bk3jnk.

      • For clarity, refer to 'the effect size of clutch size on survival" rather than simply "effect size". Figures 1 and 2 require cross-referencing with the main text to understand the y-axis.

      We have added detail to the figure legend to increase the interpretability of the figures.

      • Silhouettes in Figure 3 (or photos) would help readers without ornithological expertise to understand the taxonomic range of the species included in the analyses.

      We have added silhouettes into Figure 3.

      • Throughout the discussion: superscripts shouldn't be treated as words in a sentence so please add authors' names where appropriate.

      We have added author names and dates where required.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable paper presents a new protocol for quantifying tRNA aminoacylation levels by deep sequencing. The improved methods for discrimination of aminoacyl-tRNAs from non-acylated tRNAs, more efficient splint-assisted ligation to modify the tRNAs' ends for the following RT-PCR reaction, and the use of an error-tolerating mapping algorithm to map the tRNA sequencing reads provide new tools for anyone interested in tRNA concentrations and functional states in different cells and organisms. The results and conclusions are solid with well-designed tests to optimize the protocol under different conditions.

      Public Reviews:

      We thank both reviewers for suggestions, feedback and improvements. We address these pointwise below.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript of Davidsen and Sullivan describes an improved tRNA-seq protocol to determine aminoacyl-tRNA levels. The improvements include: (i) optimizing the Whitfeld or oxidation reaction to select aminoacyl-tRNAs from oxidation-sensitive non-acylated tRNAs; (ii) using a splint-assisted ligation to modify the tRNAs' ends for the following RT-PCR reaction; (iii) using an error-tolerating mapping algorithm to map the tRNA sequencing reads that contain mismatches at modified nucleotides.

      Strengths:

      The two steps, the oxidation, and the splint-assisted ligation are yield-diminishing steps, thus the protocol of Davidsen and Sullivan is an important improvement of the current protocols to enhance the quantification of aminocyl-tRNAs.

      Weaknesses:

      The oxidation and the selection of aminoacyl-tRNA is the first step in all protocols. Thereafter they differ on whether blunt ligation, hairpin (DM-tRNA-seq, YAMAT-seq, QuantM-seq, mim tRNA-seq, LOTTE tRNA-seq), or splint ligation is used and finally what detection method is applied (i-tRAP, tRNA microarrays). What is the correlation to those alternative approaches (e.g. i-tRAP (PMID 36283829), tRNA microarrays (PMID: 31263264) etc.)? What is the correlation with other approaches with which this improved protocol shares some steps (DM-tRNA-seq, mim-tRNA-seq)?

      We appreciate the fair assessment and fully agree that our work would benefit from a large comparison between all known tRNA-seq methods. We did directly compare many elements of our method to those of other methods (e.g. ligation efficiency and barcode bias); however, as noted by the reviewer we did not perform a direct end-to-end comparison with all other methods. An ideal comparison would require running several different sample conditions and technical replicates through our protocol and repeating the process across a half dozen or so other methods as they are described. Unfortunately, this approach is unlikely to be feasible since each method uses different oligos, reagents and kits, and all would have to be acquired at substantial cost. Some methods also rely on other detection methods such as microarrays, qPCR, or Illumina sequencing, which would also make this goal all the more onerous. There are also different pipelines for data processing that, in some instances, make the final results hard to compare. In short, this would be a monumental and expensive task to do comprehensively. We also worry that, even if these experiments were conducted such that some variables were concluded to be superior, they could still be challengeable based on perceived or actual protocol differences from the prior art. In summary, we think that an overall comparison with each method would be ideal, but practical concerns limit us to optimizing and comparing the variables that we found to be most prone to introducing bias in the results.

      For methods that measure tRNA expression levels (DM-tRNA-seq, YAMAT-seq, QuantM-seq, mim-tRNA-seq, LOTTE tRNA-seq etc.) there are some fundamental problems regarding absolute quantification using NGS that preclude simple comparisons. These problems are well known in the field of microRNA (Fuchs et al. (2012) [PMID: 25942392]) and arise due to several factors introduced during processing steps such as purification, ligation, reverse transcription and amplification. With the lack a “true” quantitation benchmark it would be difficult to make quantitative claims from each.  Therefore, in our own work we benchmark tRNA expression levels for sample-to-sample reproducibility (i.e. precision) as further explained in the response to reviewer #2.

      For comparison to methods that measure tRNA charge we did have an opportunity to compare our results with those of another study. To this end, we have added a figure comparing the baseline charge found using our method and the one used in Evans et al. (Revised manuscript Figure 2—figure supplement 9). This comparison finds broadly similar results for tRNA charge, including similar trends for a subset of Glu, Ser and Pro codons that are notable for their lowered basal tRNA charge.

      Reviewer #2 (Public Review):

      Davidsen and Sullivan present an improved method for quantifying tRNA aminoacylation levels by deep sequencing. By combining recent advances in tRNA sequencing with lysine-based chemistry that is more gentle on RNA, splint oligo-based adapter ligation, and full alignment of tRNA reads, they generate an interesting new protocol. The lab protocol is complemented by a software tool that is openly available on Github. Many of the points highlighted in this protocol are not new but have been used in recent protocols such as Behrens et al. (2021) or McGlincy and Ingolia (2017). Nevertheless, a strength of this study is that the authors carefully test different conditions to optimize their protocol using a set of well-designed controls.

      The conclusions of the manuscript appear to be well supported by the data presented. However, there are a few points that need to be clarified.

      We appreciate the acknowledgement of the strength of our aminoacylation controls and agree that our method is relying on many aspects of the mentioned prior work.  

      (1) One point that remains unsatisfactory is a better benchmarking against the state of the art. It is currently impossible to estimate how much the results of this new protocol differ from alternative methods and in particular from Behrens et al. (2021). Here it will be helpful to perform experiments with samples similar to those used in the mim-tRNAseq study and not with H1299 cells.

      We fully agree that more rigorous benchmarking would be desirable. As also noted in the response to reviewer #1, a full end-to-end comparison of methods would be ideal but would be onerous and expensive in practice, so we focused on optimizing the steps we found to be most prone to introducing bias in the data.

      We agree that Behrens et al., (2021) has substantial methodological overlap with our work and was instrumental in our efforts; however, the focus of their manuscript was largely on quantification of tRNA abundance and modifications, rather than the tRNA charge. In fact, tRNA charge was only determined for yeast in that study. Quantifying the abundance of short RNAs using NGS is very difficult (Fuchs et al. (2012) [PMID: 25942392]) and will likely require the use of a mixture of tRNAs as spike-in references for normalization (Bissels et al. (2009) [PMID: 19861428]). In the case of Behrens et al. (2021), they did not use a spike-in tRNA reference, but instead correlated gene copy number with their measured tRNA abundance. They also compare to Northern blotting for two tRNA transcripts, showing a directionally similar result; however, no quantitative claims can be made measurement accuracy. Until a good method of normalizing tRNA quantification is found, we believe that sample-to-sample reproducibility (i.e. precision) is the most useful objective to optimize because this will allow detection of differential expression. Towards that end, we quantified the precision of our method (Figure 4 and its two supplementary figures) with associated statistics, which can be used to estimate the number of samples required to detect significance during differential expression analysis. For tRNA charge, quantification is easier, which is why we present statistics on both accuracy and precision. In this case we can better compare results across methods, and so we have added a comparison of our results to the charge quantification from Evans et al. (2017) (Figure 2—figure supplement 9).

      (2) While the protocol aims to implement an improved method for quantification of tRNA aminoacylation, it can also be used for tRNA quantification and analysis of tRNA modifications. It will increase the impact of this study if the authors benchmark the outcomes of their protocol with other tRNA sequencing protocols with samples similar to these papers, which will be important for certain research teams that are unlikely to implement two different tRNA sequencing methods. Are there any possible adaptations that would allow the analysis of tRNA fragments?

      The first part of this comment regarding comparison of methods is addressed in response to in the prior reviewer comment and in the response to reviewer 1. In the specific case of tRNA modifications, the issue is similar to abundance quantification in that a “true” reference of modified tRNA is likely necessary for proper quantification, alongside testing of each method simultaneously.

      Regarding tRNA fragments, our method is not suitable for this use case. This is because our adapter ligation step depends on an intact tRNA structure with either CCA or CC overhang on the 3’-end and thus we almost exclusively get reads with CCA/CC ends and no reads from fragments. This specificity is good for increasing charge quantification accuracy but not good for the methods versatility. For a more versatile method we recommend Watkins et al. (2022) [PMID: 35513407].

      (3) Like Behrens et al. (2021), Davidsen and Sullivan use TGIRT-III RT for their analyses. The enzyme is not currently available in a form suitable for tRNA-seq. It would be very helpful to test different new RT enzymes that are commercially available. The example of Maxima RT - Figure 2 Supp 6 - shows significantly lower performance than the presented TGIRT-III RT data. In lines 296-298, the authors mention improvements to the protocol by using ornithine. Why are these improvements not included?

      We share similar concerns that the TGIRT-III enzyme is no longer commercially available. It became unavailable while we were preparing this manuscript, reflected by the fact that almost all our figures are made using this enzyme. Others have discovered this too and Lucas et al. (2023) [PMID: 37024678] tested several RT polymerases using TapeStation as a readout for readthrough. As they reported that Maxima has good performance, we decided to test it on a full run with replicates. The results are outlined in Figure 2—figure supplement 6 and for resubmission we have added a table to the appendix that compares the alignment statistics. Unfortunately, the readthrough of the Maxima polymerase on cytoplasmic tRNAs is not as high as for TGIRT-III; however, interestingly it seems to have better performance for mitochondrial tRNAs (Figure 2 – Figure Supplement 6). Regardless, in the initial paper submission we failed to evaluate whether this readthrough difference affected charge measurements. We have now fixed this by adding Figure 2—figure supplement 7, which shows that there are no differences in charge measurements TGIRT-III vs. Maxima. Not surprisingly, there are substantial differences between polymerases when looking at relative tRNA abundance (which affirms the discussion above related to the difficulty of tRNA abundance quantification); however, the high sample-to-sample reproducibility remains intact with either polymerase. An exhaustive search for better polymerases is warranted but falls outside the scope of our work.

      Regarding the improvements suggested by us, using ornithine as a cleavage catalyst instead of lysine, we first learned about this possibility later and thus only want to make readers aware that other options exist. We have clarified the paragraph to make this clearer.

      (4) A technical concern: The samples are purified multiple times using a specific RNA purification kit. Did the authors test different methods to purify the RNA and does this influence the result of the method?

      In the past, we have relied exclusively on alcohol precipitation but during the development of this protocol we found it easier and more reproducible to use column-based purification when possible. However, as we have not made a direct comparison this remains anecdotal evidence. Nonetheless, to minimize any possible bias of column-based purification you will notice that we use columns with binding capacity 5x higher than the highest amount of RNA/DNA added to the column.

      (5) The study would benefit from an explicit step-by-step protocol, including the choice of adapters that are shown to work best in the protocol.

      This is a great point! We have included tables with all the oligos used (Supplementary file 1), a detailed step-by-step protocol with pictures of anticipated gel results (Supplementary file 2) and an overview of the RNA/DNA manipulations to make it clear where adapter sequences are located (Supplementary file 3). For the data processing we provide a comprehensive example in the Github repository. All this was included in our first submission of this manuscript (as well as on bioRxiv), but we suspect this was not readily accessible to the reviewers. We will make sure that these documents are going to be available through eLife and have emphasized their existence in the main text of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      To stratify this improvement a comparison to the most common methods should be made. For example, how do the results with the improved protocol with i-tRAP (PMID 36283829), tRNA microarrays (PMID: 31263264), or with the approaches the improved protocol shares with some other tRNA-seq approaches (DM-tRNA-seq, mim-tRNA-seq)?

      Once again, we thank the reviewer for the good recommendations. The points about direct comparisons were discussed above.

      Reviewer #2 (Recommendations For The Authors):

      These are all great points; we address them below.

      Minor points:

      - Please use chemical conventions, e.g. for mcm5s2U and NaIO4 with superscript or subscript.

      Fixed.

      - Figure 2F: Glu GAA is only 82% charged; can this be due to mcm5s2U (Figure 3 supp 2) leading to a misalignment? What happens to Ser-NNN? Why is mitochondrial tRNA so much less charged?

      Regarding the Glu-GAA charge at baseline, we do not think this is an artifact of the mcm5s2U modification as it would then also be expected for Gln-CAA and Lys-AAA. The same occurs in the charge data in Evans et al. (2017) and they use a very different alignment strategy. Lastly, the charge titration and half-life experiments show no evidence of inaccuracy/bias for Glu-GAA.

      But the question remains – why is the charge of Glu-GAA so low? At this point our best guess is speculative. It may have something to do with the strong enrichment of Glu-GAA codons in the A site found by ribosome profiling on mouse embryonic stem cells (Ingolia et al. (2011) [PMID: 22056041]).

      - Spell out "clvg" or "dphs" in the figure legend of Figure 2 and others. Similar for other abbreviations in figures. They are not always explained in the legends.

      Fixed.

      - Figure 3 supp 2: Please use U instead of T in the anticodons. The labels are a bit confusing. Please clearly align to the tick (also for Figure 3C).

      Fixed.

      - Line 220-223. Which RT enzyme was used for Figure 3 supp 2? Does it make a difference?

      TGIRT-III was used. Only Figure 2—figure supplement 6 and Figure 2—figure supplement 7 (added for resubmission) show data with the Maxima polymerase. To address the second part of the question we have added a comparison between TGIRT-III and Maxima for mcm5s2U modification detection (Figure 3—figure supplement 3). Interestingly, there is a polymerase specific signature for mcm5s2U modifications; however, more work would be required to determine which polymerase is best suited for detection of this and other modifications.

      - Figure 4 supp 1 and Figure 4 supp 2 change order.

      Fixed.

      Typos:

      - Figure 1 and Figure 1-figure supplement 1: In the periodate the "-" is in a small box (at least in my PDF viewer). Can this box be removed?

      - Line 175: duplicated verb.

      - Line 348: "moved".

      Thanks for catching these. They have now been fixed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) Measurement of secreted amylase could be seen as direct evidence of sweating, however, how to determine the causal relationship between climbing behavior and sweating? Friction force may also be reduced when there is too much fingertip moisture.

      As the reviewer notes, measurement of secreted amylase can provide direct evidence of sweating, and we performed an iodine and starch reaction. Upon observing the involvement of TRPV4 in mouse foot pad perspiration, we then considered which type of behavioral analysis would be suitable to evaluate this perspiration. We agree with the reviewer’s point that friction force in the climbing test may be reduced by excessive sweating. However, we did not observe severe sweating in the absence of acetylcholine treatment. Accordingly, we interpreted that the increase in the climbing test failure rate for TRPV4KO mice could reflect the reduced friction force associated with the lack of TRPV4 activity.

      (2) For the human skin immunostaining, did the author use the same TRPV4 antibody as used in the mouse staining? Did they validate the specificity of the antibody for the human TRPV4 channel? 

      We used different antibodies for human and mouse samples. Since commercially available anti-TRPV4 antibodies do not work well with mouse samples, we generated our own anti-TRPV4 antibody and validated its specificity.

      (3) In lines 116-117, the authors tried to determine "the functional interaction of TRPV4 and ANO1 is involved in temperature-dependent sweating", however, they only used the TRPV4 ko mice and did not show any evidence supporting the relationship between TRPV4 and ANO1. 

      As the reviewer pointed out, based on the data presented in the original submission we cannot conclude that an interaction between TRPV4 and ANO1 is involved in perspiration. However, we think that the data for TRPV4KO mice presented in Figure 3 of the original version does indicate that TRPV4 is involved in perspiration. The finding that menthol and its related compounds, which inhibit the function of both TRPV4 and ANO1 (see our publication in Scientific Reports 7: 43132, 2017), blocked perspiration in both wild-type and TRPV4KO mice (original Figure 3C, D) indicates involvement of either TRPV4 or ANO1 in perspiration. In the revised version, we present results for additional iodine and starch reaction experiments using Ani9, a potent and specific ANO1 inhibitor. Ani9 drastically inhibited perspiration from mouse food pads both at 25 °C and 35 °C. Based on these collective results, we concluded that both TRPV4 and ANO1, likely acting as a complex, are involved in perspiration. We present the new data with Ani9 in the revised Figure 3E, F.

      (4) Figure 3-4 is quite confusing. At 25˚C, no sweating difference was observed between TRPV4 and wt mice (Fig 3A-3D), suggesting both Ach-induced sweating and basal sweating are TRPV4-independent at 25˚C, however, the climbing test was done at 26-27 ˚C and the data showed a climbing deficit in TRPV4 ko mice. How to interpret the data is unclear. 

      Thank you for raising this point. In the iodine and starch reaction experiment, we observed no significant reduction in perspiration in the absence of acetylcholine at 25 °C, which is the same condition as in the climbing test, whereas we detected less perspiration for TRPV4KO mice. In a trial using additional mice, we detected significantly less perspiration under control conditions without acetylcholine at 25 °C, which is consistent with the results of the climbing test. We have added this new data to the revised Figure 3A, B.

      (5) Were there any gender differences associated with sweating in mice? In Figure 3, the mouse number for behavior tests should be at least 5. 

      The TRPV4KO mice reproduced poorly and we were unable to obtain sufficient numbers of male and female mice to determine whether there were gender differences in sweating. However, according to the reviewer’s suggestion, and as mentioned above, we increased the number of experiments to obtain the results shown in the revised Figure 3. We did not a observe a significant difference in sweating with the larger sample size, which supports our conclusions.

      (6) 8- to 21-week-old mice were used in the immunostaining, the time span is too long. 

      Given the difficulty in obtaining sufficient numbers of TRPV4KO mice, we used a somewhat wider age distribution to obtain samples for immunostaining. However, we did not observe age-dependent differences in immunostaining. We reference this point in the revised manuscript.

      (7) The authors used homozygous TRPV4 ko mice for all experiments. What are control mice? Are they littermates of the TRPV4 ko mice? 

      We did not use littermates for our in vivo experiments because the TRPV4KO mice reproduced poorly and the litter sizes were small. However, we did backcross the KO mice to the commercially available wild-type mice more than ten times. As such, we expect that the wild-type and TRPV4KO mice will have similar genetic backgrounds. In addition, we have published multiple studies that have successfully used this method, which we think supports the reliability of our results for experiments involving mice.

      Reviewer #2 (Public Review):

      (1) The coexpression data needs additional controls. In the TRPV4 KO mice, there appears to be staining with the TRPV4 Ab in TRPV4 KO mice below the epidermis. This pattern appears similar to that of the location of the secretory coils of the sweat glands (Fig 1A). Is the co-staining the authors note later in Figure 1 also seen in TRPV4 KOs? This control should be shown, since the KO staining is not convincing that the Ab doesn't have off-target binding. 

      We thank the reviewer for raising these concerns about immunostaining. As the reviewer notes, in the low power image the signals appeared to be weak and punctate signals were present in the basal region of glandular cells. Although we did not identify immunohistochemical conditions that produced no signal, tissue sections from WT mice stained with anti-TRPV4 antibody showed conspicuous apical signals for the glandular cells facing lumen. Meanwhile, TRPV4KO tissues showed no signals at the apical region of the glandular cells, where the TRPV4-ANO1 interaction is expected to occur. We confirmed no trace signals in the TRPV4KO tissues in the immunoblotting.

      (2) Are there any other markers besides CGRP for dark cells in mice to support the conclusion that mouse secretory cells have clear cell and dark cell properties? 

      We did not stain with other dark cell markers. Based on previous studies describing the differences between clear and dark cells in mouse eccrine glands, we think that dark and clear cells cannot be clearly discriminated, as we described in lines 93-96 of the Results. We identified secretory cells using CK8 and dark cells with CGRP, a marker of dark cells in human eccrine glands (Zancanaro et al. 1999 J Anat). Our result showed that CGRP immunostaining could not discriminate between clear and dark cells, which is consistent with a previous report showing that mouse secretory cells were assumed to be undifferentiated and primitive based on electron microscopic observation (Kurosumi et al. 1970 Arch Histol Jap).

      (3) The authors utilize menthol (as a cooling stimulus) in several experiments. In the discussion, they interpret the effect of menthol as potentially disrupting TRPV4-ANO1 interactions independent of TRPM8. Yet, the role of TRPM8, such as in TRPM8 KO mice, is not evaluated in this study.

      We performed the iodine and starch reaction experiments with TRPM8KO mice. In the TRPM8KO mice, the sweat spots did not differ from those seen for WT mice (p=0.63, t-test), and there was also a significant reduction in sweating with menthol treatment following acetylcholine stimulation that was similar to that seen for WT mice. These results would rule out the involvement of TRPM8 in a menthol-induced reduction in sweating. We have included this data in the revised Figure 3D.

      (4) Along those lines, the authors suggest that menthol inhibits eccrine function, which might lead to a cooling sensation. But isn't the cooling sensation of sweating from evaporative cooling? In which case, inhibiting eccrine function may actually impair cooling sensations.

      Menthol has a non-specific effect that activates TRPM8, TRPV3 and TRPA1, and inhibits TRPV1, TRPV4 and ANO1. Therefore, we did not carry out a climbing test with menthol in part because menthol-dependent TRPA1 activation decreased the propensity of the mice to climb. As the reviewer notes, TRPM8 activation following topical application of menthol may cause a cooling sensation elicited in sensory neurons beneath the skin. However, the comfortable cooling sensation could also be caused in part by decreased sweating. The relationship between a comfortable cooling sensation and less perspiration following menthol application may be difficult to determine, and we have mentioned this in the updated Discussion.

      (5) The climbing assay is interesting and compelling. The authors note performing this under certain temperature and humidity conditions. Presumably, there is an optimal level of skin moisture, where skin that is too dry has less traction, but skin that is too wet may also have less traction. It would bolster this section of the study to perform this assay under hot conditions (perhaps TRPV4 KO mice, with impaired perspiration, would outperform WT mice with too much sweating?), or with pharmacologic intervention using TRPV4 agonists or antagonists to more rigorously evaluate whether this model correlates to TRPV4 function in the setting of different levels of perspiration.

      We thank the reviewer for this suggestion. Upon detecting the involvement of TRPV4/ANO1 interaction in perspiration, we considered different behavioral analyses that can be performed to demonstrate whether the TRPV4/ANO1 interactions are involved in perspiration. As the reviewer suggested, there should be an optimal level of sweating. Therefore, we first set the room temperature at 26-27 ˚C and humidity at 35-50%. To our knowledge, this is the first demonstration of temperature-dependent sweating of mouse foot pads. In humans, palm sweating is often referred to as psychotic sweating that is known to be regulated by sympathetic nerve activity. Here we tested whether foot pad sweating might be related to friction force wherein sufficient amounts of sweating could increase the friction force and in turn increase the success rate for the climbing test using a vinyl-covered slippery slope that was selected based on several trials to determine the optimal surface material and slope angles. As the reviewer suggests, the success rates could be affected by multiple factors, and hot temperatures likely induce more sweating that could increase the success rates in the climbing test. We will need to carry out additional experiments that are beyond the scope of this study to examine these temperature-dependent effects. Generally, sweating is regulated by sympathetic nerve activity that occurs in response to increased brain neuron excitation. However, here we raise for the first time the possibility that sweating might be regulated by local temperature sensation mediated through TRPV4 that may be effective for fine-tuning of perspiration activity. We have updated the Discussion to reference this possibility.

      (6) There are other studies (PMID 33085914, PMID 31216445) that have examined the role of TRPV4 in regulating perspiration. The presence of TRPV4 in eccrine glands is not a novel finding. Moreover, these studies noted that TRPV4 was not critical in regulating sweating in human subjects. These prior studies are in contradiction to the mouse data and the correlation to human anhidrotic skin in the present study. Neither of these studies is cited or discussed by the authors, but they should be. 

      We thank the reviewer for referencing these other studies concerning the possible involvement of TRPV4 in perspiration in humans. These studies focused on the vasodilating effects of TRPV4 and drew the conclusion that TRPV4 is not involved in sweating in humans, which is in contrast to our data for mice and humans. Multiple factors could explain the apparent difference between the two studies. For example, the parameters they examined differed from ours in that we assessed patients with AIGA, whereas the previous studies involved healthy volunteers. We have updated the Discussion to note the difference in the results of our and previous studies.   

      Reviewer #3 (Public Review):

      (1) Figure 2: The calcium imaging-based approach shows average traces from 6 cells per genotype, but it was unclear if all acinar cells tested with this technique demonstrated TRPV4-mediated calcium influx, or if only a subset was presented.

      “n = 6” does not indicate the number of cells, but rather 6 independent experiments that each had over 20 ROIs of sweat glands. We have clarified this point in the updated figure legend.

      (2) Figure 4: The climbing behavioral test shows a significant reduction in climbing success rate in TRPV4-deficient mice. The authors ascribe this to a lack of hind paw 'traction' due to deficiencies in hind paw perspiration, but important controls and evidence that could rule out other potential confounds were not provided or cited. 

      As noted in our response to Comment 5 made by Reviewer #2, we spent considerable time identifying optimal conditions that would delineate success rates in the climbing experiments. We are confident that TRPV4KO mice had significantly lower success rates than WT mice, but there are various factors that could affect the experimental outcomes. We reference these factors in the updated Discussion.

      (3) In general, the results support the authors' claims that TRPV4 activity is a necessary component of sweat gland secretion, which may have important implications for controlling perspiration as well as secretion from other glands where TRPV4 may be expressed. 

      As described above, the results we obtained in the climbing test can be affected by various factors. However, based on the consistency of the results obtained for the climbing test and the iodine and starch reaction assay, we think that our interpretation is correct. In terms of the involvement of TRPV4/ANO1 interactions in fluid secretion, we previously reported that the TRPV4/ANO1 complex is involved in cerebrospinal fluid secretion in the mouse choroid plexus (FASEB J. 2014) and in saliva and tear secretion in mouse salivary and lacrimal glands (FASEB J. 2018). Together, these findings suggest that this mechanism is common to water efflux from exocrine glands.

      Reviewer #1 (Recommendations For The Authors):

      (1) An exocrine gland-specific trpv4 knockout mouse should be used, as TRPV4 is also expressed by muscles, global knockout TRPV4 may affect the TRPV4-dependent muscle strength and reduce the climbing ability in mice. 

      As the reviewer suggests, use of mice with TRPV4 knockout specific to exocrine glands would be preferable to mice having global TRPV4 knockout given that TRPV4 is expressed in multiple tissues. We agree with this suggestion, but we do not currently have such mice in hand. However, as mentioned above, we have reported the involvement of theTRPV4/ANO1 interaction in cerebrospinal fluid secretion from the choroid plexus in mice (FASEB J. 28: 2238-2248, 2014), as well as saliva and tear secretion in mouse salivary and lacrimal glands (FASEB J. 32: 1841-1854, 2018.), suggesting that the TRPV4/ANO1 interaction could be widely involved in exocrine gland functions that involve water movement. We have updated the Discussion to reference this point.  

      (2) The authors showed Calcium imaging data that Menthol inhibits TRPV4-dependent calcium influx. However, it is well known that menthol induces the sensation of cooling by activating TRPM8. More evidence, including patch clamp recordings, should be done to verify the inhibition effects of menthol on TRPV4 and ANO1. Moreover, Fig 3E-3F could only suggest that menthol-induced cooling sensation may affect sweating but not the inhibition effect of menthol on TRPV4 and ANO1 channels. 

      We agree that more evidence including patch-clamp recordings can verify the inhibitory effects of menthol on TRPV4 and ANO1. We did not include such experiments here since we previously showed that menthol and related agents indeed inhibit TRPV4- and ANO1-mediated currents (Sci. Rep. 7: 43132, 2017). We now cite this paper in the revised version.

      (3) Excepting the climbing test, are there any other better models to asses the sweating-related behaviors? 

      When we detected the involvement of TRPV4/ANO1 interactions in perspiration, we considered different types of behavioral analyses that could be used to demonstrate TRPV4/ANO1-dependent perspiration. We think that the climbing experiment is the best test, particularly since foot pads are one of the few regions on mice that is not covered by fur and thus amenable to evaluation of perspiration using an iodine and starch test.

      Reviewer #2 (Recommendations For The Authors):

      (1) I was confused by a section in the introduction on lines 59-60: How does Cl- efflux lead to the formation of a physical complex in cells with high intracellular Cl-? What is the physical complex? This seems like several disparate concepts combined together, which need to be clarified.

      We apologize for the incomplete descriptions of several of our previous works. We have amended the Introduction section in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) TRPV4 is expressed by multiple other cell types in the skin (keratinocytes, macrophages etc.) which may have an impact on peripheral sensory function. Is there evidence that TRPV4-deficient animals have relatively normal sensory acuity and/or proprioception? Such evidence would lend more credibility to the reported findings in the climbing test. 

      As the reviewer points out, TRPV4 is expressed by multiple other cell types in the skin. To date we have found that TRPV4KO mice show no differences in sensory functions compared to WT mice. Whether TRPV4 is involved in proprioception is unclear, based on both our own observation and those that appear in the literature, although TRPV4 is clearly activated by mechanical stimuli. We previously compared the mechanical sensitivity of TRPV4 and Piezo1 in bladder epithelial cells, and found that Piezo 1 shows much higher sensitivity relative to TRPV4 (J. Biol. Chem. 289: 16565-16575, 2014), which is consistent with the involvement of Piezo1, rather than TRPV4, in proprioception. Although TRPV4 is reported to be expressed in sensory neurons, we did not detect TRPV4-mediated responses in isolated rat and mouse DRG neurons, suggesting that TRPV4-positive sensory neurons are relatively rare.

      (2) The methods section refers to loading entire sweat glands with Fura-2 dye for calcium imaging, but the figure legend refers to sweat gland acinar cells. Resolving this ambiguity would help readers to interpret the data. 

      We apologize for this error and have made an appropriate correction in the revised manuscript.

      (3) Alternatively, could acute intraplantar injection of a TRPV4 antagonist (e.g. GSK205) in wild-type mice phenocopy the TRPV4-knockout mouse deficits, or could normal climbing behavior be restored in the TRPV4 knockout by adding artificial perspiration to their hindpaws?

      We thank the reviewer for raising this interesting possibility and suggesting use of TRPV4 agonists or antagonists in the climbing tests. We agree that results of such an experiment would support the involvement of TRPV4 in sweating. We tried to do such experiments using injection of TRPV4 regulators into mouse hindpaws. However, the injections themselves appeared to impact climbing ability, perhaps in part due to painful sensations associated with the injection. Similarly, menthol injection appeared to reduce climbing activity, likely through pain sensations associated with TRPA1 activation. As such, we did not pursue these experiments.

    1. Author Response:

      We sincerely value the insightful and constructive feedback provided by the reviewers, which has been instrumental in identifying areas of our manuscript that required further clarification or amendment. Below are our responses detailing each comment.

      Reviewer 1:

      (1) One major issue arises in Figure 4, the recording of VLPO Ca2+ activity. In Lines 211-215, they stated that they injected AAV2/9-DBH-GCaMP6m into the VLPO, while activating LC NE neurons. As they claimed in line 157, DBH is a specific promoter for NE neurons. This implies an attempt to label NE neurons in the VLPO, which is problematic because NE neurons are not present in the VLPO. This raises concerns about their viral infection strategy since Ca activity was observed in their photometry recording. This means that DBH promoter could randomly label some non-NE neurons. Is DBH promoter widely used? The authors should list references. Additionally, they should quantify the labeling efficiency of both DBH and TH-cre throughout the paper.

      (1) In Figure 5, we found that the VLPO received the noradrenergic projection from LC, indicating the recorded Ca2+ activity may come from the axon fibers corresponding to the projection. Similarly, Gunaydin et al. (2014) demonstrated that fiber photometry can be used to selectively record from neuronal projection.

      (2) Located in the inner membrane of noradrenergic and adrenergic neurons, DBH (Dopamine-beta-hydroxylase) is an enzyme that catalyzes the conversion of dopamine to norepinephrine, and therefore plays an important role in noradrenergic neurotransmission. DBH is a marker of noradrenergic neurons. Zhou et al. (2020) clarified the probe specifically labeled noradrenergic neurons by immunolabeling for DBH. Recently, DBH promoter have been used in several studies (e.g., Han et al., 2024; Lian et al., 2023). The DBH-Cre mice are widely used to specifically labeled noradrenergic neurons (e.g., Li et al., 2023; Breton-Provencher et al., 2022; Liu et al., 2024). As reviewer said, it is difficult to distinguish the role of NE or DA neurons when using the TH promoter in VLPO. Therefore, we used DBH promoter with more specific labeling. LC is the main noradrenergic nucleus of the central nervous system. In our study, we injected rAAV-DBH-GCaMP6m-WPRE (Figure 2 and 8) and rAAV-DBH-EGFP-S'miR-30a-shRNA GABAA receptor)-3’-miR30a-WPRES (Figure 9) into the LC. The results showed that DBH promoter could specifically label noradrenergic neurons in the LC, while non-specific markers outside the LC were almost absent. As suggested, we will quantify the labeling efficiency of both DBH and TH-cre throughout the revised manuscript. This updated figure will provide a more rigorous analysis.

      (2) A similar issue arises with chemogenetic activation in Fig. 5 L-R, the authors used TH-cre and DIO-Gq virus to label VLPO neurons. Were they labelling VLPO NE or DA neurons for recording? The authors have to clarify this.

      As previously addressed in response to Comment #1, we acknowledge that it is difficult to distinguish the role of NE or DA neurons when using the TH promoter in VLPO. In the revised manuscript, we are considering conducting more restricted AAV injections into the VLPO to verify terminal expressions in the LC.

      (3) Another related question pertains to the specificity of LC NE downstream neurons in the VLPO. For example, do they preferentially modulate GABAergic or glutamatergic neurons?

      As suggested, we will supplement the multi-label ISH of LC NE downstream neurons in the VLPO to reveal the types of neurons they modulate.  

      (4) In Figure 1A-D, in the measurement of the dosage-dependent effect of Mida in LORR, were they only performed one batch of testing? If more than one batch of mice were used, error bar should be presented in 1B. Also, the rationale of testing TH expression levels after Mid is not clear. Is TH expression level change related to NE activation specifically? If so, they should cite references.

      (1) As recommended, we will supplement error bar in the revised manuscript.

      (2) As reviewer suggested, the use of TH as a marker of NE activation is controversial, so in the revised manuscript, we will directly determine central norepinephrine content.

      (5) Regarding the photometry recording of LC NE neurons during the entire process of midazolam injection in Fig. 2 and Fig. 4, it is unclear what time=0 stands for. If I understand correctly, the authors were comparing spontaneous activity during the four phases. Additionally, they only show traces lasting for 20s in Fig. 2F and Fig. 4L. How did the authors select data for analysis, and what criteria were used? The authors should also quantify the average Ca2+ activity and Ca2+ transient frequency during each stage instead of only quantifying Ca2+ peaks. In line 919, the legend for Figure 2D, they stated that it is the signal at the BLA; were they also recorded from the BLA?

      (1) In this study, we used optical fiber calcium signal recording, which is a fluorescence imaging based on changes in calcium. The fluorescence signal is usually divided into different segments according to the behavior, and the corresponding segments are orderly according to the specific behavior event as the time=0. The mean calcium fluorescence signal in the time window 1.5s or 1s before the event behavior is taken as the baseline fluorescence intensity (F0), and the difference between the fluorescence intensity of the occurrence of the behavior and the baseline fluorescence intensity is divided by the difference between the baseline fluorescence intensity and the offset value. That is, the value ΔF/F0 represents the change of calcium fluorescence intensity when the event occurs. The results of the analysis are commonly represented by two kinds of graphs, namely heat map and event-related peri-event plot (e.g., Cheng et al., 2022; Gan-Or et al., 2023; Wei et al., 2018). In Fig. 2, the time points for awake, midazolam injection, LORR and RORR in mice were respectively selected as time=0, while in Fig. 4, RORR in mice was selected as time=0. The selected traces lasting for 20s was based on the length of a complete Ca2+ signal. We will explain the Ca2+ recording experiment more specifically in the revised manuscript.

      (2) To the BLA, we sincerely apologize for our carelessness, the signal we recorded were from the LC rather than the BLA. We will carefully check and correct similar problems in the revised manuscript.

      Reviewer 2:

      In figure legends, abbreviations in figure should be supplemented as much as possible. For example, "LORR" in Figure 1.

      As suggested, we will supplement abbreviations in figure as much as possible in the revised manuscript.

      References

      Gunaydin LA, Grosenick L, Finkelstein JC, et al. Natural neural projection dynamics underlying social behavior. Cell. 2014;157(7):1535-1551. doi:10.1016/j.cell.2014.05.017

      Zhou N, Huo F, Yue Y, Yin C. Specific Fluorescent Probe Based on "Protect-Deprotect" To Visualize the Norepinephrine Signaling Pathway and Drug Intervention Tracers. J Am Chem Soc. 2020;142(41):17751-17755. doi:10.1021/jacs.0c08956

      Han S, Jiang B, Ren J, et al. Impaired Lactate Release in Dorsal CA1 Astrocytes Contributed to Nociceptive Sensitization and Comorbid Memory Deficits in Rodents. Anesthesiology. 2024;140(3):538-557. doi:10.1097/ALN.0000000000004756

      Lian X, Xu Q, Wang Y, et al. Noradrenergic pathway from the locus coeruleus to heart is implicated in modulating SUDEP. iScience. 2023;26(4):106284. Published 2023 Feb 27. doi:10.1016/j.isci.2023.106284

      Li C, Sun T, Zhang Y, et al. A neural circuit for regulating a behavioral switch in response to prolonged uncontrollability in mice. Neuron. 2023;111(17):2727-2741.e7. doi:10.1016/j.neuron.2023.05.023

      Breton-Provencher V, Drummond GT, Feng J, Li Y, Sur M. Spatiotemporal dynamics of noradrenaline during learned behaviour. Nature. 2022;606(7915):732-738. doi:10.1038/s41586-022-04782-2

      Liu Q, Luo X, Liang Z, et al. Coordination between circadian neural circuit and intracellular molecular clock ensures rhythmic activation of adult neural stem cells. Proc Natl Acad Sci U S A. 2024;121(8):e2318030121. doi:10.1073/pnas.2318030121

      Cheng J, Ma X, Li C, et al. Diet-induced inflammation in the anterior paraventricular thalamus induces compulsive sucrose-seeking. Nat Neurosci. 2022;25(8):1009-1013. doi:10.1038/s41593-022-01129-y

      Gan-Or B, London M. Cortical circuits modulate mouse social vocalizations. Sci Adv. 2023;9(39):eade6992. doi:10.1126/sciadv.ade6992

      Wei YC, Wang SR, Jiao ZL, et al. Medial preoptic area in mice is capable of mediating sexually dimorphic behaviors regardless of gender. Nat Commun. 2018;9(1):279. Published 2018 Jan 18. doi:10.1038/s41467-017-02648-0

    1. Author response:

      Reviewer 1:

      A limit of the paper is that the biological mechanisms by which intracellular mechanics is modulated (e.g. among cell types) remains unexplored and only briefly discussed. Yet this limit is greatly offset by the rigor of the approach.

      We thank the reviewer for the valuable feedback. The question regarding the biological mechanisms responsible for the different mechanical properties is, indeed, a highly important and interesting issue. In line with the reviewer, we consider this so important that it requires an extra, dedicated research focus, which is far beyond the scope of this article. By introducing the concept of the mechanical fingerprint, we provide in this work the framework to systematically investigate biological mechanisms but also the functional relevance of the intracellular mechanical properties in future studies. In the revised manuscript, we’ll elaborate on the discussion.

      Reviewer 2:

      The most difficult part of the method is the part with actin polymerization inhibition with cytochalasin B. The data shows that viscoelastic parameters as well as active energy parameters are unaffected by cytochalasin B. It is reasonable to expect that elasticity will reduce and fluidity will increase upon application of such a drug. The stiffness-reducing effect was observed only when CB was used with nocodazole most likely because of phagocytosis of the bead, which is governed by microtubule. The use of other actin-depolymerizing drugs such as latrunculin A would be needed to test actin’s role in mechanical fingerprints. If actin’s role is only explained by accompanying microtubule inhibition, it is not a convenient system to directly test the mechano-adaptation process.

      We thank the reviewer for the time and the instructive feedback. Our finding that actin depolymerization has no effect on the intracellular mechanics may appear unfamiliar, as many rheological studies performed on the cell’s cortex highlight the importance of actin on the mechanical properties of the whole cell. However, as the actin network is reported to be very sparse away from the cortex it is not impossible that the mechanical properties may be dominated by other structures in the cytoplasm. Indeed, our findings are consisted with other studies that see no strong effect of actin depolymerization on the interphase intracellular mechanics (e.g. https://doi.org/10.1016/j.bpj.2023.04.011 or https://doi.org/10.1038/s41567-021-01368-z). Still, we fully agree with the reviewers that this is an important point. In a revised version we aim to investigate the effect of other actin-depolymerizing drugs and will try to perform immunostaining to visualize and further illuminate the potential compensation mechanism between actin and MT.

      Depolymerization of MT with nocodazole did not reduce the solid-like property A. Adding discussion and comparison with other papers in the literature using nocodazole will be helpful in understanding why.

      Again, we agree with the reviewer and propose to further study this point by performing additional immunostainings and by elaborating on the discussion, also including the results of other studies.

      Reviewer 3:

      The importance of the mechanical fingerprint is diluted due to some missing controls needed for biological relevance.

      We thank the reviewer for his valuable time and feedback. This comment is in line with the point already raised by reviewer 1 and highlights the important question of how the intracellular mechanical properties are related to the actual cell function. We fully agree with the reviewers that at this point we can only report on differences, but cannot claim a biological function that is depending on the fingerprint. Although we think the alignment between function and the mechanical fingerprints allows the hypothesis that the biological system is tuning its mechanical properties for a specific function, we do not want to make any claim in this direction at the current state of our research. Hence, to answer these intriguing questions, carefully designed control experiments are required, as pointed out by the reviewer. However, this direction is not the scope of this manuscript. Here, we establish the tools we’ll use in future studies to address these highly relevant questions. Therefore, we propose to discuss these important future directions in a revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Kroll et al. conduct an in-depth behavioral analysis of F0 knockouts of 4 genes associated with late-onset Alzheimer's Disease (AD), together with 3 genes associated with early- onset AD. Kroll and colleagues developed a web application (ZOLTAR) to compare sleep-associated traits between genetic mutants with those obtained from a panel of small molecules to promote the identification of affected pathways and potential therapeutic interventions. The authors make a set of potentially important findings vis-à-vis the relationship between AD-associated genes and sleep. First, they find that loss-of-function in late-onset AD genes universally results in nighttime sleep loss, consistent with the well-supported hypothesis that sleep disruption contributes to Alzheimer's-related pathologies. psen-1, an early-onset associated AD gene, which the authors find is principally responsible for the generation of AB40 and AB42 in zebrafish, also shows a slight increase in activity at night and slight decreases in nighttime sleep. Conversely, psen-2 mutations increase daytime sleep, while appa/appb mutations have no impact on sleep. Finally, using ZOLTAR, the authors identify serotonin receptor activity as potentially disrupted in sorl1 mutants, while betamethasone is identified as a potential therapeutic to promote reversal of psen2 knockout-associated phenotypes.

      This is a highly innovative and thorough study, yet a handful of key questions remain. First, are nighttime sleep loss phenotypes observed in all knockouts for late-onset AD genes in the larval zebrafish a valid proxy for AD risk?

      We cannot say, but it is an interesting question. We selected the four late-onset Alzheimer’s risk genes (APOE, CD2AP, CLU, SORL1) based on human genetics data and brain expression in zebrafish larvae, not based on their likelihood to modify sleep behaviour, which we could have tried by searching for overlaps with GWAS of sleep phenotypes, for example. Consequently, we find it remarkable that all four of these genes caused a night-time sleep phenotype when mutated. We also find it reassuring that knockout of appa/appb and psen2 did not cause a night-time sleep phenotype, which largely excludes the possibility that the phenotype is a technical artefact (e.g. caused by the F0 knockout method) or a property of every gene expressed in the larval brain.

      Having said that, it could still be a coincidence, rather than a special property of genes associated with late-onset AD. In addition to testing additional late-onset Alzheimer’s risk genes, the ideal way to answer this question would be to test in parallel a random set of genes expressed in the brain at this stage of development. From this random set, one could estimate the proportion of genes that cause a night-time sleep phenotype when mutated. One could then use that information to test whether late-onset Alzheimer’s risk genes are indeed enriched for genes that cause a night-time sleep phenotype when mutated.

      For those mutants that cause nighttime sleep disturbances, do these phenotypes share a common underlying pathway? e.g. Do 5-HT reuptake inhibitors promote sleep across all 4 late-onset genes in addition to psen1? Can 5-HT reuptake inhibitors reverse other AD-related pathologies in zebrafish? Can compounds be identified that have a common behavioral fingerprint across all or multiple AD risk genes? Do these modify sleep phenotypes?

      To attempt to answer these questions, we used ZOLTAR to generate predictions for all the knockout behavioural fingerprints presented in the study, in the same way as for sorl1 in Fig. 5 and Fig. 5–suppl. 1. Here are the indications, targets, and KEGG pathways which are shared by the largest number of knockouts:

      – Four indications are shared by 4/7 knockouts: “mydriasis” (dilated pupils, significant for psen1, apoea/apoeb, cd2ap, clu); “fragile X syndrome” (psen1, apoea/apoeb, cd2ap, sorl1), “insomnia” (psen2, apoea/apoeb, cd2ap, sorl1); “malignant essential hypertension” (appa/appb, psen1, apoea/apoeb, cd2ap).

      – Two targets are shared by 5/7 knockouts: “glycogen synthase kinase−3 alpha” (psen1, apoeab, cd2ap, clu, sorl1) and “neuronal acetylcholine receptor beta−2” (appa/appb, psen1, apoeab, cd2ap, clu).

      – Two KEGG pathways are shared by 5/7 knockouts: “cholinergic synapse” (psen1, apoea/apoeb, cd2ap, clu, sorl1) and “nitrogen metabolism” (appa/appb, psen1, psen2, cd2ap, clu).

      As reminder, we hypothesised that loss of Sorl1 affected serotonin signalling based on the following annotations being significant: indication “depression”, target “serotonin transporter”, and KEGG pathway “serotonergic synapse”. All three are also significant for psen2 knockouts, but none others. ZOLTAR therefore does not predict serotonin signalling to be a major theme common to all mutants with a night-time sleep loss phenotype.

      While perhaps not surprising, we find reassuring that insomnia appears in the indications shared by the largest number of knockouts. apoea/apoeb, cd2ap, sorl1 also happen to be the knockouts with the largest loss in night-time sleep.

      Particularly interesting is cholinergic signalling appearing in the most common targets and KEGG pathways. Acetylcholine signalling is a major theme in research on Alzheimer’s disease. For example, the first four drugs ever approved by the FDA to treat Alzheimer’s disease were acetylcholinesterase inhibitors, which increase acetylcholine signalling by preventing its breakdown by acetylcholinesterase. These drugs are generally considered only to treat symptoms and not modify disease course, but this view has been called into question (Munoz-Torrero, 2008; Relkin, 2007). If, as ZOLTAR suggests, mutations in several Alzheimer’s risk genes affect cholinergic signalling early in development, this would point to a potential causal role of cholinergic disruption in Alzheimer’s disease.

      We see that literature also exists on the involvement of glycogen synthase kinase-3 in AD (Lauretti et al., 2020). We plan to explore further these predictions in a future study.

      Finally, the web- based platform presented could be expanded to facilitate comparison of other behavioral phenotypes, including stimulus-evoked behaviors.

      Yes, absolutely. The behavioural dataset we used (Rihel et al., 2010) did not measure other stimuli than day/night light transitions, but the “SauronX” platform and dataset (Myers-Turnbull et al., 2022) seems particularly well suited for this. To provide some context, we and collaborators have occasionally used the dataset by Rihel et al. (2010) to generate hypotheses or find candidate drugs that reverse a behavioural phenotype measured in the sleep/wake assay (Ashlin et al., 2018; Hoffman et al., 2016). The present work was the occasion to enable a wider and more intuitive use of this dataset through the ZOLTAR app, which has already proven successful. Future versions of ZOLTAR will seek to incorporate larger drug datasets using more types of measurements.

      Finally, the authors propose but do not test the hypothesis that sorl1 might regulate localization/surface expression of 5-HT2 receptors. This could provide exciting / more convincing mechanistic support for the assertion that serotonin signaling is disrupted upon loss of AD-associated genes.

      5-HT receptor type 4a is another candidate as it was shown to interact with sorting nexin 27, a subunit of retromer (Joubert et al., 2004). We see that antibodies against human 5-HT receptor type 2 and 4a exist; whether they would work in zebrafish remains to be tested, and in our experience, the availability of antibodies suitable for immunohistochemistry in the zebrafish is a serious experimental roadblock.

      Despite these important considerations, this study provides a valuable platform for high-throughput analysis of sleep phenotypes and correlation with small-molecule-induced sleep phenotypes.

      Strengths:

      - Provides a useful platform for comparison of sleep phenotypes across genotypes/drug manipulations.

      - Presents convincing evidence that nighttime sleep is disrupted in mutants for multiple late-onset AD-related genes.

      - Provides potential mechanistic insights for how AD-related genes might impact sleep and identifies a few drugs that modify their identified phenotypes

      Weaknesses:

      - Exploration of potential mechanisms for serotonin disruption in sorl1 mutants is limited.

      - The pipeline developed can only be used to examine sleep-related / spontaneous movement phenotypes and stimulus-evoked behaviors are not examined.

      - Comparisons between mutants/exploration of commonly affected pathways are limited.

      Thank you for these excellent suggestions, please see our answers above.

      Reviewer #2 (Public Review):

      Summary:

      This work delineates the larval zebrafish behavioral phenotypes caused by the F0 knockout of several important genes that increase the risk for Alzheimer's disease. Using behavioral pharmacology, comparing the behavioral fingerprint of previously assayed molecules to the newly generated knockout data, compounds were discovered that impacted larval movement in ways that suggest interaction with or recovery of disrupted mechanisms.

      Strengths:

      This is a well-written manuscript that uses newly developed analysis methods to present the findings in a clear, high-quality way. The addition of an extensive behavioral analysis pipeline is of value to the field of zebrafish neuroscience and will be particularly helpful for researchers who prefer the R programming language. Even the behavioral profiling of these AD risk genes, regardless of the pharmacology aspect, is an important contribution. The recovery of most behavioral parameters in the psen2 knockout with betamethasone, predicted by comparing fingerprints, is an exciting demonstration of the approach. The hypotheses generated by this work are important stepping stones to future studies uncovering the molecular basis of the proposed gene-drug interactions and discovering novel therapeutics to treat AD or co-occurring conditions such as sleep disturbance.

      Weaknesses:

      - The overarching concept of the work is that comparing behavioral fingerprints can align genes and molecules with similarly disrupted molecular pathways. While the recovery of the psen2 phenotypes by one molecule with the opposite phenotype is interesting, as are previous studies that show similar behaviorally-based recoveries, the underlying assumption that normalizing the larval movement normalizes the mechanism still lacks substantial support. There are many ways that a reduction in movement bouts could be returned to baseline that are unrelated to the root cause of the genetically driven phenotype. An ideal experiment would be to thoroughly characterize a mutant, such as by identifying a missing population of neurons, and use this approach to find a small molecule that rescues both behavior and the cellular phenotype. If the connection to serotonin in the sorl1 was more complete, for example, the overarching idea would be more compelling.

      Thank you for this cogent criticism.

      On the first point, we were careful not to claim that betamethasone normalises the molecular/cellular mechanism that causes the psen2 behavioural phenotype. Having said that, yes, to a certain extent that would be the hope of the approach. As you say, every compound which normalises the behavioural fingerprint will not normalise the underlying mechanism, but the opposite seems true: every compound that normalises the underlying mechanism should also normalise the behavioural fingerprint. We think this logic makes the “behaviour-first” approach innovative and interesting. The logic is to discover compounds that normalise the behavioural phenotype first, only subsequently test whether they also normalise the molecular mechanism, akin to testing first whether a drug resolves the symptoms before testing whether it actually modifies disease course. While in practice testing thousands of drugs in sufficient sample sizes and replicates on a mutant line is challenging, the dataset queried through ZOLTAR provides a potential shortcut by shortlisting in silico compounds that have the opposite effect on behaviour.

      You mention a “reduction in movement bouts” but note here that the number of behavioural parameters tested is key to our argument. To take the two extremes, say the only behavioural parameter we measured in psen2 knockout larvae was time active during the day, then, yes, any stimulant used at the right concentration could probably normalise the phenotype. In this situation, claiming that the stimulant is likely to also normalise the underlying mechanism, or even that it is a genuine “phenotypic rescue”, would not be convincing. Conversely, say we were measuring thousands of behavioural parameters under various stimuli, such as swimming speed, position in the well, bout usage, tail movements, and eye angles, it seems almost impossible for a compound to rescue most parameters without also normalising the underlying mechanism. The present approach is somewhere in-between: ZOLTAR uses six behavioural parameters for prediction (e.g. Fig 6a), but all 17 parameters calculated by FramebyFrame can be used to assess rescue during a subsequent experiment (Fig. 6c). For both, splitting each parameter in day and night increases the resolution of the approach, which partly answers your criticism. For example, betamethasone rescued the day-time hypoactivity without causing night-time hyperactivity, so we are not making the “straw man argument” explained above of using any broad stimulant to rescue the hypoactivity phenotype.

      Furthermore, for diseases where the behavioural defect is the primary concern, such as autism or bipolar disorder, perhaps this behaviour-first approach is all that is needed, and whether or not the compound precisely rescues the underlying mechanism is somewhat secondary. The use of lithium to prevent manic episodes in bipolar disorder is a good example. It was initially tested because mania was thought to be caused by excess uric acid and lithium can dissolve uric acid (Mitchell and Hadzi-Pavlovic, 2000). The theory is now discredited, but lithium continues to be used without a precise understanding of its mode of action. In this example, behavioural rescue alone, with tolerable secondary effects, is sufficient to be beneficial to patients, and whether it modulates the correct causal pathway is secondary.

      On the second point, we agree that testing first ZOLTAR on a mutant for which we have a fairly good understanding of the mechanism causing the behavioural phenotype could have been a productive approach. Note, however, that examples already exist in the literature. First, Hoffman et al. (2016) found that drugs generating behavioural fingerprints that positively correlate with the cntnap2a/cntnap2b double knockout fingerprint are enriched with NMDA and GABA receptor antagonists. In experiments analogous to our citalopram treatment (Fig. 5c,d), cntnap2a/cntnap2b knockout larvae were found to be overly sensitive to the NMDA receptor antagonist MK-801 and the GABAA receptor antagonist pentylenetetrazol (PTZ). Among other drugs tested, zolpidem, a GABAA receptor agonist, caused opposite effects on wild-type and cntnap2a/cntnap2b knockout larvae. Knockout larvae also had fewer GABAergic neurons in the forebrain. Second, Ashlin et al. (2018) found that the fingerprint of pitpnc1a knockout larvae clustered with anti-inflammatory compounds. Flumethasone, an anti-inflammatory corticosteroid, caused a lower increase in activity when added to knockout larvae compared to wild-type larvae. While these studies did not use precisely the same analysis that ZOLTAR runs, they used the same rationale and behavioural dataset to make these predictions (Rihel et al., 2010), which shows that approaches like ZOLTAR can point to causal processes.

      Related to your next point, we may reduce the discussion on sorl1 and serotonin and add some of the present arguments instead, depending on the results from  testing a second SSRI (see next point).

      - The behavioral difference between the sorl1 KO and scrambled at the higher dose of the citalopram is based on a small number of animals. The KO Euclidean distance measure is also more spread out than for the other datasets, and it looks like only five or so fish are driving the group difference. It also appears as though the numbers were also from two injection series. While there is nothing obviously wrong with the data, I would feel more comfortable if such a strong statement of a result from a relatively subtle phenotype were backed up by a higher N or a stable line. It is not impossible that the observed difference is an experimental fluke. If something obvious had emerged through the HCR, that would have also supported the conclusions. As it stands, if no more experiments are done to bolster the claim, the confidence in the strength of the link to serotonin should be reduced (possibly putting the entire section in the supplement and modifying the discussion). The discussion section about serotonin and AD is interesting, but I think that it is excessive without additional evidence.

      We mostly agree with this criticism. One could interpret the larger spread of the data for sorl1 larvae treated with 10 µM citalopram as evidence that the knockout larvae do indeed react differently to the drug at this dose. However, the result indeed does not survive removing the top 5 (p = 0.87) or top 3 (p = 0.18) sorl1 larvae.

      Given that the HCR did not reveal anything striking, we agree with you that too much of our argument relies on this result being robust. As you and reviewer #3 suggest, we plan on repeating this experiment with a different serotonin reuptake inhibitor (SSRI). If the other SSRI also shows a differential effect, this should strengthen the claim that ZOLTAR correctly predicted serotonin signalling as being affected by the loss of Sorl1, even if we did not discover the molecular mechanism.

      - The authors suggest two hypotheses for the behavioral difference between the sorl1 KO and scrambled at the higher dose of the citalopram. While the first is tested, and found to not be supported, the second is not tested at all ("Ruling out the first hypothesis, sorl1 knockouts may react excessively to a given spike in serotonin." and "Second, sorl1 knockouts may be overly sensitive to serotonin itself because post-synaptic neurons have higher levels of serotonin receptors."). Assuming that the finding is robust, there are probably other reasons why the mutants could have a different sensitivity to this molecule. However, if this particular one is going to be mentioned, it is surprising that it was not tested alongside the first hypothesis. This work could proceed without a complete explanation, but additional discussion of the possibilities would be helpful or why the second hypothesis was not tested.

      There are no strong scientific reasons why this hypothesis was not tested. The lead author (F Kroll) moved to a different lab and country so the project was finalised at that time. We do not plan on testing this hypothesis at this stage. However, we will adapt the wording to make it clear this is one possible alternative hypothesis which could be tested in the future, rather than the only alternative.

      - The authors claim that "all four genes produced a fairly consistent phenotype at night". While it is interesting that this result arose in the different lines, the second clutch for some genes did not replicate as well as others. I think the findings are compelling, regardless, but the sometimes missing replicability should be discussed. I wonder if the F0 strategy adds noise to the results and if clean null lines would yield stronger phenotypes. Please discuss this possibility, or others, in regard to the variability in some phenotypes.

      For the first part of this point, please see below our answer to Reviewer #3, point (2) c.

      Regarding the F0 strategy potentially adding variability, it is an interesting question which we tested in a larger dataset of behavioural recordings from F0 and stable knockouts for the same genes (unpublished). In summary, the F0 knockout method does not increase clutch-to-clutch or larva-to-larva variability in the assay. F0 knockout experiments found many more significant parameters and larger effect sizes than stable knockout experiments, but this difference could largely be explained by the larger sample sizes of F0 knockout experiments. In fact, larger sample sizes within individual clutches appears to be a major advantage of the F0 knockout approach over in-cross of heterozygous knockout animals as it increases sensitivity of the assay without causing substantial variability. We plan to report in more details on this analysis in a separate paper as we think it would dilute the focus of the present work.

      - In this work, the knockout of appa/appb is included. While APP is a well-known risk gene, there is no clear justification for making a knockout model. It is well known that the upregulation of app is the driver of Alzheimer's, not downregulation. The authors even indicate an expectation that it could be similar to the other knockouts ("Moreover, the behavioural phenotypes of appa/appb and psen1 knockout larvae had little overlap while they presumably both resulted in the loss of Aβ." and "Comparing with early-onset genes, psen1 knockouts had similar night-time phenotypes, but loss of psen2 or appa/appb had no effect on night-time sleep."). There is no reason to expect similarity between appa/appb and psen1/2. I understand that the app knockouts could unveil interesting early neurodevelopmental roles, but the manuscript needs to be clarified that any findings could be the opposite of expectation in AD.

      On “there is no reason to expect similarity […]”, we disagree. Knockout of appa/appb and knockout psen1 will both result in loss of Aβ (appa/appb encode Aβ and psen1 cleaves Appa/Appb to release Aβ, cf. Fig. 3e). Consequently, a phenotype caused by the loss of Aβ, or possibly other Appa/Appb cleavage products, should logically be found in both appa/appb and psen1 knockouts.

      On “it is well known that the upregulation of APP is the driver of Alzheimer’s, not downregulation”; we of course agree. Among others, the examples of Down syndrome, APP duplication (Sleegers et al., 2006), or mouse models overexpressing human APP show definitely that overexpression of APP is sufficient to cause AD. Having said that, we would not be so quick in dismissing APP knockout as potentially relevant to understanding of Alzheimer’s disease. Loss of soluble Aβ due to aggregation could contribute to pathology (Espay et al., 2023). Without getting too much into this intricate debate, links between levels of Aβ and risk of disease are often counter-intuitive too. For example, out of 138 PSEN1 mutations screened in vitro, 104 reduced total Aβ production and 11 even seemingly abolished the production of both Aβ40 and Aβ42 (Sun et al., 2017). In short, loss of soluble Aβ occurs in both AD and in our appa/appb knockout larvae, but the ideal approach would be to study zebrafish larvae with an in-frame deletion in the Aβ sequence within appa/appb.

      We will adapt the language to address your point. We would not want to imply, for example, that the absence of a night-time sleep phenotype for appa/appb is contradictory to the body of literature showing links between Aβ and sleep, including in zebrafish (Özcan et al., 2020). As you say, our experiment tested loss of App, including Aβ, while the literature typically reports on overexpression of APP, as in APP/PSEN1-overexpressing mice (Jagirdar et al., 2021).

      Reviewer #3 (Public Review):

      In this manuscript by Kroll and colleagues, the authors describe combining behavioral pharmacology with sleep profiling to predict disease and potential treatment pathways at play in AD. AD is used here as a case study, but the approaches detailed can be used for other genetic screens related to normal or pathological states for which sleep/arousal is relevant. The data are for the most part convincing, although generally the phenotypes are relatively small and there are no major new mechanistic insights. Nonetheless, the approaches are certainly of broad interest and the data are comprehensive and detailed.

      A notable weakness is the introduction, which overly generalizes numerous concepts and fails to provide the necessary background to set the stage for the data.

      Major points

      (1) The authors should spend more time explaining what they see as the meaning of the large number of behavioral parameters assayed and specifically what they tell readers about the biology of the animal. Many are hard to understand--e.g. a "slope" parameter.

      We agree that some parameters do not tell something intuitive about the biology of the animal. It would be easy to speculate. For example, the “activity slope” parameter may indicate how quickly the animal becomes tired over the course of the day. On the other hand, fractal dimension describes the “roughness/smoothness” of the larva’s activity trace (Fig. 2–suppl. 1a); but it is not obvious how to translate this into information about the physiology of the animal. We do not see this as an issue though. While some parameters do provide intuitive information about the animal’s behaviour (e.g. sleep duration or sunset startle as a measure of startle response), the benefit of having a large number of behavioural parameters is to compare behavioural fingerprints and assess rescue of the behavioural phenotype by small molecules (Fig. 6c). For this purpose, the more parameters the better. The “MoSeq” approach from Wiltschko et al., 2020 is a good example from literature that inspired our own Fig. 6c. While some of the “behavioural syllables” may be intuitive (e.g. running or grooming), it is probably pointless to try to explain the ‘meaning’ of the “small left turn in place with head motion” syllable (Wiltschko et al., 2020). Nonetheless, this syllable was useful to assess whether a drug specifically treats the behavioural phenotype under study without causing too many side effects. Unfortunately, ZOLTAR has to reduce the FramebyFrame fingerprint (17 parameters) to just six parameters to compare it to the behavioural dataset from Rihel et al., 2010, but here, more parameters would almost certainly translate into better predictions too, regardless of their intuitiveness.

      It is true however that we do not give much information on how some of the less intuitive parameters, such as activity slope or fractal dimension, are calculated or what they describe about the dataset (e.g. roughness/smoothness for fractal dimension). We will improve this in our revised version.

      (2) Because in the end the authors did not screen that many lines, it would increase confidence in the phenotypes to provide more validation of KO specificity. Some suggestions include:

      a. The authors cite a psen1 and psen2 germline mutant lines. Can these be tested in the FramebyFrame R analysis? Do they phenocopy F0 KO larvae?

      We unfortunately do not have those lines. We investigated the availability of importing a psen2 knockout line from abroad, but the process of shipping live animals is becoming more and more cost and time prohibitive. However, we observed the same pigmentation phenotype for psen2 knockouts as reported by Jiang et al., 2018, which is at least a partial confirmation of phenocopying a loss of function stable mutant. 

      b. psen2KO is one of the larger centerpieces of the paper. The authors should present more compelling evidence that animals are truly functionally null. Without this, how do we interpret their phenotypes?

      We disagree that there should be significant doubt about these mutants being truly functionally null,  given the high mutation rate and presence of the expected pigmentation phenotype (Jiang et al., 2018, Fig. 3f and Fig. 3–suppl. 2). The psen2 F0 knockouts were virtually 100% mutated at three exons across the gene (mutation rates were locus 1: 100 ± 0%; locus 2: 99.99 ± 0.06%; locus 3: 99.85 ± 0.24%). Additionally, two of the three mutated exons had particularly high rates of frameshift mutations (locus 1: 97 ± 5%; locus 2: 88 ± 17% frameshift mutation rate). It is virtually impossible that a functional protein is translated given this burden of frameshift mutations. Phenotypically, in addition to the pigmentation defect, double psen1/psen2 F0 knockout larvae had curved tails, the same phenotype as caused by a high dose of the γ-secretase inhibitor DAPT (Yang et al., 2008). These double F0 knockouts were lethal, while knockout of psen1 or psen2 alone did not cause obvious morphological defects. Evidently, most larvae must have been psen2 null mutants in this experiment, otherwise functional Psen2 would have prevented early lethality.

      Translation of zebrafish psen2 can start at downstream start codons if the first exon has a frameshift mutation, generating a seemingly functional Psen2 missing the N-terminus (Jiang et al., 2020). Zebrafish homozygous for this early frameshift mutation had normal pigmentation, showing it is a reliable marker of Psen2 function even when it is mutated. This mechanism is not a concern here as the alternative start codons are still upstream of two of the three mutated exons (the alternative start codons discovered by Jiang et al., 2020 are in exon 2 and 3, but we targeted exon 3, exon 4, and exon 6).

      We understand that the zebrafish community may be cautious about F0 phenotyping compared to stably generated mutants. As mentioned to Reviewer 2, we are planning to assemble a paper that expressly examines F0s vs. stable mutants to allay some of these concerns. We would also suggest that our current manuscript, which combines CRISPR-F0 rapid screening with in silico pharmacological predictions, ultimately represents a first step in characterizing the functions of genes.

      c. Related to the above, for cd2AP and sorl1 KO, some of the effect sizes seem to be driven by one clutch and not the other. In other words, great clutch-to-clutch variability. Should the authors increase the number of clutches assayed?

      Correct, there is great clutch-to-clutch variability in this behavioural assay. This is not specific to our experiments. Even within the same strain, wild-type larvae from different clutches (i.e. non-siblings) behave differently (Joo et al., 2021). This is why it is essential to compare behavioural phenotypes within individual clutches (i.e., from a single pair of parents, one male and one female), as we explain in Methods (section Behavioural video-tracking) and in the documentation of the FramebyFrame package. We often see two different experimental designs in literature: comparing non-sibling wild-type and mutant larvae, or pooling different clutches which include all genotypes (e.g., pooling multiple clutches from heterozygous in-crosses or pooling wild-type clutches before injecting them). The first experimental design causes false positive findings, as the clutch-to-clutch variability we and others (Joo et al., 2021) observe gets interpreted as a behavioural phenotype. The second experimental design should not cause false positives but will decrease the sensitivity of the assay by increasing the spread within genotypes. In both cases, the clutch-to-clutch variability is hidden, either by interpreting it as a phenotype (first case) or by adding it to animal-to-animal variability (second case). Our experimental design is technically more challenging as it requires obtaining large clutches from unique pairs of parents. However, this approach is better as it clearly separates the different sources of variability (clutch-to-clutch or animal-to-animal). As for every experiment, yes, a larger number of replicates would be better, but we do not plan to assay additional clutches at this time. Our work heavily focuses on the sorl1 and psen2 knockout behavioural phenotypes. The key aspects of these phenotypes were effectively tested in four clutches as sorl1 were also tested in the citalopram experiment (Fig. 5), and psen2 was also tested in the small molecule rescue experiment (Fig. 6 and Fig. 6–suppl. 1). In the citalopram experiment, one H2O-treated sorl1 knockout clutch (n = 10) replicates fairly well the baseline recordings in Fig. 4–suppl. 5, the other does not but had especially low sample size (n = 6).

      We also plan to test another SSRI on sorl1 knockouts, so this point will be addressed.

      (3) The authors make the point that most of the AD risk genes are expressed in fish during development. Is there public data to comment on whether the genes of interest are expressed in mature/old fish as well? Just because the genes are expressed early does not at all mean that early- life dysfunction is related to future AD (though this could be the case, of course). Genes with exclusive developmental expression would be strong candidates for such an early-life role, however. I presume the case is made because sleep studies are mainly done in juvenile fish, but I think it is really a pretty minor point and such a strong claim does not even need to be made.

      This is a fair criticism but we do not make this claim, at least not from expression. The reviewer is probably referring to the following quote:

      “[…] most of these were expressed in the brain of 5–6-dpf zebrafish larvae, suggesting they play a role in early brain development or function,”

      which does not mention future risk of Alzheimer’s disease. We do suggest that these genes have a function in development. After all, every gene that plays a role in brain development must be expressed during development, so this wording seems reasonable. As noted, the primary goal was to check that the genes we selected were indeed expressed in zebrafish larvae before performing knockout experiments. Our discussion does raise the hypothesis that mutations in Alzheimer’s risk genes impact brain development and sleep early in life, but this argument primarily relies on our observation that knockout of late-onset Alzheimer’s risk genes causes sleep phenotypes in 7-day old zebrafish larvae and from previous work showing brain structural differences in infants and children at high genetic risk of Alzheimer’s disease (Dean et al., 2014; Quiroz et al., 2015), not solely on gene expression early in life.

      (4) A common quandary with defining sleep behaviorally is how to rectify sleep and activity changes that influence one another. With psen2 KOs, the authors describe reduced activity and increased sleep during the day. But how do we know if the reduced activity drives increased behavioral quiescence that is incorrectly defined as sleep? In instances where sleep is increased but activity during periods during wake are normal or elevated, this is not an issue. But here, the animals might very well be unhealthy, and less active, so naturally they stop moving more for prolonged periods, but the main conclusion is not sleep per se. This is an area where more experiments should be added if the authors do not wish to change/temper the conclusions they draw. Are psen2 KOs responsive to startling stimuli like controls when awake? Do they respond normally when quiescent? Great care must be taken in all models using inactivity as a proxy for sleep, and it can harm the field when there is no acknowledgment that overall health/activity changes could be a confound. Particularly worrisome is the betamethasone data in Figure 6, where activity and sleep are once again coordinately modified by the drug.

      This is a fair criticism. We agree it is a concern, especially in the case of psen2 as we claim that day-time sleep is increased while zebrafish are diurnal. We do not rely heavily on the day-time inactivity being sleep (the ZOLTAR predictions or the small molecule rescue do not change whether the parameter is called sleep or inactivity), but  our choice of labelling may be misleading. We will try to test this claim by plotting the distribution of the inactive period durations. If psen2 knockout larvae indeed sleep more during the day compared to controls, we might predict that inactive periods longer than 1 minute to increase disproportionately compared to the increase in shorter inactive periods.

      To address, “are psen2 KO responsive to startling stimuli like controls when awake/when quiescent”, we can try to look at the behaviour of psen2 knockout larvae that were awake (i.e., moved in the preceding one minute) or ‘asleep’ (i.e., did not move in the preceding one minute) at the light transitions and count the proportion of psen2 knockout or control larvae which displayed a startle response. If most psen2 knockouts react to the light transition, it should at least exclude the concern that they are very unhealthy, as the reviewer suggests. This criticism seems challenging to definitely address experimentally though. A possible approach could be to use a closed-loop system which, after one minute of inactivity, triggers a stimulus which is sufficient to startle an awake larva but not an asleep larva. If psen2 knockout larvae indeed sleep more during the day, the stimulus should usually not be sufficient to startle them. Note, how to calibrate this stimulus is also not straightforward. We do not plan to test this, but our analysis of the light transitions may provide a decent proxy.

      (5) The conclusions for the serotonin section are overstated. Behavioural pharmacology purports to predict a signaling pathway disrupted with sorl1 KO. But is it not just possible that the drug acts in parallel to the true disrupted pathway in these fish? There is no direct evidence for serotonin dysfunction - that conclusion is based on response to the drug. Moreover, it is just 1 drug - is the same phenotype present with another SSRI? Likewise, language should be toned down in the discussion, as this hypothesis is not "confirmed" by the results (consider "supported"). The lack of measured serotonin differences further raises concern that this is not the true pathway. This is another major point that deserves further experimental evidence, because without it, the entire approach (behavioral pharm screen) seems more shaky as a way to identify mechanisms. There are any number of testable hypotheses to pursue such as a) Using transient transgenesis to visualize 5HT neuron morphology (is development perturbed: cell number, neurite morphology, synapse formation); b) Using transgenic Ca reporters to assay 5HT neuron activity.

      Regarding the comment, “is it not just possible that the drug acts in parallel to the true disrupted pathway”, we think no, assuming we understand correctly your question. Key to our argument is the fact that sorl1 knockout larvae react differently to the drug than control larvae. As an example, take night-time sleep bout length, which was not affected by knockout of sorl1 (Fig. 4–suppl. 5). For the sake of the argument, say only dopamine signalling (the “true disrupted pathway”) was affected in sorl1 knockouts but that serotonin signalling was intact. Assuming that citalopram specifically alters serotonin signalling, then treatment should cause the same increase in sleep bout length in both knockouts and controls as serotonin signalling is intact in both. This is not what we see, however. Citalopram caused a greater increase in sleep bout length in sorl1 knockouts than in scrambled-injected larvae. In other words, the effect is non-additive, in the sense that citalopram did not add the same number of Z-scores to sorl1 knockouts or controls. We think this shows that serotonin signalling is somehow different in sorl1 knockouts. Nonetheless, we would concede that the experiment does not necessarily says much about the importance of the serotonin disruption caused by loss of Sorl1. It could be, for example, that the most salient consequence of loss of Sorl1 is cholinergic disruption (see reply to Reviewer #1 above) and that serotonin signalling is a minor theme.

      Furthermore, we agree with you and Reviewer #2 that the conclusions are overly confident. We will repeat this experiment with another SSRI as you suggest. Your suggestions to further test the serotonin system in the sorl1 knockouts are excellent as well, however we do not plan to pursue them at this stage.

      References:

      Ashlin TG, Blunsom NJ, Ghosh M, Cockcroft S, Rihel J. 2018. Pitpnc1a Regulates Zebrafish Sleep and Wake Behavior through Modulation of Insulin-like Growth Factor Signaling. Cell Rep 24:1389–1396. doi:10.1016/j.celrep.2018.07.012

      Chen D, Wang X, Huang T, Jia J. 2022. Sleep and Late-Onset Alzheimer’s Disease: Shared Genetic Risk Factors, Drug Targets, Molecular Mechanisms, and Causal Effects. Front Genet 13. doi:10.3389/fgene.2022.794202

      Cirrito JR, Disabato BM, Restivo JL, Verges DK, Goebel WD, Sathyan A, Hayreh D, D’Angelo G, Benzinger T, Yoon H, Kim J, Morris JC, Mintun MA, Sheline YI. 2011. Serotonin signaling is associated with lower amyloid-β levels and plaques in transgenic mice and humans. Proc Natl Acad Sci U S A 108:14968–14973. doi:10.1073/pnas.1107411108

      Dean DC, Jerskey BA, Chen K, Protas H, Thiyyagura P, Roontiva A, O’Muircheartaigh J, Dirks H, Waskiewicz N, Lehman K, Siniard AL, Turk MN, Hua X, Madsen SK, Thompson PM, Fleisher AS, Huentelman MJ, Deoni SCL, Reiman EM. 2014. Brain Differences in Infants at Differential Genetic Risk for Late-Onset Alzheimer Disease A Cross-sectional Imaging Study. JAMA Neurol 71:11–22. doi:10.1001/jamaneurol.2013.4544

      Eriksen JL, Sagi SA, Smith TE, Weggen S, Das P, McLendon DC, Ozols VV, Jessing KW, Zavitz KH, Koo EH, Golde TE. 2003. NSAIDs and enantiomers of flurbiprofen target γ-secretase and lower Aβ42 in vivo. J Clin Invest 112:440–449. doi:10.1172/JCI18162

      Espay AJ, Herrup K, Kepp KP, Daly T. 2023. The proteinopenia hypothesis: Loss of Aβ42 and the onset of Alzheimer’s Disease. Ageing Res Rev 92:102112. doi:10.1016/j.arr.2023.102112

      Hoffman EJ, Turner KJ, Fernandez JM, Cifuentes D, Ghosh M, Ijaz S, Jain RA, Kubo F, Bill BR, Baier H, Granato M, Barresi MJF, Wilson SW, Rihel J, State MW, Giraldez AJ. 2016. Estrogens Suppress a Behavioral Phenotype in Zebrafish Mutants of the Autism Risk Gene, CNTNAP2. Neuron 89:725–733. doi:10.1016/j.neuron.2015.12.039

      in ’t Veld Bas A., Ruitenberg Annemieke, Hofman Albert, Launer Lenore J., van Duijn Cornelia M., Stijnen Theo, Breteler Monique M.B., Stricker Bruno H.C. 2001. Nonsteroidal Antiinflammatory Drugs and the Risk of Alzheimer’s Disease. N Engl J Med 345:1515–1521. doi:10.1056/NEJMoa010178

      Jagirdar R, Fu C-H, Park J, Corbett BF, Seibt FM, Beierlein M, Chin J. 2021. Restoring activity in the thalamic reticular nucleus improves sleep architecture and reduces Aβ accumulation in mice. Sci Transl Med 13:eabh4284. doi:10.1126/scitranslmed.abh4284

      Jiang H, Newman M, Lardelli M. 2018. The zebrafish orthologue of familial Alzheimer’s disease gene PRESENILIN 2 is required for normal adult melanotic skin pigmentation. PLOS ONE 13:e0206155. doi:10.1371/journal.pone.0206155

      Jiang H, Pederson SM, Newman M, Dong Y, Barthelson K, Lardelli M. 2020. Transcriptome analysis indicates dominant effects on ribosome and mitochondrial function of a premature termination codon mutation in the zebrafish gene psen2. PloS One 15:e0232559. doi:10.1371/journal.pone.0232559

      Joo W, Vivian MD, Graham BJ, Soucy ER, Thyme SB. 2021. A Customizable Low-Cost System for Massively Parallel Zebrafish Behavioral Phenotyping. Front Behav Neurosci 14.

      Joubert L, Hanson B, Barthet G, Sebben M, Claeysen S, Hong W, Marin P, Dumuis A, Bockaert J. 2004. New sorting nexin (SNX27) and NHERF specifically interact with the 5-HT4a receptor splice variant: roles in receptor targeting. J Cell Sci 117:5367–5379. doi:10.1242/jcs.01379

      Lauretti E, Dincer O, Praticò D. 2020. Glycogen synthase kinase-3 signaling in Alzheimer’s disease. Biochim Biophys Acta Mol Cell Res 1867:118664. doi:10.1016/j.bbamcr.2020.118664

      Leng Y, Ackley SF, Glymour MM, Yaffe K, Brenowitz WD. 2021. Genetic Risk of Alzheimer’s Disease and Sleep Duration in Non-Demented Elders. Ann Neurol 89:177–181. doi:10.1002/ana.25910

      Mitchell PB, Hadzi-Pavlovic D. 2000. Lithium treatment for bipolar disorder. Bull World Health Organ 78:515–517.

      Munoz-Torrero D. 2008. Acetylcholinesterase Inhibitors as Disease-Modifying Therapies for Alzheimer’s Disease. Curr Med Chem 15:2433–2455. doi:10.2174/092986708785909067

      Muto V, Koshmanova E, Ghaemmaghami P, Jaspar M, Meyer C, Elansary M, Van Egroo M, Chylinski D, Berthomier C, Brandewinder M, Mouraux C, Schmidt C, Hammad G, Coppieters W, Ahariz N, Degueldre C, Luxen A, Salmon E, Phillips C, Archer SN, Yengo L, Byrne E, Collette F, Georges M, Dijk D-J, Maquet P, Visscher PM, Vandewalle G. 2021. Alzheimer’s disease genetic risk and sleep phenotypes in healthy young men: association with more slow waves and daytime sleepiness. Sleep 44. doi:10.1093/sleep/zsaa137

      Myers-Turnbull D, Taylor JC, Helsell C, McCarroll MN, Ki CS, Tummino TA, Ravikumar S, Kinser R, Gendelev L, Alexander R, Keiser MJ, Kokel D. 2022. Simultaneous analysis of neuroactive compounds in zebrafish. doi:10.1101/2020.01.01.891432

      Özcan GG, Lim S, Leighton PL, Allison WT, Rihel J. 2020. Sleep is bi-directionally modified by amyloid beta oligomers. eLife 9:e53995. doi:10.7554/eLife.53995

      Quiroz YT, Schultz AP, Chen K, Protas HD, Brickhouse M, Fleisher AS, Langbaum JB, Thiyyagura P, Fagan AM, Shah AR, Muniz M, Arboleda-Velasquez JF, Munoz C, Garcia G, Acosta-Baena N, Giraldo M, Tirado V, Ramírez DL, Tariot PN, Dickerson BC, Sperling RA, Lopera F, Reiman EM. 2015. Brain Imaging and Blood Biomarker Abnormalities in Children With Autosomal Dominant Alzheimer Disease: A Cross-Sectional Study. JAMA Neurol 72:912–919. doi:10.1001/jamaneurol.2015.1099

      Relkin NR. 2007. Beyond symptomatic therapy: a re-examination of acetylcholinesterase inhibitors in Alzheimer’s disease. Expert Rev Neurother 7:735–748. doi:10.1586/14737175.7.6.735

      Rihel J, Prober DA, Arvanites A, Lam K, Zimmerman S, Jang S, Haggarty SJ, Kokel D, Rubin LL, Peterson RT, Schier AF. 2010. Zebrafish Behavioral Profiling Links Drugs to Biological Targets and Rest/Wake Regulation. Science 327:348–351. doi:10.1126/science.1183090

      Sleegers K, Brouwers N, Gijselinck I, Theuns J, Goossens D, Wauters J, Del-Favero J, Cruts M, van Duijn CM, Van Broeckhoven C. 2006. APP duplication is sufficient to cause early onset Alzheimer’s dementia with cerebral amyloid angiopathy. Brain J Neurol 129:2977–2983. doi:10.1093/brain/awl203

      Sun L, Zhou R, Yang G, Shi Y. 2017. Analysis of 138 pathogenic mutations in presenilin-1 on the in vitro production of Aβ42 and Aβ40 peptides by γ-secretase. Proc Natl Acad Sci 114:E476–E485. doi:10.1073/pnas.1618657114

      Weggen S, Rogers M, Eriksen J. 2007. NSAIDs: small molecules for prevention of Alzheimer’s disease or precursors for future drug development? Trends Pharmacol Sci 28:536–543. doi:10.1016/j.tips.2007.09.004

      Wiltschko AB, Tsukahara T, Zeine A, Anyoha R, Gillis WF, Markowitz JE, Peterson RE, Katon J, Johnson MJ, Datta SR. 2020. Revealing the structure of pharmacobehavioral space through motion sequencing. Nat Neurosci 23:1433–1443. doi:10.1038/s41593-020-00706-3

      Yang T, Arslanova D, Gu Y, Augelli-Szafran C, Xia W. 2008. Quantification of gamma-secretase modulation differentiates inhibitor compound selectivity between two substrates Notch and amyloid precursor protein. Mol Brain 1:15. doi:10.1186/1756-6606-1-15

    1. Author response:

      We thank the reviewers for their efforts. They have pointed out several shortcomings and made very helpful suggestions. Below, we shortly address the weak points that the reviewers brought up and outline what improvements we intend to make for the revised paper in response.

      Reviewer #1:

      The interpretation of CNN results, especially the number of layers in the final model and its relationship with the processing of visual words in the human brain, needs to be further strengthened.

      The results of our experimentation with the number of layers and the number of units in each layer can be found in the supplementary information. In the revised version, we will bring some of these results into the main text and discuss them more thoroughly.

      Reviewer #2:

      As has been shown over many decades, many potential computational algorithms, with varied model architectures, can perform the task of text recognition from an image. However, there is no evidence presented here that this particular algorithm has comparable performance to human behavior (i.e. similar accuracy with a comparable pattern of mistakes). This is a fundamental prerequisite before attempting to meaningfully correlate these layer activations to human neural activations. Therefore, it is unlikely that correlating these derived layer weights to neural activity provides meaningful novel insights into neural computation beyond what is seen using traditional experimental methods.

      We very much agree with the reviewer that a qualitative analysis of whether the model can explain experimental effects needs to happen before a quantitative analysis, such as evaluating model-brain correlation scores. In fact, this is one of the key points we wished to make.

      This starts with the observation that "traditional" models of reading (=those that do not rely on deep learning) cannot explain some very basic human behavioral results, such as humans being able to recognize a word regardless of exact letter shape, size, and (up to a point) rotation. This is not so much a failure on the part of traditional models as it is a difference in focus. There are models of vision that focus on these low-level things, currently dominated by deep learning, but these are rarely evaluated in the context of reading, which has its own literature and well-known experimental effects. We believe the current version of the manuscript makes insufficiently clear what the goals of our modeling effort are exactly, which is something we will attempt to correct in the revision.

      Since our model only covers the first phase of reading, with a special focus on letter shape detection, we sought to compare it with neuroimaging data that can provide "snapshots" of the state of the brain during these early phases, rather than comparing it with behavioral results that occur at the very end. However, we very much make this comparison in the spirit hinted at by the reviewer. The different MEG components have a distinct "behavior" to them in the way they respond to different experimental conditions (Figure 2), and the model needs to replicate this behavior (Figure 4). Only then do we move on to a quantitative analysis.

      One example of a substantial discrepancy between this model and neural activations is that, while incorporating frequency weighting into the training data is shown to slightly increase neural correlation with the model, Figure 7 shows that no layer of the model appears directly sensitive to word frequency. This is in stark contrast to the strong neural sensitivity to word frequency seen in EEG (e.g. Dambacher et al 2006 Brain Research), fMRI (e.g. Kronbichler et al 2004 NeuroImage), MEG (e.g. Huizeling et al 2021 Neurobio. Lang.), and intracranial (e.g. Woolnough et al 2022 J. Neurosci.) recordings. Figure 7 also demonstrates that the late stages of the model show a strong negative correlation with font size, whereas later stages of neural visual word processing are typically insensitive to differences in visual features, instead showing sensitivity to lexical factors.

      We are glad the reviewer brought up the topic of frequency balancing, as it is a good example of the importance of the qualitative analysis. As the reviewer points out, frequency balancing during training only had a moderate impact on correlation scores and from that point of view does not seem impactful. However, when we look at the qualitative evaluation, we see that with a large vocabulary, a model without frequency balancing fails to properly distinguish between consonant strings and (pseudo)words (Figure 4, 5th row). Hence, from the point of view of being able to reproduce experimental effects, frequency balancing had a large impact. It is true that the model, even with frequency balancing, only captures letter- and bigram-frequency effects and not word-frequency effects, as we know the N400 is sensitive to. This could mean that N400 word-frequency effects are driven by mechanics that our current model lacks, such as top-down effects from systems further up the processing pipeline.

      We agree with the reviewer that the late-stage sensitivity of the model to font size must be seen as a flaw. Of course, we say as much when we discuss this result in the paper. Important context for this flaw is that the main aim of the model is to reproduce the experimental effects of Vartiainen et al. (2011), which does not include manipulation of word length. The experimental contrasts in Figure 7 are meant to explore a bit beyond the boundaries of that particular study, but were never considered "failure points". When presenting a model, it's important to show its limitations too.

      Another example of the mismatch between this model and the visual cortex is the lack of feedback connections in the model. Within the visual cortex, there are extensive feedback connections, with later processing stages providing recursive feedback to earlier stages. This is especially evident in reading, where feedback from lexical-level processes feeds back to letter-level processes (e.g. Heilbron et al 2020 Nature Comms.). This feedback is especially relevant for the reading of words in noisy conditions, as tested in the current manuscript, as lexical knowledge enhances letter representation in the visual cortex (the word superiority effect). This results in neural activity in multiple cortical areas varying over time, changing selectivity within a region at different measured time points (e.g. Woolnough et al 2021 Nature Human Behav.), which in the current study is simplified down to three discrete time windows, each attributed to different spatial locations.

      In this study, we make a start in showing how deep learning techniques could be beneficial to enhance models of reading by showing how even a simple CNN, after a few enhancements, can account for several experimental MEG effects that we see in reading tasks, but are outside the focus of traditional models of reading. We never intended to claim that our model offers a complete view of all the processes involved. This is why we have dedicated a section in the Discussion to the various ways in which our simple CNN is incomplete as a model of reading. In this section we hint at the usage of recurrent connections, but the reviewer does an excellent job of highlighting the importance of top-down connections even in models focusing on early visual processes, which we are very happy to include in this section.

      The presented model needs substantial further development to be able to replicate, both behaviorally and neurally, many of the well-characterized phenomena seen in human behavior and neural recordings that are fundamental hallmarks of human visual word processing. Until that point, it is unclear what novel contributions can be gleaned from correlating low-dimensional model weights from these computational models with human neural data.

      The CNN model we present in this study is a small piece in a bigger effort to employ deep learning techniques to further enhance already existing models of reading. For our revision, we plan to expand on the question of where to go from here and outline our vision on how these techniques could help us better model the phenomena the reviewer speaks of. We agree with the reviewer that there is a long way to go, and we are excited to be a part of it.

      Reviewer #3:

      The paper is rather qualitative in nature. In particular, the authors show that some resemblance exists between the behavior of some layers and some parts of the brain, but it is hard to quantitively understand how strong the resemblances are in each layer, and the exact impact of experimental settings such as the frequency balancing (which seems to only have a very moderate effect according to Figure 5).

      The large focus on a qualitative evaluation of the model is intentional. The ability of the model to reproduce experimental effects (Figure 4) is a pre-requisite for any subsequent qualitative metrics (such as correlation) to be valid. The introduction of frequency balancing is a good example of this. As the reviewer points out, frequency balancing during training has only a moderate impact on correlation scores and from that point of view does not seem impactful. However, when we look at the qualitative evaluation, we see that with a large vocabulary, a model without frequency balancing fails to properly distinguish between consonant strings and (pseudo)words (Figure 4, 5th row). Hence, from the point of view of being able to reproduce experimental effects, frequency balancing has a large impact.

      That said, the reviewer is right to highlight the value of quantitative analysis. An important limitation of the "traditional" models of reading that do not employ deep learning is that they operate in unrealistically simplified environments (e.g. input as predefined line segments, words of a fixed length), which makes a quantitative comparison with brain data problematic. The main benefit that deep learning brings may very well be the increase in scale that makes more direct comparisons with brain data possible. In our revision we will attempt to capitalize on this benefit more. The reviewer has provided some helpful suggestions for doing so in their recommendations.

      The experiments only consider a rather outdated vision model (VGG).

      VGG was designed to use a minimal number of operations (convolution-and-pooling, fully-connected linear steps, ReLU activations, and batch normalization) and rely mostly on scale to solve the classification task. This makes VGG a good place to start our explorations and see how far a basic CNN can take us in terms of explaining experimental MEG effects in visual word recognition. However, we agree with the reviewer that it is easy to envision more advanced models that could potentially explain more. For our revision, we plan to expand on the question of where to go from here and outline our vision on what types of models would be worth investigating and how one may go about doing that in a way that provides insights beyond higher correlation values.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This structural and biochemical study of the mouse homolog of acidic mammalian chitinase (AMCase) enhances our understanding of the pH-dependent activity and catalytic properties of mouse AMCase and sheds light on its adaptation to different physiological pH environments. The methods and analysis of data are solid, providing several lines of evidence to support a development of mechanistic hypotheses. While the findings and interpretation will be valuable to those studying AMCase in mice, the broader significance, including extension of the results to other species including human, remain unclear.

      Public Reviews:

      Reviewer #1 (Public Review):

      General comments:

      This paper investigates the pH-specific enzymatic activity of mouse acidic mammalian chitinase (AMCase) and aims to elucidate its function's underlying mechanisms. The authors employ a comprehensive approach, including hydrolysis assays, X-ray crystallography, theoretical calculations of pKa values, and molecular dynamics simulations to observe the behavior of mouse AMCase and explore the structural features influencing its pH-dependent activity.

      The study's key findings include determining kinetic parameters (Kcat and Km) under a broad range of pH conditions, spanning from strong acid to neutral. The results reveal pH-dependent changes in enzymatic activity, suggesting that mouse AMCase employs different mechanisms for protonation of the catalytic glutamic acid residue and the neighboring two aspartic acids at the catalytic motif under distinct pH conditions.

      The novelty of this research lies in the observation of structural rearrangements and the identification of pH-dependent mechanisms in mouse AMCase, offering a unique perspective on its enzymatic activity compared to other enzymes. By investigating the distinct protonation mechanisms and their relationship to pH, the authors reveal the adaptive nature of mouse AMCase, highlighting its ability to adjust its catalytic behavior in response to varying pH conditions. These insights contribute to our understanding of the pH-specific enzymatic activity of mouse AMCase and provide valuable information about its adaptation to different physiological conditions.

      Overall, the study enhances our understanding of the pH-dependent activity and catalytic properties of mouse AMCase and sheds light on its adaptation to different physiological pH environments.

      Reviewer #2 (Public Review):

      Summary:

      In this study of the mouse homolog of acidic mammalian chitinase, the overall goal is to provide a mechanistic explanation for the unusual observation of two pH optima for the enzyme. The study includes biochemical assays to establish kinetic parameters at different solution pH, structural studies of enzyme/substrate complexes, and theoretical analysis of amino acid side chain pKas and molecular dynamics.

      Strengths:

      The biochemical assays are rigorous and nicely complemented by the structural and computational analysis. The mechanistic proposal that results from the study is well rationalized by the observations in the study.

      Weaknesses:

      The overall significance of the work could be made more clear. Additional details could be provided about the limitations of prior biochemical studies of mAMC that warranted the kinetic analysis. The mouse enzyme seems unique in terms of its behavior at high and low pH, so it remains unclear how the work will enhance broader understanding of this enzyme class. It was also not clear can the findings be used for therapeutic purposes, as detailed in the abstract, if the human enzyme works differently.

      We have edited the paper to address these concerns

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) Regarding the pH profiles of mouse AMCase, previous studies have reported its activity at pH 2.0 and within the pH range of 3-7. In this paper, the authors conducted kinetic measurements and showed that pH 6.5 is optimal for kcat/Km. The authors emphasize the significance of mouse AMCase's activity in the neutral region, particularly at pH 6.5, for understanding its physiological relevance in humans. To provide a comprehensive overview, it would be valuable for the authors to summarize the findings from previous and current studies, discuss their implications for future pulmonary therapy in humans, and cite relevant literature. Additionally, the authors should highlight their research's specific contributions and novel findings, such as the determination of kinetic parameters (Kcat and Km) under different pH conditions. Emphasizing why previous studies may have required these observations and underscoring the importance of the present findings in addressing those knowledge gaps will help readers understand the significance of the study and its impact on the field of enzymology.

      We thank the reviewer for this comment. In keeping with the knowledge gaps addressed directly by this paper, we have not augmented the discussion of future pulmonary therapy in humans. We have summarized the present findings at the end of the introduction as follows:

      “We measured the mAMCase hydrolysis of chitin, which revealed significant activity increase under more acidic conditions compared to neutral or basic conditions. To understand the relationship between catalytic residue protonation state and pH-dependent enzyme activity, we calculated the theoretical pKa of the active site residues and performed molecular dynamics (MD) simulations of mAMCase at various pHs. We also directly observed conformational and chemical features of mAMCase between pH 4.74 to 5.60 by solving X-ray crystal structures of mAMCase in complex with oligomeric GlcNAcn across this range.”

      (2) Regarding the implications of the pKa values and Asp138 orientation for the pH optima, it would be valuable for the authors to discuss the variations in optimal activity by pH among GH-18 chitinases and investigate the underlying factors contributing to these differences. In particular, exploring the role of Asp138 orientation in chitotriosidase, another mammalian chitinase, would provide important insights. Chitotriosidase is known to be inactive at pH 2.0, and it would be interesting to investigate whether the observed orientation of Asp138 towards Glu140 in mouse AMCase for pH 2.0 activity is lacking in chitotriosidase.

      There are similar rotations of the two acidic residues in the literature on Chit1. The variety of crystal pH conditions and the lack of a straightforward mechanism for pKa shifts in AMCase make it difficult to draw a comparison to why Chit1 is inactive at low pH, but this is an interesting area for future study. See a more full discussion in: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2760363/

      Furthermore, considering the lower activity of human AMCase at pH 2.0, it would be worthwhile to examine whether the Asp138 orientation towards Glu140, as observed in mouse AMCase, is also absent in human AMCase. Exploring this aspect will help determine if the orientation of Asp138 plays a critical role in pH-dependent activity in human AMCase.

      The situation for hAMCase is similar to Chit1 as the rotations observed here for mAMCase are also present. It is not the whether Asp138 can rotate, but rather the relevant energetic penalties as we discuss in the manuscript.

      (3) In a previous study by Okawa et al.(Loss and gain of human acidic mammalian chitinase activity by nonsynonymous SNPs. Mol Biol Evol 33, 3183-3193, 2016), it was reported that specific amino acid substitutions (N45D, D47N, and R61M) encoded by nonsynonymous single nucleotide polymorphisms (nsSNPs) in the N-terminal region of human AMCase had distinct effects on its chitinolytic activity. Introducing these three residues (N45D, D47N, and R61M) could activate human AMCase. This activation significantly shifted the optimal pH from 4-5 to 2.0.

      Considering the significant impact of these amino acid substitutions on the pH-dependent activity of human AMCase, the authors should discuss this point in the manuscript's discussion section. Incorporating the findings and relating them to the current study's observations on pH optima and Asp138 orientation can provide a comprehensive understanding of the factors influencing pH-dependent activity in AMCase.

      We added a citation and dicuss how the mutations identified by this study could potentially shift the pKa of key catalytic residues:

      “Okawa et al identified how primate AMCase lost activity by integration of specific, potentially pKa-shifting, mutations relative to the mouse counterpart42b.”

      (4) To further strengthen the discussion, the authors could explore the ancestral insectivorous nature of placental mammals and the differences in chitinase activity between herbivorous and omnivorous species. Incorporating these aspects would add depth and relevance to the overall discussion of AMCase. AMCase is an enzyme known for its role in digesting insect chitin in the stomachs of various insectivorous and omnivorous animals, including bats, mice, chickens, pigs, pangolins, common marmosets, and crab-eating monkeys 1-7. However, in certain animals, such as dogs (carnivores) and cattle (herbivores), AMCase expression and activity are significantly low, leading to impaired chitin digestion 8. These observations suggest a connection between dietary habits and the expression and activity of the AMCase gene, ultimately influencing chitin digestibility across different animal species 8.

      (1) Strobelet al. (2013). Insectivorous bats digest chitin in the stomach using acidic mammalian chitinase. PloS one 8, e72770.

      (2) Ohno et al. (2016). Acidic mammalian chitinase is a proteases-resistant glycosidase in mouse digestive system. Sci Rep 6, 37756.

      (3) Tabata et al. (2017). Gastric and intestinal proteases resistance of chicken acidic chitinase nominates chitin-containing organisms for alternative whole edible diets for poultry. Sci Rep 7, 6662.

      (4) Tabata et al. (2017). Protease resistance of porcine acidic mammalian chitinase under gastrointestinal conditions implies that chitin-containing organisms can be sustainable dietary resources. Sci Rep 7, 12963.

      (5) Ma et al. (2018). Acidic mammalian chitinase gene is highly expressed in the special oxyntic glands of Manis javanica. FEBS Open Bio 8, 1247-1255.

      (6) Tabata et al. (2019). High expression of acidic chitinase and chitin digestibility in the stomach of common marmoset (Callithrix jacchus), an insectivorous nonhuman primate. Sci. Rep. 9. 159.

      (7) Uehara et al. (2021). Robust chitinolytic activity of crab-eating monkey (Macaca fascicularis) acidic chitinase under a broad pH and temperature range. Sci. Rep. 11, 15470.

      (8) Tabata et al. (2018). Chitin digestibility is dependent on feeding behaviors, which determine acidic chitinase mRNA levels in mammalian and poultry stomachs. Sci Rep 8, 1461.

      This overall point is covered by our brief discussion on diet differences:

      “However, hAMCase is likely too destabilized at low pH to observe an increase in _k_cat. hAMCase may be under less pressure to maintain high activity at low pH due to humans’ noninsect-based diet, which contains less chitin compared to other mammals with primarily insect-based diets42. “

      (5) It is important for the authors to clearly state the limitations of their simulations and emphasize the need for experimental validation or additional supporting evidence. This will provide transparency and enable readers to understand the boundaries of the study's findings. A comprehensive discussion of limitations would contribute to a more robust interpretation of the results.

      We added a sentence to the discussion:

      “Our simulations have important limitations that could be overcome by quantum mechanical simulations that allow for changes in protonation state and improved consideration of polarizability.”

      Minor comments:

      (1) Regarding the naming of AMCase, it is important to accurately describe it based on its acidic isoelectric point rather than its enzymatic activity under acidic conditions based on the original paper (Reference #14 (Boot, R. G. et al. Identification of a novel acidic mammalian chitinase distinct from chitotriosidase. J. Biol. Chem. 276, 6770-6778 (2001)).

      We have made this modification

      (2) In the introduction, providing more context regarding the terminology of acidic mammalian chitinase (AMCase) would be beneficial. While AMCase was initially discovered in mice and humans, subsequent research has revealed its presence in various vertebrates, including birds, fish, and other species. Therefore, it would be appropriate to include the alternative enzyme name, Chia (chitinase, acidic), in the introduction to reflect its broader distribution across different organisms. This clarification would enhance the readers' understanding of the enzyme's taxonomy and facilitate further exploration of its functional significance in diverse biological systems.

      We have made this modification

      (3) The authors mention that AMCase is active in tissues with neutral pHs, such as the lung. However, it is important to consider that the pH in the lung is lower, around 5, due to the presence of dissolved CO2 that forms carbonic acid. The lung microenvironment is known to vary, and specific regions or conditions within the lung may have slightly different pH levels. By addressing the pH conditions in the lungs and their relationship to AMCase's activity, the authors can enhance our understanding of the enzyme's function within its physiological context. A thorough discussion of the specific pH conditions in the lung and their implications for AMCase's activity would provide valuable insights into the enzyme's role in lung pathophysiology.

      To keep the focus on the insights we have made, we have elected not to expand this discussion.

      (4) It would be helpful for the authors to provide more information about the substrate or products of AMCase. The basic X-ray crystal structures used in this study are GlcNAc2 or GlcNAc3, known products of AMCase. Including details about the specific ligands involved in the enzymatic reactions would enhance the understanding of the study's focus.

      We are unclear about what this means - and since it is a minor comment, we have elected not to change the discussion of substrates here.

      (5) The authors should critically evaluate the inclusion of the term "chitin-binding" in the Abstract and Introduction. Suppose substantial evidence or discussion regarding the specific chitin-binding properties of the enzyme or its relevance to the immune response needs to be included. In that case, removing or modifying that statement might be appropriate.

      We are unclear about what this means - and since it is a minor comment, we have elected not to change the discussion of “chitin-binding” here.

      (6) The authors developed an endpoint assay to measure the activity of mouse AMCase across a broad pH range, allowing for direct measurement of kinetic parameters. The authors should provide a more detailed description of the methods used, including any specific modifications made to the previous assay, to ensure reproducibility and facilitate further research in the field. It is important to clearly show the novelty of their endpoint assay compared to previous methods employed in other reports. The authors should also explain how their modified endpoint assay differs from existing assays and highlight its advancements or improvements. This will help readers understand the unique features and contributions of the assay in the context of previous methods.

      We have included a detailed method description and figures already. See also our previous paper by Barad which includes other, related, assays.

      (7) The authors suggest that mouse AMCase may be subject to product inhibition, potentially due to its transglycosylation activity, which can affect the Michaelis-Menten model predictions at high substrate concentrations. However, the reviewer needed help understanding the specific impact of transglycosylation on the kinetic parameters. It would be helpful for the authors to provide a more appropriate and detailed explanation, clarifying how transglycosylation activity influences the kinetic behavior of AMCase and its implications for the observed results.

      The experiments to conclusively demonstrate this are beyond our current capabilities.

      (8) In the Abstract, the authors state, "We also solved high resolution crystal structures of mAMCase in complex with chitin, where we identified extensive conformational ligand heterogeneity." This reviewer suggests replacing "chitin" with "oligomeric GlcNAcn" throughout the text, specifically about biochemical experiments. It is important to accurately describe the experimental conditions and ligands used in the study.

      We have made these changes throughout the manuscript

      (9) In the introduction, the authors mention "a polymer of β(1-4)-linked N-acetyl-D-glucosamine (GlcNAc)". In this case, the letter "N" should be italicized to conform to the proper notation for the monosaccharide abbreviation.

      corrected (and hopefully would have been done so by the copy editor!)

      (10) In the introduction, the authors state, "In the absence of AMCase, chitin accumulates in the airways, leading to epithelial stress, chronic activation of type 2 immunity, and age-related pulmonary fibrosis5,6". It is recommended to clarify that "AMCase" refers to "acidic mammalian chitinase (AMCase)" in this context, as it is the first mention of the enzyme in the introduction.

      We moved that section so that it flows better and is introduced with the full name.

      (11) In the introduction, the authors state, "Mitigating the negative effects of high chitin levels is particularly important for mammalian lung and gastrointestinal health." This reviewer requests further clarification on the connection between chitin and gastrointestinal health. Please provide an explanation or reference to support this statement.

      We have modified this sentence to:

      “Chitin levels can be potentially important for mammalian lung and gastrointestinal health.”

      (12) In the introduction, the authors mention that "Acidic Mammalian Chitinase (AMCase) was originally discovered in the stomach and named for its high enzymatic activity under acidic conditions." It is recommended to include Reference #14 (Boot et al. J. Biol. Chem. 276, 6770-6778, 2001) as it provides the first report on mouse and human AMCase, contributing to the understanding of the enzyme.

      However, it is worth noting that while this paragraph primarily focuses on human tissues, Reference #14 primarily discusses mouse AMCase but also reports on human AMCase. Additionally, References #8 and #9 mainly discuss mouse AMCase. This creates confusion in the description of human and mouse AMCase within the paragraph.

      Considering that this paper aims to focus on the unique features of mouse AMCase, it is suggested that the authors provide a more specific and balanced description of both human and mouse AMCase throughout the main text..

      We have clarified the origin of the name AMCase and the results distinguish the two orthologs in the text with h or mAMCase.

      (13) Figure 1A in the Introduction section has been previously presented in several papers. The authors should consider moving this figure to the Results section and present an alternative figure based on their experimental results to enhance the novelty and impact of the study.

      We have considered this option, but prefer the original placement.

      (14) In the Results section, the authors mentioned, "Prior studies have focused on relative mAMCase activity at different pH18,20, limiting the ability to define its enzymological properties precisely and quantitatively across conditions of interest." It would be beneficial for the authors to include reference #14, the first report showing the pH profile of mouse AMCase, to support their statement.

      We have added this reference

      (15) Regarding the statement, "To overcome the pH-dependent fluorescent properties of 4MU-chitobioside, we reverted the assay into an endpoint assay, which allowed us to measure substrate breakdown across different pH (Supplemental Figure 1A)", the authors should provide a more detailed description of the improvements made to measure AMCase activity. Additionally, it would be helpful to include a thorough explanation of the figure legend for Supplementary Figure 1A to provide clarity to readers.

      We have included a detailed method description and figures already. See also our previous paper by Barad which includes other, related, assays.

      (16) Figure 1B shows that the authors used the AMCase catalytic domain. It would benefit the authors to explain the rationale behind this choice in the figure legend or the main text.

      This point is addressed in the text:

      “Previous structural studies on AMCase have focused on interactions between inhibitors like methylallosamidin and the catalytic domain of the protein.”

      (17) For Figures 1C-E, it is recommended that the authors include error bars in their results to represent the variability or uncertainty of the data. In Figure 1E, the authors should clarify the units of the Y-axis (e.g., sec-1 µM-1). Additionally, in Figure 1F, the authors should explain how the catalytic acidity is shown.

      We have added error bars and axis labels. Figure 1F is conceptual, so we are leaving it as is.

      (18) The authors stated, "These observations raise the possibility that mAMCase, unlike other AMCase homologs, may have evolved an unusual mechanism to accommodate multiple physiological conditions." It would be helpful for the authors to compare and discuss the pH-dependent AMCase activity of mouse AMCase with other AMCase homologs to support this statement.

      That is an excellent idea for future comparative studies, but beyond the scope of what we are examining in this paper.

      (19) The authors should explain Supplemental Figures 1B and C in the Results or Methods sections to provide context for these figures.

      We are unclear about what this means - and since it is a minor comment, we have elected not to change these sections.

      (20) Supplemental Figure 3 is missing any description. It would be important for the authors to include a mention of this figure in the main text before Supplemental Figure 4 to guide the readers.

      The full legend is in there now and the reference to Supplemental 4 was mislabeled.

      (21) For Supplemental Figure 4, the authors should explain the shape of the symbol used in the figure. Additionally, they should explain "apo" and "holoenzyme" in the context of this figure.

      Unclear what a shape means in this context - perhaps the confusion arises because these are violin plots showing distributions.

      (22) Table 1 requires a more detailed explanation of its contents. Additionally, Tables 2 and 3 need to be included. The authors should include these missing tables in the revised version and explain their contents appropriately.

      Table 1 is the standard crystallographic table - there isn’t much more detailed explanation that can be offered. Tables 2 and 3 were not transferred properly by BioRxiv but were included in the review packet as requested a day after submission.

      (23) In Figure 4, it would be beneficial to enlarge Panels A-C to improve the ease of comprehension for readers. Additionally, it is recommended to use D136, D138, and E140 instead of D1, D2, and E to label the respective parts. The authors should also explain the meaning of the symbol used in the figure.

      Since it is a minor comment, we have elected not to change these figures.

      (24) In Figure 5, it would be beneficial to enlarge Panels A-C to improve the ease of comprehension for readers.

      Since it is a minor comment, we have elected not to change these figures.

      (25) Similarly, in Figure 6, all panels should be enlarged to enhance the ease of comprehension for readers.

      Since it is a minor comment, we have elected not to change these figures.

      Reviewer #2 (Recommendations For The Authors):

      In general, I did not identify many detailed or technical concerns with the work. A few items for the authors to consider are listed below.

      (1) The interpretation of the crystallographic datasets seems complicated by the heterogeneity in the substrate component. It might be nice to see more critical analysis of the approach here. Are there other explanations or possible models that were considered? Do other structures of chitinases or other polysaccharide hydrolases exhibit the same phenomenon?

      We have tried in writing it to provide a very critical approach to this and it is quite likely that other structures contain unmodeled density containing similar heterogeneity (but it is just unmodeled).

      (2) It would be ideal to include more experimental validation of the proposed mechanism. Much of the manuscript includes theoretical validations (pKa estimation, dynamics, etc) - but it would be optimal to make an enzyme variant or do an experiment with a substrate analog.

      Yes - we agree that follow on experiments are needed to fully test the mechanism and that those will be the subject of future work.

      (3) For an uninitiated reviewer, I think the major issue with this study is that the broader significance of the work and how it fits into the context of other work on these enzymes is not clear. It would be helpful to be more specific about what we know of mechanism from work on other enzymes to help the reader understand the motivation for this study.

      We have added w few additional references, guided by reviewer 1 comments, that should help in this respect.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In this manuscript by Wu et al., the authors present the high-resolution cryoEM structures of the WT Kv1.2 voltage-gated potassium channel. Along with this structure, the authors have solved several structures of mutants or experimental conditions relevant to the slow inactivation process that these channels undergo and which is not yet completely understood. 

      One of the main findings is the determination of the structure of a mutant (W366F) that is thought to correspond to the slow inactivated state. These experiments confirm results in similar mutants in different channels from Kv1.2 that indicate that inactivation is associated with an enlarged selectivity filter. 

      Another interesting structure is the complex of Kv1.2 with the pore-blocking toxin Dendrotoxin 1. The results show that the mechanism of the block is different from similar toxins, in which a lysine residue penetrates the pore deep enough to empty most external potassium binding sites. 

      The quality of the structural data presented in this manuscript is very high and allows for the unambiguous assignment of side chains. The conclusions are supported by the data. This is an important contribution that should further our understanding of voltagedependent potassium channel gating. Specific comments are appended below. 

      (1) In the mains text's reference to Figure 2d residues W18' and S22' are mentioned but are not labeled in the insets. 

      Now labeled in Fig. 2D

      (2) On page 8 there is a discussion of how the two remaining K+ ions in binding sites S3 and S4 prevent permeation K+ in molecular dynamics. However, in Shaker, inactivated W434F channels can sporadically allow K+ permeation with normal single-channel conductance but very reduced open times and open probability at not very high voltages. 

      Addressed in the Discussion, lines 480-490.

      (3) The structures of WT in the absence of K+ show a narrower selectivity filter, however, Figure 4 does not convey this finding. In fact, the structure in Figure 4B is constructed at such an angle that it looks as if the carbonyl distances are increased, perhaps this should be fixed. Also, it is not clear how the distances between carbonyls given in the text on page 12 are measured. Is it between adjacent or kitty-corner subunits? 

      We decided to remove mention of carbonyl distances, because at our resolutions the atoms are not resolved.

      (4) It would be really interesting to know the authors' opinions on the driving forces behind slow inactivation. For example, potassium flux seems to be necessary for channels to inactivate, which might indicate a local conformational change is the trigger for the main twisting events proposed here. 

      We cite Sauer et al. (2011) for the idea that the intact selectivity filter is a strained conformation, and its relaxation yields the wide vestibule seen in NaK2K and Kv channels.  Lines 434-439.

      Reviewer #2 (Public Review): 

      There are four Kv1.2 channel structures reported: the open state, the C-type inactivated state, a dendrotoxin-bound state, and a structure in Na+. 

      A high-resolution crystal structure of the open state for a chimeric Kv1.2 channel was reported in 2007 and there is no new information provided by the cryoEM structure reported in this study. 

      The cryo-EM structure of the C-type inactivated state of the Kv1.2 channel was determined for a channel with the W to F substitution in the pore helix. A cryo-EM structure of the Shaker channel and a crystal structure of a chimeric Kv1.2 channel with an equivalent W to F mutation were reported in 2022. Cryo-EM structures of the C-type inactivated Kv1.3 channel are also available. All these previous structures have provided a relatively consistent structural view of the C-type inactivated state and there is no significant new information that is provided by the structure reported in this study. 

      A structure of the Kv1.2 channel blocked by dendrotoxin is reported. A crystal structure of charybdotoxin and the chimeric Kv1.2 channel was reported in 2013. Density for dendrotoxin could not be clearly resolved due to symmetry issues and so the definitive information from the structure is that dendrotoxin binds, similarly to charybdotoxin, at the mouth of the pore. A potential new finding is that there is a deeper penetration of the blocking Lys residue in dendrotoxin compared to charybdotoxin. It will however be necessary to use approaches to break the symmetry and resolve the electron density for the dendrotoxin molecule to support this claim and to make this structure significant.  

      We have now succeeded in breaking the symmetry and present in Fig. 3 a C1 structure of the toxin-channel complex. In the improved map we now see that our previous conclusion was wrong: the penetration of Lys5 cannot be much deeper than that seen in CTx and ShK structures. However for some reason the pattern of ion-site occupancies in the blocked state is different in this structure than in the others. Fig. 3, Fig. 4E; text lines 559-568.

      The final structure reported is the structure of the Kv1.2 channel in K+ free conditions and with Na+ present. The structure of the KcsA channel by the MacKinnon group in 2001 showed a constricted filter and since then it has been falsely assumed by the K channel community that the lowering of K concentration leads to a construction of the selectivity filter. There have been structural studies on the MthK and the NaK2K channels showing a lack of constriction in the selectivity filter in the absence of K+. These results have been generally ignored and the misconception of filter constriction/collapse in the absence of K+ still persists. The structure of the Kv1.2 channel in Na+ provided a clear example that loss of K+ does not necessarily lead to filter constriction. 

      We are grateful to the reviewer for pointing out this serious omission. We now cite other work including from the Y. Jiang and C. Nichols labs showing examples of outer pore expansion and destabilization. Page p. 4, lines 90-104; lines 421-439.

      The structure in Na+ is significant while the other structures are either merely reproductions of previous reports or are not resolved well enough to make any substantial claims. 

      We now state more clearly the confirmatory nature of our Kv1.2 open structure (lines 71-74) and the similarities of the inactivated-channel structures (lines 193196).

      Reviewer #3 (Public Review): 

      Wu et al. present cryo-EM structures of the potassium channel Kv1.2 in open, C-type inactivated, toxin-blocked and presumably sodium-bound states at 3.2 Å, 2.5 Å, 2.8 Å, and 2.9 Å. The work builds on a large body of structural work on Kv1.2 and related voltage-gated potassium channels. The manuscript presents a large quantity of structural work on the Kv1.2 channel, and the authors should be commended on the breadth of the studies. The structural studies seem well-executed (this is hard to fully evaluate because the current manuscript is missing a data collection and refinement statistics table). The findings are mostly confirmatory, but they do add to the body of work on this and related channels. Notably, the authors present structures of DTXbound Kv1.2 and of Kv1.2 in a low concentration of potassium (with presumably sodium ions bound within the selectivity filter). These two structures add new information, but the studies seem somewhat underdeveloped - they would be strengthened by accompanying functional studies and further structural analyses. Overall, the manuscript is well-written and a nice addition to the field. 

      The data collection and refinement table has been added (Fig. 4 supplement 3.)

      We agree and regret the lack of functional studies. We have not been able to carry them out because work in our laboratory is winding down and the lab soon will be closing.

      Recommendations for the authors: 

      Reviewer #2 (Recommendations For The Authors): 

      (1) It is not obvious from the data shown how well the side chain positions in the inactivated state are defined by the electron density. These figures should be redone. Maybe the use of stereo would be useful. This will be particularly useful for the reader to decide if the small changes in, for example, the positioning of the carbonyl oxygens are believable. 

      Figure 2 – figure supplement 4 shows the stereo views.

      (2) The authors note the changes observed (though small) in the VSD which were not observed in other structures. The relevance of this observation is not described. Do these changes arise due to the different environments of detergents versus nanodisc etc. in the different structures?

      We’ve now inserted a note about variety of environments and how this might be a cause of the difference: lines 280-285.  

      Are there changes in the pore-VSD interface in the inactivated and the open channel structures and if yes, then do mutations at these residues affect inactivation?

      There is surprisingly little movement at the S4-S5 interface residues identified by Bassetto et al. (2022) as having effects on inactivation. Lines 262-267.

      (3) For the structures in Na+, it is important to provide analytical data showing the biochemical behavior of the channel. This is also true for the wild type and the W to F mutant channel. Size exclusion profiles should be included. 

      The SEC profile (noisy, but showing a clear peak) of the channel in Na+ is now shown in Fig. 4 supplement 1. Low expression of the W366F mutant produced even worse SEC results, but we include a representative micrograph of W366F in Na+ to show the monodispersed protein prep. In Figure 5 – figure supplement 1.

      Reviewer #3 (Recommendations For The Authors): 

      Portions of text from the manuscript are indicated by quotations. 

      Introduction: "One goal of the current study was to examine the structure of the native Kv1.2 channel." 

      Comment, minor points: The authors refer to the Kv1.2 construct used for the structural studies as "native Kv1.2". I found this somewhat confusing because the word "native" suggests derived from a native source. The phrasing above also gives the impression that the structure by Wu et al is the first structure of Kv1.2. The Kv1.2 construct is essentially identical to the one used by Long et al in 2005 to determine the initial structure of Kv1.2 (PDB 2A79). The authors discuss a subsequent paddle-chimera Kv1.2-2.1 structure from 2007 (PDB 2R9R) in the introduction, but it would be prudent to mention the 2005 one of Kv1.2 as well. The open structure determined by Wu et al. is an improvement on the 2A79 structure in that the 2A79 structure was modeled as a poly-alanine model within the voltage sensor domain. Nevertheless, the Kv1.2-2.1 structure (2R9R) is highly similar to the 2A79 structure of Kv1.2. The 2007 structure indicated that Kv1.2-2.1 recapitulates structural features of Kv1.2. It is therefore not surprising that the open structure presented here is highly similar to that of both PDB 2A79 (Kv1.2) and PDB 2R9R (Kv1.2-2.1).  

      We failed to point out the high quality of the original Long et al. 2005 structure and its comparisons with the chimeric structure in Long et al. 2007. We now have tried to correct this: lines 70-74.

      Comment: The cryo-EM analyses suggest that a large percentage (most?) of the particles are missing the beta subunit. This should be commented on somewhere.      

      Now noted on lines 120-132, we pooled particles with and without beta subunits. 

      Regarding ions in the selectivity filter, one-dimensional plots of the density would strengthen the analysis.

      Now included in Fig. 4.

      Also, one should mention caveats associated with identifying ions in cryo-EM maps and the added difficulty/uncertainty when the density is located along a symmetry axis (C4 axis, due to the possible build-up of noise). C1 reconstructions, showing density within the filter, if possible, would strengthen the analyses.

      You are correct. However local resolution is highest in the selectivity filter region. So I think that since the CTF-based filtering is constant over all the structure I think the SNR will be good on axis. 

      Comment: The section on channel inactivation could be simplified by stating that the structure is highly similar to W17'F structures of other Kv channels. (And then discussing possible differences).  

      We now note, “overall conformational difference is identical…” p. 7, lines 193-196.

      "Salt bridges involving the S4 Arg and Lys residues are shifted slightly (Figure 2-figure supplement 3A-D). Arg300 (R3) is in close proximity to Glu226 on the S2 helix for the open channel, while R3 is closer to Glu183 in the S2 helix. The Glu226 side chain adopts a visible interaction with R4 in the inactivated state." 

      Comment: The density for these acidic amino acids seems weak, especially in the inactivated state. It seems like a stretch to make much of their possible conformational changes. 

      We’ve included stereo pairs in Fig. 2 – figure supplement 4.

      "By adding 100 nM α-DTx to detergent solubilized Kv1.2 protein we obtained a cryo-EM structure at 2.8 Å resolution of the complex." 

      Comment: 100 nm. might be lower than the Kv concentration. The current methods are ambiguous on the concentration of Kv channel used for the DTx sample. From the methods, it seems possible that 100 nM DTX is a sub-stoichiometric amount relative to the channel. Regardless, the cryo-EM data seems to suggest that a large percentage of particles do not have DTx bound. This surely complicates the interpretation of density within the filter (which has partly been ascribed to a lysine side chain from DTx).

      The reviewer correctly points a potentially serious problem. It turns out that the 100nM figure we quoted was incorrect, and the actual concentration of toxin, >400 nM, was substantially greater than the protein concentration. This is confirmed by the small fraction (<1%) of 3D class particles that do not show the toxin density (lines 303-306).

      Comment: The methods on atomic structure building/refinement (Protein model building, refinement, and structural analysis) are sparse. A table is needed showing data collection and refinement statistics for each of the structures. This data should also provide average B factors for the ions in the filter. An example can be found in PMID 36224384. 

      Data collection and statistics are now in Fig. 4 – figure supplement 3.

      "In the selectivity filter of the toxin-bound channel (Figure 3E) a continuous density is seen to extend downward from the external site IS0 through to the boundary between IS1 and IS2. This density is well modeled by an extended Lys side chain from the bound toxin, with the terminal amine coordinated by the carbonyls of G27”.

      Comment: While there seems to be extra density in site IS0 from the figures, the density ascribed to lysine in the filter doesn't seem that distinct from those of ions in the open structure. 1-dimensional density plots and some degree of caution may be prudent. Could there, for example, be a mixture of toxin-bound and free channels in the dataset?

      Could the lysine penetrate to different depths? If the toxin binds with nM affinity, why are any channels missing the toxin? Have the authors modeled an atomic structure of the entire toxin bound to the channel to evaluate how plausible the proposed binding of the lysine is? Can the toxin be docked onto Kv1.2 with the deep positioning of the lysine and not clash with the extracellular surface of Kv1.2? 

      We also were concerned about these issues. We have been able to obtain a C1 reconstruction of the toxin-channel complex. In building the atomic model we found that indeed the Lys5 side chain could not penetrate as far as we had thought, and appears to be coordinated by the first carbonyl pair. Fig. 3; text lines 331-332. 

      "Toxin binding shrinks the distances between opposing carbonyl oxygens in the selectivity filter, forming a narrower tunnel into which the Lys side chain fits (Figure 3F). The second and fourth carbonyl oxygen distances are substantially reduced from 4.7 Å and 4.6 Å in an open state to 3.7 Å and 3.9 Å, respectively (Figure 4E). In a superposition of Kv1.2 open-state and α-DTX-bound P-loop structures, there is also an upward shift of the first three carbonyl groups by 0.7~1.0 Å (Figure 4F). " 

      Comment: I suspect the authors intend to refer to Figure 3F rather than 4. I would be cautious here. The refined positions of the carbonyl oxygens are almost certainly affected by the presence or absence of ions in the atomic model during refinement. The density and the resolution of the map may not be able to distinguish small changes to the positions of the carbonyl oxygens (and these differences/uncertainties are compounded by the C4 symmetry). 

      "On the other hand, the terminal amine of lysine in α-DTX is deeply wedged at the second set of carbonyls, narrowing both IS1 and IS2 while displacing ions from the sites (Figure 3-figure supplement 2A). CTX does not cause narrowing of the selectivity filter or displacements of the carbonyls (Figure 3-figure supplement 2B). "

      Comment: Again, caution would be prudent here.  

      We are very grateful to the reviewer for pointing out these problems. We have removed these statements that are weakly supported at our resolution level.

      "Shaker channels are able to conduct Na+ in the absence of K+ (Melishchuk et al., 1998)." 

      Comment: How about the Kv1.2 channel? Is Kv1.2 able to conduct Na+ in the absence of K+ ? This would certainly be relevant for interpreting the conformation of the filter and the density ascribed to Na+ for the structure in sodium.  

      We agree wholeheartedly, but unfortunately we are no longer capable of doing the measurements as our lab will soon close.

      "Ion densities are seen in the IS1, IS3, and IS4 ion binding sites, but the selectivity filter shows a general narrowing as would be expected for binding of sodium ions. The second, third, and fourth carbonyl oxygen distances are reduced from 4.7 Å, 4.7 Å, and 4.6 Å in the open state to 4.4 Å, 3.9 Å, and 4.5 Å, respectively. The rest of the channel structure is very little perturbed. " 

      Comment: The density for IS4 seems weak. To me, it looks like IS1 and IS3 are occupied, whereas IS2 and IS4 are much weaker. 1-dimensional density plots would be helpful. I would suggest caution in commenting too strongly on the "general narrowing" since the resolution of the maps, the local density, and the atomic structure refinement would be consistent with coordinate errors of 0.5 Å or more - and would be compounded (~ doubled) by measuring between symmetry-related atoms.  

      We present 1D plots in Fig. 4E. We no longer comment on “narrowing”

      "Finally, the snake toxin a-Dendrotoxin (DTx) studied here is seen to block Kv1.2 by insertion of a lysine residue into the pore." 

      Comment: Discussion (and references) should be given regarding what was known prior to this study on the mode of inhibition by DTx. 

      Discussion and references now added, lines 287-301.

      "On the other hand, a lengthy molecular-dynamics simulation of deactivation in the Kv1.2-2.1..." 

      Comment: I don't think mentioning this personal communication adds to the manuscript. 

      Actually the original “personal communication” reference was there because the situation is complicated. The movie S3 accompanying the Jensen et al. paper shows deactivation and dewetting of the channel during a 250 us simulation. In the movie there are ions visible in the selectivity filter for the first 50 us, but after that the SF appears empty. Puzzled by this we contacted Dr. Jensen who explained that the movie was in error, ions remain in the SF throughout the entire 250 us. We now cite Jensen (2012) along with the personal communication.

      "The difference between the open and inactivated Kv1.2 structures, like the difference in Kv1.2-2.1 (Reddi et al., 2022) and Shaker (Tan et al., 2022) can be imagined as resulting from a two-step process." 

      Comment: Confusing phrasing because the authors mean to compare their structure to inactivated structures of Kv1.2-2.1 and shaker. 

      Fixed, lines 220-222.

      "Molecular dynamics simulations by Tan et al. based on the Shaker-W17'F structure show that IS3 and IS4 are simultaneously occupied by K+ ions in the inactivated state." 

      Comment: I think that the word "show" is too strong. Perhaps "suggest" 

      The MD result seems to us to be unequivocal, that most of the time the two sites are occupied by ions.

      References are needed for the following statements:  

      -  "as well as the charge-transfer center phenylalanine"

      Now citing Tao et al. 2010, line 156.

      - "total gating charge movement in Shaker channels is larger, about 13 elementary charges per channel" 

      Now citing the review by Islas, 2015 (line 166-169).

      "The selectivity filter of potassium channels consists of an array of four copies of the extended loop (the P-loop) formed by a highly conserved sequence, in this case, TTVGYGD. Two residues anchor the outer half of the selectivity filter and are particularly important in inactivation mechanisms (Figure 2B, right panels). Normally, the tyrosine Y28' (Y377 in Kv1.2) is constrained by hydrogen bonds to residues in the pore helix and helix S6 and is key to the conformation of the selectivity filter. The final aspartate of the P-loop, D30' (D379 in Kv1.2) is normally located near the extracellular surface and has a side chain that also participates in H-bonds with W17' (W366 in Kv1.2) on the pore helix." 

      Citations added (Pless 2013, Sauer 2011) lines 211-214.

      - "During normal conduction, ion binding sites in the selectivity filter are usually occupied by K+ and water molecules in alternation." 

      Added Morais-Cabral et al. 2001, p. 17, lines 463-465.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors present evidence suggesting that MDA5 can substitute as a sensor for triphosphate RNA in a species that naturally lacks RIG-I. The key findings are potentially important for our understanding of the evolution of innate immune responses, but the evidence is incomplete, as additional biochemical and functional experiments are needed to unambiguously assign MDA5 as a bona fide sensor of triphosphate RNA in this model. This also leaves the title as overstating its case.

      We would like to thank the editorial team for these positive comments on our manuscript and the constructive suggestions to improve our manuscript. According to the suggestions and valuable comments of the referees, we have added substantial amounts of new data and analysis to substantiate our claims, and the manuscript, including the title, has been carefully revised to better reflect our conclusions. We are now happy to send you our revised manuscript, we hope the modified manuscript addresses your and the reviewers’ concerns satisfactorily and is suitable for publication in eLife now.

      Reviewer #1 (Public Review):

      This study offers valuable insights into host-virus interactions, emphasizing the adaptability of the immune system. Readers should recognize the significance of MDA5 in potentially replacing RIG-I and the adversarial strategy employed by 5'ppp-RNA SCRV in degrading MDA5 mediated by m6A modification in different species, further indicating that m6A is a conservational process in the antiviral immune response.

      However, caution is warranted in extrapolating these findings universally, given the dynamic nature of host-virus dynamics. The study provides a snapshot into the complexity of these interactions, but further research is needed to validate and extend these insights, considering potential variations across viral species and environmental contexts.

      We concur with the viewpoint that virus-host coevolution complicates the derivation of universal conclusions. To address this challenge, incorporated additional experiments and data based on the suggestions of the reviewers. These experiments were carried out across diverse models, including two distinct vertebrate species (M. miiuy and G. gallus), two different viruses (SCRV and VSV), and the synthesis of corresponding 5’ppp-RNA probes. We believe that these supplementary data bolster the evidence supporting the immune replacement role of MDA5 in the recognition of 5'ppp-RNA in RIG-I deficient species (Figure 1C-1E, Figure 2O and 2P, Figure 4). Moreover, we have duly incorporated references in both the introduction and discussion sections to further support our conclusion that MDA5 in T. belangeri, a mammal lacking RIG-I, possesses the ability to detect RNA viruses posed as RIG-I agonists (doi: 10.1073/pnas.1604939113). Lastly, meticulous revisions have been undertaken in the manuscript, including adjustments to the title, to ensure harmonization with our research outcomes.

      Reviewer#2 (Public Review):

      This manuscript by Geng et al. aims to demonstrate that MDA5 compensates for the loss of RIG-I in certain species, such as teleost fish miiuy croaker. The authors use siniperca cheats rhabdovirus (SCRV) and poly(I:C) to demonstrate that these RNA ligands induce an IFN response in an MDA5-dependent manner in M. miiuy derived cells. Furthermore, they show that MDA5 requires its RD domain to directly bind to SCRV RNA and to induce an IFN response. They use in vitro synthesized RNA with a 5'triphosphate (or lacking a 5'triphosphate as a control) to demonstrate that MDA5 can directly bind to 5'-triphosphorylated RNA. The second part of the paper is devoted to m6A modification of MDA5 transcripts by SCRV as an immune evasion strategy. The authors demonstrate that the modification of MDA5 with m6A is increased upon infection and that this causes increased decay of MDA5 and consequently a decreased IFN response.

      The key message of this paper, i.e. MDA5 can sense 5'-triphosphorylated RNA and thereby compensate for the loss of RIG-I, is novel and interesting, yet there is insufficient evidence provided to prove this hypothesis. Most importantly, it is crucial to test the capacity of in vitro synthesized 5'-triphosphorylated RNA to induce an IFN response in MDA5-sufficient and -deficient cells. In addition, a number of important controls are missing, as detailed below.

      To further support the notion that MDA5 is capable of detecting 5'ppp-RNA in species lacking RIG-I, we conducted additional experiments. Initially, we isolated the RNA from SCRV and VSV viruses. Subsequently, we synthesized 5'ppp-RNA probes that corresponded to the genome termini of SCRV and VSV in vitro. Then, these RNAs were treated with Calf intestinal phosphatase (CIAP) to generate dephosphorylated derivatives. Next, we separately tested the activation ability of various RNAs on IRF3 dimer and IFN response in MKC (M. miiuy kidney cell line) and DF-1 (G. gallus fibroblast cell line) cells, and determined that the immune activation ability of SCRV/VSV viruses depends on their triphosphate structure (Figure 1C-1E, Figure 4C and 4J). In addition, the knockdown of MDA5 inhibited the immune response mediated by SCRV RNA (Figure 2P and 2Q). Finally, we incorporated essential experimental controls (Figure 4B and 4I). We think that the inclusion of these supplementary experimental data significantly enhances the credibility and further substantiates our hypothesis.

      The authors describe an interaction between MDA5 and STING which, if true, is very interesting. However, the functional implications of this interaction are not further investigated in the manuscript. Is STING required to relay signaling downstream of MDA5?

      To better explore the role of STING in MDA5 signal transduction, we constructed a STING expression plasmid and synthesized specific siRNA targeting STING. Next, we found that co-expression of STING and MDA5 significantly enhance MDA5-mediated IFN-1 response during SCRV virus infection (Figure 2N). Conversely, silencing of STING expression restored the MDA5-mediated IFN-1 response (Figure 2O). These findings provide important evidence for the critical involvement of STING in the immune signaling cascade mediated by MDA5 in response to 5'ppp-RNA viruses.

      The second part of the paper is quite distinct from the first part. The fact that MDA5 is an interferon-stimulated gene is not mentioned and complicates the analyses (i.e. is there truly more m6A modification of MDA5 on a per molecule basis, or is there simply more total MDA5 and therefore more total m6A modification of MDA5).

      For the experimental data analysis in Figure 5E and 5F, we first compared the m6A-IP group to the input group, and then normalized the control group (IgG group of 5E and Mock group of 5F) to a value of “1”. Given the observed variability in MDA5 expression levels within the input group of Mock and SCRV virus-infected cells, our analysis represents the actual m6A content of each MDA5 molecule. To enhance clarity, we have updated the label on the Y-axis in Figure 5E and 5F.

      Finally, it should be pointed out that several figures require additional labels, markings, or information in the figure itself or in the accompanying legend to increase the overall clarity of the manuscript. There are frequently details missing from figures that make them difficult to interpret and not self-explanatory. These details are sometimes not even found in the legend, only in the materials and methods section. The manuscript also requires extensive language editing by the editorial team or the authors.

      We acknowledge the valuable feedback from the reviewer and have made significant improvements to our manuscript based on the recommendations provided in the "Recommendation for the authors" section. Furthermore, we have conducted a thorough review of the entire article, resulting in substantial enhancements to the format, clarity, and overall readability of our manuscript.

      Reviewer#3 (Public Review):

      Summary: In this manuscript, the authors investigated the interaction between the pattern recognition receptor MDA5 and 5'ppp-RNA in a teleost fish called Miiuy croaker. They claimed that MDA5 can replace RIG-I in sensing 5'ppp-RNA of Siniperca cheats rhabdovirus (SCRV) in the absence of RIG-I in Miiuy croaker. The recognition of MDA5 to 5'ppp-RNA was also observed in the chicken (Gallus gallus), a bird species that lacks RIG-I. Additionally, they reported that the function of MDA5 can be impaired through m6A-mediated methylation and degradation of MDA5 mRNA by the METTL3/14-YTHDF2/3 regulatory network in Miiuy croaker under SCRV infection. This impairment weakens the innate antiviral immunity of fish and promotes the immune evasion of SCRV.

      Strengths:<br /> These findings provide insights into the adaptation and functional diversity of innate antiviral activity in vertebrates.

      Weaknesses:<br /> However, there are some major and minor concerns that need to be further addressed. Addressing these concerns will help the authors improve the quality of their manuscript.One significant issue with the manuscript is that the authors claim to be investigating the role of MDA5 as a substitute for RIG-I in recognizing 5'ppp-RNA, but their study extends beyond this specific scenario. Based on my understanding, it appears that sections 2.2, 2.3, 2.5, 2.6, and 2.7 do not strictly adhere to this particular scenario. Instead, these sections tend to investigate the functional involvement of Miiuy croaker MDA5 in the innate immune response to viral infection. Furthermore, the majority of the data is focused on Miiuy croaker MDA5, with only a limited and insufficient study on chicken MDA5. Consequently, the authors cannot make broad claims that their research represents events in all RIG-I deficient species, considering the limited scope of the species studied.

      We agree with the reviewer's perspective that functional analysis of MDA5 in M. miiuy may not adequately represent all species lacking RIG-I. To address this concern, we have incorporated additional experimental data utilizing different model systems, including two different vertebrate species (M. miiuy and G. gallus), two distinct viruses (SCRV and VSV), and the synthesis of two corresponding 5’ppp-RNA probes. While the functional characterization of G. gallus MDA5 remains relatively limited compared to M. miiuy, our current experimental findings provide support for two key observations. Firstly, the triphosphate structure of the VSV virus is pivotal in activating the innate immune response in G. gallus against the virus (Figure 1D and 4J). Secondly, G. gallus MDA5 can recognize 5’ppp-RNA (Figure 4I, 4K and 4L). Consequently, although we cannot definitively establish the immune surrogate function of MDA5 in all RIG-I-deficient species, our research data further substantiates this hypothesis. Moreover, we have adopted a more cautious attitude in summarizing our experimental conclusions, thereby enhancing the rigor of our manuscript language.

      The current title of the article does not align well with its actual content. It is recommended that the focus of the research be redirected to the recognition function and molecular mechanism of MDA5 in the absence of RIG-I concerning 5'ppp-RNA. This can be achieved through bolstering experimental analysis in the fields of biochemistry and molecular biology, as well as enhancing theoretical research on the molecular evolution of MDA5. It is advisable to decrease or eliminate content related to m6A modification.

      Following the reviewer's recommendations, we have revised the title to emphasize that our main research focus is a teleost fish devoid of RIG-I. Furthermore, we have conducted additional molecular experiments to further elucidate the 5'ppp-RNA recognition function of MDA5 in RIG-I-deficient species. In an attempt to analyze the potential molecular evolution of MDA5 resulting from RIG-I deficiency, we collected MDA5 coding sequences from diverse vertebrates. However, due to multiple independent loss events of RIG-I in fish, fish with or without RIG-I genes in the phylogenetic tree cannot be effectively clustered separately, making it extremely difficult to perform this aspect of analysis. Consequently, we have regrettably opted to forgo the molecular evolution analysis of MDA5.

      Our article topic is to reveal an antagonistic phenomenon between fish receptor and RNA viruses. The MDA5 of RIG-I-lost fish has evolved the ability to recognize 5’ppp-RNA virus and mediate IFN response to resist SCRV infection. Conversely, the m6A methylation mechanism endows the SCRV virus with a means to weaken the immune capacity of MDA5. Therefore, we believe that the latter part is an important part of the arms race between the virus and its host, and should be retained.

      Additionally, the main body of the writing contains several aspects that lack rigor and tend to exaggerate, necessitating significant improvement.

      We appreciate the reviewer’s comment and have improved the manuscript addressing the points raised in the “Recommendation for the authors”. We have added corresponding experiments to strengthen the verification of the conclusions, and in addition, we are more cautious in summarizing the language of the conclusions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The evidential foundation within the Result 1 section appears somewhat tenuous.

      Firstly, the author derives conclusions regarding the phenomenon of RIG-I loss in lower vertebrates by referencing external literature and conducting bioinformatics analyses. It is pertinent to inquire whether the author considered fortifying these findings through additional WB/PCR experiments, particularly for evaluating RIG-I expression levels across diverse vertebrates, encompassing both lower and higher orders.

      Firstly, the species we analyzed are mostly model species with excellent genomic sequence information in the database. Secondly, the RIG-I protein sequences (at least some domain sequences) are relatively conserved in vertebrates. Therefore, the credibility of evaluating the existence of RIG-I in these species through homology comparison is high. Therefore, we do not intend to conduct additional PCR/WB experiments to confirm this.

      Additionally, following the identification of RIG-I loss, the author postulates MDA5 as a substitute of RIG-I, grounding this speculation in the analysis of MDA5 and LGP2 protein structures. It is imperative to address whether the author could enhance the manuscript by supplying expression data for MDA5 and LGP2 across different vertebrates and elucidating further why MDA5 is posited as the compensatory mechanism for RIGI loss.

      Like MDA5, LGP2 is also an interferon-stimulating gene, so they both likely exhibit high sensitivity to viral infections. Therefore, we think that comparing the expression data of these two genes is difficult to evaluate their function. In mammals, the regulatory mechanisms of LGP2 to RIG-I and MDA5 were complicated and ambiguous. To evaluate the potential function of LGP2 in M. miiuy, we further constructed LGP2 plasmid and synthesized siRNA targeting LGP2. Then, our results indicate that mmiLGP2 can enhance the antiviral immune response mediated by mmiMDA5 (Figure 1H and 1I), further indicating the regulatory role of mmiLGP2 in RLR signaling, rather than acting as a compensatory receptor for RIG-I.

      Also, is it conceivable that other receptors contribute to this compensatory effect in lower vertebrates?

      5’ triphosphate short blunt-end double-strand RNA is the ligand of RIG-I as contained in the panhandle of negative-strand viral genomes. We mainly focus on the immune recognition and compensatory effects of other receptors on RIG-I loss, and MDA5, as the protein with the most similar structure, first attracted our attention. In addition, IFIT proteins have been reported to recognize triphosphate single-stranded RNA (doi: 10.1038/nature11783). However, we used SCRV and VSV RNA as viral models, both of which have negative stranded genomes and meet the ligand standards of RIG-I, rather than IFIT. Therefore, we excluded the IFIT protein from our research scope.

      (2) The article exclusively employs a singular type of 5'PPP-RNA virus and one specific lower vertebrate species, thereby potentially compromising the robustness of the assertion that this phenomenon is prevalent in lower vertebrates. To bolster this claim, could the author consider incorporating data from an alternative 5'PPP-RNA virus and a different lower vertebrate species?

      To address this concern, we have incorporated additional experimental data utilizing different model systems, including two different vertebrate species (M. miiuy and G. gallus) and two distinct viruses (SCRV and VSV). While the functional characterization of G. gallus MDA5 remains relatively limited compared to M. miiuy, our current experimental findings provide support for two key observations. Firstly, the triphosphate structure of the VSV virus is pivotal in activating the innate immune response in G. gallus against the virus (Figure 1D and 4J). Secondly, G. gallus MDA5 can recognize 5’ppp-RNA (Figure 4I, 4K and 4L). Consequently, these experimental results further confirmed the conservatism of this immune compensation mechanism.

      (3) A nuanced consideration of the statement in Result 5 is warranted. Examination of the results under SCRV infection conditions suggests dynamic fluctuations in MDA5 expression levels, challenging the veracity of the statement implying "increased expression", which contradicts the proposed working model of this article.

      Because MDA5 acts as a receptor and plays a recognition immune role in the early stages of virus infection, the expression of MDA5 in the early stage of SCRV infection rapidly increases. In the later stage of infection, the expression of MDA5 may gradually decrease again due to the negative feedback mechanism in the host body to prevent excessive inflammation. However, compared to the uninfected group, the expression of MDA5 was significantly increased in the SCRV-infected group, so we believe that the term "increased expression" is not a problem. In addition, the m6A mechanism can weaken the function of MDA5, but it still cannot prevent the overall increase of MDA5 expression, which is not contradictory to the working model in this article.

      Additionally, the alterations in m6A levels in miiuy croaker under SCRV infection conditions warrant clarification. Could the author employ m6A dot blotting to supplement the findings related to total m6A levels?

      Our previous studies (doi: 10.4049/jimmunol.2200618) have suggested that the total m6A level is increased after SCRV infection in miiuy croaker. We cited this conclusion in the discussion of our manuscript.

      (4) It would be beneficial if the editors could assist the author in enhancing the language of the manuscript.

      We have carefully checked the full article and modified it with Grammarly tools, and we believe that the grammar, format, and readability of our articles have been greatly improved.

      Reviewer #2 (Recommendations For The Authors):

      Figure 1

      (1) Figure 1B - some clarification needs to be added about this figure in the text. It is unclear what the main point is that the authors would like to convey.

      What we want to emphasize is that some species with RIG-I, such as zebrafish, have also experienced RIG-I loss events, but have undergone whole genome replication events before the loss, thus preserving a copy of RIG-I. This indicates that loss events of RIG-I are very common in vertebrates and do not occur randomly. We have elaborated on this point in the results and discussion.

      (2) Figure 1C - is not very informative other than showing Mm MDA5 and LGP2 side-by-side. It would be more useful to show a comparison of human RIG-I/MDA5 alongside Mm and Gg MDA5. Are there any conserved/shared key residues between hRIG-I/hMDA5 versus mmMDA5?

      Homologous proteins are often known to adopt the same or similar structure and function. We have added human RIG-I domain information to this figure (Figure 1F). By comparing the domain information of human RIG-I with M. miiuy MDA5 and LGP2, M. miiuy MDA5 has a similar structure to human RIG-I, making it most likely to compensate for the missing RIG-I. While M. miiuy LGP2 lacks the CARD domain, which is crucial for signal transduction, so we will shift our focus to M. miiuy MDA5. In addition, we collected protein sequences of MDA5 and RIG-I from various vertebrates to identify key residues evolved in recognizing 5'ppp-RNA by M. miiuy MDA5. However, unfortunately, no potential residues were found during the comparison process.

      Figure 2

      (1) Figure 2B - It would be important to demonstrate MDA5-Flag expression by immunoblot and compare MDA5-Flag overexpression to endogenous MDA5 expression using the anti-MDA5 antibody from panel 2A. If IF is used, more cells need to be visible in the field.

      After transfecting the MDA5 plasmid into MKC, endogenous MDA5 expression was detected using MDA5 antibodies. The results showed a significant increase in MDA5 protein levels, indicating that MDA5 antibodies can specifically recognize MDA5 protein. In addition, we retained the original immunofluorescence images to better demonstrate the subcellular localization of MDA5.

      (2) Figure 2C - The 1:1 stoichiometry of MDA5:MAVS (in the absence of any stimulus) is quite surprising. How does the interaction between MDA5 and MAVS change upon stimulation with an RNA ligand (SCRV, poly(I:C))?

      We do not believe that the actual stoichiometry between MDA5 and MAVS is what you described as 1:1. In fact, the proportion of proteins in the complex depends on many factors in the experimental results with Co-IP. Firstly, the MDA5 plasmid in this study has a 3 × Flag tag, while the MAVS only has a 1x Myc tag, which makes the antibody more sensitive for detecting MDA5-Flag. In addition, the Co-IP results are also affected by multiple factors such as the type of antibody and the number of recoveries, making it difficult to estimate the actual ratio of MDA5 to MAVS. Based on the above reasons and the fact that the detection of the interaction strength between MDA5 and MAVS after infection seems to be off-topic, we did not continue to explore this point.

      (3) Figure 2D - The interaction between MDA5 and STING is a very interesting finding but is not elaborated on in the paper (even though the interaction between MDA5 and STING is mentioned in the abstract). The manuscript would be strengthened if the interaction between MDA5 and STING is further investigated. For example, does the IFN response that is reported in panels 2E to 2H require the presence of STING? Does mmMDA5 signal via STING in response to a DNA ligand?

      We appreciate the referee's suggestion to study the mutual influence between MDA5 and STING. We found that co-expression of STING and MDA5 can enhance MDA5-mediated IFN-1 response during SCRV virus infection, while knocking down STING can restore MDA5-mediated IFN-1 (Figure 2N and 2O). This indicates that STING plays an important signaling role in the immune response of MDA5 to RNA viruses. We understand the importance of cGAS/STING pathways in identifying exogenous DNA, so exploring the MDA5 pathway for DNA ligand recognition is an interesting and meaningful perspective. But this seems to be detached from the theme of our article, so we didn't continue to explore this point.

      (4) Figures 2F and 2H - the authors demonstrate that SCRV induces a type I IFN response in an MDA5-dependent manner. While SCRV is a single-stranded negative-sense RNA virus that contains 5'ppp-RNA, it cannot be excluded that MDA5 is activated here in response to a double-stranded RNA intermediate of viral origin or even a host-derived RNA whose expression or modification is altered during infection. To demonstrate in an unambiguous manner that MDA5 senses 5'ppp-RNA, it is crucial to use the in vitro synthesized 5'ppp-RNA (and its dephosphorylated derivative as a control) from Fig. 4 in these experiments.

      We transfected 5 'ppp SCRV and 5' ppp VSV (and their dephosphorylated derivatives) synthesized in vitro into MKC cells and DF-1 cells, respectively. The results showed that 5’ppp-RNAs significantly promoted the formation of IRF3 dimers, while their dephosphorylated derivatives did not (Figure 4C and 4J). In addition, we extracted virus RNA from the SCRV and VSV viruses and dephosphorylated them with Calf intestinal phosphatase (CIAP). These RNAs were transfected into MKC and DF-1 cells and found that the immune response mediated by virus RNAs was much higher than the dephosphorylated form (Figure 1C-1E). The above results indicate that the immune response activated by SCRV and VSV is indeed dependent on their triphosphate structure. Finally, the IRF3 dimer and IFN induction activated by SCRV RNA can be inhibited by si-MDA5 (Figure 2P and 2Q), further demonstrating the involvement of MDA5 in the immune response mediated by 5’ppp-RNA ligands.

      (5) In mice and humans, MDA5 is known to collaborate with LGP2 to jointly induce an IFN response. Does M.miiuy express LGP2? If so, it would be informative to include a siRNA targeting LGP2 in the experiments in panel F. In mammals, LGP2 potentiates the response via MDA5 while it may inhibit RIG-I activation.

      M.miiuy express LGP2. We constructed an LGP2 plasmid and synthesized si-LGP2 to investigate the impact of LGP2 on MDA5-mediated immune processes (Figure 1G-1I). The results showed that LGP2 can enhance the IFN response mediated by MDA5 during SCRV virus infection, similar to that in mammals.

      (6) Minor comment - Is the poly(I:C) used in this figure high or low molecular weight poly(I:C)? HMW poly(I:C) preferentially stimulates MDA5, while LMW poly(I:C) preferentially stimulates RIG-I.

      We used poly(I:C)-HMW as a positive control for activating MDA5. We have modified the relevant information in Figure 2 and its legend.

      Figure 3

      (1) Figure 3F/G - The normalization in this Figure is difficult to interpret. It would be better to split Figure 3G into 4 separate graphs and include the mock-infected cells alongside the infected samples (as done in Figure 2).

      To better demonstrate the function of the RD domain of MDA5 in M. miiuy, we have changed the experimental plan, as shown in figure 3F. We detected the induction of antiviral factors by overexpression of MDA5 and MDA5-△RD under poly (I:C)-HMW stimulation. This can indicate that the RD domain of MDA5 has a conserved function in the recognition of poly(I:C)-HMW in M. miiuy, and can serve as a positive control for the recognition of SCRV virus by the RD domain.

      Figure 4

      (1) Figure 4B - A number of important controls are missing. Was the immunoprecipitation of RNA successful? This could be shown by running a fraction of the immunoprecipitated material on an RNA gel and/or by showing that the input RNA was depleted after IP. In addition, a control IP (Streptavidin beads without biotinylated RNA) is missing to ensure that MDA5 does not stick non-specifically to the Streptavidin resin.

      We appreciate the referee's suggestions. We rerun this experiment and added a non-biomarker RNA IP control group, and the results showed that MDA5 did not adsorb non-specific onto the beads (Figure 4B). In addition, based on the referee's suggestion, we tested the consumption of RNA before and after immunoprecipitation, and the results showed that biotin-labeled RNA, rather than non-biotin-labeled RNA, could be adsorbed by beads, indicating the success of RNA precipitation. However, we think that this is not necessary for the final presentation of the experimental results, so we did not show this in the figure.

      (2) Figure 4B - It is unclear why there is such a large molecular weight difference between endogenous MDA5 and MDA5-Flag (110 kDa versus 130/140 kDa). Why is there less MDA5-Flag retrieved than endogenous MDA5?

      After careful analysis, we believe that the significant difference in molecular weight between endogenous MDA5 and MDA5 Flag may be due to three reasons. Firstly, MDA5 flag has a 3× Flag tag. Secondly, as shown in the primer table, we constructed MDA5 between the NotI and XbaI cleavage sites in the pcDNA3.1 vector, which are located at the posterior position in the vector. This means that the Flag tag has a certain distance from the starting codon of MDA5, and these sequences on the vector can also be translated and increase the molecular weight of the exogenous MDA5 protein. Finally, in order to facilitate the amplification of the primers, the F-terminal primers of MDA5 contain a small portion of the 3'UTR sequence (excluding the stop codon). These above reasons may have led to significant differences in molecular weight. In addition, in order to supplement important experimental controls, we have conducted a new RNA pull-down experiment as shown in Figure 4B.

      (3) Minor point: Figure 4B - please clarify in the figure whether RNA or protein is immunoprecipitated and via which tags.

      We have conducted a new RNA pull-down experiment as shown in Fig 4B, and we have clearly labeled the relevant information in the figure.

      (4) Figure 4E - the fraction of MDA5 that binds 5'ppp-RNA seems incredibly minor. And why is this experiment done using 5'OH-RNA as a competitor, rather than simply incubating MDA5 and 5'OH-RNA together and demonstrating that these do not form a complex?

      The proportion of MDA5 combined with 5’ppp-RNA is influenced by many conditions, including the concentration and purity of the probe and purified protein. In addition, the dosage ratio between the RNA probe and MDA5 protein in the EMSA experiment can also have a significant impact on the results. Therefore, it is not possible to accurately determine the actual binding force between MDA5 and RNA. In the EMSA experimental program, both cold probes (5’ppp-RNA) and mutated cold probes (5’OH-RNA and 5’pppGG-RNA) are crucial for demonstrating the specific binding between MDA5 and 5’ppp-RNA, as they can exclude false positive errors caused by factors such as the presence of biotin in the purified MDA5 protein itself.

      (5) Figure 4B/4C/4F - These experiments would be strengthened by including an MDA5 mutant that cannot bind to RNA. These mutants are well-described in mammals. If these residues are conserved, it is straightforward to generate this mutant.

      As shown in Figure 3, the MDA5 of M. miiuy has an RD domain that can recognize the SCRV virus. We constructed MDA5-△RD mutant plasmids with 6x His-tags and purified them for EMSA experiments (Figure 4E). The experimental results further indicate that MDA5, rather than MDA5-△RD, can bind to 5’ppp-SCRV (Figure 4G). This further confirms the crucial role of the RD domain in recognizing the 5'ppp-RNA virus.

      (6) Minor point: Figure 4E: please clarify in which lanes MDA5 has been added.

      Thank you for the referee's suggestion. We have synthesized new 5'ppp-RNA probes (5’ppp-SCRV and their dephosphate derivatives) and rerun this experiment, and relevant information has been added in the Figure (Figure 4F).

      Figure 5

      (1) Figure 5C - As MDA5 is an interferon-stimulated gene (as shown in panel G/H/I)) the increased MDA5 expression could simply explain the increase in the amount of m6A-MDA5 that is immunoprecipitated after infection. Could this figure be improved by doing a fold change between input vs m6A-IP OR uninfected vs SCRV-infected conditions? This would reveal whether the modification of MDA5 with m6A is really increased after infection.

      As shown in Figure 5F below, our data indicates that the proportion of m6A-modified MDA5 does indeed increase after SCRV infection, rather than solely due to the increased expression of MDA5 itself.

      (2) Figure. 5E/F - The y-axis is unclear: relative MDA5 m6A levels. Relative to what? Input? Mock infected?

      For experiments in Figure 5E/F, we first compared the m6A-IP group with the input group, and then normalized the control group (IgG group of 5E and Mock group of 5F) to “1”. We have replaced the Y-axis name with a clearer one (Figure 5E and 5F).

      (3) General comment - It is not mentioned in the text that MDA5 is an interferon-stimulated gene. This would account for the increase in expression (qPCR) after viral infection or poly(I:C) transfection, hence there is no novelty in this finding. In addition, the authors suggest that MDA5 increases at the protein level (by immunoblot) but the increase on these blots is not convincing (figure 5H/5I).

      We understand that the increase in expression of MDA5 as an interferon-stimulated gene after viral infection is a common phenomenon. We present this to further validate the m6A sequencing transcriptome data, and to demonstrate that although m6A modification interferes with MDA5 expression during viral infection, it cannot prevent the increase of mRNA level of MDA5. In addition, we rerun the experiment and the results showed that the expression of MDA5 protein can indeed be specifically activated by the SCRV virus and poly(I:C)-HMW.

      Figure 6

      (1) Figure 6E - What was the MOI of the virus used in this experiment? It is not mentioned in the figure legend.

      MOI=5, we have added this point in the figure legend.

      Figure 7

      (1) Figure 7J - This graphic is somewhat misleading and should be altered to better reflect the conclusions that are drawn in the manuscript. The graphic suggests that MAVS and STING interact, but this is not demonstrated in the paper. In addition, the paper does not demonstrate whether MAVS or STING (or both) are needed downstream of MDA5 to relay signalling. Finally, please draw an arrow from type I IFNs to increased expression of MDA5 to illustrate that MDA5 is an ISG.

      Thank you for the referee's suggestion. We have revised the images to more accurately match the conclusions of the manuscript (Figure 7J). Firstly, we have separated the STING protein from the MAVS protein. Secondly, arrows have been used to indicate that MDA5 is an IFN-stimulated gene. Finally, as we have added relevant experiments to demonstrate the importance of MITA protein in the signaling process of MDA5-activated IFN response. In addition, the function of MAVS binding to MDA5 protein and promoting its signal transduction is very conserved, and there is a good research background even in fish with RIG-I deficiency (10.1016/j.dci.2021.104235). Therefore, in Figure 7J, we still chose to bind MAVS to MDA5 protein and use it as a downstream signal transducer of MDA5.

      Discussion<br /> (1) There is very little discussion about METTL and YTHDF proteins in the discussion despite the fact that the last 2 figures are entirely devoted to these proteins.

      Based on the referee's suggestion, we have added relevant content about METTL and YTHDF proteins in the discussion. In addition, the basic mechanism and function of METTL and YTHDF proteins were briefly described in the introduction.

      Reviewer #3 (Recommendations For The Authors):

      Please refer to the specific suggestions and recommendations. They include proposals for experimental additions, improved methodologies, and suggestions to resolve writing-related concerns.

      Major concerns

      (1) I suggest changing the article title to "Functional Replacement of RIG-I with MDA5 in Fish Miiuy Croaker", or a similar title, to make it more focused and closely aligned with the content of the article.

      Following the reviewer's recommendations, we have revised the title to emphasize our primary research subject is a teleost fish that lacks RIG-I. In addition, we have changed “5’ppp-RNA” to “5’ppp-RNA virus” to emphasize the interaction between the virus and the receptor. We believe that the revised title is more in line with the content of the article.

      (2) Due to the inherent limitations in genome sequencing, assembly, and annotation for the Miiuy croaker, comprehensive annotation of immune-related genes remains incomplete. To address this critical gap, it is recommended that authors establish experimental protocols, such as Fluorescence In Situ Hybridization (FISH), to confirm the absence of RIG-I in the Miiuy croaker. They should simultaneously employ MDA5 probes as a positive control for validation purposes.

      The miiuy croaker has good genomic information at the chromosomal level (doi: 10.1016/j.aaf.2021.06.001). In addition, studies have shown that RIG-I is absent in the orders of Perciformes (doi: 10.1016/j.fsirep.2021.100012), while miiuy croaker belongs to the order Perciformes, so it does indeed lose the RIG-I gene. Therefore, we do not intend to use FISH technology to prove this.

      (3) Similarly, it is recommended that the authors first provide evidence of the presence of 5'ppp at the 5' terminus of the genome RNA of SCRV, as demonstrated in the study by Goubau et al. (doi: 10.1038/nature13590, Supplementary figure 1). This evidence is crucial before drawing conclusions about the compensatory role of MDA5 in recognizing 5'ppp RNA viruses, using SCRV as the viral model.

      As suggested by the referee, we extracted SCRV RNA from SCRV virus particles and assessed the 5’-phosphate-dependence of stimulation by SCRV RNA. Calf intestinal phosphatase (CIAP) treatment substantially reduced the stimulatory activity of SCRV RNA in MKC cells of M. miiuy (Figure 1C and 1E). In addition, similar results were obtained by transfecting VSV-RNA isolated from VSV virus into DF-1 cells of G. gallus (Figure 1D). The above evidences confirm the presence of triphosphate molecular features between SCRV and VSV viruses, and indicating that birds and fish lacking RIG-I have other receptors that can recognize 5’ppp-RNA.

      (4) The 62-nucleotide (nt) 5'ppp-RNA utilized in this study was obtained from Vesicular Stomatitis Virus (VSV). In order to provide direct evidence, it is necessary to include a 62-nt 5'ppp-RNA that is directly derived from SCRV itself.

      We adopted this suggestion and synthesized a 67-nucleotide 5’ppp-SCRV RNA probe. We found that 5’ppp-SCRV activates dimerization of IRF3 and binds to MDA5 of M. miiuy in a 5’-triphosphate-dependent manner (Figure 4A-4F).

      (5) Given that RNAs with uncapped diphosphate (PP) groups at the 5′ end also activate RIG-I, similar to RNAs with 5′-PPP moieties, and the 5′-terminal nucleotide must remain unmethylated at its 2′-O position to allow RNA recognition by RIG-I, it is necessary for the authors to conduct additional experiments to supplement and validate these two distinguishing features of RIG-I in RNA recognition. This will provide more reliable evidence for the replacement of RIG-I by MDA5 in RNA recognition.

      Thank you for the reviewer's professional suggestions. We understand that exploring the combination of 5’pp-RNA and 2′-O-methylated RNA with MDA5 can further demonstrate the alternative function of MDA5. But we think that the use of 5’ppp-RNA and their dephosphorylation derivatives can fully demonstrate that the MDA5 of M. miiuy and G. gallus have evolved to recognize 5’triphosphate structure like human RIG-I. Therefore, we do not intend to conduct any additional experiments

      (6) In section 2.3, the authors assert that Miiuy croaker recognizes SCRV through its RD domain. This claim is supported by their data showing that cells overexpressed with the MDA5 ΔRD mutant lost the ability to inhibit SCRV replication. As a result, the authors draw the conclusion that "these findings provide evidence that MDA5 may recognize 5'-triphosphate-dependent RNA (5'ppp-RNA) through its RD domain." However, to strengthen their argument, the authors should first demonstrate that during SCRV infection, MDA5-mediated antiviral immune response is indeed initiated by recognizing the 5'ppp part of the SCRV RNA, rather than the double-strand part (which can exist in ssRNA virus) of the viral RNA, as this is naturally a ligand for MDA5. Additionally, the authors should treat the isolated SCRV RNA with CIP to remove the phosphate group and examine the binding of MDA5 with SCRV RNA before and after treatment. They should also transfect CIP-treated or untreated SCRV RNA into MDA5 knockdown and wild-type MKC cells to investigate the induction of antiviral signaling and levels of viral replication. Finally, the authors should verify the binding ability of the mutants with isolated SCRV RNA, with or without CIP treatment, to determine which domain of MDA5 is responsible for SCRV 5'ppp-RNA recognition.

      We understand the reviewer's concern that MDA5 may be identified by binding to dsRNA in the SCRV virus. Based on the reviewer's suggestion, we extracted SCRV RNA and obtained its dephosphorylated RNA using Calf intestinal phosphatase (CIAP). Next, we transfected them into MDA5-knockdown and wild-type MKC cells, and detected the dimerization of IRF3 and IFN reaction. The results indicate that SCRV RNA does indeed activate immunity in a triphosphate-dependent manner, and knockdown of MDA5 prevents immune activation of SCRV RNA (Figure 1C and 1E, Figure 2P and 2Q). Finally, we synthesized a 5'ppp-SCRV RNA probe and demonstrated that MDA5 binds to 5'ppp-SCRV through the RD domain (Figure 4E-4G). We believe that these results can better demonstrate that MDA5 recognizes 5’ppp-RNA through its RD domain and addresses the concerns of the reviewers.

      (7) Similarly, merely presenting Co-IP data demonstrating the interaction between Miiuy croaker MDA5 and STING in overexpressed EPC cells does not justify the claim that "in vertebrates lacking RIG-I, MDA5 can utilize STING to facilitate signal transduction in the antiviral response". This is because interactions observed through overexpression may not accurately reflect the events occurring during viral infection or their actual antiviral functions. To provide more robust evidence, it is essential to conduct functional experiments after STING knockout (or at least knockdown). Furthermore, it is important to note that Miiuy Croaker alone cannot adequately represent all "vertebrates lacking RIG-I".

      We found that co-expression of STING and MDA5 can enhance MDA5-mediated IFN-1 response during SCRV virus infection, while knocking down STING can restore MDA5-mediated IFN-1 response (Figure 2N and 2O). This indicates that STING plays an important signaling role in the immune response of MDA5 to RNA viruses. In addition, loss of RIG-I is a common phenomenon in vertebrates, and STING of birds such as chickens (doi: 10.4049/jimmunol.1500638) and mammalian tree shrews (doi: 10.1073/pnas.1604939113) can also bind to MDA5, indicating that STING can indeed play a crucial role in MDA5 signaling in species with RIG-I deficiency. We have added this section to our discussion and elaborated on our observations in more cautious language.

      (8) In the manuscript, a series of experiments were conducted using an antibody (Beyotime Cat# AF7164) against endogenous MDA5. The corresponding immunogen for this MDA5 antibody is a recombinant fusion protein containing amino acids 1-205 of human IFIH1/MDA5 (NP_071451.2). However, the amino acid sequences of IFIH1/MDA5 differ substantially between humans and Miiuy croaker, which could introduce errors in the results. Therefore, it is essential to employ antibodies specifically designed for targeting Miiuy croaker's own MDA5 in the experiments.

      As shown in Figure 2B, endogenous MDA5 antibodies can detect the MDA5 portion that is forcibly overexpressed by plasmids, suggesting that the MDA5 antibody can indeed specifically recognize the MDA5 protein of M. miiuy.

      (9) It is recommended to investigate the phosphorylation of IRF3 in order to confirm the downstream signaling pathway during viral infection when MDA5 is knocked down or overexpressed.

      Due to the lack of available phosphorylation antibodies for fish IRF3, we used IRF3 dimer experiments to detect downstream signaling (Figure 1C and 1D, Figure 2P, Figure 4C and 4J).

      (10) The use of poly I:C as a mimic for dsRNA to investigate MDA5's recognition of 5'ppp-RNA in hosts lacking RIG-I, as well as the examination of the regulatory role of MDA5 m6A methylation upon activation by 5'ppp-RNA, may be inappropriate. Poly I:C does not possess 5'ppp, and while it has been identified as a ligand for MDA5 in various studies, MDA5 cannot serve as a substitute for RIG-I in recognizing poly (I:C). Therefore, the authors should utilize 5'ppp-dsRNA as the mimic and include the corresponding 5'ppp-dsRNA control without a 5'triphosphate as the negative control (both available from InvivoGen). This approach will specifically elucidate the mechanisms involved when MDA5 functions similarly to RIG-I in the recognition of 5'ppp-RNA.

      In our study, we used poly(I:C)-HMW, a known dsRNA mimetic that can be preferentially recognized by MDA5 rather than RIG-I, as a positive control for activating MDA5. What we want to demonstrate is that, like poly(I:C)-HMW (positive control), SCRV can also promote MDA5-mediated IFN immunity, further indicating the important role of MDA5 in 5’ppp-RNA virus invasion. We have clearly labeled the type of poly(I:C) in the figures and legends to avoid misunderstandings for readers.

      (11) In Figure 2, Figure 3, and Figure 6, the appearance of virus plaques is not readily apparent, and it is necessary to replace these images with clearer photographs. It appears that MKC or MPC cells are not appropriate for conducting plaque assays. To accurately assess viral proliferation, the authors should measure key indicators throughout the process, such as the production of positive-strand RNAs (+RNAs), replication intermediates (RF), and transcription of subgenomic RNAs. This approach is preferable to solely measuring the M and G protein genes from the virus genome as positive results can still be observed in contaminated cells.

      As pointed out by the reviewer, we also think that the virus plaque images in Figure 2K and Figure 3D are not clear enough, so we have replaced them with new clear images (Figure 2J and Figure 3D). But we think that other images can clearly display the proliferation of the SCRV virus, so we did not replace them. In addition, the primers we currently use do measure +RNA, so the replication level of the SCRV virus can be accurately evaluated without being affected by virus contamination. Because the regions where the two pairs of primers are located belong to the SCRV-M and SCRV-G protein genes, we label them as SCRV-M and SCRV-G to distinguish between the two pairs of genes. To avoid reader misunderstanding, we have modified the Y-axis label in the figures (Figure 2I and 2K, Figure 3E, Figure 6E and 6O).

      (12) There is a substantial disparity in the molecular size of M. miiuy MDA5 between endogenous and exogenously expressed proteins, as shown in Figure 2A and 2C-D. Please provide clarification.

      Please refer to the response to Reviewer 2's question regarding Figure 4B above.

      (13) The manuscript incorporates the evolutionary perspective, but lacks specific evolutionary analysis. Thus, it is essential to include relevant analysis to comprehend the evolutionary dynamics and positive selection on MDA5 and LGP2 in the absence of RIG-I in Miiuy croaker. This can be achieved through theoretical calculations using appropriate algorithms, such as the branch models and branch-site models based on the maximum-likelihood method implemented in the phylogenetic analysis by maximum likelihood (PAML) package.

      In fact, we have analyzed the molecular evolution of MDA5 and LGP2. Unfortunately, even when analyzing only the MDA5/LGP2 CDS sequences in fish, we found that the topologies of gene trees of MDA5/LGP2 were largely consistent with the species tree. Thus, species with or without RIG-I in the gene trees cannot effectively separate clusters, making it extremely difficult to analyze the molecular evolution of MDA5/LGP2 caused by RIG-I deficiency. Consequently, we gave up this aspect of analysis.

      (14) If the narrative regarding m6A methylation goes beyond the activation of MDA5 through recognition of 5'ppp-RNA and represents a regulatory mechanism for all MDA5 activation events, it is not relevant to the theme of "An arms race under RIG-I loss: 5'ppp-RNA and its alternative recognition receptor MDA5." Therefore, all investigations in this paper should focus solely on events when MDA5 recognizes 5'ppp-RNA. Any data associated with the broader regulatory mechanisms and m6A methylation of MDA5 should be excluded from this manuscript and instead be included in a separate study dedicated to exploring this specific topic.

      Our theme aims to showcase RNA viruses, rather than an interaction between 5'ppp-RNA and host virus receptors, which our current topic cannot accurately express. Therefore, we made two main changes: firstly, we limited the study species to M. miiuy, although some studies on the functional substitution of MDA5 for RIG-I involved birds. Secondly, change “5’ppp-RNA” to “5’ppp-RNA virus”. We believe that the revised title is more in line with our current research contents.

      (15) The running title appears to be hastily done.

      We modified it to “MDA5 recognizes 5’ppp-RNA virus in species lacking RIG-I”.

      (16) There are many descriptions that are not strongly related to the main theme of the article in the introduction section, making it lengthy and fragmented. Please focus on the research background of RIG-I and MDA5, including their structures, functions, and regulatory mechanisms, as well as the research progress on the compensatory effect of MDA5 in the absence of RIG-I and its evolutionary adaptation mechanism in other species.

      Based on the suggestions of the reviewers, we have removed some of the less relevant content in the introduction and added research progress on the compensatory effect of MDA5 in the evolutionary adaptation mechanism of tree shrews in the absence of RIG-I.

      (17) Lines 149-156 in the "Results" section include content that resembles an "Introduction" It is important to avoid duplicating information in the results section. Therefore, the authors are encouraged to revise this paragraph to ensure conciseness in the article.

      We have streamlined this section to enhance the article's conciseness and clarity.

      (18) In the "Results" section, at line 177, the authors assert, "As depicted in Figure 1F-1H," which should be corrected to Figure 2F-2H. Furthermore, the y-axis of the two figures on the right-hand side of Figure 2H represents the ISG15 genes. At line 182, "as demonstrated in Figure 1I-1L," should be revised as "as illustrated in Figure 2I-2L". The authors demonstrated a lack of attention to detail.

      Thank you to the reviewer for pointing out our errors, and we have made the necessary corrections.

      (19) In lines 197-198, the authors stated that "MDA5-ΔRD showed an inability to interact with SCRV." However, Figure 3D did not reveal any significant difference, thus it is advisable to repeat this experiment at least once.

      We have replaced this virus spot image with a new one (Figure 3D).

      (20) In lines 200-201 of the "2.3 RD domain is required for MDA5 to recognize SCRV" section, the authors report that the expression of antiviral genes was induced by the overexpression of both MDA5 and MDA5-ΔRD, even in the absence of infection (Figure 3F). Why does the expression of antiviral genes increase in the absence of viral RNA stimulation? Please provide a reasonable explanation.

      In the absence of viral infection, overexpression of viral receptor proteins may still transmit erroneous signaling, affecting the body's immunity. We speculate that due to the preservation of the CARD domain by MDA5 and MDA5-ΔRD, they can still induce the expression of antiviral factors without ligands, although this induction effect is much smaller than that of viral infection. However, in order to better demonstrate the function of the RD domain of MDA5 in M. miiuy, we have changed the experimental plan, as shown in the figure 3F. We detected the induction of antiviral factors by overexpression of MDA5 and MDA5-△RD under poly (I:C)-HMW stimulation. This can indicate that the RD domain has a conserved function in the recognition of poly(I:C)-HMW in M. miiuy, and can serve as a positive control for the recognition of SCRV virus invasion by the RD domain of MDA5.

      (21) Please provide the GeneBank accession number of M. miiuy MDA5.

      The GeneBank accession number of M. miiuy MDA5 was added in the section 4.5 plasmids construction.

      (22) The content of lines 228-233 in the "Results" section bears resemblance to that of the "Introduction." To ensure the avoidance of information duplication, it is recommended to remove this paragraph from the results section.

      This section has been streamlined.

      (23) The bands of mmiMDA5 in the 5'ppp-RNA and dsRNA lanes in Figure 4B are weak and almost unobservable. Please replace them with clear images.

      We have rerun this experiment and replaced the images (Figure 4B).

      (24) In Figure 5G and at line 253, there are only results presented for the SCRV infection group, while no results are shown for the control group. This raises the question of why the control group results are missing. It is necessary to provide a reasonable explanation or correction for this issue.

      The "0 h" infection time point of the SCRV virus is the control group, and we have replaced it with a more intuitive image (Figure 5G).

      (25) In Figure 7C, it would be necessary to include the western blot result of YTHDF protein expression in order to verify the efficiency of YTHDF siRNA.

      In fact, we have attempted to detect the endogenous expression of YTHDF protein using available commercial antibodies. Unfortunately, only the YTHDF2 antibody can specifically recognize the endogenous protein expression of YTHDF2 in M. miiuy. In addition, the knockdown effect of si-YTHDF2 has been validated by YTHDF2 antibody (doi: 10.4049/jimmunol.2200618).

      (26) In line 422 of the "4.3 Cell culture and treatment" section, the paragraph raises a question regarding the nature of Miiuy croaker kidney cells (MKCs) and spleen cells (MPCs) - whether they are cell lines or freshly isolated cells (or primary cultures) derived from kidney and spleen tissues. If these cells are indeed cell lines, it is requested to provide detailed information about the sources and properties of the cells (such as whether they are epithelial cells or other mixed cell types) and the generations of propagation. Alternatively, if the cells were freshly isolated or primary cultures obtained from fish, the method for cell isolation should be provided. The source and stability of cells are extremely important for ensuring the repeatability and reliability of experimental outcomes.

      M. miiuy kidney cells (MKCs) and spleen cells (MPCs) are cell lines derived from the kidney and spleen tissues of M. miiuy, with passages ranging from 20 to 40 times. These details have been incorporated into section 4.3.

      (27) There are many inaccurate descriptions in the text, which employ concepts that are too broad. These descriptions need to be narrowed down to specific species or objects. Here are a few examples, along with the necessary revisions. Other similar instances should also be revised accordingly. For instance, in line 119, "fish MDA5" should be changed to "Miiuy croaker MDA5." Similarly, in line 166, "fish MDA5-mediated signaling pathway" should be changed to "Miiuy croaker MDA5-mediated signaling pathway." In line 174, "fish MDA5" should be revised to "Miiuy croaker MDA5." Additionally, in line 185, "antiviral responses of teleost" should be changed to "antiviral responses of Miiuy croaker." In line 197, "interact with SCRV" should be revised to "interact with 5'ppp-RNA of SCRV." In line 337, "loss of RIG-I in the vertebrate" should be modified to "loss of RIG-I in Miiuy croaker and chicken." Similarly, in line 338, "MDA5 of fish" should be changed to "MDA5 of Miiuy croaker." Lastly, in line 348, "RIG-I deficient vertebrates" should be revised to "RIG-I deficient Miichthys miiuy and Gallus gallus."

      Thank you for the reviewer's suggestions. We have made revisions to these inaccurate descriptions and reviewed the entire manuscript to address similar statements with broad concepts.

      (28) Finally, it should be noted that a similar discovery has already been reported in tree shrews (Ling Xu, et al., Proc Natl Acad Sci., 2016, 113(39):10950-10955). This article shares similarities with that research report, therefore it is necessary to discuss in detail the relationship between the two in the discussion and compare and analyze the evolutionary patterns of MDA5 from it.

      Based on the reviewer's suggestions, we have compared the similarities and differences between these two reports during the discussion and analyzed the evolutionary dynamics of MDA5 in these vertebrates lacking RIG-I.

      Minor concerns:

      Thank you to the reviewer for their meticulous examination to our manuscript, we have made revisions to the following suggestions.

      (1) At line 120, the sentence "SCRV(one 5'ppp-RNA virus)" should have a space between "SCRV" and "(one 5'ppp-RNA virus)". Please make this correction.

      Corrected.

      (2) At lines 147-148, the sentence "However, the downstream gene of TOPORSa is missing a RIG-I" is not accurate and needs modification.

      We have modified this sentence.

      (3) At line 184, "findings indicate" should be corrected to "findings indicated".

      Corrected.

      (4) At line 189, "a 5'ppp-RNA virus" should be deleted and the text seems redundant.

      Deleted.

      (5) At line 198, "replication. (Figure 3C-3E)", please remove the punctuation between "replication" and "(Figure 3C-3E)".

      Corrected.

      (6) At line 416 in "Materials and methods" section, "4.2 Sample and challenge" should be corrected to "4.2 Fish and challenge".

      Corrected.

      (7) At line 419, the authors state that "The experimental procedure for SCRV infection was performed as described", please briefly describe the SCRV infection method and the infectious dose.

      Based on the reviewer's suggestions, we have added relevant descriptions of SCRV infection in section 4.2.

      (8) There are several formatting issues in the "Materials and Methods" section. For instance, in line 424, there is no space between the number and letter in "100 μg/ml" and "26 ℃" should be corrected to "26℃". Additionally, in line 430, "Cells" should be corrected to "cells".

      Corrected.

      (9) At line 446, "50 ng/ul" and "100 mU/ul" should be corrected to "50 ng/μl" and "100 mU/μl".

      Corrected.

      (10) At line 459, "primers 1)" should be corrected to "primers".

      Corrected.

      (11) At lines 461-464, the description "For protein purification, MDA5 plasmids with 6× His tag was constructed based on pcDNA3" seems to be no direct logical connection between protein purification and the plasmid construction. Please make the necessary corrections.

      Corrected.

      (12) At line 548, "cytoplasmic" should be corrected to "Cytoplasmic".

      Corrected.

      (13) At line 549, "5× 107" should be corrected to "5 × 107".

      Corrected.

      (14) At line 557, "MgCl2" should be corrected to "MgCl2".

      Corrected.

      (15) At line 558, "6 %" should be corrected to "6%".

      Corrected.

      (16) At line 565, "50μg" should be corrected to "50 μg".

      Corrected.

      (17) At line 571, "300{plus minus}50 bp." should be corrected to "300 {plus minus} 50 bp."

      Corrected.

      (18) At lines 592-593, the sentence "After several incubations, the m6A level was quantified colorimetrically at a wavelength of 450 nm" does not read smoothly, please improve it.

      Revised.

      (19) At line 786, "MDA5 recognize" should be corrected to "MDA5 recognized".

      Corrected.

      (20) At lines 788 and 798, "Pulldown" should be corrected to "Pull-down".

      Corrected.

      (21) At lines 790 and 796, "bluestaining" should be corrected to "blue staining".

      Deleted.

      (22) At line 825, "SCRV and infection" should be corrected to "SCRV infection".

      Corrected.

      (23) At lines 826-827, "SCRV (H) and poly(I:C) (I) infection" should be corrected to "SCRV infection (H) and poly(I:C) stimulation (I)".

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This work shows, based on basic laboratory investigations of invitro-grown bacteria as well as human bone samples, that conventional bacterial culture can substantially underrepresent the quantity of bacteria in infected tissues. This has often been mentioned in the literature, however, relatively limited data has been provided to date. This manuscript compares culture to a digital droplet PCR approach, which consistently showed greater levels of bacteria across the experiments (and for two different strains).

      Strengths:

      Consistency of findings across in vitro experiments and clinical biopsies. There are real-world clinical implications for the findings of this study.

      Weaknesses:

      No major weaknesses. Only three human samples were analyzed, although the results are compelling.

      We only put in three examples of clinical diagnosis to showcase the application of this method particularly to osteomyelitis. For further validation, larger cohort studies are required, which are currently underway.

      Reviewer #2 (Public Review):

      In this study, the authors address discrepancies in determining the local bacterial burden in osteomyelitis between that determined by culture and enumeration by DNA-directed assay. Discrepancies between culture and other means of bacterial enumeration are long established and highlighted by Staley and Konopka's classic, "The great plate count anomaly" (1985). Here, the authors first present data demonstrating the emergence of discrepancies between CFU counts and genome copy numbers detected by PCR in S. aureus strains infecting osteocyte-like cells. They go on to demonstrate PCR evidence that S. aureus can be detected in bone samples from sites meeting a widely accepted clinicopathological definition of osteomyelitis. They conclude their approach offers advantages in quantifying intracellular bacterial load in their in vitro "co-culture" system.

      The publication related to “The great plate count anomaly (1985)” has been added to revised version as new reference #2.

      Weaknesses

      - My main concern here is the significance of these results outside the model osteocyte system used by this group. Although they carefully avoid over-interpreting their results, there is a strong undercurrent suggesting their approach could enhance aetiologic diagnosis in osteomyelitis and that enumeration of the infecting pathogen might have clinical value. In the first place, molecular diagnostics such as 16S rDNA-directed PCR are well established in identifying pathogens that don't grow. Secondly, it is hard to see how enumeration could have value beyond in vitro and animal model studies since serial samples will rarely be available from clinical cases.

      Indeed, we initiated this study for the purpose of trying to improve the diagnostic outcomes for osteomyelitis, in particular that associated with prosthetic joint infection (PJI) but also all other forms, as the current gold-standard diagnostic approaches for this type of infection, either bacterial culture or whole genome sequencing, are very time consuming and costly, and yet are not necessarily accurate. Our method has the benefits (not limited to) of achieving absolute quantification of bacterial load in a shortened time period (in the order of hours) in clinical bone specimens from infected patients. Many of the identified bacterial species in patients were not able to be diagnosed by standard bacterial culturing. Moreover, one of the problematic features of treating bone infection is that repetitive surgeries are usually needed, particularly in PJI, hence, serial clinical bone specimens from the same patient are in fact often available. Therefore, our method of being able to quantify bacterial load offers the advantage of monitoring the infected status throughout the treatment journey. In this study, we chose the tuf gene as the targeting sequence to amplify the bacterial signal instead of the well-established 16S PCR for the reason that tuf provides much better sequence discrimination between bacterial species. Therefore, the short PCR amplicon of just 271 bp used in our study, is able to give us a highly accurate taxonomic readout. By this approach, we again shorten the time required for diagnosis. In the last paragraph of the Discussion in the revised manuscript, extra text, a figure demonstrating the strong sequence diversity in tuf (Supplementary Figure 2) and an additional reference have been added to address the Reviewer’s concerns.

      - I have further concerns regarding the interpretation of the combined bacterial and host cell-directed PCRs against the CFU results. Significance is attached to the relatively sustained genome counts against CFU declines. On the one hand, it must be clearly recognised that the detection of bacterial genomes does not equate to viable bacterial cells with the potential for further replication or production of pathogenic factors. Of equal importance is the potential contribution of extracellular DNA from lysed bacteria and host cells to these results. The authors must clarify what steps, if any, they have taken to eliminate such contributions for both bacteria and host cells. Even the treatment with lysotaphin may have coated their osteocyte cultures with bacterial DNA, contributing downstream to the ddPCR results presented.

      We agree that concerns around the interpretation of any molecular readout need to be taken into account. We have yet to find a method that can definitively identify bacterial viability in a clinical setting in the absence of culture. However, PJI and osteomyelitis in general is characterised by a high percentage of culture-negative infection cases, calling for such molecular approaches. Commercially available, so called “live/dead” bacterial PCR reagents exist that act as PCR signal inhibitors by penetrating the cell wall of compromised cells to prevent the PCR signal being generated from those cells. In our experience, while these can provide a certain level of added scrutiny in an experimental setting, they are not definitive because the reaction is often incomplete in an idealised situation and also the reagent may cancel signal from viable bacteria growing under conditions of stress, such as during antimicrobial treatment and host-derived stress imparted in intracellular or intra-tissue environments. Indeed, such stresses are likely contributors to clinical non-culturability. Whole genome sequencing would provide more certainty of bacterial viability to demonstrate genomic intactness but as we discuss herein, this a lengthy and costly process, and one which may prove difficult from host tissue with a low pathogen load. It should be noted that the significance of any diagnostic readout, including from culture, WGS or our method reported here would need to be interpreted by the treating clinical team. We would argue that a rapid, practical molecular diagnostic method in the absence or even presence of culture would provide treating clinicians with an improved rationale for tailoring antimicrobial treatments. 

      Strengths

      - On the positive side, the authors provide clear evidence for the value of the direct buffer extraction system they used as well as confirming the utility of ddPCR for quantification. In addition, the successful application of MinION technology to sequence the EF-Tu amplicons from clinical samples is of interest.

      - Moreover, the phenomenology of the infection studies indicating greater DNA than CFU persistence and differences between the strains and the different MOI inoculations are interesting and well-described, although I have concerns regarding interpretation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are thankful for the comments and suggestions from the Editor and Reviewers about our manuscript submitted to the eLife Journal. We have addressed all the comments, and we think these modifications will help bring clarity to our message and be helpful to your readership. Here we include an outline of the corrections performed, as well as a detailed response to each of the reviewer’s comments.

      As per the Editor and Reviewers suggestions, outline of corrections:

      ·        The title of the manuscript has been changed to reflect a more conservative conclusion.

      ·        Changes in the main manuscript text were made to enhance clarity, including the use genetic terminology and naming.

      ·        Specific responses to some comments from the reviewers are included in this document. We combined some comments that would be better addressed together.

      Accompanied to this letter is an updated version of our manuscript with the track changes feature enabled. Again, we are thankful of the comments and suggestions we received, and we hope this revised version of our manuscript will be accompanied by an updated assessment and public reviews and a final eLife Version of Record.

      Response to the public review and minor recommendations.

      From Reviewer #1:

      The major inference of the work is that SIV infection of gorillas drove the observed diversity in gorilla CD4. This is supported by the majority of SNPs being localized to the CD4 D1, which directly interacts with the envelope, and the demonstrated functional consequences of that diversity for viral entry. However, SIVgor (to the best of my knowledge) only infects Western lowland gorillas (Gorilla gorilla gorilla), and one Gorilla gorilla diehli and three Gorilla beringei graueri individuals were included in the haplotype and allele frequency analyses. The presence of these haplotypes or the presence of similar allele frequencies in Eastern lowland and mountain gorillas would impact this conclusion. It would be helpful for the authors to clarify this point.

      From Reviewer #1 (minor comment):

      Which subspecies of gorilla are the nsSNPs coming from? Gorilla gorilla diehli [n =1]; Gorilla beringei graueri [n = 3]) are not extant reservoirs of SIV and to my knowledge are not thought to have been, and so it's important to point out where the diversity is coming from if the authors are asserting that SIVgor drove this population-level diversity in gorilla CD4.

      We initially included genomic data from all the gorilla individuals available to maximize sensitivity to identify allelic variants. Although evidence points to eastern gorillas not being currently infected with SIV, our results show that all allelic variants identified have differential susceptibility to the HIV-1 and SIVcpz strains tested. The allelic variants we identified with this genomic data set match the variants identified by Russell et al (doi.org/10.1073/pnas.2025914118), including the ones found in eastern gorillas, and recapitulate that those variants have differential susceptibility to lentiviral entry, similar to the variants of western populations. Whether eastern gorillas have been exposed to lentiviruses in the past remains unknown.

      From Reviewer #1:

      The authors appear to use a somewhat atypical approach to assess intra-population selection to compensate for relatively small numbers of NHP sequences (Fig. 6). However, they do not cite precedence for the robustness of the approach or the practice of grouping sequences from multiple species for the endemic vs other comparison. They also state in the methods that some genes encoded in the locus were removed from the analysis "because they have previously been shown to directly interact with a viral protein." This seems to undercut the analysis and prevents alternative explanations for the observed diversity in CD4 (e.g., passenger mutations from selection at a neighboring locus).

      Given the nature of our samples, to detect any influence of natural selection acting on CD4, we chose to compare patterns of molecular evolution of CD4 to its neighboring loci. Comparisons of molecular evolution signatures across genomic regions are the basis of methods to detect positive selection (e.g., Sabeti DOI: 10.1038/nature01140). For our comparison, the neighboring loci represent our neutral standard for the genomic region CD4 resides. Our rationale is that demographic and neutral influences on the number and frequency of polymorphic sites in a region would equally affect all loci in a genomic region. Because these neighboring loci are our neutral benchmark, we excluded before analysis other genes in this genomic region that interact with viruses. The logic is that these loci may be evolving under the influence of positive selection and would decrease the power of our comparison. None of the excluded loci are direct neighbors to CD4. This, and given that the CD4 genomic region in humans is of average recombination rate, dampens the possibility that what we are observing at CD4 is due to selection acting at a neighboring locus. In addition, the classic population genetic method to detect positive selection, the McDonald-Kreitman test (McDonald DOI: 10.1038/351652a0), was originally presented combining polymorphism data across species. We assume that any effect on levels of diversity created by combining variability between species would equally affect all loci included in the study, not just CD4.

      From Reviewer #1:

      Data in Figure 5 is graphed as % infected cells instead of virus titer (TDU/mL). It's unclear why this is the case, and prevents a comparison to data in Figure 2 and Figure 4.

      From Reviewer #1 (minor comment):

      Figure 5: the data presentation is now shown as % infected cells instead of viral titer. This makes it difficult to compare data from Figure 5 to other figures. Can the authors please either justify this change, display data consistently or provide matched data displays as a Supplemental Figure?

      For the experiments presented in figures 2 and 4 we used different volumes of infecting pseudoviruses, which allowed us to identify the linear range of infection. Then, based on the number of cells plated per experimental replicate, we calculated a virus titer. In follow-up experiments (Fig. 5), we used fixed volumes of virus that would infect ~10-20% of control (wild-type; wt) CD4-expressing cells. Comparisons were then made between wt and mutated CD4s, and these data are best presented in their raw forms as percent cells infected.  Although this change in method prevents direct comparison between the figures, we focused on the differences observed between the experimental conditions per experimental panel.

      From Reviewer #1:

      The lack of pseudotyping with SIVgor envelope is a surprising omission from this study, that would help to contextualize the findings.

      From Reviewer #2 (minor comment):

      The inclusion of HIV-1 but not SIVgor strains in Figures 2D/E is somewhat conspicuous since chimpanzee alleles certainly differ in susceptibility to SIVcpz (and SIVgor) strains per Russell et al. 2021. The authors should either test some SIVgor infections, cite published data on at least extant human/chimpanzee/gorilla CD4 susceptibility to SIVgor, or address why they did not include it.

      We agree the data of host susceptibility to SIVgor strains would have been an interesting question to explore. However, we opted to focus on the transmission of SIVcpz strains into gorilla populations for this study. It is worth mentioning that we have cloned SIVgor envelope genes from some strains into our expression system, but we were unable to recover infectious pseudoviruses using an HIV-1DEnv-GFP backbone. This suggests that HIV-1 may be incompatible with incorporating SIVgor Env into virus particles. Recently, Russell et al (DOI: 10.1073/pnas.2025914118) managed to generate SIVgor Env pseudotyped virions using a different backbone (SIVcpzDEnv-GFP) that was unavailable to us at the time of this study.

      From Reviewer #1:

      Similarly, building gorilla CD4 haplotype SNPs onto the hominin ancestor (as opposed to extant human CD4) may provide additional insights that are meaningful toward understanding the evolutionary trajectory of gorilla CD4.

      We decided to use the extant human CD4 as a backbone to test the effects on the individual amino acid variants found in the allelic diversity of the gorilla population since the human protein is highly susceptible to all the HIV-1 and SIV strains tested, and the expected phenotype is a loss-of-function. Since the D1 of the human and ancestral sequences for CD4 are almost identical (except for a change that is fixed in gorillas), and they showed similar levels of susceptibility to lentivirus entry, we expect that the phenotypes found would be the same if the gorilla SNPs were built into the ancestral CD4 backbone.

      From Reviewer #2:

      To bolster the argument that lentiviruses are indeed the causative driver of this diversification, which seems likely from a logical perspective but is difficult to prove, Warren et al. pursue two novel lines of evidence. First, the authors reconstruct ancestral CD4 genes that predate lentiviral infection of hominid populations. They then demonstrate that resistance to lentiviral infection is a derived trait in chimpanzees and gorillas, which have been co-evolving with endemic lentiviruses, but not in humans, which only recently acquired HIV. Nevertheless, the derived resistance could be stochastic or due to drift. This argument would be strengthened by demonstrating that bonobo and orangutan CD4, which also do not have endemic lentiviruses, resemble the ancestral and human susceptibility to great-ape-infecting lentiviruses.

      From Reviewer #2 (minor comment):

      The data presented in Figure 2, showing that chimp and gorilla (but not human) CD4 resistance to lentiviral infection is a derived trait, is very intriguing for suggesting that endemic lentiviruses are the causative driver of CD4 evolution. Nevertheless, this could be stochastic or due to genetic drift. Given the later emphasis on several other non-endemically infected species, the authors should at the very least include the sequences for bonobo and orangutan CD4 in the presented alignment (Fig 2B). Ideally, they would also test these orthologs to demonstrate that they are not resistant to lentiviruses infecting great apes (SIVcpz / HIV-1 / SIVgor). If they have also derived resistance, this would suggest a possible other evolutionary driver or genetic drift.

      Based on our analysis on polymorphic sites using available data from populations of apes, we strongly believe the accumulation of resistant polymorphisms in CD4 did not arise in a stochastic manner. The frequency and accumulation of these changes strongly correlate with the function of CD4 as a receptor for lentivirus entry. We agree that experimentally testing the CD4 protein from bonobo and orangutan would strengthen our conclusions; however, based on our genomic analyses, we decided to focus on the species that would present a higher level of variability of susceptibility to the lentivirus tested, namely gorillas and chimpanzees.

      From Reviewer #2:

      Warren et al. provide a population genetic argument that only endemically infected primates exhibit diversifying selection, again arguing for endemic lentiviruses being the evolutionary driver. The authors compare SNP occurrence in CD4 to neighboring genes, demonstrating that non-synonymous SNP frequency is only elevated in endemically infected species. Moreover, these amino-acid-coding changes are significantly concentrated in the CD4 domain that binds the lentiviral envelope. This is a creative analysis to overcome the problem of very small sample sizes, with very few great ape individuals sequenced. The additional small number of species compared (2-3 in each group) also limits the power of the analysis; the authors could consider expanding their analysis to Old World Monkey species that do or do not have endemic lentiviruses, as well as great apes.

      The scope of this project was to evaluate the differential phenotype of the accumulated polymorphisms found in the ape branch of the primates. Although evaluating the accumulation of polymorphisms in a broader range of primates would generate interesting observations, this would likely require increasing the total number of primate species to include sampling along the speciation tree, many of which lack population level data.

      From Reviewer #1 (minor comment):

      Ancestral reconstruction methods and associated data tables should be included to indicate statistical support for assigned codons. A comment on ambiguity at relevant positions is needed. Similarly, given the polymorphic nature of gorilla and chimpanzee CD4, how confident are the authors in their ancestral reconstructions based on a single representative genome per species? Does this change when you include the broader panel of gorilla sequences? Is the ancestral reconstruction robust to other methods besides PAML?

      We used the PAML software package to reconstruct the ancestral hominin and hominid sequence of CD4 because it is a standard and well recognized method for this purpose. For this analysis, we used the set of primate sequences selected for positive selection analyses (see methods), namely the longest isoform sequences for each of the available species that best aligned with human CD4. We feel that the best way to perform to the ancestral state reconstruction was to use only these curated sequences instead of the population level sequences, removing potential biases introduced by having different numbers of variants per species. 

      From Reviewer #1 (minor comment):

      Page 10: "It seems that allele 2, which doesn't have this glycan, would be at a fitness disadvantage. In support of this, allele 2 is one of the least frequent alleles in the gorilla population that we surveyed (Figure 3B)." - this inference depends on the gorilla species that encode allele 2 and allele frequencies. There are statistical tests to address this inference.

      Population genetic statistics that test for skews in sample allele frequencies are not appropriate here due to the nature of the samples in this study. However, the reviewer is correct that our inference in allele frequency is dependent on the gorilla species that we find this allele in. Allele 2 is found in the Gorilla beringei graueri subspecies of gorilla included in this study.  We only have data for three individuals (six alleles) from this subspecies compared to 51 individual (102 alleles) from Gorilla gorilla gorilla. As such, genetic subdivision between the gorilla subspecies could also produce the low frequency of allele 2 observed in our sample.

      From Reviewer #1 (minor comment):

      Page 11: "These results imply that the resistance to SIVcpz found in gorilla individuals is not dependent on single amino acids, but rather the cumulative effect of multiple SNPs." Would it be more relevant (or relevant in other ways) to test this statement by putting those mutations into the hominid ancestor? Testing individual residues in the context of human CD4 may be subject to epistasis or several other factors.

      We agree that constructing multiple of the resistant SNPs in the susceptible human background would have strengthened our hypothesis, as all these amino acid changes are associated with increased resistance to at least one of the lentiviruses tested. However, the number of CD4 variants to test would increase significantly and we feel that this approach was out of the scope of this manuscript.

      From Reviewer #1 (minor comment):

      Figure 6: If you perform this analysis on chimpanzee CD4 alone do you get the same result? Just gorillas? If you remove eastern/mountain gorillas? The very small numbers of non-human non-SIV-reservoir great apes may preclude a strong conclusion.

      We agree that our study is limited by the small number of available sequences from individuals of the studied species. If we remove a whole species or subspecies the statistical power would be greatly reduced. Removing all chimpanzees or gorillas (or a subspecies) would still show that only each of those species accumulate SNPs in the D1 region of CD4, although with less statistical significance.

      From Reviewer #2 (minor comment):

      Related to Figure 2: It would strengthen the argument that resistance is a derived trait if the authors mapped the causative mutations from gorilla CD4 onto the ancestral hominin CD4. However, this experiment is not particularly critical, merely a suggestion.

      We appreciate this suggestion. We decided to use the human CD4 backbone as it is widely susceptible to lentiviral entry. The hominid and hominin ancestral sequences are almost identical to the human sequence in domain 1, except for a fixed mutation shared with the gorilla CD4. We expect that the SNPs observed in the gorilla population would also reduce susceptibility to lentivirus entry in the ancestral CD4 reconstructions.

      From Reviewer #2 (minor comment):

      Related to Figure 3B: It is difficult to make much of the allele frequency for 8 alleles in 32 individuals. Can the authors collate this with allele frequency for the referenced 100 individuals from Russell et al. 2021, to give a better sense of population frequency? This may allow the authors to better correlate allele frequency with SIVcpz resistance patterns in Figure 4, strengthening their argument that more resistant alleles should be over-represented in the population.

      At the time of our analysis the data from Russell (DOI: 10.1073/pnas.2025914118) was not available to collate or compare. When that data became available, we immediately compared the existence of the alleles found and confirmed that the ones we found were also detected in the samples used in that study.

      From Reviewer #2 (minor comment):

      Related to Figure 6: As written, several methodological details should be clarified. How were human genomes selected to limit the sample size to 50?

      We selected a total of 50 human individuals in order to size-match the sample size of the largest group in Fig 6B (chimpanzee, n=50). We randomly selected 10 individuals for each of the 5 superpopulations [Africans (AFR), Admixed Americans (AMR), East Asians (EAS), Europeans (EUR) and South Asians (SAS)] defined by the 1000 Genome Project.

      From Reviewer #2 (minor comment):

      Related to Figure 6: What comparison is being reported for the Mann-Whitney U test (CD4 vs. which gene)? Are the means shown in A an average of 2 (endemic) or 3 (non-endemic) species - if so, the authors should show the individual data points to give a clearer depiction of the data spread. In addition, it is not clear that a statistical test with sample sizes of 2 is meaningful, since Mann Whitney typically assumes n > 5. To strengthen this statistical argument, it may be necessary to include additional species that have (a) multiple genomes (or at least this locus) sequenced, and (b) have or lack lentiviral sequences. This may necessitate expanding the analysis to include Old World Monkeys (e.g. Rhesus Macaque Genome Project).

      In the Figure 6 we use the Mann-Whitney U test to compare variation between CD4 and the neighboring loci. The average and SEM are for two endemic and four non-endemic species (two orangutan datasets are from two distinct species vs the gorilla subspecies). It is true our sample size is small for any statistical testing. For the Mann-Whitney U-test it is generally preferred to have n > 5 in each group. So, we do run into problems with the endemically infected comparisons as we only have two data points (chimpanzee and gorilla) for the CD4 group. For the uninfected species, CD4 has four data points.

      From Reviewer #1 (minor comment):

      Page 6. "This suggests that the ancestral versions of CD4 in apes were susceptible to primate lentivirus entry" - The data show that tested virus pseudotyped with SIV/HIV envs can engage ancestral CD4 in the context of a canine cell line expressing human CCR5, but not necessarily that this interaction was sufficient for the process of entry per se, especially in the context of a gorilla (or hominid) cell. Some additional context would be useful for a broad readership.

      From Reviewer #1 (minor comment):

      Page 6: "but that selective pressures exerted by SIVs in the chimpanzee and gorilla lineages have led to the retention of mutations that confer resistance to primate lentivirus infection. This has not happened in humans where selective pressure by HIV-1 is too new" - this cannot be concluded from the data in Figure 1. It would be more appropriate as a Discussion point.

      From Reviewer #1 (minor comment):

      Page 14: "Natural tolerance is often required before a virus can establish itself long term in a host reservoir, and thus understanding it is key to understanding virus reservoirs in nature" - please provide a reference. This is one among several theories of long-term host-virus evolution dynamics/outcomes, and further discussion may benefit the broad readership of eLife.

      From Reviewer #1 (minor comment):

      Page 15: "There is a surprising outcome of virus-driven host evolution in that the divergence and diversity of these host genes ultimately comes at a detriment to the very viruses that drove this evolution." - it is not clear to this reviewer why this is surprising.

      From Reviewer #2 (minor comment):

      Related to Figure 5A: The authors suggest that the gorilla glycosylation site provides resistance to SIVcpz, based on TAN1.910, but in fact the glycosylated allele is no more resistant than the un-glycosylated allele to most SIVcpz strains (in Figure 4). The authors should acknowledge this more clearly in the text.

      From Reviewer #2 (minor comment):

      The title of this article (that infection "has driven selection") is somewhat overstated - though it seems very likely that lentiviruses are driving CD4 diversification, this is difficult to prove. The arguments presented here rely on very few data points: modern chimp and gorilla compared to ancestral CD4, and a population genetic analysis relying on 2 or 3 species with 10-50 individuals each. The authors should either bolster these arguments (see the above suggestions) and/or soften the claim in the title.

      Modifications to the main text of the manuscript have been made to enhance clarity on the subjects stated above.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We provide below a point-by-point reply to the Reviewers, and hope that our new manuscript will now meet the Reviewers’ concerns and the requirements for publication in eLife. 

      In summary, we have performed a new set of mouse humanization experiments using a new cohort of 4 additional HLA-DRB1*15-typed MS patients as donors, all presenting with highly active disease and under treatment with natalizumab. The new experiments aim to strengthen and further extend the findings of the original paper that HLA restriction rather than disease status plays an important role in the development of CNS inflammation. Additionally, we performed EAE using a revised protocol using lower amounts of peptide antigens to reduce the possibility of immune tolerance. Indeed, our original observations were further enriched with the finding that immunization increases infiltration of the CNS by human CD4 T cells, a finding consistent with EAE pathology, and that these human CD4 T cells co-localize with human CD8 T cells in the brain lesions. Further, we provide more detailed information concerning the EBV infection status of the PBMC donors used for humanization and find some first indications of relationships between the B cell engraftment in humanized mice, EBV status  of the donors and the development of brain lesions that might stimulate further investigation in future studies.   

      Point-by-point reply to reviewers:

      Reviewer 1:

      We thank Reviewer 1 for their valuable comments, and for their support of the overall approach as a model system. We have addressed the comments by providing additional requested information, as well as performing a EAE with a revised protocol, as suggested. We believe the new results significantly upgrades the information gained from this study.

      (1) Throughout their paper, the authors never quantify the difference in CD4 vs CD8 T cell infiltration into the CNS. While repeatedly claiming that there are fewer CD4 T cells present than CD8 T cells within the CNS, this data is not included. Further, spinal cord numbers of CD4 and CD8 are not provided in lieu of CD3 T cell characterization.

      Reply: We have now included quantitative data for the differences in CD4 vs CD8 T cells in the brain and spinal cord of non-immunized and EAE immunized mice. Thus, in brain (Fig. 2E) and spinal cord (Fig. 3D) of non-immunized mice, and brain (Fig. 4D, E, L) and spinal cord (Fig. 5D) of immunized mice we show data for numbers of hCD8 and hCD4 T cells, and ratios of CD4 to CD8 in at borders and parenchyma. Notably, using a revised EAE protocol in the second set of experiments, we observed a marked increase in hCD4 T cell infiltration at the CNS borders and parenchyma, an observation consistent with successful EAE immunization.

      B cells don't make up any significant component of the cells transferred from HLA-DR15 donors. While the cells transferred from the HLA-DR13 donor are composed of a considerable number of B cells, the mice that received these cells didn't develop any signs of neurologic disease.

      In the second experiment using new DR15 MS donors, we observed significant B cell engraftment also in several groups of DR15 MS mice. With the additional groups of mice, we were able to see a relationship between B cell engraftment in DR13 and DR15 MS mice with indicators of recent or ongoing reactivation of EBV. This is an interesting preliminary observation that might be tested in future larger studies. 

      (2) Incomplete exploration of potential experimental autoimmune encephalomyelitis (EAE) modeling. Comparison of the susceptibility of B2m-NOG mice to EAE dependent on various peptide doses would be highly informative. Given that the number of hCD45+ in the periphery of NOG mice decreases following this immunization it would be prudent for the authors to determine if such a high peptide dose is truly ideal for EAE development in this mouse model.

      Reply: We thank the reviewer for this critical comment. In the second group of experiments (DR15 MS2-5), we revised the EAE protocol to use lower amounts of peptides in a single immunization, thereby greatly reducing the exposure of human T cells to antigen and risk of tolerance/anergy. This resulted in (i), by-pass of the reduction in proportions of peripheral hCD45 cells following immunization in the peripheral blood (Fig. 1A), and (ii), increased numbers of hCD4 T cells and hCD4/hCD8 T cell ratios at the borders and infiltrating the parenchyma of brain (Fig. 4D,E) and spinal cord (Fig. 5D). 

      (3) The degree of myelin injury is not presented. The statement is repeatedly made that "demyelination was not observed in the brain or spinal cord" but no quantification of myelin staining is shown.  

      Reply: The reviewer refers to a pivotal feature (and limitation) of this particular humanized model. Despite significant T cell infiltration of white and grey matter regions of brain and spinal cord, there is no detectable demyelination. This has also been reported by in independent study using a similar humanized system (Zayoud et al., 2013). We have supplemented the figures with photomicrographs showing the presence of unperturbed myelin in the corpus callosum white T cell lesions (Fig. 4F, inset stained with Luxol fast blue), and a confocal micrograph in the same region double-immunostained for hCD45 immune cells and MBP (Fig. 4G). 

      Minor points:

      Method of quantification (e.g. cells per brain slice in figures 2E; 4E) is not very quantitative and should be justified or more appropriately updated to be more rigorous in methodology.

      Reply: In the new figures, we have changed the method of quantification of brain parenchyma infiltrating cells from per brain slice, to cells per tissue area mm2 (Fig. 2D, Fig. 4D).

      Fig. 4 data should be shown from un-immunized DR15 MS and DR15 HI mice.

      Reply: We now include the quantitative data from un-immunized mice compared to immunized mice in all groups (Fig. 4 C-E). 

      Reviewer 2:

      We thank Reviewer 2 for their very pertinent comments and overall for highlighting the importance of humanized mice as an approach for further understanding the pathobiology of MS. We also thank this reviewer for their positive comments concerning the study design, specifically the use of fresh PBMC isolated from HLADRB1-typed MS individuals and healthy control. The reviewer highlights 4 major weaknesses of the study that we have tried to address in order to increase the value of the study.

      (i) Lack of sufficient sample size (n=1 in each group) to make any conclusion.

      Reply: We have increased the sample size for the DR15 MS group from n=1 to n=5 by generating new humanized mice using PBMC freshly isolated from additional MS donors, all HLA-DRB1*5 with active RRMS and under treatment with natalizumab. Here we were able to maximize on our excellent collaboration with neurologists at the neighboring University Hospital, which runs a large organized MS outpatient clinic, with HLADRB1-typed MS individuals that are closely monitored over the course of their disease and therapy. In this way, we were able to address the engraftment success of human immune cells and variability in CNS lesion development across mice generated from 5 different DR15 MS patients. We also monitored markers for EBV activation status in all the patients used for mouse humanization in this study. 

      (ii) Lack of phenotype in mice.

      Reply: As already described in the results and address in the discussion, the B2m-NOG immunodeficient mouse strain used here is a state-of-the-art experimental tool for humanization studies, but unfortunately fails to support engraftment by human monocytes. We and previous groups (Zayoud et al., 2013) show that CNS lesions in humanized mice contain high numbers of hCD4 and CD8 T cells, accompanied by locally activated murine microglia and astrocytes, but lack human monocytes. The humanized mice contain large proportions of immature mouse CD11b+Ly6Chi monocytes in the periphery (Suppl. Table 4) but these cells are not recruited into the CNS in non-immunized or immunized humanized mice, potentially due to incompatible chemokine signals across mouse/human. The absence of human monocyte engraftment in this model is the most likely reason that lesions do not demyelinate and this limitation of the currently available host mouse strains is one that needs to be addressed before full modelling of CNS demyelination by human immune cells can be achieved.

      (iii) No disease phenotype even in humanized mice immunized for disease using standard disease induction protocol employed in an animal model of MS.

      Reply: As described above, following the suggestion of reviewer 1 (point 2) we revised the EAE protocol to use lower amounts of peptides given as a single immunization. This resulted in increased numbers of hCD4 T cells and the hCD4/hCD8 T cell ratios at the borders and infiltrating the parenchyma of brain ((Fig. 1E, Fig. 2D) and spinal cord (Fig. 5D), all indicative of a successful EAE immunization. Although immunized mice showed lesions with mixed populations of hCD4 and hCD8 T cells, demyelination and therefore clinical symptoms were again not observed. As outlined in (ii) above, successful human monocyte engraftment would be fundamental for the development of demyelination and clinical symptoms in PBMC humanized mice, and new immunodeficient animal strains should be developed to achieve this.  

      (iv) Mechanistic data on why CD8 T cells are more enriched than CD4+ T cells.

      Reply: The question of why hCD8 T cells are more enriched in the CNS than hCD4 cells is answered at least in part by the results from our new EAE experiments, which clearly show that immunization increases CNS infiltration by hCD4 T cells versus hCD8 T cells. In general, EAE protocols are designed to activate antigen-specific CD4 T cells and this is verified in the CNS of immunized humanized mice, where hCD4 T cells infiltrate to join hCD8T cells in lesion areas. The predilection of hCD8 T cells for CNS is obvious in non-immunized humanized mice, especially in the parenchyma (see Fig. 2E) and MS patients, while hCD4 infiltration becomes important after EAE immunization. The humanized model system might therefore represent a unique tool for studying mechanisms underlying preferential hCD8 T cell involvement in MS neuroinflammaton, a system that is not accurately modelled in current EAE models. As this reviewer correctly points out, this is very important point as postmortem MS patients’ brains have more CD8 T cells than CD4 T cells.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors provide a new computational platform called Vermouth to automate topology generation, a crucial step that any biomolecular simulation starts with. Given a wide arrange of chemical structures that need to be simulated, varying qualities of structural models as inputs obtained from various sources, and diverse force fields and molecular dynamics engines employed for simulations, automation of this fundamental step is challenging, especially for complex systems and in case that there is a need to conduct high-throughput simulations in the application of computer-aided drug design (CADD). To overcome this challenge, the authors develop a programming library composed of components that carry out various types of fundamental functionalities that are commonly encountered in topological generation. These components are intended to be general for any type of molecules and not to depend on any specific force field and MD engines. To demonstrate the applicability of this library, the authors employ those components to re-assemble a pipeline called Martinize2 used in topology generation for simulations with a widely used coarse-grained model (CG) MARTINI. This pipeline can fully recapitulate the functionality of its original version Martinize but exhibit greatly enhanced generality, as confirmed by the ability of the pipeline to faithfully generate topologies for two high-complexity benchmarking sets of proteins.

      Strengths:

      The main strength of this work is the use of concepts and algorithms associated with induced subgraph in graph theory to automate several key but non-trivial steps of topology generation such as the identification of monomer residue units (MRU), the repair of input structures with missing atoms, the mapping of topologies between different resolutions, and the generation of parameters needed for describing interactions between MRUs.

      Weaknesses:

      Although the Vermouth library appears promising as a general tool for topology generation, there is insufficient information in the current manuscript and a lack of documentation that may allow users to easily apply this library. More detailed explanation of various classes such as Processor, Molecule, Mapping, ForceField etc. that are mentioned is still needed, including inputs, output and associated operations of these classes. Some simple demonstration of application of these classes would be of great help to users. The formats of internal databases used to describe reference structures and force fields may also need to be clarified. This is particularly important when the Vermouth needs to be adapted for other AA/CG force fields and other MD engines.

      We thank the reviewer for pointing out the strengths of the presented work and agree that one of the current limitations is the lack of documentation about the library. In the revision, we point more clearly to the documentation page of the Vermouth library, which contains more detailed information on the various processors. The format of the internal databases has also been added to the documentation page. Providing a simple demonstration of applications of these classes is a great suggestion, however, we believe that it is more convenient to provide those in the form of code examples in the documentation or for instance jupyter notebooks rather than in the paper itself.  

      The successful automation of the Vermouth relies on the reference structures that need to be pre-determined. In case of the study of 43 small ligands, the reference structures and corresponding mapping to MARTINIcompatible representations for all these ligands have been already defined in the M3 force field and added into the Vermouth library. However, the authors need to comment on the scenario where significantly more ligands need to be considered and other force fields need to be used as CG representations with a lack of reference structures and mapping schemes.

      We acknowledge that vermouth/martinize2 is not capable of automatically generating Martini mappings or parameters on the fly for unknown structures that are not part of the database. However, this capability is not the purpose of the program, which is rather to distribute and manage existing parameters. Unlike atomistic force fields, which frequently have automated topology builders, Martini parameters are usually obtained for a set of specific molecules at a time and benchmarked accordingly. As more parameters are obtained by researchers, they can be added to the vermouth library via the GitHub interface in a controlled manner. This process allows the database to grow and in our opinion will quickly grow beyond the currently implemented parameters. Furthermore, the API of Vermouth is set up in a way that it can easily interface with automated topology builders which are currently being developed. Hence this limitation in our view does not diminish the applicability of vermouth to high-throughput applications with many ligands. The framework is existing and works, now only more parameters have to be added.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript by Kroon, Grunewald, Marrink and coworkers present the development of Vermouth library for coarse grain assignment and parameterization and an updated version of python script, the Martinize2 program, to build Martini coarse grained (CG) models, primarily for protein systems.

      Strengths:

      In contrast to many mature and widely used tools to build all-atom (AA) models, there are few well-accepted programs for CG model constructions and parameterization. The research reported in this manuscript is among the ongoing efforts to build such tools for Martini CG modeling, with a clear goal of high-throughput simulations of complex biomolecular systems and, ultimately, whole-cell simulations. Thus, this manuscript targets a practical problem in computational biophysics. The authors see such an effort to unify operations like CG mapping, parameterization, etc. as a vital step from the software engineering perspective.

      Weaknesses:

      However, the manuscript in this shape is unclear in the scientific novelty and appears incremental upon existing methods and tools. The only "validation" (more like an example application) is to create Martini models with two protein structure sets (I-TASSER and AlphaFold). The success rate in building the models was only 73%, while the significant failure is due to incomplete AA coordinates. This suggests a dependence on the input AA models, which makes the results less attractive for high-throughput applications (for example, preparation/creation of the AA models can become the bottleneck). There seems to be an improvement in considering the protonation state and chemical modification, but convincing validation is still needed. Besides, limitations in the existing Martini models remain (like the restricted dynamics due to the elastic network, the electrostatic interactions or polarizability).

      We thank the reviewer for pointing out the strengths of the presented work, but respectfully disagree with the criticism that the presented work is only incremental upon existing methods and tools. All MD simulations of structured proteins regardless of the force field or resolution rely on a decent initial structure to produce valid results. Therefore, failure upon detection of malformed protein input structures is an essential feature for any high-throughput pipeline working with proteins, especially considering the computational cost of MD simulations. We note that programs such as the first version of Martinize generate reasonable-looking input parameters that lead to unphysical simulations and wasted CPU hours.

      The alpha-fold database for which we surveyed 200,000 structures only contained 7 problematic structures, which means that the success rate was 99% for this database. This example simply shows that users potentially have to add the step of fixing atomistic protein input structures, if they seek to run a high-throughput pipeline.

      But at least they can be assured that martinize2 will make sure to check that no issues persist.

      Furthermore, we note that the manuscript does not aim to validate or improve the existing Martini (protein) models. All example cases presented in the paper are subject to the limitations of the protein models for the reason that martinize2 is only the program to generate those parameters. Future improvements in the protein model, which are currently underway, will immediately be available through the program to the broader community.  

      Reviewer #3 (Public Review):

      Summary:

      The manuscript Kroon et al. described two algorithms, which when combined achieve high throughput automation of "martinizing" protein structures with selected protonation states and post-translational modifications.

      Strengths:

      A large scale protein simulation was attempted, showing strong evidence that authors' algorithms work smoothly.

      The authors described the algorithms in detail and shared the open-source code under Apache 2.0 license on GitHub. This allows both reproducibility of extended usefulness within the field. These algorithms are potentially impactful if the authors can address some of the issues listed below.

      We thank the reviewer for pointing out the strengths.  

      Weaknesses:

      One major caveat of the manuscript is that the authors claim their algorithms aim to "process any type of molecule or polymer, be it linear, cyclic, branched, or dendrimeric, and mixtures thereof" and "enable researchers to prepare simulation input files for arbitrary (bio)polymers". However, the examples provided by the manuscript only support one type of biopolymer, i.e. proteins. Despite the authors' recommendation of using polyply along with martinize2/vermouth, no concrete evidence has been provided to support the authors' claim. Therefore, the manuscript must be modified to either remove these claims or include new evidence.

      We acknowledge that the current manuscript is largely protein-centric. To some extent this results from the legacy of martinize version 1, which was also only used for proteins. However, to show that martinize2 also works for cyclic as well as branched molecules we implemented two additional test cases and updated formerly Figure 6 and now Figure 7. Crown ether is used as an example of a cyclic molecule whereas a small branched polyethylene molecule is a test case for branching. Needless to say both molecules are neither proteins nor biomolecules. 

      Method descriptions on Martinize2 and graph algorithms in SI should be core content of the manuscript. I argue that Figure S1 and Figure S2 are more important than Figure 3 (protonation state). I recommend the authors can make a workflow chart combining Figure S1 and S2 to explain Martinize2 and graph algorithms in main text.

      The reviewer's critique is fair. Given the already rather large manuscript, we tried to strike a balance between describing benchmark test cases, some practical usage information (e.g. the Histidine modification), and the algorithmic library side of the program. In particular, we chose to add the figure on protonation state, because how to deal with protonation states—in particular, Histidines—was amongst the top three raised issues by users on our GitHub page. Due to this large community interest, we consider the figure equally important. However, we moved Figure S1 from the Supporting Information into the manuscript and annotated the already mentioned text with the corresponding panels to more clearly illustrate the underlying procedure. 

      In Figure 3 (protonation state), the figure itself and the captions are ambiguous about whether at the end the residue is simply renamed from HIS to HIP, or if hydrogen is removed from HIP to recover HIS.

      Using either of the two routes yields the same parameters in the end, which are for the protonated Histidine. In the second route, the extra hydrogen on Histidine is detected as an additional atom and therefore a different logic flow is triggered. Atoms are never removed, but only compounded to a base block plus modification atoms. We adjusted the figure caption to point this out more clearly.  

      In "Incorporating a Ligand small-molecule Database", the authors are calling for a community effort to build a small-molecule database. Some guidance on when the current database/algorithm combination does or does not work will help the community in contributing.

      Any small molecule not part of the database will not work. However, martinize2 will quickly identify if there are missing components of the system and alert the users. At that point, the users can decide to make their files, guided by the new documentation pages. 

      A speed comparison is needed to compare Martinize2 and Martinize.

      We respectfully disagree that a speed comparison is needed. We already alerted in the manuscript discussion that martinize2 is slower, since it does more checks, is more general, and does not only implement a single protein model.

    1. Author response:

      We would like to thank the reviewers for their constructive feedback. We have thoroughly considered their concerns and comments and we aim to include some additional results in an updated version of this manuscript. In addition, we would like to address some of the comments, with which we respectfully disagree. Below is our point-by-point reply.

      Reviewer 1:

      Summary:

      This paper is focused on the role of Cadherin Flamingo (Fmi) - also called Starry night (stan) - in cell competition in developing Drosophila tissues. A primary genetic tool is monitoring tissue overgrowths caused by making clones in the eye disc that express activated Ras (RasV12) and that are depleted for the polarity gene scribble (scrib). The main system that they use is ey-flp, which makes continuous clones in the developing eye-antennal disc beginning at the earliest stages of disc development. It should be noted that RasV12, scrib-i (or lgl-i) clones only lead to tumors/overgrowths when generated by continuous clones, which presumably creates a privileged environment that insulates them from competition. Discrete (hs-flp) RasV12, lgl-i clones are in fact out-competed (PMID: 20679206), which is something to bear in mind. 

      We think it is unlikely that the outcome of RasV12, scrib (or lgl) competition depends on discrete vs. continuous clones or on creation of a privileged environment. As shown in the same reference mentioned by the reviewer, the outcome of RasV12, scrib (or lgl) tumors greatly depends on the clone being able to grow to a certain size. The authors show instances of discrete clones where larger RasV12, lgl clones outcompete the surrounding tissue and eliminate WT cells by apoptosis, whereas smaller clones behave more like losers. It is not clear what aspect of the environment determines the ability of some clones to grow larger than others, but in neither case are the clones prevented from competition. Other studies show that in mammalian cells, RasV12, scrib clones are capable of outcompeting the surrounding tissue, such as in Kohashi et al (2021), where cells carrying both mutations actively eliminate their neighbors.

      The authors show that clonal loss of Fmi by an allele or by RNAi in the RasV12, scrib-i tumors suppresses their growth in both the eye disc (continuous clones) and wing disc (discrete clones). The authors attributed this result to less killing of WT neighbors when Myc over-expressing clones lacking Fmi, but another interpretation (that Fmi regulates clonal growth) is equally as plausible with the current results.

      See point (1) for a discussion on this.

      Next, the authors show that scrib-RNAi clones that are normally out-competed by WT cells prior to adult stages are present in higher numbers when WT cells are depleted for Fmi. They then examine death in RasV12, scrib-i ey-FLP clones, or in discrete hs-FLP UAS-Myc clones. They state that they see death in WT cells neighboring RasV12, scrib-i clones in the eye disc (Figures 4A-C). Next, they write that RasV12, scrib-I cells become losers (i.e., have apoptosis markers) when Fmi is removed. Neither of these results are quantified and thus are not compelling. They state that a similar result is observed for Myc over-expression clones that lack Fmi, but the image was not compelling, the results are not quantified and the controls are missing (Myc over-expressing clones alone and Fmi clones alone).

      We assayed apoptosis in UAS-Myc clones in eye discs but neglected to include the results in Figure 4. We will include them in the updated manuscript. Regarding Fmi clones alone, we direct the reviewer’s attention to Fig. 2 Supplement 1 where we showed that fminull clones cause no competition. Dcp-1 staining showed low levels of apoptosis unrelated to the fminull clones or twin-spots, and we will comment on this in the revised manuscript.

      Regarding the quantification of apoptosis, we did not provide a quantification, in part because we observe a very clear visual difference between groups (Fig. 4A-K), and in part because it is challenging to come up with a rigorous quantification method. For example, how far from a winner clone can an apoptotic cell be and still be considered responsive to the clone? For UAS-Myc winner clones, we observe a modest amount of cell death both inside and outside the clones, consistent with prior observations. For fminull UAS-Myc clones, we observe vastly more cell death within the fminull UAS-Myc clones and modest death in nearby wildtype cells, and consequently a much higher ratio of cell death inside vs outside the clone. Because of the somewhat arbitrary nature of quantification, and the dramatic difference, we initially chose not to provide a quantification. However, given the request, we chose an arbitrary distance from the clone boundary in which to consider dying cells and counted the numbers for each condition. We view this as a very soft quantification, but will report it in a way that captures the phenomenon in the revised manuscript.

      They then want to test whether Myc over-expressing clones have more proliferation. They show an image of a wing disc that has many small Myc overexpressing clones with and without Fmi. The pHH3 results support their conclusion that Myc overexpressing clones have more pHH3, but I have reservations about the many clones in these panels (Figures 5L-N).

      As the reviewer’s reservations are not specified, we have no specific response.

      They show that the cell competition roles of Fmi are not shared by another PCP component and are not due to the Cadherin domain of Fmi. The authors appear to interpret their results as Fmi is required for winner status. Overall, some of these results are potentially interesting and at least partially supported by the data, but others are not supported by the data.

      Strengths: 

      Fmi has been studied for its role in planar cell polarity, and its potential role in competition is interesting.

      Weaknesses:

      (1) In the Myc over-expression experiments, the increased size of the Myc clones could be because they divide faster (but don't outcompete WT neighbors). If the authors want to conclude that the bigger size of the Myc clones is due to out-competition of WT neighbors, they should measure cell death across many discs of with these clones. They should also assess if reducing apoptosis (like using one copy of the H99 deficiency that removes hid, rpr, and grim) suppresses winner clone size. If cell death is not addressed experimentally and quantified rigorously, then their results could be explained by faster division of Myc over-expressing clones (and not death of neighbors). This could also apply to the RasV12, scrib-i results.

      Indeed, Myc clones have been shown to divide faster than WT neighbors, but that is not the only reason clones are bigger. As shown in (de la Cova et al, 2004), Myc-overexpressing cells induce apoptosis in WT neighbors, and blocking this apoptosis results in larger wings due to increased presence of WT cells. Also, (Moreno and Basler, 2004) showed that Myc-overexpressing clones cause a reduction in WT clone size, as WT twin spots adjacent to 4xMyc clones are significantly smaller than WT twin spots adjacent to WT clones. In the same work, they show complete elimination of WT clones generated in a tub-Myc background. Since then, multiple papers have shown these same results. It is well established then that increased cell proliferation transforms Myc clones into supercompetitors and that in the absence of cell competition, Myc-overexpressing discs produce instead wings larger than usual.

      In (de la Cova et al, 2004) the authors already showed that blocking apoptosis with H99 hinders competition and causes wings with Myc clones to be larger than those where apoptosis wasn’t blocked. As these results are well established from prior literature, there is no need to repeat them here.

      (2) This same comment about Fmi affecting clone growth should be considered in the scrib RNAi clones in Figure 3.

      In later stages, scrib RNAi clones in the eye are eliminated by WT cells. While scrib RNAi clones are not substantially smaller in third instar when competing against fmi cells (Fig 3M), by adulthood we see that WT clones lacking Fmi have failed to remove scrib clones, unlike WT clones that have completely eliminated the scrib RNAi clones by this time. We therefore disagree that the only effect of Fmi could be related to rate of cell division.

      (3) I don't understand why the quantifications of clone areas in Figures 2D, 2H, 6D are log values. The simple ratio of GFP/RFP should be shown. Additionally, in some of the samples (e.g., fmiE59 >> Myc, only 5 discs and fmiE59 vs >Myc only 4 discs are quantified but other samples have more than 10 discs). I suggest that the authors increase the number of discs that they count in each genotype to at least 20 and then standardize this number.

      Log(ratio) values are easier to interpret than a linear scale. If represented linearly, 1 means equal ratios of A and B, while 2A/B is 2 and A/2B is 0.5. And the higher the ratio difference between A and B, the starker this effect becomes, making a linear scale deceiving to the eye, especially when decreased ratios are shown. Using log(ratios), a value of 0 means equal ratios, and increased and decreased ratios deviate equally from 0.

      Statistically, either analyzing a standardized number of discs for all conditions or a variable number not determined beforehand has no effect on the p-value, as long as the variable n number is not manipulated by p-hacking techniques, such as increasing the n of samples until a significant p-value has been obtained. While some of our groups have lower numbers, all statistical analyses were performed after all samples were collected. For all results obtained by cell counts, all samples had a minimum of 10 discs due to the inherent though modest variability of our automated cell counts, and we analyzed all the discs that we obtained from a given experiment, never “cherry-picking” examples. For the sake of transparency, all our graphs show individual values in addition to the distributions so that the reader knows the n values at a glance.

      (5) Figure 4 - shows examples of cell death. Cas3 is written on the figure but Dcp-1 is written in the results. Which antibody was used? The authors need to quantify these results. They also need to show that the death of cells is part of the phenotype, like an H99 deficiency, etc (see above).

      Thank you for flagging this error. We used cleaved Dcp-1 staining to detect cell death, not Cas3 (Drice in Drosophila). We will update all panels replacing Cas3 by Dcp-1.

      As described above, cell death is a well established consequence of myc overexpression induced cell death and we feel there is no need to repeat that result. To what extent loss of Fmi induces excess cell death or reduces proliferation in “would-be” winners, and to what extent it reduces “would-be” winners’ ability to eliminate competitors are interesting mechanistic questions that are beyond the scope of the current manuscript.

      (6) It is well established that clones overexpressing Myc have increased cell death. The authors should consider this when interpreting their results.

      We are aware that Myc-overexpressing clones have increased cell death, but it has also been demonstrated that despite that fact, they behave as winners and eliminate WT neighboring cells. And as mentioned in comment (1), WT clones generated in a 3x and 4x Myc background are eliminated and removed from the tissue, and blocking cell death increases the size of WT “losers” clones adjacent to Myc overexpressing clones.

      (7) A better characterization of discrete Fmi clones would also be helpful. I suggest inducing hs-flp clones in the eye or wing disc and then determining clone size vs twin spot size and also examining cell death etc. If such experiments have already been done and published, the authors should include a description of such work in the preprint.

      We have already analyzed the size of discrete Fmi clones and showed that they did not cause any competition, with fmi-null clones having the same size as WT clones in both eye and wing discs. We direct the reviewer’s attention to Figure 2 Supplement 1.

      (8) We need more information about the expression pattern of Fmi. Is it expressed in all cells in imaginal discs? Are there any patterns of expression during larval and pupal development?

      Fmi is equally expressed by all cells in all imaginal discs in Drosophila larva and pupa. We will include this information in the updated manuscript.

      (9) Overall, the paper is written for specialists who work in cell competition and is fairly difficult to follow, and I suggest re-writing the results to make it accessible to a broader audience.

      We have endeavored to both provide an accessible narrative and also describe in sufficient detail the data from multiple models of competition and complex genetic systems. We hope that most readers will be able, at a minimum, to follow our interpretations and the key takeaways, while those wishing to examine the nuts and bolts of the argument will find what they need presented as simply as possible.

      Reviewer 2:

      Summary:

      In this manuscript, Bosch et al. reveal Flamingo (Fmi), a planar cell polarity (PCP) protein, is essential for maintaining 'winner' cells in cell competition, using Drosophila imaginal epithelia as a model. They argue that tumor growth induced by scrib-RNAi and RasV12 competition is slowed by Fmi depletion. This effect is unique to Fmi, not seen with other PCP proteins. Additional cell competition models are applied to further confirm Fmi's role in 'winner' cells. The authors also show that Fmi's role in cell competition is separate from its function in PCP formation.

      We would like to thank the reviewer for their thoughtful and positive review.

      Strengths:

      (1) The identification of Fmi as a potential regulator of cell competition under various conditions is interesting.

      (2) The authors demonstrate that the involvement of Fmi in cell competition is distinct from its role in planar cell polarity (PCP) development.

      Weaknesses:

      (1) The authors provide a superficial description of the related phenotypes, lacking a comprehensive mechanistic understanding. Induction of apoptosis and JNK activation are general outcomes, but it is important to determine how they are specifically induced in Fmi-depleted clones. The authors should take advantage of the power of fly genetics and conduct a series of genetic epistasis analyses.

      We appreciate that this manuscript does not address the mechanism by which Fmi participates in cell competition. Our intent here is to demonstrate that Fmi is a key contributor to competition. We indeed aim to delve into mechanism, are currently directing our efforts to exploring how Fmi regulates competition, but the size of the project and required experiments are outside of the scope of this manuscript. We feel that our current findings are sufficiently valuable to merit sharing while we continue to investigate the mechanism linking Fmi to competition.

      (2) The depletion of Fmi may not have had a significant impact on cell competition; instead, it is more likely to have solely facilitated the induction of apoptosis.

      We respectfully disagree for several reasons. First, loss of Fmi is specific to winners; loss of Fmi has no effect on its own or in losers when confronting winners in competition. And in the Ras V12 tumor model, loss of Fmi did not perturb whole eye tumors – it only impaired tumor growth when tumors were confronted with competitors. We agree that induction of apoptosis is affected, but so too is proliferation, and only when in winners in competition.

      (3) To make a solid conclusion for Figure 1, the authors should investigate whether complete removal of Fmi by a mutant allele affects tumor growth induced by expressing RasV12 and scrib RNAi throughout the eye.

      We agree with the reviewer that this is a worthwhile experiment, given that RNAi has its limitations. However, as fmi is homozygous lethal at the embryo stage, one cannot create whole disc tumors mutant for fmi. As an approximation to this condition, we have introduced the GMR-Hid, cell-lethal combination to eliminate non-tumor tissue in the eye disc. Following elimination of non-tumor cells, there remains essentially a whole disc harboring fminull tumor. Indeed, this shows that whole fminull tumors overgrow similar to control tumors, confirming that the lack of Fmi only affects clonal tumors. We will provide those results in the updated manuscript.

      (4) The authors should test whether the expression level of Fmi (both mRNA and protein) changes during tumorigenesis and cell competition.

      This is an intriguing point that we would like to validate. We are currently performing immunostaining for Fmi in clones to confirm whether its levels change during competition. We will provide these results in the updated manuscript.

      Reviewer 3:

      Summary: <br /> In this manuscript, Bosch and colleagues describe an unexpected function of Flamingo, a core component of the planar cell polarity pathway, in cell competition in the Drosophila wing and eye disc. While Flamingo depletion has no impact on tumour growth (upon induction of Ras and depletion of Scribble throughout the eye disc), and no impact when depleted in WT cells, it specifically tunes down winner clone expansion in various genetic contexts, including the overexpression of Myc, the combination of Scribble depletion with activation of Ras in clones or the early clonal depletion of Scribble in eye disc. Flamingo depletion reduces the proliferation rate and increases the rate of apoptosis in the winner clones, hence reducing their competitiveness up to forcing their full elimination (hence becoming now "loser"). This function of Flamingo in cell competition is specific to Flamingo as it cannot be recapitulated with other components of the PCP pathway, and does not rely on the interaction of Flamingo in trans, nor on the presence of its cadherin domain. Thus, this function is likely to rely on a non-canonical function of Flamingo which may rely on downstream GPCR signaling.

      This unexpected function of Flamingo is by itself very interesting. In the framework of cell competition, these results are also important as they describe, to my knowledge, one of the only genetic conditions that specifically affect the winner cells without any impact when depleted in the loser cells. Moreover, Flamingo does not just suppress the competitive advantage of winner clones, but even turns them into putative losers. This specificity, while not clearly understood at this stage, opens a lot of exciting mechanistic questions, but also a very interesting long-term avenue for therapeutic purposes as targeting Flamingo should then affect very specifically the putative winner/oncogenic clones without any impact in WT cells.

      The data and the demonstration are very clean and compelling, with all the appropriate controls, proper quantification, and backed-up by observations in various tissues and genetic backgrounds. I don't see any weakness in the demonstration and all the points raised and claimed by the authors are all very well substantiated by the data. As such, I don't have any suggestions to reinforce the demonstration.

      While not necessary for the demonstration, documenting the subcellular localisation and levels of Flamingo in these different competition scenarios may have been relevant and provided some hints on the putative mechanism (specifically by comparing its localisation in winner and loser cells). 

      Also, on a more interpretative note, the absence of the impact of Flamingo depletion on JNK activation does not exclude some interesting genetic interactions. JNK output can be very contextual (for instance depending on Hippo pathway status), and it would be interesting in the future to check if Flamingo depletion could somehow alter the effect of JNK in the winner cells and promote downstream activation of apoptosis (which might normally be suppressed). It would be interesting to check if Flamingo depletion could have an impact in other contexts involving JNK activation or upon mild activation of JNK in clones.

      We would like to thank the reviewer for their thorough and positive review.

      Strengths: 

      - A clean and compelling demonstration of the function of Flamingo in winner cells during cell competition.

      - One of the rare genetic conditions that affects very specifically winner cells without any impact on losers, and then can completely switch the outcome of competition (which opens an interesting therapeutic perspective in the long term)

      Weaknesses: 

      - The mechanistic understanding obviously remains quite limited at this stage especially since the signaling does not go through the PCP pathway.

      Reviewer 2 made the same comment in their weakness (1), and we refer to that response. In future work, we are excited to better understand the pathways linking Fmi and competition.

    1. Author response:

      Reviewer #1 (Public Review):

      We thank Reviewer #1 for the professional evaluation and raising important points. We will address those comments in the updated manuscript and especially improve the discussion in respect to the two points of concern.

      (1) How can GlnA1 activity further be stimulated with further increasing 2-OG after the dodecamer is already fully assembled at 5 mM 2-OG.

      We assume a two-step requirement for 2-OG, the dodecameric assembly and the priming of the active sites. The assembly step is based on cooperative effects of 2-OG and does not require the presence of 2-OG in all 2-OG-binding pockets: 2-OG-binding to one binding pocket also causes a domino effect of conformational changes in the adjacent 2-OG-unbound subunit, as also described for Methanothermococcus thermolithotrophicus GS in Müller et al. 2023. Due to the introduction of these conformational changes, the dodecameric form becomes more favourable even without all 2-OG binding sites being occupied. With higher 2-OG concentrations present (> 5mM), the activity increased further until finally all 2-OG-binding pockets were occupied, resulting in the priming of all active sites (all subunits) and thereby reaching the maximal activity.

      (2) The contradictory results with previously published data on the structure of M. mazei by Schumacher et al. 2023.

      We certainly agree that it is confusing that Schumacher et al. 2023 obtained a dodecameric structure without the addition of 2-OG, which we claim to be essential for the dodecameric form. 2-OG is a cellular metabolite that is naturally present in E. coli, the heterologous expression host both groups used. Since our main question focused on analysing the 2-OG effect on GS, we have performed thorough dialysis of the purified protein to remove all 2-OG before performing MP experiments. In the absence of 2-OG we never observed significant enzyme activity and always detected a fast disassembly after incubation on ice. We thus assume that a dodecamer without 2-OG in Schuhmacher et al. 2023 is an inactive oligomer of a once 2-OG-bound form, stabilized e.g. by the presence of 5 mM MgCl2.

      The GlnA1-GlnK1-structure (crystallography) by Schumacher et al. 2023 is in stark contrast to our findings that GlnK1 and GlnA1 do not interact as shown by mass photometry with purified proteins. A possible reason for this discrepancy might be that at the high protein concentrations used in the crystallization assay, complexes are formed based on hydrophobic or ionic protein interactions, which would not form under physiological concentrations.

      Reviewer #2 (Public Review):

      We thank Reviewer #2 for the detailed assessment and valuable input. We will address those comments in the updated manuscript and clarify the message.

      (1) The discrepancy of the dodecamer formation (max. at 5 mM 2-OG) and the enzyme activity (max. at 12.5 mM 2-OG).

      We assume that there are two effects caused by 2-OG: 1. cooperativity of binding (less 2-OG needed to facilitate dodecamer formation) and 2. priming of each active site. See also Reviewer #1 R.1). We assume this is the reason why the activity of dodecameric GlnA1 can be further enhanced by increased 2-OG concentration until all catalytic sites are primed.

      (2) The lack of the structure of a 2-OG and ATP-bound GlnA1.

      Although we strongly agree that this would be a highly interesting structure, it seems out of the scope of a typical revision to request new cryo-EM structures. We evaluate the findings of our present study concerning the 2-OG effects as important insights into the strongly discussed field of glutamine synthetase regulation, even without the requested additional structures.

      (3) The observed GlnA1-filaments are an interesting finding.

      We certainly agree with the referee on that point, that the stacked polymers are potentially induced by 2-OG or ions. However, it is out of the main focus of this manuscript to further explore those filaments. Nevertheless, this observation could serve as an interesting starting point for future experiments.

      Reviewer #3 (Public Review):

      We thank Reviewer #3 for the expert evaluation and inspiring criticism.

      (1) Encouragement to examine ligand-bound states of GlnK1.

      We agree and plan to perform the suggested experiments exploring the conditions under which GlnA1 and GlnK1 might interact. We will perform the MP experiments in the presence of ATP. In GlnA1 activity test assays when evaluating the presence/effects of GlnK1 on GlnA1 activity, however, ATP was always present in high concentrations and still we did not observe a significant effect of GlnK1 on the GlnA1 activity.

      (2) The exact role of 2-OG could have been dissected much better.

      We agree on that point and will improve the clarity of the manuscript. See also Reviewer #1 R.1.

      (3) The lack of studies on dimers.

      This is actually an interesting point, which we did not consider during writing the manuscript. Now, re-analysing all our MP data in this respect, GlnA1 is likely a dimer as smallest species. Consequently, we will add more supplementary data which supports this observation and change the text accordingly.

      (4) Previous studies und structures did not show the 2-OG.

      We assume that for other structures, no additional 2-OG was added, and the groups did not specifically analyse for this metabolite either. All methanoarchaea perform methanogenesis and contain the oxidative part of the TCA cycle exclusively for the generation of glutamate (anabolism) but not a closed TCA cycle enabling them to use internal 2-OG concentration as internal signal for nitrogen availability. In the case of bacterial GS from organisms with a closed TCA cycle used for energy metabolism (oxidation of acetyl CoA) like e.g. E. coli, the formation of an active dodecameric GS form underlies another mechanism independent of 2-OG. In case of the recent M. mazei GS structures published by Schumacher et al. 2023, the dodecameric structure is probably a result from the heterologous expression and purification from E. coli. (See also Reviewer #1 R.2). One example of methanoarchaeal glutamine synthetases that do in fact contain the 2-OG in the structure, is Müller et al. 2023.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (public review and recommendations for the authors):

      Major points:

      (1) The identification of RAMP4 is a pivotal discovery in this paper. The sophisticated AlphaFold prediction, de novo model building of RAMP4's RBD domain, and sequence analyses provide strong evidence supporting the inclusion of RAMP4 in the ribosome-translocon complex structure.

      However, it is crucial to ensure the presence of RAMP4 in the purified sample. Particularly, a validation step such as western blotting for RAMP4 in the purified samples would strengthen the assertion that the ribosome-translocon complex indeed contains RAMP4. This is especially important given the purification steps involving stringent membrane solubilization and affinity column pull-down.

      As suggested, we have added Western blots showing that RAMP4 is retained at secretory translocons (and not multipass translocons) after solubilisation, affinity purification, and recovery of ribosome-translocon complexes (Fig. 3F). This data supports both our assignment of RAMP4 in ribosome-translocon complexes, and also the structure-based proposition that its occupancy is mutually exclusive with the multipass translocon (in particular, the PAT complex).  

      (2) Despite the comprehensive analyses conducted by the authors, it is challenging to accept the assertion that the extra density observed in TRAP class 1 corresponds to calnexin. The additional density in TRAP class 1 appears to be less well-resolved, and the evidence for assigning it as calnexin is insufficient. The extra density there can be any proteins that bind to TRAP. It is recommended that the authors examine the density on the ER lumen side. An investigation into whether calnexin's N-globular domain and P-domain are present in the ER lumen in TRAP class 1 would provide a clearer understanding.

      We agree that the Calnexin assignment is less confident than the other assignments in this manuscript, and that further support would be ideal. We have exhaustively searched our maps for any unexplained density connected with the putative Calnexin TMD, and have found none. This is consistent with Calnexin's lumenal domain being flexibly linked to its TMD, and thus would not be resolved in a ribosome-aligned reconstruction.

      Our assignment of this TMD to Calnexin was based on existing biochemical data (referenced in the paper) favouring this as the best working hypothesis by far: Calnexin is TRAP’s only abundant co-purifying factor, and their interaction is sensitive to point mutations in the Calnexin TMD. Recognising that this is not conclusive, we have ensured that the text and figures consistently describe this assignment as provisional or putative.

      (3) In the section titled 'TRAP competes and cooperates with different translocon subunits,' the authors present a compelling explanation for why TRAP delta defects can lead to congenital disorders of glycosylation. To enhance this explanation, it would be valuable if the authors could provide additional analyses based on mutations mentioned in the references. Specifically, examining whether these mutations align with the TRAP delta-OSTA structure models would strengthen the link between TRAP delta defects and the observed congenital disorders of glycosylation.

      We agree that mapping disease-causing point mutants to the TRAP delta structure could be potentially informative. Unfortunately, the referenced TRAP delta disease mutants act by simply impairing TRAP delta expression, and thus admit no such fine-grained analyses. However, sequence conservation is our next best guide to mutant function. We note in the text that the contact site charges on TRAP delta and RPN2 are conserved, and that the closest-juxtaposed interaction pair (K117 on TRAPδ and D386 on RPN2) is also the most conserved.

      Here are some minor points:

      (1) In the introduction, when the EMC, PAT, and BOS complexes were initially mentioned, it would be beneficial for the authors to provide more context or cite relevant references. This additional information will aid readers in better understanding these complexes, ensuring a smoother comprehension of their significance in the context of the study.

      The Introduction has been edited to provide more context with relevant references. 

      (2) In Figure 7, it would be valuable for the authors to include details on how they sampled the sequence alignments. 

      To clarify this methodological point, we have revised the Figure 7 caption to include these sentences: “The logo plots in panels A and D represent an HMM generated by jackHMMER upon convergence after querying UniProtKB’s metazoan sequences with the human TRAPα sequence. Only signal above background is shown, as rendered by Skylign.org.”

      Reviewer #2 (public review and recommendations for the authors):

      Strengths:

      The manuscript contains numerous novel new structural analyses and their potential functional implications. While all findings are exciting, the highlight is the discovery of RAMP4/SERP1 near the Sec61 lateral gate. Overall, the strength is the thorough and extensive structural analysis of the different high-resolution RTC classes as well as the expert bioinformatic evolutionary analysis.

      Weaknesses:

      A minor downside of the manuscript is the sheer volume of analyses and mechanistic hypotheses, which makes it sometimes difficult to follow. The authors might consider offloading some analyses based on weaker evidence to the supplement to maximize impact.

      We agree that the manuscript is long, but we have retained what we feel are the most important findings in the main text because the supplement is often undiscoverable via literature searches. Indeed, we chose eLife for its flexibility regarding article length and suitability for extended and detailed analyses. 

      Major:

      - Figure S1 does not capture the fact that a PAT-free subset of particles is analyzed. The PAT classification step should be added.

      We apologise for having caused some confusion on this point: we do not show a PAT classification step because there was none. Instead we reanalysed the whole dataset with a focus on Sec61 and TRAP. The very little PAT present (9% of particles, per Smalinskaitė et al. 2022) appeared as a very weak density in some of the closed-Sec and weak-TRAP classes.

      - The assignment of calnexin appears highly speculative. As the authors acknowledge the EM density is clearly of insufficient resolution for identification, and also AF2 does not render orthogonal support for the interpretation. The binding to TRAPg also does not explain complex formation in lower eukaryotes that do not have TRAPg. The authors may consider moving the calnexin assignment and interpretation to the supplement as it appears highly speculative. In any case, it should not be referred to as a hypothesis and not a structure.

      We agree that the Calnexin assignment is less confident than the other assignments in this manuscript, and that further support would be ideal. Our assignment of this TMD to Calnexin was based on existing biochemical data (referenced in the paper) favouring this as the best working hypothesis by far: Calnexin is TRAP’s only abundant co-purifying factor, and their interaction is sensitive to point mutations in the Calnexin TMD. Recognising that this is not conclusive, we have ensured that the text and figures consistently describe this assignment as provisional or putative.

      - P. 8: "This extensive competition explains why prior studies found TRAP in only 40% of MPT complexes, but at high occupancy at all other RTCs29". The interpretation is at odds with a recent re-analysis of the same dataset (preprint: Gemmer et al 2023, https://doi.org/10.1101/2023.11.28.569136), which finds TRAP occupancy to negatively correlate with PAT, not BOS.

      The reviewer is correct that the Gemmer study demonstrates a negative correlation between PAT and TRAP occupancy, but it does not, as the reviewer claims, argue against a negative correlation between BOS and TRAP. In fact it agrees that Sec61•BOS•PAT complex would clash with TRAP, and that therefore “BOS could trigger release of TRAP from the multipass translocon.” Thus, there is no conflict between the two studies. The revised text in this passage now cites the Gemmer et al. preprint and clarifies that TRAP is partially displaced by competition with BOS, but retained at the translocon via its ribosome-binding domain.  

      - P. 7/8: the authors suggest that TRAPd may be important for OSTA recruitment and hence TRAPd deletion may cause glycosylation defects in patients by failure to recruit OSTA. However, cryo-ET studies (Pfeffer et al, Nat. Comms 2017) showed that OSTA still binds in patient-derived microsomes (and the OSTA-TRAPd interaction). The author should discuss their model in the light of these data.

      As explained in the text, our hypothesis predicts that TRAPδ is more important for OSTA’s recruitment to the RTC than for its RTC affinity: “OSTA’s attraction to TRAPδ is weak compared to its binding to the ribosome, but TRAPδ may nonetheless help recruit OSTA, since TRAPδ would attract OSTA from most possible angles of approach, whereas OSTA’s ribosome contacts are stereospecific.” Therefore the fact that Pfeffer et al. 2017 found OSTA at some TRAPδ-negative RTCs is not surprising. For confirmation we would look for TRAPδ-dependent glycosylation sites in fast-folding domains or otherwise kinetically sensitive loci, and indeed TRAP-dependence screens return complex profiles that could be consistent with such a mechanism (Phoomak et al. 2021).

      - Some confidence measure for the assignment of SERP1/RAMP4 should be provided adding support for the claim "The resolution of the RBD density was sufficient for de novo modelling". Indeed, the N-terminal ribosome-bound segment appears well resolved and programs like Modelangelo or FindMySequence should provide a confidence measure for the assignment of the density to SERP1. The TM part appears less well resolved, but the connectivity to the Nterminus may justify the assignment, which should be elaborated on.

      Although we appreciate the value of tools like Modelangelo or FindMySequence, and would have used them if we were resting our assignment of RAMP4 on its RBD alone, we feel that such analyses would be superfluous here. They would quantify only the buildability of RAMP4’s

      RBD, whereas the real question of RAMP4’s assignability is independently supported by AlphaFold’s confirmation of RAMP4’s TMD as the Sec61-binding density, and further biochemical data provided or cited in the paper.

      - P. 3: "Because PAT complex recruitment and MPT assembly are just beginning, ..." the implicit kinetic model seems to be that the MPT subcomplexes assemble on ribosome and Sec61. What is the evidence for this model and later recruitment of PAT (as opposed to GEL, BOS, and PAT binding pre-assembled)?

      The work of Sundaram et al. (PMID 36261522) established that PAT, GEL and BOS do not coassociate appreciably in the absence of the ribosome-Sec61 complex. This is consistent with the structural data in Smalinskaite et al. (PMID 36261528), which shows that PAT, GEL, and BOS each contact the ribosome (and Sec61 in the case of PAT and BOS), but have few if any specific contacts among themselves. Finally, data in both of these studies show that recruitment of each complex to the RNC is not lost when any of them is missing, arguing that each is capable of independent recruitment to ribosome-Sec61 complexes. 

      - p. 4: the meaning of the sentence "Stabilising interactions with this widely conserved motif may help Sec61 respond to its diverse substrates with a consistent open state." is not entirely clear. Published single-particle cryo-EM structures of RTC appear to have resulted in various degrees of openness.

      Here we were referring not to RTC structures in general, but to substrate-engaged RTCs in particular.  The two substrate-engaged RTC structures under discussion in this paragraph are nearly identical (Figure 2c) despite large differences in substrate sequence (RhoTM2 vs preprolactin’s SP). We were surprised to find that this engaged structure creates noncovalent bonds between the Sec61 N-half and the ribosome. This bonding would tend to stabilise this particular engaged structure, and this stabilisation helps explain why the newly observed TMengaged structure is so similar to the previously observed SP-engaged structure. Without this stabilising N-half interaction, one might instead expect to see more variability, such as the reviewer suggests.

      - A recent analysis of heimdallarchaea already hypothesized TRAP in these organisms and should be cited: Eme et al, Nature 618:992-999 (2023). The novel findings of this manuscript compared to Eme et al should be discussed.

      We thank the reviewer for bringing this relevant contemporaneous work to our attention. Reviewing the putative TRAP homologs identified by Eme et al, we find that most do not in fact appear to be TRAP homologs at all, judged by the measures used in our work (reciprocal HHpred queries against the human proteome and predicted structural similarity). This is not surprising since Eme et al. relied on low-threshold sequence similarity searches rather than structural measures. To acknowledge this work, we have added a sentence as follows (italics): “To test whether these candidates are also similar to TRAPαβγ in sequence, we used them to perform reciprocal HHpred queries of the human proteome, and in each case the corresponding human TRAP protein was the top hit (E = 0.031 for TRAPα, 9.4×10-14 for TRAP β, and 110 for

      TRAPγ). A contemporaneous study has also claimed to find TRAP homologs in

      Heimdallarchaeota (Eme et al. 2023), although some caution is warranted in these assignments because they do not seem to share predicted structural similarity to TRAP subunits and do not find human homologs in reciprocal HHpred queries.”

      - Given that the authors expand the evolutionary analysis of TRAP to archaea it would be helpful if sampling for RAMP4 were consistent (i.e., is TRAP present in the early eukaryotes that do not feature RAMP4? Is RAMP4 absent from heimdallarchaea?).

      As stated in the text, RAMP4’s absence from early-branching eukaryotic taxa indicates that it was also absent from their archaeal ancestors. We did of course run such queries for completeness and indeed find no archaeal RAMP4. TRAP, for its part, is generally present in early-branching eukaryotic taxa, as stated in the text, and this necessarily includes those from which RAMP4 is absent.

      - The authors may consider discussing (Gemmer et al 2023, https://doi.org/10.1101/2023.11.28.569136), which comes to similar conclusions for NEMO integration into the MPT.

      We thank the reviewer for bringing this relevant work to our attention. We have added the following sentence to the section on NOMO: “Contemporaneous work has arrived at a similar model for PLD10-12 but did not model PLD1 (Gemmer et al. 2023).”

      - The abundance approximation of RAMP4 in the native translocon by OccuPy should probably be taken with a grain of salt. The '80%' mentioned in the conclusion may stick around and could eventually turn out to be closer to 100%.

      It is certainly possible that the occupancy of RAMP4 is higher than OccuPy estimates.

      Unfortunately no available method can provide occupancy estimates with confidence intervals. The Western blots we have added to the revised manuscript are consistent with high occupancy, but cannot discriminate between 80 or 100%.

      Minor

      - p. 5: The following sentence is incomplete: "Together, these factors explain why RAMP4's occupancy in prior cryo-EM maps was low enough to be overlooked, although in hindsight seems to be visible in several7,68,69"

      Thank you for catching this typo. We have revised the sentence as follows: “Together, these factors explain why RAMP4's occupancy in prior cryo-EM maps was low enough to be overlooked, although in hindsight it is visible in several of them.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses:

      The authors demonstrate that ASGR1 is degraded in response to RSPO2RA-antibody treatment through both the proteasomal and the lysosomal pathway, suggesting that this is due to the RSPO2RA-mediated recruitment of ZNRF3/RNF43, which have E3 ubiquitin ligase activity. The paper doesn't show, however, if ASGR1 is indeed ubiquitinated.

      We thank the reviewer for this comment. We have now conducted ASGR1 ubiquitination assays by immunoprecipitation (IP) of ubiquitin in the membrane protein extract, and immunoblotting (IB) ASGR1 after treating HepG2 cells with our SWEETS molecules or controls. The new data demonstrated ubiquitination of ASGR1 with SWEETS treatment (new Fig. S3A and S3B). Additionally, we blocked the potential ubiquitination of ASGR1 by mutating the two lysine residues in the cytoplasmic domain and compared the ASGR1 degradation after SWEETS treatment. The new data show that removing the potential ubiquitylation Lys sites prevented ASGR1 degradation post SWEETS treatment (new Fig. S3C). These new results provide direct evidence that ASGR1 is ubiquitinated to undergo lysosome or proteasome degradation.

      The authors conclude that the RSPO2A-Ab fusions can act as a targeted protein degredation platform, because they can degrade ASGR. While I agree with this statement, I would argue that the goal of these Abs would not be to degrade ASGR per se. The argumentation is a bit confusing here. This holds for both the results and the discussion section: The authors focus on the dual role of their agents, i.e. on promoting both WNT signaling AND on degrading ASGR1. They might want to reconsider how they present their data (e.g. it may be interesting to target ASGR1, but one would presumably then like to do this without also increasing WNT responsiveness?).

      We thank the reviewer for this comment. As the reviewer states, the initial goal of the RSPO2RA-ab fusions was to generate tissue-specific RSPO mimetics that focus on elimination of E3. As an unintended consequence, we observed enhanced elimination of ASGR as well. While this was unintended, the results did provide POC that when an E3 ligase is brought into proximity of another protein, ubiquitination and degradation of this protein may occur. Additionally, our results highlight that one needs to be careful in fully assessing the impact of bispecific molecules on the intended target as well as unintended targets to understand the potential side effects of such bispecific molecules. We have revised the manuscript to make this more clear, both in the Results and Discussion sections.

      Lines 326-331: The authors use a lot of abbreviations for all of the different protein targeting technologies, but since they are hinting at specific mechanisms, it would be better to actually describe the biological activity of LYTAC versus AbTAC/PROTAB/REULR so non-experts can follow.

      We thank the reviewer for this suggestion. We have added more details in the Discussion to highlight the different mechanisms of the various systems described.

      Can the authors comment on how 8M24 and 8G8 compare to 4F3? The latter seems a bit more specific (ie. lower background activity in the absence of ASGR1 in 5C)? Are there any differences/advances between 8M24 and 8G8 over 4F3? This remains unclear.

      These three antibodies bind different regions/epitopes on ASGR. 8M24 and 8G8 bind non-overlapping epitopes on the carbohydrate recognition domain (CRD), while 4F3 binds the stalk region outside of the CRD. This information is in the Results section of the manuscript. We do not believe that the difference in the ASGR binding epitopes contributes to the slight differences in the background activity. The slight differences may be due to differences in the conformation of the antibodies resulting from the differences in their primary sequences, and these differences may not be significant. We have now repeated the experiments in Fig. 5C and 5D to address the reviewer’s next comment on the axis. These new data (new Fig. 5C and 5D) show less background differences between the molecules.

      Can the authors ensure that the axes are labelled/numbered similarly for Fig 5B-D? This will make it easier to compare 5C and 5D.

      We thank the reviewer for this suggestion. The y-axes in Fig. 5B–D now have the same scale and number format. For Figs. 5C and 5D, we focus on the potency increases of the SWEETS molecules post ASGR1 overexpression.

      Reviewer #2 (Public Review):

      Weaknesses:

      The authors show crystal structures for binding of these antibodies to ASGR1/2, and hypothesize about why specificity is mediated through specific residues. They do not test these hypotheses.

      We thank the reviewer for this comment. We did not further test the residue contributions to binding and specificity as this is not the main focus of the current manuscript. We have revised the section and tuned down the claims for specificity.

      The authors demonstrate in hepatocyte cell lines that these function as mimetics, and that they do not function in HEK cells, which do not express ASGR1. They do not perform an exhaustive screen of all non-hepatocyte cells, nor do they test these molecules in vivo.

      We agree with the reviewer. For the 4F3-based SWEETS molecule, additional in vitro and in vivo specificity characterized were performed and described in Zhang et al., Sci Rep, 2020. Since 8M24 is human specific and 8G8 only weakly interacts with mouse receptors, in vivo experiments in mouse were not performed. While we did not extensively test the 8M24- and 8G8-based SWEETS on additional cell lines or in vivo, we do believe the data presented strongly support the hepatocyte-specific effects of these molecules.

      Surprisingly, these molecules also induced loss of ASGR1, which the authors hypothesize is due to ubiquitination and degradation, initiated by the E3 ligases recruited to ASGR1. They demonstrate that inhibition of either the proteasome or lysosome abrogates this effect and that it is dependent on E1 ubiquitin ligases. They do not demonstrate direct ubiquitination of ASGR1 by ZNRF3/RNF43.

      We thank the reviewer for this comment. We have now conducted ASGR1 ubiquitination assays by immunoprecipitation (IP) of ubiquitin in the membrane protein extract, and immunoblotting (IB) ASGR1 after treating HepG2 cells with our SWEETS molecules or controls. The new data demonstrate ubiquitination of ASGR1 with SWEETS treatment (new Figs. S3A and S3B). Additionally, we blocked the potential ubiquitination of ASGR1 by mutating the two lysine residues in the cytoplasmic domain and compared the ASGR1 degradation after SWEETS treatment. The new data show that removing the potential ubiquitylation Lys sites prevented ASGR1 degradation post SWEETS treatment (new Fig. S3C). These new results provide direct evidence that ASGR1 is ubiquitinated to undergo lysosome or proteasome degradation.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There are multiple instances where articles (i.e. the use of "the") are missing.

      We thank the reviewer for this comment. Following the suggestion, the manuscript has gone through a detailed review by an editorial service, and these and other grammatical errors have been corrected.

      Reviewer #2 (Recommendations For The Authors):

      The best I can think of is to inject these into Wnt reporter mice (or maybe humanized mice) and see if the liver lights up while other tissues do not.

      We thank the reviewer for this suggestion. The liver specificity was demonstrated in vivo in our earlier publication (SciRep, 10:13951, 2020) with the 4F3-RSPO2RA molecule. Unfortunately, as the results in this manuscript show, the new ASGR binders 8M24 and 8G8 either do not bind or only weakly interact with mouse receptors. Therefore, the in vivo experiments were not performed here.

      You could also consider addressing some of the statements in the manuscript that are currently hypothetical experimentally.

      We thank the reviewer for this comment. We did not further test the residues’ contribution to binding and specificity as this is not the main focus of the current manuscript. We have revised the section and tuned down the claims for specificity.

      It would be easier to compare the graphs in 5B-D if all Y-axes were the same scale, with the same scientific notation.

      We thank the reviewer for this suggestion. The y-axes in Fig. 5B-D now have the same scale and number format. For Figs. 5C and 5D, we focus on the potency increases of the SWEETS molecules post ASGR1 overexpression.

      Some of the western blots in Figure 6 do not have antibody/target labels, making them harder to interpret.

      All the Western blots antibody/target labels are on the right side of the blots for each panel, we have now made the text bold and thus easier to identify.

      Figure 6 and Supplementary Figure 2 are the same I think.

      Figure 6 and Supplementary Figure 2 show the same experimental set-up performed on two different cell lines, Fig. 6 is on Huh7 cells and Supplementary Fig. 2 is on HepG2 cells. The results from these two cell lines are quite consistent, making their appearance very similar.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This is a valuable study that develops a new model of the way muscle responds to perturbations, synthesizing models of how it responds to small and large perturbations, both of which are used to predict how muscles function for stability but also how they can be injured, and which tend to be predicted poorly by classic Hill-type models. The evidence presented to support the model is solid, since it outperforms Hill-type models in a variety of conditions. Although the combination of phenomenological and mechanistic aspects of the model may sometimes make it challenging to interpret the output, the work will be of interest to those developing realistic models of the stability and control of movement in humans or other animals.

      Reviewer #1 (Public Review):

      Muscle models are important tools in the fields of biomechanics and physiology. Muscle models serve a wide variety of functions, including validating existing theories, testing new hypotheses, and predicting forces produced by humans and animals in health and disease. This paper attempts to provide an alternative to Hill-type muscle models that includes contributions of titin to force enhancement over multiple time scales. Due to the significant limitations of Hill-type models, alternative models are needed and therefore the work is important and timely.

      The effort to include a role for titin in muscle models is a major strength of the methods and results. The results clearly demonstrate the weaknesses of Hill models and the advantages of incorporating titin into theoretical treatments of muscle mechanics. Another strength is to address muscle mechanics over a large range of time scales.

      The authors succeed in demonstrating the need to incorporate titin in muscle models, and further show that the model accurately predicts in situ force of cat soleus (Kirsch et al. 1994; Herzog & Leonard, 2002) and rabbit posts myofibrils (Leonard et al. 2010). However, it remains unclear whether the model will be practical for use with data from different muscles or preparations. Several ad hoc modifications were described in the paper, and the degree to which the model requires parameter optimization for different muscles, preparations and experiment types remains unclear.

      I think the authors should state how many parameters require fitting to the data vs the total number of model parameters. It would also be interesting for the authors to discuss challenges associated with modeling ex vivo and in vivo data sets, due to differences in means of stimulation vs. model inputs.

      (1) I think the authors should state how many parameters require fitting to the data vs the total number of model parameters.

      The total number of model parameters are listed in Table 1. Each parameter has, in addition, references listed for the source of data (if one exists) along with how the data were used (’C’ calculate, ’F’ fit, ’E’ estimated, or ’S’ for scaled) for the specific simulations that appear in this paper. While this is a daunting number of parameters, only a few of these parameters must be updated when modeling a new musculotendon.

      Similar to a Hill-type muscle model, at least 5 parameters are needed to fit the VEXAT model to a specific musculotendon: maximum isometric force (fiso), optimal contractile element (CE) length, pennation angle, maximum shortening velocity, and tendon slack length. However, similar to a Hill model, it is only possible to use this minimal set of parameters by making use of default values for the remaining set of parameters. The defaults we have used have been extracted from mammalian muscle (see Table 1) and may not be appropriate for modeling muscle tissue that differs widely in terms of the ratio of fast/slow twitch fibers, titin isoform, temperature, and scale.

      Even when these defaults are appropriate, variation is the rule for biological data rather than the exception. It will always be the case that the best fit can only be obtained by fitting more of the model’s parameters to additional data. Standard measurements of the active force-length relation, passive forcelength relation, and force-velocity relations are quite helpful to improve the accuracy of the model to a specific muscle. It is challenging to improve the fit of the model’s cross-bridge (XE) and titin models because the data required are so rare. The experiments of Kirsch et al., Prado et al, and Trombitas et´ al. are unique to our knowledge. However, if more data become available, it is relatively straight forward to update the model’s parameters using the methods described in Appendix B or the code that appears online (https://github.com/mjhmilla/Millard2023VexatMuscle).

      We have modified the manuscript to make it clear that, in some circumstances, the burden of parameter identification for the VEXAT model can be as low as a Hill model:

      - Section 3: last two sentences of the 2nd paragraph, found at: Page 10, column 2, lines 1-12 of MillardFranklinHerzog v3.pdf and 05 MillardFranklinHerzog v2 v3 diff.pdf

      - Table 1: last two sentences of the caption, found at: Page 11 of MillardFranklinHerzog v3.pdf and 05 MillardFranklinHerzog v2 v3 diff.pdf

      (2) It would also be interesting for the authors to discuss challenges associated with modeling ex vivo and in vivo data sets, due to differences in means of stimulation vs. model inputs.

      All of the experiments simulated in this work are in-situ or ex-vivo. So far the main challenges of simulating any experiment have been quite consistent across both in-situ and ex-vivo datasets: there are insufficient data to fit most model parameters to a specific specimen and, instead, defaults from the literature must be used. In an ideal case, a specimen would have roughly ten extra trials collected so that the maximum isometric force, optimal fiber length, active force-length relation, passive force-length relation (upto ≈ 0_._6_f_oM), and the force-velocity relations could be identified from measurements rather than relying on literature values. Since most lab specimens are viable for a small number of trials (with the exception of cat soleus), we don’t expect this situation to change in future.

      However, if data are available the fitting process is pretty straight forward for either in-situ or ex-vivo data: use a standard numerical method (for example non-linear least squares, or the bisection method) to adjust the model parameters to reduce the errors between simulation and experiment. The main difficulty, as described in the previous paragraph, is the availability of data to fit as many parameters as possible for a specific specimen. As such, the fitting process really varies from experiment to experiment and depends mainly on the richness of measurements taken from a specific specimen, and from the literature in general.

      Working from in-vivo data presents an entirely different set of challenges. When working with human data, for example, it’s just not possible to directly measure muscle force with tendon buckles, and so it is never completely clear how force is distributed across the many muscles that typically actuate a joint. Further, there is also uncertainty in the boundary condition of the muscle because optical motion capture markers will move with respect to the skeleton. Video fluoroscopy offers a method of improving the accuracy of measured boundary conditions, though only for a few labs due to its great expense. A final boundary condition remains impossible to measure in any case: the geometry and forces that act at the boundaries as muscle wraps over other muscles and bones. Fitting to in-vivo data are very difficult.

      While this is an interesting topic, it is tangent to our already lengthy manuscript. Since these reviews are public, we’ll leave it to the motivated reader to find this text here.

      Reviewer #2 (Public Review):

      This model of skeletal muscle includes springs and dampers which aim to capture the effect of crossbridge and titin stiffness during the stretch of active muscle. While both crossbridge and titin stiffness have previously been incorporated, in some form, into models, this model is the first to simultaneously include both. The authors suggest that this will allow for the prediction of muscle force in response to short-, mid- and long-range stretches. All these types of stretch are likely to be experienced by muscle during in vivo perturbations, and are known to elicit different muscle responses. Hence, it is valuable to have a single model which can predict muscle force under all these physiologically relevant conditions. In addition, this model dramatically simplifies sarcomere structure to enable this muscle model to be used in multi-muscle simulations of whole-body movement.

      In order to test this model, its force predictions are compared to 3 sets of experimental data which focus on short-, mid- and long-range perturbations, and to the predictions of a Hill-type muscle model. The choice of data sets is excellent and provide a robust test of the model’s ability to predict forces over a range of length perturbations. However, I find the comparison to a Hill-type muscle model to be somewhat limiting. It is well established that Hill-type models do not have any mechanism by which they can predict the effect of active muscle stretch. Hence, that the model proposed here represents an improvement over such a model is not a surprise. Many other models, some of which are also simple enough to be incorporated into whole-body simulations, have incorporated mechanistic elements which allow for the prediction of force responses to muscle stretch. And it is not clear from the results presented here that this model would outperform such models.

      The paper begins by outlining the phenomenological vs mechanistic approaches taken to muscle modelling, historically. It appears, although is not directly specified, that this model combines these approaches. A somewhat mechanistic model of the response of the crossbridges and titin to active stretch is combined with a phenomenological implementation of force-length and force-velocity relationships. This combination of approaches may be useful improving the accuracy of predictions of muscle models and whole-body simulations, which is certainly a worthy goal. However, it also may limit the insight that can be gained. For example, it does not seem that this model could reflect any effect of active titin properties on muscle shortening. In addition, it is not clear to me, either physiologically or in the model, what drives the shift from the high stiffness in short-range perturbations to the somewhat lower stiffness in mid-range perturbations.

      (1) It is well established that Hill-type models do not have any mechanism by which they can predict the effect of active muscle stretch.

      While many muscle physiologists are aware of the limitations of the Hill model, these limitations are not so well known among computational biomechanists. There are at least two reasons for this gap: there are few comprehensive evaluations of Hill models against several experiments, and some of the differences are quite nuanced. For example, active lengthening experiments can be replicated reasonably well using a Hill model if the lengthening is done on the ascending limb of the force length curve. Clearly the story is quite different on the descending limb as shown in Figure 9. Similarly, as Figure 8 shows, by choosing the right combination of tendon model and perturbation bandwidth it is possible to get reasonably accurate responses from the Hill model to stochastic length changes. Yet when a wide variety of perturbation bandwidths, magnitudes, and tendon models are tested it is clear that the Hill model cannot, in general, replicate the response of muscle to stochastic perturbations. For these reasons we think many of the Hill model’s drawbacks have not been clearly understood by computational biomechanists for many years now.

      (2) Many other models, some of which are also simple enough to be incorporated into whole-body simulations, have incorporated mechanistic elements which allow for the prediction of force responses to muscle stretch. And it is not clear from the results presented here that this model would outperform such models.

      We agree that it will be valuable to benchmark other models in the literature using the same set of experiments. Hopefully we, or perhaps others, will have the good fortune to secure research funding to continue this benchmarking work. This will, however, be quite challenging: few muscle models are accompanied by a professional-quality open-source implementation. Without such an implementation it is often impossible to reproduce published results let alone provide a fair and objective evaluation of a model.

      (3) For example, it does not seem that this model could reflect any effect of active titin properties on muscle shortening.

      The titin model described in the paper will provide an enhancement of force during a stretch-shortening cycle. This certainly would be an interesting next experiment to simulate in a future paper.

      (4) In addition, it is not clear to me, either physiologically or in the model, what drives the shift from the high stiffness in short-range perturbations to the somewhat lower stiffness in mid-range perturbations.

      We can only respond to what drives the frequency dependent stiffness in the model, though we’re quite interested in what happens physiologically. Hopefully that there are some new experiments done to examine this phenomena in the future. In the case of the model, the reasons are pretty straight forward: the formulation of Eqn. 16 is responsible for this shift.

      Equation 16 has been formulated so that the acceleration of the attachment point of the XE is driven by the force difference between the XE and a reference Hill model (numerator of the first term in Eqn. 16) which is then low pass filtered (denominator of the first term in Eqn. 16). Due to this formulation the attachment point moves less when the numerator is small, or when the differences in the numerator change rapidly and effectively become filtered out. When the attachment point moves less, more of the CE’s force output is determined by variations in the length of the XE and its stiffness.

      On the other hand, the attachment point will move when the numerator of the first term in Eqn. 16 is large, or when those differences are not short lived. When the attachment point moves to reduce the strain in the XE, the force produced by the XE’s spring-damper is reduced. As a result, the CE’s force output is less influenced by variations of the length of the XE and its stiffness.

      Reviewer #2 (Recommendations for the Authors):

      I find the clarity of the manuscript to be much improved following revision. While I still find the combination of phenomenological and mechanistic approaches to be a little limiting with regards to our understanding of muscle contraction, the revised description of small length changes makes the interpretation much less confusing.

      Similarly, while I agree that Hill-type models are widely used their limitations have been addressed extensively and are very well established. Hence, moving forward I think it would be much more valuable to start to compare these newer models to one another rather than just showing an improvement over a Hill model under (very biologically important) conditions which that model has no capacity to predict forces.

      (1) While I still find the combination of phenomenological and mechanistic approaches to be a little limiting with regards to our understanding of muscle contraction ...

      We have had to abstract some of the details of reality to have a model that can be used to simulate hundreds of muscles. In contrast, FiberSim produced by Kenneth Campbell’s group uses much less abstraction and might be of greater interest to you. FiberSim’s models include individual cross-bridges, titin molecules, and an explicit representation of the spatial geometry of a sarcomere. While this model is a great tool for testing muscle physiology questions through simulation, it is computationally expensive to use this model to simulate hundreds of muscles simultaneously.

      Kosta S, Colli D, Ye Q, Campbell KS. FiberSim: A flexible open-source model of myofilament-level contraction. Biophysical journal. 2022 Jan 18;121(2):175-82.https://campbell-muscle-lab.github.io/FiberSim/

      (2) Similarly, while I agree that Hill-type models are widely used their limitations have been addressed extensively and are very well established.

      Please see our response 1 to Reviewer # 1.

      (3) Hence, moving forward I think it would be much more valuable to start to compare these newer models to one another rather than just showing an improvement over a Hill model under (very biologically important) conditions which that model has no capacity to predict forces.

      Please see our response to 2 to Reviewer #1.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      In the paper by Choi et al., the authors aimed to develop base editing strategies to convert CAG repeats to CAA repeats in the huntingtin gene (HTT), which causes Huntington's disease (HD). They hypothesized that this conversion would delay disease onset by shortening the uninterrupted CAG repeat. Using HEK-293T cells as a model, the researchers employed cytosine base editors and guide RNAs (gRNAs) to efficiently convert CAG to CAA at various sites within the CAG repeat. No significant indels, off-target edits, transcriptome alterations, or changes in HTT protein levels were detected. Interestingly, somatic CAG repeat expansion was completely abolished in HD knock-in mice carrying CAA-interrupted repeats. 

      Correction of factual errors

      We analyzed HEK293 cells, not "HEK-293T".

      Strengths: 

      This study represents the first proof-of-concept exploration of the cytosine base editing technique as a potential treatment for HD and other repeat expansion disorders with similar mechanisms. 

      Weaknesses: 

      Given that HD is a neurodegenerative disorder, it is crucial to determine the efficiency of the base editing strategies tested in this manuscript and their feasibility in relevant cells affected by HD and the brain, which needed to be improved in this manuscript. 

      We appreciate the reviewer's constructive recommendations. Our genetic investigation focused on understanding observations in HD patients to develop genetic-based treatment strategies and test their feasibility. We agree with the reviewer regarding the importance of data from relevant cell types. Unfortunately, the levels of CAG-to-CAA conversion in the patient-derived neurons were modest, as described in our manuscript (approximately 2%). In addition, AAV did not produce detectable conversions in the brain of HD knock-in mice (data not shown), which was somewhat expected from the literature (PMID: 31937940). We believe some technical hurdles can be overcome by developing efficient delivery methods. Nonetheless, it will be an important follow-up study to perform preclinical studies employing optimized base editing strategies and efficient brain delivery methods to fully demonstrate the therapeutic potential of BE strategies. 

      Reviewer #2 (Public Review):

      Summary: 

      In a proof-of-concept study with the aspiration of developing a treatment to delay HD onset, Choi et al. design and test an A>G DNA base editing strategy to exploit the recently established inverse relationship between the number of uninterrupted CAG repeats in polyglutamine repeat expansions and the age-of-onset of Huntington's Disease (HD). Most of the study is devoted to optimizing a base editing strategy typified by BE4max and gRNA2. The base editing is performed in human HEK293 cells engineered with a 51 CAG canonical repeat and in HD knock-in mice harboring 105+ CAG repeats. 

      Correction of factual errors

      We tested base editing strategies aimed at C > T conversion, not A > G DNA base editing. In addition to HEK293 and knock-in mice, we tested base editing strategies in patient-derived iPSC and neurons.

      Weaknesses: 

      Genotypic data on DNA editing are not portrayed in a clear manner consistent with the study's goal, namely reducing the number of uninterrupted CAG repeats by a clinically relevant amount according to the authors' least square approximated mean age-at-onset. No phenotypic data are presented to show that editing performed in either model would lead to reduced hallmarks of HD onset. 

      More evidence is needed to support the central claims and therapeutic potential needs to be more adequate. 

      Our strategies for converting CAG to CAA in model systems resulted in quantitative DNA modification in a population of cells. Consequently, individual cells may carry different genotypes, some harboring CAA and others CAG at the same genomic location. Therefore, using a standard genotype format for DNA to present base editing outcomes may not be ideal. Instead, we presented the resulting genotype data in a quantitative fashion to provide the percentage of conversion at each site. This approach allows for an intuitive interpretation of both the extent of repeat length reduction and the proportion of such modifications.

      Currently, genetically precise HD mouse models with robust motor and behavioral phenotypes are unavailable. While some HD mouse models, such as the BAC and YAC models, feature pronounced behavioral phenotypes, they consist of interrupted CAG repeat sequences, making them unsuitable for base conversion studies due to their inherently short uninterrupted repeats. Although genetically precise HD knockin mouse models exist, they do not manifest motor symptom-like phenotypes. Given that CAG repeat expansion is the primary driver of the disease and knock-in mice recapitulate such phenomenon, our genetic investigation focused on assessing the effects of base conversion on CAG repeat instability in knock-in mice. However, as emphasized by the reviewer, subsequent preclinical studies to evaluate the therapeutic efficacy of CAG-to-CAA conversion strategies using mouse models harboring uninterrupted adult-onset CAG repeats and robust HD-like phenotypes remain crucial.

      Reviewer #3 (Public Review):

      Summary: 

      In human patients with Huntington's disease (HD), caused by a CAG repeat expansion mutation, the number of uninterrupted CAG repeats at the genomic level influences age-at-onset of clinical signs independent of the number of polyglutamine repeats at the protein level. In most patients, the CAG repeat terminates with a CAACAG doublet. However, CAG repeat variants exist that either do not have that doublet or have two doublets. These variants consequently differ in their number of uninterrupted CAG repeats, while the number of glutamine repeats is the same as both CAA and CAG codes for glutamine. The authors first confirm that a shorter uninterrupted CAG repeat number in human HD patients is associated with developing the first clinical signs of HD later. They predict that introducing a further CAA-CAG doublet will result in years of delay of clinical onset. Based on this observation, the authors tested the hypothesis that turning CAG to CAA within a CAG repeat sequence using base editing techniques will benefit HD biology. They show that, indeed, in HD cell models (HEK293 cells expressing 16/17 CAG repeats; a single human stem cell line carrying a CAG repeat expansion in the fully penetrant range with 42 CAG repeats), their base editing strategies do induce the desired CAG-CAA conversion. The efficiency of conversion differed depending on the strategy used. In stem cells, delivery posed a problem, so to test allele specificity, the authors then used a HEK 293 cell line with 51 CAG repeats on the expanded allele. Conversion occurred in both alleles with huntingtin protein and mRNA levels; transcriptomics data was unchanged. In knock-in mice carrying 110 CAG repeats, however, base editing did not work as well for different, mainly technical, reasons. 

      Correction of factual errors

      "HD cell models (HEK293 cells expressing 16/17 CAG repeats" is an incorrect description. It should be "HD cell models (HEK293 cells expressing 51/17 CAG repeats".

      Strengths: 

      The authors use state-of-the-art methods and carefully and thoroughly designed experiments. The data support the conclusions drawn. This work is a very valuable translation from the insight gained from large GWAS studies into HD pathogenesis. It rightly emphasises the potential this has as a causal treatment in HD, while the authors also acknowledge important limitations. 

      Weaknesses: 

      They could dedicate a little more to discussing several of the mentioned challenges. The reader will better understand where base editing is in HD currently and what needs to be done before it can be considered a treatment option. For instance, 

      - It is important to clarify what can be gained by examining again the relationship between uninterrupted CAG repeat length and age-at-onset. Could the authors clarify why they do this and what it adds to their already published GWAS findings? What is the n of datasets? 

      Published HD GWAS (PMID: 31398342) compared the onset age of duplicated interruption and loss of interruption to that of canonical repeats to determine whether uninterrupted CAG repeat or polyglutamine determines age at onset. However, GWAS findings did not quantify the magnitude of the unexplained remaining variance in age at onset in duplicated interruption and loss of interruption. Our study further investigated to gain insights into the amount of additional impact of duplicated interruption to estimate the maximum clinical benefits of base editing strategies for CAG-to-CAA conversion. Since the purpose of this genetic analysis is described in the result section already, we added the following sentence in the introduction section to bring up what is unknown. 

      "Still, age at onset of loss of interruption and duplicated interruption was not fully accounted for by uninterrupted CAG repeat, suggesting additional effects of non-canonical repeats."

      We added sample size for the least square approximation analysis in the text and corresponding figure legend. Sample sizes for molecular and animal experiments can be found in the corresponding figure legend.

      - What do they think an ideal conversion rate would be, and how that could be achieved? 

      It is a very important question. However, speculating the ideal conversion levels is out of the scope of this genetic investigation. A series of preclinical studies using relevant models may generate data that may shed light on the conversion rate levels that are required to produce meaningful clinical benefits. In the discussion section, we added the following sentence. 

      "Currently, the ideal levels of CAG-to-CAA conversion that produce significant clinical benefits are unknown. A series of preclinical studies using relevant model systems may generate data that may shed light on the optimal conversion rate levels that are required to produce significant clinical benefits."

      - Is there a dose-effect relationship for base editing, and would it be realistic to achieve the ideal conversion rate in target cells, given the difficulties described by the authors in differentiated neurons from stem cells? 

      We observed a clear dose-response relationship between the amount of BE reagents and the levels of conversion in non-neuronal cells. Unfortunately, the conversion rate was low in neuronal cells, potentially due to limited delivery, as speculated in the result section. As described in the discussion sections, we predict that efficient delivery methods will be crucial to produce significant CAG-to-CAA conversion to achieve therapeutic benefits.

      - The liver is a good tool for in-vivo experiments examining repeat instability in mouse models. However, the authors could comment on why they did not examine the brain.

      We focused on liver instability because of 1) the expectation that delivery/targeting efficiency is significantly lower in the brain (PMID: 31937940) and 2) shared underlying mechanisms between the brain and liver (described in the result section). The following sentence was added in the method section to provide a rationale for liver analysis. 

      "Since significantly lower delivery/targeting efficiency was expected in the brain 34, we focused on analyzing liver instability."

      - Is there a limit to judging the effects of base editing on somatic instability with longer repeats, given the difficulties in measuring long CAG repeat expansions? 

      Determining the levels of base conversion using sequencing technologies gets harder as repeats become longer. Fragment analysis can overcome such technical difficulty if conversion efficiency is high. As pointed out, the repeat expansion measure is also challenging because amplification is biased toward shorter alleles. However, if repeat sizes are relatively similar, the levels of repeat expansion as a function of base conversion can be determined relatively precisely without a significant bias by a standard fragment analysis approach. 

      - Given the methodological challenges for assessing HTT fragments, are there other ways to measure the downstream effects of base editing rather than extrapolate what it will likely be?

      Our CAG-to-CAA conversion strategies are not expected to directly generate fragments of huntingtin DNA, RNA, or protein. In contrast, immediate downstream effects of CAG-to-CAA conversion include sequence changes (DNA and RNA) and alteration of repeat instability, which are presented in the manuscript. If repeat instability is associated with HTT exon 1A fragment, base conversion strategies may indirectly alter the levels of such putative toxic species, which remains to be determined.  

      - Sequencing errors could mask low-level, but biologically still relevant, off-target effects (such as gRNAdependent and gRNA-independent DNA, Off-targets, RNA off-targets, bystander editing). How likely is that? 

      We agree with the reviewer that increased editing efficiency is expected to increase the levels of off-target editing. However, the field is actively developing base editors with minimal off-target effect (PMID: 35941130), which will increase the safety aspects of this technology for clinical use. We added the following sentence.  "In addition, developing base editors with high level on-target gene specificity and minimal off-target effects is a critical aspect to address 100."

      - How worried are the authors about immune responses following base editing? How could this be assessed? 

      We added the following sentence in the discussion section as the reviewer raised an important safety issue.  

      "Thorough assessments of immune responses against base editing strategies (e.g., development of antibody, B cell, and T cell-specific immune responses) and subsequent modification (e.g., immunosilencing) 101 will be critical to address immune response-associated safety issues of BE strategies."

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The following points could be considered to improve the overall quality of the manuscript: 

      (1) The authors mentioned that the reason for checking repeat instability in the nonneuronal cells was due to the availability of specific types of AAV; there are other subtypes of AAVs available to infect neurons and iPSCs. 

      Our pilot experiments testing several AAV serotypes in patient-derived iPSC and HD knock-in mice showed that only AAV9 converted CAG to CAA at detectable levels in the liver, not in the brain or neurons. We also speculate that difficulties in targeting the CAG repeat region due to GC-rich sequence contributed to low conversion efficiency. Therefore, subsequent optimization of base editor and delivery may improve BE strategies for HD, permitting robust conversion at the challenging locus. 

      (2) Despite its bold nature, minimal data in the manuscript demonstrate that this gene editing strategy is disease-modifying.

      Resources required to demonstrate the therapeutic benefits of CAG-to-CAA conversion strategies are not fully available. Especially, relevant HD mouse models that carry uninterrupted adult onset CAG repeat and that permit measuring the levels of disease-modifying are lacking, as described in our response to the second reviewer. Given that CAG repeat expansion is the primary driver of the disease, this genetic investigation focused on determining the impacts of base editing strategies on CAG repeat expansion. Still, as indicated by the reviewer, follow-up preclinical studies to evaluate the levels of disease-modifying of CAG-to-CAA conversion strategies using relevant mouse models represent important next steps.

      (3) Off-target analysis at the DNA level was limited to "predicted" off-target sites. What about possible translocations that can result from co-nicking on different chromosomes, as a large number of potential targets exist? 

      Among gRNAs we tested, we focused on gRNAs 1 and 2, which predicted small numbers of off-target. Therefore, our off-target analysis at the DNA level was focused on validating those predicted off-targets. As pointed out, thoroughly evaluating off-target effects will be necessary when candidate BE strategies take the next steps for therapeutic development.

      Genomic translocation caused by double-strand breaks can produce negative consequences, such as cancer. Importantly, although paired nicks efficiently induced translocations, translocations were not detected when a single nick was introduced on each chromosome (PMID: 25201414). Therefore, it is predicted that BE strategies using nickase confers little risk of translocation.

      (4) For in vivo work, somatic repeat expansion was analyzed only in peripheral tissue samples. Since the main affected cellular population in HD is the brain, the outcome of this treatment on a disease-relevant organ still needs to be determined. 

      Challenges in delivery to the brain made us determine instability in the liver since many mechanistic components of somatic CAG repeat instability are shared between the liver and striatum, as rationalized in the manuscript. However, we agree with the reviewer regarding the importance of determining the effects of base conversion on brain instability. We added the following sentence in the method section to provide a rationale. "Since significantly lower delivery/targeting efficiency was expected in brain 34, we focused on analyzing liver instability."

      Reviewer #2 (Recommendations For The Authors):

      Throughout the manuscript, the authors apologize for techniques that do not work when workarounds seem readily apparent to an expert in the field. In its current form, the manuscript reads verbose, speculative, apologetic, and preliminary. 

      Drug development programs that are supported by human genetics data show increased success rates in clinical trials (PMID: 26121088, 31827124, 31830040). This is why this genetic study focused on 1) investigating observations in HD subjects and 2) subsequently developing treatment strategies that are supported by patient genetics. As the first illustration of base editing in HD, the main scope of our manuscript is to justify the genetic rationale of CAG-to-CAA conversion and demonstrate the feasibility of therapeutic strategies rooted in patient genetics. As our study was not aimed at entirely demonstrating the clinical benefits of base editing strategies in HD, some of our data were based on tools and approaches that were not fully optimal. We agree with the reviewer that it will be an important next step to employ optimized approaches to evaluate the efficacy of base editing strategies in model systems. Nevertheless, our novel base conversion strategies derived from HD patient genetics represent a significant advancement as they may contribute to developing effective treatments for this devastating disorder. 

      Reviewer#3 (Recommendations For The Authors):

      It would make for an easier read if abbreviations were kept to a minimum. 

      As recommended, we decreased the use of abbreviations. The following has been spelled out throughout the manuscript: CR (canonical repeat), LI (loss of interruption), DI (duplicated interruption), and CBE (cytosine base editor). Other abbreviations with infrequent usage (e.g., ABE, SS, QC) were also spelled out in the text.

    1. Author response:

      Reviewer #1: 

      Summary:

      In this study, the authors used a multi-alternative decision task and a multidimensional signal-detection model to gain further insight into the cause of perceptual impairments during the attentional blink. The model-based analyses of behavioural and EEG data show that such perceptual failures can be unpacked into distinct deficits in visual detection and discrimination, with visual detection being linked to the amplitude of late ERP components (N2P and P3) and discrimination being linked to the coherence of fronto-parietal brain activity.

      Strengths:

      The main strength of this paper lies in the fact that it presents a novel perspective on the cause of perceptual failures during the attentional blink. The multidimensional signaldetection modelling approach is explained clearly, and the results of the study show that this approach offers a powerful method to unpack behavioural and EEG data into distinct processes of detection and discrimination.

      Weaknesses:

      (1.1) While the model-based analyses are compelling, the paper also features some analyses that seem misguided, or, at least, insufficiently motivated and explained. Specifically, in the introduction, the authors raise the suggestion that the attentional blink could be due to a reduction in sensitivity or a response bias. The suggestion that a response bias could play a role seems misguided, as any response bias would be expected to be constant across lags, while the attentional blink effect is only observed at short lags. Thus, it is difficult to understand why the authors would think that a response bias could explain the attentional blink.

      A deficit in T2 identification accuracy could arise from either sensitivity or criterion effects; the criterion effect may manifest as a choice bias. For example, in short T1-T2 lag trials, when T2 closely follows T1, participants may adopt a more conservative choice criterion for reporting the presence of T2. Moreover, criterion effects need not be uniform across lags: A participant could infer the T1-T2 lag interval based on various factors, including trial length, thereby permitting them to adjust their choice criterion variably across different lags. We will provide a more detailed illustration of this claim in the revision.

      (1.2) A second point of concern regards the way in which the measures for detection and discrimination accuracy were computed. If I understand the paper correctly, a correct detection was defined as either correctly identifying T2 (i.e., reporting CW or CCW if T2 was CW or CCW, respectively, see Figure 2B), or correctly reporting T2's absence (a correct rejection). Here, it seems that one should also count a misidentification (i.e., incorrect choice of CW or CCW when T2 was present) as a correct detection, because participants apparently did detect T2, but failed to judge/remember its orientation properly in case of a misidentification. Conversely, the manner in which discrimination performance is computed also raises questions. Here, the authors appear to compute accuracy as the average proportion of T2-present trials on which participants selected the correct response option for T2, thus including trials in which participants missed T2 entirely. Thus, a failure to detect T2 is now counted as a failure to discriminate T2. Wouldn't a more proper measure of discrimination accuracy be to compute the proportion of correct discriminations for trials in which participants detected T2?

      Detection and discrimination accuracies were computed with precisely the same procedure, and under the same conditions, as described by the Reviewer (underlined text, above). We regret our poor description; we will improve upon it in the revised manuscript.

      (1.3) My last point of critique is that the paper offers little if any guidance on how the inferred distinction between detection and discrimination can be linked to existing theories of the attentional blink. The discussion mostly focuses on comparisons to previous EEG studies, but it would be interesting to know how the authors connect their findings to extant, mechanistic accounts of the attentional blink. A key question here is whether the finding of dissociable processes of detection and discrimination would also hold with more meaningful stimuli in an identification task (e.g., the canonical AB task of identifying two letters shown amongst digits). There is evidence to suggest that meaningful stimuli are categorized just as quickly as they are detected (Grill-Spector & Kanwisher, 2005; Grill-Spector K, Kanwisher N. Visual recognition: as soon as you know it is there, you know what it is. Psychol Sci. 2005 Feb;16(2):152-60. doi: 10.1111/j.0956-7976.2005.00796.x. PMID: 15686582.). Does that mean that the observed distinction between detection and discrimination would only apply to tasks in which the targets consist of otherwise meaningless visual elements, such as lines of different orientations?

      Our results are consistent with previous literature suggested by the Reviewer. Specifically, we do not claim that detection and discrimination are sequential processes; in fact, we modeled them as concurrent computations (Figs. 3A-B). Yet, our results suggest that these processes possess distinct neural bases. We have discussed this idea briefly in the Discussion section (e.g., “Yet, we found no evidence for these two computations being sequential…”). We will discuss this further in the revised manuscript in the context of previous literature.

      Reviewer #2:

      Summary:

      The authors had two aims: First, to decompose the attentional blink (AB) deficit into the two components of signal detection theory; sensitivity and bias. Second, the authors aimed to assess the two subcomponents of sensitivity; detection and discrimination. They observed that the AB is only expressed in sensitivity. Furthermore, detection and discrimination were doubly dissociated. Detection modulated N2p and P3 ERP amplitude, but not frontoparietal beta-band coherence, whereas this pattern was reversed for discrimination.

      Strengths:

      The experiment is elegantly designed, and the data - both behavioral and electrophysiological - are aptly analyzed. The outcomes, in particular the dissociation between detection and discrimination blinks, are consistently and clearly supported by the results. The discussion of the results is also appropriately balanced.

      Weaknesses:

      (2.1) The lack of an effect of stimulus contrast does not seem very surprising from what we know of the nature of AB already. Low-level perceptual factors are not thought to cause AB. This is fine, as there are also other, novel findings reported, but perhaps the authors could bolster the importance of these (null) findings by referring to AB-specific papers, if there are indeed any, that would have predicted different outcomes in this regard.

      While there is consensus that the low-level perceptual factors are not affected by the attentional blink, other studies may suggest evidence to the contrary (e.g., Chua et al, Percept. Psychophys., 2005). We will highlight the significance of our findings in the context of such conflicting evidence in literature, in the revised manuscript.

      (2.2) On an analytical note, the ERP analysis could be finetuned a little more. The task design does not allow measurement of the N2pc or N400 components, which are also relevant to the AB, but the N1 component could additionally be analyzed. In doing so, I would furthermore recommend selecting more lateral electrode sites for both the N1, as well as the P1. Both P1 and N1 are likely not maximal near the midline, where the authors currently focused their P1 analysis.

      We will incorporate these additional analyses in the revised manuscript.

      (2.3) Impact & Context:

      The results of this study will likely influence how we think about selective attention in the context of the AB phenomenon. However, I think its impact could be further improved by extending its theoretical framing. In particular, there has been some recent work on the nature of the AB deficit, showing that it can be discrete (all-or-none) and gradual (Sy et al., 2021; Karabay et al., 2022, both in JEP: General). These different faces of target awareness in the AB may be linked directly to the detection and discrimination subcomponents that are analyzed in the present paper. I would encourage the authors to discuss this potential link and comment on the bearing of the present work on these behavioural findings.

      Thank you. We will discuss our findings in the context of these recent studies.

      Reviewer #3:

      Summary:

      In the present study, the authors aimed to achieve a better understanding of the mechanisms underlying the attentional blink, that is, a deficit in processing the second of two target stimuli when they appear in rapid succession. Specifically, they used a concurrent detection and identification task in- and outside of the attentional blink and decoupled effects of perceptual sensitivity and response bias using a novel signal detection model. They conclude that the attentional blink selectively impairs perceptual sensitivity but not response bias, and link established EEG markers of the attentional blink to deficits in stimulus detection (N2p, P3) and discrimination (fronto-parietal high-beta coherence), respectively. Taken together, their study suggests distinct mechanisms mediating detection and discrimination deficits in the attentional blink.

      Strengths:

      Major strengths of the present study include its innovative approach to investigating the mechanisms underlying the attentional blink, an elegant, carefully calibrated experimental paradigm, a novel signal detection model, and multifaceted data analyses using state-of-theart model comparisons and robust statistical tests. The study appears to have been carefully conducted and the overall conclusions seem warranted given the results. In my opinion, the manuscript is a valuable contribution to the current literature on the attentional blink. Moreover, the novel paradigm and signal detection model are likely to stimulate future research.

      Weaknesses:

      Weaknesses of the present manuscript mainly concern the negligence of some relevant literature, unclear hypotheses, potentially data-driven analyses, relatively low statistical power, potential flaws in the EEG methods, and the absence of a discussion of limitations. In the following, I will list some major and minor concerns in detail.

      Major points

      (3.1) Hypotheses:

      I appreciate the multifaceted, in-depth analysis of the given dataset including its high amount of different statistical tests. However, neither the Introduction nor the Methods contain specific statistical hypotheses. Moreover, many of the tests (e.g., correlations) rely on selected results of previous tests. It is unclear how many of the tests were planned a priori, how many more were performed, and how exactly corrections for multiple tests were implemented. Thus, I find it difficult to assess the robustness of the results.

      As outlined in the Introduction, we hypothesized that neural computations associated with target detection would be characterized by regional neuronal markers (e.g., parietal or occipital ERPs), whereas computations linked to feature discrimination may involve neural coordination across multiple brain regions (e.g. fronto-parietal coherence). We planned and conducted our statistical tests based on this hypothesis. All multiple comparison corrections (e.g., Bonferroni-Holm correction, see Methods) were performed separately for each class of analyses. We will clarify these hypotheses and provide further details in the revised manuscript.

      (3.2) Power:

      Some important null findings may result from the rather small sample sizes of N = 24 for behavioral and N = 18 for ERP analyses. For example, the correlation between detection and discrimination d' deficits across participants (r=0.39, p=0.059) (p. 12, l. 263) and the attentional blink effect on the P1 component (p=0.050, no test statistic) (p. 14, 301) could each have been significant with one more participant. In my opinion, such results should not be interpreted as evidence for the absence of effects.

      We agree and will revise the manuscript accordingly. We will also report Bayes factor (BF) values, where relevant, to further evaluate these claims.

      (3.3) Neural basis of the attentional blink:

      The introduction (e.g., p. 4, l. 56-76) and discussion (e.g., p. 19, 427-447) do not incorporate the insights from the highly relevant recent review by Zivony & Lamy (2022), which is only cited once (p. 19, l. 428). Moreover, the sections do not mention some relevant ERP studies of the attentional blink (e.g., Batterink et al., 2012; Craston et al., 2009; Dell'Acqua et al., 2015; Dellert et al., 2022; Eiserbeck et al., 2022; Meijs et al., 2018).

      We will motivate and discuss our study in the context of these previous studies. 

      (3.4) Detection versus discrimination:

      Concerning the neural basis of detection versus discrimination (e.g., p. 6, l. 98-110; p. 18, l. 399-412), relevant existing literature (e.g., Broadbent & Broadbent, 1987; Hillis & Brainard, 2007; Koivisto et al., 2017; Straube & Fahle, 2011; Wiens et al., 2023) is not included.

      Thank you for these suggestions. We will include these important studies in our discussion.

      (3.5) Pooling of lags and lags 1 sparing:

      I wonder why the authors chose to include 5 different lags when they later pooled early (100, 300 ms) and late (700, 900 ms) lags, and whether this pooling is justified. This is important because T2 at lag 1 (100 ms) is typically "spared" (high accuracy) while T2 at lag 3 (300 ms) shows the maximum AB (for reviews, see, e.g., Dux & Marois, 2009; Martens & Wyble, 2010). Interestingly, this sparing was not observed here (p. 43, Figure 2). Nevertheless, considering the literature and the research questions at hand, it is questionable whether lag 1 and 3 should be pooled.

      Lag-1 sparing is not always observed in attentional blink studies; there are notable exceptions that do not report such sparing (Hommel et al., Q. J. Exp. Psychol., 2005; Livesay et al., Attention, Percept. Psychophys., 2011). Our statistical tests revealed no significant difference in accuracies between short lag (100 and 300 ms) trials or between long lag (700 and 900 ms) trials but did reveal significant differences between the short and long lag trials (ANOVA, followed by post-hoc tests). To simplify the presentation of the findings, we pooled together the short lag (100 and 300 ms) and, separately, the long lag (700 and 900 ms) trials. We will present these analyses, and clarify the motivation for pooling in the revised manuscript. 

      (3.6) Discrimination in the attentional blink

      Concerning the claims that previous attentional blink studies conflated detection and discrimination (p. 6, l. 111-114; p. 18, l. 416), there is a recent ERP study (Dellert et al., 2022) in which participants did not perform a discrimination task for the T2 stimuli. Moreover, since the relevance of all stimuli except T1 was uncertain in this study, irrelevant distractors could not be filtered out (cf. p. 19, l. 437). Under these conditions, the attentional blink was still associated with reduced negativities in the N2 range (cf. p. 19, l. 427-437) but not with a reduced P3 (cf. p. 19, l 439-447).

      We will address the difference between our findings and those of Dellert et al (2022) in the revised manuscript.

      (3.7) General EEG methods:

      While most of the description of the EEG preprocessing and analysis (p. 31/32) is appropriate, it also lacks some important information (see, e.g., Keil et al., 2014). For example, it does not include the length of the segments, the type and proportion of artifacts rejected, the number of trials used for averaging in each condition, specific hypotheses, and the test statistics (in addition to p-values).

      We regret the oversight. We will include these details in the revised Methods.

      (3.8) EEG filters:

      P. 31, l. 728: "The data were (...) bandpass filtered between 0.5 to 18 Hz (...). Next, a bandstop filter from 9-11 Hz was applied to remove the 10 Hz oscillations evoked by the RSVP presentation." These filter settings do not follow common recommendations and could potentially induce filter distortions (e.g., Luck, 2014; Zhang et al., 2024). For example, the 0.5 high-pass filter could distort the slow P3 wave. Mostly, I am concerned about the bandstop filter. Since the authors commendably corrected for RSVP-evoked responses by subtracting T2-absent from T2-present ERPs (p. 31, l. 746), I wonder why the additional filter was necessary, and whether it might have removed relevant peaks in the ERPs of interest.

      Thank you for this suggestion. We will repeat this analysis by removing these additional filters.

      (3.9) Coherence analysis:

      P. 33, l. 786: "For subsequent, partial correlation analyses of coherence with behavioral metrics and neural distances (...), we focused on a 300 ms time period (0-300 ms following T2 onset) and high-beta frequency band (20-30 Hz) identified by the cluster-based permutation test (Fig. 5A-C)." I wonder whether there were any a priori criteria for the definition and selection of such successive analyses. Given the many factors (frequency bands, hemispheres) in the analyses and the particular shape of the cluster (p. 49, Fig 5C), this focus seems largely data-driven. It remains unclear how many such tests were performed and whether the results (e.g., the resulting weak correlation of r = 0.22 in one frequency band and one hemisphere in one part of a complexly shaped cluster; p. 15, l. 327) can be considered robust.

      Please see responses to comments #3.1 and #3.2 (above). In addition to reporting further details regarding statistical tests and multiple comparisons corrections, we will compute and report Bayes factors to quantify the strength of the evidence for correlations, as appropriate.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      The current manuscript provides an extensive in vivo analysis of two guidance pathways identifying multiple mechanisms that shape the bifurcation of DRG axons when forming the dorsal funiculus in the DREZ. 

      Strengths: 

      Multiple mouse mutant lines were used, together with complementary techniques; the results are very clear and compelling. 

      The findings are very significant and clearly move forward our understanding of the regulation of axonal development at the DREZ. 

      Weaknesses: 

      No major weaknesses were found. As it is I have no recommendations that would increase the clarity or quality of the manuscript. 

      Reviewer #2 (Public Review):

      Summary: 

      In this manuscript, the authors conduct a detailed analysis of the molecular cues that control the guidance of bifurcated dorsal root ganglion axons in a key region of the spinal cord called the dorsal funiculus. This is a specific case of axon guidance that occurs in a precise way. The authors knew that Slit was important but many axons still target correctly in Slit knockouts, suggesting a role for other guidance factors. Netrin1 is also expressed in this region, so they looked at netrin mutants. The authors found axons outside the DREZ in the Ntn1 mutants, and they show by single-neuron genetic labeling that many of these come from DRG neurons. Quantified axonal tracing studies in Slit1/2, Ntn1, or triple mutant embryos support the idea that Slit and Ntr1 have distinct functions in guidance and that the effect of their loss is additive. Interestingly none of these knockouts affect bifurcation itself but rather the guidance of one or both of the bifurcated axon terminals. Knockout of the Slit receptors (Robo1/2) or the Netrin 1 receptor (DCC) in embryos causes similar guidance defects to loss of the ligands, providing additional confirmation of the requirement for both guidance pathways. 

      Strengths: 

      This study expands understanding of the role of the axon guidance factors Ntr1/DCC and Slit/Robo in a specific axon guidance decision. The strength of the study is the careful axonal labeling and quantification, which allows the authors to establish precise consequences of the loss of each guidance factor or receptor. 

      Weaknesses: 

      There are some places in the text where the discussion of these data is compared with other studies and models, but additional details would help clarify the arguments. 

      The details were added to the first section of Discussion in the revision to address this weakness.  Also see the response to the recommendations below.

      Reviewer #3 (Public Review):

      Summary: 

      In this paper, Curran et al investigate the role of Ntn, Slit1, and Slit 2 in the axon patterning of DRG neurons. The paper uses mouse genetics to perturb each guidance molecule and its corresponding receptor. Cre-based approaches and immunostaining of DRG neurons are used to assess the phenotypes. Overall, the study uses the strength of mouse genetics and imaging to reveal new genetic modifiers of DRG axons. The conclusions of the experiments match the presented results. The paper is an important contribution to the field, as evidence that dorsal funiculus formation is impacted by Ntn and Slit signaling. However, there are some potential areas of the manuscript that should be edited to better match the results with the conclusions of the work. 

      Strengths: 

      The manuscript uses the advantage of mouse genetics to investigate the axon patterning of DRG neurons. The work does a great job of assessing individual phenotypes in single and double mutants. This reveals an intriguing cooperative and independent function of Ntn, Slit1, and Slit2 in DRG axon patterning. The sophisticated triple mutant analysis is lauded and provides important insight. 

      Weaknesses: 

      Overall, the manuscript is sound in technique and analysis. However, the majority of the manuscript is about the dorsal funiculus and not the bifurcation of the axons, as the title would make a reader believe. Further, the manuscript would provide a more scholarly discussion of the current knowledge of DRG axon patterning and how their work fits into that knowledge. 

      We revised the title as suggested.  Additional discussion of DRG axon growth at the DREZ is added to the last section of the Discussion in the revision.  Also see the response to the recommendations below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Given the reasons stated above, I have no specific recommendations for the authors. 

      There is a typo in the Abstract (... mice with triple deletion of Ntn1, Slit2, and Slit2....). 

      Corrected in the revision.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors twice repeated that their data on DRG guidance defects in the Ntn1 mutants differ from studies previously published in references 19 and 26. However it is unclear to me, without having read those other studies, what is actually different between this study and those, and why there would be differences between the results from two groups. If the authors think this is an important point to make they need to more clearly say what the other group saw and offer an explanation of why the data may be different. 

      We added detailed comparison of the defects from different studies to the first section of the Discussion and suggested multiple roles of Ntn1 in controlling sensory axon growth at the DREZ in the revision.

      (2) In the final section of the discussion it says, "The guidance regulation of DRG axon bifurcation by Slit and Ntn1 may be similar to but overshadowed by their function in midline guidance [43]." The meaning of this sentence was unclear to me. I had been thinking that since there are total knockout embryos (not conditional) there could be patterning effects that happen before the DRG branching that influence the formation of the DREZ. Is this what the authors mean to say here? How can the authors show that the guidance factors they have knocked out are actually functioning in the DRG neurons? 

      We agree with the reviewer that the first sentence is vague, so we edited the paragraph and included the discussion of the regulation of DRG axons at the DREZ, which was the main theme of this last section.  In addition, we agree with the reviewer’s suggestion of the possible indirect role of Ntn1 on DRG axons via the control of interneuron migration.  This possibility was included in the last paragraph of the Discussion.

      (3) In several of the figures (3T, 5I, 5J) there are distance measurements that are presumably averages of multiple axons in 3 or 4 embryos because 3-4 points are shown per graph. However, the figure and methods do not say how many axons were measured per embryo and I could not find if it says these numbers are averages. Clarifying the details of these panels would be useful. 

      The n is the number of animals analyzed and is now added to the figure legends.  From each animal, multiple sections (2-4) were analyzed for various parameters in Fig. 3 and 5.  This information was added to the Method section of the revision.

      Reviewer #3 (Recommendations For The Authors):

      Overall the data matches the conclusions in the paper. However, to this reviewer, the title suggests that Ntn and Slit will have defects in bifurcation. This is not the presented phenotype. I recommend the authors change the title to better reflect the findings of the work. 

      We edited the title of the revised manuscript to reflect the control of growth direction in the context of bifurcation.  

      The introduction of the work clearly outlines what is known about DREZ formation in mice but could extend its discussion to other systems like chick and zebrafish (Jaeda Coutinho-Budd et al. 2008, Wang and Scott 2000, Golding et al 1997, Nichols and Smith 2019, Kikel-Coury et al 2021). These studies are particularly important given that pioneer events, including bifurcation, can be visualized. Acknowledging the contribution of other model systems to the understanding of DRG axon patterning is important to improve the scholarly discussion of the paper. 

      We added more detailed discussion of the current knowledge of DRG axon growth at the DREZ from several relevant studies of the rodent and zebrafish models in the last section of Discussion.

      In the data presented, the authors see defects in the axon patterning of DRG neurons and conclude it is a defect in the dorsal funiculus formation. Another interpretation is that a subset of axons cannot invade the spinal cord boundary properly. This phenotype was observed in zebrafish with timelapse imaging (Kikel-Coury et al 2021). It may not be necessary to specifically test the axons' ability to enter the spinal cord in this paper, but the possibility that this could drive the presented phenotypes should be more clearly stated in the results. Entry is not thoroughly addressed in this paper and would need to be confirmed by labeling the edge of the spinal cord with a second reporter. No entry would obviously impact axon targeting. However, delayed entry could place the axon in a navigation environment that is atypical, causing it to navigate aberrantly and present as a funiculus phenotype. 

      We thank the reviewer for raising this very interesting point.  In our present view, dorsal funiculus formation is related to DRG axon patterning, which involves growth, guidance, and bifurcation of the incoming afferents at the dorsal spinal cord.  We believe that these events are highly coordinated by various environmental cues to generate the DREZ and the dorsal funiculus.  The defects we observed could result from the disruption of such coordination that leads to misregulation of DRG axon entry at the dorsal spinal cord, as suggested by the reviewer.  We propose that further analysis by time-lapse imaging as done in zebrafish would provide better understanding of such coordination.  This discussion was included in the last section of Discussion. 

      The authors should clarify that their approach does not knock out molecules in a cell-specific way. This would specifically impact the interpretation of the Dcc phenotypes. It is possible that UNC-40/DCC is guiding cells that are not labeled. The non-autonomous role of UNC-40/DCC should be clearly stated as a possibility. 

      This discussion was added to the last paragraph of the Discussion section.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are thankful to all reviewers and to you for your careful analysis of our work and for the feedback you all provided. The reviews were fundamentally positive with very minor modifications suggested, which we have addressed in this new version as follows.

      (1) We changed Figure 1 to include a high resolution image of the 3D structure of the low affinity complex between the RBD and the GM1 tetrasaccharide (GM1os), see panel d. We predicted this structure through extensive sampling through MD simulations as part of earlier work aimed at guiding the resolution of a crystal structure. Due to insurmountable difficulties in the crystallization of such complex the work was only published as an extended abstract(Garozzo, Nicotra, and Sonnino 2022). Following one of the reviewer’s suggestions we added all the details on the computational approach we used as Supplementary Material.

      (2) We added the comment and corresponding references to the Discussion section in relation to earlier work flagged by one of the Reviewers (Rochman et al. 2022) “Further to this, our results show that taking into consideration the effects on _N-_glycosylation on protein structural stability and dynamics in the context of specific protein sequences may be key to understanding epistatic interactions among RBD residues, which would be otherwise very difficult, where not impossible, to decipher.”

      References

      Garozzo, Domenico, Francesco Nicotra, and Sandro Sonnino. 2022. “‘Glycans and Glycosylation in SARS-COV2 Infection’ Session at the XVII Advanced School in Carbohydrate Chemistry, Italian Chemical Society. July 4th -7th 2021, Pontignano (Si), Italy.” Glycoconjugate Journal 39 (3): 327–34.

      Rochman, Nash D., Guilhem Faure, Yuri I. Wolf, Peter L. Freddolino, Feng Zhang, and Eugene V. Koonin. 2022. “Epistasis at the SARS-CoV-2 Receptor-Binding Domain Interface and the Propitiously Boring Implications for Vaccine Escape.” MBio 13 (2): e0013522.

    1. Author response:

      eLife assessment

      This study presents potentially valuable insights into the role of climbing fibers in cerebellar learning. The main claim is that climbing fiber activity is necessary for optokinetic reflex adaptation, but is dispensable for its long-term consolidation. There is evidence to support the first part of this claim, though it requires a clearer demonstration of the penetrance and selectivity of the manipulation. However, support for the latter part of the claim is incomplete owing to methodological concerns, including unclear efficacy of longer-duration climbing fiber activity suppression.

      We sincerely appreciate the thoughtful feedback provided by the reviewer regarding our study on the role of climbing fibers in cerebellar learning. Each point raised has been carefully considered, and we are committed to addressing them comprehensively. We acknowledge the importance of addressing methodological concerns, particularly regarding the efficacy of long-term suppression of CF activity, as well as ensuring clarity regarding penetrance and selectivity of our manipulation. To this end, we have outlined plans for substantial revisions to the manuscript to adequately address these issues.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study by Seo et al highlights knowledge gaps regarding the role of cerebellar complex spike (CS) activity during different phases of learning related to optokinetic reflex (OKR) in mice. The novelty of the approach is twofold: first, specifically perturbing the activity of climbing fibers (CFs) in the flocculus (as opposed to disrupting communication between the inferior olive (IO) and its cerebellar targets globally); and second, examining whether disruption of the CS activity during the putative "consolidation phase" following training affects OKR performance.

      The first part of the results provides adequate evidence supporting the notion that optogenetic disruption of normal CF-Purkinje neuron (PN) signaling results in the degradation of OKR performance. As no effects are seen in OKR performance in animals subjected to optogenetic irradiation during the memory consolidation or retrieval phases, the authors conclude that CF function is not essential beyond memory acquisition. However, the manuscript does not provide a sufficiently solid demonstration that their long-term activity manipulation of CF activity is effective, thus undermining the confidence of the conclusions.

      Strengths:

      The main strength of the work is the aim to examine the specific involvement of the CF activity in the flocculus during distinct phases of learning. This is a challenging goal, due to the technical challenges related to the anatomical location of the flocculus as well as the IO. These obstacles are counterbalanced by the use of a well-established and easy-to-analyse behavioral model (OKR), that can lead to fundamental insights regarding the long-term cerebellar learning process.

      Weaknesses:

      The impact of the work is diminshed by several methodological shortcomings.

      Most importantly, the key finding that prolonged optogenetic inhibition of CFs (for 30 min to 6 hours after the training period) must be complemented by the demonstration that the manipulation maintains its efficacy. In its current form, the authors only show inhibition by short-term optogenetic irradiation in the context of electrical-stimulation-evoked CSs in an ex vivo preparation. As the inhibitory effect of even the eNpHR3.0 is greatly diminished during seconds-long stimulations (especially when using the yellow laser as is done in this work (see Zhang, Chuanqiang, et al. "Optimized photo-stimulation of halorhodopsin for long-term neuronal inhibition." BMC biology 17.1 (2019): 1-17. ), we remain skeptical of the extent of inhibition during the long manipulations. In short, without a demonstration of effective inhibition throughout the putative consolidation phase (for example by showing a significant decrease in CS frequency throughout the irradiation period), the main claim of the manuscript of phase-specific involvement of CF activity in OKR learning can not be considered to be based on evidence.

      Second, the choice of viral targeting strategy leaves gaps in the argument for CF-specific mechanisms. CaMKII promoters are not selective for the IO neurons, and even the most precise viral injections always lead to the transfection of neurons in the surrounding brainstem, many of which project to the cerebellar cortex in the form of mossy fibers (MF). Figure 1Bii shows sparsely-labelled CFs in the flocculus, but possibly also MFs. While obtaining homogenous and strong labeling in all floccular CFs might be impossible, at the very least the authors should demonstrate that their optogenetic manipulation does not affect simple spiking in PNs.

      Finally, while the paper explicitly focuses on the effects of CF-evoked complex spikes in the PNs and not, for example, on those mediated by molecular layer interneurons or via direct interaction of the CF with vestibular nuclear neurons, it would be best if these other dimensions of CF involvement in cerebellar learning were candidly discussed.

      We appreciate the thorough review and recognize both the strengths and weaknesses highlighted.

      We concur with the reviewer’s assessment of the novelty of our approach, particularly in specifically perturbing the activity of CF in the flocculus and examining the effects during different phases of learning. Also the usage of OKR behavior paradigm adds strength to our study by providing a well-established model for investigating cerebellar learning processes.

      Regarding concerns about the efficacy of long-term optogenetic inhibition and the specificity of viral targeting, we are committed to addressing these issues through additional experiments. Specifically, we aim to demonstrate sustained inhibition of CF transmission by verifying the maintenance of inhibition throughout the putative consolidation phase. This may involve monitoring CF activity during the irradiation period in vivo. Furthermore, we plan to provide further characterization of viral targeting to ensure specificity of our approach.  

      Additionally, we recognize the importance of discussing alternative mechanisms of CF involvement in cerebellar learning. Hence, we will expand the manuscript to provide more comprehensive discussion of these dimensions of CF function to provide a clearer understanding of the broader implications of our findings.

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to explore the role of climbing fibers (CFs) in cerebellar learning, with a focus on optokinetic reflex (OKR) adaptation. Their goal was to understand how CF activity influences memory acquisition, memory consolidation, and memory retrieval by optogenetically suppressing CF inputs at various stages of the learning process.

      Strengths:

      The study addresses a significant question in the cerebellar field by focusing on the specific role of CFs in adaptive learning. The authors use optogenetic tools to manipulate CF activity. This provides a direct method to test the causal relationship between CF activity and learning outcomes.

      Weaknesses:

      Despite shedding light on the potential role of CFs in cerebellar learning, the study is hampered by significant methodological issues that question the validity of its conclusions. The absence of detailed evidence on the effectiveness of CF suppression and concerns over tissue damage from optogenetic stimulation weakens the argument that CFs are not essential for memory consolidation. These challenges make it difficult to confirm whether the study's objectives were fully met or if the findings conclusively support the authors' claims. The research commendably attempts to unravel the temporal involvement of CFs in learning but also underscores the difficulties in pinpointing specific neural mechanisms that underlie the phases of learning. Addressing these methodological issues, investigating other signals that might instruct consolidation, and understanding CFs' broader impact on various learning behaviors are crucial steps for future studies.

      We appreciate the reviewer’s recognition of the significance of our study in addressing the fundamental question of the role of CF in adaptive learning within the cerebellar field. The use of optogenetic tools indeed provides a direct means to investigate the causal relationship between CF activity and learning outcomes.

      To address concerns regarding the effectiveness of CF suppression during consolidation, we plan to conduct further in-vivo recordings. These will demonstrate how reliably CF transmission can be suppressed through optogenetic manipulation over an extended period.

      In response to the concern about potential tissue damage from laser stimulation, we believe that our optogenetic manipulation was not strong enough to induce significant heat-induced tissue damage in the flocculus. According to Cardin et al. (2010), light applied through an optic fiber may cause critical damage if the intensity exceeds 100 mW, which is eight times stronger than the intensity we used in our OKR experiment. Furthermore, if there had been tissue damage from chronic laser stimulation, we would expect to see impaired long-term memory reflected in abnormal gain retrieval results tested the following day. However, as shown in Figures 2 and 3, there were no significant abnormalities in consolidation percentages even after the optogenetic manipulation.

      Finally, we appreciate the reviewer’s recognition of the challenges involved in pinpointing specific neural mechanisms. We plan to expand the discussion to address these complexities and outline future research directions.

    1. Author Response:

      eLife assessment

      We thank the Editors for identifying qualified reviewers. We agree that the “evidence supporting this claim (that ‘many breast cancer mutations are mildly deleterious’) is incomplete”. Much more detail is needed to state this decisively and we do not claim completeness here. As far as validation, we carried out synthetic testing of the models as suggested by Reviewer #1 and the results seem good.

      Reviewer #1:

      We thank the Reviewer for a very thorough examination of not only the current paper but also our previous paper. We agree that the illustration material can be overwhelming and we plan to use the Reviewer’s advice in that matter. In addition, we originally put some textbook material in the Appendix, and arguably some of it may be considered superfluous.

      Most of the references the Reviewer provides are known to us, although it is likely we should cite and discuss more. All of the above will be included in the revision we are planning.

      The Reviewer is certainly correct that population growth and spatial effects play a major role in cancer. However, the effects of constraining environment are quite strong and the reality lies somewhere between the Moran and branching process models; exactly what we attempt to clarify. As for spatial effects, most tumors extracted in clinic are dissected in bulk and sub-sampling is rare, so the spatial information is rarely accessible.

      The subsequent point of importance concerns the weak specificity of the site frequency spectra (SFS) with respect to the underlying genetic and demographic forces. This cannot be denied. However, we just meant to state that our SFS are consistent with a model involving slightly deleterious passengers.

      Regarding the validation of the estimation procedures which is a point well-taken, we carried out synthetic testing of the models as suggested by Reviewer #1 and the results seem good. This will be discussed in full in the revision.

      In our view, the most important remark is the one concerning scaling of the models. The Reviewer is certainly correct that 100 stem cells are insufficient to drive a realistic tumor. However, what we had in mind but not explained sufficiently, is that a sample of 100 cells corresponds to average-depth coverage in bulk sequencing. Therefore, the strict interpretation is that the model mirrors what is observed in the sample. A more accurate approach would be to up-scale the model and then sample 100 cells from it. The Moran-type model can be up-scaled using diffusion approximation, and we hope to include these computations in the revision. The associated criticism concerning tumor growth seems less relevant, since we experimented with less or more stringent constraints in our models.

      Reviewer #2:

      We thank Reviewer #2 for studying our paper and some very positive comments. Among others, the Reviewer underscores the fact that the Moran-type model generates SFS concordant with the data (with all necessary reservations). The Reviewer concurs with us that conditioning on non-extinction is not very common in the literature, while it should be.

      Similarly as the Reviewer, we are somewhat puzzled by the differences in behavior between models A and B. Model B seems more parsimonious, but Model A looks more similar to the critical or slightly supercritical branching process. We will work to clarify these observations.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors set out to develop genetic tools that can specifically and comprehensively label Axo-Axonic Cells (AACs), also known as Chandelier cells. These AACs possess unique morphological and connectivity features, making them an ideal subject for studying various aspects of cell types across different experimental methods. To achieve both specificity and comprehensiveness in AAC labeling, the authors employ an intersectional strategy that combines lineage origin and molecular markers. This approach successfully targets AACs across the mouse brain and reveals their widespread distribution in various brain structures beyond the previously known regions. Additionally, the authors utilize rabies transneuronal labeling to provide a comprehensive overview of AACs, their variations, and input sources throughout the brain. This experimental approach offers a powerful model system for investigating the role of AACs in circuit development and function across diverse brain regions.

      Strengths:

      Genetic Tools and Specificity: The authors' genetic tools show qualitative evidence of specificity for AACs, opening new avenues for targeted research on these cells. The use of intersectional strategies enhances the precision of AAC labeling.

      Widespread Distribution: The study significantly broadens our understanding of AAC distribution, revealing their presence in brain regions beyond what was previously documented. This expanded knowledge is a valuable contribution to the field.

      Transneuronal Labeling: The inclusion of rabies transneuronal labeling provides a comprehensive view of AACs, their variations, and input sources, allowing for a more holistic understanding of their role in neural circuits.

      Weaknesses:

      Quantitative Analysis: While the claim of specificity appears qualitatively convincing, the manuscript could be improved with more quantitative analysis.

      We are glad that the reviewers appreciated our multimodal and brain-wide characterizations of the AAC population. We include many qualitative AAC examples and would like to highlight the quantitative nature of our whole brain cell body and cartridge analyses, made possible by transgenic targeting and our serial two-photon tomography imaging platform (STP). In addition to providing this brain wide AAC atlas, we also propose AACs as perhaps one of the best case examples for a bona fide cell type, which may inspire further in-depth anatomical and functional studies of AACs, and efforts to capture other ground truth cell types.

      Comprehensiveness Claim: The assertion of comprehensiveness, implying labeling "almost all" AACs in all brain regions, is challenging to substantiate conclusively. Acknowledging the limitations of proving complete comprehensiveness and discussing them in the discussion section would be more appropriate than asserting it in the results section.

      We thank the reviewer for this suggestion and have revised the results and discussion sections accordingly. The issue of how to access comprehensiveness in AAC labeling is a fair and important point, as dense brain-wide AAC labeling has not been achieved and assessed before. Previous studies had used less efficient and specific methods for capturing AACs, primarily in select areas of cortex, hippocampus, and amygdala. These AAC populations are recapitulated by our genetic strategies with higher density and specificity. It does not seem that we have missed any previously-reported AAC populations; in fact, we discovered multiple previously unreported populations. Another evidence supporting our “comprehensive” labeling of AACs is that two independent Unc5b and Pthlh transgenic strategies showed very similar AAC distribution patterns (Fig. 1 Suppl. 3). However, we recognize that probably the only way to fully assess “completeness” of labeling may be to compare with anatomical ground truth, such as by dense EM reconstruction of all AACs across the brain volume. This is currently not technically possible but may become feasible in the future. 

      Local Inputs: While the manuscript focuses on inter-areal inputs to AACs, it would benefit from exploring local inputs as well. Identifying the local neurons that target AACs and analyzing their patterns could provide valuable insights into AAC function within specific brain regions.

      This is a good suggestion. However, our serial two-photon tomography imaging platform does not have the capability for reliably preserving tissue sections for immunohistochemical processing afterward. Additionally, though our starter AAV injections were limited to 100-150nL, there were far too many input cells labelled at the injection side to resolve individual input cells and correlate with their synaptic partners (e.g. a rabies-labelled pyramidal cell within the injection site may still project to starter cell few hundred microns away). Thus, our rabies input mapping was best suited for characterizing long-range inputs and was the focus here. For studying local inputs to AACs, future studies could combine very dilute starter AAV injections with multi-marker characterization of cell types by immunohistochemistry or FISH.  

      Discussion Focus: The discussion section should delve deeper into the biological implications of the findings, moving beyond technical significance. Exploring similarities and differences in input patterns between AACs and other cell types, and linking them to the locations of starter cells or specific connectivity patterns in the brain, would enrich the discussion. For instance, investigating whether input patterns can be predicted based on the locations of starter cells or connectivity specificity could provide valuable insights.

      We thank the reviewer for this suggestion. We have expanded the discussion to include more on the relevance and implications of our input mapping results to different starter populations of AACs.

      Reviewer #2 (Public Review):

      Summary:

      The goals of this study were to develop a genetic approach that would specifically and comprehensively target axo-axonic cells (AACs) throughout the brain and then to describe the patterns and characteristics of the targeted AACs in multiple, selected brain regions. The investigators have been successful in providing the most complete description of the regional distribution of putative (pAACs) throughout the brain to date. The supporting evidence is convincing, even though incomplete in some brain regions. The findings should serve as a guide for more detailed studies of AACs within each brain region and lead to new insights into the connectivity and functional organization of this important group of GABAergic interneurons.

      Strengths:

      The study has numerous strengths. A major strength is the development of a unique intersectional genetic strategy that uses cell lineage (Nkx2.1) and molecular (Unc5b or Pthlh) markers to identify axo-axonic AACs specifically and, apparently, nearly completely throughout the mouse brain. While AACs have been described previously in the cerebral cortex, hippocampus, and amygdala, there has been no specific genetic marker that selectively identifies all AACs in these regions.

      The current genetic strategy has labeled pAACs in a large number of additional brain regions, including the claustrum-insular complex, extended amygdala, and several olfactory centers. In general, the findings provide support for the specificity of the methods for targeting AACs, and include some examples of labeling near markers of axon initial segments. However, the Investigators are careful to refer to labeled neurons as "putative AACs" as they have not been fully characterized and their identity verified.

      The descriptions and numerous low-magnification images of the brain provide a roadmap for subsequent, detailed studies of AACs in numerous brain regions. The overview and summaries of the findings in the Abstract, Introduction, and Discussion are particularly clear and helpful in placing the extensive regional descriptions of AACs in context.

      Weaknesses:

      One weakness of the study is the lack of an illustration of the high-resolution cell labeling that can be achieved with the methods, including labeling of numerous rows of axon terminals in contact with axon initial segments. The initial images of the brain-wide distribution of putative AACs are necessarily presented at low magnification. Although the authors indicate that the cells have "highly characteristic AAC labeling patterns throughout the neocortex, hippocampus and BLA", these morphological details cannot be visualized by the reader at the current magnification, even when the images are enlarged on the computer screen. Some of the details become evident in later Figures, but an initial illustration of single cell labeling with confocal microscopy, or tracing of their characteristic axonal arbors, would support the specificity of the labeling in the low magnification images.

      We thank the reviewer for the suggestion. We have now added high-resolution images showing the colocalization of AAC axon boutons (cartridges) along AnkG positive postsynaptic axon initial segments in Fig. 2 Suppl. 1, Figure 1 panels a, d, e, and Fig. 4 panels b, c. These images unequivocally demonstrate AAC identity and specificity.

      Table 1 indicates that the AAC identity of the cells has been validated in many brain regions but not in all. The methods used for validation have not been described and should be included for completeness. The authors are careful to acknowledge that labeled cells in some regions have not been validated and refer to such cells as pAACs.

      Validation was defined by colocalization of RFP-labelled AAC cartridges and AnkryinG or Phospho-IκBα-labelled axon initial segments, imaged by confocal microscopy. We provide high-magnification examples throughout figures 2-6 and supplements. We have also tried to clarify this better in the methods section entitled “Immunohistochemistry.” Putative AAC (pAACs) refers to populations in which relatively few single cell examples of AACs exhibiting co-localized cartridges were found, largely due to the sparsity of the low tamoxifen dosage used (see response above).

      The intersectional genetic methods included the use of the lineage marker Nkx2.1 with either Unc5b or Pthlh as the molecular marker. As described, the mice with intersectional targeting of Nkx2.1 and Unc5b appear to show the most specific brain-wide labeling for AACs, and the majority of the descriptions are from these mice. The targeting with Nkx2.1 and Pthlh is less convincing. The title for Figure 1 Supplemental Figure 3 suggests a similar AAC distribution in the Pthlh;Nkx2.1 mouse compared to the Unc5b;Nkx2.1 mouse. However, the descriptions of the individual panels suggest a number of inconsistencies and non-AAC labeling. The heavy labeling in the caudate and cells in layer 4 is particularly problematic. Based on the data presented, it appears that heavy labeling achieved in these mice could not be relied on for specific labeling of all AACs, although specific labeling could be achieved under some conditions, such as following tamoxifen administration at select ages.

      The reviewer is correct about Pthlh being less specific for AACs than Unc5b when crossed to a constitutive Nkx2.1 recombinase driver line. Pthlh/Nkx2.1 intersection labeled a set of layer 4 cells in somatosensory cortex and dense cells in striatum, which are clearly not AACs. But these are the only main difference compared to Unc5b/Nkx2.1 intersection. As the reviewer points out, it is only when Pthlh is crossed to an inducible Nkx2.1-CreER line and induced embryonically with tamoxifen that there is more specific AAC labeling (at least in cortex). We included this data as well as the intersection with VIP-Cre in case either of these are useful to researchers studying fate-mapping of AACs or bipolar cell interneurons. We have also revised the title of Fig. 1 Suppl. 3 to better convey this.

      The methods described for dense labeling and single-cell labeling are described briefly in the methods. Some discussion of the development of the methods would be useful, including how it was determined that methods for heavy labeling identified AACs specifically and completely.

      We have added a description on the development of these to the methods section entitled “Animals.”

      Reviewer #3 (Public Review):

      Summary:

      Raudales et al. aimed at providing an insight into the brain-wide distribution and synaptic connectivity of bona fide GABAergic inhibitory interneuron subtypes focusing on the axo-axonic cell (AAC), one of the most distinctive interneuron subtypes, which innervates the axon initial segments of glutamatergic projection neurons. They establish intersectional genetic strategies that enable them to specifically and comprehensively capture AACs based on their lineage (Nkx2.1) and marker expression (Unc5b, Pthlh). They find that AACs are deployed across essentially all the pallium-derived brain structures as well as the anterior olfactory nucleus, taenia tecta, and lateral septum. They show that AACs in distinct areas and layers of the neocortex as well as different subregions of the hippocampal formation display unique soma and synaptic density and morphological variations. Rabies virus-based retrograde monosynaptic input tracing reveals that AACs in the neocortex, the hippocampus, and the basolateral amygdala receive synaptic inputs from common as well as specific brain regions and supports the utility of this novel genetic approach. This study elucidates brain-wide neuroanatomical features and morphological variations of AACs with solid techniques and analysis. Their novel AAC-targeting strategies will facilitate the study of their development and function in different brain regions. The conclusions in this paper are well supported by the data. However, there are a few comments to strengthen this study.

      (1) The definition of putative AAC (pAAC) is unclear and Table 1 may not be accurate. Although the authors find synaptic cartridges of RFP-labeled cells in the claustro-insular complex and the dorsal endopiriform nuclei, they still consider these cells as pAACs (not validated). The authors claim that without examining the presence of synaptic cartridges, RFP-labeled cells in the hypothalamus and the bed nuclei of the stria terminalis (BNST) are pAACs while those in the L4 of the somatosensory cortex in Pthlh;Nkx2.1;Ai65 mice are non-AACs. In Table 1, the BNST is supposed to contain AACs (validated), but in the text, the authors claim that RFP-labeled cells in the BNST are pAACs. Could the authors clarify how AACs, pAACs, and non-AACs are defined?

      We thank the reviewer for their interest and comments on our work. Please see our response to reviewer 2 for clarification on putative pAACs. Additionally, we have clarified in the methods under “Immunohistochemistry” how we defined AACs, pAAC, and non-AACs. For BNST we did not positively identify more than a few exhibiting overlap with AnkryinG/IκBα, so we currently leave them as pAACs—Table 1 has been corrected to reflect this.

      (2) The intersectional strategies presented in this study could also specifically capture developing AACs. If so, how early are AACs labeled in the brain? It would also be nice if the authors could add a simple schematic like Fig. 1a showing the time course of Pthlh expression.

      We thank the reviewer for suggesting the application of our method in studying AAC development. As the onset of Unc5b is in early postnatal time, tamoxifen induction of Unc5b-CreER in early postnatal days can enable studies of AAC neurite and synapse development, maturation, and plasticity. Similarly, Pthlh expression in the brain is relatively low/absent at P4 and present at P14 and later timepoints. Pthlh-Flp;Nkx2.1-Cre intersection can be used to study postnatal AAC development and plasticity.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      While the claim of specificity appears qualitatively convincing, additional quantitative analysis would make the authors' claim much stronger. For example in Figure 4 (f-h), where the authors show an overlap of AAC axons with AnkG labeling, there also appears to be a region of AAC axon lacking adjacent AnkG labeling. The author could quantify the fraction of cartridges that overlap with AnkG labeling in different brain regions, potentially stringing their claim that pAACs are AACs as well as providing important documentation of the diversity or homogeneity of compartment targeting across the brain.

      As mentioned previously, we only performed AnkG co-labeling analysis on low-dose tamoxifen/sparsely labelled samples in which we could readily differentiate individual cells. This was performed on samples with the Ai65 cytoplasmic reporter—for validation purposes we could positively identify co-labelled cartridges, but it would be more difficult to accurately identify any cartridges not co-labeled (since the entire axon was labelled with RFP). For precisely identifying and mapping AAC cartridge locations we found the intersectional synaptophysin-EGFP reporter (Fig. 2k-n) to be a more precise method for specifically labeling the “cartridge” segment of AAC axons. However, we did not try AnkG staining on samples from this reporter line, as they were set aside for STP imaging.

      Regarding the claim of comprehensiveness, labeling "almost all" AACs in all brain regions is a high standard and challenging to demonstrate conclusively. The study already significantly expands our understanding of AAC distribution, and the authors might consider discussing the limitations of proving complete comprehensiveness in the discussion rather than claiming it in the results section.

      We again thank the reviewer for this critique. As mentioned above, we have revised the results and discussion sections to better convey this point across.

      Furthermore, the manuscript connectivity section primarily focuses on inter-areal inputs to AACs, but it could benefit from exploring local inputs as well. By identifying the local neurons that target AACs, the authors could ask if there is any general property or rule of the local projections to AACs across the brain, or at least within the cortex. Moreover, a clear indication of the injection site would be helpful, particularly in Figure 7, where there seems to be some discrepancy between the histograms and fluorescent images regarding local projections. The histograms of Figure 7, seem to indicate that the local projection to AACs is a small fraction of all the presynaptic neurons, however, the fluorescent image for the SSp seems to suggest otherwise with many fluorescent cells in the injected area.

      We thank the reviewer for these comments. Regarding the local inputs in the rabies tracing datasets, it is a limitation (as mentioned above) of our STP platform’s inability to preserve tissue for immunohistochemistry labeling as well as our relatively dense starter cell labeling. Instead, our focus here was on long-range inputs (i.e. outside the ipsilateral ARA area of injection), which was simply not known for these AAC populations. We have revised the Figure 7 legend and added a description in the methods section to more clearly indicate that we only included long-range input projections in the Figure 7 histograms.

      In the discussion, the authors should delve more into the biological implications of their findings rather than solely emphasizing the technical significance. They could explore the similarities and differences in input patterns between AACs and other cell types, potentially linking them to the locations of their starter cells or specific connectivity patterns in the brain. For example, the authors could check if the input patterns could be predicted from the projections to the layers where their starter cells are located (either from an Atlas like the Allen Connectivity Atlas, or from retrograde rabies injections in the same locations). Can the differences between the input patterns to PVC and AAC be predicted for their location versus some specificity of connections?

      Thank you for the extensive comment. We address this point above, and have revised our discussion accordingly.

      Reviewer #2 (Recommendations For The Authors):

      The Figure legends vary in completeness and quality.

      (1) The legend for Figure 1 is very informative, and section e-g serves as a useful guide, as the legend includes the names of the brain regions related to the abbreviations and also indicates the specific panels that show the identified structures. Because of the large number of structures and the number of panels in each Figure, it would be ideal to follow the same pattern in the remaining figures.

      (2) Several edits are needed in the legend for Figure 1 Supplement Figure 1. The descriptions of a-f could be improved by providing general terms to describe the brain regions associated with the latter list of abbreviations (as has been done with the identification of the cerebral cortex, hippocampus, and olfactory centers and their related panels). One suggestion would be to write out insula, claustrum, and endopiriform prior to listing the abbreviations (AI, CLA, EP) (b-c) and adding amygdaloid complex and extended amygdala before the abbreviations (COA, BLA, MeA) (d-f) and (BST) (d).

      We thank the reviewer, as the suggestion of further expanding the abbreviations is a good one. As such, we have revised/reorganized the anatomical abbreviations in the figure legends for Figure 1 Supplement Figures 1, 2, and 3.

      Descriptions for Panels g-j require editing to link the appropriate panels and the descriptions. Panels for BSTpr appear to be g-h (rather than f-g) and i,j (rather than h-i.

      We have fixed this typo in the legend for Figure 1 Supplement Figure 1.

      Descriptions for Panels k-n could be edited to include abbreviations for the identified brain regions. For example, include the abbreviation ARHP after arcuate nuclei and indicate panels m-n (rather than j-l); include PVP after paraventricular and indicate panel n (rather than m); include DMPH after dorsomedial nuclei and indicate k-m (rather than j-l).

      Thank you for the suggestion. We have expanded the abbreviations in Figure 1 Supplement 1 accordingly.

      Reviewer #3 (Recommendations For The Authors):

      (1) Please clarify if tdTomato, EGFP (from helper AAVs), and RFP (from rabies virus) are native signals or IHC signals in legends.

      We have added the descriptors “native” or “stained” to all figure legends containing fluorescent images.

      (2) Fig. 4b and c: Please add insets of high-magnification images showing AAC boutons along AnkG-labeled AISs.

      We have added these insets to Fig. 4b and c.

      (3) Fig. 7S1: It appears that d and e are reversed. Judging from the positions of starter cells, d is for PV-Cre? Please make sure. It is also better to draw the laminar border in d and e.

      The original genotype labels are correct for Fig. 7S1 d and e. We have added the laminar borders as suggested.

      (4) Fig. 9b: Just for consistency, please label with the name of the helper AAV.

      Added.

      (5) Line 617: intragranular>>>infragranular?

      Corrected, thank you.

      (6) It may be unclear to some readers if the images in the figures are from confocal or STP. The authors may want to clarify that all images in the figures are generated by confocal microscopy in the method section.

      We have clarified this better in the methods section, “Microcopy and image analysis.”

      (7) The authors should clarify that STP was used to map input cells to the brain in the result section.

      We have added this description in the results section.

    1. Author response:

      We thank the reviewers and editors for their review and assessment of our manuscript and comprehensive feedback. The manuscript will be revised to address all the reviewers’ comments. Specifically, to address the comment of Reviewer 1 and the editor regarding the lack of quantitative comparison between the classical and fractal cycle approaches and identification of the source of the discrepancies between classical and fractal cycles, we plan to perform and report the following analyses and comparisons:

      (1) Intra-method reliability

      a) Classical cycles. An additional scorer will independently define onsets and offsets of all classical sleep cycles for all datasets and mark sleep cycles with skipped REM sleep. Likewise, we will perform automatic sleep cycle detection. We will add a new Supplementary table showing the averaged cycle durations obtained by the two scorers and automatic algorithm as well as the inter-scorer rate agreement and update the Supplemental Excel file with corresponding information for each cycle for each participant for each dataset.

      b) Fractal cycles. We will correlate the durations of fractal cycles calculated using the parameters defined in the Main text with those calculated using different parameters, namely, the longer and shorter smoothing window lengths, higher and lower minimum peak prominence. Likewise, we will correlate the durations of fractal cycles calculated using frontal vs other available electrodes.

      (2) Origin of method differences

      In the current version of our Manuscript, we describe a few possible sources of discrepancies between classical and fractal cycle durations and numbers. Following the suggestion of one of the reviewers, in the revised Manuscript, we will quantify the sources of discrepancies between the two methods in order to identify the “criteria for recordings in which fractal cycles will produce similar results to the classical method”. Specifically, we will calculate the correlation between the difference in classical vs fractal sleep cycle durations on one side, and either the amplitudes of fractal descend/ascend, relative durations of cycles with skipped REM sleep and wake after sleep onset, or peak flatness on the other side.    

      In addition, we will include a new figure, illustrating the goodness of fit of the data as assessed by the IRASA method. Likewise, we will update Supplementary File 1 (that shows classical and fractal sleep cycles for each participant) with marks that highlight the onsets and offsets of sleep cycles as well as the cycles with skipped REM sleep.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment 

      This study explores the role of one the most abundant circRNAs, circHIPK3, in bladder cancer cells, providing convincing data that circHIPK3 depletion affects thousands of genes and that those downregulated (including STAT3) share an 11-mer motif with circHIPK3, corresponding to a binding site for IGF2BP2. The experiments demonstrate that circHIPK3 can compete with the downregulated mRNAs targets for IGF2BP2 binding and that IGF2BP2 depletion antagonizes the effect of circHIPK3 depletion by upregulating the genes containing the 11mer motif. These valuable findings contribute to the growing recognition of the complexity of cancer signaling regulation and highlight the intricate interplay between circRNAs and protein-coding genes in tumorigenesis. 

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In this work the authors propose a new regulatory role for one the most abundant circRNAs, circHIPK3. They demonstrate that circHIPK3 interacts with an RNA binding protein (IGF2BP2), sequestering it away from its target mRNAs. This interaction is shown to regulates the expression of hundreds of genes that share a specific sequence motif (11-mer motif) in their untranslated regions (3'-UTR), identical to one present in circHIPK3 where IGF2BP2 binds. The study further focuses on the specific case of STAT3 gene, whose mRNA product is found to be downregulated upon circHIPK3 depletion. This suggests that circHIPK3 sequesters IGF2BP2, preventing it from binding to and destabilizing STAT3 mRNA. The study presents evidence supporting this mechanism and discusses its potential role in tumor cell progression. These findings contribute to the growing complexity of understanding cancer regulation and highlight the intricate interplay between circRNAs and protein-coding genes in tumorigenesis.

      Strengths:

      The authors show mechanistic insight into a proposed novel "sponging" function of

      circHIPK3 which is not mediated by sequestering miRNAs but rather a specific RNA binding protein (IGF2BP2). They address the stoichiometry of the molecules involved in the interaction, which is a critical aspect that is frequently overlooked in this type of studies. They provide both genome-wide analysis and a specific case (STAT3) which is relevant for cancer progression. Overall, the authors have significantly improved their manuscript in their revised version.

      Weaknesses:

      While the authors have performed northern blots to measure circRNA levels, an estimation of the circRNA overexpression efficiency, namely the circular-to-linear expression ratio, would be desired. The seemingly contradictory effects of circHIPK3 and STAT3 depletion in cancer progression, are now addressed by the authors in their revised manuscript, incorporating potential reasons that might explain such complexity.

      We have now included a full version of the northern blot, where no discernible linear precursor can be detected, supporting efficient circHIPK3 WT and circHIPK3 MUT production (please see the detailed description in the specific comments below). We agree that the observations about STAT3 homeostasis and cancer progression, is not a straightforward extrapolation as discussed. 

      Reviewer #2 (Public Review):

      Summary: 

      The authors have diligently addressed most of the points raised during the review process (except the important point of "additional in vitro experiments [...] needed to investigate the implication of circHIPK3 in bladder cancer cell phenotype" for which no additional experiments were performed), resulting in an improvement in the study. The data are now described with clarity and conciseness, enhancing the overall quality of the manuscript. 

      Strengths: 

      New, well-defined molecular mechanism of circRNAs involvement in bladder cancer. 

      Weaknesses: 

      Lack of solid translational significance data. 

      The focus of this study has been to disclose molecular mechanisms of action by circHIPK3, with implications for cancer. We agree that further studies are needed to fully understand the impact of circHIPK3 in bladder cancer.  

      Reviewer #3 (Public Review):

      In Okholm et al., the authors evaluate the functional impact of circHIPK3 in bladder cancer cells. By knocking down circHIPK3 and performing an RNA-seq analysis, the authors found thousands of deregulated genes which look unaffected by miRNAs sponging function and that are, instead, enriched for a 11-mer motif. Further investigations showed that the 11mer motif is shared with the circHIPK3 and able to bind the IGF2BP2 protein. The authors validated the binding of IGF2BP2 and demonstrated that IGF2BP2 KD antagonizes the effect of circHIPK3 KD and leads to the upregulation of genes containing the 11-mer. Among the genes affected by circHIPK3 KD and IGF2BP2 KD, resulting in downregulation and upregulation respectively, the authors found the STAT3 gene, which also consistently has concomitant upregulation of one of its targets TP53. The authors propose a mechanism of competition between circHIPK3 and IGF2BP2 triggered by IGF2BP2 nucleation, potentially via phase separation. 

      Strengths: 

      Although the number of circRNAs continues to grow, this field lacks many instances of detailed molecular investigations. The presented work critically addresses some of the major piaalls in the field of circRNAs, and there has been a careful analysis of aspects frequently poorly investigated. Experiments involving use of time-point knockdown followed by RNAseq, investigation of miRNA-sponge function of circHIPK3, identification of 11-mer motif, identification and validation of IGF2BP2, and the analysis of copy number ratio between circHIPK3 and IGF2BP2 in assessing the potential ceRNA mode of action are thorough and convincing. 

      Weaknesses: 

      It is unclear why the authors used certain bladder cancer cells versus non-bladder cells in some experiments. The efficacy of certain experiments (specifically rescue experiments) and some control conditions is still questionable. Overall, the presented study adds some further knowledge in describing circHIPK3 function, its capability to regulate some downstream genes, and its interaction and competition for IGF2BP2. 

      We have provided a discussion and argumentation of how certain bladder cancer cells (and non-bladder cancer cells) have been used in this study in our previous rebuttal letter and also clarified this further in the materials and methods section in the first revision. Regarding control conditions for experiments, we believe we have included all necessary controls and explanations for these in the revised version (please see the detailed description in the specific comments below). 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major points about revised manuscript

      (1) In Supplementary Figure S5H, the membrane may have been trimmed too closely to the circRNA band, potentially resulting in the absence of the linear RNA band. Could the authors provide a full image of the membrane that includes the loading points? Having access to the complete image would allow for a more comprehensive evaluation of the results, including the presence or absence of expected linear and circular RNA bands.

      I have taken the liberty to move this “major point” from the public review section, as I believe it would be too detailed for this section. We have included the full section of the northern blot, according to the reviewers recommendations. 

      As described in the previous rebuttal letter our northern blots suffered from heavy background signal arising from the rRNA bands, which was the reason for cuttng the northern blot in the previous version of Supplementary figure S5H. We have now shown the entire blot as suggested by the reviewer, so that the reader can more clearly inspect any potential linear precursor band. We previously stated that we could not assess the circular-to-linear ratio due to background signal, since a potential linear HIPK3 precursor RNA could be masked by the rRNA signal. However, the theoretical size of a linear precursor is ~2.9 kb – a region where we do not detect any distinct bands (just above the 18S band), making a rather efficient circularization very likely. In support of this claim, we are using the Laccase2 vector described in Kramer et al, 2015 (Genes dev), which is proven to produce high levels of circHIPK3 compared to negligable amounts of linear precursor (although in a different cell line). We have also included a 5.8S rRNA probe to control for loading and RNase R activity (can also be ascertained by the disappearence of 18S/28S bands). Since we do not have the option to use another probe (limited by the BSJ-specific probe) and it is not practical to deplete for rRNA from 20 µg samples of total RNA, prior to running the northern blot, we find that this data sufficiently proves that our vector constructs produce a decent amount of RNase R-resistant circHIPK3, with no visible/discernible linear precursor.    

      Minor points about revised manuscript

      (1) In Supplementary Figure S3B, the authors offer no explanation as to why genes that become upregulated upon circHIPK3 knockdown generally contain more circHIPK3-RBP binding sites other than for IGF2BP2. A clarification would be of help.

      Again, this issue has been addressed in the previous rebuttal letter. Our response is repeated below:

      We do not have any evidence to explain this observation. One possibility is that other RBPs elicit mRNA-stabilizing effects on average, whereas abundant IGF2BP2 (~ 120.000200.000 copies per cell) now able to bind more target mRNAs and elicit destabilization. This remains highly speculative though.

      (2) In Supplementary Figure S3D, the authors' claim that the 11-mer motif is found more bound to IGF2BP2 than for other circHIPK3-RBPs should referred to the corresponding dataset/reference.

      Again, this issue has been addressed in the previous rebuttal letter. Our response is repeated below:

      This information is stated in the figure legend (K562) and we have now included it in the main text as well: “We evaluated how often binding sites of circHIPK3-RBPs overlap the 11-mer motif and found that this is more often the case for IGF2BP2 binding sites than binding sites of the other circHIPK3-RBPs when scrutinizing K562 datasets (Supplementary Figure S3D)”.

      (3) In the rescue experiment where both circHIPK3 and IGF2BP2 are downregulated, using the term "normalization" to mean reestablishing normal levels of gene expression can lead to confusion with the concept of normalization as it is commonly understood in the context of data processing (i.e. the mathematical process of adjusting data to account for various factors that might affect measurements). I would recommend the authors to use a term that more specifically describes the biological process they are referring to, such as "restoration of normal expression levels" or simply "return to normal levels".

      We agree that this term could be misunderstood. This has now been changed as recommended.

      (4) The figure legend of Supplementary Figure 5F is wrongly labeled. The legend for panel F actually corresponds to panel G and vice versa. 

      This has now been corrected.  

      Reviewer #2 (Recommendations For The Authors): 

      The authors have diligently addressed most of the points raised during the review process (except the important point of "additional in vitro experiments [...] needed to investigate the implication of circHIPK3 in bladder cancer cell phenotype" for which no additional experiments were performed), resulting in an improvement in the study. The data are now described with clarity and conciseness, enhancing the overall quality of the manuscript. Therefore, I support the publication of this work. 

      We thank the reviewer for the positive comments.

      Reviewer #3 (Recommendations For The Authors): 

      Please ensure that when the changes are made (especially for major points) by addressing the reviewer's comments, these are all appropriately incorporated in the text (for example the use of Act B as a low affinity positive control (now in Fig 4A), is not explained in the text neither the legends/methods) 

      This has now been included.

      Please ensure that all the legends correspond to the right figures (eg: Supplementary Figure with rescue experiment is 5F, but the corresponding legend in the manuscript is the S5G). 

      This has now been corrected.

      Please for future reviewing processes ensure the new parts are properly highlighted or coloured differently in the manuscript

      This has now been done more thoroughly.

    1. Author response

      Reviewer #1 (Public Review):

      The authors aimed to investigate if 2-hydroxybutyrate (2HB), a metabolite induced by exercise, influences physiological changes, particularly metabolic alterations post-exercise training. They treated young mice and cultured myoblasts with 2HB, conducted exercise tests, metabolomic profiling, gene expression analysis, and knockdown experiments to understand 2HB's mechanisms. Their findings indicate that 2HB enhances exercise tolerance, boosts branch chain amino acid (BCAA) enzyme gene expression in skeletal muscles, and increases oxidative capacity. They also highlight the role of SIRT4 in these effects. This study establishes 2HB, once considered a waste product, as a regulator of exercise-induced metabolic processes. The study's strength lies in its consistent results across in vitro, in vivo, and ex vivo analyses.

      The authors propose a mechanism in which 2HB inhibits BCAA breakdown, raises NAD+/NADH ratio, activates SIRT4, increases ADP ribosylation, and controls gene expression.

      However, some questions remain unclear based on these findings:

      This study focused on the effects of short-term exercise (1 or 5 bouts of treadmill running) and short-term 2HB treatment (1 or 4 days of treatment). Adaptations to exercise training typically occur progressively over an extended period. It's important to investigate the effects of long-term 2HB treatment and whether extended combined 2HB treatment and exercise training have independent, synergistic, or antagonistic effects.

      We agree with the reviewer that investigation of longer-term 2HB treatment may potentially yield interesting findings with more implications to exercise physiology. To investigate the effects of 2HB treatment against or in combination with a progressive exercise training protocol would require an experiment duration between 4 to 12 weeks, based on previous studies (Systematic Review by Massett et al., Frontiers in Physiology, 2021, 10.3389/fphys.2021.782695). However, our experience with these types of experiments is that such a pursuit would require a breadth of work beyond the scope of this current study. For instance, if there were evidence of weakened effect of 2HB over time, one may be compelled to investigate other organs such as the liver to find signs of metabolic adaptation to the exogenous metabolite. If there were additive or synergistic effects on exercise performance, one may be compelled to investigate changes to the cardiovascular system in addition to the skeletal muscle. Additional questions would be raised around the skeletal muscle as well, including assessment of structural and fibre-type changes. Further, these additional mechanisms would need to be characterized in a time course fashion. Rather, we view the scope of the current study to be the acute response to 2HB as an initial report on mechanistic effects of 2HB.

      Exercise training leads to significant mitochondrial changes, including increased mitochondrial biogenesis in skeletal muscle. It would be valuable to compare the impact of 2HB treatment on mitochondrial content and oxidative capacity in treated mice to that in exercised mice.

      We agree with the author that it is of interest to investigate how 2HB may affect mitochondrial biogenesis. However, our preliminary findings were that 2HB-treated MEFs, C2C12s, and mouse soleus muscles showed no change in PGC1α gene expression after four days of treatment (data not shown). As a follow-up assessment of mitochondrial protein expression, although not specific to mtDNA derived genes, we quantified the expression of the respiratory chain proteins in cells and soleus muscle and found no effect of 2HB treatment (SFig. 5,6). At this stage we conclude that there is not evidence of 2HB modifying mitochondrial biogenesis in this time frame and that further investigation would be best suited to a follow-up study such as one interested in long-term exercise training.

      The authors demonstrate that 2-ketobutyrate (2KB) can serve as an oxidative fuel, suggesting a role for the intact BCAA catabolic pathway. However, it's puzzling that the knockout of BCKDHA, a subunit crucial for the second step of BCAA catabolism, did not result in changes in oxidative capacity in cultured myoblasts.

      While we report the BCKDH complex to be dispensable for 2KB oxidation it is important to note that previous studies have reported the following: (1) that 2KB is a viable substrate for BCKDH, (2) that 2KB is a viable substrate for pyruvate dehydrogenase, and (3) that pyruvate dehydrogenase is also dispensable for 2KB oxidation (see Steele et al., J Nutr., 114: 701-710, and Paxton et al. Biochem J., 234:295-303). Collectively, these data have led previous studies to conclude that BCKDH and pyruvate dehydrogenase are redundant for the first step of 2KB oxidation, with a preference for BCKDH. The flux through either may depend upon the metabolic environment. The aim for figure 3C was to determine whether the BCAA degradation pathway was required for 2KB oxidation. We conclude that this pathway is required, first at the step of PCC.

      While these past studies were mentioned in paragraph 2 of the discussion, in light of the reviewer’s comment we have expanded this paragraph. We have added language to explain that future research interested in the presented 2HB mechanism should carefully consider BCKDH and PDH expression in the cell or tissue of interest, as the metabolism of 2KB is quite central to the presented mechanism.

      Nevertheless, this innovative model of metabolic signaling during exercise will serve as a valuable reference for informing future.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript entitled "A 2-HB-mediated feedback loop regulates muscular fatigue" by the Johnson group reports interesting findings with implications for the health benefits of exercise. The authors use a combination of metabolic/biochemical in vivo and in vitro assays to delineate a metabolic route triggered by 2-HB (a relatively stable metabolite induced by exercise in humans and mice) that controls branched-chain amino transferase enzymes and mitochondrial oxidative capacity. Mechanistically, the author shows that 2-HB is a direct inhibitor of BCAT enzymes that in turn control levels of SIRT4 activity and ADp-ribosylation in the nucleus targeting C/EBP transcription factor, affecting BCAA oxidation genes (see Fig 4i in the paper). Overall, these are interesting and novel observations and findings with relevance to human exercise, with the potential implication of using these metabolites to mimic exercise benefits, or conditions or muscular fatigue that occurs in different human chronic diseases including rheumatic diseases or long COVID.

      Weaknesses:

      There are several experiments/comments that will strengthen the manuscript-

      (1) A final model in Figure 6 integrating the exercise/mechanistic findings, expanding on Fig 4i) will clarify the findings.

      We appreciate the reviewer’s suggestion to incorporate the exercise findings into a summary figure. However, upon internal review we find that such a figure is too similar to Fig 4i to warrant a new diagram.

      (2) In some of the graphs, statistics are missing (e.g Fig 6G).

      Some figures are included primarily for the reader to visualize the data while statistical comparison is conducted in a separate figure, for example Fig 2D-G. However, we have revised the figure legends to ensure that statistical comparisons are described for all appropriate figures, including Fig 6G identified by the reviewer.

      (3) The conclusions on SIRT4 dependency should be carefully written, as it is likely that this is only one potential mechanism, further validation with mouse models would be necessary.

      We appreciate the reviewers feedback and take the point well that a NAD-dependent mechanism will likely stimulate other sirtuins, which are often in fact expressed at greater levels than SIRT4. To reflect this comment in the manuscript we have altered paragraph 5 of the discussion to now focus on sirtuins. We briefly discuss SIRT4 and highlight the need for future consideration of other sirtuins, perhaps particularly mitochondrial sirtuins.

      (4) One of the needed experiments to support the oxidative capacity effects that could be done in cultured cells, is the use of radiosotope metabolites including BCCAs to determine the ability to produce CO2. Alternatively or in combination metabolite flux using isotopes would be useful to strengthen the current results.

      We appreciate the suggestion from the reviewer and we will look to conduct such an experiment in our follow-up work.

      We sincerely thank the reviewers for their input on this study as their suggestions have led to an improved manuscript for the version of record. The reviewer comments are well taken and we are glad that they will be present alongside the final manuscript to provide an important perspective on the work.

    1. Author response:

      [The following is the authors’ response to the current reviews.]

      In response to Reviewer #2, we agree with the reviewer that it needs to be noted that not all forms of recognition are the same and have added the following: "However, we note that not all forms of recognition are the same; researchers may prefer to have their work featured instead of personal stories or critiques of the scientific environment."


      [The following is the authors’ response to the previous reviews.]

      We thank both reviewers for their detailed comments and insightful suggestions. Below we summarize our responses to each concern in addition to the edits within the manuscript.

      We would also like to add a clarification to the eLife assessment, it states “This important bibliometric analysis shows that authors of scientific papers whose names suggest they are female or East Asian get quoted less often in news stories about their work.” We show that individuals with names predicted to be from women or East Asian name origins are less likely to be quoted or mentioned in Nature’s scientific news stories than expected by publication demographics. In this study, we did not compare the level of coverage of a scientific article by the demographics of the authors of the article.

      Reviewer #1

      The article is not so clearly structured, which makes it hard to follow. A better framing, contextualization, and conceptualization of their analysis would help the readers to better understand the results. There are some unclear definitions and wrong wording of key concepts.

      We have adapted our wording in the text and added a more detailed discussion which hopefully makes the paper easier to comprehend. These changes are described in the context of your reviewer's suggestions and addressed in the next section.

      Language use: Male/Female refers to sex, not to gender.

      We have now updated the language throughout the text. Thank you for pointing this out.

      Regional disparities are not the same as names' origin. While the first might relate to the academic origin of authors, inferred from their institutional belonging, the latter reflects the authors' inferred identity. Ethnic identities and the construction of prejudice against specific populations need proper contextualization.

      We have added better contextualization in the manuscript and reworded the section in our results and discussion to clarify that we are analyzing disparities related to perceived ethnicity and not regions. We also added the following text to the results section “In our analysis, we use name origin as an estimate for the perceived ethnicity of a primary source by a journalist. Our prediction is not intended to assign ethnicity to an individual, but to be used broadly as a tool to quantify representational differences in a journalist's sociologically constructed perception of a primary source's ethnicity.” We also added the following text to our Discussion: “Our use of name origins is a proxy for a journalist's or referring scholarly peer’s potential perceptions of the ethnicity of a primary source as signaled by an individual's name. We do not intend to assign an identity to an individual, but to generate a broad metric to measure possible bias for particular ethnicities during journalists' primary source gathering.”

      It would be helpful to have a clear definition of what are quotes, mentions, and citations. For me, it was not so clear and made understanding the results more difficult.

      We added the following text to the results section Extracted Data Used for Analysis: “Quoted names are any names that were attached to a quote within the article. Mentioned names are any names that were stated within the article. Cited names are all author names of a scientific paper that was cited in the news article.”

      The comparison against Nature published research articles is not perfect because journalists will also cover articles not published in Nature. If for example, the gender representation in the quoted articles is not the same between Nature journals and other journals, then this source of inequality would be missing (e.g. if the journalists are biased against women, but not as much when they published in Nature, because they are also biased towards Nature articles). Also, the gender representation among Nature authors could not be the same as in general. Nevertheless, this seems to be a fair benchmark, especially if the authors did not have access to other more comprehensive databases. But a statement of limitations including these potential issues would be good to have.

      To add better context to the generalizability of our work, we added the following text to our discussion: “Furthermore, the news articles present on "www.nature.com" are intended for a very specific readership that may not be reflective of more broad scientific news outlets. In a separate analysis, we took a cursory look into a comparison with The Guardian and found similar disparities in gender and name origin. However, it is not clear which publications should be used as a comparator for science-related articles in The Guardian, and difficult to compare relative rates of representation. While other science news outlets may not have a direct comparator, it would be useful to take a broad comparison across multiple science news outlets to compare against one another. Our existing pipeline could be easily applied to other science news outlets and identify if there exists a consistent pattern of disparity regardless of the intended readership.”

      "we select the highest probability origin for each name as the resultant assignment". Threshold based approaches for race/ethnicity name-based inference have been criticized by the literature as they might reproduce biases (see Kozlowski, D., Murray, D. S., Bell, A., Hulsey, W., Larivière, V., Monroe-White, T., & Sugimoto, C. R. (2022). Avoiding bias when inferring race using name-based approaches. Plos one, 17(3), e0264270.). The authors could use the full distribution of probabilities over names instead of selecting one. The formulae proposed (3-5) could be easily adapted to this change.

      We thank the author for pointing this out. We have updated our analysis to use the probabilities instead of hard assignments. Figure 3 and formulae 3-5 have been updated. While we observe a slight shift in the calculated values, the overall trends are unchanged.

      Is it possible to make an analysis that intersects both name origin and gender? I am not sure if the sample size would allow for this, but if some other dimensions were collapsed, it would be very important to show what happens at the intersection of these two dimensions of discrimination.

      We agree that identifying any differences in quotation patterns at the intersection of gender and name origin would be very useful to identify. To address this, we added supplemental table 5. This table identifies the number of quotes per predicted name origin and gender over all years and article types. In this table, we don’t see a significant difference in gender distribution across predicted name origins.

      Given a larger sample size, we would be able to better identify more subtle differences, but at this sample size, we cannot make more detailed inferences. Additionally, this also addresses a QC-issue, where predicted gender accuracy varies by name origin, specifically East Asian name origin. From our data, we don’t see a large difference in proportions across any name origin. We added the following text to the results section to incorporate this analysis:

      “However, it should be noted that the error rate varies by name origin with the largest decrease in performance on names with an Asian origin [@doi:10.7717/peerj-cs.156;@doi:10.5195/jmla.2021.1252]

      . In our analysis, we did not observe a large difference in names predicted to come from a man or woman between predicted East Asian and other name origins (Table 5). “

      The use of vocabulary should be more homogeneous. For example, in page 13 the authors start to use the concepts of over/under enrichment, which appeared before in a title but was not used.

      The text has been updated to remove all mentions of “over/under enrichment” with “over/under representation”

      In the discussions section, it would be important to see as a statement of limitations the problems that automatic origin and gender inference have.

      We thank the reviewer for this suggestion. We have added the following paragraph to our discussion.

      Computational tools enabled us to automatically analyze thousands of articles to identify existing disparities by gender and name origin, but these tools are not without limitations. Our tools are unable to identify non-binary people and rely on gender predictors that are known to have region-specific biases, with the largest decrease in performance on names of an Asian origin [@doi:10.7717/peerj-cs.156;@doi:10.5195/jmla.2021.1252]. Furthermore, name origin is only a proxy for externally perceived racial or ethnic origins of a source or author and is not as accurate as self-identified race or ethnicity. Self-identification better captures the lived experience of an individual that computational estimates from a name can not capture. This is highlighted in our inability to distinguish between Black and White people from the US by their names. As the collection of demographic data by publication outlets grows, we believe this will enable a more fine-grained and accurate analysis of disparities in scientific journalism.

      Figures 2a and 3a show that the affiliations of authors and their countries was going to be used in this analysis. Yet, this section is not present in the article. I would encourage the authors to add this to the analysis as it would show important patterns, and to intersect the dimensions of gender, name origin and country.

      We were interested in using this analysis in our work, but unfortunately the sample size of cited works in each country was too small to make inferences. If this work was extended to larger scientific outlets to include larger corpora such as The Guardian or New York Times, we think one could be able to make more robust inferences. Since our work only focuses on Nature, we decided not to include this analysis. However, we do include a section in our discussion for future work.

      “As a proxy for measuring possible geographical bias of a journalist, we attempted to identify if there was any geographical bias of cited authors. To do this, we identified the affiliation of each cited author and identified their affiliated country. Unfortunately, we could not robustly extract a large enough number of cited authors from different countries to make any conclusive statements. Expanding our work to other science journalism outlets could help identify possible ways in which geographic region, genders, and perceived ethnicity interact and affect scientific visibility of specific groups. While we are unable to identify that journalists have a specific geographical bias, having reporters explicitly focused on specific regional sources will broaden coverage of international opinions in science.”

      It is not clear at that point what column dependence means.

      The abstract has been updated to state, “Gender disparity in Nature quotes was dependent on the article type.”

      Reviewer #2

      We thank the reviewer for their very detailed and insightful suggestions regarding our analysis and the key caveats that needed better contextualization in our analysis. We went through each major point the reviewer brought up below and included any additional text that was needed.

      In some cases, the manuscript lacks consistency in terminology, and uses word choice that is strange (e.g., "enrichment" and "depletion" when discussion representation).

      We thank the review for pointing this out, we have removed all instances of depletion/enrichment for over/under-representation

      Caveats to Claim 1. So while Claim 1 holds, it does not hold for all comparator sets and for all years. I don't think this is critical of the paper-the authors do discuss the trend in Claim 2-but interpretation of this claim should take care of these caveats, and readers should consider the important differences in first and last authorship.

      We thank the reviewer for their detailed feedback on this section. We have added the missing contextualization of our results. In the results section, I changed the figure caption to: “Speakers predicted to be men are sometimes overrepresented in quotes, but this depends on the year and article type.” Added the following paragraph “When considering the relative proportion of authors and speakers predicted to be men, we only find a slight over-representation of men. This overrepresentation is dependent on the authorship position and the year. Before 2010, quotes predicted as from men are overrepresented in comparison to both first and last authors, but between 2010 and 2017 quotes predicted from men are only overrepresented in comparison for first authors. In 2020, we find a slight over-representation of quotes predicted to be from women relative to first and last authors, but still severely under-represented when considering the general population. The choice of comparison between first and last authors can reveal different aspects of the current state of academia. While this does not hold in all scientific fields, first authors are typically early career scientists and last authors are more senior scientists. It has also been shown that early career scientists tend to be more diverse than senior scientists [@doi:10.7554/eLife.60829; @doi:10.1096/fj.201800639]. Since we find that quotes are only slightly more likely to come from a last author, it is reasonable to compare the relative rate of predicted quotes from men to either authorship position. Comparison with last authorships may reveal more how gender bias currently exists whereas comparison with early career scientists may reveal bias in comparison to a future, more possibly diverse academic environment. We hope that increased representation and recognition of women in science, even beyond what is observed in authorship, can increase the proportion of women first and last authors such that it better reflects the general population.”

      Generalizability to other contexts of science journalism:

      We thank the reviewer for their feedback on the generalizability of our work. We have now added the following text to our discussion to provide the reader with a better context of our results: “To articles presented on "www.nature.com" are intended for a very specific readership that may not be reflective of more broad scientific news outlets. In a separate analysis, we took a cursory look into a comparison with The Guardian and found very similar disparities in gender and name origin. However, it is not clear which publications should be used as a comparator for science-related articles in The

      Guardian, and difficult to compare relative rates of representation. While other science news outlets may not have a direct comparator, it would be useful to take a broad comparison across multiple science news outlets to compare against one another. Our existing pipeline could be easily applied to other science news outlets and identify if there exists a consistent pattern of disparity regardless of the intended readership. ”

      Shallow discussion:

      The authors highlight gender parity in career features, but why exactly is there gender parity in this format

      We thank the reviewer for encouraging us to better contextualize our findings in the broader discourse. We have now added several sections to our Discussion. To address gender parity, we have added the following text: “This finding, coupled with the near equal number of articles written by journalists predicted to be men or women, argues for more diversity in topical coverage. "Career Feature" articles highlight current topics relevant to working scientists and frequently highlight systemic issues with the scientific environment. This column allows space for marginalized people to critique the current state of affairs in science or share their personal stories. This type of content encourages the journalist to seek out a diverse set of primary sources. Including more content that is not primarily focused on recent publications, but all topics surrounding the practice of science, can serve as an additional tool to rapidly achieve gender parity in journalistic recognition.”

      Representation in quotations varies by first and last author, most certainly as a result of the academic division of labor in the life sciences. However, what does it say about the scientific quotation that it appears first authors are more often to be quoted? Does this mean that the division of labor is changing such that the first authors are the lead scientists? Or does it imply that senior authors are being skipped over, or giving away their chance to comment on a study to the first author?

      We thank the reviewer for asking bringing up these important questions. We have added better context to our first author analysis in our discussion. We have included the following two sections to address this. Also, we want to state that we find last authors to be slightly more quoted than first authors, as depicted in Fig. 2d., with first author quotation percentage largely appearing below the red line. We included this text in a response above and include it again here for convenience.

      “Before 2010, quotes predicted as from men are overrepresented in comparison to both first and last authors, but between 2010 and 2017 quotes predicted from men are only overrepresented in comparison for first authors. In 2020, we find a slight over-representation of quotes predicted to be from women relative to first and last authors, but still severely under-represented when considering the general population. The choice of comparison between first and last authors can reveal different aspects of the current state of academia. While this does not hold in all scientific fields, first authors are typically early career scientists and last authors are more senior scientists. It has also been shown that early career scientists tend to be more diverse than senior scientists [@doi:10.7554/eLife.60829; @doi:10.1096/fj.201800639]. Since we find that quotes are only slightly more likely to come from a last author, it is reasonable to compare the relative rate of predicted quotes from men to either authorship position. Comparison with last authorships may reveal more how gender bias currently exists whereas comparison with early career scientists may reveal bias in comparison to a future, more possibly diverse academic environment. We hope that increased representation and recognition of women in science, even beyond what is observed in authorship, can increase the proportion of women first and last authors such that it better reflects the general population.”

      “In our analysis, we also find that there are more first authors with predicted East Asian name origin than last authors. This is in contrast to predicted Celtic/English and European name origins.

      Furthermore, we see that the amount of first author people with predicted East Asian name origins is increasing at a much faster rate than quotes are increasing. If this mismatched rate of representation continues, this could lead to an increasingly large erasure of early career scientists with East Asian name origins. As noted before, focusing on increasing engagement with early career scientists can help to reduce the growing disparity of public visibility of scientists with East Asian name origins.”

      What might be the downstream impacts on the public stemming from the under-representation of scientists with East Asian names? According to Figure 3d, not only are East Asian names under-represented in quotations, but they are becoming more under-represented over time as they appear as authors in a greater number of Nature publications; Those with European names are proportionately represented in quotations given their share of authors in Nature. Why might this be, especially seeing as Anglo names are heavily over-represented?

      To address this point, we have added the following text to our discussion: “In our analysis, we also find that there are more first authors with predicted East Asian name origin than last authors. This is in contrast to predicted Celtic/English and European name origins. Furthermore, the amount of first author people with predicted East Asian name origins is increasing at a much faster rate than quotes are increasing. If this mismatched rate of representation continues, this could lead to an increasingly large erasure of early career scientists with East Asian name origins. As noted before, focusing on increasing engagement with early career scientists can help to reduce the growing disparity of public visibility of scientists with East Asian name origins.”

      I am very confused by Figure 1B. It mixes the counts of News-related items with (non-Springer) research articles in a single stacked bar plot which makes determining the quantity of either difficult. I would advise splitting them out

      Figure 1B has been updated, and the News and Research articles have been separated.

      When querying the first 2000 or so results from the SpringerNature API, are the authors certain that they are getting a random sample of papers?

      These papers were the first 200 English language "Journal" papers returned by the Springer Nature API for each month, resulting in 2400 papers per year from 2005 through 2020. These papers are the first 200 papers published each month by a Springer Nature journal, which may not be completely random, but we believe to be a reasonably representative sample. Furthermore, the Springer Nature comparator set is being used as an additional comparator to the complete set of all Nature research papers used in our analyses.

      In all figures: the authors use capital letters to indicate panels in the caption, but lowercase letters in the figure itself and in the main text. This should be made consistent.

      This has been updated.

      In all figures: the authors should make the caption letter bold in the figure captions, which makes it much easier to find descriptions of specific panels

      This has been updated.

      In the section "coreNLP": the authors mention "co-reference resolution" but without really remarking why it is being used. This is an issue throughout the methods-the authors describe what method they are using but either they don't mention why they are using that method until later, or else not at all.

      We have added better reasoning behind our coreNLP selected methods: “We used the standard set of annotaters: tokenize, ssplit, pos, lemma, ner, parse, coref, and additionally the quote annotator. These perform text tokenization, sentence splitting, part of speech recognition, lemmatization, named entity recoginition, division of sentences into constituent phrases, co-reference resolution, and identification of quoted entities, respectively. We used the "statistical" algorithm to perform coreference resolution for speed. Each of these aspects is required to identify the names of quoted or mentioned speakers and identify any of their associated pronouns. All results were output to json format for further downstream processing.”

      We included a better description of scrapy: “Scrapy is a tool that applies user-defined rules to follow hyperlinks on webpages and return the information contained on each webpage.

      We used Scrapy to extract all web pages containing news articles and extract the text.”

      We also included our motivation for bootstrapping: “We used the boostrap method to construct confidence intervals for each of our calculated statistics.”

      In the section "Name Formatting for Gender Prediction in Quotes or Mentions", genderizeR is mentioned before an introduction to the tool

      We added the following text to provide context: “Even though genderizeR, the computational method used to predict the name's gender, only uses the first name to make the gender prediction, identifying the full name gives us greater confidence that we correctly identified the first name. “

      In the section "Name Formatting for Gender Prediction of Authors", you state that you exclude papers with only one author. How many papers is this? I assume few, in Nature, but if not I can imagine gender differences based on who writes first-authored papers.

      We find that the number excluded is roughly 7% of all papers, which is consistent across Nature and Springer Nature (1113/15013 for cited springer articles, 2899/42155 for random springer articles, 955/12459 for nature authors). We have added the following text to the manuscript for better context: “Roughly 7% of all papers were estimated to be by a single author and removed from this analysis.: 1113/15013 for cited Springer articles, 2899/42155 for random Springer articles, 955/12459 for Nature research articles.”

      In "Name Origin Analysis", for the in-text reference to Equation 3: include the prefix "Eq." or similar to mark this as referencing the equation and not something else

      This has been updated.

      The use of the word "enrichment" in reference to the representation of East Asian authors is strange and does not fit the colloquial definition of the term. I suggest just using a simpler term like "representation" instead

      Similarly, the authors use the word "depletion" to reflect the lower rate of quotes to scientists with East-Asian names, but I feel a simpler word would be more appropriate.

      We thank the reviewer for this suggestion, all instances of “enrichment/depletion” have been replaced with “over/under representation”

      The authors claim in Figure 2d that there is a steady increase in the rate of first author citations, however, this graph is not convincing. It appears to show much more noise than anything resembling a steady change.

      We have reworded our figure description to state that there is a consistent bias towards quoting last authors. Our figure description now states: “Panel d shows a consistent but slight bias towards quoting the last author of a cited article than the first author over time.”

      Supplemental Figures 1b and 1c do not seem to be mentioned in the main text, and I struggle to see their relevance.

      We thank the reviewer for identifying this error; these subpanels have been removed.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Point-by-point response to concerns raised by reviewer #3:

      The manuscript has improved very substantially in revision. The authors have clearly taken the comments on board in good faith. Yet, some small concerns remain around the behavioural analysis.

      In Fig. 8H and H' average sleep/day is ~100. Is this minutes of sleep? 100 min/day is far too low, is it a typo?

      The numbers for sleep bouts are also too low to me e.g. in Fig 9 number of sleep bouts avg around 4.

      In their response to reviewers the authors say these errors were fixed, yet the figures appear not to have been changed. Perhaps the old figures were left in inadvertently?

      Indeed this correction was somehow missed and we thank the reviewer for noticing this. We have now corrected Fig 8H-H’ and Fig 9D.  

      The circadian anticipatory activity analyses could also be improved. The standard in the field is to perform eduction analyses and quantify anticipatory activity e.g. using the method of Harrisingh et al. (PMID: 18003827). This typically computed as the ratio of activity in the 3hrs preceding light transition to activity in the 6hrs preceding light transition.

      In their response to reviewers, the authors have revised their anticipation analyses by quantifying the mean activity in the 6 hrs preceding light transition. However, in the method of Harrisingh et al., anticipation is the ratio of activity in the 3hrs preceding light transition to activity in the 6hrs preceding light transition. Simply computing the activity in the 6hrs preceding light transition does not give a measure of anticipation, determining the ratio is key.

      We acknowledge the importance of obtaining accurate results in our analysis, therefore we have re-evaluated the anticipation activity by measuring the ratio of the mean activity in the 3h preceding light transition over the activity in the 6h preceding light transition. We have reported the data as percentages in Fig 8F-G and modified the figure legends accordingly.

    1. Author response:

      eLife assessment 

      This important study provides evidence for a combination of the latest generation of Oxford Nanopore Technology long reads with state-of-the art variant callers enabling bacterial variant discovery at accuracy that matches or exceeds the current "gold standard" with short reads. The evidence supporting the claims of the authors is convincing, although the inclusion of a larger number of reference genomes would further strengthen the study. The work will be of interest to anyone performing sequencing for outbreak investigations, bacterial epidemiology, or similar studies. 

      We thank the editor and reviewers for the accurate summary and positive assessment. We address the comment about increasing the number of reference genomes in the response to reviewer 2.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors assess the accuracy of short variant calling (SNPs and indels) in bacterial genomes using Oxford Nanopore reads generated on R10.4 flow cells from a very similar genome (99.5% ANI), examining the impact of variant caller choice (three traditional variant callers: bcftools, freebayes, and longshot, and three deep learning based variant callers: clair3, deep variant, and nano caller), base calling model (fast, hac and sup) and read depth (using both simplex and duplex reads). 

      Strengths: 

      Given the stated goal (analysis of variant calling for reads drawn from genomes very similar to the reference), the analysis is largely complete and results are compelling. The authors make the code and data used in their analysis available for re-use using current best practices (a computational workflow and data archived in INSDC databases or Zenodo as appropriate). 

      Weaknesses: 

      While the medaka variant caller is now deprecated for diploid calling, it is still widely used for haploid variant calling and should at least be mentioned (even if the mention is only to explain its exclusion from the analysis). 

      We agree that this would be an informative addition to the study and will add it to the benchmarking.

      Appraisal: 

      The experiments the authors engaged in are well structured and the results are convincing. I expect that these results will be incorporated into "best practice" bacterial variant calling workflows in the future. 

      Thank you for the positive appraisal.

      Reviewer #2 (Public Review): 

      Summary: 

      Hall et al describe the superiority of ONT sequencing and deep learning-based variant callers to deliver higher SNP and Indel accuracy compared to previous gold-standard Illumina short-read sequencing. Furthermore, they provide recommendations for read sequencing depth and computational requirements when performing variant calling. 

      Strengths: 

      The study describes compelling data showing ONT superiority when using deep learning-based variant callers, such as Clair3, compared to Illumina sequencing. This challenges the paradigm that Illumina sequencing is the gold standard for variant calling in bacterial genomes. The authors provide evidence that homopolymeric regions, a systematic and problematic issue with ONT data, are no longer a concern in ONT sequencing. 

      Weaknesses: 

      (1) The inclusion of a larger number of reference genomes would have strengthened the study to accommodate larger variability (a limitation mentioned by the authors). 

      Our strategic selection of 14 genomes—spanning a variety of bacterial genera and species, diverse GC content, and both gram-negative and gram-positive species (including M. tuberculosis, which is neither)—was designed to robustly address potential variability in our results. Moreover, all our genome assemblies underwent rigorous manual inspection as the quality of the true genome sequences is the foundation this research is built upon. Given this, the fundamental conclusions regarding the accuracy of variant calls would likely remain unchanged with the addition of more genomes.  However, we do acknowledge that a substantially larger sample size, which is beyond the scope of this study, would enable more fine-grained analysis of species differences in error rates.

      (2) In Figure 2, there are clearly one or two samples that perform worse than others in all combinations (are always below the box plots). No information about species-specific variant calls is provided by the authors but one would like to know if those are recurrently associated with one or two species. Species-specific recommendations could also help the scientific community to choose the best sequencing/variant calling approaches.

      Thank you for highlighting this observation. The precision, recall, and F1 scores for each sample and condition can be found in Supplementary Table S4. We will investigate the samples that consistently perform below expectation to determine if this is associated with specific species, which may necessitate tailored recommendations for those species. Additionally, we will produce a species-segregated version of Figure 2 for a clearer interpretation and will place it in the supplementary materials.

      (3) The authors support that a read depth of 10x is sufficient to achieve variant calls that match or exceed Illumina sequencing. However, the standard here should be the optimal discriminatory power for clinical and public health utility (namely outbreak analysis). In such scenarios, the highest discriminatory power is always desirable and as such an F1 score, Recall and Precision that is as close to 100% as possible should be maintained (which changes the minimum read sequencing depth to at least 25x, which is the inflection point).

      We agree that the highest discriminatory power is always desirable for clinical or public health applications. In which case, 25x is probably a better minimum recommendation. However, we are also aware that there are resource-limited settings where parity with Illumina is sufficient. In these cases, 10x depth from ONT would provide sufficient data.

      The manuscript currently emphasises the latter scenario, but we will revise the text to clearly recommend 25x depth as a conservative aim in settings where resources are not a constraint, ensuring the highest possible discriminatory power for applications like outbreak analysis.

      (4) The sequencing of the samples was not performed with the same Illumina and ONT method/equipment, which could have introduced specific equipment/preparation artefacts that were not considered in the study. See for example https://academic.oup.com/nargab/article/3/1/lqab019/6193612

      To our knowledge, there is no evidence that sequencing on different ONT machines or barcoding kits leads to a difference in read characteristics or accuracy. To ensure consistency and minimise potential variability, we used the same ONT flowcells for all samples and performed basecalling on the same Nvidia A100 GPU. We will update the methods to emphasise this.

      For Illumina and ONT, the exact machines used for which samples will be added as a supplementary table. We will also add a comment about possible Illumina error rate differences in the ‘Limitations’ section of the Discussion.

      In summary, while there may be specific equipment or preparation artifacts to consider, we took steps to minimise these effects and maintain consistency across our sequencing methods.

      Reviewer #3 (Public Review): 

      Hall et al. benchmarked different variant calling methods on Nanopore reads of bacterial samples and compared the performance of Nanopore to short reads produced with Illumina sequencing. To establish a common ground for comparison, the authors first generated a variant truth set for each sample and then projected this set to the reference sequence of the sample to obtain a mutated reference. Subsequently, Hall et al. called SNPs and small indels using commonly used deep learning and conventional variant callers and compared the precision and accuracy from reads produced with simplex and duplex Nanopore sequencing to Illumina data. The authors did not investigate large structural variation, which is a major limitation of the current manuscript. It will be very interesting to see a follow-up study covering this much more challenging type of variation. 

      We fully agree that investigating structural variations (SVs) would be a very interesting and important follow-up. Identifying and generating ground truth SVs is a nontrivial task and we feel it deserves its own space and study. We hope to explore this in the future.

      In their comprehensive comparison of SNPs and small indels, the authors observed superior performance of deep learning over conventional variant callers when Nanopore reads were basecalled with the most accurate (but also computationally very expensive) model, even exceeding Illumina in some cases. Not surprisingly, Nanopore underperformed compared to Illumina when basecalled with the fastest (but computationally much less demanding) method with the lowest accuracy. The authors then investigated the surprisingly higher performance of Nanopore data in some cases and identified lower recall with Illumina short read data, particularly from repetitive regions and regions with high variant density, as the driver. Combining the most accurate Nanopore basecalling method with a deep learning variant caller resulted in low error rates in homopolymer regions, similar to Illumina data. This is remarkable, as homopolymer regions are (or, were) traditionally challenging for Nanopore sequencing. 

      Lastly, Hall et al. provided useful information on the required Nanopore read depth, which is surprisingly low, and the computational resources for variant calling with deep learning callers. With that, the authors established a new state-of-the-art for Nanopore-only variant, calling on bacterial sequencing data. Most likely these findings will be transferred to other organisms as well or at least provide a proof-of-concept that can be built upon. 

      As the authors mention multiple times throughout the manuscript, Nanopore can provide sequencing data in nearly real-time and in remote regions, therefore opening up a ton of new possibilities, for example for infectious disease surveillance. 

      However, the high-performing variant calling method as established in this study requires the computationally very expensive sup and/or duplex Nanopore basecalling, whereas the least computationally demanding method underperforms. Here, the manuscript would greatly benefit from extending the last section on computational requirements, as the authors determine the resources for the variant calling but do not cover the entire picture. This could even be misleading for less experienced researchers who want to perform bacterial sequencing at high performance but with low resources. The authors mention it in the discussion but do not make clear enough that the described computational resources are probably largely insufficient to perform the high-accuracy basecalling required. 

      We have provided runtime benchmarks for basecalling in Supplementary Figure S16 and detailed these times in Supplementary Table S7. In addition, we state in the Results section (P10 L228-230) “Though we do note that if the person performing the variant calling has received the raw (pod5) ONT data, basecalling also needs to be accounted for, as depending on how much sequencing was done, this step can also be resource-intensive.”

      Even with super-accuracy basecalling considered, our analysis shows that variant calling remains the most resource-intensive step for Clair3, DeepVariant, FreeBayes, and NanoCaller. Therefore, the statement “the described computational resources are probably largely insufficient to perform the high-accuracy basecalling required”, is incorrect. However, we will endeavour to make the basecalling component and considerations more prominent in the Results and Discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors discuss an effect, "diffusive lensing", by which particles would accumulate in high-viscosity regions, for instance in the intracellular medium. To obtain these results, the authors rely on agent-based simulations using custom rules performed with the Ito stochastic calculus convention. The "lensing effect" discussed is a direct consequence of the choice of the Ito convention without spurious drift which has been discussed before and is likely to be inadequate for the intracellular medium, causing the presented results to likely have little relevance for biology.

      We thank the editors and the reviewers for their consideration of our manuscript. We argue in this rebuttal and revision that our results and conclusions are in fact likely to have relevance for biology. While we use the Itô convention for ease of modeling considering its non-anticipatory nature upon discretization (see (Volpe and Wehr 2016) for the discretization schemes), we refer to Figure S1B to emphasize that diffusive lensing occurs not only under the Itô convention but across a wide parameter space. Indeed, it is absent only in the normative isothermal convention; note that even a stochastic differential equation conforming to the isothermal convention may be reformulated into the Itô convention by adding suitable drift terms, allowing for diffusive lensing to be seen even in case of the isothermal convention. We note in particular that the choice of the convention is a highly context-dependent one (Sokolov 2010); there is not a universally correct choice, and one can obtain stochastic differential equations consistent with Ito or Stratonovich interpretations in different regimes. Lastly, space-dependent diffusivity is now an experimentally well-recognized feature of the cellular interior, as noted in our references and as discussed further later in this response. This fact points towards the potential relevance of our model for subcellular diffusion.

      In our revised preprint, we have made changes to the text and minor changes to figures to address reviewer concerns.

      Responses to the Reviewers

      We thank the reviewers for their feedback and address the issues they raised in this rebuttal and in the revised manuscript. The central point that the reviewers raise concerns the validity of the drift-less Itô interpretation in modeling potential nonequilibrium types of subcellular transport arising from space-dependent diffusivity. If the drift term were considered, the resulting stochastic differential equation stochastic differential equation (SDE) is equivalent to one arising from the isothermal interpretation of heterogeneous diffusivity (Volpe and Wehr 2016), wherein no diffusive lensing is seen (as shown in Fig. S1B). That is, the isothermal interpretation and the drift-comprising Itô SDE produce the same uniform steady-state particle densities.

      While we agree with the reviewers that for a given interpretation, equivalent stochastic differential equations (SDEs) arising from other interpretations may be drawn, we disagree with the generalization that all types of subcellular diffusion conform to the isothermal interpretation. That is, there is no reason why any and all instances of nonequilibrium subcellular particle diffusion must be modeled using isothermal-conforming SDEs (such as the drift-comprising Itô SDE, for instance). We refer to (Sokolov 2010) which prescribes choosing a convention in a context-dependent manner. In this regard, we disagree with the second reviewer’s characterization of making such a choice merely a “choice of writing” considering that it is entirely dependent on the choice of microscopic parameters, as detailed in the discussion section of the manuscript. The following references have also been added to the manuscript: the reference from the first reviewer (Kupferman et al. 2004) proposes a prescription for choosing an appropriate convention based upon comparing the noise correlation time and the particle relaxation time. The reference notes that the Itô convention is appropriate when the particle relaxation time is large when compared to the noise correlation time and the Stratonovich convention is appropriate in the converse scenario. In (Rupprecht et al. 2018), active noise is considered and the resulting Fokker-Planck equation conforms to the Stratonovich convention when thermal noise was negligible. The related reference, (Vishen et al. 2019) compares three timescales: those of particle relaxation, noise correlation and viscoelastic relaxation, to make the choice. Indeed, as noted in the manuscript, lensing is seen in all but one interpretation (without drift additions); only its magnitude is altered by the interpretation/choice of the drift term. The appendix has been modified to include a subsection on the interchangeability of the conventions.

      Separately, with regards to the discussion on anomalous diffusion, the section on mean squared displacement calculation has been amended to avoid confusing our model with canonical anomalous diffusion which considers the anomalous exponent; how the anomalous exponent varies with space-dependent diffusivity offers an interesting future area of study.

      Responses to specific reviewer comments appear below.

      Reviewer #1 (Public Review):

      The manuscript "Diffusive lensing as a mechanism of intracellular transport and compartmentalization", explores the implications of heterogeneous viscosity on the diffusive dynamics of particles. The authors analyze three different scenarios:

      (i)   diffusion under a gradient of viscosity,

      (ii)  clustering of interacting particles in a viscosity gradient, and

      (iii) diffusive dynamics of non-interacting particles with circular patches of heterogeneous viscous medium.

      The implications of a heterogeneous environment on phase separation and reaction kinetics in cells are under-explored. This makes the general theme of this manuscript very relevant and interesting. However, the analysis in the manuscript is not rigorous, and the claims in the abstract are not supported by the analysis in the main text.

      Following are my main comments on the work presented in this manuscript:

      (a) The central theme of this work is that spatially varying viscosity leads to position-dependent diffusion constant. This, for an overdamped Langevin dynamics with Gaussian white noise, leads to the well-known issue of the interpretation of the noise term.

      The authors use the Ito interpretation of the noise term because their system is non-equilibrium.

      One of the main criticisms I have is on this central point. The issue of interpretation arises only when there are ill-posed stochastic dynamics that do not have the relevant timescales required to analyze the noise term properly. Hence, if the authors want to start with an ill-posed equation it should be mentioned at the start. At least the Langevin dynamics considered should be explicitly mentioned in the main text. Since this work claims to be relevant to biological systems, it is also of significance to highlight the motivation for using the ill-posed equation rather than a well-posed equation. The authors refer to the non-equilibrium nature of the dynamics but it is not mentioned what non-equilibrium dynamics to authors have in mind. To properly analyze an overdamped Langevin dynamics a clear source of integrated timescales must be provided. As an example, one can write the dynamics as Eq. (1) \dot x = f(x) + g(x) \eta , which is ill-defined if the noise \eta is delta correlated in time but well-defined when \eta is exponentially correlated in time. One can of course look at the limit in which the exponential correlation goes to a delta correlation which leads to Eq. (1) interpreted in Stratonovich convention. The choice to use the Ito convention for Eq. (1) in this case is not justified.

      We thank the reviewer for detailing their concerns with our model’s assumptions. We have addressed them in the common rebuttal.

      (b) Generally, the manuscript talks of viscosity gradient but the equations deal with diffusion which is a combination of viscosity, temperature, particle size, and particle-medium interaction. There is no clear motivation provided for focus on viscosity (cytoplasm as such is a complex fluid) instead of just saying position-dependent diffusion constant. Maybe authors should use viscosity only when talking of a context where the existence of a viscosity gradient is established either in a real experiment or in a thought experiment.

      The manuscript has been amended to use only “diffusivity” to avoid confusion.

      (c) The section "Viscophoresis drives particle accumulation" seems to not have new results. Fig. 1 verifies the numerical code used to obtain the results in the later sections. If that is the case maybe this section can be moved to supplementary or at least it should be clearly stated that this is to establish the correctness of the simulation method. It would also be nice to comment a bit more on the choice of simulation methods with changing hopping sizes instead of, for example, numerically solving stochastic ODE.

      The main point of this section and of Fig. 1 is the diffusive lensing effect itself: the accumulation of particles in lower-diffusivity areas. To the best of our knowledge, diffusive lensing has not been reported elsewhere as a specific outcome of non-isothermal interpretations of diffusion, with potential relevance to nonequilibrium subcellular motilities. The simulation method has been fully described in the Methods section, and the code has also been shared (see Code Availability).

      A minor comment, the statement "the physically appropriate convention to use depends upon microscopic parameters and timescale hierarchies not captured in a coarse-grained model of diffusion." is not true as is noted in the references that authors mention, a correct coarse-grained model provides a suitable convention (see also Phys. Rev. E, 70(3), 036120., Phys. Rev. E, 100(6), 062602.).

      This has been addressed in the common rebuttal.

      (d) The section "Interaction-mediated clustering is affected by viscophoresis" makes an interesting statement about the positioning of clusters by a viscous gradient. As a theoretical calculation, the interplay between position-dependent diffusivity and phase separation is indeed interesting, but the problem needs more analysis than that offered in this manuscript. Just a plot showing clustering with and without a gradient of diffusion does not give enough insight into the interplay between density-dependent diffusion and position-dependent diffusion. A phase plot that somehow shows the relative contribution of the two effects would have been nice. Also, it should be emphasized in the main text that the inter-particle interaction is through a density-dependent diffusion constant and not a conservative coupling by an interaction potential.

      The density-dependence has been added from the Methods to the main text. The goal of the work is to present lensing as a natural outcome of the parameter choices we make and present its effects as they relate to clustering and commonly used biophysical methods to probe dynamics within cells. A dense sampling of the phase space and how it is altered as a function of diffusivity, and the subsequent interpretation, lie beyond the scope of the present work but offer exciting future directions of study.

      (e) The section "In silico microrheology shows that viscophoresis manifests as anomalous diffusion" the authors show that the MSD with and without spatial heterogeneity is different. This is not a surprise - as the underlying equations are different the MSD should be different.

      The goal here is to compare and contrast the ways in which homogeneous and heterogeneous diffusion manifest in simulated microrheology measurements. We hope that an altered saturation MSD, as is observed in our simulations, provokes interest in considering lensing while modeling experimental data.

      There are various analogies drawn in this section without any justification:

      (i) "the saturation MSD was higher than what was seen in the homogeneous diffusion scenario possibly due to particles robustly populating the bulk milieu followed by directed motion into the viscous zone (similar to that of a Brownian ratchet, (Peskin et al., 1993))."

      In case of i), the Brownian ratchet is invoked as a model to explain directed accumulation. We have removed this analogy to avoid confusion as it is not delved into further over the course of our work.

      (ii) "Note that lensing may cause particle displacements to deviate from a Gaussian distribution, which could explain anomalous behaviors observed both in our simulations and in experiments in cells (Parry et al., 2014)." Since the full trajectory of the particles is available, it can be analyzed to check if this is indeed the case.

      This has been addressed in the common rebuttal.

      (f) The final section "In silico FRAP in a heterogeneously viscous environment ... " studies the MSD of the particles in a medium with heterogeneous viscous patches which I find the most novel section of the work. As with the section on inter-particle interaction, this needs further analysis.

      We thank the reviewer for their appreciation. In presenting these three sections discussing the effects of diffusive lensing, we intend to broadly outline the scope of this phenomenon in influencing a range of behaviors. Exploring the directions further comprise promising future directions of study that lie beyond the scope of this manuscript.

      To summarise, as this is a theory paper, just showing MSD or in silico FRAP data is not sufficient. Unlike experiments where one is trying to understand the systems, here one has full access to the dynamics either analytically or in simulation. So just stating that the MSD in heterogeneous and homogeneous environments are not the same is not sufficient. With further analysis, this work can be of theoretical interest. Finally, just as a matter of personal taste, I am not in favor of the analogy with optical lensing. I don't see the connection.

      We value the reviewer’s interest in investigating the causes underlying the differences in the MSDs and agree that it represents a promising future area of study. The main point of this section of the manuscript was to make a connection to experimentally measurable quantities.

      Reviewer #2 (Public Review):

      Summary:

      The authors study through theory and simulations the diffusion of microscopic particles and aim to account for the effects of inhomogeneous viscosity and diffusion - in particular regarding the intracellular environment. They propose a mechanism, termed "Diffusive lensing", by which particles are attracted towards high-viscosity regions where they remain trapped. To obtain these results, the authors rely on agent-based simulations using custom rules performed with the Ito stochastic calculus convention, without spurious drift. They acknowledge the fact that this convention does not describe equilibrium systems, and that their results would not hold at equilibrium - and discard these facts by invoking the fact that cells are out-of-equilibrium. Finally, they show some applications of their findings, in particular enhanced clustering in the high-viscosity regions. The authors conclude that as inhomogeneous diffusion is ubiquitous in life, so must their mechanism be, and hence it must be important.

      Strengths:

      The article is well-written, and clearly intelligible, its hypotheses are stated relatively clearly and the models and mathematical derivations are compatible with these hypotheses.

      We thank the reviewer for their appreciation.

      Weaknesses:

      The main problem of the paper is these hypotheses. Indeed, it all relies on the Ito interpretation of the stochastic integrals. Stochastic conventions are a notoriously tricky business, but they are both mathematically and physically well-understood and do not result in any "dilemma" [some citations in the article, such as (Lau and Lubensky) and (Volpe and Wehr), make an unambiguous resolution of these]. Conventions are not an intrinsic, fixed property of a system, but a choice of writing; however, whenever going from one to another, one must include a "spurious drift" that compensates for the effect of this change - a mathematical subtlety that is entirely omitted in the article: if the drift is zero in one convention, it will thus be non-zero in another in the presence of diffusive gradients. It is well established that for equilibrium systems obeying fluctuation-dissipation, the spurious drift vanishes in the anti-Ito stochastic convention (which is not "anticipatory", contrarily to claims in the article, are the "steps" are local and infinitesimal). This ensures that the diffusion gradients do not induce currents and probability gradients, and thus that the steady-state PDF is the Gibbs measure. This equilibrium case should be seen as the default: a thermal system NOT obeying this law should warrant a strong justification (for instance in the Volpe and Wehr review this can occur through memory effects in robotic dynamics, or through strong fluctuation-dissipation breakdown). In near-equilibrium thermal systems such as the intracellular medium (where, although out-of-equilibrium, temperature remains a relevant and mostly homogeneous quantity), deviations from this behavior must be physically justified and go to zero when going towards equilibrium.

      Considering that the physical phenomena underlying diffusion span a range of timescales (particle relaxation, noise, environmental correlation, et cetera), we disagree with the assertion that all types of subcellular diffusion processes can be modeled as occurring at thermal equilibrium: for example, one can easily imagine memory effects arising in the presence of an appropriate hierarchy of timescales. We have added references that describe in more detail the way in which the comparison of timescales can dictate the applicability of different conventions. We also refer the referee to the common rebuttal section of our response in which we discuss factors that govern the choice of the interpretation. The adiabatic elimination arguments highlighted in (Kupferman et al. 2004) provide a clear description of how relevant particle and environment-related timescales can inform the choice of stochastic calculus to use.

      With regards to the use of the term “anticipatory” to refer to the isothermal interpretation, we refer to the comment in (Volpe and Wehr 2016) of the Itô interpretation “not looking into the future”. In any case, whether anticipatory or otherwise, the interpretation’s effect on our model remains unchanged, as highlighted in the section in the Appendix on the conversion between different conventions; this section has been added to minimize confusion about the effects of the choice of convention on lensing.

      Here, drifts are arbitrarily set to zero in the Ito convention (the exact opposite of the equilibrium anti-Ito), which is the equilibrium equivalent to adding a force (with drift $- grad D$) exactly compensating the spurious drift. If we were to interpret this as a breakdown of detailed balance with inhomogeneous temperature, the "hot" region would be effectively at 4x higher temperature than the cold region (i.e. 1200K) in Fig 1A.

      Our work is based on existing observations of space-dependent diffusivity in cells (Garner et al., 2023; Huang et al., 2021; Parry et al., 2014; Śmigiel et al., 2022; Xiang et al., 2020). These papers support a definitive model for the existence of space-dependent diffusivity without invoking space-dependent temperature.

      It is the effects of this arbitrary force (exactly compensating the Ito spurious drift) that are studied in the article. The fact that it results in probability gradients is trivial once formulated this way (and in no way is this new - many of the references, for instance, Volpe and Wehr, mention this).

      Addressed in the common rebuttal.

      Enhanced clustering is also a trivial effect of this probability gradient (the local concentration is increased by this force field, so phase separation can occur). As a side note the "neighbor sensing" scheme to describe interactions is very peculiar and not physically motivated - it violates stochastic thermodynamics laws too, as the detailed balance is apparently not respected.

      The neighbor-sensing scheme used here is just one possible model of an effective attractive potential between particles. Other models that lead to density-dependent attraction between particles should also provide qualitatively similar results as ours; this offers an interesting prospect for future research.

      Finally, the "anomalous diffusion" discussion is at odds with what the literature on this subject considers anomalous (the exponent does not appear anomalous).

      This has been addressed in the common rebuttal, and the relevant part of the manuscript has been modified to avoid confusion.

      The authors make no further justification of their choice of convention than the fact that cells are out-of-equilibrium, leaving the feeling that this is a detail. They make mentions of systems (eg glycogen, prebiotic environment) for which (near-)equilibrium physics should mostly prevail, and of fluctuation-dissipation ("Diffusivity varies inversely with viscosity", in the introduction). Yet the "phenomenon" they discuss is entirely reliant on an undiscussed mechanism by which these assumptions would be completely violated (the citations they make for this - Gnesotto '18 and Phillips '12 - are simply discussions of the fact that cells are out-of-equilibrium, not on any consequences on the convention).

      Finally, while inhomogeneous diffusion is ubiquitous, the strength of this effect in realistic conditions is not discussed (this would be a significant problem if the effect were real, which it isn't). Gravitational attraction is also an ubiquitous effect, but it is not important for intracellular compartmentalization.

      The manuscript text has been supplemented with additional references that detail the ways in which the comparison of timescales can dictate how one can apply different conventions. We refer the reviewer to the common rebuttal section of our response where we detail factors that dictate the choice of the convention to use. As previously noted, the adiabatic elimination arguments highlighted in (Kupferman et al., 2004) provide a prescription for how different timescales are to be considered in deciding the choice of stochastic calculus to use.

      With regards to the strength of space-dependent diffusivity in subcellular milieu, various measurements of heterogeneous diffusivity have been made both across different model systems and via different modalities, as cited in our manuscript. (Garner et al. 2023) used single-particle tracking to determine over 100-fold variability in diffusivity within individual S. pombe cells. Single-molecule measurements in (Xiang et al. 2020) and (Śmigiel et al. 2022) reveal an order-of-magnitude variation in tracer diffusion in mammalian cells and multi-fold variation in E. coli cytoplasm respectively. Fluorescence correlation spectroscopy measurements in (Huang et al. 2022) have found a two-fold increase in short-range diffusion of protein-sized tracers in X. laevis extracts. We have also added a reference to a study that uses 3D single particle tracking in the cytosol of a multinucleate fungus, A. gossypii, to identify regions of low-diffusivity near nuclei and hyphal tips (McLaughlin et al. 2020). Many of these references deploy particle tracking and investigate how mesoscale-sized particles (i.e. tracers spanning biologically relevant size scales) are directly impacted by space-dependent diffusivity. Therefore, we base our model on not only space-dependent diffusivity being a well-recognized feature of the cellular interior, but also on these observations pertaining to mesoscale-sized particles’ motion along relevant timescales.

      These measurements are also relevant to the reviewer’s question about the strength of the effect, which depends directly on the variability in diffusivity: for ten- or a hundred-fold diffusivity variations, the effect would be expected to be significant. In case of using the Itô convention directly, the contrast in concentration gradient is, in fact, that of the diffusivity gradient.

      To conclude, the "diffusive lensing" effect presented here is not a deep physical discovery, but a well-known effect of sticking to the wrong stochastic convention.

      As detailed in the various responses above, we respectfully disagree with the notion that there exists a singular correct stochastic convention that is applicable for all cases of subcellular heterogeneous diffusion. Further, as detailed in (Volpe and Wehr 2016) and as detailed in the Appendix, it is possible to convert between conventions and that an isothermal-abiding stochastic differential equation may be suitably altered, by means of adding a drift term, to an Itô-abiding stochastic differential equation; therefore, one can observe diffusive lensing without discarding the isothermal convention if the latter were modified. Indeed, it is only the driftless (or canonical) isothermal convention that does not allow for diffusive lensing.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1: 

      This is my first review of the article entitled "The canonical stopping network: Revisiting the role of the subcortex in response inhibition" by Isherwood and colleagues. This study is one in a series of excellent papers by the Forstmann group focusing on the ability of fMRI to reliably detect activity in small subcortical nuclei - in this case, specifically those purportedly involved in the hyper- and indirect inhibitory basal ganglia pathways. I have been very fond of this work for a long time, beginning with the demonstration of De Hollander, Forstmann et al. (HBM 2017) of the fact that 3T fMRI imaging (as well as many 7T imaging sequences) do not afford sufficient signal to noise ratio to reliably image these small subcortical nuclei. This work has done a lot to reshape my view of seminal past studies of subcortical activity during inhibitory control, including some that have several thousand citations.

      In the current study, the authors compiled five datasets that aimed to investigate neural activity associated with stopping an already initiated action, as operationalized in the classic stop-signal paradigm. Three of these datasets are taken from their own 7T investigations, and two are datasets from the Poldrack group, which used 3T fMRI.

      The authors make six chief points: 

      (1) There does not seem to be a measurable BOLD response in the purportedly critical subcortical areas in contrasts of successful stopping (SS) vs. going (GO), neither across datasets nor within each individual dataset. This includes the STN but also any other areas of the indirect and hyperdirect pathways.

      (2) The failed-stop (FS) vs. GO contrast is the only contrast showing substantial differences in those nodes.

      (3) The positive findings of STN (and other subcortical) activation during the SS vs. GO contrast could be due to the usage of inappropriate smoothing kernels.

      (4) The study demonstrates the utility of aggregating publicly available fMRI data from similar cognitive tasks. 

      (5) From the abstract: "The findings challenge previous functional magnetic resonance (fMRI) of the stop-signal task" 

      (6) and further: "suggest the need to ascribe a separate function to these networks." 

      I strongly and emphatically agree with points 1-5. However, I vehemently disagree with point 6, which appears to be the main thrust of the current paper, based on the discussion, abstract, and - not least - the title.

      To me, this paper essentially shows that fMRI is ill-suited to study the subcortex in the specific context of the stop-signal task. That is not just because of the issues of subcortical small-volume SNR (the main topic of this and related works by this outstanding group), but also because of its limited temporal resolution (which is unacknowledged, but especially impactful in the context of the stop-signal task). I'll expand on what I mean in the following.

      First, the authors are underrepresenting the non-fMRI evidence in favor of the involvement of the subthalamic nucleus (STN) and the basal ganglia more generally in stopping actions. 

      - There are many more intracranial local field potential recording studies that show increased STN LFP (or even single-unit) activity in the SS vs. FS and SS vs. GO contrast than listed, which come from at least seven different labs. Here's a (likely non-exhaustive) list of studies that come to mind:

      Ray et al., NeuroImage 2012 <br /> Alegre et al., Experimental Brain Research 2013 <br /> Benis et al., NeuroImage 2014 <br /> Wessel et al., Movement Disorders 2016 <br /> Benis et al., Cortex 2016 <br /> Fischer et al., eLife 2017 <br /> Ghahremani et al., Brain and Language 2018 <br /> Chen et al., Neuron 2020 <br /> Mosher et al., Neuron 2021 <br /> Diesburg et al., eLife 2021 

      - Similarly, there is much more evidence than cited that causally influencing STN via deep-brain stimulation also influences action-stopping. Again, the following list is probably incomplete: 

      Van den Wildenberg et al., JoCN 2006 <br /> Ray et al., Neuropsychologia 2009 <br /> Hershey et al., Brain 2010 <br /> Swann et al., JNeuro 2011 <br /> Mirabella et al., Cerebral Cortex 2012 <br /> Obeso et al., Exp. Brain Res. 2013 <br /> Georgiev et al., Exp Br Res 2016 <br /> Lofredi et al., Brain 2021 <br /> van den Wildenberg et al, Behav Brain Res 2021 <br /> Wessel et al., Current Biology 2022 

      - Moreover, evidence from non-human animals similarly suggests critical STN involvement in action stopping, e.g.: 

      Eagle et al., Cerebral Cortex 2008 <br /> Schmidt et al., Nature Neuroscience 2013 <br /> Fife et al., eLife 2017 <br /> Anderson et al., Brain Res 2020 

      Together, studies like these provide either causal evidence for STN involvement via direct electrical stimulation of the nucleus or provide direct recordings of its local field potential activity during stopping. This is not to mention the extensive evidence for the involvement of the STN - and the indirect and hyperdirect pathways in general - in motor inhibition more broadly, perhaps best illustrated by their damage leading to (hemi)ballism. 

      Hence, I cannot agree with the idea that the current set of findings "suggest the need to ascribe a separate function to these networks", as suggested in the abstract and further explicated in the discussion of the current paper. For this to be the case, we would need to disregard more than a decade's worth of direct recording studies of the STN in favor of a remote measurement of the BOLD response using (provably) sub ideal imaging parameters. There are myriads of explanations of why fMRI may not be able to reveal a potential ground-truth difference in STN activity between the SS and FS/GO conditions, beginning with the simple proposition that it may not afford sufficient SNR, or that perhaps subcortical BOLD is not tightly related to the type of neurophysiological activity that distinguishes these conditions (in the purported case of the stop-signal task, specifically the beta band). But essentially, this paper shows that a specific lens into subcortical activity is likely broken, but then also suggests dismissing existing evidence from superior lenses in favor of the findings from the 'broken' lens. That doesn't make much sense to me.

      Second, there is actually another substantial reason why fMRI may indeed be unsuitable to study STN activity, specifically in the stop-signal paradigm: its limited time resolution. The sequence of subcortical processes on each specific trial type in the stop-signal task is purportedly as follows: at baseline, the basal ganglia exert inhibition on the motor system. During motor initiation, this inhibition is lifted via direct pathway innervation. This is when the three trial types start diverging. When actions then have to be rapidly cancelled (SS and FS), cortical regions signal to STN via the hyperdirect pathway that inhibition has to be rapidly reinstated (see Chen, Starr et al., Neuron 2020 for direct evidence for such a monosynaptic hyperdirect pathway, the speed of which directly predicts SSRT). Hence, inhibition is reinstated (too late in the case of FS trials, but early enough in SS trials, see recordings from the BG in Schmidt, Berke et al., Nature Neuroscience 2013; and Diesburg, Wessel et al., eLife 2021). 

      Hence, according to this prevailing model, all three trial types involve a sequence of STN activation (initial inhibition), STN deactivation (disinhibition during GO), and STN reactivation (reinstantiation of inhibition during the response via the hyperdirect pathway on SS/FS trials, reinstantiation of inhibition via the indirect pathway after the response on GO trials). What distinguishes the trial types during this period is chiefly the relative timing of the inhibitory process (earliest on SS trials, slightly later on FS trials, latest on GO trials). However, these temporal differences play out on a level of hundreds of milliseconds, and in all three cases, processing concludes well under a second overall. To fMRI, given its limited time resolution, these activations are bound to look quite similar. 

      Lastly, further building on this logic, it's not surprising that FS trials yield increased activity compared to SS and GO trials. That's because FS trials are errors, which are known to activate the STN (Cavanagh et al., JoCN 2014; Siegert et al. Cortex 2014) and afford additional inhibition of the motor system after their occurrence (Guan et al., JNeuro 2022). Again, fMRI will likely conflate this activity with the abovementioned sequence, resulting in a summation of activity and the highest level of BOLD for FS trials. 

      In sum, I believe this study has a lot of merit in demonstrating that fMRI is ill-suited to study the subcortex during the SST, but I cannot agree that it warrants any reappreciation of the subcortex's role in stopping, which are not chiefly based on fMRI evidence. 

      We would like to thank reviewer 1 for their insightful and helpful comments. We have responded point-by-point below and will give an overview of how we reframed the paper here.  

      We agree that there is good evidence from other sources for the presence of the canonical stopping network (indirect and hyperdirect) during action cancellation, and that this should be reflected more in the paper. However, we do not believe that a lack of evidence for this network during the SST makes fMRI ill-suited for studying this task, or other tasks that have neural processes occurring in quick succession. What we believe the activation patterns of fMRI reflect during this task, is the large of amount of activation caused by failed stops. That is, that the role of the STN in error processing may be more pronounced that its role in action cancellation. Due to the replicability of fMRI results, especially at higher field strengths, we believe the activation profile of failed stop trials reflects a paramount role for the STN in error processing. Therefore, while we agree we do not provide evidence against the role of the STN in action cancellation, we do provide evidence that our outlook on subcortical activation during different trial types of this task should be revisited. We have reframed the article to reflect this, and discuss points such as fMRI reliability, validity and the complex overlapping of cognitive processes in the SST in the discussion. Please see all changes to the article indicated by red text.

      A few other points: 

      - As I said before, this team's previous work has done a lot to convince me that 3T fMRI is unsuitable to study the STN. As such, it would have been nice to see a combination of the subsamples of the study that DID use imaging protocols and field strengths suitable to actually study this node. This is especially true since the second 3T sample (and arguably, the Isherwood_7T sample) does not afford a lot of trials per subject, to begin with.

      Unfortunately, this study already comprises of the only 7T open access datasets available for the SST. Therefore, unless we combined only the deHollander_7T and Miletic_7T subsamples there is no additional analysis we can do for this right now. While looking at just the sub samples that were 7T and had >300 trials would be interesting, based on the new framing of the paper we do not believe it adds to the study, as the sub samples still lack the temporal resolution seemingly required for looking at the processes in the SST.

      - What was the GLM analysis time-locked to on SS and FS trials? The stop-signal or the GO-signal? 

      SS and FS trials were time-locked to the GO signal as this is standard practice. The main reason for this is that we use contrasts to interpret differences in activation patterns between conditions. By time-locking the FS and SS trials to the stop signal, we are contrasting events at different time points, and therefore different stages of processing, which introduces its own sources of error. We agree with the reviewer, however, that a separate analysis with time-locking on the stop-signal has its own merit, and now include results in the supplementary material where the FS and SS trials are time-locked to the stop signal as well.

      - Why was SSRT calculated using the outdated mean method? 

      We originally calculated SSRT using the mean method as this was how it was reported in the oldest of the aggregated studies. We have now re-calculated the SSRTs using the integration method with go omission replacement and thank the reviewer for pointing this out. Please see response to comment 3.

      - The authors chose 3.1 as a z-score to "ensure conservatism", but since they are essentially trying to prove the null hypothesis that there is no increased STN activity on SS trials, I would suggest erring on the side of a more lenient threshold to avoid type-2 error. 

      We have used minimum FDR-corrected thresholds for each contrast now, instead of using a blanket conservative threshold of 3.1 over all contrasts. The new thresholds for each contrast are shown in text. Please see below (page 12):

      “The thresholds for each contrast are as follows: 3.01 for FS > GO, 2.26 for FS > SS and 3.1 for SS > GO.”

      - The authors state that "The results presented here add to a growing literature exposing inconsistencies in our understanding of the networks underlying successful response inhibition". It would be helpful if the authors cited these studies and what those inconsistencies are. 

      We thank reviewer 1 for their detailed and thorough evaluation of our paper. Overall, we agree that there is substantial direct and indirect evidence for the involvement of the cortico-basal-ganglia pathways in response inhibition. We have taken the vast constructive criticism on board and agree with the reviewer that the paper should be reframed. We would like to thank the reviewer for the thoroughness of their helpful comments aiding the revising of the paper.

      (1) I would suggest reframing the study, abstract, discussion, and title to reflect the fact that the study shows that fMRI is unsuitable to study subcortical activity in the SST, rather than the fact that we need to question the subcortical model of inhibition, given the reasons in my public review.

      We agree with the reviewer that the article should be reframed and not taken as direct evidence against the large sum of literature pointing towards the involvement of the cortico-basal-ganglia pathway in response inhibition. We have significantly rewritten the article in light of this.

      (2) I suggest combining the datasets that provide the best imaging parameters and then analyzing the subcortical ROIs with a more lenient threshold and with regressors time-locked to the stop-signals (if that's not already the case). This would make the claim of a null finding much more impactful. Some sort of power analysis and/or Bayes factor analysis of evidence for the null would also be appreciated. 

      Instead of using a blanket conservative threshold of 3.1, we instead used only FDR-corrected thresholds. The threshold level is therefore different for each contrast and noted in the figures. We have also added supplementary figures including the group-level SPMs and ROI analyses when the FS and SS trials were time-locked to the stop signal instead of the GO signal (Supplementary Figs 4 & 5). But as mentioned above, due to the difference in time points when contrasting, we believe that time-locking to the GO signal for all trial types makes more sense for the main analysis.

      We have now also computed BFs on the first level ROI beta estimates for all contrasts using the BayesFactor package as implemented in R. We add the following section to the methods and updated the results section accordingly (page 8):

      “In addition to the frequentist analysis we also opted to compute Bayes Factors (BFs) for each contrast per ROI per hemisphere. To do this, we extracted the beta weights for each individual trial type from our first level model. We then compared the beta weights from each trial type to one another using the ‘BayesFactor’ package as implement in R (Morey & Rouder, 2015). We compared the full model comprising of trial type, dataset and subject as predictors to the null model comprising of only the dataset and subject as predictor. The datasets and subjects were modeled as random factors. We divided the resultant BFs from the full model by the null model to provide evidence for or against a significant difference in beta weights for each trial type. To interpret the BFs, we used a modified version of Jeffreys’ scale (Jeffreys, 1939; Lee & Wagenmakers, 2014).”

      (3) I suggest calculating SSRT using the integration method with the replacement of Go omissions, as per the most recent recommendation (Verbruggen et al., eLife 2019).

      We agree we should have used a more optimal method for SSRT estimation. We have replaced our original estimations with that of the integration method with go omissions replacement, as suggested and adapted the results in table 3.

      We have also replaced text in the methods sections to reflect this (page 5):

      “For each participant, the SSRT was calculated using the mean method, estimated by subtracting the mean SSD from median go RT (Aron & Poldrack, 2006; Logan & Cowan, 1984).”

      Now reads:

      “For each participant, the SSRT was calculated using the integration method with replacement of go omissions (Verbruggen et al., 2019), estimated by integrating the RT distribution and calculating the point at which the integral equals p(respond|signal). The completion time of the stop process aligns with the nth RT, where n equals the number of RTs in the RT distribution of go trials multiplied by the probability of responding to a signal.”

      Reviewer #2:

      This work aggregates data across 5 openly available stopping studies (3 at 7 tesla and 2 at 3 tesla) to evaluate activity patterns across the common contrasts of Failed Stop (FS) > Go, FS > stop success (SS), and SS > Go. Previous work has implicated a set of regions that tend to be positively active in one or more of these contrasts, including the bilateral inferior frontal gyrus, preSMA, and multiple basal ganglia structures. However, the authors argue that upon closer examination, many previous papers have not found subcortical structures to be more active on SS than FS trials, bringing into question whether they play an essential role in (successful) inhibition. In order to evaluate this with more data and power, the authors aggregate across five datasets and find many areas that are *more* active for FS than SS, specifically bilateral preSMA, caudate, GPE, thalamus, and VTA, and unilateral M1, GPi, putamen, SN, and STN. They argue that this brings into question the role of these areas in inhibition, based upon the assumption that areas involved in inhibition should be more active on successful stop than failed stop trials, not the opposite as they observed. 

      As an empirical result, I believe that the results are robust, but this work does not attempt a new theoretical synthesis of the neuro-cognitive mechanisms of stopping. Specifically, if these many areas are more active on failed stop than successful stop trials, and (at least some of) these areas are situated in pathways that are traditionally assumed to instantiate response inhibition like the hyperdirect pathway, then what function are these areas/pathways involved in? I believe that this work would make a larger impact if the author endeavored to synthesize these results into some kind of theoretical framework for how stopping is instantiated in the brain, even if that framework may be preliminary. 

      I also have one main concern about the analysis. The authors use the mean method for computing SSRT, but this has been shown to be more susceptible to distortion from RT slowing (Verbruggen, Chambers & Logan, 2013 Psych Sci), and goes against the consensus recommendation of using the integration with replacement method (Verbruggen et al., 2019). Therefore, I would strongly recommend replacing all mean SSRT estimates with estimates using the integration with replacement method. 

      I found the paper clearly written and empirically strong. As I mentioned in the public review, I believe that the main shortcoming is the lack of theoretical synthesis. I would encourage the authors to attempt to synthesize these results into some form of theoretical explanation. I would also encourage replacing the mean method with the integration with replacement method for computing SSRT. I also have the following specific comments and suggestions (in the approximate order in which they appear in the manuscript) that I hope can improve the manuscript: 

      We would like to thank reviewer 2 for their insightful and interesting comments. We have adapted our paper to reflect these comments. Please see direct responses to your comments below. We agree with the reviewer that some type of theoretical synthesis would help with the interpretability of the article. We have substantially reworked the discussion and included theoretical considerations behind the newer narrative. Please see all changes to the article indicated by red text.

      (1) The authors say "performance on successful stop trials is quantified by the stop signal reaction time". I don't think this is technically accurate. SSRT is a measure of the average latency of the stop process for all trials, not just for the trials in which subjects successfully stop. 

      Thank you for pointing this technically incorrect statement. We have replaced the above sentence with the following (page 1):

      “Inhibition performance in the SST as a whole is quantified by the stop signal reaction time (SSRT), which estimates the speed of the latent stopping process (Verbruggen et al., 2019).”

      (2) The authors say "few studies have detected differences in the BOLD response between FS and SS trials", but then do not cite any papers that detected differences until several sentences later (de Hollander et al., 2017; Isherwood et al., 2023; Miletic et al., 2020). If these are the only ones, and they only show greater FS than SS, then I think this point could be made more clearly and directly. 

      We have moved the citations to the correct place in the text to be clearer. We have also rephrased this part of the introduction to make the points more direct (page 2).

      “In the subcortex, functional evidence is relatively inconsistent. Some studies have found an increase in BOLD response in the STN in SS > GO contrasts (Aron & Poldrack, 2006; Coxon et al., 2016; Gaillard et al., 2020; Yoon et al., 2019), but others have failed to replicate this (Bloemendaal et al., 2016; Boehler et al., 2010; Chang et al., 2020; B. Xu et al., 2015). Moreover, some studies have actually found higher STN, SN and thalamic activation in failed stop trials, not successful ones (de Hollander et al., 2017; Isherwood et al., 2023; Miletić et al., 2020).

      (3) Unless I overlooked it, I don't believe that the author specified the criterion that any given subject is excluded based upon. Given some studies have significant exclusions (e.g., Poldrack_3T), I think being clear about how many subjects violated each criterion would be useful. 

      This is indeed interesting and important information to include. We have added the number of participants who were excluded for each criterion. Please see added text below (page 4):

      “Based on these criteria, no subjects were excluded from the Aron_3T dataset. 24 subjects were excluded from the Poldrack_3T dataset (3 based on criterion 1, 9 on criterion 2, 11 on criterion 3, and 8 on criterion 4). Three subjects were excluded from the deHollander_7T dataset (2 based on criterion 1 and 1 on criterion 2). Five subjects were excluded from the Isherwood_7T dataset (2 based on criterion 1, 1 on criterion 2, and 2 on criterion 4). Two subjects were excluded from the Miletic_7T dataset (1 based on criterion 2 and 1 on criterion 4). Note that some participants in the Poldrack_3T study failed to meet multiple inclusion criteria.”

      (4) The Method section included very exhaustive descriptions of the neuroimaging processing pipeline, which was appreciated. However, it seems that much of what is presented is not actually used in any of the analyses. For example, it seems that "functional data preprocessing" section may be fMRIPrep boilerplate, which again is fine, but I think it would help to clarify that much of the preprocessing was not used in any part of the analysis pipeline for any results. For example, at first blush, I thought the authors were using global signal regression, but after a more careful examination, I believe that they are only computing global signals but never using them. Similarly with tCompCor seemingly being computed but not used. If possible, I would recommend that the authors share code that instantiates their behavioral and neuroimaging analysis pipeline so that any confusion about what was actually done could be programmatically verified. At a minimum, I would recommend more clearly distinguishing the pipeline steps that actually went into any presented analyses.

      We thank the reviewer for finding this inconsistency. The methods section indeed uses the fMRIprep boilerplate text, which we included so to be as accurate as possible when describing the preprocessing steps taken. While we believe leaving the exact boilerplate text that fMRIprep gives us is the most accurate method to show our preprocessing, we have adapted some of the text to clarify which computations were not used in the subsequent analysis. As a side-note, for future reference, we’d like to add that the fmriprep authors expressly recommend users to report the boilerplate completely and unaltered, and as such, we believe this may become a recurring issue (page 7).

      “While many regressors were computed in the preprocessing of the fMRI data, not all were used in the subsequent analysis. The exact regressors used for the analysis can be found above. For example, tCompCor and global signals were calculated in our generic preprocessing pipeline but not part of the analysis. The code used for preprocessing and analysis can be found in the data and code availability statement.”

      (5) What does it mean for the Poldrack_3T to have N/A for SSD range? Please clarify. 

      Thank you for pointing out this omission. We had not yet found the possible SSD range for this study. We have replaced this value with the correct value (0 – 1000 ms).

      (6) The SSD range of 0-2000ms for deHollander_7T and Miletic_7T seems very high. Was this limit ever reached or even approached? SSD distributions could be a useful addition to the supplement. 

      Thank you for also bringing this mistake to light. We had accidentally placed the max trial duration in these fields instead of the max allowable SSD value. We have replaced the correct value (0 – 900 ms).

      (7) The author says "In addition, median go RTs did not correlate with mean SSRTs within datasets (Aron_3T: r = .411, p = .10, BF = 1.41; Poldrack_3T: r = .011, p = .91, BF = .23; deHollander_7T: r = -.30, p = .09, BF = 1.30; Isherwood_7T: r = .13, p = .65, BF = .57; Miletic_7T: r = .37, p = .19, BF = 1.02), indicating independence between the stop and go processes, an important assumption of the horse-race model (Logan & Cowan, 1984)." However, the independent race model assumes context independence (the finishing time of the go process is not affected by the presence of the stop process) and stochastic independence (the duration of the go and stop processes are independent on a given trial). This analysis does not seem to evaluate either of these forms of independence, as it correlates RT and SSRT across subjects, so it was unclear how this analysis evaluated either of the types of independence that are assumed by the independent race model. Please clarify or remove. 

      Thank you for this comment. We realize that this analysis indeed does not evaluate either context or stochastic independence and therefore we have removed this from the manuscript.

      (8) The RTs in Isherwood_7T are considerably slower than the other studies, even though the go stimulus+response is the same (very simple) stimulus-response mapping from arrows to button presses. Is there any difference in procedure or stimuli that might explain this difference? It is the only study with a visual stop signal, but to my knowledge, there is no work suggesting visual stop signals encourage more proactive slowing. If possible, I think a brief discussion of the unusually slow RTs in Isherwood_7T would be useful. 

      We have included the following text in the manuscript to reflect this observed difference in RT between the Isherwood_7T dataset and the other datasets (page 9).

      “Longer RTs were found in the Isherwood_7T dataset in comparison to the four other datasets. The only difference in procedure in the Isherwood_7T dataset is the use of a visual stop signal as opposed to an auditory stop signal. This RT difference is consistent with previous research, where auditory stop signals and visual go stimuli have been associated with faster RTs compared to unimodal visual presentation (Carrillo-de-la-Peña et al., 2019; Weber et al., 2024). The mean SSRTs and probability of stopping are within normal range, indicating that participants understood the task and responded in the expected manner.”

      (9) When the authors included both 3T and 7T data, I thought they were preparing to evaluate the effect of magnet strength on stop networks, but they didn't do this analysis. Is this because the authors believe there is insufficient power? It seems that this could be an interesting exploratory analysis that could improve the paper.

      We thank the reviewer for this interesting comment. As our dataset sample contains only two 3T and three 7T datasets we indeed believe there is insufficient power to warrant such an analysis. In addition, we wanted the focus of this paper to be how fMRI examines the SST in general, and not differences between acquisition methods. With a greater number of datasets with different imaging parameters (especially TE or resolution) in addition to field strength, we agree such an analysis would be interesting, although beyond the scope of this article.

      (10) The authors evaluate smoothing and it seems that the conclusion that they want to come to is that with a larger smoothing kernel, the results in the stop networks bleed into surrounding areas, producing false positive activity. However, in the absence of a ground truth of the true contributions of these areas, it seems that an alternative interpretation of the results is that the denser maps when using a larger smoothing kernel could be closer to "true" activation, with the maps using a smaller smoothing kernel missing some true activity. It seems worth entertaining these two possible interpretations for the smoothing results unless there is clear reason to conclude that the smoothed results are producing false positive activity. 

      We agree with the view of the reviewer on the interpretation of the smoothing results. We indeed cannot rule this out as a possible interpretation of the results, due to a lack of ground truth. We have added text to the article to reflect this view and discuss the types of errors we can expect for both smaller and larger smoothing kernels (page 15).

      “In the absence of a ground truth, we are not able to fully justify the use of either larger or smaller kernels to analyse such data. On the one hand, aberrantly large smoothing kernels could lead to false positives in activation profiles, due to bleeding of observed activation into surrounding tissues. On the other side, too little smoothing could lead to false negatives, missing some true activity in surrounding regions. While we cannot concretely validate either choice, it should be noted that there is lower spatial uncertainty in the subcortex compared to the cortex, due to the lower anatomical variability. False positives from smoothing spatially unmatched signal, are more likely than false negatives. It may be more prudent for studies to use a range of smoothing kernels, to assess the robustness of their fMRI activation profiles.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      General response:

      We thank all the reviewers for their detailed reviews.

      All reviewers made a number of valuable comments, in particular by highlighting several points that would benefit from additional clarifications and discussion. We really appreciate the time and effort that went into the reviews. We have updated the paper to reflect the changes we have made in response to the reviewers' comments (largely by including more discussion regarding the model limitations and the effect of various modeling choices). We have also included several new supplementary figures (S7, S8, S9, S10) that provide further details of the model behavior, and show the effect of changing some of the terms in the cost. Below, we go through the individual comments, and highlight the places in which we have made changes to address the reviewers’ comments.

      Reviewer 1:

      Thank you for your review and pointing out multiple things to be discussed and clarified! Below, we go through the various limitations you pointed out and refer to the places where we have tried to address them.

      (1) It's important to keep in mind that this work involves simplified models of the motor system, and often the terminology for 'motor cortex' and 'models of motor cortex' are used interchangeably, which may mislead some readers. Similarly, the introduction fails in many cases to state what model system is being discussed (e.g. line 14, line 29, line 31), even though these span humans, monkeys, mice, and simulations, which all differ in crucial ways that cannot always be lumped together.

      That is a good point. We have clarified this in the text (Introduction and Discussion), to highlight the fact that our model isn’t necessarily meant to just capture M1. We have also updated the introduction to make it more clear which species the experiments which motivate our investigation were performed in.

      (2) At multiple points in the manuscript thalamic inputs during movement (in mice) is used as a motivation for examining the role of preparation. However, there are other more salient motivations, such as delayed sensory feedback from the limb and vision arriving in the motor cortex, as well as ongoing control signals from other areas such as the premotor cortex.

      Yes – the motivation for thalamic inputs came from the fact that those have specifically been shown to be necessary for accurate movement generation in mice. However, it is true that the inputs in our model are meant to capture any signals external to the dynamical system modeled, and as such are likely to represent a mixture of sensory signals, and feedback from other areas. We have clarified this in the Discussion, and have added this additional motivation in the Introduction.

      (3) Describing the main task in this work as a delayed reaching task is not justified without caveats (by the authors' own admission: line 687), since each network is optimized with a fixed delay period length. Although this is mentioned to the reader, it's not clear enough that the dynamics observed during the delay period will not resemble those in the motor cortex for typical delayed reaching tasks.

      Yes, we completely agree that the terminology might be confusing. While the task we are modeling is a delayed reaching task, it does differ from the usual setting since the network has knowledge of the delay period, and that is indeed a caveat of the model. We have added a brief paragraph just after the description of the optimal control objective to highlight this limitation.

      We have also performed additional simulations using two different variants of a model-predictive control approach that allow us to relax the assumption that the go-cue time is known in advance. We show that these modifications of the optimal controller yield results that remain consistent with our main conclusions, and can in fact in some settings lead to preparatory activity plateaus during the preparation epoch as often found in monkey M1 (e.g in Elsayed et al. 2016). We have modified the Discussion to explain these results and their limitations, which are summarized in a new Supplementary Figure (S9).

      (4) A number of simplifications in the model may have crucial consequences for interpretation.

      a) Even following the toy examples in Figure 4, all the models in Figure 5 are linear, which may limit the generalisability of the findings.

      While we agree that linear models may be too simplistic, much prior analyses of M1 data suggest that it is often good enough to capture key aspects of M1 dynamics; for example, the generative model underlying jPCA is linear, and Sussillo et al. (2015) showed that the internal activity of nonlinear RNN models trained to reproduce EMG data aligned best with M1 activity when heavily regularized; in this regime, the RNN dynamics were close to linear. Nevertheless, this linearity assumption is indeed convenient from a modeling viewpoint: the optimal control problem is more easily solved for linear network dynamics and the optimal trajectories are more consistent across networks. Indeed, we had originally attempted to perform the analyses of Figure 5 in the nonlinear setting, but found that while the results were overall similar to what we report in the linear regime, iLQR was occasionally trapped into local minimal, resulting in more variable results especially for inhibition-stabilized network in the strongly connected end of the spectrum. Finally, Figure 5 is primarily meant to explore to what extent motor preparation can be predicted from basic linear control-theoretic properties of the Jacobian of the dynamics; in this regard, it made sense to work with linear RNNs (for which the Jacobian is constant).

      b) Crucially, there is no delayed sensory feedback in the model from the plant. Although this simplification is in some ways a strength, this decision allows networks to avoid having to deal with delayed feedback, which is a known component of closed-loop motor control and of motor cortex inputs and will have a large impact on the control policy.

      This comment resonates well with Reviewer 3's remark regarding the autonomous nature (or not) of M1 during movement. Rather than thinking of our RNN models as anatomically confined models of M1 alone, we think of them as models of the dynamics which M1 implements possibly as part of a broader network involving “inter-area loops and (at some latency) sensory feedback”, and whose state appears to be near-fully decodable from M1 activity alone. We have added a paragraph of Discussion on this important point.

      (5) A key feature determining the usefulness of preparation is the direction of the readout dimension. However, all readouts had a similar structure (random Gaussian initialization). Therefore, it would be useful to have more discussion regarding how the structure of the output connectivity would affect preparation, since the motor cortex certainly does not follow this output scheme.

      We agree with this limitation of our model — indeed one key message of Figure 4 is that the degree of reliance on preparatory inputs depends strongly on how the dynamics align with the readout. However, this strong dependence is somewhat specific to low-dimensional models; in higher-dimensional models (most of our paper), one expects that any random readout matrix C will pick out activity dimensions in the RNN that are sufficiently aligned with the most controllable directions of the dynamics to encourage preparation.

      We did consider optimizing C away (which required differentiating through the iLQR optimizer, which is possible but very costly), but the question inevitably arises what exactly should C be optimized for, and under what constraints (e.g fixed norm or not). One possibility is to optimize C with respect to the same control objective that the control inputs are optimized for, and constrain its norm (otherwise, inputs to the M1 model, and its internal activity, could become arbitrarily small as C can grow to compensate). We performed this experiment (new Supplementary Figure S7) and obtained a similar preparation index; there was one notable difference, namely that the optimized readout modes led to greater observability compared to a random readout; thus, the same amount of “muscle energy” required for a given movement could now be produced by a smaller initial condition. In turn, this led to smaller control inputs, consistent with a lower control cost overall.

      Whilst we could have systematically optimized C away, we reasoned that (i) it is computationally expensive, and (ii) the way M1 affects downstream effectors is presumably “optimized” for much richer motor tasks than simple 2D reaching, such that optimizing C for a fixed set of simple reaches could lead to misleading conclusions. We therefore decided to stick with random readouts.

      Additional comments :

      (1) The choice of cost function seems very important. Is it? For example, penalising the square of u(t) may produce very different results than penalising the absolute value.

      Yes, the choice of cost function does affect the results, at least qualitatively. The absolute value of the inputs is a challenging cost to use, as iLQR relies on a local quadratic approximation of the cost function. However, we have included additional experiments in which we penalized the squared derivative of the inputs (Supplementary Figure S8; see also our response to Reviewer 3's suggestion on this topic), and we do see differences in the qualitative behavior of the model (though the main takeaway, i.e. the reliance on preparation, continues to hold). This is now referred to and discussed in the Discussion section.

      (2) In future work it would be useful to consider the role of spinal networks, which are known to contribute to preparation in some cases (e.g. Prut and Fetz, 1999).

      (3) The control signal magnitude is penalised, but not the output torque magnitude, which highlights the fact that control in the model is quite different from muscle control, where co-contraction would be a possibility and therefore a penalty of muscle activation would be necessary. Future work should consider the role of these differences in control policy.

      Thank you for pointing us to this reference! Regarding both of these concerns, we agree that the model could be greatly improved and made more realistic in future work (another avenue for this would be to consider a more realistic biophysical model, e.g. using the MotorNet library). We hope that the current Discussion, which highlights the various limitations of our modeling choices, makes it clear that a lot of these choices could easily be modified depending on the specific assumptions/investigation being performed.

      Reviewer 2:

      Thank you for your positive review! We very much agree with the limitations you pointed out, some of which overlapped with the comments of the other reviewers. We have done our best to address them through additional discussion and new supplementary figures. We briefly highlight below where those changes can be found.

      (1) Though the optimal control theory framework is ideal to determine inputs that minimize output error while regularizing the input norm, it however cannot easily account for some other varied types of objectives especially those that may lead to a complex optimization landscape. For instance, the reusability of parts of the circuit, sparse use of additional neurons when learning many movements, and ease of planning (especially under uncertainty about when to start the movement), may be alternative or additional reasons that could help explain the preparatory activity observed in the brain. It is interesting to note that inputs that optimize the objective chosen by the authors arguably lead to a trade-off in terms of other desirable objectives. Specifically, the inputs the authors derive are time-dependent, so a recurrent network would be needed to produce them and it may not be easy to interpolate between them to drive new movement variants. In addition, these inputs depend on the desired time of output and therefore make it difficult to plan, e.g. in circumstances when timing should be decided depending on sensory signals. Finally, these inputs are specific to the full movement chain that will unfold, so they do not permit reuse of the inputs e.g. in movement sequences of different orders.

      Yes, that is a good point! We have incorporated further Discussion related to this point. We have additionally included a new example in which we regularize the temporal complexity of the inputs (see also our response to Reviewer 3's suggestion on this topic), which leads to more slowly varying inputs, and may indeed represent a more realistic constraint and lead to simpler inputs that can more easily be interpolated between. We also agree that uncertainty about the upcoming go cue may play an important role in the strategy adopted by the animals. While we have not performed an extensive investigation of the topic, we have included a Supplementary Figure (S9) in which we used Model Predictive Control to investigate the effect of planning under uncertainty about the go cue arrival time. We hope that this will give the reader a better sense of what sort of model extensions are possible within our framework.

      (2) Relatedly, if the motor circuits were to balance different types of objectives, the activity and inputs occurring before each movement may be broken down into different categories that may each specialize into one objective. For instance, previous work (Kaufman et al. eNeuron 2016, Iganaki et al., Cell 2022, Zimnik and Churchland, Nature Neuroscience 2021) has suggested that inputs occurring before the movement could be broken down into preparatory inputs 'stricto sensu' - relating to the planned characteristics of the movement - and a trigger signal, relating to the transition from planning to execution - irrespective of whether the movement is internally timed or triggered by an external event. The current work does not address which type(s) of early input may be labeled as 'preparatory' or may be thought of as a part of 'planning' computations.

      Yes, our model does indeed treat inputs in a very general way, and does not distinguish between the different types of processes they may be composed of. This is partly because we do not explicitly model where the inputs come from, such that our inputs likely englobe multiple processes. We have added discussion related to this point.

      (3) While the authors rightly point out some similarities between the inputs that they derive and observed preparatory activity in the brain, notably during motor sequences, there are also some differences. For instance, while both the derived inputs and the data show two peaks during sequences, the data reproduced from Zimnik and Churchland show preparatory inputs that have a very asymmetric shape that really plummets before the start of the next movement, whereas the derived inputs have larger amplitude during the movement period - especially for the second movement of the sequence. In addition, the data show trigger-like signals before each of the two reaches. Finally, while the data show a very high correlation between the pattern of preparatory activity of the second reach in the double reach and compound reach conditions, the derived inputs appear to be more different between the two conditions. Note that the data would be consistent with separate planning of the two reaches even in the compound reach condition, as well as the re-use of the preparatory input between the compound and double reach conditions. Therefore, different motor sequence datasets - notably, those that would show even more coarticulation between submovements - may be more promising to find a tight match between the data and the author's inputs. Further analyses in these datasets could help determine whether the coarticulation could be due to simple filtering by the circuits and muscles downstream of M1, planning of movements with adjusted curvature to mitigate the work performed by the muscles while permitting some amount of re-use across different sequences, or - as suggested by the authors - inputs fully tailored to one specific movement sequence that maximize accuracy and minimize the M1 input magnitude.

      Regarding the exact shape of the occupancy plots, it is important to note that some of the more qualitative aspects (e.g the relative height of the two peaks) will change if we change the parameters of the cost function. Right now, we have chosen the parameters to ensure that both reaches would be performed at roughly the same speed (as a way to very loosely constrain the parameters based on the observed behavior). However, small changes to the hyperparameters can lead to changes in the model output (e.g one of the two consecutive reaches being performed using greater acceleration than the other), and since our biophysical model is fairly simple, changes in the behavior are directly reflected in the network activity. Essentially, what this means is that while the double occupancy is a consistent feature of the model, the exact shape of the peaks is more sensitive to hyperparameters, and we do not wish to draw any strong conclusions from them, given the simplicity of the biophysical model. However, we do agree that our model exhibits some differences with the data. As discussed above, we have included additional discussion regarding the potential existence of separate inputs for planning vs triggering the movement in the context of single reaches.

      Overall, we are excited about the suggestions made by the Reviewer here about using our approach to analyze other motor sequence datasets, but we think that in order to do this properly, one would need to adopt a more realistic musculo-skeletal model (such as one provided by MotorNet).

      (4) Though iLQR is a powerful optimization method to find inputs optimizing the author's cost function, it also has some limitations. First, given that it relies on a linearization of the dynamics at each timestep, it has a limited ability to leverage potential advantages of nonlinearities in the dynamics. Second, the iLQR algorithm is not a biologically plausible learning rule and therefore it might be difficult for the brain to learn to produce the inputs that it finds. It remains unclear whether using alternative algorithms with different limitations - for instance, using variants of BPTT to train a separate RNN to produce the inputs in question - could impact some of the results.

      We agree that our choice of iLQR has limitations: while it offers the advantage of convergence guarantees, it does indeed restrict the choice of cost function and dynamics that we can use. We have now included extensive discussion of how the modeling choices affect our results.

      We do not view the lack of biological plausibility of iLQR as an issue, as the results are agnostic to the algorithm used for optimization. However, we agree that any structure imposed on the inputs (e.g by enforcing them to be the output of a self-contained dynamical system) would likely alter the results. A potentially interesting extension of our model would be to do just what the reviewer suggested, and try to learn a network that can generate the optimal inputs. However, this is outside the scope of our investigation, as it would then lead to new questions (e.g what brain region would that other RNN represent?).

      (5) Under the objective considered by the authors, the amount of input occurring before the movement might be impacted by the presence of online sensory signals for closed-loop control. It is therefore an open question whether the objective and network characteristics suggested by the authors could also explain the presence of preparatory activity before e.g. grasping movements that are thought to be more sensory-driven (Meirhaeghe et al., Cell Reports 2023).

      It is true that we aren’t currently modeling sensory signals explicitly. However, some of the optimal inputs we infer may be capturing upstream information which could englobe some sensory information. This is currently unclear, and would likely depend on how exactly the model is specified. We have added new discussion to emphasize that our dynamics should not be understood as just representing M1, but more general circuits whose state can be decoded from M1.

      Reviewer #2 (Recommendations For The Authors):

      Additionally, thank you for pointing out various typos in the manuscript, we have fixed those!

      Reviewer 3:

      Thank you very much for your review, which makes a lot of very insightful points, and raises several interesting questions. In summary, we very much agree with the limitations you pointed out. In particular, the choice of input cost is something we had previously discussed, but we had found it challenging to decide on what a reasonable cost for “complexity” could be. Following your comment, we have however added a first attempt at penalizing “temporal complexity”, which shows promising behavior. We have only included those additional analyses as supplementary figures, and we have included new discussion, which hopefully highlights what we meant by the different model components, and how the model behavior may change as we vary some of our choices. We hope this can be informative for future models that may use a similar approach. Below, we highlight the changes that we have made to address your comments.

      The main limitation of the study is that it focuses exclusively on one specific constraint - magnitude - that could limit motor-cortex inputs. This isn't unreasonable, but other constraints are at least as likely, if less mathematically tractable. The basic results of this study will probably be robust with regard such issues - generally speaking, any constraint on what can be delivered during execution will favor the strategy of preparing - but this robustness cuts both ways. It isn't clear that the constraint used in the present study - minimizing upstream energy costs - is the one that really matters. Upstream areas are likely to be limited in a variety of ways, including the complexity of inputs they can deliver. Indeed, one generally assumes that there are things that motor cortex can do that upstream areas can't do, which is where the real limitations should come from. Yet in the interest of a tractable cost function, the authors have built a system where motor cortex actually doesn't do anything that couldn't be done equally well by its inputs. The system might actually be better off if motor cortex were removed. About the only thing that motor cortex appears to contribute is some amplification, which is 'good' from the standpoint of the cost function (inputs can be smaller) but hardly satisfying from a scientific standpoint.

      The use of a term that punishes the squared magnitude of control signals has a long history, both because it creates mathematical tractability and because it (somewhat) maps onto the idea that one should minimize the energy expended by muscles and the possibility of damaging them with large inputs. One could make a case that those things apply to neural activity as well, and while that isn't unreasonable, it is far from clear whether this is actually true (and if it were, why punish the square if you are concerned about ATP expenditure?). Even if neural activity magnitude an important cost, any costs should pertain not just to inputs but to motor cortex activity itself. I don't think the authors really wish to propose that squared input magnitude is the key thing to be regularized. Instead, this is simply an easily imposed constraint that is tractable and acts as a stand-in for other forms of regularization / other types of constraints. Put differently, if one could write down the 'true' cost function, it might contain a term related to squared magnitude, but other regularizing terms would by very likely to dominate. Using only squared magnitude is a reasonable way to get started, but there are also ways in which it appears to be limiting the results (see below).

      I would suggest that the study explore this topic a bit. Is it possible to use other forms of regularization? One appealing option is to constrain the complexity of inputs; a long-standing idea is that the role of motor cortex is to take relatively simple inputs and convert them to complex time-evolving inputs suitable for driving outputs. I realize that exploring this idea is not necessarily trivial. The right cost-function term is not clear (should it relate to low-dimensionality across conditions, or to smoothness across time?) and even if it were, it might not produce a convex cost function. Yet while exploring this possibility might be difficult, I think it is important for two reasons.

      First, this study is an elegant exploration of how preparation emerges due to constraints on inputs, but at present that exploration focuses exclusively on one constraint. Second, at present there are a variety of aspects of the model responses that appear somewhat unrealistic. I suspect most of these flow from the fact that while the magnitude of inputs is constrained, their complexity is not (they can control every motor cortex neuron at both low and high frequencies). Because inputs are not complexity-constrained, preparatory activity appears overly complex and never 'settles' into the plateaus that one often sees in data. To be fair, even in data these plateaus are often imperfect, but they are still a very noticeable feature in the response of many neurons. Furthermore, the top PCs usually contain a nice plateau. Yet we never get to see this in the present study. In part this is because the authors never simulate the situation of an unpredictable delay (more on this below) but it also seems to be because preparatory inputs are themselves strongly time-varying. More realistic forms of regularization would likely remedy this.

      That is a very good point, and it mirrors several concerns that we had in the past. While we did focus on the input norm for the sake of simplicity, and because it represents a very natural way to regularize our control solutions, we agree that a “complexity cost” may be better suited to models of brain circuits. We have addressed this in a supplementary investigation. We chose to focus on a cost that penalizes the temporal complexity of the inputs, as ||u(t+1) - u(t)||^2. Note that this required augmenting the state of the model, making the computations quite a bit slower; while it is doable if we only penalize the first temporal derivative, it would not scale well to higher orders.

      Interestingly, we did find that the activity in that setting was somewhat more realistic (see new Supplementary Figure S8), with more sustained inputs and plateauing activity. While we have kept the original model for most of the investigations, the somewhat more realistic nature of the results under that setting suggests that further exploration of penalties of that sort could represent a promising avenue to improve the model.

      We also found the idea of a cost that would ensure low-dimensionality of the inputs across conditions very interesting. However, it is challenging to investigate with iLQR as we perform the optimization separately for each condition; nevertheless, it could be investigated using a different optimizer.

      At present, it is also not clear whether preparation always occurs even with no delay. Given only magnitude-based regularization, it wouldn't necessarily have to be. The authors should perform a subspace-based analysis like that in Figure 6, but for different delay durations. I think it is critical to explore whether the model, like monkeys, uses preparation even for zero-delay trials. At present it might or might not. If not, it may be because of the lack of more realistic constraints on inputs. One might then either need to include more realistic constraints to induce zero-delay preparation, or propose that the brain basically never uses a zero delay (it always delays the internal go cue after the preparatory inputs) and that this is a mechanism separate from that being modeled.

      I agree with the authors that the present version of the model, where optimization knows the exact time of movement onset, produces a reasonably realistic timecourse of preparation when compared to data from self-paced movements. At the same time, most readers will want to see that the model can produce realistic looking preparatory activity when presented with an unpredictable delay. I realize this may be an optimization nightmare, but there are probably ways to trick the model into optimizing to move soon, but then forcing it to wait (which is actually what monkeys are probably doing). Doing so would allow the model to produce preparation under the circumstances where most studies have examined it. In some ways this is just window-dressing (showing people something in a format they are used to and can digest) but it is actually more than that, because it would show that the model can produce a reasonable plateau of sustained preparation. At present it isn't clear it can do this, for the reasons noted above. If it can't, regularizing complexity might help (and even if this can't be shown, it could be discussed).

      In summary, I found this to be a very strong study overall, with a conceptually timely message that was well-explained and nicely documented by thorough simulations. I think it is critical to perform the test, noted above, of examining preparatory subspace activity across a range of delay durations (including zero) to see whether preparation endures as it does empirically. I think the issue of a more realistic cost function is also important, both in terms of the conceptual message and in terms of inducing the model to produce more realistic activity. Conceptually it matters because I don't think the central message should be 'preparation reduces upstream ATP usage by allowing motor cortex to be an amplifier'. I think the central message the authors wish to convey is that constraints on inputs make preparation a good strategy. Many of those constraints likely relate to the fact that upstream areas can't do things that motor cortex can do (else you wouldn't need a motor cortex) and it would be good if regularization reflected that assumption. Furthermore, additional forms of regularization would likely improve the realism of model responses, in ways that matter both aesthetically and conceptually. Yet while I think this is an important issue, it is also a deep and tricky one, and I think the authors need considerable leeway in how they address it. Many of the cost-function terms one might want to use may be intractable. The authors may have to do what makes sense given technical limitations. If some things can't be done technically, they may need to be addressed in words or via some other sort of non-optimization-based simulation.

      Specific comments

      As noted above, it would be good to show that preparatory subspace activity occurs similarly across delay durations. It actually might not, at present. For a zero ms delay, the simple magnitude-based regularization may be insufficient to induce preparation. If so, then the authors would either have to argue that a zero delay is actually never used internally (which is a reasonable argument) or show that other forms of regularization can induce zero-delay preparation.

      Yes, that is a very interesting analysis to perform, which we had not considered before! When investigating this, we found that the zero-delay strategy does not rely on preparation in the same way as is seen in the monkeys. This seems to be a reflection of the fact that our “Go cue” corresponds to an “internal” go cue which would likely come after the true, “external go cue” – such that we would indeed never actually be in the zero delay setting. This is not something we had addressed (or really considered) before, although we had tried to ensure we referred to “delta prep” as the duration of the preparatory period but not necessarily the delay period. We have now included more discussion on this topic, as well as a new Supplementary Figure S10.

      I agree with the authors that prior modeling work was limited by assuming the inputs to M1, which meant that prior work couldn't address the deep issue (tackled here) of why there should be any preparatory inputs at all. At the same time, the ability to hand-select inputs did provide some advantages. A strong assumption of prior work is that the inputs are 'simple', such that motor cortex must perform meaningful computations to convert them to outputs. This matters because if inputs can be anything, then they can just be the final outputs themselves, and motor cortex would have no job to do. Thus, prior work tried to assume the simplest inputs possible to motor cortex that could still explain the data. Most likely this went too far in the 'simple' direction, yet aspects of the simplicity were important for endowing responses with realistic properties. One such property is a large condition-invariant response just before movement onset. This is a very robust aspect of the data, and is explained by the assumption of a simple trigger signal that conveys information about when to move but is otherwise invariant to condition. Note that this is an implicit form of regularization, and one very different from that used in the present study: the input is allowed to be large, but constrained to be simple. Preparatory inputs are similarly constrained to be simple in the sense that they carry only information about which condition should be executed, but otherwise have little temporal structure. Arguably this produces slightly too simple preparatory-period responses, but the present study appears to go too far in the opposite direction. I would suggest that the authors do what they can to address these issue via simulations and/or discussion. I think it is fine if the conclusion is that there exist many constraints that tend to favor preparation, and that regularizing magnitude is just one easy way of demonstrating that. Ideally, other constraints would be explored. But even if they can't be, there should be some discussion of what is missing - preparatory plateaus, a realistic condition-invariant signal tied to movement onset - under the present modeling assumptions.

      As described above, we have now included two additional figures. In the first one (S8, already discussed above), we used a temporal smoothness prior, and we indeed get slightly more realistic activity plateaus. In a second supplementary figure (S9), we have also considered using model predictive control (MPC) to optimize the inputs under an uncertain go cue arrival time. There, we found that removing the assumption that the delay period is known came with new challenges: in particular, it requires the specification of a “mental model” of when the Go cue will arrive. While it is reasonable to expect that monkeys will have a prior over the go time arrival cue that will be shaped by the design of the experiment, some assumptions must be made about the utility functions that should be used to weigh this prior. For instance, if we imagine that monkeys carry a model of the possible arrival time of the go cue that is updated online, they could nonetheless act differently based on this information, for instance by either preparing so as to be ready for the earliest go cue possible or alternatively to be ready for the average go cue. This will likely depend on the exact task design and reward/penalty structure. Here, we added simulations with those two cases (making simplifying assumptions to make the problem tractable/solvable using model predictive control), and found that the “earliest preparation” strategy gives rise to more realistic plateauing activity, while the model where planning is done for the “most likely go time” does not. We suspect that more realistic activity patterns could be obtained by e.g combining this framework with the temporal smoothness cost. However, the main point we wished to make with this new supplementary figure is that it is possible to model the task in a slightly more realistic way (although here it comes at the cost of additional model assumptions). We have now added more discussion related to those points. Note that we have kept our analyses on these new models to a minimum, as the main takeaway we wish to convey from them is that most components of the model could be modified/made more realistic. This would impact the qualitative behavior of the system and match to data but – in the examples we have so far considered – does not appear to modify the general strategy of networks relying on preparation.

      On line 161, and in a few other places, the authors cite prior work as arguing for "autonomous internal dynamics in M1". I think it is worth being careful here because most of that work specifically stated that the dynamics are likely not internal to M1, and presumably involve inter-area loops and (at some latency) sensory feedback. The real claim of such work is that one can observe most of the key state variables in M1, such that there are periods of time where the dynamics are reasonably approximated as autonomous from a mathematical standpoint. This means that you can estimate the state from M1, and then there is some function that predicts the future state. This formal definition of autonomous shouldn't be conflated with an anatomical definition.

      Yes, that is a good point, thank you for making it so clearly! Indeed, as previous work, we do not think of our “M1 dynamics” as being internal to M1, but they may instead include sensory feedback / inter-area loops, which we summarize into the connectivity, that we chose to have dynamics that qualitatively resemble data. We have now incorporated more discussion regarding what exactly the dynamics in our model represent.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment 

      Dasgupta and colleagues make a valuable contribution to the understanding how the guidance factor Sema7a promotes connections between mechanosensory hair cells and afferent neurons of the zebrafish lateral line system. The authors provide solid evidence that loss of Sema7a function results in fewer contacts between hair cells and afferents through comprehensive quantitative analysis. Additional work is needed to distinguish the effects of different isoforms of Sema7a to determine whether there are specific roles of secreted and membrane bound forms. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Dasguta et al. have dissected the role of Sema7a in fine tuning of a sensory microcircuit in the posterior lateral line organ of zebrafish. They attempt to also outline the different roles of a secreted verses membrane-bound form of Sema7a in this process. Using genetic perturbations and axonal network analysis, the authors show that loss of both Sema7a isoforms causes abnormal axon terminal structure with more bare terminals and fewer loops in contact with presynaptic sensory hair cells. Further, they show that loss of Sema7a causes decreased number and size of both the pre- and post-synapse. Finally, they show that overexpression of the secreted form of Sema7a specifically can elicit axon terminal outgrowth to an ectopic Sema7a expressing cell. Together, the analysis of Sema7a loss of function and overexpression on axon arbor structure is fairly thorough and revealed a novel role for Sema7a in axon terminal structure. However, the connection between different isoforms of Sema7a and the axon arborization needs to be substantiated. Furthermore, the effect of loss of Sema7a on the presynaptic cell is not ruled out as a contributing factor to the synaptic and axon structure phenotypes. These issues weaken the claims made by the authors including the statement that they have identified dual roles for the GPI-anchored verses secreted forms of Sema7a on synapse formation and as a chemoattractant for axon arborization respectively. 

      Reviewer #2 (Public Review):

      In this work, Dasgupta et al. investigates the role of Sema7a in the formation of peripheral sensory circuit in the lateral line system of zebrafish. They show that Sema7a protein is present during neuromast maturation and localized, in part, to the base of hair cells (HCs). This would be consistent with pre-synaptic Sema7a mediating formation and/or stabilization of the synapse. They use sema7a loss-of-function strain to show that lateral line sensory terminals display abnormal arborization. They provide highly quantitative analysis of the lateral line terminal arborization to show that a number of specific topological parameters are affected in mutants. Next, they ectopically express a secreted form of Sema7a to show that lateral line terminals can be ectopically attracted to the source. Finally, they also demonstrate that the synaptic assembly is impaired in the sema7a mutant. Overall, the data are of high quality and properly controlled. The availability of Sema7a antibody is a big plus, as it allows to address the endogenous protein localization as well to show the signal absence in the sema7a mutant. The quantification of the arbor topology should be useful to people in the field who are looking at the lateral line as well as other axonal terminals. I think some results are overinterpreted though. The authors state: "Our findings demonstrate that Sema7A functions both as a juxtracrine and as a secreted cue to pattern neural circuitry during sensory organ development." However, they have not actually demonstrated which isoform functions in HCs (also see comments below). In addition, they have to be careful in interpreting their topology analysis, as they cannot separate individual axons. Thus, such analysis can generate artifacts. They can perform additional experiments to address these issues or adjust their interpretations. 

      Reviewer #3 (Public Review):

      The data reported here demonstrate that Sema7a defines the local behavior of growing axons in the developing zebrafish lateral line. The analysis is sophisticated and convincingly demonstrates effects on axon growth and synapse architecture. Collectively, the findings point to the idea that the diffusible form of sema7a may influence how axons grow within the neuromast and that the GPI-linked form of sema7a may subsequently impact how synapses form, though additional work is needed to strongly link each form to its' proposed effect on circuit assembly. 

      The revised manuscript is significantly improved. The authors comprehensively and appropriately addressed most of the reviewers' concerns. In particular, they added evidence that hair cells express both Sema7A isoforms, showed that membrane bound Sema7A does not have long range effects on guidance, demonstrated how axons behave close to ectopic Sema7A, and analyzed other features of the hair cells that revealed no strong phenotypes. The authors also softened the language in many, but not all places. Overall, I am satisfied with the study as a whole. 

      Reviewer #4 (Public Review):

      This study provides direct evidence showing that Sema7a plays a role in the axon growth during the formation of peripheral sensory circuits in the lateral-line system of zebrafish. This is a valuable finding because the molecules for axon growth in hair-cell sensory systems are not well understood. The majority of the experimental evidence is convincing, and the analysis is rigorous. The evidence supporting Sema7a's juxtracrine vs. secreted role and involvement in synapse formation in hair cells is less conclusive. The study will be of interest to cell, molecular and developmental biologists, and sensory neuroscientists. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In their revised manuscript, Dasgupta et al. have provided further experiments to address the role of Sema7a (sec and GPI-anchored) in regulating axon guidance in the lateral line system. Specifically, the inclusion of the heat shock controls and FM labeling to show hair cell mechanotransduction were crucial to interpretation of the results. However, there are still concerns about the specificity of the results. My primary concern is if the change in axon patterning is specifically due to loss of Sema7a in the mutant hair cells. These animals are morphologically very abnormal and, in the rebuttal, the authors state that hair cell number is reduced. This is not quantified in the manuscript and should be included. 

      Thank you for this suggestion. We have included the data in the manuscript in lines 137-139, in Figure 2—figure supplement 1B, and in the source data for Figure 2 and Figure 2-figure supplements.

      If there is not a function for Sema7a in hair cells themselves, why is the number reduced? 

      The sema7a-/- homozygous mutants are not viable and they die by 6 dpf. The loss of Sema7A protein produce other developmental defects including brain edema and a curved body axis. We believe a slight but not significant decrease in hair cell number may arise from a minute developmental delay in the morphogenesis of the neuromast. We have accordingly quantified our data at three distinct developmental stages-at 2 dpf, 3 dpf, and 4 dpf-and have incorporated them in the revised manuscript.

      Additionally, FM data should be quantified and presented in animals without a transgene in the same excitation/emission spectra for clearer interpretation of the staining.

      We have quantified the intensities of labeling with FM 4-64 styryl dye from the control and the sema7a-/- mutant larvae and incorporated the data in lines 139-146, in Figure 2—figure supplement 1D, and in source data for Figure 2 and Figure 2-figure supplements. We Kept the transgenes to concurrently show the arborization phenotype, hair cell morphology, and the FM 4-64 incorporation between the genotypes. 

      Rescue analysis using the myo6d promotor would allow the authors to ensure that the axon deficits can be rescued by putting Sema7a back into the sensory hair cells. Transient transgenesis could be useful for this approach and would not require the creation of a stable line. This could be done with both forms of Sema7a allowing the true assessment of whether or not the secreted and GPI-anchored form have disparate functions as claimed in lines 418424. 

      Although we recognize the importance of the rescue of the sema7a-/- mutant phenotype with the sema7asec and the sema7aGPI transcripts, it is not possible for us to perform that experiment at the moment, for the first author will leave the lab next week.  However, he plans to continue work on this project as an independent investigator to dissect the individual roles of the transcript variants in specifying the pattern of sensory arborization, a project that includes generation of transcript-specific knockout animals and rescue experiments with stable transgenic fish lines. 

      Other concerns:

      (1) The timeline of the heat shock experiment is confusing to me and, therefore, it makes me question the specificity of those results. Based on the speed of axon outgrowth and the time necessary for transcription and translation after heat shock induction of the transgene, it is unclear to me how the axon growth defects could occur in the timeline provided. Imaging two hours after the start of the heat shock is very rapid and speaks to either an indirect effect of the transgenesis on the axon growth or a leaky promotor/induction paradigm. It is possible I am just misunderstanding the set up but, from what I could gather, the imaging is being done 2 hrs after the start of the heat shock. This should be clarified. 

      The axons of the zebrafish posterior lateral line migrate relatively fast. The pioneering axons migrate at around 120 μm/hour (Sato et. al., 2010) and the follower axons migrate at almost 30-80 μm/hour (Sato et. al., 2010). The heat-shock promoter that we have utilized, hsp70l, is highly effective in inducing gene expression and subsequent protein formation within 30 to 60 mins. We believe an hour of heat shock and an hour of incubation post heat shock is sufficient to induce directed axon migration to a distance that spans from 27 μm to 140 μm. 

      We strongly believe that the directed arborization of the sensory axons towards the Sema7Asec source is not due to an indirect effect of transgenesis or leaky promoter induction, as in all 18 of the injected but not heat-shocked control larvae we did not observe ectopic Sema7Asec expression, and no aberrant projection was formed from the sensory arbor network. We highlight this observation in lines 297-299 and in Figure 4E.

      Sato et. al., 2010: Single-cell analysis of somatotopic map formation in the zebrafish lateral line system. Developmental Dynamics 239:2058–2065, 2010.

      Similarly, it would help to clarify if t(0) in the figure is the onset of the heat shock or onset of imaging two hours after the heat shock is started. 

      The t=0 hour in the Figure 4I denotes the onset of imaging two hours after the heat shock began. We have clarified this in the manuscript in lines 1155-1156.

      (2) In the rebuttal, the line numbers cited do not match up with the appropriate text, I believe.

      We have corrected this and updated the manuscript.

      (3) Some of the supplemental figures are not mentioned in the text, or I could not find them. For example: Figure 1 supplement 2J. 

      Thank you for pointing this. We have corrected the manuscript, and the new information is added in line 114.  

      (4) Table 1 statistics: were these adjusted for multiple comparisons using a bonferroni correction or something similar? This is necessary for statistical significance to be meaningful. 

      We did not adjust the p-values for multiple comparisons because the values correspond to only three or four statistical tests per experiment, strongly indicating the unlikelihood of erroneous significance due solely to multiple tests.

      (5) Figure 1I and 1-S3 - The legend states a positive correlation between axonal signal and sema7A signal. Correlations are 0.5, 0.6, and 0.4 (2,3, 4dpf). This is not a convincing positive correlation. At best this is no to a very weak positive correlation. 

      In lines 122-126 we mention that the basal association of the sensory arbors shows a positive correlation with Sema7A accumulation. We never emphasize on the strength of the correlation. However, a consistent positive correlation at three different developmental stages suggests that progressive Sema7A accumulation at the base of the hair cells may guide the sensory arbors to increasingly associate themselves with the hair cells.    

      Reviewer #2 (Recommendations For The Authors):

      I am a bit disappointed that the authors elected not to experimentally address the issue raised by all reviewers: whether the secreted or membrane bound isoform is active in hair cells. They rather decided to change their interpretation in the text. It is fine, given the eLife review structure. However, that would make the manuscript much stronger. Other issues were adequately addressed through textual changes as well. 

      Although we recognize the importance of the rescue of the sema7a-/- mutant phenotype with the sema7asec and the sema7aGPI transcripts, it is not possible for us to perform that experiment at the moment, for the first author will leave the lab next week.  However, he plans to continue work on this project as an independent investigator to dissect the individual roles of the transcript variants in specifying the pattern of sensory arborization, a project that includes generation of transcript-specific knockout animals and rescue experiments with stable transgenic fish lines. 

      Reviewer #3 (Recommendations For The Authors):

      Overall, I am satisfied with the study as a whole and just have a few minor comments that remain to be addressed. 

      (1) Although the authors say that they added appropriate no plasmid/heatshock-only and plasmid-only/no heatshock controls, these results need to be presented more clearly, as they are separated in the paper and only one was quantified (i.e. 100% of embryos showed no defect). Please just make it clear that no defects were observed in either control for either experiment (both secreted and membrane bound ectopic expression). 

      We have clearly stated this information in lines 297-299 and 343-345.

      (2) Please add a compass to Fig. 1A to indicate the orientation of the neuromast. It would also be helpful to add labels for developmental ages to all of the figures, rather than making the reader look it up in the legend. 

      We have updated the Figure 1A and the corresponding figure legend in lines 882883 . We have denoted the larval age in the figure legends to keep the individual images uncluttered.  

      (3) For the RT-PCR experiments in Figure 1, no negative control was included to show that supporting cell or neuronal genes are not detected in the purified hair cells and v.v. that neither isoform is detected in supporting cells or neurons. I ask only because there is a lot of immune-signal outside of the hair cells and I am curious whether that is secreted or might come from other cell types. For neurons and supporting cells, simply demonstrating absence of Sema7a overall would suffice. 

      We have utilized the transgenic line Tg(myo6b:actb1-EGFP) that expresses the fluorophore GFP specifically in the hair cells of the neuromast. Unfortunately, we do not possess a transgenic line that reliably and specifically labels the support cells in the neuromast. Hence, in our sorting experiment the GFP-negative cells that are collected from the trunk segments of the larvae contain all the non-hair cells including epidermal cells, neuronal cells, and immune cells etc. Such a mixture of varied cellular identity may not serve as a reliable negative control. 

      In Figure 7, we have plotted the normalized expression values of the sema7a gene in the neuromast. The plot clearly depicts that the source of Sema7A is the young and the mature hair cells, not the support cells. We further confirm this observation by

      immunostaining where the Sema7A signal is highly restricted to the hair cells and not in any other cell in the neuromast (Figure 1E). Immunostaining further demonstrates that the lateral line sensory arbors also do not produce the Sema7A protein (Figure 1H; Video 1).

      We agree with the reviewer that there are diverse immune cells, including macrophages in and around the neuromast. These macrophages are dynamic and possess highly ramified structure (Denans et. al., 2022). In all our Sema7A immunostainings, we never observed structures that resemble macrophages. Albeit we cannot confirm that Sema7A is not expressed in a distant immune cell, but we highly doubt that signal coming from immune cells is impacting hair cell innervation by the sensory arbors during homeostatic development.

      Denans et. al., 2022: Nature Communications volume 13, Article number: 5356 (2022).

      (4) In Figure 1, Supplement 4, I do not see the immunogen labeled in blue. 

      We have corrected the figure legend. The immunogenic region of the Sema7A protein is now clearly denoted in the figure legend of Figure 1—figure supplement 4.

      (5) In Figure 2, please add a control image as requested, as that enables direct comparison. There is ample room in the figure. 

      We have updated the Figure 2 and made the suggested change.

      (6) In Figure 2, Supplement 1, the FM4-64 data are not presented in a quantified fashion. Please report at least how many embryos showed reliable uptake and preferably how many hair cells per embryo showed reliable uptake. 

      We have quantified the FM 4-64 intensities in control and sema7a-/- mutant larvae. The new data is added to the manuscript in lines 142-146, 577-579 , and in Figure 2—figure supplement 1D.

      (7) In Figure 3, there seems to be a typo in the figure legend: "mutants in the same larvae" does not make sense to me. 

      We have corrected the error. The modified statement is represented in lines 10671068.

      (8) The text should refer more explicitly to the statistical tests reported in Table 1, i.e. as the results are presented. 

      In lines 1105 and 1109, we clearly state the statistical tests that were performed.

      (9) In Figure 6, Supplement 1, please show the raw data points not just the bar graphs

      We have updated the Figure 6—figure supplement 1.

      (10) Minor point: the authors state that they addressed the distance over which secreted Sema7A may act, but this was not evident to me in the text. Please make this finding clearer.

      We have clarified this information in lines 310-311.

      (11) Finally, the discussion contains a statement that is not supported by the data: "We have discovered dual modes of Sema7A function in vivo." They have discovered evidence that there are two isoforms, that loss of both disrupts connectivity, and that overexpression of only the secreted form can elicit growth from a distance. However, there is no direct evidence that the membrane-bound form is responsible for local effects. It is formally possible still that the phenotypes are a result of dual roles for the secreted form. It is clear that another manuscript is forthcoming that will expand on the role of the transmembrane form, but for this manuscript, the authors should make firm conclusions only about the data presented herein.

      Thank you for this suggestion. We have modified the manuscript in lines 425-434.

      Reviewer #4 (Recommendations For The Authors):

      The authors have made significant changes to the manuscript based on the comments of the reviewers. It is now suitable for publication.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have no more experiment to ask but the following errors should be corrected prior.

      (1) L. 183-198: Figure 3 panels were erroneously referred in several places.

      This has been corrected.

      (2) L.182-183: description of active/total cell numbers in main text does not match numbers in Figure 3B

      This has been corrected.

      (3) L.185-187: Figure 3C indicates significant changes of rheobase only between DMI+6OHDA versus 6-OHDA group. Statistical comparison between sham and DMI+6-OHDA was not provided, which may change the interpretation of the data in Figure 3B, C: "...these findings suggest that the 6-OHDA induced lesion of midbrain dopaminergic neurons evoked the increased firing of DRN5-HT neurons" (L.185-187).

      We thank the reviewer for highlighting this point. Indeed, a Kruskal-Wallis test comparing all three groups revealed a significantly lower rheobase in DMI + 6-OHDA mice compared to Sham while the 6-OHDA injected group was not affected. Therefore, the increased firing of DRN5-HT neurons recorded in 6-OHDA injected mice pretreated with DMI also critically involves the noradrenergic system. This is now included in the revised results section of the manuscript (lines 190-197).

      (4) L. 188: The description of "While the excitability of DRN5-HT neurons was not affected in 6-OHDA mice..." does not match the clearly increased cellular excitability shown in Figure 3G-I.

      This has been corrected and we are now referring more specifically to the rheobase, which is not affected in 6-OHDA mice.

      (5) Mann-Whitney tests were inappropriately used for statistics in Figures 3-6: Multiple comparisons (>=3 groups) should be performed one-way ANOVA or the Kruskal-Wallis test for nonparametric data.

      We thank the reviewer for the comment. We now applied the one-way ANOVA/KruskalWallis tests and the text has been modified accordingly.

      (6) It seems that the data points in some panels of Figure 4C represented a cell, but others were averaged within a mouse (Figure 4D). This needs to be clarified or corrected.

      None of the data in Figure 4 was averaged within a mouse. In the the type of chosen graph (aligned dot plot) the equal data are overlapped.

      Reviewer #2 (Recommendations For The Authors):

      The authors' revised manuscript has addressed most of my concerns. However, I'm not convinced by the authors' claim regarding Figure 5B. It would be great if the authors at least discuss in their manuscript why the DMI pretreatment group alone, not the 6OHDA group, significantly lowers the firing rate of DRN (DA) and increases the Erest of DRN (DA), compared to the sham-lesion group. These statistically significant data are not explained at all in the revised manuscript (This effect can be explained by the neuroprotection of NA-neurons from 6-OHDA toxicity?).

      We thank the reviewer for this comment. Since using a one-way ANOVA or a KruskalWallis test for comparing the three groups (as suggested by reviewer 1), the changes previously shown in Figure 5B are not significant.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This manuscript represents a cleanly designed experiment for assessing biological motion processing in children (mean age = 9) with and without ADHD. The group differences concerning accuracy in global and local motion processing abilities are solid, but the analyses suggesting dissociable relationships between global and local processing and social skills, age, and IQ are inconclusive. The results are useful in terms of understanding ADHD and the ontogenesis of different components of the processing of biological motion.

      We thank the editors and reviewers for their valuable feedback and constructive comments. We have carefully considered each point raised by the reviewers and made the necessary revisions to the manuscript. Regarding the relationships between global and local BM processing, the accumulated evidence from previous studies has converged on the dissociation of the two BM components, e.g., while global BM processing is susceptible to learning and practice, local BM processing does not show a learning trend (Chang and Troje, 2009; Grossman et al., 2004), and the brain activations in response to local and global BM cues are different (Chang et al., 2018; Duarte et al., 2022). Nevertheless, we concurred with reviewers that the evidence for such dissociation from the current study by itself is not strong enough. Therefore, we have toned down on this point and no longer claimed the dissociation (including the title). Based on the current results, we focused our discussion on the different aspects of BM processing in children with and without ADHD.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper presents a nice study investigating the impairments of biological motion perception in individuals with ADHD in comparison with neurotypical controls. Motivated by the idea that there is a relationship between biological motion perception and social capabilities, the authors investigated the impairments of local and global (holistic) biological motion perception, the diagnosis status, and several additional behavioral variables that are affected in ADHS (IQ, social responsiveness, and attention / impulsivity). As well local as global biological motion perception is impaired in ADHD individuals. In addition, the study demonstrates a significant correlation between local biological motion perception skills and the social responsiveness score in the ADHD group, but not in controls. A path analysis in the ADHD group suggests that general performance in biological motion perception is influenced mainly by global biological motion perception performance and attentional and perceptual reasoning skills.

      Strengths:

      It is true that there exists not much work on biological motion perception and ADHD. Therefore, the presented study contributes an interesting new result to the biological motion literature, and adds potentially also new behavioral markers for this clinical group. The design of the study is straightforward and technically sound, and the drawn conclusions are supported by the presented results.

      Thanks for this positive assessment of our work.

      Weaknesses:

      Some of the claims about the relationship between genetic factors and ADHD and the components of biological motion processing have to remain speculative at this point because genetic influences were not explicitly tested in this paper. Specifically, the hypothesis that the perception of human social interaction is critically based on a local mechanism for the detection of asymmetry in foot trajectories of walkers (this is what 'BL-local' really measures), or on the detection of live agents in cluttered scenes seems not very plausible.

      Thanks for these comments. We agree that the relationship between genetic factors and BM perception remains to be further examined, as we did not test the genetic influences in this study. We have deleted relavant discussion about genetics. Based on our results, we discuss the possible mechanisms behind the relationship between local BM processing and social interaction in the revised manuscript as follows:

      “As mentioned above, we found a significant negative correlation between the SRS total score and the accuracy of local BM processing, specifically in the ADHD group. This could be due to decreased visual input related to atypical local BM processing, which further impairs global BM processing. According to the two-process theory of biological motion processing61, local BM cues guide visual attention towards BM stimuli55,62. Consequently, the visual input of BM stimuli increases, facilitating the development of the ability to process global BM cues through learning21,63. The latter is a prerequisite for attributing intentions to others and facilitating social interactions with other individuals20,64,65. Thus, atypical local BM processing may contribute to impaired social interaction through altered visual inputs. Further empirical studies are required to confirm these hypotheses.” (lines 417 - 428)

      Based on my last comments, now the discussion has been changed in a way that tries to justify the speculative claims by citing a lot of other speculative papers, which does not really address the problem. For example, the fact that chicks walk towards biological motion stimuli is interesting. To derive that this verifies a fundamental mechanism in human biological motion processing is extremely questionable, given that birds do not even have a cortex. Taking the argumentation of the authors serious, one would have to assume that the 'Local BM' mechanism is probably located in the mesencephalon in humans, and then would have to interact in some way with social perception differences of ADHD children. To me all this seems to make very strong (over-)claims. I suggest providing a much more modest interpretation of the interesting experimental result, based on what has been really experimentally shown by the authors and closely related other data, rather than providing lots of far-reaching speculations.

      In the same direction, in my view, go claims like 'local BM is an intrinsic trait' (L. 448) , which is not only imprecise (maybe better 'mechanisms of processing of local BM cues') but also rather questionable. Likely, this' local processing of BM' is a lower level mechanisms, located probably in early and mid-levels of the visual cortex, with a possible influence of lower structures. It seems not really plausible that this is related to a classical trait variables in the sense of psychology, like personality, as seems to be suggested here. Also here I suggest a much more moderate and less speculative interpretation of the results.

      We thank the reviewer for pointing out these issues. According to these comments, we have carefully revised the discussion to avoid strong (over-) claims. We have deleted the example of chicks, but substituted with more empirical studies to explain our results. We agree that the Local BM mechanism is probably located in subcortical regions in humans, which were reported by some MRI studies (Chang et al., 2018; Hirai and Senju, 2020; Loula et al., 2005). We have added some evidence that atypical local BM processing may decrease visual inputs related to social information as follows:

      “According to the two-process theory of biological motion processing61, local BM cues guide visual attention towards BM stimuli55,62. Consequently, the visual input of BM stimuli increases, facilitating the development of the ability to process global BM cues through learning21,63. The latter is a prerequisite for attributing intentions to others and facilitating social interactions with other individuals20,64,65. Thus, atypical local BM processing may contribute to impaired social interaction through altered visual inputs.” (lines 421 - 427)

      We have also deleted the clarims of 'local BM is an intrinsic trait' (originally L. 448) and related discussion as it was not conclusive based on the current study.

      Reviewer #2 (Public Review):

      Summary:

      Tian et al. aimed to assess differences in biological motion (BM) perception between children with and without ADHD, as well as relationships to indices of social functioning and possible predictors of BM perception (including demographics, reasoning ability and inattention). In their study, children with ADHD showed poorer performance relative to typically developing children in three tasks measuring local, global, and general BM perception. The authors further observed that across the whole sample, performance in all three BM tasks was negatively correlated with scores on the social responsiveness scale (SRS), whereas within groups a significant relationship to SRS scores was only observed in the ADHD group and for the local BM task. Local and global BM perception showed a dissociation in that global BM processing was predicted by age, while local BM perception was not. Finally, general (local & global combined) BM processing was predicted by age and global BM processing, while reasoning ability mediated the effect of inattention on BM processing.

      Strengths:

      Overall, the manuscript is presented in a clear fashion and methods and materials are presented with sufficient detail so the study could be reproduced by independent researchers. The study uses an innovative, albeit not novel, paradigm to investigate two independent processes underlying BM perception. The results are novel and have the potential to have wide-reaching impact on multiple fields.

      We appreciate the reviewer’s positive feedback very much.

      Weaknesses:

      The manuscript has greatly improved in clarity and methodological considerations in response to the review. There are only a few minor points which deserve the authors' attention:

      When outlining the moviation for the current study, results from studies in ADHD and ASD are used too interchangeably. The authors use a lack of evidence for contributing (psychological/developmental) factors on BM processing in ASD to motivate the present study and refer to evidence for differences between typical and non-typical BM processing using studies in both ASD and ADHD. While there are certainly overlapping features between the two conditions/neurotypes, they are not to be considered identical and may have distinct etiologies, therefore the distinction between the two should be made clearer.

      We thank the reviewer for pointing out this issue. We have removed some unnecessary citations about ASD and referred to studies about social cognition in ADHD to elaborate the motivation of this study:

      “Further exploration of a diverse range of social cognitions (e.g., biological motion perception) can provide a fresh perspective on the impaired social function observed in ADHD. Moreover, recent studies have indicated that the social cognition in ADHD may vary depending on different factors at the cognitive, pathological, or developmental levels, such as general cognitive impairment5, symptoms severity8, or age5. Nevertheless, understanding how these factors relate to social cognitive dysfunction of in ADHD is still in its infancy. Bridging this gap is crucial as it can help depict the developmental trajectory of social cognition and identify effective interventions for impaired social interaction in individuals with ADHD.” (lines 53 - 62)

      In the first/main analysis, is unclear to me why in the revised manuscript the authors changed the statistical method from ANOVA/ANCOVA to independent samples t-tests (unless the latter were only used for post-hoc comparisons, then this needs to be stated). Furthermore, although p-values look robust, for this analysis too it should be indicated whether and how multiple comparison problems were accounted for.

      Thanks for the reviewer’s comments. According to the suggestions from reviewer #3, it may be inapposite to regard gender as a covariate in ANOVA, which may violate the assumptions of ANCOVA. To ensure that gender does not influence the results, firstly, we separated boys and girls on the plots with different coloured individual data points, and there are no signs of a gender effect in their TD group. Secondly, we use t-tests to examine the difference between TD and ADHD groups. Finally, we conducted a subsampling analysis with balanced data, and the results remained consistent.

      In part 1 of the results, we aimed to compare the task accuracies between the TD and ADHD groups in three independent tasks, which assess the participants’ abilities to process three types of BM cues. We assumed that individuals with ADHD show poorer performance in three tasks compared to TD individuals. With regard to that, we consider that multiple comparisons may not be necessary.

      Reviewer #3 (Public Review):

      Strengths:

      The authors present differences between ADHD and TD children in biological motion processing, and this question has not received as much attention as equivalent processing capabilities in autism. They use a task that appears well controlled. They raise some interesting mechanistic possibilities for differences in local and global motion processing, which are distinctions worth exploring. The group differences will therefore be of interest to those studying ADHD, as well as other developmental conditions, and those examining biological motion processing mechanisms in general.

      We appreciate the reviewer’s positive assessment of this work.

      Weaknesses:

      The data are not strong enough to support claims about differences between global and lobal processing wrt social communication skills and age. The mechanistic possibilities for why these abilities may dissociate in such a way are interesting, but the crucial tests of differences between correlations do not present a clear picture. Further empirical work would be needed to test the authors' claims. Specifics:

      The authors state frequently that it was the local BM task that related to social communication skills (SRS) and not the global tasks. However, the results section shows a correlation between SRS and all three tasks. The only difference is that when looking specifically within the ADHD group, the correlation is only significant for the local task. The supplementary materials demonstrate that tests of differences between correlations present an incomplete picture. Currently they have small samples for correlations, so this is unsurprising.

      Thanks for this comment. We agree with the reviewer that the relationship between local and global processing with social communication and age needs more expirical work. Based on our results, there are only possible dissociable roles of local and global BM processing. The accumulated evidence from previous studies has converged on this dissociation, e.g., whild global BM processing is susceptible to learning and practice, local BM processing does not show a learning trend (Chang and Troje, 2009; Grossman et al., 2004), and the brain activations in response to local and global BM cues are different (Chang et al., 2018; Duarte et al., 2022). We concurred with reviewers that the evidence for such dissociation from the current study by itself is not strong enough. Therefore, we have toned down on this point and no longer emphasized the dissociation. Based on the current results, we focused our discussion on the different aspects of BM processing in children with and without ADHD. Future studies with larger sample sizes are needed to confirm this disociable relationship.

      Theoretical assumptions. The authors make some statements about local vs global biological motion processing that should still be made more tentatively. They assume that local processing is specifically genetically whereas global processing is a product of experience. These data in newborn chicks are controversial and confounded - I cannot remember the specifics but I think there an upper vs lower visual field complexity difference here.

      We appreciate the reviewer’s suggestion. We agree that the relationship between genetic factors and BM perception remains to be further examined as we didn’t perform any genetic analysis in the current study. Some speculative papers have been removed, so do the statement about newborn chicks given the controversial and confounded results. We have toned down our claims and povided a moderate interpretation of the results:

      “Sensitivity to local BM cues emerges early in life54,55 and involves rapid processing in the subcortical regions16,56-58. As a basic pre-attentive feature23, local BM cues can guide visual attention spontaneously59,60. In contrary, the ability to process global BM cues is related to slow cortical BM processing and is influenced by many factors such as attention25,26 and visual experience21,51. As mentioned above, we found a significant negative correlation between the SRS total score and the accuracy of local BM processing, specifically in the ADHD group. This could be due to decreased visual input related to atypical local BM processing, which further impairs global BM processing. According to the two-process theory of biological motion processing61, local BM cues guide visual attention towards BM stimuli55,62. Consequently, the visual input of BM stimuli increases, facilitating the development of the ability to process global BM cues through learning21,63. The latter is a prerequisite for attributing intentions to others and facilitating social interactions with other individuals20,64,65. Thus, atypical local BM processing may contribute to impaired social interaction through altered visual inputs.” (lines 413 - 427)

      “Few developmental studies have been conducted on local BM processing. The ability to process local BM cues remained stable and did not exhibit a learning trend21,25. A reasonable interpretation may be that local BM processing is a low-level mechanism, probably performed by the primary visual cortex and subcortical regions such as the superior colliculus, pulvinar, and ventral lateral nucleus14,56,61.” (lines 441- 446)

      Readability. The manuscript needs very careful proofreading and correction for grammar. There are grammatical errors throughout.

      Thank the reviewer for this feedback. We have performed thorough proofreading and corrected grammatical errors throughout the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I thank the authors for their revisions that address several of the minor points that I raised in my last review. A number of requests are still not sufficiently answered:

      L. 290 ff.: These model 'BM-local = age + gender etc ' is a pretty sloppy notation. I think what is meant that a GLM was used that uses the predictors genderetc. time appropriate beta_i values. This formulas should be corrected or one just says that a GLM was run with the predictors gender

      The same criticism applies to these other models that follow.

      This was corrected.

      However, the corrected text remains sloppy: example: 'BM-locaL = ...' What exacty is 'BM-Local' the accuracy? etc. Here a precise notation shoudl be given that clearly names which variables are used here as predictors and target variables.

      We appreciate the reviewer’s suggestion. We clarified which variables are used in our model and gived them precise notations:

      “Three linear models were built to investigate the contributing factors: (a) ACClocal = β0 + β1 * age + β2 * gender + β3 * FIQ + β4 * QbInattention, (b) ACCglobal = β0 + β1 * age + β2 * gender + β3 * FIQ + β4 * QbInattention, and (c) ACCgeneral = β0 + β1 * age + β2 * gender + β3 * FIQ + β4 * QbInattention + β5 * ACClocal + β6 * ACCglobal. ACClocal, ACCglobal and ACCgeneral refer to the response accuracies of the three tasks in the ADHD group, and QbInattention is the standardised score for sustained attention function.” (lines 337 - 343)

      All these models assume linearity of the combination of the predictors. was this assumption verified?

      We referred to the previous study of BM perception in children. They found main predictor variables, including IQ (Rutherford et al., 2012; Jones et al., 2011) and age (Annaz et al., 2010; van et al., 2016), have a linear relation with the ability of BM processing.

      This answer is insufficient and not convincing. Because a variable Y depends linearly on predictor A and B in some other study, this does not imply that is is also linear in predictor C, or does not show interactions with such predictors in the present study.

      What is needed here is the testing of models with interaction terms and verifying that such models are not better predictors. If authors do not want to do this, they need at least to clearly point out that they made the strong assumption of linearity of their model, which might be wrong and thus be a substantial limitation of their analysis.

      Thanks for the suggestion. We tried to compare each possible mode with and without relative interactions. The results showed that the change of Coefficient of Determination (R-squared, R2) between the two models was not statistically significant.

      L. 296ff.: For model (b) it looks like general BM performance is strongly driven by the predictor global BM performance in the ADHD group. Does the same observation also apply to the controls?

      The same phenomenon was not observed in TD children. We have briefly discussed this point in the Discussion section of the revised manuscript (lines 449 - 459).

      Was such a path analysis also done for the TD subjects or not? If yes, was then also predicted that the variable BM-Global largely and directedly influences the variable BM-General? (The answer refers to the general discussion section, where no such analysis is presented, as far as I understand.)

      Thank you for your comment. We also conduct a path analysis similar to that in the ADHD group. There is no statistically significant mediator effect in the TD group. Please see Figure S3 for complete statistics.

      Reviewer #2 (Recommendations For The Authors):

      (1) Please add public access to the data repository so data availability can be assessed.

      The data analyzed during the study is available at https://osf.io/37p5s/.

      (2) Lines 119-115: The differences observed in ADHD participants in the studies referenced here were relative to what group? The last sentence here also refers to two groups, and it is difficult to gather which specific groups are meant, also because the two references relate to both ADHD and ASD samples. Please clarify.

      The suggestion is well taken. We have clarified the expressions accordingly:

      “Specifically, compared with the typically developing (TD) group, children with ADHD showed reduced activity of motion-sensitive components (N200) while watching biological and scrambled motions, although no behavioural differences were observed. Another study found that children with ADHD performed worse in BM detection with moderate noise ratios than the TD group32.” (lines 100 - 105)

      (3) Line 116: I'm not sure what is meant by 'despite initial indications' - please briefly specify/summarise here why the investigation into BM processing in ADHD is warranted.

      Thank the reviewer for pointing out this issue. We rephrase this part and briefly specify “why the investigation into BM processing in ADHD is warranted”:

      “Despite initial findings about atypical BM perception in ADHD, previous studies on ADHD treated BM perception as a single entity, which may have led to misleading or inconsistent findings28. Hence, it is essential to deconstruct BM processing into multiple components and motion features.” (lines 108 -111)

      (4) Lines 290-293: Please complete the sentence.

      Thank the reviewer for pointing out this issue. Th sentence has been completed:

      “For Task 2 and 3, where children were asked to detect the presence or discriminate the facing direction of the target walker, TD group have higher accuracies than the ADHD group (Task 2 - TD: 0.70 ± 0.12, ADHD: 0.59 ± 0.12, t73 = 3.677, p < 0.001, Cohen's d = 0.861; Task 3 - TD: 0.79 ± 0.12, ADHD: 0.63 ± 0.17, t73 = 4.702, p < 0.001, Cohen's d = 1.100).” (lines 284 - 288)

      Reviewer #3 (Recommendations For The Authors):

      (1) Conclusions concerning differences between the local and global tasks wrt SRS and age (see above). I believe the authors need to reword throughout to reflect that the tests of differences between these crucial correlations did not present a clear picture.

      We have reworded throughout the paper to reflect the inconclusiveness with regard to the relationship between local and global processing with social communication based on this study only. Future studies with larger sample sizes are needed to confirm this conclusion. The mechanism for this dissociable relationship should be validated by more psychologial tests in the future studies.

      (2) I would again tone down the discussion of genetic specification of local processing, given it is highly controversial.

      We thank the reviewer for pointing out the issue. We agree the point about the genetic specification of local processing remains controversial. The interpretation of results about local BM processing has been rephrased. Please refer to our response to the point #2 mentioned.

      (3) The manuscript needs very careful proofreading and grammatical correction throughout.

      Thanks for the suggestion to check the grammar. We have carefully proofread the manuscript to correct grammatical errors

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      Following synaptic vesicle fusion events at release sites, vesicle remnants will need to be cleared in order to allow new rounds of vesicle docking and fusion. This fundamental study of Mahapatra and Takahashi examines the role of release site clearance in synaptic transmission during repetitive activity in two types of central synapses, the giant calyx of Held and hippocampal CA1 synapses. The study uses pharmacological approaches to interfere with release site clearance by blocking membrane retrieval (endocytosis). They compare the effects on short-term plasticity with those obtained by pharmacologically inhibiting scaffold protein activity. The data presented make a compelling case for fast endocytosis as necessary for rapid site clearance and vesicle recruitment to active zones. The data reveal an unexpected, fast role for local site clearance in counteracting synaptic depression.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study examines the role of release site clearance in synaptic transmission during repetitive activity under physiological conditions in two types of central synapses, calyx of Held and hippocampal CA1 synapses. After acute block of endocytosis by pharmacology, deeper synaptic depression or less facilitation was observed in two types of synapses. Acute block of CDC42 and actin polymerization, which possibly inhibits the activity of Intersectin, affected synaptic depression at the calyx synapse, but not at CA1 synapses. The data suggest an unexpected, fast role of the site clearance in counteracting synaptic depression.

      Strengths:

      The study uses acute block of the molecular targets with pharmacology together with precise electrophysiology. The experimental results are clear cut and convincing. The study also examines the physiological roles of the site clearance using action potential-evoked transmission at physiological Ca and physiological temperature at mature animals. This condition has not been examined.

      Weaknesses:

      Pharmacology may have some off-target effects, though acute manipulation should be appreciated and the authors have tried several reagents to verify the overall conclusions.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Mahapatra and Takahashi report on the physiological consequences of pharmacologically blocking either clathrin and dynamin function during compensatory endocytosis or of the cortical actin scaffold both in the calyx of Held synapse and hippocampal boutons in acute slice preparations

      Strengths:

      Although many aspects of these pharmacological interventions have been studied in detail during the past decades, this is a nice comprehensive and comparative study, which reveals some interesting differences between a fast synapse (Calyx of Held) tuned to reliably transmit at several 100 Hz and a more slow hippocampal CA1 synapse. In particular the authors find that acute disturbance of the synaptic actin network leads to a marked frequency-dependent enhancement of synaptic depression in the Calyx, but not in the hippocampal synapse This striking difference between both preparations is the most interesting and novel finding.

      Weaknesses:

      Unfortunately, however, these findings concerning the different consequences of actin depolymerization are not sufficiently discussed in comparison to the literature. My only criticism concerns the interpretation of the ML 141 and Lat B data. With respect to the Calyx data, I am missing a detailed discussion of the effects observed here in light of the different RRP subpools SRP and FRP. This is very important since Lee at al. (2012, PNAS 109 (13) E765-E774) showed earlier that disruption of actin inhibits the rapid transition of SRP SVs to the FRP at the AZ. The whole literature on this important concept is missing. Likewise, the role of actin for the replacement pool at a cerebellar synapse (Miki et al., 2016) is only mentioned in half a sentence. There is quite some evidence that actin is important both at the AZ (SRP to FRP transition, activation of replacement pool) and at the peri-active zone for compensatory endocytosis and release site clearance. Both possible underlying mechanisms (SRP to FRP transition or release site clearance) should be better dissected.

      We dissected the latrunculin effect further by referring to the related literature within the scope of this study in the revised Discussion section (last paragraph).

      Reviewer #3 (Public Review):

      The manuscript by Mahapatra and Takahashi addresses the role of presynaptic release site clearance during sustained synaptic activity. The authors characterize the effects of pharmacologically interfering with SV endocytosis (pre-incubation with Dynasore or Pitstop-2) on synaptic short-term plasticity (STP) at two different CNS synapses (calyx of Held synapses and hippocampal SC to CA1 synapses) using patch-clamp recordings in acute slices under experimental conditions designed to closely mimic a physiological situation (37{degree sign}C and 1.3 mM external [Ca2+]). Endocytosis blocker-induced changes in STP and in the recovery from short-term depression (STD) are compared to those seen after pharmacologically inhibiting actin filament assembly (pre-incubation with Latrunculin-B or the selective Cdc42 GTPase inhibitor ML-141). Presynaptic capacitance (Cm) recordings in calyx terminals were used to establish the effects of the pharmacological maneuvers on SV endocytosis.

      Latrunculin-B and ML-141 neither affect SV endocytosis (assayed by Cm recordings) nor EPSC recovery following conditioning trains, but strongly enhances STD at calyx synapses. No changes in STP were observed at Latrunculin-B- or ML-141-treated SC to CA1 synapses.

      Dynasore and Pitstop-2 slow down endocytosis, limit the total amount of exocytosis in response to long stimuli, enhance STD in response to 100 Hz stimulation, but profoundly accelerate EPSC recovery following conditioning 100 Hz trains at calyx synapses. At SC to CA1 synapses, Dynasore and Pitstop-2 reduce the extend of facilitation and lower relative steady-state EPSCs suggesting a change in the facilitation-depression balance in favor of the latter.

      The authors use state-of-the art techniques and their data, which is clearly presented, leads to authors to conclude that endocytosis is universally important for clearance of release sites while the importance of scaffold protein-mediated site clearance is limited to 'fast synapses'.

      Unfortunately, and perhaps not completely unexpected in view of the pharmacological tools chosen, there are several observations which remain difficult to understand:

      (1) Blocking site clearance affects release sites that have previously been used, i.e. sites at which SV fusion has occurred and which therefore need to be cleared. Calyces use at most 20% of all release sites during a single AP, likely fewer at 1.3 mM external [Ca2+]. Even if all those 20% of release sites become completely unavailable due to a block of release site clearance, the 2nd EPSC in a train should not be reduced by >20% because ~80% of the sites cannot be affected. However, ~50% EPSC reduction was observed (Fig. 2B1, lower right panel) raising the possibility that Dynasore does more than specifically interfering with SVs endocytosis (and possibly Pitstop as well). Non-specific effects are also suggested by the observed two-fold increase in initial EPSC size in SC to CA1 synapses after Dynasore pre-incubation.

      This study compares different experimental conditions to conclude the physiological role of endocytosis on rapid neurotransmission at the large calyceal synapse in mice. A related study at the Drosophila neuromuscular junction (Kawasaki et al., Nat. Neuroscience 2000) reported similar findings in comparable experimental settings (physiological conditions and acute block of endocytosis).

      (2) More severe depression was observed at calyx synapses after blocking endocytosis which the authors attribute to a presynaptic mechanism affecting pool replenishment. When probing EPSC recovery after conditioning 100 Hz trains, a speed up was observed mediated by an "unknown mechanism" which is "masked in 2 mM [Ca2+]". These two observations, deeper synaptic depression during 100 Hz but faster recovery from depression following 100 Hz, are difficult to align and no attempt was made to find an explanation.

      By varying temperature (PT vs RT), calcium concentration (1.3 mM vs 2.0 mM), and stimulation frequency (10, 100, and 200 Hz; some data are not shown), the effect of endocytosis block on EPSC STD and recovery from STD kinetics at the post-hearing calyx were compared in these settings: (PT, 1.3 mM [Ca2+]), (PT, 2.0 mM Ca2+), and (RT, 2.0 mM [Ca2+]), to dissect their respective role.

      (3) To reconcile previous data reporting a block of Ca2+-dependent recovery (CDR) by Dynasore or Latrunculin (measured at 2 mM external [Ca2+]) with the data presented here (using 1.3 mM external [Ca2+]) reporting no effect or a speed up of recovery from depression, the authors postulate that "CDR may operate only when excessive Ca2+ enters during massive presynaptic activation" (page 10 line 244). While that is possible, such explanation ignores plenty of calyx studies demonstrating fiber stimulation-induced CDR and elucidating molecular pathways mediating fiber stimulation-induced CDR, and it also completely dismisses the strong change in recovery time course after 10 Hz conditioning (single exponential) as compared to 100 Hz conditioning (double exponential with a pronounced fast component).

      Strong presynaptic stimuli such as those illustrated in Figs. 1B,C induce massive exocytosis. The illustrated Cm increase of 2 to 2.5 pF represents fusion of 25,000 to 30,000 SVs (assuming a single SV capacitance of 80 aF) corresponding to a 12 to 15% increase in whole terminal membrane surface (assuming a mean terminal capacitance of ~16 pF). Capacitance measurements can only be considered reliable in the absence of marked changes in series and membrane conductance. Documentation of the corresponding conductance traces is therefore advisable for such massive Cm jumps and merely mentioning that the first 450 ms after stimulation were skipped during analysis or referring to previous publications showing conductance traces is insufficient.

      All bar graphs in Figures 1 through 6 and Figures S3 through S6 compare three or even four (Fig. 5C) conditions, i.e. one control and at least two treatment data sets. It appears as if repeated t-tests were used to run multiple two-group comparisons (i.e. using the same control data twice for two different comparisons). Either a proper multiple comparison test should be used or a Bonferroni correction or similar multiple-comparison correction needs to be applied.

      We updated the statistical analysis of all data using one-way ANOVA and t-test with BonferroniHolm method of p level correction and rectified one analysis in Fig 1 and 3, all major conclusions are unchanged.

      Finally, the terminology of contrasting "fast-signaling" (calyx synapses) and "slow-plastic" (SC synapses) synapses seems to imply that calyx synapses lack plasticity, as does the wording "conventional bouton-type synapses involved in synaptic plasticity" (page 11, line 251). I assume, the authors primarily refer to the maximum frequencies these two synapse types typically transmit (fast-signaling vs slow-signaling)?

      Properties of these two synapses described explicitly in updated text and they are renamed as fast and slow synapes.

      Recommendations for the authors:

      Reviewer #3 (Recommendations For The Authors):

      'SV replenishment' and 'site clearance' should not be used synonymously as it seems to be done sometimes here.

      In this revision, we described them more explicitly.

      The data presented in Fig. S6 are detached from the rest of the manuscript, not relevant and should be removed. page 4 line 95 "... to ensure sufficient Ca2+ currents to induce exo-endocytosis." ICa is large enough to induce exocytosis also at 1.3 mM Ca2+. Please clarify.

      We updated the relevant section.

      page 5, line 108 "... this slow endocytosis showed a strongly prolonged time course without accompanied by the change of Cm or presynaptic Ca2+ currents" Please fix.

      Fixed.

      page 5, line 121 "Thus, at calyces of Held, bath-application of Dynasore or Pitstop-2 can block both fast and slow endocytosis without perturbing presynaptic intracellular milieu." Bath-application never perturbs the intracellular milieu. Please clarify.

      Rephrased.

      page 6 line 128 "... physiological aCSF" is a misnomer (= physiological artificial CSF). Please fix.

      In the introduction section, it is clearly described.

      page 11, line 252 "... from hippocampal SC-CA1 pyramidal neurons" There are no "SC-CA1 pyramidal neurons". Please fix.

      Fixed.

      page 12, line 285 "In acute slices optimized to physiological conditions" The conditions are optimized, not the slices. Please fix.

      Fixed.

      page 14, line 323 same as above

      Fixed.

      page 14, line 330 LTP at SC-CA1 synapses is postsynaptic. Please clarify.

      Rephrased

      page 16, line 381 "had a series resistance of 3-4 MOhm" versus

      page 17, line 408 "The patch pipettes had a series resistance of 5-15 MOhm (less than 10 MOhm in most cells)" 3-4 is perhaps pipette resistance while 5-15 is perhaps series resistance? Please clarify.

      Fixed.

      page 17, line 398 "Cm traces were averaged at every 10 ms (for 10 Hz train stimulation) or 20 ms (for 5 ms single or 1 Hz train stimulation)." Do you mean to say that Cm traces were smoothed with a moving average using a window size of 10 or 20 ms duration? Please clarify.

      Rephrased to clarify better.

      page 18, "All values are given as mean {plus minus} SEM and significance of difference was evaluated by Student's unpaired t-test, unless otherwise noted." Please check. You cannot simply use repeated t-tests for multiple comparisons. Either a proper multiple comparison test should be used or a Bonferroni correction or similar multiple-comparison correction needs to be applied.

      All statistical analysis are updated using one-way ANOVA and t-test, with Bonferroni-Holm method of p level correction and one analysis is rectified in Fig 1 and 3, with no change in major conclusions.

    1. Author response:

      Response to Reviewer #1 (Public Review):

      We thank the reviewer for their constructive criticism of our study, their proposed solutions, and for highlighting areas of the methodology and analytical pipeline where explanations were unclear or unsatisfactory. We will take the reviewer’s feedback into account to improve the clarity and readability of the revised manuscript. We acknowledge the importance of ruling out eye movements as a potential confound. We address these concerns briefly below, but a more detailed explanation (and a full breakdown of the relevant analyses, including the corrected and uncorrected results) will be provided in the revised manuscript.

      First, the source of EEG activity recorded from the frontal electrodes is often unclear. Without an external reference, it is challenging to resolve the degree to which frontal EEG activity represents neural or muscular responses1. Thus, as a preventative measure against the potential contribution of eye movement activity, for all our EEG analyses, we only included activity from occipital, temporal, and parietal electrodes (the selected electrodes can be seen in the final inset of Figure 3).

      Second, as suggested by the reviewer, we re-ran our analyses using the activity measured from the frontal electrodes alone. If the source of the nonlinear decoding accuracy in the AV condition was muscular activity produced by eye movements, we would expect to observe better decoding accuracy from sensors closer to the source. Instead, we found that decoding accuracy from the frontal electrodes (peak d' = 0.08) was less than half that of decoding accuracy from the more posterior electrodes (peak d' = 0.18). These results suggest that the source of neural activity containing information about stimulus position was located over occipito-parietal areas, consistent with our topographical analyses (inset of Figure 4).

      Third, we compared the average eye movements between the three main sensory conditions (auditory, visual, and audiovisual). In the visual condition, there was little difference in eye movements corresponding to the five stimulus locations, likely because the visual stimuli were designed to be spatially diffuse. For the auditory and audiovisual conditions, there was more distinction between eye movements corresponding to the stimulus locations. However, these appeared to be the same between auditory and audiovisual conditions. If consistent saccades to audiovisual stimuli had been responsible for the nonlinear decoding we observed, we would expect to find a higher positive correlation between horizontal eye position and stimulus location in the audiovisual condition than in the auditory or visual conditions. Instead, we found no difference in correlation between audiovisual and auditory stimuli, indicating that eye movements were equivalent in these conditions and unlikely to explain better decoding accuracy for audiovisual stimuli.

      Finally, we note that the stricter eye movement criterion acknowledged in the Discussion section of the original manuscript resulted in significantly better audiovisual d' than the MLE prediction, but this difference did not survive cluster correction. This is an important distinction to make as, when combined with the results described above, it seems to support our original interpretation that the stricter criterion combined with our conservative measure of (mass-based) cluster correction2 led to type 2 error.

      References

      (1) Roy, R. N., Charbonnier, S., & Bonnet, S. (2014). Eye blink characterization from frontal EEG electrodes using source separation and pattern recognition algorithms. Biomedical Signal Processing and Control, 14, 256–264.

      (2) Pernet, C. R., Latinus, M., Nichols, T. E., & Rousselet, G. A. (2015). Cluster-based computational methods for mass univariate analyses of event-related brain potentials/fields: A simulation study. Journal of Neuroscience Methods, 250, 85–93.

      Response to Reviewer #2 (Public Review):

      We thank the reviewer for their insight and constructive feedback. As emphasized in the review, an interesting question that arises from our results is that, if the neural data exceeds the optimal statistical decision (MLE d'), why doesn’t the behavioural data? We agree with the reviewer’s suggestion that more attention should be devoted to this question, and plan to provide a deeper discussion of the relationship between behavioural and neural super-additivity in the revised manuscript. We also note that while this discrepancy remains unexplained, our results are consistent with the literature. That is, both non-linear neural responses (single-cell recordings) and behavioural responses that match MLE are reliable phenomenon in multisensory integration1,2,3,4.

      One possible explanation for this puzzling discrepancy is that behavioural responses occur sometime after the initial neural response to sensory input. There are several subsequent neural processes between perception and a behavioural response5, all of which introduce additional noise that may obscure super-additive perceptual sensitivity. In particular, the mismatch between neural and behavioural accuracy may be the result of additional neural processes that translate sensory activity into a motor response to perform the behavioural task.

      Our measure of neural super-additivity (exceeding optimally weighted linear summation) differs from how it is traditionally assessed (exceeding summation of single neuron responses)2. However, neither method has yet fully explained how this neural activity translates to behavioural responses, and we think that more work is needed to resolve the abovementioned discrepancy. However, our method will facilitate this work by providing a reliable method of measuring neural super-additivity in humans, using non-invasive recordings.

      References

      (1) Alais, D., & Burr, D. (2004). The ventriloquist effect results from near-optimal bimodal integration. Current Biology, 14(3), 257–262.

      (2) Ernst, M. O., & Banks, M. S., (2002). Humans integrate visual and haptic information in a statistically optimal fashion. Nature, 415(6870), 429–433.

      (3) Meredith, M. A., & Stein, B. E. (1993). Interactions among converging sensory inputs in the superior colliculus. Science, 221, 389–391.

      (4) Stanford, T. R., & Stein, B. E. (2007). Superadditivity in multisensory integration: putting the computation in context. Neuroreport 18, 787–792.

      (5) Heekeren, H., Marrett, S. & Ungerleider, L. (2008). The neural systems that mediate human perceptual decision making. Nature Reviews Neuroscience, 9, 467–479.

    1. Author response:

      Thanks for the eLife assessment

      “This study employed a comprehensive approach to examining how the MT+ region integrates into a complex cognition system in mediating human visuo-spatial intelligence. While the findings are useful, the experimental evidence is incomplete and the study design, hypothesis, analyses, writing, and presentation need to be improved.” We plan to revise the manuscript according to the comments of Public Reviews.

      We are grateful for the excellent and very helpful comments, and now we address provisional author responses.

      Reviewer #1 (Public Review):

      Summary:

      The study of human intelligence has been the focus of cognitive neuroscience research, and finding some objective behavioral or neural indicators of intelligence has been an ongoing problem for scientists for many years. Melnick et al, 2013 found for the first time that the phenomenon of spatial suppression in motion perception predicts an individual's IQ score. This is because IQ is likely associated with the ability to suppress irrelevant information. In this study, a high-resolution MRS approach was used to test this theory. In this paper, the phenomenon of spatial suppression in motion perception was found to be correlated with the visuo-spatial subtest of gF, while both variables were also correlated with the GABA concentration of MT+ in the human brain. In addition, there was no significant relationship with the excitatory transmitter Glu. At the same time, SI was also associated with MT+ and several frontal cortex FCs.

      Strengths:

      (1) 7T high-resolution MRS is used.

      (2) This study combines the behavioral tests, MRS, and fMRI.

      Weaknesses:

      (1) In the intro, it seems to me that the multiple-demand (MD) regions are the key in this study. However, I didn't see any results associated with the MD regions. Did I miss something??

      Thank reviewer for pointing this out. After careful consideration, we agree with your point of view. According to the results of Melnick 2013, the motion surround suppression (SI) and the time thresholds of small and large gratings representing hMT+ functionality are correlated with Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indicators, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. This suggests that hMT+ does have the potential to become the core of MD system. However, due to our results only delving into “the GABA-ergic inhibition in human MT predicts visuo-spatial intelligence mediated by reverberation with frontal cortex”, it is not yet sufficient to prove that hMT+is the core node of the MD system, we will adjust the explanatory logic of the article, that is, emphasizing the de-redundancy of hMT+ in visual-spatial intelligence and the improvement of information processing efficiency, while weakening the significance of hMT+ in MD systems.

      (2) How was the sample size determined? Is it sufficient??

      Thank reviewer for pointing this out. We use G*power to determine our sample size. In the study by Melnick (2013), they reported a medium effect between SI and Perception Reasoning sub-ability (r=0.47). Here we use this r value as the correlation coefficient (ρ H1), setting the power at the commonly used threshold of 0.8 and the alpha error probability at 0.05. The required sample size is calculated to be 26. This ensures that our study has adequate power to yield valid statistical results. Furthermore, compared to earlier within-subject studies like Schallmo et al.'s 2018 research, which used 22 datasets to examine GABA levels in MT+ and the early visual cortex (EVC), our study includes a more extensive dataset.

      (3) In Schallmo elife 2018, there was no correlation between GABA concentration and SI. How can we justify the different results different here?

      Thank reviewer for pointing this out. There are several differences between us:

      a. While the earlier study by Schallmo et al. (2018) employed 3T MRS, we utilize 7T MRS, enhancing our ability to detect and measure GABA with greater accuracy.

      b. Schallmo elife 2018 choose to use the bilateral hMT+ as the MRS measurement region while we use the left hMT+. The reason why we focus on left hMT+ are describe in reviewer 1. (6). Briefly, use of left MT/V5 as a target was motivated by studies demonstrating that left MT/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011).

      c. The resolution of MRS sequence in Schallmo elife 2018 is 3 cm isotropic voxel, while we apply 2 cm isotropic voxel. This helps us more precisely locate hMT+ and exclude more white matter signal.

      (4) Basically this study contains the data of SI, BDT, GABA in MT+ and V1, Glu in MT+ and V1-all 6 measurements. There should be 6x5/2 = 15 pairwise correlations. However, not all of these results are included in Figure 1 and supplementary 1-3. I understand that it is not necessary to include all figures. But I suggest reporting all values in one Table.

      We thank the reviewer for the good suggestion, we are planning to make a correlation matrix to reporting all values.

      (5) In Melnick (2013), the IQ scores were measured by the full set of WAIS-III, including all subtests. However, this study only used the visual spatial domain of gF. I wonder why only the visuo-spatial subtest was used not the full WAIS-III?

      We thank the reviewer for pointing this out. The decision was informed by Melnick’s findings which indicated high correlations between Surround suppression (SI) and the Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indexes, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. It is well-established that the hMT+ region of the brain is a sensory cortex involved in visual perception processing (3D perception). Furthermore, motion surround suppression (SI), a specific function of hMT+, aligns closely with this region's activities. Given this context, the Perception Reasoning sub-ability was deemed to have the clearest mechanism for further exploration. Consequently, we selected the most representative subtest of Perception Reasoning—the Block Design Test—which primarily assesses 3D visual intelligence.

      (6) In the functional connectivity part, there is no explanation as to why only the left MT+ was set to the seed region. What is the problem with the right MT+?

      We thank the reviewer for pointing this out. The main reason is that our MRS ROI is the left hMT+, we would like to make different models’ ROI consistent to each other. Use of left MT/V5 as a target was motivated by studies demonstrating that left MT/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011). In addition, we will check the results of our localizer to confirm whether similar findings are consistently replicated.

      (7) In Melnick (2013), the authors also reported the correlation between IQ and absolute duration thresholds of small and large stimuli. Please include these analyses as well.

      We thank the reviewer for the good advice. Containing such result do help researchers compare the result between Melnick and us. We are planning to make such picture in the revised version.

      Reviewer #2 (Public Review):

      Summary:

      Recent studies have identified specific regions within the occipito-temporal cortex as part of a broader fronto-parietal, domain-general, or "multiple-demand" (MD) network that mediates fluid intelligence (gF). According to the abstract, the authors aim to explore the mechanistic roles of these occipito-temporal regions by examining GABA/glutamate concentrations. However, the introduction presents a different rationale: investigating whether area MT+ specifically, could be a core component of the MD network.

      Strengths:

      The authors provide evidence that GABA concentrations in MT+ and its functional connectivity with frontal areas significantly correlate with visuo-spatial intelligence performance. Additionally, serial mediation analysis suggests that inhibitory mechanisms in MT+ contribute to individual differences in a specific subtest of the Wechsler Adult Intelligence Scale, which assesses visuo-spatial aspects of gF.

      Weaknesses:

      (1) While the findings are compelling and the analyses robust, the study's rationale and interpretations need strengthening. For instance, Assem et al. (2020) have previously defined the core and extended MD networks, identifying the occipito-temporal regions as TE1m and TE1p, which are located more rostrally than MT+. Area MT+ might overlap with brain regions identified previously in Fedorenko et al., 2013, however the authors attribute these activations to attentional enhancement of visual representations in the more difficult conditions of their tasks. For the aforementioned reasons, It is unclear why the authors chose MT+ as their focus. A stronger rationale for this selection is necessary and how it fits with the core/extended MD networks.

      We really appreciate reviewer’s opinions. The reason why we focus on hMT+ is following: According to the results of Melnick 2013, the motion surround suppression (SI) and the time thresholds of small and large gratings representing hMT+ functionality are correlated with Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indicators, with high correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. In addition, Fedorenko et al. 2013, the averaged MD activity region appears to overlap with hMT+. Based on these findings, we assume that hMT+ does have the potential to become the core of MD system.

      (2) Moreover, although the study links MT+ inhibitory mechanisms to a visuo-spatial component of gF, this evidence alone may not suffice to position MT+ as a new core of the MD network. The MD network's definition typically encompasses a range of cognitive domains, including working memory, mathematics, language, and relational reasoning. Therefore, the claim that MT+ represents a new core of MD needs to be supported by more comprehensive evidence.

      Thank reviewer for pointing this out. After careful consideration, we agree with your point of view. Due to our results only delving into visuo-spatial intelligence, it is not yet sufficient to prove that hMT is the core node of the MD system. We will adjust the explanatory logic of the article, that is, emphasizing the de-redundancy of hMT+in visual-spatial intelligence and the improvement of information processing efficiency, while weakening the significance of hMT+ in MD systems.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript aims to understand the role of GABA-ergic inhibition in the human MT+ region in predicting visuo-spatial intelligence through a combination of behavioral measures, fMRI (for functional connectivity measurement), and MRS (for GABA/glutamate concentration measurement). While this is a commendable goal, it becomes apparent that the authors lack fundamental understanding of vision, intelligence, or the relevant literature. As a result, the execution of the research is less coherent, dampening the enthusiasm of the review.

      Strengths:

      (1) Comprehensive Approach: The study adopts a multi-level approach, i.e., neurochemical analysis of GABA levels, functional connectivity, and behavioral measures to provide a holistic understanding of the relationship between GABA-ergic inhibition and visuo-spatial intelligence.

      (2) Sophisticated Techniques: The use of ultra-high field magnetic resonance spectroscopy (MRS) technology for measuring GABA and glutamate concentrations in the MT+ region is a recent development.

      Weaknesses:

      Study Design and Hypothesis

      (1) The central hypothesis of the manuscript posits that "3D visuo-spatial intelligence (the performance of BDT) might be predicted by the inhibitory and/or excitation mechanisms in MT+ and the integrative functions connecting MT+ with the frontal cortex." However, several issues arise:

      (1.1) The Suppression Index depicted in Figure 1a, labeled as the "behavior circle," appears irrelevant to the central hypothesis.

      We thank the reviewer for pointing this out. In our study, the inhibitory mechanisms in hMT+ are conceptualized through two models: the neurotransmitter model and the behavior model. The Suppression Index is essential for elucidating the local inhibitory mechanisms within behavior model. However, we acknowledge that our initial presentation in the introduction may not have clearly articulated our hypothesis, potentially leading to misunderstandings. We plan to revise the introduction to better clarify these connections and ensure the relevance of the Suppression Index is comprehensively understood.

      (1.2) The construct of 3D visuo-spatial intelligence, operationalized as the performance in the Block Design task, is inconsistently treated as another behavioral task throughout the manuscript, leading to confusion.

      We thank the reviewer for pointing this out. We acknowledge that our manuscript may have inconsistently presented this construct across different sections, causing confusion. To address this, we plan to ensure a consistent description of 3D visuo-spatial intelligence in both the introduction and the discussion sections. But we would like to maintain 'Block Design task score' within the results section to help readers clarify which subtest we use.

      (1.3) The schematics in Figure 1a and Figure 6 appear too high-level to be falsifiable. It is suggested that the authors formulate specific and testable hypotheses and preregister them before data collection.

      We thank the reviewer for pointing this out. We are planning to revise the Figure 1a and make it less abstract and more logical. For Figure 6, the schematic represents our theoretical framework of how hMT+ works in the 3D viso-spatial intelligence, we believe the elements within this framework are grounded in related theories and supported by evidence discussed in our results and discussions section, making them specific and testable.

      (2) Central to the hypothesis and design of the manuscript is a misinterpretation of a prior study by Melnick et al. (2013). While the original study identified a strong correlation between WAIS (IQ) and the Suppression Index (SI), the current manuscript erroneously asserts a specific relationship between the block design test (from WAIS) and SI. It should be noted that in the original paper, WAIS comprises Similarities, Vocabulary, Block design, and Matrix reasoning tests in Study 1, while the complete WAIS is used in Study 2. Did the authors conduct other WAIS subtests other than the block design task?

      Thanks for pointing this out. Reviewer #1 also asked this question, we copy the answers in here “The decision was informed by Melnick’s findings which indicated high correlations between Surround suppression (SI) and the Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indexes, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. It is well-established that the hMT+ region of the brain is a sensory cortex involved in visual perception processing (3D perception). Furthermore, motion surround suppression (SI), a specific function of hMT+, aligns closely with this region's activities. Given this context, the Perception Reasoning sub-ability was deemed to have the clearest mechanism for further exploration. Consequently, we selected the most representative subtest of Perception Reasoning—the Block Design Test—which primarily assesses 3D visual intelligence.”

      (3) Additionally, there are numerous misleading references and unsubstantiated claims throughout the manuscript. As an example of misleading reference, "the human MT ... a key region in the multiple representations of sensory flows (including optic, tactile, and auditory flows) (Bedny et al., 2010; Ricciardi et al., 2007); this ideally suits it to be a new MD core." The two references in this sentence are claims about plasticity in the congenitally blind with sensory deprivation from birth, which is not really relevant to the proposal that hMT+ is a new MD core in healthy volunteers.

      Thanks for pointing this out. We have carefully read the corresponding references and considered the corresponding theories and agree with these comments. Due to our results only delving into “the GABA-ergic inhibition in human MT predicts visuo-spatial intelligence mediated by reverberation with frontal cortex”, it is not yet sufficient to prove that hMT+ is the core node of the MD system, we will adjust the explanatory logic of the article, that is, emphasizing the de redundancy of hMT+in visual-spatial intelligence and the improvement of information processing efficiency, while weakening the significance of hMT+ in MD systems. In addition, regarding the potential central role of hMT+ in the MD system, we agree with your view that research on hMT+ as a multisensory integration hub mainly focuses on developmental processes. Meanwhile, in adults, the MST region of hMT+ is considered a multisensory integration area for visual and vestibular inputs, which potentially supports the role of hMT+ in multitasking multisensory systems (Gu et al., J. Neurosci, 26(1), 73–85, 2006; Fetsch et al., Nat. Neurosci, 15, 146–154, 2012.). Further research could explore how other intelligence sub-ability such as working memory and language comprehension are facilitated by hMT+'s features.

      Another example of unsubstantiated claim: the rationale for selecting V1 as the control region is based on the assertion that "it mediates the 2D rather than 3D visual domain (Born & Bradley, 2005)". That's not the point made in the Born & Bradley (2005) paper on MT. It's crucial to note that V1 is where the initial binocular convergence occurs in cortex, i.e., inputs from both the right and left eyes to generate a perception of depth.

      Thank you for pointing this out. We acknowledge the inappropriate citation of "Born & Bradley, 2005," which focuses solely on the structure and function of the visual area MT. However, we believe that choosing hMT+ as the domain for 3D visual analysis and V1 as the control region is justified. Cumming and DeAngelis (Annu Rev Neurosci, 24:203–238.2001) state that binocular disparity provides the visual system with information about the three-dimensional layout of the environment, and the link between perception and neuronal activity is stronger in the extrastriate cortex (especially MT) than in the primary visual cortex(V1). This supports our choice and emphasizes the relevance of MT+ in our study. We will revise our reference in the revised version.

      Results & Discussion

      (1) The missing correlation between SI and BDT is crucial to the rest of the analysis. The authors should discuss whether they replicated the pattern of results from Melnick et al. (2013) despite using only one WAIS subtest.

      We thank for reviewer’s suggestion. Now the correlation result is placed in the supplemental material, we will put it back to the main text.

      (2) ROIs: can the authors clarify if the results are based on bilateral MT+/V1 or just those in the left hemisphere? Can the authors plot the MRS scan area in V1? I would be surprised if it's precise to V1 and doesn't spread to V2/3 (which is fine to report as early visual cortex).

      We thank for reviewer’s suggestion. We plan to draw the V1 ROI MRS scanning area and use the visual template to check if the scanning area contains V2/3. If it does, we will refer to it as the early visual cortex rather than specifically V1 in our reporting.

      (3) Did the authors examine V1 FC with either the frontal regions and/or whole brain, as a control analysis? If not, can the author justify why V1 serves as the control region only in the MRS but not in FC (Figure 4) or the mediation analysis (Figure 5)? That seems a little odd given that control analyses are needed to establish the specificity of the claim to MT+

      We thank for reviewer’s suggestion. We plan to do the V1 FC-behavior connection as control analysis. For mediation analysis, since V1 GABA/Glu has no correlation with BDT score, it is not sufficient to apply mediation analysis.

      (4) It is not clear how to interpret the similarity or difference between panels a and b in Figure 4.

      We thank reviewer for pointing this out. We plan to further interpret the difference between a and b in the revised version. Panels a represents BDT score correlated hMT+-region FC, which is obviously involved in frontal cortex. While panels b represents SI correlated hMT+-region FC, which shows relatively less regions. The overlap region is what we are interested in and explain how local inhibitory mechanisms works in the 3D viso-spatial intelligence. In addition, we would like to revise Figure 4 and point out the overlap region.

      (5) SI is not relevant to the authors‘ priori hypothesis, but is included in several mediation analyses. Can the authors do model comparisons between the ones in Figure 5c, d, and Figure S6? In other words, is SI necessary in the mediation model? There seem discrepancies between the necessity of SI in Figures 5c/S6 vs. Figure 5d.

      We thank the reviewer for highlighting this point. The relationship between the Suppression Index (SI) and our a priori hypotheses is elaborated in the response to reviewer 3, section (1). SI plays a crucial role in explicating how local inhibitory mechanisms function within the context of the 3D visuo-spatial task. Additionally, Figure 5c illustrates the interaction between the frontal cortex and hMT+, showing how the effects from the frontal cortex (BA46) on the Block Design Task are fully mediated by SI. This further underscores the significance of SI in our model.

      (6) The sudden appearance of "efficient information" in Figure 6, referring to the neural efficiency hypothesis, raises concerns. Efficient visual information processing occurs throughout the visual cortex, starting from V1. Thus, it appears somewhat selective to apply the neural efficiency hypothesis to MT+ in this context.

      We thank the reviewer for highlighting this point. There is no doubt that V1 involved in efficient visual information processing. However, in our result, the V1 GABA has no significant correlation between BDT score, suggesting that the V1 efficient processing might not sufficiently account for the individual differences in 3D viso-spatial intelligence. Additionally, we will clarify our use of the neural efficiency hypothesis by incorporating it into the introduction of our paper to better frame our argument.

      Transparency Issues:

      (1) Don't think it's acceptable to make the claim that "All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary information". It is the results or visualizations of data analysis, rather than the raw data themselves, that are presented in the paper/supp info.

      We thank reviewer for pointing this out. We realized that such expression will lead to confusion. We will delete this expression.

      (2) No GitHub link has been provided in the manuscript to access the source data, which limits the reproducibility and transparency of the study.

      We thank reviewer for pointing this out. We will attach the GitHub link in the revised version.

      Minor:

      "Locates" should be replaced with "located" throughout the paper. For example: "To investigate this issue, this study selects the human MT complex (hMT+), a region located at the occipito-temporal border, which represents multiple sensory flows, as the target brain area."

      We thank reviewer for pointing this out. We will revise it.

      Use "hMT+" instead of "MT+" to be consistent with the term in the literature.

      We thank reviewer for pointing this out. We agree to use hMT+ in the literature.

      "Green circle" in Figure 1 should be corrected to match its actual color.

      We thank reviewer for pointing this out. We will revise it.

      The abbreviation for the Wechsler Adult Intelligence Scale should be "WAIS," not "WASI."

      We thank reviewer for pointing this out. We will revise it.

    1. Author Response:

      We appreciate the thorough comments from the reviewers. Before revising the manuscript, we would like to briefly reply to the main concerns raised:

      • Is pupil size a reliable proxy of effort? A vast amount of work demonstrates that pupil size sensitively scales with fluctuations in effort: for instance, the pupil dilates when increasing load in working memory, or multiple object tracking tasks, and such pupillary effects robustly explain individual differences in cognitive ability and fluctuations in performance across trials.1–4 This extends to the planning of movements as pupil dilations are observed prior to the execution of (eye) movements.5 As reviewed previously6–12 (based on vast literature each), any increase in effort is associated with an increase in pupil size. Inadvertently, we phrased as if the link between effort and pupil size was established via shared neural correlates. However, this is not the case as the link between effort and pupil size had been established well before the underlying neural circuitry of this relationship was investigated in detail. During the revision, we plan to rewrite this section to clarify that pupil size indexes effort and to provide a clear distinction between this link and putative neural underpinnings of such effort-linked modulations.

      • Is saccade latency an alternative explanation for the link between effort and saccade selection? Longer saccade latencies may imply more complex oculomotor programming (e.g. saccades with larger amplitudes require longer latencies for non-microsaccades13, and latencies increase when distractors are presented14), and latencies are indeed known to differ across directions15,16. As suggested, it is possible that saccade latencies may also predict saccade preferences. However, even if this is the case, this would not constitute an alternative explanation. As saccade latency may index oculomotor programming complexity, it can potentially be considered an alternative outcome measure of effort, albeit restricted to the context of saccades. Therefore, if saccade latencies predict saccade preferences, this would not affect our conclusion, rather it would constitute as converging evidence that supports the conclusion that effort drives saccade selection.

      A related question is why one would use pupil size as a measure of effort, given the methodological care that pupillometry requires. There are a number of points that make pupil size sensible and promising in comparison with saccade latencies. In contrast to saccade latencies, pupil size allows to capture the effort of different effector systems (e.g. head or hand movements), and potentially even the effort associated with covert shifts of attention. Moreover, pupil size is a temporally rich and continuous measure that allows to isolate processes unfolding prior to (eye) movement onset (e.g. oculomotor programming). Together, this makes pupil size a powerful tool to study the costs of visual selection more broadly. In the revision, we will add analyses incorporating latencies and other other saccade metrics. We will also discuss the differences between pupil size and saccade latencies in capturing saccade costs and effort.

      • Are the current results causal or correlational? Most of the currently reported results are indeed correlational in nature. In our first tasks, we correlated pupil size during saccade planning to saccade preferences in a subsequent task. Although the link between across tasks was correlational, the observed relationship clearly followed our previously specified hypothesis.17 Moreover, experiments 1 and 2 of the visual search data replicated and extended this relationship. We also directly manipulated cognitive demand in the second visual search experiment. In line with the hypothesis that effort affects saccade selection, participants executed less saccades overall when performing a (primary) auditory dual task, and even cut the costly saccades most. Whilst mostly correlational, we do not know of a more fitting and parsimonious explanation for our findings than effort predicting saccade selection. We will address causality in the discussion for transparency and point more clearly to the second visual search experiment for causal evidence.

      References

      (1) Alnæs, D. et al. Pupil size signals mental effort deployed during multiple object tracking and predicts brain activity in the dorsal attention network and the locus coeruleus. J. Vis. 14, 1 (2014).

      (2) Koevoet, D., Strauch, C., Van der Stigchel, S., Mathôt, S. & Naber, M. Revealing visual working memory operations with pupillometry: Encoding, maintenance, and prioritization. WIREs Cogn. Sci. e1668 (2023) doi:10.1002/wcs.1668.

      (3) Robison, M. K. & Unsworth, N. Pupillometry tracks fluctuations in working memory performance. Atten. Percept. Psychophys. 81, 407–419 (2019).

      (4) Unsworth, N. & Miller, A. L. Individual Differences in the Intensity and Consistency of Attention. Curr. Dir. Psychol. Sci. 30, 391–400 (2021).

      (5) Richer, F. & Beatty, J. Pupillary Dilations in Movement Preparation and Execution. Psychophysiology 22, 204–207 (1985).

      (6) Bumke, O. Die Pupillenstörungen Bei Geistes-Und Nervenkrankheiten. (Fischer, 1911).

      (7) Kahneman, D. Attention and Effort. (Prentice-Hall, 1973).

      (8) van der Wel, P. & van Steenbergen, H. Pupil dilation as an index of effort in cognitive control tasks: A review. Psychon. Bull. Rev. 25, 2005–2015 (2018).

      (9) Loewenfeld, I. E. Mechanisms of reflex dilatation of the pupil. Doc. Ophthalmol. 12, 185–448 (1958).

      (10) Mathôt, S. Pupillometry: Psychology, Physiology, and Function. J. Cogn. 1, 16 (2018).

      (11) Sirois, S. & Brisson, J. Pupillometry. WIREs Cogn. Sci. 5, 679–692 (2014).

      (12) Strauch, C., Wang, C.-A., Einhäuser, W., Van der Stigchel, S. & Naber, M. Pupillometry as an integrated readout of distinct attentional networks. Trends Neurosci. 45, 635–647 (2022).

      (13) Kalesnykas, R. P. & Hallett, P. E. Retinal eccentricity and the latency of eye saccades. Vision Res. 34, 517–531 (1994).

      (14) Walker, R., Deubel, H., Schneider, W. X. & Findlay, J. M. Effect of Remote Distractors on Saccade Programming: Evidence for an Extended Fixation Zone. J. Neurophysiol. 78, 1108–1119 (1997).

      (15) Hanning, N. M., Himmelberg, M. M. & Carrasco, M. Presaccadic attention enhances contrast sensitivity, but not at the upper vertical meridian. iScience 25, 103851 (2022).

      (16) Hanning, N. M., Himmelberg, M. M. & Carrasco, M. Presaccadic Attention Depends on Eye Movement Direction and Is Related to V1 Cortical Magnification. J. Neurosci. 4

      4, (2024).

      (17) Koevoet, D., Strauch, C., Naber, M. & Van der Stigchel, S. The Costs of Paying Overt and Covert Attention Assessed With Pupillometry. Psychol. Sci. 34, 887–898 (2023).

    1. Author response:

      The following is the authors’ response to the original reviews.

      We have made revisions accordingly. The following is a list of the changes we have made in this revised Version of Record:

      (1) We have added three more panels to Figure 1-figure supplement 1, showing that lipopolysaccharide-induced severe lung injury also generate some ectopic tuft cells expressing both Dclk1 and Gα-gustducin, a G protein α subunit expressed in taste bud cells and many tuft cells.

      (2) We have added a new supplemental figure, Figure 2-figure supplement 1, showing the reanalysis data of the single-cell RNAseq dataset (GSE197163) indicating the numbers of Trpm5-GFP+ ectopic tuft cells expressing Tas2r108, Tas2r105, Tas2r138, Tas2r137 and other Tas2rs, respectively. And the original “Figure 2-figure supplement 1” in the previous version has been changed to “Figure 2-figure supplement 2”.

      (3) We have added another new supplemental figure, Figure 3-figure supplement 1, showing the H1N1 infection-damaged lung tissue volumes in the Gng13-cKO mice are significantly greater than those in WT or Trpm-/- mice, which is in agreement with the data of the injured lung surface areas from these three genotypes of mice (Figure 3 C and D). And the original “Figure 3-figure supplement 1” in the previous version has been changed to “Figure 3-figure supplement 2”.

      (4) We have added to the new Figure 3-figure supplement 2 two new panels: I and J, showing the reanalysis data of the single-cell RNAseq dataset (GSE197163), indicating that about 57% of Trpm5-GFP+ ectopic tuft cells express Gγ13, some of which express Alox5, a key enzyme to the biosynthesis of pro-resolving mediators.

      (5) We have added one reference on Sytox and another on Alox5.

      (6) We have corrected two labeling errors to Figure 3 G and M, and some other typos in the article. Also, we have removed “Present address” attached to some authors since no present address was needed at all.

      Attached below is our point-by-point reply to the comments and suggestions made by the reviewers. We hope that you and the reviewers will find all concerns satisfactorily addressed.

      Responses to public reviews:

      Reviewer #1:

      Li et al. report here on the expression of a G-protein subunit Gng13 in ectopic tuft cells that develop after severe pulmonary injury in mice. By deleting this gene in ectopic tuft cells as they arise, the authors observed worsened lung injury and greater inflammation after influenza infection, as well as a decrease in the overall number of ectopic tuft cells. This was in stark contrast to the deletion of Trpm5, a cation channel generally thought to be required for all functional gustatory signaling in tuft cells, where no phenotype is observed. Strengths here include a thorough assessment of lung injury via a number of different techniques. Weaknesses are notable: confusingly, these findings are at odds with reports from other groups demonstrating no obvious phenotype upon influenza infection in mice lacking the transcription factor Pou2f3, which is essential for all tuft cell specification and development. The authors speculate that heterogeneity within nascent tuft cell populations, specifically the presence of pro- and anti-inflammatory tuft cells, may explain this difference, but they do not provide any data to support this idea.

      We thank the reviewer for pointing out the strengths of this work. The phenotypes of the Gng13 conditional knockout mice upon severe pulmonary injury seem to be severer than those of Trpm5 knockout or Pou2f3 knockout mice, which we would attribute to functionally specific tuft cell subtypes. In the intestines, tuft cells are known to promote type II innate immune responses. Those ectopic pulmonary tuft cells emerge at 12 days post infection, and may not be involved in the initial immune responses to the infection, and instead, some of them may contribute to the inflammation resolution and functional recovery. Reanalysis of the previously published single tuft cell RNAseq dataset indeed showed that Gng13 is expressed in a subset of these ectopic pulmonary tuft cells, and anti-inflammatory genes such as Alox5 are also found in some of these tuft cells (please see the newly added Figure 3 supplement 2 I and J). Together, these data suggest that while some of these tuft cells may still play a pro-inflammatory role as in the intestines, some other Gγ13-expressing tuft cells contribute to the inflammation resolution, and disruption of the latter’s function results in the severer phenotypes.

      Reviewer #2:

      The study by Li et al. aimed to demonstrate the role of the Gγ13-mediated signal transduction pathway in tuft cell-driven inflammation resolution and repairing injured lung tissue. The authors showed a reduced number of tuft cells in the parenchyma of Gγ13 null lungs following viral infection. Mice with a Gγ13 null mutation showed increased lung damage and heightened macrophage infiltration when exposed to the H1N1 virus. Their further findings suggested that lung inflammation resolution, epithelial barrier, and fibrosis were worsened in Gγ13 null mutants.

      Strengths:

      The beautiful immunostaining findings do suggest that the number of tuft cells is decreased in Gr13 null mutants.

      Weaknesses:

      The description of phenotypes, and the approaches used to measure the phenotypes are problematic. Rigorous investigation of the mouse lung phenotypes is needed to draw meaningful conclusions.

      Thank the reviewer for pointing out the major findings and strengths of our work. Regarding the approaches used to measure the phenotypes, we first did double immunostaining and validated that the lipopolysaccharide-induced DCLK1+ positive cells are indeed ectopic pulmonary tuft cells with an antibody to Gα-gustducin, a commonly expressed G protein α subunit in taste buds and tuft cells. Second, in addition to the measurements of the injured lung surface areas, we determined the injured lung tissue volumes by slicing the injured lungs into a series of tissue sections, quantifying the injured areas in each section and then reconstructing the injured volumes. Third, we reanalyzed the previously published single-tuft cell RNAseq dataset and found that a subset (i.e., ~57%) of Trpm5-GFP+ tuft cells express Gng13, some of which express anti-inflammatory genes such as Alox5. These additional data further support our finding that a subset of these Gγ13-expressing ectopic tuft cells may contribute to the inflammation resolution while others may play a proinflammatory role.

      Reply to the recommendations of Reviewer #1:

      (1) A major issue with this study is the fact that Chat-Cre mediated knockout of Gng13 leads to reduced tuft cells and impaired recovery, yet global TRPM5 deletion (this study) and global Pou2f3 deletion (Barr et al.) exhibit no apparent phenotype. One can imagine a Trpm5-independent role of Gng13 in tuft cells, but it is much harder to reconcile with the fact that Pou2f3 KO mice, which lack tuft cells entirely, exhibit no apparent phenotype. This was examined in some detail in Barr et al., demonstrating no apparent change in weight loss, dysplastic expansion (Krt5+ cells), or goblet cell metaplasia. The most parsimonious explanation is that Gng13 deletion in another Chat+ cell type, probably neurons of some sort, is leading to this phenotype. The authors really need to investigate this in some detail as the data does not really support a role of tuft cells in the phenotype they observe. Better yet, identification of the other Chat+ cell type in which Gng13 deletion promotes impaired lung recovery would be very interesting. While neurons seem likely, perhaps there is another Chat+ cell type expressing Gng13 in the respiratory tract that could be playing a role as well. In either case, the discrepancy between Pou2f3 KO (no phenotype) and Chat-Cre / Gng13 KO (impaired recovery) is difficult to reconcile.

      We agree with the reviewer, and it took us some time to make senses of the data as well. The differences in phenotypes between Trpm5-knockout versus Gng13 conditional knockout (Gng13-cKO) could be explained by that Gγ13 is a partner of Gβγ moiety of a heterotrimeric G protein (Gαβγ),which is known to act on many effector enzymes and ion channels, while Trpm5 largely regulates the influx of monovalent cations, depolarizing the plasma membrane potentials. Thus, it is understandable that nullification of Gng13 may have more profound effect on cell physiology and consequent phenotypes than that of Trpm5, and similar differential effects were also found in the intestines (Frontiers in Immunology, 2023, DOI 10.3389/fimmu.2023.1259521).

      Data from several research groups have indicated that there are subtypes of tuft cells, each of which displays unique gene expression patterns as well as input and out signal profiles. It is yet not well understood how each subtype may contribute to the inflammatory responses or inflammation resolution. Comparative analyses of our data from the Gng13-cKO mice versus those from Pou2f3-KO mice suggest that Gng13-expressing tuft cells may have a role in the inflammation resolution while other ectopic tuft cells may contribute to the maintenance of the inflammation at a certain level, impairing subsequent tissue repairing and recovery. The exact molecular and cellular mechanisms are to be revealed in our future studies.

      The central nervous system may also play a role in the impaired lung recovery. But our detailed immunochemical studies did not identify any significant number of neurons innervating the lung tissue co-expressing ChAT and Gng13, suggesting that no immediate action from these neurons may regulate the pulmonary inflammation resolution or functional recovery.

      Together, our data suggest the importance of tuft cell subtype-specific functions, which may help us further understand the role of these rare tuft cells.

      (2) Figures showing alternative injury models inducing the generation of ectopic tuft cells are not convincing and not quantified. DCLK1 can be a bit promiscuous, so verifying tuft cell expansion in these other models with other markers (especially for LPS and HDM which have not been reported elsewhere) is important.

      We agree with the reviewer that DCLK1 is not a very specific marker for tuft cells. We have also observed that chemical inductions of these ectopic tuft cells with bleomycin, HDM or LPS are not as effective as H1N1 viruses. To verify that these rare DCLK1-positive cells are indeed tuft cells, we performed double immunostaining with antibodies to DCLK1 and to Gα-gustducin, another tuft cell marker. The results showed that some of these spindle-shaped DCLK1 positive cells indeed also express Gα-gustducin (see the newly added panels in Figure 1-figure supplement 1), indicating that they are most likely the chemically induced ectopic tuft cells. We also agree with the reviewer that it would be important to further investigate the possible roles of these cells during the stages of the chemically induced injury, inflammation resolution and functional recovery.

      (3) Calcium responses in isolated post-flu tuft cells are interesting but difficult to interpret as presented. Can higher-power images be shown? Also, no statistical analysis is presented to provide any confidence in that data.

      Thank the reviewer for the suggestions. As found in taste buds, only a subset of these ectopic tuft cells expresses Tas2rs, and each of these cells may express a few of the 35 murine Tas2rs. Thus, a particular bitter tasting compound can activate only few tuft cells and we had to use low-magnification to include more responsive cells in a field under the imaging microscope. We agree with the reviewer that it would be an interesting idea to statistically correlate the response profile to bitter substances with the cell’s Tas2r expression pattern, which we have done with sperm cells before (Molecular Human Reproduction, 2013, doi:10.1093/molehr/gas040). However, the main focus of this work is on the effect of Gng13-cKO in a subset of these ectopic tuft cells on the recovery. We plan to investigate these interesting cells in more details in the future.

      (4) I am unaware of Sytox being a specific dye for pyroptotic cells. Can the authors please provide a reference or otherwise justify this?

      Sytox is a dye to stain dead cells, which has been used previously in the studies on gasdermin-mediated lytic cell death (Xi et al., Up-regulation of gasdermin C in mouse small intestine is associated with lytic cell death in enterocytes in worm-induced type 2 immunity. PNAS 2021 118(30) e2026307118 https://doi.org/10.1073/pnas.2026307118). In our work we used the dye for the same assay.

      (5) The authors perform qPCR for various taste receptor genes pre- and post-flu, but do not show that these genes are specifically induced in tuft cells. Since single-cell data and bulk RNA-Seq are available from Barr et al., the authors should validate the expression of these Tas2r genes specifically in post-flu tuft cells.

      Thank the reviewer for the suggestion. Yes, we have performed analysis of the single-cell RNAseq dataset (GSE197163, Barr et al. 2022) and found that among 613 Trpm5-GFP+ tuft cells, Tas2r108 was expressed in the greatest number of cells, i.e., 67 cells, followed by Tas2r105, Tas2R138, Tas2r137, Tas2r118 and Tas2r102, which were detected in 11, 10, 10, 5 and 4 cells, respectively (see the newly added Figure 2-figure supplement 1). This order of expressing cell numbers is very much in agreement with that of the relative Tas2r expression levels obtained with the qPCR experiment (Figure 2A), indicating the expression of these Tas2rs likely in the ectopic tuft cells. We will further validate the data by analyzing the bulk RNA-Seq dataset when it is accessible to us.

      (6) Some general editing of language throughout would be helpful to increase readability.

      Thanks for pointing out. We have carefully checked the manuscripts, corrected some typos and revised several sentences to increase its readability.

      (7) For the fibrosis analysis, trichrome staining is very heterogenous, which is reflected by the large error bars in Fig. 8B. A more quantitative, "whole lung" analysis such as hydroxyproline content or western blotting for Col1a1 would be ideal.

      The approach of Masson’s trichrome staining along with qRT-PCR assays on the fibrotic gene expression has been used previously to quantitatively analyze fibrosis (e.g., Zhang et al., Neuropilin-1 mediates lung tissue-specific control of ILC2 function in type 2 immunity. Nature Immunology 23:237-250, 2022, https://doi.org/10.1038/s41590-021-01097-8). We agree with the reviewer that there are large error bars in Fig. 8B, and hydroxyproline content assay or western blotting for Col1a1 would be ideal. But our qRT-PCR was performed on the RNA samples extracted from the “whole lungs”, and its data are also able to reflect the extent of fibrosis of the lungs.

      (8) The authors claim that only a subset of tuft cells express Gng13, but this is supported only by a single IF image in Fig. 3 supplement 1G. The authors could download the single-cell dataset from Barr et al. to confirm the heterogeneity of Gng13 expression and get a better sense of the fraction of total ectopic tuft cells that express this, as it is a critical point in their model.

      Thank the reviewer for the suggestion. Yes, we have downloaded and reanalyzed the single-cell RNAseq dataset (GSE197163), and found that out of 613 Trpm5-GFP+ tuft cells, 350 or 57% of these cells expressed Gng13 (Figure 3-figure supplement 2I). This result, together with our immunohistochemical data (Figure 3-figure supplement 2G and H) indicates that Gγ13 is expressed in a subset of these ectopic tuft cells. More comprehensive studies are needed to characterize these tuft cell subtypes and elucidate subtype-selective functions.

      Reply to the recommendations of Reviewer #2:

      The study needs more rigorous examinations of the phenotypes. For example, quantification of the injury area in Fig3C is problematic. Similarly, fibrotic phenotype and quantification in Fig 8C also have problems. This study heavily used qRT-PCR analysis to quantitate the level change of bitter/other receptors in a minor population of tuft cells which are also minor in a whole lung. Given the limited number of cells, it is difficult to appreciate that qRT-PCR can pick up the difference. In addition, how would the findings in this study reconcile with the finding by Huang (PMID: 36129169) where pou2f3 null mutants (without tuft cells) were used? Huang et al. did not observe more severe phenotypes in the mice without tuft cells than controls.

      Thank the reviewer for the recommendations. Regarding Fig 3C, please see the reply below: revisions for clarity point #2.

      Fig 8 B and C used Masson’s trichrome staining to quantitatively analyze fibrosis, which has been used by other groups as well (e.g., Zhang et al., Neuropilin-1 mediates lung tissue-specific control of ILC2 function in type 2 immunity. Nature Immunology 23:237-250, 2022, https://doi.org/10.1038/s41590-021-01097-8). Our qRT-PCR data on the fibrotic gene expression (Figure 8A) further support the Masson’s trichrome staining results.

      We realized that tuft cells make up only a minor population in the lungs. So, we performed qRT-PCR assays on the RNA samples isolated from mostly the injured tissues along with the corresponding tissues from the uninjured lungs as control. To validate our qRT-PCR data, we reanalyzed the previously published single ectopic tuft cell RNAseq dataset (GSE197163), and found that the most abundantly expressed Tas2r108 determined by qRT-PCR was also expressed in the greatest number of tuft cells, and the order of expression levels of other Tas2rs are also well in agreement between the qRT-PCR and single-cell RNAseq data (Figure 2A, Figure 2-figure supplement 1), cross-validating the data obtained by these two very different approaches.

      We have carefully studied the finding by Huang (PMID: 36129169). Our data suggest that there are subtypes of the ectopic tuft cells, some of which contribute to the inflammation resolution while others play a proinflammatory role. Indeed, the reanalysis of the aforementioned single tuft cell RNAseq dataset found that about 57% Trpm5-GFP+ ectopic tuft cells expressed Gng13, and some of which expressed Alox5, a key enzyme to the biosynthesis of pro-resolving mediators. Thus, in the Pou2f3-knockout mice, both pro- and anti-inflammatory tuft cells are ablated, it would be hard to observe any significant phenotypes. When the function of a subset of Gγ13-expressing tuft cells is disrupted, the anti-inflammatory role from these cells is eliminated, resulting severer phenotypes. More studies are needed to further understand the subtype-specific functions of these fascinating tuft cells.

      Do Gγ13 null mutants show similar phenotypes in bleomycin injury model?

      Bleomycin and other chemicals-induced injury models indeed engender much fewer ectopic pulmonary tuft cells. Thus, it is more difficult to test the effect of Gng13 mutation due to the small number of the Gng13-expressing tuft cells in either WT or mutant lungs.

      What is the cell fate of lineage labeled tuft cells in the lungs of Chat-Cre:Ai9:Gng13flox/flox mice following viral infection at different times examined? The numbers were decreased at different time points post-injury based on the data. Did these cells undergo apoptosis? It is an excellent idea to look into the cell fate of ChAT-Cre:Ai9:Gng13flox/flox. We believe that these cells would have a similar fate to other ectopic tuft cells, probably undergoing apoptosis. But our data suggest that Gng13 mutation suppresses the increase the ectopic tuft cells, or the increase of a particular subtype of these tuft cells. Further studies are needed to elucidate the molecular mechanisms of the Gγ13-mediated signal transduction pathways regulating the proliferation of a subset of ectopic tuft cells.

      Here are the revisions for clarity and coherence to the figures:

      (1) Fig 2: For the functional assessment, using tracheal tuft cells from the same ChAT-Cre:Ai9 mice would be a suitable positive control in the calcium response traces experiment. These specific cells could also serve as a control in Fig2a.

      We would agree with the reviewer that tracheal tuft cells from the same ChAT-Cre: Ai9 mice would be an ideal positive control in the calcium response experiment as well as in the qRT-PCR assay. But we have established reliable methods to calcium image primary cells expressing taste receptors and quantify their RNA expression levels, which have been used in our previous publications, e.g., (1) Functional characterization of bitter taste receptors expressed in mammalian testis. Molecular Human Reproduction, 2013, doi:10.1093/molehr/gas040; (2) Infection by the parasitic helminth Trichinella spiralis activates a Tas2r-mediated signaling pathway in intestinal tuft cells. PNAS 2019, www.pnas.org/cgi/doi/10.1073/pnas.1812901116. We thank the reviewer for the excellent suggestion.

      (2) Fig 3C: It is not clear whether the depicted areas really represent the injured area. To provide a more comprehensive view, the authors should also provide histological analysis and quantification of the injured lung. A 3D representation of the injury area would offer a more accurate presentation.

      Thank the reviewer for the point. The depicted areas in Fig 3C are indeed the injured surface areas of the lungs. Following the reviewer’s suggestion, we carried out the histological analysis to determine the injured tissue volumes of the lungs. We fixed the lungs, and sliced them into 12 μm-thick sections, which were imaged under a microscope. The injured areas in a section were identified and quantified using the ImageJ software, and then the injured volume for this section was obtained by multiplying the area by the thickness of the section, i.e., 12 μm. Statistical analyses indicate that the injured volume of the Gng13-cKO lungs is significantly more than those of WT or Trpm5-KO mice, which has been included in Figure 3-figure supplement 1, and is in agreement with the data of the injured surface areas (Fig 3C).

      (3) Fig 3 G/I/K/M: There seems to be an inconsistency in the time points. There's no indication for 14 dpi, yet two for 25 dpi. Additionally, a color legend for each sample would be helpful.

      Thank the reviewer for pointing out. There were two typos, which have been corrected. Yes, the time points should be 14 dpi, 20 dpi, 25 dpi and 50 dpi. And a color legend has been added as well.

      (4) Fig 4A: Using CD64 co-stained with Krt5 might better highlight the immune cells in the damaged region. Additionally, could you clarify the choice of the neutrophil marker CD64 over CD45 for staining the injured lung?

      We agree with the reviewer that Krt5 antibody staining can help define the damaged region. We sectioned the lung tissues with a special attention to the damaged areas, but we found that the adjacent healthy areas also had extra immune cells. Thus, we counted in all these CD64+ cells in both the damaged as well as the surrounding, seemingly healthy areas. We used CD64 instead of CD45 to label these altered immune cells because we found that CD64 can better label the differential immune cells between WT and Gng13-cKO mice following H1N1 infection. Furthermore, CD64-labeled cells could be readily related to the Gsdmd/Gsdme-expressing F4/80-labeled immune cells shown in Figure 5 and its supplemental figures.

      (5) Fig 5 and Supplemental Fig 5: It appears that the F4/80 staining exhibits notable background staining.

      Yes, there is some background staining. The antibody was the best we could find, but its quality could be further improved. On the other hand, we thought that there were some cellular debris that might be stained positive by that antibody. At a higher magnification, however, we could still identify individual cells co-expressing IL-1β.

      (6) Fig 8C: The depicted area does not seem to adequately represent the fibrosis in the injured lung.

      Masson’s trichrome staining has been previously used to quantitatively analyze fibrosis (e.g., Zhang et al., Neuropilin-1 mediates lung tissue-specific control of ILC2 function in type 2 immunity. Nature Immunology 23:237-250, 2022, https://doi.org/10.1038/s41590-021-01097-8). Our qRT-PCR assays on the fibrotic gene expression (Figure 8A) were performed on the RNA samples extracted from the whole lungs, and the resultant data are able to reflect the extent of fibrosis of the lungs, although we also agree with the reviewer that additional data would make the conclusion more convincing.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We express our sincere appreciation for your insightful comments and constructive suggestions. It is with great pleasure that we submit the revised version of our manuscript. Over the past months, we have meticulously considered all the invaluable feedback provided by the three anonymous reviewers, and endeavored to incorporate significant revisions accordingly. Furthermore, we have meticulously rephrased the results section in accordance with your guidance, aiming to bolster the rigor of our manuscript. The specific changes implemented in the revised manuscript are outlined below:

      - Revised the title of the manuscript.

      - Revised the description of early mitotic and meiotic chromosome structure in the scc3 mutant (Lines 167-274).

      - Added the BiFC results illustrating the interaction between SCC3 and other cohesin proteins in Figure S10.

      - Enhanced the detail in the description of figure legends, particularly for Figures 2 and 4.

      - Refined and rephrased the language of the manuscript.

      We hope these positive revisions have substantially strengthened the manuscript. Once again, we extend our heartfelt gratitude for your invaluable input.

      eLife assessment

      This important study elucidates the function of the cohesin subunit SCC3 in impeding DNA repair between inter-sister chromatids in rice. The observation of sterility in the SCC3 weak mutant prompted an investigation of abnormal chromosome behavior during anaphase I through karyotype analysis. While the evidence presented is largely solid, the strength of support can be substantially improved in some aspects, leaving room for further investigation. This research contributes to our understanding of meiosis in rice and attracts cell biologists, reproductive biologists, and plant geneticists.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript describes the identification and characterization of rice SCC3, including the generation and characterization of plants containing apparently lethal null mutations in SCC3 as well as mutant plants containing a c-terminal frame-shift mutation. The weak scc3 mutants showed both vegetative and reproductive defects. Specifically, mitotic chromosomes appeared to partially separate during prometaphase, while meiotic chromosomes were diffuse during early meiosis and showed alterations in sister chromatid cohesion, homologous chromosome pairing, and recombination. The authors suggest that SCC3 acts as a cohesin subunit in mitosis and meiosis, but also plays more functions other than just cohesion.

      Reviewer #2 (Public Review):

      This manuscript shows detailed evidence of the role of cohesin regulators in rice meiosis and mitosis.

      Reviewer #3 (Public Review):

      Prior research on SCC3, a cohesin subunit protein, in yeast and Arabidopsis has underscored its vital role in cell division. This study investigated into the specific functions of SCC3 in rice mitosis and meiosis. In a weakened SCC3 mutant, sister chromatids separating was observed in anaphase I, resulting in 24 univalents and subsequent sterility. The authors meticulously documented SCC3's loading and degradation dynamics on chromosomes, noting its impact on DNA replication. Despite the loss of homologous chromosome pairing and synapsis in the mutant, chromosomes retained double-strand breaks without fragmenting. Consequently, the authors inferred that in the scc3 mutant, DNA repair more frequently relies on sister chromatids as templates compared to the wild type.

      We extend our sincere gratitude to the Editors and the Reviewers for their highly constructive and insightful suggestions. We deeply appreciate receiving both positive feedback and constructive criticism on our manuscript. In light of the reviewers’ comments, we have diligently undertaken substantial revisions to improve the manuscript. The revised version comprehensively addresses all the points raised by the reviewers.

      Below, we provide a detailed point-by-point response to the reviewers’ comments:

      Recommendations for the authors:

      Reviewer #1:

      (1) Line 170- looking at pollen formation does not specifically evaluate whether SCC3 is involved in meiosis.

      Thank you very much for this advice. We totally agree with your point of view that pollen formation defects only indicate the problem of gametogenesis. We are sorry for not accurately describing this sentence. It has been revised in the manuscript (Lines 167-176).

      (2) Lines 203-205- this seems more like discussion and is pure speculation. Another possibility described above is that the truncated SCC3 protein is partially functional and what they see is due to this partial functionality. Have the authors considered the possibility that a partially functional version of SCC3 is produced that alters its function or the function of the cohesin complex? How much of the protein epitope remains in the truncated protein?

      We are so grateful for the insightful suggestions provided. We concur with the proposition that a partially functional SCC3 may indeed be synthesized, contributing to the survivability of the mutant. Notably, the truncated version of the protein retains approximately 60% to 70% of the epitope, which ostensibly maintains a residual functionality within the weak scc3 mutant. In this manuscript, the loss of C-terminal 910-1116 aa of SCC3 contains a special protein epitope and a certain protein secondary structure, which may alter the protein’s folding and its subsequent roles within the cohesin complex.

      In this study, we encountered challenges in generating null alleles of the scc3 mutants in rice utilizing the CRISPR-Cas9 system. Consequently, it is plausible that the scc3-1 and scc3-2 variants represent null alleles of SCC3, resulting in embryonic lethality. We posit that the identification of weak alleles is paramount to facilitating the survival of the organism. Thus, selecting some weak mutants, particularly those exhibiting the most pronounced phenotype, is advantageous for conducting further research. Our findings indicate that the diminished scc3 mutant lacks only a segment of the C-terminal, yet this deficiency is adequate to ensure the plant's survival while significantly impeding the meiotic process. We cannot dismiss the likelihood that these observed defects are attributable to the unique truncated proteins. We extend our sincerest thanks once again.

      (3) Lines 212- I question whether what the authors see in Figure 2 is chromosome fragmentation. It could just as well be alterations in chromosome structure. Likewise, the authors provide little to no evidence that the mutation affects the replication process. Rather, the presence of replicated chromosomes later in mitosis and meiosis would argue that replication is not disrupted.

      We express our gratitude to the reviewer for highlighting this critical inquiry. Contrary to the scenario of chromosome fragmentation, as you astutely observed, the preservation of normal sister chromatids during prometaphase indicates that the replication process remains uninterrupted. In alignment with your insights, our study embarked on an extensive series of full-length fluorescence in situ hybridization (FISH) experiments to elucidate the underlying mechanisms contributing to the observed increase in the distance between sister chromatids, particularly during interphase. The preponderance of our findings corroborates the hypothesis that the chromosomes exhibit alterations in structure, as depicted in Figure 2A. Intriguingly, our data suggest that cohesin, upon interaction with other chromatin-bound proteins, may facilitate loop extrusion, anchoring itself in a manner that potentially alters chromosomal architecture. These alterations in chromosome structure and the subsequent defects in genome folding and cohesion establishment, particularly rely on SCC3. In response to your valuable suggestions, we have meticulously revised the relevant sections of our manuscript. We extend our sincere thanks for your insightful comments.

      (4) Line 230- what does the sentence SCC3 may enhance the interaction with DNA mean, the interaction of the cohesin complex?

      We are sorry for the ambiguity in our initial description and wish to clarify that SCC3 indeed plays a pivotal role in augmenting the interaction between the cohesin complex and DNA. Our observations revealed an upsurge in the signal intensity of SCC3 as cells transition from interphase to prophase, as depicted in Figure 2B. This enhancement correlates with the observed defects in scc3 mutants during prophase, suggesting that SCC3’s functional significance is particularly pronounced at this stage of the cell cycle. We have revised our manuscript to reflect these insights more accurately, in accordance with your valuable suggestions. We express our sincere gratitude for your guidance.

      (5) Oddly, and unexplainably the authors present data indicating that SCC3 interacts with RAD21.1, but not SMC1, SMC3, or REC8. The fact that the authors report that SCC3 only interacts with RAD21.1 but no other cohesin proteins is quite hard to explain.

      As argued in the point above, the available data do not provide compelling evidence supporting the interaction between SCC3 and other cohesin proteins. We have repeated yeast two-hybrid (Y2H) experiments yielding consistent outcomes, which also surprised us initially. In the revised manuscript, we further added the bimolecular fluorescence complementation (BiFC) results between SCC3 and other cohesin proteins in rice protoplast (Figure S10). These supplementary data affirm that SCC3 predominantly interacts with RAD21.1, excluding interactions with other cohesin proteins. While the absence of such interactions is perplexing, our investigations have failed to detect any binding between SCC3 and other cohesin proteins.

      A weak interaction between SCC3 and REC8 has been reported in Arabidopsis (Kuttig et al. bioRxiv https://doi.org/10.1101/2022.06.20.496767). We speculate that either these proteins do not interact or the yeast-hybrid assays may be inadequate for detecting their interaction, as several factors can impede interaction in a heterologous system. In Figure 7, we could only detect the interaction between SCC3 and RAD21.1 in both Y2H and BiFC experiments. This suggests potential alterations in protein folding or conformation, or the involvement of additional regulatory factors modulating the interaction between SCC3 and other cohesin proteins. Notably, given RAD21.1’s pivotal role as a core component in the cohesin complex, our supplementary findings demonstrate the interactions between SMC1/3 and RAD21.1 (data not shown). Consequently, our current data propose a model wherein RAD21.1 and SMC1/3 form a circular structure, with SCC3 positioned on the outer periphery of the ring complex, associating specifically with RAD21.1 (Figure 8A).

      Reviewer #2:

      The authors did not consider creating heterozygous mutants for the replication fork. Moderate English language editing may be required.

      We extend our gratitude to the reviewer for their valuable suggestions. Initially, we did not explore the potential relationship between SCC3 and the replication fork. Cohesin, as we understand, becomes associated with DNA prior to DNA replication. The phenomenon of sister chromatid co-entrapment arises as replication forks traverse through cohesin rings, a process intricately linked to DNA replication dynamics. In this study, we exclusively observed aberrant chromosome structures in the scc3 mutant during interphase (Figure 2). We conjecture that these anomalies may stem from alterations in chromosome structure, such as genome folding and loop extrusion, rather than being directly attributable to the DNA replication fork. However, the precise nature of these chromosome structural aberrations during interphase in the scc3 mutant remains elusive, necessitating further comprehensive investigation in future studies. We have refined the language of our manuscript in accordance with the reviewer’s suggestions. Once again, we express our sincere appreciation for the invaluable suggestions provided.

      Reviewer #3:

      While the paper's conclusions are generally well-supported, further substantiation is needed for the claim that SCC3 inhibits template choice for sister chromatids. To bolster this conclusion, I recommend that the authors perform whole-genome sequencing on parental and F1 individuals from two rice variants, subsequently calculating the allele frequencies at heterozygous sites in the F1 individuals. If SCC3 indeed inhibits inter-sister chromatid repair in the wild type, we would anticipate a higher frequency of inter-homologous chromosome repair (i.e., gene conversion). This should be manifested as a bias away from the Mendelian inheritance ratio (50:50) in the offspring of the wild type compared to the offspring of the scc3+/- mutant.

      We express our sincere appreciation for your insightful suggestions. It is really a good suggestion. We have arranged to do this experiment. As it takes long time to prepare plant materials and sequence analysis, we hope the ongoing sequencing work will get some important information supporting those hypotheses. As we have not obtained the direct evidence that SCC3 involved in sister chromatid repair, we changed the title as “SCC3 is an axial element essential for homologous chromosome pairing and synapsis”. Once again, we really extend our gratitude for your invaluable suggestions.

      A point that warrants consideration is the placement of the protein interaction experiments involving SCC3 within the paper. It is presented relatively late in the manuscript. If the authors possess information regarding the interaction between RAD21.1 and SCC3 and how it relates to the functional study of RAD21.1, it could contribute to a more comprehensive analysis. However, if this information is unrelated to the current study, it might be advisable to omit it, as it appears to diverge from the main focus of this work.

      We express our sincere gratitude for your invaluable suggestions. It has been documented in yeast that the interaction between SCC3 and SCC1 is indispensable for the efficient loading of cohesin. In our study, we endeavored to elucidate the intricate relationships among various cohesin subunits. Through our investigations, we have discerned that RAD21.1 serves as a pivotal core subunit within the cohesin complex, facilitating interactions with both SMC1/3 and SCC3 (data not shown). Additionally, our findings indicate that the interaction between RAD21.1 and SCC3 is imperative for maintaining the stability of the cohesin ring and its association with DNA (data not shown). Consequently, the interaction between these two proteins assumes paramount importance for our subsequent analyses. This study holds significant promise for future investigations.

      It's worth noting that while the title of the study claims that "SCC3 inhibits inter-sister chromatids repair during rice meiosis," the last sentence of the abstract weakens this conclusion by using the word "seems." A study's title should ideally reflect the most definitive and conclusive findings.

      We sincerely appreciate your valuable suggestions. In response, we have revised the description in our manuscript to enhance its rigor.

      In Figure 8C, it appears that cohesin is depicted between two DNA strands.

      Figure 8C illustrates the process of sister chromatid repair during meiosis in the scc3 mutant. Two gray lines and two blue lines represent the four sister chromatids of two homologous chromosomes, respectively. In the wild type, cohesin plays a crucial role in tethering together the two sister chromatids. As per your reminder, cohesin should indeed encircle the two sister chromatids, as depicted in Figure 8B. Following a thorough evaluation and to mitigate any potential confusion, we have deleted Figure 8C.

    1. Author response

      Reviewer #1 (Public Review):

      Summary:

      The authors aimed to modify the characteristics of the extracellular matrix (ECM) produced by immortalized mesenchymal stem cells (MSCs) by employing the CRISPR/Cas9 system to knock out specific genes. Initially, they established VEGF-KO cell lines, demonstrating that these cells retained chondrogenic and angiogenic properties. Additionally, lyophilized carriage tissues produced by these cells exhibited retained osteogenic properties.

      Subsequently, the authors established RUNX2-KO cell lines, which exhibited reduced COLX expression during chondrogenic differentiation and notably diminished osteogenic properties in vitro. Transplantation of lyophilized carriage tissues produced by RUNX2-KO cell lines into osteochondral defects in rat knee joints resulted in the regeneration of articular cartilage tissues as well as bone tissues, a phenomenon not observed with tissues derived from parental cells. This suggests that gene-edited MSCs represent a valuable cell source for producing ECM with enhanced quality.

      Strengths:

      The enhanced cartilage regeneration observed with ECM derived from RUNX2-KO cells supports the authors' strategy of creating gene-edited MSCs capable of producing ECM with superior quality. Immortalized cell lines offer a limitless source of off-the-shelf material for tissue regeneration.

      We thank the reviewer for the interest in our work. We however want to clarify that the present manuscript does not report the generation of ECM with “superior quality”, but rather of modulated composition and thus function.

      Weaknesses:

      Most data align with anticipated outcomes, offering limited novelty to advance scientific understanding. Methodologically, the chondrogenic differentiation properties of immortalized MSCs appeared deficient, evidenced by Safranin-O staining of 3D tissues and histological findings lacking robust evidence for endochondral differentiation. This presents a critical limitation, particularly as authors propose the implantation of cartilage tissues for in vivo experiments. Instead, the bulk of data stemmed from type I collagen scaffold with factors produced by MSCs stimulated by TGFβ.

      The chondrogenic differentiation of our MSOD-B line and their capacity of undergoing endochondral ossification has been robustly demonstrated in previous studies (Pigeot et al., Advanced Materials 2021 and Grigoryan et al., Science Translational Medicine 2022). In the present manuscript, we thus compare the chondrogenic capacity of newly established VEGF-KO and RUNX-KO lines to those of MSOD-B cells. We demonstrate by qualitative (Safranin-O staining, Collagen type 2 and Collagen type X immuno-stainings) and quantitative (glycosaminoglycans assay) assays that the generated tissues consist in cartilage grafts of similar quality than the MSOD-B counterpart. Of note, the safranin-O stainings were performed on lyophilized tissues, which can alter the staining quality/intensity. We will thus provide additional stainings of generated tissues pre-lyophilization.

      The rationale behind establishing VEGF-KO cell lines remains unclear. What specific outcomes did the authors anticipate from this modification?

      VEGF is a known master regulator of angiogenesis and a key mediator of endochondral ossification. It has also been extensively used in bone tissue engineering studies as a supplemented factor – primarily in the form of VEGFα – to increase the vascularization and thus outcome of bone formation of engineered grafts (https://www.nature.com/articles/s42003-020-01606-9, https://www.sciencedirect.com/science/article/pii/S8756328216301752). In our study, it was thus identified as a natural candidate to demonstrate the possibility to generate VEGF-KO cartilage and subsequently assess the functional impact on both the angiogenic and osteogenic potential of resulting cartilage tissue.

      Insufficient depth was given to elucidate the disparity in osteogenic properties between those observed in ectopic bone formation and those observed in transplantation into osteochondral defects. While the regeneration of articular cartilage in RUNX2-KO ECM presents intriguing results, the study lacked an exploration into underlying mechanisms, such as histological analyses at earlier time points.

      Using RUNX2-KO ECM, we aimed at demonstrating the impact on cartilage remodeling and bone formation. This was performed ectopically but also in the rat osteochondral defect as a regenerative set-up of higher clinical relevance. We agree with the reviewer that additional experimental groups and time-points (not only earlier but also longer ones) would offer a better mechanistic understanding of the ECM contribution to the joint repair. However, as stated in our manuscript this is a proof-of-concept study that successfully demonstrated the influence of the cartilage ECM modification on the in vivo skeletal regeneration. A follow-up study would need to be performed to complement existing evidence and strengthen the relevance of our approach for cartilage repair.

      Reviewer #2 (Public Review):

      The manuscript submitted by Sujeethkumar et al. describes an alternative approach to skeletal tissue repair using extracellular matrix (ECM) deposited by genetically modified mesenchymal stromal/stem cells. Here, they generate a loss of function mutations in VEGF or RUNX2 in a BMP2-overexpressing MSC line and define the differences in the resulting tissue-engineered constructs following seeding onto a type I collagen matrix in vitro, and following lyophilization and subcutaneous and orthotopic implantation into mice and rats. Some strengths of this manuscript are the establishment of a platform by which modifications in cell-derived ECM can be evaluated both in vitro and in vivo, the demonstration that genetic modification of cells results in complexity of in vitro cell-derived ECM that elicits quantifiable results, and the admirable goal to improve endogenous cartilage repair. However, I recommend the authors clarify their conclusions and add more information regarding reproducibility, which was one limitation of primary-cell-derived ECMs.

      We thank the reviewer for the positive evaluation of our work.

      Overcoming the limitations of native/autologous/allogeneic ECMs such as complete decellularization and reduction of batch-to-batch variability was not specifically addressed in the data provided herein. For the maintenance of ECM organization and complexity following lyophilization, evidence of complete decellularization was not addressed, but could be easily evaluated using polarized light microscopy and quantification of human DNA for example in constructs pre and post-lyophilization.

      We will clarify the experiments and characterization performed with lyophilized tissues versus those performed with decellularized ones. We will also provide evidence of DNA removal in our decellularized ECMs.

      It would be ideal to see minimization of batch-to-batch variability using this approach, as mitigation of using a sole cell line is likely not sufficient (considering that the sole cell line-derived Matrigel does exhibit batch-to-batch and manufacturer-to-manufacturer variability). I recommend adding details regarding experimental design and outcomes not initially considered. Inter- and intra-experimental reproducibility was not adequately addressed. The size of in vitro-derived cartilage pellets was not quantified, and it is not clear that more than one independent 'differentiation' was performed from each gene-edited MSC line to generate in vitro replicates and constructs that were implanted in vivo.

      We thank the Reviewer for the comment on variability/reproducibility concern. Using a cell line does confer higher robustness but indeed does not grant unlimited consistency of batch production. We will temper our claims in the discussion and mention the need to regularly re-characterize cell lines properties upon passages.

      In our study, our grafts have been generated from various batches and tested in more than one experimental repeat. This will be further described in the revised version of our manuscript. We will also implement data on the size variability of generated tissues.

      The use of descriptive language in describing conclusions may mislead the reader and should be modified accordingly throughout the manuscript. For example, although this reviewer agrees with the comparative statements made by the authors regarding parental and gene-edited MSC lines, non-quantifiable terms such as 'frank' 'superior' (example, line 242) are inappropriate and should rather be discussed in terms of significance. Another example is 'rich-collagenous matrix,' which was not substantiated by uniform immunostaining for type II collagen (line 189).

      I have similar recommendations regarding conclusive statements from the rat implantation model, which was appropriately used for the purpose of evaluating the response of native skeletal cells to the different cell-derived ECMs. Interpretations of these results should be described with more accuracy. For example, increased TRAP staining does not indicate reduced active bone formation (line 237). Many would not conclude that GAGs were retained in the RUNX2-KO line graft subchondral region based on the histology. Quantification of % chondral regeneration using histology is not accurate as it is greatly influenced by the location in the defect from which the section was taken. Chondral regeneration is usually semi-quantified from gross observations of the cartilage surface immediately following excision. The statements regarding integration (example line 290) are not founded by histological evidence, which should show high magnification of the periphery of the graft adjacent to the native tissue.

      We thank the Reviewer for the constructive suggestions. We will revise language accordingly throughout the manuscript.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors have started off using an immortalized human cell line and then gene-edited it to decrease the levels of VEGF1 (in order to influence vascularization), and the levels of Runx2 (to decrease chondro/osteogenesis). They first transplanted these cells with a collagen scaffold. The modified cells showed a decrease in vascularization when VEGF1 was decreased, and suggested an increase in cartilage formation.

      In another study, the matrix generated by these cells was subsequently remodeled into a bone marrow organ. When RUNX2 was decreased, the cells did not mineralize in vitro, and their matrices expressed types I and II collagen but not type X collagen in vitro, in comparison with unedited cells. In vivo, the author claims that remodeling of the matrices into bone was somewhat inhibited. Lastly, they utilized matrices generated by RUNX2 edited cells to regenerate chondro-osteal defects. They suggest that the edited cells regenerated cartilage in comparison with unedited cells.

      Strengths:

      -The notion that inducing changes in the ECM by genetically editing the cells is a novel one, as it has long been thought that ECM composition influences cell activity.

      -If successful, it may be possible to make off-the-shelf ECMS to carry out different types of tissue repair.

      We thank the Reviewer for the critical evaluation of our work and the highlighted novelty of it.

      Weaknesses:

      -The authors have not generated histologically identifiable cartilage or bone in their transplants of the cells with a type I scaffold.

      The chondrogenic differentiation of our MSOD-B line and their capacity of undergoing endochondral ossification has been robustly demonstrated in previous studies (Pigeot et al., Advanced Materials 2021 and Grigoryan et al., Science Translational Medicine 2022). In the present manuscript, we thus compare the chondrogenic capacity of newly established VEGF-KO and RUNX-KO lines to those of MSOD-B. We demonstrate by qualitative (Safranin-O staining, Collagen type 2 and Collagen type X immuno-stainings) and quantitative (glycosaminoglycans assay) assays that the generated tissues consist in cartilage tissue of similar quality than the MSOD-B. However, the safranin-O stainings were performed on lyophilized tissues, which can alter the staining quality/intensity. We will thus provide additional stainings of generated tissues pre-lyophilization.

      On the contested formation of bone in vivo by our ECMs grafts, we have provided compelling qualitative evidence via Masson´s Trichrome stainings and quantification of mineralized volume by µCT. Both cortical bone and trabecular structures were identified ectopically. Those are standard evaluation methods in the field, we would be happy to receive additional suggestions by the Reviewer.

      -In many cases, they did not generate histologically identifiable cartilage with their cell-free-edited scaffold. They did generate small amounts of bone but this is most likely due to BMPs that were synthesized by the cells and trapped in the matrix.

      We now appreciate that the Reviewer agrees on the successful formation of bone induced by our engineered grafts. We however still respectfully disagree with the “small amount of bone” statement since our MSOD-B and MSOD-B VEGF KO cartilage grafts led to the full generation of a mature ectopic bone organ (that is, also composed of extensive marrow). This has been assessed qualitatively and quantitatively.

      We agree with the Reviewer on the key role of BMP-2 in the remodeling process into bone and bone marrow, which we have extensively described in our previous publication (Pigeot et al., Advanced Materials 2021). We previously demonstrated that the low amount of BMP-2 (in the dozens of nanogram/tissue range) embedded in the matrix is not sufficient per se to induce ectopic endochondral ossification. It is the combined presence of GAGs in the matrix -thus cartilage- that allows the success of bone formation. Since we have already demonstrated in the present manuscript that the GAGs content is the same in MSOD-B and MSOD-B edited ECMs, we will provide additional data demonstrating the maintenance of BMP-2 content in all generated cartilage tissues.

      -There is a great deal of missing detail in the manuscript.

      We will provide additional information on the MSOD-B line and the overall methodology in our revised version.

      -The in vivo study is underpowered, the results are not well documented pictorially, and are not convincing.

      We will provide additional information and pictures related to our in vivo studies. We believe our group size supports our conclusions confirmed by statistical assessment.

      -Given the fact that they have genetically modified cells, they could have done analyses of ECM components to determine what was different between the lines, both at the transcriptome and the protein level. Consequently, the study is purely descriptive and does not provide any mechanistic understanding of what mixture of matrix components and growth factors works best for cartilage or bone. But this presupposes that they actually induced the formation of bona fide cartilage, at least.

      We thank the Reviewer for the suggestion. However, our study did not aim at understanding what ECM graft composition work best for cartilage nor bone regeneration respectively. Instead, we propose the exploitation of our cellular tools to interrogate the function of key ECM constituents and their impact in skeletal regeneration. We once more confirm that we generated lyophilized cartilage grafts which will be more evidently supported by histological assessment before lyophilization.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We thank the reviewers for their thorough review of and overall positive comments on our manuscript. We have revised the manuscript to address the one remaining concern raised by one of the reviewers. This is described below.

      Fig.1B-C: To give a standard deviation from 2 data points has no statistical significance. In this case it would be better to define as range/difference of the 2 data points.

      We have modified the legend for Figure 1 to now read, “The average of two experiments is plotted with the bars representing the range of each time point.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      In 'Systems analysis of miR-199a/b-5p and multiple miR-199a/b-5p targets during chondrogenesis', Patel et al. present a variety of analyses using different methodologies to investigate the importance of two miRNAs in regulating gene expression in a cellular model of cartilage development. They first re-analysed existing data to identify these miRNAs as one of the most dynamic across a chondrogenesis development time course. Next, they manipulated the expression of these miRNAs and showed that this affected the expression of various marker genes as expected. An RNA-seq experiment on these manipulations identified putative mRNA targets of the miRNAs which were also supported by bioinformatics predictions. These top hits were validated experimentally and, finally, a kinetic model was developed to demonstrate the relationship between the miRNAs and mRNAs studied throughout the paper.

      I am convinced that the novel relationships reported here between miR-199a/b-5p and target genes FZD6, ITGA3, and CAV1 are likely to be genuine. It is important for researchers working on this system and related diseases to know all the miRNA/mRNA relationships but, as the authors have already published work studying the most dynamic miRNA (miR-140-5p) in this biological system I was not convinced that this study of the second miRNA in their list provided a conceptual advance on their previous work.

      We believe this study is an enhancement on our previous work for two reasons, which have been alluded to in new text within the introduction. Firstly, our previous work used experimental and bioinformatic analysis to identify microRNAs with significant regulatory roles during chondrogenesis. This new manuscript additionally uses  a systems biology approaches to identify novel miRNA-mRNA interactions and capture these within an in silico model. Secondly, this work was initiated by the analysis of our previously generated data – using a novel tool we developed for this type of data (Bioconductor - TimiRGeN).  

      I was also concerned with the lack of reporting of details of the manipulation experiments. The authors state that they have over-expressed miR-199a-5p (Figure 2A) and knocked down miR-199b-5p (Figure 2B) but they should have reported their proof that these experiments had worked as predicted, e.g. showing the qRT-PCR change in miRNA expression. Similarly, I was concerned that one miRNA was over-expressed while the other was knocked down - why did the authors not attempt to manipulate both miRNAs in both directions? Were they unable to achieve a significant change in miRNA expression or did these experiments not confirm the results reported in the manuscript?

      We agree with the reviewer that some additional data were needed to demonstrate the effective regulation of miR-199-5p.  Hence, Supplementary Figure 1 is now included which provides validation of the effects of miR-199a-5p overexpression (Supplementary Figure 1A) and inhibition of miR-199a/b-5p (Supplementary Figure 1B). Within the main manuscript, Figure 2B has been amended to include the consequences of inhibition of miR-199a-5p, with 2C showing the consequences of miR-199b-5p inhibition. Further, we include new data with regards to miR-199a/b-5p inhibition on CAV1 (Figure 4A). 

      I had a number of issues with the way in which some of the data was presented. Table 1 only reported whether a specific pathway was significant or not for a given differential expression analysis but this concealed the extent of this enrichment or the level of statistical significance reported. Could it be redrawn to more similarly match the format of Figure 3A? The various shades of grey in Figure 2 and Figure 4 made it impossible to discriminate between treatments and therefore identify whether these data supported the conclusions made in the text. It also appeared that the same results were reported in Figure 3B and 3C and, indeed, Figure 3B was not referred to in the main text. Perhaps this figure could be made more concise by removing one of these two sets of panels.

      We agree with all points made here and have amended these within the manuscript. Figure 1A is now pathway enrichment plots from the TimiRGeN R Bioconductor package, and the table which previously showed the pathways enriched at each time point is now in the supplementary materials (supp. Table 1). Figure 2 and 4 now have color instead of shades of grey. Figure 3C has now been moved to supplementary materials (Supplementary Figure 2) and is referenced in the text. 

      Overall, while I think that this is an interesting and valuable paper, I think its findings are relatively limited to those interested in the role of miRNAs in this specific biomedical context.

      Reviewer #2 (Public review):

      Summary:

      This study represents an ambitious endeavor to comprehensively analyze the role of miR199a/b-5p and its networks in cartilage formation. By conducting experiments that go beyond in vitro MSC differentiation models, more robust conclusions can be achieved.

      Strengths:

      This research investigates the role of miR-199a/b-5p during chondrogenesis using bioinformatics and in vitro experimental systems. The significance of miRNAs in chondrogenesis and OA is crucial, warranting further research, and this study contributes novel insights.

      Weaknesses:

      While miR-140 and miR-455 are used as controls, these miRNAs have been demonstrated to be more relevant to Cartilage Homeostasis than chondrogenesis itself. Their deficiency has been genetically proven to induce Osteoarthritis in mice. Therefore, the results of this study should be considered in comparison with these existing findings.

      We agree with the reviewers comments. miR-455-null mice develop normally but miR-140-null (or mutated) mice and humans do have skeletal abnormalities (e.g. Nat Med. 2019 Apr;25(4):583-590. doi: 10.1038/s41591-019-0353-2), indicating a role in chondrogenesis.  We have made an addition in the description to point towards the need to assess the roles miR-199a/b-5p may play during skeletogenesis and OA. We anticipate miR-199a/b-5p to be relevant in OA and have ongoing additional work for this – but this beyond the scope of this manuscript. 

      Recommendations to Authors:

      Reviewer #1 (Recommendations to authors):

      Beyond the issues raised in the public review, I had a few minor recommendations that are largely designed to help improve the understanding of the manuscript as it is currently written.

      (1) Please provide the statistical tests used to obtain p-values in the Figure 2 and 4 legends.

      We have now added statistical test information to the figure legends of figures 2 and 4.

      (2) It is stated on p. 9 that both miRNAs may share a functional repertoire because 25 and 341 genes are interested between their inhibition experiments. Please provide statistical support that this overlap is an enrichment over the null background in this experiment. Total DE genes – chi squared. Expected / Observed. 

      A chi-squared test is now presented in the manuscript which shows that the number of significant genes which were found in common between miR-199a-5p knockdown and miR-199b-5p knockdown were significantly more than expected for day 0 or day 1 of the experiments. 

      (3) The final sentence on p. 12 (beginning 'Size of the points reflect...') seemed out of place - is it part of a legend?

      Thank you for pointing out this mistake - it was part of figure 3C and now is in the supplementary materials.

      (4) A sentence on p. 14 reads that 'FZD6 and ITGA3 levels increased significantly' but this should read decreased, rather than increased. Quite an important typo!

      Thank you for pointing this error out. It has been corrected.

      (5) Theoretical transcripts are mentioned in the legend of Figure 5A but these were not present in the figure. Please include these or remove them from the legend.

      This error has been removed form Figure 5A.

      (6) On p 20, the references 22 and 27 should I think be moved to earlier in the sentence (after 'miR-199a-5p-FZD6 has been predicted previously'). Currently, it reads as if these references support your luciferase assays which you claim are the first evidence for this target relationship.

      We agree with this change and have corrected the manuscript.

      (7) The reference to Figure 5D on p. 20 should be a reference to Figure 5C.

      Thank you for pointing this error out – this has been corrected.

      Reviewer #2 (Recommendations to authors):

      (1) The paper is based on the importance of miR-140 and miR-455 as miRNAs in chondrogenesis, citing only Barter, M. J. et al. Stem Cells 33, (2015). Considering the scope and results of this study, this citation is insufficient.

      We agree with this reviewers comments. For many year miR-140 and miR-455 have been experimented on and their importance in OA research has become apparent. We included additional references within the introduction to address this.

      (2) Analyzing chondrogenesis solely through differentiation experiments from MSCs is inadequate. It is essential to perform experiments involving the network within normal cartilage tissue and/or the generation of knockout mice to understand the precise role of miR199a/b-5p in chondrogenesis.

      We have added an additional paragraph in the discussion to state this, and do believe it is highly important that miR-199a/b-5p be tested in OA samples – however this would be beyond the intended scope of this article.

      (3) In light of the above points, it is imperative to investigate the role of miR-199a/b-5p beyond the in vitro differentiation model from MSCs, encompassing mouse OA models or human disease samples.

      In tangent with the previous address, we agree with the pretense and believe additional experiments should be performed to gain more insight to the mechanism of how miR-199a/b-5p regulate OA. But development of a new mouse line to investigate this is not in the scope of this manuscript.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      In this study the authors use an elegant set of single-molecule experiments to assess the transcriptional and post-transcriptional regulation of RecB. The question stems from a previous observation from the same lab, that RecB protein levels are low and not induced under DNA damage. The authors first show that recB transcript levels are low and have a short half-life. They further show that RecB levels are likely regulated via translational control. They provide evidence for low noise in RecB protein levels across cells and show that the translation of the mRNA increases under double-strand break conditions. Authors identify Hfq binding sites in the recbcd [recBCD] operon and show that Hfq regulates the levels of RecB protein without changing the mRNA levels. They suggest that RecB translation is directly controlled by Hfq binding to mRNA, as mutating one of the binding sites has a direct effect on RecB protein levels.

      Strengths:

      The implication of Hfq in regulation of RecB translation is important and suggests mechanisms of cellular response to DNA damage that are beyond the canonically studied mechanisms (such as transcriptional regulation by LexA). Data are clearly presented and the writing is direct and easy to follow. Overall, the study is well-designed and provides novel insights into the regulation of RecB, that is part of the complex required to process break ends.

      Weaknesses:

      Some key findings need additional support/ clarifications to strengthen the conclusions. These are suggested to the authors.

      Reviewer #2 (Public Review):

      Summary:

      The authors carry out a careful and rigorous quantitative analysis of RecB transcript and protein levels at baseline and in response to DNA damage. Using single-molecule FISH and Halo-tagging in order to achieve sensitive measurements, they provide evidence that enhanced RecB protein levels in response to DNA damage are achieved through a post-transcriptional mechanism mediated by the La-like RNA binding protein, Hhq1 [Sm-like RNA binding protein, Hfq]. In terms of biological relevance, the authors suggest that this mechanism provides a way to control the optimum level of RecB expression as both deletion and over-expression are deleterious. In addition, the proposed mechanism provides a new framework for understanding how transcriptional noise can be suppressed at the protein level.

      Strengths:

      Strengths of the manuscript include the rigorous approaches and orthogonal evidence to support the core conclusions, for example, the evidence that altering either Hhq1 [Hfq] or its recognition sequence on the RNA similarly enhance the protein to RNA ratio of RecB. The writing is clear and the experiments are well-controlled. The modeling approaches provide essential context to interpret the data, particularly given the small numbers of molecules per cell. The interpretations are careful and well supported.

      Weaknesses:

      The authors make a compelling case for the biological need to exquisitely control RecB levels, which they suggest is achieved by the pathway they have uncovered and described in this work. However, this conclusion is largely inferred as the authors only investigate the effect on cell survival in response to (high levels of) DNA damage and in response to two perturbations - genetic knock-out or over-expression, both of which are likely more dramatic than the range of expression levels observed in unstimulated and DNA damage conditions.

      In the discussion, we proposed that the post-transcriptional regulation of recB that we have uncovered could be involved in keeping RecB levels within an optimal range. We agree that testing the phenotypic impact of small changes in RecB levels would add additional strength to this suggestion. However, this is experimentally very challenging because of the low copy number of RecB molecules, which makes it difficult to slightly alter RecB levels in a controlled and homogeneous (across cells) manner. Developing the synthetic biology tools necessary for such an experiment is beyond the scope of this article. In the manuscript, we will clarify the limits of our interpretation of the role of the uncovered regulation.

      Reviewer #3 (Public Review):

      Summary:

      The work by Kalita et al. reports regulation of RecB expression by Hfq protein in E.coli cell. RecBCD is an essential complex for DNA repair and chromosome maintenance. The expression level needs to be regulated at low level under regular growth conditions but upregulated upon DNA damage. Through quantitative imaging, the authors demonstrate that recB mRNAs and proteins are expressed at low level under regular conditions. While the mRNA copy number demonstrates high noise level due to stochastic gene expression, the protein level is maintained at a lower noise level compared to expected value. Upon DNA damage, the authors claim that the recB mRNA level is not significantly affected, but RecB protein level increases due to a higher translation efficiency. [Upon DNA damage, the authors claim that the recB mRNA concentration is decreased, however RecB protein level is compensated by higher translation efficiency]. Through analyzing CLASH data on Hfq, they identified two Hfq binding sites on RecB polycistronic mRNA, one of which is localized at the ribosome binding site (RBS). Through measuring RecB mRNA and protein level in the ∆hfq cell, the authors conclude that binding of Hfq to the RBS region of recB mRNA suppresses translation of recB mRNA. This conclusion is further supported by the same measurement in the presence of Hfq sequestrator, the sRNA ChiX, and the deletion of the Hfq binding region on the mRNA.

      Strengths:

      (1) The manuscript is well-written and easy to understand.

      (2) While there are reported cases of Hfq regulating translation of bound mRNAs, its effect on reducing translation noise is relatively new.

      (3) The imaging and analysis are carefully performed with necessary controls.

      Weaknesses:

      The major weaknesses include a lack of mechanistic depth, and part of the conclusions are not fully supported by the data.

      (1) Mechanistically, it is still unclear why upon DNA damage, translation level of recB mRNA increases, which makes the story less complete. The authors mention in the Discussion that a moderate (30%) decrease in Hfq protein was observed in previous study, which may explain the loss of translation repression on recB. However, given that this mRNA exists in very low copy number (a few per cell) and that Hfq copy number is on the order of a few hundred to a few thousand, it's unclear how 30% decrease in the protein level should resides a significant change in its regulation of recB mRNA.

      While Hfq is a highly abundant protein, it has many mRNA and sRNA targets, some of which are also present in large amounts (DOI: 10.1046/j.1365-2958.2003.03734.x). As recently shown, the competition among the targets over Hfq proteins results in unequal (across various targets) outcomes, where the targets with higher Hfq affinity have an advantage over the ones with less efficient binding (DOI: 10.1016/j.celrep.2020.02.016). In line with these findings, we reason that upon DNA damage, a moderate decrease in the Hfq protein abundance (30%) can lead to a similar competition among Hfq targets where high-affinity targets outcompete low- affinity ones as well as low-abundant ones (such as recB mRNAs). Therefore, we hypothesise that the regulation of low abundant targets of Hfq by moderate perturbations of Hfq protein level is a potential explanation for the change in RecB translation that we have observed. We will expand this part of the discussion to explain our reasoning in a more explicit and coherent way.

      (2) Based on the experiment and the model, Hfq regulates translation of recB gene through binding to the RBS of the upstream ptrA gene through translation coupling. In this case, one would expect that the behavior of ptrA gene expression and its response to Hfq regulation would be quite similar to recB. Performing the same measurement on ptrA gene expression in the presence and absence of Hfq would strengthen the conclusion and model

      Indeed, based on our model, we expect PtrA expression to be regulated by Hfq in a similar manner to RecB. However, the product encoded by the ptrA gene, Protease III, (i) has been poorly characterised; (ii) unlike RecB, is located in the periplasm (DOI: 10.1128/jb.149.3.1027-1033.1982); and (iii) is not involved in any DNA repair pathway. Therefore, analysing PtrA expression would take us away from the key questions of our study.

      (3) The authors agree that they cannot exclude the possibility of sRNA being involved in the translation regulation. However, this can be tested by performing the imaging experiments in the presence of Hfq proximal face mutations, which largely disrupt binding of sRNAs.

      (4) The data on construct with a long region of Hfq binding site on recB mRNA deleted is less convincing. There is no control to show that removing this sequence region itself has no effect on translation, and the effect is solely due to the lack of Hfq binding. A better experiment would be using a Hfq distal face mutant that is deficient in binding to the ARN motifs.

      We thank the referee for these suggestions. We have performed the requested experiments, and the quantification of RecB abundance in the presence of Hfq proteins mutated in the proximal and distal face will be added to the revised version of the manuscript.

      (5) Ln 249-251: The authors claim that the stability of recB mRNA is not changed in ∆hfq simply based on the steady-state mRNA level. To claim so, the lifetime needs to be measured in the absence of Hfq.

      We agree that this statement is not fully supported by our data and will address this issue in the revised version.

      (6) What's the labeling efficiency of Halo-tag? If not 100% labeled, is it considered in the protein number quantification? Is the protein copy number quantification through imaging calibrated by an independent method? Does Halo tag affect the protein translation or degradation?

      Our previous study (DOI: 10.1038/s41598-019-44278-0) described a detailed characterisation of the HaloTag labelling technique for quantifying low-copy proteins in single E. coli cells.

      In that study, we used RecB-HaloTag as an example of a low-copy number protein. We showed a complete quantitative agreement of RecB detection between two fully independent methods: HaloTag-based labelling with cell fixation and RecB-sfGFP combined with a microfluidic device that lowers protein diffusion in the bacterial cytoplasm. This second method has previously been validated for protein quantification (DOI: 10.1038/ncomms11641) and provides detection of 80-90% of the labelled protein. Additionally, in our protocol, immediate chemical fixation of cells after the labelling and quick washing steps ensure that new, unlabelled RecB proteins are not produced. We, therefore, conclude that our approach to RecB detection is highly reliable and sufficient for comparing RecB production in different conditions and mutants.

      The RecB-HaloTag construct has been designed for minimal impact on RecB production and function. The HaloTag is translationally fused to RecB in a loop positioned after the serine present at position 47 where it is unlikely to interfere with (i) the formation of RecBCD complex (based on RecBCD structure, DOI: 10.1038/nature02988), (ii) the initiation of translation (as it is far away from the 5’UTR and the beginning of the open reading frame) and (iii) conventional C-terminal-associated mechanisms of protein degradation (DOI: 10.15252/msb.20199208). In our manuscript, we showed that the RecB-HaloTag degradation rate is similar to the dilution rate due to bacterial growth. This is in line with a recent study on unlabelled proteins, which shows that RecB’s lifetime is set by the cellular growth rate (https://doi.org/10.1101/2022.08.01.502339) and indicates that the HaloTag fusion is not affecting RecB stability.

      Furthermore, we have demonstrated (DOI: 10.1038/s41598-019-44278-0) that (i) bacterial growth is not affected by replacing the native RecB with RecB-HaloTag, (ii) RecB-HaloTag is fully functional upon DNA damage, and (iii) no proteolytic processing of the RecB-HaloTag is detected by Western blot.

      These results suggest that RecB expression and functionality are unlikely to be affected by the translational HaloTag insertion at Ser-47 in RecB. In the revised version of the manuscript, we will add information about the construct and discuss the reliability of the quantification.

      (7) Upper panel of Fig S8a is redundant as in Fig 5B. Seems that Fig S8d is not described in the text.

      Indeed, the data in the upper panel in Fig S8a was repeated (from Fig 5B) for visual purposes to facilitate comparison with the panel below. We will modify the figure legend to indicate this repetition clearly.

      In Fig S8d, we confirmed the functionality of the Hfq protein expressed from the pQE-Hfq plasmid in our experimental conditions, which was not described in the text. We will include this clarification in the updated manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We are pleased that Reviewer 3 has deemed our revisions satisfactory; below, we provide responses to the remaining Recommendations for the Authors from Reviewer 2.

      Reviewer #2 (Recommendations For The Authors):

      Minor corrections:

      • Line 91: GWT should be GNWT

      Fixed, thank you.

      • Figure 2: fix the label "Participationcoefficient rank" (no space between Participation and coefficient)

      Fixed, thank you for spotting.

      • Line 317: Figure 2 should be Figure 3

      Fixed, thank you.

      • Line 360: Figure 4D, right?

      Fixed, thank you. We also confirm that Figure 4 and its caption are correct. Under anaesthesia, many regions have more Integrated Information than during Recovery (red regions), but the only changes that are consistently observed across all three contrasts are the decreases.

      • Line 375: Should be Figure 5A

      Fixed, thank you.

      • The recovery period of the anesthesia data is not described in Methods.

      We have now added the missing information:

      “Propofol was discontinued following the deep anaesthesia scan, and participants reached level 2 of the Ramsey scale approximately 11 minutes afterwards, as indicated by clear and rapid responses to verbal commands. This corresponds to the “recovery” period 176.”

      We have also expanded our discussion on the interaction between information decomposition and measures of directionality:

      “Indeed, transfer entropy can itself be decomposed into information-dynamic atoms through Partial Information Decomposition and Integrated Information Decomposition 33,34,49,151; ΦID can further decompose the Normalised Directed Transfer Entropy measure used by Deco et al 5, as recently demonstrated 152. We look forward to a more refined conceptualization of the synergistic workspace architecture that takes into account both information types and the directionality of information flow – especially in datasets with higher temporal resolution.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Pg. 3 - lines 51-53: "Once established, the canonical RdDM pathway takes over, whereby small RNAs are generated by the plant-specific polymerase IV (Pol IV). In both cases, a second plant-specific polymerase, Pol V, is an essential downstream component." The authors' intro omits an important aspect of Pol V's function in RdDM, which is quite relevant to their study. Pol V transcribes DNA to synthesize noncoding RNA scaffolds, to which AGO4-bound 24 nt siRNAs are thought to base pair, leading to DRM2 recruitment for cytosine methylation near to these nascent Pol V transcripts (Wierzbicki et al 2008 Cell; Wierzbicki et al. 2009 Nat Genet). I recommend that the authors cite these key studies.

      These citations have now been added (see line 57).

      The authors provide compelling evidence that Pol V redistributes to ectopic heterochromatin regions in h1 mutants (e.g., Fig1a browser shot). Presumably, this would allow Pol V to transcribe these regions in h1 mutants, whereas it could not transcribe them in WT plants. Have the authors detected and/or quantified Pol V transcripts in the h1 mutant compared to WT plants at the sites of Pol V redistribution (detected via NRPE1 ChIP)?

      Robust detection of Pol V transcripts can be experimentally challenging, and instead we quantify and detect NRPE1 dependent methylation at these regions (Fig 5), which occurs downstream of Pol V transcript production. However, we note detecting Pol V transcripts as a potential future direction in the discussion (see line 263).

      Pg. 5 - lines 101-102: Figure 1e - "The preferential enrichment of NRPE1 in h1 was more pronounced at TEs that overlapped with heterochromatin associated mark, H3K9me2 (Fig. 1e). Was a statistical test performed to determine that the overall differences are significant only at TE sites with H3K9me2? Can the sites without H3K9me2 also be differentiated statistically?

      Yes, there is a statistically significant difference between WT and h1 at both the H3K9me2 marked and unmarked TEs (Wilcoxon rank sum tests, see updated Fig 1e). The size of the effect is larger for the H3K9me2 marked TEs (median difference of 0.41 vs 0.16). Median values have now been added to the boxplots so that this is directly viewable to the reader (Fig 1e). This reflects the general increase in NRPE1 occupancy in h1 mutants through the genome, with the effect consistently stronger in heterochromatin. In our initial version of the manuscript, we summarise the effect as follows “We found that h1 antagonizes NRPE1 occupancy throughout the genome, particularly at heterochromatic regions” (previous version line 83, current version line 95). Although important exceptions exist (see Fig 5, NRPE1 and DNA methylation loss in h1), we now make this point even more explicit, and have updated the manuscript at several locations (abstract line 26, results line 245, discussion line 265).

      Pg. 5 - lines 108-110: The authors state, "Importantly, we found no evidence for increased NRPE1 expression at the mRNA or protein level in the h1 mutant (Suppl. Fig. 2)." But the authors did observe reduced NRPE1 transcript levels in h1 mutants, in their re-analysis of RNA-seq data and reduced NRPE1 protein signals via western blot in (Suppl. Fig. 2), which should be reported here in the results.

      As described further below, we reanalysed h1 RNA-seq from scratch, and see no evidence for significant differential gene expression of NRPE1. This table and analysis are now provided in Supplementary Table 1.

      More importantly, the above logic about NRPE1 expression in h1 mutants assumes that NRPE1 is the stoichiometrically limiting subunit for Pol V assembly and function in vivo, but this is not known to be the case:

      (1) While NRPE1's expression is somewhat reduced (and not increased) in h1 mutant plants, we cannot be certain that other genes influencing Pol V stability or recruitment are unaffected by h1 mutants. I thus recommend that the authors perform RT-qPCR directly on the WT and h1 mutant materials used in their current study, quantifying NRPE1, NRPE2, NRPE5, DRD1, DMS3, RDM1, SUVH2 and SUVH9 transcript levels.

      (2) Normalizations used to compare samples should be included with RT-qPCR and western assays. An appropriate house-keeping gene like Actin2 or Ubiquitin could be used to normalize the RT-qPCR. Protein sample loading in Suppl. Fig. 2 could be checked by Coomassie staining and/or an antibody detection of a house-keeping protein.

      We have now included a full re-analysis of h1 RNA-seq (data from Choi et al 2020) focusing on transcriptional changes of DNA methylation machinery genes in the h1 mutant. Of the 61 genes analysed, only AGO6 and AGO9 were found to be differentially expressed (2-3 fold upregulation). This analysis is now included as a table

      (Supplementary Table 1). The western blot has been moved to Supplementary Fig 3 to now illustrate antibody specificity and H1 loss in the h1 mutant lines, so NRPE1 itself serves as a loading control (Supplementary Fig 3a).

      Pg. 6 - lines 129-131: The authors state that "over NRPE1 defined peaks (where NRPE1 occupancy is strongest in WT) we observed no change in H1 occupancy in nrpe1 (Fig 2b). The results indicate that H1 does not invade RdDM regions in the nrpe1 mutant background." This conclusion assumes that the author's H1 ChIP is successfully detecting H1 occupancy. However, in Fig 2d there does not appear to be H1 enrichment or peaks as visualized across the 10766 ZF-DMS3 off-target loci, or even at the selected 451 ZFDMS3 off-target hyper DMRs, where the putative signal for H1 enrichment on the metaplot center is extremely weak/non-existent.

      As a reference for H1 enrichment in chromatin (e.g., looking where H2A.W antagonizes H1 occupancy) one can compare analyses in Bourguet et al (2021) Nat Commun, involving co-authors of the current study. Bourguet et al (2021) Fig 5b show a metaplot of H1 levels centered on H2A.W peaks with H1 ChIP signal clearly tapering away from the metaplot center point peak. To my eye, the H1 ChIP metaplots for ZF-DMS3 offtarget loci in the current manuscript (Fig 2d) resemble "shuffled peaks" controls like those in Fig 5b of Bourguet et al (2021).

      Can one definitively interpret Fig 2d as showing RdDM "not reciprocally affecting H1 localization" without first showing the specificity of the ChIP-seq results in a genotype where H1 occupancy changes? Alternatively, could this dataset be displayed with Deeptools heatmaps to strengthen the evidence that the authors are detecting H1 occupancy/enrichment genome-wide, before diving into WT/nrpe1 mutant analysis at ZF-DMS3 off-target loci?

      This is an excellent suggestion from the reviewer. We have now included several analyses that assess and demonstrate the quality of our H1 ChIP-seq profiles. First, as suggested by the reviewer, we show that our H1 profiles peak over H2A.W enriched euchromatic TEs as defined by Bourguet et al, mirroring these published findings. Next, we investigated whether our H1 profiles match Teano’s recently described pattern over genes, confirming a similar pattern with 3’ enrichment of H1 over H3K27me3 unmarked genes. Furthermore, we show that the H1 peaks defined here are similarly enriched with GFP tagged H1.2 from the Teano et al. 2023 study. These analyses that validate the quality of our H1 ChIP-seq datasets and bolster the conclusion that NRPE1 redistribution does not affect H1 occupancy. These new analysis are now presented in Supplementary Figure 3 and see line 153.

      Pg. 8 - lines 228-230: The authors state that, "As with NRPE1, SUVH1 increased in the h1 background significantly more in heterochromatin, with preferential enrichment over long TEs, cmt2 dependent hypo CHH DMRs, and heterochromatic TEs (Fig. 6b)."

      Contrary to the above statement, the violin plots in Fig. 6c show SUVH1 occupancy increasing at euchromatic TEs in the h1 mutant. What statistical test allowed the authors to determine that the increase in h1 occurs "significantly more in heterochromatin"? The authors should critically interpret Fig. 6c and 6d, which are not currently referenced in the results section. More support is needed for the claim that SUVH1 specifically encroaches into heterochromatin in the h1 mutant, rather than just TEs generally (euchromatic and heterochromatic alike).

      Similar to what we see for NRPE1, statistical tests that we have now performed show that SUVH1 is significantly enriched in h1 in all classes. Importantly however, the effect size is larger in all of the heterochromatin associated classes. We display these statistical tests and the median values on the plots so that effects are immediately viewable (see updated Fig 6).

      In addition, the authors should verify that SUVH1-3xFLAG transgenes (in the WT and h1 mutant backgrounds, respectively) and endogenous Arabidopsis genes encoding the transcriptional activator complex (SUVH1-SUVH3-DNAJ1-DNAJ2) are not overexpressed in the h1 mutant vs. WT. Higher expression of SUVH1 or limiting factors in the larger complex could explain the observation of increased SUVH1 occupancy in the h1 background.

      We do not see a difference in SUVH1/3/DNAJ1/2 complex gene expression in the h1 background (see Supplementary Table 1). However, we cannot rule out that that our SUVH1-FLAG line in h1 is more highly expressed than the corresponding SUVH1-FLAG line in WT. We now note this point in line 248.

      Pg. 8 - lines 231-232: Here the authors make a sweeping conclusion about H1 demarcating, "the boundary between euchromatic and heterochromatic methylation pathways, likely through promoting nucleosome compaction and restricting heterochromatin access." I do not see how a H1 boundary between euchromatic and heterochromatic methylation pathways is revealed based on the SUVH1-3xFLAG occupancy data, which shows increased enrichment at every category interrogated in the h1 mutant (Fig 6b,c,d) and all along the baseline too in the h1 mutant browser tracks (Fig 6a). Can the authors provide more examples of this phenomenon (similar to Fig 6a) and better explain why their SUVH1-3xFLAG ChIP supports this demarcation model?

      The general conclusion from SUVH1 about H1’s agnostic role in preventing heterochromatin access is now further supported from our findings with H3K27me3 (see Figure 6e and description from line 250). However, we agree that the demarcation model as initially presented was overly simplistic. This point was also raised by reviewer 2. We have removed the line highlighted by the reviewer in the revised version of the manuscript. In the revised version we clarify that H1 impedes RdDM and associated machinery throughout the genome (consistent with H1’s established broad occupancy across the genome) but this effect is most pronounced in heterochromatin, corresponding to maximal H1 occupancy (abstract line 26, results line 245, discussion line 265). 

      Corrections:

      Pg. 8 - lines 226-227: "We therefore wondered whether complex's occupancy might also be affected by H1." The sentence contains a typo, where I assume the authors mean to refer to occupancy by the SUVH1-SUVH3-DNAJ1-DNAJ2 transcriptional activator complex. This needs to be specified more clearly.

      The paragraph has been updated (see from line 237).

      Pg. 13 - lines 393-405: There are minor errors in the capitalization of titles and author initials in the References. I recommend that the authors proofread all the references to eliminate these issues:

      Thank you, these have been corrected.

      Choi J, Lyons DB, Zilberman D. 2021. Histone H1 prevents non-cg methylation-mediated small RNA biogenesis in arabidopsis heterochromatin. Elife 10:1-24. doi:10.7554/eLife.72676 (...)

      Du J, Johnson LM, Groth M, Feng S, Hale CJ, Li S, Vashisht A a., Gallego-Bartolome J, Wohlschlegel J a., Patel DJ, Jacobsen SE. 2014. Mechanism of DNA methylation-directed histone methylation by KRYPTONITE. Mol Cell 55:495-504. doi:10.1016/j.molcel.2014.06.009 (...)

      Du J, Zhong X, Bernatavichute Y V, Stroud H, Feng S, Caro E, Vashisht A a, Terragni J, Chin HG, Tu A, Hetzel J, Wohlschlegel J a, Pradhan S, Patel DJ, Jacobsen SE. 2012. Dual binding of chromomethylase domains to H3K9me2-containing nucleosomes directs DNA methylation in plants. Cell 151:167-80. doi:10.1016/j.cell.2012.07.034

      Reviewer #2 (Recommendations For The Authors):

      As for a normal review, here are our major and minor points.

      Major:

      (1) Lines 38 to 45 of the introduction are important for the subsequent definition of heterochromatic and non-heterochromatic transposons, but the definition is ambiguous. Is heterochromatin defined by surrounding context such as pericentromeric position or is this an autonomous definition? Can a TE with the chromosomal arms be considered heterochromatic provided that it is long enough and recruits the right machinery? These cases should be more explicitly introduced. Ideally, a supplemental dataset should provide a key to the categories, genomic locations and overlapping TEs as they were used in this analysis, even if some of the categories were taken from another study.

      We have now added all the regions used for analysis in this study to Supplementary Table 3.

      (2) Line 80: This would be the first chance to cite Teno et al. and the "encroachment" of

      PcG complexes to TEs in H1 mutants

      Done - “H1 also plays a key role in shaping nuclear architecture and preventing ectopic polycomb-mediated H3K27me3 deposition in telomeres (Teano et al., 2023).” See line 83

      (3) It is "only" a supplemental figure but S2 but it should still follow the rules: Indicate the number of biological replicates for the RNA-seq data, and perform a statistical test. In case of WB data, provide a loading control.

      We are now using the western blot to illustrate antibody specificity and H1 loss in the h1 mutant lines, so NRPE1 itself serves as a loading control (Supplementary Fig 3a). For NRPE1 mRNA expression, we have now replaced this with a more comprehensive transcriptome analysis of methylation machinery in h1 (see Supplementary Table 1). 

      (4) Lines 115 to 124 and corresponding data: Here, the goal is to exclude other changes to heterochromatin structure other than "increased access" in H1 mutants; however, only one feature, H3K9me2, is tested. Testing this one mark does not necessarily prove that the nature of the chromatin does not change, e.g. H2A.W could be differently redistributed, DDM1 may change, VIM protein, and others. Either more comprehensive testing for heterochromatin markers should be performed, or the conclusions moderated.

      We have moderated the text accordingly (see line 135).

      (5) Lines 166ff and Figure 1, a bit out of order also Figure 5: The general hypothesis is that NRPE1 redistributes to heterochromatic regions in h1 mutants (as do other chromatin modifiers), but the data seem to only support a higher occurrence at target sites.

      a. The way the NRPE1 data is displayed makes it seem like there is much more NRPE1 in the h1 samples, even at peaks that should not be recruiting more as they do not represent "long" TEs. It would be good to present more gbrowse shots of all peak classes.

      We now clarify that h1 does result in a general increase of NRPE1 throughout the genome, but the effect is strongest at heterochromatin. In our initial version of the manuscript, we summarise the effect as follows “We found that h1 antagonizes NRPE1 occupancy throughout the genome, particularly at heterochromatic regions” (previous version line 83, current version line 95). We have modified the language at several locations throughout the manuscript to make this point more clearly (abstract line 26, results line 245, discussion line 265). We include several browser shots in Supp Fig. 8.

      b. The data are "normalized" how exactly?

      c. One argument of observing "gaining" and "losing" peaks is that there is redistribution of NRPE1 from euchromatic to heterochromatic sites. There should be an analysis and figure to corroborate the point (e.g. by comparing FRIP values). Figure 1b shows lower NRPE1 signals at the TE flanking regions. This could reflect a redistribution or a flawed normalization procedure.

      The data are normalised using a standardised pipeline by log2 fold change over input, after scaling each sample by mapped read depth using the bamCompare function in deepTools. This is now described in detail in the Materials and Methods line 365, with full code and pipelines available from GitHub (https://github.com/Zhenhuiz/H1-restrictseuchromatin-associated-methylation-pathways-from-heterochromatic-encroachment).

      d. Figure 1d and f show similar profiles comparing "long" and "short" TEs or "CMT2 dependent hypo-CHH" and "DRM2 dependent CHH". How do these categories relate to each other, how many fragments are redundant?

      The short vs long TEs were defined in Liu et al 2018 (doi: 10.1038/s41477-017-0100-y) and the DMRs were defined in Zhang et al. 2018 (DOI: 10.1073/pnas.1716300115). There is likely to be some degree of overlap between the categories, but numbers are very different (short TEs (n=820), long TEs (n=155), drm2 DMRs (n=5534), CMT (n=21784)) indicating that the different categories are informative. We have now listed all the regions used for analysis in this study as in Supplementary Table 3.

      e. The purpose of the data presented in Figure 1 b is to compare changes of NRPE1 association in H3K9me3 non-overlapping and overlapping TEs between wild-type and background, yet the figure splits the categories in two subpanels and does neither provide a fold-change number nor a statistical test of the comparison. As before, the figure does not really support the idea that NPRE1 somehow redistribute from its "normal" sites towards heterochromatin as both TE classes seem to show higher NRPE1 binding in h1 mutants.

      There is a statistically significant difference between WT and h1 at both the H3K9me2 marked and unmarked TEs, however, the size of the effect is larger for the H3K9me2 marked TEs (median difference of 0.41 vs 0.16). Median values have now been added to the boxplots so that this is directly viewable to the reader (Fig 1e). Although important exceptions exist (see Fig 5 – regions that lose NRPE1 and DNA methylation), this reflects the general increase in NRPE1 occupancy in h1 mutants throughput the genome, with a consistently stronger effect in heterochromatin. As noted above, we have updated the manuscript to make this point more clearly (abstract line 26, results line 245, discussion line 265).

      f. Panel g is the only attempt to corroborate the redistribution towards heterochromatic regions, but at this scale, the apparent reduction of binding in the chromosome arms may be driven by off-peak differences and normalization problems between different ChIP samples with different signal-to-noise-ratio.

      We describe our normalisation and informatic pipeline in more detail in the Materials and Methods line 365. It is also important to note that the reduction is not only observed at the chromosomal level, but also at specific sites. We called differential peaks between WT and h1 mutant. The "Regions that gain NRPE1 in h1" peaks are more enriched in heterochromatic regions, while " Regions that lose NRPE1 in h1" peaks are more enriched outside heterochromatic regions.

      g. Figure 5: how many regions gain vs lose NRPE1 in h1 mutants? If the "redistribution causes loss" scenario applies, the numbers should overall be balanced but that does not seem the case. The loss case appears to be rather exceptional judging from the zigzagging meta-plot. Are these sites related to the sites taken over by PcG-mediated repression in h1 mutants?

      As described in line 222 (previous version of the manuscript line 206), there are 15,075 sites that gain and 1,859 sites that lose NRPE1 in h1. Comparing these sites to

      H3K27me3 in the Teano et al. study was an excellent suggestion. We compared sites that gain NRPE1 to sites that gain H3K27me3 in h1, finding a statistically significant overlap (2.4 fold enrichment over expected, hypergeometric test p-value 2.1e-71). Reciprocally, sites that lose NRPE1 were significantly enriched for overlap with H3K27me3 loss regions (1.6 fold over expected, hypergeometric test p-value 1.4e-4). This indicates that RdDM and H3K27me3 patterning are similarly modulated by H1. To directly test this, we reanalysed the H3K27me3 ChIP-seq data from Teano et al., finding coincident gain and loss of H3K27me3 at sites that gain and lose NRPE1 in h1. These results are described from line 250 and in Fig 6e, which supports a general role for H1 in preventing heterochromatin encroachment.

      (6) Lines 166ff and Figure 3: The data walk towards the scenario of pathway redistribution but actually find that RdDM plays a minor role overall as a substantial increase in heterochromatin regions occurs in all contexts and is largely independent of RdDM.

      a. How exactly are DNA-methylation data converted across regions to reach a fraction score from 0 to 1? There is no explanation in the legend for the methods that allow to recapitulate.

      We now explain our methods in full in the Materials and Methods and all the code for generating these has now been deposited on GitHub (https://github.com/Zhenhuiz/H1restricts-euchromatin-associated-methylation-pathways-from-heterochromaticencroachment). Briefly, BSMAP is used to calculate the number of reads that are methylated vs unmethylated on a per-cytosine basis across the genome. Next, the DNA methylation fraction in each region is calculated by adding all the methylation fractions per cytosine in a given window, and divided by the total number of cytosines in that same window (ie mC/(unmC+mC)) i.e. this is expressed as a fraction ranging from 0 to 1.

      “0” indicates this region is not methylated, and “1” indicates this region is fully methylated (every cytosine is 100% methylated).  

      b. Kernel plots? These are slang for experts and should be better described. In addition, nothing is really concluded from these plots in the text, although they may be quite informative.

      Kernel density plots show the proportion of TEs that gain or lose methylation in a particular mutant, rather than the overall average as depicted in the methylation metaplots above. We now describe the kernel density plots in more detail in the Figure 3 legend. 

      (7) Figure 4: This could be a very interesting analysis if the reader could actually understand it.

      a. The legend is minimal. What is the meaning of hypo and hyper regions indicated to the right of Figure 4c?

      b. The color scale represents observed/expected values. What exactly does this mean? Mutant vs WT?

      c. Some comparisons in 4a are cryptic, e.g. h1 nrpe1 nrpe1 vs CHH?

      d. Figure 4d focuses on a correlation square of relevance, but why? Interestingly the square does not correspond to any "hypo" or "hyper" label?

      Thank you, we have revised Figure 4 and legend based on these suggestions to clarify all of the above.

      (8) Lines 226 and Figure 6B. De novo (or increased) targeting of SUVH1 to heterochromatic sites in h1 mutants, similar to NRPE1, is used to support the argument that more access allows other chromatin modifiers to encroach. SUVH1 strongly depends on RdDM for its in vivo binding and may be the least conclusive factor to argue for a "general" encroachment mechanism.

      We appreciate the reviewers point here. Something that is entirely independent of RdDM following the same pattern would be stronger evidence in favour of general encroachment. Excitingly, this is exactly what we provide evidence for when investigating the interrelationship with H3K27me3 and we appreciate the reviewer’s suggestion to check this! This data is now described in Figure 6e and line 250.

      Minor:

      (1) Line 23: "Loss of H1 resulted in heterochromatic TE enrichment by NRPE1." This does not seem right. NRPE enrichment as TEs

      Modified, (line 26) thank you.

      (2) Lines 73-74: The idea that DDM1 displaces H1 in heterochromatic TEs is somewhat counterintuitive to model that heterochromatic TEs are unavailable for RdDM because of the presence of H1. Is this displacement non-permanent and directly linked to interaction with CMT2/3 Met1?

      This is a very good question and we agree with the reviewer that the effect of DDM1 may only be transient or insufficient to allow for full RdDM assembly, or indeed there may be a direct interaction between DDM1 and CMTs/MET1. During preparation of these revisions, a structure of Arabidopsis nucleosome bound DDM1 was published, which provides some insight by showing that DDM1 promotes DNA sliding. This is at least consistent with the idea of DDM1 causing transient / non-permanent displacement of H1 that would be insufficient for RdDM establishment. We incorporate discussion of these ideas at line 80.

      (3) Line 85: A bit more background on the Reader activator complex should be given. In fact, the reader may not really care that it was more recently discovered (not really recent btw) but what does it actually do?

      We have quite extensively reconfigured this paragraph to take into account our new finding with H3K27me3, such that there is less emphasis on the reader activator complex. The sentence now reads as follows:

      “We found that h1 antagonizes NRPE1 occupancy throughout the genome, particularly at heterochromatic regions. This effect was not limited to RdDM,  similarly impacting both the methylation reader complex component, SUVH1 (Harris et al., 2018) and polycomb-mediated H3K27me3 (Teano et al., 2023).” (line 95). 

      Also, when describing the experiment the results section (line 241), we now provide more background on SUVH1’s function.

      (4) Lines 80-81: Since it is already shown that RdDM associated small RNAs are more enriched in h1 at heterochromatin, help us to know what is precisely the added value of studying the enrichment of NRPE1 at these sites.

      Good point. We have the following line: ‘...small RNAs are not a direct readout of functional RdDM activity and Pol IV dependent small RNAs are abundant in regions of the genome that do not require RdDM for methylation maintenance and that do not contain Pol V (Stroud et al., 2014).’ (line 90)

      (5) Line 99: This seems to be the only time where the connection between long TEs and heterochromatic regions is mentioned but no source is cited.

      We have added the following appropriate citations: (Bourguet et al., 2021; Zemach et al., 2013). (line 110).

      (6) Line 100: DMRs is used for the first time here without explanation and full text. The abbreviation is introduced later in the text (Line 187).

      Thank you, we now describe DMRs upon first use, line 112.

      (7) Figure 2: Panels 2 c and d should show metaplots for WT and transgenes in one panel. There is something seriously wrong with the normalization in d or the scale for left and right panel is not the same. Neither legend nor methods describe how normalization was performed.

      Thank you for pointing this out, the figure has been corrected. We have updated the Materials and Methods (line 365) and have added codes and pipelines to GitHub to explain the normalisation procedure in more detail (https://github.com/Zhenhuiz/H1restricts-euchromatin-associated-methylation-pathways-from-heterochromaticencroachment).

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are very grateful to the reviewers for their constructive comments. Here is a summary of the main changes we made from the previous manuscript version, based on the reviewers’ comments:

      (1) Introduction of a new model, based on a Markov chain, capturing within-trial evolution in search strategy .

      (2) Addition of a new figure investigating inter-animal variations in search strategy.

      (3) Measurement of model fit consistency across 10 simulation repetitions, to prevent the risk of model overfitting.

      (4) Several clarifications have been made in the main text (Results, Discussion, Methods) and figure legends.

      (5) We now provide processed data and codes for analyses and models at GitHub repository

      (6) Simplification of the previous modeling. We realized that the two first models in the previous manuscript version were simply special cases of the third model. Therefore, we retained only the third model, which has been renamed as the ‘mixture model’.

      (7) Modification of Figure 4-6 and Supplementary Figure 7-8 (or their creation) to reflect the aforementioned changes

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors design an automated 24-well Barnes maze with 2 orienting cues inside the maze, then model what strategies the mice use to reach the goal location across multiple days of learning. They consider a set of models and conclude that one of these models, a combined strategy model, best explains the experimental data.

      This study is written concisely and the results presented concisely. The best fit model is reasonably simple and fits the experimental data well (at least the summary measures of the data that were presented).

      Major points:

      (1) One combined strategy (once the goal location is learned) that might seem to be reasonable would be that the animal knows roughly where the goal is, but not exactly where, so it first uses a spatial strategy just to get to the first vestibule, then switches to a serial strategy until it reaches the correct vestibule. How well would such a strategy explain the data for the later sessions? The best combined model presented in the manuscript is one in which the animal starts with a roughly 50-50 chance of a serial (or spatial strategy) from the start vestibule (i.e. by the last session before the reversal the serial and spatial strategies are at ~50-50m in Fig. 5d). Is it the case that even after 15 days of training the animal starts with a serial strategy from its starting point approximately half of the time? The broader point is whether additional examination of the choices made by the animal, combined with consideration of a larger range of possible models, would be able to provide additional insight into the learning and strategies the animal uses.

      Our analysis focused on the evolution of navigation strategies across days and trials. The reviewer raises the interesting possibility that navigation strategy might evolve in a specific manner within each trial, especially on the later days once the environment is learned. To address this possibility, we first examined how some of the statistical distributions, previously analyzed across days, evolved within trials. Consistent with the reviewer’s intuition, the statistical distributions changed within trials, suggesting a specific strategy evolution within trials. Second, we developed a new model, where strategies are represented as nodes of a Markov chain. This model allows potential strategy changes after each vestibule visit, according to a specific set of transition probabilities. Vestibules are chosen based on the same stochastic processes as in the previous model. This new model could be fitted to the experimental distributions and captured both the within-trial evolution and the global distributions. Interestingly, the trials were mostly initiated in the random strategy (~67% chance) and to a lesser extent in the spatial strategy (~25% chance), but rarely in the serial strategy (~8% chance). This new model is presented in Figure 6.

      (2) To clarify, in the Fig. 4 simulations, is the "last" vestibule visit of each trial, which is by definition 0, not counted in the plots of Fig. 4b? Otherwise, I would expect that vestibule 0 is overrepresented because a trial always ends with Vi = 0.

      The last vestibule visit (vestibule 0 by definition) is counted in the plots of Fig.4b. We initially shared the same concern as the reviewer. However, upon further consideration, we arrived at the following explanation: A factor that might lead to an overrepresentation of vestibule 0 is the fact that, unlike other vestibules, it has to be contained in each trial, as trials terminated upon the selection of vestibule 0. Conversely, a factor that might contribute to an underrepresentation of vestibule 0 is that, unlike other vestibules, it cannot be counted more than once per trial. Somehow these two factors seem to counterbalance each other, resulting in no discernible overrepresentation or underrepresentation of vestibule 0 in the random process. 

      Reviewer #2 (Public Review):

      This paper uses a novel maze design to explore mouse navigation behaviour in an automated analogue of the Barnes maze. Overall I find the work to be solid, with the cleverly designed maze/protocol to be its major strength - however there are some issues that I believe should be addressed and clarified.

      (1) Whilst I'm generally a fan of the experimental protocol, the design means that internal odor cues on the maze change from trial to trial, along with cues external to the maze such as the sounds and visual features of the recording room, ultimately making it hard for the mice to use a completely allocentric spatial 'place' strategy to navigate. I do not think there is a way to control for these conflicts between reference frames in the statistical modelling, but I do think these issues should be addressed in the discussion.

      It should be pointed out that all cues on the maze (visual, tactile, odorant) remained unchanged across trials, since the maze was rotated together with goal and guiding cues. Furthermore, the maze was equipped with an opaque cover to prevent mice from seeing the surrounding room (the imaging of mouse trajectories was achieved using infrared light and camera). It is however possible that some other cues such as room sounds and odors could be perceived and somewhat interfered with the sensory cues provided inside the maze. We have now mentioned this possibility in the discussion.

      (2) Somewhat related - I could not find how the internal maze cues are moved for each trial to demarcate the new goal (i.e. the luminous cues) ? This should be clarified in the methods.

      The luminous cues were fixed to the floor of the arena. Consequently, they rotated along with the arena as a unified unit, depicted in figure 1. We have added some clarifications in Figure 1 legend and methods.

      (3) It appears some data is being withheld from Figures 2&3? E.g. Days 3/4 from Fig 2b-f and Days 1-5 on for Fig 3. Similarly, Trials 2-7 are excluded from Fig 3. If this is the case, why? It should be clarified in the main text and Figure captions, preferably with equivalent plots presenting all the data in the supplement.

      The statistical distributions for all single days/trials are shown in the color-coded panels of Figure2&3. In the line plots of Figure2&3, we show only the overlay of 2-3 lines for the sake of clarity. The days/trials represented were chosen to capture the dynamic range of variability within the distributions. We have added this information in the figure legends.

      (4) I strongly believe the data and code should be made freely available rather than "upon reasonable request".

      Matrices of processed data and various codes for simulations and analyses are now available at https://github.com/ sebiroyerlab/Vestibule_sequences.

      Reviewer #3 (Public Review):

      Royer et al. present a fully automated variant of the Barnes maze to reduce experimenter interference and ensure consistency across trials and subjects. They train mice in this maze over several days and analyze the progression of mouse search strategies during the course of the training. By fitting models involving stochastic processes, they demonstrate that a model combined of the random, spatial, and serial processes can best account for the observed changes in mice's search patterns. Their findings suggest that across training days the spatial strategy (using local landmarks) was progressively employed, mostly at the expense of the random strategy, while the serial strategy (consecutive nearby vestibule check) is reinforced from the early stages of training. Finally, they discuss potential mechanistic underpinnings within brain systems that could explain such behavioral adaptation and flexibility.

      Strength:

      The development of an automated Barnes maze allows for more naturalistic and uninterrupted behavior, facilitating the study of spatial learning and memory, as well as the analysis of the brain's neural networks during behavior when combined with neurophysiological techniques. The system's design has been thoughtfully considered, encompassing numerous intricate details. These details include the incorporation of flexible options for selecting start, goal, and proximal landmark positions, the inclusion of a rotating platform to prevent the accumulation of olfactory cues, and careful attention given to atomization, taking into account specific considerations such as the rotation of the maze without causing wire shortage or breakage. When combined with neurophysiological manipulations or recordings, the system provides a powerful tool for studying spatial navigation system.

      The behavioral experiment protocols, along with the analysis of animal behavior, are conducted with care, and the development of behavioral modeling to capture the animal's search strategy is thoughtfully executed. It is intriguing to observe how the integration of these innovative stochastic models can elucidate the evolution of mice's search strategy within a variant of the Barnes maze.

      Weakness:

      (1) The development of the well-thought-out automated Barnes maze may attract the interest of researchers exploring spatial learning and memory. However, this aspect of the paper lacks significance due to insufficient coverage of the materials and methods required for readers to replicate the behavioral methodology for their own research inquiries.

      Moreover, as discussed by the authors, the methodology favors specialists who utilize wired recordings or manipulations (e.g. optogenetics) in awake, behaving rodents. However, it remains unclear how the current maze design, which involves trapping mice in start and goal positions and incorporating angled vestibules resulting in the addition of numerous corners, can be effectively adapted for animals with wired implants.

      The reviewer is correct in pointing out that the current maze design is not suitable for performing experiments with wired implant, particularly due to the maze’s enclosed structure and the access to the start/goal boxes through side holes. Instead, pharmacogenetics and wireless approaches for optogenetic and electrophysiology would need to be used. We have now mentioned this limitation in the discussion.

      (2) Novelty: In its current format, the main axis of the paper falls on the analysis of animal behavior and the development of behavioral modeling. In this respect, while it is interesting to see how thoughtfully designed models can explain the evolution of mice search strategy in a maze, the conclusions offer limited novel findings that align with the existing body of research and prior predictions.

      We agree with the reviewer that our study is weakly connected to previous researches on hippocampus and spatial navigation, as it consists mainly of animal behavior analysis and modeling and addresses a relatively unexplored topic. We hope that the combination of our behavioral approach with optogenetic and electrophysiology will allow in the future new insights that are in line with the existing body of research.

      (3) Scalability and accessibility: While the approach may be intriguing to experts who have an interest in or are familiar with the Barnes maze, its presentation seems to primarily target this specific audience. Therefore, there is a lack of clarity and discussion regarding the scalability of behavioral modeling to experiments involving other search strategies (such as sequence or episodic learning), other animal models, or the potential for translational applications. The scalability of the method would greatly benefit a broader scientific community. In line with this view, the paper's conclusions heavily rely on the development of new models using custom-made codes. Therefore, it would be advantageous to make these codes readily available, and if possible, provide access to the processed data as well. This could enhance comprehension and enable a larger audience to benefit from the methodology.

      The current approach might indeed extend to other species in equivalent environments and might also constitute a general proof of principle regarding the characterization of animal behaviors by the mixing of stochastic processes. We have now mentioned these points in the discussion.

      As suggest by the reviewer, we have now provided model/simulation codes and processed data to replicate the figures, at https://github.com/sebiroyerlab/Vestibule_sequences

      (4) Cross-validation of models: The authors have not implemented any measures to mitigate the risk of overfitting in their modeling. It would have been beneficial to include at least some form of cross-validation with stochastic models to address this concern. Additionally, the paper lacks the presence of analytics or measures that assess and compare the performance of the models.

      To avoid the risk of model overfitting, the most appropriate solution appeared to be repeating the simulations several times and examining the consistency of the obtained parameters across repetitions. For the mixture model, we now show in Supplementary figure 7 the probabilities obtained from 10 repetitions of the simulation. Similarly, for the Markov chain model, the probabilities obtained from 10 repetitions of the simulation are shown in Figure 6.

      Regarding model comparison, we have simplified our mixture model into only one model, as we realized the 2 other models in the previous manuscript version were simply special cases of the 3rd model. Nevertheless, comparison was still needed for the estimation for the best value of N (the number of consecutive segments that a strategy lasts) in the mixture model. We now show the comparison of mean square errors obtained for different values of N, using t-test across 10 repetitions of the simulations (Figure 5c).

      (5) Quantification of inter-animal variations in strategy development: It is important to investigate, and address the argument concerning the possibility that not all animals recruit and develop the three processes (random, spatial, and serial) in a similar manner over days of training. It would be valuable to quantify the transition in strategy across days for each individual mouse and analyze how the population average, reflecting data from individual mice, corresponds to these findings. Currently, there is a lack of such quantification and analysis in the paper.

      We have added a figure (Supplementary figure 8) showing the mixture model matching analyses for individual animals. A lot of variability is indeed observed across animals, with some animals displaying strong preferences for certain strategies compare to others. The average across mouse population showed a similar trend as the result obtained with the pooled data.

      Recommendations for the authors:

      Summary of Reviewer Comments:

      (1) In its present form, the manuscript lacks sufficient coverage of the materials and methods necessary for readers to replicate the behavioral methodology in their own research inquiries. For instance, it would be beneficial to clarify how the cues are rotated relative to the goal.

      (2) The models may be over-fitted, leading to spurious conclusions, and cross-validation is necessary to rule out this possibility.

      (3) The specific choice of the three strategies used to fit behavior in this model should be better justified, as other strategies may account for the observed behavior.

      (4) The study would benefit from an analysis of behavior on an animal-by-animal basis, potentially revealing individual differences in strategies.

      (5) Spatial behavior is not necessarily fully allocentric in this task, as only the two cues in the arena can be used for spatial orientation, unlike odor cues on the floor and sound cues in the room. This should be discussed.

      (6) Making the data and code fully open source would greatly strengthen the impact of this study.

      In addition, each reviewer has raised both major and minor concerns which should be addressed if possible.

      Reviewer #1 (Recommendations For The Authors):

      Minor points:

      (1) Change "tainted" to "tinted" in Fig. 1a

      (2) Should note explicitly in Fig. 2d that the goal is at vestibule 0, and also in the legend

      (3) Fig. 3 legend should say "c-e)", not "c-f)"

      (4) Supplementary Fig. 8 legend repeats "d)" twice

      Reviewer #2 (Recommendations For The Authors):

      Packard & McGaugh 1996 is cited twice as refs 5 and 14

      Reviewer #3 (Recommendations For The Authors):

      - Figure 3: Please correct the labels referenced as "c-f)" in the figure's legend.

      - Rounding numbers issue on page 4: 82.62% + 17.37% equals 99.99%, not 100%.

      We fixed all minor points. We are very thankful to the reviewers for their constructive comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are thankful to the reviewers and the editor for their detailed feedback, insightful suggestions, and thoughtful assessment of our work. Our point-by-point responses to the comments and suggestions are below.

      The revised manuscript has taken into account all the comments of the three reviewers. Modifications include corrections to errors in spelling and unit notation, additional quantification, improvements to the clarity of the language in some places, as well as additional detail in the descriptions of the methods, and revisions to the figures and figure legends.

      We have also undertaken additional analyses and added materials in response to reviewer suggestions. In brief:

      In response to a suggestion from Reviewer #1, we added Figure 6-1 to show examples of the calcium traces of individual fish and individual ROIs from the condensed data in Figure 6. We revised Figure 7 as follows:

      • We added an analysis of the duration of the response to shock to address comments from Reviewers #2 and #3.

      • In response to Reviewer #3, we added histograms showing the distribution of the amplitudes of the calcium signals in the gsc2 and rln3a neurons to show, without relying on the detection of peaks in the calcium trace, that the rln3a neurons have more oscillations in activity.

      We added Figure 8-2 in response to the suggestion from Reviewer #3 to analyze turning behavior in larvae with ablated rln3a neurons.

      To address Reviewer #2’s suggestion to show how the ablated transgenic animals compare to the non-ablated transgenic animals of the same genotype, we have added this analysis as Figure 8-3.

      A detailed point-by-point is as follows:

      The reviewers agree that the study of Spikol et al is important, with novel findings and exciting genetic tools for targeting cell types in the nucleus incertus. The conclusions are overall solid. Results could nonetheless be strengthened by performing few additional optogenetic experiments and by consolidating the analysis of calcium imaging and behavioral recordings as summarized below.

      (1) Light pulses used for optogenetic-mediated connectivity mapping were very long (5s), which could lead to non specific activation of numerous population of neurons than the targeted ones. To confirm their results, the authors should repeat their experiments with brief 5-50ms (500ms maximum) -long light pulses for stimulation.

      As the activity of the gsc2 neurons is already increased by 1.8 fold (± 0.28) within the first frame that the laser is activated (duration ~200 msec), it is unlikely that that the observed response is due to non-specific activation induced by the long light pulse.

      (2) In terms of analysis, the authors should improve :

      a) The detection of calcium events in the "calcium trace" showing the change in fluorescence over time by detecting the sharp increase in the signal when intracellular calcium rises;

      We have added an additional analysis to Figure 7 that does not rely on detection of calcium peaks. See response to Reviewer #3.

      b) The detection of bouts in the behavioral recordings by measuring when the tail beat starts and ends, thereby distinguishing the active swimming during bouts from the immobility observed between bouts.

      Our recordings capture the entire arena that the larva can explore in the experiment and therefore lack the spatial resolution to capture and analyze the tail beat. Rather, we measured the frequency and length of phases of movement in which the larva shows no more than 1 second of immobility. To avoid confusion with studies that measure bouts from the onset of tail movement, we removed this term from the manuscript and refer to activity as phases of movement.

      (3) The reviewers also ask for more precisions in the characterization of the newly-generated knock-in lines and the corresponding anatomy as explained in their detailed reports.

      Please refer to the point-by-point request for additional details that have now been added to the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      The conclusions of this paper are mostly well supported by data, but some technical aspects, especially about calcium imaging and data analysis, need to be clarified.

      (1) Both the endogenous gsc2 mRNA expression and Tg(gsc2:QF2) transgenic expression are observed in a neuronal population in the NI, but also in a more sparsely distributed population of neurons located more anteriorly (for example, Fig. 2B, Fig. 5A). The latter population is not mentioned in the text. It would be necessary to clarify whether or not this anterior population is also considered as the NI, and whether this population was included for the analysis of the projection patterns and ablation experiments.

      The sparsely distributed neurons had been mentioned in the Results, line 134, but we have now added more detail. In line 328, we have clarified that: “As the sparsely distributed anterior group of gsc2 neurons (Fig. 2B, C) are anatomically distinct from the main cluster and not within the nucleus incertus proper, they were excluded from subsequent analyses.”

      (2) Both Tg(gsc2:QF2) and Tg(rln3a:QF2) transgenic lines have the QF genes inserted in the coding region of the targeted genes. This probably leads to knock out of the gene in the targeted allele. Can the authors mention whether or not the endogenous expression of gsc2 and rln3a was affected in the transgenic larvae? Is it possible that the results they obtained using these transgenic lines are affected by the (heterozygous or homozygous) mutation of the targeted genes?

      Figure 8-1 includes in situ hybridization for gsc2 and rln3a in heterozygous Tg(gsc2:QF2)c721; Tg(QUAS:GFP)c578 and Tg(rln3a:QF2; he1.1:YFP)c836; Tg(QUAS:GFP)c578 transgenic larvae.

      The expression of gsc2 is unaffected in Tg(gsc2:QF2)c721; Tg(QUAS:GFP)c578 heterozygotes

      (Fig. 8-1A), whereas the expression of rln3a is reduced in Tg(rln3a:QF2; he1.1:YFP)c836; Tg(QUAS:GFP)c578 heterozygous larvae (Fig. 8-1D), as mentioned in the legend for Figure 8-1. We confirmed these findings by comparing endogenous gene expression between transgenic and non-transgenic siblings that were processed for RNA in situ hybridization in the same tube.

      The behavioral results we obtained are not due to rln3a heterozygosity because comparisons were made with sibling larvae that are also heterozygous for Tg(rln3a:QF2; he1.1:YFP)c836; Tg(QUAS:GFP)c578, as stated in the Figure 8 legend.

      (3) Optogenetic activation and simultaneous calcium imaging is elegantly designed using the combination of the orthogonal Gal4/UAS and QF2/QUAS systems (Fig. 6). However, I have some concerns about the analysis of calcium responses from a technical point of view. Their definition of ΔF/F in this manuscript is described as (F-Fmin)/(Fmax-Fmin) (see line 1406). This is confusing because it is different from the conventional definition of ΔF/F, which is F-F0/F0, where F0 is a baseline GCaMP fluorescence. Their way of calculating the ΔF/F is inappropriate for measuring the change in fluorescence relative to the baseline signal because it rather normalizes the amplitude of the responses across different ROIs. The same argument applies to the analyses done for Fig. 7.

      We have taken a careful look at our analyses and replotted the data using F-F0/F0. However, this only changes Y-axis values and does not change the shape of the calcium trace or the change in signal upon stimulation. Both metrics (F-F0/F0 and (F-Fmin)/(Fmax-Fmin)) adjust the fluorescence values of each ROI to its own baseline.

      (4) The %ΔF/F plots shown in Fig.6 are highly condensed showing the average of different ROIs (cells) within one fish and then the average of multiple fish. It would be helpful to see example calcium traces of individual ROIs and individual fish to know the variability across ROIs and fish. Also, It would be helpful to know how much laser power (561 nm laser) was used to photostimulate ReaChR.

      Laser power (5%) was added to the section titled Calcium Signaling in Methods.

      In Figure 6, shading in the %ΔF/F plots (D, D’, E, E’, F, F’, G, G’, H, H’) represents the variability across ROIs, and the dot plots (D’’, E’’, F’’, G’’, H’’) show the variability across fish (where each data point represents an individual fish). We have now also added Figure 6-1 with examples of calcium traces from individual fish and individual ROIs.

      (5) Some calcium traces presented in Fig. 6 (Fig. 6D, D', F, H, H') show discontinuous fluctuations at the onset and offset of the photostimulation period. Is this caused by some artifacts introduced by switching the settings for the photostimulation? The authors should mention if there are some alternative explanations for this discontinuity.

      As noted by the reviewer, this artifact does result from switching the settings for photostimulation, which we mention in the legend for Figure 6.

      (6) In the introduction, they mention that the griseum centrale is a presumed analogue of the NI (lines 74-75). It would be helpful for the readers to better understand the brain anatomy if the authors could discuss whether or not their findings on the gsc2 and rln3a NI neurons support this idea.

      Our findings on the gsc2 and rln3a neurons support the idea that the griseum centrale of fish is the analogue of the mammalian NI. We have now edited the text in the third paragraph of the discussion, line 1271, to make this point more clearly: “By labeling with QUAS-driven fluorescent reporters, we determined that the anatomical location, neurotransmitter phenotype, and hodological properties of gsc2 and rln3a neurons are consistent with NI identity, supporting the assertion that the griseum centrale of fish is analogous to the mammalian NI. Both groups of neurons are GABAergic, reside on the floor of the fourth ventricle and project to the interpeduncular nucleus.”

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      (1) Throughout the figures a need for more precision and reference in the anatomical evidence:

      • Specify how many planes over which height were projected for each Z-projection in Figure 1,2,3, ....

      We added this information to the last paragraph of the section titled Confocal Imaging within the Materials and Methods.

      • Provide the rhombomere numbers, deliminate the ventricles & always indicate on the panel the orientation (Rostral Caudal, Left Right or Ventral Dorsal) for Figure 1 panels D-F , Figure 2-1B-G, Figure 2-2A-C in the adult brain, Figure 3.

      We annotated Figures 2-1 and 2-2 as suggested. We also indicated the orientation (anterior to the top or anterior to the left) in all figure legends. For additional context on the position of gsc2 and rln3a neurons within the larval brain, refer to Fig. 1A-C’, Fig. 1-2A, Fig. 2, Fig. 4 and Fig. 5.

      • Add close up when necessary: Figure 2-2A-C, specify in the text & in the figure where are the axon bundles from the gsc2+ neurons in the adult brain- seems interesting and is not commented on?

      We added a note to the legend of Figure 2-2: Arrowheads in B and B’ indicate mApple labeling of gsc2 neuronal projections to the hypothalamus. We also refer to Fig 2-2B, B’ in the Results section titled Distinct Projection Patterns of gsc2 and rln3a neurons.

      • keep the same color for one transgene within one figure: example, glutamatergic neurons should always be the same color in A,B,C - it is confusing as it is.

      We have followed the reviewer’s suggestion and made the color scheme consistent in Figure 3.

      • Movies: add the labels (which transgenic lines in which color, orientation & anatomical boundaries for NI, PAG, any other critical region that receives their projections and the brain ventricle boundaries) on the anatomical movies in supplemental (ex Movie 4-1 for gsc2 neurons and 4-2 for rln3 neurons: add cerebellum, IPN, raphe, diencephalon, and rostral and caudal hypothalamus, medulla for 4-1 as well as lateral hypothalamus and optic tectum for 42); add the ablated region when necessary.

      We added more detail to the movie legends. Please refer to Figure 4 for additional anatomical details.

      • for highlighting projections from NI neurons and distinguish them from the PAG neurons, the authors elegantly used 2 Photon ablation of one versus the other cluster: this method is valid but we need more resolution that the Z stacks added in supplemental by performing substraction of before and after maps.

      We are not sure what the author meant by subtraction as there are no before and after images in this experiment. Larvae underwent ablation of cell bodies and were imaged one day later in comparison to unablated larvae.

      In particular, it is not clear to me if both PAG and NI rln3a neurons project to medulla - can the authors specify this point & the comparison between intact & PAG vs NI ablation maps? The authors should resolve better the projections to all targeted regions of NI gsc2 neurons and differentiate them from other PAG gsc2 neurons, same for rln3a neurons.

      We have clarified this point on line 549.

      Make sure to mention in the result section the duration between ablation & observation that is key for the axons to degrade.

      We always assessed degeneration of neuronal processes at 1-day post-ablation.

      (“2) calcium imaging experiments:

      a) with optogenetic connectivity mapping:

      the authors combine an impressive diverse set of optogenetic actuators & sensors by taking advantage of the QUAS/QF2 and UAS/GAL4 systems to test connectivity from Hb-IPN onto gsc2 and rln3 neurons.

      The experiments are convincing but the choice of the duration of the stimulation (5s) is not adequate to test for direct connectivity: the authors should make sure that response in gsc2 neurons is observed with short duration (50ms-1s max).

      As noted above:

      “As the activity of the gsc2 neurons is already increased by 1.8 fold (± 0.28) within the first frame that the laser is activated (duration ~200 msec), it is unlikely that that the observed response is due to non-specific activation induced by the long light pulse.”

      note: Specify that the gsc2 neurons tested are in NI.

      We have edited the text accordingly in the Results section titled Afferent input to the NI from the dHb-IPN pathway.

      b) for the response to shock: in the example shown for rln3 neurons, the activity differs before and after the shock with long phases of inhibition that were not seen before. Is it representative? the authors should carefully stare at their data & make sure there is no difference in activity patterns after shock versus before.

      We reexamined the responses for each of the rln3a neurons individually and confirmed that, although oscillations in activity are frequent, the apparent inhibition (excursions below baseline) are an idiosyncratic feature of the particular example shown.

      (3) motor activity assay:

      a) there seems to be a misconception in the use of the word "bout" to estimate in panels H and I bout distance and duration and the analysis should be performed with the criterion used by all in the motor field:

      As we know now well based on the work of many labs on larval zebrafish (Orger, Baier, Engert, Wyart, Burgess, Portugues, Bianco, Scott, ...), a bout is defined as a discrete locomotor event corresponding to a distance swam of typically 1-6mm, bout duration is typically 200ms and larvae exhibit a bout every s or so during exploration (see Mirat et al Frontiers 2013; Marques et al Current Biology 2018; Rajan et al. Cell Reports 2022).

      Since the larval zebrafish has a low Reynolds number, it does not show much glide and its movement corresponds widely to the active phase of the tail beats.

      Instead of detecting the active (moving) frames as bouts, the authors however estimate these values quite off that indicate an error of calibration in the detection of a movement: a bout cannot last for 5-10s, nor can the fish swim for more than 1 cm per bout (in the definition of the authors, bout last for 5-10 s, and bout correspond to 10 cm as 50 cm is covered in 5 bouts).

      The authors should therefore distinguish the active (moving) from inactive (immobile) phase of the behavior to define bouts & analyze the corresponding distance travelled and duration of active swimming. They would also benefit from calculating the % of time spent swimming in order to test whether the fish with ablated rln3 neurons change the fraction of the time spent swimming.

      As noted above:

      Our recordings capture the entire arena that the larva can explore in the experiment and therefore lack the spatial resolution to capture and analyze the tail beat. Rather, we measured the frequency and length of phases of movement in which the larva shows no more than 1 second of immobility. To avoid confusion with studies that measure bouts from the onset of tail movement, we removed this term from the manuscript and refer to activity as phases of movement.

      Note that a duration in seconds is not a length and that the corresponding symbol for seconds in a scientific publication is "s" and not "sec".

      We have corrected this.

      b) controls in these experiments are key as many clutches differ in their spontaneous exploration and there is a lot of variation for 2 min long recordings (baseline is 115s). The authors specify that the control unablated are a mix of siblings; they should show us how the ablated transgenic animals compare to the non ablated transgenic animals of the same clutch.

      The unablated Tg(gsc2:QF2)c721; Tg(QUAS:GFP)c578 and Tg(rln3a:QF2, he1.1:YFP)c836; Tg(QUAS:GFP)c578 larvae in the control group are siblings of ablated larvae. We repeated the analyses using either the Tg(gsc2:QF2)c721; Tg(QUAS:GFP)c578 or Tg(rln3a:QF2, he1.1:YFP)c836; Tg(QUAS:GFP)c578 larvae only as controls and added the results in Figure 8-3. Although the statistical power is slightly reduced due to a smaller number of samples in the control group, the conclusions are the same, as the behavior of Tg(gsc2:QF2)c721; Tg(QUAS:GFP)c578 and Tg(rln3a:QF2, he1.1:YFP)c836; Tg(QUAS:GFP)c578 unablated larvae is indistinguishable.

      Minor comments:

      (1) Anatomy :

      • Add precision in the anatomy in Figure 1:

      • Improve contrast for cckb.

      The contrast is determined by the signal to background ratio from the fluorescence in situ hybridization. Increasing the brightness would increase both the signal and the background, as any modification must be applied to the whole image.

      • since the number of neurons seems low in each category, could you quantify the number of rln3+, nmbb+, gsc2+, cckb+ neurons in NI?

      Quantification of neuronal numbers has been added to the first Results section titled Identification of gsc2 neurons in the Nucleus Incertus, lines 219-224.

      note: indicate duration for the integral of the DF/F in s and not in frames.

      We have added this in the legends for Figures 6 and 7 and in Materials and Methods.

      (2) Genetic tools:

      To generate a driver line for the rln3+ neurons using the Q system, the authors used the promoter for the hatching gland in order to drive expression in a structure outside of the nervous system that turns on early and transiently during development: this is a very elegant approach that should be used by many more researchers.

      If the her1 construct was integrate together with the QF2 in the first exon of the rln3 locus as shown in Figure 2, the construct should not be listed with a ";" instead of a "," behind rln3a:QF2 in the transgene name. Please edit the transgene name accordingly.

      We have edited the text accordingly.

      (3) Typos:

      GABAergic neurons is misspelled twice in Figure 3.

      Thank you for catching this. We have corrected the misspellings.

      Reviewer #3 (Recommendations For The Authors):

      • More analysis should be done to better characterize the calcium activity of gsc2 and rln3a populations. Specifically:

      Spontaneous activity is estimated by finding peaks in the time-series data, but the example in Fig7 raises concerns about this process: Two peaks for the gsc2 cell are identified while numerous other peaks of apparently similar SNR are not detected. Moreover, the inset images suggest GCaMP7a expression might be weaker in the gsc2 transgenic and as such, differences in peak count might be related to the SNR of the recordings rather than underlying activity. Overall, the process for estimating spontaneous activity should be more rigorous.

      To not solely rely on the identification of peaks in the calcium traces, we also plotted histograms of the amplitudes of the calcium signals for the rln3a and gsc2 neurons. The histograms show that the amplitudes of the rln3a calcium signals frequently occur at small and large values (suggesting large fluctuations in activity), whereas the amplitudes of the gsc2 calcium signals occur most frequently at median values. We added this analysis to a revised Figure 7.

      Interestingly, there are a number of large negative excursions in the calcium data for the rln3a cell - what is the authors' interpretation of these? Could it be that presynaptic inhibition via GABA-B receptors in dIPN might influence dIPN-innervating rln3a neurons?

      As noted above:

      We reexamined the responses for each of the rln3a neurons individually and confirmed that, although oscillations in activity are frequent, the apparent inhibition (excursions below baseline) are an idiosyncratic feature of the particular example shown.

      Regarding shock-evoked activity, the authors state "rln3a neurons showed ... little response to shock", yet the immediate response after shock appears very similar in gsc2 vs rln3a cells (approx 30 units on the dF/F scale). The subsequent time-course of the response is what appears to distinguish gsc2 versus rln3a; it might thus be useful to separately quantify the amplitude and decay time constant of the shock evoked response for the two populations.

      The reviewer is correct that the difference between the gsc2 and rln3a neurons in the response to shock is dependent on the duration of time post-shock that is analyzed. Thus, the more relevant feature is the length of the response rather than the size. To reflect this, we compared the average length of responses for the gsc2 and rln3a neurons. We have now added this analysis to Figure 7 and updated the text accordingly.

      • The difference in spontaneous locomotor behavior is interesting and the example tracking data suggests there might also be differences in turn angle distribution and/or turn chain length following rln3 NI ablations. I would recommend the authors consider exploring this.

      Thank you for this suggestion. We wrote additional code to quantify turning behavior and found that larvae with rln3a NI neurons ablated do indeed have a statistically significant increase in turning compared to other groups. We now show this analysis as Figure 8-2 and we added an explanation of the quantification of turning behavior to the Methods section titled Locomotor assay.

      • I didn't follow the reasoning in the discussion that activity of rln3a cells may control transitions between phases of behavioral activity and inactivity. The events (at least those that are detected) in Fig7 occur with an average interval exceeding 30 s, yet swim bouts occur at a frequency around 1 Hz. The authors should clarify their hypothesis about how these disparate timescales might be connected.

      As noted above:

      Our recordings capture the entire arena that the larva can explore in the experiment and therefore lack the spatial resolution to capture and analyze the tail beat. Rather, we measure the frequency and length of phases of movement in which the larva shows no more than 1 second of immobility. To avoid confusion with studies that measure bouts from the onset of tail movement, we removed this term from the manuscript and refer to activity as phases of movement.

      • Fig2-2: Images are ordered from (A, B, C) anterior to (A', B', C') posterior. Its not clear what this means and images appear to be in sequence A, A', B, B'.... please clarify and consider including a cartoon of the brain in sagittal view showing location of sections indicated.

      We clarified the text in the Figure 2-2 legend and added a drawing of the brain showing the location of the sections.

      • In Fig7, why are 300 frames analyzed pre/post shock? Even for gsc2, the response appears complete in ~100 frames.

      Reviewer #2 also pointed out that the difference between the gsc2 and rln3a neurons in the response to shock is dependent on the duration of time post-shock that is analyzed. Thus, the more relevant feature is the length of the response rather than the size. To reflect this, we compared the average length of response for the gsc2 and rln3a neurons and modified the text and Figure as described above.

      • What are the large negative excursions in the calcium signal in the rln3a data (Fig7E)?

      See response to Reviewer # 2, repeated below:

      We looked through each of the responses of individual rln3a neuron and confirmed that, although oscillations in activity are frequent among the rln3a neurons, the apparent inhibition (excursions below baseline) are an idiosyncratic feature of the particular example shown.

      • There are several large and apparently perfectly straight lines in the fish tracking examples (Fig8) suggestive of tracking errors (ie. where the tracked centroid instantaneously jumps across the camera frame). Please investigate these and include analysis of the distribution of swim velocities to support the validity of the tracking data.

      The reason for this is indeed imperfect tracking resulting in frames in which the tracker does not detect the larva. The result is that the larva appears to move 1 cm or more in a single frame. However, analysis of the distribution of distances across all frames shows that these events (movement of 1 cm or more in a single frame) are rare (less than 0.04%), and there are no systematic differences that would explain the differences in locomotor behavior presented in Fig. 8. A summary of the data is as follows:

      Controls: 0.0249% of distances 1 cm or greater gsc2 neurons ablated: 0.0302% of distances 1 cm or greater rln3a NI neurons ablated: 0.0287% of distances 1 cm or greater rln3a PAG neurons ablated: 0.0241% of distance 1 cm or greater

      • Insufficient detail is provided in the methods about how swim bouts are detected (and their durations extracted) from the centroids tracking data. Please expand detail in this section.

      We added an explanation to the Methods section titled Locomotor assay.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This study uses carefully designed experiments to generate a useful behavioural and neuroimaging dataset on visual cognition. The results provide solid evidence for the involvement of higher-order visual cortex in processing visual oddballs and asymmetry. However, the evidence provided for the very strong claims of homogeneity as a novel concept in vision science, separable from existing concepts such as target saliency, is inadequate.

      We appreciate the positive and balanced assessment from the reviewers. We agree that visual homogeneity is similar to existing concepts such as target saliency. We have tried our best to articulate our rationale for defining it as a novel concept. However, the debate about whether visual homogeneity is novel or related to existing concepts is completely beside the point, since that is not the key contribution of our study.

      Our key contribution is our quantitative model for how the brain could be solving generic visual tasks by operating on a feature space. In the literature there are no theories regarding the decision-making process by which the brain could be solving generic visual tasks. In fact, oddball search tasks, same-different tasks and symmetry tasks are never even mentioned in the same study because it is tacitly assumed that the underlying processes are completely different! Our work brings together these disparate tasks by proposing a specific computation that enables the brain to solve both types of tasks and providing evidence for it. This specific computation is a well-defined, falsifiable model that will need to be replicated, elaborated and refined by future studies.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors define a new metric for visual displays, derived from psychophysical response times, called visual homogeneity (VH). They attempt to show that VH is explanatory of response times across multiple visual tasks. They use fMRI to find visual cortex regions with VH-correlated activity. On this basis, they declare a new visual region in the human brain, area VH, whose purpose is to represent VH for the purpose of visual search and symmetry tasks.

      Thank you for your concise summary. We appreciate your careful reading and thoughtful and constructive comments.

      Strengths:

      The authors present carefully designed experiments, combining multiple types of visual judgments and multiple types of visual stimuli with concurrent fMRI measurements. This is a rich dataset with many possibilities for analysis and interpretation.

      Thank you for your accurate assessment of the strengths of our study.

      Weaknesses:

      The datasets presented here should provide a rich basis for analysis. However, in this version of the manuscript, I believe that there are major problems with the logic underlying the authors' new theory of visual homogeneity (VH), with the specific methods they used to calculate VH, and with their interpretation of psychophysical results using these methods. These problems with the coherency of VH as a theoretical construct and metric value make it hard to interpret the fMRI results based on searchlight analysis of neural activity correlated with VH.

      We appreciate your concerns, and have tried our best to respond to them fully against your specific concerns below.

      In addition, the large regions of VH correlations identified in Experiments 1 and 2 vs. Experiments 3 and 4 are barely overlapping. This undermines the claim that VH is a universal quantity, represented in a newly discovered area of the visual cortex, that underlies a wide variety of visual tasks and functions.

      We agree with you that the VH regions defined using symmetry task and search task do not overlap completely (as we have shown in Figure S13). However this is to be expected for several reasons. First, the images in the symmetry task were presented at fixation, whereas the images in the visual search task were presented peripherally. Second, the lack of overlap could be due to variations across individuals. Indeed, considerable individual variability has been observed in the location of category-selective regions such as VWFA (Glezer and Riesenhuber 2013) and FFA (Weiner and Grill-Spector, 2012). We propose that testing the same participants on both search and symmetry tasks would reveal overlapping VH regions. We now acknowledge these issues in the Results (p. 26).

      Maybe I have missed something, or there is some flaw in my logic. But, absent that, I think the authors should radically reconsider their theory, analyses, and interpretations, in light of the detailed comments below, to make the best use of their extensive and valuable datasets combining behavior and fMRI. I think doing so could lead to a much more coherent and convincing paper, albeit possibly supporting less novel conclusions.

      We appreciate your concerns. We have tried our best to respond to them fully against your specific concerns below.

      THEORY AND ANALYSIS OF VH

      (1) VH is an unnecessary, complex proxy for response time and target-distractor similarity. VH is defined as a novel visual quality, calculable for both arrays of objects (as studied in Experiments 1-3) and individual objects (as studied in Experiment 4). It is derived from a center-to-distance calculation in a perceptual space. That space in turn is derived from the multi-dimensional scaling of response times for target-distractor pairs in an oddball detection task (Experiments 1 and 2) or in a same-different task (Experiments 3 and 4).

      The above statements are not entirely correct. Experiments 1 & 3 are oddball visual search experiments. Their purpose was to estimate the underlying perceptual space of objects.

      Proximity of objects in the space is inversely proportional to response times for arrays in which they were paired. These response times are higher for more similar objects. Hence, proximity is proportional to similarity. This is visible in Fig. 2B as the close clustering of complex, confusable animal shapes.

      VH, i.e. distance-to-center, for target-present arrays, is calculated as shown in Fig. 1C, based on a point on the line connecting the target and distractors. The authors justify this idea with previous findings that responses to multiple stimuli are an average of responses to the constituent individual stimuli. The distance of the connecting line to the center is inversely proportional to the distance between the two stimuli in the pair, as shown in Fig. 2D. As a result, VH is inversely proportional to the distance between the stimuli and thus to stimulus similarity and response times. But this just makes VH a highly derived, unnecessarily complex proxy for target-distractor similarity and response time. The original response times on which the perceptual space is based are far more simple and direct measures of similarity for predicting response times.

      We agree that VH brings no explanatory power to target-present searches, since target-present response times are a direct estimate of target-distractor similarity. However, we are additionally explaining target-absent response times. Target-absent response times are well known to vary systematically with image properties, but why they do so have not been clear in the literature.

      Our key conceptual advance lies in relating the neural response to a search array to the neural response of the constituent elements, and in proposing a decision variable using which participants can make both target-present and target-absent judgements on any search array.

      (2) The use of VH derived from Experiment 1 to predict response times in Experiment 2 is circular and does not validate the VH theory.

      The use of VH, a response time proxy, to predict response times in other, similar tasks, using the same stimuli, is circular. In effect, response times are being used to predict response times across two similar experiments using the same stimuli. Experiment 1 and the target present condition of Experiment 2 involve the same essential task of oddball detection. The results of Experiment 1 are converted into VH values as described above, and these are used to predict response times in Experiment 2 (Fig. 2F). Since VH is a derived proxy for response values in Experiment 1, this prediction is circular, and the observed correlation shows only consistency between two oddball detection tasks in two experiments using the same stimuli.

      We agree that it would be circular to use oddball search times in Experiment 1 to explain only target-present search times in Experiment 2, since they basically involve the same searches. However, we are explaining both target-present and target-absent search times in a unified framework; systematic variations in target-absent search times have been noted in the literature but never really explained. One could still simply say that target-absent search times are some function of the target-present search times, but this still doesn’t provide an explanation for how participants are making target-present and absent decisions. The existing literature contains models for how visual search might occur for a specific target and distractor but does not elucidate how participants might perform generic visual search where target and distractors are not known in advance.

      Our key conceptual advance lies in relating the neural response to a search array to the neural response of the constituent elements, and in proposing a decision variable using which participants can make both target-present and target-absent judgements on any search array.

      (3) The negative correlation of target-absent response times with VH as it is defined for target-absent arrays, based on the distance of a single stimulus from the center, is uninterpretable without understanding the effects of center-fitting. Most likely, center-fitting and the different VH metrics for target-absent trials produce an inverse correlation of VH with target-distractor similarity.

      We see no cause for concern with the center-fitting procedure, for several reasons. First, the best-fitting center remained stable despite many randomly initialized starting points. Second, the best-fitting center derived from one set of objects was able to predict the target-absent and target-present responses of another set of objects. Finally, the VH obtained for each object (i.e. distance from the best-fitting center) is strongly correlated with the average distance of that object from all other objects (Figure S1A). We have now clarified this in the Results (p. 11).

      The construction of the VH perceptual space also involves fitting a "center" point such that distances to center predict response times as closely as possible. The effect of this fitting process on distance-to-center values for individual objects or clusters of objects is unknowable from what is presented here. These effects would depend on the residual errors after fitting response times with the connecting line distances. The center point location and its effects on the distance-to-center of single objects and object clusters are not discussed or reported here.

      While it is true that the optimal center needs to be found by fitting to the data, there no particular mystery to the algorithm: we are simply performing a standard gradient-descent to maximize the fit to the data. We have described the algorithm clearly and are making our codes public. We find the algorithm to yield stable optimal centers despite many randomly initialized starting points. We find the optimal center to be able to predict responses to entirely novel images that were excluded during model training. We are making no assumption about the location of centre with respect to individual points. Therefore, we see no cause for concern regarding the center-finding algorithm.

      Yet, this uninterpretable distance-to-center of single objects is chosen as the metric for VH of target-absent displays (VHabsent). This is justified by the idea that arrays of a single stimulus will produce an average response equal to one stimulus of the same kind. However, it is not logically clear why response strength to a stimulus should be a metric for homogeneity of arrays constructed from that stimulus, or even what homogeneity could mean for a single stimulus from this set. It is not clear how this VHabsent metric based on single stimuli can be equated to the connecting line VH metric for stimulus pairs, i.e. VHpresent, or how both could be plotted on a single continuum.

      Most visual tasks, such as finding an animal, are thought to involve building a decision boundary on some underlying neural representation. Even visual search has been portrayed as a signal-detection problem where a particular target is to be discriminated from a distractor. However none of these formulations work in the case of generic visual tasks, where the target and distractor identities are unknown. We are proposing that, when we view a search array, the neural response to the search array can be deduced from the neural responses to the individual elements using well known rules, and that decisions about an oddball target being present or absent can be made by computing the distance of this neural response from some canonical mean firing rate of a population of neurons. This distance to center computation is what we denote as visual homogeneity. We have revised our manuscript throughout to make this clearer and we hope that this helps you understand the logic better.

      It is clear, however, what should be correlated with difficulty and response time in the target-absent trials, and that is the complexity of the stimuli and the numerosity of similar distractors in the overall stimulus set. The complexity of the target, similarity with potential distractors, and the number of such similar distractors all make ruling out distractor presence more difficult. The correlation seen in Fig. 2G must reflect these kinds of effects, with higher response times for complex animal shapes with lots of similar distractors and lower response times for simpler round shapes with fewer similar distractors.

      You are absolutely correct that the stimulus complexity should matter, but there are no good measures for stimulus complexity. But considering what factors are correlated with target-absent response times is entirely different from asking what decision variable or template is being used by participants to solve the task.

      The example points in Fig. 2G seem to bear this out, with higher response times for the deer stimulus (complex, many close distractors in the Fig. 2B perceptual space) and lower response times for the coffee cup (simple, few close distractors in the perceptual space). While the meaning of the VH scale in Fig. 2G, and its relationship to the scale in Fig. 2F, are unknown, it seems like the Fig. 2G scale has an inverse relationship to stimulus complexity, in contrast to the expected positive relationship for Fig. 2F. This is presumably what creates the observed negative correlation in Fig. 2G.

      Taken together, points 1-3 suggest that VHpresent and VHabsent are complex, unnecessary, and disconnected metrics for understanding target detection response times. The standard, simple explanation should stand. Task difficulty and response time in target detection tasks, in both present and absent trials, are positively correlated with target-distractor similarity.

      Respectfully, we disagree with your assessment. Your last point is not logically consistent though: response times for target-absent trials cannot be correlated with any target-distractor similarity since there is no target in the first place in a target-absent array. We have shown that target-absent response times are in fact, independent of experimental context, which means that they index an image property that is independent of any reference target (Results, p. 15; Section S4). This property is what we define as visual homogeneity.

      I think my interpretations apply to Experiments 3 and 4 as well, although I find the analysis in Fig. 4 especially hard to understand. The VH space in this case is based on Experiment 3 oddball detection in a stimulus set that included both symmetric and asymmetric objects. However, the response times for a very different task in Experiment 4, a symmetric/asymmetric judgment, are plotted against the axes derived from Experiment 3 (Fig. 4F and 4G). It is not clear to me why a measure based on oddball detection that requires no use of symmetry information should be predictive of within-stimulus symmetry detection response times. If it is, that requires a theoretical explanation not provided here.

      We are using an oddball detection task to estimate perceptual dissimilarity between objects, and construct the underlying perceptual representation of both symmetric and asymmetric objects. This enabled us to then ask if some distance-to-center computation can explain response times in a symmetry detection task, and obtain an answer in the affirmative. We have reworked the text to make this clear.

      (4) Contrary to the VH theory, same/different tasks are unlikely to depend on a decision boundary in the middle of a similarity or homogeneity continuum.

      We have provided empirical proof for our claims, by showing that target-present response times in a visual search task are correlated with “different” responses in the same-different task, and that target-absent response times in the visual search task are correlated with “same” responses in the same-different task (Section S3).

      The authors interpret the inverse relationship of response times with VHpresent and VHabsent, described above, as evidence for their theory. They hypothesize, in Fig. 1G, that VHpresent and VHabsent occupy a single scale, with maximum VHpresent falling at the same point as minimum VHabsent. This is not borne out by their analysis, since the VHpresent and VHabsent value scales are mainly overlapping, not only in Experiments 1 and 2 but also in Experiments 3 and 4. The authors dismiss this problem by saying that their analyses are a first pass that will require future refinement. Instead, the failure to conform to this basic part of the theory should be a red flag calling for revision of the theory.

      We respectfully disagree – by no means did we dismiss this problem! In fact, we have explicitly acknowledged this by saying that VH does not explain all the variance in the response times, but nonetheless explains substantial variance and might form the basis for an initial guess or a fast response. The remaining variance might be explained by processes that involve more direct scrutiny. Please see Results, page 10 & 22.

      The reason for this single scale is that the authors think of target detection as a boundary decision task, along a single scale, with a decision boundary somewhere in the middle, separating present and absent. This model makes sense for decision dimensions or spaces where there are two categories (right/left motion; cats vs. dogs), separated by an inherent boundary (equal left/right motion; training-defined cat/dog boundary). In these cases, there is less information near the boundary, leading to reduced speed/accuracy and producing a pattern like that shown in Fig. 1G.

      The key conceptual advance of our study is that we show that even target/present, same/different or symmetry judgements can be fit into the standard decision-making framework.

      This logic does not hold for target detection tasks. There is no inherent middle point boundary between target present and target absent. Instead, in both types of trials, maximum information is present when the target and distractors are most dissimilar, and minimum information is present when the target and distractors are most similar. The point of greatest similarity occurs at the limit of any metric for similarity. Correspondingly, there is no middle point dip in information that would produce greater difficulty and higher response times. Instead, task difficulty and response times increase monotonically with the similarity between targets and distractors, for both target present and target absent decisions. Thus, in Figs. 2F and 2G, response times appear to be highest for animals, which share the largest numbers of closely similar distractors.

      Unfortunately, your logic does not boil down to any quantitative account, since you are using vague terms like “maximum information”. Further, any argument based solely on item similarity to explain visual search or symmetry responses cannot explain systematic variations observed for target-absent arrays and for symmetric objects, for the reasons below.

      If target-distractor dissimilarity were the sole driver of response times, target-absent judgements should always take the longest time since the target and distractor have zero similarity, with no variation from one image to another. This account does not explain why target-absent response times vary so systematically.

      Similarly, if symmetry judgements are solely based on comparing the dissimilarity between two halves of an object, there should be no variation in the response times of symmetric objects since the dissimilarity between their two halves is zero. However we do see systematic variation in the response times to symmetric objects.

      DEFINITION OF AREA VH USING fMRI

      (1) The area VH boundaries from different experiments are nearly completely non-overlapping.

      In line with their theory that VH is a single continuum with a decision boundary somewhere in the middle, the authors use fMRI searchlight to find an area whose responses positively correlate with homogeneity, as calculated across all of their target present and target absent arrays. They report VH-correlated activity in regions anterior to LO. However, the VH defined by symmetry Experiments 3 and 4 (VHsymmetry) is substantially anterior to LO, while the VH defined by target detection Experiments 1 and 2 (VHdetection) is almost immediately adjacent to LO. Fig. S13 shows that VHsymmetry and VHdetection are nearly non-overlapping. This is a fundamental problem with the claim of discovering a new area that represents a new quantity that explains response times across multiple visual tasks. In addition, it is hard to understand why VHsymmetry does not show up in a straightforward subtraction between symmetric and asymmetric objects, which should show a clear difference in homogeneity. • Actually VHsymmetry is apparent even in a simple subtraction between symmetric and asymmetric objects (Figure S10). The VH regions identified using the visual search task and symmetry task have a partial overlap, not zero overlap as you are incorrectly claiming.

      We have noted that it is not straightforward to interpret the overlap, since there are many confounding factors. One reason could simply be that the stimuli in the symmetry task were presented at fixation, whereas the visual search arrays contained items exclusively in the periphery. Another that the participants in the two tasks were completely different, and the lack of overlap is simply due to inter-individual variability. Testing the same participants in two tasks using similar stimuli would be ideal but this is outside the scope of this study. We have acknowledged these issues in the Results (p. 26) and in the Supplementary Material (Section S8).

      (2) It is hard to understand how neural responses can be correlated with both VHpresent and VHabsent.

      The main paper results for VHdetection are based on both target-present and target-absent trials, considered together. It is hard to interpret the observed correlations, since the VHpresent and VHabsent metrics are calculated in such different ways and have opposite correlations with target similarity, task difficulty, and response times (see above). It may be that one or the other dominates the observed correlations. It would be clarifying to analyze correlations for target-present and target-absent trials separately, to see if they are both positive and correlated with each other.

      Thanks. The positive correlation between VH and neural response holds even when we do the analysis separately for target-present and -absent searches (correlation between neural response in VH region and visual homogeneity (n = 32, r = 0.66, p < 0.0005 for target-present searches & n = 32, r = 0.56, p < 0.005 for target-absent searches).

      (3) The definition of the boundaries and purpose of a new visual area in the brain requires circumspection, abundant and convergent evidence, and careful controls.

      Even if the VH metric, as defined and calculated by the authors here, is a meaningful quantity, it is a bold claim that a large cortical area just anterior to LO is devoted to calculating this metric as its major task. Vision involves much more than target detection and symmetry detection. The cortex anterior to LO is bound to perform a much wider range of visual functionalities. If the reported correlations can be clarified and supported, it would be more circumspect to treat them as one byproduct of unknown visual processing in the cortex anterior to LO, rather than treating them as the defining purpose for a large area of the visual cortex.

      We totally agree with you that reporting a new brain region would require careful interpretation and abundant and converging evidence. However, this requires many studies worth of work, and historically category-selective regions like the FFA have achieved consensus only after they were replicated and confirmed across many studies. We believe our proposal for the computation of a quantity like visual homogeneity is conceptually novel, and our study represents a first step that provides some converging evidence (through replicable results across different experiments) for such a region. We have reworked our manuscript to make this point clearer (Discussion, p 32).

      Reviewer #2 (Public Review):

      Summary:

      This study proposes visual homogeneity as a novel visual property that enables observers perform to several seemingly disparate visual tasks, such as finding an odd item, deciding if two items are the same, or judging if an object is symmetric. In Experiment 1, the reaction times on several objects were measured in human subjects. In Experiment 2, the visual homogeneity of each object was calculated based on the reaction time data. The visual homogeneity scores predicted reaction times. This value was also correlated with the BOLD signals in a specific region anterior to LO. Similar methods were used to analyze reaction time and fMRI data in a symmetry detection task. It is concluded that visual homogeneity is an important feature that enables observers to solve these two tasks.

      Strengths:

      (1) The writing is very clear. The presentation of the study is informative.

      (2) This study includes several behavioral and fMRI experiments. I appreciate the scientific rigor of the authors.

      We are grateful to you for your balanced assessment and constructive comments.

      Weaknesses:

      (1) My main concern with this paper is the way visual homogeneity is computed. On page 10, lines 188-192, it says: "we then asked if there is any point in this multidimensional representation such that distances from this point to the target-present and target-absent response vectors can accurately predict the target-present and target-absent response times with a positive and negative correlation respectively (see Methods)". This is also true for the symmetry detection task. If I understand correctly, the reference point in this perceptual space was found by deliberating satisfying the negative and positive correlations in response times. And then on page 10, lines 200-205, it shows that the positive and negative correlations actually exist. This logic is confusing. The positive and negative correlations emerge only because this method is optimized to do so. It seems more reasonable to identify the reference point of this perceptual space independently, without using the reaction time data. Otherwise, the inference process sounds circular. A simple way is to just use the mean point of all objects in Exp 1, without any optimization towards reaction time data.

      We disagree with you since the same logic applies to any curve-fitting procedure. When we fit data to a straight line, we are finding the slope and intercept that minimizes the error between the data and the straight line, but we would hardly consider the process circular when a good fit is achieved – in fact we take it as a confirmation that the data can be fit linearly. In the same vein, we would not have observed a good fit to the data, if there did not exist any good reference point relative to which the distances of the target-present and target-absent search arrays predicted these response times.

      In Section S1, we have already reported that the visual homogeneity estimates for each object is strongly correlated with the average distance of each object to all other objects (r = 0.84, p<0.0005, Figure S1). Second, to confirm that the results we obtained are not due to overfitting, we have already reported a cross-validation analysis, where we removed all searches involving a particular image and predicted these response times using visual homogeneity. This too revealed a significant model correlation confirming that our results are not due to overfitting.

      (2) On page 11, lines 214-221. It says: "these findings are non-trivial for several reasons". However, the first reason is confusing. It is unclear to me why "it suggests that there are highly specific computations that can be performed on perceptual space to solve oddball tasks". In fact, these two sentences provide no specific explanation for the results.

      We have now revised the text to make it clearer (Results, p. 11).

      (3) The second reason is interesting. Reaction times in target-present trials can be easily explained by target-distractor similarity. But why does reaction time vary substantially across target-absent stimuli? One possible explanation is that the objects that are distant from the feature distribution elicit shorter reaction times. Here, all objects constitute a statistical distribution in the feature (perceptual) space. There is certainly a mean of this distribution. Some objects look like outliers and these outliers elicit shorter reaction times in the target-absent trials because outlier detection is very salient.

      One might argue that the above account is merely a rephrasing of the idea of visual homogeneity proposed in this study. If so, feature saliency is not a new account. In other words, the idea of visual homogeneity is another way of reiterating the old feature saliency theory.

      Thank you for this interesting point. We don’t necessarily see a contradiction. However, we are proposing a quantitative decision variable that the brain could be using to make target present/absent judgements.

      (4) One way to reject the feature saliency theory is to compare the reaction times of the objects that are very different from other objects (i.e., no surrounding objects in the perceptual space, e.g., the wheel in the lower right corner of Fig. 2B) with the objects that are surrounded by several similar objects (e.g., the horse in the upper part of Fig. 2B). Also, please choose the two objects with similar distance from the reference point. I predict that the latter will elicit longer reaction times because they can be easily confounded by surrounding similar objects (i.e., four-legged horses can be easily confounded by four-legged dogs). If the density of object distribution per se influences the visual homogeneity score, I would say that the "visual homogeneity" is essentially another way of describing the distributional density of the perceptual space.

      We agree with you, and we have indeed found that visual homogeneity estimates from our model are highly correlated with the average distance of an object relative to all other objects. However, we performed several additional experiments to elucidate the nature of target-absent response times. We find that they are unaffected by whether these searches are performed in the midst of similar or dissimilar objects (Section S4, Experiment S6), and even when the same searches are performed among nearby sets of objects with completely uncorrelated average distances (Section S4, Experiment S7). We have now reworked the text to make this clearer.

      (5) The searchlight analysis looks strange to me. One can easily perform a parametric modulation by setting visual homogeneity as the trial-by-trial parametric modulator and reaction times as a covariate. This parametric modulation produces a brain map with the correlation of every voxel in the brain. On page 17 lines 340-343, it is unclear to me what the "mean activation" is.

      We have done something similar. For each region we took the mean activation at each voxel as the average activation 3x3x3 voxel neighborhood in the brain, and took its correlation with visual homogeneity. We have now reworked this to make it clearer (Results, p. 16).

      Minor points

      (1) In the intro, it says: "using simple neural rules..." actually it is very confusing what "neural rules" are here. Better to change it to "computational principles" or "neural network models"??

      We have now replaced this with “using well-known principles governing multiple object representations”.

      (2) In the intro, it says: "while machine vision algorithms are extremely successful in solving feature-based tasks like object categorization (Serre, 2019), they struggle to solve these generic tasks (Kim et al., 2018; Ricci et al. 2021). These are not generic tasks. They are just a specific type of visual task-judging relationship between multiple objects. Moreover, a large number of studies in machine vision have shown that DNNs are capable of solving these tasks and even more difficult tasks. Two survey papers are listed here.

      Wu, Q., Teney, D., Wang, P., Shen, C., Dick, A., & Van Den Hengel, A. (2017). Visual question answering: A survey of methods and datasets. Computer Vision and Image Understanding, 163, 21-40.

      Małkiński, M., & Mańdziuk, J. (2022). Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven's Progressive Matrices. arXiv preprint arXiv:2201.12382.

      Thank you for sharing these references. In fact, a recent study has shown that specific deep networks can indeed solve the same-different task (Tartaglini et al, 2023). However our broader point remains that the same-different or other such visual tasks are non-trivial for machine vision algorithms.

      Reviewer #1 (Recommendations For The Authors):

      Nothing to add to the public review. If my concerns turn out to be invalid, I apologize and will happily accept correction. If they are valid, I hope they will point toward a new version of this paper that optimizes the insights to be gained from this impressive dataset.

      Reviewer #2 (Recommendations For The Authors):

      My suggestions are as follows:

      (1) Analyze the fMRI data using the parametric modulation approach first at the single-subject level and then perform group analysis.

      To clarify, we have obtained image-level activations from each subject, and used it for all our analyses.

      (2) Think about a way to redefine visual homogeneity from a purely image-computable approach. In other words, visual homogeneity should be first defined as an image feature that is independent of any empirical response data. And then use the visual homogeneity scores to predict reaction times.

      While we understand what you mean, any image-computable representation such as from a deep network may carry its own biases and may not be an accurate representation of the underlying object representation. By contrast, neural dissimilarities in the visual cortex are strongly predictive of visual search oddball response times. That is why we used visual search oddball response times as a proxy for the underlying neural representation, and then asked whether some decision variable can be derived from this representation to explain both target present and absent judgements in visual search.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors provide convincing experimental evidence of extended motivational signals encoded in the mouse anterior cingulate cortex (ACC) that are implemented by the orbitofrontal cortex (OFC)-to-ACC signaling during learning. The results are valuable to the field of motivation and cognition. The experimental methods used were state-of-the-art. The manuscript would further benefit from theory-driven analyses to inform a mechanistic understanding, particularly for the single-cell calcium imaging results. These results will be of interest to those interested in cortical function, learning, and/or motivation.

      We thank the reviewers for their thoughtful reading of our paper and providing constructive feedback. We have made the relevant changes to the manuscript to improve the writing and figures. We provide responses below to each of the reviewer’s comments.

      Reviewer #1 (Public Review):

      (1) An important conclusion (Figure 4) is that when mice are trained to run through no reward (N) cues in order to reach reward (R) cues, the OFC neurons projecting to ACC each respond to different specific events in a manner that ensures that collectively they tile the extended behavioural sequence. What I was less sure of was whether the ACC neurons do the same or not. Figure 3 suggests that on average ACC neurons maintain activity across N cues in order to get to R cues but I was not sure whether this was because all individual neurons did this or whether some had activity patterns like the OFC neurons projecting to ACC.

      We agree that it remains uncertain what individual ACC neurons do during the extended behavioral sequence. We now include a few sentences in the discussion about what we hypothesize, as we did not perform the cellular resolution imaging to determine this:

      “While we did not perform single-cell imaging of ACC in our task, we hypothesize that individual ACC neurons could encode the distribution of actions/opportunities47 (i.e. stop, run, lick, suppress lick) taken during R or N cues. ACC neurons could compute the relative value of the action taken such that more ACC neurons become recruited once mice learn to run out of N cues. The sustained increase in bulk ACC activity across N cue trials (Figure 2) could come from a stable sequence of individual neurons that encode the timescale of the actions taken. In this way, OFC projections would encode current motivation across N cues before learning, which then triggers ACC to compute the valuebased actions. Motivational signals in OFC would thus represent state since past rewards/goals, while in ACC these signals represent actions taken to pursue rewards/goals in the future.”

      (2) Figure 1 versus Figure 2: There does not seem to be a particular motivation for whether chemogenetic inactivation or optogenetic inhibition were used in different experiments. I think that this is not problematic but, if I am wrong and there were specific reasons for performing each experiment in a certain way, then further clarification as to why these decisions were made would be useful. If there is no particular reason, then simply explaining that this is the case might stop readers from seeking explanations.

      Thank you for this comment and we agree that clarification on this is important. We performed chemogenetic inhibition of ACC in Figure 1 to take a broad survey of behavioral effects throughout a 40-min long behavioral session, and performed optogenetic inhibition in Figure 2 because we wanted to restrict our inhibition to the few seconds of cue presentation during a behavioral session and across days. Furthermore, we wanted to combat any potential off-target effects that would come from repeated administration of CNO over the several days of training (Manvich et al 2018). We have included a couple sentences on page 4 to clarify this:

      “We proceeded to test whether these motivation related signals in ACC are required for learning. To restrict our inhibition to cue presentation portions of our task, and combat any potential off-target effects of CNO31 from repeated administration across several days of training, we used optogenetic inhibition.”

      (3) P5, paragraph 2. The authors argue that OFC and anteriomedial (AM) thalamic inputs into ACC are especially important for mediating motivation through N cues in order to reach R cues. Is this based on a statistical comparison between the activity in OFC or AM inputs as opposed to the other inputs?

      We determined that OFC and AM thalamic inputs to ACC are particularly important by comparing the pre-cue activity in a reward-no reward-reward trial sequence (RNR; Figure 3B). Specifically, we performed paired t-tests comparing pre-cue activity between N and R cues, and found a statistically significant increase for R cues but only for the OFC and AM inputs, not for the BLA or LC inputs.

      (4) P3, paragraph 2. Some papers by Khalighinejad and colleagues (eg Neuron 2020, Current Biology, 2022) might be helpful here in as much as they assess ACC roles in determining action frequency, initiation, and speed and mediating the relationship between reward availability and action frequency and speed.

      We thank the reviewer for bringing these relevant papers to our attention. We have included these papers in our citations in this paragraph.

      (5) Paragraph 1 "This learning is of a more deliberate, informed nature than habitual learning, as they are sensitive to the current value of outcomes and can lead to a novel sequence of actions for a desired outcome1-3." Should "they" be "it"?

      This is correct, we have edited this in the manuscript.

      Reviewer #2 (Public Review):

      Impact:

      The findings will be valuable for further research on the impact of motivational states on behaviour and cognition. The authors provided a promising concept of how persistent motivational states could be maintained, as well as established a novel, reproducible task assay. While experimental methods used are currently state-of-the-art, theoretical analysis seems to be incomplete/not extensive. We thank the reviewer for these comments. In our paper, we performed single-cell calcium imaging of OFC projection neurons to ACC to build a mechanistic understanding for the bulk ramp-like response we identified in these neurons with photometry. We identified ensembles of neurons that tile sequences of trials that match the bulk response, in particular a subset of neurons that are active at the time a reward (R) cue is reached after 2 no-reward (N) cues. We included a paragraph in the discussion to address future theory-driven analyses to address how computation is achieved by OFC projection neurons:

      “We linked the ramp-like increase in neural activity in OFC to motivation, but several questions still remain about how motivation is computed and why it would be represented as a ramp. Motivation could be computed as a combination of several variables such as time since last reward, value of reward, and effort to reach future rewards. Future theorydriven analyses could determine how motivation is computed, and whether individual variables of time, value, and effort, are encoded as clusters of similar tuned neurons, or mixed and collectively represented at the population level. In either case, it is likely that a combined map of task space and value-information carried by OFC are being used to inform downstream regions, such as ACC, for adjusting behavior. ”

      Reviewer #2 (Recommendations for the Authors):

      Overall, the layout of the figures seems a little bit chaotic and makes it hard to understand the boundaries between panels.

      We agree that the figure layout could be improved upon to aid the reader in moving from panel to panel. We have edited two of the main figures with layouts that are most irregular (Figures 2 and 4) to help with this.

      Figures/text should include the promoters used for protein expression so that readers understand which cell types would be affected.

      We have made sure to edit the figures to include the promoter of the viruses we used, and edited the text to include both the AAV serotype and promoter.

      Discuss why it is necessary for multiple prefrontal areas to be involved in maintaining motivational signals.

      We thank the reviewer for this comment. We believe that prefrontal areas would be recruited as tasks to study motivational states become more complex and require animals to keep track of task structure and perform value-guided actions. We have included a couple sentences in the final paragraph of the discussion about this:

      “Our work showed the recruitment of multiple frontal cortical areas in this process, which is to be expected as animals are required to build, maintain, and use representations of task structure and value to drive learned, motivated behaviors47. Future work can build upon the task we developed here to determine how the frontal cortex maintains motivational states across many more cue-outcome associations, and how these associations may dynamically change across time48”.

      Additionally, we included a short discussion on how in motivational signals differ between OFC and ACC in our work. We suggest OFC encodes current motivation before and after learning, which then leads ACC to represent learned actions taken and thus have a longer timescale motivational response (see response to Reviewer 1).

      Minor: Page 4, Line 1: "increase" instead of "increases".

      This is correct, we have edited this in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides important insights into the role of neurexins as regulators of synaptic strength and timing at the glycinergic synapse between neurons of the medial nucleus of the trapezoid body and the lateral superior olive, key components of the auditory brainstem circuit involved in computing sound source location from differences in the intensity of sounds arriving at the two ears. Through an elegant combination of genetic manipulation, fluorescence in-situ hybridization, ex vivo slice electrophysiology, pharmacology, and optogenetics, the authors provide convincing evidence to support their claims. While further work is needed to reveal the mechanistic basis by which neurexins influence glycinergic neurotransmission, this work will be of interest to both auditory and synaptic neuroscientists.

      We appreciate the recognition of the significance of our study in shedding light on the role of neurexins in regulating synaptic strength and timing at the glycinergic synapse. Indeed, further investigations are warranted to delve deeper into the specific role of each different variant of neurexins in the future. We hope that our work will spark more interest and collaboration in unraveling the complexities of molecular codes of synaptic function.

      Public Reviews:

      Reviewer #1 (Public Review):

      Jiang et al. demonstrated that ablating Neurexins results in alterations to glycinergic transmission and its calcium sensitivity, utilizing a robust experimental system. Specifically, the authors employed rAAV-Cre-EGFP injection around the MNTB in Nrxn1/2/3 triple conditional mice at P0, measuring Glycine receptor-dependent IPSCs from postsynaptic LSO neurons at P13-14. Notably, the authors presented a clear reduction of 60% and 30% in the amplitudes of opto- and electric stimulation-evoked IPSCs, respectively. Additionally, they observed changes in kinetics, alterations in PPR, and sensitivity to lower calcium and the calcium chelator, EGTA, indicating solid evidence for changes in presynaptic properties of glycinergic transmission.

      Furthermore, the authors uncovered an unexpected increase in sIPSC frequency without altering amplitude. Despite the reduction in evoked IPSC, immunostaining revealed an increase in GlyT2 and VGAT in TKO mice, supporting the notion of an increase in synapse number. However, the reviewer expresses caution regarding the authors' conclusion that "glycinergic neurotransmission likely by promoting the synapse formation/maintenance, which is distinct from the phenotypes observed in glutamatergic and GABAergic neurons (Chen et al., 2017; Luo et al., 2021)", as outlined in lines 173-175. The reviewer suggests that this statement may be overstated, pointing out the authors' own discussion in lines 254-265, which acknowledges multiple possibilities, including the potential that the increase in synapses is a consequence rather than a causal effect of Nrxn deletion.

      We appreciate the reviewer’s thoughtful evaluation of our study. We agree that our conclusion regarding the promotion of synapse formation/maintenance may have been overstated and recognize the need for a more nuanced interpretation of our findings. Accordingly, we have revised our interpretation by discussing carefully the various possibilities that may cause the observed increase in synapse number in line 256-266.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Jiang et al., explore the role of neurexins at glycinergic MNTB-LSO synapses. The authors utilize elegant and compelling ex vivo slice electrophysiology to assess how the genetic conditional deletion of Nrxns1-3 impacts inhibitory glycinergic synaptic transmission and found that TKO of neurexins reduced electrically and optically evoked IPSC amplitudes, slowed optically evoked IPSC kinetics and reduced presynaptic release probability. The authors use classic approaches including reduced [Ca2+] in ACSF and EGTA chelation to propose that changes in these evoked properties are likely driven by the loss of calcium channel coupling. Intriguingly, while evoked transmission was impaired, the authors reported that spontaneous IPSC frequency was increased, potentially due to an increased number of synapses in LSO. Overall, this manuscript provides important insight into the role of neurexins at the glycinergic MNTP-LSO synapse and further emphasizes the need for continued study of both the non-redundant and redundant roles of neurexins.

      We thank the reviewer for the strong comments and support of our work.

      Strengths:

      This well-written manuscript seamlessly incorporates mouse genetics and elegant ex vivo electrophysiology to identify a role for neurexins in glycinergic transmission at MNTB-LSO synapses. Triple KO of all neurexins reduced the amplitude and timing of evoked glycinergic synaptic transmission. Further, spontaneous IPSC frequency was increased. The evoked synaptic phenotype is likely a result of reduced presynaptic calcium coupling while the spontaneous synaptic phenotype is likely due to increased synapse numbers. While neuroligin-4 has been identified at glycinergic synapses, this study, to the best of my knowledge, is the first to study Nrxn function at these synapses.<br />

      We again appreciate the positive feedback on the strengths of our study. We agree that the observed reduction in evoked synaptic transmission and the increase in spontaneous IPSC frequency provide intriguing insights into the function of neurexins in regulating glycinergic synaptic activity.

      Weaknesses:

      The data are compelling and report an intriguing functional phenotype. The role of Neurexins redundantly controls calcium channel coupling has been previously reported. Mechanistic insight would significantly strengthen this study.

      We wholeheartedly agree with the reviewer that understanding how neurexins control calcium channel coupling at the presynaptic active zone is crucial for elucidating their role in synaptic transmission. While our current study has provided compelling evidence for the functional phenotypes of pan-neurexin deletion, we recognize the importance of investigating the underlying molecular mechanisms in future research. Exploring these mechanisms would undoubtedly enhance our understanding of neurexin function at various synapses and contribute to advancing the field.

      The claim that triple KO of Nrxns from MNTB increases the number of synapses in LSO is not strongly supported.

      We agree. Echoing the suggestion made by reviewer 1 (as mentioned above), we acknowledge that the claim regarding the increase in synapse numbers in the LSO following the triple knockout of neurexins from the MNTB was overstated. Consequently, we have revised our conclusions more carefully to reflect this adjustment.

      Despite the stated caveats of measuring electrically evoked currents and the more robust synaptic phenotypes observed using optically evoked transmission, the authors rely heavily on electrical stimulation for most measurements.

      We acknowledge that optogenetic stimulation offers crucial advantages, and we have provided a balanced discussion of the caveats associated with both methods in our manuscript. Additionally, we have conducted new optogenetic experiments specifically for measuring the paired-pulse ratio in control and Nrxn123 TKO mice. These results have been included as a new supplementary figure (Figure S2).

      For experiments involving EGTA and low Ca2+ manipulations, we opted for electrical stimulation due to concerns regarding potential side effects of optogenetics, including the phototoxicity and photobleaching during prolonged light exposure.

      The differential expression of individual neurexins might indicate that specific neurexins may dominantly regulate synaptic transmission, however, this possibility is not discussed in detail.

      We thank the reviewer for bringing up this important point. The differential expression of individual neurexins indeed suggests that specific neurexins may play dominant roles in regulating synaptic transmission. While our study primarily focused on the collective impact of ablating all neurexins, we acknowledge the significance of exploring the specific contributions of individual neurexin isoforms in the future. Understanding the distinct roles of each neurexin isoform could provide valuable insights into the precise mechanisms underlying synaptic function and plasticity. We have added discussion in our revised manuscript Line223-230.

      Reviewer #3 (Public Review):

      Summary:

      The authors investigate the hypothesis that neurexins serve a crucial role as regulators of the synaptic strength and timing at the glycinergic synapse between neurons of the medial nucleus of the trapezoid body (MNTB) and the lateral superior olivary complex (LSO). It is worth mentioning that LSO neurons are an integration station of the auditory brainstem circuit displaying high reliability and temporal precision. These features are necessary for computing interaural cues to derive sound source location from comparing the intensities of sounds arriving at the two ears. In this context, the authors' findings build up according to the hypothesis first by displaying that neurexins were expressed in the MNTB at varying levels. They followed this up with the deletion of all neurexins in the MNTB through the employment of a triple knock-out (TKO). Using electrophysiological recordings in acute brainstem slices of these TKO mice, they gathered solid evidence for the role of neurexins in synaptic transmission at this glycinergic synapse primarily by ensuring tight coupling of Ca2+ channels and vesicular release sites. Additionally, the authors uncovered a connection between the deletion of neurexins and a higher number of glycinergic synapses in TKO mice, for which they provided evidence in the form of immunostainings and related it to electrophysiological data on spontaneous release. Consequently, this investigation expands our knowledge on the molecular regulation of synaptic transmission at glycinergic synapses, as well as on the auditory processing at the level of the brainstem.

      Strengths:

      The authors demonstrate substantial results in support of the hypothesis of a critical role of neurexins for regulating glycinergic transmission in the LSO using various techniques. They provide evidence for the expression of neurexins in the MNTB and consecutively successfully generate and characterize the neurexin TKO. For their study on LSO IPSCs the authors transduced MNTB neurons by co-injection of virus-carrying Cre and ChR2 and subsequently optogenetically evoke release of glycine. As a result, they observed a significant reduction in amplitude and significantly slower rise and decay times of the IPSCs of the TKO in comparison with control mice in which MNTB neurons were only transduced with ChR2. Furthermore, they observed an increased paired pulse ratio (PPR) of LSO IPSCs in the TKO mice, indicating lower release probability. Elaborating on the hypothesis that neurexins are essential for the coupling of synaptic vesicles to Ca2+ channels, the authors show lowered Ca2+ sensitivity in the TKO mice. Additionally, they reveal convincing evidence for the connection between the increased frequency of spontaneous IPSC and the higher number of glycinergic synapses of the LSO in the TKO mice, revealed by immunolabeling against the glycinergic presynaptic markers GlyT2 or VGAT.

      We thank the reviewer for the thoughtful and thorough evaluation of the significance of investigating the role of neurexins in glycinergic transmission at the MNTB-LSO synapse, particularly in the context of auditory processing and sound localization. The positive feedback is greatly appreciated.

      Weaknesses:

      The major concern is novelty as this work on the effects of pan-neurexin deletion in a glycinergic synapse is quite consistent with the authors' prior work on glutamatergic synapses (Luo et al., 2020). The authors might want to further work out novel aspects and strengthen the comparative perspective. Conceptually, the authors might want to be more clear about interpreting the results on the altered dependence of release on voltage-gated Ca2+ influx (Ca2+ sensitivity, coupling).

      Regarding the reviewer’s concern about the novelty of our work, we acknowledge that our previous work has explored the effects of pan-neurexin deletion on glutamatergic synapses (Luo et al., 2020). However, we would like to point out that a novelty of our present study indeed stems from the exploration of how different types of synapses converge to employ the same mechanism of synaptic function, particularly in the context of neurexin-mediated regulation. Our previous study focused on glutamatergic synapses, the current study delves into the realm of glycinergic synapses, which represent a distinct population with unique properties and functions. Despite the differences between these synapse types, our findings reveal a commonality in the underlying mechanisms of synaptic regulation mediated by neurexins. This convergence of mechanisms across different synapse types highlights the fundamental role of neurexins in synaptic function and plasticity. By elucidating how neurexins regulate synaptic transmission at both excitatory and inhibitory synapses, we provide valuable insights into the general principles governing synaptic function. In addition, this comparative perspective may shed light on the complex interplay between excitatory and inhibitory neurotransmission, which is crucial for maintaining the balance of neuronal activity and network dynamics.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      During the developmental period spanning P3-P12, the MNTB-LSO synapses undergo a transition from GABAergic to glycinergic transmission. It is well-established that Neurexin plays a role in modulating GABAergic transmission. In the authors' experimental system, AAV was injected at P0, likely impacting GABAergic transmission, including potentially influencing synapse number, before subsequently affecting glycinergic transmission. A thoughtful discussion of how the experimental interventions might have influenced this developmental process and glycinergic transmission would enhance the clarity and interpretation of their findings.

      We thank the reviewer for raising the interesting topic of the transmitter switch during neurodevelopment. Strong evidence using gerbils and rats as animal models demonstrates that the MNTB-LSO synapses undergo a shift from GABAergic to glycinergic during the early development. However, in a more recent study by Friauf and colleagues (Fisher et al., 2019), patch-clamp recordings in acute mouse brainstem slices at P4-P11 combined with pharmacological blockade of GABAA receptors and/or glycine receptors clearly demonstrated no GABAergic synaptic component on LSO principal neurons, suggesting the transmitter subtype switch may be species different. We add a discussion in our revision to clarify this topic.

      Reviewer #2 (Recommendations For The Authors):

      The data are compelling and report an intriguing functional phenotype. Mechanistic insight into how this phenotype manifests would significantly strengthen this study. For example, which neuroligin is found at these MNTB-LSO synapses?

      We agree that investigating the underlying molecular mechanisms, particularly the specific function of each variant of neurexins and their respective ligands on the postsynaptic neurons, is crucial. Exploring these mechanisms, which extend beyond the scope of our current study, would undoubtedly enhance our understanding of neurexin function at various synapses and foster advancements in the field.

      Does the TKO alter the ability of MNTB inputs to induce AP firing in LSO neurons?

      Activation of the MNTB inputs does not directly induce AP firing in LSO neurons, because the MNTB-LSO synapses are glycinergic and serve to inhibit neuronal activity.

      We think the reviewer was to ask whether pan-neurexin deletion in the MNTB neurons alter their ability to impact the firing of LSO neurons. Indeed, the weakening of glycinergic transmission due to pan-neurexin ablation in MNTB neurons could potentially alter the excitation-inhibition (E/I) balance, thereby impacting the overall excitability of LSO neurons. We have conducted preliminary experiments to investigate this aspect and found that the E/I balance at LSO neurons was notably increased in TKO mice. We are currently preparing a manuscript to comprehensively address the role of neurexins at the auditory circuit and behavior levels.

      Additional calcium measurements using GECIs would provide insight into whether nanodomain calcium or total calcium is altered at these synapses.

      We appreciate the valuable suggestion provided by the reviewer. However, distinguishing between Ca2+ nanodomain and Ca2+ microdomain using Ca2+ imaging techniques requires advanced systems such as two-photon STED microscopy, which are beyond the scope of our current research.

      It is unclear why fluorescence intensity is quantified instead of the number of synaptic clusters in LSO. In addition to changes in synapse numbers, fluorescent intensity can indicate a number of other possible morphological changes.

      We appreciate the valuable suggestion from the reviewer. We have re-analyzed our imaging data to compare synaptic density. The results, as included in Fig.3f and 3h, confirm an increase in the number of glycinergic synapses after pan-neurexin deletion.

      The most robust synaptic phenotypes were produced by measuring light-evoked oIPSCs and the authors acknowledge that electrically-evoked eIPSCs might be contaminated by uninfected fibers or by other sources of glycinergic inputs. I suggest that IPSC PPRs, EGTA, and low Ca2+ experiments be performed using optogenetics.

      As discussed in our response to Public Reviews, we acknowledge that optogenetic stimulation offers crucial advantages, and we have provided a balanced discussion of the caveats associated with both methods in our manuscript. Additionally, following the reviewer’s suggestion, we have conducted new optogenetic experiments specifically for measuring the paired-pulse ratio in control and Nrxn123 TKO mice. We included this new dataset in supplementary Figure S2, which is consistent with our result obtained with electrically fiber stimulation.

      For experiments involving EGTA and low Ca2+ manipulations, we opted for electrical stimulation due to major concerns regarding potential side effects of optogenetics, including the phototoxicity and photobleaching during prolonged light exposure.

      It is sometimes confusing which type of evoked stimulation is being used (e.g. PPR, EGTA, and low Ca2+ experiments). To aid in the interpretations of these experiments, it would help to clarify.

      We appreciate the reviewer's suggestion regarding the clarity of the evoked stimulation methods used in our experiments. We have revised the manuscript to provide clearer descriptions of the specific types of evoked stimulation employed in each experiment. Thank you for guiding towards this clarification.

      The comparisons to Chen et al 2017 and the senior author's 2020 paper seem disjointed and do not contribute to the findings, which alone, are quite interesting. Given the prevailing notion that neurexins control different synaptic properties depending on the brain region and/or synapse studied, is it surprising that the findings observed here differ from previous studies of different synapses (glutamatergic and GABAergic)?

      By comparing previous studies at different types of neurons/synapses, our findings reveal a commonality in the underlying mechanisms of synaptic regulation mediated by neurexins. This convergence of mechanisms across different synapse types highlights the fundamental role of neurexins in synaptic function and plasticity. In addition, this comparative perspective may shed light on the complex interplay between excitatory and inhibitory neurotransmission, which is crucial for maintaining the balance of neuronal activity and network dynamics.

      Despite Nrxn3 being the most abundant Nrxn mRNA in MNTB neurons, the possible contributions of this highly expressed protein are not discussed.

      We thank the reviewer for bringing up this important point. The differential expression of individual neurexins indeed suggests that specific neurexins may play dominant roles in regulating synaptic transmission. While our study primarily focused on the collective impact of ablating all neurexins, we acknowledge the significance of exploring the specific contributions of individual neurexin isoforms in the future. Understanding the distinct roles of each neurexin isoform could provide valuable insights into the precise mechanisms underlying synaptic function and plasticity. We have added discussion in our revised manuscript Line223-230.

      Reviewer #3 (Recommendations For The Authors):

      • There are several instances of spaces missing and typos, please carefully check the manuscript.

      We greatly appreciate the reviewer's helpful feedback on the text that could be clarified or improved. We have meticulously edited the manuscript to address these concerns.

      • While studying the properties of IPSC, apart from optogenetic stimulation, the authors performed experiments with electrical fiber stimulation. Their findings showed a slightly significant reduction of the IPSC amplitude and no effect on the IPSCs kinetics when comparing the TKO and control. One weakness is the discrepancy between the results from the optogenetic and fiber stimulation experiments, which the authors contribute to inefficient transfection in the fiber stimulation experiments. The authors state that they tried to optimize their protocols for virus injection protocols. However, they do not elaborate on how the transfection rates could be improved in the discussion section. Moreover, it would be good to further address the reasons for the difference in amplitude between the control IPSCs in the optogenetic and fiber stimulation experiments.

      Echoing the suggestion by Reviewer 2 (see above), we acknowledge that optogenetic stimulation offers certain advantages, and we have provided a balanced discussion of the caveats associated with both methods in our manuscript. In addition, we have performed a new set of optogenetic experiment for the paired-pulse ratio measurement in control and Nrxn123 TKO mice and included as a new figure in supplementary figure S2.

      For experiments involving EGTA and low Ca2+ manipulations, we opted for electrical stimulation due to major concerns regarding potential side effects of optogenetics, including the phototoxicity and photobleaching during prolonged light exposure.

      We added the detail of virus injection strategy that optimized the transfection rates in the method section “To enhance virus infection efficiency, we decreased the dosage per injection while increasing the frequency of injections. Additionally, we ensured the pipette remained immobilized for 20-30 seconds to guarantee virus absorption at injection sites. As a result of this strategy, we estimated that the vast majority of MNTB neurons were inoculated by AAVs.” See line288-290.

      • Abstract: "ablation of all neurexins in MNTB neurons reduced not only the amplitude but also altered the kinetics of the glycinergic synaptic transmission at LSO neurons."

      Changed as suggested.

      • Consider revising to "The synaptic dysfunctions primarily resulted from an altered dependence of release on voltage-gated Ca2+ influx."

      We appreciate the reviewer's suggestion, which helps improve the clarity of our manuscript. We have revise the phrasing as follows: "The synaptic dysfunctions primarily resulted from an impaired calcium sensitivity of release and a loosened coupling between voltage-gated calcium channels and synaptic vesicles."

      • Line 39 should be vertebrates.

      Revised as suggested.

      • Line 49 it would sound better to say "which further points to the diverse actions of neurexins in specific neurons."

      Revised as suggested.

      • Line 60 - this paragraph could include information about GABA signaling from the MNTB to the LSO, because on line 113 you mention LSO neurons receive inhibitory GABAergic/glycinergic inputs, but when you do not mention blocking of GABA currents to isolate the glycinergic ones.

      We thank the reviewer for the thoughtful and detailed suggestion. We revised the text in line 60 to “In the mature mammalian auditory brainstem” and in line 113, we removed GABAergic to emphasize the nature of glycinergic synapse, particularly in the mouse brainstem where no GABAergic components are found (Fisher et al., 2019).

      • Line 72/73 it should be adeno-associated virus; line 73: "combining this with the RNAScope technique" sounds better.

      Changed as suggested.

      • Line 91 using the RNAScope technique; lines 97, 119 as a control; line 108 the functional organization.<br />

      Changed as suggested.

      • Line 113 should be a pharmacological approach; line 122 optogenetically evoked.

      Changed as suggested.

      • Line 132, 160: the control.

      Changed as suggested.

      • Line 147 thus were infected; line 148 likely to be present but were obscured .

      Changed as suggested.

      • Line 154 which has been routinely used.

      Changed as suggested.

      • Line 155 It is not supposed to be Figure 2h but 2i; following that Figure 2i should be 2j; in my opinion, Figure 2i does not display a strong depression for the TKO mice.

      Changed as suggested.

      • Line 171 a better flow is achieved by saying: together these data show.

      Changed as suggested.

      • EC50 rather than IC50 of [Ca2+].

      Changed as suggested.

      • 180 it is better to say "we approached the matter by..."; line 183 while recording;

      Changed as suggested.

      • Line 203 were much stronger than the effect at control synapses; line 206 tightly clustering.

      Changed as suggested.

      • Line 212 sounds like they provide evidence for retina and spinal cord as well, should be made clear.

      Changed as suggested.

      • Line 289 previously.

      Changed as suggested.

      • Line 295 should be 30 min.

      Changed as suggested.

      • Line 336, 337 confocal microscope.

      Changed as suggested.

      • Please provide the number of data points also in figure captions or in the results section.

      Added in the captions as suggested.

      • Line 533, a better phrasing would be: the blocking effect of 0.2 mM Ca on IPSC amplitude.

      Changed as suggested.

      • Explain either in the methods or result section how was the EC50 of Ca2+ calculated.

      Added in the methods as suggested.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Comments to the Author):

      Summary:

      In this study, Xie and colleagues aimed to explore the function and potential mechanisms of the gut microbiota in a hamster model of severe leptospirosis. The results demonstrated that Leptospira infection was able to cause intestine damage and inflammation. Leptospira infection promoted an expansion of Proteobacteria, increased gut barrier permeability, and elevated LPS levels in the serum. Thus, they proposed an LPS-neutralization therapy which improved the survival rate of moribund hamsters combined with antibody therapy or antibiotic therapy.

      Strengths:

      The work is well-designed and the story is interesting to me. The gut microbiota is essential for immunity and systemic health. Many life-threatening pathogens, such as SARS-CoV-2 and other gut-damaged infection, have the potential to disrupt the gut microbiota in the later stages of infection, causing some harmful gut microbiota-derived substances to enter the bloodstream. It is emphasized that in addition to exogenous pathogenic pathogens, harmful substances of intestinal origin should also be considered in critically ill patients.

      Weaknesses:

      Q1: There are many serotypes of Leptospira, it is suggested to test another pathogenic serotype of Leptospira to validate the proposed therapy.

      That’s a constructive suggestion. We have tested another pathogenic serotype of Leptospira (L. interrogans serovar Autumnalis strain 56606) to verify the LPS-neutralization therapy combined with antibiotic therapy (Supplementary Fig. S9B). The results showed that the combination of the LPS-neutralization therapy with antibody therapy or antibiotic therapy also significantly improved the survival rate of hamsters infected by 56606.

      Q2: Authors should explain why the infective doses of leptospires was not consistent in different study.

      Thank you for your comment. To examine the role of the gut microbiota on acute leptospirosis, the infective doses of leptospires was chosen for 106, while in other sections of the study, the infective doses of leptospires was chosen for 107. In fact, we also used 107 leptospires to infect hamsters, however, the infective doses of 107 leptospires might be overdose, there was no significant difference on the survival rate between the control group and the Abx-treated group. A previous study also highlighted that the infective doses of leptospires was important in the investigating the sex on leptospirosis, as male hamsters infected with L. interrogans are more susceptible to severe leptospirosis after exposure to lower infectious doses than females (103 leptospires but not 104 leptospires) (1).

      Reference

      (1) GOMES C K, GUEDES M, POTULA H H, et al. Sex Matters: Male Hamsters Are More Susceptible to Lethal Infection with Lower Doses of Pathogenic Leptospira than Female Hamsters (J). Infect Immun, 2018, 86(10).

      Q3: In the discussion section, it is better to supplement the discussion of the potential link between the natural route of infection and leptospirosis.

      Thank for your suggestion. We have supplemented it in the discussion (line 523-527 in the track change PDF version).

      Q4: Line 231, what is the solvent of thioglycolate?

      We have supplemented it in the manuscript (line 242-243 in the track change PDF version).

      Q5: Lines 962-964, there are some mistakes which are not matched to Figure 7.

      Thank you for pointing that out, we have corrected it in the manuscript.

      Reviewer #2 (Comments to the Author):

      Summary:

      Severe leptospirosis in humans and some mammals often meet death in the endpoint. In this article, authors explored the role of the gut microbiota in severe leptospirosis. They found that Leptospira infection promoted a dysbiotic gut microbiota with an expansion of Proteobacteria and LPS neutralization therapy synergized with antileptospiral therapy significantly improved the survival rates in severe leptospirosis. This study is well-organized and has potentially important clinical implications not only for severe leptospirosis but also for other gut-damaged infections.

      Weaknesses:

      Q1: In the Introduction section and Discussion section, the authors should describe and discuss more about the differences in the effect of Leptospira infection between mice and hamsters, so that the readers can follow this study better.

      Thank you for your suggestion, we have supplemented it in the manuscript (line 62-66 in the track change PDF version).

      Q2: Lines 92-95, the authors should explain why they chose two different routines of infection.

      Thank you for your comment, we have explained it in the manuscript (line 100 in the track change PDF version).

      Q3: Line 179-180, the concentration of PMB and Dox is missed, and 0.016 μg/L is just ok.

      We have corrected it in the manuscript.

      Q4: "μL" or "μl" and "mL" or "ml' should be uniform in the manuscript.

      Thank you for your suggestion, we have revised it in the manuscript.

      Q5: In the culture of primary macrophages, how many cells are inoculated in the plates should be described clearly.

      We have supplemented it in the manuscript (line 250 in the track change PDF version).

      Q6: Line 271, it is better to list primers used for leptospiral detection in the text. Because it allows readers to find the information they need more directly.

      Thank you for your suggestions, we have supplemented it in the manuscript (line 281-284 in the track change PDF version).

      Q7: Line 366-369, Lactobacillus seems to be a kind of key bacteria during Leptospira infection. A previous study (doi: 10.1371/journal.pntd.0005870) also demonstrated that pre-treatment with Lactobacillus plantarum prevented severe pathogenesis in mice. The authors should discuss the potential probiotic for leptospirosis prevention.

      We have discussed it in the manuscript (line 564-566 in the track change PDF version).

      Q8: Lines 450-451, not all concentrations of fecal filtration from two groups upregulated all gene expression mentioned in the text, the authors should correct it.

      Thank you for pointing that out, we have corrected it in the manuscript (line 461-462 in the track change PDF version).

      Reviewer #3 (Comments to the Author):

      Summary:

      This is a well-prepared manuscript that presented interesting research results. The only defect is that the authors should further revise the English language.

      Strengths:

      The omics method produced unbiased results.

      Weaknesses:

      Q1: LPS neutralization is not a new method for treating leptospiral infection.

      Thank you for your comment. Yes, LPS neutralization is not a new method for treating leptospiral infection, most of which might focus on leptospiral LPS. In addition, Leptospira seemed to be naturally resistant to polymyxin B (1). Recently, neutralizing gut-derived LPS was applied in other diseases which significantly relieved diseases (2-3). In this study, we found that Leptospira infection promoted an expansion of Proteobacteria, increased gut barrier permeability, and elevated LPS levels in the serum. Thus, we proposed an LPS-neutralization therapy which improved the survival rate of moribund hamsters combined with antibody therapy or antibiotic therapy.

      Reference

      (1) LIEGEON G, DELORY T, PICARDEAU M. Antibiotic susceptibilities of livestock isolates of leptospira (J). Int J Antimicrob Agents, 2018, 51(5):693-699.

      (2) MUNOZ L, BORRERO M J, UBEDA M, et al. Intestinal Immune Dysregulation Driven by Dysbiosis Promotes Barrier Disruption and Bacterial Translocation in Rats With Cirrhosis (J). Hepatology, 2019, 70(3):925-938.

      (3) ZHANG X, LIU H, HASHIMOTO K, et al. The gut-liver axis in sepsis: interaction mechanisms and therapeutic potential (J). Crit Care, 2022, 26(1):213.

      Q2: The authors should further revise the English language used in the text.

      Thank you for your suggestion, our manuscript has been polished by American Journal Experts (certificate number: 81C8-C5C1-9D5D-109D-3F23).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In their valuable study, Chen et al. aim to define the neuronal role of HMMR, a microtubule-associated protein typically associated with cell division. Their findings suggest that HMMR is necessary for proper neuronal morphology and the generation of polymerizing microtubules within neurites, potentially by promoting the function of TPX2. While the study is recognized as a first step in deciphering the influence of HMMR on microtubule organization in neurons, reviewers note the current work has important gaps and would benefit from further exploration of the mechanism of microtubule stability by HMMR, the link between HMMR-mediated microtubule generation and morphogenesis, and the physiological implications of disrupting HMMR during neuronal morphogenesis.

      Public Reviews:

      Reviewer #1 (Public Review):

      The microtubule cytoskeleton is essential for basic cell functions, enabling intracellular transport, and establishment of cell polarity and motility. Microtubule-associated proteins (MAPs) contribute to the regulation of microtubule dynamics and stability - mechanisms that are specifically important for the development and physiological function of neurons. Here, the authors aimed to elucidate the neuronal function of the MAP Hmmr, which they had previously identified in a quantitative study of the proteome associated with neuronal microtubules.

      The authors conduct well-controlled experiments to demonstrate the localization of endogenous as well as exogenous Hmmr on microtubules within the soma as well as all neurites of hippocampal neurons. Functional analysis using gain- and loss-of-function approaches demonstrates that Hmmr levels are crucial for neuronal morphogenesis, as the length of both dendrites and axons decreases upon loss of Hmmr and increases upon Hmmr overexpression. In addition to length alterations, the branching pattern of neurites changes with Hmmr levels. To uncover the mechanism of how Hmmr influences neuronal morphology, the authors follow the lead that Hmmr overexpression induces looped microtubules in the soma, indicative of an increase in microtubule stability. Microtubule acetylation indeed decreases and increases with Hmmr LOF and GOF, respectively. Together with a rescue of nocodazole-induced microtubule destabilization by Hmmr GOF, these results argue that Hmmr regulates microtubule stability. Highlighted by the altered movement of a plus-end-associated protein, Hmmr also has an effect on the dynamic nature of microtubules. The authors present evidence suggesting that the nucleation frequency of neuronal microtubules depends on Hmmr's ability to recruit the microtubule nucleator Tpx2. Together, these data add novel insight into MAP-mediated regulation of microtubules as a prerequisite for neuronal morphogenesis. While the data shown support the author's conclusions, the study also has several weaknesses:

      • The study appears incomplete as the initial proteomics analysis which is referenced as an entry into the study is not presented. This surely is the authors' choice, however, without presenting this data set, it would make more sense if the authors first showed the localization of Hmmr on neuronal microtubules and then started with the functional analysis.

      The reviewer suggests moving the Hmmr localization data in front of the loss- and gain-of-function data because we did not present the proteomics data. However, we still believe placing the loss- and gain-of-function data in the beginning is the better arrangement. This is because it allows the audience to see the drastic changes on neuronal morphology when HMMR is depleted or overly abundant. It also provides a better linkage between HMMR’s localization on microtubules and its effect on the stability and dynamics of microtubules.

      • Neurite branching is quantified, but the methods used are not consistent (normalized branch density vs. Sholl analysis) and there is no distinction between alterations of branching in dendrites vs. axons. This information should be added as it could prove informative with respect to the physiological function of Hmmr in neurite branching.

      Sholl analysis is considered the gold standard in neurite branching analyses. However, in the knockdown experiment (Figure 1A~1E), HMMR-depleted neurons exhibited extremely short axons (<100 μm) and dendrites (<40 μm). Using Sholl analysis to assess the branching of these Hmmrdepleted neurons became unsuitable. That is why we used normalized branch density (Figure 1E) in the knockdown experiment and Sholl analysis (Figure 1J) in the overexpression experiment.

      Regarding the branching difference between axons and dendrites, only axons exhibit branches at 4 DIV. Therefore, the branching analysis focuses on axons rather than on dendrites. We have revised the manuscript to clarify this.

      • The authors show that altered Hmmr levels affect neurite branching and identify an effect on microtubule stability and dynamics as a molecular mechanism. However, how branching correlates with or is regulated by Hmmr-mediated microtubule dynamics is neither addressed experimentally nor discussed by the authors. The physiological significance of altered neuronal morphogenesis also lacks discussion.
      • To discuss how branching correlates with or is regulated by HMMR-mediated microtubule dynamics, we have added the following paragraph into the Discussion section:

      “It has been shown that compromising microtubule nucleation in neurons by SSNA1 mutant overexpression prevents proper axon branching (Basnet et al., 2018). Additionally, dendritic branching in Drosophila sensory neurons depends on the orientation of microtubule nucleation. Nucleation that results in an anterograde microtubule growth leads to increased branching, while nucleation that results in a retrograde microtubule growth leads to decreased branching (Yalgin et al., 2015). These results demonstrate the importance of microtubule nucleation on neurite branching. It is conceivable that overexpressing a microtubule nucleation promoting protein such as HMMR results in an increase of branching complexity.”

      • In terms of discussing the physiological significance of altered neuronal morphogenesis. We have added the following paragraph to the Discussion section:

      “Neurons are the communication units of the nervous system. The formation of their intricate shape is therefore crucial for the physiological function. Alterations in neuronal morphogenesis have a profound impact on how nerve cells communicate, leading to a variety of physiological consequences. These consequences include impaired neural circuit formation and function, compromised signal transmission between neurons, as well as altered anatomical structure of the CNS. Depending on the specific type and location of the morphogenetically altered neurons, the physiological consequences can include neurological disorders such as autism spectrum disorder (Berkel et al., 2012) and schizophrenia (Goo et al., 2023), as well as learning and memory deficits (Winkle et al., 2016). However, due to the involvement of HMMR on mitosis, most HMMR mutations are associated with familial cancers (based on ClinVar data).”

      • Multiple times, the manuscript lacks a rationale for an experimental approach, choice of cell type, time points, regions of interest, etc. Also, a meaningful description of the methods and for how data were analyzed is missing, making the paper hard to read for someone not directly from the field.

      We understand the reviewer’s comments regarding the lack of rationale for choosing the experimental approach, choice of cell type, time points, regions of interest, etc. As a result, we have added the rationales where appropriate to help readers from other fields to better understand the choice of cell type, time points, regions of interest, etc. A brief explanation is shown below:

      • Approach and timing: We employed both electroporation (immediate but milder expression) and lipofectamine transfection (delayed but stronger expression). We prioritized knocking down HMMR early in development, so electroporation was used. For overexpression experiments, we chose lipofectamine which allows high protein expression level to be achieved.

      • Cell selection: Hippocampal neurons were chosen in experiments that involve morphological quantification due to their homogeneous morphology. On the other hand, cortical neurons were selected in experiments that require large amounts of neurons and/or experiments where we want to demonstrate the universality of a proposed hypothesis.

      • Regions of interest (ROIs): In our previous publication (Chen et al., 2017), it was discovered that a significant reduction of EB3 emanation frequency can be detected at the tip and the base of the neurite but not in the middle of the neurite in TPX2-depleted neurons. The reason for this difference is due to the presence of GTP-bound Ran GTPase (RanGTP) at the tip and the base of the neurite. Since RanGTP has also been shown to regulate the interaction between HMMR and TPX2 in the cell-free system (Scrofani et al., 2015), it is possible that the same phenomenon can be observed in HMMR-depleted neurons. This is why we examined those 3 ROIs in Figure 4.

      Reviewer #2 (Public Review):

      The mechanism of microtubule formation, stabilization, and organization in neurites is important for neuronal function. In this manuscript, the authors examine the phenotype of neurons following alteration in the level of the protein HMMR, a microtubule-associated protein with established roles in mitosis. Neurite morphology is measured as well as microtubule stability and dynamic parameters using standard assays. A binding partner of HMMR, TPX2, is localized. The results support a role for HMMR in neurons.

      The work presented in this manuscript seeks to determine if a MAP called HMMR contributes to microtubule dynamics in neurons. Several steps, including validation of the RNAi, additional statistical analysis, use of cells at the same age in culture, and better documentation in figures, would increase the impact of the work.

      In many places, the data can be improved which might make the story more convincing. As presented, the results show that HMMR is distributed as puncta on neurons with data coming from a single HMMR antibody, and some background staining that was not discussed. In the discussion the authors state that HMMR impacts microtubule stability, which was evaluated by the presence of post-translational modification and resistance to nocodazole; the data are suggestive but not entirely convincing. The discussion also states that HMMR increases the “amount” of growing microtubules which was measured as the frequency of comet appearance. The authors did not comment on how the number of growing microtubules results in the observed morphological changes.

      We actually tested several HMMR antibodies, including E-19 (Santa Cruz, sc-16170), EPR4054 (Abcam, ab124729), and a variety of antibodies provided by Prof. Eva Turley. E-19 performed the best in immunofluorescence (IF) staining and knockdown validation. The other antibodies either failed to detect HMMR in IF staining or generate excessive background signal. We understand that the final images are produced using a single antibody. But since we meticulous validated this antibody and that the localization of overexpressed HMMR is consistent with the endogenous HMMR, we are very confident about our data generated using this single antibody.

      We have added the following paragraph in the Discussion section to elucidate how the number of growing microtubules result in the observed morphological changes such as an increase of axon branches:

      “It has been shown that compromising microtubule nucleation in neurons by SSNA1 mutant overexpression prevents proper axon branching (Basnet et al., 2018). Additionally, dendritic branching in Drosophila sensory neurons depends on the orientation of microtubule nucleation. Nucleation that results in an anterograde microtubule growth leads to increased branching, while nucleation that results in a retrograde microtubule growth leads to decreased branching (Yalgin et al., 2015). These results demonstrate the importance of microtubule nucleation on neurite branching. It is conceivable that overexpressing a microtubule nucleation promoting protein such as HMMR results in an increase of branching complexity.

      Reviewer #1 (Recommendations for The Authors):

      (1) The manuscript jumps extensively between main figures and supplementary figures. Please check whether parts of the supplement could be moved to the main figures.

      We understand the frustration of moving back and forth between the main figures and supplementary figures. After examining the manuscript, we decided to combine Figure 2A with Figure S3.

      (2) In Figure 1, total neurite length between days 3 and 4 DIV does not appear to change - can this be true?

      Please check or else explain.

      We carefully re-examined our raw data and found out the total neurite length of 4 DIV hippocampal neurons expressing non-targeting shRNA (Figure 1B) and that of 3 DIV hippocampal neurons expressing AcGFP (Figure 1G) are indeed very similar. The explanation is that the 3 DIV hippocampal neurons used for Figure 1G was cultured in low-density and in the presence of cortical neuron-conditioned neurobasal medium (as written in Methods, Neuron culture and transfection section). The low-density culture with minimal overlapping neurites allowed us to better quantify total neurite length, because neurons expressing AcGFP-mHMMR sprouted long and highly branched axons. However, the addition of cortical neuron-conditioned neurobasal medium promoted neurite elongation. This is the reason why the total neurite length of 4 DIV hippocampal neurons expressing non-targeting shRNA (Figure 1B) and that of 3 DIV hippocampal neurons expressing AcGFP (Figure 1G) is similar.

      (3) Groen et al. have shown that Hmmr also bundles microtubules, a mechanism that surely is important for neuronal microtubules. Please discuss.

      We thank the reviewer for pointing out that HMMR also bundles microtubules and have added this to our revised Discussion section:

      “It has been shown that the Xenopus HMMR homolog XRHAMM bundles microtubules in vitro (Groen et al., 2004). In addition, deleting proteins which promote microtubule bundling (e.g., doublecortin knockout, MAP1B/MAP2 double knockout) leads to impaired neurite outgrowth (Bielas et al., 2007; Teng et al., 2001). These observations are consistent with our data that overexpressing HMMR leads to the increased axon and dendrite outgrowth, while depleting it results in the opposite phenotype (Figure 1).”

      (4) Please explain why in Figure 4, cortical neurons were chosen for analysis and why and how the three different ROIs were picked.

      To answer the question why we chose cortical neurons for the analyses in Figure 4, it will be important to explain why we used hippocampal neurons for other figures. Primary hippocampal neurons have a high homogeneity in terms of their morphology. This uniform morphology allows more consistent morphological quantification. Figure 4, however, does not involve morphological quantification. We are more confident to conclude that HMMR regulates microtubule dynamics if this effect can be detected in the relatively heterogeneous cortical neurons. These are the reasons why we chose to analyze cortical neurons in Figure 4.

      In our previous publication (Chen et al., 2017), it was discovered that a significant reduction of EB3 emanation frequency can be detected at the tip and the base of the neurite but not in the middle of the neurite in TPX2-depleted neurons. The reason for this difference is due to the presence of GTP-bound Ran GTPase (RanGTP) at the tip of the neurite and in the soma. Since RanGTP has also been shown to regulate the interaction between HMMR and TPX2 in the cell-free system (Scrofani et al., 2015), it is possible that the same phenomenon can be observed in HMMR-depleted neurons. This was why we examined those 3 ROIs in Figure 4.

      (5) Microtubule looping has been shown to occur in regions prior to branch formation (e.g. Dent et al. 2004). As the authors identify increased looping upon Hmmr GOF, this should be discussed.

      We thank the reviewer for pointing out that microtubule looping occurs in regions of branch formation and have added this to our revised discussion:

      “It is worth noting that the elevated level of HMMR increases the branching density of axons (Figure 1J) and promotes the formation of looped microtubules (Figure 3A). This is consistent with the observations that looped microtubules are often detected in regions of axon branch formation (Dent et al., 1999; Dent and Kalil, 2001; Purro et al., 2008).”

      Reviewer #2 (Recommendations for The Authors):

      (1) The work seeks to gain insight into microtubule behavior in neurons, an important issue.

      (2) Several steps, including validation of the RNAi, additional statistical analysis, use of cells at the same age in culture, and better documentation in figures, would increase the impact of the work.

      (3) Figure 1 documents the results of experiments in which the HMMR protein was depleted using shRNA. A western blot of cell extracts from control and depleted cells is needed to verify that the protein level is reduced; alternatively, documentation of the reduction in RNA levels in treated cells could be provided. Neurite, axon, and dendrite length and branch density are measured. The neurite length is in microns, and the axon length is normalized to 100% of the non-treated cells. Please use the same for measures for easier comparison. Looking at the images in Figure 1, the length of the dendrites does not look different in the examples shown, whereas the axon appears shorter. This impression is not supported by the quantification. Are representative images shown? Additionally, the authors should report the values for each replicate of the experiment and compare the three averages rather than comparison of lengths from all measurements. A related issue is that the dendrites do not look longer in panel F, following overexpression of HMMR. For examples of using averages of replicates see: https://pubmed.ncbi.nlm.nih.gov/32346721/

      The reviewer mentioned that Western blot of cell extracts or RNA quantification from control and depleted cells are needed to verify that the protein level is reduced.

      Unfortunately, these assays are extremely difficult to perform in primary neurons due to the low transfection efficiency. We believe that the consistent knockdown phenotype from 3 different shRNA sequences (Figure 1A-D) and the immunofluorescence staining in depleted primary neurons (Figure S2) are sufficient to confirm that HMMR level is reduced.

      We revised Figure 1C, 1D, 1H, 1I so that axon and dendrite lengths are all in micron.

      We selected another image for the non-targeting control in Figure 1A to better demonstrate the reduction of dendrite length when HMMR is knocked down.

      We thank the reviewer for the suggestion of comparing the three average values rather than comparing all measurements. We have performed statistical analyses for all our data using the average values and revised the graphs accordingly. While the P-values changed, our conclusions remain the same.

      We thank the reviewer for pointing out this discrepancy and have selected another image of the AcGFP control for Figure 1F to better demonstrate the increase of dendrite length when HMMR is overexpressed.

      (4) Given the changes in neurite morphology, the authors examine the localization of endogenous and overexpressed. The supplemental figures (see S2 and S3) show evidence that HMMR is present in a punctate pattern by conventional immunofluorescence. This is reasonable evidence that the protein is in a linear pattern along cytoskeletal microtubules and that the signal is present in puncta. Please move this to the main text, perhaps replacing Figure 2A, which is low magnification and very hard to see the HMMR staining. Additionally, the level of overexpression of HMMR is not mentioned. Please address this; were cells with similar levels of overexpression selected? Did the result depend on the overexpression? A related issue is the DIV for the cells - some are examined earlier and some at later times; does this impact the results? Please provide information or perform experiments with consistent timing. For the immunofluorescence, were multiple antibodies tried to see if the result was the same with each? Were different fixations, in addition to methanol, utilized?

      We have replaced Figure 2A with Figure S3 based on the reviewer’s suggestion.

      In the HMMR overexpression experiments, we used HMMR antibody and immunofluorescence staining to confirm that the overexpression is achieved. However, we did not quantify to what extend HMMR was overexpressed.

      We performed all the depletion experiments on 4 DIV to maximize knockdown efficiency and performed all the overexpression experiments on 3 DIV to prevent excessive axon fasciculation. Nonetheless, we examined the effect of HMMR depletion on neuronal morphology on 3 DIV. The trend of reduced total neurite length, axon length, and dendrite length can be observed, but no statistical significance can be detected. We also examined the effect of HMMR overexpression on neuronal morphology on 4 DIV and did observe an increase of total neurite length, axon length, and dendrite length. But the overlapping and bundled axons made reliable quantification extremely difficult.

      We actually tested multiple HMMR antibodies, such as E-19 (Santa Cruz, sc-16170), EPR4054 (Abcam, ab124729), and a variety of antibodies provided by Prof. Eva Turley. E19 performed the best in immunofluorescence (IF) staining and knockdown validation. The other antibodies either failed to detect HMMR in IF staining or generate excessive background signal. We also tested various fixation methods, including 37°C formaldehyde fixation, -20°C methanol fixation, 37°C formaldehyde followed by -20°C methanol fixation. All fixation methods generated similar IF staining pattern using the E-19 antibody, but 3.7% formaldehyde fixation produced the highest signal.

      (5) In Figure 2 C it is hard to see DAPI fluorescence. Are the white areas in the merge with bright cell nuclei? Is Figure 2C control or overexpressing cells? If this is endogenous, is there less signal in PLA compared with S4, which was in culture longer and is overexpressed prior to using PLA for detection?

      The white areas in Figure 2C the reviewer mentioned are not cell nuclei, they are actually bubbles formed within the mounting medium.

      HMMR detected in Figure 2C is endogenous. We did not quantitatively compare the PLA signals in Figure 2C and those in Figure S4. This is because the PLA signals in Figure 2C are generated using anti-HMMR (to detect endogenous HMMR) and anti-β-III-tubulin antibodies while those in Figure S4 are generated using anti-AcGFP (to detect overexpressed AcGFP-mHMMR) and anti-β-III-tubulin antibodies. Since the affinity of the two antibodies (i.e., anti-HMMR and anti-AcGFP) toward their antigens is different, comparing the PLA signals is not informative.

      (6) The images of the endogenous HMMR (Fig S3) and the PLA with tubulin and HMMR antibodies are not the same (2C). The "dots" in PLA are widely separated; gauging from the marker bar length of 50 μm, the small clusters of dots are about 10 μm apart. In Figure S3, the puncta are much more closely spaced, appearing almost in a linear fashion along the microtubules. Enlarging the PLA image shows that each dot is very small - just a few pixels - please provide additional explanation including the minimal detection limit for the method, and why the images differ. If the standard immunofluorescence signal was enhanced, for example with the use of two secondaries, what is observed? Is the distribution of HMMR similar for both dendrites and axons? Microtubule polarity differs in these locations, so greater attention to this point seems of interest. There is a significant amount of punctate HMMR in the cytoplasm (or outside the cytoplasm?) in Figure S5; this is concerning. Please outline the cell edge for ease of visualization. What is the distribution of HMMR in a cell that has been treated with cold and/or nocodazole to disassemble the microtubules? is the signal lost?

      The reasons images of the endogenous HMMR (Figure S3) and the PLA with tubulin and HMMR antibodies (Figure 2C) differ are due to the following reasons. o PLA utilizes two primary antibodies to target two different epitopes on HMMR and βIII-tubulin. It is conceivable that not every anti-HMMR antibody has the correct orientation and/or proximity (<40 nm) toward the anti-β-III-tubulin antibody to enable DNA amplification. This results in the shortage of PLA puncta compared to immunofluorescence signals.

      • The creator of PLA has pointed out that in situ PLA is a method based upon equilibrium reactions and several enzymatic steps. Therefore, only a fraction of the inter-acting molecules is detected (Weibrecht et al., 2010).

      We have not used signal enhancing immunofluorescence staining methods [e.g., using tertiary antibodies or tyramide signal amplification (TSA)] to detect HMMR. This is mainly because HMMR signal is strong enough to be detected using standard immunofluorescence staining.

      Regarding the question “Is the distribution of HMMR similar for both dendrites and axons?” The reviewer raised a very important issue about the polarity difference of microtubules in axons (uniform) and dendrites (mixed). We were aware of such issue and very carefully examined the distribution and signal intensity of HMMR in axons vs dendrites. However, no differences were detected.

      The reviewer mentioned that “there is a significant amount of punctate HMMR in the cytoplasm (or outside the cytoplasm?) in Figure S5; this is concerning. Please outline the cell edge for ease of visualization.” Instead of outlining the cell edge, we have selected another image to facilitate the visualization of HMMR signals. There are indeed HMMR signals outside the cell. However, these outside signals are usually weaker and smaller in size compared to those inside the cell.

      After the examination of neurons expressing AcGFP-mHMMR with or without 100 nM nocodazole treatment, we did not notice any difference of AcGFP-mHMMR in distribution. We did not examine the distribution and signal intensity of the endogenous HMMR.

      (7) To determine if HMMR alters microtubule stability, the authors examine the distribution of acetylated tubulin and resistance to nocodazole-induced microtubule disassembly. In Figure 3 please show immunofluorescence images of the acetylated tubulin staining, not just the ratio images; the color is not obviously different in the various panels shown. For statistical analysis, see the comment above for Figure 1. For the nocodazole experiment, a similar change in neurite length following drug treatment was observed (Figure 3H), for the experimental and control, even though the starting length was greater in the overexpressing cells. Please consider the possibility that in both cases the microtubules are only partially resistant to nocodazole and that HMMR is not changing the fraction of microtubules that are sensitive to the drug. The cells were treated at 3 DIV; the authors note that more stable microtubules accumulate with time; how does time in culture impact stability? Often, acute treatment with a high concentration of nocodazole is used to assay microtubule stability; here the authors used a low (nM) concentration for 2 days (chronic). Why not use a higher concentration (1-10 μM) for a shorter incubation? The data show that overexpression of HMMR results in curved, buckled microtubules are these microtubules more acetylated and/or retained after nocodazole treatment?

      The reviewer suggested that we show immunofluorescence images of the acetylated tubulin staining, not just the ratio images. But we still believe showing the ratio images is the better approach. This is because the microtubules density can be different from neuron to neuron. Showing acetylated tubulin may provide a false impression when the overall microtubule density is higher or lower in a particular neuron. We realized that “16 colors” pseudo-color scheme has the cyan color at the lower intensity which can sometimes be confused with the white color at the higher intensity. Therefore, we changed the pseudocolor from “16 colors” to “fire” for Figure 3B and 3E to better visualize these images so that they appear more consistent with the quantitative data.

      The reviewer raised a very good question regarding the possibility that HMMR is not changing the fraction of microtubules that are sensitive to nocodazole. We re-conducted the same experiment and used a series of different nocodazole concentrations. While the addition of nocodazole causes a concentration-dependent reduction of total neurite length in both AcGFP and AcGFP-mHMMR expressing neurons, there are subtle differences in the susceptibility of neurite length to the concentration of nocodazole. 1) 10 nM nocodazole treatment causes a significant reduction of neurite length in AcGFP expressing neurons, but not in AcGFP-mHMMR expressing neurons. This result indicates that AcGFP-mHMMR expression increases the tolerance of neurite elongation toward 10 nM nocodazole treatment. 2) 50 nM and 100 nM nocodazole treatment exhibits no statistical significance in AcGFP expressing neurons, suggesting that 50 nM nocodazole has reached maximal effectiveness. In AcGFP-mHMMR expressing neurons, 100 nM nocodazole further reduces the neurite length compared to the 50 nM group. These results argue against the possibility that HMMR does not change the fraction of microtubules that are sensitive to nocodazole. We have revised Figure 3H accordingly.

      The reviewer asked why we did not use the acute nocodazole treatment (μM concentration) to assess the effect of Hmmr on microtubule stability. This is because we used the neurite length as an indicator for microtubule stability. That is why the chronic treatment was chosen to produce a more detectable effect on neurite length.

      The reviewer asked whether the looped microtubules caused by HMMR overexpression are more acetylated and/or nocodazole resistant. While we do not have direct evidence to answer the reviewer’s question, we can deduce the answer from our observations. We noticed that looped microtubules are only present when HMMR is highly expressed (i.e., using lipofection to introduce HMMR-expressing plasmid) but not when HMMR is mildly expressed (i.e., using electroporation to introduce HMMR-expressing plasmid). From these observations, we can conclude that HMMR is more abundantly present on looped microtubules. Since HMMR overexpression leads to higher microtubule acetylation (Figure 3E), looped microtubules which contains more HMMR are most likely to be more acetylated.

      (8) An additional measure of microtubule dynamics is to measure the growth of microtubules using a live cell marker for microtubule plus ends. Such experiments were performed, using tagged EB3. The images are rather fuzzy. Parameters of microtubule dynamics were measured at three locations - is there data that the authors can cite about any differences in dynamics in control cells at these locations? They look very similar, so it is not clear why the different locations were used. It is not possible to learn much from the kymographs which look similar for all panels; I would remove these unless they can be changed or labeled to help the reader. Data is presented for three shRNA reagents. No data are presented to document the extent to which the protein is depleted with these reagents. This should be fixed. Alternatively, an RNAi pool could be utilized. Is there a control for off-target effects? For the analysis were all the comets used to generate the average values? What about a comparison of the average of each trial - not each comet?

      In our previous publication (Chen et al., 2017), it was discovered that a significant reduction of EB3 emanation frequency can be detected at the tip and the base of the neurite but not in the middle of the neurite in TPX2-depleted neurons. The reason for this difference is due to the presence of RanGTP at the tip and the base of the neurite. Since RanGTP has also been shown to regulate the interaction between HMMR and TPX2 in the cell-free system (Scrofani et al., 2015), it is possible that the same phenomenon can be observed in HMMR-depleted neurons. This is why we examined those 3 ROIs in Figure 4.

      We notice that photobleaching causes the EB3-mCherry signal to diminish at later time points, which made it difficult to observe the differences amongst kymographs. In the revised Figure 4B and 4D, we removed the second half of all the kymographs to make the differences more obvious.

      The reviewer mentioned that there are no data documenting the extent to which the protein is depleted with the shRNAs. These data are shown in Figure S2, in which we quantified the HMMR protein level in the soma and along the neurite in neurons expressing different shRNA molecules.

      The reviewer asked whether there is a control for off-target effects. The answer is yes. We performed the rescue experiment to control for off-target effects, which is shown in Figure S1.

      We revised Figure 4 so that the dynamic properties of EB3 are quantified using the average of each experimental repetition.

      (9) In a final experiment, the authors examine the distribution of TPX2, a binding partner of HMMR. Include a standard immunofluorescence in addition to PLA to illustrate the distribution of TPX2. The quantification used was the inter puncta distance; please quantify the signal in control and treated cells.

      The reviewer asked us to include a standard immunofluorescence staining to illustrate the distribution of TPX2. We have done that in our previous publication (Chen et al., 2017) and TPX2 localizes primarily to the centrosome (https://www.nature.com/articles/srep42297/figures/2). In order to enhance the weak signal of TPX2 along the neurite, we actually needed to use PLA in that publication (https://www.nature.com/articles/srep42297/figures/3).

      Proximity ligation assay (PLA) generates fluorescent signals based on a local enzymatic reaction which catalyzes the amplification of a specific DNA sequence that can then be detected using a red fluorescent probe. Because this enzymatic reaction is not linear, the amount of amplified DNA nor the intensity of the fluorescence does not correlate with the strength of the interaction (Soderberg et al., 2006). As a result, quantification of PLA is typically done by counting the number of fluorescent puncta per unit area or by calculating the area containing fluorescent signal (not signal intensity) per unit area in the case that PLA signals are too strong and coalesced. That is why our quantification is based on the distance between PLA fluorescent puncta, not the fluorescent signal intensity.

      References

      Basnet, N., H. Nedozralova, A.H. Crevenna, S. Bodakuntla, T. Schlichthaerle, M. Taschner, G. Cardone, C. Janke, R. Jungmann, M.M. Magiera, C. Biertumpfel, and N. Mizuno. 2018. Direct induction of microtubule branching by microtubule nucleation factor SSNA1. Nat. Cell Biol. 20:1172-1180.

      Berkel, S., W. Tang, M. Trevino, M. Vogt, H.A. Obenhaus, P. Gass, S.W. Scherer, R. Sprengel, G. Schratt, and G.A. Rappold. 2012. Inherited and de novo SHANK2 variants associated with autism spectrum disorder impair neuronal morphogenesis and physiology. Hum. Mol. Genet. 21:344-357.

      Bielas, S.L., F.F. Serneo, M. Chechlacz, T.J. Deerinck, G.A. Perkins, P.B. Allen, M.H. Ellisman, and J.G. Gleeson. 2007. Spinophilin facilitates dephosphorylation of doublecortin by PP1 to mediate microtubule bundling at the axonal wrist. Cell. 129:579-591.

      Chen, W.S., Y.J. Chen, Y.A. Huang, B.Y. Hsieh, H.C. Chiu, P.Y. Kao, C.Y. Chao, and E. Hwang. 2017. Ran-dependent TPX2 activation promotes acentrosomal microtubule nucleation in neurons. Sci. Rep. 7:42297.

      Dent, E.W., J.L. Callaway, G. Szebenyi, P.W. Baas, and K. Kalil. 1999. Reorganization and movement of microtubules in axonal growth cones and developing interstitial branches. J. Neurosci. 19:8894-8908.

      Dent, E.W., and K. Kalil. 2001. Axon branching requires interactions between dynamic microtubules and actin filaments. J. Neurosci. 21:9757-9769.

      Goo, B.S., D.J. Mun, S. Kim, T.T.M. Nhung, S.B. Lee, Y. Woo, S.J. Kim, B.K. Suh, S.J. Park, H.E. Lee, K. Park, H. Jang, J.C. Rah, K.J. Yoon, S.T. Baek, S.Y. Park, and S.K. Park. 2023. Schizophrenia-associated Mitotic Arrest Deficient-1 (MAD1) regulates the polarity of migrating neurons in the developing neocortex. Mol. Psychiatry. 28:856-870.

      Groen, A.C., L.A. Cameron, M. Coughlin, D.T. Miyamoto, T.J. Mitchison, and R. Ohi. 2004. XRHAMM functions in ran-dependent microtubule nucleation and pole formation during anastral spindle assembly. Curr. Biol. 14:1801-1811.

      Purro, S.A., L. Ciani, M. Hoyos-Flight, E. Stamatakou, E. Siomou, and P.C. Salinas. 2008. Wnt regulates axon behavior through changes in microtubule growth directionality: a new role for adenomatous polyposis coli. J. Neurosci. 28:8644-8654.

      Scrofani, J., T. Sardon, S. Meunier, and I. Vernos. 2015. Microtubule nucleation in mitosis by a RanGTP-dependent protein complex. Curr. Biol. 25:131-140.

      Soderberg, O., M. Gullberg, M. Jarvius, K. Ridderstrale, K.J. Leuchowius, J. Jarvius, K. Wester, P. Hydbring, F. Bahram, L.G. Larsson, and U. Landegren. 2006. Direct observation of individual endogenous protein complexes in situ by proximity ligation. Nat. Methods. 3:995-1000.

      Teng, J., Y. Takei, A. Harada, T. Nakata, J. Chen, and N. Hirokawa. 2001. Synergistic effects of MAP2 and MAP1B knockout in neuronal migration, dendritic outgrowth, and microtubule organization. J. Cell Biol. 155:65-76.

      Weibrecht, I., K.J. Leuchowius, C.M. Clausson, T. Conze, M. Jarvius, W.M. Howell, M. Kamali-Moghaddam, and O. Soderberg. 2010. Proximity ligation assays: a recent addition to the proteomics toolbox. Expert Rev Proteomics. 7:401-409.

      Winkle, C.C., R.H. Olsen, H. Kim, S.S. Moy, J. Song, and S.L. Gupton. 2016. Trim9 Deletion Alters the Morphogenesis of Developing and Adult-Born Hippocampal Neurons and Impairs Spatial Learning and Memory. J. Neurosci. 36:49404958.

      Yalgin, C., S. Ebrahimi, C. Delandre, L.F. Yoong, S. Akimoto, H. Tran, R. Amikura, R. Spokony, B. Torben-Nielsen, K.P. White, and A.W. Moore. 2015. Centrosomin represses dendrite branching by orienting microtubule nucleation. Nat. Neurosci. 18:1437-1445.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors set up a pipeline for automated high-throughput single-molecule fluorescence imaging (htSMT) in living cells and analysis of molecular dynamics

      Strengths:

      htSMT reveals information on the diffusion and bound fraction of molecules, dose-response curves, relative estimates of binding rates, and temporal changes of parameters. It enables the screening of thousands of compounds in a reasonable time and proves to be more sensitive and faster than classical cell-growth assays. If the function of a compound is coupled to the mobility of the protein of interest, or affects an interaction partner, which modulates the mobility of the protein of interest, htSMT allows identifying the modulator and getting the first indication of the mechanism of action or interaction networks, which can be a starting point for more in-depth analysis.

      Weaknesses:

      While elegantly showcasing the power of high-throughput measurements, the authors disclose little information on their microscope setup and analysis procedures. Thus, reproduction by other scientists is limited. Moreover, a critical discussion about the limits of the approach in determining dynamic parameters, the mechanism of action of compounds, and network reconstruction for the protein of interest is missing. In addition, automated imaging and analysis procedures require implementing sensitive measures to assure data and analysis quality, but a description of such measures is missing.

      The reviewer rightly highlights both the power and complexity in high throughput assay systems, and as such the authors have spent significant effort in first developing quality control checks to support screening. We discuss some of these as part of the description and characterization of the platform. We added additional details into the manuscript to help clarify. The implementation of our workflow for image acquisition, processing and analysis relies heavily on the specifics of our lab hardware and software infrastructure. We have added additional details to the text, particularly in the Methods section, and believe we have added enough information that our results can be reproduced using the suite of tools that already exist for single molecule tracking.

      The reviewer also points out that all assays have limitations, and these have not been clearly identified as part of our discussion of the htSMT platform. We have also added some comments on the limitations of the current system and our approach.

      Reviewer #2 (Public Review):

      Summary:

      McSwiggen et al present a high throughput platform for SPT that allows them to identify pharmaceutics interactions with the diffusional behavior of receptors and in turn to identify potent new ligands and cellular mechanisms. The manuscript is well written, it provides a solid new mentor and a proper experimental foundation

      Strengths:

      The method capitalizes and extends to existing high throughput toolboxes and is directly applied to multiple receptors and ligands. The outcomes are important and relevant for society. 10^6 cells and >400 ligands per is a significant achievement.

      The method can detect functionally relevant changes in transcription factor dynamics and accurately differentiate the ligand/target specificity directly within the cellular environment. This will be instrumental in screening libraries of compounds to identify starting points for the development of new therapeutics. Identifying hitherto unknown networks of biochemical signaling pathways will propel the field of single-particle live cell and quantitative microscopy in the area of diagnostics. The manuscript is well-written and clearly conveys its message.

      Weaknesses:

      There are a few elements, that if rectified would improve the claims of the manuscript.

      The authors claim that they measure receptor dynamics. In essence, their readout is a variation in diffusional behavior that correlates to ligand binding. While ligand binding can result in altered dynamics or /and shift in conformational equilibrium, SPT is not recording directly protein structural dynamics, but their effect on diffusion. They should correct and elaborate on this.

      This is an excellent clarifying question, and we have tried to make it more explicit in the text. The reviewer is absolutely correct; we’re not using SPT to directly measure protein structural dynamics, but rather the interactions a given protein makes with other macromolecules within the cell. So when an SHR binds to ligand it adopts conformations that promote association with DNA and other protein-protein interactions relevant to transcription. This is distinct from assays that directly measure conformational changes of the protein.

      L 148 What do the authors mean 'No correlation between diffusion and monomeric protein size was observed, highlighting the differences between cellular protein dynamics versus purified systems'. This is not justified by data here or literature reference. How do the authors know these are individual molecules? Intensity distributions or single bleaching steps should be presented.

      The point we were trying to make is that the relative molecular weights for the monomer protein (138 kDa for Halo-AR, 102 kDa for ER-Halo, 122 kDa for Halo-GR, and 135 kDa for Halo-PR) is uncorrelated with its apparent free diffusion coefficient. Were we to make this measurement on purified protein in buffer, where diffusion is well described by the Stokes Einstein equation, one would expect to see monomer size and diffusion related. We’ve clarified this point in the manuscript.

      Along the same lines, the data in Figs 2 and 4 show that not only the immobile fraction is increased but also that the diffusion coefficient of the fast-moving (attributed to free) is reduced. The authors mention this and show an extended Fig 5 but do not provide an explanation.

      This is an area where there is still more work to do in understanding the estrogen receptor and other SHRs. As the reviewer says, we see not only an increase in chromatin binding but also a decrease in the diffusion coefficient of the “free” population. A potential explanation is that this is a greater prevalence of freely-diffusing homodimers of the receptor, or other protein-protein interactions (14-3-3, P300, CBP, etc) that can occur after ligand binding. Nothing in our bioactive compound screen shed light on this in particular, and so we can only speculate and have refrained from drawing further conclusions in the text.

      How do potential transient ligand binding and the time-dependent heterogeneity in motion (see comment above) contribute to this? Also, in line 216 the authors write "with no evidence" of transient diffusive states. How do they define transient diffusive states? While there are toolboxes to directly extract the existence and abundance of these either by HMM analysis or temporal segmentation, the authors do not discuss or use them.

      Throughout the analysis in this work, we consider all of tracks with a 2-second FOV as representative of a single underlying population and have not looked at changes in dynamics within a single movie. As we show in the supplemental figures we added (see Figure 3, figure supplement 1), this appears to be a reasonable assumption, at least in the cases we’ve encountered in this manuscript. For experiments involving changes in dynamics over time, these are experiments where we’ve added compound simultaneous with imaging and collect many 2-second FOVs in sequence to monitor changes in ER dynamics. In this case when we refer to “transient states,” we are pointing out that we don’t observe any new states in the State Array diagram that exist in early time points but disappear at later time point.

      The reviewer suggests track-level analysis methods like hidden Markov models or variational Bayesian approaches which have been used previously in the single molecule community. These are very powerful techniques, provided the trajectories are long (typically 100s of frames). In the case of molecules that diffuse quickly and can diffuse out of the focal plane, we don’t have the luxury of such long trajectories. This was demonstrated previously (Hansen et al 2017, Heckert el al 2022) and so we’ve adopted the State Array approach to inferring state occupations from short trajectories. As the reviewer rightly points out, this approach potentially loses information about state transitions or changes over time, but as of now we are not aware of any robust methods that work on short trajectories.

      The authors discuss the methods for extracting kinetic information of ligand binding by diffusion. They should consider the temporal segmentation of heterogenous diffusion. There are numerous methods published in journals or BioRxiv based on analytical or deep learning tools to perform temporal segmentation. This could elevate their analysis of Kon and Koff.

      We’re aware of a number of approaches for analyzing both high framerate SMT as well as long exposure residence time imaging. As we say above, we’re not aware of any methods that have been demonstrated to work robustly on short trajectories aside from the approaches we’ve taken. Similarly, for residence time imaging there are published approaches, but we’re not aware of any that would offer new insight into the experiments in this study. If the reviewer has specific suggestions for analytical approaches that we’re not aware of we would happily consider them.

      Reviewer #3 (Public Review):

      Summary:

      The authors aim to demonstrate the effectiveness of their developed methodology, which utilizes super-resolution microscopy and single-molecule tracking in live cells on a high-throughput scale. Their study focuses on measuring the diffusion state of a molecule target, the estrogen receptor, in both ligand-bound and unbound forms in live cells. By showcasing the ability to screen 5067 compounds and measure the diffusive state of the estrogen receptor for each compound in live cells, they illustrate the capability and power of their methodology.

      Strengths:

      Readers are well introduced to the principles in the initial stages of the manuscript with highly convincing video examples. The methods and metrics used (fbound) are robust. The authors demonstrate high reproducibility of their screening method (R2=0.92). They also showcase the great sensitivity of their method in predicting the proliferation/viability state of cells (R2=0.84). The outcome of the screen is sound, with multiple compounds clustering identified in line with known estrogen receptor biology.

      Weaknesses:

      • Potential overstatement on the relationship of low diffusion state of ER bound to compound and chromatin state without any work on chromatin level.

      We appreciate the reviewers caution in over-interpreting the relationship between an increase in the slowest diffusing states that we observe by SMT and bona fide engagement with chromatin. In the case of the estrogen receptor there is strong precedent in the literature showing increases in chromatin binding and chromatin accessibility (as measured by ChIP-seq and ATAC-seq) upon treatment with either estradiol or SERM/Ds. Taken together with the RNA-seq, we felt it reasonable to assume all the trajectories with a diffusion coefficient less that 0.1 µm2/sec were chromatin bound.

      • Could the authors clarify if the identified lead compound effects are novel at any level?

      Most of the compounds we characterize in the manuscript have not previously been tested in an SMT assay, but many are known to functionally impact the ER or other SHRs based on other biochemical and functional assays. We have not described here any completely novel ER-interacting compounds, but to our knowledge this is the first systematic investigation of a protein showing that both direct and indirect perturbation can be inferred by observing the protein’s motion. Especially for the HSP90 inhibitors, the observation that inhibiting this complex would so dramatically increase ER chromatin-binding as opposed to increasing the speed of the free population is counterintuitive and novel.

      • More video example cases on the final lead compounds identified would be a good addition to the current data package.

      Reviewer #1 (Recommendations For The Authors):

      General:

      • More information on the microscope setup and analysis procedures should be given. Since custom code is used for automated image registration, spot detection, tracking, and analysis of dynamics, this code should be made publicly available.

      Results:

      • line 97: more details about the robotic system and automatic imaging, imaging modalities, and data analysis procedures should be given directly in the text.

      Additional information added to text and methods

      • line 100: we generated three U2OS cell lines --> how?

      Additional information added to text and methods

      • line 101: ectopically expressing HaloTag fused proteins --> how much overexpression did cells show?

      The L30 promoter tends to produce fairly low expression levels. The same approach was used for all ectopic expression plasmids, and for the SHRs the expression levels were all comparable to endogenous levels. We have not checked this for H2B, Caax and free Halo but given that the necessary dye concentration to achieve similar spot densities is within a 10-fold range for all constructs, its reasonable to say that those clonal cell lines will also have modest Halotag expression.

      • line 107: Single-molecule trajectories measured in these cell lines yielded the expected diffusion coefficients --> how was data analysis performed?

      Additional information added to text and methods

      • line 109: how was the localization error determined?

      Additional information added to text and methods

      • line 155: define occupation-weighted average diffusion coefficient.

      Additional information added to text and methods

      • line 157: with 34% bound in basal conditions and 87% bound after estradiol treatment  contradicts figure 2b, where the bound fraction is up to 50% after estradiol treatment.

      Line 157 is the absolute fraction bound, figure 2b is change in fbound

      • line 205: Figure 2c is missing.

      Fixed

      • line 215: within minutes --> how was this data set obtained? which time bins were taken?

      Additional information added to text and methods

      • line 216: with no evidence of transient diffusive states  What is meant by transient diffusive state? It seems all time points have a diffusive component, which decreases over time.

      Additional information added to text and methods

      The diffusive peak decreases, the bound peak increases but no other peaks emerge during that time (e.g. neither super fast nor super slow)

      • line 225: it seems that fbound of GDC-0810 and GDC-0927 are rather similar in FRAP experiments, please comment, how was FRAP done?

      FRAP is in the methods section. The curves and recovery times are quite distinct, is the reviewer looking at

      • line 285: reproducibly: how often was this repeated?

      Information added to the manuscript

      • line 285: it would be necessary to name all of the compounds that were tested, e.g. with an ID number in the graph and a table. This also refers to extended data 7 and 8.

      Additional supplemental file with the list of bioactive compounds tested will be included.

      • line 290/1: what is meant by vendor-provided annotation was poorly defined?

      Additional information added to text and methods. Specifically, the “other” category is the most common category, and it includes both compounds with unknown targets/functions as well as compound where the target and pathway are reasonably well documented. Hence, we applied our own analysis to better understand the list of active compounds.

      Figures:

      • fig. 2-6: detailed statistics are missing (number of measured cells, repetitions, etc.).

      We have added clarifying information, including an “experiment design and sample size” section in the Methods.

      • fig. 3: the authors need to give a list with details about the 5067 compounds tested,

      Additional supplemental file with the list of bioactive compounds tested will be included.

      • extended data 1c: time axis does not correspond to the 1.5s of imaging in the text, results line 127.

      Axes fixed

      • extended data 3: panel c and d are mislabeled.

      Panel labels fixed

      Methods:

      • line 746: HILO microscope: the authors need to explain how they can get such large fields of view using HILO

      Additional details added to the materials and methods. The combination of the power of the lasers, the size of the incident beam out of the fiber optic coupling device and the sCMOS camera are the biggest components that enable detection over a larger field of view.

      • line 761: it is common practice to publish the analysis code. Since the authors wrote their own code, they should publish it

      Our software contains proprietary information that we cannot yet release publicly. Comparable results can be achieved with HILO data using publicly-available tools like utrack. State Arrays code is distributed and the parameters used are listed in the M&M.

      Reviewer #2 (Recommendations For The Authors):

      The writing and presentation are coherent, concise, and easy to follow.

      The authors should consider justifying the following:

      Why is 1.5s imaging time selected? Topological and ligand variations may last significantly longer than this. The authors should present at least for one condition the same effect images for longer.

      Related to the similar comment above, we added a figure examining the jump length distribution as a function of frame. Over the 6 seconds of data collection the jump length distribution is unchanged, suggesting it is reasonable to consider all the trajectories within an FOV as representative of the same underlying dynamical states.

      The authors miss the k test or T test in their graphs.

      We chose to apply the Kurskal-Wallis test in the context of the bioactive screen to assess whether a grouping of compounds based on their presumed cellular target was significantly different from the control even when individual compounds might not by themselves raise to significance. In this case many of the pathway inhibitors are subtle and not necessarily obvious in their difference. In the other cases throughout the manuscript, whether two conditions are statistically distinguishable is rarely in question and of far less importance to the conclusions in the manuscript than the magnitude of the difference. We’ve added statistical tests where appropriate.

      The overall integrated area of Fig 4a appears to reduce upon ligand addition. Data appear normalized but the authors should also add N (number of molecules) on top of the graphs.

      While the integrated area may appear to decrease, all State Array analysis is performed by first randomly sampling 10,000 trajectories from the assay well and inferring state distribution on those 10,000. This has been clarified in the figure legend and in the Methods.

      Minor

      Extended Figure 3 legend c, d appear swapped and incorrectly named in the text.

      Panel labels fixed

      L 197 but this appears not to BE a general feature of SHRs (maybe missing Be).

      Error fixed

      L205 authors refer to Figure 2c, which does not exist.

      Panel reference fixed

      Reviewer #3 (Recommendations For The Authors):

      Among minor issues:

      In Figure 1B, if the authors could specify how they discriminate the specific cell lines from the mixed context, it would enhance clarity. Could they perform additional immunofluorescence to understand how the assignment is determined? Alternatively, could they also show the case with isolated cell lines in an unmixed context?

      Immunofluorescence would be a challenge given that there is not a good epitope to distinguish the three ectopically-expressed genes from each other or from endogenous proteins in the case of H2B and CaaX. We are really reliant on the single cell dynamics to determine the likely cell identity. That said, we’ve added graphs of a number of individual cell State Arrays from the same data graphed in 1A which support the notion that it’s reasonable to assume a cells identity given the observed dynamics.

      In Extended Figure 2F: possibly a CHip-Seq experiment would be more directly qualified to state the effect of ER ligand on ER ability to bind chromatin.

      This is true. Presumably ER that is competent at activating transcription of ER-responsive genes is also capable of binding DNA. ChIP would be the more direct measure, but would not address whether the protein was functional. We chose to balance these measuring these two aspects of ER biology by pairing dynamics with the end-point transcription readout.

      In Figure 3: A representation with plate-by-plate orientation along the x-axis, with controls included in each plate, would be more appropriate to reflect the consistency of the controls used in the assay across different plates. Currently, all controls are pooled in one location, and we cannot appreciate how the controls vary from plate to plate.

      Figure added to the supplement

      Also in this figure, a general workflow of the screen down to segmentation/analysis would be a great add-on.

      New figure added to the supplement and reflected in the textual description of the platform

      In Extended Figures 3B and C an add-on of the positive and negative control would make the figure more convincing.

      Addressed as part of figure added to the supplement

      Is there any description of compound leads identified that is novel in nature in relation to impact on ER, and if so could it be stated more clearly in the text as novel finding?

      To our knowledge, the impact of HSP inhibition in increasing ER-chromatin association has never been described, neither has the link between inhibition post-translation modifying enzymes like the CDKs or mTOR and ER dynamics ever been described. We added clarifying text to the manuscript

    1. Author response:

      We thank the reviewers for their positive assessments and constructive feedback.

      In light of their comments, we will aim to improve the explanation of the methods and interpretation of results, as well as their relation to well-established literature in this research area.

      The major contributions of our work are threefold:

      • First, we introduce a novel way of analyzing codas that specifically targets subcoda structures by considering inter-click intervals within codas in terms of transition probabilities. By describing codas’ click patterns via Variable Length Markov Chains, we do not need to consider codas in their entirety, but we can detect coda subunits.This enables a new dimension for quantitatively comparing differences among various individuals, social units, and clans; which we term ‘vocal style’.

      • Using this approach, we reinforce findings from past research, including the idea that identity codas function as symbolic markers of vocal clan identity (Hersh et al., 2022; Sharma et al., 2024). More importantly, we offer new insights into the function of non-identity codas, which comprise the majority of coda types produced by sperm whales but have been largely uncharacterized. 

      • Our work reveals that non-identity coda vocal styles are more similar for spatially overlapped clans, and suggests that this similarity in style may be maintained by social learning across clan boundaries. This opens up a paradigm shift in our understanding of between-clan acoustic interactions.

      From a broader perspective, our work builds on two well-established research areas: the form and function of sperm whale codas, and statistical generative models, specifically Variable Length Markov Chains on finite data spaces. Our methods, results, and interpretations are grounded in theories and concepts from these fields.

      For clarity, we will ensure that our terminology aligns with field standards and existing research. We will clearly introduce each key theory or concept at first mention and justify its relevance. In particular, we will clarify the definition and meaning of the distance between subcoda trees for a general audience. We agree with the reviewers’ comments on the broader implications and will refine our work accordingly.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      Both reviewers positively received the manuscript, in general. The agreement was that the manuscript presented valuable findings, using solid techniques and approaches, that shed additional light into how the canine distemper virus hemagglutinin might engage cellular receptors and how that engagement impacts host tropism. While both reviewers appreciated the X-ray crystallographic data, they also felt that the AFM experiments could have been performed at a higher standard and that the interpretation of the results ensuing from those AFM experiments could have been explained more thoroughly and in simpler terms. An additional missed opportunity of the current manuscript is the lack of comparison of the crystal structure to that of the already published cryo-EM structure, for context.

      Thank you very much for constructive comments of the editor and reviewers. Following your comments, we have changed the text related to the AFM experiments with simpler terms as follows.

      “When CDV-H was loaded onto a mica substrate and scanned with a cantilever to acquire images of attached molecules, the CDV-H dimer was observed as two globules clustered together in most cases, but sometimes, each domain moved independently (Fig. 7B and Supplementary Movie). Time-course analysis of the dynamics of the representative CDV-H dimer showed that CDV-H could adopt both associated and dissociated forms (Fig. 7C). The distances between the domains were calculated by measuring those between the centers of mass of each domain. Finally, the distribution of distances between each head domain in the CDV-H dimers showed approximately 15 nm as a major peak (Fig. 7D). This is a reasonable length for the linker between the head domain dimers.” in Page 11, Lines 8-17.

      With regards to the structural comparison between cryo-EM structure published in Proc. Natl. Acad. Sci. U. S. A. (2023) 120, e2208866120 and our crystal structure, we have compared these structures for Cα on page 6 and added the following text. “A recent cryo-EM structure of the wild-type CDV-H ectodomain revealed that the head dimer is located on one side of the stalk region in solution (Proc. Natl. Acad. Sci. U. S. A. (2023) 120, e2208866120)” in Page 14, Lines 22-24.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Fukuhara, Maenaka, and colleagues report a crystal structure of the canine distemper virus (CDV) attachment hemagglutinin protein globular domain. The structure shows a dimeric organization of the viral protein and describes the detailed amino-acid side chain interactions between the two protomers. The authors also use their best judgement to comment on predicted sites for the two cellular receptors - Nectin-4 and SLAM - and thus speculate on the CDV host tropism. A complementary AFM study suggests a breathing movement at the hemagglutinin dimer interface.

      Strengths:

      The study of CDV and related Paramyxoviruses is significant for human/animal health and is very timely. The crystallographic data seem to be of good quality.

      Thank you very much for the constructive comment of the reviewer.

      Weaknesses:

      While the recent CDV hemagglutinin cryo-EM structure is mentioned, it is not compared to the present crystal structure, and thus the context of the present study is poorly justified. Additionally, the results of the AFM experiment are not unexpected. Indeed, other paramyxoviral RBP/G proteins also show movement at the protomer interface.

      Thank you very much for constructive comments of the reviewer. When we submitted our manuscript to e-life, cryo-EM structure just published in Proc. Natl. Acad. Sci. U. S. A. (2023) 120, e2208866120 a week ago was not able to be available. Following the comment of the reviewer, we have added the text about the structural comparison between the cryo-EM structure and our crystal structure. We also have changed the text related to the AFM experiments to tone down the movement of the protomer interfaceas follows.

      “This observation raises the possibility that each head domain of CDV-H also dissociates and moves flexibly, as shown in the structure of Nipah virus (NiV)-G protein, previously (Science (2022) 375, 1373–1378).” in Page 11, Lines 4-6.

      Reviewer #2 (Public Review):

      Summary:

      The authors solved the crystal structure of CDV H-protein head domain at 3,2 A resolution to better understand the detailed mechanism of membrane fusion triggering. The structure clearly showed that the orientation of the H monomers in the homodimer was similar to that of measles virus H and different from other paramyxoviruses. The authors used the available co-crystal strictures of the closely related measles virus H structures with the SLAM and Nectin4 receptors to map the receptor binding site on CDV H. The authors also confirmed which N-linked sites were glycosylated in the CDV H protein and showed that both wildtype and vaccine strains of CDV H have the same glycosylation pattern. The authors documented that the glycans cover a vast majority of the H surface while leaving the receptor binding site exposed, which may in part explain the long-term success of measles virus and CDV vaccines. Finally, the authors used HS-AFM to visualize the real-time dynamic characteristics of CDV-H under physiological conditions. This analysis indicated that homodimers may dissociate into monomers, which has implications for the model of fusion triggering.

      The structural data and analysis were thorough and well-presented. However, the HS-AFM data, while very exciting, was not presented in a manner that could be easily grasped by readers of this manuscript. I have some suggestions for improvement.

      (1) The authors claim their structure is very similar to the recently published croy-EM structure of CDV H. Can the authors provide us with a quantitative assessment of this statement?

      Thank you very much for constructive comments of the reviewer. When we submitted our manuscript to e-life, cryo-EM structure just published in Proc. Natl. Acad. Sci. U. S. A. (2023) 120, e2208866120 a week ago was not able to be available. Following the comment of the reviewer, we have added the text about the structural comparison between the cryo-EM structure and our crystal structure. We also have changed the text related to the AFM experiments to tone down the movement of the protomer interface as follows.

      “This observation raises the possibility that each head domain of CDV-H also dissociates and moves flexibly, as shown in the structure of Nipah virus (NiV)-G protein, previously (Science (2022) 375, 1373–1378).” in Page 11, Lines 4-6.

      (2) The results for the HS-AFM are difficult to follow and it is not clear how the authors came to their conclusions. Can the authors better explain this data and justify their conclusions based on it?

      Thank you very much for constructive comments of the reviewer. Following your comments, we have changed the text related to the AFM experiments with simpler terms as follows.

      “When CDV-H was loaded onto a mica substrate and scanned with a cantilever to acquire images of attached molecules, the CDV-H dimer was observed as two globules clustered together in most cases, but sometimes, each domain moved independently (Fig. 7B and Supplementary Movie). Time-course analysis of the dynamics of the representative CDV-H dimer showed that CDV-H could adopt both associated and dissociated forms (Fig. 7C). The distances between the domains were calculated by measuring those between the centers of mass of each domain. Finally, the distribution of distances between each head domain in the CDV-H dimers showed approximately 15 nm as a major peak (Fig. 7D). This is a reasonable length for the linker between the head domain dimers.” in Page 11, Lines 8-17.

      (3) The fusion triggering model in Figure 8 is ambiguous as to when H-F interactions are occurring and when they may be disrupted. The authors should clarify this point in their model.

      Thank you very much for constructive comments of the reviewer. Following your comments, we have changed the Figure 8 and its legend.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) AFM experiments with SLAM or Nectin-4 immobilized on the cantilever would be much more informative.

      Thank you very much for the constructive comment of the reviewer. We will try this experiment in the next paper.

      (2) The authors should compare their crystal structure to that of the reported cryo-EM structure.

      With regards to the structural comparison between cryo-EM structure published in Proc. Natl. Acad. Sci. U. S. A. (2023) 120, e2208866120 and our crystal structure, we have added the text.

      (3) Figure 1D - why does the beta2 MG negative control have such a high SPR signal?

      Thank you very much for the constructive comment of the reviewer. The immobilization levels for b 2-microglobulin (beta2 MG), CDV-OP-H and CDV-5VD-H were similar, 1204.7 RU, 1235.7 RU, and 1504.5 RU, respectively. We applied relatively high concentrations (5 mM) of dNectin4 and hNectin4 onto the chip to determine low-affinity dissociation constants. Then, the signals for beta2 MG (negative control) were high. In other SPR experiments for cell surface receptors, such high signals for beta2 MG were often observed in our previous paper, Kuroki et al., J. Immunol. 2019 Dec 15;203(12):3386-3394. doi: 10.4049/jimmunol.1900562. Therefore, we think that these SPR signals are not unusual.

      (4) Figure 1C - please indicate the Ve volume for the peak and add in Ve for standard.

      Thank you very much for the constructive comment of the reviewer. We have indicated the Ve volume for the peak and added in Ve for standard in Figure 1C.

      (5) The authors mention that one of the chains in the asymmetric unit was better resolved than the other. Please show regions of the atomic model fit regions of the electron density to convince the reader of the quality of your data.

      Thank you very much for the constructive comment of the reviewer. We have added new Supplementary figure 2 for comparison of electron density maps of chains A and B.

      (6) Table 2 indicates that the difference between Rw and Rf values is larger than 5% which indicates slight overfitting during refinement. Please provide details of your refinement strategy and attempt simulated annealing as a strategy to reduce this delta.

      Thank you very much for the constructive comment of the reviewer. We further introduced TLS and NCS parameters for the refinement. Consequently, the R/Rfree factors became 0.2645/0.3092. Simulated annealing had been already carried out. All the refinement statistics in the table 2 are updated.

      Reviewer #2 (Recommendations For The Authors):

      (1) The authors' fusion triggering model was difficult to follow. For example, this sentence was difficult to understand: "The other possible models may include the monomer-dimer-tetramer transition facilitated by receptor binding for the fusion."

      Thank you very much for the constructive comment of the reviewer. Following your comments, we have removed the above sentences and have added the detail mechanism of the proposed model in Discussion. Furthermore, we have changed the Figure 8 and its legend for readers to understand more clearly.

      (2) Figure 5A is not called out in the main text.

      Thank you very much for the constructive comment of the reviewer. Following your comments, we have added the text as follows.

      “the crystal structure of MeV-H in complex with hNectin-4 showed that the H-SLAM interaction consists of three main sites (Fig. 5A) (Nat. Struct. Mol. Biol. (2013) 20, 67–72).” in Page 11, Lines 4-6.

      (3) Page 9, Line 4: interspaces? Perhaps interphases.

      Thank you very much for the constructive comment of the reviewer. We have changed the term “interspaces” to “internal spaces”.

      (4) Page 12, penultimate line: The authors mention "epitopes for anti-MeV-H Abs." Do they mean anti-CDV-H Abs?

      Thank you very much for the constructive comment of the reviewer. Following your comments, we have changed the “anti-MeV-H Abs” to “anti-morbillivirus H neutralizing antibodies”.

      (5) The paper will benefit from an English language editor to help clarify what the authors are trying to convey.

      Thank you very much for the constructive comment of the reviewer.

      We have asked a English proof reading company to check.

    1. Author response:

      We are grateful to the reviewers for their interest and enthusiasm about the work, and deeply appreciate their constructive comments and suggestions. Our responses are below

      (1) Do mice with BCR-ABL/MSI2-HOXA9 leukemia have an increased pool of leukemic stem cells (LSC), or do they have an increased propensity to develop blast cells? Is it the number of LSCs that has increased, or is it the function of LSC to give rise to the disease that has increased? It is not clear if the detected differences in Lineage-negative cells (Figure S1D) were detected in vitro in retrovirally transduced cells or were detected in vivo in transplanted mice. If the differences were detected in vitro, could the author confirm the same findings in vivo? This will greatly enhance the understanding of in vivo disease pathogenesis and could directly link the aggressivity of the disease (shortened survival) with an increased stem cell-like population.

      We find that BCR-ABL/MSI2-HOXA9 leads to a marked increase in Lineage negative (Lin-) cells which contains the LSC fraction. Specifically, the LSC containing fraction represented 14.1% of the BCR-ABL driven disease and 56.7% of the BCR-ABL and MSI2-HOXA9 driven disease (p<.0001). This suggests that MSI2-HOXA9 triggers the expansion of the undifferentiated LSC containing pool. In addition, the blast frequency was also increased albeit to a lesser extent, with 63.8% blasts (SEM 1.1) for BCR-ABL and 83.3% (SEM 3.1) for BCR-ABL/MSI2-HOXA9 (p=.0001). This suggests that the resulting aggressive disease seen with MSI2-HOXA9 is a consequence of a large increase in undifferentiated  LSC containing cells, as well as the resulting increase in the blast count. The Lin- cells were analyzed from fully established leukemias in vivo (Fig. S1D)

      (2) The authors suggest that BCR-ABL/MSI2-HOXA9 leads to the development of blast crisis-CML. One of the main characteristics of blast crisis-CML is drug resistance. Is BCR-ABL/MSI2-HOXA9 leukemia resistant to classical CML treatment drugs?

      The sensitivity to Imatinib is a very interesting question. In general, while differentiated cells in CML are sensitive to Imatinib, the more undifferentiated cells (LSCs) are resistant1,2. Based on the fact that therapy resistance in blast crisis is largely driven by the undifferentiated fraction of leukemia cells, and given that BCR-ABL/MSI2-HOXA9 driven disease harbors a larger fraction of these undifferentiated cells, we would predict that BCR-ABL/MSI2-HOXA9 leukemia would also be more resistant to imatinib. However, this would need to be experimentally demonstrated and is an important question to address.

      (3) The authors have emphasized the heightened expression of Polrmt in delineating the mitochondrial phenotype of BCR-ABL/MSI2-HOXA9 leukemia cells. However, the regulatory mechanism governing the expression of Polrmt by MSI2-HOXA9 has not been clearly demonstrated by the authors. Unveiling this mechanism would constitute a novel finding and significantly elevate the quality of the research.

      Since Polrmt and mitochondrial genes are transcribed in the nucleus we explored whether MSI2-HOXA9 may control mitochondrial gene expression by triggering expression of Polrmt and other key transcription factors. Consistent with this possibility, MSI2-HOXA9 was preferentially found in the nucleus relative to MSI2. In addition, there were 10 occurrences of the minimal MSI2 RRM1 consensus binding sequence UAGU within the Polrmt transcript. While this is consistent with the possibility that Polrmt expression can be post-transcriptionally modulated by MSI2-HOXA9, this needs to be experimentally validated using Clip Seq analysis with wild type MSI2 as well as the MSI2-HOXA9 fusion protein in context of blast crisis CML.

      (4) Did the authors observe any survival differences between BCR-ABL/NUP98-HOXA9 and BCR-ABL/MSI2-HOXA9?

      In previous work from our lab we have found that the median survival for BCR-ABL/NUP98-HOXA9 was 17 days, and with BCR-ABL/ MSI2-HOXA9 was 18.5 days (p value of 0.22). This suggests that there is not a significant difference in survival times between the leukemias driven by the distinct alleles, and they may be equally aggressive.

      (1) MSI2-HOXA9 fusion is extremely rare as it has been only found in a handful of patients and it is not clear whether other MSI2 fusions function in a similar manner.

      We were very surprised and excited to see the large number of translocations in solid cancers that involve MSI2.  Interestingly, MSI2 translocations occurred both at the N and the C terminus.  Distinct translocations are likely to have unique roles in each disease context. For example, if MSI2’s 5 prime end is part of a translocation, it may functionally contribute via its promoter to drive expression in immature cells and could thus activate oncogenic signals (e.g. controlled by the partner gene) in immature cells which are inherently more susceptible to transformation (Eµ-myc is an example of such a translocation). If Msi2’s RRM domains are part of the fusion, they could bind and target RNAs aberrantly (such as in the wrong cell and the wrong time) and lead to activation of downstream oncogenic mediators. To fully understand the role of each of these translocations in each specific cancer, we would need to experimentally test their impact by ectopic expression in the appropriate cell of origin and domain mapping the basis of any impact in the relevant cancer models as we have done for MSI2-HOXA9 in blast crisis CML in the work we report here.   While this is an intensive undertaking, it is nonetheless important future work as it will undoubtedly lead to new insight about MSI2 linked translocations in diverse solid cancers such as breast cancer and lung cancer.

      (2) The mechanism needs to be strengthened since MSI2 alone or the HOXA9 mutant may not be linked to the mitochondrial mechanism. (3) It is not clear that the mitochondrial pathway is sufficient for the MSI2-HOXA9 oncogenic mechanism.

      Our observation that MSI2-HOXA9 triggered changes in mitochondrial function was of particular interest as it was (to our knowledge) uncharted in context of Msi2 signaling in cancer, thus leading us to explore this further.  However, multiple other signals are likely downstream regulators and these may well act cooperatively with, or independently of, the heightened­­ mitochondrial function we report here. Among these pathways, the most likely mediators included oncogenic programs related to the Wnt pathway including Wnt, Fzd 3 and Frat1, and those related to the Notch pathway including-Tribbles and Hey1 as well as other stem cell genes such as Aldh1. These programs have been previously implicated in the regulation of myeloid leukemia3-11 and could well mediate the impact of the MSI2-HOXA9 translocation. The relative contribution of mitochondrial metabolism and that of developmental and stem cell signals to the onset of MSI2-HOXA9 driven blast crisis CML is an important avenue of future work.

      References

      (1) Corbin, A. S., Agarwal, A., Loriaux, M., Cortes, J., Deininger, M. W. & Druker, B. J. 2011. Human chronic myeloid leukemia stem cells are insensitive to imatinib despite inhibition of BCR-ABL activity. J Clin Invest 121: 396-409. PMC3007128.

      (2) Graham, S. M., Jørgensen, H. G., Allan, E., Pearson, C., Alcorn, M. J., Richmond, L. & Holyoake, T. L. 2002. Primitive, quiescent, Philadelphia-positive stem cells from patients with chronic myeloid leukemia are insensitive to STI571 in vitro. Blood 99: 319-325.

      (3) Gurska, L. M., Ames, K. & Gritsman, K. 2019. Signaling Pathways in Leukemic Stem Cells. Adv Exp Med Biol 1143: 1-39. PMC7249489.

      (4) Narendra, G., Raju, B., Verma, H. & Silakari, O. 2021. Identification of potential genes associated with ALDH1A1 overexpression and cyclophosphamide resistance in chronic myelogenous leukemia using network analysis. Med Oncol 38: 123.

      (5) Ran, D., Schubert, M., Pietsch, L., Taubert, I., Wuchter, P., Eckstein, V., Bruckner, T., Zoeller, M. & Ho, A. D. 2009. Aldehyde dehydrogenase activity among primary leukemia cells is associated with stem cell features and correlates with adverse clinical outcomes. Exp Hematol 37: 1423-1434.

      (6) Reya, T., Duncan, A. W., Ailles, L., Domen, J., Scherer, D. C., Willert, K., Hintz, L., Nusse, R. & Weissman, I. L. 2003. A role for Wnt signalling in self-renewal of haematopoietic stem cells. Nature 423: 409-414.

      (7) Riether, C., Schürch, C. M., Bührer, E. D., Hinterbrandner, M., Huguenin, A. L., Hoepner, S., Zlobec, I., Pabst, T., Radpour, R. & Ochsenbein, A. F. 2017. CD70/CD27 signaling promotes blast stemness and is a viable therapeutic target in acute myeloid leukemia. J Exp Med 214: 359-380. PMC5294846.

      (8) Riether, C., Schürch, C. M., Flury, C., Hinterbrandner, M., Drück, L., Huguenin, A. L., Baerlocher, G. M., Radpour, R. & Ochsenbein, A. F. 2015. Tyrosine kinase inhibitor-induced CD70 expression mediates drug resistance in leukemia stem cells by activating Wnt signaling. Sci Transl Med 7: 298ra119.

      (9) Venton, G., Pérez-Alea, M., Baier, C., Fournet, G., Quash, G., Labiad, Y., Martin, G., Sanderson, F., Poullin, P., Suchon, P., Farnault, L., Nguyen, C., Brunet, C., Ceylan, I. & Costello, R. T. 2016. Aldehyde dehydrogenases inhibition eradicates leukemia stem cells while sparing normal progenitors. Blood Cancer J 6: e469. PMC5056970.

      (10) Yin, D. D., Fan, F. Y., Hu, X. B., Hou, L. H., Zhang, X. P., Liu, L., Liang, Y. M. & Han, H. 2009. Notch signaling inhibits the growth of the human chronic myeloid leukemia cell line K562. Leuk Res 33: 109-114.

      (11) Kang, Y. A., Pietras, E. M. & Passegué, E. 2020. Deregulated Notch and Wnt signaling activates early-stage myeloid regeneration pathways in leukemia. J Exp Med 217. PMC7062512.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Will the nanobody be available to the TB research community?

      Yes, we will make E11rv available upon request. Please see our materials availability statement.

      Reviewer #2 (Recommendations For The Authors):

      (1) It would be interesting to test the potential impact of residual ASB-14 contaminant on the biochemical behavior of ESAT-6-CFP10 heterodimer and ESAT-6 homodimer or tetramer and their hemolytic activity in comparison with the ones without ASB-14.

      We agree that this is an interesting line of questioning. Based on the study by Refai et al. that we cite in the text, ESAT-6 treated with nonionic detergents ASB-14 or LDAO, but not other common detergents, undergoes a conformational change that increases its cytotoxicity in cell assays, hemolytic activity, and ability to dimerize with CFP-10. What is not known at this point, is how similar the ASB-bound conformation is to anything seen physiologically.

      (2) Building on the progress in making anti-ESAT-6 nanobodies and their anti-Mtb effects in the cells, it could have been tested in human or mouse primary macrophages infected with Mtb and a mouse model of Mtb infection for its anti-Mtb efficiency.

      We thank the reviewer for this suggestion, and we agree that these would be very informative next steps for determining the therapeutic potential of anti-ESAT-6 nanobodies.

      Reviewer #3 (Recommendations For The Authors):

      Minor comments:

      Line 133: "It is well established that Mm-induced hemolysis is ESX-1 dependent, but our results suggest that Mtb must lack one or more factors necessary for efficient hemolysis.". I would tone this down a bit, as it is also known that M. tuberculosis escapes much later than M. marinum from the phagosome, which could indicate different kinetics.

      We thank the reviewer for their insightful comments. We agree that the kinetics of Mtb and Mm infection are quite different and that this may impact the hemolysis assay. As described by Augenstreich et al. some hemolysis by Mtb is observed at 48 hours, though the method of normalization makes it impossible to determine absolute amount of hemolysis that occurred in their experiment. Our findings just show that the absolute amount of Mtb hemolysis in 2 hours is negligible, setting it apart from Mm. We have edited the wording of this statement in the manuscript to avoid any confusion.

      Line 155: "Because Mtb often exists in an acidified compartment". First of all, the reference used here does not discuss anything about Mtb, secondly, papers that do measure the acidification of Mtb-loaded phagosomes indicate that this acidification is very mild (typically to pH 6.2).

      We agree that this point should be articulated more precisely. We have added additional clarification that the pH of Mtb-containing compartments in macrophages can fall in a broad range depending on the activation state of the macrophages, and that non-activated macrophages are typically only mildly acidic. We have updated our references to better describe the current state of knowledge on this topic.

      Line 339: "Whereas most of these functions rely only on the secretion of ESAT-6 into the cytoplasm, the ability of E11rv to access Mtb suggests that this communication is likely two-way." No, not necessary, there are many processes in which ESX-1 substrates affect the macrophage. This nanobody could affect EsxA functioning only once the bacteria reach the cytoplasm. I think checking phagosomal escape in these cells is therefore crucial.

      We agree that phagosomal escape and subsequent direct secretion of ESAT-6 into the cytoplasm is a reasonable alternative hypothesis. We have added this point to our discussion, and we agree that looking directly at phagosomal escape is an important next step.

      Figure 7 is not mentioned in the text (mistake for Fig 6).

      This has been corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study is highly interesting and the applied methods are target-oriented. The biophysical characterization of viable N-protein species and several representative N-protein mutants is supported by the data, including polarity, hydrophobicity, thermodynamic stability, CD spectra, particle size, and especially protein self-association. The physicochemical parameters for viable N-protein and related coronavirus are described for comparison in detail. However, the conclusion becomes less convincing that the interaction of peptides or motifs was judged by different biophysical results, with no more direct data about peptide interaction. Additionally, the manuscript could benefit from more results involving peptide interaction to support the author's opinions or make expression more accurate when concerning the interaction of motifs. Although the authors put a lot of effort into the study, there are still some questions to answer.

      We thank the Reviewer for this assessment and wholeheartedly agree that there are still many questions. The main thrust of the present work was not intended to unravel the detailed mechanistic origin of all observations, but rather to juxtapose the different observations made with different viable N-protein species across the mutant spectrum, in order to get a sense of how narrowly the biophysical phenotype is confined to ensure virus viability. Such a study has become possible for the first time with the unprecedented genomic database of SARS-CoV-2. This has led to observations of non-local effects of individual mutations that are not independent and non-additive relative to the effects of other mutations, and in that sense we have inferred ‘interactions’. These might be mediated by direct contacts or indirectly through altered chain configurations. In the revised manuscript we have clarified this point.

      Meanwhile, a number of documented direct physical intra-molecular and intra-dimer interactions provide a context to our study of mutation effects. The flexibility of the IDRs provides a rich variety of contacts that have been observed in molecular dynamics and single-molecule fluorescence studies (Rozycki & Boura, Biophys Chem. 2022 and Cubuk et al, Nat Communs 2021). We have previously carried out detailed hydrodynamic studies of self-association interfaces located in the leucine-rich region. More recently, NMR data just published by the Blackledge laboratory (Botova et al., bioRxiv 2024) extend the list of intra-molecular contacts with the observation of long-range intra-molecular interactions between the NTD and the CTD, NTD and the phosphorylated SR-rich region, and NTD and the previously studied leucine-rich region. The latter contacts require the C-terminal region of the linker to loop back onto the NTD, which may well introduce susceptibility to any of the linker mutations. However, detailed linker configurations are beyond the scope of the present work.

      With regard to the effects of the Omicron mutations in the N-arm IDR, we have shown hydrodynamic data directly demonstrating peptide self-association, and we are currently working on a more detailed functional follow-up study which we hope to communicate soon.

      Reviewer #2 (Public Review):

      Summary: This work focuses on the biochemical features of the SARS-CoV-2 Nucleocapsid (N)protein, which condenses the large viral RNA genome inside the virus and also plays other roles in the infected cell. The N protein of SARS-CoV-2 and other coronaviruses is known to contain two globular RNA-binding domains, the NTD and CTD, flanked by disordered regions. The central disordered linker is particularly well understood: it contains a long SR-rich region that is extensively phosphorylated in infected cells, followed by a leucine-rich helical segment that was shown previously by these authors to promote N protein oligomerization.

      In the current work, the authors analyze 5 million viral sequence variants to assess the conservation of specific amino acids and general sequence features in the major regions of the N protein. This analysis shows that disordered regions are particularly variable but that the general hydrophobic and charge character of these regions are conserved, particularly in the SR and leucine-rich regions of the central linker. The authors then construct a series of N proteins bearing the most prevalent mutations seen in the Delta and Omicron variants, and they subject these mutant proteins to a comprehensive array of biophysical analyses (temperature sensitivity, circular dichroism, oligomerization, RNA binding, and phase separation).

      Strengths:

      The results include a number of novel findings that are worthy of further exploration. Most notable are the analyses of the previously unstudied P31L mutation of the Omicron variant. The authors use ColabFold and sedimentation analysis to suggest that this mutation promotes the self-association of the disordered N-terminal region and stimulates the formation of N protein condensates. Although the affinity of this interaction is low, it seems likely that this mutation enhances viral fitness by promoting N-terminal interactions. The work also addresses the impact of another unstudied mutation, D63G, that is located on the surface of the globular NTD and has no significant effect on the properties analyzed here, raising interesting questions about how this mutation enhances viral fitness. Finally, the paper ends with studies showing that another common mutant, R203K/G204R,disrupts phase separation and might thereby alter N protein function in a way that enhances viral fitness.

      Thank you for highlighting the strengths of our paper.

      Weaknesses:

      In general, the results in the paper confirm previous ideas about the role of N protein regions. The key novelty of the paper lies in the identification of point mutations, notablyP13L, that suggest previously unsuspected functions of the N-terminal disordered region in protein oligomerization. The paper would benefit from further exploration of these possibilities.

      We agree that the bioinformatic results confirm previous ideas about the role of the N protein regions. However, we believe our results go beyond the previous thinking in a crucial aspect, which is that we examine the full (so far known) mutant spectrum of N-protein. Properties previously inferred from the inspection of single consensus sequences can be misleading because of the quasispecies nature of RNA viruses. By considering the mutant spectrum we can obtain a sense for how significant differences in the physicochemical properties of the different regions are, and how much variation is possible without jeopardizing essential protein functions.

      With regard to the N-arm IDR mutations we believe this deserves a separate study focusing on the apparent N-arm function. Our rationale for presenting some initial N-arm results in the current paper was to highlight how the variability of N-protein species in the mutant spectrum can even include differences in the type and number of protein self-association interfaces.

      Reviewer #3 (Public Review):

      Nguyen, Zhao, et al. used bioinformatic analysis of mutational variants of SARS-CoV-2Nucleocapsid (N) protein from the large genomic database of SARS-CoV-2 sequences to identify domains and regions of N where mutations are more highly represented and computationally determined the effects of these mutations on the physicochemical properties of the protein. They found that the intrinsically disordered regions (IDRs) of N protein are more highly mutated than structured regions and that these mutations can lead to higher variability in the physical properties of these domains. These computational predictions are compared to in vitro biophysical experiments to assess the effects of identified mutations on the thermodynamic stability, oligomeric state, particle formation, and liquid-liquid phase separation of a few exemplary mutants.

      The paper is well-written and easy to follow, and the conclusions drawn are supported by the evidence presented. The analyses and conclusions are interesting and will be of value to virologists, cell biologists, and biophysicists studying SARS-CoV-2 function and assembly. It would be nice if some further extrapolation or comments could be made regarding the effects of the observed mutations on the in vivo behavior and properties of the virus, but I appreciate that this is much higher-order than could be addressed with the approaches employed here.

      We thank the Reviewer for this positive assessment. With regard to the possible in vivo behavior of mutant species, we agree that this would require additional data beyond the scope of the present work.

      However, for the N:G215C mutant we can point to a very recent preprint by Kubinski et al. (bioRxiv 2024) that describes reverse genetics experiments where the isolated N:G215C mutation caused altered in vivo pathology, enhanced viral replication, and altered virion morphology. We have cited this work in the revised manuscript.

      As mentioned above, for the P13L mutation we hope to communicate a more detailed follow-up study that will allow us to extrapolate on its in vivo behavior.

      Recommendations For The Authors:

      Reviewer #1:

      (1) Given the structure organization of N-protein in Figure 1, the authors should explain why linker region 180-247 is different from linker (175-247) mentioned in the first result.

      We thank the reviewer for bringing up this point, which we agree deserves clarification. While often the NTD has been assigned a C-terminal limit of 180 (e.g., in the NMR structure by Dinesh et al, Plos Pathogens 2020), the last several residues in the NTD are already disordered and contain the S176/R177 pair and therefore may be ascribed to the beginning of the SR-rich portion of the linker. In order not to artificially truncate functional sequences of either NTD or linker, we have decided to allow the designations of the NTD and linker regions to overlap. We believe this is conservative in that possible NTD or linker properties extending into this transition region will be preserved. In order to explain this in the manuscript, we have modified Figure 1 and inserted a brief sentence “(Due to ambiguity in delineation between NTD and linker, designations overlapping in 175-180 were used to avoid artificial truncation and permit conservative evaluation of the properties of each domain.)”.

      (2) Please specify the "physicochemical requirements" in the fourth paragraph of the first result, and its physicochemical meaning and references.

      Thank you for pointing this out; we agree this was not well expressed. We have rephrased this (including new references) to “…we find that hydrophobicity is uniformly high and polarity correspondingly low in the folded NTD and CTD domains, which is consistent with the expectation that folded structures are stabilized by buried hydrophobic residues (Eisenberg and McLachlan, 1986; Kauzmann, 1959)”.

      (3) The authors should clarify the biological meaning of the net charge and phosphorylation charge in the first result, just like the description in the results of polarity and hydrophobicity.

      We agree this will improve readability, and have inserted an introductory sentence to the study of charges in the mutant spectrum: “Charges in proteins can control multiple properties related to electrostatic interactions, from functions of active sites to protein solubility, protein interactions, and conformational ensembles in IDRs (Garcia-Viloca et al., 2004; Gerstein and Chothia, 1996; Gitlin et al., 2006; Mao et al., 2010).”.

      (4) The authors should clarify the calculation method and meaning of the column "occurs in % of all genomes" in Table 2.

      We have inserted a footnote specifying that this is the “Percentage of all sequenced genomes carrying the specific mutation.”.

      (5) Please specify what information or conclusion we can get for the shift of the intrinsic fluorescent spectrum of N: D63G in the third result paragraph 2.

      We have rephrased the second sentence of this paragraph to “The presence of the N:D63G mutation in the NTD is highlighted in the shift of the intrinsic fluorescence quantum yield of this mutant in comparison to Nref ”. It confirms the structural prediction, which positions D63G at the protein surface near the NA binding site, and sets up the question whether this obligatory mutation of Delta-variant N-protein affects NA binding and thereby possibly assembly. Unexpectedly, we did not find any impact of the D63G mutation on NA binding, although we observed a modest impact on temperature-dependent particle formation by DLS.

      (6) The conclusion, "some epistatic interaction between mutation of the linker and N-arm" in the third result paragraph 4, is over-interpreted from the result of the CD spectra because they didn't detect peptide interaction between mutation of the linker and N-arm.

      Thank you for raising this point. We did not mean to make a strong conclusion here, and have now deleted this statement.

      (7) The parallel assay for N: G215C and Nδ in SV-AUC experiments is recommended to be conducted with other groups to avoid experimental error.

      I believe this may be a misunderstanding: Indeed we had carried out SV-AUC experiments for all the mutants, as shown in Figure 5A. However, since all but the N:G215C and Nδ formed only dimers as the reference protein, we did not comment on these in the results text. We have rectified this omission in the revision by inserting the sentence: “…The same behavior is observed for N:D63G, No, N:R203K/G204R, as well as N:P13L/Δ31-33 at low micromolar concentrations (Figure 5A). By contrast, the G215C mutation promotes the formation of higher oligomers…”

      With regard to experimental error, SV-AUC is an absolute method based on first principles and we have maintained our instruments by performing regular calibrations, using methods developed by us and colleagues at NIST, as described in the literature (Anal Biochem 2013, PLOS ONE 2018, Eur. Biophys. J. 2021). Previously we have critically examined the accuracy of s-values by SV-AUC before and after calibration in a large multi-laboratory study (PLOS ONE 2015), and found that the accuracy of s-values is ~1%. This allows detailed comparisons of results from different runs and different points in time. To alleviate any concerns we have now mentioned our calibration methods in the methods section.

      (8) The authors did not test the function of Nδ R203M mutation, so they should not mention about it like in the third result paragraph 5, which is over-interpreted from result 5A.

      We accept the criticism that we have not yet examined the R203M mutation in isolation. However, we believe some speculation is in order: Nδ consists of D63G, R203M, G215C, and D377Y, of which D63G is unlikely to impact oligomeric state based on our data of N:D63G. It is therefore reasonable to assume that R203M and/or D377Y interfere with the observed promotion of oligomerization that we have observed with N:G215C. In previous work, we have traced the 215C-incuded oligomerization to the transient helix in the leucine-rich region of the linker 215-235 (Science Advances, 2023), Since 377Y is quite far away, the more proximal 203M appears to be the most plausible origin of the modulation of dimerization.

      In the revision we have more clearly outlined this speculation: “ Of the three additional mutations of Nδ relative to N:G215C, we speculate that D63G does not impact dimerization (as in N:D63G, Figure 5A), and that therefore either the distant D377Y and/or R203M might cause this reduction of helicity and oligomerization relative to N:G215C, noting that R203M is proximal to the L-rich region (215-235) reshaped by 215C. ”. Later we refer to this as “any potential inhibitory role suspected of the R203M mutation on self-association…”.

      (9) The description of LLPS formation lacks reference in the third result paragraph 6.

      Thank you. To improve the transition to this new paragraph in the results, we have inserted “As outlined in the introduction, …” and repeated the 8 references to the fact that N-protein undergoes LLPS. The two additional, separate references refer to just those published studies that examined the temperature-dependence of LLPS, which I believe is now clearer.

      (10) The authors did not test the interaction between the N-arm IDR mutation and linker IDR, it is not exponible that interaction promoted particle formation of No in the third result paragraph 8, which is over-interpreted from result 5B.

      We thank the Reviewer for raising this point. In fact, we did not want to imply a direct physical interaction (in terms of binding) between the N-arm IDR mutation and that in the linker. But clearly there are non-additive effects in particle formation since P13L/Δ31-33 inhibits slightly and R203K/G204R inhibits almost completely, whereas the combination of the two (constituting No) promotes particle formation. We have rephrased this to “alter the effect of”, avoiding the term “interact with” not to suggest a picture of direct binding and invoke instead the idea of epistatic interactions.

      (11) In the third result paragraph 9, why did the authors choose to examine the role of the N-arm mutations of the Omicron variants in greater detail? This reason should be added to the manuscript.

      Thank you for this suggestion. Naturally, we were curious how the defining N-arm mutations of Omicron variants could impact particle formation. Even though no obvious enhancement of self-association by either Omicron N-arm or linker mutations was observed at low micromolar concentrations in SV-AUC (Figure 5A), we knew from experience with the study of the leucine-rich transient helix in the linker IDR that even weak interfaces with mM Kd can be highly relevant in the context of multivalent assemblies (Science Advances, 2023). Therefore we followed the same roadmap and focused on IDR peptides with the goal to study them at higher concentrations that might reveal weak interactions.

      We have described this motivation as follows: “We were curious whether IDR mutations might alter particle formation through modulation of existing or introduction of new protein-protein interfaces. We focused on Omicron mutations as these are obligatory an all currently circulating strains, and specifically on N-arm mutations, which have recently been implicated in altered intramolecular interactions with NA-occupied NTD (Cubuk et al., 2023). Even though SV-AUC showed no indication of self-association of N:P13L/Δ31-33 at low micromolar concentrations, weak interactions with Kd > mM would not be detectable under these conditions yet could be highly relevant in the context of multi-valent complexes (Zhao et al., 2024). Following the roadmap used previously for the study of the weak self-association of the leucine-rich linker IDR (Zhao et al., 2023), we restricted the protein to the N-arm peptide such that it can be studied at much higher concentrations. To this end, we …”

      (12) Why were different proteins dissolved in either high-salt buffer or low-salt buffer for biophysical experiments? Did this affect the experimental results? Explanations and evidence are required.

      We appreciate this is an important point. Unfortunately, for practical reasons of available sample concentrations and quantities, it was not always possible to dialyze protein into both buffers. For example, the DSF data in Figure 4B show all proteins in low-salt buffer except N:R203K/G204R, which is in high-salt buffer. We had previously reported the absence of changes in Ti in DSF for Nref in the two buffers, which we have documented better in the revised manuscript by providing an additional Supplementary Figure S7: “As a buffer control, the difference in Ti for Nref in LS and HS buffer was measured and found to be within error of data acquisition (Supplementary Figure S7A).” This new Supplementary Figure provides an overlay of low-salt and high-salt DSF data for Nref, N:D63G, and No, which have variations in the Ti values for different buffers on the order of 0.1 °C. This is comparable to the precision of the measurement, and significantly smaller than the changes in Ti values between the different mutant protein species. Finally, we note that the one species for which we were unable to collect DSF data in low-salt buffer, N:R203K/G204R, was unremarkable relative to Nref, No, and N:P13L/Δ31-33.

      In the case of CD, the only species for which we could not collect spectra in low-salt buffer was No. Again, this spectrum was similar to the group including Nref, along with N:P13L/Δ31-33, and N:D63G. In the results we interpreted significant differences from Nref for N:G215C and N:R203K/G204R.

      Similarly, SV-AUC experiments were carried out in high-salt buffer, except Nref, Nδ , and N:G215C. In this case, we could observe a ≈ 5% difference in s-value for the same protein in different buffers, but the magnitude of this change is negligible compared to the ≈ 60-90% increase observed for altered oligomeric states. To clarify this we have inserted a sentence “Proteins for self-association studies were in buffer HS, except Nref, Nδ , and N:G215C were in LS, the latter causing a ≈5% increase in s-value (Supplementary Figure S7B).”, with the new Supplementary Figure S7B showing a comparison of sedimentation coefficient distributions of Nref and N:D63G in low- and high-salt buffers. Whether the small differences in s-values are indeed significant and reflective of salt-dependent conformational ensembles of IDRs will require a more detailed follow-up study, but is outside the scope of the present work.

      All other experiments were carried out with uniform buffer conditions for all protein species.

      (13) DLS data of N from other research suggests oligomers beyond dimer. Please address this discrepancy.

      Unfortunately several previous studies in the literature did not recognize the importance of eliminating nucleic acid contaminations in the N-protein preparations, and/or did not succeed in completely removing nucleic acid from the protein. We and others have repeatedly commented on this issue. For example, Tarczewska et al (IJBM 188 (2021) 391-403) clearly demonstrate this in much detail in a study dedicated to this problem.

      The clarify this point we have included a sentence in the paragraph describing the protein preparation “…the ratio of absorbance at 260 nm and 280 nm of ~0.50-0.55 confirmed absence of nucleic acid. The latter is important to eliminate higher order N-protein oligomers induced by nucleic acid binding (Carlson et al., 2020; Tarczewska et al., 2021; Zhao et al., 2021)” .

      In order to strengthen the statement in the Results that the ancestral N-protein is dimeric we have added additional references from other labs that have carried out detailed biophysical analyses: “As reported previously, the ancestral N-protein at micromolar concentrations in NA-free form is a tightly linked dimer sedimenting at ≈4 S , without significant populations of higher oligomers (Forsythe et al., 2021; Ribeiro-Filho et al., 2022; Tarczewska et al., 2021; Zhao et al., 2022, 2021).”

      Reviewer #2:

      The key novel finding of the work lies in the evidence that P31L promotes N-terminal interactions. The paper would be strengthened by additional studies of the impact of P31Lon the oligomerization of full-length N protein. The sedimentation analysis in Fig 6 shows that high concentrations of the N arm alone self-associate, while the analysis in Fig 5 argues that P31L does not have an effect on the oligomerization of the full-length protein. Perhaps there are specific conditions or mutation combinations that would provide evidence that P31L has an effect on protein behavior that might explain the prevalence of this mutation.

      We agree that the finding of P13L promoting N-terminal interactions is of great interest, and we thank the Reviewer for the suggestion to examine cross-correlations of N-arm mutations with other mutations as a tool to study its function and relevance.

      The observation of self-association in Figure 6 at high concentrations is not necessarily at odds with the absence of self-association at 100fold lower concentrations. Rather, it seems to show that the interaction mediated by the N-terminal mutation P13L is weak with an effective Kd in the mM range. It will likely not be possible to reach sufficiently high protein concentrations with the full-length protein to visualize the oligomerization of N-terminal IDR. But even if it was possible to concentrate the protein enough, very likely other assembly processes would take place, including LLPS, obscuring potential P13L interfaces. Nonetheless we believe the protein-protein interface created by the N-arm IDR is highly relevant in the context of multi-valent complexes, where entropic co-localization enhances the effective N-arm IDR concentration that then can provide additional binding energy and strengthen the assembly of multi-protein complexes.

      We are currently pursuing further experiments examining the properties and relevance of the N-arm mutations and intend to publish this in a separate study, not to distract from the thrust of the current work exploring of the extent of the biophysical phenotype space.

      The R203K/G204R mutations have a surprising impact on LLPS in Figure 7: it is not clear how such limited mutations would alter the many nonspecific, multivalent interactions that presumably lead to phase separation. The paper would benefit from a more extensive analysis of LLPS in this mutant and in the P31L mutant, perhaps by performing the analysis at various protein concentrations and times.

      Following this recommendation we have expanded the study of LLPS of Figure 7 by comparison of two different time points for Nref, N:R203K/G204R, and N:P13L in a new Supplementary Figure S6. We have also quantified the droplet distributions as shown in the new Supplementary Figure S5. Both clearly confirm the strong inhibitory effect of the R203K/G204R mutation on LLPS under our experimental conditions. What this shows is not that this protein could not undergo LLPS per se, but that the phase boundaries have shifted such that under the experimental conditions we applied LLPS does not occur yet. (In this context it is interesting to note that ≈50,000 genomes in the GISAID database have R203K/G204R as the sole N-protein mutation, without impact on viral viability.)

      That individual point-mutations in IDRs can have significant impact on LLPS has been observed previously for several other proteins. Examples include SPOP [Bouchard et al., Mol Cell 72 (2018) 19-36.e8], SHP2 [Zhu et al., Cell 183 (2020) 490-502.e18], FUS [Niaki et al., Mol Cell 77 (2020) 82-94.e4], and CAPRIN1 [Kim et al., PNAS 118 (2021) 1-11]. The latter work applies NMR and reveals that promotion of LLPS is not uniform but centered in hot-spot residues of CAPRIN1.

      While the precise molecular mechanism for LLPS of the N-protein is unclear, we can speculate how the effect of 203K/204R might be amplified. As shown by the coarse-grained MD simulations from Rozycki & Boura (Biophys. Chem. 2022), the linker IDR is highly flexible and the 203/204 residues make transient contacts to other residues throughout the linker as well as to distinct sites on the NTD. Furthermore, recent NMR data from the Blackledge lab (Botova et al., bioRxiv 2024, doi:10.1101/2024.02.22.579423) have revealed intra-molecular interactions, including a state where the L-rich (C-terminal) portion of the linker IDR interacts with a site on the distant NTD. (We have included a reference to this preprint in the discussion.) This intra-molecular contact observed in NMR must cause significant chain compaction and may thereby modulate the accessibility of portions of the linker IDR available to inter-molecular interactions contributing to LLPS. The residues 203/204 are in the middle between the SR-rich and L-rich region where bending of the chain must occur to allow for the intra-molecular contacts. The 203K/204R mutation may alter the dynamics or population of this intra-molecular bound state, especially considering the introduction of a bulky positively charged R replacing G204.

      In summary, considering the dynamics of intra-molecular contacts and considering precedent of several other disordered proteins, we believe it is not unreasonable that the local mutation in the IDR R203K/G204R may cause a significant shift in LLPS phase boundaries. We note that this mutant also shows a very distinct behavior in the temperature-dependent DLS, entirely lacking particle formation below 70 °C. This observation seems consistent with altered inter-molecular interactions.

      Reviewer #3:

      I have only a few minor specific comments:

      (1) Page 4, last paragraph - typo: "The large number of structural and non-structural N-protein functions poses the question of how they are conserved...". This either needs a colon or to be changed to "... poses the question of how they are conserved...".

      Thank you – we have changed this sentence accordingly.

      (2) Page 7, 2nd and 3rd paragraphs of "Physicochemical properties" section: why is Figure2B discussed before Figure 2A?

      Initially when we present the results of polarity and hydrophobicity we refer more generally to Figure 2, as the two properties are so closely related. Later, in the section on related coronaviruses we do refer once more to Figure 2. Here we begin this section by discussing Figure 2B since in this plot the symbols for the different viruses are most recognizable.

      (3) Page 11, lines 1-2: "Since this is a tell-tale of weak protein..." -> "tell-tale sign of ...".

      We thank the reviewer for pointing this out and have fixed this sentence.

      (4) Further down in the same paragraph, the meaning of "SV-AUC" should be spelled out at its first use.

      We have double checked that SV-AUC is spelled out at its first use.

      (5) Figures 1 and 2. Is there a good reason that the color scheme for the IDRs (magenta and cyan) is so close to the color scheme for the identifying mutations of Omicron and Delta (magenta and blue)? This initially led me to try to search for some connection, and it remains unclear to me if there is.

      We apologize for this confusion. This was indeed a poor color choice, and we have rectified this in the revised manuscript by changing the colors of the identifying mutations of Omicron and Delta to dashed green and dotted red, respectively, so that there is no connection to the shading of the IDRs. Thank you very much for pointing this out!

      (6) Figure 1: The physical limits of the subdomains, e.g. SR-rich, L-rich, C-arm1, and N3 could be more clearly delineated with lines, or some other visual representation.

      Once more, we thank the reviewer for pointing this out. We have revised Figure 1 to indicate the limits between these subdomains.

      (7) Figures 4, 5, and 6: are there any kind of error bars or confidence intervals on these measurements?

      We appreciate this concern and have addressed it in different ways for the different methods.

      For the spectra of intrinsic fluorescence in Figure 4A, we have now plotted an overlay of three acquired spectra, from which the experimental error as a function of wavelength may be assessed. It is clear that the differences between Nref and N:D63G are far greater than the measurement error.

      With regard to DSF, we have provide an error estimate of 0.3 °C for the Ti-values, a value that we have revised from the previously reported errors of sequential replicates to now include Ti variation observed with different preparations of the same protein over long time periods.

      For CD spectra we have included a new Supplementary Figure S3 that shows standard deviations of triplicate measurements as a function of wavelength. Since an overlay including errors for all species would be too crowded, we have created separate plots for all species in comparison with Nref. (On this occasion we discovered a 3% error in the magnitude of the Nref spectrum due to previously incorrect conversion to MRE, which we have now fixed.)

      In SV-AUC, for data with typical signal-noise ratio, the statistical error is very small due to the large number (> 104 ) of raw data points included in the calculation of each c(s) trace, which each data point carrying a statistical error that is usually better than 1%. Therefore, the dominant error is systematic. In the past we have carried out large studies quantifying the accuracy of the major peaks of the sedimentation coefficient distributions, and found they are typically ≈1% in s-value and 1-2% for relative peak areas. In the AUC methods section we have now included the sentence “Typical accuracy of c(s) peaks are on the order of ≈1% for peak s-values and ≈1-2% for relative peak areas (Zhao et al., 2015).”

      Finally, for the temperature-dependent DLS data we have to resort to the scatter in the temperature-dependent Rh-values. The calculated Rh-values can exhibit fluctuations once particles start to form and the distribution becomes highly polydisperse. As is characteristic for DLS under those conditions, individual Rh-values can be dominated by adventitious diffusion of few large particles into the laser focal spot. Although customarily autocorrelation functions can be filtered out through software filters (e.g., setting baseline and amplitude thresholds), this still presents the largest source of error in the Rh-values. These are systematic for the individual autocorrelation functions. We believe that the variation of Rh-values at similar temperatures outside the transition region provides a reasonable estimate for the experimental error.

      (8) Figure 7: My most major comment. It would be good to somehow quantify the differences between these images. The claim is made that the LLPS droplets are different sizes, or for the P13L/\Delta31-33 variant that droplets are coalescing or changing shape over time. It would be good to quantify this rather than rely on eyeballing the pictures.

      We are grateful to the Reviewer for this suggestion. As mentioned above, to improve the LLPS analysis we have now carried out segmentation of the images in Figure 7 to quantify the droplet numbers and areas. Histograms and statistical analyses are now provided in the new Supplementary Figure S5. In addition, we have added a comparison of the droplet numbers and sizes at two time-points for Nref, N:R203K/G204R, in addition to the previously shown N:P13L/Δ31-33, provided in the new Supplementary Figure S6. The results corroborate the previous conclusions, and depict how droplets in the N:P13L/Δ31-33 merge and grow in area more strongly than those from Nref.

    1. Author response:

      eLife assessment

      This study represents a fundamental contribution to our understanding of how gene expression levels are controlled in bacteria. Through a series of compelling and careful experiments, relying on a mutant that blocks DNA replication but permits growth, and using various methods, the authors reveal how genome concentration rapidly becomes limiting for growth when replication is inhibited. This work contributes to our understanding of the contributions and limiting roles of DNA, mRNA, and ribosomes for growth in bacteria, and will be of considerable interest within both systems biology and microbial physiology.

      Thank you!

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Mäkelä et al. presents compelling experimental evidence that the amount of chromosomal DNA can become limiting for the total rate of mRNA transcription and consequently protein production in the model bacterium Escherichia coli. Specifically, the authors demonstrate that upon inhibition of DNA replication the single-cell growth rate continuously decreases, in direct proportion to the concentration of active ribosomes, as measured indirectly by single-particle tracking. The decrease of ribosomal activity with filamentation, in turn, is likely caused by a decrease of the concentration of mRNAs, as suggested by an observed plateau of the total number of active RNA polymerases. These observations are compatible with the hypothesis that DNA limits the total rate of transcription and thus translation. The authors also demonstrate that the decrease of RNAp activity is independent of two candidate stress response pathways, the SOS stress response and the stringent response, as well as an anti-sigma factor previously implicated in variations of RNAp activity upon variations of nutrient sources.

      Remarkably, the reduction of growth rate is observed soon after the inhibition of DNA replication, suggesting that the amount of DNA in wild-type cells is tuned to provide just as much substrate for RNA polymerase as needed to saturate most ribosomes with mRNAs. While previous studies of bacterial growth have most often focused on ribosomes and metabolic proteins, this study provides important evidence that chromosomal DNA has a previously underestimated important and potentially rate-limiting role for growth.

      Thank you for the excellent summary of our work.

      Strengths:

      This article links the growth of single cells to the amount of DNA, the number of active ribosomes and to the number of RNA polymerases, combining quantitative experiments with theory. The correlations observed during depletion of DNA, notably in M9gluCAA medium, are compelling and point towards a limiting role of DNA for transcription and subsequently for protein production soon after reduction of the amount of DNA in the cell. The article also contains a theoretical model of transcription-translation that contains a Michaelis-Menten type dependency of transcription on DNA availability and is fit to the data. While the model fits well with the continuous reduction of relative growth rate in rich medium (M9gluCAA), the behavior in minimal media without casamino acids is a bit less clear (see comments below).

      At a technical level, single-cell growth experiments and single-particle tracking experiments are well described, suggesting that different diffusive states of molecules represent different states of RNAp/ribosome activities, which reflect the reduction of growth. However, I still have a few points about the interpretation of the data and the measured fractions of active ribosomes (see below).

      Apart from correlations in DNA-deplete cells, the article also investigates the role of candidate stress response pathways for reduced transcription, demonstrating that neither the SOS nor the stringent response are responsible for the reduced rate of growth. Equally, the anti-sigma factor Rsd recently described for its role in controlling RNA polymerase activity in nutrient-poor growth media, seems also not involved according to mass-spec data. While other (unknown) pathways might still be involved in reducing the number of active RNA polymerases, the proposed hypothesis of the DNA substrate itself being limiting for the total rate of transcription is appealing.

      Finally, the authors confirm the reduction of growth in the distant Caulobacter crescentus, which lacks overlapping rounds of replication and could thus have shown a different dependency on DNA concentration.

      Weaknesses:

      There are a range of points that should be clarified or addressed, either by additional experiments/analyses or by explanations or clear disclaimers.

      First, the continuous reduction of growth rate upon arrest of DNA replication initiation observed in rich growth medium (M9gluCAA) is not equally observed in poor media. Instead, the relative growth rate is immediately/quickly reduced by about 10-20% and then maintained for long times, as if the arrest of replication initiation had an immediate effect but would then not lead to saturation of the DNA substrate. In particular, the long plateau of a constant relative growth rate in M9ala is difficult to reconcile with the model fit in Fig 4S2. Is it possible that DNA is not limiting in poor media (at least not for the cell sizes studied here) while replication arrest still elicits a reduction of growth rate in a different way? Might this have something to do with the naturally much higher oscillations of DNA concentration in minimal medium?

      We note that the total RNAP activity (abundance x active fraction) was also significantly reduced in poor media (Figure 3 -- supplement 4G and H) similarly to rich medium (Figure 3H). This is consistent with DNA being limiting. The main difference between rich and poor medium conditions is that the total ribosome activity in poor media (Figure 2 -- supplement 4G and H) was less affected in comparison to rich media (Figure 2H). Our interpretation of these results is that while DNA is limiting in all medium conditions (as shown by the RNAP data), changes in ribosome activity or mRNA degradation can compensate for the reduction in transcription in poor media and hence maintain better scaling of growth rates under DNA limitation. We understand how our current presentation made it confusing. We will reorganize the text and figures to better explain our results and interpretations. 

      The authors argue that DNA becomes limiting in the range of physiological cell sizes, in particular for M9glCAA (Fig. 1BC). It would be helpful to know by how much (fold-change) the DNA concentration is reduced below wild-type (or multi-N) levels at t=0 in Fig 1B and how DNA concentration decays with time or cell area, to get a sense by how many-fold DNA is essentially 'overexpressed/overprovided' in wild-type cells.

      We will provide an estimate.

      Fig. 2: The distribution of diffusion coefficients of RpsB is fit to Gaussians on the log scale. Is this based on a model or on previous work or simply an empirical fit to the data? An exact analytical model for the distribution of diffusion constants can be found in the tool anaDDA by Vink, ..., Hohlbein Biophys J 2020. Alternatively, distributions of displacements are expressed analytically in other tools (e.g., in SpotOn).

      We use an empirical fit of Gaussian mixture model (GMM) of three states to the data and extract the fractions of molecules in each state. This avoids making too many assumptions on the underlying processes, e.g. a Markovian system with Brownian diffusion. The model in anaDDA (Vink et al.) is currently limited to two-transitioning states with a maximal step number of 8 steps per track for a computationally efficient solution (longer tracks are truncated). Using a short subset of the trajectories is less accurate than using the entire trajectory and because of this, we consider full tracks with at least 9 displacements. Meanwhile, Spot-On supports a three-state model but it is still based on a semi-analytical model with a pre-calculated library of parameters created by fitting of simulated data. Neither of these models considers the effect of cell confinement, which plays a major role on single-molecule diffusion in small-sized cells such as bacteria. For these reasons, we opted to use an empirical fit to the data. We note that the fractions of active ribosomes in WT cells grown in different media, which we extracted from these diffusion measurements, are consistent with estimates obtained by others using similar or different approaches (Forchhammer and Lindhal 1971; Mohapatra and Weisshaar, 2018; Sanamrad et al., 2014).

      The estimated fraction of active ribosomes in wild-type cells shows a very strong reduction with decreasing growth rate (down from 75% to 30%), twice as strong as measured in bulk experiments (Dai et al Nat Microbiology 2016; decrease from 90% to 60% for the same growth rate range) and probably incompatible with measurements of growth rate, ribosome concentrations, and almost constant translation elongation rate in this regime of growth rates. Might the different diffusive fractions of RpsB not represent active/inactive ribosomes? See also the problem of quantification above. The authors should explain and compare their results to previous work.

      We agree that our measured range is somewhat larger than the estimated range from Dai et al, 2016. However, they use different media, strains, and growth conditions. We also note that Dai et al did not make actual measurements of the active ribosome fraction. Instead, they calculate the “active ribosome equivalent” based on a model that includes growth rate, protein synthesis rate, RNA/protein abundance, and the total number of amino acids in all proteins in the cell. Importantly, our measurements show the same overall trend as Dai et al, 2016. Furthermore, our results are in quantitative agreements with previous experimental measurements that use ribosome profiling (Forchhammer and Lindhal 1971) or single-ribosome tracking (Mohapatra and Weisshaar, 2018; Sanamrad et al., 2014), which, we believe, validates our approach. We will clarify this point in the revised manuscript.

      To measure the reduction of mRNA transcripts in the cell, the authors rely on the fluorescent dye SYTO RNAselect. They argue that 70% of the dye signal represents mRNA. The argument is based on the previously observed reduction of the total signal by 70% upon treatment with rifampicin, an RNA polymerase inhibitor (Bakshi et al 2014). The idea here is presumably that mRNA should undergo rapid degradation upon rif treatment while rRNA or tRNA are stable. However, work from Hamouche et al. RNA (2021) 27:946 demonstrates that rifampicin treatment also leads to a rapid degradation of rRNA. Furthermore, the timescale of fluorescent-signal decay in the paper by Bakshi et al. (half life about 10min) is not compatible with the previously reported rapid decay of mRNA (24min) but rather compatible with the slower, still somewhat rapid, decay of rRNA reported by Hamouche et al.. A bulk method to measure total mRNA as in the cited Balakrishnan et al. (Science 2022) would thus be a preferred method to quantify mRNA. Alternatively, the authors could also test whether the mass contribution of total RNA remains constant, which would suggest that rRNA decay does not contribute to signal loss. However, since rRNA dominates total RNA, this measurement requires high accuracy. The authors might thus tone down their conclusions on mRNA concentration changes while still highlighting the compelling data on RNAp diffusion.

      Thank you for bringing the Hamouche et al 2022 paper to our attention. We will address this point in the revised manuscript.

      The proteomics experiments are a great addition to the single-cell studies, and the correlations between distance from ori and protein abundance is compelling. However, I was missing a different test, the authors might have already done but not put in the manuscript: If DNA is indeed limiting the initiation of transcription, genes that are already highly transcribed in non-perturbed conditions might saturate fastest upon replication inhibition, while genes rarely transcribed should have no problem to accommodate additional RNA polymerases. One might thus want to test, whether the (unperturbed) transcription initiation rate is a predictor of changes in protein composition. This is just a suggestion the authors may also ignore, but since it is an easy analysis, I chose to mention it here.

      Thank you for the suggestion. We will provide the suggested analysis in the revised manuscript.

      Related to the proteomics, in l. 380 the authors write that the reduced expression close to the ori might reflect a gene-dosage compensatory mechanism. I don't understand this argument. Can the authors add a sentence to explain their hypothesis?

      We apologize for the confusion. This will be addressed in the revised manuscript.

      In Fig. 1E the authors show evidence that growth rate increases with cell length/area. While this is not a main point of the paper it might be cited by others in the future. There are two possible artifacts that could influence this experiment: a) segmentation: an overestimation of the physical length of the cell based on phase-contrast images (e.g., 200 nm would cause a 10% error in the relative rate of 2 um cells, but not of longer cells). b) time-dependent changes of growth rate, e.g., due to change from liquid to solid or other perturbations. To test for the latter, one could measure growth rate as a function of time, restricting the analysis to short or long cells, or measuring growth rate for short/long cells at selected time points. For the former, I recommend comparison of phasecontrast segmentation with FM4-64-stained cell boundaries.

      As the reviewer notes, the small increase in relative growth was just a minor observation that does not affect our story whether it is biologically meaningful or the result of a technical artefact. But we agree with the reviewer that others might cite it in future works and thus should be interpreted with caution.

      An artefact associated with time-dependent changes (e.g. changing from liquid cultures to more solid agarose pads) is unlikely for two reasons. 1. We show that varying the time that cells spend on agarose pads relative to liquid cultures does not affect the cell size-dependent growth rate results (Figure 1 -- supplement 5B). 2. We show that the growth rate is stable from the beginning of the time-lapse with no transient effects upon cell placement on agarose pads for imaging (Figure 1 -- supplement 5B). These results were described in the Methods section where they could easily be missed. We will revise the text to discuss these controls more prominently in the Results section.

      As for cell segmentation, we have run simulations and agree with the reviewer that a small overestimation of cell area (which is possible with any cell segmentation methods including ours) could lead to a small increase in relative growth with increasing cell areas. Since the finding is not important to our story, we will simply alert the readers to the possibility that the observation may be due to a small cell segmentation bias.

      Reviewer #2 (Public Review):

      In this work, the authors uncovered the effects of DNA dilution on E. coli, including a decrease in growth rate and a significant change in proteome composition. The authors demonstrated that the decline in growth rate is due to the reduction of active ribosomes and active RNA polymerases because of the limited DNA copy numbers. They further showed that the change in the DNA-tovolume ratio leads to concentration changes in almost 60% of proteins, and these changes mainly stem from the change in the mRNA levels.

      Thank you for the support and accurate summary!

      Reviewer #3 (Public Review):

      Summary:

      Mäkelä et al. here investigate genome concentration as a limiting factor on growth. Previous work has identified key roles for transcription (RNA polymerase) and translation (ribosomes) as limiting factors on growth, which enable an exponential increase in cell mass. While a potential limiting role of genome concentration under certain conditions has been explored theoretically, Mäkelä et al. here present direct evidence that when replication is inhibited, genome concentration emerges as a limiting factor.

      Strengths:

      A major strength of this paper is the diligent and compelling combination of experiment and modeling used to address this core question. The use of origin- and ftsZ-targeted CRISPRi is a very nice approach that enables dissection of the specific effects of limiting genome dosage in the context of a growing cytoplasm. While it might be expected that genome concentration eventually becomes a limiting factor, what is surprising and novel here is that this happens very rapidly, with growth transitioning even for cells within the normal length distribution for E. coli. Fundamentally, it demonstrates the fine balance of bacterial physiology, where the concentration of the genome itself (at least under rapid growth conditions) is no higher than it needs to be.

      Weaknesses:

      One limitation of the study is that genome concentration is largely treated as a single commodity. While this facilitates their modeling approach, one would expect that the growth phenotypes observed arise due to copy number limitation in a relatively small number of rate-limiting genes. The authors do report shifts in the composition of both the proteome and the transcriptome in response to replication inhibition, but while they report a positional effect of distance from the replication origin (reflecting loss of high-copy, origin-proximal genes), other factors shaping compositional shifts and their functional effects on growth are not extensively explored. This is particularly true for ribosomal RNA itself, which the authors assume to grow proportionately with protein. More generally, understanding which genes exert the greatest copy number-dependent influence on growth may aid both efforts to enhance (biotechnology) and inhibit (infection) bacterial growth.

      We agree but feel that identifying the specific limiting genes is beyond the scope of the study. However, to examine other potential contributing factors and identify limiting gene candidates, we plan to carry out new correlation analyses between our proteomic/transcriptomic datasets and published genome-wide datasets that report various variables under unperturbed conditions (e.g., mRNA/protein concentration, mRNA degradation rates, fitness cost, transcription/translation initiation rates, and essentiality).

      Overall, this study provides a fundamental contribution to bacterial physiology by illuminating the relationship between DNA, mRNA, and protein in determining growth rate. While coarse-grained, the work invites exciting questions about how the composition of major cellular components is fine-tuned to a cell's needs and which specific gene products mediate this connection. This work has implications not only for biotechnology, as the authors discuss, but potentially also for our understanding of how DNA-targeted antibiotics limit bacterial growth.

      Good point about the DNA-targeted antibiotics. Thank you!

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public Review): 

      As a reviewer for this manuscript, I recognize its significant contribution to understanding the immune response to saprophytic Leptospira exposure and its implications for leptospirosis prevention strategies. The study is well-conceived, addressing an innovative hypothesis with potentially high impact. However, to fully realize its contribution to the field, the manuscript would benefit greatly from a more detailed elucidation of immune mechanisms at play, including specific cytokine profiles, antigen specificity of the antibody responses, and long-term immunity. Additionally, expanding on the methodological details, such as immunophenotyping panels, qPCR normalization methods, and the rationale behind animal model choice, would enhance the manuscript's clarity and reproducibility. Implementing functional assays to characterize effector T-cell responses and possibly investigating the microbiota's role could offer novel insights into the protective immunity mechanisms. These revisions would not only bolster the current findings but also provide a more comprehensive understanding of the potential for saprophytic Leptospira exposure in leptospirosis vaccine development. Given these considerations, I believe that after substantial revisions, this manuscript could represent a valuable addition to the literature and potentially inform future research and vaccine strategy development in the field of infectious diseases. 

      We have been interested in understanding how both pathogenic and non-pathogenic Leptospira species affect each other on a mammalian reservoir host. With the current study we continue to elucidate the immune mechanisms engaged by pathogenic Leptospira interrogans versus non-pathogenic L. biflexa, as a follow up to our previous work (Shetty et al, 2021 PMID: 34249775, and Kundu et al 2022 PMID 35392072). We found that both species engaged partially overlapping myeloid immune cells and inflammatory signatures of infection. For example, some chemokines were increased, and macrophage and dendritic cells were engaged at 24h post inoculation with both species of Leptospira (PMID: 34249775). Thus, we questioned whether this robust innate immune response raised to eliminate an immunogenic but rather non-pathogenic bacterium, could also help restrain L. interrogans pathogenesis. In this study we show that L. biflexa pre-exposure to L. interrogans challenge mediates improved kidney homeostasis, mitigates leptospirosis severity and leads to increased shedding of L. interrogans in urine. This suggests an interspecies symbiotic commensalistic process that facilitates survival of the pathogenic species. These findings have high impact on the lives of millions of people in areas endemic for leptospirosis that are naturally exposed to non-pathogenic Leptospira species.

      We will expand on the methodological details and will update the introduction and discussion to include answers to questions raised by the three reviewers to further clarify the importance and impact of our study.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors try to achieve a method of protection against pathogenic strains using saprophytic species. It is undeniable that the saprophytic species, despite not causing the disease, activates an immune response. However, based on these results, using the saprophytic species does not significantly impact the animal's infection by a virulent species. 

      We separate concepts of exposure to a non-virulent bacterium that establishes a brief infection with engagement of an immune response (L. biflexa), from infection established by a virulent species of Leptospira that leads to pathogenesis (L. interrogans). While trying to understand how both pathogenic and non-pathogenic Leptospira species affect each other on a mammalian reservoir host, we previously found that L. biflexa induces immune responses that should affect immunity of populations naturally exposed to this spirochete. Thus, we designed this study to answer that question.

      Strengths: 

      Exposure to the saprophytic strain before the virulent strain reduces animal weight loss, reduces tissue kidney damage, and increases cellular response in mice.

      Weaknesses: 

      Even after the challenge with the saprophyte strain, kidney colonization and the release of bacteria through urine continue. Moreover, the authors need to determine the impact on survival if the experiment ends on the 15th. 

      Another novel and unexpected aspect of our findings in the single exposure experiment was that L. biflexa pre-exposure mediated a homeostatic environment in the kidney (lower ColA1, healthier renal physiology) that restrained pathogenesis of L. interrogans after challenge, which resulted in better health outcomes and increased shedding of L. interrogans in urine; in contrast, if the kidney is compromised (high ColA1) by L. interrogans (without L. biflexa pre-exposure) there was lower shedding L. interrogans in urine. Interestingly, this suggests an interspecies symbiotic commensalistic process that facilitates survival of the pathogenic species. Thus, these data suggest that higher shedding of L. interrogans in urine may not be a hallmark of increased disease, but rather it could be the opposite.

      We will include these concepts in the updated discussion.

      We don’t think that extending this experiment to d21 or d28 would add relevant data to our findings. We provide survival curves for both experiments up to d15 post infection.

      Reviewer #3 (Public Review): 

      Summary: 

      Kundu et al. investigated the effects of pre-exposure to a non-pathogenic Leptospira strain in the prevention of severe disease following subsequent infection by a pathogenic strain. They utilized a single or double exposure method to the non-pathogen prior to challenge with a pathogenic strain. They found that prior exposure to a non-pathogen prevented many of the disease manifestations of the pathogen. Bacteria, however, were able to disseminate, colonize the kidneys, and be shed in the urine. This is an important foundational work to describe a novel method of vaccination against leptospirosis. Numerous studies have attempted to use recombinant proteins to vaccinate against leptospirosis, with limited success. The authors provide a new approach that takes advantage of the homology between a non-pathogen and a pathogen to provide heterologous protection. This will provide a new direction in which we can approach creating vaccines against this re-emerging disease. 

      Strengths: 

      The major strength of this paper is that it is one of the first studies utilizing a live non-pathogenic strain of Leptospira to immunize against severe disease associated with leptospirosis. They utilize two independent experiments (a single and double vaccination) to define this strategy. This represents a very interesting and novel approach to vaccine development. This is of clear importance to the field. 

      The authors use a variety of experiments to show the protection imparted by pre-exposure to the non-pathogen. They look at disease manifestations such as death and weight loss. They define the ability of Leptospira to disseminate and colonize the kidney. They show the effects infection has on kidney architecture and a marker of fibrosis. They also begin to define the immune response in both of these exposure methods. This provides evidence of the numerous advantages this vaccination strategy may have. Thus, this study provides an important foundation for future studies utilizing this method to protect against leptospirosis. 

      Weaknesses: 

      Although they provide some evidence of the utility of pretreatment with a non-pathogen, there are some areas in which the paper needs to be clarified and expanded. 

      The authors draw their conclusions based on the data presented. However, they state the graphs only represent one of two independent experiments. Each experiment utilized 3-4 mice per group. In order to be confident in the conclusions, a power analysis needs to be done to show that there is sufficient power with 3-4 mice per group. In addition, it would be important to show both experiments in one graph which would inherently increase the power by doubling the group size, while also providing evidence that this is a reproducible phenotype between experiments. Overall, this weakens the strength of the conclusions drawn and would require additional statistical analysis or additional replicates to provide confidence in these conclusions. 

      We will take these suggestions into consideration and will address as many of these issues as possible in the revised manuscript.

      A direct comparison between single and double exposure to the non-pathogen is not able to be determined. The ages of mice infected were different between the single (8 weeks) and double (10 weeks) exposure methods, thus the phenotypes associated with LIC infection are different at these two ages. The authors state that this is expected, but do not provide a reasoning for this drastic difference in phenotypes. It is therefore difficult to compare the two exposure methods, and thus determine if one approach provides advantages over the other. An experiment directly comparing the two exposure methods while infecting mice at the same age would be of great relevance to and strengthen this work. 

      Both experiments need to be analyzed as separate but complementary as they provide different hind sights into L. interrogans pathogenesis and potential solutions to the problem. Optimal measurements of disease progression (weight loss, survival curves) require infection of mice at 8 weeks. Based on this, a new L. biflexa double exposure experiment would have to start when mice are 4 weeks old which is just after weaning, and before the mouse immune system is fully developed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is a valuable contribution to the electric fish community, and to studies of active sensing more generally, in that it provides evidence that a well-studied behavior (chirping) may serve in active sensing rather than communication. For the most part, the evidence is solid. In particular, the evidence showing increased chirping in more cluttered environments and the relationship between chirping and movement are convincing. Nevertheless, evidence to support the argument that chirps are mostly used for navigation rather than communication is incomplete.

      Thank you for the comment. In response to what seemed to be a generalized need for more evidence to support our hypothesis, we have extensively reviewed the manuscript, changed the existing figures and added new ones (3 new figures in the main text and 4 in the supplementary information section). Our edits include:

      (1) changes to the written text to remove categorical statements ruling out the possible communication function of chirps. When necessary, we have also added details on why we believe a social communication function of chirps could interfere with a role in electrolocation.

      (2) new experiments (and related figures) adding details on the behavioral correlates of chirping, on the effects of chirps on electric images (which are a way to represent current flow on the fish skin), and behavioral responses to ramp frequency playback EODs (used to test a continuous range of beat frequencies and fill the sampling gaps left by our experiments using real fish).

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors investigate the role of chirping in a species of weakly electric fish. They subject the fish to various scenarios and correlate the production of chirps with many different factors. They find major correlations between the background beat signals (continuously present during any social interactions) or some aspects of social and environmental conditions with the propensity to produce different types of chirps. By analyzing more specifically different aspects of these correlations they conclude that chirping patterns are related to navigation purposes and the need to localize the source of the beat signal (i.e. the location of the conspecific).

      We thank the Reviewer for the extensive feedback received. Hereby we respond to each of the points raised.

      We have better clarified that our intention is not to propose chirps as tools for “conspecific localization” intended as the pinpointing of its particular location. Instead, based on our observation of chirps being employed at very close ranges, we suggest that chirps may serve to assess other parameters related to “conspecific positioning” (which in a wide sense, it is still “electrolocation”), and that could be derived from the beat. These parameters might include size, relative orientation, or subtle changes in position during movement. While the experiments discussed in the manuscript do not provide a conclusive answer in this regard, we prioritize here the presentation of broader evidence for a different use of chirping. We are actively working on another manuscript that explores this aspect more in detail, but, due to space limitations, additional results had to be excluded.

      In the abstract we mention a role of chirps in the enhancement of “electrolocation”, but - as above mentioned - it is here meant only in a broad sense. In the introduction (at the very end) we propose chirps as self-directed signals (homeoactive sensing). In the result paragraph dedicated to the novel environment exploration experiment the following lines were added “Most chirps (90%) in fact are produced within a distance corresponding to 1% of the maximum field intensity (i.e. roughly 30 cm; Figure S12B), indicating that chirping occurs way below the threshold range for beat detection (i.e. roughly in the range of 60-120 cm, depending on the study; see appendix 1: Detecting beats at a distance) and likely does not represent a way to improve it”. We conclude this paragraph mentioning “This further corroborates the hypothesized role of chirps in beat processing.”. The last result paragraph (on chirping in cluttered environments) ends with “This supports the notion of chirps as self-referenced probing cues, potentially employed to optimize short-range aspects of conspecific electrolocation, such as conspecific size, orientation, and swimming direction - a hypothesis that will certainly be explored in future studies.”. In the discussion paragraph entitled “probing with chirps”, we do provide hints to possible mechanisms implied in the role of chirps in beat processing. As mentioned, we have planned to add further details in another manuscript, currently in preparation.

      The study provides a wealth of interesting observation of behavior and much of this data constitute a useful dataset to document the patterns of social interactions in these fish. Some data, in particular the high propensity to chirp in cluttered environments, raises interesting questions. Their main hypothesis is a useful addition to the debate on the function of these chirps and is worth being considered and explored further. However, the data they provide does not support strong conclusion statements arguing that these chirps are used for localization purposes and is even less convincing at rejecting previously established hypotheses on the communication purpose of the chirps.

      We intentionally framed our aims a bit provocatively to underscore that, to date, the role of chirps in social communication has been supported solely by correlative evidence. While the evidence we provide to support the role of chirps as probes is also correlative, it opens at the same time critical questions on the long assumed role of chirps in social communication. In fact, chirping is strongly dependent on fish reciprocal positioning, highly constrained by beat frequency, and patterned in such ways that - in our opinion - makes the existence of links between chirp types and internal states less likely, as suggested instead by the current view. Moreover, the use of different chirp types does not appear specific to any of the social contexts analyzed but is primarily explained by DF (beat frequency). This observation, coupled with the analysis of chirp transitions (more self-referenced than reflecting an actual exchange between subjects), leads us to hypothesize with greater confidence that chirp production may be more related to sensing the environment, rather than transmitting information about a specific behavioral state.

      Nevertheless, the Reviewer's comment is valid. We've tempered the study's conclusions by introducing the possibility of chirps serving both communication and electrolocation functions, as stated in the conclusion paragraph: "While our results do not completely dismiss the possibility of chirps serving a role in electrocommunication—probing cues could, for instance, function as proximity signals to signal presence, deter approaches, or coordinate behaviors like spawning (Henninger et al., 2018).". Nonetheless, we do emphasize that our hypothesis is more likely to apply - based on our data. We refrain from categorically excluding a communicative function for chirps (between subjects), but we hypothesize that this communication - if occurring - may contain the same type of information as the self-directed signaling implied by the “chirps as probes” idea (i.e. spatial information).

      In response to the Reviewer's feedback, we've revised the end of the introduction, removing suggestions of conclusiveness: "Finally, by recording fish in different conditions of electrical 'visibility,' we provide evidence supporting a previously neglected role of chirps: homeoactive sensing." (edit: the word “validating” has been removed to give a less “conclusive” answer to the open functional questions about chirping).

      I would suggest thoroughly revising the manuscript to provide a neutral description of the results and leaving any speculations and interpretations for the discussion where the authors should be careful to separate strongly supported hypotheses from more preliminary speculations. I detail below several instances where the argumentation and/or the analysis are flawed.

      Following to the reviewer’s comment, we have revised the manuscript to emphasize the following points: 1) the need for a revision of the current view on chirping, 2) our proposal of an alternative hypothesis based on correlations between chirping and behavior, which were previously unexplored, and 3) our acknowledgment that while we offer evidence supporting a probing role of chirps (e.g., lack of behavioral correlation, DF-dependency, stereotypy in repeated trials, modulation by clutter and distance), we do not present here conclusive evidence for chirps detecting specific details of conspecific positioning. Neither do we exclude categorically a role of chirps in social communication.

      They analyze chirp patterning and show that, most likely, a chirp by an individual is followed by a chirp in the same individual. They argue that it is rare that a chirp elicits a "response" in the other fish. Even if there are clearly stronger correlations between chirps in the same individual, they provide no statistical analysis that discards the existence of occasional "response" patterns. The fact that these are rare, and that the authors don't do an appropriate analysis of probabilities, leads to this unsupported conclusion.

      We employed cross-correlation indices, calculated and assessed with a 3 standard deviation symmetrical boundary (which is a statistically sound and strict criterion). Median values were utilized to depict trends in each group/pair. To support our findings, we added new experiments and new figures: 1) a correlation analysis between chirps and behaviors, providing more convincing evidence of how chirps are employed during "scanning" swimming activity (backward swimming); 2) a text mining approach to underscore chirp-behavior correlations, employing alternative and statistically more robust methods.

      One of the main pieces of evidence that chirps can be used to enhance conspecific localization is based on their "interference" measure. The measure is based on an analysis of "inter-peak-intervales". This in itself is a questionable choice. The nervous system encodes all parts of the stimulus, not just the peak, and disruption occurring at other phases of the beat might be as relevant. The interference will be mostly affected by the summed duration of intervals between peaks in the chirp AM. They do not explain why this varies with beat frequency. It is likely that the changes they see are simply an artifact of the simplistic measure. A clear demonstration that this measure is not adequate comes from the observation in Fig7E-H. They show that the interference value changes as the signal is weaker. This measure should be independent of the strength of the signal. The method is based on detecting peaks and quantifying the time between peaks. The only reason this measure could be affected by signal strength is if noisy recordings affect how the peak detection occurs. There is no way to argue that this phenomenon would happen the same way in the nervous system. Furthermore, they qualitatively argue that patterns of chirp production follow patterns of interference strength. No statistical demonstration is done. Even the qualitative appraisal is questionable. For example, they argue that there are relatively few chirps being produced for DFs of 60 or -60 Hz. But these are DF where they have only a very small sample size. The single pair of fish that they recorded at some of these frequencies might not have chirped by chance and a rigorous statistical analysis is necessary. Similarly, in Fig 5C they argue that the position of the chirps fall on areas of the graph where the interferences are strongest (darker blue) but this is far from obvious and, again, not proven.

      We would like to clarify that the estimation of the effects of chirps on the beat (referred to as “beat interference”) was not intended to serve as the primary evidence supporting a different use of chirping. In fact, all the experiments conducted prior to that calculation already provide substantial evidence supporting the hypothesis we have proposed. In an attempt to address the Reviewer’s concern and to avoid misleading interpretations, we moved this part now to the Supplementary Information (see now Figures S8 and S9), in agreement with the non crucial relevance of this approach. We also added the following statement to the result paragraph entitled “Chirps significantly interfere with the beat and enhance electric image contrast”: “Obviously, measuring chirp-triggered beat interferences by using an elementary outlier detection algorithm on the distribution of beat cycles does not reflect any physiological process carried out by the electrosensory system and can be therefore used only as an oversimplified estimate.”.

      Regarding the meaning of “beat interference” (as here estimated) from a perspective of brain physiology: chirp interference was calculated using the beat cycles as a reference. Beat peaks were used only to estimate beat cycle duration. Regardless of whether or not a beat peak is represented in the brain, beat cycle duration (estimated using the peaks) is the main determinant of p-unit rhythmic response to a beat. Regarding the effect of signal amplitude, this is also not very relevant. It is obvious that a chirp creates more - or less - interference based on the chirp FM and its duration (but also the sign of the DF and the magnitude of the amplitude modulation). If electroreceptor responses are entrained in waves of beat AMs and if “interference” is a measure of how such waves are scrambled, then “interference” is a measure of how chirps scramble waves of electroreceptor activity by affecting beat AMs.

      The reason why the interference fades with the signal (previous figure 7, now Figure S12) is because it is weighted on the signal strength (the signals used as carrier for chirps are recalculated based on real measurements of signal strength at different distances). Nonetheless, the Reviewer is right: mathematically speaking interference would not change at all because it is just the result of an outlier detection algorithm. This outlier detection is actually set to have a 1% threshold (percent of beat contrast).

      Regarding the comparison “chirps vs interference”, we did not make a statistical analysis because we wanted to just show a qualitative observation. Similar results can be obtained for slightly shorter or longer time windows, within certain limits of course (see added Figure S9, in the Supplementary Information). We hope that moving this analysis to the supplementary information makes it clear that this approach is not central to make our point.

      The Reviewer’s point on the DF sampling is correct, we have reconsidered the low chirping at 60Hz as potentially the result of sampling bias and edited the respective result paragraph.

      They relate the angle at which one fish produces chirps relative to the orientation of the mesh enclosing. They argue that this is related to the orientation of electric field lines by doing a qualitative comparison with a simplified estimate of field lines. To be convincing this analysis should include a quantitative comparison using the exact same body position of the two fish when the chirps are emitted.

      We agree with the Reviewer, this type of experiment would be much better suited to illustrate the correlations between chirping and reciprocal positioning in fish. What we can see is that chirping occurs at certain orientations more often than others. This could have something to do with either field geometry or with locomotion in the particular test environment we have used. As mentioned earlier, we are currently editing a second manuscript which will include the type of analysis/experiment the Reviewer is thinking of. We preferred to focus in this first study on the broader behavioral correlates of chirping. We removed the mention to the field current lines because - we agree - the argument is vague as presented here.

      They show that the very vast majority of chirps in Fig 6 occur when the fish are within a few centimeters (e.g. very large first bin in Fig6E-Type2). This is a situation when the other fish signal will be strongest and localization will be the easiest. It is hard to understand why the fish would need a mechanism to enhance localization in these conditions (this is the opposite of difficult conditions e.g. the "cluttered" environment).

      Agreed, in fact we do not explicitly propose chirps as means to improve “electrolocation” (this word is used only broadly in the abstract) but instead as probes to extract spatial information (e.g. shape, motion, orientation) from a beat source. In a broader sense, all these spatial parameters contribute to any given instance of "localization." Because we were unable to explore all these aspects in greater detail, we chose to maintain a broader perspective. If chirps contribute to a better resolution of fine spatial attributes of conspecific locations, it is reasonable to expect higher chirping rates in proximity to the target fish.

      The argumentation aimed at rejecting the well-established role of chirp in communication is weak at best. First, they ignored some existing data when they argue that there is no correlation between chirping and behavioral interactions. Particularly, Hupe and Lewis (2008) showed a clear temporal correlation between chirps and a decrease in bites during aggressive encounters. It could be argued that this is "causal evidence" (to reuse their wording) that chirps cause a decrease in attacks by the receiver fish (see Fig 8B of the Hupe paper and associated significant statistics). Also, Oboti et al. argue that social interactions involve "higher levels of locomotion" which would explain the use of chirps since they are used to localize. But chirps are frequent in "chirp chamber" paradigms where no movement is involved. They also point out that social context covaries with beat frequency and thus that it is hard to distinguish which one is linked to chirping propensity and then say that it is hard to disentangle this from "biophysical features of EOD fields affecting detection and localization of conspecific fish". But they don't provide any proof that beat frequency affects detection and localization so their argument is not clear. Last, they argue that tests in one species shouldn't be extrapolated to other species. But many of the studies arguing for the role of chirps in communication was done on brown ghost. In conclusion of this point, they do not provide any strong argument that rejects the role of chirps as a communication signal. A perspective that would be better supported by their data and consistent with past research would be to argue that, in addition to a role in communication, chirps could sometimes be used to help localize conspecifics.

      We did not intend to disregard the extensive body of literature supporting a role of chirps in social communication. Rather, the primary goal of this study was to present a valid alternative perspective to this prevailing view. The existence of a well-established hypothesis does not imply that new evidence cannot change it; it simply indicates that changing it may be challenging either because it's genuinely difficult or because the idea has not been thoroughly explored. Whatever the case may be, proposing new hypotheses, whether complementary or alternative to established theories, is a challenging undertaking for a single study. We judged that starting from broad correlations would be the most desirable approach.

      We did not ignore data from Hupé and Lewis 2008. We cited this study repeatedly and compared their findings to those of others, not only for the correlation chirp-behaviors but also for chirping distance considerations. However, following the Reviewer’s comment, we now cite this study in the context of the behavioral analysis recently added (data from the PSTH plots could possibly confirm the observation of lower chirps during attacks). We also cited the study by Triefenbach and Zakon 2008, which reports something along the same lines. See the statement: “Overall, these results provided mutually reinforcing evidence indicating that chirps are produced more often during locomotion or scanning-related motor activity and confirm previous reports of a lower occurrence of chirping during more direct aggressive contact (as shown also by Triefenbach and Zakon, 2008; Hupé and Lewis, 2008).”, in the result paragraph related to the behavioral correlates of chirping.

      In our study we make it clear how we distinguish causal evidence (i.e. providing evidence that A is required for B) from correlation (i.e. evidence for A simply occurring together with B). We also make it clear that we are not going to provide causal evidence but we are going to provide new evidence for correlations that were so far not considered, in order to propose a new unexplored function of chirps.

      The Reviewer's point on chirping during motion and while caged in a chirp chamber is valid. Indeed at first we were also puzzled by this finding. However, under the “chirps as probes” paradigm, chirping in a chirp-chamber can be explained by the need to obtain spatial information from an otherwise unreachable beat source (brown ghosts are typically exploring new environmental objects or conspecifics by actively swimming around them - something caged fish can’t do). So, eventually the observation of chirping under conditions of limited movement (such as in a chirp chamber experiment) is not in contradiction with our hypothesis, rather it can be used to support it. Further experiments are required - as rightfully pointed out - to evaluate the effects of beat frequency on beat detection. We added a note about this in the “probing with chirps” discussion paragraph.

      The Reviewer's comment regarding generalization is unclear. We acknowledge that most studies are conducted in brown ghosts, as stated in the abstract. Our intention was to highlight that insights gained from this species have been applied to broaden the understanding of chirps in other species. Specifically, the "behavioral meaning idea" of chirping has been extended to other gymnotiform species producing EOD frequency modulations .

      Our study's aim is not to dismiss the idea of chirps being used for communication but to present an alternative hypothesis and to provide supporting evidence. While our results may not align well with the communication theory, our intention is not dismissal but rather engaging in a discussion and exploration of alternative perspectives.

      The discussion they provide on the possible mechanism by which chirps could help with localization of the conspecific is problematic. They imply that chirps cause a stronger response in the receptors. For most chirps considered here, this is not true. For a large portion of the beat frequencies shown in this paper, chirps will cause a de-synchronization of the receptors with no increase in firing rate. They cannot argue that this represents an enhanced response. They also discuss a role for having a broader frequency spectrum -during the chirp- in localization by making a parallel with pulse fish. There is no evidence that a similar mechanism could even work in wave-type fish.

      We have already commented on the “localization” idea in our previous responses. The Reviewer is right in saying that we have provided only vague descriptions of the potential mechanisms implied by our hypothesis. The studies by Benda and others (2005, 2006) demonstrate a clear synchronizing effect of chirps on p-unit firing rates, especially at low DFs (at ranges similar to those considered in this study). This synchronization could lead to an enhanced response at the electroreceptor level, as described in these very studies, which in turn would result in a higher probability of firing in downstream neurons (E-cells in the ELL).

      As also reported within the same works, chirps may also exert an opposite effect on p-units (i.e. desynchronization). This is what happens for large chirps at high DFs. Desynchronization may cause temporary lapses of p-unit firing, which in turn may lead to increased activity of I-cells in the ELL (which are indeed specifically tuned to p-unit lack of activity).

      So, in general, if we consider both ON and OFF pyramidal cells (in the ELL) and small and large chirps, we could state that chirps can be potentially used to enhance the activity of peripheral electrosensory circuits through different mechanisms, contingent on the chirp type and beat frequency. Unfortunately, space constraints limited our ability to dig into these details in the present study.

      However, to address the Reviewer’s rightful point, we now mention this in the manuscript: Since the beat AMs generated by the chirps always trigger reliable responses in primary electrosensory circuits (pyramidal cells in the ELL respond to both increases and decreases in beat AM), any chirp-triggered AM causing a sudden change in p-unit firing could potentially amplify the downstream signal (Marsat and Maler, 2010) and thus enhance EI contrast.” (see result paragraph on beat interference and electric images).

      They write the whole paper as if males and females had been identified in their experiments. Although EOD frequency can provide some guess of the sex the method is unreliable. We can expect a non-negligible percentage of error in assigning sex.

      We agree and in fact, in the method section we state:

      “The limitation of this approach is that females cannot be distinguished from immature males with absolute certainty, since no post-mortem gonadal inspection was carried out.”

      to this we added:

      “Although a more accurate way to determine the sex of brown ghosts would be to consider other morphological features such as the shape of the snout, the body size, the occurrence of developing eggs, EOD frequency has been extensively used for this purpose.”

      Moreover, the consistent behavioral differences observed in low frequency fish, measured with those behavioral experiments aimed at assessing responses to playback stimuli and swimming behavior in novel environments, could also be caused by a younger age (as opposed to femaleness). However, the size ranges of our fish (an admittedly unreliable proxy of age) were all comparable, making this possibility perhaps less likely.

      Reviewer #2 (Public Review):

      Studying the weakly electric brown ghost knifefish, the authors provide evidence that 'chirps' (brief modulations in the frequency and amplitude of the ongoing electric signal) function in active sensing (specifically homeoactive sensing) rather than communication. This is a behavior that has been very well studied, including numerous studies on the sensory coding of chirps and the neural mechanisms for chirp generation. Chirps are largely thought to function in communication behavior, so this alternative function is a very exciting possibility that could have a great impact on the field. The authors do provide convincing evidence that chirps may function in homeoactive sensing. However, their evidence arguing against a role for chirps in communication is not as strong, and neglects a large body of research. Ultimately, the manuscript has great potential but suffers from framing these two possibilities as mutually exclusive and dismissing evidence in favor of a communicative function.

      We thank the Reviewer for the comment. Overall, we have edited the manuscript to soften our conclusions and avoid any strong categorical statement excluding the widely accepted role of chirps in social communication. We have added some new experiments with the aim to add more detail to the behavioral correlates of chirping and to the DF dependency of the production of different types of chirps. Nonetheless, based on our results, we are prone to conclude that the communication idea - although widely accepted - is not as well substantiated as it should be.

      Although we do not dismiss the bulk of literature supporting a role of chirps in social communication, we think that our hypothesis (i.e. decoding of spatial parameters from the beat) may be not fully compatible with the social communication hypothesis for the following reasons:

      (1) Chirp type dependency on DF makes chirps likely to be adaptive responses to beat frequency. While this idea is compatible with a role of chirps in the detection of beat parameters, their concurrent role in social communication would imply that chirps interacting at given beat frequencies (DFs) would communicate only (or mainly) by delivering a very limited range of “messages”. For instance, assuming type 2 chirps are related to aggression (as widely suggested), are female-male pairs - with larger DFs - interacting less aggressively than same sex pairs? Our experiments often suggested this is not the case. In addition, large DFs are not always indicative of opposite sex interactions, while they are very often characterized by the emission of large chirps. Not to mention that, despite the fact that opposite sex interactions in absence of breeding-like conditions, cannot be considered truly courtship-related, large chirps are often considered courtship signals, regardless of the reproductive state of the emitting fish.

      (2) Chirping is highly affected by locomotion (consider female/male pairs with or without mesh divider) and distance (as shown in the novel environment exploration experiment). While the involvement of both parameters is compatible with a role of chirps in active sensing, a role of chirps in social communication implies that such signaling would occur only when fish are in very close proximity to each other. In this case, the beat is therefore heavily distorted not only by fish position/locomotion but also by chirps. Which means that when fish are close to each other, the 2 different types of information relayed by the beat (electrolocation and electrocommunication) would certainly interfere (this idea has been better phrased in the Introduction paragraph).

      (3) In our playback experiments we could not see any meaningful matching (e.g. angry-chirp → angry-chirp or sexy-chirp → approach) between playback chirps and evoked chirps, raising doubts on the meaning associated so far with the different types. Considering that playback experiments are typically used to assess signal meaning based on how animals respond to them, this result is suggesting quite strongly that such meaning cannot be assigned to chirps.

      (4) In playback experiments in which the same stimulus is provided multiple times, chirp type transitions (i.e. emission of a different chirp type after a given chirp) become predictable (as shown in the added playback experiments using ramping signals). This confirms that the choice to emit a given chirp type has something to do with beat frequency (or a change in this parameter) and not a communication of internal states. It would be otherwise unclear how a fish could change its internal state so quickly - and so reliably - even in the span of a few seconds.

      Despite this evidence against a semantic content of chirps in the context of social communication, we conclude the manuscript reminding that we are not providing strong evidence dismissing the communication hypothesis, and that both could coexist (see the example of “proximity signals” in the mating context given in the concluding paragraph).

      (1) The specific underlying question of this study is not made clear in the abstract or introduction. It becomes apparent in reading through the manuscript that the authors seek to test the hypothesis that chirps function in active sensing (specifically homeoactive sensing). This should be made explicitly clear in both the abstract and introduction, along with the rationale for this hypothesis.

      In the abstract we state “Despite the success of this model in neuroethology over the past seven decades, the underlying logic of their electric communication remains unclear. This study re-evaluates this view, aiming to offer an alternative, and possibly complementary, explanation for why these freshwater bottom dwellers emit electric chirps.”. This statement is meant as a summary of our aims. However, in order to convey a clearer message, we have revised the whole manuscript to more explicitly articulate our objectives. In particular we stress that with our experiments we intend to provide correlative evidence for a different role of chirps (previously unexplored) with the idea to stimulate a discussion and possibly a revision of the current theory about the functional role of chirps.

      In the introduction we have added a paragraph explaining our aim and also why we think that communicating through chirps could potentially interfere with efficient electrolocation: “Since both chirps and positional parameters (such as size, orientation or motion) can only be detected as perturbations of the beat (Petzold et al., 2016; Yu et al., 2012; Fotowat et al., 2013), and via the same electroreceptors, the inputs relaying both types of information are inevitably interfering. Moreover, as the majority of chirps are produced within a short range (< 50 cm; Zupanc et al., 2006; Hupé and Lewis 2008; Henninger et al., 2018; see appendix 1) this interference is likely to occur consistently during social interactions.

      Under the communication-hypothesis, the assumption that chirps and beats are conveying different types of information (i.e. semantic value as opposed to position and related geometrical parameters) is therefore leaving this issue unresolved.”.

      (2) My biggest issue with this manuscript is that it is much too strong in dismissing evidence that chirping correlates with context. This is captured in this sentence in the introduction, "We first show that the choice of different chirp types does not significantly correlate with any particular behavioral or social context." This very strong conclusion comes up repeatedly, and I disagree with it, for the following reasons:

      In your behavioral observations, you found sex differences in chirping as well as differences between freely interacting and physically separated fish. Your model of chirp variability found that environmental experience, social experience, and beat frequency (DF) are the most important factors explaining chirp variability. Are these not all considered "behavioral or social context"? Beat frequency (DF) in particular is heavily downplayed as being a part of "context" but it is a crucial part of the context, as it provides information about the identity of the fish you're interacting with.

      In your playback experiments, fish responded differently to small vs. large DFs, males chirped more than females, type 2 chirps became more frequent throughout a playback, and rises tended to occur at the end of a playback. These are all examples of context-dependent behavior.

      We agree with the Reviewer’s comment and we think that probably we have been unclear in what the meaning of that statement was. We also agree with the Reviewer about what is defined as “context”, and that a given beat frequency (DF) can in the end represent a “behavioral context” as well. In order to make it clearer, we have rephrased this statement and changed it to: “We first show that the relative number of different chirp types in a given recording does not significantly correlate with any particular behavioral or social context.”. This new form refers specifically to the observation that - in all different social conditions examined - the relative amounts of different types of chirps is unchanged (see Figure S2). We thought the Reviewer maybe interpreted our statement as if we suggested that chirp type choice is random or unaffected by any social variable. We agree with the Reviewer that this is not the case. We also reported that sex differences in chirping are present, but we have emphasized they may have something to do with the propensity of the brown ghosts of either sex to swim/explore as opposed to seek refuge and wait (as suggested by our experiments in which FM pairs were either divided or freely interacting and our novel environment exploration experiments).

      We agree DF is important, in fact it is the 3rd most important factor explaining chirp variance in our model. In our fish pair recordings, we see a strong correlation of chirp total variance with tank experience (one naïve, one experienced, both fish equally experienced) and social context (novel to each other/familiar to each other, subordinate/dominant, breeding/non breeding, accessible/not accessible) although data clustering seems to better distinguish “divided” vs “freely moving” conditions (and sex may also play a role as well because of the reversal of sexual dimorphism in chirp rates in precisely this case) more than other variables. However, we do not see a specific effect of these variables on the proportion of different types of chirps in any recording (see Figure S2).

      We also edited the beginning of the first result paragraph and changed it to “Thus, if behavioral meaning can be attributed to different types of chirps, as posed by the prevailing view (e.g., Hagedorn and Heiligenberg, 1985; Larimer and MacDonald, 1968; Rose, 2004), one should be able to identify clear correlations between behavioral contexts characterizing different internal states and the relative amounts of different types of chirp”, to emphasize we are here assessing the meaning of different types of chirps (not of the total amount of chirping in general).

      Further, you only considered the identity of interacting fish or stimulated fish, not their behavior during the interaction or during playback. Such an analysis is likely beyond the scope of this study, but several other studies have shown correlations between social behavior and chirping. In the absence of such data here, it is too strong to claim that chirping is unrelated to context.

      We agree with the Reviewer, in fact this analysis was previously carried out but purposely left out in an attempt to limit the manuscript length. We have now made space for this experimental work which is now added (see the new Figure 6).

      In summary, it is simply too strong to say that chirping does not correlate with context. Importantly, however, this does not detract from your hypothesis that chirping functions in homeoactive sensing. A given EOD behavior could serve both communication and homeoactive sensing. I actually suspect that this is quite common in electric fish. The two are not mutually exclusive, and there is no reason for you to present them as such. I recommend focusing more on the positive evidence for a homeoactive function and less on the negative evidence against a communication function.

      We aimed to clarify that our reference was to the lack of correlation between "chirp type relative numbers" and the analyzed context. Regarding the communication function, we tempered negative statements. However, as this study stems from evidence within the established paradigm of "chirps as communication signals", and aims at proposing an alternative hypothesis, eliminating all references to it could undermine the study's purpose.

      (3) The results were generally challenging to follow. In the first 4 sections, it is not made clear what the specific question is, what the approach to addressing that question is, and what specific experiment was carried out (the last two sections of the results were much clearer). The independent variables (contexts) are not clearly established before presenting the results. Instead they are often mentioned in passing when describing the results. They come across as an unbalanced hodgepodge of multiple factors, and it is not made clear why they were chosen. This makes it challenging to understand why you did what you did, the results, and their implications. For each set of major results, I recommend: First, pose a clear question. Then, describe the general approach to answering that question. Next, describe the specifics of the experimental design, with a rationale that appeals to the general approach described. Finally, describe the specific results.

      The introductory sentences of the first result paragraphs have been edited, rendering the aim of the experiments more explicit.

      (4) Results: "We thus predicted that, if behavioral meaning can be attributed to different types of chirps, as posed by the prevailing view (e.g., Hagedorn and Heiligenberg, 1985; Larimer and MacDonald, 1968; Rose, 2004)..." It should be made clear why this is the prevailing view, and this description should likely be moved to the introduction. There is a large body of evidence supporting this view and it is important to be complete in describing it, especially since the authors seem to seek to refute it.

      We understand the Reviewer’s question and we tried to express in the introduction the main reasons for why this is the current view. We state “Different types of chirps are thought to carry different semantic content based on their occurrence during either affiliative or agonistic encounters (Larimer and MacDonald 1968; Bullock 1969; Hopkins 1974; Hagedorn and Heiligenberg 1985; Zupanc and Maler 1993; Engler et al. 2000; Engler and Zupanc 2001; Bastian et al., 2001).”. To this we added: “Although supported mainly by correlative evidence, this idea gained popularity because it is intuitive and because it matches well enough with the numerous behavioral observations of interacting brown ghosts.”.

      We believe the prevailing view is based on intuition and a series of basic observed correlations repeated throughout the years. The crystallization of this idea is not due to negligence but mainly to technical limitations existing at the time of the first recordings. In order to assess the role of chirps in behaving fish a tight and precise temporal control over synched video-EOD recordings is most likely necessary, and this is a technical feature probably available only much later than the 50-60ies, when electric communication was first described.

      (5) I am not convinced of the conclusion drawn by the analysis of chirp transitions. The transition matrices show plenty of 1-2 and 2-1 transitions occurring. Further, the cross-correlation analysis only shows that chirp timing between individuals is not phase-locked at these small timescales. It is entirely possible that chirp rates are correlated between interacting individuals, even if their precise timing is not.

      We agree with the Reviewer: chirp repertoires recorded in different social contexts are not devoid of reciprocal chirp transitions (i.e. fish 1 chirp - to - fish 2 chirp, or vice versa). Yet our point is to emphasize that their abundance is way more limited when compared to the self-referenced ones (i.e. 1-1 and 2-2). This is a fair concern and in order to further address this point, we have added a whole new set of analyses and new experiments (see chirp-behavior correlations, PSTHs and more analysis based on more solid statistical methods; see Figure 6).

      Reviewer #3 (Public Review):

      Summary:

      This important paper provides the best-to-date characterization of chirping in weakly electric fish using a large number of variables. These include environment (free vs divided fish, with or without clutter), breeding state, gender, intruder vs resident, social status, locomotion state and social and environmental experience, as well as with playback experiments. It applies state-of-the-art methods for reducing dimensionality and finding patterns of correlation between different kinds of variables (factor analysis, K-means). The exceptional strength of the evidence, collated from a large number of trials with many controls, leads to the conclusion that a number of commonly accepted truths about which variable affects chirping must be carefully rewritten or nuanced. Based on their extensive analyses, the authors suggest that chirps are mainly used as probes that help detect beats and objects.

      Strengths:

      The work is based on completely novel recordings using interaction chambers. The amount of new data and associated analyses is simply staggering, and yet, well organized in presentation. The study further evaluates the electric field strength around a fish (via modelling with the boundary element method) and how its decay parallels the chirp rate, thereby relating the above variables to electric field geometry.

      The main conclusions are that the lack of any significant behavioural correlates for chirping, and the lack of temporal patterning in chirp time series, cast doubt on a communication goal for most chirps. Rather, the key determinants of chirping are the difference frequency between two interacting conspecifics as well as individual subjects' environmental and social experience. These conclusions by themselves will be hugely useful to the field. They will also allow scientists working on other "communication" systems to at least reconsider, and perhaps expand the precise goal of the probes used in those senses. There are a lot of data summarized in this paper, and thorough referencing to past work. For example, the paper concludes that there is a lack of evidence for stereotyped temporal patterning of chirp time series, as well as of sender-received chirp transitions beyond the known increase in chirp frequency during an interaction.

      The alternative hypotheses that arise from the work are that chirps are mainly used as environmental probes for better beat detection and processing and object localization.

      The authors also advance the interesting idea that the sinusoidal frequency modulations caused by chirps are the electric fish's solution to the minute (and undetectable by neural wetware) echo-delays available to it, due to the propagation of electric fields at the speed of light in water.

      Weaknesses:

      My main criticism is that the alternative putative role for chirps as probe signals that optimize beat detection could be better developed. The paper could be clearer as to what that means precisely.

      We appreciate the Reviewer's kind comments. While we acknowledge that our exploration of chirp function in this study may be limited and not entirely satisfying, we made this decision due to space constraints, opting for a broader and diversified approach. We hope that future studies will build on these data and start filling the gaps. We are also working on another manuscript which is addressing this point more in detail.

      Nonetheless, we considered the Reviewer’s criticism and added not only a new figure (to show more explicitly what chirps can do to the perceived electric fields, as simulated by electric images) but also more descriptive parts explaining how we think chirps may act to improve the spatial resolution of beat processing (see the discussion paragraph “probing with chirps”). In this paragraph we rendered more clearly how chirps could improve beat processing by phase shifting EODs and recovering eventual blind-spots on the fish skin caused by disruptive EOD interferences (resulting in lower beat contrast). We also mention that enhancement of electrosensory input triggered by chirps, could be localized not only at the level of electroreceptors (consider the synchronizing effects small chirps have on p-units at low frequency beats) but also at the level of ON and OFF pyramidal cells in the ELL. Looked at from the perspective of these neurons, any chirp would enhance the activity of these input lines, yet in opposite ways.

      And there is an egg-and-chicken type issue as well, namely, that one needs a beat in order to "chirp" the beating pattern, but then how does chirping optimize the detection of the said beat? Perhaps the authors mean (as they wrote elsewhere in the paper) that the chirps could enhance electrosensory responses to the beat.

      According to the Reviewer’s comment, we have now revised several instances of the misleading phrasing identified.

      In the results on novel environment exploration: “If chirps enhance beat processing, for instance, chirping should occur within beat detection range but at a certain distance.”.

      “This, in turn, could be used to validate our beat-interference estimates as meaningfully related to beat processing.” and “In all this, rises may represent an exception as their locations are spread over larger distances and even in presence of obstacles potentially occluding the beat source (such as shelters, plants, or walls), all of which are conditions in which beat detection or beat processing could be more difficult (this, could be coherent with the production of rises right at the end of EOD playbacks; Figure S5).”

      Last result paragraph (clutter experiment): “Overall, these results indicate that chirping is significantly affected by the presence of environmental clutter partially disrupting - or simply obstructing - the processing of beat related information during locomotion”.

      In the probing with chirps discussion paragraph “In theory, chirps could also be used to improve electrolocation of objects as well (as opposed to the processing of the beat).”.

      In the conclusions: “optimizing the otherwise passive responses to the beat”.

      A second criticism is that the study links the beat detection to underwater object localization. I did not see a sufficiently developed argument in this direction, nor how the data provided support for this argument. It is certainly possible that the image on the fish's body of an object in the environment will be slightly modified by introducing a chirp on the waveform, as this may enhance certain heterogeneities of the object in relation to its environment. The thrust of this argument seems to derive more from the notion of Fourier analysis with pulse type fish (and radar theory more generally) that the higher temporal frequencies in the beat waveform induced by the chirp will enable a better spatial resolution of objects. It remains to be seen whether this is significant.

      The Reviewer is correct in noting that this point is not addressed in the manuscript. We introduced it as a speculative discussion point to mention alternative possibilities. These could be subject to further testing in future studies.

      I would also have liked to see a proposal for new experiments that could test these possible new roles.

      We have added clearer suggestions for future experiments throughout the discussion: these may be aimed at 1) improving playback experiments using more realistic copies of the brown ghost’s EODs (including harmonics), 2) assess fish reciprocal positioning during chirping in better detail and 3) test the use of chirping during target-reaching tasks in order to better assess the probing function of chirps.

      The authors should recall for the readers the gist of Bastian's 2001 argument that the chirp "can adjust the beat frequency to levels that are better detectable" in the light of their current. Further, at the beginning of the "Probing with chirps" section, the 3rd way in which chirps could improve conspecific localization mentions the phase-shifting of the EOD. The authors should clarify whether they mean that the tuberous receptors and associated ELL/toral circuitry could deal with that cue, or that the T_unit pathway would be needed?

      We thank the Reviewer for identifying this unclear point. We added reference to the p-units “Yet, this does not exclude the possibility that chirps could be used to briefly shift the EOD phase in order to avoid disruptive interferences caused by phase opposition (at the level of p-units)” in the above mentioned paragraph. We would prefer to omit a more detailed reference to t-units in order to avoid lengthy descriptions required to discuss the different electroreceptor types.

      On p.17 I don't understand what is meant by most chirps being produced, possibly aligned with the field lines, since field lines are everywhere. And what is one to conclude from the comparison of Fig.6D and 7A? Likewise it was not clear what is meant by chirps having a detectable effect on randomly generated beats.

      We agree on the valid point raised by the Reviewer and we have removed reference to current lines from the text.

      In the section on Inconsistencies between behaviour and hypothesized signal meaning, the authors could perhaps nuance the interpretation of the results further in the context of the unrealistic copy of natural stimuli using EOD mimics. In particular, Kelly et al. 2008 argued that electrode placement mattered in terms of representation of a mimic fish onto the body of a real fish, and thus, if I properly understand the set up here, the movement would cause the mimic to vary in quality. This may nevertheless be a small confounding issue.

      We agree with the Reviewer and added a comment at the beginning of the paragraph mentioned. “Nonetheless, it's plausible that playback stimuli, as employed in our study and others, may not faithfully replicate natural signals, thus potentially influencing the reliability of the observed behaviors. Future studies might consider replicating these findings using either natural signals or improved mimics, which could include harmonic components (excluded in this study).”

      Recommendations for the authors:

      8Reviewer #2 (Recommendations For The Authors):*

      (1) Abstract: "...is probably the most intensely studied species..." is a weak, unsupported, and unnecessary statement. Just state that it has been heavily studied, or is one of the most well-studied,...

      rephrased

      (2) Abstract: "...are thus used as references to specific internal states during recordings - of either the brain or the electric organ..." This was not clear to me.

      rephrased

      (3) Abstract: "...the logic underlying this electric communication..." It is not clear to me what the authors mean here by "logic".

      rephrased

      (4) I strongly recommend clearly defining homeoactive sensing and distinguishing it from allocative sensing when this term is first introduced in the introduction. This is not a commonly used term. Most readers likely think they understand what is meant by the term active sensing, however I recommend first defining it, and then distinguishing amongst these two different types of active sensing.

      rephrased

      (5) Introduction: "Together with a few other species (Rose, 2004),..." More than a few. There are hundreds of species with electric organs. It is certainly not a "unique" capability.

      rephrased

      (6) Introduction: "But the real advantage of active electrolocation can be appreciated in the context of social interaction." This is unclear. Why is this the "real advantage" of active electrolocation when an electrically silent fish could detect an electrically communicating fish just fine without interference? Active electrolocation is needed to detect objects that are not actively emitting an electric field. It is not needed to detect signaling individuals.

      rephrased

      (7) Introduction: why is active sensing using EODs limited to distances of 6-12 cm? Why does it not work at closer range?

      Here we meant to give a range based on published data. We rephrased it to “up to 12”.

      (8) Introduction: electric fields decay with the cubed of distance, as you show in appendix 1.

      rephrased

      (9) Introduction: it is not clear what is meant by "blurred EOD amplitude".

      rephrased (“noisy”)

      (10) Figure 2C is very challenging to interpret. I recommend spending more time in the manuscript walking the reader through this analysis and its presentation.

      We are grateful for the comment as we probably overlooked this point. We now added a small paragraph to explain these data in better detail.

      (11) Results: "This was done by calculating the ratio between the duration of the beat cycles affected by the chirp (beat interpeak intervals) and the total duration of the beat cycles detected within a fixed time window (roughly double the size of the maximum chirp duration, 700 ms)." This was not clear to me.

      We now rephrased to “Estimates of beat interference were made by calculating the ratio between the cumulative duration of the beat cycles affected by a given chirp (1 beat cycle corresponding to the beat comprised by two consecutive beat peaks, or - more simply - the beat inter-peak interval) over the cumulative duration of all the beat cycles within the time window used as a reference (700 ms; other analysis windows were tested Figure S9)” to clarify this method.

      (12) Results: "For each chirp, the interference values obtained for 4 different phases (90{degree sign} steps) were averaged." Why was this done?

      To consider an average effect across phases. Although it is true that chirp parameters may have a different impact on the beat, depending on EOD phase, including this parameter in our figure/s would have considerably increased the volume of data reported giving too much emphasis to an analysis we judged not crucially important. In addition, since we did not consider EOD phase in our recordings, we opted for an average estimate encompassing different phase values.

      (13) Discussion: "Third, observations in a few species are generalized to all other gymnotiforms without testing for species differences (Turner et al., 2007; Smith et al., 2013; Petzold et al., 2016)." I strongly disagree with this statement. First, the studies referenced here do explicitly compare chirps across species. Second, you only studied one species here, so it is not clear to me how this is a relevant concern in interpreting your findings.

      Here we have probably been unclear in the writing: the point we wanted to make is that the idea of chirps having semantic content has been generalized to other species without investigating the nature of their chirping with as much detail as done for brown ghosts.

      We have now rephrased the statement and changed it to: “Second, observations in a few species are generalized to all other gymnotiforms without testing whether chirping may have similar functions in other species (Turner et al., 2007; Smith et al., 2013; Petzold et al., 2016)”

      (14) Discussion: "The two beats could be indistinguishable (assuming that the mechanism underlying the discrimination of the sign of DF at low DFs, and thought to be the basis of the so called jamming avoidance response (JAR; Metzner, 1999), is not functional at higher DFs)." Why would you assume this?

      What we meant here is that it is unlikely that the two DFs are not discriminated by the same mechanisms implied in the JAR, even if the DF is higher than the levels at which usually JARs are detected (i.e. DF = 1-10 Hz?). To improve clarity, we rephrased this statement. “The two beats could be indistinguishable (assuming - perhaps not realistically - that the same mechanism involved in DF discrimination at lower DF values would not work in this case; Metzner, 1999)”.

      (15) Discussion: "...an idea which seems congruent with published electrophysiological studies..." How so?

      Rephrased to “Based on our beat interference estimates, we propose that the occurrence of the different types of chirps at more positive DFs (such as in male-to-female chirping) may be explained by their different effect on the beat (Figure 5D; Benda et al., 2006; Walz et al., 2013).”

      Reviewer #3 (Recommendations For The Authors):

      On p.2 there is a discrepancy between the quoted ranges for active sensing of objects, first 10-12 cm, and then 6-12 cm further down. And in the following paragraph right below this passage, electric fields are said to decay with the squared distance (appendix 1). That expression has a cos(theta) which is inversely proportional to the distance, and so one is really dealing, as expected for dipolar fields, with a drop-off that decays with the distance cubed.

      We thank the Reviewer for the comment, we have now corrected the mistake and added “cubed”. We also removed the imprecise reference to the range 6-12 cm, rephrased to “up to 12 cm”.

      At the end of the section on Inconsistencies..., it is not clear what "activity levels" refers to. It should also be made clearer at the outset, and reminded in this section too, that for the authors, behavioural context does not include social experience, which is somewhat counter-intuitive.

      We now specified we meant “locomotor activity levels”. Regarding the social experience we included it as “behavioral context”, we now made it clearer in the first result paragraph. We hope we resolved the confusion.

      The caption of Fig.8 could use more clarity in terms of what is being compared in (C) (and is "1*2p" a typo?)

      We corrected the typo and edited the figure to make the references more clear.

      The concept of "high self-correlation of chirp time series" is presented only in the Conclusion using those words. The word self-correlation is not used beforehand. This needs to be fixed so the reader knows clearly what is being referred to.

      Thank you for noting this. We have now changed the wording using the term “auto-correlation” and changed a statement at the beginning of the “interference” result paragraph accordingly, removing references to self-correlation.

    1. Author response: 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Meissner-Bernard et al present a biologically constrained model of telencephalic area of adult zebrafish, a homologous area to the piriform cortex, and argue for the role of precisely balanced memory networks in olfactory processing.

      This is interesting as it can add to recent evidence on the presence of functional subnetworks in multiple sensory cortices. It is also important in deviating from traditional accounts of memory systems as attractor networks. Evidence for attractor networks has been found in some systems, like in the head direction circuits in the flies. However, the presence of attractor dynamics in other modalities, like sensory systems, and their role in computation has been more contentious. This work contributes to this active line of research in experimental and computational neuroscience by suggesting that, rather than being represented in attractor networks and persistent activity, olfactory memories might be coded by balanced excitation-inhibitory subnetworks.

      Strengths:

      The main strength of the work is in: (1) direct link to biological parameters and measurements, (2) good controls and quantification of the results, and (3) comparison across multiple models.

      (1) The authors have done a good job of gathering the current experimental information to inform a biological-constrained spiking model of the telencephalic area of adult zebrafish. The results are compared to previous experimental measurements to choose the right regimes of operation.

      (2) Multiple quantification metrics and controls are used to support the main conclusions and to ensure that the key parameters are controlled for - e.g. when comparing across multiple models.

      (3) Four specific models (random, scaled I / attractor, and two variant of specific E-I networks - tuned I and tuned E+I) are compared with different metrics, helping to pinpoint which features emerge in which model.

      Weaknesses:

      Major problems with the work are: (1) mechanistic explanation of the results in specific E-I networks, (2) parameter exploration, and (3) the functional significance of the specific E-I model.

      (1) The main problem with the paper is a lack of mechanistic analysis of the models. The models are treated like biological entities and only tested with different assays and metrics to describe their different features (e.g. different geometry of representation in Fig. 4). Given that all the key parameters of the models are known and can be changed (unlike biological networks), it is expected to provide a more analytical account of why specific networks show the reported results. For instance, what is the key mechanism for medium amplification in specific E/I network models (Fig. 3)? How does the specific geometry of representation/manifolds (in Fig. 4) emerge in terms of excitatory-inhibitory interactions, and what are the main mechanisms/parameters? Mechanistic account and analysis of these results are missing in the current version of the paper.

      We agree with the reviewer that a mechanistic analysis of manifold geometry is of high interest and we will address this issue in our revisions. We are currently exploring approaches to better understand how amplification of activity is controlled in E/I assemblies, and how geometric modifications can be described in terms of elementary excitatory and inhibitory interactions. We expect these approaches to provide new mechanistic insights into representational manifolds.

      (2) The second major issue with the study is a lack of systematic exploration and analysis of the parameter space. Some parameters are biologically constrained, but not all the parameters. For instance, it is not clear what the justification for the choice of synaptic time scales are (with E synaptic time constants being larger than inhibition: tau_syn_i = 10 ms, tau_syn_E = 30 ms). How would the results change if they are varying these - and other unconstrained - parameters? It is important to show how the main results, especially the manifold localisation, would change by doing a systematic exploration of the key parameters and performing some sensitivity analysis. This would also help to see how robust the results are, which parameters are more important and which parameters are less relevant, and to shed light on the key mechanisms.

      We varied neuronal and network parameters in the past and we are currently performing additional systematic parameter variations to further address this comment. Preliminary results indicate that networks with similar properties can be obtained with equal synaptic time constants and biophysical parameters for all E and I neurons, thus supporting the notion that representational geometry is determined primarily by connectivity. Results of parameter variations will be reported in the revised manuscript.

      (3) It is not clear what the main functional advantage of the specific E-I network model is compared to random networks. In terms of activity, they show that specific E-I networks amplify the input more than random networks (Fig. 3). But when it comes to classification, the effect seems to be very small (Fig. 5c). Description of different geometry of representation and manifold localization in specific networks compared to random networks is good, but it is more of an illustration of different activity patterns than proving a functional benefit for the network. The reader is still left with the question of what major functional benefits (in terms of computational/biological processing) should be expected from these networks, if they are to be a good model for olfactory processing and learning.

      One possibility for instance might be that the tasks used here are too easy to reveal the main benefits of the specific models - and more complex tasks would be needed to assess the functional enhancement (e.g. more noisy conditions or more combination of odours). It would be good to show this more clearly - or at least discuss it in relation to computation and function.

      We agree that further insights into potential benefits of manifold representations would be interesting. In the initial manuscript we performed analyses of pattern classification primarily to examine whether the structured E/I networks studied here can support pattern classification at all, given that they do not exhibit discrete attractor states or global pattern completion. As structured E/I networks still support pattern classification when activity is read out from neuronal subsets, we concluded that structured E/I networks are not in conflict with the general notion of pattern classification by autoassociation. In addition, manifold representations may support a variety of other computations that we discussed only superficially.  In the revised we are planning to address this issue in more depth by additional discussion and analyses. In particular, we are planning to address the hypothesis that manifold geometry provides a continuous distance metric to analyze relationships between inputs and relevant stimuli (learned odors) in the presence of irrelevant stimulus components (non-learned odors).

      Reviewer #2 (Public Review):

      Summary:

      The authors conducted a comparative analysis of four networks, varying in the presence of excitatory assemblies and the architecture of inhibitory cell assembly connectivity. They found that co-tuned E-I assemblies provide network stability and a continuous representation of input patterns (on locally constrained manifolds), contrasting with networks with global inhibition that result in attractor networks.

      Strengths:

      The findings presented in this paper are very interesting and cutting-edge. The manuscript effectively conveys the message and presents a creative way to represent high-dimensional inputs and network responses. Particularly, the result regarding the projection of input patterns onto local manifolds and continuous representation of input/memory is very Intriguing and novel. Both computational and experimental neuroscientists would find value in reading the paper.

      Weaknesses:

      Intuitively, classification (decodability) in discrete attractor networks is much better than in networks that have continuous representations. This could also be shown in Figure 5B, along with the performance of the random and tuned E-I networks. The latter networks have the advantage of providing network stability compared to the Scaled I network, but at the cost of reduced network salience and, therefore, reduced input decodability. The authors may consider designing a decoder to quantify and compare the classification performance of all four networks.

      As suggested by the reviewer, we will explicitly examine decodability by different types of networks in the revised manuscript.

      Networks featuring E/I assemblies could potentially represent multistable attractors by exploring the parameter space for their reciprocal connectivity and connectivity with the rest of the network. However, for co-tuned E-I networks, the scope for achieving multistability is relatively constrained compared to networks employing global or lateral inhibition between assemblies. It would be good if the authors mentioned this in the discussion. Also, the fact that reciprocal inhibition increases network stability has been shown before and should be cited in the statements addressing network stability (e.g., some of the citations in the manuscript, including Rost et al. 2018, Lagzi & Fairhall 2022, and Vogels et al. 2011 have shown this).

      We thank the reviewer for this comment and will revise the manuscript accordingly.

      Providing raster plots of the pDp network for familiar and novel inputs would help with understanding the claims regarding continuous versus discrete representation of inputs, allowing readers to visualize the activity patterns of the four different networks. (similar to Figure 1B).

      We will follow the suggestion by the reviewer and include raster plots of responses to both familiar and novel inputs in the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      This work investigates the computational consequences of assemblies containing both excitatory and inhibitory neurons (E/I assembly) in a model with parameters constrained by experimental data from the telencephalic area Dp of zebrafish. The authors show how this precise E/I balance shapes the geometry of neuronal dynamics in comparison to unstructured networks and networks with more global inhibitory balance. Specifically, E/I assemblies lead to the activity being locally restricted onto manifolds - a dynamical structure in between high-dimensional representations in unstructured networks and discrete attractors in networks with global inhibitory balance. Furthermore, E/I assemblies lead to smoother representations of mixtures of stimuli while those stimuli can still be reliably classified, and allow for more robust learning of additional stimuli.

      Strengths:

      Since experimental studies do suggest that E/I balance is very precise and E/I assemblies exist, it is important to study the consequences of those connectivity structures on network dynamics. The authors convincingly show that E/I assemblies lead to different geometries of stimulus representation compared to unstructured networks and networks with global inhibition. This finding might open the door for future studies for exploring the functional advantage of these locally defined manifolds, and how other network properties allow to shape those manifolds.

      The authors also make sure that their spiking model is well-constrained by experimental data from the zebrafish pDp. Both spontaneous and odor stimulus triggered spiking activity is within the range of experimental measurements. But the model is also general enough to be potentially applied to findings in other animal models and brain regions.

      Weaknesses:

      I find the point about pattern completion a bit confusing. In Fig. 3 the authors argue that only the Scaled I network can lead to pattern completion for morphed inputs since the output correlations are higher than the input correlations. For me, this sounds less like the network can perform pattern completion but it can nonlinearly increase the output correlations. Furthermore, in Suppl. Fig. 3 the authors show that activating half the assembly does lead to pattern completion in the sense that also non-activated assembly cells become highly active and that this pattern completion can be seen for Scaled I, Tuned E+I, and Tuned I networks. These two results seem a bit contradictory to me and require further clarification, and the authors might want to clarify how exactly they define pattern completion.

      We believe that this comment concerns a semantic misunderstanding and apologize for any lack of clarity. The reviewer is correct that “pattern completion” in morphing experiments can be described as a nonlinear increase in output correlations in response to related inputs. This is different from the results obtained by simulated current injections because currents were targeted to subsets of assembly neurons and the analysis focused on firing rates within and outside assemblies. We referred to results of both experiments as “pattern completion” because this has been standard in the neurobiological and in the computer science literature, respectively. However, we agree that this can cause confusion and we will revise the manuscript to clarify this issue.

      The authors argue that Tuned E+I networks have several advantages over Scaled I networks. While I agree with the authors that in some cases adding this localized E/I balance is beneficial, I believe that a more rigorous comparison between Tuned E+I networks and Scaled I networks is needed: quantification of variance (Fig. 4G) and angle distributions (Fig. 4H) should also be shown for the Scaled I network. Similarly in Fig. 5, what is the Mahalanobis distance for Scaled I networks and how well can the Scaled I network be classified compared to the Tuned E+I network? I suspect that the Scaled I network will actually be better at classifying odors compared to the E+I network. The authors might want to speculate about the benefit of having networks with both sources of inhibition (local and global) and hence being able to switch between locally defined manifolds and discrete attractor states.

      As pointed out already in response to reviewer 1, we agree that the potential computational benefits of continuous manifold representations in comparison to discrete attractor states is an important point that merits further exploration and discussion. We are therefore planning to include a more in-depth discussion and to perform further analyses. The specific suggestions of the reviewer will be addressed.

      At a few points in the manuscript, the authors use statements without actually providing evidence in terms of a Figure. Often the authors themselves acknowledge this, by adding the term "not shown" to the end of the sentence. I believe it will be helpful to the reader to be provided with figures or panels in support of the statements.

      Thank you for this comment. We shall be happy to include additional data figures in the revised manuscript.

    1. Author response:

      eLife assessment

      The authors present an algorithm and workflow for the inference of developmental trajectories from single-cell data, including a mathematical approach to increase computational efficiency. While such efforts are in principle useful, the absence of benchmarking against synthetic data and a wide range of different single-cell data sets make this study incomplete. Based on what is presented, one can neither ultimately judge if this will be an advance over previous work nor whether the approach will be of general applicability.

      We thank the eLife editor for the valuable feedback. We wish to emphasize that both, benchmarking against other methods and validation on a synthetic dataset (“dyntoy”) are indeed presented in Supplementary Note, although we failed to sufficiently emphasize it in the main text. 

      We will extend the benchmarking to more TI methods and we will improve the results and discussion sections to present those facts more clearly to the reader.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors present tviblindi, a computational workflow for trajectory inference from molecular data at single-cell resolution. The method is based on (i) pseudo-time inference via expecting hitting time, (ii) sampling of random walks in a directed acyclic k-NN where edges are oriented away from a cell of origin w.r.t. the involved nodes' expected hitting times, and (iii) clustering of the random walks via persistent homology. An extended use case on mass cytometry data shows that tviblindi can be used elucidate the biology of T cell development.

      Strengths:

      - Overall, the paper is very well written and most (but not all, see below) steps of the tviblindi algorithm are explained well.

      - The T cell biology use case is convincing (at least to me: I'm not an immunologist, only a bioinformatician with a strong interest in immunology).

      We thank the reviewer for feedback and suggestions that we will accommodate, we respond point-by-point below

      Weaknesses:

      - The main weakness of the paper is that a systematic comparison of tviblindi against other tools for trajectory inference (there are many) is entirely missing. Even though I really like the algorithmic approach underlying tviblindi, I would therefore not recommend to our wet-lab collaborators that they should use tviblindi to analyze their data. The only validation in the manuscript is the T cell development use case. Although this use case is convincing, it does not suffice for showing that the algorithms's results are systematically trustworthy and more meaningful (at least in some dimension) than trajectories inferred with one of the many existing methods.

      We have compared tviblindi to several trajectory inference methods (Supplementary note section 8.2: Comparison to state-of-the-art methods, namely Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021) and PAGA (scanpy==1.9.3) Wolf et al. (2019).) We will add thorough and systematic comparisons to the other algorithms mentioned by reviewers. We will include extended evaluation on publically available datasets.

      Also, we have successfully used tviblindi to investigate human B-cell development in primary immunodeficiency (manuscript in revisions), double negative T-cells development in ALPS (Autoimmune Lymphoproliferative Syndrome) by mass cytometry (project in progress).

      - The authors' explanation of the random walk clustering via persistent homology in the Results (subsection "Real-time topological interactive clustering") is not detailed enough, essentially only concept dropping. What does "sparse regions" mean here and what does it mean that "persistent homology" is used? The authors should try to better describe this step such that the reader has a chance to get an intuition how the random walk clustering actually works. This is especially important because the selection of sparse regions is done interactively. Therefore, it's crucial that the users understand how this selection affects the results. For this, the authors must manage to provide a better intuition of the maths behind clustering of random walks via persistent homology.

      In order to satisfy both reader types: the biologist and the mathematician, we explain the mathematics in detail in the Supplementary Note, section 4. We will improve the Results text to better point the reader to the mathematical foundations in the Supplementary Note.

      - To motivate their work, the authors write in the introduction that "TI methods often use multiple steps of dimensionality reduction and/or clustering, inadvertently introducing bias. The choice of hyperparameters also fixes the a priori resolution in a way that is difficult to predict." They claim that tviblindi is better than the original methods because "analysis is performed in the original high-dimensional space, avoiding artifacts of dimensionality reduction." However, in the manuscript, tviblindi is tested only on mass cytometry data which has a much lower dimensionality than scRNA-seq data for which most existing trajectory inference methods are designed. Since tviblindi works on a k-NN graph representation of the input data, it is unclear if it could be run on scRNA-seq data without prior dimensionality reduction. For this, cell-cell distances would have to be computed in the original high-dimensional space, which is problematic due to the very high dimensionality of scRNA-seq data. Of course, the authors could explicitly reduce the scope of tviblindi to data of lower dimensionality, but this would have to be stated explicitly.

      In the manuscript we tested the framework on the scRNA-seq data from Park et al 2020 (DOI: 10.1126/science.aay3224). To illustrate that tviblindi can work directly in the high-dimensional space, we applied the framework successfully on imputed 2000 dimensional data.

      The idea behind tviblindi is to be able to work without the necessity to use non-linear dimensionality reduction techniques, which reduce the dimensionality to a very low number of dimensions and whose effects on the data distribution are difficult to predict. On the other hand the use of (linear) dimensionality reduction techniques which effectively suppress noise in the data such as PCA is a good practice (see also response to reviewer 2). We will emphasize this in the revised version and add the results of the corresponding analysis.

      - Also tviblindi has at least one hyper-parameter, the number k used to construct the k-NN graphs (there are probably more hidden in the algorithm's subroutines). I did not find a systematic evaluation of the effect of this hyper-parameter.

      Detailed discussion of the topic is presented in the Supplementary Note, section 8.1, where Spearman correlation coefficient between pseudotime estimated using k=10 and k=50 nearest neighbors was 0.997.   The number k however does affect the number of candidate endpoints. But even when larger k causes spurious connection between unrelated cell fates, the topological clustering of random walks allows for the separation of different trajectories. We will expand the “sensitivity to hyperparameters section” also in response to reviewer 2.

      Reviewer #2 (Public Review):

      Summary:

      In Deconstructing Complexity: A Computational Topology Approach to Trajectory Inference in the Human Thymus with tviblindi, Stuchly et al. propose a new trajectory inference algorithm called tviblindi and a visualization algorithm called vaevictis for single-cell data. The paper utilizes novel and exciting ideas from computational topology coupled with random walk simulations to align single cells onto a continuum. The authors validate the utility of their approach largely using simulated data and establish known protein expression dynamics along CD4/CD8 T cell development in thymus using mass cytometry data. The authors also apply their method to track Treg development in single-cell RNA-sequencing data of human thymus.

      The technical crux of the method is as follows: The authors provide an interactive tool to align single cells along a continuum axis. The method uses expected hitting time (given a user input start cell) to obtain a pseudotime alignment of cells. The pseudotime gives an orientation/direction for each cell, which is then used to simulate random walks. The random walks are then arranged/clustered based on the sparse region in the data they navigate using persistent homology.

      We thank the reviewer for feedback and suggestions that we will accommodate, we respond point-by-point below.

      Strengths:

      The notion of using persistent homology to group random walks to identify trajectories in the data is novel.

      The strength of the method lies in the implementation details that make computationally demanding ideas such as persistent homology more tractable for large scale single-cell data.

      This enables the authors to make the method more user friendly and interactive allowing real-time user query with the data.

      Weaknesses:

      The interactive nature of the tool is also a weakness, by allowing for user bias leading to possible overfitting for a specific data.

      tviblindi is not designed as a fully automated TI tool (although it implements a fully automated module), but as a data driven framework for exploratory analysis of unknown data. There is always a risk of possible bias in this type of analysis - starting with experimental design, choice of hyperparameters in the downstream analysis, and an expert interpretation of the results. The successful analysis of new biological data involves a great deal of expert knowledge which is difficult to a priori include in the computational models.

      tvilblindi tries to solve this challenge by intentionally overfitting the data and keeping the level of resolution on a single random walk. In this way we aim to capture all putative local relationships in the data. The on-demand aggregation of the walks using the global topology of the data allows researchers to use their expert knowledge to choose the right level of detail (as demonstrated in the Figure 4 of the manuscript) while relying on the topological structure of the high dimensional point cloud. At all times tviblindi allows to inspect the composition of the trajectory to assess the variance in the development, possible hubs on the KNN-graph etc.

      The main weakness of the method is lack of benchmarking the method on real data and comparison to other methods. Trajectory inference is a very crowded field with many highly successful and widely used algorithms, the two most relevant ones (closest to this manuscript) are not only not benchmarked against, but also not sited. Including those that specifically use persistent homology to discover trajectories (Rizvi et.al. published Nat Biotech 2017). Including those that specifically implement the idea of simulating random walks to identify stable states in single-cell data (e.g. CellRank published in Lange et.al Nat Meth 2022), as well as many trajectory algorithms that take alternative approaches. The paper has much less benchmarking, demonstration on real data and comparison to the very many other previous trajectory algorithms published before it. Generally speaking, in a crowded field of previously published trajectory methods, I do not think this one approach will compete well against prior work (especially due to its inability to handle the noise typical in real world data (as was even demonstrated in the little bit of application to real world data provided).

      We provide comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021) and PAGA (scanpy==1.9.3) Wolf et al. (2019). We use two datasets: artificial Dyntoy and real mass cytometry thymus+peripheral blood dataset. We thank the reviewer for suggesting specific methods.  CellRank was excluded from the benchmarking as it was originally designed for RNA-velocity data (not available in mass cytometry data), but will include recent upgrade CellRank2 (preprint at doi.org/10.1101/2023.07.19.549685) which offers more flexibility.

      We will add further benchmarking as suggested by the reviewer in the course of revisions.

      Beyond general lack of benchmarking there are two issues that give me particular concern. As previously mentioned, the algorithm is highly susceptible to user bias and overfitting. The paper gives the example (Figure 4) of a trajectory which mistakenly shows that cells may pass from an apoptotic phase to a different developmental stage. To circumvent this mistake, the authors propose the interactive version of tviblindi that allows users to zoom in (increase resolution) and identify that there are in fact two trajectories in one. In this case, the authors show how the author can fix a mistake when the answer is known. However, the point of trajectory inference is to discover the unknown. With so much interactive options for the user to guide the result, the method is more user/bias driven than data-driven. So a rigorous and quantitative discussion of robustness of the method, as well as how to ensure data-driven inference and avoid over-fitting would be useful.

      Local directionality in expression data is a challenge which is not, to our knowledge, solved. And we are not sure it can be solved entirely, even theoretically. The random walks passing “through” the apoptotic phase are biologically infeasible, but it is an (unbiased) representation of what the data look like based on the diffusion model. It is a property of the data (or of the panel design), which has to be interpreted properly rather than a mistake. Of note, except for Monocle3 (which does not provide the directionality) other tested methods did not discover this trajectory at all.

      The “zoom in” has in fact nothing to do with “passing through the apoptosis”. We show how the researcher can investigate the suggested trajectory to see if there is an additional structure of interest and/or relevance. This investigation is still data driven (although not fully automated). Anecdotally in this particular case this branching was discovered by an bioinformatician, who knew nothing about the presence of beta-selection in the data. 

      We show that the trajectory of apoptosis of cortical thymocytes consists of 2 trajectories corresponding to 2 different checkpoints (beta-selection and positive/negative selection). This type of structure, where 2 (or more) trajectories share the same path for most of the time, then diverge only to be connected at a later moment (immediately from the point of view of the beta-selection failure trajectory) is a challenge for TI algorithms and none of tested methods gave a correct result. More importantly there seems to be no clear way to focus on these kinds of structures (common origin and common fate) in TI methods.

      Of note, the “zoom in” is a recommended and convenient method to look for an inner structure, but it does not necessarily mean addition of further homological classes. Indeed, in this case the reason that the structure is not visible directly is the limitation of the dendrogram complexity (only branches containing at least 10% of simulated random walks are shown by default).

      In summary, tviblindi effectively handled all noise in the data that obscured biologically valid trajectories for other methods. We will improve the discussion of the robustness in the reviewed version. 

      Second, the paper discusses the benefit of tviblindi operating in the original high dimensions of the data. This is perhaps adequate for mass cytometry data where there is less of an issue of dropouts and the proteins may be chosen to be large independent. But in the context of single-cell RNA-sequencing data, the massive undersampling of mRNA, as well as high degree of noise (e.g. ambient RNA), introduces very large degree of noise so that modeling data in the original high dimensions leads to methods being fit to the noise. Therefore ALL other methods for trajectory inference work in a lower dimension, for very good reason, otherwise one is learning noise rather than signal. It would be great to have a discussion on the feasibility of the method as is for such noisy data and provide users with guidance. We note that the example scRNA-seq data included in the paper is denoised using imputation, which will likely result in the trajectory inference being oversmoothed as well.

      We agree with the reviewer. In our manuscript we wanted to showcase that tviblindi can directly operate in high-dimensional space (thousands of dimensions) and we used MAGIC imputation for this purpose. This was not ideal. More standard approach, which uses 30-50 PCs as input to the algorithm resulted in equivalent trajectories. We will add this analysis to the study.

      In summary, the fact that tviblindi scales well with dimensionality of the data and is able to work in the original space does not mean that it is always the best option. We will emphasize in the revised paper that we aim to avoid the non-linear dimensional reduction techniques as a data preprocessing tool, as the effect of the reduction is difficult to predict. We will also discuss the preprocessing of scRNA-seq data in greater detail.

      Reviewer #3 (Public Review):

      Summary:

      Stuchly et al. proposed a single-cell trajectory inference tool, tviblindi, which was built on a sequential implementation of the k-nearest neighbor graph, random walk, persistent homology and clustering, and interactive visualization. The paper was organized around the detailed illustration of the usage and interpretation of results through the human thymus system.

      Strengths:

      Overall, I found the paper and method to be practical and needed in the field. Especially the in-depth, step-by-step demonstration of the application of tviblindi in numerous T cell development trajectories and how to interpret and validate the findings can be a template for many basic science and disease-related studies. The videos are also very helpful in showcasing how the tool works.

      Weaknesses:

      I only have a few minor suggestions that hopefully can make the paper easier to follow and the advantage of the method to be more convincing.

      (1) The "Computational method for the TI and interrogation - tviblindi" subsection under the Results is a little hard to follow without having a thorough understanding of the tviblindi algorithm procedures. I would suggest that the authors discuss the uniqueness and advantages of the tool after the detailed introduction of the method (moving it after the "Connectome - a fully automated pipeline".

      We thank the reviewer for the suggestion and we will accommodate it to improve readability of the text.

      Also, considering it is a computational tool paper, inevitably, readers are curious about how it functions compared to other popular trajectory inference approaches. I did not find any formal discussion until almost the end of the supplementary note (even that is not cited anywhere in the main text). Authors may consider improving the summary of the advantages of tviblindi by incorporating concrete quantitative comparisons with other trajectory tools.

      We provide comparisons of tviblindi and vaevictis in the Supplementary Note, section 8.2, where we compare it to Monocle3 (v1.3.1) Cao et al. (2019), Stream (v1.1) Chen et al. (2019), Palantir (v1.0.0) Setty et al. (2019), VIA (v0.1.89) Stassen et al. (2021) and PAGA (scanpy==1.9.3) Wolf et al. (2019). We use two datasets: artificial Dyntoy and real mass cytometry thymus+peripheral blood dataset. We will also add CellRank2 into comparisons and we will strengthen the message of the benchmarking results in the Discussion section.

      (2) Regarding the discussion in Figure 4 the trajectory goes through the apoptotic stage and reconnects back to the canonical trajectory with counterintuitive directionality, it can be a checkpoint as authors interpret using their expert knowledge, or maybe a false discovery of the tool. Maybe authors can consider running other algorithms on those cells and see which tracks they identify and if the directionality matches with the tviblindi.

      We have indeed used the thymus dataset for comparison of all TI algorithms listed above. Except for Monocle 3 they failed to discover the negative selection branch (Monocle 3 does not offer directionality information). Therefore, a valid topological trajectory with incorrect (expert-corrected) directionality was partly or entirely missed by other algorithms.

      (3) The paper mainly focused on mass cytometry data and had a brief discussion on scRNA-seq. Can the tool be applied to multimodality data such as CITE-seq data that have both protein markers and gene expression? Any suggestions if users want to adapt to scATAC-seq or other epigenomic data?

      The analysis of multimodal data is the logical next step and is the topic of our current research. At this moment tviblindi cannot be applied directly to multimodal data. It is possible to use the KNN-graph based on multimodal data (such as weighted nearest neighbor graph implemented in Seurat) for pseudotime calculation and random walk simulation. However, we do not have a fully developed triangulation for the multimodal case yet.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript focuses on the role of the deubiquitinating enzyme UPS-50/USP8 in endosome maturation. The authors aimed to clarify how this enzyme drives the conversion of early endosomes into late endosomes. Overall, they did achieve their aims in shedding light on the precise mechanisms by which UPS-50/USP8 regulates endosome maturation. The results support their conclusions that UPS-50 acts by disassociating RABX-5 from early endosomes to deactivate RAB-5 and by recruiting SAND-1/Mon1 to activate RAB-7. This work is commendable and will have a significant impact on the field. The methods and data presented here will be useful to the community in advancing our understanding of endosome maturation and identifying potential therapeutic targets for diseases related to endosomal dysfunction. It is worth noting that further investigation is required to fully understand the complexities of endosome maturation. However, the findings presented in this manuscript provide a solid foundation for future studies.

      We thank this reviewer for the instructive suggestions and encouragement.

      Strengths:

      The major strengths of this work lie in the well-designed experiments used to examine the effects of UPS-50 loss. The authors employed confocal imaging to obtain a picture of the aftermath of the USP-50 loss. Their findings indicated enlarged early endosomes and MVB-like structures in cells deficient in USP-50/USP8.

      We thank this reviewer for the instructive suggestions and encouragement.

      Weaknesses:

      Specifically, there is a need for further investigation to accurately characterize the anomalous structures detected in the ups-50 mutant. Also, the correlation between the presence of these abnormal structures and ESCRT-0 is yet to be addressed, and the current working model needs to be revised to prevent any confusion between enlarged early endosomes and MVBs.

      Excellent suggestions. The EM imaging indeed revealed an increase in enlarged cellular vesicles containing various contents in usp-50 mutants. However, the detailed molecular features of these vesicles remain unclear. Therefore, we plan to utilize ESCRT components for double staining with early or late endosome markers. This will enable us to accurately characterize the anomalous structures detected in the usp-50 mutants.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors study how the deubiquitinase USP8 regulates endosome maturation in C. elegans and mammalian cells. The authors have isolated USP8 mutant alleles in C. elegans and used multiple in vivo reporter lines to demonstrate the impact of USP8 loss-of-function on endosome morphology and maturation. They show that in USP8 mutant cells, the early endosomes and MVB-like structures are enlarged while the late endosomes and lysosomal compartments are reduced. They elucidate that USP8 interacts with Rabx5, a guanine nucleotide exchange factor (GEF) for Rab5, and show that USP8 likely targets specific lysine residue of Rabx5 to dissociate it from early endosomes. They also find that the localization of USP8 to early endosomes is disrupted in Rabx5 mutant cells. They observe that in both Rabx5 and USP8 mutant cells, the Rab7 GEF SAND-1 puncta which likely represents late endosomes are diminished, although Rabex5 is accumulated in USP8 mutant cells. The authors provide evidence that USP8 regulates endosomal maturation in a similar fashion in mammalian cells. Based on their observations they propose that USP8 dissociates Rabex5 from early endosomes and enhances the recruitment of SAND-1 to promote endosome maturation.

      We thank this reviewer for the instructive suggestions and encouragement.

      Strengths:

      The major highlights of this study include the direct visualization of endosome dynamics in a living multi-cellular organism, C. elegans. The high-quality images provide clear in vivo evidence to support the main conclusions. The authors have generated valuable resources to study mechanisms involved in endosome dynamics regulation in both the worm and mammalian cells, which would benefit many members of the cell biology community. The work identifies a fascinating link between USP8 and the Rab5 guanine nucleotide exchange factor Rabx5, which expands the targets and modes of action of USP8. The findings make a solid contribution toward the understanding of how endosomal trafficking is controlled.

      We thank this reviewer for the instructive suggestions and encouragement.

      Weaknesses:

      -The authors utilized multiple fluorescent protein reporters, including those generated by themselves, to label endosomal vesicles. Although these are routine and powerful tools for studying endosomal trafficking, these results cannot tell whether the endogenous proteins (Rab5, Rabex5, Rab7, etc.) are affected in the same fashion.

      Good suggestion. Indeed, to test whether the endogenous proteins (Rab5, Rabex5, Rab7, etc.) are affected in the same fashion as fluorescent protein reporters, we supplemented our approach with the utilization of endogenous markers. These markers, including Rab5, RAB-5, Rabex5, RABX-5, and EEA1 for early endosomes, as well as RAB-7, Mon1a, and Mon1b for late endosomes, were instrumental in our investigations (refer to Figure 3, Figure 6, Sup Figure 4, Sup Figure 5, and Sup Figure 7). Our comprehensive analysis, employing various methodologies such as tissue-specific fused proteins, CRISPR/Cas9 knock-in, and antibody staining, consistently highlights the critical role of USP8 in early-to-late endosome conversion.

      -The authors clearly demonstrated a link between USP8 and Rabx5, and they showed that cells deficient in both factors displayed similar defects in late endosomes/lysosomes. However, the authors didn't confirm whether and/or to which extent USP8 regulates endosome maturation through Rabx5. Additional genetic and molecular evidence might be required to better support their working model.

      Excellent point. We plan to conduct additional genetic analyses, including the construction of double mutants between usp-50 and various rabex-5 mutations, to further elucidate the extent to which USP8 regulates endosome maturation via Rabex5.

      Reviewer #3 (Public Review):

      Summary:

      The authors were trying to elucidate the role of USP8 in the endocytic pathway. Using C. elegans epithelial cells as a model, they observed that when USP8 function is lost, the cells have a decreased number and size in lysosomes. Since USP8 was already known to be a protein linked to ESCRT components, they looked into what role USP8 might play in connecting lysosomes and multivesicular bodies (MVB). They observed fewer ESCRT-associated vesicles but an increased number of "abnormal" enlarged vesicles when USP8 function was lost. At this specific point, it's not clear what the objective of the authors was. What would have been their hypothesis addressing whether the reduced lysosomal structures in USP8 (-) animals were linked to MVB formation? Then they observed that the abnormally enlarged vesicles, marked by the PI3P biosensor YFP-2xFYVE, are bigger but in the same number in USP8 (-) compared to wild-type animals, suggesting homotypic fusion. They confirmed this result by knocking down USP8 in a human cell line, and they observed enlarged vesicles marked by YFP-2xFYVE as well. At this point, there is quite an important issue. The use of YFP-2xFYVE to detect early endosomes requires the transfection of the cells, which has already been demonstrated to produce differences in the distribution, number, and size of PI3P-positive vesicles (doi.org/10.1080/15548627.2017.1341465). The enlarged vesicles marked by YFP-2xFYVE would not necessarily be due to the loss of UPS8. In any case, it appears relatively clear that USP8 localizes to early endosomes, and the authors claim that this localization is mediated by Rabex-5 (or Rabx-5). They finally propose that USP8 dissociates Rabx-5 from early endosomes facilitating endosome maturation.

      Weaknesses:

      The weaknesses of this study are, on one side, that the results are almost exclusively dependent on the overexpression of fusion proteins. While useful in the field, this strategy does not represent the optimal way to dissect a cell biology issue. On the other side, the way the authors construct the rationale for each approximation is somehow difficult to follow. Finally, the use of two models, C. elegans and a mammalian cell line, which would strengthen the observations, contributes to the difficulty in reading the manuscript.

      The findings are useful but do not clearly support the idea that USP8 mediates Rab5-Rab7 exchange and endosome maturation, In contrast, they appear to be incomplete and open new questions regarding the complexity of this process and the precise role of USP8 within it.

      We thank this reviewer for the insightful comments. Fluorescence-fused proteins serve as potent tools for visualizing subcellular organelles both in vivo and in live settings. Specifically, in epidermal cells of worms, the tissue-specific expression of these fused proteins is indispensable for studying organelle dynamics within living organisms. This approach is necessitated by the inherent limitations of endogenously tagged proteins, whose fluorescence signals are often weak and unsuitable for live imaging or genetic screening purposes. Acknowledging concerns raised by the reviewer regarding potential alterations in organelle morphology due to overexpression of certain fused proteins, we supplemented our approach with the utilization of endogenous markers. These markers, including Rab5, RAB-5, Rabex5, RABX-5, and EEA1 for early endosomes, as well as RAB-7, Mon1a, and Mon1b for late endosomes, were instrumental in our investigations (refer to Figure 3, Figure 6, Sup Figure 4, Sup Figure 5, and Sup Figure 7). Our comprehensive analysis, employing various methodologies such as tissue-specific fused proteins, CRISPR/Cas9 knock-in, and antibody staining, consistently highlights the critical role of USP8 in early-to-late endosome conversion. Specifically, we discovered that the recruitment of USP-50/USP8 to early endosomes is depending on Rabex5. However, instead of stabilizing Rabex5, the recruitment of USP-50/USP8 leads to its dissociation from endosomes, concomitantly facilitating the recruitment of the Rab7 GEF SAND-1/Mon1. In cells with loss-of-function mutations in usp-50/usp8, we observed enhanced RABX-5/Rabex5 signaling and mis-localization of SAND-1/Mon1 proteins from endosomes. Consequently, this disruption impairs endolysosomal trafficking, resulting in the accumulation of enlarged vesicles containing various intraluminal contents and rudimentary lysosomal structures.

      Through an unbiased genetic screen, verified by cultured mammalian cell studies, we observed that loss-of-function mutations in usp-50/usp8 result in diminished lysosome/late endosomes. To elucidate the underlying mechanisms, we investigated the formation of multivesicular bodies (MVBs), a process tightly linked to USP8 function. Extensive electron microscopy (EM) analysis indicated that MVB-like structures are largely intact in usp-50 mutant cells, suggesting that USP8/USP-50 likely regulate lysosome formation through alternative pathways in addition to their roles in MVB formation and ESCRT component function. USP8 is known to regulate the endocytic trafficking and stability of numerous transmembrane proteins. Interestingly, loss-of-function mutations in usp8 often lead to the enlargement of early endosomes, yet the mechanisms underlying this phenomenon remain unclear. Given that lysosomes receive and degrade materials generated by endocytic pathways, we hypothesized that the abnormally enlarged MVB-like vesicular structures observed in usp-50 or usp8 mutant cells correspond to the enlarged vesicles coated by early endosome markers. Indeed, in the absence of usp8/usp-50, the endosomal Rab5 signal is enhanced, while early endosomes are significantly enlarged. Given that Rab5 guanine nucleotide exchange factor (GEF), Rabex5, is essential for Rab5 activation, we further investigated its dynamics. Additional analyses conducted in both worm hypodermal cells and cultured mammalian cells revealed an increase of endosomal Rabex5 in response to usp8/usp-50 loss-of-function. Live imaging studies further demonstrated active recruitment of USP8 to newly formed Rab5-positive vesicles, aligning spatiotemporally with Rabex5 regulation. Through systematic exploration of putative USP-50 binding partners on early endosomes, we identified its interaction with Rabex5. Comprehensive genetics and biochemistry experiments demonstrated that USP8 acts through K323 site de-ubiquitination to dissociate Rabex5 from early endosomes and promotes the recruitment of the Rab7 GEF SAND-1/Mon1. In summary, our study began with an unbiased genetic screen and subsequent examination of established theories, leading to the formulation of our own hypothesis. Through multifaceted approaches, we unveiled a novel function of USP8 in early-to-late endosome conversion.

    1. Author response:

      We would like to thank all the reviewers and editors for their thoughtful and detailed comments, critiques and suggestions. We will revise our manuscript in accordance with all the points raised by the reviewers. Here we summarize some of the main points that we intend to address in our revised manuscript.

      The reviewers noted that we were not sufficiently careful in identifying possible exogenous cues that the mice might be using to locate the cues and that we did not consider why such cues might be ineffective. As the reviewers point out, the mice may be ignoring the visual landmarks (and floor scratches) because they are not reliable cues and their relation to the food varies with the entrance the mice have used. In particular, a reviewer refers to papers that show that “in environments with 'unreliable' landmarks, place cells are not controlled by landmarks”. These papers were known to the authors but failed to make final cut of our extensive discussion. This important point will be thoroughly addressed.

      Another critical point was the mice were often doing thigmotaxis. The literature on thigmotaxis was known to us and we will now directly refer to this point. We do note that the final average start to food trajectory (TEV) is directly to the food. In other words, the thigmotaxic trajectories and “towards the center” trajectories effectively average out.

      There was a very cogent point about the difficulty of totally eliminating odor cues that we will now address. Finally, based on studies using a virtual reality environment, one reviewer questioned the use of “path integration” as a signal that encodes goal location. The relevance of path integration to spatial learning and performance is a very difficult issue that, to our knowledge, has never been entirely settled in the vast spatial learning literature. We do not think that our data can “settle’ this issue but will try to at least be explicit re the complexity of the path integration hypothesis as it applies to both our own data and the virtual reality literature. In particular, we will discuss the potential roles of optic flow versus proprioceptive and vestibular inputs to a putative path integration mechanism.

      Finally, the reviewers raised many important technical points re statistics reporting and how the figures are presented. In our revision, we will completely comply with all these helpful critiques.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      Chang et al. provide glutamate co-expression profiles in the central noradrenergic system and test the requirement of Vglut2-based glutamatergic release in respiratory and metabolic activity under physiologically relevant gas challenges. Their experiments provide compelling evidence that conditional deletion of Vglut2 in noradrenergic neurons does not impact steadystate breathing or metabolic activity in room air, hypercapnia, or hypoxia. This study provides an important contribution to our understanding of how noradrenergic neurons regulate respiratory homeostasis in conscious adult mice.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Chang et al. provide glutamate co-expression profiles in the central noradrenergic system and test the requirement of Vglut2-based glutamatergic release in respiratory and metabolic activity under physiologically relevant gas challenges. Their experiments show that conditional deletion of Vglut2 in NA neurons does not impact steady-state breathing or metabolic activity in room air, hypercapnia, or hypoxia. Their observations challenge the importance of glutamatergic signaling from Vglut2 expressing NA neurons in normal respiratory homeostasis in conscious adult mice.

      Strengths:

      The comprehensive Vglut1, Vglut2, and Vglut3 co-expression profiles in the central noradrenergic system and the combined measurements of breathing and oxygen consumption are two major strengths of this study. Observations from these experiments provide previously undescribed insights into (1) expression patterns for subtypes of the vesicular glutamate transporter protein in the noradrenergic system and (2) the dispensable nature of Vglut2-dependent glutamate signaling from noradrenergic neurons to breathing responses to physiologically relevant gas challenges in adult conscious mice.

      Weaknesses:

      Although the cellular expression profiles for the vesicular glutamate transporters are provided, the study fails to document that glutamatergic-based signaling originating from noradrenergic neurons is evident at the cellular level under normal, hypoxic, and/or hypercapnic conditions. This limits the reader's understanding of why conditional Vglut2 knockdown is dispensable for breathing under the conditions tested.

      We thank the reviewers for their positive evaluation of our work. First, we would like to highlight that multiple studies have provided anatomical evidence of innervation of multiple cardio-respiratory nuclei by Vglut2+ noradrenergic fibers. Thus, the anatomical substrates are present for noradrenergic based Vglut2 signaling to either play a direct role in breathing control or, upon perturbation, to indirectly affect breathing through disrupted metabolic or cardiovascular control. We have included supplemental table 1 that summarizes central noradrenergic Vglut2+ innervations of respiratory and autonomic nuclei. Additionally, Ultrastructural evidence shows asymmetric synaptic contacts assuming glutamatergic transmission between C1 neurons and LC, A1, A2 and the dorsal motor nucleus of the vagus (DMV) (Milner et al., 1989; Abbott et al., 2012; Holloway et al., 2013; DePuy et al., 2013).

      Functionally, electrophysiological evidence showed that photostimulating C1 neurons activate LC, A1, A2 noradrenergic neurons monosynaptically by releasing glutamate (Holloway et al., 2013; DePuy et al., 2013) and optogenetic stimulation of LC neurons excite the downstream parabrachial nucleus (PBN) neurons by releasing glutamate. Thus, at least the glutamatergic signaling from C1 and LC noradrenergic neurons (two noradrenergic nuclei that have been shown to play a role in breathing control) is evident at the cellular level under normal conditions. Other evidence, highlighted in our manuscript, is more circumstantial.

      Reviewer #2 (Public Review):

      The authors characterized the recombinase-based cumulative fate maps for vesicular glutamate transporters (Vglut1, Vglut2 and Vglut3) expression and compared those maps to their real-time expression profiles in central NA neurons by RNA in situ hybridization in adult mice. Authors have revealed a new and intriguing expression pattern for Vglut2, along with an entirely uncharted co-expression domain for Vglut3 within central noradrenergic neurons. Interestingly, and in contrast to previous studies, the authors demonstrated that glutamatergic signaling in central noradrenergic neurons does not exert any influence on breathing and metabolic control either under normoxic/normocapnic conditions or after chemoreflex stimulation. Also, they showed for the first-time the Vglut3-expressing NA population in C2/A2 nuclei. In addition, they were also able to demonstrate Vglut2 expression in anterior NA populations, such as LC neurons, by using more refined techniques, unlike previous studies.

      A major strength of the study is the use of a set of techniques to investigate the participation of NA-based glutamatergic signaling in breathing and metabolic control. The authors provided a full characterization of the recombinase-based cumulative fate maps for Vglut transporters. They performed real-time mRNA expression of Vglut transporters in central NA neurons of adult mice. Further, they evaluated the effect of knocking down Vglut2 expression in NA neurons using a DBH-Cre; Vglut2cKO mice on breathing and control in unanesthetized mice. Finally, they injected the AAV virus containing Cre-dependent Td tomato into LC of v-Glut2 Cre mice to verify the VGlut2 expression in LC-NA neurons. A very positive aspect of the article is that the authors combined ventilation with metabolic measurements. This integration holds particular significance, especially when delving into the exploration of respiratory chemosensitivity. Furthermore, the sample size of the experiments is excellent.

      Despite the clear strengths of the paper, some weaknesses exist. It is not clear in the manuscript if the experiments were performed in males and females and if the data were combined. I believe that the study would have benefited from a more comprehensive analysis exploring the sex specific differences. The reason I think this is particularly relevant is the developmental disorders mentioned by the authors, such as SIDS and Rett syndrome, which could potentially arise from disruptions in central noradrenergic (NA) function, exhibit varying degrees of sex predominance. Moreover, some of the noradrenergic cell groups are sexually dimorphic. For instance, female Wistar rats exhibit a larger LC size and more LC-NA neurons than male subjects (Pinos et al., 2001; Garcia-Falgueras et al., 2005). More recently, a detailed transcriptional profiling investigation has unveiled the identities of over 3,000 genes in the LC. This revelation has highlighted significant sexual dimorphisms, with more than 100 genes exhibiting differential expression within LC-NA neurons at the transcript level. Furthermore, this investigation has convincingly showcased that these distinct gene expression patterns have the capacity to elicit disparate behavioral responses between sexes (Mulvey et al., 2018). Therefore, the authors should compare the fate maps, Vglut transporters in males and females, at least considering LC-NA neurons. Even in the absence of identified sex differences, this information retains significant importance.

      All experiments contained both males and females as described in the original submission. In our analysis of breathing and metabolism, sex was included in the analysis and no significant phenotypic difference was observed. For the fate map and in situ experiments, we did not see obvious differences in the expression patterns in the three glutamate transporters between females and males, though the group size is small. Though all the anatomical and phenotypic data in this manuscript are presented as combined graphs, we have differentially labeled our data points by sex. The reviewer does raise important questions regarding possible sexual dimorphisms in the central noradrenergic system and whether such dimorphisms may extend to glutamate transporter co-expression. Our thorough interrogation of respiratory-metabolic parameters fails to reveal any sex specific differences in control or experimental mice. Thus, it is unclear if any of the previously described and cited dimorphisms are functionally relevant in this setting. Given the large differences in the real time expression and cumulative fate maps of Vglut2, a worthwhile interrogation of differential glutamate transporter expression would be best served by longitudinal studies with large group sizes across age as it is not clear what underlies the dynamic VGlut2 expression changes. Such changes may at times be greater in males and other times in females, driven by experience or physiological challenges etc., but resulting in averaged cumulative fatemaps that are similar between sexes. Such a longitudinal quantitative study of real-time and fatemapped cell populations across the central NA system would be of a scale that is beyond the scope of this report, especially when no phenotypic changes have been observed in our respiratory data.

      An important point well raised by the authors is that although suggestive, these experiments do not definitively rule out that NA-Vglut2 based glutamatergic signaling has a role in breathing control. Subsequent experiments will be necessary to validate this hypothesis.

      As noted, we discuss that we only address requirement, not sufficiency, of NA Vglut2 in breathing. Functional sufficiency experiments usually involve increasing the relevant output. However, these experiments can lead to non-specific, pleiotropic effects that would be difficult to disambiguate, even if done with high cellular specificity. Viral or genetic overexpression of Vglut2 in NA neurons may be a feasible approach. Conditional ablation of TH or DBH with concurrent chemo or optogenetic stimulation may also be informative. These approaches would require significant investments in mouse model generation and suffer additional experimental limitations.

      An improvement could be made in terms of measuring body temperature. Opting for implanted sensors over rectal probes would circumvent the need to open the chamber, thereby preventing alterations in gas composition during respiratory measurements. Further, what happens to body temperature phenotype in these animals under different gas exposures? These data should be included in the Tables.

      While surgical implantation of sensors would provide a more direct assessment of temperature, it requires components that were not available at the time of the study and addresses a question (temperature changes during a time course of gas exposure) that go beyond the scope of the current work focused on respiratory response. As we have done for prior experiments (Martinez et al., 2019; Ray et al., 2011), the body temperature was measured immediately before and after measuring breathing only. Our flow through system using inline gas sensors (AEI P-61B CO2 sensor and AEI N-22M O2 sensor) ensure that gas challenges were constant and consistent across all measurements. Any disruption in gas composition would have been noted by our software analysis system, Breathe Easy, and the data rejected. We did not observe any such perturbations.

      Is it plausible that another neurotransmitter within NA neurons might be released in higher amounts in DBH-Cre; Vglut2 cKO mice to compensate for the deficiency in glutamate and prevent changes in ventilation?

      We agree that compensation is always a possibility at the synaptic, cellular, and circuit levels that may involve a variety of transcriptional, translational, cellular, and circuit mechanisms (i.e., synaptic strength). This could be interrogated by combining multiple conditional alleles and recombinase drivers for various transmitters and receptors, but would, in our experience, take multiple years for the requisite breeding to be completed.

      Continuing along the same line of inquiry is there a possibility that Vglut2 cKO from NA neurons not only eliminates glutamate release but also reduces NA release? A similar mechanism was previously found in VGLUT2 cKO from DA neurons in previous studies (Alsio et al., 2011; Fortin et al., 2012; Hnasko et al., 2010). Additionally, does glutamate play a role in the vesicular loading of NA? Therefore, could the lack of effect on breathing be explained by the lack of noradrenaline and not glutamate?

      These are all excellent points, but prior studies suggest that reductions in NA signaling would itself have an apparent effect (Zanella et al., 2006; Kuo et al., 2016). Although several studies showed that LC and C1 NA neurons co-release noradrenaline and glutamate, no direct evidence yet makes clear that glutamate facilitates NA release or vice versa. However, it would be of great interest to test if reduced or lack of NA compensated for loss of glutamate in the future. We do fully acknowledge that compensation in the manuscript that any number of compensatory events could be at play in these findings.

      Reviewer #3 (Public Review):

      Summary:

      The authors, Y Chang and colleagues, have performed elegant studies in transgenic mouse models that were designed to examine glutamatergic transmission in noradrenergic neurons, with a focus on respiratory regulation. They generated 3 different transgenic lines, in which a red fluorophore was expressed in dopamine-B-hydroxylase (DBH; noradrenergic and adrenergic neurons) neurons that did not express a vesicular glutamate transporter (Vglut) and a green fluorophore in DBH neurons that did express one of either Vglut1, Vglut2 or Vglut3.

      Further experiments generated a transgenic mouse with knockout of Vglut2 in DBH neurons. The authors used plethysmography to measure respiratory parameters in conscious, unrestrained mice in response to various challenges.

      Strengths:

      The distribution of the Vglut expression is broadly in agreement with other studies, but with the addition of some novel Vglut3 expression. Validation of the transgenic results, using in situ hybridization histochemistry to examine mRNA expression, revealed potential modulation of Vglut2 expression during phases of development. This dataset is comprehensive, wellpresented and very useful.

      In the physiological studies the authors observed that neither baseline respiratory parameters, nor respiratory responses to hypercapnea (5, 7, 10% CO2) or hypoxia (10% O2) were different between knockout mice and littermate controls. The studies are well-designed and comprehensive. They provide observations that are supportive of previous reports using similar methodology.

      Weaknesses:

      In relation to the expression of Vglut2, the authors conclude that modulation of expression occurs, such that in adulthood there are differences in expression patterns in some (nor)adrenergic cell groups. Altered sensitivity is provided as an explanation for different results between studies examining mRNA expression. These are likely explanations; however, the conclusion would really be definitive with inclusion of a conditional cre expressing mouse. Given the effort taken to generate this dataset, it seems to me that taking that extra step would be of value for the overall understanding of glutamatergic expression in these catecholaminergic neurons

      The seemingly dynamic Vglut2 expression pattern across the NA system is intriguing. As noted in our comments to reviewer 2, a robust age dependent interrogation would require a large magnitude study. The reviewer correctly points out that a temporally controlled recombinase fate mapping experiment would offer greater insight into the dynamic expression of Vglut2. We strongly agree with that idea and did work to develop a Vglut2-CreER targeted allele that, despite our many other successes in mouse genetic engineering (Lusk et al., 2022; Sun and Ray, 2016), did not succeed on the first attempt. We aim to complete the line in the near future so that we may better understand the Vglut2 expression pattern in central noradrenergic neurons in a time-specific manner and sex specific manner.

      The respiratory physiology is very convincing and provides clear support for the view that Vglut2 is not required for modulation of the respiratory parameters measured and the reflex responses tested. It is stated that this is surprising. However, comparison with the data from Abbott et al., Eur J Neurosci (2014) in which the same transgenic approach was used, shows that they also observed no change in baseline breathing frequency. Differences were observed with strong, coordinated optogenetic stimulation, but, as discussed in this manuscript, it is not clear what physiological function this is relevant to. It just shows that some C1 neurons can use glutamate as a signaling molecule. Further, Holloway et al., Eur J Neurosci (2015), using the same transgenic mouse approach, showed that the respiratory response to optogenetic activation of Phox2 expressing neurons is not altered in DBH-Vglut2 KO mice. The conclusion seems to be that some C1 neuron effects are reliant upon glutamatergic transmission (C1DMV for example), and some not.

      We agree that activation of C1 neurons may be sufficient to modulate breathing when artificially stimulated and that such stimulation relies on glutamatergic transmission for its effect. This is why we find our results surprising and important in clarifying for the field that glutamatergic signaling in noradrenergic cells is dispensable for breathing and hypoxic and hypercapnic responses under physiological conditions.

      Further contrast is made in this manuscript to the work of Malheiros-Lima and colleagues (eLife 2020) who showed that the activation of abdominal expiratory nerve activity in response to peripheral chemoreceptor activation with cyanide was dependent upon C1 neurons and could be attenuated by blockade of glutamate receptors in the pFRG - i.e. the supposition that glutamate release from C1 neurons was responsible for the function. However, it is interesting to observe that diaphragm EMG responses to hypercapnia (10% CO2) or cyanide, and the expiratory activation to hypercapnia, were not affected by the glutamate receptor blockade. Thus, a very specific response is affected and one that was not measured in the current study.

      As we mention above, we do not dispute that glutamate signaling can be manipulated to create a response in non-physiological conditions – we suggest that framing the interpretation around the glutamatergic role in a model that better matches physiological conditions should inform our interpretation. Furthermore, we do include an examination of expiratory flow – which was not impacted by loss of glutamatergic activity in NA neurons – which would be likely to have been impacted if abdominal expiratory nerve activity was modified.

      These previous published observations are consistent with the current study which provides a more comprehensive analysis of the role of glutamatergic contributions respiratory physiology. A more nuanced discussion of the data and acknowledgement of the differences, which are not actually at odds, would improve the paper and place the information within a more comprehensive model.

      Thank you for the comments. As noted in the original and extended discussion, we respectfully disagree with the perspective that our results align with prior results.

      Recommendations for the authors:

      The three reviewers believe this is an important study. They have numerous suggestions for improvement of the manuscript (outlined below), but no new experiments are required. The Editor requests some nomenclature changes as indicated in attachment 1.

      Reviewer #1 (Recommendations For The Authors):

      Abstract/Introduction: Although the need for this study is obvious, it is important that the authors explicitly communicate their working hypothesis < before the start of the work> to the reader. In the current form, it is unclear whether the authors aimed to test the hypothesis that glutamatergic signaling from noradrenergic neurons is important to breathing or whether to test the hypothesis that glutamatergic signaling from noradrenergic neurons is not important to breathing. If it is the latter-it is not important-then the study (related to the breathing measurements) is poorly justified and designed, as additional orthogonal approaches (e.g., actual measurements of glutamatergic signaling at the cellular level) are almost requisite. If the authors' hypothesis was originally based on existing literature suggesting that glutamatergic signaling from noradrenergic neurons is important to breathing, then the experimental design appropriate.

      Thank you for the suggestion. The working hypothesis has been added in the abstract (line 2425) and the introduction (line 92-94)), making clear that we initially hypothesized that glutamatergic signaling from noradrenergic neurons is important in breathing.

      Results: While the steady state measurements for breathing metrics are clearly important in defining how glutamatergic signaling may contribute to be pulmonary function, the role of glutamatergic signaling may have a greater role in the dynamics of patterns (i.e., regularity of the breathing rhythms) such traits can be described using SD1 and SD2 from Poincare maps, and/or entropy measurements. Such an analysis should be performed.

      Thank you for the suggestion. The dynamic patterns of respiratory rate (Vf), tidal volume (VT), minute ventilation (VE), inspiratory duration (TI), expiratory duration (TE), breath cycle duration (TTOT), inspiratory flow rate (VT/TI), expiratory flow rate (VT/TE) have been shown as Poincaré plots and quantified and tested using the SD1 and SD2 statistics in the supplemental figures of Figure 4-7.

      Results: Analyses of Inspiratory time (Ti) and flow rate (i.e., Tidal Volume / Ti) should be assessed and included.

      Thank you for the suggestion. Inspiratory duration (Ti), expiratory duration (TE), breath cycle duration (TTOT), inspiratory flow rate (VT/Ti), and expiratory flow rate (VT/TE) have been included in the Figures 4-7.

      Results/Methods: If similar analytical approaches were used in the current study as to that in Lusk et al. 2022, it appears that data was discontinuously sampled, rejecting periods of movement and only including periods of quiescent breathing. Were the periods of quiescent breathing different? Information should be provided to describe the total sampling duration included.

      For room air, the entire gas condition was used for data analysis. For hypercapnia (5% CO2, 7% CO2, 10% CO2), only the last 5 minutes of the gas challenge period was used for data analysis. For hypoxia (10% O2), we analyzed the breathing trace of three 5-minute epochs following initiation of the gas exposure separately, e.g., epoch 1 = 5-10min, epoch 2 = 10-15min, and epoch 3 = 15-20min. All breaths included as quiescent breathing were analyzed in the aggregate for each group and experimental condition, we did not compare individual periods of quiescent breathing within or across an animal(s)/group(s)/experimental condition(s). We have added the details in the Materials and Methods (line 637-642).

      Results: As mice were conscious in this study, were sniff periods (transient periods of fast breathing, i.e.,>8Hz) included in the analysis?

      No, only regular quiescent breathing periods were included in the analysis.

      Discussion: The authors need to discuss the limitations of their findings.

      • How should the reader interpret the findings? Concluding that glutamatergic signaling is dispensable implies that it occurs in room air, hypoxia, and hypercapnia.

      We have edited our discussion for clarity to highlight our conclusions that Vglut2-based glutamatergic signaling from noradrenergic neurons is ultimately dispensable for baseline breathing and hypercapnia and hypoxic chemoreflex in unanesthetized and unrestrained mice.

      • Assuming that glutamatergic signaling is active during the conditions tested, then the authors should discuss what may be the potential compensations.

      We have provided additional discussion surrounding potential compensatory events that may have taken place and could result in the unchanged phenotype in the experimental group.

      • The authors need to discuss how age and state of consciousness may play a role in their finds. The current discussion gives the impression that their findings are broadly applicable in all cases, but the lack of differences in this study may not hold true under different conditions.

      The study was done in adult (6–8-week-old) unanesthetized and unrestrained mice. In the discussion (line 472-474), we highlight that in our unpublished results, loss of NA-expressed Vglut2 does not change the survival curve in P7 neonate mice undergoing repeated bouts of autoresuscitation until death. Thus, we believed that Vglut2-based glutamatergic signaling in central NA neurons is dispensable for baseline breathing and the hypercapnic and hypoxic chemoreflexes in unanesthetized and unrestrained mice across different ages. Otherwise, we do not imply that we have interrogated any other aspects of breathing in our discussion.

      Methods: Further description of the analysis window for the respiratory metrics should be provided. Were breath values for each condition taken throughout the entire condition? This is particularly important for hypoxia, where the stereotypical respiratory response is biphasic.

      For room air, the entire gas condition was used for data analysis. For hypercapnia (5% CO2, 7% CO2, 10% CO2), only the last 5min of the gas challenge period was used for data analysis. For hypoxia (10% O2), we analyzed the breathing trace of three 5min time periods separately including 5-10min, 10-15min, and 15-20min during the hypoxic challenge as noted in our original manuscript, we graph and assess three 5min epochs during hypoxic exposure to capture the dynamic nature of the hypoxic ventilatory response. We have added the details in the Materials and Methods (line 637-642).

      Methods: How was consciousness determined?

      The conscious mice mentioned in the manuscript refer to the mice without anesthesia. We have replaced “awake” and “conscious” with “unanesthetized” in the text.

      Reviewer #2 (Recommendations For The Authors):

      Since no EEG/EMG recording was performed it would be more appropriate to remove "awake" and "conscious" throughout the manuscript and include the term "unanesthetized".

      Thank you for the suggestion. “Awake” and “conscious” have been replaced by “unanesthetized” in the text.

      Line 545: Why 32C? Isn't this temperature too high for animals?

      30-32°C is the thermoneutral zone for mice. It is the range of ambient temperature where mice can maintain a stable core temperature with their minimal metabolic rate (Gordon, 1985). Whole-body plethysmography uses the barometric technique to detect pressure oscillations caused by changes in temperature and humidity with each breathing act when an animal sits in a sealed chamber (Mortola et al., 2013). Thus, maintaining the chamber temperature near the thermoneutral zone during the plethysmography assay is required to maintain constancy in respiratory and metabolic parameters from trial to trial as well as to maintain linearity of ventilatory pressure changes due to humidification, rarefaction, and thermal expansion and contraction during inspiration and expiration (Ray et al., 2011). The chamber temperature that has been used for adult plethysmography has been set across a range 30-34°C (Hodges et al., 2008; Ray et al., 2011; Hennessy et al., 2017). We use 32°C in this manuscript which is consistent with previously published literature from other groups and our own work (Sun et al., 2017; Lusk et al., 2022).

      I would include the units of the physiological variables in the tables.

      Thank you for the suggestion. The units of the physiological variables have been added in all the tables.

      Reviewer #3 (Recommendations For The Authors):

      Why is the C3 group not considered in this study?

      The C3 adrenergic group, best characterized in rat, is only seen in rodents but not in many other species including primates (including human) (Kitahama et al., 1994). Thus, the C3 group is not the focus of this study where we aim to discuss if glutamate derived from noradrenergic neurons could be the potential therapeutic target of human respiratory disorders. The C3 adrenergic group is typically described as a population containing only about 30 neurons. We have added the fate map data and the adult expression pattern for the three vesicular glutamate transporters for the C3 group in the figure 1 and 2 supplements for reference.

      Sub CD/CV does not appear to be defined in the manuscript.

      Thank you for the point. The definition of sub CD/CV has been added in the text (line 126).

      The data on line 131-133 is interesting but could be described more effectively and clearly.

      Thank you for the suggestion. The text has been modified accordingly.

      The end of the paragraph at lines 140 onwards is rather repeated in the paragraph that starts at line 146.

      The repeated text has been removed accordingly.

      Whilst anterior and posterior are correct anatomical terms, for a quadraped, rostral and caudal are more widely used - particularly in the brainstem field. Is there a particular reason for using anterior/posterior?

      We followed the anatomical terminations in the Robertson et al. (2013) where they used anterior/posterior to describe C2/A2 and C1/A1.

      On the protocol lines include in Figure 4-7 it would be worth adding the test day. This seems a little strange. Why wait up to one week after the habituation to perform the stimulation. How many mice were left for each day between habituation and experimentation, and does this timing affect responses? Do mice forget the habituation after a period?

      Thank you for the point. We have added the test day for plethysmography in figures 4-7. After the 5 days of habituation, we began the plethysmography recordings on the sixth day. A maximum of 6 mice can be assayed for plethysmography per day due to the limited number of barometric flow through plethysmography and metabolic measurement systems we have. Thus, all animals were finished with plethysmography “within” one week of the last day of habituation. This protocol is consistent with our previous published work (Martinez et al., 2019; Lusk et al., 2022; Lusk et al., 2023). For the experiments in this manuscript, mice were assayed within 3 days after habituation. As noted in our methods and figures, each mouse is given as much as 40 mins to acclimate to the chamber (determined by directly observed quiet breathing) before data acquisition. We have no reason or evidence that indicates testing order and thus timing was a factor. The detailed explanation for the plethysmography protocol has been added in the material and methods section (line 606-625).

      Please state clearly that each mouse is only exposed to one gas mixture (what I interpret is the case), or could one mouse be exposed to several different stimuli?

      Each mouse is only exposed to one gas challenge (5% CO2, 7% CO2, 10% CO2, or 10% O2) in a testing period. Each testing period for an individual mouse was separated by 24hs to allow for a full recovery. The protocol is to put the mouse under room air for 45mins, switch to one gas challenge for 20mins, and switch back to room air for 20mins.

      With apologies if I missed this, but did each of the respiratory stimuli produce a statistically significant response in the control mice? For example, the response to 10%O2?

      Yes, each respiratory stimuli including 5/7/10% CO2 and 10% O2 produced a statistically significant response in both mutant and control mice. We have labeled the statistical significance in the Figures 4-7. Thank you for pointing this out.

      Line 312: Optogenetic stimulation induced an increase from 130 to 180 breaths per min (Abbott et al., EJN 2014). It is surprising that this is called "modest". Baseline respiratory frequency was presented.

      Thank you for the point. The word “modest” has been removed and the discussion has been changed accordingly (line 355-360).

      Line 338: This discussion is not sufficiently nuanced. It is the increased Dia amplitude (to KCN only, not 10%CO2 ) and the stimulation of active expiration, to both stimuli, that is blocked by kyn in pFRG. There is no effect of breathing frequency. The current study would not detect such differences in active expiration.

      Thank you for the suggestion. The discussion has been modified accordingly (line 382-388).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this important paper, Blin and colleagues develop a high-throughput behavioral assay to test spontaneous swimming and olfactory preference in individual Mexican cavefish larvae. The authors present compelling evidence that the surface and cave morphs of the fish show different olfactory preferences and odor sensitivities and that individual fish show substantial variability in their spontaneous activity that is relevant for olfactory behaviour. The paper will be of interest to neurobiologists working on the evolution of behaviour, olfaction, and the individuality of behaviour.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors posed a research question about how an animal integrates sensory information to optimize its behavioral outputs and how this process evolved. Their data (behavioral output analysis with detailed categories in response to the different odors in different concentrations by comparing surface and cave populations and their hybrid) partially answer this tough question. They built a new low-disturbance system to answer the question. They also found that the personality of individual fish is a good predictor of behavioral outputs against odor response. They concluded that cavefish evolved to specialize their response to alanine and histidine while surface fish are more general responders, which was supported by their data.

      Strengths:

      With their new system, the authors could generate clearer results without mechanical disturbances. The authors characterize multiple measurements to score the odor response behaviors, and also brought a new personality analysis. Their conclusion that cavefish evolved as a specialist to sense alanine and histidine among 6 tested amino acids was well supported by their data.

      Weaknesses:

      The authors posed a big research question: How do animals evolve the processes of sensory integration to optimize their behavioral outputs? I personally feel that, to answer the questions about how sensory integration generates proper (evolved) behavior, the authors at least need to show the ecological relevance of their response. For the alanine/histidine preference in cavefish, they need data for the alanine and other amino acid concentrations in the local cave water and compare them with those of surface water.

      We agree with the reviewer. This is why, in the Discussion section, we had written: “…Such significant variations in odor preferences or value may be adaptive and relate to the differences in the environmental and ecological conditions in which these different animals live. However, the reason why Pachón cavefish have become “alanine specialists” remains a mystery and prompts analysis of the chemical ecology of their natural habitat. Of note, we have not found an odor that would be repulsive for Astyanax so far, and this may relate to their opportunist, omnivorous and detritivore regime (Espinasa et al., 2017; Marandel et al., 2020).” This is also why we currently develop field work projects aimed at clarifying this question. However, such experiments and analyses are challenging, practically and technically. We hope we can reach some conclusions in the future.

      To complete the discussion we have also added an important hypothesis: “Alternatively, specialization for alanine may not need to be specific for an olfactory cue present only, or frequently, or in high amounts in caves. Bat guano for example, which is probably the main source of food in the Pachón cave, must contain many amino acids. Enhanced recognition of one of them - in the present case alanine but evolution may have randomly acted for enhanced recognition of another amino acid – should suffice to confer cavefish with augmented sensitivity to their main source of nutriment.”

      Also, as for "personality matters", I read that personality explains a large variation in surface fish. Also, thigmotaxis or wall-following cavefish individuals are exceeded to respond well to odorants compared with circling and random swimming cavefish individuals. However, I failed to understand the authors' point about how much percentages of the odorant-response variations are explained (PVE) by personality. Association (= correlation) was good to show as the authors presented, but showing proper PVE or the effect size of personality to predict the behavioral outputs is important to conclude "personality is matter"; otherwise, the conclusion is not so supported.

      From the above, I recommend the authors reconsider the title also their research questions well. At this moment, I feel that the authors' conclusions and their research questions are a little too exaggerated, with less supportive evidence.

      Thank you for this interesting suggestion, which we have fully taken into consideration. We have therefore now calculated and plotted PVE (the percentage of variation explained on the olfactory score) as a function of swimming speed or as a function of swimming pattern. The results are shown in modified Figure 8 of our revised ms and they suggest that the personality (here, swimming patterns or swimming speed) indeed predicts the olfactory response skills. Therefore, we would like to keep our title as we provide support for the fact that “personality matters”.

      Also, for the statistical method, Fisher's exact test is not appropriate for the compositional data (such as Figure 2B). The authors may quickly check it at https://en.wikipedia.org/wiki/Compositional_data or https://www.annualreviews.org/doi/pdf/10.1146/annurev-statistics-042720-124436.

      The authors may want to use centered log transformation or other appropriate transformations (Rpackage could be: https://doi.org/10.1016/j.cageo.2006.11.017). According to changing the statistical tests, the authors' conclusion may not be supported.

      Actually, in most cases, the distributions are so different (as seen by the completely different colors in the distribution graphs) that there is little doubt that swimming behaviors are indeed different between surface and cavefish, or between ‘before’ and ‘after’ odor stimulation. However, it is true that Fisher’s exact test is not fully appropriate because data can be considered as compositional type. For this kind of data, centered log transformation have been suggested. However, our dataset contains many zeros, and this is a case where log transformations have difficulty handling.

      To help us dealing with our data, the reviewer proposed to consider the paper by Greenacre (2021) (https://www.annualreviews.org/doi/pdf/10.1146/annurev-statistics-042720-124436). In his paper, Greenacre clearly wrote: "Zeros in compositional data are the Achilles heel of the logratio approach (LRA)."

      Therefore, we have now tested our data using CA (Correspondence Analysis), that can deal with table containing many zeros and is a trustable alternative to LRA (Cook-Thibeau, 2021; Greenacre, 2011).

      The results of CA analysis are shown in Supplemental figure 8 and they fully confirm the difference in baseline swimming patterns between morphs as well as changes (or absence of changes) in behavioral patterns after odor stimulation suggested by the colored bar plots in main figures, with confidence ellipses overlapping or not overlapping, depending on cases. Therefore, the CA method fully confirms and even strengthens our initial interpretations.

      Finally, we have kept our initial graphical representation in the ms (color-coded bar plots; the complete color code is now given in Suppl. Fig7), and CA results are shown in Suppl. Figure 8 and added in text.

      Reviewer #2 (Public Review):

      In their submitted manuscript, Blin et al. describe differences in the olfactory-driven behaviors of river-dwelling surface forms and cave-dwelling blind forms of the Mexican tetra, Astyanax mexicanus. They provide a dataset of unprecedented detail, that compares not only the behaviors of the two morphs but also that of a significant number of F2 hybrids, therefore also demonstrating that many of the differences observed between the two populations have a clear (and probably relatively simple) genetic underpinning.

      To complete the monumental task of behaviorally testing 425 six-week-old Astyanax larvae, the authors created a setup that allows for the simultaneous behavioral monitoring of multiple larvae and the infusion of different odorants without introducing physical perturbations into the system, thus biasing the responses of cavefish that are particularly fine-tuned for this sensory modality. During the optimization of their protocol, the authors also found that for cave-dwelling forms one hour of habituation was insufficient and a full 24 hours were necessary to allow them to revert to their natural behavior. It is also noteworthy that this extremely large dataset can help us see that population averages of different morphs can mask quite significant variations in individual behaviors.

      Testing with different amino-acids (applied as relevant food-related odorant cues) shows that cavefish are alanine- and histidine-specialists, while surface fish elicit the strongest behavioral responses to cysteine. It is interesting that the two forms also react differently after odor detection: while cave-dwelling fish decrease their locomotory activity, surface fish increase it. These differences are probably related to different foraging strategies used by the two populations, although, as the observations were made in the dark, it would be also interesting to see if surface fish elicit the same changes in light as well.

      Thank you for these nice comments.

      Further work will be needed to pinpoint the exact nature of the genetic changes that underlie the differences between the two forms. Such experimental work will also reveal how natural selection acted on existing behavioral variations already present in the SF population.

      Yes. Searching for genetic underpinnings of the sensory-driven behavioral differences is our current endeavor through a QTL study and we should be able to report it in the near future.

      It will be equally interesting, however, to understand what lies behind the large individual variation of behaviors observed both in the case surface and cave populations. Are these differences purely genetic, or perhaps environmental cues also contribute to their development? Does stochasticity provided by the developmental process has also a role in this? Answering these questions will reveal if the evolvability of Astyanax behavior was an important factor in the repeated successful colonization of underground caves.

      Yes. We will also access (at least partially) responses to most of these questions in our current QTL study.

      Reviewer #3 (Public Review):

      Summary:

      The paper explores chemosensory behaviour in surface and cave morphs and F2 hybrids in the Mexican cavefish Astyanax mexicanus. The authors develop a new behavioural assay for the longterm imaging of individual fish in a parallel high-throughput setup. The authors first demonstrate that the different morphs show different basal exploratory swimming patterns and that these patterns are stable for individual fish. Next, the authors test the attraction of fish to various concentrations of alanine and other amino acids. They find that the cave morph is a lot more sensitive to chemicals and shows directional chemotaxis along a diffusion gradient of amino acids. For surface fish, although they can detect the chemicals, they do not show marked chemotaxis behaviour and have an overall lower sensitivity. These differences have been reported previously but the authors report longer-term observations on many individual fish of both morphs and their F2 hybrids. The data also indicate that the observed behavior is a quantitative genetic trait. The approach presented will allow the mapping of genes' contribution to these traits. The work will be of general interest to behavioural neuroscientists and those interested in olfactory behaviours and the individual variability in behavioural patterns.

      Strengths:

      A particular strength of this paper is the development of a new and improved setup for the behavioural imaging of individual fish for extended periods and under chemosensory stimulation. The authors show that cavefish need up to 24 h of habituation to display a behavioural pattern that is consistent and unlikely to be due to the stressed state of the animals. The setup also uses relatively large tanks that allow the build-up of chemical gradients that are apparently present for at least 30 min.

      The paper is well written, and the presentation of the data and the analyses are clear and to a high standard.

      Thank you for these nice comments.

      Weaknesses:

      One point that would benefit from some clarification or additional experiments is the diffusion of chemicals within the behavioural chamber. The behavioural data suggest that the chemical gradient is stable for up to 30 min, which is quite surprising. It would be great if the authors could quantify e.g. by the use of a dye the diffusion and stability of chemical gradients.

      OK. We had tested the diffusion of dyes in our previous setup and we also did in the present one (not shown). We think that, due to differences of molecular weight and hydrophobicity between the tested dyes and the amino acid molecules we are using, their diffusion does not constitute a proper read-out of actual amino acid diffusion. We anticipate that amino acid diffusion is extremely complex in the test box, possibly with odor plumes diffusing and evolving in non-gradient patterns, in the 3 dimensions of the box, and potentially further modified by the fish swimming through it, the flow coming from the opposite water injection side and the borders of the box. This is the reason why we have designed the assay with contrasting “odor side” and “water control side”. Moreover, our question here is not to determine the exact concentration of amino acid to which the fish respond, but to compare the responses in cavefish, surface fish and F2 hybrids. Finally and importantly, we have performed dose/response experiments whereby varying concentrations have been presented for 3 of the 6 amino acids tested, and these experiments clearly show a difference in the threshold of response of the different morphs.

      The paper starts with a statement that reflects a simplified input-output (sensory-motor) view of the organisation of nervous systems. "Their brains perceive the external world via their sensory systems, compute information and generate appropriate behavioral outputs." The authors' data also clearly show that this is a biased perspective. There is a lot of spontaneous organised activity even in fish that are not exposed to sensory stimulation. This sentence should be reworded, e.g. "The nervous system generates autonomous activity that is modified by sensory systems to adapt the behavioural pattern to the external world." or something along these lines.

      Done

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      In addition to my comments in the "weakness" section above, here are my other comments.

      How many times fish were repeatedly assayed and what the order (alanine followed by cysteine, etc) was, is not clear (Pg 24, Materials and Methods). I am afraid that fish memorize the prior experience to get better/worse their response to the higher conc of alanine, etc. Please clarify this point.

      Many fish were tested in different conditions on consecutive days, indeed. Most often, control experiments (eg, water/nothing; water/water; nothing/nothing) were followed by odor testing. In such cases, there is no risk that fish memorize prior experience and that such previous experience interferes with response to odor. In other instances, fish were tested with a low concentration of one amino acid, followed by a high concentration of another amino acid, which is also on the safe side. Of note, on consecutive days, the odors were always perfused on alternate sides of the test box, to avoid possibility of spatial memory. Finally, in the few cases where increasing concentrations of the same amino acids were perfused consecutively, 1) they were perfused on alternate sides, 2) if the fish does not detect a low concentration below threshold / does not respond, then prior experience should not interfere for responding to higher concentrations, and 3) we have evidence (unpublished, current studies) that when a fish is given increasing concentrations of the same amino acid above detection threshold, then the behavioral response is stable and reproducible (eg does not decrease or increase).

      Minor points:

      Thygmotaxis and wall following.

      Classically, thigmotaxis and wall following are treated as the same (sharma et al., 2009; https://pubmed.ncbi.nlm.nih.gov/19093125/) but the authors discriminate it in thigmotaxis at X-axis and Y-axis because fish repeatedly swam back and forth on x-axis wall or y-axis wall. I understand the authors' point to discriminate WF and T but present them with more explanations (what the differences between them) in the introduction and result sections.

      Done

      Pg5 "genetic architecture" in the introduction.

      "Genetic architecture" analysis needs a more genomic survey, such as GWAS, QTL mapping, and Hi-C. Phenotype differences in F2 generation can be stated as "genetic factor(s)" "genetic component(s)", etc. please revise.

      Done

      Pg10 At the serine treatment, the authors concluded that "...suggesting that their detection threshold for serine is lower than for alanine." I believe that the 'threshold for serine is higher' according to the authors' data. Their threshold-related statement is correct in Pg21 "as SF olfactory concentration detection threshold are higher than CF,..." So the statement on page 10 is a just mistake, I think. Please revise.

      Done (mistake indeed)

      Pg11 After explaining Fig5, the statement "In sum, the responses of the different fish types to different concentrations of different amino acids were diverse and may reflect complex, case-bycase, behavioral outputs" does not convey any information. Please revise.

      OK. Done : “In sum, the different fish types show diverse responses to different concentrations of different amino acids.”

      For the personality analysis (Fig 7)

      The index value needs more explanation. I read the materials and methods three times but am still confused. From the equation, the index does not seem to exceed 1.0, unless the "before score" was a negative value, and the "after score" value was positive. I could not get why the authors set a score of 1.5 as the threshold for the cumulative score of these different behavior index values (= individual score). Please provide more description. Currently, I am skeptical about this index value in Fig 7.

      Done, in results and methods.

      Pg15 the discussion section

      Please discuss well the difference between the authors' finding (cavefish respond 10^-4M for position and surface fish responded 10^-4 for thig-Y; Fig 4AB), and those in Hinaux et al. 2016 (cavefish responded 10^-10M alanine but surface fish responded 10^-5M or higher). It seems that surface fish could respond to the low conc of alanine as cavefish do, which is opposed to the finding in Hinaux 2016.

      The increase in NbrtY at population level for surface fish with 10-4M alanine (~10-6M in box) was most probably due to only a few individuals. Contrarily to cavefish, all other parameters were unchanged in surface fish for this concentration. Moreover, at individual level, only 3.2% of surface fish had significant olfactory scores (to be compared to 81.3% for cavefish). Thus, we think that globally this result does not contradict our previous findings in Hinaux et al (2016), and solely represent the natural, unexplained variations inherent to the analysis of complex animal behaviors – even when we attempt to use the highest standards of controlled conditions.

      Of note, in the revised version, we have now included a full dose/response analysis for alanine concentration ranging from 10-2M to 10-10M, on cavefish. Alanine 10-5M has significant effects (now shown in Suppl Fig2 and indicated in text; a column has been added for 10-5M in Summary Table 1). Lower concentrations have milder effects (described in text) but confirm the very low detection threshold of cavefish for this amino acid.

      Pg19, "In sum, CF foraging strategy has evolved in response to the serious challenge of finding food in the dark"

      My point is the same as explained in the 'weakness' section above: how this behavior is effective in the cave life, if they conclude so? Please explain or revise this statement.

      The present manuscript reports on experiments performed in “artificial” and controlled laboratory conditions. We are fully aware that these conditions are probably distantly related to conditions encountered in the wild. Note that we had written in original version (page 20) “…for 6-week old juveniles in a rectangular box - but the link may be more elusive when considering a fish swimming in a natural, complex environment.” As the reviewer may know, we also perform field studies in a more ethological approach of animal behaviors, thus we may be able to discuss this point more accurately in the future.

      Pg20 "To our knowledge, this is the first time individual variations are taken into consideration in Astyanax behavioral studies."

      This is wrong. Please see Fernandes et al., 2022. (https://pubmed.ncbi.nlm.nih.gov/36575431/).

      OK. The sentence is wrong if taken in its absolute sense, i.e., considering inter-individual variations of a given parameter (e.g., number of neuromasts per individual or number of approaches to vibrating rod in Fernandez et al, 2022). In this same sense, Astyanax QTL studies on behaviors in the past also took into account variations among F2 individuals. Here, we wanted to stress that personality was taken into consideration. The sentence has been changed: “To our knowledge, this is the first time individual temperament is taken into consideration in Astyanax behavioral studies.”

      Figure 2B and others.

      The order of categories (R, R-TX, etc) should match in all columns (SF, F2, and CF). Currently, the category orders seem random or the larger ratio categories at the bottom, which is quite difficult to compare between SF, F2, and CF. Also, the writings in Fig 2A (times, Y-axis labels, etc), and the bargraphs' writings are quite difficult to read in Fig 2B, Fig 3B 4H, 5GN, 6EFG. Also, no need to show fish ID in Fig 2C in the current way, but identify the fish data points of the fish in Fig 2D (SF#40, CF#65, and F2#26) in Fig 2C if the authors want to show fish ID numbers in the boxplots. Fish ID numbers in other boxplot figures are recommended to be removed too.

      We have thought a lot on how to best represent the distributions of swimming patterns in graphs such as Fig 2B and others. The difficulty is due to the existence of many combinations (33 possibilities in total, see new Suppl Fig7), which are never the same in different plots/conditions because individual tested fish are different. We decided that that the best way was to represent, from bottom to top, the most used to the less used swimming patterns, and to use a color code that matches at best the different combinations. It was impossible to give the full color code on each figure, therefore it was simplified, and we believe that the results are well conveyed on the graphs. We would like to keep it as it is. To respond (partially) to the reviewer’s concern, we have now added a full color code description in a new Supplemental Figure 7 (associated to Methods).

      Size of lettering has been modified in all pattern graphs like Fig2A. Thanks for the suggestion, it reads better now.

      Finally, we would like to keep the fish ID numbers because this contributes to conveying the message of the paper, that individuality matters.

      Raw data files were not easy to read in Excel or LibreOffice. Please convert them into the csv format to support the rigor in the authors' conclusion.

      We do not understand this request. Our very large dataset must be analysed with R, not excel for stats or for plotting and pattern analysis. However, raw data files can be opened in excel with format conversion.

      Reviewer #2 (Recommendations For The Authors):

      I think most of the experimental procedures (with few exceptions, see below) are well-defined and nicely described, so the majority of my suggestions will be related to the visualization of the data. I think the authors have done a great job in presenting this complex dataset, but there are still some smaller tweaks that could be used to increase the legibility of the presented data.

      First and perhaps foremost, a better definition of the swimming pattern subsets is needed. I have no problem understanding the main behavioral types, but whereas the color codes for these suggest that there is continuous variance within each pattern, it is not clear (at least to me), what particular aspect(s) of the behaviors vary. Also, whereas the sidebars/legends suggest a continuum within these behaviors, the bar charts themselves clearly present binned data. I did not find a detailed description of how the binning was done. As this has been - according the Methods section - a manual process, more clarity about the details of the binning would be welcome. I would also suggest using binned color codes for the legends as well.

      Done, in Results and Methods. We hope it is now clear that there is no “continuum”, rather multiple combinations of discrete swimming patterns. The gradient aspect in color code in figures has been removed to avoid the idea of continuum. According to the chosen color code, WF is in red, R in blue, T in yellow and C in green. Then, combination are represented by colors in between, for example, R+WF is purple. We have now added a full color code description for the swimming patterns and their combinations in a new Supplemental Figure 7 (associated to Methods).

      Also, to better explain the definition of the swimming patterns and the graphical representation, it now reads (in Methods):

      “The determination of baseline swimming patterns and swimming patterns after odor injection was performed manually based on graphical representations such as in Figure 2A or Figure 3A. Four distinctive baseline behaviors clearly emerged: random swim (R; defined as haphazard swimming with no clear pattern, covering entirely or partly the surface of the arena), wall following (WF; defined as the fish continuously following along the 4 sides of the box and turning around it, in a clockwise or counterclockwise fashion), large or small circles (C; self explanatory), and thigmotactism (T, along the X- or the Y-axis of the box; defined as the fish swimming back and forth along one of the 4 sides of the box). On graphical representations of swimming pattern distributions, we used the following color code: R in blue, WF in red, C in green, T in yellow. Of note, many fish swam according to combination(s) of these four elementary swimming patterns (see descriptions in the legends of Supplemental figures, showing many examples). To fully represent the diversity and the combinations of swimming patterns used by individual fish, we used an additional color code derived from the “basic” color code described above and where, for example R+WF is purple. The complete combinatorial color code is shown in Suppl. Fig7.”

      It would be also easier to comprehend the stacked bar charts, presenting the particular swimming patterns in each population, if the order of different swimming patterns was the same for all the plots (e.g. the frequency of WF always presented at the bottom, R on the top, and C and T in the middle). This would bring consistency and would highlight existing differences between SF, CF, and F2s. Furthermore, such a change would also make it much easier to see (and compare) shifts in behaviors.

      We have thought a lot on how to best represent the distributions of swimming patterns in graphs such as Fig 2B and others. The difficulty is due to the existence of many combinations, which are never the same in different plots/conditions because the individual fish tested are different. We decided to keep it as it currently stands, because we think re-doing all the graphs and figures would not significantly improve the representation. In fact, we think that the differences between morphs (dominant blue in SF, dominant red in CF) and between conditions (bar charts next to each other) are easy to interpret at first glance in the vast majority of cases. Moreover, they are now completed by CA analyses (Suppl Figure 8).

      While the color coding of the timeline in the "3D" plots presented for individual animals is a nice feature, at the moment it is slightly confusing, as the authors use the same color palette as for the stacked bar charts, representing the proportionality of the particular swimming patterns. As the y-axis is already representing "time" here, the color coding is not even really necessary. If the authors would like to use a color scheme for aesthetic reasons, I would suggest using another palette, such as "grey" or "viridis".

      We would like to keep the graphical aspect of our figures as they are, for aesthetic reasons. To avoid confusion with stacked bar chart color code, we have added a sentence in Methods and in the legend of Figure 2, where the colors first appear:

      “The complete combinatorial color code is shown in Suppl. Figure 7. Of note, in all figures, the swimming pattern color code does not relate whatsoever with the time color code used in the 2D plus time representation of swimming tracks such as in Figure 2A”.

      I would also suggest changing the boxplots to violin-plots. Figure 7 clearly shows bimodality for F2 scores (something, as the authors themselves note, not entirely surprising given the probably poligenic nature of the trait), but looking at SF and CF scores I think there are also clear hints for non-normal distributions. If non-normal distribution of traits is the norm, violin-plots would capture the variance in the data in a more digestible way. (The existence of differently behaving cohorts within the population of both SF and CF forms would also help to highlight the large pre-existing variance, something that was probably exploited by natural selection as well, as mentioned briefly in the Discussion by the authors, too.)

      The bimodal distribution of scores shown by F2s in Figure 7B is indeed probably due to the polygenic nature of the trait. However, such distribution is rather the exception than the norm. Moreover, the boxplot representations we have used throughout figures include all the individual points, and outliers can be identified as they have the fish ID number next to them. This allows the reader to grasp the variance of the data. Again, redoing all graphs and figures would constitute a lot of work, for little gain in term of conveying the results. Therefore, we choose not to change the boxplot for violin plots.

      The summary data of individual scores in Table 1B shows some intriguing patterns, that warrant a bit further discussion, in my opinion. For example, we can see opposite trends in scores of SF and CF forms with increasing alanine concentration. Is there an easy explanation for this? Also, in the case of serine, the CF scores do not seem to respond in a dose-dependent manner and puzzlingly at 10^(-3)M serine concentration F2 scores are above those of both grandparental populations.

      That is true. However, we have no simple explanation for this. To begin responding to this question, we have now performed full dose/responses expts for alanine (concentrations tested from 10-2M to 10-10M on cavefish; confirm that CF are bona fide “alanine specialists”) and for serine (10-2M to 104M tested on both morphs; confirm that both morphs respond well to this amino acid). These complementary results are now included in text and figures (partially) and in the summary table 1.

      If anything is known about this, I would also welcome some discussion on how thigmotactic behavior, a marker of stress in SF, could have evolved to become the normal behavior of CF forms, with lower cortisol levels and, therefore lower anxiety.

      We actually think thigmotactism is a marker of stress in both morphs. See Pierre et al, JEB 2020, Figure S3A: in both SF and CF thigmotaxis behavior decreases after long habituation times. In our hands, the only difference between the two morphs is that surface fish (at 5 month of age) express stress by thigmotactism but also freezing and rapid erratic movements, while cavefish have a more restricted stress repertoire.

      This is why in the present paper we have carefully made the distinction between thigmotactism (= possible stress readout) and wall following (= exploratory behavior). Our finding that WF and large circles confers better olfactory response scores to cavefish is in strong support of the different nature of these two swimming patterns. Then, why is swimming along the 4 walls of a tank fundamentally different from swimming along one wall? The question is open, although the number of changes of direction is probably an important parameter: in WF the fish always swims forward in the same direction, while in T the fish constantly changes direction when reaching the corner of the tank – which is similar to erratic swim in stressed surface fish.

      Finally two smaller suggestions:

      • When referring to multiple panels on the same figure it would be better to format the reference as "Figure 4D-G" instead of "Figure 4DEFG";

      Done

      • On page 4, where the introduction reads as "although adults have a similar olfactory rosette with 2025 lamellae", in my opinion, it would be better to state that "while adults of the two forms have a similar olfactory rosette with 20-25 lamellae".

      Done

      Reviewer #3 (Recommendations For The Authors):

      Consider moving Figure 3 to be a supplement of Figure 4. This figure shows a water control and therefore best supplements the alanine experiment.

      We would like to keep this figure as a main figure: we consider it very important to establish the validity of our behavioral setup at the beginning of the ms, and to establish that in all the following figures we are recording bona fide olfactory responses.

      "sensory changes in mecano-sensory and gustatory systems " - mechano-sensory.

      Done

      Figure 2 legend: "(3) the right track is the 3D plus time (color-coded)" - shouldn't it be 2D plus time or 3D (x,y, time).

      True! Thanks for noting this, corrected.

      Figure 4 legend "E, Change in swimming patterns" should be H.

      Done

      "suggesting that their detection threshold for serine is lower than for alanine" - higher?

      Done

      In the behavioural plots, I assume that the "mean position" value represents the mean position along the X-axis of the chamber - this should be clarified and the axis label updated accordingly.

      That is correct and has been updated in Methods and Figures and legends.

      "speed, back and forth trips in X and Y, position and pattern changes (see Methods; Figure 7A)." - here it would be helpful to add an explanation like "to define an olfactory score for individual fish."

      This has been changed in Results and more detailed explanations on score calculations are now given in Methods.

      "possess enhanced mecanosensory lateral line" - mechanosensory.

      Done

    1. Author response:

      Reviewer #1 (Public Review):

      (1) Deleting ICP34.5 from the HSV construct has a very strong effect on HIV reactivation. Why is no eGFP readout given in Figure 1C as for WT HSV? The mechanism underlying increased activation by deleting ICP34.5 is only partially explored. Overexpression of ICP34.5 has a much smaller effect (reduction in reactivation) than deletion of ICP34.5 (strong activation); so the story seems incomplete.

      Thank you for your comment. (1) In Figure 1c, "HSV-wt" refers to the virus rescued from pBAC—GFP-HSV (as mentioned in the “Method” section), which carries GFP itself. Therefore, detecting GFP cannot distinguish between HSV infection and HIV reactivation. Hence, we assess the reactivation effect by measuring the mRNA levels of HIV LTR. (2) Our data indicate that overexpression of ICP34.5 inhibits the reactivation of the HIV latent reservoir, but this effect is not equivalent to the activation observed in HSV-1 with ICP34.5 deletion. There are some possible reasons: one is that the overexpression of ICP34.5 by lentivirus is randomly integrated into the genome of J-Lat cell line, which will potentially activate HIV latency to some extent. The other is that ICP34.5 mainly inhibited HIV reactivation through modulation of host NF-κB or HSF1 pathways, while PMA, TNF-a, and HSV-1 with deleted ICP34.5 can reactivate HIV latency by other mechanisms that have yet to be determined. Thereby, exerting a synergistic small inhibitory effect. We will further discuss this issue in the revised version. Thank you.

      (2) No toxicity data are given for deleting ICP34.5. How specific is the effect for HIV reactivation? An RNA seq analysis is required to show the effect on cellular genes.

      Thank you for your comment. We plan to conduct several experiments to demonstrate a reduction in HSV-1 replication after ICP34.5 deletion: (1) Detect the growth curve of HSV-1 deleted with ICP34.5 in Vero cells. The virus growth curve of HSV-1 with deleted ICP34.5 may be lower than that of wild-type HSV-1, which could demonstrate a reduction in HSV-1 replication after ICP34.5 deletion. (2) Detect the level of inflammatory factors in tumor cells after infection with HSV-1 deleted with ICP34.5.

      We believe that the effect is specific, as we previously tested poxviruses and adenoviruses and found no activation of the latent reservoir. We consider the activation observed with HSV-1 virus and HSV-1 with deleted ICP34.5 to be specific. We will supplement relevant data in the revised version.

      In addition, we will provide the corresponding RNA-seq data to assess its effect on cellular genes.

      (3) The primate groups are too small and the results to variable to make averages. In Figure 5, the group with ART and saline has two slow rebounders. It is not correct to average those with a single quick rebounder. Here the interpretation is NOT supported by the data.

      We agree with you that this is a pilot study of limited numbers of rhesus macaques. There were only 3 monkeys per group in this study, but our results were encouraging. Although the number of macaques was relatively limited, these nine macaques were distributed very carefully based on age, sex, weight and genotype. All SIV-infected macaques used in this study had a long history of SIV infection and had several courses of ART therapy, which mimics treatment of chronic HIV-1 infection in humans. These macaques were infected with SIVmac239 for more than 5 years, and highly pathogenic SIV-infected macaques have been well-validated as a stringent model to recapitulate HIV-1 pathogenesis and persistence during ART therapy in humans. Indeed, in our rhesus model, ART treatment effectively suppressed SIV infection to undetectable levels in plasma, and upon ART discontinuation, virus rapidly rebounded, which is very similar with that in ART-treated HIV patients. Our further studies will be expanded the scale of animals and then to preclinical and clinical study in our next projects. Thank you for your understanding.

      Discussion

      HSV vectors are mainly used in cancer treatment partially due to induced inflammation. Whether these are suitable to cure PLWH without major symptoms is a bit questionable to me and should at least be argued for.

      We will provide more data about the safety assessment of HSV-1 vector in SIV-infected macaques, and also further discuss the potential of inflammatory HSV vector in PLWH in the revised manuscript.

      Reviewer #2 (Public Review):

      (1) While the mechanism of ICP34.5 interaction and modulation of the NF-kB and HSF1 pathways are shown, this only proves ICP34.5 interactions but does not give away the mechanism of how the HSV-deltaICP-34.5 vector purges HIV-1 latency. What other components of the vector are required for latency reversal? Perhaps serial deletion experiments of the other ORFs in the HSV-deltaICP-34.5 vector might be revealing.

      We agree with your suggestion. In fact, we are currently further exploring some viral genes of HSV-1 that play a role in activation. We have found that the ICP0 gene of HSV-1 virus can activate HIV, and the specific mechanism is under investigation.

      (2) The efficacy of the HSV vaccine vectors was evaluated in Rhesus Macaque model animals. Animals were chronically infected with SIV (a parent of HIV), treated with ART, challenged with bi-functional HSV vaccine or controls, and discontinued treatment, and the resulting virus burden and immune responses were monitored. The animals showed SIV Gag and Env-specific immune responses, and delayed virus rebound (however rebound is still there), and below-detection viral DNA copies. What would make a more convincing argument to this reviewer will be data to demonstrate that after the bi-functional vaccine, the animals show overall reduction in the number of circulating latent cells. The feasibility of obtaining such a result is not clearly demonstrated.

      Thank you for your suggestion. We will plan to conduct IPDA experiments to further supplement data on the overall reduction in circulating latent cell numbers in animals.

      (3) The authors state that the reduced virus rebound detected following bi-functional vaccine delivery is due to latent genomes becoming activated and steady-state neutralization of these viruses by antibody response. This needs to be demonstrated. Perhaps cell-culture experiments from specimens taken from animals might help address this issue. In lab cultures one could create environments without antibody responses, under these conditions one would expect a higher level of viral loads to be released in response to the vaccine in question.

      We plan to use primary cells for related experiments to further validate the results of the cell experiments.

      (4) How do the authors imagine neutralizing HIV-1 envelope epitopes by a similar strategy? A discussion of this point may also help.

      Thank you for your comments. In fact, our study adopts the "shock and kill" strategy, with a focus on the "kill" aspect leaning towards T-cell therapy. Although the vaccine in the paper also utilizes Env antigen, we believe these antibodies are insufficient for neutralizing the mutated SIV virus. We strongly agree with your suggestion that in HIV/AIDS treatment, effective T-cell killing combined with broad-spectrum neutralizing antibodies would be more effective. This aligns with our findings, as our treatment has partially delayed viral rebound but with a relatively short duration of suppression. This may indicate insufficient killing activity. In future research, we will further consider the role of broad-spectrum neutralizing antibodies. Our revised manuscript will elaborate on this in the discussion section.

      (5) I thought the empty HSV-vector control also elicited somewhat delayed kinetics in virus rebound and neutralization, can the authors comment on why this is the case?

      We agree with you that the HSV-1 empty vector does exhibit somewhat a delayed rebound. The reason is that our treatment simultaneously utilizes both the HSV vector vaccine and ART therapy. Although the empty HSV-vector cannot elicit SIV-specific CTL response, it effectively activates the latent SIV reservoirs and then these activated virions can be partially killed by ART, Therefore, even without carrying antigens, the slight delay may be achieved.