10,000 Matching Annotations
  1. May 2024
    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife assessment

      This important study explores infants' attention patterns in real-world settings using advanced protocols and cutting-edge methods. The presented evidence for the role of EEG theta power in infants' attention is currently incomplete. The study will be of interest to researchers working on the development and control of attention.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper investigates the physiological and neural processes that relate to infants' attention allocation in a naturalistic setting. Contrary to experimental paradigms that are usually employed in developmental research, this study investigates attention processes while letting the infants be free to play with three toys in the vicinity of their caregiver, which is closer to a common, everyday life context. The paper focuses on infants at 5 and 10 months of age and finds differences in what predicts attention allocation. At 5 months, attention episodes are shorter and their duration is predicted by autonomic arousal. At 10 months, attention episodes are longer, and their duration can be predicted by theta power. Moreover, theta power predicted the proportion of looking at the toys, as well as a decrease in arousal (heart rate). Overall, the authors conclude that attentional systems change across development, becoming more driven by cortical processes.

      Strengths:

      I enjoyed reading the paper, I am impressed with the level of detail of the analyses, and I am strongly in favour of the overall approach, which tries to move beyond in-lab settings. The collection of multiple sources of data (EEG, heart rate, looking behaviour) at two different ages (5 and 10 months) is a key strength of this paper. The original analyses, which build onto robust EEG preprocessing, are an additional feat that improves the overall value of the paper. The careful consideration of how theta power might change before, during, and in the prediction of attention episodes is especially remarkable. However, I have a few major concerns that I would like the authors to address, especially on the methodological side.

      Points of improvement

      (1) Noise

      The first concern is the level of noise across age groups, periods of attention allocation, and metrics. Starting with EEG, I appreciate the analysis of noise reported in supplementary materials. The analysis focuses on a broad level (average noise in 5-month-olds vs 10-month-olds) but variations might be more fine-grained (for example, noise in 5mos might be due to fussiness and crying, while at 10 months it might be due to increased movements). More importantly, noise might even be the same across age groups, but correlated to other aspects of their behaviour (head or eye movements) that are directly related to the measures of interest. Is it possible that noise might co-vary with some of the behaviours of interest, thus leading to either spurious effects or false negatives? One way to address this issue would be for example to check if noise in the signal can predict attention episodes. If this is the case, noise should be added as a covariate in many of the analyses of this paper. 

      We thank the reviewer for this comment. We certainly have evidence that even the most state-of-the-art cleaning procedures (such as machine-learning trained ICA decompositions, as we applied here) are unable to remove eye movement artifact entirely from EEG data (Haresign et al., 2021; Phillips et al., 2023). (This applies to our data but also to others’ where confounding effects of eye movements are generally not considered.) Importantly, however, our analyses have been designed very carefully with this explicit challenge in mind. All of our analyses compare changes in the relationship between brain activity and attention as a function of age, and there is no evidence to suggest that different sources of noise (e.g. crying vs. movement) would associate differently with attention durations nor change their interactions with attention over developmental time. And figures 5 and 7, for example, both look at the relationship of EEG data at one moment in time to a child’s attention patterns hundreds or thousands of milliseconds before and after that moment, for which there is no possibility that head or eye movement artifact can have systematically influenced the results.

      Moving onto the video coding, I see that inter-rater reliability was not very high. Is this due to the fine-grained nature of the coding (20ms)? Is it driven by differences in expertise among the two coders? Or because coding this fine-grained behaviour from video data is simply too difficult? The main dependent variable (looking duration) is extracted from the video coding, and I think the authors should be confident they are maximising measurement accuracy.

      We appreciate the concern. To calculate IRR we used this function (Cardillo G. (2007) Cohen's kappa: compute the Cohen's kappa ratio on a square matrix. http://www.mathworks.com/matlabcentral/fileexchange/15365). Our “Observed agreement” was 0.7 (std= 0.15). However, we decided to report the Cohen's kappa coefficient, which is generally thought to be a more robust measure as it takes into account the agreement occurring by chance. We conducted the training meticulously (refer to response to Q6, R3), and we have confidence that our coders performed to the best of their abilities.

      (2) Cross-correlation analyses

      I would like to raise two issues here. The first is the potential problem of using auto-correlated variables as input for cross-correlations. I am not sure whether theta power was significantly autocorrelated. If it is, could it explain the cross-correlation result? The fact that the cross-correlation plots in Figure 6 peak at zero, and are significant (but lower) around zero, makes me think that it could be a consequence of periods around zero being autocorrelated. Relatedly: how does the fact that the significant lag includes zero, and a bit before, affect the interpretation of this effect? 

      Just to clarify this analysis, we did include a plot showing autocorrelation of theta activity in the original submission (Figs 7A and 7B in the revised paper). These indicate that theta shows little to no autocorrelation. And we can see no way in which this might have influenced our results. From their comments, the reviewer seems rather to be thinking of phasic changes in the autocorrelation, and whether the possibility that greater stability in theta during the time period around looks might have caused the cross-correlation result shown in 7E. Again though we can see no way in which this might be true, as the cross-correlation indicates that greater theta power is associated with a greater likelihood of looking, and this would not have been affected by changes in the autocorrelation.

      A second issue with the cross-correlation analyses is the coding of the looking behaviour. If I understand correctly, if an infant looked for a full second at the same object, they would get a maximum score (e.g., 1) while if they looked at 500ms at the object and 500ms away from the object, they would receive a score of e.g., 0.5. However, if they looked at one object for 500ms and another object for 500ms, they would receive a maximum score (e.g., 1). The reason seems unclear to me because these are different attention episodes, but they would be treated as one. In addition, the authors also show that within an attentional episode theta power changes (for 10mos). What is the reason behind this scoring system? Wouldn't it be better to adjust by the number of attention switches, e.g., with the formula: looking-time/(1+N_switches), so that if infants looked for a full second, but made 1 switch from one object to the other, the score would be .5, thus reflecting that attention was terminated within that episode? 

      We appreciate this suggestion. This is something we did not consider, and we thank the reviewer for raising it. In response to their comment, we have now rerun the analyses using the new measure (looking-time/(1+N_switches), and we are reassured to find that the results remain highly consistent. Please see Author response image 1 below where you can see the original results in orange and the new measure in blue at 5 and 10 months.

      Author response image 1.

      (3) Clearer definitions of variables, constructs, and visualisations

      The second issue is the overall clarity and systematicity of the paper. The concept of attention appears with many different names. Only in the abstract, it is described as attention control, attentional behaviours, attentiveness, attention durations, attention shifts and attention episode. More names are used elsewhere in the paper. Although some of them are indeed meant to describe different aspects, others are overlapping. As a consequence, the main results also become more difficult to grasp. For example, it is stated that autonomic arousal predicts attention, but it's harder to understand what specific aspect (duration of looking, disengagement, etc.) it is predictive of. Relatedly, the cognitive process under investigation (e.g., attention) and its operationalization (e.g., duration of consecutive looking toward a toy) are used interchangeably. I would want to see more demarcation between different concepts and between concepts and measurements.

      We appreciate the comment and we have clarified the concepts and their operationalisation throughout the revised manuscript.

      General Remarks

      In general, the authors achieved their aim in that they successfully showed the relationship between looking behaviour (as a proxy of attention), autonomic arousal, and electrophysiology. Two aspects are especially interesting. First, the fact that at 5 months, autonomic arousal predicts the duration of subsequent attention episodes, but at 10 months this effect is not present. Conversely, at 10 months, theta power predicts the duration of looking episodes, but this effect is not present in 5-month-old infants. This pattern of results suggests that younger infants have less control over their attention, which mostly depends on their current state of arousal, but older infants have gained cortical control of their attention, which in turn impacts their looking behaviour and arousal.

      We thank the reviewer for the close attention that they have paid to our manuscript, and for their insightful comments.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript explores infants' attention patterns in real-world settings and their relationship with autonomic arousal and EEG oscillations in the theta frequency band. The study included 5- and 10-month-old infants during free play. The results showed that the 5-month-old group exhibited a decline in HR forward-predicted attentional behaviors, while the 10-month-old group exhibited increased theta power following shifts in gaze, indicating the start of a new attention episode. Additionally, this increase in theta power predicted the duration of infants' looking behavior.

      Strengths:

      The study's strengths lie in its utilization of advanced protocols and cutting-edge techniques to assess infants' neural activity and autonomic arousal associated with their attention patterns, as well as the extensive data coding and processing. Overall, the findings have important theoretical implications for the development of infant attention.

      Weaknesses:

      Certain methodological procedures require further clarification, e.g., details on EEG data processing. Additionally, it would be beneficial to eliminate possible confounding factors and consider alternative interpretations, e,g., whether the differences observed between the two age groups were partly due to varying levels of general arousal and engagement during the free play.

      We thank the reviewer for their suggestions and have addressed them in our point-by-point responses below.

      Reviewer #3 (Public Review):

      Summary:

      Much of the literature on attention has focused on static, non-contingent stimuli that can be easily controlled and replicated--a mismatch with the actual day-to-day deployment of attention. The same limitation is evident in the developmental literature, which is further hampered by infants' limited behavioral repertoires and the general difficulty in collecting robust and reliable data in the first year of life. The current study engages young infants as they play with age-appropriate toys, capturing visual attention, cardiac measures of arousal, and EEG-based metrics of cognitive processing. The authors find that the temporal relations between measures are different at age 5 months vs. age 10 months. In particular, at 5 months of age, cardiac arousal appears to precede attention, while at 10 months of age attention processes lead to shifts in neural markers of engagement, as captured in theta activity.

      Strengths:

      The study brings to the forefront sophisticated analytical and methodological techniques to bring greater validity to the work typically done in the research lab. By using measures in the moment, they can more closely link biological measures to actual behaviors and cognitive stages. Often, we are forced to capture these measures in separate contexts and then infer in-the-moment relations. The data and techniques provide insights for future research work.

      Weaknesses:

      The sample is relatively modest, although this is somewhat balanced by the sheer number of data points generated by the moment-to-moment analyses. In addition, the study is cross-sectional, so the data cannot capture true change over time. Larger samples, followed over time, will provide a stronger test for the robustness and reliability of the preliminary data noted here. Finally, while the method certainly provides for a more active and interactive infant in testing, we are a few steps removed from the complexity of daily life and social interactions.

      We thank the reviewer for their suggestions and have addressed them in our point-by-point responses below.

      Reviewer #1 (Recommendations For The Authors):

      Here are some specific ways in which clarity can be improved:

      A. Regarding the distinction between constructs, or measures and constructs:

      i. In the results section, I would prefer to mention looking at duration and heart rate as metrics that have been measured, while in the introduction and discussion, a clear 1-to-1 link between construct/cognitive process and behavioural or (neuro)psychophysical measure can be made (e.g., sustained attention is measured via looking durations; autonomic arousal is measured via heart-rate). 

      The way attention and arousal were operationalised are now clarified throughout the text, especially in the results.

      ii. Relatedly, the "attention" variable is not really measuring attention directly. It is rather measuring looking time (proportion of looking time to the toys?), which is the operationalisation, which is hypothesised to be related to attention (the construct/cognitive process). I would make the distinction between the two stronger.

      This distinction between looking and paying attention is clearer now in the reviewed manuscript as per R1 and R3’s suggestions. We have also added a paragraph in the Introduction to clarify it and pointed out its limitations (see pg.5).

      B. Each analysis should be set out to address a specific hypothesis. I would rather see hypotheses in the introduction (without direct reference to the details of the models that were used), and how a specific relation between variables should follow from such hypotheses. This would also solve the issue that some analyses did not seem directly necessary to the main goal of the paper. For example:

      i. Are ACF and survival probability analyses aimed at proving different points, or are they different analyses to prove the same point? Consider either making clearer how they differ or moving one to supplementary materials.

      We clarified this in pg. 4 of the revised manuscript.

      ii. The autocorrelation results are not mentioned in the introduction. Are they aiming to show that the variables can be used for cross-correlation? Please clarify their role or remove them.

      We clarified this in pg. 4 of the revised manuscript.

      C. Clarity of cross-correlation figures. To ensure clarity when presenting a cross-correlation plot, it's important to provide information on the lead-lag relationships and which variable is considered X and which is Y. This could be done by labelling the axes more clearly (e.g., the left-hand side of the - axis specifies x leads y, right hand specifies y leads x) or adding a legend (e.g., dashed line indicates x leading y, solid line indicates y leading x). Finally, the limits of the x-axis are consistent across plots, but the limits of the y-axis differ, which makes it harder to visually compare the different plots. More broadly, the plots could have clearer labels, and their resolution could also be improved. 

      This information on what variable precedes/ follows was in the caption of the figures. However, we have edited the figures as per the reviewer’s suggestion and added this information in the figures themselves. We have also uploaded all the figures in higher resolution.

      D. Figure 7 was extremely helpful for understanding the paper, and I would rather have it as Figure 1 in the introduction. 

      We have moved figure 7 to figure 1 as per this request.

      E. Statistics should always be reported, and effects should always be described. For example, results of autocorrelation are not reported, and from the plot, it is also not clear if the effects are significant (the caption states that red dots indicate significance, but there are no red dots. Does this mean there is no autocorrelation?).

      We apologise – this was hard to read in the original. We have clarified that there is no autocorrelation present in Fig 7A and 7D.

      And if so, given that theta is a wave, how is it possible that there is no autocorrelation (connected to point 1)? 

      We thank the reviewer for raising this point. In fact, theta power is looking at oscillatory activity in the EEG within the 3-6Hz window (i.e. 3 to 6 oscillations per second). Whereas we were analysing the autocorrelation in the EEG data by looking at changes in theta power between consecutive 1 second long windows. To say that there is no autocorrelation in the data means that, if there is more 3-6Hz activity within one particular 1-second window, there tends not to be significantly more 3-6Hz activity within the 1-second windows immediately before and after.

      F. Alpha power is introduced later on, and in the discussion, it is mentioned that the effects that were found go against the authors' expectations. However, alpha power and the authors' expectations about it are not mentioned in the introduction. 

      We thank the reviewer for this comment. We have added a paragraph on alpha in the introduction (pg.4).

      Minor points:

      1. At the end of 1st page of introduction, the authors state that: 

      “How children allocate their attention in experimenter-controlled, screen-based lab tasks differs, however, from actual real-world attention in several ways (32-34). For example, the real-world is interactive and manipulable, and so how we interact with the world determines what information we, in turn, receive from it: experiences generate behaviours (35).”

      I think there's more to this though - Lab-based studies can be made interactive too (e.g., Meyer et al., 2023, Stahl & Feigenson, 2015). What remains unexplored is how infants actively and freely initiate and self-structure their attention, rather than how they respond to experimental manipulations.

      Meyer, M., van Schaik, J. E., Poli, F., & Hunnius, S. (2023). How infant‐directed actions enhance infants' attention, learning, and exploration: Evidence from EEG and computational modeling. Developmental Science, 26(1), e13259.

      Stahl, A. E., & Feigenson, L. (2015). Observing the unexpected enhances infants' learning and exploration. Science, 348(6230), 91-94.

      We thank the reviewer for this suggestion and added their point in pg. 4.

      (2) Regarding analysis 4:

      a. In analysis 1 you showed that the duration of attentional episodes changes with age. Is it fair to keep the same start, middle, and termination ranges across age groups? Is 3-4 seconds "middle" for 5-month-olds? 

      We appreciate the comment. There are many ways we could have run these analyses and, in fact, in other papers we have done it differently, for example by splitting each look in 3, irrespective of its duration (Phillips et al., 2023).

      However, one aspect we took into account was the observation that 5-month-old infants exhibited more shorter looks compared to older infants. We recognized that dividing each into 3 parts, regardless of its duration, might have impacted the results. Presumably, the activity during the middle and termination phases of a 1.5-second look differs from that of a look lasting over 7 seconds.

      Two additional factors that provided us with confidence in our approach were: 1) while the definition of "middle" was somewhat arbitrary, it allowed us to maintain consistency in our analyses across different age points. And, 2) we obtained a comparable amount of observations across the two time points (e.g. “middle” at 5 months we had 172 events at 5 months, and 194 events at 10 months).

      b. It is recommended not to interpret lower-level interactions if more complex interactions are not significant. How are the interaction effects in a simpler model in which the 3-way interaction is removed? 

      We appreciate the comment. We tried to follow the same steps as in (Xie et al., 2018). However, we have re-analysed the data removing the 3-way interaction and the significance of the results stayed the same. Please see Author response image 2 below (first: new analyses without the 3-way interactions, second: original analyses that included the 3-way interaction).

      Author response image 2.

      (3) Figure S1: there seems to be an outlier in the bottom-right panel. Do results hold excluding it? 

      We re-run these analyses as per this suggestion and the results stayed the same (refer to SM pg. 2).

      (4) Figure S2 should refer to 10 months instead of 12.

      We thank the reviewer for noticing this typo, we have changed it in the reviewed manuscript (see SM pg. 3). 

      (5) In the 2nd paragraph of the discussion, I found this sentence unclear: "From Analysis 1 we found that infants at both ages showed a preferred modal reorientation rate". 

      We clarified this in the reviewed manuscript in pg10

      (6) Discussion: many (infant) studies have used theta in anticipation of receiving information (Begus et al., 2016) surprising events (Meyer et al., 2023), and especially exploration (Begus et al., 2015). Can you make a broader point on how these findings inform our interpretation of theta in the infant population (go more from description to underlying mechanisms)? 

      We have extended on this point on interpreting frequency bands in pg13 of the reviewed manuscript and thank the reviewer for bringing it up.

      Begus, K., Gliga, T., & Southgate, V. (2016). Infants' preferences for native speakers are associated with an expectation of information. Proceedings of the National Academy of Sciences, 113(44), 12397-12402.

      Meyer, M., van Schaik, J. E., Poli, F., & Hunnius, S. (2023). How infant‐directed actions enhance infants' attention, learning, and exploration: Evidence from EEG and computational modeling. Developmental Science, 26(1), e13259.

      Begus, K., Southgate, V., & Gliga, T. (2015). Neural mechanisms of infant learning: differences in frontal theta activity during object exploration modulate subsequent object recognition. Biology letters, 11(5), 20150041.

      (7) 2nd page of discussion, last paragraph: "preferred modal reorientation timer" is not a neural/cognitive mechanism, just a resulting behaviour. 

      We agree with this comment and thank the reviewer for bringing it out to our attention. We clarified this in in pg12 and pg13 of the reviewed manuscript.

      Reviewer #2 (Recommendations For The Authors):

      I have a few comments and questions that I think the authors should consider addressing in a revised version. Please see below:

      (1) During preprocessing (steps 5 and 6), it seems like the "noisy channels" were rejected using the pop_rejchan.m function and then interpolated. This procedure is common in infant EEG analysis, but a concern arises: was there no upper limit for channel interpolation? Did the authors still perform bad channel interpolation even when more than 30% or 40% of the channels were identified as "bad" at the beginning with the continuous data? 

      We did state in the original manuscript that “participants with fewer than 30% channels interpolated at 5 months and 25% at 10 months made it to the final step (ICA) and final analyses”. In the revised version we have re-written this section in order to make this more clear (pg. 17).

      (2) I am also perplexed about the sequencing of the ICA pruning step. If the intention of ICA pruning is to eliminate artificial components, would it be more logical to perform this procedure before the conventional artifacts' rejection (i.e., step 7), rather than after? In addition, what was the methodology employed by the authors to identify the artificial ICA components? Was it done through manual visual inspection or utilizing specific toolboxes? 

      We agree that the ICA is often run before, however, the decision to reject continuous data prior to ICA was to remove the very worst sections of data (where almost all channels were affected), which can arise during times when infants fuss or pull the caps. Thus, this step was applied at this point in the pipeline so that these sections of really bad data were not inputted into the ICA. This is fairly widespread practice in cleaning infant data.

      Concerning the reviewer’s second question, of how ICA components were removed – the answer to this is described in considerable detail in the paper that we refer to in that setion of the manuscript. This was done by training a classifier specially designed to clean naturalistic infant EEG data (Haresign et al., 2021) and has since been employed in similar studies (e.g. Georgieva et al., 2020; Phillips et al., 2023).

      (3) Please clarify how the relative power was calculated for the theta (3-6Hz) and alpha (6-9Hz) bands. Were they calculated by dividing the ratio of theta or alpha power to the power between 3 and 9Hz, or the total power between 1 (or 3) and 20 Hz? In other words, what does the term "all frequency bands" refer to in section 4.3.7? 

      We thank the reviewer for this comment, we have now clarified this in pg. 22.

      (4) One of the key discoveries presented in this paper is the observation that attention shifts are accompanied by a subsequent enhancement in theta band power shortly after the shifts occur. Is it possible that this effect or alteration might be linked to infants' saccades, which are used as indicators of attention shifts? Would it be feasible to analyze the disparities in amplitude between the left and right frontal electrodes (e.g., Fp1 and Fp2, which could be viewed as virtual horizontal EOG channels) in relation to theta band power, in order to eliminate the possibility that the augmentation of theta power was attributable to the intensity of the saccades? 

      We appreciate the concern. Average saccade duration in infants is about 40ms (Garbutt et al., 2007). Our finding that the positive cross-correlation between theta and look duration is present not only when we examine zero-lag data but also when we examine how theta forwards-predicts attention 1-2 seconds afterwards seems therefore unlikely to be directly attributable to saccade-related artifact. Concerning the reviewer’s suggestion – this is something that we have tried in the past. Unfortunately, however, our experience is that identifying saccades based on the disparity between Fp1 and Fp2 is much too unreliable to be of any use in analysing data. Even if specially positioned HEOG electrodes are used, we still find the saccade detection to be insufficiently reliable. In ongoing work we are tracking eye movements separately, in order to be able to address this point more satisfactorily.

      (5) The following question is related to my previous comment. Why is the duration of the relationship between theta power and moment-to-moment changes in attention so short? If theta is indeed associated with attention and information processing, shouldn't the relationship between the two variables strengthen as the attention episode progresses? Given that the authors themselves suggest that "One possible interpretation of this is that neural activity associates with the maintenance more than the initiation of attentional behaviors," it raises the question of (is in contradiction to) why the duration of the relationship is not longer but declines drastically (Figure 6). 

      We thank the reviewer for raising this excellent point. Certainly we argue that this, together with the low autocorrelation values for theta documented in Fig 7A and 7D challenge many conventional ways of interpreting theta. We are continuing to investigate this question in ongoing work.

      (6) Have the authors conducted a comparison of alpha relative power and HR deceleration durations between 5 and 10-month-old infants? This analysis could provide insights into whether the differences observed between the two age groups were partly due to varying levels of general arousal and engagement during free play.

      We thank the reviewer for this suggestion. Indeed, this is an aspect we investigated but ultimately, given that our primary emphasis was on the theta frequency, and considering the length of the manuscript, we decided not to incorporate. However, we attached Author response image 3 below showing there was no significant interaction between HR and alpha band.

      Author response image 3.

      Reviewer #3 (Recommendations For The Authors):

      (1) In reading the manuscript, the language used seems to imply longitudinal data or at the very least the ability to detect change or maturation. Given the cross-sectional nature of the data, the language should be tempered throughout. The data are illustrative but not definitive. 

      We thank the reviewer for this comment. We have now clarified that “Data was analysed in a cross-sectional manner” in pg15.

      (2) The sample size is quite modest, particularly in the specific age groups. This is likely tempered by the sheer number of data points available. This latter argument is implied in the text, but not as explicitly noted. (However, I may have missed this as the text is quite dense). I think more notice is needed on the reliability and stability of the findings given the sample. 

      We have clarified this in pg16.

      (3) On a related note, how was the sample size determined? Was there a power analysis to help guide decision-making for both recruitment and choosing which analyses to proceed with? Again, the analytic approach is quite sophisticated and the questions are of central interest to researchers, but I was left feeling maybe these two aspects of the study were out-sprinting the available data. The general impression is that the sample is small, but it is not until looking at table s7, that it is in full relief. I think this should be more prominent in the main body of the study.

      We have clarified this in pg16.

      (4) The devotes a few sentences to the relation between looking and attention. However, this distinction is central to the design of the study, and any philosophical differences regarding what take-away points can be generated. In my reading, I think this point needs to be more heavily interrogated. 

      This distinction between looking and paying attention is clearer now in the reviewed manuscript as per R1 and R3’s suggestions. We have also added a paragraph in the Introduction to clarify it and pointed out its limitations (see pg.5).

      (5) I would temper the real-world attention language. This study is certainly a great step forward, relative to static faces on a computer screen. However, there are still a great number of artificial constraints that have been added. That is not to say that the constraints are bad--they are necessary to carry out the work. However, it should be acknowledged that it constrains the external validity. 

      We have added a paragraph to acknowledged limitations of the setup in pg. 14.

      (6) The kappa on the coding is not strong. The authors chose to proceed nonetheless. Given that, I think more information is needed on how coders were trained, how they were standardized, and what parameters were used to decide they were ready to code independently. Again, with the sample size and the kappa presented, I think more discussion is needed regarding the robustness of the findings. 

      We appreciate the concern. As per our answer to R1, we chose to report the most stringent calculator of inter-rater reliability, but other calculation methods (i.e., percent agreement) return higher scores (see response to R1).

      As per the training, we wrote an extensively detailed coding scheme describing exactly how to code each look that was handed to our coders. Throughout the initial months of training, we meet with the coders on a weekly basis to discuss questions and individual frames that looked ambiguous. After each session, we would revise the coding scheme to incorporate additional details, aiming to make the coding process progressively less subjective. During this period, every coder analysed the same interactions, and inter-rater reliability (IRR) was assessed weekly, comparing their evaluations with mine (Marta). With time, the coders had fewer questions and IRR increased. At that point, we deemed them sufficiently trained, and began assigning them different interactions from each other. Periodically, though, we all assessed the same interaction and meet to review and discuss our coding outputs.

    1. eLife assessment

      This valuable manuscript reveals sex differences in bi-conditioning Pavlovian learning and conditional behavior. Males learn hierarchical context-cue-outcome associations more quickly, but females show more stable and robust task performance. These sex differences are related to cellular activation in the orbitofrontal cortex. Although the evidence for the claims is convincing, the claim of sex differences in context-dependent discrimination behaviour is overstated in places. Nevertheless, the results will be of interest to many behavioural neuroscientists, particularly those who investigate sex-specific behaviours.

    2. Reviewer #1 (Public Review):

      Summary:

      Peterson et al., present a series of experiments in which the Pavlovian performance (i.e. time spent at a food cup/port) of male and female rats is assessed in various tasks in which context/cue/outcome relationships are altered. The authors find no sex differences in context-irrelevant tasks, and no such differences in tasks in which the context signals that different cues will earn different outcomes. They do find sex differences, however, when a single outcome is given and context cues must be used to ascertain which cue will be rewarded with that outcome (Ctx-dep O1 task). Specifically, they find that males acquired the task faster, but that once acquired, performance of the task was more resilient in female rats against exposures to a stressor. Finally, they show that these sex differences are reflected in differential rates of c-fos expression in all three subregions of rat OFC, medial, lateral and ventral, in the sense that it is higher in females than males, and only in the animals subject to the Ctx-dep O1 task in which sex differences were observed.

      Strengths:

      • Well written<br /> • Experiments elegantly designed<br /> • Robust statistics<br /> • Behaviour is the main feature of this manuscript, rather than any flashy techniques or fashionable lab methodologies, and luckily the behaviour is done really well.<br /> • For the most part I think the conclusions were well supported, although I do have some slightly different interpretations to the authors in places.

      Weaknesses:

      The authors have done an excellent job of addressing all previous weaknesses. I have no further comments.

    3. Reviewer #2 (Public Review):

      Summary:

      A bidirectional occasion-setting design is used to examine sex differences in the contextual modulation of reward-related behaviour. It is shown that females are slower to acquire contextual control over cue-evoked reward seeking. However, once established, the contextual control over behaviour was more robust in female rats (i.e., less within-session variability and greater resistance to stress) and this was also associated with increased OFC activation.

      Strengths:

      The authors use sophisticated behavioural paradigms to study the hierarchical contextual modulation of behaviour. The behavioural controls are particularly impressive and do, to some extent, support the specificity of the conclusions. The analyses of the behavioural data are also elegant, thoughtful, and rigorous.

      Comments on revised version:

      In this revised version the authors have addressed the major weaknesses that I identified in my previous review.

    4. Reviewer #3 (Public Review):

      Summary:

      This manuscript reports an experiment that compared groups of rats acquisition and performance of a Pavlovian bi-conditional discrimination, in which the presence of one cue, A, signals that the presentation of one CS, X, will be followed by a reinforcer and a second CS, Y, will be nonreinforced. Periods of cue A alternated with periods of cue B, which signaled the opposite relationship, cue X is nonreinforced and cue Y is reinforced. This is a conditional discrimination problem in which the rats learned to approach the food cup in the presence of each CS conditional on the presence of the third background cue. The comparison groups consisted of the same conditional discrimination with the exception that each CS was paired with a different reinforcer. This makes the problem easier to solve as the background is now priming a differential outcome. A third group received simple discrimination training of X reinforced and Y nonreinforced in cues A and B, and the final group were trained with X and Y reinforced on half the trials (no discrimination). The results were clear that the latter two discrimination learning procedures resulted in rapid learning in comparison to the first. Rats required about 3 times as many 4-session blocks to acquire the bi-conditional discrimination than the other two discrimination groups. Within the biconditional discrimination group, female and male rats spent the same amount of time in the food cup during the rewarded CS, but females spent more time in the food cup during CS- than males. The authors interpret this as a deficit in discrimination performance in females on this task and use a measure that exaggerates the difference in CS+ and CS_ responding (a discrimination ratio) to support their point. When tested after acute restraint stress, the male rats spent less time in the food cup during the reinforced CS in comparison to the female rats, but did not lose discrimination performance entirely. The was also some evidence of more fos positive cells in the orbitofrontal cortex in females. Overall, I think the authors were successful in documenting performance on the biconditional discrimination task, showing that it is more difficult to perform than other discriminations is valuable and consistent with the proposal that accurate performance requires encoding of conditional information (which the authors refer to as "context"). There is evidence that female rats spend more time in the food cup during CS-, but this I hesitate to agree that this is an important sex difference. There is no cost to spending more time in the food cup during CS- and they spend much less time there than during CS+. Males and females also did not differ in their CS+ responding, suggesting similar levels of learning, A number of factors could contribute to more food cup time in CS-, such as smaller body size and more locomotor activity. The number of food cup entries during CS+ and CS- was not reported here. Nevertheless, I think the manuscript will make a useful contribution to the field and hopefully lead readers to follow up on these types of tasks. One area for development would be to test the associative properties of the cues controlling the conditional discrimination, can they be shown to have the properties of Pavlovian occasion setting stimuli? Such work would strengthen the justification/rationale for using the term "context" and "occasion setter" to refer to these stimuli in this task in the way the authors do in this paper.

      Strengths:

      Nicely designed and conducted experiment.<br /> Documents performance difference by sex.

      Weaknesses:

      Overstatement of sex differences.<br /> Inconsistent, confusing, and possibly misleading use of terms to describe/imply the underlying processes contributing to performance.

    5. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive comments on our manuscript and their appreciation of the results. We provide point-by-point responses bellow. For your convenience we highlight here the main changes to the manuscript.

      ·        More descriptive terminology for the contextual cues (Ctx.A / Ctx.noA is now referred to as LIGHT / DARK).

      ·        Schematic of experiment timeline highlighting the exclusion of non-discriminators following the initial acquisition period. This explains the absence of baseline sex differences post acquisition and clears up some misconceptions about lack of replicability.

      ·        New data (time in port preCS) showing that a prior reward does not cause continued presence in port.

      ·        Several text edits to address all the points raised by the reviewers.

      We hope that the editors and reviewers will be satisfied with this revised version and find the strength of the evidence more convincing.

      Reviewer #1 (Recommendations For The Authors):

      In relation to weaknesses points 1-4 in the public review:

      (1) With regards to the claim (page 4 of pdf), I think I can see what the authors are getting at when they claim "Only Ctx-dep.01 engages context-gated reward predictions", because the same reward is available in each context, and the animal must use contextual information to determine which cue will be rewarded. In other words, it has a discriminative purpose. In Ctx-dep.O1/O2, however, although the context doesn't serve a discriminative purpose in the sense that one cue will always earn a unique outcome, regardless of context, the fact that these cues are differentially rewarded in the different context means that animals may well form context-gated cue-outcome associations (e.g. CtxA-(CS1-O1), CtxnoA-(CS2-O2)). Moreover, the context is informative in this group in telling the animal which cue will be rewarded, even prior to outcome delivery, such that I don't think contextual information will fade to the background of the association and attention be lost to it in the way, say Mackintosh (1975) might predict. Therefore, I don't think this statement is correct.

      I suggest that the authors refine the statement to be more accurate.

      We agree with the reviewer —the context is absolutely relevant for rats trained in the Ctx-dep. O1/O2 task. We have edited the text in several places to make this clear. The question is how (by what mechanism) does the context participate in the control of behavior in this group. The reviewer correctly points out that, just like rats trained in the Ctx-dep. O1 task, rats trained in the Ctx-dep. O1/O2 might have formed context-gated cue-outcome associations. We now clearly acknowledge that in the text.

      However, because in this group the two outcomes are always encountered in different contexts, we argue that these rats could also have formed a direct association between the two contexts and the two outcomes. In other words, each context might directly evoke the expectation of a distinct reward outcome (prepare to drink, or prepare to eat). On a given trial, if the cue and context both tend to activate the same outcome representation, the converging cue+context excitation can add up. This would produce a context-sensitive response, but not via hierarchical modulation process (unlike Ctx-dep O1). Arguably, this last associative mechanism is much simpler and might explain why almost all rats in Ctx-dep. O1/O2 group learned the discrimination and at a much faster rate.

      Therefore, while rats trained in Ctx-dep O1/O2 might engage a combination of associative processes to achieve context-sensitive behavior (including hierarchical associations), only rats in the Ctx-dep O1 critically and unambiguously rely on hierarchical associations to achieve context-sensitive behavior.

      (2) I think the results shown in Figure 1 are very interesting, and well supported by the statistics. It's so nice to see a significant interaction, as so many papers try to report these types of effects without it. However, I do wonder how specific the results are to contextual modulation. That is, should a discriminative discrete cue be used instead of each context (e.g. CS1 indicates CS2 earns O1, CS3 indicates CS4 earns O1), would female rats still be as slow to learn the discrimination?

      I am just curious as to whether the authors have thoughts on this.

      We have not tested this and are not aware of a paper that examined this question specifically.

      However, we would like to point out that in the suggested design (CS1→[CS2→O1]; CS3→[CS4→O1]) the discriminative cues (CS1 and CS3) would almost certainly also acquire substantial reward-predictive value, either because of their direct association with the reward, or via second-order conditioning. This would complicate the interpretation of the results in terms of hierarchical associations. Incorporating non-rewarded presentation of CS1 and CS3 alone (i.e. extinguishing those cues, as is sometimes done in occasion setting experiments) would be one way to reduce the reward expectation evoked by those cues, but this approach has some limitations. Indeed, as mentioned by Rescorla (2006) “During extinction, the net associative strength of a stimulus declines to the level of [a response] threshold, but further decrement stops at that point”. So while extinguished CS1 and CS3 might no longer evoke overt behavioral responses, these cues could retain nonnegligible subthreshold excitatory connection with the US.  Individually, these cues might fail to evoke responding but could nonetheless increase responding during the CS1→CS2 trials (or CS3→CS4 trials), via simple summation. (Rescorla, 2006: “the compound of two [extinguished] stimuli has a strength that exceeds the threshold and so evokes responding”).

      This type of consideration is precisely why we opted for the behavioral task used in the study. In Ctx-dep. O1, the discriminative stimuli exert opposite effects on the two target cues, which rules out summation effects as a mechanism for context-sensitive behavior.

      (3) Pages 8-9 of pdf, where the biological basis or the delayed acquisition of contextual control in females is considered, I find this to be written from a place of assuming that what is observed in the males is the default behaviour. That is, although the estrous cycle and its effects on synaptic plasticity/physiology may well account for the results, is there not a similar argument to be made for androgens in males? Perhaps the androgens also somehow alter synaptic plasticity/physiology, leading to their faster speed, reduced performance stability, and increased susceptibility to stress.

      I would like the argument that female behaviour might be the default, and male behaviour the deviation to be considered in the discussion in addition to those already stated.

      We regret if we gave the impression that male behavior was the default. The paper is intended to report sex differences but we don’t view either sex as the default. To correct this impression, we have added a few sentences in the discussion to highlight male-hormonal factors as well as non-gonadal genetic factors that might have contributed to the observed sex differences.

      (4) In addition, the OFC - which is the brain region found to have differential expression of c-fos in males and females in Figure 5 - is not explicitly discussed with regard to the biological mechanisms of differences, which seems odd.

      I suggest OFC be discussed with regard to biological mechanisms of differences.

      We added a few sentences in the discussion to i) highlight the parallel between our study and human fMRI studies showing superior OFC activation in females during the regulation of emotional responses, ii) Suggest a potential relationship between the reported sex differences (speed of acquisition, robustness of performance, and OFC activation in context-gated reward prediction), iii) acknowledge our ignorance of the root causes of these sex differences.

      We wish we could offer a better answer. We have attempted to offer possible proximal explanations for the observed sex differences, but ultimately our work did not address the root causes of these behavioral and neural sex differences. Therefore we feel that further attempts to explain these differences would be too speculative.

      (5) I did wonder if the authors were aware that in the Rescorla-Wagner model, contextual stimuli are thought to summate with discrete cues to enter into the association with the outcome (i.e., the error term is between lambda and sigmaV, with sigmaV the 'summation' of all stimuli present on a trial, including contextual stimuli). Typically, this is not considered much, because the cue itself is so salient and more consistently paired with reward (whereas the ever-present context is often paired with no reward), but nevertheless, it is a part of the association. I'm not sure it's wrong to say that the background circumstances under which events occur are thought to play little role (as in the second sentence of the introduction), but I was wondering if the authors were aware of this fact when they wrote that.

      This sentence in the introduction was meant to introduce the distinction between eliciting stimuli and modulating contexts. Admittedly, this paints a naive picture, which we now acknowledge (we hope that the rest of the paper provides more nuance). As pointed out by this reviewer, the context is also a stimulus, and, just like any other stimulus, it is eligible for direct association with an outcome. The possibility for direct context→outcome association is precisely the rational for the Ctx-dep O1/O2 group.

      (6) Context-noA - Seems a little confusing for a name, why not just call it context B? NoA appears to imply that nothing happens in A or no outcome is available, whereas this is not always the case.

      We debated which terminology to use. We felt that “Context A vs. Context B” should perhaps be reserved to situations where the global context changes (e.g. two different conditioning boxes with different odors, floor texture etc., with proper counterbalancing procedures). We felt that “Context A vs noA” might be more appropriate here, as we are manipulating the local context by introducing (or removing) one single stimulus (the houselight). In this revised version we followed this reviewer’s advice and adopted a more descriptive, and hopefully less confusing, terminology: "Light vs Dark”.

      (7) Why is it that in the text the Ctx-dep O1/O2 is explained before simple and no discrimination, but in the Figure Ctx-dep O1/O2 is shown last? These should be consistent.

      Thanks for pointing that out. We have switched the order of task description to be consistent with the figures.

      (8) Page 6 (of pdf) - could the authors elaborate a little on why or how (or both) the delivery of reward can interfere with the expression of context-dependent discrimination? Do they just mean the performance of discrimination (e.g., animals will sit at the food port longer if there is food there because they are sitting there and eating it, which does not necessarily reflect the expectation of food based on cue presentations?), in which case it is not the discrimination itself that is being interfered with, just the measure of it. Perhaps the authors could elaborate by just inserting a sentence.

      We have added a few sentences to discuss this effect.

      The first clarification that we can make is that the reduced discrimination performance following reward is not simply due to animals’ continued presence in the reward port. We have added the time pre-cue to Fig. 3 B-F. This measure is not affected by previous reward history, showing that rats are leaving the port between trials.

      So what is driving this effect? At this stage, we are agnostic about the mechanism(s) for this effect. Kuchibhotla et al. (2019) —who first reported a similar effect— proposed a model in which recent rewards modify the threshold for behavioral responses (i.e. performance). In this model, a cue might evoke a weak reward prediction but evoke a strong behavioral response if presented after a reward. Additionally, we believe that learning factors might also contribute to the effect reported here. Indeed, the behavioral response on a given trial likely reflects the balance of hierarchical (context-dependent) associations vs. direct associations (Bradfield and Balleine, 2013). Naturally, this balance is dynamic and influenced by trial history. For instance, a Light:X+ trial might increase the value of cue X and promote responding during the following Dark:X- trial. The same logic could be applied to the influence of the context (e.g., Light:X+ trial might promote responding to a subsequent Light:Y- trial). We are currently working on a computational model that captures the dynamic interplay between hierarchical associations and direct associations. We hope that this model will provide some insight into the learning/performance mechanism for the effects reported here. However this computational work is still in the early stages and beyond the scope of the present study.

      (9) The lack of effect in the Ctx-dep O1/O2 groups in Figure 4 could be due to a lack of power - the group sizes are a lot smaller for this group than for Ctx-dep O1 where an interaction was detected. I think this should be at least addressed in the discussion (i.e., that this lack of effect is possibly due to less power here, as the effects are in the same direction).

      Good point. We now acknowledge this limitation in the text.

      Reviewer #2 (Recommendations For The Authors):

      (1) Please comment on the failure to replicate the sex differences across experiments. Perhaps this is due to some change in the training procedure that is briefly mentioned in the methods (a reduction in the number of rewarded trials) but it is unclear.

      The reviewer correctly observed that Fig. 3-5 do not show sex differences in baseline condition. This is not because of a replication failure, but because non-discriminating subjects were excluded from the experiment at the end of the acquisition period (after 72 training sessions). We now clarify this in the Method and Results section. We also added a schematic of the experiment timeline that highlights the exclusion of non-discriminators at the end of the acquisition period (Fig 1).

      On the topic of replicability, the data for Ctx-dep O1 was collected over 3 cohorts (over the course of 2 years) and the sex difference pattern was consistent.  For instance, the proportion of discriminators vs. non-discriminators for males and females trained in Ctx-dep O1, showed similar patterns across cohorts (see below).

      Author response table 1.

      (2) The design of this experiment makes it possible to analyse whether there is a differential outcome effect (DOE). The DOE would indeed predict better discrimination in group cxt-dep O1/O2 versus cxt-dep O1, which seems to be exactly what the authors observe although between-group statistics are not reported. Inspection of Figure 1 suggests that there may be a DOE in females but not in males. I wonder if the authors might consider reanalysing the data to check this.

      Indeed, there is clearly a differential outcome effect. We now point out this DOE in relation to the latency to achieve discrimination criterion (Fig. 2 C-D). Rats in the Ctx-dep. O1/O2 group acquired discrimination (reached criterion) much faster than rats in in the Ctx-dep. O1 group.

      Following the reviewer’s suggestion, we provide here the results of targeted ANOVAs (focusing exclusively on Ctx-dep. O1 and Ctx-dep. O1/O2) to investigate a potential sex-dependent effect of DOE (i.e. Sex x Task interactions), see figure below. A three-way ANOVA (Sex x Task x Session) conducted on the discrimination index reveal a main effect of Task (F1, 86 \= 173.560, P < 0.001), Session (F2.678, 230.329 \= 140.479, P<0.001) and a marginal effect of Sex (F1,86 = 3.929, P = 0.051), but critically no Task x Sex or Task x Sex x Session interaction (P ≥ 0.504). A two-way ANOVA (Sex x Task) conducted on the sessions to criterion revealed a main effect of both factors (Sex F1, 63 = 9.52, P = 0.003; Task F1, 62 = 184.143, P < 0.001) but critically, no Sex x Task interaction (P = 0.233).  These results indicate that the use of two different outcomes clearly facilitated the acquisition of context-dependent discrimination (DOE effect), but this effect benefited both sexes equally. We thank the reviewer for recommending this analysis.

      Author response image 1.

      Differential outcome effect (DOE) affects males and females equally. A. Discrimination ratio over the acquisition period. B. trials to criterion. Compared to animals trained with a single outcome (Ctx-dep. O1), the introducing dissociable outcomes for the two type of rewarded trials (Ctx-dep. O1/O2) profoundly facilitated the acquisition of discriminated behavior. This effect benefited both sexes equally.

      (3) Some minor points for clarification that the authors may also wish to address:

      - Figure 3: is data presented from sessions 71-80 only or for all sessions? I didn't fully follow the explanation offered in the results section.

      That’s right. The data presented in Fig. 3 considers only sessions 71-80, in discriminator rats —when performance is globally stable. We have edited the text to make this clearer. These 10 sessions represent a total of 800 trials (=10 session * 80 trials). The first trial of a session what not included in the analysis since it was not preceded by any trial. For the remaining 790 trials (10 session x 79 trials), we examined how the outcome of the past trial (reward or nonrewarded) influenced responding on the next trial.  This large sample size (790 trials / rat) was required to ensure that enough data was collected for each possible trial history scenario.

      - The authors argue that females are protected from the disrupting effect of stress. It might be useful if the authors offer further explanation as to what they mean by "protected".

      By “protected”, we simply mean “less sensitive”. We have reworded this sentence in that way. We do not claim to have an understanding of the precise mechanism for this sex dependent effect (although our data point to a possible role of the OFC).

      - The authors state that "delivery of reward, while critical for learning, can also interfere with the expression of context-dependent discrimination". This statement should be explained in further detail. For instance, why should reward delivery specifically impair context-dependent discrimination but not other forms of discrimination?

      We have reworded this sentence to be more inclusive. Indeed, delivery of reward also interferes with other forms of discrimination, particularly when discrimination performance is not yet optimal. We have also added a paragraph to discuss the possible mechanisms by which reward might interfere with discrimination performance in our task.   

      Reviewer #3 (Recommendations For The Authors):

      I do not suggest additional experiments, but I do hope you continue the behavioral work to characterize what is being learned in the task. I think the approach is promising. I would suggest reporting the % time in port and port entries for the entire CS. There is no justification for only analyzing the response in the last 5s.

      We thank the reviewer for the encouragement.

      We opted to focus on the time in port for two main reasons:

      (1) This measure is relatively consistent across the two different reward outcomes (unlike the rate of port entries). Indeed, consistent with prior studies (Delamater et al., 2017), we observed that the type of reward (solid or liquid) influences the topography of the anticipatory magazine-directed behavior. Specifically, cues paired with pellets elicited significantly more port entries than cues paired with chocolate milk. The opposite pattern was observed for time in port --cues paired with chocolate milk elicited more sustained time in port compared to cues paired with pellets (see figure below). While these measures (port entries and time in port) show opposite bias for the two possible outcomes, the size of this bias is much smaller for the time in port (Cohen’s d effect size: port entries: 1.41; time in port: 0.62). As a result, the discrimination ratio calculated from Time in port is consistent across the two outcomes (P = 0.078; effect size: 0.07), which is not the case for the discrimination ratio calculated from port entries (P = 0.007; effect size 0.32 see figure below).

      (2) Unlike the rate of port entries, the time in port shows monotonic increase during training in these tasks. Indeed, we observed here and in past work (Keiflin et al., 2019), that the rate of port entries initially increases with training, but then slightly decreases; particularly for cues paired with liquid reward. In contrast, the time in port continues to increase, or remains high, with extended training. This is easy to understand if we consider the extreme case of a hypothetical rat that might enter the port once upon cue presentation and maintain continued presence in port for the whole cue duration. This rat would have a relatively low rate of port entry (a single port entry per trial) but a high time in port.

      This is not to say that the rate of port entries is not a valid measure overall (we have used, and continue to use, this metric in other preparations). However, for the reasons explained above, we believe that the time in port is a better metric for reward anticipation in this specific study.

      Moreover, we chose to focus our analysis on the last 5s of the cue because that’s when anticipatory food cup behavior is more reliably observed (in our preparation >2/3 of the total time in port in occurs during the last 5s of the cue) and less contaminated by orienting behaviors (Holland, 1977, 1980, 2000). For these reasons, analysis of the last portion of the cue is relatively common in Pavlovian anticipatory approach preparations (El-Amamy and Holland, 2007; Olshavsky et al., 2013; Esber et al., 2015; Holland, 2016a, 2016b; Schiffino and Holland, 2016; Gardner et al., 2017; Sharpe et al., 2021; Maes et al., 2020; Sharpe et al., 2020; Siemian et al., 2021; Kang et al., 2021). Reporting time in port during the same cue epoch facilitates comparisons between these studies.

      We have edited the text in the Method section to provide a brief justification for focusing our analyses on this cue epoch.

      Author response image 2.

      Outcome identity influences the topography of the conditioned response. A-C: Conditioned responding expressed as the number of port entries per trial (A) or time in port per trials (C) for rats trained in the simple discrimination task with a chocolate milk reward (n= 19) or a sucrose pellet (n = 16). Data show the average of the last three 3 sessions. Compared to chocolate milk, pellets tend to produce more port entries. Conversely, chocolate milk tend to produce more time in port. However the magnitude of this bias is smaller for the Time in port. C-D: discrimination ratio calculate from the number of port entries (C) or the time in port (D); the latter is not affected by the outcome identity. *P<0.05; **P<0.01; ***P<0.001 T tests.

      The inconsistent use of terms is distracting throughout the paper. Is it discriminated or context-gated? Please provide a definition of your terms and then use them consistently. Is it a discriminative stimulus, a context, or an occasion setter? These all imply slightly different things and it would help the reader if you just used one term throughout the paper.

      Thanks for pointing that out. We have added a definition for “context-gated” and edited the text to keep the terminology consistent when appropriate. The words “discrimination”/”discriminated” still appear in the manuscript but without implying a mechanism (all tasks are variations of Pavlovian discrimination; the rats discriminating between rewarded and non-rewarded trials).

      As mentioned by this reviewer, the terms “context” and “occasion setter” are not synonymous. Therefore these terms still appear in the manuscript to refer to different concepts (e.g. in our task the visual stimulus is a context for all rats; this context acts as an occasion setter only for some rats).

      Minor:

      Intro, 2nd PP: "autism". This is abbreviated in the abstract but spelled out here. I suggest not abbreviating in the abstract and introducing abbreviations here, as you do with PTSD.

      Fixed as suggested

      Have deficits in contextual modulation been distinguished from potential deficits in binary associative learning in autism, PTSD, and substance use disorders? This is implied, but there are no citations provided.

      We provide a list of references showing deficits in contextual modulation in these disorders.

      This does not mean that these disorders are reducible to deficits in contextual modulation and it does not exclude other forms of deficits in those disorders --including alterations in certain aspects of binary associative learning.

      "In positive occasion-setting, animals learn that a target cue (X) results in a reward outcome (+) only when that cue is accompanied by a contextual feature (A); the same cue presented in absence of this contextual feature remains without consequence (A:X+ / X-)." - there are words missing in this sentence.

      We apologize but we fail identify the missing word(s). Perhaps the reviewer could be more specific and we will be happy to edit the sentence as needed.

      What is a contextual feature, is this redundant or can you provide a specific definition?

      We use the terminology “feature” and “target” as these are the standard terms in the description of occasion setting preparations (one stimulus, “the feature”, sets the occasion for responding –or not responding- to the “target” cue). By contextual feature, we meant that in this specific example the context was the feature. We have clarified this in the text. We believe that these terms are not redundant. Indeed, the context is not always a feature, and a feature is not necessarily a context (phasic cues can serve as “features”).

      Can you provide some background on studies of sex differences in simple associative learning? You imply these have been much more thoroughly studied than conditional discriminations.

      We added a few references as suggested.

      What is the rationale for studying stress?

      Stressful life events exacerbate several mental illnesses, potentially by impacting cognitive functions.

      Although the (sex-dependent) effects of stress on some cognitive function are well established (e.g. working memory, selective attention, spatial navigation), the effect of stress on contextual modulation (a core dysfunction in certain mental illnesses) --and the possible sex-differences in this effect-- had not been formally tested. We added a few sentences in the results section (at the beginning of the stress section) to remind the reminder of why we tested the effect of stress in this task.

      Method/Results:

      Cues are not counterbalanced; the feature is visual and targets are auditory - this should be noted as a limitation in the discussion section.

      We now acknowledge this limitation in the discussion. Moreover we believe that the new terminology for the context —Light vs Dark— (instead of A vs. noA in the original version) makes it abundantly clear that the “context” is this study was always visual.

      Summation is invoked to describe the discrimination with different outcomes, how is summation happening? This is not described. Perhaps incorporate the literature on conditional discriminations with differential outcomes (the "differential outcomes effect").

      We have edited the Result + Discussion section to clarify how summation might contribute to discrimination with different outcomes. We have also added references for the DOE in this task.

      The stress effect is confounded with test order; comparing stress vs. baseline.

      Sorry we don’t understand this point. The “baseline” refers to the animal’s performance on the last training session before the acute stress manipulation (we have edited the text to make this clear). Animals are first trained in the task and then we examine how stress alters their performance in this learned task. We don’t see how this could induce a test order confound.

      Throughout the results section, it would be helpful to have the number of animals reported for each analysis.

      The number of animals for each part of the experiment is now reported in the text, as well as in the figures.

      Discussion:

      "For Ctx-dep. O1, context is an occasion-setter, i.e. a stimulus that hierarchically modulates the associative strength between a target cue and its outcome." This is inaccurate. Occasion setters do not change or modulate the associative strength of a target cue. They modulate whether excitation or inhibition is expressed.

      We reworded the sentence as suggested: “For Ctx-dep. O1, context is an occasion-setter, i.e. a stimulus that modulates the response to a target cue”.

      "Together, these results indicate that the sex differences observed here are not attributable to simple associative, motivational, working-memory, or attentional processes, but are specific to the neurocomputational operations required for the hierarchical, contextual control of behavior." It should be noted here that the difference is one of degree, a quantitative difference, but not a difference in the qualitative features of the process.

      "Regardless of the precise mechanism, our results indicate that, compared to male rats, females ultimately achieved more stable contextual control over cued reward-seeking; their behavior remained context-regulated under stress or after recent rewards." Again this is a matter of degree.

      We absolutely agree. All the sex-difference reported here are a matter of degree. In the framework of McCarthy et al. (2012) the reported effects are type 2 or type 3 sex differences, not type 1 sexual dimorphism. We made a few edits in the Discussion to clarify this point.

      Procedure:

      Please clarify the percentage of trials that were reinforced in the No Discrimination group.

      From session 1-32 (acquisition period), 50% of the trials were reinforced. Following this acquisition period, only 25% of the trials were reinforced to match all the other groups. We have edited the method section to clarify this point.

      Please provide the dimensions of the restraint tubes and the model number if available.

      This information is now included.

      References

      Bradfield LA, Balleine BW (2013) Hierarchical and binary associations compete for behavioral control during instrumental biconditional discrimination. J Exp Psychol Anim Behav Process 39:2–13.

      Delamater AR, Garr E, Lawrence S, Whitlow JW (2017) Elemental, configural, and occasion setting mechanisms in biconditional and patterning discriminations. Behav Processes 137:40–52.

      El-Amamy H, Holland PC (2007) Dissociable effects of disconnecting amygdala central nucleus from the ventral tegmental area or substantia nigra on learned orienting and incentive motivation. Eur J Neurosci 25:1557–1567.

      Esber GR, Torres-Tristani K, Holland PC (2015) Amygdalo-striatal interaction in the enhancement of stimulus salience in associative learning. Behav Neurosci 129:87–95.

      Gardner MPH, Conroy JS, Shaham MH, Styer CV, Schoenbaum G (2017) Lateral Orbitofrontal Inactivation Dissociates Devaluation-Sensitive Behavior and Economic Choice. Neuron 96:1192–1203.e4.

      Holland PC (1977) Conditioned stimulus as a determinant of the form of the Pavlovian conditioned response. J Exp Psychol Anim Behav Process 3:77–104.

      Holland PC (1980) CS-US interval as a determinant of the form of Pavlovian appetitive conditioned responses. J Exp Psychol Anim Behav Process 6:155–174.

      Holland PC (2000) Trial and intertrial durations in appetitive conditioning in rats. Anim Learn Behav 28:121–135.

      Holland PC (2016a) Enhancing second-order conditioning with lesions of the basolateral amygdala. Behav Neurosci 130:176–181.

      Holland PC (2016b) Effects of amygdala lesions on overexpectation phenomena in food cup approach and autoshaping procedures. Behav Neurosci 130:357–375.

      Kang M, Reverte I, Volz S, Kaufman K, Fevola S, Matarazzo A, Alhazmi FH, Marquez I, Iordanova MD, Esber GR (2021) Agency rescues competition for credit assignment among predictive cues from adverse learning conditions. Sci Rep 11:16187.

      Keiflin R, Pribut HJ, Shah NB, Janak PH (2019) Ventral tegmental dopamine neurons participate in reward identity predictions. Curr Biol 29:93–103.e3.

      Kuchibhotla KV, Hindmarsh Sten T, Papadoyannis ES, Elnozahy S, Fogelson KA, Kumar R, Boubenec Y, Holland PC, Ostojic S, Froemke RC (2019) Dissociating task acquisition from expression during learning reveals latent knowledge. Nat Commun 10:2151.

      Maes EJP, Sharpe MJ, Usypchuk AA, Lozzi M, Chang CY, Gardner MPH, Schoenbaum G, Iordanova MD (2020) Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors. Nat Neurosci 23:176–178.

      McCarthy MM, Arnold AP, Ball GF, Blaustein JD, De Vries GJ (2012) Sex differences in the brain: the not so inconvenient truth. J Neurosci 32:2241–2247.

      Olshavsky ME, Song BJ, Powell DJ, Jones CE, Monfils M-H, Lee HJ (2013) Updating appetitive memory during reconsolidation window: critical role of cue-directed behavior and amygdala central nucleus. Front Behav Neurosci 7:186.

      Rescorla RA (2006) Deepened extinction from compound stimulus presentation. J Exp Psychol Anim Behav Process 32:135–144.

      Schiffino FL, Holland PC (2016) Secondary visual cortex is critical to the expression of surprise-induced enhancements in cue associability in rats. Eur J Neurosci 44:1870–1877.

      Sharpe MJ, Batchelor HM, Mueller LE, Gardner MPH, Schoenbaum G (2021) Past experience shapes the neural circuits recruited for future learning. Nat Neurosci 24:391–400.

      Sharpe MJ, Batchelor HM, Mueller LE, Yun Chang C, Maes EJP, Niv Y, Schoenbaum G (2020) Dopamine transients do not act as model-free prediction errors during associative learning. Nat Commun 11:106.

      Siemian JN, Arenivar MA, Sarsfield S, Borja CB, Russell CN, Aponte Y (2021) Lateral hypothalamic LEPR neurons drive appetitive but not consummatory behaviors. Cell Rep 36:109615.

    1. eLife assessment

      This important study shows that a high autism quotient in neurotypical adults is associated with suboptimal motor planning and visual updating after eye movements, suggesting a disrupted efference copy mechanism. The implication is that abnormal visuomotor updating may contribute to sensory overload - a key symptom in autism spectrum disorder. The evidence presented is convincing, with few limitations, and should be of broad interest to neuroscientists at large.

    2. Reviewer #1 (Public Review):

      Summary:

      This study examines a hypothesized link between autism symptomatology and efference copy mechanisms. This is an important question for a number of reasons. Efference copy is both a critical brain mechanism that is key to rapid sensorimotor behaviors, and one that has important implications for autism given recent empirical and theoretical work implicating atypical prediction mechanisms and atypical reliance on priors in ASD.<br /> The authors test this relationship in two different experiments, both of which show larger errors/biases in spatial updating for those with heightened autistic traits (as measured by AQ in neurotypical (NT) individuals).

      Strengths:

      The empirical results are convincing - effects are strong, sample sizes are sufficient, and the authors also rule out alternative explanations (ruling out differences in motor behavior or perceptual processing per se).

      Weaknesses:

      My main residual concern is that the paper should be more transparent about both (1) that this study does not include individuals with autism, and (2) acknowledging the limitations of the AQ.<br /> On the first point, and I don't think this is intentional, there are several instances where the line between heightened autistic traits in the NT population and ASD is blurred or absent. For example, in the second sentence of the abstract, the authors state "Here, we examine the idea that sensory overload in ASD may be linked to issues with efference copy mechanisms". I would say this is not correct because the authors did not test individuals with ASD. I don't see a problem with using ASD to motivate and discuss this work, but it should be clear in key places that this was done using AQ in NT individuals.<br /> For the second issue, the AQ measure itself has some problems. For example, reference 38 in the paper (a key AQ paper) also shows that the AQ is skewed more male than modern estimates of ASD, suggesting that the AQ may not fully capture the full spectrum of ASD symptomatology.<br /> Of course, this does not mean that the AQ is not a useful measure (the present data clearly show that it captures something important about spatial updating during eye movements), but it should not be confused with ASD, and its limitations need to be acknowledged. My recommendation would be to do this in the title as well - e.g. note impaired visuomotor updating in individuals with "heightened autistic traits".

      Suggestions for improvement:<br /> - Figure 5 is really interesting. I think it should be highlighted a bit more, perhaps even with a model that uses the results of both tasks to predict AQ scores.<br /> - Some discussion of the memory demands of the tasks will be helpful. The authors argue that memory is not a factor, but some support for this is needed.<br /> - With 3 sessions for each experiment, the authors also have data to look at learning. Did people with high AQ get better over time, or did the observed errors/biases persist throughout the experiment?

    3. Reviewer #2 (Public Review):

      Summary:

      The idea that various clinical conditions may be associated, at least partially, with a disrupted corollary discharge mechanism has been present for long. In this paper, the authors draw a link between sensory overload, a characteristic of autism spectrum disorder, and a disturbance in the corollary discharge mechanism. The authors substantiate their hypothesis with strong evidence from both the motor and perceptual domains. As a result, they broaden the clinical relevance of the corollary discharge mechanism to encompass autism spectrum disorder.

      Public comments:

      The authors write:

      "Imagine a scenario in which you're watching a video of a fast-moving car on a bumpy road. As the car hits a pothole, your eyes naturally make quick, involuntary saccades to keep the car in your visual field. Without a functional efference copy system, your brain would have difficulty accurately determining the current position of your eye in space, which in turn affects its ability to anticipate where the car should appear after each eye movement."

      I appreciate the use of examples to clarify the concept of efference copy. However, I believe this example is more related to a gain-field mechanism, informing the system about the position of the eye with respect to the head, rather than an example of efference copy per-se.

      Without an efference copy mechanism, the brain would have trouble to accurately determine where the eyes will be in space after an eye movement, and it will have trouble predicting the sensory consequences of the eye movement. But it can be argued that the gain-field mechanism would be sufficient to inform the brain about the current position of the eyes with respect the head.

      The authors write:

      "In the double-step paradigm, two consecutive saccades are made to briefly displayed targets 21,22. The first saccade occurs without visual references, relying on internal updating to determine the eye's position."

      Maybe I am missed something, but in the double-step paradigm the first saccade can occur without the help of visual references if no visual feedback is present, that is, when saccades are performed in total darkness. Was this the case for this experiment? I could not find details about room conditions in the methods. Please provide further details.<br /> In case saccades were not performed in total darkness, then the first saccade can be based on the remembered location of the first target presented, which can be derived from the retinotopic trace of the first stimuli, as well as contribution from the surroundings, that is: the remembered relative location of the first target with respect to the screen border along the horizontal meridian (i.e. allocentric cues)<br /> A similar logic could be applied to the second saccade. If the second saccade were based only on the retinotopic trace, without updating, then it would go up and 45 deg to the right, based on the example shown in Figure 1. With appropriate updating, the second saccade would go straight up. However, if saccades were not performed in total darkness, then the location of the second target could also be derived from its relationship with the surroundings (for example, the remembered distance from screen borders, i.e. allocentric cues).<br /> If saccades were not performed in total darkness, the results shown in Figures 2 and 3 could then be related to: i) differences in motor updating between AQ score groups; ii) differences in the use of allocentric cues between AQ score groups; iii) a combination of i) and ii). I believe this is a point worth mentioning in the discussion."

      The authors write:

      "According to theories of saccadic suppression, an efference copy is necessary to predict the occurrence of a saccade."

      I would also refer to alternative accounts, where saccadic suppression appears to arise as early as the retina, due to the interaction between the visual shift introduced by the eye movement, and the retinal signal associated with the probe used to measure saccadic suppression. This could potentially account for the scaling of saccadic suppression magnitude with saccade amplitude.

      Idrees, S., Baumann, M.P., Franke, F., Münch, T.A. and Hafed, Z.M., 2020. Perceptual saccadic suppression starts in the retina. Nature communications, 11(1), p.1977.

    4. Reviewer #3 (Public Review):

      Summary:

      This work examined efference copy related to eye movements in healthy adults who have high autistic traits. Efference copies allow the brain to make predictions about sensory outcomes of self-generated actions, and thus serve important roles in motor planning and maintaining visual stability. Consequently, disrupted efference copies have been posited as a potential mechanism underlying motor and sensory symptoms in psychopathology such as Autism Spectrum Disorder (ASD), but so far very few studies have directly investigated this theory. Therefore, this study makes an important contribution as an attempt to fill in this knowledge gap. The authors conducted two eye-tracking experiments examining the accuracy of motor planning and visual perception following a saccade, and found that participants with high autistic traits exhibited worse task performance (i.e., less accurate second saccade and biased perception of object displacement), consistent with their hypothesis of less impact of efference copies on motor and visual updating. Moreover, the motor and visual biases are positively correlated, indicative of a common underlying mechanism. These findings are promising and can have important implications for clinical intervention, if they can be replicated in a clinical sample.

      Strengths:

      The authors utilized well-established and rigorously designed experiments and sound analytic methods. This enables easy translations between similar work in non-human primates and humans and readily points to potential candidates for underlying neural circuits that could be further examined in follow-up studies (e.g., superior colliculus, frontal eye fields, mediodorsal thalamus). The finding of no association between initial saccade accuracy and level of autistic trait in both experiments also serves as an important control analysis and increases one's confidence in the conclusion that the observed differences in task performance were indeed due to disrupted efference copies, not confounding factors such as basic visual/motor deficits or issues with working memory. The strong correlation between the observed motor and visual biases further strengthens the claim that the findings from both experiments may be explained by the same underlying mechanism - disrupted efference copies. Lastly, the authors also presented a thoughtful and detailed mechanistic theory of how efference copy impairment may lead to ASD symptomatology, which can serve as a nice framework for more research into the role of efference copies in ASD.

      Weaknesses:

      Although the paper has a lot of strengths, the main weakness of the paper is that a direct link with sensory/motor symptoms cannot be established. As the authors have discussed, the most likely symptoms resulting from disrupted efference copies would be sensory overload and motor inflexibility. The measure used to quantify the level of autistic traits, Autistic Quotient (AQ), does not capture any sensory or motor characteristics of the Autism spectrum. Therefore, it is unknown whether those scored high on AQ in this study experienced high, or even any, sensory or motor difficulties. In other words, more evidence is needed to demonstrate a direct link between disrupted efference copies and sensory/motor symptoms in ASD.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study tests the hypothesis that a high autism quotient in neurotypical adults is strongly associated with suboptimal motor planning and visual updating after eye movements, which in turn, is related to a disrupted efference copy mechanism. The implication is that such abnormal behavior would be exaggerated in those with ASD and may contribute to sensory overload - a key symptom in this condition. The evidence presented is convincing, with significant effects in both visual and motor domains, adequate sample sizes, and consideration of alternatives. However, the study would be strengthened with minor but necessary corrections to methods and statistics, as well as a moderation of claims regarding direct application to ASD in the absence of testing such patients.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study examines a hypothesized link between autism symptomatology and efference copy mechanisms. This is an important question for several reasons. Efference copy is both a critical brain mechanism that is key to rapid sensorimotor behaviors, and one that has important implications for autism given recent empirical and theoretical work implicating atypical prediction mechanisms and atypical reliance on priors in ASD.

      The authors test this relationship in two different experiments, both of which show larger errors/biases in spatial updating for those with heightened autistic traits (as measured by AQ in neurotypical (NT) individuals).

      Strengths:

      The empirical results are convincing - effects are strong, sample sizes are sufficient, and the authors also rule out alternative explanations (ruling out differences in motor behavior or perceptual processing per se).

      Weaknesses:

      My main concern is that the paper should be more transparent about both (1) that this study does not include individuals with autism, and (2) acknowledging the limitations of the AQ.

      On the first point, and I don't think this is intentional, there are several instances where the line between heightened autistic traits in the NT population and ASD is blurred or absent. For example, in the second sentence of the abstract, the authors state "Here, we examine the idea that sensory overload in ASD may be linked to issues with efference copy mechanisms". I would say this is not correct because the authors did not test individuals with ASD. I don't see a problem with using ASD to motivate and discuss this work, but it should be clear in key places that this was done using AQ in NT individuals.

      For the second issue, the AQ measure itself has some problems. For example, reference 38 in the paper (a key paper on AQ) also shows that those with high AQ skew more male than modern estimates of ASD, suggesting that the AQ may not fully capture the full spectrum of ASD symptomatology. Of course, this does not mean that the AQ is not a useful measure (the present data clearly show that it captures something important about spatial updating during eye movements), but it should not be confused with ASD, and its limitations need to be acknowledged. My recommendation would be to do this in the title as well - e.g. note impaired visuomotor updating in individuals with "heightened autistic traits".

      We thank the reviewer for the kind words. We now specify more carefully that our sample of participants consists of neurotypical adults scored for autistic traits and none of them was diagnosed with autism before participating in our experiment. Regarding the Autistic Quotient Questionnaire (AQ) on page 5 of the Introduction we now write:

      “The autistic traits of the whole population form a continuum, with ASD diagnosis usually situated on the high end 31-33. Moreover, autistic traits share a genetic and biological etiology with ASD 34. Thus, quantifying autistic-trait-related differences in healthy people can provide unique perspectives as well as a useful surrogate for understanding the symptoms of ASD 31,35.”

      In the Discussion (page 9) we now write:

      ”It is essential to note that our participant pool lacked pre-existing diagnoses before engaging in the experiments and we must address limitations associated with the AQ questionnaire. The AQ questionnaire demonstrates adequate test-retest reliability 36, normal distribution of sum scores in the general population 50, and cross-cultural equivalence has been established in Dutch and Japanese samples 51-53. The AQ effectively categorizes individuals into low, average, and high degrees of autistic traits, demonstrating sensitivity for both group and individual assessments 54.

      However, evolving research underscores many aspects that are not fully captured by the self-administered questionnaire: for example, gender differences in ASD trait manifestation 55. Autistic females may exhibit more socially typical interests, often overlooked by professionals 56. Camouflaging behaviors, employed by autistic women to blend in, pose challenges for accurate diagnosis 57. Late diagnoses are attributed to a lack of awareness, gendered traits, and outdated assessment tools 58. Moving forward, complementing AQ evaluations in the general population with other questionnaires, such as those assessing camouflaging abilities 59, or motor skills in everyday situation (MOSES-test 60) becomes crucial for a comprehensive understanding of autistic traits.”

      Suggestions for improvement:

      - Figure 5 is really interesting. I think it should be highlighted a bit more, perhaps even with a model that uses the results of both tasks to predict AQ scores.

      We thank the reviewer for the suggestion. However, the sample size is relatively small for building a robust and generalizable model to predict AQ scores. Statistical models built on small datasets can be prone to overfitting, meaning that they might not accurately predict the AQ for new individuals.

      - Some discussion of the memory demands of the tasks will be helpful. The authors argue that memory is not a factor, but some support for this is needed. 

      The reviewer raises an important point regarding the potential for memory demands to influence our results. We have now also investigated the accuracy of the second saccade separately for the x and y dimension. As also shown in figure 3 panel A, a motor bias was observed only in one dimension (x), weaking the argument of memory which would imply a bias in both directions (participants remembering the position of the target relative to both screen borders for example). We performed a t-test between our subsample of participants and indeed we found a difference in saccade accuracy for the x dimension (p = 0.03) but not in the y dimension (p = 0.88).

      We now add these analyses in Discussion on page 8.

      - With 3 sessions for each experiment, the authors also have data to look at learning. Did people with high AQ get better over time, or did the observed errors/biases persist throughout the experiment? 

      We thank the reviewer for pointing this out. On page 7 (Results) we now write:

      ” Understanding how these biases might change over time could provide further insights into this mechanism. Specifically, we investigated whether participants exhibited any learning effects throughout the experiments. For data of Experiment 1 – motor updating – we divided our data into 10 separate bins of 30 trials each. We conducted a repeated measure ANOVA with the within-subject factor “number of sessions” (two main sessions of 5 bins each, ~150 trials) and the between-subject factor “group” (lower vs upper quartile of the AQ distribution). We found no main effect of “number of sessions” (F(1,7) = 0.25, p = 0.66), a main effect of “group” (F(1,7) = 2.52, p = 0.015), and no interaction between the two subsample of participants and the sessions tested (F(1,7) = 0.51, p = 0.49). Data of Experiment 2 – visual updating– were separated into 3 sessions. For each session we extracted the PSE and we conducted a repeated measure ANOVA with within subject factor “sessions” and between subject factor “groups” (lower vs upper quartile of the AQ distribution). Also here we found no main effect of sessions (F(1,13) = 0.86, p = 0.39), a main effect of group (F(1,14) = 11.85, p = 0.004), and no interaction between the two subsample of participants and the sessions tested (F(1,13) = 0.20, p = 0.73). In conclusion, the current study found no evidence of learning effects across the experimental sessions. However, a significant main effect of group was observed in both Experiment 1 (motor updating) and Experiment 2 (visual updating). Participants in the group with higher autistic traits performed systematically differently on the task, regardless of the number of sessions completed compared to those in the group with lower autistic traits.”

      Reviewer #2 (Public Review):

      Summary:

      The idea that various clinical conditions may be associated, at least partially, with a disrupted corollary discharge mechanism has been present for a long time.

      In this paper, the authors draw a link between sensory overload, a characteristic of autism spectrum disorder, and a disturbance in the corollary discharge mechanism. The authors substantiate their hypothesis with strong evidence from both the motor and perceptual domains. As a result, they broaden the clinical relevance of the corollary discharge mechanism to encompass autism spectrum disorder.

      The authors write:

      "Imagine a scenario in which you're watching a video of a fast-moving car on a bumpy road. As the car hits a pothole, your eyes naturally make quick, involuntary saccades to keep the car in your visual field. Without a functional efference copy system, your brain would have difficulty accurately determining the current position of your eye in space, which in turn affects its ability to anticipate where the car should appear after each eye movement."

      I appreciate the use of examples to clarify the concept of efference copy. However, I believe this example is more related to a gain-field mechanism, informing the system about the position of the eye with respect to the head, rather than an example of efference copy per se.

      Without an efference copy mechanism, the brain would have trouble accurately determining where the eyes will be in space after an eye movement, and it will have trouble predicting the sensory consequences of the eye movement. However it can be argued that the gain-field mechanism would be sufficient to inform the brain about the current position of the eyes with respect to the head. 

      We now used a different example. And on page 3 of Introduction, we now write:

      “During a tennis game, rapid oculomotor saccades are employed to track the high-velocity ball across the visual display. In the absence of a functional efference copy mechanism, the brain would encounter difficulty in anticipating the precise retinal location of the ball following each saccade. This could result in a transient period of visual disruption as the visual system adjusts to the new eye position. The efference copy, by predicting the forthcoming sensory consequences of the saccade, would bridge this gap and facilitate the maintenance of a continuous and accurate representation of the ball's trajectory.”

      The authors write:

      "In the double-step paradigm, two consecutive saccades are made to briefly displayed targets 21, 22. The first saccade occurs without visual references, relying on internal updating to determine the eye's position."

      Maybe I have missed something, but in the double-step paradigm the first saccade can occur without the help of visual references if no visual feedback is present, that is, when saccades are performed in total darkness. Was this the case for this experiment? I could not find details about room conditions in the methods. Please provide further details.

      In case saccades were not performed in total darkness, then the first saccade can be based on the remembered location of the first target presented, which can be derived from the retinotopic trace of the first stimuli, as well as the contribution from the surroundings, that is: the remembered relative location of the first target with respect to the screen border along the horizontal meridian (i.e. allocentric cues).

      A similar logic could be applied to the second saccade. If the second saccade were based only on the retinotopic trace, without updating, then it would go up and 45 deg to the right, based on the example shown in Figure 1. With appropriate updating, the second saccade would go straight up. However, if saccades were not performed in total darkness, then the location of the second target could also be derived from its relationship with the surroundings (for example, the remembered distance from screen borders, i.e. allocentric cues).

      If saccades were not performed in total darkness, the results shown in Figures 2 and 3 could then be related to i) differences in motor updating between AQ score groups; ii) differences in the use of allocentric cues between AQ score groups; iii) a combination of i) and ii). I believe this is a point worth mentioning in the discussion." 

      Thank you for raising the important issue of visual references in the double-step saccade task. Participants performed saccades in a dimly lit room where visual references, i.e. the screen borders, were barely visible. At the time we collected the data a laboratory that allowed performing experiments in complete darkness was not at our disposal. We acknowledge the possibility that participants could have memorized the target locations relative to the screen borders. The bias of high AQ participants could then be attributed to differences in either encoding, memorization or decoding of the target location relative to the screen borders. However, the potentially abnormal use of visual references must reflect an altered remapping process since we did not find differences in saccade landing in the vertical dimension. A t-test between our group of participants revealed a difference in saccade accuracy for the x dimension (p = 0.03) but not in the y dimension (p = 0.88). We thus agree that in addition to an altered efference copy signal in high AQ participants, altered use of visual references might also affect their saccadic remapping.

      In Discussion we now write: “Our findings suggest that a general memory deficit is unlikely to fully explain the observed bias in high-AQ participants' second saccades. As highlighted in Figure 3A, the bias was specific to the horizontal dimension, weakening the argument for a global memory issue affecting both vertical and horizontal encoding of target location. However, it's important to acknowledge that even under non-darkness conditions, participants might rely on a combination of internal updating based on the initial target location and visual cues from the environment, such as screen borders. This potential use of visual references could contribute to the observed bias in the high-AQ group. If high-AQ participants differed in their reliance on visual cues compared to the low-AQ group, it could explain the specific pattern of altered remapping observed in the horizontal dimension. This possibility aligns with our argument for an abnormal remapping process underlying the results. While altered efference copy signals remain a strong candidate, the potential influence of visual cues on remapping in this population warrants further investigation. Future studies could incorporate a darkness condition to isolate the effects of internal updating on the first saccade, and systematically manipulate the availability of visual cues throughout the task. This would allow for a more nuanced understanding of how internal updating and visual reference use interact in the double-step paradigm, particularly for individuals with varying AQ scores “.

      The authors write:

      According to theories of saccadic suppression, an efference copy is necessary to predict the occurrence of a saccade."

      I would also refer to alternative accounts, where saccadic suppression appears to arise as early as the retina, due to the interaction between the visual shift introduced by the eye movement, and the retinal signal associated with the probe used to measure saccadic suppression. This could potentially account for the scaling of saccadic suppression magnitude with saccade amplitude.

      Idrees, S., Baumann, M.P., Franke, F., Münch, T.A. and Hafed, Z.M., 2020. Perceptual saccadic suppression starts in the retina. Nature communications, 11(1), p.1977. 

      We thank the reviewer. Now on page 4 of Introduction we write:

      “Some theories consider saccadic omission and saccadic suppression as resulting from an active mechanism. In this view an efference copy would signal the occurrence of a saccade, yielding a transient decrease in visual sensitivity20-22. Others however have pointed out the possibility that a purely passive mechanism suffices to induce saccadic omission23. A recent study has found evidence for saccadic suppression already in the retina. Idrees et al.24 demonstrated that retinal ganglion cells in isolated retinae of mice and pigs respond to saccade-like displacements, leading to the suppression of responses to additional flashed visual stimuli through visually triggered retinal-circuit mechanisms. Importantly, their findings suggest that perisaccadic modulations of contrast sensitivity may have a purely visual origin, challenging the need for an efference copy in the early stages of saccadic suppression. However, the suppression they measured lasted much longer than time-courses observed in behavioral data. An efference copy signal could thus be necessary to release perception from suppression.”

      Reviewer #3 (Public Review): 

      Summary:

      This work examined efference copy related to eye movements in healthy adults who have high autistic traits. Efference copies allow the brain to make predictions about sensory outcomes of self-generated actions, and thus serve important roles in motor planning and maintaining visual stability. Consequently, disrupted efference copies have been posited as a potential mechanism underlying motor and sensory symptoms in psychopathology such as Autism Spectrum Disorder (ASD), but so far very few studies have directly investigated this theory. Therefore, this study makes an important contribution as an attempt to fill in this knowledge gap. The authors conducted two eye-tracking experiments examining the accuracy of motor planning and visual perception following a saccade and found that participants with high autistic traits exhibited worse task performance (i.e., less accurate second saccade and biased perception of object displacement), consistent with their hypothesis of less impact of efference copies on motor and visual updating. Moreover, the motor and visual biases are positively correlated, indicative of a common underlying mechanism. These findings are promising and can have important implications for clinical intervention if they can be replicated in a clinical sample.

      Strengths:

      The authors utilized well-established and rigorously designed experiments and sound analytic methods. This enables easy translations between similar work in non-human primates and humans and readily points to potential candidates for underlying neural circuits that could be further examined in follow-up studies (e.g., superior colliculus, frontal eye fields, mediodorsal thalamus). The finding of no association between initial saccade accuracy and level of autistic trait in both experiments also serves as an important control analysis and increases one's confidence in the conclusion that the observed differences in task performance were indeed due to disrupted efference copies, not confounding factors such as basic visual/motor deficits or issues with working memory. The strong correlation between the observed motor and visual biases further strengthens the claim that the findings from both experiments may be explained by the same underlying mechanism - disrupted efference copies. Lastly, the authors also presented a thoughtful and detailed mechanistic theory of how efference copy impairment may lead to ASD symptomatology, which can serve as a nice framework for more research into the role of efference copies in ASD.

      Weaknesses:

      Although the paper has a lot of strengths, the main weakness of the paper is that a direct link with ASD symptoms (i.e., sensory overload and motor inflexibility as the authors suggested) cannot be established. First of all, the participants are all healthy adults who do not meet the clinical criteria for an ASD diagnosis. Although they could be considered a part of the broader autism phenotype, the results cannot be easily generalized to the clinical population without further research. Secondly, the measure used to quantify the level of autistic traits, Autistic Quotient (AQ), does not actually capture any sensory or motor symptoms of ASD. Therefore, it is unknown whether those who scored high on AQ in this study experienced high, or even any, sensory or motor difficulties. In other words, more evidence is needed to demonstrate a direct link between disrupted efference copies and sensory/motor symptoms in ASD.

      This is a valid point, and we thank the reviewer for raising it up. Moving forward, complementing AQ evaluations in the general population with other questionnaires, such as those assessing camouflaging abilities (Hull, L., Mandy, W., Lai, MC., et al., 2019), or motor skills in everyday situation (MOSES-test, Hillus J, Moseley R, Roepke S, Mohr B. 2019 ) becomes crucial for a comprehensive understanding of autistic traits.”

      We now address this point in Discussion page 9.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      - The pothole example in the introduction was really hard to follow. I wonder if there is a better example. 

      We now used a different example. And on page 3 of Introduction, we now write:

      “During a tennis game, rapid oculomotor saccades are employed to track the high-velocity ball across the visual display. In the absence of a functional efference copy mechanism, the brain would encounter difficulty in anticipating the precise retinal location of the ball following each saccade. This could result in a transient period of visual disruption as the visual system adjusts to the new eye position. The efference copy, by predicting the forthcoming sensory consequences of the saccade, would bridge this gap and facilitate the maintenance of a continuous and accurate representation of the ball's trajectory.”

      - This is really minor; I would say that saccades are not the most frequent movement that humans perform. Some of the balance-related adjustments and even heartbeats are faster. Maybe just add "voluntary". 

      We thank the reviewer for the suggestion, now added.

      - "Severe consequences" on page 4 is a bit strong. If that were true, there would be pretty severe impairments in eye movement behavior in ASD, which I don't think is the case.

      We agree with the reviewer. We now eliminated the term “severe”.

      - The results section would read better if each experiment had a short paragraph reiterating its overall goal and the specific approach each experiment took to achieve that goal. 

      Now on page 5, for the first experiment, we write:

      ”We investigated the influence of autistic traits on visual updating during saccadic eye movements using a classic double-step saccade task. This task relies on participants making two consecutive saccades to briefly presented targets. The accuracy of the second saccade serves as an indirect measure of how effectively the participant's brain integrated the execution of the first saccade into their internal representation of visual space. Participants were divided into quartiles based on the severity of their autistic traits, as assessed by the Autistic quotient questionnaire (cite). We hypothesized that individuals with higher autistic traits would exhibit greater difficulty in visual updating compared to those with lower autistic traits. This would be reflected in reduced accuracy of their second saccades in the double-step task. Figure 2C illustrates examples from participants at the extremes of the autistic trait distribution (Autistic quotient = 3, in orange and Autistic quotient = 31, in magenta). As shown, both participants were instructed to make saccades to the locations indicated by two brief target appearances (T1 and T2), as quickly and accurately as possible, following the order of presentation. However, successful execution of the second saccade requires accurate internal compensation for the first saccade, without any visual references or feedback available during the saccade itself.”

      On page 6, for experiment 2, we write:

      ”With a trans-saccadic localization task, we explored how autistic traits affect the integration of eye movements into visual perception. Participants were presented with stimuli before and after a single saccade, creating an illusion of apparent motion. We measured the perceived direction of this displacement, which is influenced by how well the participant's brain accounts for the saccadic eye movement. We predicted that individuals with higher autistic traits would show a stronger bias in the perceived displacement direction, suggesting a less accurate integration of the eye movement into their visual perception.”

      - On page 6, the text about "vertical displacement" is confusing. The spatial displacements in this experiment were horizontal? 

      Yes, they were. The spatial displacement is horizontal, but the perceived trajectory (due to the saccade) is vertical. We now changed “vertical displacement” to “vertical trajectory”.

      - Page 6, grammatical problems in "while we report a slightly slant of the dots trajectory". 

      Thank you. Now fixed.

      - It would be helpful to discuss the apparent motion part of Experiment 2 in the main text. This important part is not made clear. 

      We now in Introduction, page 4, write:

      “In this paradigm, one stimulus is shown before and another after saccade execution. Together these two stimuli produce the perception of “apparent motion”. If stimuli are placed such that the apparent motion path is orthogonal to the saccade path, then the orientation of the apparent motion path indicates how the saccade vector is integrated into vision. The apparent motion trajectory can only appear vertical if the movement of the eyes is perfectly accounted for, that is the retinotopic displacement is largely compensated, ensuring spatial stability. However, small biases of motion direction – implying under- (or over-) compensation of the eye movement – can indicate relative failures in this stabilization process. In a seminal study, Szinte and Cavanagh 27 found a slight over-compensation of the saccade vector leading to apparent motion slightly tilted against the direction of the saccade. More importantly, when efference copies are not available, i.e. localization occurring at the time of a second saccade in a double step task, a strong saccade under-compensation occurs 28.

      This phenomenon cannot be explained by perisaccadic mislocalization of flashed visual stimuli 29,30, but the two phenomena may be related in that they may both depend upon efference copy information.”

      - Figure 1 could be improved. For example, the text talks about the motor plan, but this is not clearly shown in the figure.

      We now added the motor plan into the model. Thank you.

      - Figure 2A, the scale is off (the pictures make it look like the horizontal movement was longer than the vertical). 

      Now fixed.

      - Figure 4, it would be helpful if the task was also described in the figure. 

      We thank the reviewer for the comment. We now tried to modify the figure by also adding the perceptual judgment task.

      - Figure 5A, the y-axis shows p(correct), but that is not what the y-axis shows (the legend makes the same mistake). 

      We apologize, it’s the proportion of time participants reported the second dot to be more to the right compared to the first one. We now changed the figure and the text accordingly.

      - A recent study on motion and eye movement prediction in ASD is very relevant to the work presented here.: Park et al. (2021). Atypical visual motion-prediction abilities in autism spectrum disorder. Clinical Psychological Science, 9(5), 944-960.

      Indeed. We now refer to the cited study in Discussion, on page 9.

      Reviewer #2 (Recommendations For The Authors):

      Statistics and plotting.

      I believe some of the reported statistics are not clear. For example, the authors write:

      "Saccade landing positions of participants in the lower quartile (mean degree {plus minus} SEM: 10.17{plus minus} 0.50) did not deviate significantly from those in the upper quartile (mean degree {plus minus} SEM: 9.65 {plus minus} 0.77). This result was also confirmed by a paired sample t-test (t(7) = 0.66; p = 0.66, BF10 = 0.40)"

      Maybe I am missing something, but why use a paired-sample t-test when the upper and lower quartiles constitute different groups of participants? Shouldn't a two-sample t-test be used in this case?

      We apologize for the confusion. It is indeed a two-sample t-test.

      Along the same lines, I do not understand the link between the number of degrees of freedom reported in the t-test (7) and the number of participants reported in the study (41).

      This is also evident when looking at the scatterplot in Figure 3C. How many participants formed the averages and standard errors reported in Figures 3B and 3D? Please clarify.

      I have the same comment(s) also for the visual updating task (and related figures), where 13 degrees of freedom are reported in the t-tests. Please clarify. 

      We thank the reviewer for pointing this out. The number of participants reported in the scatter plots were indeed 42.  However, we opted to compare the averages only in the lower and upper quartile of the AQ distribution to avoid dealing with a median split (which would imply a skewed distribution). Of our sample of participants in Exp1, 8 fell into the lower quartile of the AQ distribution and 8 in the upper quartile (14 deg of freedom); from Exp 2, 8 participants fell in the lower and 7 in the upper (13 deg of freedom).

      We now fixed the values accordingly.

      Reviewer #3 (Recommendations For The Authors):

      (1) The language can be a bit misleading (especially the title and abstract) as it wasn't always clear that the participants don't actually have clinical ASD. I'd suggest avoiding using words like "symptom" as that would indicate clinical severity, and using words like "traits/characteristics" instead for more precise language. 

      We apologize for the misleading terminology used. Now fixed.

      (2) In the Intro: "...perfect compensation results in a vertical trajectory, while small biases indicate stabilization issues23-25." This is a bit confusing without knowing the details of the paradigm. Consider clarifying or at least referring to Figure 4. 

      Thank you.

      (3) In the Results: "This result was also confirmed by a paired sample t-test (t(7) = 0.66;..." This is confusing as a two-sample t-test is the appropriate test here. Also, the degree of freedom seems very low - could the authors clarify how many participants are in each subgroup (i.e., low vs. high AQ quartile), for both experiments? 

      Of our sample of participants in Exp1 8 fell into the lower quartile of the AQ distribution and 8 in the upper quartile (14 deg of freedom); from Exp 2, 8 participants fell in the lower and 7 in the upper (13 deg of freedom).

      (4) In the Methods: Experiment 2: "The first dot could appear randomly above or below gaze level at a fixed horizontal location, halfway between the two fixations (x = 0, y = -5{degree sign} or +5{degree sign} depending on the trial). The second dot was then shown orthogonal to the first one at a variable horizontal location (x = 5{degree sign} {plus minus} 2.5{degree sign})." This would mean that the position of the 2nd dot relative to the 1st one would be 2.5{degree sign}- 7.5{degree sign}, but the task description in Results and Figure 5A would suggest the horizontal location of the second dot is x = 0{degree sign} {plus minus} 2.5{degree sign}. Which one is correct? 

      The second option is the correct one. We now fixed the typo in the Methods part.

      (5) There is another study that examined oculomotor efference copies in children with ASD using a similar trans-saccadic perception task (Yao et al., 2021, Journal of Vision). In that study, they found a correlation between task performance and an ASD motor symptom (repetitive behavior). This seems quite relevant to the authors' hypothesis and discussion. 

      We thank the reviewer for the suggestion. We now added the mentioned paper in the discussion.

      (6) Please proofread the entire paper carefully as there were multiple grammatical and spelling errors.

      Thank you.

    1. eLife assessment

      This study offers a useful advance by introducing a cord blood DNA methylation score for maternal smoking effects, with the inclusion of cohorts from diverse backgrounds. However, the overall strength of evidence is deemed incomplete, due to concerns regarding low exposure levels and low statistical power, which hampers the generalisability of their findings. The study provides an interesting basis for future studies, but would benefit from the addition of more cohorts to validate the findings and a focus on more diverse health outcomes.

    2. Reviewer #2 (Public Review):

      Summary:

      The authors generated a DNA methylation score in cord blood for detecting exposure to cigarette smoke during pregnancy. They then asked if it could be used to predict height, weight, BMI, adiposity and WHR throughout early childhood.

      Strengths:

      The study included two cohorts of European ancestry and one of South Asian ancestry.

      Weaknesses:

      (1) Numbers of mothers who self-reported any smoking was very low likely resulting in underpowered analyses.

      (2) Although it was likely that some mothers were exposed to second-hand smoke and/or pollution, data on this was not available.

      (3) One of the European cohorts and half of the South Asian cohort had DNA methylation measured on only 2500 CpG sites including only 125 sites previously linked to prenatal smoking.

    3. Reviewer #3 (Public Review):

      Summary:

      Deng et al. assess neonatal cord blood methylation profiles and the association with (self-reported) maternal smoking in multiple populations, including two European (CHILD, FAMILY) and one South Asian (START), via two approaches: 1) they perform an independent epigenome-wide association study (EWAS) and meta-analysis across the CHILD and FAMILY cohort, during which they also benchmark previously reported maternal-smoking associated sites, and 2) they generate new composite methylation risk scores for maternal smoking, and assess their performance and association with phenotypic characteristics in the three populations, in addition to previously described maternal smoking methylation risk scores.

      Strengths and weaknesses:

      Their meta-analysis across multiple cohorts and comparison with previous findings represents a strength. In particular the inclusion of a South Asian birth cohort is commendable as it may help to bolster generalizability. However, their conclusions are limited by several important weaknesses:

      (1) the low number of (self-reported) maternal smokers in particular their South Asian population, resulting in an inability to conduct benchmarking of maternal smoking sites in this cohort. As such, the inclusion of the START cohort in certain figures is not warranted (e.g., Figure 3) and the overall statement that smoking-associated MRS are portable across populations are not fully supported;<br /> (2) different methylation profiling tools were used: START and CHILD methylation profiles were generated using the more comprehensive 450K array while the FAMILY cohort blood samples were profiled using a targeted array covering only 3,000, as opposed to 450,000 sites, resulting in different coverage of certain sites which affects downstream analyses and MRS, and importantly, omission of potentially relevant sites as the array was designed in 2016 and substantial additional work into epigenetic traits has been conducted since then;<br /> (3) the authors train methylation risk scores (MRS) in CHILD or FAMILY populations based on sites that are associated with maternal smoking in both cohorts and internally validate them in the other cohort, respectively. As START cohort due to insufficient numbers of self-reported maternal smokers, the authors cannot fully independently validated their MRS, thus limiting the strength of their results.

      Overall strength of evidence and conclusions:

      Despite these limitations, the study overall does explore the feasibility of using neonatal cord blood for the assessment of maternal smoking. However, their conclusion on generalizability of the maternal smoking risk score is currently not supported by their data as they were not able to validate their score in a sufficiently large number of maternal smokers and never smokers of South Asian populations.

      While their generalizability remains limited due to small sample numbers and previous studies with methylation risk scores exist, their findings may nonetheless provide the basis for future work into prenatal exposures which will be of interest to the research community. In particular their finding that the maternal smoking-associated MRS was associated with small birth sizes and weights across birth cohorts, including the South Asian birth cohort that had very few self-reported smokers, is interesting and the author suggest these findings could be associated with factors other than smoking alone (e.g., pollution), which warrant further investigation and would be highly novel.<br /> Future exploration should also include a strong focus on more diverse health outcomes, including respiratory conditions that may have long-lasting health consequences.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their time and the thoughtful reviews on our manuscript. The reviewers brought good points regarding the sample size, and the low exposure in the South Asian cohort owing to their unique cultural and social practices. We recognize these as limitations of the paper and discussed these in the revised version. In the revised manuscript, we have taken the key suggestions by reviewers to 1) better illustrate the analytical flow and statistical methods, in particular, to show which datasets had been used in discovery, validation, and testing of the score – as a main figure in the manuscript and in the graphical abstract; 2) demonstrate there is no possibility of overfitting in our approach using statistical metrics of performance; 3) emphasize the goal was not for discovery (e.g. our own EWAS was not used for deriving the score), but to compare with existing EWASs and contrast the results from the white European and SA populations; 4) and supplement the analysis with previously derived maternal smoking, smoking and air pollution methylation score and to explore additional health outcomes in relation to lung health in newborns. Finally, we would also like to take this opportunity to re-iterate that it was not our objective to derive the most powerful methylation score of smoking nor to demonstrate the causal role of maternal smoking on birth weight via DNAm. We have restructure the manuscript as well as the discussion to clarify this. Please find below a point-by-point response to the comments below.

      Reviewer #1:

      The manuscript could benefit from a more detailed description of methods, especially those used to derive MRS for maternal smoking, which appears to involve overfitting. In particular, the addition of a flow chart would be very helpful to guide the reader through the data and analyses. The FDR correction in the EWAS corresponds to a fairly liberal p-value threshold. 

      We thank the reviewer for these good suggestions. In the revised manuscript, we have provided a flow chart as the new Figure 1, more detailed description of the method (added a subsection “Statistical analysis” under Materials and Methods) as well as metrics including measures of fit indices such as AUC and adjusted R2 for each validation and testing dataset to illustrate there is no danger of overfitting (in new Supplementary Table 5).

      The choice of use FDR was indeed arbitrary as there has been no consensus on what significance threshold, if any, should be used in the context of EWAS. Here we simply followed the convention in previous studies to contrast the top associated signals for their effects between different populations and with reported effect sizes. Throughout the manuscript, we have removed the notion of significant associations and used the phrase “top associated signals” or “top associations” when discussion EWAS results for individual CpGs.

      Reviewer #2:

      (1) The number of mothers who self-reported any smoking was very low, much lower than in the general population and practically non-existent in the South Asian population. As a result, all analyses appeared to have been underpowered. It is possibly for this reason that the authors chose to generate their DNA methylation model using previously published summary statistics. The resulting score is not of great value in itself due to the low-powered dataset used to estimate covariance between CpG sites. In fact, a score was generated for a much larger, better-powered dataset several years ago (Reese, EHP, 2017, PMID 27323799). 

      We thank the reviewer for pointing out the low exposure in the South Asian population, which we believe is complementary to the literature on maternal smoking that almost exclusively focused on white Europeans. However, the score was validating in the white European cohort (CHILD; current smoking 3.1%), which was reasonably similar to the trend that maternal cigarettes smoking is on the decline from 2016 to 2021, from 7.2% to 4.6% (Martin, Osterman, & Driscoll, 2023). This is also consistent with the fact that CHILD participants were recruited from major metropolitans of Canada with relatively high SES and education as compared to FAMILY.

      We do agree with the reviewers that a higher prevalence of maternal smoking in the validating sample could potential improve the power of the score. Our original analytical pipeline focused on CHILD as the validation dataset; FAMILY (see the new Figure 1) was used as the testing data. We alternatively provided an analytical scheme using FAMILY as the validation dataset, as it had a higher proportion of current smokers, however, this is limited by the number of CpGs available (128 in FAMILY vs. 2,619 in CHILD out of the 2,620 CpGs from (Joubert et al., 2016)). The results of all possible combinations of validation vs. testing and restriction of targeted array vs. HM450 are summarized in the new new Supplementary Table 5 and Supplementary Figure 5.

      To clarify, our choice to construct DNAm score using published summary statistics was not an ad-hoc decision due to the observed low power from CHILD EWAS. We agree with the reviewer that our study was indeed underpowered and was not originally intended for EWAS discovery. Thus, we specifically proposed to adopt a multivariate strategy from the literature of polygenic risk scores. This approach enabled us to leverage well-powered association signals without individual-level access to data with a sample size of n > 5,000 (Joubert et al., 2016). In comparison, the Reese maternal smoking score (Reese et al., 2017) had a discovery sample size of only n = 1,057. Our score was not out-performed, in fact, the AUC in both FAMILY (external validating dataset; n=411) and CHILD (external testing dataset; n=352) and was larger than that based on the Reese score as tabulated below (part of the new Supplementary Table 5).

      Author response table 1.

      Further, regarding the comment on the covariance matrix. Indeed, lassosum via elastic-net and summary data requires a reference covariance matrix that is consistent between the discovery data and external validation data. In fact, for moderately sized correlation/covariance values (r2 > 0.1), a sample size of >100 is sufficiently powered to detect it being different from 0 and thus used for estimation. Similar to the linkage disequilibrium of genotype data, the CpGs also exhibit a block-wise correlation structure and thus the theoretical framework of lassosum extends naturally to MRS.

      In the revised manuscript, we included the Reese score, as well as a few additional scores to compare their predictiveness of smoking phenotypes in white European cohorts. We note that the applicability was limited in the FAMILY cohort that was profiled using a targeted array and only 7 out of 28 of the CpGs in the Reese score were available. As a result, though the Reese score had similar performance than our derived score in CHILD (0.94 vs. 0.95), its performance in FAMILY was compromised (0.72 vs. 0.89).

      (2) The conclusion that "even minimal smoking exposure in South Asian mothers who were not active smokers showed a DNAm signature of small body size and low birthweight in newborns" is not warranted because no analyses were performed to show that the association between DNA methylation and birth size/weight was driven by maternal smoking. 

      We thank the reviewer for this subtle point – it was not our intention to suggest there was a causal relationship between DNA methylation and birth size that was mediated by maternal smoking. We meant to suggest that the maternal smoking methylation score was consistently associated with negative outcomes in newborns of both white European and South Asian mothers despite no maternal smoking was present in South Asian mothers. It is possible that maternal smoking MRS was capturing a lot more than just smoking and second-hand smoking, such as other environmental exposures that also lead to oxidative stress. These together are associated with reduced birth size/weight.

      In the revised manuscript, we have modified the conclusion above to:

      “Notably, these results indicate a consistent association between the DNAm signature of maternal smoking and a small body size and low birthweight in newborns, in both white European mothers who exhibited some amount of smoking and in South Asian mothers who themselves were not active smokers.”

      (3) Although it was likely that some mothers were exposed to second-hand smoke and/or pollution, data on this was either non-existent or not included in this study. Including this would have allowed a more novel investigation of the effects of smoke exposure on the pregnancies of non-smoking mothers.

      We agree with this comment – second-hand smoking was captured by self-reported weekly smoking exposure by the mothers. We reported the association with smoking exposure and found that it was not consistently associated with our methylation scores across the cohorts (cohort specific association p-values of 5.4×10-5, 3.4×10-5, and 0.58, for CHILD, FAMILY, and START; original Table 3), possibly due to the low exposure in South Asian population (max weekly exposure was 42 hrs in contrast to 168 hrs in FAMILY and 98 hrs in CHILD). Meanwhile, air pollution data are currently not available. Here we additionally performed the association between maternal smoking and air pollution methylation score, using key CpGs from the largest air pollution EWAS to-date (Gondalia et al., 2021). However, there was no association between the air pollution score and any maternal smoking phenotypes (ps > 0.4).

      (4) One of the European cohorts and half of the South Asian cohort had DNA methylation measured on only 2500 CpG sites. This set of sites included only 125 sites previously linked to prenatal smoking. The resulting model of prenatal smoking was small (only 11 CpG sites). It is possible that a large model may have been more powerful.

      That is correct – also see our response to R2 comment #1. In our previous analysis, we validated two scores (one based on CpGs on the < 3,000 CpGs array and the other one for the full HM450K). The score with more CpGs indeed had slightly better performance. We included this as one of the limitations of the paper. Nevertheless, it does not impact the conclusion that the scores (based on a larger or smaller model) are transferrable to diverse populations and can be used to comparatively study the DNAm influence of maternal smoking in newborns.

      The following was added in the discussion:

      “First, the customized array with a limited number of CpGs (<3,000) was designed in 2016 and many large EWASs on smoking and maternal smoking conducted more recently had not been included.”

      (5) The health outcomes investigated are potentially interesting but there are other possibly more important outcomes of interest such as birth complications, asthma, and intellectual impairment which are known to be associated with prenatal smoking.

      We thank the reviewer for bring up this point. One of the key health outcomes in the CHILD study was asthma, and data at later time points are available. However, we do not have similar outcomes collected in the other two studies (FAMILY and START), which focused on cardiometabolic health in young children. Thus, we did not initially include outcomes that were not available across all cohorts as the intention was to contrast the effects between populations.

      We recognize that this is an important question and decided to provide the association results for asthma and allergy at available time points in CHILD, FAMILY, and START. We also included mode of delivery via emergency C-section as an additional proxy outcome of birth complications. However, none of these were marginally (p < 0.05) associated with the DNAm smoking score. These are now included in the updated Supplementary Table 8.

      Reviewer #1 (Recommendations For The Authors):

      (1) The number of samples in the South Asian birth cohort given in the abstract (n = 887) does not match the sample size of the START cohort from the results section (results, page 7, line 139, n = 880). It is also different from the final analytical dataset size from the methods section (page 17, line 386, n = 890). Please clarify. 

      We thank the reviewer for pointing this out. In the abstract, it was the final sample sized used for EWAS (no missingness in smoking history). The 880 in result was a typo for 890, which contains three individuals with missing smoking data. These have been updated with the correct sample size for START cohort that had full epigenome-wide methylation data (n = 504, and 503 with non-missing smoking history).

      (2) Page 3, line 54: "consistent signal from the GFI1 gene (ps < 5×10-5)". Is ps a typo? If not then it might be clearer to state how many sites this included. 

      No, these summarized the six CpG sites in the GFI1 gene as outlined in Table 2. We have clarified in the abstract to show the number of CpG sites included.

      (3) Please report effect sizes together with information about the statistical significance (p values). 

      We have updated the manuscript with (standardized) effect sizes whenever possible along with p-values.

      (4) Page 4, line 80. This paragraph could be improved by adding a sentence explaining DNA methylation. 

      We thank the reviewer for this suggestion. A sentence was included to introduce DNAm at the beginning of the second paragraph:

      “DNA methylation is one of the most commonly studied epigenetic mechanisms by which cells regulate gene expression, and is increasingly recognized for its potential as a biomarker (13).”

      (5) Page 4, line 84. Sentence difficult to understand, please rephrase: "Our recent systematic review of 17 cord blood epigenome-wide association studies (EWAS) demonstrated that out of the 290 CpG sites reported, 19 sites were identified in more than one study; all of them associated with maternal smoking". 

      We have revised to clarify the review was on cord blood EWAS with five outcomes: maternal diabetes, pre-pregnancy body mass index, diet during pregnancy, smoking, and gestational age.

      “Our recent systematic review of 17 cord blood epigenome-wide association studies (EWAS) found that out of the 290 CpG sites reported to be associated with at least one of the following: maternal diabetes, pre-pregnancy body mass index (BMI), diet during pregnancy, smoking, and gestational age, 19 sites were identified in more than one study and all of them associated with maternal smoking.”

      (6) Page 5, line 93. The second part of the sentence is not necessary: "The majority of cohort studies have focused on participants of European ancestry, but few were designed to assess the influence of maternal exposures on DNA methylation changes in non-Europeans". 

      We have revised accordingly to:

      “Only a handful of cohort studies were designed to assess the influence of maternal exposures on DNA methylation changes in non-Europeans.”

      (7) Page 5, line 95. "It has been suggested that ancestral background could influence both systematic patterns of methylation (27), such as cell composition and smoking behaviours (28)". The sentence is slightly unclear. Could it be rephrased to say that cell composition differences may be present by ancestry, which can lead to differential DNAm patterns? 

      We have revised accordingly to:

      “It has been suggested that systematic patterns of methylation (Elliott et al., 2022), such as cell composition, could differ between individuals of different ancestral backgrounds, which could in turn confound the association between differential DNAm and smoking behaviours (Choquet et al., 2021).”

      (8) Page 5, line 108. How does reducing the number of predictors lead to more interpretable effect sizes? 

      This was meant as a general comment in the context of variable selection, whereby the fewer predictors there are, the effect size of each predictor becomes more interpretable. However, we recognize this comment might be irrelevant to the specific approaches we adopted. We have revised it to motivate methylation score as a powerful instrument for analysis:

      “Reducing the number of predictors and measurement noise in the data can lead to better statistical power and a more parsimonious instrument for subsequent analyses.”

      (9) Page 5, line 112. Health consequences seem a bit strong, given that the analysis describes correlations/associations. 

      We have revised it to “association with”:

      “In this paper, we investigated the epigenetic signature of maternal smoking on cord blood DNA methylation in newborns, as well as its influence on newborn and later life outcomes in one South Asian which refers to people who originate from the Indian subcontinent, and two predominantly European-origin birth cohorts.”

      Results

      (10) It would be very helpful to have a flow diagram to detail all of your analyses.

      We thank the reviewer for this suggestion. In the revised manuscript, we have provided a flow chart as the new Figure 1, updated the summary of analysis in . Table 3, and added a new Supplementary Table 5 for the DNAm score derivation, as well as more detailed description of the statistical analysis in the Materials and Methods under the subsection “Statistical analysis”.

      (11) Page 7, line 138. Please add a reference to the CHILD study. 

      We have added a reference of the CHILD study.

      (12) Tables in results and in supplemental data a) contain a mixture of fields describing the newborn and its mother (this is not true for Supplementary Table 2), b) lack column descriptions, c) lack descriptions of abbreviations and formatting used in tables, d) use different font types, e) lack descriptions of statistical tests that were used to obtain p-values, f) use inconsistent rounding. Please correct and add the missing information.

      We have consolidated the notation and nomenclature in all Tables and text. All numerical results are now rounded to 2 decimal places. The tests used were included in the Table headers as well as described in the Materials and Methods:

      “For continuous phenotypes, an analysis of variance (ANOVA) using the F-statistics or a two-sample t-test was used to compare the mean difference across the three cohorts or two groups, respectively. For categorical phenotypes, a chi-square test of independence was used to compare the difference in frequencies of observed categories. Note that three of the categories under smoking history in the START cohort had expected cell counts less than 5, and was thus excluded from the comparison, the reported p-value was for CHILD and FAMILY.”

      (13) Table 1. Sample sizes given in column descriptions do not add up to 1,650 (legend text).

      We thank the reviewer for pointing this out. The updated sample size is 1,267, based on the 352 CHILD samples, 411 FAMILY samples, and 352 START samples. Notice that we did not remove those without full smoking history data as Table 1 was intended for the epigenetic subsamples.

      (14) Page 7, line 156. Supplementary Tables are incorrectly numbered. In the text, Supplementary Table 4 comes after Supplementary Table 2.

      We thank the reviewer for catching this and have corrected the ordering of the Supplementary Tables and Figures. 

      (15) Page 7, line 158. "cell compositions" - do you mean estimated white cell proportions? 

      We have revised it to “estimated cord blood cell proportions” in the text throughout.

      (16) Smoking EWAS - do you see any overlap/directional consistency with the top findings from adult EWASs of smoking such as AHRR? 

      We annotated the top EWAS signals from the literature in the meta-analysis (new Figure 2; Supplementary Figures 1 and 3), but was only able to confirm associations in the GFI1 gene. The AHRR signals were also annotated, but below the FDR correction threshold as seen in new Figure 2 at the start of chromosome 5. We further added a new Supplementary Figure 3 to show the directional consistency with top findings (2,620 CpGs reported and 128 CpGs overlapped with our meta-analysis) from Joubert et al., 2016. The Pearson’s correlation coefficient with meta-analyzed effect for maternal smoking was 0.72 and for smoking exposure was 0.60.

      We added the following to Results:

      “Further, we observed consistency in the direction of association for the 128 CpGs that overlapped between our meta-analysis and the 2,620 CpGs with evidence of association for maternal smoking (19) (Supplementary Figure 3). Specifically, the Pearson’s correlation coefficient for maternal smoking and weekly smoking exposure was 0.72 and 0.60, respectively.”

      (17) Page 8, line 169. "also coincided with the GFI1 gene" this is a bit imprecise. Please report the correlation with the CpG from the maternal smoking analysis. 

      The CpG was inside the GFI1 gene, we have included the Pearson’s correlation with the top hit in the text below:

      “There were no CpGs associated with the ever-smoker status at an FDR of 0.05, though the top signal (cg09935388) was also mapped to the GFI1 gene (Pearson’s r2 correlation with cg12876356 = 0.75 and 0.68 in CHILD and FAMILY, respectively; Supplementary Figure 1).”

      (18) Page 8, line 171. Typo "ccg": "ccg01798813". 

      It has been corrected to “cpg01798813”.

      (19) Page 8, line 176. Please be clear about the phenotype used in these analyses. 

      The EWAS of weekly smoking exposure in START was removed in this version of the manuscript, in reflection of the results and the reviewer’s comments, as a result of this phenotyping being skewed and possibly leading to only spurious results (also see response to comment #20).

      We have clarified the phenotypes for these results under “Epigenetic Association of Maternal Smoking in White Europeans” below:

      “The maternal smoking and smoking exposure EWASs in CHILD did not yield any CpGs after FDR correction (Supplementary Figure 3).”

      (20) What was the genomic inflation for the EWASs? 474 loci in the South Asian EWAS seems like a lot of findings. Perhaps a more robust method (e.g., OSCA MOMENT) might help to control the false positive rate. 

      The genomic inflation factor was moderately across the cohorts for smoking exposure: 1.02 in CHILD, 0.94 in FAMILY, and 1.00 in START. However, there was more inflation in the tail of the distribution in START than the European cohorts. The empirical type I error rates at 0.01, 0.001, 0.00001, were high in START (x1.7, x5.7, and x165 times at each respective threshold), in contrast to CHILD (x1.06, x1.05, and x0.6) or FAMILY (x1.6, x1.9, and 0). The smoking exposure EWAS based on START was thus removed as these are likely false positives and there was very low smoking exposure to start with (11 reported weekly exposure between 2–42 hrs/week out of 462 with non-missing data). We have added the QQ-plots as well as the genomic inflation factor for the reported meta-analysis in the new Supplementary Figure 2. The following was added to the Results:

      “There was no noticeable inflation of empirical type I error in the association p-values from the meta-analysis, with the median of the observed association test statistic roughly equal to the expected median (Supplementary Figure 2).”

      (21) What is the targeted array? I don't think it has been introduced prior to this point. 

      We introduced it in the Materials and Methods under subsection “Methylation data processing and quality controls”. Considering this comment and previous comments on the ordering of Tables and Figures, we have decided to place Materials and Methods after Introduction and before Results.

      (22) The MRS section is described poorly in the results section. It is not clear where the 11 or 114 CpGs come from.

      We now include an analytical summary of all scores (derived or external from literature) in the new Supplementary Table 5. Further, we updated the description of scores in Materials and Methods under the subsection “Using DNA Methylation to Construct Predictive Models for Maternal Smoking” to clarify the source and types of MRSs derived:

      “To evaluate whether the targeted GMEL-EPIC array design has comparable performance as the epigenome-wide array to evaluate the epigenetic signature of maternal smoking, a total of three MRSs were constructed, two using the 128 CpGs available in all cohorts – across the HM450K and targeted GMEL-EPIC arrays – and with either CHILD (n = 347 with non-missing smoking history) or FAMILY (n = 397) as the validation cohort, and another using 2,107 CpGs that were only available in CHILD and START samples with CHILD as the validation cohort. Henceforth, we referred to these derived maternal smoking scores as the FAMILY targeted MRS, CHILD targeted MRS, and the HM450K MRS, respectively.”

      (23) Page 9, line 187. "There was no statistically significant difference between the two scores in all samples (p = 1.00) or among non-smokers (p = 0.24).". How was the significance assessed? Please describe the models (outcome, covariates, model type) used for comparing the two models. It would also be good to report the correlation between the scores.

      We have added a subsection “Statistical analysis” under Materials and Methods that described the tests. The correlation between scores is now summarized as a heatmap across all cohorts in the new Supplementary Figure 6.

      “For each cohort, we contrasted the three versions of the derived scores using an analysis of variance analysis (ANOVA) along with pairwise comparisons using a two-sample t-test to examine how much information might be lost due to the exclusion of more than 10-fold CpGs at the validation stage. We also examined the correlation structure between all derived and external MRSs using a heatmap summarizing their pairwise Pearson’s correlation coefficient.”

      (24) Please include the number of samples in the training/validation and in the test set in the methods and in the results.

      We thank the reviewer for this suggestion. In the revised manuscript, we have provided a flow chart as the new Figure 1 and more detailed description of the method in the Materials and Methods. Please also see response to comment #22. The training sample size is based on Joubert et al., (2016), which is 5,647. For our main analyses, the validation sample with non-missing phenotypes remained the CHILD cohort (n=347), while the FAMILY (n=397) and START (n=503) samples were the independent testing data. We alternatively provided another scenario, in which the FAMILY sample was the validation cohort, while CHILD and START were the testing cohorts. The exact sample size and performance metrics for each scenario and score are clearly summarized in the new Supplementary Table 5.

      (25) Table 3. Please clarify the type of information contained in the four last columns (p-value?).

      Yes – these are the individual cohort p-values. We have taken the suggestion from comment #12 to fully describe all columns and fields.

      (26) Page 10, line 215: "The meta-analysis revealed no heterogeneity in the direction nor the effect size of associations between populations". Please quote/refer to the results. 

      In the revision, the heterogeneity p-values were quoted and the relevant tables (Supplementary Table 8) were added to this sentence.

      (27) Figure 2 has issues with x labels. Due to the low number of ever smokers in START, the boxplot may not be the best visualisation method. It would also benefit from listing n's per group.

      We appreciate this comment to improve the figure presentation. We increased the font size for the X-labels. The sample size for each group in START was also labeled in the new Figure 3 (previously Figure 2).

      Discussion

      (28) Studying the association between maternal smoking and cord blood DNAm is interesting from a biological perspective as it allows for assessing the immediate and long-term effects of maternal smoking on newborn health. However, in terms of calculating the MRS, what are the benefits of using cord blood over the mother's blood? We know that blood-based DNAm smoking score is a powerful predictor of long-term smoking status. 

      The reviewer raises an interesting point – abundant literature supports that DNAm changes are tissue-specific. While mother’s blood DNAm smoking score reflect the long-term exposure to smoking in mothers, the cord blood DNAm captures the consequence of such long-term exposure for newborn health. One of the key results of our study is showing that established DNAm signatures of maternal smoking, which is known to mediate birth size and weight in white Europeans (these references were cited in the original manuscript), carries the same effect of reducing birth weight and size in the South Asian population. This is a critical finding from a DoHaD and public health perspective, as DNAm signatures of maternal smoking, irrespective of the smoking status of the mother, can influence the health trajectory of the newborns.

      We have expanded our discussion based on this suggestion to highlight the unique features of studying maternal smoking via different tissues and their implications. The following was added to the discussion:

      “There are several advantages of using a cord blood based biomarker from the DoHaD perspective. Firstly, cord blood provides a direct reflection of the in utero environment and fetal exposure to maternal smoking. Additionally, since cord blood is collected at birth, it eliminates potential confounding factors such as postnatal exposures that may affect maternal blood samples. Furthermore, studying cord blood DNAm allows for the assessment of epigenetic changes specifically relevant to the newborn, offering valuable information on the potential long-term health implications.”

      (29) Page 13, line 285: "Fourth" without "third".

      It has been revised accordingly.

      Methods 

      (30) The methods section does not contain all the details required to replicate the analysis. Whenever statistical analysis is conducted, this section should clearly describe the type of the analysis (linear regression, t-test, etc.) and name the dependent and independent variables. Sample sizes should also be given. 

      We added further details of test used and sample size for each analysis. We have also included a new “Statistical analysis” subsection under Materials and Methods.

      (31) Please describe MRS testing in the methods.

      We tested MRS with respect to binary and continuous smoking phenotypes using a logistic and linear regression, respectively. The predictive value was assessed using area under the roc curve for the binary outcome and an adjusted R2 for the continuous outcome. These were added to the new “Statistical analysis” subsection under Materials and Methods. See response to comments #22-24, and #30.

      (32) Please describe the methods used to compare the two versions of MRS for maternal

      smoking.

      It was a two-sample t-test, which was described in the Figure legends. We have now added this to the new “Statistical analysis” subsection under Materials and Methods.

      (33) Please describe testing the associations between MRS and Offspring Anthropometrics in more detail.

      We added further details on the regression model and the test for association in the methods. We have now added this to the new “Statistical analysis” subsection under Materials and Methods.

      (34) Meta analysing the 450k and GMEL arrays is going to substantially reduce the number of CpGs under investigation.

      We agree with the reviewer that this is not optimal for signal discovery. However, this is the only way we could synthesize evidence across the cohorts as FAMILY samples were only processed using the customized array. We added the following as a limitation of the study in the discussion.

      “First, the customized array with a limited number of CpGs (<3,000) was designed in 2016 and many large EWASs on smoking and maternal smoking conducted more recently had not been included.”

      (35) Page 16, line 364: GDM abbreviation was used in the results section (line 145), yet it is introduced in line 364. 

      Thank you for catching this, we have removed the duplicate.

      (36) Page 17, line 381: Given the stated importance of ancestry, why not restrict the sample to genetically confirmed groups?

      The reviewer has a valid point that ancestry, either perceived or genetic, can introduce additional heterogeneity due to potential differences in genetics, cultural and social practices, and lifestyles. Genetic data are indeed available for a subset of the individuals. In the original version of the manuscript, we used a stringent ancestry calling method by mapping all individuals with the 1000 Genomes samples from continental populations. The final definition was based on a combination of self-reported and genetically confirmed ancestry. However, if we restricted only to genetically confirmed groups, the sample size would be reduced to 312 (vs. 411), 268 (vs. 352), and 488 (vs. 504) in FAMILY, CHILD, and START, respectively.

      We compared the mean difference in the beta-values of the top associated CpGs and the derived MRS between those genetically confirmed vs. self-reported ancestral groups, and observed no material difference. These results are now included in the Supplementary Materials as part of the sensitivity analysis. Thus, given these considerations, we decided to use this complementary approach to retain the maximum number of samples while ensuring some aspect of ancestral homogeneity.

      “To maximize sample size in FAMILY and CHILD, we retained either self-identified or genetically confirmed Europeans based on available genetic data (Supplementary Table 1).”

      (37) Page 18, line 397: sensitivity analysis not sensitive analysis.

      Thank you for catching this, we have revised accordingly.

      (38) Page 18, line 409: smoking was rank transformed however, it would be good to see regression diagnostics for the lead loci in the EWAS to check that assumptions were met. 

      We thank the reviewer for this suggestion. Smoking exposure is indeed skewed and in fact very much zero-inflated across the cohorts. The raw phenotype violated several model assumptions in terms of variance heteroskedasticity, outlying values (influential points), and linearity. The diagnostics suggested improved deviation from model assumption, yet some aspects of the violation remained at a lesser degree. We included a comparison of results before and after transformation and model diagnostics for the lead CpG using CHILD and FAMILY data in the Supplementary Materials. The following was added to the results:

      “As a sensitivity analysis, we repeated the analysis for the continuous smoking exposure under rank transformation vs. raw phenotype for the associated CpG in GFI1 and examined the regression diagnostics (Supplementary Material), and found that the model under rank-transformation deviated less from assumptions.”

      (39) Page 19, line 418: FDR seems quite a lenient threshold, especially when genome-wide significance thresholds exist. I would be inclined to view the EWAS findings as null.

      The choice of use FDR to was indeed arbitrary as there has been no consensus on what significance threshold, if any, should be used in the context of EWAS. The significance threshold for GWAS (Pe’er et al., 2008) probably does not apply directly to EWAS as the number of effective tests will likely differ between genome-wide genetic variants and CpGs. The Bonferroni corrected p-value threshold in this context would be 0.05/200,050=2.5´10-7, which is still less stringent than the GWAS significance threshold. We originally decided to follow the convention of previous studies and use FDR to filter out a subset of plausible associations to contrast the top association signals for their effects between different populations and with reported effect sizes.

      We have revised the manuscript throughout by removing the notion of significant associations, and instead used the phrase “top associated signals” or “top associations” when discussion EWAS results for individual CpGs. The following was added to Materials and Methods to clarify the choice of our threshold:

      “For each EWAS or meta-analysis, the false discovery rate (FDR) adjustment was used to control multiple testing and we considered CpGs that passed an FDR-adjusted p-value < 0.05 to be relevant for maternal smoking.”

      (40) I do not understand Supplementary Figure 6 - how have the data been standardised? Why not plot the CpGs on the beta-value scale?

      The standardized values were plotted as the reported p-values for the mean and variance equality tests (i.e. ANOVA F-test, Levene’s test, Anderson-Darling test) were based on these transformed values to reduce inflation due to non-normality. We have since removed this comparison and kept only the comparison of the overall score as the number of CpGs in the HM450k score (143 CpGs) for comparison is too high to be visually interpretable.

      (41) It is my understanding, that the MRS for maternal smoking was constructed using external weights projected and regularised using elastic net (effectively trained) in CHILD cohort. The results section discusses associations between maternal smoking history and outcomes in CHILD, FAMILY, and START. Training and testing the score in the same sample (cohort) may result in overfitting and therefore should not be implemented.

      The original MRS was constructed using external weights from an independent discovery sample (Joubert et al., 2016; n > 5,000) and the LASSO validation was done in CHILD (n = 352), external testing was in FAMILY and START. This was the lassosum framework whereby we leverage larger sample size from external studies to select more plausible CpGs as candidates to include in the model. Thus, training, validation, and testing were not done in the same samples. We have included a Figure 1 to illustrate the updated analytical flow and a graphical abstract to summarize the methods.

      (42) Is it a concern that the findings don't seem to replicate Joubert's results, which came from a much larger study?

      Replication is usually done in samples much larger than the discovery samples, thus it is not a concern that we were unable to confirm all signals from Joubert et al., (2016). However, 6/7 of the top associations (FDR adjusted p-value < 0.05) in the meta-analysis were declared as significant in Joubert et al. (2016). In addition, the fact that using Joubert’s summary statistics, we were able to derive MRSs that were strongly associated with both smoking history and weekly exposure suggests shared signals. Also see response to  R1 comment #16 for a comparison of effect consistency.

      (43) Please check that all analysis scripts have been uploaded to Github and that the EWAS results are publicly available.

      We thank the reviewer for this suggestion. All updated scripts and EWAS results are available on Github. We are working to have the results also submitted to EWAS catalog.

      Reviewer #2 (Recommendations For The Authors):

      The impact of this study is reduced due to previous findings:

      (1) Previous studies have already shown that DNA methylation may mediate the effect of maternal smoking on birth size/weight (see e.g.https://doi.org/10.1098/rstb.2018.0120https://doi.org/10.1093/ije/dyv048).

      We thank the reviewer for this point and would like to take the opportunity to clarify that it was not our objective to examine whether there was a causal relationship, between DNA methylation and birth size that was mediated by maternal smoking. One of the key messages of our study is to evaluate whether epigenetic associations – at individual CpGs and aggregated as a score – are consistent between white European and South Asian populations. One way to examine this is through using established DNAm signatures of maternal smoking, which is known to mediate birth size and weight in white Europeans (these references were cited in the original manuscript), and confirm whether they also carry the same effect on birth outcomes in the South Asian population.

      Indeed, our results support that maternal smoking methylation score was consistently associated with negative outcomes in newborns of both white European and South Asian mothers despite no maternal smoking was present in South Asian mothers. These collective point to the possibility that the maternal smoking MRS was capturing a lot more than just smoking and second-hand smoking, but potentially other environmental exposures that also lead to oxidative stress. These together are associated with health consequences, including reduced birth size/weight. One of the candidates for such exposure is air pollution as some of the maternal smoking CpGs were previously linked to air pollution. However, we were unable to assess this hypothesis directly without the air pollution data, and the air pollution methylation score was not associated with smoking history (Supplementary Figure 5) nor smoking exposure (p > 0.4 in CHILD, FAMILY and START).

      The following was added to Materials and Methods under the subsection Using DNA Methylation to Construct Predictive Models for Maternal Smoking:

      “To benchmark and compare with existing maternal smoking MRSs, we calculated the Reese score using 28 CpGs (48,49),  Richmond score using 568 CpGs (49), Rauschert score using 204 CpGs (50), Joubert score using all 2,620 CpGs with evidence of association for maternal smoking (19), and finally a three-CpG score for air pollution (51). The details of these scores and score weight can be found in Supplementary Table 4.”

      The following was added to Results

      “Both produced methylation scores that were significantly associated with maternal smoking history (ANOVA F-test p-values =1.0×10-6 and 2.4×10-14 in CHILD and  6.9×10-16 and <2.2×10-16 in FAMILY), and the best among alternative scores for CHILD and FAMILY (Supplementary Table 5). With the exception of the air pollution MRS, all remaining scores were marginally associated with smoking history in both CHILD and FAMILY (Supplementary Figure 5).”

      (2) Due to the small study size and low levels of prenatal smoke exposure, the model derived here is of little value and is, in fact, superseded by a previously published model (PMID: 27323799). At the very least, the model should be evaluated here. A novel aspect of this study is the inclusion of a South Asian cohort. Unfortunately, smoke exposure is practically non-existent, so it is unclear how it can be used. The more interesting finding in this study is the possibility that environmental factors such as second-hand smoke or pollution may have similar effects on pregnancies as maternal smoking. Are these available? If so, they could be evaluated for associations with DNA methylation. This would be novel. 

      In the revised manuscript, we included the Reese score (Reese et al., 2017) and a few other maternal smoking scores for comparison. In the CHILD cohort, the performance was comparable to our derived score (AUC of 0.95 vs. 0.94 for Reese score), but its applicability was limited since the FAMILY dataset was profiled using a targeted array and only 7 out of 28 of the CpGs in the Reese score were available (AUC of 0.89 vs. 0.72 for Reese). As compared to the remaining scores from literature (see the new Supplementary Table 5 for complete results), Reese’s score has generally favorable performance.

      We did examine second-hand smoking in the original manuscript, showing a significant association with weekly maternal smoking exposure (original Table 3 and Supplementary Table 8). However, air pollution data is not available for assessment.

      (3) The other novel aspect is the evaluation of associations with outcomes later in life. Height and weight are interesting but impact could be gained by including other relevant outcomes such as birth complications, asthma, and intellectual impairment which are known to be associated with prenatal smoking. 

      We thank the reviewer for bring up this point. One of the key health outcomes in the CHILD study was asthma, and data at later time points are available. However, we do not have similar outcomes collected in the other two studies (FAMILY and START), which focused on cardiometabolic health in young children. Thus, we did not initially include outcomes that were not available across all cohorts as the intention was to contrast the effects between populations.

      We recognize that this is an important question and decided to provide the association results for mother reported asthma and allergy, but based on different definitions as these outcomes cannot be harmonized across the cohorts. We also included mode of delivery via emergency C-section as an additional proxy outcome of birth complication.

      The following was added to Materials and Methods:

      “Mode of delivery (emergency c-section vs. other) was collected at the time of delivery.”

      “Additional phenotypes included smoking exposures (hours per week) at home, potential allergy based on mother reporting any of: eczema, hay fever, wheeze, asthma, food allergy (egg, cow milk, soy, other) for her child in FAMILY and START, and asthma based on mother’s opinion in CHILD (“In your opinion, does the child have any of the following? Asthma”).”

      The following was added to Results:

      “The maternal smoking MRS was consistently associated with increasing weekly smoking exposure in children reported by mothers at the 1-year (0.51±0.15, FDR adjusted p= 0.0052) , 3-year (0.53±0.16, FDR adjusted p= 0.0052), and 5-year (0.40±0.15, FDR adjusted p= 0.021) visits with similar effects.”

      “We did not find any association with self-reported allergy or asthma in children at later visits (Supplementary Table 8). Further, there was no evidence of association between the MRS and any maternal outcomes (Supplementary Table 8).”

      REFERENCES:

      Gondalia, R., Baldassari, A., Holliday, K. M., Justice, A. E., Stewart, J. D., Liao, D., . . . Whitsel, E. A. (2021). Epigenetically mediated electrocardiographic manifestations of sub-chronic exposures to ambient particulate matter air pollution in the Women's Health Initiative and Atherosclerosis Risk in Communities Study. Environ Res, 198, 111211. doi:10.1016/j.envres.2021.111211

      Joubert, B. R., Felix, J. F., Yousefi, P., Bakulski, K. M., Just, A. C., Breton, C., . . . London, S. J. (2016). DNA Methylation in Newborns and Maternal Smoking in Pregnancy: Genome-wide Consortium Meta-analysis. Am J Hum Genet, 98(4), 680-696. doi:10.1016/j.ajhg.2016.02.019

      Martin, J. A., Osterman, M. J. K., & Driscoll, A. K. (2023). Declines in Cigarette Smoking During Pregnancy in the United States, 2016-2021. NCHS Data Brief(458), 1-8. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/36723453

      Reese, S. E., Zhao, S., Wu, M. C., Joubert, B. R., Parr, C. L., Haberg, S. E., . . . London, S. J. (2017). DNA Methylation Score as a Biomarker in Newborns for Sustained Maternal Smoking during Pregnancy. Environ Health Perspect, 125(4), 760-766. doi:10.1289/EHP333

    1. eLife assessment

      This important manuscript shows that axonal transport of Wnd is required for its normal degradation by the Hiw ubiquitin ligase. These are interesting findings supported by solid data. However, the summary and conclusions are over-interpreted and how Rab11 is involved in Golgi processing or axonal transport of Wnd is not resolved and would require additional experiments to support the claims. Alternatively, the authors should dial back on their interpretation.

    2. Reviewer #1 (Public Review):

      Summary:

      The manuscript by Kim et al. describes a role for axonal transport of Wnd (a dual leucine zipper kinase) for its normal degradation by the Hiw ubiquitin ligase pathway. In Hiw mutants, the Wnd protein accumulates dramatically in nerve terminals compared to the cell body of neurons. In the absence of axonal transport, Wnd levels rise and lead to excessive JNK signaling that makes neurons unhappy.

      Strengths:

      Using GFP-tagged Wnd transgenes and structure-function approaches, the authors show that palmitoylation of the protein at C130 plays a role in this process by promoting golgi trafficking and axonal localization of the protein. In the absence of this transport, Wnd is not degraded by Hiw. The authors also identify a role for Rab11 in the transport of Wnd, and provide some evidence that Rab11 loss-of-function neuronal degenerative phenotypes are due to excessive Wnd signaling. Overall, the paper provides convincing evidence for a preferential site of action for Wnd degradation by the Hiw pathway within axonal and/or synaptic compartments of the neuron. In the absence of Wnd transport and degradation, the JNK pathway becomes hyperactivated. As such, the manuscript provides important new insights into compartmental roles for Hiw-mediated Wnd degradation and JNK signaling control.

      Weaknesses:

      It is unclear if the requirement for Wnd degradation at axonal terminals is due to restricted localization of HIW there, but it seems other data in the field argues against that model. The mechanistic link between Hiw degradation and compartmentalization is unknown.

    3. Reviewer #2 (Public Review):

      Summary:

      Utilizing transgene expression of Wnd in sensory neurons in Drosophila, the authors found that Wnd is enriched in axonal terminals. This enrichment could be blocked by preventing palmitoylation or inhibiting Rab1 or Rab11 activity. Indeed, subsequent experiments showed that inhibiting Wnd can prevent toxicity by Rab11 loss of function.

      Strengths:

      This paper evaluates in detail Wnd location in sensory neurons, and identifies a novel genetic interaction between Rab11 and Wnd that affects Wnd cellular distribution.

      Weaknesses:

      The authors report low endogenous expression of wnd, and expressing mutant hiw or overexpressing wnd is necessary to see axonal terminal enrichment. It is unclear if this overexpression model (which is known to promote synaptic overgrowth) would be relevant to normal physiology.

      Palmitoylation of the Wnd orthologue DLK in sensory neurons has previously been identified as important for DLK trafficking in a cell culture model.

      The authors find genetic interaction between Wnd and Rab11, but these studies are incomplete and they do not support the authors' mechanistic interpretation.

    1. eLife assessment

      Manley and Vaziri introduce an important new method for brain-wide imaging of cellular activity in zebrafish and provide evidence for the applicability of this technique. They use this method to explore the question of how neural variability gives rise to variability in behavior. The analyses used are mostly convincing, with some central results that are currently incomplete and difficult to interpret.

    2. Reviewer #1 (Public Review):

      Summary:

      In this paper, Manley and Vaziri investigate whole-brain neural activity underlying behavioural variability in zebrafish larvae. They combine whole brain (single cell level) calcium imaging during the presentation of visual stimuli, triggering either approach or avoidance, and carry out whole brain population analyses to identify whole brain population patterns responsible for behavioural variability. They show that similar visual inputs can trigger large variability in behavioural responses. Though visual neurons are also variable across trials, they demonstrate that this neural variability does not degrade population stimulus decodability. Instead, they find that the neural variability across trials is in orthogonal population dimensions to stimulus encoding and is correlated with motor output (e.g. tail vigor). They then show that behavioural variability across trials is largely captured by a brain-wide population state prior to the trial beginning, which biases choice - especially on ambiguous stimulus trials. This study suggests that parts of stimulus-driven behaviour can be captured by brain-wide population states that bias choice, independently of stimulus encoding.

      Strengths:

      -The strength of the paper principally resides in the whole brain cellular level imaging in a well-known but variable behaviour.

      - The analyses are reasonable and largely answer the questions the authors ask.

      - Overall the conclusions are well warranted.

      Weaknesses:

      A more in-depth exploration of some of the findings could be provided, such as:

      - Given that thousands of neurons are recorded across the brain a more detailed parcelation of where the neurons contribute to different population coding dimensions would be useful to better understand the circuits involved in different computations.

      - Given that the behaviour on average can be predicted by stimulus type, how does the stimulus override the brain-wide choice bias on some trials? In other words, a better link between the findings in Figures 2 and 3 would be useful for better understanding how the behaviour ultimately arises.

      - What other motor outputs do the noise dimensions correlate with?

      The dataset that the authors have collected is immensely valuable to the field, and the initial insights they have drawn are interesting and provide a good starting ground for a more expanded understanding of why a particular action is determined outside of the parameters experimenters set for their subjects.

    3. Reviewer #2 (Public Review):

      Overview

      In this work, Manley and Vaziri investigate the neural basis for variability in the way an animal responds to visual stimuli evoking prey-capture or predator-avoidance decisions. This is an interesting problem and the authors have generated a potentially rich and relevant data set. To do so, the authors deployed Fourier light field microscopy (Flfm) of larval zebrafish, improving upon prior designs and image processing schemes to enable volumetric imaging of calcium signals in the brain at up to 10 Hz. They then examined associations between neural activity and tail movement to identify populations primarily related to the visual stimulus, responsiveness, or turn direction - moreover, they found that the activity of the latter two populations appears to predict upcoming responsiveness or turn direction even before the stimulus is presented. While these findings may be valuable for future more mechanistic studies, issues with resolution, rigor of analysis, clarity of presentation, and depth of connection to the prior literature significantly dampen enthusiasm.

      Imaging

      - Resolution: It is difficult to tell from the displayed images how good the imaging resolution is in the brain. Given scattering and lensing, it is important for data interpretation to have an understanding of how much PSF degrades with depth.

      - Depth: In the methods it is indicated that the imaging depth was 280 microns, but from the images of Figure 1 it appears data was collected only up to 150 microns. This suggests regions like the hypothalamus, which may be important for controlling variation in internal states relevant to the behaviors being studied, were not included.

      - Flfm data processing: It is important for data interpretation that the authors are clearer about how the raw images were processed. The de-noising process specifically needs to be explained in greater detail. What are the characteristics of the noise being removed? How is time-varying signal being distinguished from noise? Please provide a supplemental with images and algorithm specifics for each key step.

      - Merging: It is noted that nearby pixels with a correlation greater than 0.7 were merged. Why was this done? Is this largely due to cross-contamination due to a drop in resolution? How common was this occurrence? What was the distribution of pixel volumes after aggregation? Should we interpret this to mean that a 'neuron' in this data set is really a small cluster of 10-20 neurons? This of course has great bearing on how we think about variability in the response shown later.

      - Bleaching: Please give the time constants used in the fit for assessing bleaching.

      Analysis

      - Slow calcium dynamics: It does not appear that the authors properly account for the slow dynamics of calcium-sensing in their analysis. Nuclear-localized GCaMP6s will likely have a kernel with a multiple-second decay time constant for many of the cells being studied. The value used needs to be given and the authors should account for variability in this kernel time across cell types. Moreover, by not deconvolving their signals, the authors allow for contamination of their signal at any given time with a signal from multiple seconds prior. For example, in Figure 4A (left turns), it appears that much of the activity in the first half of the time-warped stimulus window began before stimulus presentation - without properly accounting for the kernel, we don't know if the stimulus-associated activity reported is really stimulus-associated firing or a mix of stimulus and pre-stimulus firing. This also suggests that in some cases the signals from the prior trial may contaminate the current trial.

      - Partial Least Squares (PLS) regression: The steps taken to identify stimulus coding and noise dimensions are not sufficiently clear. Please provide a mathematical description.

      - No response: It is not clear from the methods description if cases where the animal has no tail response are being lumped with cases where the animal decides to swim forward and thus has a large absolute but small mean tail curvature. These should be treated separately.

      Results

      - Behavioral variability: Related to Figure 2, within- and across-subject variability are confounded. Please disambiguate. It may also be informative on a per-fish basis to examine associations between reaction time and body movement.

      - Data presentation clarity: All figure panels need scale bars - for example, in Figure 3A there is no indication of timescale (or time of stimulus presentation). Figure 3I should also show the time series of the w_opt projection.

      - Pixel locations: Given the poor quality of the brain images, it is difficult to tell the location of highlighted pixels relative to brain anatomy. In addition, given that the midbrain consists of much more than the tectum, it is not appropriate to put all highlighted pixels from the midbrain under the category of tectum. To aid in data interpretation and better connect this work with the literature, it is recommended that the authors register their data sets to standard brain atlases and determine if there is any clustering of relevant pixels in regions previously associated with prey-capture or predator-avoidance behavior.

      Interpretation

      - W_opt and e_1 orthogonality: The statement that these two vectors, determined from analysis of the fluorescence data, are orthogonal, actually brings into question the idea that true signal and leading noise vectors in firing-rate state-space are orthogonal. First, the current analysis is confounding signals across different time periods - one could assume linearity all the way through the transformations, but this would only work if earlier sources of activation were being accounted for. Second, the transformation between firing rate and fluorescence is most likely not linear for GCaMP6s in most of the cells recorded. Thus, one would expect a change in the relationship between these vectors as one maps from fluorescence to firing rate.

      - Sources of variability: The authors do not take into account a fairly obvious source of variability in trial-to-trial response - eye position. We know that prey capture responsiveness is dependent on eye position during stimulus (see Figure 4 of PMID: 22203793). We also expect that neurons fairly early in the visual pathway with relatively narrow receptive fields will show variable responses to visual stimuli as the degree of overlap with the receptive field varies with eye movement. There can also be small eye-tracking movements ahead of the decision to engage in prey capture (Figure 1D, PMID: 31591961) that can serve as a drive to initiate movements in a particular direction. Given these possibilities indicating that the behavioral measure of interest is gaze, and the fact that eye movements were apparently monitored, it is surprising that the authors did not include eye movements in the analysis and interpretation of their data.

    4. Reviewer #3 (Public Review):

      Summary:

      In this study, Manley and Vaziri designed and built a Fourier light-field microscope (fLFM) inspired by previous implementations but improved and exclusively from commercially available components so others can more easily reproduce the design. They combined this with the design of novel algorithms to efficiently extract whole-brain activity from larval zebrafish brains.

      This new microscope was applied to the question of the origin of behavioral variability. In an assay in which larval zebrafish are exposed to visual dots of various sizes, the fish respond by turning left or right or not responding at all. Neural activity was decomposed into an activity that encodes the stimulus reliably across trials, a 'noise' mode that varies across trials, and a mode that predicts tail movements. A series of analyses showed that trial-to-trial variability was largely orthogonal to activity patterns that encoded the stimulus and that these noise modes were related to the larvae's behavior.

      To identify the origins of behavioral variability, classifiers were fit to the neural data to predict whether the larvae turned left or right or did not respond. A set of neurons that were highly distributed across the brain could be used to classify and predict behavior. These neurons could also predict spontaneous behavior that was not induced by stimuli above chance levels. The work concludes with findings on the distributed nature of single-trial decision-making and behavioral variability.

      Strengths:

      The design of the new fLFM microscope is a significant advance in light-field and computational microscopy, and the open-source design and software are promising to bring this technology into the hands of many neuroscientists.

      The study addresses a series of important questions in systems neuroscience related to sensory coding, trial-to-trial variability in sensory responses, and trial-to-trial variability in behavior. The study combines microscopy, behavior, dynamics, and analysis and produces a well-integrated analysis of brain dynamics for visual processing and behavior. The analyses are generally thoughtful and of high quality. This study also produces many follow-up questions and opportunities, such as using the methods to look at individual brain regions more carefully, applying multiple stimuli, investigating finer tail movements and how these are encoded in the brain, and the connectivity that gives rise to the observed activity. Answering questions about variability in neural activity in the entire brain and its relationship to behavior is important to neuroscience and this study has done that to an interesting and rigorous degree.

      Points of improvement and weaknesses:

      The results on noise modes may be a bit less surprising than they are portrayed. The orthogonality between neural activity patterns encoding the sensory stimulus and the noise modes should be interpreted within the confounds of orthogonality in high-dimensional spaces. In higher dimensional spaces, it becomes more likely that two random vectors are almost orthogonal. Since the neural activity measurements performed in this study are quite high dimensional, a more explicit discussion is warranted about the small chance that the modes are not almost orthogonal.

      The conclusion that sparsely distributed sets of neurons produce behavioral variability needs more investigation because the way the results are shown could lead to some misinterpretations. The prediction of behavior from classifiers applied to neural activity is interesting, but the results are insufficiently presented for two reasons.

      (1) The neurons that contribute to the classifiers (Figures 4H and J) form a sufficient set of neurons that predict behavior, but this does not mean that neurons outside of that set cannot be used to predict behavior. Lasso regularization was used to create the classifiers and this induces sparsity. This means that if many neurons predict behavior but they do so similarly, the classifier may select only a few of them. This is not a problem in itself but it means that the distributions of neurons across the brain (Figures 4H and J) may appear sparser and more distributed than the full set of neurons that contribute to producing the behavior. This ought to be discussed better to avoid misinterpretation of the brain distribution results, and an alternative analysis that avoids the confound could help clarify.

      (2) The distribution of neurons is shown in an overly coarse manner in only a flattened brain seen from the top, and the brain is divided into four coarse regions (telencephalon, tectum, cerebellum, hindbrain). This makes it difficult to assess where the neurons are and whether those four coarse divisions are representative or whether the neurons are in other non-labeled deeper regions. For these two reasons, some of the statements about the distribution of neurons across the brain would benefit from a more thorough investigation.

    1. Reviewer #1 (Public Review):

      Summary:

      This research group has consistently performed cutting-edge research aiming to understand the role of hormones in the control of social behaviors, specifically by utilizing the genetically tractable teleost fish, medaka, and the current work is no exception. The overall claim they make, that estrogens modulate social behaviors in males and females is supported, with important caveats. For one, there is no evidence these estrogens are generated by "neurons" as would be assumed by their main claim that it is NEUROestrogens that drive this effect. While indeed the aromatase they have investigated is expressed solely in the brain, in most teleosts, brain aromatase is only present in glial cells (astrocytes, radial glia). The authors should change this description so as not to mislead the reader. Below I detail more specific strengths and weaknesses of this manuscript.

      Strengths:

      • Excellent use of the medaka model to disentangle the control of social behavior by sex steroid hormones.

      • The findings are strong for the most part because deficits in the mutants are restored by the molecule (estrogens) that was no longer present due to the mutation.

      • Presentation of the approach and findings are clear, allowing the reader to make their own inferences and compare them with the authors'.

      • Includes multiple follow-up experiments, which lead to tests of internal replication and an impactful mechanistic proposal.

      • Findings are provocative not just for teleost researchers, but for other species since, as the authors point out, the data suggest mechanisms of estrogenic control of social behaviors may be evolutionarily ancient.

      Weaknesses:

      • As stated in the summary, the authors attribute the estrogen source to neurons and there isn't evidence this is the case. The impact of the findings doesn't rest on this either.

      • The d4 versus d8 esr2a mutants showed different results for aggression. The meaning and implications of this finding are not discussed, leaving the reader wondering.

      • Lack of attribution of previously published work from other research groups that would provide the proper context of the present study.

      • There are a surprising number of citations not included; some of the ones not included argue against the authors' claims that their findings were "contrary to expectation".

      • The experimental design for studying aggression in males has flaws. A standard test like a resident-intruder test should be used.

      • While they investigate males and females, there are fewer experiments and explanations for the female results, making it feel like a small addition or an aside.

      • The statistics comparing "experimental to experimental" and "control to experimental" aren't appropriate.

    2. Reviewer #2 (Public Review):

      The novelty of this study stems from the observations that neuro-estrogens appear to interact with brain androgen receptors to support male-typical behaviors. The study provides a step forward in clarifying the somewhat contradictory findings that, in teleosts and unlike other vertebrates, androgens regulate male-typical behaviors without requiring aromatization, but at the same time estrogens appear to also be involved in regulating male-typical behaviors. They manipulate the expression of one aromatase isoform, cyp19a1b, that is purported to be brain-specific in teleosts. Their findings are important in that brain estrogen content is sensitive to the brain-specific cyp19a1b deficiency, leading to alterations in both sexual behavior and aggressive behavior. Interestingly, these males have relatively intact fertility rates, despite the effects on the brain.

      That said, the framing of the study, the relevant context, and several aspects of the methods and results raise concerns. Two interpretations need to be addressed/tempered:

      (1) that the rescue of cyp19a1b deficiency by tank-applied estradiol is not necessarily a brain/neuro-estrogen mode of action, and<br /> (2) the large increases in peripheral and brain androgen levels in the cyp19a1b deficient animals imply some indirect/compensatory effects of lifelong cyp19a1b deficiency.

    3. Reviewer #3 (Public Review):

      Summary:

      Taking advantage of the existence in fish of two genes coding for estrogen synthase, the enzyme aromatase, one mostly expressed in the brain (Cyp19a1b) and the other mostly found in the gonads (Cyp19a1a), this study investigates the role of neuro-estrogens in the control of sexual and aggressive behavior in teleost fish. The constitutive deletion of Cyp19a1b reduced brain estrogen content by 87% in males and about 50% in females. It led to reduced sexual and aggressive behavior in males and reduced sexual behavior in females. These effects are reversed by adult treatment with estradiol thus indicating that they are activational in nature. The deletion of Cyp19a1b is associated with a reduced expression of the genes coding for the two androgen receptors, ara, and arb, in brain regions involved in the regulation of social behavior. The analysis of the gene expression and behavior of mutants of estrogen receptors indicates that these effects are likely mediated by the activation of the esr1 and esr2a isoforms. These results provide valuable insight into the role of neuro-estrogens in social behavior in the most abundant vertebrate taxa. While estrogens are involved in the organization of the brain and behavior of some birds and rodents, neuro-estrogens appear to play an activational role in fish through a facilitatory action of androgen signaling.

      Strengths:

      - Evaluation of the role of brain "specific" Cyp19a1 in male teleost fish, which as a taxa are more abundant and yet proportionally less studied than the most common birds and rodents. Therefore, evaluating the generalizability of results from higher vertebrates is important. This approach also offers great potential to study the role of brain estrogen production in females, an understudied question in all taxa.

      - Results obtained from multiple mutant lines converge to show that estrogen signaling drives aspects of male sexual behavior.

      - The comparative discussion of the age-dependent abundance of brain aromatase in fish vs mammals and its role in organization vs activation is important beyond the study of the targeted species.

      Weaknesses:

      - The new transgenic lines are under-characterized. There is no evaluation of the mRNA and protein products of Cyp19a1b and ESR2a.

      - The stereotypic sequence of sexual behavior is poorly described, in particular, the part played by the two sexual partners, such that the conclusions are not easily understandable, notably with regards to the distinction between motivation and performance. The behavior of females is only assessed from the perspective of the male, which raises questions about the interpretation of the reduced behavior of the males.<br /> At no point do the authors seem to consider that a reduced behavior of one sex could result from a reduced sensory perception from this sex or a reduced attractivity or sensory communication from the other sex.

      - Aspects of the methods are not detailed enough to allow proper evaluation of their quality or replication of the data.

      - It seems very dangerous to use the response to a mutant abnormal behavior (ESR2-KO females) as a test, given that it is not clear what is the cause of the disrupted behavior.

      - Most experiments are weakly powered (low sample size) and analyzed by multiple T-tests while 2 way ANOVA could have been used in several instances. No mention of T or F values, or degrees of freedom.

      - The variability of the mRNA content for the same target gene between experiments (genotype comparison vs E2 treatment comparison) raises questions about the reproducibility of the data (apparent disappearance of genotype effect).

      - The discussion confuses the effects of estrogens on sexual differentiation (developmental programming = permanent) and activation (= reversible activation of brain circuits in adulthood) of the brain and behavior. Whether sex differences in the circuits underlying social behaviors exist is not clear.

      Conclusions :

      Overall, the claims regarding the activational role of neuro-estrogens on male sexual behavior are supported by converging evidence from multiple mutant lines. The role of neuroestrogens on gene expression in the brain is mostly solid too. The data for females are comparatively weaker. Conclusions regarding sexual differentiation should be considered carefully.

    1. eLife assessment

      This is a potentially valuable study suggesting that neuronal-specific loss of function of the RNA splicing factor Ptbp1 in striatal neurons induces dopaminergic markers and alleviates motor defects in a 6-hydroxydopamine (6-OHDA) mouse model of Parkinson's Disease. If properly replicated, the claims of the manuscript are remarkable and identify a straightforward mechanism with therapeutic relevance for the treatment of motor deficits in Parkinson's Disease. However, while the rescue of motor deficits with Ptbp1 manipulation is solid, the strength of the evidence supporting the induction of a dopaminergic neuronal identity is incomplete. The study nevertheless addresses recent controversial literature on cell reprogramming in Parkinson's Disease and will be of interest to researchers with a focus on the application of gene therapy to rescue neurodegeneration.

    2. Reviewer #1 (Public Review):

      Summary:

      Recent years have seen spectacular and controversial claims that loss of function of the RNA splicing factor Ptbp1 can efficiently reprogram astrocytes into functional neurons that can rescue motor defects seen in 6-hydroxydopamine (6-OHDA)-induced mouse models of Parkinson's disease (PD). This latest study is one of a series that fails to reproduce these observations, but remarkably also reports that neuronal-specific loss of function of Ptbp1 both induces expression of dopaminergic neuronal markers in striatal neurons and rescues motor defects seen in 6-OHDA-treated mice. The claims, if replicated, are remarkable and identify a straightforward and potentially translationally relevant mechanism for treating motor defects seen in PD models. However, while the reported behavioral effects are strong and were collected without sample exclusion, other claims made here are less convincing. In particular, no evidence that Ptbp1 loss of function actually occurs in striatal neurons is provided, and the immunostaining data used to claim that dopaminergic markers are induced in striatal neurons is not convincing. Furthermore, no characterization of the molecular identity of Ptbp1-deficient striatal neurons is provided using single-cell RNA-Seq or spatial transcriptomics, making it difficult to conclude that these cells are indeed adopting a dopaminergic phenotype.

      Overall, while the claims of behavioral rescue of 6-OHDA-treated mice appear compelling, it is essential that these be independently replicated as soon as possible before further studies on this topic are carried out. Insights into the molecular mechanisms by which neuronal-specific loss of function of Ptbp1 induces behavioral rescue are lacking, however. Moreover, the claims of induction of neuronal identity in striatal neurons by Ptbp1 require considerable additional work to be convincing.

      Strengths of the study:

      (1) The effect size of the behavioral rescue in the stepping and cylinder tests is strong and significant, essentially restoring 6-OHDA-lesioned mice to control levels.

      (2) Since the neurotoxic effects of 6-OHDA treatment are highly variable, the fact that all behavioral data was collected blinded and that no samples were excluded from analysis increases confidence in the accuracy of the results reported here.

      Weaknesses of the study:

      (1) Neurons express relatively little Ptbp1. Indeed, cellular expression levels as measured by scRNA-Seq are substantially below those of astrocytes and other non-neuronal cell types, and Ptbp1 immunoreactivity has not been observed in either striatal or midbrain neurons (e.g. Hoang, et al. Nature 2023). This raises the question of whether any recovery of Th expression is indeed mediated by the loss of function of Ptbp1 rather than by off-target effects. AAV-mediated rescue of Ptbp1 expression could help clarify this.

      (2) It is not clear why dopaminergic neurons, which are not normally found in the striatum, are observed following Ptbp1 knockout. This is very similar to the now-debunked claims made in Zhou, et al. Cell 2020, but here performed using the hSyn rather than GFAP mini promoter to control AAV expression. While this is the most dramatic and potentially translationally relevant claim of the study, this claim is extremely surprising and lacks any clear mechanistic explanation for why it might happen in the first place. This observation is even more surprising in light of reports that antisense oligonucleotide-mediated knockdown of Ptbp1, which should have affected both neuronal and glial Ptbp1 expression, failed to induce expression of dopaminergic neuronal markers in the striatum (Chen, et al. eLife 2022). Selective loss of function of Ptbp1 in striatal and midbrain astrocytes likewise results in only modest changes in gene expression It is critically important that this claim be independently replicated, and that additional data be provided to conclusively show that striatal neurons are indeed expressing dopaminergic markers.

      (3) More generally, since multiple spectacular and irreproducible claims of single-step glial-to-neuron reprogramming have appeared in high-profile journals in recent years, a consensus has emerged that it is essential to comprehensively characterize the identity of "transformed" cells using either single-cell RNA-Seq or spatial transcriptomics (e.g. Qian, et al. FEBS J 2021; Wang and Zhang, Dev Neurobiol 2022). These concerns apply equally to claims of neuronal subtype conversion such as those advanced here, and it is essential to provide these same datasets.

      (4) Low-power images are generally lacking for immunohistochemical data shown in Figures 3 and 4, which makes interpretation difficult. DAPI images in Figure 3C do not appear nuclear. Immunostaining for Th, DAT, and Dcx in Figure 4 shows a high background and is difficult to interpret.

      (5) Insights into the mechanism by which neuronal-specific loss of Ptbp1 function induces either functional recovery, or dopaminergic markers in striatal neurons, is lacking.

    3. Reviewer #2 (Public Review):

      Summary:

      The manuscript by Bock and colleagues describes the generation of an AAV-delivered adenine base editing strategy to knockdown PTBP1 and the behavioral and neurorestorative effects of specifically knocking down striatal or nigral PTBP1 in astrocytes or neurons in a mouse model of Parkinson's disease. The authors found that knocking down PTBP1 in neurons, but not astrocytes, and in striatum, but not nigra, results in the phenotypic reorganization of neurons to TH+ cells sufficient to rescue motor phenotypes, though insufficient to normalize responses to dopaminomimetic drugs.

      Strengths:

      The manuscript is generally well-written and adds to the growing literature challenging previous findings by Qian et al., 2020 and Zhou et al., 2020 indicating that astrocytic downregulation of PTBP1 can induce conversion to dopaminergic neurons in the midbrain and improve parkinsonian symptoms. The base editing approach is interesting and potentially more therapeutically relevant than previous approaches.

      Weaknesses:

      The manuscript has several weaknesses in approach and interpretation. In terms of approach, the animal model utilized, the 6-OHDA model, though useful to examine dopaminergic cell loss, exhibits accelerated neurodegeneration and none of the typical pathological hallmarks (synucleinopathy, Lewy bodies, etc.) compared to the typical etiology of Parkinson's disease, limiting its translational interpretation. In addition, there is no confirmation of a neuronal or astrocytic knockdown of PTBP1 in vivo; all base editing validation experiments were completed in cell lines. Finally, it is unclear why the base editing approach was used to induce loss-of-function rather than a cell-type specific knockout, if the goal is to assess the effects of PTBP1 loss in specific neurons. In terms of interpretation, the conclusion by the authors that PTBP1 knockdown has little likelihood to be therapeutically relevant seems overstated, particularly since they did observe a beneficial effect on motor behavior. We know that in PD, patients often display negligible symptoms until 50-70% of dopaminergic input to the striatum is lost, due to compensatory activity of remaining dopaminergic cells. Presumably, a small recovery of dopaminergic neurons would have an outsized effect on motor ability and may improve the efficacy of dopaminergic drugs, particularly levodopa, at lower doses, averting many problematic side effects. Since striatal dopamine was assessed by whole-tissue analysis, which is not necessarily reflective of synaptic dopamine availability, it is difficult to assess whether the ~10% increase in TH+ cells in the striatum was sufficient to improve dopamine function. However, the improvement in motor activity suggests that it was.

    4. Reviewer #3 (Public Review):

      This study explores the use of an adenine base editing strategy to knock down PTBP1 in astrocytes and neurons of a Parkinson's disease mouse model, as a potential AAV-BE therapy. The results indicate that editing Ptbp1 in neurons, but not astrocytes, leads to the formation of tyrosine hydroxylase (TH)+ cells, rescuing some motor symptoms.

      Several aspects of the manuscript stand out positively. Firstly, the clarity of the presentation. The authors communicate their ideas and findings in a clear and understandable manner, making it easier for readers to follow.

      The Materials and methods section is well-elaborated, providing sufficient detail for reproducibility.

      The logical flow of the manuscript makes sense, with each section building upon the previous one coherently.

      The ABE strategy employed by the authors appears sound, and the manuscript presents a coherent and well-supported argument.

      Positively, some of the data in this study effectively counteracts previous work in line with more recent publications, demonstrating the authors' ability to contribute to the ongoing conversation in the field.

      However, while the in vitro data yields promising results, it may have been overly optimistic to assume that the efficiencies observed in dividing cells will directly translate to in vivo conditions. This consideration is important given the added complexities of vector optimization, different cell types targeted in vitro versus in vivo, as well as unknown intrinsic limitations of the base editing technology.

      In addition, certain aspects of the manuscript would benefit from a more in-depth and comprehensive discussion rather than being only briefly touched upon. Such a discussion would enhance the relevance of the obtained results and provide the foundation for improvement when using similar approaches.

    1. eLife assessment

      This valuable study provides insights and strategies for assessing laminar structure in vivo in the visual cortex of the macaque monkey with high-density linear electrode arrays. The paper provides solid evidence demonstrating that signals in higher frequency bands, related to the discharge of action potentials, are of substantially better use for achieving well-resolved cortical layer identification than are signals in lower frequency bands typically associated with local field potentials and standard-practice Current Source Density (CSD) analyses. These findings are of interest to electrophysiologists seeking to make comparisons between cortical layers.

    2. Reviewer #1 (Public Review):

      Summary:

      In this study, Zhang et al., presented an electrophysiology method to identify the layers of macaque visual cortex with high density Neuropixels 1.0 electrode. They found several electrophysiology signal profiles for high-resolution laminar discrimination and described a set of signal metrics for fine cortical layer identification.

      Strengths:

      There are two major strengths. One is the use of high density electrodes. The Neuropixels 1.0 probe has 20 um spacing electrodes, which can provide high resolution for cortical laminar identification. The second strength is the analysis. They found multiple electrophysiology signal profiles which can be used for laminar discrimination. Using this new method, they could identify the most thin layer in macaque V1. The data support their conclusion.

      Weaknesses:

      While this electrophysiology strategy is much easier to perform even in awake animals compared to histological staining methods, it provides an indirect estimation of cortical layers. A parallel histological study can provide a direct matching between the electrode signal features and cortical laminar locations. However, there are technical challenges, for example the distortions in both electrode penetration and tissue preparation may prevent a precise matching between electrode locations and cortical layers. In this case, additional micro wires electrodes binding with Neuropixels probe can be used to inject current and mark the locations of different depths in cortical tissue after recording.

    3. Reviewer #2 (Public Review):

      Summary:

      This paper documents an attempt to accurately determine the locations and boundaries of the anatomically and functionally defined layers in macaque primary visual cortex using voltage signals recorded from a high-density electrode array that spans the full depth of cortex with contacts at 20 um spacing. First, the authors attempt to use current source density (CSD) analysis to determine layer locations, but they report a striking failure because the results vary greatly from one electrode penetration to the next and because the spatial resolution of the underlying local field potential (LFP) signal is coarse compared to the electrical contact spacing. The authors thus turn to examining higher frequency signals related to action potentials and provide evidence that these signals reflect changes in neuronal size and packing density, response latency and visual selectivity.

      Strengths:

      There is a lot of nice data to look at in this paper that shows interesting quantities as a function of depth in V1. Bringing all of these together offers the reader a rich data set: CSD, action potential shape, response power and coherence spectrum, and post-stimulus time response traces. Furthermore, data are displayed as a function of eye (dominant or non-dominant) and for achromatic and cone-isolating stimuli.

      This paper takes a strong stand in pointing out weaknesses in the ability of CSD analysis to make consistent determinations about cortical layering in V1. Many researchers have found CSD to be problematic, and the observations here may be important to motivate other researchers to carry out rigorous comparisons and publish their results, even if they reflect negatively on the value of CSD analysis.

      The paper provides a thoughtful, practical and comprehensive recipe for assigning traditional cortical layers based on easily-computed metrics from electophysiological recordings in V1, and this is likely to be useful for electrophysiologists who are now more frequently using high-density electrode arrays.

      Weaknesses:

      Much effort is spent pointing out features that are well known, for example, the latency difference associated with different retinogeniculate pathways, the activity level differences associated with input layers, and the action potential shape differences associated with white vs. gray matter. These have been used for decades as indicators of depth and location of recordings in visual cortex as electrodes were carefully advanced. High density electrodes allow this type of data to now be collected in parallel, but at discrete, regular sampling points. Rather than showing examples of what is already accepted, the emphasis should be placed on developing a rigorous analysis of how variable vs. reproducible are quantitative metrics of these features across penetrations, as a function of distance or functional domain, and from animal to animal. Ultimately, a more quantitative approach to the question of consistency is needed to assess the value of the methods proposed here.

      Another important piece of information for assessing the ability to determine layers from spiking activity is to carry out post-mortem histological processing so that the layer determination made in this paper could be compared to anatomical layering.

      On line 162, the text states that there is a clear lack of consistency across penetrations, but why should there be consistency: how far apart in the cortex were the penetrations? How long were the electrodes allowed to settle before recording, how much damage was done to tissue during insertion? Do you have data taken over time - how consistent is the pattern across several hours, and how long was the time between the collection of the penetrations shown here?

      The impact of the paper is lessened because it emphasizes consistency but not in a consistent manner. Some demonstrations of consistency are shown for CSDs, but not quantified. Figure 4A is used to make a point about consistency in cell density, but across animals, whereas the previous text was pointing out inconsistency across penetrations. What if you took a 40 or 60 um column of tissue and computed cell density, then you would be comparing consistency across potentially similar scales. Overall, it is not clear how all of these different metrics compare quantitatively to each other in terms of consistency.

      In many places, the text makes assertions that A is a consistent indicator of B, but then there appear to be clear counterexamples in the data shown in the figures. There is some sense that the reasoning is relying too much on examples, and not enough on statistical quantities.

      Overall

      Overall, this paper makes a solid argument in favor of using action potentials and stimulus driven responses, instead of CSD measurements, to assign cortical layers to electrode contacts in V1. It is nice to look at the data in this paper and to read the authors' highly educated interpretation and speculation about how useful such measurements will be in general to make layer assignments. It is easy to agree with much of what they say, and to hope that in the future there will be reliable, quantitative methods to make meaningful segmentations of neurons in terms of their differentiated roles in cortical computation. How much this will end up corresponding to the canonical layer numbering that has been used for many decades now remains unclear.

    4. Reviewer #3 (Public Review):

      Summary:

      Zhang et al. explored strategies for aligning electrophysiological recordings from high-density laminar electrode arrays (Neuropixels) with the pattern of lamination across cortical depth in macaque primary visual cortex (V1), with the goal of improving the spatial resolution of layer identification based on electrophysiological signals alone. The authors compare the current commonly used standard in the field - current source density (CSD) analysis - with a new set of measures largely derived from action potential (AP) frequency band signals. Individual AP band measures provide distinct cues about different landmarks or potential laminar boundaries, and together they are used to subdivide the spatial extent of array recordings into discrete layers, including the very thin layer 4A, a level of resolution unavailable when relying on CSD analysis alone for laminar identification. The authors compare the widths of the resulting subdivisions with previously reported anatomical measurements as evidence that layers have been accurately identified. This is a bit circular, given that they also use these anatomical measurements as guidelines limiting the boundary assignments; however, the strategy is overall sensible and the electrophysiological signatures used to identify layers are generally convincing. Furthermore, by varying the pattern of visual stimulation to target chromatically sensitive inputs known to be partially segregated by layer in V1, they show localized response patterns that lend confidence to their identification of particular sublayers.

      The authors compellingly demonstrate the insufficiency of CSD analysis for precisely identifying fine laminar structure, and in some cases its limited accuracy at identifying coarse structure. CSD analysis produced inconsistent results across array penetrations and across visual stimulus conditions and was not improved in spatial resolution by sampling at high density with Neuropixels probes. Instead, in order to generate a typical, informative pattern of current sources and sinks across layers, the LFP signals from the Neuropixels arrays required spatial smoothing or subsampling to approximately match the coarser (50-100 µm) spacing of other laminar arrays. Even with smoothing, the resulting CSDs in some cases predicted laminar boundaries that were inconsistent with boundaries estimated using other measures and/or unlikely given the typical sizes of individual layers in macaque V1. This point alone provides an important insight for others seeking to link their own laminar array recordings to cortical layers.

      They next offer a set of measures based on analysis of AP band signals. These measures include analyses of the density, average signal spread, and spike waveforms of single- and multi-units identified through spike sorting, as well as analyses of AP band power spectra and local coherence profiles across recording depth. The power spectrum measures in particular yield compact peaks at particular depths, albeit with some variation across penetrations, whereas the waveform measures most convincingly identified the layer 6-white matter transition. In general, some of the new measures yield inconsistent patterns across penetrations, and some of the authors' explanations of these analyses draw intriguing but rather speculative connections to properties of anatomy and/or responsivity. However, taken as a group, the set of AP band analyses appear sufficient to determine the layer 6-white matter transition with precision and to delineate intermediate transition points likely to correspond to actual layer boundaries.

      Strengths:

      The authors convincingly demonstrate the potential to resolve putative laminar boundaries using only electrophysiological recordings from Neuropixels arrays. This is particularly useful given that histological information is often unavailable for chronic recordings. They make a clear case that CSD analysis is insufficient to resolve the lamination pattern with the desired precision and offer a thoughtful set of alternative analyses, along with an order in which to consider multiple cues in order to facilitate others' adoption of the strategy. The widths of the resulting layers bear a sensible resemblance to the expected widths identified by prior anatomical measurements, and at least in some cases there are satisfying signatures of chromatic visual sensitivity and latency differences across layers that are predicted by the known connectivity of the corresponding layers. Thus, the proposed analytical toolkit appears to work well for macaque V1 and has strong potential to generalize to use in other cortical regions, though area-targeted selection of stimuli may be required.

      Weaknesses:

      The waveform measures, and in particular the unit density distribution, are likely to be sensitive to the criteria used for spike sorting, which differ widely among experimenters/groups, and this may limit the usefulness of this particular measure for others in the community. The analysis of detected unit density yields fluctuations across cortical depth which the authors attribute to variations in neural density across layers; however, these patterns seemed particularly variable across penetrations and did not consistently yield peaks at depths that should have high neuronal density, such as layer 2. Therefore, this measure has limited interpretability.

      More generally, although the sizes of identified layers comport with typical sizes identified anatomically, a more powerful confirmation would be a direct per-penetration comparison with histologically identified boundaries. Ultimately, the absence of this type of independent confirmation limits the strength of their claim that veridical laminar boundaries can be identified from electrophysiological signals alone.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, Hoops et al. showed that Netrin-1 and UNC5c can guide dopaminergic innervation from nucleus accumbens to cortex during adolescence in rodent models. 

      We showed this with respect to Netrin-1 only. With respect to UNC5c, we showed that the timing of its expression suggests that it may be involved, but did not conduct the UNC5cmanipulation experiments necessary to prove it. We state this clearly in the manuscript.

      They found that these dopamine axons project to the prefrontal cortex in a Netrin-1 dependent manner and knocking down Netrin-1 disrupted motor and learning behaviors in mice. 

      We would like to clarify that we did not show that learning or motor behaviors are affected. We showed that inhibitory control, measured in the Go/No-Go task, is altered in adulthood.

      Furthermore, the authors used hamsters, a seasonal model that is affected by the length of daylight, to demonstrate that the guidance of dopamine axons is mediated by the environmental factor such as daytime length and in sex dependent manner. 

      We agree with this characterization of our hamster experiments, but want to emphasize that it is the timing of the adolescent dopamine axon input to the prefrontal cortex what is impacted by daytime length in a sex dependent manner.

      Regarding the cell type specificity of Netrin-1 expression, the authors began by stating "this question is not the focus of the study and we consider it irrelevant to the main issue we are addressing, which is where in the forebrain regions we examined Netrin-1+ cells are present." This statement contradicts the exact issue regarding the specificity issue I raised.

      We are not sure why the identities of the cell types expressing Netrin-1 are at issue. As a secreted protein, Netrin-1 can be attached to the extracellular cell surface or in the extracellular matrix, where it interacts with its receptors, which are embedded in the cell surfaces of growing axons (Finci et al., 2015; Rajasekharan & Kennedy, 2009). Netrin-1 is expressed by a wide variety of cell types, for example it is expressed in medium spiny neurons in the striatum of rodents as well as in cholinergic neurons (Shatzmiller et al., 2008). However, we cannot see why showing exactly what type(s) of cells have Netrin-1 on their surfaces, or have secreted them into the matrix, would be at issue for our study.

      They then went on to show the RNAscope data for Netrin-1 in Figure 2, which showed Netrin-1 mRNA was actually expressed quite ubiquitously in anterior cingulate cortex, dorsopeduncular cortex, infralimbic cortex, prelimbic cortex, etc. 

      Figure 2 - this is referring to Author response image 2 of our first response to reviewers.

      We agree that Netrin-1 mRNA is present throughout the forebrain. In particular, its presence in the regions mentioned by Reviewer #1 is a key component of our theory for how dopamine axons grow to the prefrontal cortex in adolescence.

      In addition, contrary to the authors' statement that Netrin-1 is a "secreted protein", the confocal images in Figure 1 in the rebuttal letter actually show Netrin-1 present in "granule-like" organelles inside the cytoplasm of neurons. 

      The rebuttal letter’s Figure 1 is not sufficient to determine the subcellular location of the Netrin-1, however we agree that it is likely that Netrin-1 is present in the cytoplasm of neurons. Indeed, its presence in vesicles in the cytoplasm is to be expected as this is a common mechanism for cells to secrete proteins into the extracellular space (Glasgow et al., 2018). We are not sure whether Reviewer #1’s “granule-like” organelles are in fact secretory vesicles or not, and we do not think our immunohistochemical images are an appropriate method by which to determine this kind of question. We find, however, that a detailed characterization of the subcellular distribution of Netrin-1 is beyond the scope of our study. 

      That Netrin-1 is a secreted protein is well-established in the literature (for example, see Glasgow et al., 2018). The confocal images we provide suggest, but do not prove, that it is likely Netrin-1 is present both extracellularly and intracellularly, which is entirely consistent with its synthesis, secretion, and function. It is also consistent with our methodology and findings. 

      Finally, the authors presented Figure 7 to indicate the location where virus expressing Netrin-1 shRNA might be located. Again, the brain region targeted was quite focal and most likely did not cover all the Netrin-1+ brain regions in Figure 2. 

      Figure 2 - this is referring to Author response image 2 of our first response to reviewers.

      Figure 7 - this is referring to Author response image 4 of our first response to reviewers.

      We agree with Reviewer #1’s characterization of our experiment. We intended to interrupt the Netrin-1 pathway to the prefrontal cortex, like removing a bridge along a road. The Netrin-1 signal remained intact along the dopamine axon’s route before and after the location of the viral injection, however it was lost at the site of the virus injection. This is like a road remaining intact on either side of a destroyed bridge, but becoming impassable at the location where the bridge was destroyed. We are glad that Reviewer 1 agrees our experimental design achieved the desired outcome (a focal reduction in Netrin-1 expression).

      Collectively, these results raised more questions regarding the specificity of Netrin-1 expression in brain regions that are behaviorally relevant to this study.

      We do not agree with this assessment. Our manipulation of Netrin-1 expression was highly localized and specific, as Reviewer #1 seems to acknowledge. We are not clear on what questions this might raise that would call into question our findings as described in our manuscript. We have now added the following paragraph to our manuscript:  

      “It remains unknown exactly what types of cells are expressing Netrin-1 along the dopamine axon route, and how this expression is regulated to produce the Netrin-1 gradients that guide the dopamine axons. It also remains unclear where the misrouted axons end up in adulthood. Future experiments aimed at addressing these questions will provide further valuable insight into the nature of the “Netrin-1 pathway”. Nonetheless, our results allow us to conclude that Netrin-1 expressing cells “pave the way” for dopamine axons growing to the medial prefrontal cortex.”

      With respect to the effectiveness of Netrin-1 knockdown in the animals in this study, the authors cited data in HEK293 cells (Cuesta et al., 2020. Figure 2a), which did not include any statistics, and previously published in vivo data in a separate, independent study (Cuesta et al., 2020. Figure 2c). They do not provide any data regarding the effectiveness of Netrin-1 knockdown in THIS study.

      Indeed, we understand the concerns of Reviewer 1 here. This issue was discussed at the time all the experiments (both in the current manuscript and in Cuesta et al., (2020)) were conducted, and we decided that it was sufficient to show the virus was capable of knocking down Netrin-1 in vitro and in vivo in the forebrain. These characterization experiments were published in the first manuscript to present results using the virus, which was Cuesta et al., 2020. However, all experiments from both manuscripts were conducted contemporaneously.

      We do not see how repeating the same characterization experiments again is useful. 

      Similar concerns regarding UNC5C knockdown (points #6, #7, and #8) were not adequately addressed.

      There is no UNC5c knockdown in this manuscript. Furthermore, points #6, #7 and #8 do not deal with UNC5c knockdown. Point #6 is regarding the Netrin-1 virus efficacy, which we discuss above. Points #7 and #8 are requesting numerous additional experiments that we feel are worthy of their own manuscripts, and we do not feel that they call into question the findings we present here. Rather, answering points #7 and #8 would further refine our understanding of how dopamine axons grow to the prefrontal cortex beyond our current manuscript.

      In brief, while this study provides a potential role of Netrin-1-UNC5C in target innervation of dopaminergic neurons and its behavioral output in risk-taking, the data lack sufficient evidence to firmly establish the cause-effect relationship.

      We do not claim a cause-effect relationship here or anywhere in the manuscript. Concrete establishment of a cause-effect relationship will require several more manuscripts worth of experiments.

      Reviewer #2 (Public Review):

      In this manuscript, Hoops et al., using two different model systems, identified key developmental changes in Netrin-1 and UNC5C signaling that correspond to behavioral changes and are sensitive to environmental factors that affect the timing of development. They found that Netrin-1 expression is highest in regions of the striatum and cortex where TH+ axons are travelling, and that knocking down Netrin-1 reduces TH+ varicosities in mPFC and reduces impulsive behaviors in a Go-No-Go test. 

      We want to point out that we examined the Netrin-1 expression in the septum rather than the striatum but otherwise feel the above description is accurate.

      Further, they show that the onset of Unc5 expression is sexually dimorphic in mice, and that in Siberian hamsters, environmental effects on development are also sexually dimorophic. This study addresses an important question using approaches that link molecular, circuit and behavioral changes. Understanding developmental trajectories of adolescence, and how they can be impacted by environmental factors, is an understudied area of neuroscience that is highly relevant to understanding the onset of mental health disorders. I appreciated the inclusion of replication cohorts within the study.

      We appreciate Reviewer #2’s comments, which we feel accurately describe our experimental approach and findings, including their limitations.

      Reviewer #3 (Public Review):

      This study from the Flores group aims at understanding neuronal circuit changes during adolescence which is an ill-defined, transitional period involving dramatic changes in behavior and anatomy. They focus on DA innervation of the prefrontal cortex, and their interaction with the guidance cue Netrin1. They propose DA axons in the PFC increase in the postnatal period, and their density is reduced in a Netrin 1 knockdown, suggesting that Netrin abets the development of this mesocortical pathway. 

      We feel it necessary to point out that we are not the first to propose that dopamine axons in the prefrontal cortex increase in the postnatal period.  This is well-established and was first documented in rodents in the 1980s (Kalsbeek et al., 1988). Otherwise we agree with Reviewer 3’s characterization.

      In such mice impulsivity gauged by a go-no go task is reduced. They then provide some evidence that Unc5c is developmentally regulated in DA axons. Finally they use an interesting hamster model, to study the effect of light hours on mesocortical innervation, and make some interesting observations about the timing of innervation and Unc5c expression, and the fact that females housed in winter day length conditions display an accelerated innervation of the prefrontal cortex.

      We agree with Reviewer #3’s characterization of our study and findings here.

      Comments on the revision. Several points were addressed; some remain to be addressed.

      (4) It's not clear to me that TH doesnt stain noradrenergic axons in the PFC. See Islam and Blaess, 2021, and references therein.

      Presuming that Reviewer #3 is referring to Islam et al. (2021), the review they cite supports our position that TH-stained axons in the forebrain are by-and-large dopamine axons.

      Nonetheless, Islam et al. do point out that it is important to keep in mind that TH-positive axons have a slight possibility of being noradrenaline axons. We are very conscious of this possibility and are careful to minimize this risk. As we state in the methods, we only examine axons that are morphologically consistent with dopamine axons and are localized to areas within the forebrain where dopamine axons are known to innervate, in addition to being THpositive. The localization and morphology of noradrenaline axons in the forebrain is different from that of dopamine axons. This is stated in our methods on lines 76-94, where we describe in detail the differentiation between dopamine and norepinephrine axons and include a full list of relevant citations.

      (6) The Netrin knockdown data provided is from a previous study/samples.

      Indeed, however the experiments for the two manuscripts were conducted contemporaneously. We believe two sets of validation experiments are not required.

      (8) While the authors make the argument that the behavior is linked to DA, they still haven't formally tested it, in my opinion.

      We agree that we have not formally tested this link. However, we disagree that we claim to have established a formal link in our manuscript.

      (1). Fig 3, UNc 5c  levels are not yet quantified. Furthermore, I agree with the previous reviewer that Unc5C knockdown would corroborate key aspects of the model.

      We present UNC5c quantities for mice in our first response to reviewers (Figure 11 therein) however we did not do so for the hamsters due to the time involved. We are planning further experiments with the hamsters and may include quantification of UNC5c in the nucleus accumbens at such time. However, we do not feel its absence from this manuscript calls into question our findings.

      With regards to the UNC5c knockdown, we agree it would be an informative extension of our findings here, but again we do not feel that it is necessary to corroborate our current findings.

      New - Developmental trajectory of prefrontal TH-positive axons from early adolescence to adulthood is similar in male and female rats, (Willing Juraska et al., 2017). This needs discussion.

      Willing et al. (2017) reported an increase in prefrontal dopamine density during adolescence in male and female rats, with a non-significant trend towards an earlier increase in females.

      This is in line with our current results in mice indicating that the timing of dopamine axon targeting and growth is sex specific. We are currently testing this idea directly using intersectional viral tracing methods. We now added the following sentence to the manuscript: 

      “Differences in the precise timing of dopamine innervation to the PFC in adolescence have been suggested by findings reported in male and female rats (Willing et al., 2017)”.

      References

      Brignani, S., Raj, D. D. A., Schmidt, E. R. E., Düdükcü, Ö., Adolfs, Y., Ruiter, A. A. D., Rybiczka-Tesulov, M., Verhagen, M. G., Meer, C. van der, Broekhoven, M. H., MorenoBravo, J. A., Grossouw, L. M., Dumontier, E., Cloutier, J.-F., Chédotal, A., & Pasterkamp, R. J. (2020). Remotely Produced and Axon-Derived Netrin-1 Instructs GABAergic Neuron Migration and Dopaminergic Substantia Nigra Development. Neuron, 107(4), 684-702.e9. https://doi.org/10.1016/j.neuron.2020.05.037

      Cuesta, S., Nouel, D., Reynolds, LM, Morgunova, A., Torres-Berrio, A., White, A., Hernandez, G., Cooper, HM, Flores, C. (2020). Dopamine axon targeting in the nucleus accumbnes in adolescence requires Netrin-1. Frontiers in Cell and Developmental Biology, 8,  doi:10.3389/fcell.2020.00487

      Finci, L., Zhang, Y., Meijers, R., & Wang, J. H. (2015). Signaling mechanism of the netrin-1 receptor DCC in axon guidance. Progress in Biophysics and Molecular Biology, 118(3), 153-160. https://doi.org/10.1016/j.pbiomolbio.2015.04.001

      Glasgow, S. D., Labrecque, S., Beamish, I. V., Aufmkolk, S., Gibon, J., Han, D., Harris, S. N., Dufresne, P., Wiseman, P. W., McKinney, R. A., Séguéla, P., Koninck, P. D., Ruthazer, E. S., & Kennedy, T. E. (2018). Activity-Dependent Netrin-1 Secretion Drives Synaptic Insertion of GluA1-Containing AMPA Receptors in the Hippocampus. Cell Reports, 25(1),

      168-182.e6. https://doi.org/10.1016/j.celrep.2018.09.028

      Islam, K. U. S., Meli, N., & Blaess, S. (2021). The Development of the Mesoprefrontal Dopaminergic System in Health and Disease. Frontiers in Neural Circuits, 15, 746582. https://doi.org/10.3389/fncir.2021.746582

      Kalsbeek, A., Voorn, P., Buijs, R. M., Pool, C. W., & Uylings, H. B. M. (1988). Development of the Dopaminergic Innervation in the Prefrontal Cortex of the Rat. The Journal of Comparative Neurology, 269(1), 58–72. https://doi.org/10.1002/cne.902690105

      Rajasekharan, S., & Kennedy, T. E. (2009). The netrin protein family. Genome Biology, 10(9), 239. https://doi.org/10.1186/gb-2009-10-9-239

      Shatzmiller, R. A., Goldman, J. S., Simard-Émond, L., Rymar, V., Manitt, C., Sadikot, A. F., & Kennedy, T. E. (2008). Graded expression of netrin-1 by specific neuronal subtypes in the adult mammalian striatum. Neuroscience, 157(3), 621–636. https://doi.org/10.1016/j.neuroscience.2008.09.031

      Willing, J., Cortes, L. R., Brodsky, J. M., Kim, T., & Juraska, J. M. (2017). Innervation of the medial prefrontal cortex by tyrosine hydroxylase immunoreactive fibers during adolescence in male and female rats. Developmental Psychobiology, 59(5), 583–589. https://doi.org/10.1002/dev.21525

    2. eLife assessment

      This study addresses an important, understudied question using approaches that link molecular, circuit, and behavioral changes. The findings that Netrin-1 and UNC5c can guide dopaminergic innervation from the nucleus accumbens to the cortex during adolescence are solid. The data showing that the onset of Unc5 expression is sexually dimorphic in mice, and that in Siberian hamsters environmental effects on development are also sexually dimorphic are also solid. Reviewers identified significant gaps in evidence for specificity of Netrin-1 expression, which, if filled, would strengthen the evidence for some of the claims. Future work would also benefit from Unc5C knockdown to corroborate the results and investigation of the cause-effect relationship. This paper will be of interest to those interested in neural development, sex differences, and/or dopamine function.

    3. Reviewer #1 (Public Review):

      In this study, Hoops et al. showed that Netrin-1 and UNC5c can guide dopaminergic innervation from nucleus accumbens to cortex during adolescence in rodent models. They found that these dopamine axons project to the prefrontal cortex in a Netrin-1 dependent manner and knocking down Netrin-1 disrupted motor and learning behaviors in mice. Furthermore, the authors used hamsters, a seasonal model that is affected by the length of daylight, to demonstrate that the guidance of dopamine axons is mediated by the environmental factor such as daytime length and in sex dependent manner.

      Regarding the cell type specificity of Netrin-1 expression, the authors began by stating "this question is not the focus of the study and we consider it irrelevant to the main issue we are addressing, which is where in the forebrain regions we examined Netrin-1+ cells are present." This statement contradicts the exact issue regarding the specificity issue I raised. They then went on to show the RNAscope data for Netriin-1 in Figure 2, which showed Netrin-1 mRNA was actually expressed quite ubiquitously in anterior cingulate cortex, dorsopeduncular cortex, infralimbic cortex, prelimbic cortex, etc. In addition, contrary to the authors' statement that Netrin-1 is a "secreted protein", the confocal images in Figure 1 in the rebuttal letter actually show Netrin-1 present in "granule-like" organelles inside the cytoplasm of neurons. Finally, the authors presented Figure 7 to indicate the location where virus expressing Netrin-1 shRNA might be located. Again, the brain region targeted was quite focal and most likely did not cover all the Netrin-1+ brain regions in Figure 2. Collectively, these results raised more questions regarding the specificity of Netrin-1 expression in brain regions that are behaviorally relevant to this study.

      With respect to the effectiveness of Netrin-1 knockdown in the animals in this study, the authors cited data in HEK293 cells (Figure 5), which did not include any statistics, and previously published in vivo data in a separate, independent study (Figure 6). They do not provide any data regarding the effectiveness of Netrin-1 knockdown in THIS study.

      Similar concerns regarding UNC5C knockdown (points #6, #7, and #8) were not adequately addressed.

      In brief, while this study provides a potential role of Netrin-1-UNC5C in target innervation of dopaminergic neurons and its behavioral output in risk-taking, the data lack sufficient evidence to firmly establish the cause-effect relationship.

    4. Reviewer #2 (Public Review):

      In this manuscript, Hoops et al., using two different model systems, identified key developmental changes in Netrin-1 and UNC5C signaling that correspond to behavioral changes and are sensitive to environmental factors that affect the timing of development. They found that Netrin-1 expression is highest in regions of the striatum and cortex where TH+ axons are travelling, and that knocking down Netrin-1 reduces TH+ varicosities in mPFC and reduces impulsive behaviors in a Go-No-Go test. Further, they show that the onset of Unc5 expression is sexually dimorphic in mice, and that in Siberian hamsters, environmental effects on development are also sexually dimorophic. This study addresses an important question using approaches that link molecular, circuit and behavioral changes. Understanding developmental trajectories of adolescence, and how they can be impacted by environmental factors, is an understudied area of neuroscience that is highly relevant to understanding the onset of mental health disorders. I appreciated the inclusion of replication cohorts within the study.

    5. Reviewer #3 (Public Review):

      This study from the Flores group aims at understanding neuronal circuit changes during adolescence which is an ill-defined, transitional period involving dramatic changes in behavior and anatomy. They focus on DA innervation of the prefrontal cortex, and their interaction with the guidance cue Netrin-1. They propose DA axons in the PFC increase in the postnatal period, and their density is reduced in a Netrin 1 knockdown, suggesting that Netrin abets the development of this mesocortical pathway. In such mice impulsivity gauged by a go-no-go task is reduced. They then provide some evidence that Unc5c is developmentally regulated in DA axons. Finally they use an interesting hamster model, to study the effect of light hours on mesocortical innervation, and make some interesting observations about the timing of innervation and Unc5c expression, and the fact that females housed in winter day length conditions display an accelerated innervation of the prefrontal cortex.

      Comments on the revision. Several points were addressed; some remain to be addressed.

      4. It's not clear to me that TH doesn't stain noradrenergic axons in the PFC. See Islam and Blaess, 2021, and references therein.

      6. The Netrin knockdown data provided is from a previous study/samples.

      8. While the authors make the argument that the behavior is linked to DA, they still haven't formally tested it, in my opinion.

      13. Fig 3, UNc 5c levels are not yet quantified. Furthermore, I agree with the previous reviewer that Unc5C knockdown would corroborate key aspects of the model.

      New - Developmental trajectory of prefrontal TH-positive axons from early adolescence to adulthood is similar in male and female rats, (Willing Juraska et al., 2017). This needs discussion.

      Editors note:<br /> should you choose to revise your manuscript, please include degrees of freedom in your statistical reporting.

    1. eLife assessment

      This important study uses state-of-the-art, multi-region two-photon calcium imaging to characterize the statistics of functional connectivity between visual cortical neurons. The evidence supporting the conclusions is incomplete; currently, alternative interpretations of the results cannot be ruled out. With new analyses strengthening the conclusions, the work would be of broad interest to neuroscientists interested in the visual cortex and inter-area communication.

    2. Reviewer #1 (Public Review):

      Using multi-region two-photon calcium imaging, the manuscript meticulously explores the structure of noise correlations (NCs) across the mouse visual cortex and uses this information to make inferences about the organization of communication channels between primary visual cortex (V1) and higher visual areas (HVAs). Using visual responses to grating stimuli, the manuscript identifies 6 tuning groups of visual cortex neurons and finds that NCs are highest among neurons belonging to the same tuning group whether or not they are found in the same cortical area. The NCs depend on the similarity of tuning of the neurons (their signal correlations) but are preserved across different stimulus sets - noise correlations recorded using drifting gratings are highly correlated with those measured using naturalistic videos. Based on these findings, the manuscript concludes that populations of neurons with high NCs constitute discrete communication channels that convey visual signals within and across cortical areas.

      Experiments and analyses are conducted to a high standard and the robustness of noise correlation measurements is carefully validated. However, the interpretation of noise correlation measurements as a proxy from network connectivity is fraught with challenges. While the data clearly indicates the existence of distributed functional ensembles, the notion of communication channels implies the existence of direct anatomical connections between them, which noise correlations cannot measure.

      The traditional view of noise correlations is that they reflect direct connectivity or shared inputs between neurons. While it is valid in a broad sense, noise correlations may reflect shared top-down input as well as local or feedforward connectivity. This is particularly important since mouse cortical neurons are strongly modulated by spontaneous behavior (e.g. Stringer et al, Science, 2019). Therefore, noise correlation between a pair of neurons may reflect whether they are similarly modulated by behavioral state and overt spontaneous behaviors. Consequently, noise correlation alone cannot determine whether neurons belong to discrete communication channels.

      Behavioral modulation can influence the gain of sensory-evoked responses (Niell and Stryker, Neuron, 2010). This can explain why signal correlation is one of the best predictors of noise correlations as reported in the manuscript. A pair of neurons that are similarly gain-modulated by spontaneous behavior (e.g. both active during whisking or locomotion) will have higher noise correlations if they respond to similar stimuli. Top-down modulation by the behavioral state is also consistent with the stability of noise correlations across stimuli. Therefore, it is important to determine to what extent noise correlations can be explained by shared behavioral modulation.

    3. Reviewer #2 (Public Review):

      Summary:

      This groundbreaking study characterizes the structure of activity correlations over a millimeter scale in the mouse cortex with the goal of identifying visual channels, specialized conduits of visual information that show preferential connectivity. Examining the statistical structure of the visual activity of L2/3 neurons, the study finds pairs of neurons located near each other or across distances of hundreds of micrometers with significantly correlated activity in response to visual stimulation. These highly correlated pairs have closely related visual tuning sharing orientation and/or spatial and/or temporal preference as would be expected from dedicated visual channels with specific connectivity.

      Strengths:

      The study presents best-in-class mesoscopic-scale 2-photon recordings from neuronal populations in pairs of visual areas (V1-LM, V1-PM, V1-AL, V1-LI). The study employs diverse visual stimuli that capture some of the specialization and heterogeneity of neuronal tuning in mouse visual areas. The rigorous data quantification takes into consideration functional cell groups as well as other variables that influence trial-to-trial correlations (similarity of tuning, neuronal distance, receptive field overlap). The paper convincingly demonstrates the robustness of the clustering analysis and of the activity correlation measurements. The calcium imaging results convincingly show that noise correlations are correlated across visual stimuli and are strongest within cell classes which could reflect distributed visual channels. A simple simulation is provided that suggests that recurrent connectivity is required for the stimulus invariance of the results. The paper is well-written and conceptually clear. The figures are beautiful and clear. The arguments are well laid out and the claims appear in large part supported by the data and analysis results (but see weaknesses).

      Weaknesses:

      An inherent limitation of the approach is that it cannot reveal which anatomical connectivity patterns are responsible for observed network structure. The modeling results presented, however, suggest interestingly that a simple feedforward architecture may not account for fundamental characteristics of the data. A limitation of the study is the lack of a behavioral task. The paper shows nicely that the correlation structure generalizes across visual stimuli. However, the correlation structure could differ widely when animals are actively responding to visual stimuli. I do think that, because of the complexity involved, a characterization of correlations during a visual task is beyond the scope of the current study.

      An important question that does not seem addressed (but it is addressed indirectly, I could be mistaken) is the extent to which it is possible to obtain reliable measurements of noise correlation from cell pairs that have widely distinct tuning. L2/3 activity in the visual cortex is quite sparse. The cell groups laid out in Figure S2 have very sharp tuning. Cells whose tuning does not overlap may not yield significant trial-to-trial correlations because they do not show significant responses to the same set of stimuli, if at all any time. Could this bias the noise correlation measurements or explain some of the dependence of the observed noise correlations on signal correlations/similarity of tuning? Could the variable overlap in the responses to visual responses explain the dependence of correlations on cell classes and groups?

      With electrophysiology, this issue is less of a problem because many if not most neurons will show some activity in response to suboptimal stimuli. For the present study which uses calcium imaging together with deconvolution, some of the activity may not be visible to the experimenters. The correlation measure is shown to be robust to changes in firing rates due to missing spikes. However, the degree of overlap of responses between cell pairs and their consequences for measures of noise correlations are not explored.

      Beyond that comment, the remaining issues are relatively minor issues related to manuscript text, figures, and statistical analyses. There are typos left in the manuscript. Some of the methodological details and results of statistical testing also seem to be missing. Some of the visuals and analyses chosen to examine the data (e.g., box plots) may not be the most effective in highlighting differences across groups. If addressed, this would make a very strong paper.

    4. Reviewer #3 (Public Review):

      Summary:

      Yu et al harness the capabilities of mesoscopic 2P imaging to record simultaneously from populations of neurons in several visual cortical areas and measure their correlated variability. They first divide neurons into 65 classes depending on their tuning to moving gratings. They found the pairs of neurons of the same tuning class show higher noise correlations (NCs) both within and across cortical areas. Based on these observations and a model they conclude that visual information is broadcast across areas through multiple, discrete channels with little mixing across them.

      NCs can reflect indirect or direct connectivity, or shared afferents between pairs of neurons, potentially providing insight on network organization. While NCs have been comprehensively studied in neuron pairs of the same area, the structure of these correlations across areas is much less known. Thus, the manuscripts present novel insights into the correlation structure of visual responses across multiple areas.

      Strengths:

      The study uses state-of-the art mesoscopic two-photon imaging.

      The measurements of shared variability across multiple areas are novel.

      The results are mostly well presented and many thorough controls for some metrics are included.

      Weaknesses:

      I have concerns that the observed large intra-class/group NCs might not reflect connectivity but shared behaviorally driven multiplicative gain modulations of sensory-evoked responses. In this case, the NC structure might not be due to the presence of discrete, multiple channels broadcasting visual information as concluded. I also find that the claim of multiple discrete broadcasting channels needs more support before discarding the alternative hypothesis that a continuum of tuning similarity explains the large NCs observed in groups of neurons.

      Specifically:

      Major concerns:

      (1) Multiplicative gain modulation underlying correlated noise between similarly tuned neurons

      (1a) The conclusion that visual information is broadcasted in discrete channels across visual areas relies on interpreting NC as reflecting, direct or indirect connectivity between pairs, or common inputs. However, a large fraction of the activity in the mouse visual system is known to reflect spontaneous and instructed movements, including locomotion and face movements, among others. Running activity and face movements are some of the largest contributors to visual cortex activity and exert a multiplicative gain on sensory-evoked responses (Niell et al, Stringer et al, among others). Thus, trial-by-fluctuations of behavioral state would result in gain modulations that, due to their multiplicative nature, would result in more shared variability in cotuned neurons, as multiplication affects neurons that are responding to the stimulus over those that are not responding ( see Lin et al, Neuron 2015 for a similar point).

      As behavioral modulations are not considered, this confound affects most of the conclusions of the manuscript, as it would result in larger NCs the more similar the tuning of the neurons is, independently of any connectivity feature. It seems that this alternative hypothesis can explain most of the results without the need for discrete broadcasting channels or any particular network architecture and should be addressed to support its main claims.

      (1b) In Figure 5 the observations are interpreted as evidence for NCs reflecting features of the network architecture, as NCs measured using gratings predicted NC to naturalistic videos. However, it seems from Figure 5 A that signal correlations (SCs) from gratings had non-zero correlations with SCs during naturalistic videos (is this the case?). Thus, neurons that are cotuned to gratings might also tend to be coactivated during the presentation of videos. In this case, they are also expected to be susceptible to shared behaviorally driven fluctuations, independently of any circuit architecture as explained before. This alternative interpretation should be addressed before concluding that these measurements reflect connectivity features.

      (2) Discrete vs continuous communication channels

      (2a) One of the author's main claims is that the mouse cortical network consists of discrete communication channels. This discreteness is based on an unbiased clustering approach to the tuning of neurons, followed by a manual grouping into six categories in relation to the stimulus space. I believe there are several problems with this claim. First, this clustering approach is inherently trying to group neurons and discretise neural populations. To make the claim that there are 'discrete communication channels' the null hypothesis should be a continuous model. An explicit test in favor of a discrete model is lacking, i.e. are the results better explained using discrete groups vs. when considering only tuning similarity? Second, the fact that 65 classes are recovered (out of 72 conditions) and that manual clustering is necessary to arrive at the six categories is far from convincing that we need to think about categorically different subsets of neurons. That we should think of discrete communication channels is especially surprising in this context as the relevant stimulus parameter axes seem inherently continuous: spatial and temporal frequency. It is hard to motivate the biological need for a discretely organized cortical network to process these continuous input spaces.

      (2b) Consequently, I feel the support for discrete vs continuous selective communication is rather inconclusive. It seems that following the author's claims, it would be important to establish if neurons belong to the same groups, rather than tuning similarity is a defining feature for showing large NCs.

      Finally, as stated in point 1, the larger NCs observed within groups than across groups might be due to the multiplicative gain of state modulations, due to the larger tuning similarity of the neurons within a class or group.

    5. Author Response:

      We appreciate the constructive reviews. We have performed additional analysis to address reviewer concerns, and we will submit a full revision in the near future. Our new analysis confirms that the visual stimulus can account for about a third of the variance in population neural activity. Pupil dynamics only account for a small fraction of the trial-to-trial variability, less than six percent. Once we regress out the stimulus responses and the pupil dynamics, we can use the network activity to predict the trial-to-trial variability of single neuron responses, and about eight percent of the variance is explained. Thus it appears as though multiplicative gain cannot account for the results. As for the concerns about missing spikes, we would like to direct readers to the supplementary figure that addresses that concern. The analysis shows that the correlation measurements are robust to the imprecisions of spike inference from calcium imaging data. Finally, we would also like to take the opportunity to clarify that we make no claim as to the discreteness of tuning classes. The GMM analysis was performed to obtain a data-driven, granular categorization of neuron tuning, to support detailed statistical analysis. We take no position on the discreteness or lack thereof of these groups. We agree that it is an interesting question, and we are happy to provide additional analysis in the revision to address this question. Our main result on functional connectivity structure holds regardless of the discreteness of neuron tuning selectivity.

    1. eLife assessment

      The authors expand the concept of a new layer to training immunity, which is currently being highlighted by several colleagues in the field. The work provides important hints to understand end-stage renal disease. Overall, the rational approach leads to experimental results that are solid.

    2. Reviewer #1 (Public Review):

      In this study, Kim et al. investigated the mechanism by which uremic toxin indoxyl sulfate (IS) induces trained immunity, resulting in augmented pro-inflammatory cytokine production such as TNF and IL-6. The authors claim that IS treatment induced epigenetic and metabolic reprogramming, and the aryl hydrocarbon receptor (AhR)-mediated arachidonic acid pathway is required for establishing trained immunity in human monocytes. They also demonstrated that uremic sera from end-stage renal disease (ESRD) patients can generate trained immunity in healthy control-derived monocytes.

      These are interesting results that introduce the important new concept of trained immunity and its importance in showing endogenous inflammatory stimuli-induced innate immune memory. Additional evidence proposing that IS plays a critical role in the initiation of inflammatory immune responses in patients with CKD is also interesting and a potential advance of the field.

      Comments on the revised version:

      In the revised manuscripts, the authors have addressed essentially almost all of the points raised by the reviewers and have revised the manuscript accordingly. The additional comments improved the manuscript and strengthened the overall impact of the paper.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, Kim et al. investigated the mechanism by which uremic toxin indoxyl sulfate (IS) induces trained immunity, resulting in augmented pro-inflammatory cytokine production such as TNF and IL6. The authors claim that IS treatment induced epigenetic and metabolic reprogramming, and the aryl hydrocarbon receptor (AhR)-mediated arachidonic acid pathway is required for establishing trained immunity in human monocytes. They also demonstrated that uremic sera from end-stage renal disease (ESRD) patients can generate trained immunity in healthy control-derived monocytes.

      These are interesting results that introduce the important new concept of trained immunity and its importance in showing endogenous inflammatory stimuli-induced innate immune memory. Additional evidence proposing that IS plays a critical role in the initiation of inflammatory immune responses in patients with CKD is also interesting and a potential advance of the field. This study is in large part well done, but some components of the study are still incomplete and additional efforts are required to nail down the main conclusions.

      Thank you very much for your positive feedback.

      Specific comments:

      (1) Of greatest concern, there are concerns about the rigor of these experiments, whether the interpretation and conclusions are fully supported by the data. (1) Although many experiments have been sporadically conducted in many fields such as epigenetic, metabolic regulation, and AhR signaling, the causal relationship between each mechanism is not clear. (2) Throughout the manuscript, no distinction was made between the group treated with IS for 6 days and the group treated with the second LPS (addressed below). (3) Besides experiments using non-specific inhibitors, genetic experiments including siRNA or KO mice should be examined to strengthen and justify central suggestions.

      We are grateful for the invaluable constructive feedback provided. 

      (1) In response to the reviewer's feedback, we conducted additional experiments employing appropriate inhibitors to investigate the causal relationship among the AhR pathway, epigenetic modifications, and metabolic rewiring in IS-induced trained immunity. Notably, metabolic rewiring, particularly the upregulation of aerobic glycolysis via the mTORC1 signaling pathway, stands as a pivotal mechanism underlying the induction of trained immunity through the modulation of epigenetic modifications (Riksen NP et al. Figure 1). Initially, we assessed the enrichment of H3K4me3 at 6-day on promoters of TNFA and IL6 loci after treatment of zileuton, an inhibitor of ALOX5, and 2-DG, a glycolysis inhibitor. Additionally, we evaluated the alteration in the activity of S6K, a downstream molecule of mTORC1, following zileuton treatment. Our findings indicate that AhR-dependent arachidonic acid (AA) signaling induces epigenetic modifications, albeit without inducing metabolic rewiring, in IS-induced trained immunity (Author response image 1). However, IS stimulation promotes mTORC1-mediated glycolysis in an AhR-independent manner. Notably, inhibition of glycolysis with 2-DG impacts epigenetic modifications. We have updated Figure 7 of the revised manuscript to incorporate these additional experimental findings, elucidating the correlation between the diverse mechanisms implicated in IS-induced innate immune memory (Fig. 7 in the revised manuscript). These data have been integrated into the revised manuscript as Figure 3D and 5I, and supplementary Figure 5I.

      (2) We apologize for any confusion arising from the unclear description regarding the distinction between the group treated with IS for 6 days and the group subjected to secondary lipopolysaccharide (LPS) stimulation. It is imperative to clarify that induction of trained immunity necessitates 1 day of IS stimulation followed by 5 days of rest, rendering the 6th day sample representative of a trained state. Subsequent to this, a 24-hour LPS stimulation is applied, designating the 7th day sample as a secondary LPS-stimulated cell. This clarification is now explicitly indicated throughout the entirety of Figure 1A and Figure 3A in the revised manuscript.

      (3) In accordance with your feedback, we performed siRNA knockdown of AhR and ALOX5 in primary human monocytes. AhR knockdown markedly attenuated the mRNA expression of TNF-α and IL-6, which are augmented in IS-trained macrophages. Similarly, knockdown of ALOX5 using ALOX5 siRNA abrogated the increase in TNF-α and IL-6 levels upon LPS stimulation in IS-trained macrophages (Author response image 2). Our experiments utilizing AhR siRNA corroborate the involvement of AhR in the expression of AA pathway-related molecules, such as ALOX5, ALOX5AP, and LTB4R1, in IS-induced trained immunity. These data have been incorporated into the revised manuscript as Figure 4E and 5G, and supplementary Figure 5H.  

      Author response image 1.

      Epigenetic modification is regulated by arachidonic acid (AA) pathway and metabolic rewiring, but metabolic rewiring is not affected by the AA pathway. A-B. Monocytes were pre-treated with zileuton (ZLT), an inhibitor of ALOX5, or 2DG, a glycolysis inhibitor, followed by stimulation with IS for 24 hours. After a resting period of 5 days, the enrichment of H3K4me3 on the promoters of TNFA and IL6 loci was assessed. Normalization was performed using 2% input. C. Monocytes were pre-treated with zileuton (ZLT) and stimulated with IS for 24 hr. Cell lysates were immunoblotted for phosphorylated S6 Kinase, with β-actin serving as a normalization control. Band intensities in the immunoblots were quantified using densitometry. D, A schematic representation of the mechanistic framework underlying IS-trained immunity. Bar graphs show the mean ± SEM. * = p < 0.05, **= p < 0.01, and *** = p < 0.001 by two-tailed paired t-test.

      Author response image 2.

      Inhibition of IS-trained immunity by knockdown of AhR or ALOX5 in human monocytes. A-C. Human monocytes were transfected with siRNA targeting AhR (siAhR), ALOX5 (siALOX5), or negative control (siNC) for 1 day, followed by stimulation with IS for 24 hours. After a resting period of 5 days, cells were re-stimulated with LPS for 24 hours. mRNA expression levels of AhR and ALOX5 at 1 day after transfection, and TNF-α and IL-6 at 1 day after LPS treatment, were assessed using RT-qPCR. D. Human monocytes were transfected with AhR siRNA or negative control (NC) siRNA for 1 day, followed by stimulation with IS for 24 hours. After resting for 5 days, mRNA expression levels of ALOX5, ALOX5AP, and LTB4R1 were analyzed using RT-qPCR. Bar graphs show the mean ± SEM. * = p < 0.05, ** = p < 0.01, and *** = p < 0.001 by two-tailed paired t-test.  

      (2) The authors showed that IS-trained monocytes showed no change in TNF or IL-6, but increased the expression levels of TNF and IL-6 in response to the second LPS (Fig. 1B). This suggests that the different LPS responsiveness in IS-trained monocytes caused altered gene expression of TNF and IL6. However, the authors also showed that IS-trained monocytes without LPS stimulation showed increased levels of H3K4me3 at the TNF and IL-6 loci, as well as highly elevated ECAR and OCR, leading to no changes in TNF and IL-6. Therefore, it is unclear why or how the epigenetic and metabolic states of IS-trained monocytes induce different LPS responses. For example, increased H3K4me3 in HK2 and PFKP is important for metabolic rewiring, but why increased H3K4me3 in TNF and IL6 does not affect gene expression needs to be explained.

      We acknowledge the constructive critiques provided by the reviewer. While epigenetic modifications in the promoters of TNF-α, IL-6, HK2, and PFKP (Figure 3B and Supplementary Figure 3C in the revised manuscript), and metabolic rewiring (Figure 2A-D in the revised manuscript) were observed in IS-trained macrophages at 6 days prior to LPS stimulation, these macrophages do not exhibit an increase in TNF-α and IL-6 mRNA and protein levels before LPS stimulation. This lack of response is attributed to a 5-day resting period, allowing the macrophages to revert to a non-activated state, as depicted in Author response image 3 and 4. This phenomenon aligns with the concept of typical trained immunity.

      Trained immunity is characterized by the long-term functional reprogramming of innate immune cells, which is evoked by various primary insults and which leads to an altered response towards a second challenge after the return to a non-activated state. Metabolic and epigenetic reprogramming events during the primary immune response persist partially even after the initial stimulus is removed. Upon a secondary challenge, trained innate immune cells exhibit a more robust and more prompt response than the initial response (Netea MG et al. Defining trained immunity and its role in health and disease. Nat Rev Immunol. 2020 Jun;20(6):375-388).

      Numerous studies have demonstrated the observation of epigenetic modifications in the promoters of TNF-α and IL-6, and metabolic rewiring prior to LPS stimulation as a secondary challenge. However, cytokine production is contingent on LPS stimulation (Arts RJ et al. Glutaminolysis and Fumarate Accumulation Integrate Immunometabolic and Epigenetic Programs in Trained Immunity. Cell Metab. 2016 Dec 13;24(6):807-819; Arts RJW et al. Immunometabolic Pathways in BCG-Induced Trained Immunity. Cell Rep. 2016 Dec 6;17(10):2562-2571; Ochando J et al. Trained immunity - basic concepts and contributions to immunopathology. Nat Rev Nephrol. 2023 Jan;19(1):23-37). The prolonged presence of higher levels of H3K4me3 on immune gene promoters, even after returning to baseline, is associated with open chromatin and results in a more rapid and stronger response, such as cytokine production, upon a secondary insult (Netea MG et al. Defining trained immunity and its role in health and disease. Nat Rev Immunol. 2020 Jun;20(6):375-388).

      The results in Figure 1B may be interpreted as indicating different LPS responsiveness in IStrained monocytes caused altered gene expression of TNF and IL-6. However, it is plausible that trained immune cells respond more robustly even to low concentrations of LPS. In fact, the aim of this experiment was to determine the appropriate LPS concentration.

      Author response image 3.

      The changes in mRNA and protein level of TNF-α and IL-6 during induction of IS-trained immunity. Human monocytes were treated with or without IS (1 mM) for 24 hrs, succeeded by 5-day resting period to induce trained immunity. Cells were stimulated with LPS for 24 hrs. Protein and mRNA levels were assessed by ELISA and RT-qPCR, respectively. Bar graphs show the mean ± SEM. * = p < 0.05 and **= p < 0.01, by two-tailed paired t-test.

      Author response image 4.

      The changes in mRNA of HK2 and PFKP induced by IS during induction of IS-trained immunity. Human monocytes were treated with or without IS (1 mM) for 24 hrs, succeeded by 5-day resting period to induce trained immunity. mRNA levels were assessed by RT-qPCR. Bar graphs show the mean ± SEM. * = p < 0.05 by two-tailed paired ttest.

      (3) The authors used human monocytes cultured in human serum without growth factors such as MCSF for 5-6 days. When we consider the short lifespan of monocytes (1-3 days), the authors need to explain the validity of the experimental model.

      We appreciate the reviewer’s constructive critiques. As pointed out by the reviewer, human circulating CD14+ monocytes exhibit a relatively short lifespan (1-3 days) when cultured in the absence of growth factors (Patel AA et al. The fate and lifespan of human monocyte subsets in steady state and systemic inflammation. J Exp Med. 2017 Jul 3;214(7):1913-1923). In this study, purified CD14+ monocytes were subjected to adherent culture for a duration of 7 days in RPMI1640 media supplemented with 10% human AB serum, a standard in vitro culture protocol widely employed in studies focusing on trained immunity (Domínguez-Andrés J et al. In vitro induction of trained immunity in adherent human monocytes. STAR Protoc. 2021 Feb 24;2(1):100365). In response to the reviewer's suggestions, we assessed cell viability on days 0, 1, 4, and 6, utilizing the WST assay. Despite a marginal reduction in cell viability observed at day 1, attributed to detachment from the culture plate, the cultured monocytes exhibited a notable enhancement in cell viability on days 4 and 6 when compared to days 0 or 1 (Author response image 5).

      It has been demonstrated that the adhesion of human monocytes to a cell culture dish leads to their activation and induces the synthesis of substantial amounts of IL-1β mRNA as observed in monocytes adherent to extracellular matrix components such as fibronectin and collagen.

      Morphologically, human adherent monocytes cultured with 10% human serum appear to undergo partial differentiation into macrophages by day 6, potentially explaining the observed lack of decrease in monocyte viability. Notably, Safi et al. have reported that adherent monocytes cultured with 10% human serum exhibit no significant difference in cell viability over a 7-day period when compared to cultures supplemented with growth factors such as M-CSF and IL-3 (Safi W et al. Differentiation of human CD14+ monocytes: an experimental investigation of the optimal culture medium and evidence of a lack of differentiation along the endothelial line. Exp Mol Med. 2016 Apr 15;48(4):e227).

      Author response image 5.

      Viability of human monocytes during the induction of trained immunity. Purified human monocytes were seeded on plates with RPIM1640 media supplemented with 10% human AB serum. Cell viability was assessed on days 0, 1, 4, and 6 utilizing the WST assay (Left panel). Cell morphology was examined under a light-inverted microscope at the indicated times (Right panel).

      (4) The authors' ELISA results clearly showed increased levels of TNF and IL-6 proteins, but it is well established that LPS-induced gene expression of TNF and IL-6 in monocytes peaked within 1-4 hours and returned to baseline by 24 hours. Therefore, authors need to investigate gene expression at appropriate time points.

      We appreciate the valuable constructive feedback provided by the reviewer. As indicated by the reviewer, the LPS-induced gene expression of TNF-α and IL-6 in IS-trained monocytes exhibited a peak within the initial 1 to 4 hours, followed by a decrease by the 24-hour time point, as illustrated in Author response image 6. Nevertheless, the mRNA expression levels of TNFα and IL-6 were still elevated at the 24-hour mark. Furthermore, the protein levels of both TNFα and IL-6 apparently increased 24 hours after LPS stimulation. Due to technical constraints, sample collection had to be conducted at a single time point, and the 24-hour post-stimulation interval was deemed optimal for this purpose.

      Author response image 6.

      Kinetics of protein and mRNA expression of TNF-α and IL-6 after treatment of LPS as secondary insult in IS-trained monocytes. IS-trained cells were re-stimulated by LPS (10 ng/ml) for the indicated time. The supernatant and lysates were collected for ELISA assay and RT-qPCR analysis, respectively. Bar graphs show the mean ± SEM. * = p <0.05 and **= p < 0.01, by two-tailed paired t-test.

      (5) It is a highly interesting finding that IS induces trained immunity via the AhR pathway. The authors also showed that the pretreatment of FICZ, an AhR agonist, was good enough to induce trained immunity in terms of the expression of TNF and IL-6. However, from this point of view, the authors need to discuss why trained immunity was not affected by kynurenic acid (KA), which is a well-known AhR ligand accumulated in CKD and has been reported to be involved in innate immune memory mechanisms (Fig. S1A).

      We appreciate the constructive criticism provided by the reviewer, and we comprehend the raised points. In our initial experiments, we hypothesized that kynurenic acid (KA), an aryl hydrocarbon receptor (AhR) ligand, might instigate trained immunity in monocytes, despite KA not being our primary target uremic toxin. However, our findings, as depicted in Fig. S1A, demonstrated that KA did not induce trained immunity. Notably, KA-treated monocytes exhibited induction of CYP1B1, an AhR-responsive gene, and elevated levels of TNF-α and IL-6 mRNA at 24 hours post-treatment, comparable to FICZ-treated monocytes. This observation underscores KA's role as an AhR ligand in human monocytes, as emphasized by the reviewer. 

      Of particular interest, proteins associated with the arachidonic acid pathway, such as ALOX5 and ALOX5AP - integral to the mechanisms underlying IS-induced trained immunity - did not exhibit an increase at day 6 following KA treatment, in contrast to the significant elevation observed with IS and FICZ treatments (Author response image 7). The rationale behind this disparity remains unknown, necessitating further investigation to elucidate the underlying factors. These data have been incorporated into the revised manuscript as Supplementary Figure 5C.

      Author response image 7.

      Divergent impact of AhR agonists, especially IS, FICZ, and KA on the AhR-ALOX5 pathway. Purified ytes underwent treatment with IS (1 mM), FICZ (100 nM), or KA (0.5 mM) for 1 day, followed by 5-day resting period to trained immunity. Activation of AhR through ligand binding was assessed by examining the induction of CYP1B1, an AhR ene, and cytokines one day post-treatment. The expression of genes related to the arachidonic acid pathway, such as ALOX5, 5AP, and LTB4R1, was analyzed via RT-qPCR six days after inducing trained immunity. Bar graphs show the mean ± SEM. * .05, **= p < 0.01, and ***= p < 0.001 by two-tailed paired t-test.

      Indeed, it has been demonstrated that FICZ and TCDD, two high-affinity AhR ligands, exert opposite effects on T-cell differentiation, with TCDD inducing regulatory T cells and FICZ inducing Th17 cells. This dichotomy has been attributed to ligand-intrinsic differences in AhR activation (Ho PP et al. The aryl hydrocarbon receptor: a regulator of Th17 and Treg cell development in disease. Cell Res. 2008 Jun;18(6):605-8; Ehrlich AK et al. TCDD, FICZ, and Other High Affinity AhR Ligands Dose-Dependently Determine the Fate of CD4+ T Cell Differentiation. Toxicol Sci. 2018 Feb 1;161(2):310-320). These outcomes imply the involvement of an intricate interplay involving metabolic rewiring, epigenetic reprogramming, and the AhR-ALOX5 pathway in IS-induced trained immunity within monocytes.

      (6) The authors need to clarify the role of IL-10 in IS-trained monocytes. IL-10, an anti-inflammatory cytokine that can be modulated by AhR, whose expression (Fig. 1E, Fig. 4D) may explain the inflammatory cytokine expression of IS-trained monocytes.

      We appreciate the reviewer’s valuable comment, recognizing its significant importance. IL-10, characterized by potent anti-inflammatory attributes, assumes a pivotal role in constraining the host immune response against pathogens. This function serves to mitigate potential harm to the host and uphold normal tissue homeostasis. In the context of atherosclerosis (Mallat Z et al. Protective role of interleukin-10 in atherosclerosis. Circ Res. 1999 Oct 15;85(8):e17-24.) and kidney disease (Wei W et al. The role of IL-10 in kidney disease. Int Immunopharmacol. 2022 Jul;108:108917), IL-10 exerts potent deactivating effects on macrophages and T cells, influencing various cellular processes that could impact the development and stability of atherosclerotic plaques. Additionally, it is noteworthy that IL-10-deficient macrophages exhibit an augmentation in the proinflammatory cytokine TNF-α (Smallie T et al. IL-10 inhibits transcription elongation of the human TNF gene in primary macrophages. J Exp Med. 2010 Sep 27;207(10):2081-8; Couper KN et al. IL-10: the master regulator of immunity to infection. J Immunol. 2008 May 1;180(9):5771-7). As emphasized by the reviewer, the reduced gene expression of IL-10 by IS-trained monocytes may contribute to the heightened expression of proinflammatory cytokines. We have thoroughly addressed and discussed this specific point in response to the reviewer's comment (Line 394-399 of page 18 in the revised manuscript).

      (7) The authors need to show H3K4me3 levels in TNF and IL6 genes in all conditions in one figure. (Fig. 2B). Comparing Fig. 2B and Fig. S2B, H3K4me3 does not appear to be increased at all by LPS in the IL6 region. 

      We are grateful for the constructive criticism provided by the reviewer. In response to the reviewer's comment, we endeavored to conduct an experiment demonstrating H3K4me3 enrichment on the promoters of TNF-α and IL-6 across all experimental conditions. However, due to limitations in the availability of purified human monocytes, we conducted an additional three independent experiments for ChIP-qPCR across all conditions. Despite encountering a notable variability among individuals, even within the healthy donor cohort, our results demonstrated an increase in H3K4me3 enrichment on the TNF-α and IL-6 promoters in IS-trained groups, irrespective of subsequent LPS treatment (Author response image 8).

      Author response image 8.

      Analysis of H3K4me3 enrichment on the promoters of TNFA and IL6 Loci in IS-trained macrophages. ChIP-qPCR was employed to assess the enrichment of H3K4me3 on the promoters of TNFA and IL6 loci before (day 6) and after LPS stimulation (day 7) in IS-trained macrophages. The normalization control utilized 2% input. Bar graphs show the mean ± SEM. The data presented are derived from three independent experiments utilizing samples from different donors.

      (8) The authors need to address the changes of H3K4me3 in the presence of MTA.

      We appreciate the constructive criticism provided by the reviewer. In response to the reviewer's feedback, we conducted an analysis of the changes in H3K4me3 in the presence of MTA, a general methyltransferase inhibitor, using identical conditions as depicted in Figure 2C of the original manuscript. Our findings revealed that MTA exerted inhibitory effects on the levels of H3K4me3, as isolated through the acid histone extraction method, which were otherwise increased by IS-training, as illustrated in Author response image 9. 

      Author response image 9.

      The reduction of H3K4me3 by MTA treatment in IS-trained macrophages. IS-trained cells were restimulated by LPS (10 ng/ml) as a secondary challenge for 24 hrs, followed by isolation of histone and WB analysis for H3K4me3, Histone 3 (H3), and β-actin. The blot data from two independent experiments with different donors were shown.

      (9) Interpretation of ChIP-seq results is not entirely convincing due to doubts about the quality of sequencing results. First, authors need to provide information on the quality of ChIP-seq data in reliable criteria such as Encode Pipeline. It should also provide representative tracks of H3K4me3 in the TNF and IL-6 genes (Fig. 2F). And in Fig. 2F, the author showed the H3K4me3 track of replicates, but the results between replicates were very different, so there are concerns about reproducibility. Finally, the authors need to show the correlation between ChIP-seq (Fig. 2) and RNA-seq (Fig. 5).

      We appreciate the constructive criticism provided by the reviewer. 

      As indicated by the reviewer, for evaluation of sample read quality, analysis was performed using the histone ChIP-seq standard from the ENCODE project, focusing on metrics such as read depth, PCR bottleneck coefficient (PBC)1, PBC2, and non-redundant fraction (NRF). Five of the total samples were displayed moderate bottleneck levels (0.5 ≤ PBC1 < 0.8, 1 ≤ PBC2 < 3) with acceptable (0.5 ≤ NRF < 0.8) complexity. One sample showed mild bottlenecks (0.8 ≤ PBC1 < 0.9, 3 ≤ PBC2 < 10) with compliance (0.8 ≤ NRF < 0.9) complexity. This quality metrics indicated ChIP-seq data quality meets at least the standards required for downstream analysis according to ENCODE project criteria (Author response image 10A).

      To examine the differences in H3K4me3 enrichment patterns between two groups, we normalized the read counts around the TSS ±2 kb of human genes to CPM. Sequentially, we compared the average values of IS-treated macrophage compare to control and displayed in waterfall plots. In addition, we marked genes of interest in red including the phenotypes of IStrained macrophages (TNF and IL6), the activation of the innate immune responses (XRCC5, IFI16, PQBP1), and the regulation of ornithine decarboxylase (OAZ3, PSMA3, PSMA1) (Author response image 10B and C). Also, H3K4me3 peak tracks of TNF and IL6 loci and H3K4me3 enrichment pattern were added in supplementary Figure 3D and 3F in the revised manuscript.

      Next, to evaluate the consistency among replicates within a group, we analyzed enrichment values, expressed as Counts per Million (CPM) using edgeR R-package, by applying Spearman's correlation coefficients. we analyzed two sets included total 7,136 H3K4me3 peak sets, as described in Figure 3E in the revised manuscript and 2 kbp around transcription start sites (TSS) from hg19 human genomes. The resulting Spearman's correlation coefficients and associated P-values demonstrated a concordance between replicates, confirming reproducibility and consistent performance (Author response image 10D). 

      Finally, the correlation between gene expression and H3K4me3 enrichment around transcription start sites (TSS) has been reported in previous research (Reshetnikov VV et al. Data of correlation analysis between the density of H3K4me3 in promoters of genes and gene expression: Data from RNA-seq and ChIP-seq analyses of the murine prefrontal cortex. Data Brief. 2020 Oct 2;33:106365). To verify this association in our study, we applied Spearman's correlation for comparative analysis and conducted linear regression to determine if a consistent global trend in RNA expression existed. In our analysis, count values from regions extending 2 kbp around the TSSs in H3K4me3 ChIP-seq data were converted to Counts per Million (CPM) using edgeR R-package. These were then contrasted with the Transcripts Per Million (TPM) values of genes. Our results revealed a significant positive correlation, reinforcing the consistent relationship between H3K4me3 enrichment and gene expression (Author response image 10E and Supplementary Fig. 6D in revised manuscripts).

      Author response image 10.

      The information on quality of ChIP-seq data and correlation between ChIP-seq and RNA-seq. A, information on quality of ChIP-seq data. B, H3K4me3 peak of promoter region on TNFA and IL6. C, The differences in H3K4me3 enrichment patterns between control group and IS-training group. D, The consistency among replicates within a group. E, Correlation between ChIP-seq and RNA-seq in IS-induced trained immunity.

      (10) AhR changes in the cell nucleus should be provided (Fig. 4A).

      We appreciate the constructive feedback from the reviewer. In response to the reviewer's suggestions, we investigated the nuclear translocation of AhR on 6 days after the induction of ISmediated trained immunity, as illustrated in Author response image 11. For this purpose, the lysate from IS-trained monocytes was fractionated into the nucleus and cytosol, and AhR protein was subsequently immunoblotted. The results depicted in Figure X demonstrate that IS-trained monocytes exhibited a higher level of AhR protein in the nucleus compared to non-trained monocytes. Notably, the nuclear translocation of AhR was significantly attenuated in IS-trained monocytes treated with GNF351. These findings imply that the activation of AhR, facilitated by the binding of IS, persisted partially up to 6 days, indicating that IS-mediated degradation of AhR was not fully recovered even on day 6 after the induction of IS training. Consequently, we have replaced Figure 4A in the revised manuscript.

      Author response image 11.

      The activation of AhR, facilitated by IS binding, is persisted partially up to 6 days during induction of trained immunity. The lysate of IS-trained cells treated with or without GNF351, were separated into nuclear and cytosol fraction, followed by WB analysis for AhR protein (Left panel). Band intensity in immunoblots was quantified by densitometry (Right panel). β-actin was used as a normalization control. Bar graphs show the mean ± SEM. * = p < 0.05, by two-tailed paired t-test.

      (11) Do other protein-bound uremic toxins (PBUTs), such as PCS, HA, IAA, and KA, change the mRNA expression of ALOX5, ALOX5AP, and LTB4R1? In the absence of genetic studies, it is difficult to be certain of the ALOX5-related mechanism claimed by the authors.

      We are grateful for the constructive criticism provided by the reviewer. In response to the reviewer's comment, we investigated whether uremic toxins, specifically PBUTs such as PCS, HA, IAA, and KA, induce changes in the mRNA expression of ALOX5, ALOX5AP, and LTB4R1 in trained monocytes. Intriguingly, the examination revealed no discernible induction in the mRNA expression of these genes by PBUTs, with the exception of IS, as depicted in Author response image 12 of the letter. These findings once again underscore the implication of the AhR-ALOX5 pathway in the induction of trained immunity in monocytes by IS.

      Author response image 12.

      No obvious impact of PBUTs except IS on the expression of arachidonic acid pathway-related genes on 6 days after treatment with PBUTs. Purified monocytes were treated with several PBUTs including IS, PCS, HA, IAA, and KA for 24 hrs., following by 5-day resting period to induce trained immunity. The mRNA expression of ALOX5, ALOX5AP, and LTB4R1 were quantified using RT-qPCR. Bar graphs show the mean ± SEM. * = p < 0.05, by two-tailed paired t-test.

      (12) Fig.6 is based on the correlated expression of inflammatory genes or AA pathway genes. It does not clarify any mechanisms the authors claimed in the previous figures. 

      We express our sincere appreciation for the constructive criticism provided by the reviewer, and we have taken careful note of the points raised. In response to the reviewer's feedback, we adopted two distinct approaches utilizing samples obtained from ESRD patients and IS-trained mice. Initially, we investigated the correlation between ALOX5 protein expression in monocytes and IS concentration in the plasma of ESRD patients presented in Figure 6E of the original manuscript. Despite the limited number of samples, our analysis revealed a nonsignificant correlation between IS concentration and ALOX5 expression; however, it demonstrated a positive trend (Author response image 13A). Subsequently, we examined the potential inhibitory effects of zileuton, an ALOX5 inhibitor, on the production of TNF-α and IL-6 in LPSstimulated splenic myeloid cells derived from IS-trained mice. Our findings indicate that zileuton significantly inhibits the production of TNF-α and IL-6 induced by LPS in splenic myeloid cells from IS-trained mice (Author response image 13B). These data were added in Figure 6N of the revised manuscript (Line 350-354 of page 16 in the revised manuscript).

      Author response image 13.

      Assessment of the correlation between ALOX5 and the concentration of IS in ESRD patients, and investigation of ALOX5 effects in mouse splenic myeloid cells in IS-trained mice. A. Examination of the correlation between ALOX5 protein expression in monocytes and IS concentration in the plasma of ESRD patients. B. C57BL/6 mice were administered daily injections of 200 mg/kg IS for 5 days, followed by a resting period of another 5 days. Subsequently, IS-trained mice were sacrificed, and spleens were mechanically dissociated. Isolated splenic myeloid cells were subjected to ex vivo treatment with LPS (10 ng/ml), along with zileuton (100 µM). The levels of TNF-α and IL-6 in the supernatants were quantified using ELISA. The graphs show the mean ± SEM. * = p < 0.05, by two-tailed paired t-test between zileuton treatment group and no-treatment group.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor corrections to the figures

      (1) No indicators for the control group in Fig. 1B.

      We thank you for the reviewer’s comment. According to the reviewer’s comment, the control group was indicated with (-).

      (2) The same paper is listed twice in the references section. (No. 19 and 28)

      We thank you for the reviewer’s comment. We deleted the reference No. 28.

      Reviewer #2 (Public Review):

      Manuscript entitled "Uremic toxin indoxyl sulfate (IS) induces trained immunity via the AhR-dependent arachidonic acid pathway in ESRD" presented some interesting findings. The manuscript strengths included use of H3K4me3-CHIP-Seq, AhR antagonist, IS treated cell RNA-Seq, ALOX5 inhibitor, MTA inhibitor to determine the roles of IS-AhR in trained immunity related to ESRD inflammation and trained immunity.

      Thank you very much for your positive feedback.

      Reviewer #2 (Recommendations For The Authors):

      However, the manuscript needs to be improved by fixing the following concerns.

      There are concerns:

      (1) The experiments in Figs. 1G, 1H and 1I need to have AhR siRNA, and siRNA control to demonstrate that the results in uremic toxins-containing serum-treated experiments were related to IS;

      We extend our gratitude to the reviewer for their invaluable comment, acknowledging its significant relevance to our study. In accordance with the reviewer's suggestion, we endeavored to conduct additional experiments utilizing AhR siRNA to elucidate the direct impact of IS present in the serum of end-stage renal disease (ESRD) patients on the induction of IS-mediated trained immunity. 

      Regrettably, owing to limitations in the availability of monocytes post-siRNA transfection, we were unable to establish a direct relationship between the observed outcomes in experiments utilizing uremic toxins-containing serum and IS in AhR siRNA knockdown monocytes. However, treatment with GNF351, an AhR antagonist, resulted in the inhibition of TNF-α production in trained monocytes exposed to uremic toxins-containing serum (Author response image 14).

      In our previous studies, we have already reported that uremic serum-induced TNF-α production in human monocytes is dependent on the AhR pathway, using GNF351 (Kim HY et al. Indoxyl sulfate (IS)-mediated immune dysfunction provokes endothelial damage in patients with end-stage renal disease (ESRD). Sci Rep. 2017 Jun 8;7(1):3057). Additionally, we have provided evidence demonstrating an augmentation in the activity of the AhR pathway within monocytes derived from ESRD patients, indicative of a significant reduction in AhR protein levels (Kim HY et al. Indoxyl sulfate-induced TNF-α is regulated by crosstalk between the aryl hydrocarbon receptor, NF-κB, and SOCS2 in human macrophages. FASEB J. 2019 Oct;33(10):10844-10858). It is noteworthy that other major protein-bound uremic toxins (PBUTs), such as PCS, HA, IAA, and KA, failed to induce trained immunity in human monocytes (Supplementary Figure 1A in the revised manuscript). Nevertheless, knockdown of AhR via siRNA effectively impeded the induction of IS-mediated trained immunity in human monocytes (Figure 4E in the revised manuscript). 

      Taken collectively, our findings suggest a critical role for IS present in the serum of ESRD patients in the induction of trained immunity in human monocytes. 

      Author response image 14.

      Inhibition of uremic serum (US)-induced trained immunity by AhR antagonist, GNF351. Monocytes were pre-treated with or without GNF351 (AhR antagonist; 10 µM) for 1 hour, followed by treatment with pooled normal serum (NS) or uremic serum (US) at a concentration of 30% (v/v) for 24 hours. After a resting period of 5 days, cells were stimulated with LPS for 24 hours. The production of TNF-α and IL-6 in the supernatants was quantified using ELISA. The data presented are derived from three independent experiments utilizing samples from different donors.

      (2) Fig. 3 needs to be moved as Fig. 2

      We express appreciation for the constructive suggestion provided by the reviewer. In response to the reviewer's comment, the sequence of Figure 3 and Figure 2 was adjusted in the revised manuscript.

      (3, 4) The connection between bioenergetic metabolism pathways and H3K4me3 was missing; The connection between bioenergetic metabolism pathways and ALOX5 was missing;

      We appreciate the reviewer’s constructive criticism and fully understood the reviewer's points. In response to the reviewer's feedback, we conducted additional experiments employing appropriate inhibitors to elucidate the interrelation between bioenergetic metabolism and H3K4me3 and between bioenergetic metabolism and ALOX5. Initially, we assessed the enrichment of H3K4me3 at 6-day on promoters of TNFA and IL6 loci after treatment of 2-DG, a glycolysis inhibitor. Additionally, we evaluated the alteration in the activity of S6K, a downstream molecule of mTORC1, following treatment with zileuton, an inhibitor of ALOX5. Our findings indicate that AhR-dependent arachidonic acid (AA) signaling induces epigenetic modifications, albeit without inducing metabolic rewiring, in IS-induced trained immunity (Author response image 15). However, IS stimulation promotes mTORC1-mediated glycolysis in an AhR-independent manner. Notably, inhibition of glycolysis with 2-DG impacts epigenetic modifications. We have updated Figure 7 of the revised manuscript to incorporate these additional experimental findings, elucidating the correlation between the diverse mechanisms implicated in IS-induced innate immune memory (Fig. 7 in the revised manuscript).

      Author response image 15.

      Epigenetic modification is regulated by arachidonic acid (AA) pathway and metabolic rewiring, but metabolic rewiring is not affected by the AA pathway. A-B. Monocytes were pre-treated with zileuton (ZLT), an inhibitor of ALOX5, or 2DG, a glycolysis inhibitor, followed by stimulation with IS for 24 hours. After a resting period of 5 days, the enrichment of H3K4me3 on the promoters of TNFA and IL6 loci was assessed. Normalization was performed using 2% input. C. Monocytes were pre-treated with ziluton (ZLT) and stimulated with IS for 24 hr. Cell lysates were immunoblotted for phosphorylated S6 Kinase, with β-actin serving as a normalization control. Band intensities in the immunoblots were quantified using densitometry. D, A schematic representation of the mechanistic framework underlying IS-trained immunity. Bar graphs show the mean ± SEM. * = p < 0.05, **= p < 0.01, and *** = p < 0.001 by two-tailed paired t-test.

      (5) It was unclear whether histone acetylations such as H3K27acetylation and H3K14 acetylation are involved in IS-induced epigenetic reprogramming or IS-induced trained immunity is highly histone methylation-specific.

      We appreciate the constructive comment provided by the reviewer. As highlighted by the reviewer, alterations in epigenetic histone markers, specifically H3K4me3 or H3K27ac, have been recognized as the underlying molecular mechanism in trained immunity. Due to limitations in the availability of trained cells, this study primarily focused on histone methylation. In response to the reviewer's inquiry, we briefly investigated the impact of histone acetylation using C646, a histone acetyltransferase inhibitor, on IS-induced trained immunity (Author response image 16). Our experiments revealed that C646 treatment effectively hinders the production of TNF-α and IL-6 by IS-trained monocytes in response to LPS stimulation, comparable to the effects observed with MTA (5’methylthioadenosine), a non-selective methyltransferase inhibitor. This suggests that histone acetylation also contributes to the epigenetic modifications associated with IS-induced trained immunity. We sincerely appreciate the valuable input from the reviewer.

      Author response image 16.

      The role of histone acetylation in epigenetic modifications in IS-induced trained immunity. Monocytes were pretreated with MTA (methylthioadenosine, methyltransferase inhibitor) or C646 (histone acetyltransferase p300 inhibitor), followed treatment with IS 1 mM for 24 hrs. After resting for 5 days, trained cells were re-stimulated by LPS 10 ng/ml as secondary insult. TNF-α and IL-6 in supernatants were quantified by ELISA. Bar graphs show the mean ± SEM. * = p < 0.05 and **= p < 0.01 by two-tailed paired t-test.

      Reviewer #3 (Public Review):

      The manuscript entitled, "Uremic toxin indoxyl sulfate induces trained immunity via the AhRdependent arachidonic acid pathway in ESRD" demonstrates that indoxyl sulfate (IS) induces trained immunity in monocytes via epigenetic and metabolic reprogramming, resulting in augmented cytokine production. The authors conducted well-designed experiments to show that the aryl hydrocarbon receptor (AhR) contributes to IS-trained immunity by enhancing the expression of arachidonic acid (AA) metabolism-related genes such as arachidonate 5-lipoxygenase (ALOX5) and ALOX5 activating protein (ALOX5AP). Overall, this is a very interesting study that highlights that IS mediated trained immunity may have deleterious outcomes in augmented immune responses to the secondary insult in ESRD. Key findings would help to understand accelerated inflammation in CKD or RSRD.

      We greatly appreciate your positive feedback.

      Reviewer #3 (Recommendations for The Authors):

      This reviewer, however, has the following concerns.

      Major comments:

      (1) Figure 1B: IS is known to induce the expression of TNF-a and IL-6. This reviewer wonders why these molecules were not detected in the IS (+) LPS (-) condition.

      We appreciate the constructive comment provided by the reviewer. In our prior investigation, it was observed that the expression of TNF-α and IL-6 was induced 24 hours after IS treatment in human monocytes and macrophages (Couper KN et al. IL-10: the master regulator of immunity to infection. J Immunol. 2008 May 1;180(9):5771-7). In adherence to the trained immunity protocol, the medium was replaced at the 24 hours post-IS treatment to eliminate IS, with a subsequent change after a 5-day resting period. Probably, TNF-α and IL-6 are accumulated and detected in the IS (+) LPS (-) culture supernatant if the media was not changed at these specific time points. Our primary objective, however, was to ascertain the role of IS in the induction of trained immunity, prompting an investigation into whether IS contributes to an increase in the production of TNF-α and IL-6 in response to LPS stimulation as a secondary insult. 

      (2) 1' stimulus is IS followed by 2' stimulus LPS/Pam3. It would be interesting to know what the immune profile is when other uremic toxin is used for secondary insult, this would be more relevant in clinical context of ESRD.

      The reviewer's insightful comment is greatly appreciated. To address their feedback, IStrained macrophages were subjected to additional stimulation using protein-bound uremic toxins (PBUTs) as a secondary challenge. As illustrated in Letter figure 17, the examined uremic toxins, namely p-cresyl sulfate (PCS), Hippuric acid (HA), Indole 3-acetic acid (IAA), and kynurenic acid (KA), failed to elicit the production of proinflammatory cytokines, specifically TNF-α and IL-6, by IS-trained monocytes.

      Author response image 17.

      No obvious effect of protein-bound uremic toxin (PBUTs) as secondary insults on the production of proinflammatory cytokines in IS-trained monocytes. IS-trained monocytes were re-stimulated with several PBUTs, such as IS (1 mM), PCS (1 mM), HA (2 mM), IAA. (0.5 mM), and KA (0.5 mM) as a secondary challenge for 24 hrs. TNF-α and IL-6 in supernatants were quantified by ELISA. The data from two independent experiments with different donors were shown. ND indicates ‘not detected’.

      (3) The authors need to explain a rationale why RNA and protein data used different markers.

      We appreciate the constructive input provided by the reviewer. Given that TNF-α and IL6 represent prototypical cytokines synthesized by trained monocytes in humans, we conducted a comprehensive analysis of their mRNA and protein levels. In human macrophages, the release of active IL-1β necessitates a second priming event, such as the presence of ATP. Consequently, we posited that assessing the mRNA levels of IL-1β would suffice to demonstrate the induction of trained immunity in our experimental protocol. Nevertheless, in response to the reviewer's comment, we proceeded to assess the protein levels of IL-1β, IL-10, and MCP-1 as illustrated in Author response image 189. These data have been incorporated into the revised manuscript as supplementary Figure 1E. 

      Author response image 18.

      Modulation of cytokine levels in IS-trained macrophages in response to secondary stimulation with LPS. Human monocytes were stimulated with the IS for 24 hr, followed by resting period for 5 days. On day 6, the cells were re-stimulated with LPS for 24 hr. The levels of each cytokine in the supernatants were quantified using ELISA. Bar graphs show the mean ± SEM. ** = p < 0.01 and ***= p < 0.001 by two-tailed paired t-test.

      (4) Epigenetic modification primarily involves histone modification and DNA methylation. The authors presented convincing data on histone modification (Figure 2), but did not provide any insights in the promoter DNA methylation status.

      We express our gratitude to the reviewer for providing valuable comments, which highlight a crucial aspect of our study. Despite the well-established primary role of DNA methylation in epigenetic modifications, recent suggestions propose that histone modifications, particularly H3K4me3 or H3K27ac, play a predominant role in the induction of trained immunity. In this context, our primary inquiry was focused on determining whether IS, as an endogenous insult, induces trained immunity in monocytes, and if so, whether IS-trained immunity is mediated through metabolic and epigenetic modifications - recognized as the major mechanisms underlying the generation of trained immunity. It is imperative to note that our study's primary objective did not encompass the identification of various epigenetic changes. In response to the reviewer's inquiry, we conducted a brief examination of the impact of DNA methylation using ZdCyd (5-aza-2’-deoxycytidine), a DNA methylation inhibitor, on IS-induced trained immunity. Our experimental findings indicate that ZdCyd treatment exerts no discernible effect on the production of TNF-α and IL-6 by IS-trained monocytes upon stimulation with LPS, as illustrated in Author response image 19. However, a recent study has shed light on the role of DNA methylation in BCG vaccine-induced trained immunity in human monocytes (Bannister S et al. Neonatal BCG vaccination is associated with a long-term DNA methylation signature in circulating monocytes. Sci Adv. 2022 Aug 5;8(31):eabn4002). Consequently, further investigations utilizing DNA methylation sequencing are warranted to elucidate whether DNA methylation is implicated in the induction of IS-trained immunity.

      Author response image 19.

      The effect of DNA methylation on IS-induced trained immunity. Monocytes were pretreated with ZdCyd (5-aza-2’-deoxycytidine, DNA methylation inhibitor), followed by treatment with IS 1 mM for 24 hrs. After resting for 5 days, cells were re-stimulated by LPS 10 ng/ml as secondary insult. TNF-α and IL-6 in supernatants were quantified by

      ELISA. Bar graphs show the mean ± SEM. * = p < 0.05 and **= p < 0.01 by two-tailed paired t-test.

                     

      (5) Metabolic rewiring in trained immunity cells undergo metabolic changes which involved intertwined pathways of glucose and cholesterol metabolism. The authors presented nice data on glucose pathway (Figure 3) but failed to show any changes related to cholesterol metabolism.

      We express our gratitude to the reviewer for providing valuable comments, which underscore a noteworthy observation. In the current investigation, our primary emphasis has been on glycolytic reprogramming, recognized as a principal mechanism for inducing trained immunity in monocytes. This focus stems from preliminary experiments wherein Fluvastatin, a cholesterol synthesis inhibitor, demonstrated no discernible impact on TNF-α production by IS-trained monocytes, as illustrated in Author response image 20. Intriguingly, Fluvastatin treatment exhibited a partial inhibitory effect on the production of IL-6 by IS-trained monocytes. Subsequent investigations are imperative to elucidate the role of cholesterol metabolism in the induction of IS-trained immunity.

      Author response image 20.

      The effect of cholesterol metabolism on IS-induced trained immunity. Monocytes were pretreated with Fluvastatin (cholesterol synthesis inhibitor, HMG-CoA reductase inhibitor), followed treatment with IS 1 mM for 24 hrs. After resting for 5 days, cells were re-stimulated by LPS 10 ng/ml as secondary insult. TNF-α and IL-6 in supernatants were quantified by ELISA. Bar graphs show the mean ± SEM. * = p < 0.05 and **= p < 0.01 by two-tailed paired t-test.

      (6) Trained immunity involves neutrophils in addition to monocyte/macrophages. It is evident from the RNAseq data that neutrophil degranulation (Figure 5B) is the top enriched pathway. This reviewer wonders why the authors did not perform any assays on neutrophils.

      We appreciate the reviewer for valuable comment. IS represents a major uremic toxin that accumulates in the serum of patients with chronic kidney disease (CKD), correlating with CKD progression and the onset of CKD-related complications, including cardiovascular diseases (CVD). Our prior investigations have demonstrated that IS promotes the production of TNF-α and IL-1β by human monocytes and macrophages. Additionally, macrophages pre-treated with IS exhibit a significant augmentation in TNF-α production when exposed to a low dose of lipopolysaccharide (LPS). Considering the pivotal role of proinflammatory macrophages and TNF-α, a principal cardiotoxic cytokine, in CVD pathogenesis, our focus in this study has primarily focused on elucidating the trained immunity of monocytes/macrophages. Consequently, all experiments were meticulously conducted using highly purified monocytes and monocytederived macrophages derived from both healthy controls and end-stage renal disease (ESRD) patients. The reviewer's observation regarding the potential involvement of neutrophils in trained immunity has been duly noted. Subsequent investigations will be imperative to explore the conceivable role of IS-trained neutrophils in the pathogenesis of CVD. Once again, we appreciate the reviewer for their valuable comment.

      (7) Figure 5C (GSEA plots): This reviewer is not sure if one can present the plots assigned with groups (eg. IS(T) vs Control). More details are required in the Methods related to this.

      We apologize for any ambiguity resulting from the previously unclear description of methods concerning Gene Set Enrichment Analysis (GSEA) plots. To provide clarification, additional details pertaining to this aspect have been explained upon in the revised manuscript's Methods section. 

      (8) In vivo data (Figure 6 I-M): Instead of serum profile and whole set of spleen myeloid cells, it would be interesting to see changes of markers on peritoneal macrophages or bone marrow-derived macrophages since the in vitro findings are on monocyte-derived macrophages.

      We appreciate comment and the insightful suggestion provided by the reviewer. In response to the reviewer's feedback, we conducted additional in vivo experiments to examine the production of TNF-α and IL-6 in bone marrow-derived macrophages (BMDMs) derived from IStrained mice. Upon LPS stimulation, we observed an increase in the production of TNF-α and IL-6 in spleen myeloid cells from IS-trained mice. However, no such increase in these cytokines was noted in BMDMs derived from the same mice (Author response image 22, A and B). In fact, we already observed that that the expression of ALOX5 was not elevated in BM cells derived from IS-trained mice presented in Figure 6L and M of the original manuscript (Author response image 22C). 

      Recent studies have indicated that trained immunity can be induced in circulating immune cells, such as monocytes or resident macrophages (peripheral trained immunity), as well as in hematopoietic stem and progenitor cells (HSPCs) within the bone marrow (central trained immunity) (Kaufmann E et al. BCG Educates Hematopoietic Stem Cells to Generate Protective Innate Immunity against Tuberculosis. Cell. 2018 Jan 11;172(1-2):176-190.e19; Riksen NP et al. Trained immunity in atherosclerotic cardiovascular disease. Nat Rev Cardiol. 2023 Dec;20(12):799-811). It is plausible that central trained immunity in BM progenitor cells may not be elicited in our mouse model, which is relatively acute in nature. Further investigations are warranted to explore the role of IS in inducing central trained immunity, utilizing appropriate chronic disease models.

      We have included this additional data as supplementary figures in the revised manuscript (Suppl. Fig. 7, D and E, and line 355-362 of page 16 in the revised manuscript).

      Author response image 21.

      Absence of trained immunity in bone marrow derived macrophages (BMDMs) derived from IStrained mice. A-B, IS was intraperitoneally injected daily for 5 days, followed by training for another 5 days. Isolated BM progenitor cells and spleen myeloid cells were differentiated or treated with LPS for 24 hr. The supernatants were collected for ELISA. C, The level of ALOX5 protein in BM cells isolated from IS-trained or control mice was analyzed by western blot. The graph illustrates the band intensity quantified by densitometry. Bar graphs show the mean ± SEM. * = p < 0.05 and **= p < 0.01, by unpaired t-test.

      (9) Figure 7: There are no data on signaling pathway(s) that links IS and epigenetic changes, the authors therefore may want to add "?" to the proposed mechanism.

      We extend our sincere appreciation to the reviewer for providing valuable feedback. In light of the constructive comments provided by three reviewers, we have undertaken a series of additional experiments. These efforts have enabled us to propose a more elucidating schematic representation of the proposed mechanism, free of any ambiguous elements (Figure 7 in the revised manuscript). We are grateful for your insightful input.

      (10) Demographic data (Table S2): ESRD patients have co-morbidities including diabetes (33% of subjects), CAD (28%). How did the authors factor out the co-morbidities in the overall context of their findings?

      We express gratitude to the reviewer for providing valuable comments, particularly on a noteworthy and significant aspect. The investigation employed an End-Stage Renal Disease (ESRD) Cohort involving approximately 60 subjects undergoing maintenance hemodialysis at Severance Hospital in Seoul, Korea. The subset of participants subjected to analysis consisted of stable individuals who provided informed consent and had not undergone hospitalization for reasons related to infection or acute events within the preceding three months.

      (11) There are no data on the purity of IS.

      According to the reviewer's suggestion, we have included information regarding the purity (99%) of IS in the Methods section.

      (12) Figure 6L: Immunoblot on b-actin were merged. This reviewer wonders how the authors analyzed these blots. 

      We express gratitude for the constructive criticism provided by the reviewer, and we acknowledge and comprehend the concerns raised. In response to the reviewer's comments, a reanalysis of the ALOX5 expression level in Figure 6M was conducted, employing immunoblot analysis on β-actin, as depicted in Figure 6L, with a short exposure time (Author response image 22).

      Author response image 22.

      ALOX5 protein exhibited an elevation in splenic myeloid cells obtained from IS-trained mice.

      (13) qPCR data throughout the manuscript have control group with no error bar. The authors may not set all controls arbitrarily equal to 1 (Example Figure 1H and I). Data should be normalized in a test standard way. The average of a single datapoint may be scaled to 1, but variation must remain within the control groups.

      We express gratitude to the reviewer for their valuable feedback, acknowledging a comprehensive understanding of their perspectives. Our qPCR assays predominantly investigated the impact of various treatments on the expression of specific target genes (e.g., TNF-α, IL-6, Alox5) within monocytes/macrophages obtained from the same donors.

      Subsequently, normalization of gene expression levels occurred relative to ACTINB expression, followed by relative fold-increase determination using the comparative CT method (ΔΔCT).

      Statistical significance was assessed through a two-tailed paired analysis in these instances. Additionally, a substantial portion of the qPCR data was validated at the protein level through ELISA and immunoblotting techniques.

      Minor Comments:

      (1) Molecular weight markers are missing in immunoblots throughout the manuscript.

      According to the reviewer's comment, molecular weight markers are added into immunoblots

      (2)  ESRD should be spelled out in the title.

      According to the reviewer's comment, we spelled out ESRD in the title.

    1. eLife assessment

      This important study expands generally upon our understanding of the role of hnRNP proteins in lncRNA function through analysis of ASAR genes that are present on all chromosomes and of profound significance. The findings provide convincing evidence linking ASARs with the phenomenon of RNA retention on chromosomes, including X inactivation, thereby providing an expanded context for studies in these areas. This manuscript will be of interest to researchers studying gene regulation and the interactions and functional roles of hnRNP and lncRNAs.

    2. Reviewer #1 (Public Review):

      Summary:

      Thayer et al build upon their prior findings that ASAR long noncoding RNAs (lncRNAs) are chromatin-associated and are implicated in control of replication timing. To explore the mechanism of function of ASAR transcripts, they leveraged the ENCODE RNA binding protein eCLIP datasets to show that a 7kb region of ASAR6-141 is bound by multiple hnRNP proteins. Deletion of this 7kb region resulted in delayed chromosome 6 replication. Furthermore, ectopic integration of the ASAR6-141 7kb region into autosomes or the inactive X-chromosome also resulted in delayed chromosome replication. They then use RNA FISH experiments to show that knockdown of these hnRNP proteins disrupts ASAR6-141 localization to chromatin and in turn replication timing.

      Strengths:

      Given prior publications showing HNRNPU to be important for chromatin retention of XIST and Firre, this work expands upon our understanding on the role of hnRNP proteins in lncRNA function.

      Weaknesses:

      The work presented is mechanistically interesting, however, one must be careful with the over interpretation that hnRNP proteins can regular chromosome replication directly.

    3. Reviewer #2 (Public Review):

      Summary:

      This paper reports a role for a substantial number of RNA binding proteins (RBPs), in particular hnRNPs, in the function of ASAR "genes". ASARs are (very) long, non-coding RNAs (lncRNAs) that control allelic expression imbalance (e.g.: mono-allelic expression) and replication timing of their resident chromosomes. These relatively novel "genes" have recently been identified on all human autosomes and are of broad significance given their critical importance for basic chromosomal functions and stability. However, the mechanism(s) of ASAR function remain unclear. ASARs exhibit some functional relatedness to Xist RNA, including persistent association of the expressed RNA with its resident chromosome, and similarities in the composition of RNA sequences associated with ASARs, in particular Line1 RNAs. Recent findings that certain hnRNPs control the chromosome territory retention of Cot1-bearing RNAs (which includes Line1) led the authors to test hypothesis that hnRNPs might regulate ASARs.

      Specific new findings in this paper:

      -Analysis of eCLIP (RNA-protein interaction) ENCODE data shows numerous interactions of the ASAR6-141 RNA with RBPs, including hnRNPs (e.g.: HNRNPU) that have been implicated in the retention of RNAs within local chromosome territories.<br /> -most of these interactions can be mapped to a 7kb region of the 185kb ASAR6-141 RNA<br /> -deletion of this 7kb region is sufficient to induce the DMC/DRT phenotype associated with deletion of the entire ASAR region<br /> -ectopic integration into mouse autosomes of the 7kb region is sufficient to cause DMC/DRT of the targeted autosome, and a similar effect upon ectopic integration into inactive X. This raises the question about integration into the active X, which was not mentioned. Is integration into the active X observed? Is it possible that integration might alter Xist expression confounding this interpretation?<br /> -Knockdown of RBPs that bind the 7kb region causes dissociation of ASAR6-141 RNA from its chromosome territory, and, remarkably, dissociation of Xist RNA from inactive X, and mis-colocalization of the ASAR6-141 and Xist RNAs. Depletion of these RBPs causes DMC/DRT on all autosomes.

      Strengths:

      These are compelling results suggesting shared mechanism(s) in the regulation of ASARs and Xist RNAs by RBPs that bind Cot1 sequences in these lncRNAs. The identification of these RBPs as shared effectors of ASARs and Xist that are required for RNA territory localization mechanistically links previously independent phenomena.

      The data are convincing and support the conclusions. The replication timing method is low resolution and is only a relative measure but seems adequate for the task at hand. The FISH experiments are convincing. The quality of the images is impressive.

      Links to other subfields like X-inactivation and RNA association with chromosome territories provide novel context and protein players, new phenotypes to examine

      Weaknesses:

      The exact effects of knockdown experiments are unclear and may be indirect, which is acknowledged.

      The mechanism is not much clearer than before.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major recommendations

      (1) In lines 42-44 (abstract), the authors state that "ASARs function as essential RNA scaffolds for the assembly of hnRNP complexes that help maintain the structural integrity of each mammalian chromosome". Similar conclusions are restated in lines 138-140. Based on the data presented, it is evident that ASARs localization on chromatin is dependent on hnRNPs. However, there is insufficient evidence to conclude that ASARs cause the assembly of hnRNP complexes or that these hnRNP complexes are directly responsible for the regulation of chromosome replication. Please revise your claims.

      We have modified the text as follows: “Our results further demonstrate the role that ASARs play during the temporal order of genome-wide replication, and we propose that ASARs function as essential RNA scaffolds for the assembly of hnRNP complexes that help maintain the structural integrity of each mammalian chromosome.”

      (2) In the analysis in Figure 1C- F, it is unclear why XIST is used as a comparison to ASAR6-141. A more meaningful control would be to show that hnRNPs preferentially bind ASAR6-141 relative to all expressed transcripts. Also, some panels are missing the y-axis label.

      We have genetically validated 8 different ASAR genes for their role in controlling chromosome-wide replication timing. The only other gene known to control chromosome-wide replication timing is XIST, which also encodes a chromosome-associated lncRNA. Our analysis of publicly available eCLIP data (and previous literature on XIST-binding proteins) showed substantial overlap between RBPs that associate with ASARs and XIST. Hence, we anticipated that at least some RBP knockdowns would affect both lncRNAs, despite their contrasting functions. In addition, we routinely use XIST RNA as a positive control in RNA FISH assays, as the XIST RNA FISH protocol represents a robust and well validated chromosomal RNA FISH procedure.

      y-axis labels have been added to Figure 1.

      (3) In Figure 2K&L, it would be beneficial to quantify and normalize the BrdU incorporation, as ectopic integration of the sense 7kb region appears to result in overall higher BrdU incorporation in all chromosomes, not just chromosome 5.

      There are two main aspects of the BrdU incorporation assay that we use: 1) The BrdU incorporation banding pattern on each chromosome is unique to that chromosome, and the banding pattern is also representative of the time during S phase when the BrdU incorporation occurred, i.e. we detect a different banding pattern if BrdU is incorporated in early S phase versus late S phase. 2) The amount of BrdU incorporation can be used to measure the synchrony between chromosome homologs, but only within the same cell. Thus, we generate a ratio of BrdU incorporation in chromosome homologs in individual cells, then compare the ratio of incorporation into each chromosome pair in multiple cells (see Figure 2B-E). The overall BrdU incorporation into the chromosomes of different cells is quite variable; however, the banding pattern and ratio of BrdU incorporation in chromosome homologs in individual cells is comparable, unless we have disrupted or ectopically integrated an ASAR. Given the variability in overall BrdU incorporation detected between different cells in the population this is not a useful readout for measuring synchronous versus asynchronous replication between chromosome homologs.

      (4) hnRNP protein can regulate multiple aspects of RNA processing other than chromatin retention. Hence, it would be beneficial to rule out an alternative hypothesis as to what the hnRNP knockdowns do to ASAR6-131? For example, assessing changes in RNA levels or splicing upon knockdown of hnRNPs using qPCR?

      We agree that direct roles for any of the hnRNP/RBPs that are critical for ASAR RNA localization and replication timing have not been established. However, our findings combined with the observation that cells depleted of HNRNPU show reduced origin licensing in G1, and show reduced origin activation frequency during S phase (PMID: 34888666), supports a role for HNRNPU, either directly or indirectly, in DNA replication. Furthermore, we also found that depletion of the DNA replication fork remodeler HLTF or the deubiquitinase UCHL5 also results in mis-localization of ASAR RNAs, and results in asynchronous replication of every autosome pair, indicating that ASAR RNA mis-localization and asynchronous replication are not simply a phenotype associated with hnRNP depletions. A full mechanistic understanding of the role that ASAR RNAs play in combination with this relatively large and diverse set of hnRNP/RBPs will require a better understanding of the direct roles that each protein, and any higher order complexes that contain these proteins, play in regulating DNA synthesis, splicing, transcription, chromatin structure and/or ASAR RNA localization.

      (5) Both the disruption and ectopic expression of the 7kb region result in delayed chromosome replication. Would one not expect there to be opposing effects on replication timing? Please discuss.

      One puzzling set of observations is that loss of function mutations and gain of function mutations of ASAR genes result in a similar delayed replication timing and delayed mitotic condensation phenotype. We have detected delayed replication timing in human cells following genetic knockouts (loss of function) of eight different ASAR genes located on 5 different autosomes. We have also detected delayed replication timing on mouse chromosomes expressing transgenes (gain of function) from three different ASAR genes (ASAR6, ASAR6-141, and ASAR15). The ASAR transgenes ranged in size from an ~180kb BAC, to an ~3kb PCR product. One possible explanation for these observations is that ectopic integration of ASAR transgenes function in a dominant negative manner by interfering with the endogenous “ASARs” on the integrated chromosomes. Consistent with this possibility is that we recently identified ASAR candidate genes on every human autosome (PMC9588035). Our favored model is that expression of ASAR transgenes integrated into mouse chromosomes disrupts the function of endogenous ASARs by "out-competing" them for shared RBPs. We also point out that a similar ectopic integration assay, using Xist transgenes, has been an informative assay for characterization of Xist functions, including the ability to delay replication timing and induce gene silencing on autosomes (reviewed in PMID:19898525). One intriguing observation (yet largely ignored by the X inactivation field) is that deletion of the Xist gene on either the active or inactive X chromosomes in somatic cells results in delayed replication timing of the X chromosomes (PMC1667074; PMC1456779). Thus, both loss of function and gain of function mutations of Xist result in a similar delayed replication timing phenotype. Given these parallels between Xist and ASAR gene mutation phenotypes we were curious to test the consequences of ASAR gain of function on the inactive X chromosome. In this manuscript, we integrated the ~7kb ASAR6-141 transgene into the inactive X chromosome, and detected a delayed replication timing phenotype on the integrated X chromosome. We also detected an association between Xist and ASAR RNAs using RNA FISH in interphase cells (Figure 4A and 4B), which supports the observations that ASAR RNAs and XIST RNA are bound by a partially overlapping set of hnRNP/RBPs (Figure 1D-F), and is consistent with the model that ASAR transgenes disrupt function by competition for shared RBPs. Dissecting the roles that the hnRNP/RBPs that interact with both ASAR and XIST RNAs will undoubtably give important insights into both XIST and ASAR function, and how these poorly understood chromosomal phenotypes are generated.

      Minor recommendations

      (1) In Figure 1G, it would be informative to show where the LINE-1 element within ASAR6-141 is located to get a sense of what hnRNP proteins bind to it.

      There are numerous LINE-1 elements within the ASAR6-141 gene. The ~7kb RBPD does not contain LINE-1 sequences. Therefore, we did not detect significant hnRNP/RBP eCLIP peaks within LINE-1 sequences.

      (2) The rationale for ectopic integration of the 7kb region into the inactive X-chromosome is unclear. Is there something unique about the replication of the inactive X or were you interested in seeing whether the 7kb region could escape X-inactivation?

      Given the parallels between Xist and ASAR gene mutation phenotypes, i.e. loss of function and gain of function result in delayed replication timing (see above), we were curious to test the consequences of ASAR gene gain of function on the inactive X chromosome. One possibility was reversal of X inactivation and a shift to earlier replication timing. However, we detected delayed replication timing on the inactive X, and an enhanced XIST RNA FISH signal that overlapped with the ASAR RNA. This speaks to the comment of Reviewer 2 questioning: "Is it possible that integration might alter Xist expression confounding this interpretation? ". The enhanced XIST RNA FISH signal suggests that the delayed replication of the inactive X is not due to reduced expression of XIST RNA.

    1. Joint Public Review:

      Xie et al. propose that the asymmetric segregation of the NuRD complex is regulated in a V-ATPase-dependent manner, and plays a crucial role in determining the differential expression of the apoptosis activator egl-1 and thus critical for the life/death fate decision.

      Remaining concerns are the following:

      The authors should provide the point-by-point response to the following issues. In particular, authors should provide clear reasoning as to why they did not address some of the following comments in the previous revisions. The next response should be directly answering to the following concerns.

      (1) Discussion should be added regarding the criticism that NuRD asymmetric segregation is simply a result of daughter cell size asymmetry. It is perfectly fine that the NuRD asymmetry is due to the daughter cell size difference (still the nucleus within the bigger daughter would have more NuRD, which can determine the fate of daughter cells). Once the authors add this clarification, some criticisms about 'control' may become irrelevant.

      (2) ZEN-4 is a kinesin that predominantly associates with the midzone microtubules and a midbody during mitosis. Given that midbodies can be asymmetrically inherited during cell division, ZEN-4 is not a good control for monitoring the inheritance of cytoplasmic proteins during asymmetric cell division. Other control proteins, such as a transcriptional factor that predominantly localizes in the cytoplasm during mitosis and enters into nucleus during interphase, are needed to clarify the concern.

      As for pHluorin experiments, symmetric inheritance of GFP and mCherry is not an appropriate evidence to estimate the level of pHluorin during asymmmetric Q cell division. This issue remains unsolved.

      (3) Q-Q plot (quantile-quantile plot) in Figure S10 can be used for visually checking normality of the data, but it does not guarantee that the distribution of each sample is normal and has the standard deviation compared with the other samples. I recommend the authors to show the actual statistical comparison P-values for each case. The authors also need to show the number of replicate experiments for each figure panel.

      The authors left inappropriate graphs in the revised manuscript. In Figure 3E, some error bars are disconnected and the other are stuck in the bars. In Figure S4C, LIN-53 in QR.a/p graph shows lines disconnected from error bars.

      I am bit confused with the error bars in Figure 2B. Each dot represents a fluorescent intensity ratio of either HDA-1 or LIN-53 between the two daughter cells in a single animal. Plots are shown with mean and SEM, but several samples (for example, the left end) exhibit the SEM error bar very close to a range of min and max. I might misunderstand this graph but am concerned that Figure 2B may contain some errors in representing these data sets. I would like to ask the authors to provide all values in a table format so that the reviewers could verify the statistical tests and graph representation.

      (4) The authors still do not provide evidence that the increase in sAnxV::GFP and Pegl-1gfp or the increase in H3K27ac at the egl-1 gene in hda-1(RNAi) and lin-53(RNAi) animals is not a consequence of global effects on development. Indeed, the images provided in Figure S7B demonstrate that there are global effects in these animals. no causal interactions have been demonstrated.

      (5) Figure 4: Due to the lack of appropriate controls for the co-IP experiment (Fig. 4), I remain unconvinced of the claim that the NuRD complex and V-ATPase specifically interact. Concerning the co-IP, the authors now mention that the co-IP was performed three times: "Assay was performed using three biological replicates. Three independent biological replicates of the experiment were conducted with similar results." However, the authors did not use ACT-4::GFP or GFP alone as controls for their co-IP as previously suggested. This is critical considering that the evidence for a specific HDA-1::GFP - V-ATPase interaction is rather weak (compare interactions between HDA-1::GFP and V-ATPase subunits in Fig 4B with those of HDA-1::GFP and subunits of NuRD in Fig S8B).

      (6) Based on Fig 5E, it appears that Bafilomycin treatment causes pleiotropic effects on animals (see differences in HDA-1::GFP signal in the three rows). The authors now state: "Although BafA1-mediated disruption of lysosomal pH homeostasis is recognized to elicit a wide array of intracellular abnormalities, we found no evidence of such pleiotropic effects at the organismal level with the dosage and duration of treatment employed in this study". However, the 'evidence' mentioned is not shown. It is critical that the authors provide this evidence.

    1. eLife assessment

      The author use an approach that is in principle useful, comparative meta-analysis, to contribute to our understanding of life history evolution. The advance remains limited, as both the meta-analysis and the theoretical model are incomplete, and proper statistical and mechanistic descriptions of the simulations are lacking. A major concern is that the interpretation does not properly take into account the effect of well-characterised complexities in the relationship between clutch size and fitness in birds.

    2. Reviewer #2 (Public Review):

      I have read the re-submission of the manuscript "The optimal clutch size revisited: separating individual quality from the parental survival costs of reproduction" by LA Winder and colleagues.

      I have to say that I am quite disappointed not to see any formalisation of the mechanism that the authors have in mind to explain the results they have and to draw general conclusions from it. In my original review, I strongly recommended "improving the theoretical component of the analysis by providing a solid theoretical framework before, from it, drawing conclusions. This, at a minimum, requires [...] most importantly a mechanistic model describing the assumed relationships."

      Without it, it is impossible to follow, agree or disagree with the authors and learn something from the meta-analysis other than: the clutch size-annual survival relationship has opposite slopes for manipulated and natural populations. Such a set of equations (would replace pages of verbose and) is not only necessary for the readers to be able to understand the authors' points and to clearly understand the simplifying assumptions, but also for the authors to ensure they conclusions are sound. For these reasons this is a central part of such studies, see, e.g. (Walker et al., 2008). This is supposedly replaced here by a figure (figure 5), which top-left part reads: "Parental survival costs of reproduction constrain intra-specific reproduction" - "no the effect size on fig 4 is too small". Figure 4 is the output of simulations where the authors have incorporated the mean effect on survival rate per egg from the manipulated populations into a model where they compute R0 for various increases in the annual fertility rate, and related decreases in annual survival rates, showing that along the slow-fast gradient, for balanced survival-reproduction (certainly not far from R0=1), R0 is not affected (or very little) by change in fertility-survival along the trade-off. Nowhere on this figure, do we have any information inferring that survival costs of reproduction do not constrain intra-specific reproduction. It is actually possible to build a simple mechanistic model with a trade-off mechanism that strongly affects the LRS and its variance between individuals and to would produce the exact same figure.

      This is compounded in this manuscript by the constant verbose, imprecisions, outright mistakes, with a general confusion between magnitudes and variation of magnitudes, which makes it very hard to read. Let us just look at two examples illustrating my points. In the abstract, I read: " ... revealed that reproduction presented negligible costs, except when reproductive effort was forced beyond the level observed within species, to that seen between species" means nothing: what is the level of reproductive effort seen between species? I suppose the authors mean "forced beyond the maximum level observed within species, to that seen between species" or something like that. Caption figure 4:" Selection differentials (i.e., the difference in lifetime reproductive output between hypothetical control and brood-manipulated populations)" It cannot be how this was calculated however: the difference between equal things is 0, not 1. These errors and all the other imprecisions, lengthy definitions that are for some almost impossible to fathom are the direct result of trying at all costs not to use a single equation, the most important tool in the study of ecology and trade-offs in particular, in a paper on costs of reproduction.

    3. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this potentially useful study, the authors attempt to use comparative meta-analysis to advance our understanding of life history evolution. Unfortunately, both the meta-analysis and the theoretical model is inadequate and proper statistical and mechanistic descriptions of the simulations are lacking. Specifically, the interpretation overlooks the effect of well-characterised complexities in the relationship between clutch size and fitness in birds.

      Public Reviews:

      We would like to thank the reviewers for their helpful comments, which have been considered carefully and have been valuable in progressing our manuscript. The following bullet points summarise the key points and our responses, though our detailed responses to specific comments can be found below:<br /> - Two reviewers commented that our data was not made available. Our data was provided upon submission and during the review process, however was not made accessible to the reviewers. Our data and code are available at https://doi.org/10.5061/dryad.q83bk3jnk.

      - The reviewers have highlighted that some of our methodology was unclear and we have added all the requested detail to ensure our methods can be easily understood.

      - The reviewers highlight the importance of our conclusions, but also suggest some interpretations might be missing and/or are incomplete. To make clear how we objectively interpreted our data and the wider consequences for life-history theory we provide a decision tree (Figure 5). This figure makes clear where we think the boundaries are in our interpretation and how multiple lines of evidence converge to the same conclusions.

      Reviewer #1 (Public Review):

      This paper falls in a long tradition of studies on the costs of reproduction in birds and its contribution to understanding individual variation in life histories. Unfortunately, the meta-analyses only confirm what we know already, and the simulations based on the outcome of the meta-analysis have shortcomings that prevent the inferences on optimal clutch size, in contrast to the claims made in the paper.

      There was no information that I could find on the effect sizes used in the meta-analyses other than a figure listing the species included. In fact, there is more information on studies that were not included. This made it impossible to evaluate the data-set. This is a serious omission, because it is not uncommon for there to be serious errors in meta-analysis data sets. Moreover, in the long run the main contribution of a meta-analysis is to build a data set that can be included in further studies.

      It is disappointing that two referees comment on data availability, as we supplied a link to our full dataset and the code we used in Dryad with our submitted manuscript. We were also asked to supply our data during the review process and we again supplied a link to our dataset and code, along with a folder containing the data and code itself. We received confirmation that the reviewers had been given our data and code. We support open science and it was our intention that our dataset should be fully available to reviewers and readers. Our data and code are at https://doi.org/10.5061/dryad.q83bk3jnk.

      The main finding of the meta-analysis of the brood size manipulation studies is that the survival costs of enlarging brood size are modest, as previously reported by Santos & Nakagawa on what I suspect to be mostly the same data set.

      We disagree that the main finding of our paper is the small survival cost of manipulated brood size. The major finding of the paper, in our opinion, is that the effect sizes for experimental and observational studies are in opposite directions, therefore providing the first quantitative evidence to support the influential theoretical framework put forward by van Noordwijk and de Jong (1986), that individuals differ in their optimal clutch size and are constrained to reproducing at this level due to a trade-off with survival. We further show that while the manipulation experiments have been widely accepted to be informative, they are not in fact an effective test of whether within-species variation in clutch size is the result of a trade-off between reproduction and survival.

      The comment that we are reporting the same finding as Santos & Nakagawa (2012) is a misrepresentation of both that study and our own. Santos & Nakagawa found an effect of parental effort on survival only in males who had their clutch size increased – but no effect for males who had their clutch size reduced and no survival effect on females for either increasing or reducing parental effort. However, we found an overall reduction in survival for birds who had brood sizes manipulated to be larger than their original brood (for both sexes and mixed sex studies combined). In our supplementary information, we demonstrate that the overall survival effect of a change in reproductive effort is close to zero for males, negative (though non-significant) for females and significantly negative for mixed sexes (which are not included in the Santos & Nakagawa study). Please also note that the Santos & Nakagawa study was conducted over 10 years ago. This means we added additional data (L364-365). Furthermore, meta-analyses are an evolving practice and we also corrected and improved on the overall analysis approach (e.g. L358-359 and L 393-397, and see detailed SI).

      The paper does a very poor job of critically discussing whether we should take this at face value or whether instead there may be short-comings in the general experimental approach. A major reason why survival cost estimates are barely significantly different from zero may well be that parents do not fully adjust their parental effort to the manipulated brood size, either because of time/energy constraints, because it is too costly and therefore not optimal, or because parents do not register increased offspring needs. Whatever the reason, as a consequence, there is usually a strong effect of brood size manipulation on offspring growth and thereby presumably their fitness prospects. In the simulations (Fig.4), the consequences of the survival costs of reproduction for optimal clutch size were investigated without considering brood size manipulation effects on the offspring. Effects on offspring are briefly acknowledged in the discussion, but otherwise ignored. Assuming that the survival costs of reproduction are indeed difficult to discern because the offspring bear the brunt of the increase in brood size, a simulation that ignores the latter effect is unlikely to yield any insight in optimal clutch size. It is not clear therefore what we learn from these calculations.

      The reviewer’s comment is somewhat of a paradox. We take the best studied example of the trade-off between reproductive effort and parental survival – a key theme in life history and the biology of ageing – and subject this to a meta-analysis. The reviewer suggests we should interpret our finding as if there must be something wrong with the method or studies we included, rather than considering that the original hypothesis could be false or inflated in importance. We do not consider questioning the premise of the data over questioning a favoured hypothesis to necessarily be the best scientific approach here. In many places in our manuscript, we question and address, at length, the underlying data and their interpretation (L116-117, L165-167, 202-204 and L277-282). Moreover, we make it clear that we focus on the trade-off between current reproductive effort and subsequent parental survival, while being aware that other trade-offs could counter-balance or explain our findings (discussed on L208-210 & L301-316). Note that it is also problematic, when you do not find the expected response, to search for an alternative that has not been measured. In the case here, of potential trade-offs, there are endless possibilities of where a trade-off might operate between traits. We purposefully focus on the one well-studied and most commonly invoked trade-off. We clearly acknowledge, though, that when all possible trade-offs are taken into account a trade-off on the fitness level can occur and cite two famous studies (Daan et al., 1990 and Verhulst & Tinbergen 1991) that have shown just that (L314-316).

      So whilst we agree with the reviewer that the offspring may incur costs themselves, rather than costs being incurred by the parents, the aim of our study was to test for a general trend across species in the survival costs of reproductive effort. It is unrealistic to suggest that incorporating offspring growth into our simulations would add insight, as a change in offspring number rarely affects all offspring in the nest equally and there can even be quite stark differences; for example, this will be most evident in species that produce sacrificial offspring. This effect will be further confounded by catch-up growth, for example, and so it is likely that increased sibling competition from added chicks alters offspring growth trajectories, rather than absolute growth as the reviewer suggests. There are mixed results in the literature on the effect of altering clutch size on offspring survival, with an increased clutch size through manipulation often increasing the number of recruits from a nest.

      What we do appreciate from the reviewer’s comment is that the interpretation of our findings is complex. Even though our in-text explanation includes the caveats the reviewer refers to, and are discussed at length, their inter-relationships are hard to appreciate from a text format. To improve this presentation and for ease of the reader, we have added a decision tree (Figure 5) which represents the logical flow from the hypothesis being tested through to what overall conclusion can be drawn from our results. We believe this clarifies what conclusions can be drawn from our results. We emphasise again that the theory that trade-offs between reproductive effort and parental survival being the major driver of variation in offspring production was not supported though is the one that practitioners in the field would be most likely to invoke, and our result is important for this reason.

      There are other reasons why brood size manipulations may not reveal the costs of reproduction animals would incur when opting for a larger brood size than they produced spontaneously themselves. Firstly, the manipulations do not affect the effort incurred in laying eggs (which also biases your comparison with natural variation in clutch size). Secondly, the studies by Boonekamp et al on Jackdaws found that while there was no effect of brood size manipulation on parental survival after one year of manipulation, there was a strong effect when the same individuals were manipulated in the same direction in multiple years. This could be taken to mean that costs are not immediate but delayed, explaining why single year manipulations generally show little effect on survival. It would also mean that most estimates of the fitness costs of manipulated brood size are not fit for purpose, because typically restricted to survival over a single year.

      First, our results did show a survival cost of reproduction for brood manipulations (L107-123, Figure 1, Table 1). Note, however, that much theory is built on the immediate costs of reproduction and, as such, these costs are likely overinterpreted, meaning that our overall interpretation still holds, i.e. “parental survival trade-off is not the major determinative trade-off in life history within-species” (Figure 5).

      We agree with the reviewer that lifetime manipulations could be even more informative than single-year manipulations. Unfortunately, there are currently too few studies available to be able to draw generalisable conclusions across species for lifetime manipulations. This is, however, the reason we used lifetime change in clutch size in our fitness projections, which the reviewer seems to have missed – please see methods line 466-468, where we explicitly state that this is lifetime enlargement. Of course, such interpretations do not include an accumulation of costs that is greater than the annual cost, but currently there is no clear evidence that such an assumption is valid. Such a conclusion can also not be drawn from the study on jackdaws by Boonekamp et al (2014) as the treatments were life-long and, therefore, cannot separate annual from accrued (multiplicative) costs that are more than the sum of the annual costs incurred. Note that we have now included specific discussion of this study in response to the reviewer (L265-269).

      Details of how the analyses were carried out were opaque in places, but as I understood the analysis of the brood size manipulation studies, manipulation was coded as a covariate, with negative values for brood size reductions and positive values for brood size enlargements (and then variably scaled or not to control brood or clutch size). This approach implicitly assumes that the trade-off between current brood size (manipulation) and parental survival is linear, which contrasts with the general expectation that this trade-off is not linear. This assumption reduces the value of the analysis, and contrasts with the approach of Santos & Nakagawa.

      We thank the reviewer for highlighting a lack of clarity in places in our methods. We have added additional detail to the methodology section (see “Study sourcing & inclusion criteria” and “Extracting effect sizes”) in our revised manuscript. Note, that our data and code was not shared with the reviewers despite us supplying this upon submission and again during the review process, which would have explained a lot more of the detail required.

      For clarity in our response, each effect size was extracted by performing a logistic regression with survival as a binary response variable and clutch size was the absolute value of offspring in the nest (i.e., for a bird that laid a clutch size of 5 but was manipulated to have -1 egg, we used a clutch size value of 4). The clutch size was also standardised and, separately, expressed as a proportion of the species’ mean.

      We disagree that our approach reduces the value of our analysis. First, our approach allows a direct comparison between experimental and observational studies, which is the novelty of our study. Our approach does differ from Santos & Nakagawa but we disagree that it contrasts. Our approach allows us to take into consideration the severity of the change in clutch size, which Santos & Nakagawa do not. Therefore, we do not agree that our approach is worse at accounting for non-linearity of trade-offs than the approach used by Santos & Nakagawa. Arguably, the approach by Santos & Nakagawa is worse, as they dichotomise effort as increased or decreased, factorise their output and thereby inflate their number of outcomes, of which only 1 cell of 4 categories is significant (for males and females, increased and decreased brood size). The proof is in the pudding as well, as our results clearly demonstrate that the magnitude of the manipulation is a key factor driving the results, i.e. one offspring for a seabird is a larger proportion of care (and fitness) than one offspring for a passerine. Such insights were not achieved by Santos & Nakagawa’s method and, again, did not allow a direct quantitative comparison between quality (correlational) and experimental (brood size manipulation, i.e. “trade-off”) effects, which forms a central part of our argumentation (Figure 5). 

      Our analysis, alongside a plethora of other ecological studies, does assume that the response to our predictor variable is linear. However, it is common knowledge that there are very few (if any) truly linear relationships. We use linear relationships because they serve a good approximation of the trend and provide a more rigorous test for an underlying relationship than would fitting nonlinear models. For many datasets the range of added chicks required to estimate a non-linear relationship was not available. The question also remains of what the shape of such a non-linear relationship should be and is hard to determine a priori. There is also a real risk when fitting non-linear terms that they are spurious and overinterpreted, as they often present a better fit (denoting one df is not sufficient especially when slopes vary). We have added this detail to our discussion.

      The observational study selection is not complete and apparently no attempt was made to make it complete. This is a missed opportunity - it would be interesting to learn more about interspecific variation in the association between natural variation in clutch size and parental survival.

      We clearly state in our manuscript that we deliberately tailored the selection of studies to match the manipulation studies (L367-369). We paired species extracted for observational studies with those extracted in experimental studies to facilitate a direct comparison between observational and experimental studies, and to ensure that the respective datasets were comparable. The reviewer’s focus in this review seems to be solely on the experimental dataset. This comment dismisses the equally important observational component of our analysis and thereby fails to acknowledge one of the key questions being addressed in this study. Note that in our revised version we have edited the phylogenetic tree to indicate for which species we have both types of information, which highlights our approach to selecting observational data (Figure 3).

      Reviewer #2 (Public Review):

      I have read with great interest the manuscript entitled "The optimal clutch size revisited: separating individual quality from the costs of reproduction" by LA Winder and colleagues. The paper consists in a meta-analysis comparing survival rates from studies providing clutch sizes of species that are unmanipulated and from studies where the clutch sizes are manipulated, in order to better understand the effects of differences in individual quality and of the costs of reproduction. I find the idea of the manuscript very interesting. However, I am not sure the methodology used allows to reach the conclusions provided by the authors (mainly that there is no cost of reproduction, and that the entire variation in clutch size among individuals of a population is driven by "individual quality").

      We would like to highlight that we do not conclude that there is no cost of reproduction. Please see lines 336–339, where we state that our lack of evidence for trade-offs driving within-species variation in clutch size does not necessarily mean the costs of reproduction are non-existent. We conclude that individuals are constrained to their optima by the survival cost of reproduction. It is also an over-statement of our conclusion to say that we believe that variation in clutch size is only driven by quality. Our results show that unmanipulated birds that have larger clutch sizes also lived longer, and we suggest that this is evidence that some individuals are “better” than others, but we do not say, nor imply, that no other factors affect variation in clutch size. We have added Figure 5 to our manuscript to help the reader better understand what questions we can answer with our study and what conclusions we can draw from our results.

      I write that I am not sure, because in its current form, the manuscript does not contain a single equation, making it impossible to assess. It would need at least a set of mathematical descriptions for the statistical analysis and for the mechanistic model that the authors infer from it.

      We appreciate this comment, and have explained our methods in terms that are accessible to a wider audience. Note, however, that our meta-analysis is standard and based on logistic regression and standard meta-analytic practices. We have added the model formula to the model output tables.

      For the simulation, we simply simulated the resulting effects. We of course supplied our code for this along with our manuscript (https://doi.org/10.5061/dryad.q83bk3jnk), though as we mentioned above, we believe this was not shared with the reviewers despite us making this available for the review process. We therefore understand why the reviewer feels the simulations were not explained thoroughly. We have revised our methods section and added details which we believe make our methodology more clear without needing to consult the supplemental material. However, we have also added the equations used in the process of calculating our simulated data to the Supplementary Information for readers who wish to have this information in equation form.

      The texts mixes concepts of individual vs population statistics, of within individual vs among-individuals measures, of allocation trade-offs and fitness trade-offs, etc ....which means it would also require a glossary of the definitions the authors use for these various terms, in order to be evaluated.

      We would like to thank the reviewer for highlighting this lack of clarity in our text. Throughout the manuscript we have refined our terminology and indicated where we are referring to the individual level or the population level. The inclusion of our new Figure 5 (decision tree) should also help in this context, as it is clear on which level we base our interpretation and conclusions on.

      This problem is emphasised by the following sentence to be found in the discussion "The effect of birds having naturally larger clutches was significantly opposite to the result of increasing clutch size through brood manipulation". The "effect" is defined as the survival rate (see Fig 1). While it is relatively easy to intuitively understand what the "effect" is for the unmanipulated studies: the sensitivity of survival to clutch size at the population level, this should be mentioned and detailed in a formula. Moreover, the concept of effect size is not at all obvious for the manipulated ones (effect of the manipulation? or survival rate whatever the manipulation (then how could it measure a trade-off ?)? at the population level? at the individual level ?) despite a whole appendix dedicated to it. This absolutely needs to be described properly in the manuscript.

      Thank you for identifying this sentence for which the writing was ambiguous, our apologies. We have now rewritten this and included additional explanation. L282-290: ‘The effect on parental annual survival of having naturally larger clutches was significantly opposite to the result of increasing clutch size through brood manipulation, and quantitatively similar. Parents with naturally larger clutches are thus expected to live longer and this counterbalances the “cost of reproduction” when their brood size is experimentally manipulated. It is, therefore, possible that quality effects mask trade-offs. Furthermore, it could be possible that individuals that lay larger clutches have smaller costs of reproduction, i.e. would respond less in terms of annual survival to a brood size manipulation, but with our current dataset we cannot address this hypothesis (Figure 5).’

      We would also like to thank the reviewer for bringing to our attention the lack of clarity about the details of our methodology. We have added details to our methodology (see “Extracting effect sizes” section) to address this (see highlighted sections). For clarity, the effect size for both manipulated and unmanipulated nests was survival, given the brood size raised. We performed a logistic regression with survival as a binary response variable (i.e., number of individuals that survived and number of individuals that died after each breeding season), and clutch size was the absolute value of offspring in the nest (i.e., for a bird that laid a clutch size of 5 but was manipulated to have -1 egg, we used a clutch size value of 4). This allows for direct comparison of the effect size (survival given clutch size raised) between manipulated and unmanipulated birds.

      Despite the lack of information about the underlying mechanistic model tested and the statistical model used, my impression is still that the interpretation in the introduction and discussion is not granted by the outputs of the figures and tables. Let's use a model similar to that of (van Noordwijk and de Jong, 1986): imagine that the mechanism at the population level is

      a.c_(i,q)+b.s_(i,q)=E_q

      Where c_(i,q) are s_(i,q) are respectively the clutch size for individual i which is of quality q, and E_q is the level of "energy" that an individual of quality q has available during the given time-step (and a and b are constants turning the clutch size and survival rate into energy cost of reproduction and energy cost of survival, and there are both quite "high" so that an extra egg (c_(i,q) is increased by 1) at the current time-step, decreases s_(i,q) markedly (E_q is independent of the number of eggs produced), that is, we have strong individual costs of reproduction). Imagine now that the variance of c_(i,q) (when the population is not manipulated) among individuals of the same quality group, is very small (and therefore the variance of s_(i,q) is very small also) and that the expectation of both are proportional to E_q. Then, in the unmanipulated population, the variance in clutch size is mainly due to the variance in quality. And therefore, the larger the clutch size c_(i,q) the higher E_q, and the higher the survival s_(i,q).

      In the manipulated populations however, because of the large a and b, an artificial increase in clutch size, for a given E_q, will lead to a lower survival s_(i,q). And the "effect size" at the population level may vary according to a,b and the variances mentioned above. In other words, the costs of reproduction may be strong, but be hidden by the data, when there is variance in quality; however there are actually strong costs of reproduction (so strong actually that they are deterministic and that the probability to survive is a direct function of the number of eggs produced)

      We would like to thank the reviewer for these comments. We have added detail to our methodology section so our models and rationale are more clear. Please note that our simulations only take the experimental effect of brood size on parental survival into account. Our model does not incorporate quality effects. The reviewer is right that the relationship between quality and the effects exposed by manipulating brood size can take many forms and this is a very interesting topic, but not one we aimed to tackle in our manuscript. In terms of quality we make two points: (1) overall quality effects connecting reproduction and parental survival are present, (2) these effects are opposite in direction to the effects when reproduction is manipulated and similar in magnitude. We do not go further than that in interpreting our results. The reviewer is correct, however, that we do suggest and repeat suggestions by others that quality can also mask the trade-off in some individuals or circumstances (L74-76, L95-98 & L286-289), but we do not quantify this, as it is dependent on the unknown relationship between quality and the response to the manipulation. A focussed set of experiments in that context would be interesting and there are some data that could get at this, i.e. the relationship between produced clutch size and the relative effect of the manipulation (now included L287-290). Such information is, however, not available for all studies and, although we explored the possibility of analysing this, currently this is not possible with adequate confidence and there is the possible complexity of non-linear effects. We have added this rationale in our revision (L259-265).

      Moreover, it seems to me that the costs of reproduction are a concept closely related to generation time. Looking beyond the individual allocative (and other individual components of the trade-off) cost of reproduction and towards a populational negative relationship between survival and reproduction, we have to consider the intra-population slow fast continuum (some types of individuals survive more and reproduce less (are slower) than other (which are faster)). This continuum is associated with a metric: the generation time. Some individuals will produce more eggs and survive less in a given time-period because this time-period corresponds to a higher ratio of their generation time (Gaillard and Yoccoz, 2003; Gaillard et al., 2005). It seems therefore important to me, to control for generation time and in general to account for the time-step used for each population studied when analysing costs of reproduction. The data used in this manuscript is not just clutch size and survival rates, but clutch size per year (or another time step) and annual (or other) survival rates.

      The reviewer is right that this is interesting. There is a longstanding unexplained difference in temperate (seasonal) and tropical reproductive strategies. Most of our data come from seasonal breeders, however. Although there is some variation in second brooding and such, these species mostly only produce one brood. We do agree that a wider consideration here is relevant, but we are not trying to explain all of life history in our paper. It is clearly the case that other factors will operate and the opportunity for trade-offs will vary among species according to their respective life histories. However, our study focuses on the two most fundamental components of fitness – longevity and reproduction – to test a major hypothesis in the field, and we uncover new relationships that contrast with previous influential studies and cast doubt on previous conclusions. We question the assumed trade-off between reproduction and annual survival. We show that quality is important and that the effect we find in experimental studies is so small that it can only explain between-species patterns but is unlikely to be the selective force that constrains reproduction within species. We do agree that there is a lot more work that can be done in this area. We hope we are contributing to the field, by questioning this central trade-off. We have incorporated some of the reviewers suggestions in the revision (L309-315). We have added Figure 5 to make clear where we are able to reach solid conclusions and the evidence on which these are based as clearly as possible in an easily accessible format.

      Finally, it is important to relate any study of the costs of reproduction in a context of individual heterogeneity (in quality for instance), to the general problem of the detection of effects of individual differences on survival (see, e.g., Fay et al., 2021). Without an understanding of the very particular statistical behaviour of survival, associated to an event that by definition occurs only once per life history trajectory (by contrast to many other traits, even demographic, where the corresponding event (production of eggs for reproduction, for example) can be measured several times for a given individual during its life history trajectory).

      Thank you for raising this point. The reviewer is right that heterogeneity can dampen or augment selection. Note that by estimating the effect of quality here we give an example of how heterogeneity can possibly do exactly this. We thank the reviewer for raising that we should possibly link this to wider effects of heterogeneity and we have added to our discussion of how our results play into the importance of accounting for among-individual heterogeneity (L252-256).

      References:

      Fay, R. et al. (2021) 'Quantifying fixed individual heterogeneity in demographic parameters: Performance of correlated random effects for Bernoulli variables', Methods in Ecology and Evolution, 2021(August), pp. 1-14. doi: 10.1111/2041-210x.13728.

      Gaillard, J.-M. et al. (2005) 'Generation time: a reliable metric to measure life-history variation among mammalian populations.', The American naturalist, 166(1), pp. 119-123; discussion 124-128. doi: 10.1086/430330.

      Gaillard, J.-M. and Yoccoz, N. G. (2003) 'Temporal Variation in Survival of Mammals: a Case of Environmental Canalization?', Ecology, 84(12), pp. 3294-3306. doi: 10.1890/02-0409.

      van Noordwijk, A. J. and de Jong, G. (1986) 'Acquisition and Allocation of Resources: Their Influence on Variation in Life History Tactics', American Naturalist, p. 137. doi: 10.1086/284547.

      Reviewer #3 (Public Review):

      The authors present here a comparative meta-analysis analysis designed to detect evidence for a reproduction/ survival trade-off, central to expectations from life history theory. They present variation in clutch size within species as an observation in conflict with expectations of optimisation of clutch size and suggest that this may be accounted for from weak selection on clutch size. The results of their analyses support this explanation - they found little evidence of a reproduction - survival trade-off across birds. They extrapolated from this result to show in a mathematical model that the fitness consequences of enlarged clutch sizes would only be expected to have a significant effect on fitness in extreme cases, outside of normal species' clutch size ranges. Given the centrality of the reproduction-survival trade-off, the authors suggest that this result should encourage us to take a more cautious approach to applying concepts the trade-off in life history theory and optimisation in behavioural ecology more generally. While many of the findings are interesting, I don't think the argument for a major re-think of life history theory and the role of trade-offs in fitness maximisation is justified.

      The interest of the paper, for me, comes from highlighting the complexities of the link between clutch size and fitness, and the challenges facing biologists who want to detect evidence for life history trade-offs. Their results highlight apparently contradictory results from observational and experimental studies on the reproduction-survival trade-off and show that species with smaller clutch sizes are under stronger selection to limit clutch size.

      Unfortunately, the authors interpret the failure to detect a life history trade-off as evidence that there isn't one. The construction of a mathematical model based on this interpretation serves to give this possible conclusion perhaps more weight than is merited on the basis of the results, of this necessarily quite simple, meta-analysis. There are several potential complicating factors that could explain the lack of detection of a trade-off in these studies, which are mentioned and dismissed as unimportant (lines 248-250) without any helpful, rigorous discussion. I list below just a selection of complexities which perhaps deserve more careful consideration by the authors to help readers understand the implications of their results:

      We would like to thank the reviewer for their thoughtful response and summary of the findings that we also agree are central to our study. The reviewer also highlights areas where our manuscript could benefit from a deeper consideration and we have added detail accordingly to our revised discussion.

      We would like to highlight that we do not interpret the failure to detect a trade-off as evidence that there is not one. First, and importantly, we do find a trade-off but show this is only incurred when individuals produce a clutch beyond their optimal level. Second, we also state on lines 322-326 that the lack of evidence to support trade-offs being strong enough to drive variation in clutch size does not necessarily mean there are no costs of reproduction.

      The statement that we have constructed a mathematical model based on the interpretation that we have not found a trade-off is, again, factually incorrect. We ran these simulations because the opposite is true – we did find a trade-off. There is a significant effect of clutch size when manipulated on annual parental survival. We benefit from our unique analysis allowing for a quantitative fitness estimate from the effect size on annual survival (as this is expressed on a per-egg basis). This allowed us to ask whether this quantitative effect size can alone explain why reproduction is constrained, and we evaluate this using simulations. From these simulations we find that this effect size is too small to explain the constraint, so something else must be going on, and we do spend a considerable amount of text discussing the possible explanations (L202-215). Note that the possibly most parsimonious conclusion here is that costs of reproduction are not there, or simply small, so we also give that explanation some thought (L221-224 and L315-331).

      We are disappointed by the suggestion that we have dismissed complicating factors that could prevent detection of a trade-off, as this was not our intention. We were aiming to highlight that what we have demonstrated to be an apparent trade-off can be explained through other mechanisms, and that the trade-off between clutch size and survival is not as strong in driving within-species variation in clutch size as previously assumed. We have added further discussion to our revised manuscript to make this clear and give readers a better understanding of the complexity of factors associated with life-history theory, including the addition of a decision tree (Figure 5).

      • Reproductive output is optimised for lifetime reproductive success and so the consequences of being pushed off the optimum for one breeding attempt are not necessarily detectable in survival but in future reproductive success (and, therefore, lifetime reproductive success).

      We agree this is a valid point, which is mentioned in our manuscript in terms of alternative stages where the costs of reproduction might be manifested (L316-320). We would also like to highlight that , in our simulations, the change in clutch size (and subsequent survival cost) was assumed for the lifetime of the individual, for this very reason.

      • The analyses include some species that hatch broods simultaneously and some that hatch sequentially (although this information is not explicitly provided (see below)). This is potentially relevant because species which have been favoured by selection to set up a size asymmetry among their broods often don't even try to raise their whole broods but only feed the biggest chicks until they are sated; any added chicks face a high probability of starvation. The first point this observation raises is that the expectation of more chicks= more cost, doesn't hold for all species. The second more general point is that the very existence of the sequential hatching strategy to produce size asymmetry in a brood is very difficult to explain if you reject the notion of a trade-off.

      We agree with the reviewer that the costs of reproduction can be absorbed by the offspring themselves, and may not be equal across offspring (we also highlight this at L317-318 in the manuscript). However, we disagree that for some species the addition of more chicks does not equate to an increase in cost, though we do accept this might be less for some species. This is, however, difficult to incorporate into a sensible model as the impacts will vary among species and some species do also exhibit catch-up growth. So, without a priori knowledge on this, we kept our model simple to test whether the effect on parental survival (often assumed to be a strong cost) can explain the constraint on reproductive effort, and we conclude that it does not.

      We would also like to make clear that we are not rejecting the notion of a trade-off. Our study shows evidence that a trade-off between survival and reproductive effort probably does not drive within-species variation in clutch size. We do explicitly say this throughout our manuscript, and also provide suggestions of other areas where a trade-off may exist (L317-320). The point of our study is not whether trade-offs exist or not, it is whether there is a generalisable across-species trend for a trade-off between reproductive effort and survival – the most fundamental trade-off in our field but for which there is a lack of conclusive evidence within species. We believe the addition of Figure 5 to our reviewed manuscript also makes this more evident.

      • For your standard, pair-breeding passerine, there is an expectation that costs of raising chicks will increase linearly with clutch size. Each chick requires X feeding visits to reach the required fledge weight. But this is not the case for species which lay precocious chicks which are relatively independent and able to feed themselves straight after hatching - so again the relationship of care and survival is unlikely to be detectable by looking at the effect of clutch size but again, it doesn't mean there isn't a trade-off between breeding and survival.

      Precocial birds still provide a level of parental care, such as protection from predators. Though we agree that the level of parental care in provisioning food (and in some cases in all parental care given) is lower in precocial than altricial birds, this would only make our reported effect size for manipulated birds to be an underestimate. Again, we would like to draw the reviewer’s attention to the fact we did detect a trade-off in manipulated birds and we do not suggest that trade-offs do not exist. The argument the reviewer suggests here does not hold for unmanipulated birds, as we found that birds that naturally lay larger clutch sizes have higher survival.

      • The costs of raising a brood to adulthood for your standard pair-breeding passerine is bound to be extreme, simply by dint of the energy expenditure required. In fact, it was shown that the basal metabolic rate of breeding passerines was at the very edge of what is physiologically possible, the human equivalent being cycling the Tour de France (Nagy et al. 1990). If birds are at the very edge of what is physiologically possible, is it likely that clutch size is under weak selection?

      If birds are at the very edge of what is physiologically possible, then indeed it would necessarily follow that if they increase the resource allocated in one area then expenditure in another area must be reduced. In many studies, however, the overall brood mass is increased when chicks are added and cared for in an experimental setting, suggesting that birds are not operating at their limit all the time. Our simulations show that if individuals increase their clutch size, the survival cost of reproduction counterbalances the fitness gained by increasing clutch size and so there is no overall fitness gain to producing more offspring. Therefore, selection on clutch size is constrained to the within-species level. We do not say in our manuscript that clutch size is under weak selection – we only ask why variation in clutch size is maintained if selection always favours high-producing birds.

      • Variation in clutch size is presented by the authors as inconsistent with the assumption that birds are under selection to lay the Lack clutch. Of course, this is absurd and makes me think that I have misunderstood the authors' intended point here. At any rate, the paper would benefit from more clarity about how variable clutch size has to be before it becomes a problem for optimality in the authors' view (lines 84-85; line 246). See Perrins (1965) for an exquisite example of how beautifully great tits optimise clutch size on average, despite laying between 5-12 eggs.

      We thank the reviewer for highlighting that our manuscript may be misleading in places, however, we are unsure which part of our conclusions the author is referring to here. The question we pose is “Why don’t all birds produce a clutch size at the population optimum?”, and is central to the decades-long field of life-history theory. Why is variation maintained? As the reviewer outlines, there is extensive variability, with some birds laying half of what other birds lay.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Title: while the costs of reproduction are possibly important in shaping optimal clutch size, it is not clear what you can about it given that you do not consider clutch / brood size effects on fitness prospects of the offspring.

      We have expanded on our discussion of how some costs may be absorbed by the offspring themselves. However, a change in offspring number rarely affects all offspring in the nest equally and there can even be quite stark differences; for example this will be most evident in species that produce sacrificial offspring. This effect will be further confounded by catch-up growth. There are mixed results in the literature on the effect of altering clutch size on offspring survival, with an increased clutch size through manipulation often increasing the number of recruits from a nest. We have focussed on the relationship between reproductive effort and survival because it is given the most weight in the field in terms of driving intra-specific variation in clutch size. We have altered our title to show we focus on the survival costs specifically: “The optimal clutch size revisited: separating individual quality from the parental survival costs of reproduction”.

      (2) L.11-12: I agree that this is true for birds, but this is phrased more generally here. Are you sure that that is justified?

      The trade-off between survival and reproductive effort has largely been tested experimentally through brood manipulations in birds as this provides a good system in which to test the costs and benefits of increasing parental effort. The work in this area has provided theory beyond just passerine birds, which are the most commonly manipulated group, to across-taxa theories. We are unaware of any study/studies that provide evidence that the reproduction/survival trade-off is generalisable across multiple species in any taxa. As such, we do believe this sentence is justified. An example is the lack of a consistent negative genetic correlation in populations of fruitflies, for example, that has also been hailed as a lack-of-cost paradigm. Furthermore, some mutants that live longer do so without a cost on reproduction.

      (3) L.13-14: Not sure what you mean with this sentence - too much info lacking.

      We have added some detail to this sentence.

      (4) L.14: it is slightly awkward to say 'parental investment and survival' because it is the survival effect that is usually referred to as the 'investment'. Perhaps what you want to say is 'parental effort and survival'?

      We have replaced “parental investment” with “reproductive effort”

      (5) L.15: you can omit 'caused'. Compared to control treatment or to reduced broods? Why not mention effects or lack thereof of brood reduction? And it would be good to also mention here whether effects were similar in the sexes.

      Please see our methodology where we state that we use clutch size as a continuous variable (we do not compare to control or reduced but include the absolute value of offspring in a logistic regression). The effects of a brood reduction are drawn from the same regression and so are opposite. Though we appreciate the detail here is lacking to fully comprehend our study, we would like to highlight this is the abstract and details are provided in the main text.

      (6) L. 15: I am not sure why you write 'however', as the finding that experimental and natural variation have opposite effects is in complete agreement with what is generally reported in the literature and will therefore surprise no one that is aware of the literature.

      We use “however” to highlight the change in direction of the effect size from the results in the previous sentence. We also believe that ours ise the first study that provides a quantitative estimate of this effect and that previous work is largely theoretical. The reviewer states that this is what is generally reported but it is not reported in all cases, as some relationships between reproductive effort and survival are negative (for the quality measurement, in correlational space, see Figure 1).

      (7) L.16: saying 'opposite to the effect of phenotypic quality' seems difficult to justify, as clutch size cannot be equated with phenotypic quality. Perhaps simply say 'natural variation in clutch size'? If that is what you are referring to.

      Please note we are referring to effect sizes here –- that is, the survival effect of a change in clutch size. By phenotypic quality we are referring to the fact that we find higher parental survival when natural clutch sizes are higher. It is not the case that we refer to quality only as having a higher clutch size. This is explicitly stated in the sentence you refer to. We have changed “effect” to “effect size” to highlight this further.

      (8) L.18: why do you refer to 'parental care' here? Brood size is not equivalent to parental care.

      Brood size manipulations are used to manipulate parental care. The effect on parental survival is expected to be incurred because of the increase in parental care. We have changed “parental care” to “reproductive effort” to reduce the number of terms we use in our manuscript.

      (9) L.18-19: suggest to tone down this claim, as this is no more than a meta-analytic confirmation of a view that is (in my view) generally accepted in the field. That does not mean it is not useful, just that it does not constitute any new insight.

      We are unaware of any other study which provides generalisable across-species evidence for opposite effects of quality and costs of reproduction. The work in this area is also largely theoretical and is yet to be supported experimemtally, especially in a quantitative fashion. It is surprising to us that the reviewer considers there to be general acceptance in a field, rather than being influenced by rigorous testing of hypotheses, made possible by meta-analysis, the current gold standard in our field.

      (10) L.21: what does 'parental effort' mean here? You seem to use brood size, parental care, parental effort, and parental investment interchangeably but these are different concepts. Daan et al (1990, Behaviour), which you already cite, provide a useful graph separating these concepts. Please adjust this throughout the manuscript, i.e. replace 'reproductive effort' with wording that reflect the actual variable you use.

      We have not used the phrase “parental effort” in this sentence. We agree these are different concepts but in this context are intertwined. For example, brood size is used to manipulate parental care as a result of increased parental effort. We do agree the manuscript would benefit from keeping terminology consistent throughout the manuscript and have adjusted this throughout.

      (11) L.23: perhaps add 'in birds' somewhere in this sentence? Some reference to the assumptions underlying this inference would also be useful. Two major assumptions being that birds adjusted their effort to the manipulation as they would have done had they opted for a larger brood size themselves, and that the costs of laying and incubating extra eggs can be ignored. And then there is the effect that laying extra eggs will usually delay the hatch date, which in many species reduces reproductive success.

      Though our study does exclusively use birds, birds have been used to test the survival/reproduction trade-off because they present a convenient system in which to experimentally test this. The conclusions from these studies have a broader application than in birds alone. We believe that although these details are important, they are not appropriate in the abstract of our paper.

      (12) L.26: how is this an explanation? It just repeats the finding.

      We intend to refer to all interpretations from all results presented in our manuscript. We have made this more clear by adjusting our writing.

      (13) L.27: I do not see this point. And 'reproductive output' is yet another concept, that can be linked to the other concepts in the abstract in different ways, making it rather opaque.

      We have changed “reproductive output” to “reproductive effort”.

      (14) L.33: here you are jumping from 'resources' to 'energetically' - it is not clear that energy is the only or main limiting resource, so why narrow this down to energy?

      We do not say energy is the only or main limiting resource. We simply highlight that reproduction is energetically demanding and so, intuitively, a trade-off with a highly energetically demanding process would be the focal place to observe a trade off. We have, though, replaced “energetically” with “resource”.

      (15) L.35-36: this is new to me - I am not aware of any such claims, and effects on the residual reproductive value could also arise through effects on future reproduction. The authors you cite did not work on birds, or (in their own study systems) presented results that as far as I remember warrant such a general statement.

      The trade-off between reproduction and survival is seminal to the disposable soma theory, proposed by Kirkwood. Though Kirkwood’s work was largely not focussed on birds, it had fundamental implications for the field of evolutionary ecology because of the generalisable nature of his proposed framework. In particular, it has had wide-reaching influence on how the biology of aging is interpreted. The readership of the journal here is broad, and our results have implications for that field too. The work of Kirkwood (many of the papers on this topic have over 2000 citations each) has been perhaps overly influential in many areas, so a link to how that work should be interpreted is highly relevant. If the reviewer is interested in this topic the following papers by one of the co-authors and others could be of interest, some of which we could not cite in the main manuscript due to space considerations:

      https://www.science.org/doi/pdf/10.1126/sciadv.aay3047

      https://agingcelljournal.org/Archive/Volume3/stochasticity_explains_non_genetic_inheritance_of_lifespan/

      https://pubmed.ncbi.nlm.nih.gov/21558242/

      https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2435.13444

      https://www.nature.com/articles/362305a0

      https://www.cell.com/trends/ecology-evolution/fulltext/S0169-5347(12)00147-4

      https://www.cell.com/cell/pdf/S0092-8674(15)01488-9.pdf

      https://bmcbiol.biomedcentral.com/articles/10.1186/s12915-018-0562-z

      (16) L.42: this could be preceded with mentioning the limitations of observational data.

      We have added detail as to why brood manipulations are a good test for trade-offs and so this is now inherently implied.

      (17) L.42-43: why?

      We have added detail to this sentence.

      (18) L.45: do any of the references cited here really support this statement? I am certain that several do not - in these this statement is an assumption rather than something that is demonstrated. It may be useful to look at Kate Lessell's review on this that appeared in Etologia, I think in the 1990's. Mind however that 'reproductive effort' is operationally poorly defined for reproducing birds - provisioning rate is not necessarily a good measure of effort in so far as there are fitness costs.

      We have updated the references to support the sentence.

      (19) L.47: Given that you make this statement with respect to brood size manipulations in birds, it seems to me that the paper by Santos & Nakagawa is the only paper you should cite here. Given that you go on to analyze the same data it deserves to be discussed in more detail, for example to clarify what you aim to add to their analysis. What warrants repeating their analysis?

      Please first note that our dataset includes Santos & Nakagawa and additional studies, so it is not accurate to say we analyse the same data. Furthermore, we believe our study has implications beyond birds alone and so believe it is appropriate to cite the papers that do support our statement. We have added details to the methods to explicitly state what data is gathered from Santos & Nakagawa (it is only used to find the appropriate literature and data was re-extracted and re-analysed in a more appropriate way) and, separately, how we gathered the observational studies (see L352-381).

      (20) L.48: There are more possible explanations to this, which deserve to be discussed. For example, brood size manipulations may not have been that effective in manipulating reproductive effort - for example, effects on energy expenditure tend to be not terribly convincing. Secondly, the manipulations do not affect the effort incurred in laying eggs (which also biases your comparison with natural variation in clutch size). Thirdly, the studies by Boonekamp et al on Jackdaws found that while there was no effect of brood size manipulation on parental survival after one year of manipulation, there was a strong effect when the same individuals were manipulated in the same direction in multiple years. This could be taken to mean that costs are not immediate but delayed, explaining why single year manipulations generally show little effect on survival. It would also mean that most estimates of the fitness costs of manipulated brood size are not fit for purpose, because typically restricted to survival over a single year.

      Please see our response to this comment in the public reviews.

      Out of interest and because the reviewer mentioned “energy expenditure” specifically: There are studies that show convincing effects of brood size manipulation on parental energy expenditure. We do agree that there are also studies that show ceilings in expenditure. We therefore disagree that they “tend to be not terribly convincing”. Just a few examples:

      https://academic.oup.com/beheco/article/10/5/598/222025 (Figure 2)

      https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/1365-2435.12321 (Figure 1)

      https://besjournals.onlinelibrary.wiley.com/doi/full/10.1046/j.1365-2656.2000.00395.x (but ceiling at enlarged brood).

      (21) L.48, "or, alternatively, that individuals may differ in quality": how do you see that happening when brood size is manipulated, and hence 'quality' of different experimental categories can be assumed to be approximately equal? This point does apply to observational studies, so I assume that that is what you had in mind, but that distinction should be clear (also on line 54).

      We have made it more clear that we determine if there are quality effects separate to the costs of reproduction found using brood manipulation studies.

      (22) L.50: Drent & Daan, in their seminal paper on "The prudent parent" (1980, Ardea) were among the earliest to make this point and deserve to be cited here.

      We have added this citation

      (23) L.51, "relative importance": relative to what? Please be more specific.

      We have adjusted this sentence.

      (24) L.54: Vedder & Bouwhuis (2018, Oikos) go some way towards this point and should be explicitly mentioned with reference to the role of 'quality' effects on the association between reproductive output and survival.

      We have added this reference.

      (25) L.55: can you be more specific on what you want to do exactly? What you write here could be interpreted differently.

      We have added an explicit aim after this sentence to be more clear.

      (26) L.57: Here also a more specific wording would be useful. What does it mean exactly when you say you will distinguish between 'quality' and 'costs'?

      We have added detail to this sentence.

      (27) L.62: it should be clearer from the introduction that this is already well known, which will indirectly emphasize what you are adding to what we know already.

      We would argue this is not well known and has only been theorised but not shown empirically, as we do here.

      (28) L.62: you equate clutch size with 'quality' here - that needs to be spelled out.

      We refer to quality as the positive effect size of survival for a given clutch size, not clutch size alone. We appreciate this is not clear in this sentence and have reworded.

      (29) L.64: this looks like a serious misunderstanding to me, but in any case, these inferences should perhaps be left to the discussion (this also applies to later parts of this paragraph), when you have hopefully convinced readers of the claims you make on lines 62-63.

      We are unsure of what the reviewer is referring to as a misunderstanding. We have chosen this format for the introduction to highlight our results. If this is a problem for the editors we will change as required.

      (30) L.66: quantitative comparison of what?

      Comparison of species. We have changed the wording of this sentence

      (31) L.67-69: this should be in the methods.

      We have used a modern format which highlights our result. We are happy to change the format should the editors wish us to.

      (32) L.74-88: suggest to (re)move this entire paragraph, presenting inferences in such an uncritical manner before presenting the evidence is inappropriate in my view. I have therefore refrained from commenting on this paragraph.

      We have chosen a modern format which highlights our result. We are happy to change the format should the editors wish us to.

      (33) L.271, "must detail variation in the number of raised young": it is not sufficiently clear what this means - what does 'detail' mean in this context? And what does 'number of raised young' mean? The number hatched or raised to fledging?

      We have now made this clear.

      (34) L271, "must detail variation in the number of raised young": looking at table S4, it seems that on the basis of this criterion also brood size manipulation studies where details on the number of young manipulated were missing are excluded. I see little justification for this - surely these manipulations can for example be coded as for example having the average manipulation size in the meta-analysis data set, thereby contributing to tests of manipulation effects, but not to variation within the manipulation groups?

      We have done in part what the reviewer describes. We are specifically interested in the manipulation size, so we required this to compare effect sizes across species and categories, a key advance of our study and outlined in many places in our manuscript. Note, however, that we only need comparative differences, and have used clutch size metrics more generally to obtain a mean clutch size for a species, as well as SD where required. Please also note that our supplement details exactly why studies were excluded from our analysis, as is the preferred practice in a meta-analysis.

      (35) L.271, "referred to as clutch size": the point of this simplification is not clear to me why it is clearly confusing - why not refer to 'brood size' instead?

      Brood size and clutch size can be used interchangeably here because, in the observational studies, the individuals vary in the number of eggs produced, whereas for brood manipulations this obviously happens after hatching and brood is perhaps a more appropriate term, but we wanted to simplify the terminology used. However, we use clutch size throughout as the aim of our study is to determine why individuals differ in the number of offspring they produce, and so clutch size is the most appropriate term for that.

      (36) L.280: according to the specified inclusion criteria (lines 271/272) these studies should already be in the data set, so what does this mean exactly?

      Selection criteria refers to whether a given study should be kept for analysis or not. It does not refer to how studies were found. Please see lines 361-378 for details on how we found studies (additional details are also in the Supplementary Methods).

      (37) L.281: the use of 'quality' here is misleading - natural variation in clutch or brood size will have multiple causes, variation in phenotypic quality of the individuals and their environment (territories) is only one of the causes. Why not simply refer to what you are actually investigating: natural and experimental variation in brood size.

      We disagree, our study aims to separate quality effects from the costs of reproduction and we use observational studies to test for quality differences, though we make no inference about the mechanisms. We do not imply that the environment causes differences in quality, but that to directly compare observation and experimental groups, they should contain similar species. So, to be clear again, quality refers to the positive covariation of clutch size with survival. We feel that we explain this clearly in our study’s rationale and have also improved our writing in several sections on this to avoid any confusion (see responses to earlier comments by the three reviewers).

      (38) L.283, "in most cases": please be exact and say in xx out xx cases.

      We have added the number of studies for each category here.

      (39) L.283-285: presumably readers can see this directly in a table with the extracted data?

      Our data and code can be accessed with the following link: https://doi.org/10.5061/dryad.q83bk3jnk. We believe the data are too large to include as a table in the main text and are not essential in understanding the paper. Though we do believe all readers should have access to this information if they wish and so is publicly available.

      (40) L.293: there does not seem to be a table that lists the included studies and effect sizes. It is not uncommon to find major errors in such tables when one is familiar with the literature, and absence of this information impedes a complete assessment of the manuscript.

      We supplied a link to our full dataset and the code we used in Dryad with our submitted manuscript. We were also asked to supply our data during the review process and we again supplied a link to our dataset and code, along with a folder containing the data and code itself. We received confirmation that the reviewers had been given our data and code. We support open science and it was our intention that our dataset should be fully available to reviewers and readers. We believe the data are too large to include as a table in the main text and are not essential in understanding the paper. Our data and code are at https://doi.org/10.5061/dryad.q83bk3jnk.

      (41) L.293: from how many species?

      We have added this detail.

      (42) L.296, "longevity": this is a tricky concept, not usually reported in the studies you used, so please describe in detail what data you used.

      We have removed longevity as we did not use this data in our current version of the manuscript.

      (43) L. 298: again: where can I see this information?

      Our data and code can be accessed with the following link: https://doi.org/10.5061/dryad.q83bk3jnk. We did supply this information when we submitted our manuscript and again during the review process but we believe this was not passed onto the reviewers.

      (44) L. 304, "we used raw data": I assume that for the majority of papers the raw data were not available, so please explain how you dealt with this. Or perhaps this applies to a selection of the studies only? Perhaps the experimental studies?

      By raw data, we mean the absolute value of offspring in the nest. We have changed the wording of this sentence and added detail about whether the absolute value of offspring was not present for brood manipulation studies (L393-397).

      (45) L.304: When I remember correctly, Santos and Nakagawa examined effects of reducing and enlarging brood size separately, which is of importance because trade-off curves are unlikely to be linear and whether they are or not has major effects on the optimization process. But perhaps you tackled this in another way? I will read on.....

      You are correct that Santos & Nakagawa compared brood increases and reductions to control separately. Note that this only partially accounts non-linearity and it does not take into account the severity of the change in brood size. By using a logistic regression of absolute clutch size, as we have done, we are able to directly compare brood manipulations with experimental studies. Please see Supplementary Methods lines 11-12, where we have added additional detail as to why our approach is beneficial in this analysis.

      (46) L.319: what are you referring to exactly with "for each clutch size transformation"?

      We refer to the raw, standardised and proportional clutch size transformations. We have added detail here to be more clear.

      (47) L.319: is there a cost of survival? Perhaps you mean 'survival cost'? This would be appropriate for the experimental data, but not for the observational data, where the survival variation may be causally unrelated to the brood size variation, even if there is a correlation.

      We have changed “cost of survival” to “effect of parental survival”. We only intend to imply causality for the experimental studies. For observational studies we do not suggest that increasing clutch size is causal for increasing survival, only correlative (and hence we use the phrase “quality”).

      (48) L.320: please replace "parental effort" with something like 'experimental change in brood size'.

      We have changed “parental effort” to “reproductive effort”

      (49) L.321: due to failure of one or more eggs to hatch, and mortality very early in life, before brood sizes are manipulated, it is not likely that say an enlargement of brood size by 1 chick can be equated to the mean clutch size +1 egg / check. For example, in the Wytham great tit study, as re-analysed by Richard Pettifor, a 'brood size manipulation' of unmanipulated birds is approximately -1, being the number of eggs / chicks lost between laying and the time of brood size manipulation. Would this affect your comparisons?

      Though we agree these are important factors in determining what a clutch/brood size actually is for a given individual/pair, as this can vary from egg laying to fledging. We do not believe that accounting for this (if it was possible to do so) would significantly affect our conclusions, as observational studies are comparable in the fact that these birds would also likely see early life mortality of their offspring. It is also possibly the case that parents already factor in this loss, and so a brood manipulation still changes the parental care effort an individual has to incur.

      (50) L.332: instead of "adjusted" perhaps say 'mean centred'?

      We have implemented this suggestion.

      (51) L.345: this statement surprised me, but is difficult to verify because I could not locate a list of the included studies. However, to my best knowledge, most studies reporting brood size manipulation effects on parental survival had this as their main focus, in contrast to your statement.

      Our data and code can be accessed with the following link: https://doi.org/10.5061/dryad.q83bk3jnk. We did supply this information when we submitted our manuscript and again during the review process but we believe this was not passed onto the reviewers by the journal, although supplied by us on several occasions. We regret that the reviewer was impeded by this unfortunate communication failure, but we did our best to make the data available to the reviewers during the initial review process.

      (52) L.361-362: this seems a realistic approach from an evolutionary perspective, but we know from the jackdaw study by Boonekamp that the survival effect of brood size manipulation in a single year is very different from the survival effect of manipulating as in your model, i.e. every year of an individual's life the same manipulation. For very short-lived species this possibly does not make much difference, but for somewhat longer-lived species this could perhaps strongly affect your results. This should be discussed, and perhaps also explored in your simulations?

      Note that the Boonekamp study does not separate whether the survival effects are additive or

      multiplicative. As such, we do not know whether the survival effects for a single year manipulation are just small and hard to detect, or whether the survival effects are multiplicative. Our simulations assumed that the brood enlargement occurred every year throughout their lives. We have added some text to the discussion on the point you raise.

      (53) L.360: what is "lifetime reproductive fitness"? Is this different from just "fitness"?

      We have changed “lifetime reproductive fitness” to “lifetime reproductive output”.

      (54) L.363: when you are interested in optimal clutch size, why not also explore effects of reducing clutch size?

      As we find that a reduction in clutch size leads to a reduction in survival (for experimental studies), we already know that these individuals would have a reduced fitness return compared to reproducing at their normal level, and so we would not learn anything from adding this into our simulations. The interest in using clutch size enlargements is to find out why an individual does not produce more offspring than it does, and the answer is that it would not have a fitness benefit (unless its clutch size and survival rate combination is out of the bounds of that observable in the wild).

      (55) Fig.1 - using 'parental effort' in the y-axis label is misleading, suggest to replace with e.g. "clutch or brood size". Using "clutch size" in the title is another issue, as the experimental studies typically changed the number of young rather than the number of eggs.

      We have updated the figure axes to say “clutch size” rather than “parental effort”. Please see response to comment 35 where we explain our use of the term “clutch size” throughout this manuscript.

      (56) L.93 - 108: I appreciate the analysis in Table 1, in particular the fact that you present different ways of expressing the manipulation. However, in addition, I would like to see the results of an analysis treating the manipulations as factor, i.e. without considering the scale of the manipulation. This serves two purposes. Firstly, I believe it is in the interest of the field that you include a detailed comparison with the results of Santos & Nakagawa's analysis of what I expect to be largely the same data (manipulation studies only - for this purpose I would also like to see a comparison of effect size between the sexes). Secondly, there are (at least) two levels of meta-analysis, namely quantifying an overall effect size, and testing variables that potentially explain variation in effect size. You are here sort of combining the two levels of analysis, but including the first level also would give much more insight in the data set.

      Our main intention here was to improve on how the same hypothesis was approached by Santos & Nakagawa. We did this by improving our analysis (on a by “egg” basis) and by adding additional studies (i.e. more data). In this process mistakes are corrected (as we re-extracted all data, and did not copy anything across from their dataset – which was used simply to ensure we found the same papers); more recent data were also added, including studies missed by Santos & Nakagawa. This means that the comparison with Santos & Nakagawa becomes somewhat irrelevant, apart from maybe technical reasons, i.e. pointing out mistakes or limitations in certain approaches. We would not be able to pinpoint these problems clearly without considering the whole dataset, yet Santos & Nakagawa only had a small subset of the data that were available to us. In short, meta-analysis is an iterative process and similar questions are inevitably analysed multiple times and updated. This follows basic meta-analytic concepts and Cochrane principles. Except where there is a huge flaw in a prior dataset or approach (like we sometimes found and highlighted in our own work, e.g. Simons, Koch, Verhulst 2013, Aging Cell), in itself a comparison of the kind the reviewer suggests distracts from the biology. With the dataset being made available others can make these comparisons, if required. On the sex difference, we provide a comparison of effect sizes separated between both sexes and mixed sex in Table S2 and Figure S1.

      (57) L.93 - 108: a thing that does not become clear from this section is whether experimentally reducing brood size affects parental survival similarly (in absolute terms) as enlarging brood size. Whether these effects are symmetric is biologically important, for example because of its effect on clutch size optimization. In the text you are specific about the effects of increasing brood size, but the effect you find could in theory be due entirely to brood size reduction.

      We have added detail to make it clear that a brood reduction is simply the opposite trend. We use linear relationships because they serve a good approximation of the trend and provide a more rigorous test for an underlying relationship than would fitting nonlinear models. For many datasets there is not a range of chicks added for which a non-linear relationship could be estimated. The question also remains of what the shape of this non-linear relationship should be and is hard to determine a priori.

      We have added some discussion on this to our manuscript (L278-282), in response to an earlier comment.

      (58) L.103-107: this is perhaps better deferred to the discussion, because other potential explanations should also be considered. For example, there have been studies suggesting that small birds were provisioning their brood full time already, and hence had no scope to increase provisioning effort when brood size was experimentally increased.

      We agree this is a discussion point but we believe it also provides an important context for why we ran our simulations, and so we believe this is best kept brief but in place. We agree the example you give is relevant but believe this argument is already contained in this section. See line 121-123 “...suggesting that costs to survival were only observed when a species was pushed beyond its natural limits”.

      (59) L.103-107: this discussion sort of assumes that the results in Table 1 differ between the different ways that the clutch/brood size variation is expressed. Is there any statistical support for this assumption?

      We are unsure of what the reviewer means here exactly. Note that in each of the clutch size transformations, experimental and observational effect sizes are significantly opposite. For the proportional clutch size transformation, experimental and observation studies are both separately significantly different from 0.

      (60) L.104: at this point, I would like to have better insight into the data set. Specifically, a scatter plot showing the manipulation magnitude (raw) plotted against control brood size would be useful.

      Our data and code can be accessed with the following link: https://doi.org/10.5061/dryad.q83bk3jnk. We did supply this information when we submitted our manuscript and again during the review process but we believe this was not passed onto the reviewers by the journal.

      Thank you for this suggestion: this is a useful suggestion also to illustrate how manipulations are relatively stronger for species with smaller clutches, in line with our interpretation of the result presented in Figure 2. We have added Figure S1 which shows the strength of manipulation compared to the species average.

      (61) L. 107: this seems a bold statement - surely you can test directly whether effect size becomes disproportionally stronger when manipulations are outside the natural range, for example by including this characterization as a factor in the models in Table 1.

      It is hard to define exactly what the natural range is here, so it is not easy to factorise objectively, which is why we chose not to do this. However, it is clear that for species with small clutches the manipulation itself is often outside the natural range. Thank you for your suggestion to include a figure for this as it is clear manipulations are stronger in species with smaller clutches. We attribute this to species being forced outside their natural range. We consider our wording makes it clear that this is our interpretation of our findings and we therefore do not think this is a bold statement, especially as it fits with how we interpret our later simulations.

      (62) Fig.3, legend: the term 'node support' does not mean much to me, please explain.

      Node support is a value given in phylogenetic trees to dictate the confidence of a branch. In this case, values are given as a percentage and so can translate to how many times out of 100 the estimate of the phylogeny gives the same branching. Our values are low, as we have relatively few species in our meta-analysis.

      (63) Fig.3: it would be informative when you indicate in this figure whether the species contributed to the experimental or the observational data set or both.

      We have added into Fig 3 whether the species was observational, experimental or both.

      (64) L.139: the p-value refers to the interaction between species clutch size and treatment (observational vs. experimental), but it appears that no evidence is presented for the correlation being significant in either observational or experimental studies.

      We agree that our reporting of the effect size could be misinterpreted and have added detail here. The statistic provided describes the slopes are significantly different between observational and experimental, implying there are differences between the slopes of small and large clutch-laying species.

      (65) L.140: I am wondering to what extent these correlations, which are potentially interesting, are driven by the fact that species average clutch size was also used when expressing the manipulation effect. In other words, to what extent is the estimate on the Y-axis independent from the clutch size on the X-axis? Showing that the result is the same when using survival effect sizes per manipulation category would considerably improve confidence in this finding.

      We are unsure what the reviewer means by “per manipulation category”. Please also note that we have used a logistic regression to calculate our effect sizes of survival, given a unit increase in reproductive effort. So, for example, if a population contained birds that lay 2,3 or 4 eggs, provided that the number of birds which survived and died in each category did not change, if we changed the number of eggs raised to 10,11 or 12, respectively, then our effect size would be the same. In this way, our effect sizes are independent of the species’ average clutch size.

      (66) L.145: when I remember correctly, Santos & Nakagawa considered brood size reduction and enlargement separately. Can this explain the contrasting result? Please discuss.

      You are correct, in that Santos & Nakagawa compared reductions and enlargements to controls separately. However, we found some mistakes in the data extracted by Santos & Nakagawa that we believe explain the differences in our results for sex-specific effect sizes. We do not feel that highlighting these mistakes in the main text is fair, useful or scientifically relevant, as our approach is to improve the test of the hypothesis.

      (67) L.158-159: looking at table S2 it seems to me you have a whole range of estimates. In any case, there is something to be said for taking the estimates for females because it is my impression (and experience) that clutch size variation in most species is a sex-linked trait, in that clutch size tends to be repeatable among females but not among males.

      We agree that, in many cases, the female is the one that ultimately decides on the number of chicks produced. We did also consider using female effect sizes only, however, we decided against this for the following reasons: (1) many of the species used in our meta-analysis exhibit biparental care, as is the case for many seabirds, and so using females only would bias our results towards species with lower male investment; in our case this would bias the results towards passerine species. (2) it has also been shown that, as females in some species are operating at their maximum of parental care investment, it is the males who are able to adjust their workload to care for extra offspring. (3) we are ultimately looking at how many offspring the breeding adults should produce, given the effort it costs to raise them, and so even if the female chooses a clutch size completely independently of the male, it is still the effort of both parents combined that determines whether the parents gain an overall fitness benefit from laying extra eggs. (4) some studies did not clearly specify male or female parental survival and we would not want to reduce our dataset further.

      (68) L.158-168: please explain how you incorporated brood size effects on the fitness prospects of offspring, given that it is a very robust finding of brood size manipulation studies that this affects offspring growth and survival.

      We would argue this is near-on impossible to incorporate into our simulations. It is unrealistic to suggest that incorporating offspring growth into our simulations would add insight, as a change in offspring number rarely affects all offspring in the nest equally and there can even be quite stark differences; for example, this will be most evident in species that produce sacrificial offspring. This effect will be further confounded by catch-up growth, for example, and so it is likely that increased sibling competition from added chicks alters offspring growth trajectories, rather than absolute growth as the reviewer suggests. There are mixed results in the literature on the effect of altering clutch size on offspring survival, with an increased clutch size through manipulation often increasing the number of recruits from a nest. It would be interesting, however, to explore this further using estimates from the literature, but this is beyond our current scope, and would in our initial intuition not be very accurate. It would be interesting to explore how big the effect on offspring should be to constrain effect size strongly. Such work would be more theoretical. The point of our simple fitness projections here is to aid interpretation of the quantitative effect size we estimated.

      (69) L.163: while I can understand that you select the estimate of -0.05 for computational reasons, it has enormous confidence intervals that also include zero. This seems problematic to me. However, in the simulations, you also examined the results of selecting -0.15, which is close to the lower end of the 95% C.I., which seems worth mentioning here already.

      Thank you for this suggestion. Yes, indeed, our range was chosen based on the CI, and we have now made this explicit in the manuscript.

      (70) L.210: defined in this way, in my world this is not what is generally taken to be a selection differential. Is what you show not simply scaled lifetime reproductive success?

      As far as we are aware, a selection differential is the relative change between a given group and the population mean, which is what we have done here. We appreciate this is a slightly unusual context in which to place this, but it is more logical to consider the individuals who produce more offspring as carrying a potential mutation for higher productivity. However, we believe that “selection differential” is the best terminology for the statistic we present. We also detail in our methodology how we calculate this. We have adjusted this sentence to be more explicit about what we mean by selection differential.

      (71) L.177-180: is this not so because these parameter values are closest to the data you based your estimates on, which yielded a low estimate and hence you see that here also?

      We are unsure of what exactly the reviewer means here. The effect sizes for our exemplar species were predicted from each combination of clutch size and survival rate. Note that we used a range of effect sizes, higher than that estimated in our meta-analysis, to explore a large parameter space and that these same conclusions still hold.

      (72) L.191-194: these statements are problematic, because based on the assumption that an increase in brood size does not impact the fitness prospects of the offspring, and we know this assumption to be false.

      Though we appreciate that some cost is often absorbed by the offspring themselves, we are unaware of any evidence that these costs are substantial and large enough to drive within-species variation in reproductive effort, though for some specific species this may be the case. However, in terms of explaining a generalisable, across-species trend, the fitness costs incurred by a reduction in offspring quality are unlikely to be significantly larger than the survival costs to reproduce. We also find it highly unlikely the cost to fitness incurred by a reduction in offspring quality is large enough to counter-balance the effect of parental quality that we find in our observational studies. We do also discuss other costs in our discussion.

      (73) L.205: here and in other places it would be useful to be more explicit on whether in your discussion you are referring to observational or experimental variation.

      We have added this detail to our manuscript. Do note that many of our conclusions are drawn by the combination of results of experimental and observational studies. We believe the addition of Figure 5 makes this more clear to the reader.

      (74) L.225: this may be true (at least, when we overlook the misuse of the word 'quality' here), but I would expect some nuance here to reflect that there is no surprise at all in this result as this pattern is generally recognized in the literature and has been the (empirical) basis for the often-repeated explanation of why experiments are required to demonstrate trade-offs. On a more quantitative level, it is worth mentioning the paper of Vedder & Bouwhuis (2017, Oikos) that essentially shows the same thing, i.e. a positive association between reproductive output and parental survival.

      We have added some discussion on this point, including adding the citation mentioned. However, we would like to highlight that our results demonstrate that brood manipulations are not necessarily a good test of trade-offs, as they fail to recognise that individuals differ in their underlying quality. Though we agree that this result should not necessarily be a surprising one, we have also not found it to be the case that differences in individual quality are accepted as the reason that intra-specific clutch size is maintained – in fact, we find that it is most commonly argued that when costs of reproduction are not identifiedit is concluded that the costs must be elsewhere – yet we cannot find conclusive evidence that the costs of reproduction (wherever they lie) are driving intra-specific variation in reproductive effort. Furthermore, some studies in our dataset have reported negative correlations between reproductive effort and survival (see observational studies, Figure 1).

      (75) L.225-226: perhaps present this definition when you first use the term.

      We have added more detail to where we first use and define this term to improve clarity (L57-58).

      (76) L.227-228, "currently unknown": this statement surprised me, given that there is a plethora of studies showing within-population variation in clutch size to depend on environmental conditions, in particular the rate at which food can be gathered.

      We mean to question that if an individual is “high quality”, why is it not selected for? We have rephrased, to improve clarity.

      (77) L.231: this seems no more than a special case of the environmental effect you mention above.

      We think this is a relevant special case, as it constitutes within-individual variation in reproduction that is mistaken for between-individual variation. This is a common problem in our field, that we feel needs adressing. We only have between-individual variation here in our study on quality, and by highlighting this we show that there might not be any variation between individuals, but this could come about fully (doubtful) or partly (perhaps likely) due to terminal effects.

      (78) L235-236: but apparently depending on how experimental and natural variation was expressed? Please specify here.

      We are not sure what results the reviewer is referring to here, as we found the same effect (smaller clutch laying species are more severely affected by a change in clutch size) for both clutch size expressed as raw clutch size and standardised clutch size.

      (79) L.237: the concept of 'limits' is not very productive here, and it conflicts with the optimality approach you apply elsewhere. What you are saying here can also be interpreted as there being a non-linear relationship between brood size manipulation and parental survival, but you do not actually test for that. A way to do this would be to treat brood size reduction and enlargement separately. Trade-off curves are not generally expected to be linear, so this would also make more sense biologically than your current approach.

      We have replaced “limits” with “optima”. We believe our current approach of treating clutch size as a continuous variable, regardless of manipulation direction, is the best approach, as it allows us to directly compare with observational studies and between species that use different manipulations (now nicely illustrated by the reviewer’s suggested Figure S1). Also note that transforming clutch size to a proportion of the mean allows us to account for the severity in change in clutch size. We also do not believe that treating reductions and enlargements separately accounts for non-linearity, as either we are separating this into two linear relationships (one for enlargements and one for reductions) or we compare all enlargements/reductions to the control, as in Santos & Nakagawa 2012, which does not take into account the severity of the increase, which we would argue is worse for accounting for non-linearity. Furthermore, in the cases where the manipulation involved one offspring only, we also cannot account for non-linearity.

      (80) L.239: assuming birds are on average able to optimize their clutch size, one could argue that any manipulation, large or small, on average forces birds to raise a number of offspring that deviates from their natural optimum. At this point, it would be interesting to discuss in some detail studies with manipulation designs that included different levels of brood size reduction/enlargement.

      We agree with the reviewer that any manipulation is changing an individual’sclutch size away from its own individual optima, which we have argued also means brood manipulations are not necessarily a good test of whether a trade-off occurs in the wild (naturally), as there could be interactions with quality – we have now edited to explicitly state this (L299-300).

      (81) L.242-244: when you choose to maintain this statement, please add something along the lines of "assuming there is no trade-off between number and quality of offspring".

      As explained above, though we agree that the offspring may incur some of the cost themselves, we are not aware of any evidence suggesting this trade-off is also large enough to drive intra-specific variation in clutch size across species. Furthermore, in the context here, the trade-off between number and quality of offspring would not change our conclusion – that the fitness benefit of raising more offspring is offset by the cost on survival. We have added detail on the costs incurred by offspring earlier in our discussion (L309-315). The addition of Figure 5 should help interpret these data.

      (82) L.253: instead of reference 30 the paper by Tinbergen et al in Behaviour (1990) seems more appropriate.

      We believe our current citation is relevant here but we have also added the Tinbergen et al (1990) citation.

      (83) L.253-254: such trade-offs may perfectly explain variation in reproductive effort within species if we were able to estimate cost-benefit relations for individuals. In fact, reference 29 goes some way to achieve this, by explaining seasonal variation in reproductive effort.

      We are unaware of any quantitative evidence that any combination of trade-offs explains intra-specific variation in reproductive effort, especially as a general across-species trend.

      (84) L.255: how does one demonstrate "between species life-history trade-offs"? The 'trade-off' between reproductive rate and survival we observe between species is not necessarily causal, and hence may not really be a trade-off but due to other factors - demonstrating causality requires some form of experimental manipulation.

      Between-species trade-offs are well established in the field, stemming from GC Williams’ seminal paper in 1966, and for example in r/K selection theory. It is possible to move from these correlations to testing for causation, and this is happening currently by introducing transgenes (genes from other species) that promote longevity into shorter-lived species (e.g., naked-mole rat genes into mice). As yet it is unclear what the effects on reproduction are.

      (85) L.256: it is quite a big claim that this is a novel suggestion. In fact, it is a general finding in evolutionary theory that fitness landscapes tend to be rather flat at equilibrium.

      It is important to note here that we simulate the effect size found, and hence this is the novel suggestion, that because the resulting fitness landscape is relatively flat there is no directional selection observed. We did not intend to suggest our interpretation of flat fitness landscapes is novel. We have changed the phrasing of this sentence to avoid misinterpretation.

      (86) L.259: why bring up physiological 'costs' here, given that you focus on fitness costs? Do you perhaps mean fitness costs instead of physiological costs? Furthermore, here and in the remainder of this paragraph it would be useful to be more specific on whether you are considering natural or experimental variation.

      The cost of survival is a physiological cost incurred by the reduction of self-maintenance as a result of lower resource allocation. This is one arm of fitness; we feel it would be confusing here to talk about costs to fitness, as we do not assess costs to future reproduction (which formed the large part of the critique offered by the reviewer). We would like to highlight that the aim of this manuscript was to separate costs of reproduction from the effects of quality, and this is why we have observational and experimental studies in one analysis, rather than separately. Our conclusion that we have found no evidence that the survival cost to reproduce drives within-species variation in clutch size comes both from the positive correlation found in the observational studies and our negligible fitness return estimates in our simulations. We therefore, do not believe it is helpful to separate observational and experimental conclusions throughout our manuscript, as the point is that they are inherently linked. We hope that with the addition of Figure 5 that this is more clear.

      (87) L.262: The finding that naturally more productive individuals tend to also survive better one could say is by definition explained by variation in 'quality', how else would you define quality?

      We agree, and hence we believe quality is a good term to describe individuals who perform highly in two different traits. Note that we also say the lack of evidence that trade-offs drive intra-specific variation in clutch size also potentially suggests an alternative theory, including intra-specific variation driven by differences in individual quality.

      Supplementary information

      (88) Table S1: please provide details on how the treatment was coded - this information is needed to derive the estimates of the clutch size effect for the treatments separately.

      We have added this detail.

      (89) Table S2: please report the number of effect sizes included in each of these models.

      We have added this detail.

      (90) Table S4: references are not given. Mentioning species here would be useful. For example, Ashcroft (1979) studied puffins, which lay a single egg, making me wonder what is meant when mentioning "No clutch or brood size given" as the reason for exclusion. A few more words to explain why specific studies were excluded would be useful. For example, what does "Clutch size groups too large" mean? It surprises me that studies are excluded because "No standard deviation reported for survival" - as the exact distribution is known when sample size and proportion of survivors is known.

      We have updated this table for more clarity.

      (91) Fig.S1: please plot different panels with the same scale (separately for observational and experimental studies). You could add the individual data points to these plots - or at least indicate the sample size for the different categories (female, male, mixed).

      We have scaled all panels to have the same y axis and added sample sizes to the figure legend.

      (92) Fig.S3: please provide separate plots for experimental and observational studies, as it seems entirely plausible that the risk of publication bias is larger for observational studies - in particular those that did not also include a brood size manipulation. At the same time, one can wonder what a potential publication bias among observational studies would represent, given that apparently you did not attempt to collect all studies that reported the relevant information.

      We have coloured the points for experimental and observational studies. Note that a study is an independent effect size and, therefore, does not indicate whether multiple data (i.e., both experimental and observational studies) came from the same paper. As we detail in the paper and above in our reviewer responses, we searched for observational studies from species used in the experimental studies to allow direct comparison between observational and experimental datasets.

      Reviewer #2 (Recommendations For The Authors):

      I strongly recommend improving the theoretical component of the analysis by providing a solid theoretical framework before, from it, drawing conclusions.

      This, at a minimum, requires a statistical model and most importantly a mechanistic model describing the assumed relationships.

      We thank the reviewer for highlighting that our aims and methodology are unclear in places. We have added detail to our model and simulation descriptions and have improved the description of our rationale. We also feel the failure of the journal to provide code and data to the reviewers has not helped their appreciation of our methodology and use of data.

      Because the field uses the same wording for different concepts and different wording for the same concept, a glossary is also necessary.

      We thank the reviewer for raising this issue. During the revision of this manuscript, we have simplified our terminology or given a definition, and we believe this is sufficient for readers to understand our terminology.

      Reviewer #3 (Recommendations For The Authors):

      • The files containing information of data extracted from each study were not available so it has not been possible to check how any of the points raised above apply to the species included in the study. The ms should include this file on the Supp. Info as is standard good practice for a comparative analysis.

      We supplied a link to our full dataset and the code we used in Dryad with our submitted manuscript. We were also asked to supply our data during the review process and we again supplied a link to our dataset and code, along with a folder containing the data and code itself. We received confirmation that the reviewers had been given our data and code. We support open science and it was our intention that our dataset should be fully available to reviewers and readers. We believe the data is too large to include as a table in the main text and is not essential in understanding the paper. Our data and code are at https://doi.org/10.5061/dryad.q83bk3jnk.

      • For clarity, refer to 'the effect size of clutch size on survival" rather than simply "effect size". Figures 1 and 2 require cross-referencing with the main text to understand the y-axis.

      We have added detail to the figure legend to increase the interpretability of the figures.

      • Silhouettes in Figure 3 (or photos) would help readers without ornithological expertise to understand the taxonomic range of the species included in the analyses.

      We have added silhouettes into Figure 3.

      • Throughout the discussion: superscripts shouldn't be treated as words in a sentence so please add authors' names where appropriate.

      We have added author names and dates where required.

    1. eLife assessment

      This valuable paper presents a new protocol for quantifying tRNA aminoacylation levels by deep sequencing. The improved methods for discrimination of aminoacyl-tRNAs from non-acylated tRNAs, more efficient splint-assisted ligation to modify the tRNAs' ends for the following RT-PCR reaction, along with the use of an error-tolerating mapping algorithm to map the tRNA sequencing reads provide new tools for anyone interested in tRNA concentrations and functional states in different cells and organisms. The results and conclusions are solid, with well-designed tests to optimize the protocol under different conditions.

    2. Reviewer #1 (Public Review):

      The manuscript of Davidsen and Sullivan describes an improved tRNA-seq protocol to determine aminoacyl-tRNA levels. The improvements include: (i) optimizing the Whitfeld or oxidation reaction to select aminoacyl-tRNAs from oxidation-sensitive non-acylated tRNAs; (ii) using a splint-assisted ligation to modify the tRNAs' ends for the following RT-PCR reaction; (iii) using an error-tolerating mapping algorithm to map the tRNA sequencing reads that contain mismatches at modified nucleotides.

      The revised manuscript of Davidsen and Sullivan has addressed my concerns in the previous review. The authors performed a end-to-end comparison, which I requested - Fig. 2 and Fig S2. This is exactly what I meant, albeit the differences in each method to perform the comparison of the detectability. The manuscript is a strong methodological improvement of the tRNA quantification protocols!

    3. Reviewer #2 (Public Review):

      Davidsen and Sullivan present an improved method for quantifying tRNA aminoacylation levels by deep sequencing. By combining recent advances in tRNA sequencing with lysine-based chemistry that is more gentle on RNA, splint oligo-based adapter ligation, and full alignment of tRNA reads, they generate an interesting new protocol. The lab protocol is complemented by a software tool that is openly available on Github. Many of the points highlighted in this protocol are not new, but have been used in recent protocols such as Behrens et al. (2021) or McGlincy and Ingolia (2017). Nevertheless, a strength of this study is that the authors carefully test different conditions to optimize their protocol using a set of well-designed controls.

      The conclusions of the manuscript appear to be well supported by the data presented. However, the lack of benchmarking relative to other methods remains as a key criticism also after this revision.

      (1) The manuscript reports a different method to measure aminoacylation of tRNA. The main point that remains unsatisfactory is a better benchmarking of such aminoacylation measurements against the state of the art. In the current form of the revised manuscript it is not possible to estimate how much the results of this new protocol differ from alternative methods and in particular from Behrens et al. (2021). Here it will be helpful to perform experiments with samples similar to those (like HEK cells or yeast cells) used in the mim-tRNAseq study and not with H1299 cells.

      The claim that a comparison to every published protocol is not feasible is not a good argument for not performing any benchmarking experiments. Such benchmarking experiments are not meant to define the ground truth but are needed to estimate the difference in the outcome of different protocols. I agree with the authors that precision/reproducibility is essential when developing a new protocol. But the analysis and comparison should not stop there.

      (2) The reported protocol can not only be used for quantification of tRNA aminoacylation but it can also be used for tRNA quantification and analysis of tRNA modifications. It will increase the impact of this study if the authors benchmark the outcomes of their protocol with other tRNA sequencing protocols with samples similar to these papers, which will be important for certain research teams that are unlikely to implement two different tRNA sequencing methods.

      The authors decided not to perform further experiments in cell lines or mutants that allow a comparison to other published methods. In my opinion this limits the impact of the work. But as a reviewer I can only make recommendations. It is the authors decision to take those or not.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable paper presents a new protocol for quantifying tRNA aminoacylation levels by deep sequencing. The improved methods for discrimination of aminoacyl-tRNAs from non-acylated tRNAs, more efficient splint-assisted ligation to modify the tRNAs' ends for the following RT-PCR reaction, and the use of an error-tolerating mapping algorithm to map the tRNA sequencing reads provide new tools for anyone interested in tRNA concentrations and functional states in different cells and organisms. The results and conclusions are solid with well-designed tests to optimize the protocol under different conditions.

      Public Reviews:

      We thank both reviewers for suggestions, feedback and improvements. We address these pointwise below.

      Reviewer #1 (Public Review):

      Summary:

      The manuscript of Davidsen and Sullivan describes an improved tRNA-seq protocol to determine aminoacyl-tRNA levels. The improvements include: (i) optimizing the Whitfeld or oxidation reaction to select aminoacyl-tRNAs from oxidation-sensitive non-acylated tRNAs; (ii) using a splint-assisted ligation to modify the tRNAs' ends for the following RT-PCR reaction; (iii) using an error-tolerating mapping algorithm to map the tRNA sequencing reads that contain mismatches at modified nucleotides.

      Strengths:

      The two steps, the oxidation, and the splint-assisted ligation are yield-diminishing steps, thus the protocol of Davidsen and Sullivan is an important improvement of the current protocols to enhance the quantification of aminocyl-tRNAs.

      Weaknesses:

      The oxidation and the selection of aminoacyl-tRNA is the first step in all protocols. Thereafter they differ on whether blunt ligation, hairpin (DM-tRNA-seq, YAMAT-seq, QuantM-seq, mim tRNA-seq, LOTTE tRNA-seq), or splint ligation is used and finally what detection method is applied (i-tRAP, tRNA microarrays). What is the correlation to those alternative approaches (e.g. i-tRAP (PMID 36283829), tRNA microarrays (PMID: 31263264) etc.)? What is the correlation with other approaches with which this improved protocol shares some steps (DM-tRNA-seq, mim-tRNA-seq)?

      We appreciate the fair assessment and fully agree that our work would benefit from a large comparison between all known tRNA-seq methods. We did directly compare many elements of our method to those of other methods (e.g. ligation efficiency and barcode bias); however, as noted by the reviewer we did not perform a direct end-to-end comparison with all other methods. An ideal comparison would require running several different sample conditions and technical replicates through our protocol and repeating the process across a half dozen or so other methods as they are described. Unfortunately, this approach is unlikely to be feasible since each method uses different oligos, reagents and kits, and all would have to be acquired at substantial cost. Some methods also rely on other detection methods such as microarrays, qPCR, or Illumina sequencing, which would also make this goal all the more onerous. There are also different pipelines for data processing that, in some instances, make the final results hard to compare. In short, this would be a monumental and expensive task to do comprehensively. We also worry that, even if these experiments were conducted such that some variables were concluded to be superior, they could still be challengeable based on perceived or actual protocol differences from the prior art. In summary, we think that an overall comparison with each method would be ideal, but practical concerns limit us to optimizing and comparing the variables that we found to be most prone to introducing bias in the results.

      For methods that measure tRNA expression levels (DM-tRNA-seq, YAMAT-seq, QuantM-seq, mim-tRNA-seq, LOTTE tRNA-seq etc.) there are some fundamental problems regarding absolute quantification using NGS that preclude simple comparisons. These problems are well known in the field of microRNA (Fuchs et al. (2012) [PMID: 25942392]) and arise due to several factors introduced during processing steps such as purification, ligation, reverse transcription and amplification. With the lack a “true” quantitation benchmark it would be difficult to make quantitative claims from each.  Therefore, in our own work we benchmark tRNA expression levels for sample-to-sample reproducibility (i.e. precision) as further explained in the response to reviewer #2.

      For comparison to methods that measure tRNA charge we did have an opportunity to compare our results with those of another study. To this end, we have added a figure comparing the baseline charge found using our method and the one used in Evans et al. (Revised manuscript Figure 2—figure supplement 9). This comparison finds broadly similar results for tRNA charge, including similar trends for a subset of Glu, Ser and Pro codons that are notable for their lowered basal tRNA charge.

      Reviewer #2 (Public Review):

      Davidsen and Sullivan present an improved method for quantifying tRNA aminoacylation levels by deep sequencing. By combining recent advances in tRNA sequencing with lysine-based chemistry that is more gentle on RNA, splint oligo-based adapter ligation, and full alignment of tRNA reads, they generate an interesting new protocol. The lab protocol is complemented by a software tool that is openly available on Github. Many of the points highlighted in this protocol are not new but have been used in recent protocols such as Behrens et al. (2021) or McGlincy and Ingolia (2017). Nevertheless, a strength of this study is that the authors carefully test different conditions to optimize their protocol using a set of well-designed controls.

      The conclusions of the manuscript appear to be well supported by the data presented. However, there are a few points that need to be clarified.

      We appreciate the acknowledgement of the strength of our aminoacylation controls and agree that our method is relying on many aspects of the mentioned prior work.  

      (1) One point that remains unsatisfactory is a better benchmarking against the state of the art. It is currently impossible to estimate how much the results of this new protocol differ from alternative methods and in particular from Behrens et al. (2021). Here it will be helpful to perform experiments with samples similar to those used in the mim-tRNAseq study and not with H1299 cells.

      We fully agree that more rigorous benchmarking would be desirable. As also noted in the response to reviewer #1, a full end-to-end comparison of methods would be ideal but would be onerous and expensive in practice, so we focused on optimizing the steps we found to be most prone to introducing bias in the data.

      We agree that Behrens et al., (2021) has substantial methodological overlap with our work and was instrumental in our efforts; however, the focus of their manuscript was largely on quantification of tRNA abundance and modifications, rather than the tRNA charge. In fact, tRNA charge was only determined for yeast in that study. Quantifying the abundance of short RNAs using NGS is very difficult (Fuchs et al. (2012) [PMID: 25942392]) and will likely require the use of a mixture of tRNAs as spike-in references for normalization (Bissels et al. (2009) [PMID: 19861428]). In the case of Behrens et al. (2021), they did not use a spike-in tRNA reference, but instead correlated gene copy number with their measured tRNA abundance. They also compare to Northern blotting for two tRNA transcripts, showing a directionally similar result; however, no quantitative claims can be made measurement accuracy. Until a good method of normalizing tRNA quantification is found, we believe that sample-to-sample reproducibility (i.e. precision) is the most useful objective to optimize because this will allow detection of differential expression. Towards that end, we quantified the precision of our method (Figure 4 and its two supplementary figures) with associated statistics, which can be used to estimate the number of samples required to detect significance during differential expression analysis. For tRNA charge, quantification is easier, which is why we present statistics on both accuracy and precision. In this case we can better compare results across methods, and so we have added a comparison of our results to the charge quantification from Evans et al. (2017) (Figure 2—figure supplement 9).

      (2) While the protocol aims to implement an improved method for quantification of tRNA aminoacylation, it can also be used for tRNA quantification and analysis of tRNA modifications. It will increase the impact of this study if the authors benchmark the outcomes of their protocol with other tRNA sequencing protocols with samples similar to these papers, which will be important for certain research teams that are unlikely to implement two different tRNA sequencing methods. Are there any possible adaptations that would allow the analysis of tRNA fragments?

      The first part of this comment regarding comparison of methods is addressed in response to in the prior reviewer comment and in the response to reviewer 1. In the specific case of tRNA modifications, the issue is similar to abundance quantification in that a “true” reference of modified tRNA is likely necessary for proper quantification, alongside testing of each method simultaneously.

      Regarding tRNA fragments, our method is not suitable for this use case. This is because our adapter ligation step depends on an intact tRNA structure with either CCA or CC overhang on the 3’-end and thus we almost exclusively get reads with CCA/CC ends and no reads from fragments. This specificity is good for increasing charge quantification accuracy but not good for the methods versatility. For a more versatile method we recommend Watkins et al. (2022) [PMID: 35513407].

      (3) Like Behrens et al. (2021), Davidsen and Sullivan use TGIRT-III RT for their analyses. The enzyme is not currently available in a form suitable for tRNA-seq. It would be very helpful to test different new RT enzymes that are commercially available. The example of Maxima RT - Figure 2 Supp 6 - shows significantly lower performance than the presented TGIRT-III RT data. In lines 296-298, the authors mention improvements to the protocol by using ornithine. Why are these improvements not included?

      We share similar concerns that the TGIRT-III enzyme is no longer commercially available. It became unavailable while we were preparing this manuscript, reflected by the fact that almost all our figures are made using this enzyme. Others have discovered this too and Lucas et al. (2023) [PMID: 37024678] tested several RT polymerases using TapeStation as a readout for readthrough. As they reported that Maxima has good performance, we decided to test it on a full run with replicates. The results are outlined in Figure 2—figure supplement 6 and for resubmission we have added a table to the appendix that compares the alignment statistics. Unfortunately, the readthrough of the Maxima polymerase on cytoplasmic tRNAs is not as high as for TGIRT-III; however, interestingly it seems to have better performance for mitochondrial tRNAs (Figure 2 – Figure Supplement 6). Regardless, in the initial paper submission we failed to evaluate whether this readthrough difference affected charge measurements. We have now fixed this by adding Figure 2—figure supplement 7, which shows that there are no differences in charge measurements TGIRT-III vs. Maxima. Not surprisingly, there are substantial differences between polymerases when looking at relative tRNA abundance (which affirms the discussion above related to the difficulty of tRNA abundance quantification); however, the high sample-to-sample reproducibility remains intact with either polymerase. An exhaustive search for better polymerases is warranted but falls outside the scope of our work.

      Regarding the improvements suggested by us, using ornithine as a cleavage catalyst instead of lysine, we first learned about this possibility later and thus only want to make readers aware that other options exist. We have clarified the paragraph to make this clearer.

      (4) A technical concern: The samples are purified multiple times using a specific RNA purification kit. Did the authors test different methods to purify the RNA and does this influence the result of the method?

      In the past, we have relied exclusively on alcohol precipitation but during the development of this protocol we found it easier and more reproducible to use column-based purification when possible. However, as we have not made a direct comparison this remains anecdotal evidence. Nonetheless, to minimize any possible bias of column-based purification you will notice that we use columns with binding capacity 5x higher than the highest amount of RNA/DNA added to the column.

      (5) The study would benefit from an explicit step-by-step protocol, including the choice of adapters that are shown to work best in the protocol.

      This is a great point! We have included tables with all the oligos used (Supplementary file 1), a detailed step-by-step protocol with pictures of anticipated gel results (Supplementary file 2) and an overview of the RNA/DNA manipulations to make it clear where adapter sequences are located (Supplementary file 3). For the data processing we provide a comprehensive example in the Github repository. All this was included in our first submission of this manuscript (as well as on bioRxiv), but we suspect this was not readily accessible to the reviewers. We will make sure that these documents are going to be available through eLife and have emphasized their existence in the main text of the manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      To stratify this improvement a comparison to the most common methods should be made. For example, how do the results with the improved protocol with i-tRAP (PMID 36283829), tRNA microarrays (PMID: 31263264), or with the approaches the improved protocol shares with some other tRNA-seq approaches (DM-tRNA-seq, mim-tRNA-seq)?

      Once again, we thank the reviewer for the good recommendations. The points about direct comparisons were discussed above.

      Reviewer #2 (Recommendations For The Authors):

      These are all great points; we address them below.

      Minor points:

      - Please use chemical conventions, e.g. for mcm5s2U and NaIO4 with superscript or subscript.

      Fixed.

      - Figure 2F: Glu GAA is only 82% charged; can this be due to mcm5s2U (Figure 3 supp 2) leading to a misalignment? What happens to Ser-NNN? Why is mitochondrial tRNA so much less charged?

      Regarding the Glu-GAA charge at baseline, we do not think this is an artifact of the mcm5s2U modification as it would then also be expected for Gln-CAA and Lys-AAA. The same occurs in the charge data in Evans et al. (2017) and they use a very different alignment strategy. Lastly, the charge titration and half-life experiments show no evidence of inaccuracy/bias for Glu-GAA.

      But the question remains – why is the charge of Glu-GAA so low? At this point our best guess is speculative. It may have something to do with the strong enrichment of Glu-GAA codons in the A site found by ribosome profiling on mouse embryonic stem cells (Ingolia et al. (2011) [PMID: 22056041]).

      - Spell out "clvg" or "dphs" in the figure legend of Figure 2 and others. Similar for other abbreviations in figures. They are not always explained in the legends.

      Fixed.

      - Figure 3 supp 2: Please use U instead of T in the anticodons. The labels are a bit confusing. Please clearly align to the tick (also for Figure 3C).

      Fixed.

      - Line 220-223. Which RT enzyme was used for Figure 3 supp 2? Does it make a difference?

      TGIRT-III was used. Only Figure 2—figure supplement 6 and Figure 2—figure supplement 7 (added for resubmission) show data with the Maxima polymerase. To address the second part of the question we have added a comparison between TGIRT-III and Maxima for mcm5s2U modification detection (Figure 3—figure supplement 3). Interestingly, there is a polymerase specific signature for mcm5s2U modifications; however, more work would be required to determine which polymerase is best suited for detection of this and other modifications.

      - Figure 4 supp 1 and Figure 4 supp 2 change order.

      Fixed.

      Typos:

      - Figure 1 and Figure 1-figure supplement 1: In the periodate the "-" is in a small box (at least in my PDF viewer). Can this box be removed?

      - Line 175: duplicated verb.

      - Line 348: "moved".

      Thanks for catching these. They have now been fixed.

    1. eLife assessment

      This useful studying implicates TRPV4 as a mediator of sweat, potentially based on TRPV4's expression and function on sweat glands. The data and methods are solid, with some limitations in terms of the approach. Overall, the work lends new insight into the physiologic basis of sweating using data from mice and humans.

    2. Joint Public Review:

      In this study, Kashio et al examined the role of TRPV4 in regulating perspiration in mice. They find coexpression of TRPV4 with the chloride channel ANO1 and aquaporin 5, which implies possible coupling of heat sensing through TRPV4 to ion and water excretion through the latter channels. Calcium imaging of eccrine gland cells revealed that the TRPV4 agonist GSK101 activates these cells in WT mice, but not in TRPV4 KO. This effect is reduced with cold-stimulating menthol treatment. Temperature-dependent perspiration in mouse skin, either with passive heating or with ACh stimulation, was reduced in TRPV4 KO mice. Functional studies in mice - correlating the ability to climb a slippery slope to properly regulate skin moisture levels - reveal potential dysregulation of foot pad perspiration in TRPV4 KO mice, which had fewer successful climbing attempts. Lastly, a correlation of TRPV4 to hypohydrosis in humans was shown, as anhidrotic skin showed reduced levels of TRPV4 expression compared to normohidrotic or control skin.

      Overall this is an interesting study on how TRPV4 regulates perspiration.

      (1) The functional relationship between TRPV3 and ANO1 remains correlative.

      (2) Littermate controls were not used, but TRPV4ko were backcrossed onto the WT strain.

      (3) In general, the results support the authors' claims that TRPV4 activity is a necessary component of sweat gland secretion, which may have important implications for controlling perspiration; secretion from other glands where TRPV4 may be expressed remains a possibility given the lack of us of exocrine-specific knockouts.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) Measurement of secreted amylase could be seen as direct evidence of sweating, however, how to determine the causal relationship between climbing behavior and sweating? Friction force may also be reduced when there is too much fingertip moisture.

      As the reviewer notes, measurement of secreted amylase can provide direct evidence of sweating, and we performed an iodine and starch reaction. Upon observing the involvement of TRPV4 in mouse foot pad perspiration, we then considered which type of behavioral analysis would be suitable to evaluate this perspiration. We agree with the reviewer’s point that friction force in the climbing test may be reduced by excessive sweating. However, we did not observe severe sweating in the absence of acetylcholine treatment. Accordingly, we interpreted that the increase in the climbing test failure rate for TRPV4KO mice could reflect the reduced friction force associated with the lack of TRPV4 activity.

      (2) For the human skin immunostaining, did the author use the same TRPV4 antibody as used in the mouse staining? Did they validate the specificity of the antibody for the human TRPV4 channel? 

      We used different antibodies for human and mouse samples. Since commercially available anti-TRPV4 antibodies do not work well with mouse samples, we generated our own anti-TRPV4 antibody and validated its specificity.

      (3) In lines 116-117, the authors tried to determine "the functional interaction of TRPV4 and ANO1 is involved in temperature-dependent sweating", however, they only used the TRPV4 ko mice and did not show any evidence supporting the relationship between TRPV4 and ANO1. 

      As the reviewer pointed out, based on the data presented in the original submission we cannot conclude that an interaction between TRPV4 and ANO1 is involved in perspiration. However, we think that the data for TRPV4KO mice presented in Figure 3 of the original version does indicate that TRPV4 is involved in perspiration. The finding that menthol and its related compounds, which inhibit the function of both TRPV4 and ANO1 (see our publication in Scientific Reports 7: 43132, 2017), blocked perspiration in both wild-type and TRPV4KO mice (original Figure 3C, D) indicates involvement of either TRPV4 or ANO1 in perspiration. In the revised version, we present results for additional iodine and starch reaction experiments using Ani9, a potent and specific ANO1 inhibitor. Ani9 drastically inhibited perspiration from mouse food pads both at 25 °C and 35 °C. Based on these collective results, we concluded that both TRPV4 and ANO1, likely acting as a complex, are involved in perspiration. We present the new data with Ani9 in the revised Figure 3E, F.

      (4) Figure 3-4 is quite confusing. At 25˚C, no sweating difference was observed between TRPV4 and wt mice (Fig 3A-3D), suggesting both Ach-induced sweating and basal sweating are TRPV4-independent at 25˚C, however, the climbing test was done at 26-27 ˚C and the data showed a climbing deficit in TRPV4 ko mice. How to interpret the data is unclear. 

      Thank you for raising this point. In the iodine and starch reaction experiment, we observed no significant reduction in perspiration in the absence of acetylcholine at 25 °C, which is the same condition as in the climbing test, whereas we detected less perspiration for TRPV4KO mice. In a trial using additional mice, we detected significantly less perspiration under control conditions without acetylcholine at 25 °C, which is consistent with the results of the climbing test. We have added this new data to the revised Figure 3A, B.

      (5) Were there any gender differences associated with sweating in mice? In Figure 3, the mouse number for behavior tests should be at least 5. 

      The TRPV4KO mice reproduced poorly and we were unable to obtain sufficient numbers of male and female mice to determine whether there were gender differences in sweating. However, according to the reviewer’s suggestion, and as mentioned above, we increased the number of experiments to obtain the results shown in the revised Figure 3. We did not a observe a significant difference in sweating with the larger sample size, which supports our conclusions.

      (6) 8- to 21-week-old mice were used in the immunostaining, the time span is too long. 

      Given the difficulty in obtaining sufficient numbers of TRPV4KO mice, we used a somewhat wider age distribution to obtain samples for immunostaining. However, we did not observe age-dependent differences in immunostaining. We reference this point in the revised manuscript.

      (7) The authors used homozygous TRPV4 ko mice for all experiments. What are control mice? Are they littermates of the TRPV4 ko mice? 

      We did not use littermates for our in vivo experiments because the TRPV4KO mice reproduced poorly and the litter sizes were small. However, we did backcross the KO mice to the commercially available wild-type mice more than ten times. As such, we expect that the wild-type and TRPV4KO mice will have similar genetic backgrounds. In addition, we have published multiple studies that have successfully used this method, which we think supports the reliability of our results for experiments involving mice.

      Reviewer #2 (Public Review):

      (1) The coexpression data needs additional controls. In the TRPV4 KO mice, there appears to be staining with the TRPV4 Ab in TRPV4 KO mice below the epidermis. This pattern appears similar to that of the location of the secretory coils of the sweat glands (Fig 1A). Is the co-staining the authors note later in Figure 1 also seen in TRPV4 KOs? This control should be shown, since the KO staining is not convincing that the Ab doesn't have off-target binding. 

      We thank the reviewer for raising these concerns about immunostaining. As the reviewer notes, in the low power image the signals appeared to be weak and punctate signals were present in the basal region of glandular cells. Although we did not identify immunohistochemical conditions that produced no signal, tissue sections from WT mice stained with anti-TRPV4 antibody showed conspicuous apical signals for the glandular cells facing lumen. Meanwhile, TRPV4KO tissues showed no signals at the apical region of the glandular cells, where the TRPV4-ANO1 interaction is expected to occur. We confirmed no trace signals in the TRPV4KO tissues in the immunoblotting.

      (2) Are there any other markers besides CGRP for dark cells in mice to support the conclusion that mouse secretory cells have clear cell and dark cell properties? 

      We did not stain with other dark cell markers. Based on previous studies describing the differences between clear and dark cells in mouse eccrine glands, we think that dark and clear cells cannot be clearly discriminated, as we described in lines 93-96 of the Results. We identified secretory cells using CK8 and dark cells with CGRP, a marker of dark cells in human eccrine glands (Zancanaro et al. 1999 J Anat). Our result showed that CGRP immunostaining could not discriminate between clear and dark cells, which is consistent with a previous report showing that mouse secretory cells were assumed to be undifferentiated and primitive based on electron microscopic observation (Kurosumi et al. 1970 Arch Histol Jap).

      (3) The authors utilize menthol (as a cooling stimulus) in several experiments. In the discussion, they interpret the effect of menthol as potentially disrupting TRPV4-ANO1 interactions independent of TRPM8. Yet, the role of TRPM8, such as in TRPM8 KO mice, is not evaluated in this study.

      We performed the iodine and starch reaction experiments with TRPM8KO mice. In the TRPM8KO mice, the sweat spots did not differ from those seen for WT mice (p=0.63, t-test), and there was also a significant reduction in sweating with menthol treatment following acetylcholine stimulation that was similar to that seen for WT mice. These results would rule out the involvement of TRPM8 in a menthol-induced reduction in sweating. We have included this data in the revised Figure 3D.

      (4) Along those lines, the authors suggest that menthol inhibits eccrine function, which might lead to a cooling sensation. But isn't the cooling sensation of sweating from evaporative cooling? In which case, inhibiting eccrine function may actually impair cooling sensations.

      Menthol has a non-specific effect that activates TRPM8, TRPV3 and TRPA1, and inhibits TRPV1, TRPV4 and ANO1. Therefore, we did not carry out a climbing test with menthol in part because menthol-dependent TRPA1 activation decreased the propensity of the mice to climb. As the reviewer notes, TRPM8 activation following topical application of menthol may cause a cooling sensation elicited in sensory neurons beneath the skin. However, the comfortable cooling sensation could also be caused in part by decreased sweating. The relationship between a comfortable cooling sensation and less perspiration following menthol application may be difficult to determine, and we have mentioned this in the updated Discussion.

      (5) The climbing assay is interesting and compelling. The authors note performing this under certain temperature and humidity conditions. Presumably, there is an optimal level of skin moisture, where skin that is too dry has less traction, but skin that is too wet may also have less traction. It would bolster this section of the study to perform this assay under hot conditions (perhaps TRPV4 KO mice, with impaired perspiration, would outperform WT mice with too much sweating?), or with pharmacologic intervention using TRPV4 agonists or antagonists to more rigorously evaluate whether this model correlates to TRPV4 function in the setting of different levels of perspiration.

      We thank the reviewer for this suggestion. Upon detecting the involvement of TRPV4/ANO1 interaction in perspiration, we considered different behavioral analyses that can be performed to demonstrate whether the TRPV4/ANO1 interactions are involved in perspiration. As the reviewer suggested, there should be an optimal level of sweating. Therefore, we first set the room temperature at 26-27 ˚C and humidity at 35-50%. To our knowledge, this is the first demonstration of temperature-dependent sweating of mouse foot pads. In humans, palm sweating is often referred to as psychotic sweating that is known to be regulated by sympathetic nerve activity. Here we tested whether foot pad sweating might be related to friction force wherein sufficient amounts of sweating could increase the friction force and in turn increase the success rate for the climbing test using a vinyl-covered slippery slope that was selected based on several trials to determine the optimal surface material and slope angles. As the reviewer suggests, the success rates could be affected by multiple factors, and hot temperatures likely induce more sweating that could increase the success rates in the climbing test. We will need to carry out additional experiments that are beyond the scope of this study to examine these temperature-dependent effects. Generally, sweating is regulated by sympathetic nerve activity that occurs in response to increased brain neuron excitation. However, here we raise for the first time the possibility that sweating might be regulated by local temperature sensation mediated through TRPV4 that may be effective for fine-tuning of perspiration activity. We have updated the Discussion to reference this possibility.

      (6) There are other studies (PMID 33085914, PMID 31216445) that have examined the role of TRPV4 in regulating perspiration. The presence of TRPV4 in eccrine glands is not a novel finding. Moreover, these studies noted that TRPV4 was not critical in regulating sweating in human subjects. These prior studies are in contradiction to the mouse data and the correlation to human anhidrotic skin in the present study. Neither of these studies is cited or discussed by the authors, but they should be. 

      We thank the reviewer for referencing these other studies concerning the possible involvement of TRPV4 in perspiration in humans. These studies focused on the vasodilating effects of TRPV4 and drew the conclusion that TRPV4 is not involved in sweating in humans, which is in contrast to our data for mice and humans. Multiple factors could explain the apparent difference between the two studies. For example, the parameters they examined differed from ours in that we assessed patients with AIGA, whereas the previous studies involved healthy volunteers. We have updated the Discussion to note the difference in the results of our and previous studies.   

      Reviewer #3 (Public Review):

      (1) Figure 2: The calcium imaging-based approach shows average traces from 6 cells per genotype, but it was unclear if all acinar cells tested with this technique demonstrated TRPV4-mediated calcium influx, or if only a subset was presented.

      “n = 6” does not indicate the number of cells, but rather 6 independent experiments that each had over 20 ROIs of sweat glands. We have clarified this point in the updated figure legend.

      (2) Figure 4: The climbing behavioral test shows a significant reduction in climbing success rate in TRPV4-deficient mice. The authors ascribe this to a lack of hind paw 'traction' due to deficiencies in hind paw perspiration, but important controls and evidence that could rule out other potential confounds were not provided or cited. 

      As noted in our response to Comment 5 made by Reviewer #2, we spent considerable time identifying optimal conditions that would delineate success rates in the climbing experiments. We are confident that TRPV4KO mice had significantly lower success rates than WT mice, but there are various factors that could affect the experimental outcomes. We reference these factors in the updated Discussion.

      (3) In general, the results support the authors' claims that TRPV4 activity is a necessary component of sweat gland secretion, which may have important implications for controlling perspiration as well as secretion from other glands where TRPV4 may be expressed. 

      As described above, the results we obtained in the climbing test can be affected by various factors. However, based on the consistency of the results obtained for the climbing test and the iodine and starch reaction assay, we think that our interpretation is correct. In terms of the involvement of TRPV4/ANO1 interactions in fluid secretion, we previously reported that the TRPV4/ANO1 complex is involved in cerebrospinal fluid secretion in the mouse choroid plexus (FASEB J. 2014) and in saliva and tear secretion in mouse salivary and lacrimal glands (FASEB J. 2018). Together, these findings suggest that this mechanism is common to water efflux from exocrine glands.

      Reviewer #1 (Recommendations For The Authors):

      (1) An exocrine gland-specific trpv4 knockout mouse should be used, as TRPV4 is also expressed by muscles, global knockout TRPV4 may affect the TRPV4-dependent muscle strength and reduce the climbing ability in mice. 

      As the reviewer suggests, use of mice with TRPV4 knockout specific to exocrine glands would be preferable to mice having global TRPV4 knockout given that TRPV4 is expressed in multiple tissues. We agree with this suggestion, but we do not currently have such mice in hand. However, as mentioned above, we have reported the involvement of theTRPV4/ANO1 interaction in cerebrospinal fluid secretion from the choroid plexus in mice (FASEB J. 28: 2238-2248, 2014), as well as saliva and tear secretion in mouse salivary and lacrimal glands (FASEB J. 32: 1841-1854, 2018.), suggesting that the TRPV4/ANO1 interaction could be widely involved in exocrine gland functions that involve water movement. We have updated the Discussion to reference this point.  

      (2) The authors showed Calcium imaging data that Menthol inhibits TRPV4-dependent calcium influx. However, it is well known that menthol induces the sensation of cooling by activating TRPM8. More evidence, including patch clamp recordings, should be done to verify the inhibition effects of menthol on TRPV4 and ANO1. Moreover, Fig 3E-3F could only suggest that menthol-induced cooling sensation may affect sweating but not the inhibition effect of menthol on TRPV4 and ANO1 channels. 

      We agree that more evidence including patch-clamp recordings can verify the inhibitory effects of menthol on TRPV4 and ANO1. We did not include such experiments here since we previously showed that menthol and related agents indeed inhibit TRPV4- and ANO1-mediated currents (Sci. Rep. 7: 43132, 2017). We now cite this paper in the revised version.

      (3) Excepting the climbing test, are there any other better models to asses the sweating-related behaviors? 

      When we detected the involvement of TRPV4/ANO1 interactions in perspiration, we considered different types of behavioral analyses that could be used to demonstrate TRPV4/ANO1-dependent perspiration. We think that the climbing experiment is the best test, particularly since foot pads are one of the few regions on mice that is not covered by fur and thus amenable to evaluation of perspiration using an iodine and starch test.

      Reviewer #2 (Recommendations For The Authors):

      (1) I was confused by a section in the introduction on lines 59-60: How does Cl- efflux lead to the formation of a physical complex in cells with high intracellular Cl-? What is the physical complex? This seems like several disparate concepts combined together, which need to be clarified.

      We apologize for the incomplete descriptions of several of our previous works. We have amended the Introduction section in the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) TRPV4 is expressed by multiple other cell types in the skin (keratinocytes, macrophages etc.) which may have an impact on peripheral sensory function. Is there evidence that TRPV4-deficient animals have relatively normal sensory acuity and/or proprioception? Such evidence would lend more credibility to the reported findings in the climbing test. 

      As the reviewer points out, TRPV4 is expressed by multiple other cell types in the skin. To date we have found that TRPV4KO mice show no differences in sensory functions compared to WT mice. Whether TRPV4 is involved in proprioception is unclear, based on both our own observation and those that appear in the literature, although TRPV4 is clearly activated by mechanical stimuli. We previously compared the mechanical sensitivity of TRPV4 and Piezo1 in bladder epithelial cells, and found that Piezo 1 shows much higher sensitivity relative to TRPV4 (J. Biol. Chem. 289: 16565-16575, 2014), which is consistent with the involvement of Piezo1, rather than TRPV4, in proprioception. Although TRPV4 is reported to be expressed in sensory neurons, we did not detect TRPV4-mediated responses in isolated rat and mouse DRG neurons, suggesting that TRPV4-positive sensory neurons are relatively rare.

      (2) The methods section refers to loading entire sweat glands with Fura-2 dye for calcium imaging, but the figure legend refers to sweat gland acinar cells. Resolving this ambiguity would help readers to interpret the data. 

      We apologize for this error and have made an appropriate correction in the revised manuscript.

      (3) Alternatively, could acute intraplantar injection of a TRPV4 antagonist (e.g. GSK205) in wild-type mice phenocopy the TRPV4-knockout mouse deficits, or could normal climbing behavior be restored in the TRPV4 knockout by adding artificial perspiration to their hindpaws?

      We thank the reviewer for raising this interesting possibility and suggesting use of TRPV4 agonists or antagonists in the climbing tests. We agree that results of such an experiment would support the involvement of TRPV4 in sweating. We tried to do such experiments using injection of TRPV4 regulators into mouse hindpaws. However, the injections themselves appeared to impact climbing ability, perhaps in part due to painful sensations associated with the injection. Similarly, menthol injection appeared to reduce climbing activity, likely through pain sensations associated with TRPA1 activation. As such, we did not pursue these experiments.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Reviewer #2 (Recommendations For The Authors):

      We sincerely appreciate the time and efforts of the Reviewer.

      In light of your data showing that the IgG response is similar with and without CIN, it would be good to drop "and induce abroad, vaccination-like anti-tumor IgG response". This suggests a direct connection between CIN and the IgG response.In my opinion, the shorter title is equally strong and more correct.

      We edited this phrase in the originally submitted title for accuracy:

      Chromosomal instability induced in cancer can enhance macrophage-initiated immune responses that include anti-tumor IgG

      I agree that inducing CIN through other means can be left for a different study but in that case the abstract should moredirectly mention MSP1 inhibition since that is how CIN is always induced. Perhaps line 18: CIN is induced by MSP-1inhibition in poorly immunogenic....

      Done as requested:

      “…Here, CIN is induced in poorly immunogenic B16F10 mouse melanoma cells using spindle assembly checkpoint MPS1 inhibitors…”


      The following is the authors’ response to the original reviews.

      eLife assessment

      This study highlights a valuable finding that chromosomal instability can change immunes responses, in particular macrophages behaviours. The convincing results showing that the use of CD47 targeting and anti-Tyrp1 IgG can overcome changes in immune landscape in tumors and prolong survival of tumor-bearing mice. These findings reveal a new exciting dimension on how chromosomal instability can influence immune responses against tumor.

      We thank the Editors for their enthusiasm and appreciation for this work. We also want to highlight our thanks for their careful reading, support, and patience while handling this manuscript. While this work provides useful insight into potential therapeutic implications of chromosomal instability in the macrophage immunotherapy field, we also hope it elucidates some novel basic science to further explore how chromosomal instability has such interesting effects on the immune system.

      Public Reviews:

      Reviewer #1 (Public Review):

      The manuscript by Hayes et al. explored the potential of combining chromosomal instability with macrophage phagocytosis to enhance tumor clearance of B16-F10 melanoma. However, the manuscript suffers from substandard experimental design, some contradictory conclusions, and a lack of viable therapeutic effects.

      The authors suggest that early-stage chromosomal instability (CIN) is a vulnerability for tumorigenesis, CD47-SIRPa interactions prevent effective phagocytosis, and opsonization combined with inhibition of the CD47-SIRPa axis can amplify tumor clearance. While these interactions are important, the experimental methodology used to address them is lacking.

      Reviewer #1 (Recommendations For The Authors):

      First, early stages of the tumor are essentially being defined as before implantation. In all cases, the tumor cells were pre-treated with MPS1i or had a genetic knockout of CD47. This makes it difficult to see how this would translate clinically.

      We greatly appreciate the Reviewer’s interest in the topic and its potential, but our manuscript makes no claims of immediate clinical translation. Chromosomal instability (CIN) studies have to date not yet discovered or described whether and how CIN can affect macrophage function. To our knowledge, this is the first study to begin such characterizations with various MPS1i drugs to induce CIN. Many variations of the approach can be envisioned for future studies.

      Our Results include some key studies of cancer cells with wildtype levels of CD47- including in vivo tumor elimination (Fig.3E). Nonetheless, we do conduct some of our studies in a CD47 knockout context to remove this “brake” that generally impedes phagocytosis, with our goal being to better understand how CIN affects phagocytosis. As cited to some extent in our Introduction, there are many efforts in clinical trials to disrupt this macrophage checkpoint and others focused on macrophage immunotherapy. Whether CIN can be induced by clinically translatable drugs and specifically in cancer cells is beyond the scope of our studies.

      I would like to see the amount of CIN that occurs in WT B16F10 over the course of tumorigenesis (ie longer than 5 days). This is because I would assume that CIN would eventually occur in the WT B16F10 regardless of whether MPS1i is being given. And if that's the case, then the initiation of CIN at day 10 after implantation (for example) would still be considered "early stage" CIN. If the therapy is then initiated at this point, does the effect remain? Or put differently, how would the authors propose to induce the appropriate level of CIN in an established tumor? Why is pretreatment necessary?

      Untreated B16F10 cells fail to produce micronuclei over 12 days compared to MPS1i treated cells – as shown in a newly added panel in Fig. S1:

      Author response image 1.

      This helps support our decision to pre-treat cells with MPS1i to stimulate genomic instability and is described in the first section of Results:

      “…we saw >10-fold increases of micronuclei over the cell line’s low basal level (~1% of cells), and two other MPS1i inhibitors AZ3146 and BAY12-17389 confirm such effects (Fig. S1A). Micronuclei-positive cells can persist up to 12 days after treatment (Fig. S1B), while control cells maintain the low basal levels. The results suggest pre-treatment with MPS1i can simulate CIN in an experimental context even for 1-2 weeks, which may not typically occur at the same frequency during early tumor growth.

      It is known that PD-1 expression inhibits tumor-associated macrophage phagocytosis (Nature, 2017). Does MSP1i (sic) treatment affect the population of PD-1+ tumor macrophages in vivo?

      We thank the Reviewer for bringing up an interesting point.

      Using the same tumor RNA-seq data that was used for Fig.1E, a heatmap of expression of PD-1 (gene Pdcd1) shows no consistent trend with MPS1i:

      Author response image 2.

      We also examined whether the secretome from CIN-afflicted cancer cells affect PD-1 expression in cultured macrophages, but we did not register any reads from our single-cell RNA-sequencing experiment for Pdcd1 in any of the macrophage clusters from Fig. 1H.

      Author response image 3.

      The Discussion section now includes a statement on this topic:

      “…B16F10 tumors are poorly immunogenic, do not respond to either anti-CD47 or anti-PD-1/PDL1 monotherapies, and show modest and variable cure rates (~20-40%; Dooling et al., 2023; Hayes et al., 2023) even when macrophages have been made maximally phagocytic according to notions above. We should note here that our whole-tumor RNA-seq data (Fig.1E) shows expression of PD-1 (gene Pdcd1) follows no consistent trend upon MPS1i treatment, and that Pdcd1 was not detected in our scRNA-seq data for macrophage cultures (Fig.1G) – motivating further study.”

      The authors must explain how the proposed therapy works since MPS1i increases tumor (cell) size, making it difficult for macrophages to phagocytose the tumor cells. It also reduces or suppresses Tyrp1 expression on the cancer cells, making it harder to opsonize. Since these were two main points for the rationale of this study, the authors need to reconcile them.

      We appreciate this comment and have re-organized this Results section to try to minimize confusion:

      CIN-afflicted, CD47-knockout tumoroids are eliminated by Macrophages

      To assess functional effects of macrophage polarization, we focused on a 3D “immuno-tumoroid” model in which macrophage activity can work (or not) over many days against a solid proliferating mass of cancer cells in non-adherent roundbottom wells (Fig. 2A) (Dooling et al., 2023). We used CD47 knockout (KO) B16F10 cells, which removes the inhibitory effect of CD47 on phagocytosis, noting that KO does not perturb surface levels of Tyrp1, which is targetable for opsonization with anti-Tyrp1 (Fig. S2A). BMDMs were added to pre-assembled tumoroids at a 3:1 ratio, and we first assessed surface protein expression of macrophage polarization markers. Consistent with our whole-tumor bulk RNA-sequencing and also single-cell RNA-sequencing of BMDM monocultures (Fig. 1E, 1I-J), BMDMs from immunotumoroids of MPS1i-treated B16F10 showed increased surface expression of M1-like markers MHCII and CD86 while showing decreased expression of M2-like markers CD163 and CD206 (Fig. 2B-C). Although these macrophages seemed poised for anticancer activity, the cancer cells showed decreased binding of anti-Tyrp1 (Fig. S2B) and ~20% larger size in flow cytometry (Fig. S2C). The latter likely reflects cytokinesis defects and poly-ploidy as acute effects of CIN induction (Chunduri & Storchová, 2019; Mallin et al., 2022). Such cancer cell changes might explain why standard 2D phagocytosis assays show BMDMs attached to rigid plastic engulf relatively few anti-Tyrp1 opsonized cancer cells pretreated with MPS1i versus DMSO (Fig. S2D). In such cultures, BMDMs use their cytoskeleton to attach and spread, competing with engulfment of large and poorly opsonized targets. Noting that tumors in vivo are not as rigid as plastic, our 3D immunotumoroids eliminate attachment to plastic, and large numbers of macrophages can cluster and cooperate in engulfing cancer cells in a cohesive mass (Dooling et al., 2023). We indeed find CIN-afflicted tumoroids are eliminated by BMDMs regardless of anti-Tyrp1 opsonization (Fig. 2D-E), whereas anti-Tyrp1 is required for clearance of DMSO control tumoroids (Fig. 2D, S3B). Imaging also suggests that cancer CIN stimulates macrophages to cluster (compare Day-4 in Fig. 2D), which favors cooperative phagocytosis of tumoroids (Dooling et al., 2023), and occurs despite the lack of cancer cell opsonization and their larger cell size. The 3D immunotumoroid results with induced CIN are thus consistent with a more pro-phagocytic M1-type polarization (Fig.1J and 2B,C).

      The authors used varying numbers of tumor cells for the in vivo portions of the study; the first half of the manuscript uses 500,000 cells, while the latter half uses 200,000 cells. Why?

      The reasons for the difference in numbers is now clarified in the Methods:

      For assessing immune infiltrates in early stages of tumor engraftment, when tumors are still small, we used a relatively high number of tumor cells (500,000 cells in Fig. 1D and Fig. 2F-G) to achieve sufficient cell numbers after dissociating the tumors, particularly for the slow-growing MPS1i-treated tumors. More specifically, with dissection, collagenase treatment, passage through a filter to remove clumps, we would lose many cells, and yet needed 100,000 viable cells or more for bulk RNA-seq suspensions and for flow cytometry measurements. For all other studies, 200,000 cancer cells were injected,

      The authors need to report the tumor volumes and the total number of cells isolated from the day five tumors to avoid grossly inflating the effect (i.e. Fig 2G and 4G).

      We have added relevant numbers in the Methods:

      For day 5 post-challenge measurements, 100,000 to 200,000 live cells were collected. For in vivo tumor infiltrate studies in re-challenged mice, 10 million live cells were collected.

      Also, regarding tumor sizes and cell numbers, we have previously published relevant measurements in assessments of tumor growth. Please see:

      Brandon H Hayes, Hui Zhu, Jason C Andrechak, Lawrence J Dooling, Dennis E Discher, Titrating CD47 by mismatch CRISPR-interference reveals incomplete repression can eliminate IgG-opsonized tumors but limits induction of antitumor IgG, PNAS Nexus, Volume 2, Issue 8, August 2023, pgad243, https://doi.org/10.1093/pnasnexus/pgad243

      Dooling, L.J., Andrechak, J.C., Hayes, B.H. et al. Cooperative phagocytosis of solid tumours by macrophages triggers durable anti-tumour responses. Nat. Biomed. Eng 7, 1081–1096 (2023). https://doi.org/10.1038/s41551-023-01031-3

      In the present study, similar tumor growth curves are provided for transparency, but the Kaplan-Meier curves as the key pieces of data in Fig. 3-4. Lastly, regarding reporting total cell number harvested, we based our experiments on previously accepted measurements that also reported numbers out of total harvested cells. See:

      Cerezo-Wallis, D., Contreras-Alcalde, M., … Soengas, M.S., 2020. Midkine rewires the melanoma microenvironment toward a tolerogenic and immune-resistant state. Nat Med 26, 1865–1877. https://doi.org/10.1038/s41591-020-1073-3

      The figure titles need to be revised. For example, the title of Figure 1 claims that "MPS1i-induced chromosomal instability causes proliferation deficits in B16F10 tumors." However, the evidence provided is weak. The authors only present GSEA analysis of proliferation and no functional evidence of impairment. The authors need to characterize this proliferation deficit using in vitro studies and functional studies of macrophage polarization. I would suggest proliferation assays (crystal violet, MTT, Incucyte, etc) to measure the B16 growth over time with MPS1i treatment.

      We thank the Reviewer for pointing this out. In Fig.1 we have minimized information regarding proliferation because it is later quantified in Figs.2D,E, S3, and 3D-i:

      Fig.1F legend: Top downregulated hallmark gene sets in tumors comprised of MPS1i-treated B16F10 cells, showing downregulated DNA repair, cell cycle, and growth-related pathways, consistent with observations of slowed growth in culture and in vivo – as subsequently quantified.

      Then the authors could collect the tumor supernatant to culture with macrophages and determine polarization in vitro. I would also like to see functional studies of macrophage polarization (suppression assays, cytokine production, etc). Currently, the authors provide no functional studies.

      Fig.2B,C provides functional surface marker measurements of in vitro polarization toward anti-cancer M1 macrophages by MPS1i-pretreated tumor cells, consistent with gene expression in Fig.1G-J. Function is further shown as ant-cancer activity in Fig.2D,E, as now stated explicitly in the text:

      “…In our 3D tumoroid in vitro assays, we found that macrophages can suppress the growth of chromosomally unstable tumoroids and clear them, surprisingly both with and without anti-Tyrp1 (Fig. 2D-E), regardless of MPS1i concentration used for treatment. Such a result is consistent with M1-type polarization (Fig.1J and 2B,C), which tends to be more pro-phagocytic. Such a result is consistent with M1-type polarization (Fig.1J and 2B,C), which tends to be more prophagocytic.”

      The authors claim that macrophages are the key effector cells, but they need to provide evidence for this claim.

      Other immune cells clearly contribute to the presented results because the IgG must eventually come from B cells. The text has been edited to indicate 'macrophages are key initiating-effector cells', and some evidence for this is the maximal survival of (WT B16 + Rev tumors) in Fig.3E upon treatment with Marrow Macrophages plus Macrophage-relevant SIRPa blockade and Macrophage-relevant IgG (via FcR). T cells do not have SIRPa or FcR.

      They can deplete macrophages and T and B cells to determine whether the effect remains or is ablated. This is the only definitive way to make this claim.

      To determine whether T and B cells might also be key initiating-effector cells, new experiments were done with mice depleted of T and B cells (per Fig.S9, below). We compared the growth of MPS1i vs DMSO treatments in these mice to results in mice with T and B cells (which should replicate our previous results in Fig.3D-i). We found that slower growth with Rev relative to DMSO was similar in mice without T and B cells compared to mice with T and B cells. We have added to the text our conclusion that: T and B cells are not key initiating-effector cells. Whereas B cells are effector cells at least in terms of eventually making anti-tumor IgG, our results show that macrophages are key initiating-effector cells because macrophages certainly affect the growth of (WT B16 + Rev tumors) when more are added (Fig.3E).

      Author response image 4.

      Growth of CIN-afflicted wild-type (WT) tumors in T- and B-cell deficient mice and T- and B-cell replete mice. Similar growth delays for MPS1i-pretreated B16F10 cells in T- and B-cell deficient NSG mice and immunocompetent C57BL/6 mice. Both types of mice have functional macrophages. Parallel studies in vivo were done with WT B16F10 ctrl cells cultured 24 h in 2.5 μM MPS1i (reversine or DMSO, then washed 3x in growth media for 5 min each and allowed to recover in growth media for 48 h. 200,000 cells in 100 uL PBS were injected subcutaneously into right flanks, and the standard size limit was used to determine survival curves. The C57BL/6 experiments were done independently here (by co-author L.J.D.) from the similar results (by B.H.H.) shown in Fig.3D-i, which provides evidence of reproducibility.

      The Results section final paragraph describes all of this:

      Macrophages seem to be the key initiating-effector cells, based in part on the following findings. First, macrophages with both SIRPα blockade and FcR-engaging, tumor-targeting IgG maximize survival of mice with WT B16 + Rev tumors (Fig. 3E) – noting that macrophages but not T cells express SIRPα and FcR’s. Despite the clear benefits of adding macrophages, to further assess whether T and B cells are key initiating-effector cells, new experiments were done with mice depleted of T and B cells. We compared the growth delay of MPS1i versus DMSO treatments in these mice to the delay in fully immunocompetent mice with T and B cells – with all studies done at the same time. We found that slower growth with Rev relative to DMSO was similar in mice without T and B cells when compared to immunocompetent C57 mice (Fig.S9). We conclude therefore that T and B cells are not key initiating-effector cells. At later times, B cells are likely effector cells at least in terms of making anti-tumor IgG, and T cells in tumor re-challenges are also increased in number (Fig. 4G-ii). We further note that in our earlier collaborative study (Harding et al., 2017) WT B16 cells were pre-treated by genome-damaging irradiation before engraftment in C57 mice, and these cells grew minimally – similar to MPS1i treatment – while untreated WT B16 cells grew normally at a contralateral site in the same mouse. Such results indicate that T and B cells in C57BL/6 mice are not sufficiently stimulated by genome-damaged B16 cells to generically impact the growth of undamaged B16 cells.

      Reviewer #2 (Public Review):

      Harnessing macrophages to attack cancer is an immunotherapy strategy that has been steadily gaining interest. Whether macrophages alone can be powerful enough to permanently eliminate a tumor is a high-priority question. In addition, the factors making different tumors more vulnerable to macrophage attack have not been completely defined. In this paper, the authors find that chromosomal instability (CIN) in cancer cells improves the effect of macrophage targeted immunotherapies. They demonstrate that CIN tumors secrete factors that polarize macrophages to a more tumoricidal fate through several methods. The most compelling experiment is transferring conditioned media from MSP1 inhibited and control cancer cells, then using RNAseq to demonstrate that the MSP1-inhibited conditioned media causes a shift towards a more tumoricidal macrophage phenotype. In mice with MSP1 inhibited (CIN) B16 melanoma tumors, a combination of CD47 knockdown and anti-Tyrp1 IgG is sufficient for long term survival in nearly all mice. This combination is a striking improvement from conditions without CIN.

      Like any interesting paper, this study leaves several unanswered questions. First, how do CIN tumors repolarize macrophages? The authors demonstrate that conditioned media is sufficient for this repolarization, implicating secreted factors, but the specific mechanism is unclear. In addition, the connection between the broad, vaccination-like IgG response and CIN is not completely delineated. The authors demonstrate that mice who successfully clear CIN tumors have a broad anti-tumor IgG response. This broad IgG response has previously been demonstrated for tumors that do not have CIN. It is not clear if CIN specifically enhances the anti-tumor IgG response or if the broad IgG response is similar to other tumors. Finally, CIN is always induced with MSP1 inhibition. To specifically attribute this phenotype to CIN it would be most compelling to demonstrate that tumors with CIN unrelated to MSP1 inhibition are also able to repolarize macrophages.

      Overall, this is a thought-provoking study that will be of broad interest to many different fields including cancer biology, immunology and cell biology.

      We thank the Reviewer for their enthusiastic and positive comments toward the manuscript.

      Our main purpose with this study has been discovery science oriented and mechanistic, with implications for improving macrophage immunotherapies. More experimentation needs to be done to further understand how this positive immune response emerges. However, we could address whether CIN enhances or not the anti-tumor IgG response by quantitative comparisons to our two other recent studies, and we conclude that it does not per new edits in the Abstract and the Results. See attached PPT for full details and comparison.

      Abstract:

      “CIN does not greatly affect the level of the induced response but does significantly increase survival.”

      “…these results demonstrate induction of a generally potent anti-cancer antibody response to CIN-afflicted B16F10 in a CD47 KO context. Importantly, comparing these sera results for CINafflicted tumors to our recent studies of the same tumor model without CIN (Dooling et al., 2022; Hayes et al., 2022), we find similar levels of IgG induction (e.g. ~100-fold above naive on average for IgG2a/c), similar increases in phagocytosis by sera opsonization (e.g. equivalent to antiTyrp1), and similar levels of suppressed tumoroid growth – including the variability.

      However, median survival increased (21 days) compared to their naïve counterparts (14 days), supporting the initial hypothesis of prolonged survival and consistent not only with past results indicating major benefits of a prime-&-boost approach with anti-Tyrp1 (Dooling et al., 2022) but also with the noted similarities in induced IgG levels.”

      Future studies could certainly focus on trying to identify what secreted factors might be inducing the M1-like polarization (using ELISA assays for cytokine detection, for example). This could be important because a main finding here is that we achieve nearly a 100% success rate in clearing tumors when we combine CD47 ablation and IgG opsonization with cancer cell CIN. Previous studies were only able to achieve about 40% cures in mice when working with CD47 disription and IgG opsonization alone, suggesting CIN in this experimental context does improve macrophage response.

      Lastly, we agree with the Reviewer that future studies should also address how CIN in general (not MPS1i-induced) affects tumor growth. The final paragraph of our Discussion at least cites support for consistent effects of M1-like polarization:

      “The effects of CIN and aneuploidy in macrophages certainly requires further investigation. We did publish recently that M1-like polarization of BMDMs with IFNg priming is sufficient to suppress growth of B16 tumoroids with anti-Tyrp1 opsonization more rapidly than unpolarized/unprimed macrophages and much more rapidly than M2-like polarization of BMDMs with IL4 (Extended Data Fig.5a in Dooling et al., 2023); hence, anti-cancer polarization contributes in this assay.

      While the secretome from MPS1i-treated cancer cells has been found to trigger…”

      Nonetheless, we can only speculate that there is a threshold of CIN reached by a certain timepoint in tumor engraftment and growth. Natural CIN might not be enough, so we pursued a pharmacological approach consistent with ongoing pre-clinical studies (https://doi.org/10.1158/1535-7163.MCT-15-0500). Future studies should consider trying knockdown models to gradually accrue CIN in tumors or using more relevant pharmacological drugs that are known to induce CIN not associated with the spindle. We believe, however, that these are larger questions on their own and are beyond the scope of the foundational discoveries in this manuscript.

      Reviewer #2 (Recommendations For The Authors):

      None

      We again thank the Reviewer for their support and enthusiasm for the manuscript. We made some additional changes and more data to address questions posed by the other Reviewer that we hope you find to help the manuscript further.

    2. eLife assessment

      The authors provide compelling evidence that MSP1 inhibition (leading to chromosomal instability or CIN in the cancer cells) increases phagocytosis and that tumors with CIN respond better to macrophage therapeutics. In this important study, they demonstrate particularly impressive survival rates for mouse models of CIN B16 tumors treated with adoptively transferred macrophages, CD47-SIRPα blockade, and anti-Tyrp1 IgG.

    1. eLife assessment

      This study provides a useful set of experiments showing the relative contribution of the Neurodrenergic system in reversing the sedation induced by midazolam. The evidence supporting the claims of the authors is solid, although specificity issues in the pharmacology and neural-circuit investigations narrow down the strengths of the conclusions. After dealing with these limitations, the paper will be of interest to medical biologists working on the neurobiology of anesthesia.

    2. Reviewer #1 (Public Review):

      In this study, Gu at al., investigated the role of the central noradrenaline system from LC to VLPO in the recovery of consciousness induced by midazolam. Combining pharmacology, optogenetics/chemogenetics, they found that the LC to VLPO NE circuits are essential for consciousness rebooting after midazolam, activation of this circuit strongly speeded up the recovery process, dependent on alpha1 adrenergic receptors in the VLPO neurons. The topic is important and their findings are of some interest.<br /> However, substantial improvements are needed in the language, for grammar, clarity, and layout. There are significant experimental errors (see below 1-2). Further experiments are required to support their main conclusions.

      (1) One major issue arises in Figure 4, the recording of VLPO Ca2+ activity. In Lines 211-215, they stated that they injected AAV2/9-DBH-GCaMP6m into the VLPO, while activating LC NE neurons. As they claimed in line 157, DBH is a specific promoter for NE neurons. This implies an attempt to label NE neurons in the VLPO, which is problematic because NE neurons are not present in the VLPO. This raises concerns about their viral infection strategy since Ca activity was observed in their photometry recording. This means that DBH promoter could randomly label some non-NE neurons. Is DBH promoter widely used? The authors should list references. Additionally, they should quantify the labeling efficiency of both DBH and TH-cre throughout the paper.<br /> (2) A similar issue arises with chemogenetic activation in Fig. 5 L-R, the authors used TH-cre and DIO-Gq virus to label VLPO neurons. Were they labelling VLPO NE or DA neurons for recording? The authors have to clarify this.<br /> (3) Another related question pertains to the specificity of LC NE downstream neurons in the VLPO. For example, do they preferentially modulate GABAergic or glutamatergic neurons?<br /> (4) In Figure 1A-D, in the measurement of the dosage-dependent effect of Mida in LORR, were they only performed one batch of testing? If more than one batch of mice were used, error bar should be presented in 1B. Also, the rationale of testing TH expression levels after Mid is not clear. Is TH expression level change related to NE activation specifically? If so, they should cite references.<br /> (5) Regarding the photometry recording of LC NE neurons during the entire process of midazolam injection in Fig. 2 and Fig. 4, it is unclear what time=0 stands for. If I understand correctly, the authors were comparing spontaneous activity during the four phases. Additionally, they only show traces lasting for 20s in Fig. 2F and Fig. 4L. How did the authors select data for analysis, and what criteria were used? The authors should also quantify the average Ca2+ activity and Ca2+ transient frequency during each stage instead of only quantifying Ca2+ peaks. In line 919, the legend for Figure 2D, they stated that it is the signal at the BLA; were they also recorded from the BLA?

    3. Reviewer #2 (Public Review):

      Summary:

      This article mainly explores the neural circuit mechanism of recovery of consciousness after midazolam administration and proves that the LC-VLPO NEergic neural circuit helps to promote the recovery of midazolam, and this effect is mainly caused by α1 adrenergic receptors. (α1-R) mediated.

      Strengths:

      This article uses innovative methods such as optogenetics and fiber optic photometry in the experimental methods section to make the stimulation of neuronal cells more precise and the stimulation intensity more accurate in experimental research. In addition, fiber optic photometry adds confidence to the results of calcium detection in mouse neuronal cells.

      This article explains the results from the entire system down to cells, and then cells gradually unfold to explain the entire mechanism. The entire explanation process is logical and orderly. At the same time, this article conducted a large number of rescue experiments, which greatly increased the credibility of the experimental conclusions.

      Throughout the full text and all conclusions, this article has elucidated the neural circuit mechanism of recovery of consciousness after midazolam administration and successfully verified that the LC-VLPO NEergic neural circuit helps promote the recovery of midazolam.

      The conclusions of this article are crucial to ameliorate the complications of its abuse. It will pinpoint relevant regions involved in midazolam response and provide a perspective to help elucidate the dynamic changes in neural circuits in the brain during altered consciousness and suggest a promising approach towards the goal of timely recovery from midazolam. New research avenues.

      At the same time, this article also has important clinical translation significance. The application of clinical drug midazolam and animal experiments have certain guiding significance for subsequent related clinical research.

    4. Author Response:

      We sincerely value the insightful and constructive feedback provided by the reviewers, which has been instrumental in identifying areas of our manuscript that required further clarification or amendment. Below are our responses detailing each comment.

      Reviewer 1:

      (1) One major issue arises in Figure 4, the recording of VLPO Ca2+ activity. In Lines 211-215, they stated that they injected AAV2/9-DBH-GCaMP6m into the VLPO, while activating LC NE neurons. As they claimed in line 157, DBH is a specific promoter for NE neurons. This implies an attempt to label NE neurons in the VLPO, which is problematic because NE neurons are not present in the VLPO. This raises concerns about their viral infection strategy since Ca activity was observed in their photometry recording. This means that DBH promoter could randomly label some non-NE neurons. Is DBH promoter widely used? The authors should list references. Additionally, they should quantify the labeling efficiency of both DBH and TH-cre throughout the paper.

      (1) In Figure 5, we found that the VLPO received the noradrenergic projection from LC, indicating the recorded Ca2+ activity may come from the axon fibers corresponding to the projection. Similarly, Gunaydin et al. (2014) demonstrated that fiber photometry can be used to selectively record from neuronal projection.

      (2) Located in the inner membrane of noradrenergic and adrenergic neurons, DBH (Dopamine-beta-hydroxylase) is an enzyme that catalyzes the conversion of dopamine to norepinephrine, and therefore plays an important role in noradrenergic neurotransmission. DBH is a marker of noradrenergic neurons. Zhou et al. (2020) clarified the probe specifically labeled noradrenergic neurons by immunolabeling for DBH. Recently, DBH promoter have been used in several studies (e.g., Han et al., 2024; Lian et al., 2023). The DBH-Cre mice are widely used to specifically labeled noradrenergic neurons (e.g., Li et al., 2023; Breton-Provencher et al., 2022; Liu et al., 2024). As reviewer said, it is difficult to distinguish the role of NE or DA neurons when using the TH promoter in VLPO. Therefore, we used DBH promoter with more specific labeling. LC is the main noradrenergic nucleus of the central nervous system. In our study, we injected rAAV-DBH-GCaMP6m-WPRE (Figure 2 and 8) and rAAV-DBH-EGFP-S'miR-30a-shRNA GABAA receptor)-3’-miR30a-WPRES (Figure 9) into the LC. The results showed that DBH promoter could specifically label noradrenergic neurons in the LC, while non-specific markers outside the LC were almost absent. As suggested, we will quantify the labeling efficiency of both DBH and TH-cre throughout the revised manuscript. This updated figure will provide a more rigorous analysis.

      (2) A similar issue arises with chemogenetic activation in Fig. 5 L-R, the authors used TH-cre and DIO-Gq virus to label VLPO neurons. Were they labelling VLPO NE or DA neurons for recording? The authors have to clarify this.

      As previously addressed in response to Comment #1, we acknowledge that it is difficult to distinguish the role of NE or DA neurons when using the TH promoter in VLPO. In the revised manuscript, we are considering conducting more restricted AAV injections into the VLPO to verify terminal expressions in the LC.

      (3) Another related question pertains to the specificity of LC NE downstream neurons in the VLPO. For example, do they preferentially modulate GABAergic or glutamatergic neurons?

      As suggested, we will supplement the multi-label ISH of LC NE downstream neurons in the VLPO to reveal the types of neurons they modulate.  

      (4) In Figure 1A-D, in the measurement of the dosage-dependent effect of Mida in LORR, were they only performed one batch of testing? If more than one batch of mice were used, error bar should be presented in 1B. Also, the rationale of testing TH expression levels after Mid is not clear. Is TH expression level change related to NE activation specifically? If so, they should cite references.

      (1) As recommended, we will supplement error bar in the revised manuscript.

      (2) As reviewer suggested, the use of TH as a marker of NE activation is controversial, so in the revised manuscript, we will directly determine central norepinephrine content.

      (5) Regarding the photometry recording of LC NE neurons during the entire process of midazolam injection in Fig. 2 and Fig. 4, it is unclear what time=0 stands for. If I understand correctly, the authors were comparing spontaneous activity during the four phases. Additionally, they only show traces lasting for 20s in Fig. 2F and Fig. 4L. How did the authors select data for analysis, and what criteria were used? The authors should also quantify the average Ca2+ activity and Ca2+ transient frequency during each stage instead of only quantifying Ca2+ peaks. In line 919, the legend for Figure 2D, they stated that it is the signal at the BLA; were they also recorded from the BLA?

      (1) In this study, we used optical fiber calcium signal recording, which is a fluorescence imaging based on changes in calcium. The fluorescence signal is usually divided into different segments according to the behavior, and the corresponding segments are orderly according to the specific behavior event as the time=0. The mean calcium fluorescence signal in the time window 1.5s or 1s before the event behavior is taken as the baseline fluorescence intensity (F0), and the difference between the fluorescence intensity of the occurrence of the behavior and the baseline fluorescence intensity is divided by the difference between the baseline fluorescence intensity and the offset value. That is, the value ΔF/F0 represents the change of calcium fluorescence intensity when the event occurs. The results of the analysis are commonly represented by two kinds of graphs, namely heat map and event-related peri-event plot (e.g., Cheng et al., 2022; Gan-Or et al., 2023; Wei et al., 2018). In Fig. 2, the time points for awake, midazolam injection, LORR and RORR in mice were respectively selected as time=0, while in Fig. 4, RORR in mice was selected as time=0. The selected traces lasting for 20s was based on the length of a complete Ca2+ signal. We will explain the Ca2+ recording experiment more specifically in the revised manuscript.

      (2) To the BLA, we sincerely apologize for our carelessness, the signal we recorded were from the LC rather than the BLA. We will carefully check and correct similar problems in the revised manuscript.

      Reviewer 2:

      In figure legends, abbreviations in figure should be supplemented as much as possible. For example, "LORR" in Figure 1.

      As suggested, we will supplement abbreviations in figure as much as possible in the revised manuscript.

      References

      Gunaydin LA, Grosenick L, Finkelstein JC, et al. Natural neural projection dynamics underlying social behavior. Cell. 2014;157(7):1535-1551. doi:10.1016/j.cell.2014.05.017

      Zhou N, Huo F, Yue Y, Yin C. Specific Fluorescent Probe Based on "Protect-Deprotect" To Visualize the Norepinephrine Signaling Pathway and Drug Intervention Tracers. J Am Chem Soc. 2020;142(41):17751-17755. doi:10.1021/jacs.0c08956

      Han S, Jiang B, Ren J, et al. Impaired Lactate Release in Dorsal CA1 Astrocytes Contributed to Nociceptive Sensitization and Comorbid Memory Deficits in Rodents. Anesthesiology. 2024;140(3):538-557. doi:10.1097/ALN.0000000000004756

      Lian X, Xu Q, Wang Y, et al. Noradrenergic pathway from the locus coeruleus to heart is implicated in modulating SUDEP. iScience. 2023;26(4):106284. Published 2023 Feb 27. doi:10.1016/j.isci.2023.106284

      Li C, Sun T, Zhang Y, et al. A neural circuit for regulating a behavioral switch in response to prolonged uncontrollability in mice. Neuron. 2023;111(17):2727-2741.e7. doi:10.1016/j.neuron.2023.05.023

      Breton-Provencher V, Drummond GT, Feng J, Li Y, Sur M. Spatiotemporal dynamics of noradrenaline during learned behaviour. Nature. 2022;606(7915):732-738. doi:10.1038/s41586-022-04782-2

      Liu Q, Luo X, Liang Z, et al. Coordination between circadian neural circuit and intracellular molecular clock ensures rhythmic activation of adult neural stem cells. Proc Natl Acad Sci U S A. 2024;121(8):e2318030121. doi:10.1073/pnas.2318030121

      Cheng J, Ma X, Li C, et al. Diet-induced inflammation in the anterior paraventricular thalamus induces compulsive sucrose-seeking. Nat Neurosci. 2022;25(8):1009-1013. doi:10.1038/s41593-022-01129-y

      Gan-Or B, London M. Cortical circuits modulate mouse social vocalizations. Sci Adv. 2023;9(39):eade6992. doi:10.1126/sciadv.ade6992

      Wei YC, Wang SR, Jiao ZL, et al. Medial preoptic area in mice is capable of mediating sexually dimorphic behaviors regardless of gender. Nat Commun. 2018;9(1):279. Published 2018 Jan 18. doi:10.1038/s41467-017-02648-0

    1. eLife assessment

      Münker and colleagues use an optical tweezer setup to apply oscillatory forces to endocytosed/phagocytosed glass beads over a wide frequency range (from ~1 to 1000 Hz) and probe cytoplasmic material properties at multiple time scales in six different cell types. Using statistical methods and principal component analysis, they find that the active and passive mechanical properties of cells can be described by 6 parameters (from power law fits) that allow characterizing the viscous and elastic nature of the cytoplasmic material as well as an effective active energy driven by cellular metabolism. Overall, this is very well done and important work, using convincing and state-of-the-art methods, albeit with some limitations related to the way the beads are internalized.

    2. Reviewer #1 (Public Review):

      Summary:

      In this MS, Muenker and colleagues, explore the intracellular mechanics of a range of animal adherent cells. The study is based on the use of an optical tweezer set up, which allows to apply oscillatory forces on endocytosed/phagocytosed glass beads with a large frequency range (from ~1 to 1000 Hz) , allowing to probe cytoplasm material properties at multiple time scales. By switching off the laser trap, the authors also record the positional fluctuations of beads, to extract passive rheological signatures. The combination of both methods allow to fit 6 parameters (from power law fits) that allow to characterize the viscous and elastic nature of the cytoplasm material as well as an effective active energy driven by cellular metabolism. Using these methodologies, the authors first establish/confirm, using HeLa cells, that the cytoplasm is more solid like at short frequencies, and more fluid like at higher frequencies, and that these material states depend on both microtubules and actin cytoskeleton. The manuscript then go on to explore how these parameters evolve in other 6 cell types including muscles, highly migratory and epithelial cells. These results show for instance that muscle cells are much stiffer, while migratory cells are more fluid like with an increased active energy. Finally using statistical methods and principal component analysis, the authors establish some mechanical fingerprints (activity, fluidity and resistance) that allow to distinguish cell's mechanical state and relate it to their particular functions.

      Strengths:

      Overall this is a very well-executed work, which provides a large body of rigorous numbers and data to understand the regulation of cytoplasm mechanics and its relation to cell state/function.

      Weaknesses:

      A limit of the paper is that the biological mechanisms by which intracellular mechanics is modulated (e.g. among cell types) remains unexplored and only briefly discussed. Yet this limit is greatly offset by the rigor of the approach.

    3. Reviewer #2 (Public Review):

      Summary:

      By analyzing cells' frequency-dependent viscoelastic properties and intracellular activity through microrheology, Münker et al simplify the complex active mechanical state into six key parameters that constitute the mechanical fingerprint. They apply this concept to cells treated with cytoskeleton-inhibiting drugs. Additionally, a comprehensive statistical analysis across various cell types shows how cells coordinate their mechanical properties within a defined phase-space marked by activity, mechanical resistance, and fluidity.

      Strengths:

      (1) The distribution of the six parameters: they have been well characterized based on established theories, and they can be used to understand cell-type-specific biomechanical differences. The examples of muscle cells and immune cells were profound and informative.<br /> (2) Efforts to perform dimension reduction of parameter space into activity (E), fluidity (C1) and resistance (A) are insightful and will be helpful for future characterization of cell mechanics.

      Weaknesses:

      (1) The most difficult part of the method is the part with actin polymerization inhibition with cytochalasin B. The data shows that viscoelastic parameters as well as active energy parameters are unaffected by cytochalasin B. It is reasonable to expect that elasticity will reduce and fluidity will increase upon application of such a drug. The stiffness-reducing effect was observed only when CB was used with nocodazole most likely because of phagocytosis of the bead, which is governed by microtubule. The use of other actin-depolymerizing drugs such as latrunculin A would be needed to test actin's role in mechanical fingerprints. If actin's role is only explained by accompanying microtubule inhibition, it is not a convenient system to directly test the mechano-adaptation process.<br /> (2) Depolymerization of MT with nocodazole did not reduce the solid-like property A. Adding discussion and comparison with other papers in the literature using nocodazole will be helpful in understanding why.<br /> (3) Overall, the usefulness of the concept of mechanical fingerprints and comparisons with other cell mechanics studies (from other groups) will make this manuscript stronger.

    4. Reviewer #3 (Public Review):

      Summary:

      Cells and tissues are viscoelastic materials. However, metabolic processes that underly survival, growth and migration render the cell as an active matter at non-equilibrium. These two facts contribute to the difficulty of probing mechanical properties especially with sub-cellular resolution. However, the concept that the mechanical phenotype can be indicative of normal physiology necessitates approaches of defining the cellular phenotype. Here, Muenker et al evokes a powerful argument for mapping intracellular mechanics using optical tweezer- active microrheology. They present a suite of parameters towards a definition of a mechanical fingerprint. This is a compelling idea. There are some concerns as detailed below

      Strengths:

      These are technically challenging experiments and the authors provide systematic approaches to probe a system at non-equilibrium.

      Weaknesses:

      The importance of the mechanical fingerprint is diluted due to some missing controls needed for biological relevance.

    5. Author response:

      Reviewer 1:

      A limit of the paper is that the biological mechanisms by which intracellular mechanics is modulated (e.g. among cell types) remains unexplored and only briefly discussed. Yet this limit is greatly offset by the rigor of the approach.

      We thank the reviewer for the valuable feedback. The question regarding the biological mechanisms responsible for the different mechanical properties is, indeed, a highly important and interesting issue. In line with the reviewer, we consider this so important that it requires an extra, dedicated research focus, which is far beyond the scope of this article. By introducing the concept of the mechanical fingerprint, we provide in this work the framework to systematically investigate biological mechanisms but also the functional relevance of the intracellular mechanical properties in future studies. In the revised manuscript, we’ll elaborate on the discussion.

      Reviewer 2:

      The most difficult part of the method is the part with actin polymerization inhibition with cytochalasin B. The data shows that viscoelastic parameters as well as active energy parameters are unaffected by cytochalasin B. It is reasonable to expect that elasticity will reduce and fluidity will increase upon application of such a drug. The stiffness-reducing effect was observed only when CB was used with nocodazole most likely because of phagocytosis of the bead, which is governed by microtubule. The use of other actin-depolymerizing drugs such as latrunculin A would be needed to test actin’s role in mechanical fingerprints. If actin’s role is only explained by accompanying microtubule inhibition, it is not a convenient system to directly test the mechano-adaptation process.

      We thank the reviewer for the time and the instructive feedback. Our finding that actin depolymerization has no effect on the intracellular mechanics may appear unfamiliar, as many rheological studies performed on the cell’s cortex highlight the importance of actin on the mechanical properties of the whole cell. However, as the actin network is reported to be very sparse away from the cortex it is not impossible that the mechanical properties may be dominated by other structures in the cytoplasm. Indeed, our findings are consisted with other studies that see no strong effect of actin depolymerization on the interphase intracellular mechanics (e.g. https://doi.org/10.1016/j.bpj.2023.04.011 or https://doi.org/10.1038/s41567-021-01368-z). Still, we fully agree with the reviewers that this is an important point. In a revised version we aim to investigate the effect of other actin-depolymerizing drugs and will try to perform immunostaining to visualize and further illuminate the potential compensation mechanism between actin and MT.

      Depolymerization of MT with nocodazole did not reduce the solid-like property A. Adding discussion and comparison with other papers in the literature using nocodazole will be helpful in understanding why.

      Again, we agree with the reviewer and propose to further study this point by performing additional immunostainings and by elaborating on the discussion, also including the results of other studies.

      Reviewer 3:

      The importance of the mechanical fingerprint is diluted due to some missing controls needed for biological relevance.

      We thank the reviewer for his valuable time and feedback. This comment is in line with the point already raised by reviewer 1 and highlights the important question of how the intracellular mechanical properties are related to the actual cell function. We fully agree with the reviewers that at this point we can only report on differences, but cannot claim a biological function that is depending on the fingerprint. Although we think the alignment between function and the mechanical fingerprints allows the hypothesis that the biological system is tuning its mechanical properties for a specific function, we do not want to make any claim in this direction at the current state of our research. Hence, to answer these intriguing questions, carefully designed control experiments are required, as pointed out by the reviewer. However, this direction is not the scope of this manuscript. Here, we establish the tools we’ll use in future studies to address these highly relevant questions. Therefore, we propose to discuss these important future directions in a revised manuscript.

    1. eLife assessment

      This valuable manuscript sets out to identify sleep/arousal phenotypes in larval zebrafish carrying mutations in Alzheimer's disease (AD)-associated genes. The authors provide detailed phenotypic data for F0 knockouts of each of 7 AD-associated genes and then compare the resulting behavioral fingerprints to those obtained from a large-scale chemical screen to generate new hypotheses about underlying molecular mechanisms. The data presented are solid, although extensive interpretation of pharmacological screen data does not necessarily reflect the limited mechanistic data. Nonetheless, the phenotypic characterization presented is comprehensive, and the authors develop a well-designed behavioral analysis pipeline that will provide considerable value for zebrafish neuroscientists.

    2. Reviewer #1 (Public Review):

      Summary:

      In this study, Kroll et al. conduct an in-depth behavioral analysis of F0 knockouts of 4 genes associated with late-onset Alzheimer's Disease (AD), together with 3 genes associated with early-onset AD. Kroll and colleagues developed a web application (ZOLTAR) to compare sleep-associated traits between genetic mutants with those obtained from a panel of small molecules to promote the identification of affected pathways and potential therapeutic interventions. The authors make a set of potentially important findings vis-à-vis the relationship between AD-associated genes and sleep. First, they find that loss-of-function in late-onset AD genes universally results in nighttime sleep loss, consistent with the well-supported hypothesis that sleep disruption contributes to Alzheimer's-related pathologies. psen-1, an early-onset associated AD gene, which the authors find is principally responsible for the generation of AB40 and AB42 in zebrafish, also shows a slight increase in activity at night and slight decreases in nighttime sleep. Conversely, psen-2 mutations increase daytime sleep, while appa/appb mutations have no impact on sleep. Finally, using ZOLTAR, the authors identify serotonin receptor activity as potentially disrupted in sorl1 mutants, while betamethasone is identified as a potential therapeutic to promote reversal of psen2 knockout-associated phenotypes.

      This is a highly innovative and thorough study, yet a handful of key questions remain. First, are nighttime sleep loss phenotypes observed in all knockouts for late-onset AD genes in the larval zebrafish a valid proxy for AD risk? For those mutants that cause nighttime sleep disturbances, do these phenotypes share a common underlying pathway? e.g. Do 5-HT reuptake inhibitors promote sleep across all 4 late-onset genes in addition to psen1? Can 5-HT reuptake inhibitors reverse other AD-related pathologies in zebrafish? Can compounds be identified that have a common behavioral fingerprint across all or multiple AD risk genes? Do these modify sleep phenotypes? Finally, the web-based platform presented could be expanded to facilitate comparison of other behavioral phenotypes, including stimulus-evoked behaviors. Finally, the authors propose but do not test the hypothesis that sorl1 might regulate localization/surface expression of 5-HT2 receptors. This could provide exciting / more convincing mechanistic support for the assertion that serotonin signaling is disrupted upon loss of AD-associated genes. Despite these important considerations, this study provides a valuable platform for high-throughput analysis of sleep phenotypes and correlation with small-molecule-induced sleep phenotypes.

      Strengths:

      - Provides a useful platform for comparison of sleep phenotypes across genotypes/drug manipulations.

      - Presents convincing evidence that nighttime sleep is disrupted in mutants for multiple late-onset AD-related genes.

      - Provides potential mechanistic insights for how AD-related genes might impact sleep and identifies a few drugs that modify their identified phenotypes

      Weaknesses:

      - Exploration of potential mechanisms for serotonin disruption in sorl1 mutants is limited.

      - The pipeline developed can only be used to examine sleep-related / spontaneous movement phenotypes and stimulus-evoked behaviors are not examined.

      - Comparisons between mutants/exploration of commonly affected pathways are limited.

    3. Reviewer #2 (Public Review):

      Summary:

      This work delineates the larval zebrafish behavioral phenotypes caused by the F0 knockout of several important genes that increase the risk for Alzheimer's disease. Using behavioral pharmacology, comparing the behavioral fingerprint of previously assayed molecules to the newly generated knockout data, compounds were discovered that impacted larval movement in ways that suggest interaction with or recovery of disrupted mechanisms.

      Strengths:

      This is a well-written manuscript that uses newly developed analysis methods to present the findings in a clear, high-quality way. The addition of an extensive behavioral analysis pipeline is of value to the field of zebrafish neuroscience and will be particularly helpful for researchers who prefer the R programming language. Even the behavioral profiling of these AD risk genes, regardless of the pharmacology aspect, is an important contribution. The recovery of most behavioral parameters in the psen2 knockout with betamethasone, predicted by comparing fingerprints, is an exciting demonstration of the approach. The hypotheses generated by this work are important stepping stones to future studies uncovering the molecular basis of the proposed gene-drug interactions and discovering novel therapeutics to treat AD or co-occurring conditions such as sleep disturbance.

      Weaknesses:

      - The overarching concept of the work is that comparing behavioral fingerprints can align genes and molecules with similarly disrupted molecular pathways. While the recovery of the psen2 phenotypes by one molecule with the opposite phenotype is interesting, as are previous studies that show similar behaviorally-based recoveries, the underlying assumption that normalizing the larval movement normalizes the mechanism still lacks substantial support. There are many ways that a reduction in movement bouts could be returned to baseline that are unrelated to the root cause of the genetically driven phenotype. An ideal experiment would be to thoroughly characterize a mutant, such as by identifying a missing population of neurons, and use this approach to find a small molecule that rescues both behavior and the cellular phenotype. If the connection to serotonin in the sorl1 was more complete, for example, the overarching idea would be more compelling.

      - The behavioral difference between the sorl1 KO and scrambled at the higher dose of the citalopram is based on a small number of animals. The KO Euclidean distance measure is also more spread out than for the other datasets, and it looks like only five or so fish are driving the group difference. It also appears as though the numbers were also from two injection series. While there is nothing obviously wrong with the data, I would feel more comfortable if such a strong statement of a result from a relatively subtle phenotype were backed up by a higher N or a stable line. It is not impossible that the observed difference is an experimental fluke. If something obvious had emerged through the HCR, that would have also supported the conclusions. As it stands, if no more experiments are done to bolster the claim, the confidence in the strength of the link to serotonin should be reduced (possibly putting the entire section in the supplement and modifying the discussion). The discussion section about serotonin and AD is interesting, but I think that it is excessive without additional evidence.

      - The authors suggest two hypotheses for the behavioral difference between the sorl1 KO and scrambled at the higher dose of the citalopram. While the first is tested, and found to not be supported, the second is not tested at all ("Ruling out the first hypothesis, sorl1 knockouts may react excessively to a given spike in serotonin." and "Second, sorl1 knockouts may be overly sensitive to serotonin itself because post-synaptic neurons have higher levels of serotonin receptors."). Assuming that the finding is robust, there are probably other reasons why the mutants could have a different sensitivity to this molecule. However, if this particular one is going to be mentioned, it is surprising that it was not tested alongside the first hypothesis. This work could proceed without a complete explanation, but additional discussion of the possibilities would be helpful or why the second hypothesis was not tested.

      - The authors claim that "all four genes produced a fairly consistent phenotype at night". While it is interesting that this result arose in the different lines, the second clutch for some genes did not replicate as well as others. I think the findings are compelling, regardless, but the sometimes missing replicability should be discussed. I wonder if the F0 strategy adds noise to the results and if clean null lines would yield stronger phenotypes. Please discuss this possibility, or others, in regard to the variability in some phenotypes.

      - In this work, the knockout of appa/appb is included. While APP is a well-known risk gene, there is no clear justification for making a knockout model. It is well known that the upregulation of app is the driver of Alzheimer's, not downregulation. The authors even indicate an expectation that it could be similar to the other knockouts ("Moreover, the behavioural phenotypes of appa/appb and psen1 knockout larvae had little overlap while they presumably both resulted in the loss of Aβ." and "Comparing with early-onset genes, psen1 knockouts had similar night-time phenotypes, but loss of psen2 or appa/appb had no effect on night-time sleep."). There is no reason to expect similarity between appa/appb and psen1/2. I understand that the app knockouts could unveil interesting early neurodevelopmental roles, but the manuscript needs to be clarified that any findings could be the opposite of expectation in AD.

    4. Reviewer #3 (Public Review):

      In this manuscript by Kroll and colleagues, the authors describe combining behavioral pharmacology with sleep profiling to predict disease and potential treatment pathways at play in AD. AD is used here as a case study, but the approaches detailed can be used for other genetic screens related to normal or pathological states for which sleep/arousal is relevant. The data are for the most part convincing, although generally the phenotypes are relatively small and there are no major new mechanistic insights. Nonetheless, the approaches are certainly of broad interest and the data are comprehensive and detailed.

      A notable weakness is the introduction, which overly generalizes numerous concepts and fails to provide the necessary background to set the stage for the data.

      Major points

      (1) The authors should spend more time explaining what they see as the meaning of the large number of behavioral parameters assayed and specifically what they tell readers about the biology of the animal. Many are hard to understand--e.g. a "slope" parameter.

      (2) Because in the end the authors did not screen that many lines, it would increase confidence in the phenotypes to provide more validation of KO specificity. Some suggestions include:<br /> a. The authors cite a psen1 and psen2 germline mutant lines. Can these be tested in the FramebyFrame R analysis? Do they phenocopy F0 KO larvae?<br /> b. psen2KO is one of the larger centerpieces of the paper. The authors should present more compelling evidence that animals are truly functionally null. Without this, how do we interpret their phenotypes?<br /> c. Related to the above, for cd2AP and sorl1 KO, some of the effect sizes seem to be driven by one clutch and not the other. In other words, great clutch-to-clutch variability. Should the authors increase the number of clutches assayed?

      (3) The authors make the point that most of the AD risk genes are expressed in fish during development. Is there public data to comment on whether the genes of interest are expressed in mature/old fish as well? Just because the genes are expressed early does not at all mean that early-life dysfunction is related to future AD (though this could be the case, of course). Genes with exclusive developmental expression would be strong candidates for such an early-life role, however. I presume the case is made because sleep studies are mainly done in juvenile fish, but I think it is really a pretty minor point and such a strong claim does not even need to be made.

      (4) A common quandary with defining sleep behaviorally is how to rectify sleep and activity changes that influence one another. With psen2 KOs, the authors describe reduced activity and increased sleep during the day. But how do we know if the reduced activity drives increased behavioral quiescence that is incorrectly defined as sleep? In instances where sleep is increased but activity during periods during wake are normal or elevated, this is not an issue. But here, the animals might very well be unhealthy, and less active, so naturally they stop moving more for prolonged periods, but the main conclusion is not sleep per se. This is an area where more experiments should be added if the authors do not wish to change/temper the conclusions they draw. Are psen2 KOs responsive to startling stimuli like controls when awake? Do they respond normally when quiescent? Great care must be taken in all models using inactivity as a proxy for sleep, and it can harm the field when there is no acknowledgment that overall health/activity changes could be a confound. Particularly worrisome is the betamethasone data in Figure 6, where activity and sleep are once again coordinately modified by the drug.

      (5) The conclusions for the serotonin section are overstated. Behavioural pharmacology purports to predict a signaling pathway disrupted with sorl1 KO. But is it not just possible that the drug acts in parallel to the true disrupted pathway in these fish? There is no direct evidence for serotonin dysfunction - that conclusion is based on response to the drug. Moreover, it is just 1 drug - is the same phenotype present with another SSRI? Likewise, language should be toned down in the discussion, as this hypothesis is not "confirmed" by the results (consider "supported"). The lack of measured serotonin differences further raises concern that this is not the true pathway. This is another major point that deserves further experimental evidence, because without it, the entire approach (behavioral pharm screen) seems more shaky as a way to identify mechanisms. There are any number of testable hypotheses to pursue such as a) Using transient transgenesis to visualize 5HT neuron morphology (is development perturbed: cell number, neurite morphology, synapse formation); b) Using transgenic Ca reporters to assay 5HT neuron activity.

    5. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, Kroll et al. conduct an in-depth behavioral analysis of F0 knockouts of 4 genes associated with late-onset Alzheimer's Disease (AD), together with 3 genes associated with early- onset AD. Kroll and colleagues developed a web application (ZOLTAR) to compare sleep-associated traits between genetic mutants with those obtained from a panel of small molecules to promote the identification of affected pathways and potential therapeutic interventions. The authors make a set of potentially important findings vis-à-vis the relationship between AD-associated genes and sleep. First, they find that loss-of-function in late-onset AD genes universally results in nighttime sleep loss, consistent with the well-supported hypothesis that sleep disruption contributes to Alzheimer's-related pathologies. psen-1, an early-onset associated AD gene, which the authors find is principally responsible for the generation of AB40 and AB42 in zebrafish, also shows a slight increase in activity at night and slight decreases in nighttime sleep. Conversely, psen-2 mutations increase daytime sleep, while appa/appb mutations have no impact on sleep. Finally, using ZOLTAR, the authors identify serotonin receptor activity as potentially disrupted in sorl1 mutants, while betamethasone is identified as a potential therapeutic to promote reversal of psen2 knockout-associated phenotypes.

      This is a highly innovative and thorough study, yet a handful of key questions remain. First, are nighttime sleep loss phenotypes observed in all knockouts for late-onset AD genes in the larval zebrafish a valid proxy for AD risk?

      We cannot say, but it is an interesting question. We selected the four late-onset Alzheimer’s risk genes (APOE, CD2AP, CLU, SORL1) based on human genetics data and brain expression in zebrafish larvae, not based on their likelihood to modify sleep behaviour, which we could have tried by searching for overlaps with GWAS of sleep phenotypes, for example. Consequently, we find it remarkable that all four of these genes caused a night-time sleep phenotype when mutated. We also find it reassuring that knockout of appa/appb and psen2 did not cause a night-time sleep phenotype, which largely excludes the possibility that the phenotype is a technical artefact (e.g. caused by the F0 knockout method) or a property of every gene expressed in the larval brain.

      Having said that, it could still be a coincidence, rather than a special property of genes associated with late-onset AD. In addition to testing additional late-onset Alzheimer’s risk genes, the ideal way to answer this question would be to test in parallel a random set of genes expressed in the brain at this stage of development. From this random set, one could estimate the proportion of genes that cause a night-time sleep phenotype when mutated. One could then use that information to test whether late-onset Alzheimer’s risk genes are indeed enriched for genes that cause a night-time sleep phenotype when mutated.

      For those mutants that cause nighttime sleep disturbances, do these phenotypes share a common underlying pathway? e.g. Do 5-HT reuptake inhibitors promote sleep across all 4 late-onset genes in addition to psen1? Can 5-HT reuptake inhibitors reverse other AD-related pathologies in zebrafish? Can compounds be identified that have a common behavioral fingerprint across all or multiple AD risk genes? Do these modify sleep phenotypes?

      To attempt to answer these questions, we used ZOLTAR to generate predictions for all the knockout behavioural fingerprints presented in the study, in the same way as for sorl1 in Fig. 5 and Fig. 5–suppl. 1. Here are the indications, targets, and KEGG pathways which are shared by the largest number of knockouts:

      – Four indications are shared by 4/7 knockouts: “mydriasis” (dilated pupils, significant for psen1, apoea/apoeb, cd2ap, clu); “fragile X syndrome” (psen1, apoea/apoeb, cd2ap, sorl1), “insomnia” (psen2, apoea/apoeb, cd2ap, sorl1); “malignant essential hypertension” (appa/appb, psen1, apoea/apoeb, cd2ap).

      – Two targets are shared by 5/7 knockouts: “glycogen synthase kinase−3 alpha” (psen1, apoeab, cd2ap, clu, sorl1) and “neuronal acetylcholine receptor beta−2” (appa/appb, psen1, apoeab, cd2ap, clu).

      – Two KEGG pathways are shared by 5/7 knockouts: “cholinergic synapse” (psen1, apoea/apoeb, cd2ap, clu, sorl1) and “nitrogen metabolism” (appa/appb, psen1, psen2, cd2ap, clu).

      As reminder, we hypothesised that loss of Sorl1 affected serotonin signalling based on the following annotations being significant: indication “depression”, target “serotonin transporter”, and KEGG pathway “serotonergic synapse”. All three are also significant for psen2 knockouts, but none others. ZOLTAR therefore does not predict serotonin signalling to be a major theme common to all mutants with a night-time sleep loss phenotype.

      While perhaps not surprising, we find reassuring that insomnia appears in the indications shared by the largest number of knockouts. apoea/apoeb, cd2ap, sorl1 also happen to be the knockouts with the largest loss in night-time sleep.

      Particularly interesting is cholinergic signalling appearing in the most common targets and KEGG pathways. Acetylcholine signalling is a major theme in research on Alzheimer’s disease. For example, the first four drugs ever approved by the FDA to treat Alzheimer’s disease were acetylcholinesterase inhibitors, which increase acetylcholine signalling by preventing its breakdown by acetylcholinesterase. These drugs are generally considered only to treat symptoms and not modify disease course, but this view has been called into question (Munoz-Torrero, 2008; Relkin, 2007). If, as ZOLTAR suggests, mutations in several Alzheimer’s risk genes affect cholinergic signalling early in development, this would point to a potential causal role of cholinergic disruption in Alzheimer’s disease.

      We see that literature also exists on the involvement of glycogen synthase kinase-3 in AD (Lauretti et al., 2020). We plan to explore further these predictions in a future study.

      Finally, the web- based platform presented could be expanded to facilitate comparison of other behavioral phenotypes, including stimulus-evoked behaviors.

      Yes, absolutely. The behavioural dataset we used (Rihel et al., 2010) did not measure other stimuli than day/night light transitions, but the “SauronX” platform and dataset (Myers-Turnbull et al., 2022) seems particularly well suited for this. To provide some context, we and collaborators have occasionally used the dataset by Rihel et al. (2010) to generate hypotheses or find candidate drugs that reverse a behavioural phenotype measured in the sleep/wake assay (Ashlin et al., 2018; Hoffman et al., 2016). The present work was the occasion to enable a wider and more intuitive use of this dataset through the ZOLTAR app, which has already proven successful. Future versions of ZOLTAR will seek to incorporate larger drug datasets using more types of measurements.

      Finally, the authors propose but do not test the hypothesis that sorl1 might regulate localization/surface expression of 5-HT2 receptors. This could provide exciting / more convincing mechanistic support for the assertion that serotonin signaling is disrupted upon loss of AD-associated genes.

      5-HT receptor type 4a is another candidate as it was shown to interact with sorting nexin 27, a subunit of retromer (Joubert et al., 2004). We see that antibodies against human 5-HT receptor type 2 and 4a exist; whether they would work in zebrafish remains to be tested, and in our experience, the availability of antibodies suitable for immunohistochemistry in the zebrafish is a serious experimental roadblock.

      Despite these important considerations, this study provides a valuable platform for high-throughput analysis of sleep phenotypes and correlation with small-molecule-induced sleep phenotypes.

      Strengths:

      - Provides a useful platform for comparison of sleep phenotypes across genotypes/drug manipulations.

      - Presents convincing evidence that nighttime sleep is disrupted in mutants for multiple late-onset AD-related genes.

      - Provides potential mechanistic insights for how AD-related genes might impact sleep and identifies a few drugs that modify their identified phenotypes

      Weaknesses:

      - Exploration of potential mechanisms for serotonin disruption in sorl1 mutants is limited.

      - The pipeline developed can only be used to examine sleep-related / spontaneous movement phenotypes and stimulus-evoked behaviors are not examined.

      - Comparisons between mutants/exploration of commonly affected pathways are limited.

      Thank you for these excellent suggestions, please see our answers above.

      Reviewer #2 (Public Review):

      Summary:

      This work delineates the larval zebrafish behavioral phenotypes caused by the F0 knockout of several important genes that increase the risk for Alzheimer's disease. Using behavioral pharmacology, comparing the behavioral fingerprint of previously assayed molecules to the newly generated knockout data, compounds were discovered that impacted larval movement in ways that suggest interaction with or recovery of disrupted mechanisms.

      Strengths:

      This is a well-written manuscript that uses newly developed analysis methods to present the findings in a clear, high-quality way. The addition of an extensive behavioral analysis pipeline is of value to the field of zebrafish neuroscience and will be particularly helpful for researchers who prefer the R programming language. Even the behavioral profiling of these AD risk genes, regardless of the pharmacology aspect, is an important contribution. The recovery of most behavioral parameters in the psen2 knockout with betamethasone, predicted by comparing fingerprints, is an exciting demonstration of the approach. The hypotheses generated by this work are important stepping stones to future studies uncovering the molecular basis of the proposed gene-drug interactions and discovering novel therapeutics to treat AD or co-occurring conditions such as sleep disturbance.

      Weaknesses:

      - The overarching concept of the work is that comparing behavioral fingerprints can align genes and molecules with similarly disrupted molecular pathways. While the recovery of the psen2 phenotypes by one molecule with the opposite phenotype is interesting, as are previous studies that show similar behaviorally-based recoveries, the underlying assumption that normalizing the larval movement normalizes the mechanism still lacks substantial support. There are many ways that a reduction in movement bouts could be returned to baseline that are unrelated to the root cause of the genetically driven phenotype. An ideal experiment would be to thoroughly characterize a mutant, such as by identifying a missing population of neurons, and use this approach to find a small molecule that rescues both behavior and the cellular phenotype. If the connection to serotonin in the sorl1 was more complete, for example, the overarching idea would be more compelling.

      Thank you for this cogent criticism.

      On the first point, we were careful not to claim that betamethasone normalises the molecular/cellular mechanism that causes the psen2 behavioural phenotype. Having said that, yes, to a certain extent that would be the hope of the approach. As you say, every compound which normalises the behavioural fingerprint will not normalise the underlying mechanism, but the opposite seems true: every compound that normalises the underlying mechanism should also normalise the behavioural fingerprint. We think this logic makes the “behaviour-first” approach innovative and interesting. The logic is to discover compounds that normalise the behavioural phenotype first, only subsequently test whether they also normalise the molecular mechanism, akin to testing first whether a drug resolves the symptoms before testing whether it actually modifies disease course. While in practice testing thousands of drugs in sufficient sample sizes and replicates on a mutant line is challenging, the dataset queried through ZOLTAR provides a potential shortcut by shortlisting in silico compounds that have the opposite effect on behaviour.

      You mention a “reduction in movement bouts” but note here that the number of behavioural parameters tested is key to our argument. To take the two extremes, say the only behavioural parameter we measured in psen2 knockout larvae was time active during the day, then, yes, any stimulant used at the right concentration could probably normalise the phenotype. In this situation, claiming that the stimulant is likely to also normalise the underlying mechanism, or even that it is a genuine “phenotypic rescue”, would not be convincing. Conversely, say we were measuring thousands of behavioural parameters under various stimuli, such as swimming speed, position in the well, bout usage, tail movements, and eye angles, it seems almost impossible for a compound to rescue most parameters without also normalising the underlying mechanism. The present approach is somewhere in-between: ZOLTAR uses six behavioural parameters for prediction (e.g. Fig 6a), but all 17 parameters calculated by FramebyFrame can be used to assess rescue during a subsequent experiment (Fig. 6c). For both, splitting each parameter in day and night increases the resolution of the approach, which partly answers your criticism. For example, betamethasone rescued the day-time hypoactivity without causing night-time hyperactivity, so we are not making the “straw man argument” explained above of using any broad stimulant to rescue the hypoactivity phenotype.

      Furthermore, for diseases where the behavioural defect is the primary concern, such as autism or bipolar disorder, perhaps this behaviour-first approach is all that is needed, and whether or not the compound precisely rescues the underlying mechanism is somewhat secondary. The use of lithium to prevent manic episodes in bipolar disorder is a good example. It was initially tested because mania was thought to be caused by excess uric acid and lithium can dissolve uric acid (Mitchell and Hadzi-Pavlovic, 2000). The theory is now discredited, but lithium continues to be used without a precise understanding of its mode of action. In this example, behavioural rescue alone, with tolerable secondary effects, is sufficient to be beneficial to patients, and whether it modulates the correct causal pathway is secondary.

      On the second point, we agree that testing first ZOLTAR on a mutant for which we have a fairly good understanding of the mechanism causing the behavioural phenotype could have been a productive approach. Note, however, that examples already exist in the literature. First, Hoffman et al. (2016) found that drugs generating behavioural fingerprints that positively correlate with the cntnap2a/cntnap2b double knockout fingerprint are enriched with NMDA and GABA receptor antagonists. In experiments analogous to our citalopram treatment (Fig. 5c,d), cntnap2a/cntnap2b knockout larvae were found to be overly sensitive to the NMDA receptor antagonist MK-801 and the GABAA receptor antagonist pentylenetetrazol (PTZ). Among other drugs tested, zolpidem, a GABAA receptor agonist, caused opposite effects on wild-type and cntnap2a/cntnap2b knockout larvae. Knockout larvae also had fewer GABAergic neurons in the forebrain. Second, Ashlin et al. (2018) found that the fingerprint of pitpnc1a knockout larvae clustered with anti-inflammatory compounds. Flumethasone, an anti-inflammatory corticosteroid, caused a lower increase in activity when added to knockout larvae compared to wild-type larvae. While these studies did not use precisely the same analysis that ZOLTAR runs, they used the same rationale and behavioural dataset to make these predictions (Rihel et al., 2010), which shows that approaches like ZOLTAR can point to causal processes.

      Related to your next point, we may reduce the discussion on sorl1 and serotonin and add some of the present arguments instead, depending on the results from  testing a second SSRI (see next point).

      - The behavioral difference between the sorl1 KO and scrambled at the higher dose of the citalopram is based on a small number of animals. The KO Euclidean distance measure is also more spread out than for the other datasets, and it looks like only five or so fish are driving the group difference. It also appears as though the numbers were also from two injection series. While there is nothing obviously wrong with the data, I would feel more comfortable if such a strong statement of a result from a relatively subtle phenotype were backed up by a higher N or a stable line. It is not impossible that the observed difference is an experimental fluke. If something obvious had emerged through the HCR, that would have also supported the conclusions. As it stands, if no more experiments are done to bolster the claim, the confidence in the strength of the link to serotonin should be reduced (possibly putting the entire section in the supplement and modifying the discussion). The discussion section about serotonin and AD is interesting, but I think that it is excessive without additional evidence.

      We mostly agree with this criticism. One could interpret the larger spread of the data for sorl1 larvae treated with 10 µM citalopram as evidence that the knockout larvae do indeed react differently to the drug at this dose. However, the result indeed does not survive removing the top 5 (p = 0.87) or top 3 (p = 0.18) sorl1 larvae.

      Given that the HCR did not reveal anything striking, we agree with you that too much of our argument relies on this result being robust. As you and reviewer #3 suggest, we plan on repeating this experiment with a different serotonin reuptake inhibitor (SSRI). If the other SSRI also shows a differential effect, this should strengthen the claim that ZOLTAR correctly predicted serotonin signalling as being affected by the loss of Sorl1, even if we did not discover the molecular mechanism.

      - The authors suggest two hypotheses for the behavioral difference between the sorl1 KO and scrambled at the higher dose of the citalopram. While the first is tested, and found to not be supported, the second is not tested at all ("Ruling out the first hypothesis, sorl1 knockouts may react excessively to a given spike in serotonin." and "Second, sorl1 knockouts may be overly sensitive to serotonin itself because post-synaptic neurons have higher levels of serotonin receptors."). Assuming that the finding is robust, there are probably other reasons why the mutants could have a different sensitivity to this molecule. However, if this particular one is going to be mentioned, it is surprising that it was not tested alongside the first hypothesis. This work could proceed without a complete explanation, but additional discussion of the possibilities would be helpful or why the second hypothesis was not tested.

      There are no strong scientific reasons why this hypothesis was not tested. The lead author (F Kroll) moved to a different lab and country so the project was finalised at that time. We do not plan on testing this hypothesis at this stage. However, we will adapt the wording to make it clear this is one possible alternative hypothesis which could be tested in the future, rather than the only alternative.

      - The authors claim that "all four genes produced a fairly consistent phenotype at night". While it is interesting that this result arose in the different lines, the second clutch for some genes did not replicate as well as others. I think the findings are compelling, regardless, but the sometimes missing replicability should be discussed. I wonder if the F0 strategy adds noise to the results and if clean null lines would yield stronger phenotypes. Please discuss this possibility, or others, in regard to the variability in some phenotypes.

      For the first part of this point, please see below our answer to Reviewer #3, point (2) c.

      Regarding the F0 strategy potentially adding variability, it is an interesting question which we tested in a larger dataset of behavioural recordings from F0 and stable knockouts for the same genes (unpublished). In summary, the F0 knockout method does not increase clutch-to-clutch or larva-to-larva variability in the assay. F0 knockout experiments found many more significant parameters and larger effect sizes than stable knockout experiments, but this difference could largely be explained by the larger sample sizes of F0 knockout experiments. In fact, larger sample sizes within individual clutches appears to be a major advantage of the F0 knockout approach over in-cross of heterozygous knockout animals as it increases sensitivity of the assay without causing substantial variability. We plan to report in more details on this analysis in a separate paper as we think it would dilute the focus of the present work.

      - In this work, the knockout of appa/appb is included. While APP is a well-known risk gene, there is no clear justification for making a knockout model. It is well known that the upregulation of app is the driver of Alzheimer's, not downregulation. The authors even indicate an expectation that it could be similar to the other knockouts ("Moreover, the behavioural phenotypes of appa/appb and psen1 knockout larvae had little overlap while they presumably both resulted in the loss of Aβ." and "Comparing with early-onset genes, psen1 knockouts had similar night-time phenotypes, but loss of psen2 or appa/appb had no effect on night-time sleep."). There is no reason to expect similarity between appa/appb and psen1/2. I understand that the app knockouts could unveil interesting early neurodevelopmental roles, but the manuscript needs to be clarified that any findings could be the opposite of expectation in AD.

      On “there is no reason to expect similarity […]”, we disagree. Knockout of appa/appb and knockout psen1 will both result in loss of Aβ (appa/appb encode Aβ and psen1 cleaves Appa/Appb to release Aβ, cf. Fig. 3e). Consequently, a phenotype caused by the loss of Aβ, or possibly other Appa/Appb cleavage products, should logically be found in both appa/appb and psen1 knockouts.

      On “it is well known that the upregulation of APP is the driver of Alzheimer’s, not downregulation”; we of course agree. Among others, the examples of Down syndrome, APP duplication (Sleegers et al., 2006), or mouse models overexpressing human APP show definitely that overexpression of APP is sufficient to cause AD. Having said that, we would not be so quick in dismissing APP knockout as potentially relevant to understanding of Alzheimer’s disease. Loss of soluble Aβ due to aggregation could contribute to pathology (Espay et al., 2023). Without getting too much into this intricate debate, links between levels of Aβ and risk of disease are often counter-intuitive too. For example, out of 138 PSEN1 mutations screened in vitro, 104 reduced total Aβ production and 11 even seemingly abolished the production of both Aβ40 and Aβ42 (Sun et al., 2017). In short, loss of soluble Aβ occurs in both AD and in our appa/appb knockout larvae, but the ideal approach would be to study zebrafish larvae with an in-frame deletion in the Aβ sequence within appa/appb.

      We will adapt the language to address your point. We would not want to imply, for example, that the absence of a night-time sleep phenotype for appa/appb is contradictory to the body of literature showing links between Aβ and sleep, including in zebrafish (Özcan et al., 2020). As you say, our experiment tested loss of App, including Aβ, while the literature typically reports on overexpression of APP, as in APP/PSEN1-overexpressing mice (Jagirdar et al., 2021).

      Reviewer #3 (Public Review):

      In this manuscript by Kroll and colleagues, the authors describe combining behavioral pharmacology with sleep profiling to predict disease and potential treatment pathways at play in AD. AD is used here as a case study, but the approaches detailed can be used for other genetic screens related to normal or pathological states for which sleep/arousal is relevant. The data are for the most part convincing, although generally the phenotypes are relatively small and there are no major new mechanistic insights. Nonetheless, the approaches are certainly of broad interest and the data are comprehensive and detailed.

      A notable weakness is the introduction, which overly generalizes numerous concepts and fails to provide the necessary background to set the stage for the data.

      Major points

      (1) The authors should spend more time explaining what they see as the meaning of the large number of behavioral parameters assayed and specifically what they tell readers about the biology of the animal. Many are hard to understand--e.g. a "slope" parameter.

      We agree that some parameters do not tell something intuitive about the biology of the animal. It would be easy to speculate. For example, the “activity slope” parameter may indicate how quickly the animal becomes tired over the course of the day. On the other hand, fractal dimension describes the “roughness/smoothness” of the larva’s activity trace (Fig. 2–suppl. 1a); but it is not obvious how to translate this into information about the physiology of the animal. We do not see this as an issue though. While some parameters do provide intuitive information about the animal’s behaviour (e.g. sleep duration or sunset startle as a measure of startle response), the benefit of having a large number of behavioural parameters is to compare behavioural fingerprints and assess rescue of the behavioural phenotype by small molecules (Fig. 6c). For this purpose, the more parameters the better. The “MoSeq” approach from Wiltschko et al., 2020 is a good example from literature that inspired our own Fig. 6c. While some of the “behavioural syllables” may be intuitive (e.g. running or grooming), it is probably pointless to try to explain the ‘meaning’ of the “small left turn in place with head motion” syllable (Wiltschko et al., 2020). Nonetheless, this syllable was useful to assess whether a drug specifically treats the behavioural phenotype under study without causing too many side effects. Unfortunately, ZOLTAR has to reduce the FramebyFrame fingerprint (17 parameters) to just six parameters to compare it to the behavioural dataset from Rihel et al., 2010, but here, more parameters would almost certainly translate into better predictions too, regardless of their intuitiveness.

      It is true however that we do not give much information on how some of the less intuitive parameters, such as activity slope or fractal dimension, are calculated or what they describe about the dataset (e.g. roughness/smoothness for fractal dimension). We will improve this in our revised version.

      (2) Because in the end the authors did not screen that many lines, it would increase confidence in the phenotypes to provide more validation of KO specificity. Some suggestions include:

      a. The authors cite a psen1 and psen2 germline mutant lines. Can these be tested in the FramebyFrame R analysis? Do they phenocopy F0 KO larvae?

      We unfortunately do not have those lines. We investigated the availability of importing a psen2 knockout line from abroad, but the process of shipping live animals is becoming more and more cost and time prohibitive. However, we observed the same pigmentation phenotype for psen2 knockouts as reported by Jiang et al., 2018, which is at least a partial confirmation of phenocopying a loss of function stable mutant. 

      b. psen2KO is one of the larger centerpieces of the paper. The authors should present more compelling evidence that animals are truly functionally null. Without this, how do we interpret their phenotypes?

      We disagree that there should be significant doubt about these mutants being truly functionally null,  given the high mutation rate and presence of the expected pigmentation phenotype (Jiang et al., 2018, Fig. 3f and Fig. 3–suppl. 2). The psen2 F0 knockouts were virtually 100% mutated at three exons across the gene (mutation rates were locus 1: 100 ± 0%; locus 2: 99.99 ± 0.06%; locus 3: 99.85 ± 0.24%). Additionally, two of the three mutated exons had particularly high rates of frameshift mutations (locus 1: 97 ± 5%; locus 2: 88 ± 17% frameshift mutation rate). It is virtually impossible that a functional protein is translated given this burden of frameshift mutations. Phenotypically, in addition to the pigmentation defect, double psen1/psen2 F0 knockout larvae had curved tails, the same phenotype as caused by a high dose of the γ-secretase inhibitor DAPT (Yang et al., 2008). These double F0 knockouts were lethal, while knockout of psen1 or psen2 alone did not cause obvious morphological defects. Evidently, most larvae must have been psen2 null mutants in this experiment, otherwise functional Psen2 would have prevented early lethality.

      Translation of zebrafish psen2 can start at downstream start codons if the first exon has a frameshift mutation, generating a seemingly functional Psen2 missing the N-terminus (Jiang et al., 2020). Zebrafish homozygous for this early frameshift mutation had normal pigmentation, showing it is a reliable marker of Psen2 function even when it is mutated. This mechanism is not a concern here as the alternative start codons are still upstream of two of the three mutated exons (the alternative start codons discovered by Jiang et al., 2020 are in exon 2 and 3, but we targeted exon 3, exon 4, and exon 6).

      We understand that the zebrafish community may be cautious about F0 phenotyping compared to stably generated mutants. As mentioned to Reviewer 2, we are planning to assemble a paper that expressly examines F0s vs. stable mutants to allay some of these concerns. We would also suggest that our current manuscript, which combines CRISPR-F0 rapid screening with in silico pharmacological predictions, ultimately represents a first step in characterizing the functions of genes.

      c. Related to the above, for cd2AP and sorl1 KO, some of the effect sizes seem to be driven by one clutch and not the other. In other words, great clutch-to-clutch variability. Should the authors increase the number of clutches assayed?

      Correct, there is great clutch-to-clutch variability in this behavioural assay. This is not specific to our experiments. Even within the same strain, wild-type larvae from different clutches (i.e. non-siblings) behave differently (Joo et al., 2021). This is why it is essential to compare behavioural phenotypes within individual clutches (i.e., from a single pair of parents, one male and one female), as we explain in Methods (section Behavioural video-tracking) and in the documentation of the FramebyFrame package. We often see two different experimental designs in literature: comparing non-sibling wild-type and mutant larvae, or pooling different clutches which include all genotypes (e.g., pooling multiple clutches from heterozygous in-crosses or pooling wild-type clutches before injecting them). The first experimental design causes false positive findings, as the clutch-to-clutch variability we and others (Joo et al., 2021) observe gets interpreted as a behavioural phenotype. The second experimental design should not cause false positives but will decrease the sensitivity of the assay by increasing the spread within genotypes. In both cases, the clutch-to-clutch variability is hidden, either by interpreting it as a phenotype (first case) or by adding it to animal-to-animal variability (second case). Our experimental design is technically more challenging as it requires obtaining large clutches from unique pairs of parents. However, this approach is better as it clearly separates the different sources of variability (clutch-to-clutch or animal-to-animal). As for every experiment, yes, a larger number of replicates would be better, but we do not plan to assay additional clutches at this time. Our work heavily focuses on the sorl1 and psen2 knockout behavioural phenotypes. The key aspects of these phenotypes were effectively tested in four clutches as sorl1 were also tested in the citalopram experiment (Fig. 5), and psen2 was also tested in the small molecule rescue experiment (Fig. 6 and Fig. 6–suppl. 1). In the citalopram experiment, one H2O-treated sorl1 knockout clutch (n = 10) replicates fairly well the baseline recordings in Fig. 4–suppl. 5, the other does not but had especially low sample size (n = 6).

      We also plan to test another SSRI on sorl1 knockouts, so this point will be addressed.

      (3) The authors make the point that most of the AD risk genes are expressed in fish during development. Is there public data to comment on whether the genes of interest are expressed in mature/old fish as well? Just because the genes are expressed early does not at all mean that early- life dysfunction is related to future AD (though this could be the case, of course). Genes with exclusive developmental expression would be strong candidates for such an early-life role, however. I presume the case is made because sleep studies are mainly done in juvenile fish, but I think it is really a pretty minor point and such a strong claim does not even need to be made.

      This is a fair criticism but we do not make this claim, at least not from expression. The reviewer is probably referring to the following quote:

      “[…] most of these were expressed in the brain of 5–6-dpf zebrafish larvae, suggesting they play a role in early brain development or function,”

      which does not mention future risk of Alzheimer’s disease. We do suggest that these genes have a function in development. After all, every gene that plays a role in brain development must be expressed during development, so this wording seems reasonable. As noted, the primary goal was to check that the genes we selected were indeed expressed in zebrafish larvae before performing knockout experiments. Our discussion does raise the hypothesis that mutations in Alzheimer’s risk genes impact brain development and sleep early in life, but this argument primarily relies on our observation that knockout of late-onset Alzheimer’s risk genes causes sleep phenotypes in 7-day old zebrafish larvae and from previous work showing brain structural differences in infants and children at high genetic risk of Alzheimer’s disease (Dean et al., 2014; Quiroz et al., 2015), not solely on gene expression early in life.

      (4) A common quandary with defining sleep behaviorally is how to rectify sleep and activity changes that influence one another. With psen2 KOs, the authors describe reduced activity and increased sleep during the day. But how do we know if the reduced activity drives increased behavioral quiescence that is incorrectly defined as sleep? In instances where sleep is increased but activity during periods during wake are normal or elevated, this is not an issue. But here, the animals might very well be unhealthy, and less active, so naturally they stop moving more for prolonged periods, but the main conclusion is not sleep per se. This is an area where more experiments should be added if the authors do not wish to change/temper the conclusions they draw. Are psen2 KOs responsive to startling stimuli like controls when awake? Do they respond normally when quiescent? Great care must be taken in all models using inactivity as a proxy for sleep, and it can harm the field when there is no acknowledgment that overall health/activity changes could be a confound. Particularly worrisome is the betamethasone data in Figure 6, where activity and sleep are once again coordinately modified by the drug.

      This is a fair criticism. We agree it is a concern, especially in the case of psen2 as we claim that day-time sleep is increased while zebrafish are diurnal. We do not rely heavily on the day-time inactivity being sleep (the ZOLTAR predictions or the small molecule rescue do not change whether the parameter is called sleep or inactivity), but  our choice of labelling may be misleading. We will try to test this claim by plotting the distribution of the inactive period durations. If psen2 knockout larvae indeed sleep more during the day compared to controls, we might predict that inactive periods longer than 1 minute to increase disproportionately compared to the increase in shorter inactive periods.

      To address, “are psen2 KO responsive to startling stimuli like controls when awake/when quiescent”, we can try to look at the behaviour of psen2 knockout larvae that were awake (i.e., moved in the preceding one minute) or ‘asleep’ (i.e., did not move in the preceding one minute) at the light transitions and count the proportion of psen2 knockout or control larvae which displayed a startle response. If most psen2 knockouts react to the light transition, it should at least exclude the concern that they are very unhealthy, as the reviewer suggests. This criticism seems challenging to definitely address experimentally though. A possible approach could be to use a closed-loop system which, after one minute of inactivity, triggers a stimulus which is sufficient to startle an awake larva but not an asleep larva. If psen2 knockout larvae indeed sleep more during the day, the stimulus should usually not be sufficient to startle them. Note, how to calibrate this stimulus is also not straightforward. We do not plan to test this, but our analysis of the light transitions may provide a decent proxy.

      (5) The conclusions for the serotonin section are overstated. Behavioural pharmacology purports to predict a signaling pathway disrupted with sorl1 KO. But is it not just possible that the drug acts in parallel to the true disrupted pathway in these fish? There is no direct evidence for serotonin dysfunction - that conclusion is based on response to the drug. Moreover, it is just 1 drug - is the same phenotype present with another SSRI? Likewise, language should be toned down in the discussion, as this hypothesis is not "confirmed" by the results (consider "supported"). The lack of measured serotonin differences further raises concern that this is not the true pathway. This is another major point that deserves further experimental evidence, because without it, the entire approach (behavioral pharm screen) seems more shaky as a way to identify mechanisms. There are any number of testable hypotheses to pursue such as a) Using transient transgenesis to visualize 5HT neuron morphology (is development perturbed: cell number, neurite morphology, synapse formation); b) Using transgenic Ca reporters to assay 5HT neuron activity.

      Regarding the comment, “is it not just possible that the drug acts in parallel to the true disrupted pathway”, we think no, assuming we understand correctly your question. Key to our argument is the fact that sorl1 knockout larvae react differently to the drug than control larvae. As an example, take night-time sleep bout length, which was not affected by knockout of sorl1 (Fig. 4–suppl. 5). For the sake of the argument, say only dopamine signalling (the “true disrupted pathway”) was affected in sorl1 knockouts but that serotonin signalling was intact. Assuming that citalopram specifically alters serotonin signalling, then treatment should cause the same increase in sleep bout length in both knockouts and controls as serotonin signalling is intact in both. This is not what we see, however. Citalopram caused a greater increase in sleep bout length in sorl1 knockouts than in scrambled-injected larvae. In other words, the effect is non-additive, in the sense that citalopram did not add the same number of Z-scores to sorl1 knockouts or controls. We think this shows that serotonin signalling is somehow different in sorl1 knockouts. Nonetheless, we would concede that the experiment does not necessarily says much about the importance of the serotonin disruption caused by loss of Sorl1. It could be, for example, that the most salient consequence of loss of Sorl1 is cholinergic disruption (see reply to Reviewer #1 above) and that serotonin signalling is a minor theme.

      Furthermore, we agree with you and Reviewer #2 that the conclusions are overly confident. We will repeat this experiment with another SSRI as you suggest. Your suggestions to further test the serotonin system in the sorl1 knockouts are excellent as well, however we do not plan to pursue them at this stage.

      References:

      Ashlin TG, Blunsom NJ, Ghosh M, Cockcroft S, Rihel J. 2018. Pitpnc1a Regulates Zebrafish Sleep and Wake Behavior through Modulation of Insulin-like Growth Factor Signaling. Cell Rep 24:1389–1396. doi:10.1016/j.celrep.2018.07.012

      Chen D, Wang X, Huang T, Jia J. 2022. Sleep and Late-Onset Alzheimer’s Disease: Shared Genetic Risk Factors, Drug Targets, Molecular Mechanisms, and Causal Effects. Front Genet 13. doi:10.3389/fgene.2022.794202

      Cirrito JR, Disabato BM, Restivo JL, Verges DK, Goebel WD, Sathyan A, Hayreh D, D’Angelo G, Benzinger T, Yoon H, Kim J, Morris JC, Mintun MA, Sheline YI. 2011. Serotonin signaling is associated with lower amyloid-β levels and plaques in transgenic mice and humans. Proc Natl Acad Sci U S A 108:14968–14973. doi:10.1073/pnas.1107411108

      Dean DC, Jerskey BA, Chen K, Protas H, Thiyyagura P, Roontiva A, O’Muircheartaigh J, Dirks H, Waskiewicz N, Lehman K, Siniard AL, Turk MN, Hua X, Madsen SK, Thompson PM, Fleisher AS, Huentelman MJ, Deoni SCL, Reiman EM. 2014. Brain Differences in Infants at Differential Genetic Risk for Late-Onset Alzheimer Disease A Cross-sectional Imaging Study. JAMA Neurol 71:11–22. doi:10.1001/jamaneurol.2013.4544

      Eriksen JL, Sagi SA, Smith TE, Weggen S, Das P, McLendon DC, Ozols VV, Jessing KW, Zavitz KH, Koo EH, Golde TE. 2003. NSAIDs and enantiomers of flurbiprofen target γ-secretase and lower Aβ42 in vivo. J Clin Invest 112:440–449. doi:10.1172/JCI18162

      Espay AJ, Herrup K, Kepp KP, Daly T. 2023. The proteinopenia hypothesis: Loss of Aβ42 and the onset of Alzheimer’s Disease. Ageing Res Rev 92:102112. doi:10.1016/j.arr.2023.102112

      Hoffman EJ, Turner KJ, Fernandez JM, Cifuentes D, Ghosh M, Ijaz S, Jain RA, Kubo F, Bill BR, Baier H, Granato M, Barresi MJF, Wilson SW, Rihel J, State MW, Giraldez AJ. 2016. Estrogens Suppress a Behavioral Phenotype in Zebrafish Mutants of the Autism Risk Gene, CNTNAP2. Neuron 89:725–733. doi:10.1016/j.neuron.2015.12.039

      in ’t Veld Bas A., Ruitenberg Annemieke, Hofman Albert, Launer Lenore J., van Duijn Cornelia M., Stijnen Theo, Breteler Monique M.B., Stricker Bruno H.C. 2001. Nonsteroidal Antiinflammatory Drugs and the Risk of Alzheimer’s Disease. N Engl J Med 345:1515–1521. doi:10.1056/NEJMoa010178

      Jagirdar R, Fu C-H, Park J, Corbett BF, Seibt FM, Beierlein M, Chin J. 2021. Restoring activity in the thalamic reticular nucleus improves sleep architecture and reduces Aβ accumulation in mice. Sci Transl Med 13:eabh4284. doi:10.1126/scitranslmed.abh4284

      Jiang H, Newman M, Lardelli M. 2018. The zebrafish orthologue of familial Alzheimer’s disease gene PRESENILIN 2 is required for normal adult melanotic skin pigmentation. PLOS ONE 13:e0206155. doi:10.1371/journal.pone.0206155

      Jiang H, Pederson SM, Newman M, Dong Y, Barthelson K, Lardelli M. 2020. Transcriptome analysis indicates dominant effects on ribosome and mitochondrial function of a premature termination codon mutation in the zebrafish gene psen2. PloS One 15:e0232559. doi:10.1371/journal.pone.0232559

      Joo W, Vivian MD, Graham BJ, Soucy ER, Thyme SB. 2021. A Customizable Low-Cost System for Massively Parallel Zebrafish Behavioral Phenotyping. Front Behav Neurosci 14.

      Joubert L, Hanson B, Barthet G, Sebben M, Claeysen S, Hong W, Marin P, Dumuis A, Bockaert J. 2004. New sorting nexin (SNX27) and NHERF specifically interact with the 5-HT4a receptor splice variant: roles in receptor targeting. J Cell Sci 117:5367–5379. doi:10.1242/jcs.01379

      Lauretti E, Dincer O, Praticò D. 2020. Glycogen synthase kinase-3 signaling in Alzheimer’s disease. Biochim Biophys Acta Mol Cell Res 1867:118664. doi:10.1016/j.bbamcr.2020.118664

      Leng Y, Ackley SF, Glymour MM, Yaffe K, Brenowitz WD. 2021. Genetic Risk of Alzheimer’s Disease and Sleep Duration in Non-Demented Elders. Ann Neurol 89:177–181. doi:10.1002/ana.25910

      Mitchell PB, Hadzi-Pavlovic D. 2000. Lithium treatment for bipolar disorder. Bull World Health Organ 78:515–517.

      Munoz-Torrero D. 2008. Acetylcholinesterase Inhibitors as Disease-Modifying Therapies for Alzheimer’s Disease. Curr Med Chem 15:2433–2455. doi:10.2174/092986708785909067

      Muto V, Koshmanova E, Ghaemmaghami P, Jaspar M, Meyer C, Elansary M, Van Egroo M, Chylinski D, Berthomier C, Brandewinder M, Mouraux C, Schmidt C, Hammad G, Coppieters W, Ahariz N, Degueldre C, Luxen A, Salmon E, Phillips C, Archer SN, Yengo L, Byrne E, Collette F, Georges M, Dijk D-J, Maquet P, Visscher PM, Vandewalle G. 2021. Alzheimer’s disease genetic risk and sleep phenotypes in healthy young men: association with more slow waves and daytime sleepiness. Sleep 44. doi:10.1093/sleep/zsaa137

      Myers-Turnbull D, Taylor JC, Helsell C, McCarroll MN, Ki CS, Tummino TA, Ravikumar S, Kinser R, Gendelev L, Alexander R, Keiser MJ, Kokel D. 2022. Simultaneous analysis of neuroactive compounds in zebrafish. doi:10.1101/2020.01.01.891432

      Özcan GG, Lim S, Leighton PL, Allison WT, Rihel J. 2020. Sleep is bi-directionally modified by amyloid beta oligomers. eLife 9:e53995. doi:10.7554/eLife.53995

      Quiroz YT, Schultz AP, Chen K, Protas HD, Brickhouse M, Fleisher AS, Langbaum JB, Thiyyagura P, Fagan AM, Shah AR, Muniz M, Arboleda-Velasquez JF, Munoz C, Garcia G, Acosta-Baena N, Giraldo M, Tirado V, Ramírez DL, Tariot PN, Dickerson BC, Sperling RA, Lopera F, Reiman EM. 2015. Brain Imaging and Blood Biomarker Abnormalities in Children With Autosomal Dominant Alzheimer Disease: A Cross-Sectional Study. JAMA Neurol 72:912–919. doi:10.1001/jamaneurol.2015.1099

      Relkin NR. 2007. Beyond symptomatic therapy: a re-examination of acetylcholinesterase inhibitors in Alzheimer’s disease. Expert Rev Neurother 7:735–748. doi:10.1586/14737175.7.6.735

      Rihel J, Prober DA, Arvanites A, Lam K, Zimmerman S, Jang S, Haggarty SJ, Kokel D, Rubin LL, Peterson RT, Schier AF. 2010. Zebrafish Behavioral Profiling Links Drugs to Biological Targets and Rest/Wake Regulation. Science 327:348–351. doi:10.1126/science.1183090

      Sleegers K, Brouwers N, Gijselinck I, Theuns J, Goossens D, Wauters J, Del-Favero J, Cruts M, van Duijn CM, Van Broeckhoven C. 2006. APP duplication is sufficient to cause early onset Alzheimer’s dementia with cerebral amyloid angiopathy. Brain J Neurol 129:2977–2983. doi:10.1093/brain/awl203

      Sun L, Zhou R, Yang G, Shi Y. 2017. Analysis of 138 pathogenic mutations in presenilin-1 on the in vitro production of Aβ42 and Aβ40 peptides by γ-secretase. Proc Natl Acad Sci 114:E476–E485. doi:10.1073/pnas.1618657114

      Weggen S, Rogers M, Eriksen J. 2007. NSAIDs: small molecules for prevention of Alzheimer’s disease or precursors for future drug development? Trends Pharmacol Sci 28:536–543. doi:10.1016/j.tips.2007.09.004

      Wiltschko AB, Tsukahara T, Zeine A, Anyoha R, Gillis WF, Markowitz JE, Peterson RE, Katon J, Johnson MJ, Datta SR. 2020. Revealing the structure of pharmacobehavioral space through motion sequencing. Nat Neurosci 23:1433–1443. doi:10.1038/s41593-020-00706-3

      Yang T, Arslanova D, Gu Y, Augelli-Szafran C, Xia W. 2008. Quantification of gamma-secretase modulation differentiates inhibitor compound selectivity between two substrates Notch and amyloid precursor protein. Mol Brain 1:15. doi:10.1186/1756-6606-1-15

    1. eLife assessment

      van Vliet and colleagues show a correlation between internal states of a convolutional neural network (CNN) trained on visual word stimuli with three specific components of evoked MEG potentials during reading in humans. The findings are useful, but the current results remain incomplete, without evidence that the CNN can produce any of the phenomena that the human visual system is known to have (e.g., feedback connections, sensitivity to word frequency), or that the model has comparable performance to human behaviour (i.e., similar task accuracy with a comparable pattern of mistakes).

    2. Reviewer #1 (Public Review):

      Summary:

      This study trained a CNN for visual word classification and supported a model that can explain key functional effects of the evoked MEG response during visual word recognition, providing an explicit computational account from detection and segmentation of letter shapes to final word-form identification.

      Strengths:

      This paper not only bridges an important gap in modeling visual word recognition, by establishing a direct link between computational processes and key findings in experimental neuroimaging studies, but also provides some conditions to enhance biological realism.

      Weaknesses:

      The interpretation of CNN results, especially the number of layers in the final model and its relationship with the processing of visual words in the human brain, needs to be further strengthened.

    3. Reviewer #2 (Public Review):

      van Vliet and colleagues present the results of a study correlating internal states of a convolutional neural network trained on visual word stimuli with evoked MEG potentials during reading.

      In this study, a standard deep learning image recognition model (VGG-11) trained on a large natural image set (ImageNet) that begins illiterate but is then further trained on visual word stimuli, is used on a set of predefined stimulus images to extract strings of characters from "noisy" words, pseudowords and real words. This methodology is used in hopes of creating a model that learns to apply the same nonlinear transforms that could be happening in different regions of the brain - which would be validated by studying the correlations between the weights of this model and neural responses. Specifically, the aim is that the model learns some vector embedding space, as quantified by the spread of activations across a layer's units (L2 Norm after ReLu Activation Function), for the different kinds of stimuli, that creates a parameterized decision boundary that is similar to amplitude changes at different times for a MEG signal. More importantly, the way that the stimuli are ordered or ranked in that space should be separable to the degree we see separation in neural activity. This study shows that the activation corresponding to five different broad classes of stimuli statistically correlates with three specific components in the ERP. However, I believe there are fundamental theoretical issues that limit the implications of the results of this study.

      As has been shown over many decades, many potential computational algorithms, with varied model architectures, can perform the task of text recognition from an image. However, there is no evidence presented here that this particular algorithm has comparable performance to human behavior (i.e. similar accuracy with a comparable pattern of mistakes). This is a fundamental prerequisite before attempting to meaningfully correlate these layer activations to human neural activations. Therefore, it is unlikely that correlating these derived layer weights to neural activity provides meaningful novel insights into neural computation beyond what is seen using traditional experimental methods.

      One example of a substantial discrepancy between this model and neural activations is that, while incorporating frequency weighting into the training data is shown to slightly increase neural correlation with the model, Figure 7 shows that no layer of the model appears directly sensitive to word frequency. This is in stark contrast to the strong neural sensitivity to word frequency seen in EEG (e.g. Dambacher et al 2006 Brain Research), fMRI (e.g. Kronbichler et al 2004 NeuroImage), MEG (e.g. Huizeling et al 2021 Neurobio. Lang.), and intracranial (e.g. Woolnough et al 2022 J. Neurosci.) recordings. Figure 7 also demonstrates that the late stages of the model show a strong negative correlation with font size, whereas later stages of neural visual word processing are typically insensitive to differences in visual features, instead showing sensitivity to lexical factors.

      Another example of the mismatch between this model and the visual cortex is the lack of feedback connections in the model. Within the visual cortex, there are extensive feedback connections, with later processing stages providing recursive feedback to earlier stages. This is especially evident in reading, where feedback from lexical-level processes feeds back to letter-level processes (e.g. Heilbron et al 2020 Nature Comms.). This feedback is especially relevant for the reading of words in noisy conditions, as tested in the current manuscript, as lexical knowledge enhances letter representation in the visual cortex (the word superiority effect). This results in neural activity in multiple cortical areas varying over time, changing selectivity within a region at different measured time points (e.g. Woolnough et al 2021 Nature Human Behav.), which in the current study is simplified down to three discrete time windows, each attributed to different spatial locations.

      The presented model needs substantial further development to be able to replicate, both behaviorally and neurally, many of the well-characterized phenomena seen in human behavior and neural recordings that are fundamental hallmarks of human visual word processing. Until that point, it is unclear what novel contributions can be gleaned from correlating low-dimensional model weights from these computational models with human neural data.

    4. Reviewer #3 (Public Review):

      Summary:

      The authors investigate the extent to which the responses of different layers of a vision model (VGG-11) can be linked to the cascade of responses (namely, type-I, type-II, and N400) in the human brain when reading words. To achieve maximal consistency, they add noisy-activations to VGG and finetune it on a character recognition task. In this setup, they observe various similarities between the behavior of VGG and the brain when when presented with various transformations of the words (added noise, font modification, etc).

      Strengths:

      - The paper is well-written and well-presented.

      - The topic studied is interesting.

      - The fact that the response of the CNN on unseen experimental contrasts such as adding noise correlated with previous results on the brain is compelling.

      Weaknesses:

      - The paper is rather qualitative in nature. In particular, the authors show that some resemblance exists between the behavior of some layers and some parts of the brain, but it is hard to quantitively understand how strong the resemblances are in each layer, and the exact impact of experimental settings such as the frequency balancing (which seems to only have a very moderate effect according to Figure 5).

      - The experiments only consider a rather outdated vision model (VGG).

    5. Author response:

      We thank the reviewers for their efforts. They have pointed out several shortcomings and made very helpful suggestions. Below, we shortly address the weak points that the reviewers brought up and outline what improvements we intend to make for the revised paper in response.

      Reviewer #1:

      The interpretation of CNN results, especially the number of layers in the final model and its relationship with the processing of visual words in the human brain, needs to be further strengthened.

      The results of our experimentation with the number of layers and the number of units in each layer can be found in the supplementary information. In the revised version, we will bring some of these results into the main text and discuss them more thoroughly.

      Reviewer #2:

      As has been shown over many decades, many potential computational algorithms, with varied model architectures, can perform the task of text recognition from an image. However, there is no evidence presented here that this particular algorithm has comparable performance to human behavior (i.e. similar accuracy with a comparable pattern of mistakes). This is a fundamental prerequisite before attempting to meaningfully correlate these layer activations to human neural activations. Therefore, it is unlikely that correlating these derived layer weights to neural activity provides meaningful novel insights into neural computation beyond what is seen using traditional experimental methods.

      We very much agree with the reviewer that a qualitative analysis of whether the model can explain experimental effects needs to happen before a quantitative analysis, such as evaluating model-brain correlation scores. In fact, this is one of the key points we wished to make.

      This starts with the observation that "traditional" models of reading (=those that do not rely on deep learning) cannot explain some very basic human behavioral results, such as humans being able to recognize a word regardless of exact letter shape, size, and (up to a point) rotation. This is not so much a failure on the part of traditional models as it is a difference in focus. There are models of vision that focus on these low-level things, currently dominated by deep learning, but these are rarely evaluated in the context of reading, which has its own literature and well-known experimental effects. We believe the current version of the manuscript makes insufficiently clear what the goals of our modeling effort are exactly, which is something we will attempt to correct in the revision.

      Since our model only covers the first phase of reading, with a special focus on letter shape detection, we sought to compare it with neuroimaging data that can provide "snapshots" of the state of the brain during these early phases, rather than comparing it with behavioral results that occur at the very end. However, we very much make this comparison in the spirit hinted at by the reviewer. The different MEG components have a distinct "behavior" to them in the way they respond to different experimental conditions (Figure 2), and the model needs to replicate this behavior (Figure 4). Only then do we move on to a quantitative analysis.

      One example of a substantial discrepancy between this model and neural activations is that, while incorporating frequency weighting into the training data is shown to slightly increase neural correlation with the model, Figure 7 shows that no layer of the model appears directly sensitive to word frequency. This is in stark contrast to the strong neural sensitivity to word frequency seen in EEG (e.g. Dambacher et al 2006 Brain Research), fMRI (e.g. Kronbichler et al 2004 NeuroImage), MEG (e.g. Huizeling et al 2021 Neurobio. Lang.), and intracranial (e.g. Woolnough et al 2022 J. Neurosci.) recordings. Figure 7 also demonstrates that the late stages of the model show a strong negative correlation with font size, whereas later stages of neural visual word processing are typically insensitive to differences in visual features, instead showing sensitivity to lexical factors.

      We are glad the reviewer brought up the topic of frequency balancing, as it is a good example of the importance of the qualitative analysis. As the reviewer points out, frequency balancing during training only had a moderate impact on correlation scores and from that point of view does not seem impactful. However, when we look at the qualitative evaluation, we see that with a large vocabulary, a model without frequency balancing fails to properly distinguish between consonant strings and (pseudo)words (Figure 4, 5th row). Hence, from the point of view of being able to reproduce experimental effects, frequency balancing had a large impact. It is true that the model, even with frequency balancing, only captures letter- and bigram-frequency effects and not word-frequency effects, as we know the N400 is sensitive to. This could mean that N400 word-frequency effects are driven by mechanics that our current model lacks, such as top-down effects from systems further up the processing pipeline.

      We agree with the reviewer that the late-stage sensitivity of the model to font size must be seen as a flaw. Of course, we say as much when we discuss this result in the paper. Important context for this flaw is that the main aim of the model is to reproduce the experimental effects of Vartiainen et al. (2011), which does not include manipulation of word length. The experimental contrasts in Figure 7 are meant to explore a bit beyond the boundaries of that particular study, but were never considered "failure points". When presenting a model, it's important to show its limitations too.

      Another example of the mismatch between this model and the visual cortex is the lack of feedback connections in the model. Within the visual cortex, there are extensive feedback connections, with later processing stages providing recursive feedback to earlier stages. This is especially evident in reading, where feedback from lexical-level processes feeds back to letter-level processes (e.g. Heilbron et al 2020 Nature Comms.). This feedback is especially relevant for the reading of words in noisy conditions, as tested in the current manuscript, as lexical knowledge enhances letter representation in the visual cortex (the word superiority effect). This results in neural activity in multiple cortical areas varying over time, changing selectivity within a region at different measured time points (e.g. Woolnough et al 2021 Nature Human Behav.), which in the current study is simplified down to three discrete time windows, each attributed to different spatial locations.

      In this study, we make a start in showing how deep learning techniques could be beneficial to enhance models of reading by showing how even a simple CNN, after a few enhancements, can account for several experimental MEG effects that we see in reading tasks, but are outside the focus of traditional models of reading. We never intended to claim that our model offers a complete view of all the processes involved. This is why we have dedicated a section in the Discussion to the various ways in which our simple CNN is incomplete as a model of reading. In this section we hint at the usage of recurrent connections, but the reviewer does an excellent job of highlighting the importance of top-down connections even in models focusing on early visual processes, which we are very happy to include in this section.

      The presented model needs substantial further development to be able to replicate, both behaviorally and neurally, many of the well-characterized phenomena seen in human behavior and neural recordings that are fundamental hallmarks of human visual word processing. Until that point, it is unclear what novel contributions can be gleaned from correlating low-dimensional model weights from these computational models with human neural data.

      The CNN model we present in this study is a small piece in a bigger effort to employ deep learning techniques to further enhance already existing models of reading. For our revision, we plan to expand on the question of where to go from here and outline our vision on how these techniques could help us better model the phenomena the reviewer speaks of. We agree with the reviewer that there is a long way to go, and we are excited to be a part of it.

      Reviewer #3:

      The paper is rather qualitative in nature. In particular, the authors show that some resemblance exists between the behavior of some layers and some parts of the brain, but it is hard to quantitively understand how strong the resemblances are in each layer, and the exact impact of experimental settings such as the frequency balancing (which seems to only have a very moderate effect according to Figure 5).

      The large focus on a qualitative evaluation of the model is intentional. The ability of the model to reproduce experimental effects (Figure 4) is a pre-requisite for any subsequent qualitative metrics (such as correlation) to be valid. The introduction of frequency balancing is a good example of this. As the reviewer points out, frequency balancing during training has only a moderate impact on correlation scores and from that point of view does not seem impactful. However, when we look at the qualitative evaluation, we see that with a large vocabulary, a model without frequency balancing fails to properly distinguish between consonant strings and (pseudo)words (Figure 4, 5th row). Hence, from the point of view of being able to reproduce experimental effects, frequency balancing has a large impact.

      That said, the reviewer is right to highlight the value of quantitative analysis. An important limitation of the "traditional" models of reading that do not employ deep learning is that they operate in unrealistically simplified environments (e.g. input as predefined line segments, words of a fixed length), which makes a quantitative comparison with brain data problematic. The main benefit that deep learning brings may very well be the increase in scale that makes more direct comparisons with brain data possible. In our revision we will attempt to capitalize on this benefit more. The reviewer has provided some helpful suggestions for doing so in their recommendations.

      The experiments only consider a rather outdated vision model (VGG).

      VGG was designed to use a minimal number of operations (convolution-and-pooling, fully-connected linear steps, ReLU activations, and batch normalization) and rely mostly on scale to solve the classification task. This makes VGG a good place to start our explorations and see how far a basic CNN can take us in terms of explaining experimental MEG effects in visual word recognition. However, we agree with the reviewer that it is easy to envision more advanced models that could potentially explain more. For our revision, we plan to expand on the question of where to go from here and outline our vision on what types of models would be worth investigating and how one may go about doing that in a way that provides insights beyond higher correlation values.

    1. eLife assessment

      This study provides useful insights for anyone focusing on exonic regions when looking into the investigation of DNA fragmentation patterns (fragmentomics) for circulating tumor DNA (ctDNA) data for cancer detection. The method expands the DELFI method of Cristiano and colleagues (2019), but the datasets chosen are not ideal and the analysis remains incomplete.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors are looking to assess fragmentomics effects using the Delfi method in exonic regions (Exome sequencing). They argue that this is to make the test more cost effective by extracting this information from exome sequencing.

      Strengths:

      Well written and explained. Different ML approaches tried.

      Weaknesses:

      To assess fragmentomics in WES, it doesn't seem valid to downsample WGS. WES is generated by a different library preparations so to answer this question, it would be necessary to try this in WES samples. The coverage of WES is generally done much higher because this is necessary to assess mutation calls therefore the approach of combining seems flawed because these were not generated by the same experiment.

      The authors do not really show why they included longer fragment sizes in their model that had previously been excluded from the original Delfi publication

      As a proof of concept this is a good idea but really needs a bit of a rethink on the utility and impact.

    3. Reviewer #2 (Public Review):

      Apiwat Sangphukieo et al. have developed machine learning models, exomeDELFI and xDELFI trained on 4 public datasets comprising 721 cfDNA samples. They demonstrate the exomeDELFI model utilizing DNA from whole exome, exhibits higher AUC values compared to the original DELFI model at equal whole-genome sequencing depth for distinguishing patients with and without cancer. Additionally, the xDELFI model, integrating coverage of overall fragments, fragments within 3 fragment size thresholds (short, medium, long) and fragment size distribution (FSD), resulting in 2,952 features, shows improved enhanced prediction performance. Furthermore, the authors have devised a multiclass machine learning model capable of classifying the tissue of origin for eight cancer types, using distinct tissue-specific fragmentomic patterns in cfDNA from whole-exome regions.

      However, the conclusions drawn in this paper rely heavily on cross-validation of machine learning models constructed from hundreds of samples but employing thousands of features, posing a risk of overfitting. Thus, more rigorous validation is warranted.

      (1) The claim in line 18 is misleading. The authors assert that the high cost of whole-genome sequencing (WGS) limited the application of cfDNA in clinic, and therefore imply their model are more cost-efficient by using fewer DNA molecules only originated from exosmic regions. However, WGS is essential in their analysis. Instead of using whole-exome sequencing data, they extracted DNA molecules from WGS data which fall within gene exome regions for feature extraction and downstream analysis, resulting in the same cost for DNA sequencing. In this regard, xDELFI, which selectively uses DNA from exomic regions, demonstrates inferior performance compared to the DELFI model using all WGS data (AUC: 0.896 vs. 0.920) at the same cost using same WGS data.

      (2) The utilization of WGS data from 4 distinct datasets (Jiang et al., 2015, Snyder et al., 2016, Cristiano et al., 2019 and Sun et al., 2019) raises concerns about potential batch effects arising from different DNA library preparation kits (e.g., Kapa Library Preparation Kit (Kapa Biosystems); ThruPLEX DNA-seq kits (Rubicon Genomics); NEBNext DNA Library Prep Kit for Illumina (New England Biolabs); and KAPA HTP Library Preparation Kit (Kapa Biosystems), receptivity). Each kit may induce varying pre-analytical effects on cfDNA fragmentomic features, as evidenced by differing size distribution profiles (e.g., in Fig.4 in Jiang et al., 2015, the cfDNA size distribution profiles show the major peak at ~166 bp with frequency of ~3%. However, in Fig.1B in Snyder et al., 2016, the major peak at ~166 bp is ~2%). To enhance the robustness of their models, the authors should develop sophisticated normalization pipeline to mitigate batch effects and split training and testing sets without mixing any dataset. The author should demonstrate their model performs equally well between training and testing sets and across different datasets.

      (3) The uneven distribution of cancer patients across different datasets introduces another layer of complexity, potentially confounding the analysis of tissue of origin. In line 300, the authors find that liver, colorectal, and lung cancers had the highest prediction accuracy in their models. However, the cancer patient distribution is not even across different datasets (e.g., liver cancer patients are all from Jiang et al., 2015; colorectal cancer patients are mostly from Sun et al., 2019, and Cristiano et al., 2019; and lung cancer patients are mainly from Cristiano et al., 2019. The potential pre-analytical differences in each dataset, coupled with overwhelming cancer types in each database, underscores the importance of addressing these discrepancies to ensure the validity of tissue of origin predictions.

      (4) In Line 145, the authors mention selection of features used in the xDELFI model but did not specify the number of remaining features in each fragmentomic category post-selection. Providing this information would enhance the transparency and reproducibility of their methodology.

    1. eLife assessment

      Supported by solid evidence, this work provides valuable insights into theanine metabolism and regulation at single-cell resolution. The study paves the way for addressing the multicellular compartmentation of secondary metabolites in various plant systems, making it a valuable resource for future research.

    2. Reviewer #1 (Public Review):

      Summary:

      The study used root tips from semi-hydroponic tea seedlings. The strategy followed sequential steps to draw partial conclusions.

      Initially, protoplasts obtained from root tips were processed for scRNA-seq using the 10x Genomics platform. The sequencing data underwent pre-filtering at both the cell and gene levels, leading to 10,435 cells. These cells were then classified into eight clusters using t-SNE algorithms. The present study scrutinised cell typification through protein sequence similarity analysis of homologs of cell type marker genes. The analysis was conducted to ensure accuracy using validated genes from previous scRNA-seq studies and the model plant Arabidopsis thaliana. The cluster cell annotation was confirmed using in situ RT-PCR analyses. This methodology provided a comprehensive insight into the cellular differentiation of the sample under study. The identified clusters, spanning 1 to 8, have been accurately classified as xylem, epidermal, stem cell niche, cortex/endodermal, root cap, cambium, phloem, and pericycle cells.

      Then, the authors performed a pseudo-time analysis to validate the cell cluster annotation by examining the differentiation pathways of the root cells. Lastly, they created a differentiation heatmap from the xylem and epidermal cells and identified the biological functions associated with the highly expressed genes.

      Upon thoroughly analysing the scRNA-seq data, the researchers delved into the cell heterogeneity of nitrate and ammonium uptake, transport, and nitrogen assimilation into amino acids. The scRNA-seq data was validated by in situ RT-PCR. It allows the localisation of glutamine and alanine biosynthetic enzymes along the cell clusters and confirms that both constituent the primary amino acid metabolism in the root. Such investigation was deemed necessary due to the paramount importance of these processes in theanine biosynthesis since this molecule is synthesised from glutamine and alanine-derived ethylamine.

      Afterwards, the authors analysed the cell-specific expression patterns of the theanine biosynthesis genes, combining the same molecular tools. They concluded that theanine biosynthesis is more enriched in cluster 8 "pericycle cells" than glutamine biosynthesis (Lines 271-272). However, the statement made in Line 250 states that the highest expression levels of genes responsible for glutamine biosynthesis were observed in Clusters 1, 3, 4, 6, and 8, leading to an unclear conclusion.

      The regulation of theanine biosynthesis by the MYB transcription factor family is well-established. In particular, CsMYB6, a transcription factor expressed specifically in roots, has been found to promote theanine biosynthesis by binding to the promoter of the TSI gene responsible for theanine synthesis. However, their findings indicate that CsMYB6 expression is present in Cluster 3 (SCN), Cluster 6 (cambium cells), and Cluster 1 (xylem cells) but not in Cluster 8 (pericycle cells), which is known for its high expression of CsTSI. Similarly, their scRNA-seq data indicated that CsMYB40 and CsHHO3, which activate and repress CsAlaDC expression, respectively, did not show high expression in Cluster 1 (the cell cluster with high CsAlaDC expression). Based on these findings, the authors hypothesised that transcription factors and target genes are not necessarily always highly expressed in the same cells. Nonetheless, additional evidence is essential to substantiate this presumption.

      Lastly, the authors have discovered a novel transcription factor belonging to the Lateral Organ Boundaries Domain (LBD) family known as CsLBD37 that can co-regulate the synthesis of theanine and the development of lateral roots. The authors observed that CsLBD37 is located within the nucleus and can repress the CsAlaDC promoter's activity. To investigate this mechanism further, the authors conducted experiments to determine whether CsLBD37 can inhibit CsAlaDC expression in vivo. They achieved this by creating transiently CsLBD37-silenced or over-expression tea seedlings through antisense oligonucleotide interference and generation of transgenic hairy roots. Based on their findings, the authors hypothesise that CsLBD37 regulates CsAlaDC expression to modulate the synthesis of ethylamine and theanine.

      Additionally, the available literature suggests that the transcription factors belonging to the Lateral Organ Boundaries Domain (LBD) family play a crucial role in regulating the development of lateral roots and secondary root growth. Considering this, they confirmed that pericycle cells exhibit a higher expression of CsLBD37. A recent experiment revealed that overexpression of CsLBD37 in transgenic Arabidopsis thaliana plants led to fewer lateral roots than the wild type. From this observation, the researchers concluded that CsLBD37 regulates lateral root development in tea plants. I respectfully submit that the current conclusion may require additional research before it can be considered definitive.

      Further efforts should be made to investigate the signalling mechanisms that govern CsLBD37 expression to arrive at a more comprehensive understanding of this process. In the context of Arabidopsis lateral root founder cells, the establishment of asymmetry is regulated by LBD16/ASL18 and other related LBD/ASL proteins, as well as the AUXIN RESPONSE FACTORs (ARF7 and ARF19). This is achieved by activating plant-specific transcriptional regulators such as LBD16/ASL18 (Go et al., 2012, https://doi.org/10.1242/dev.071928). On the other hand, other downstream homologues of LBD genes regulated by cytokinin signalling play a role in secondary root growth (Ye et al., 2021, https://doi.org/10.1016/j.cub.2021.05.036). It is imperative to shed light on the hormonal regulation of CsLBD37 expression in order to gain a comprehensive understanding of its involvement in the morphogenic process.

      Strength:

      The manuscript showcases significant dedication and hard work, resulting in valuable insights that serve as a fundamental basis for generating knowledge. The authors skillfully integrated various tools available for this type of study and meticulously presented and illustrated every step involved in the survey. The overall quality of the work is exceptional, and it would be a valuable addition to any academic or professional setting.

      Weaknesses:

      In its current form, the article presents certain weaknesses that need to be addressed to improve its overall quality. Specifically, the authors' conclusions appear to have been drawn in haste without sufficient experimental data and a comprehensive discussion of the entire plant. It is strongly advised that the authors devote additional effort to resolving the abovementioned issues to bolster the article's credibility and dependability. This will ensure that the article is of the highest quality, providing readers with reliable and trustworthy information.

    3. Reviewer #2 (Public Review):

      Summary:

      In their manuscript, Lin et al. present a comprehensive single-cell analysis of tea plant roots. They measured the transcriptomes of 10,435 cells from tea plant root tips, leading to the identification and annotation of 8 distinct cell clusters using marker genes. Through this dataset, they delved into the cell-type-specific expression profiles of genes crucial for the biosynthesis, transport, and storage of theanine, revealing potential multicellular compartmentalization in theanine biosynthesis pathways. Furthermore, their findings highlight CsLBD37 as a novel transcription factor with dual regulatory roles in both theanine biosynthesis and lateral root development.

      Strengths:

      This manuscript provides the first single-cell dataset analysis of roots of the tea plants. It also enables detailed analysis of the specific expression patterns of the gene involved in theanine biosynthesis. Some of these gene expression patterns in roots were further validated through in-situ RT-PCR. Additionally, a novel TF gene CsLBD37's role in regulating theanine biosynthesis was identified through their analysis.

      Weaknesses:

      Several issues need to be addressed:

      (1) The annotation of single-cell clusters (1-8) in Figure 2 could benefit from further improvement. Currently, the authors utilize several key genes, such as CsAAP1, CsLHW, CsWAT1, CsIRX9, CsWOX5, CsGL3, and CsSCR, to annotate cell types. However, it is notable that some of these genes are expressed in only a limited number of cells within their respective clusters, such as CsAAP1, CsLHW, CsGL3, CsIRX9, and CsWOX5. It would be advisable to utilize other marker genes expressed in a higher percentage of cells or employ a combination of multiple marker genes for more accurate annotation.

      (2) Figure 3 could enhance clarity by displaying the trajectory of cell differentiation atop the UMAP, similar to the examples demonstrated by Monocle 3.

      (3) The identification of CsLBD37 primarily relies on bulk RNA-seq data. The manuscript could benefit from elaborating on the role of the single-cell dataset in this context.

      (4) The manuscript's conclusions predominantly rely on the expression patterns of key genes. This reliance might stem from the inherent challenges of tea research, which often faces limitations in exploring molecular mechanisms due to the lack of suitable genetic and molecular methods. The authors may consider discussing this point further in the discussion section.

    4. Reviewer #3 (Public Review):

      Summary:

      Lin et al., performed a scRNA-seq-based study of tea roots, as an example, to elucidate the biosynthesis and regulatory processes for theanine, a root-specific secondary metabolite, and established the first map of tea roots comprised of 8 cell clusters. Their findings contribute to deepening our understanding of the regulation of the synthesis of important flavor substances in tea plant roots. They have presented some innovative ideas.

      It is notable that the authors - based on single-cell analysis results - proposed that TFs and target genes are not necessarily always highly expressed in the same cells. Many of the important TFs they previously identified, along with their target genes (CsTSI or CsAlaDC), were not found in the same cell cluster. Therefore, they proposed a model in which the theanine biosynthesis pathway occurs via multicellular compartmentation and does not require high co-expression levels of transcription factors and their target genes within the same cell cluster. Since it is not known whether the theanine content is absolutely high in the cell cluster 1 containing a high CsAlaDC expression level (due to the lack of cell cluster theanine content determination, which may be a current technical challenge), it is difficult to determine whether this non-coexpressing cell cluster 1 is a precise regulatory mechanism for inhibiting theanine content in plants. In fact, there are actually a small number of cells where TFs and CsAlaDC are simultaneously highly expressed, but the quantity is insufficient to form a separate cluster. However, these few cells may be sufficient to meet the current demands for theanine synthesis. This possibility may better align with some previous experiments and validation results in this study. Moreover, I feel that under normal conditions, plants may not mobilize a large number of cells to synthesize a particular substance. Perhaps, cell cluster 1 is actually a type of cell that inhibits the synthesis of theanine, aiming to prevent excessive theanine production? I do not oppose the model proposed by the author, but I feel there is a possibility as I mentioned. If it seems reasonable, the author may consider adding it to an appropriate position in the discussion.

    1. eLife assessment

      The authors present an important resource to quantify mitochondrial function across many organs in mice. The convincing conclusions are supported by the identification of processes that specifically differ between young and old, or between male and female mice. All reviewers point to the merit of this study in providing a comprehensive resource to contextualize mitochondrial functions across the body. Some further suggestions are made to clarify conclusions in terms of data normalization, interpretations of comparative analyses between organs.

    2. Reviewer #1 (Public Review):

      In this study, Sarver and colleagues carried out an exhaustive analysis of the functioning of various components (Complex I/II/IV) of the mitochondrial electron transport chain (ETC) using a real-time cell metabolic analysis technique (commonly referred as Seahorse oxygen consumption rate (OCR) assay). The authors aimed to generate an atlas of ETC function in about 3 dozen tissue types isolated from all major mammalian organ systems. They used a recently published improvised method by which ETC function can be quantified in freshly frozen tissues. This method enabled them to collect data from almost all organ systems from the same mouse and use many biological replicates (10 mice/experiment) required for an unbiased and statistically robust analysis. Moreover, they studied the influence of sex (male and female) and aging (young adult and old age) on ETC function in these organ systems. The main findings of this study are (1) cells in the heart and kidneys have very active ETC complexes compared to other organ systems, (2) the sex of the mice has little influence on the ETC function, and (3) aging undermined the mitochondrial function in most tissue, but surprisingly in some tissue aging promoted the activity of ETC complexes (e.g., Quadriceps, plantaris muscle, and Diaphragm). Although this study provides a comprehensive outlook on the ETC function in various tissues, the main caveat is that it's too technical and descriptive. The authors didn't invest much effort in putting their findings in the context of the biological function of the tissue analyzed, i.e., some tissues might be more glycolytic than others and have low ETC activity. Also, it is unclear what slight changes in the activity of one or the other ETC complex mean in terms of mitochondrial ATP production. Likely, these small changes reported do not affect the mitochondrial respiration. With such a detailed dataset, the study falls short of deriving more functionally relevant conclusions about the heterogeneity of mitochondrial function in various tissues. In the current format, the readers get lost in the large amount of data presented in a technical manner. Also, it is highly recommended that all the raw data and the values be made available as an Excel sheet (or other user-friendly formats) as a resource to the community.

      Major concerns

      (1) In this study, the authors used the method developed by Acin-Perez and colleagues (EMBO J, 2020) to analyze ETC complex activities in mitochondria derived from the snap-frozen tissue samples. However, the preservation of cellular/mitochondrial integrity in different types of tissues after being snap-frozen was not validated. Additionally, the conservation of mitochondrial respiration in snap-frozen tissues might differ, especially in those derived from old mice. For example, quadriceps (young male/female), plantaris (young male/female), intestinal segments (duodenum), and pancreas preparations show almost no activity (nearly flat OCR in Seahorse assays). For such a comprehensive study, the author must at least validate those tissues where the OCR plots looked suboptimal with the mitochondrial preparations derived from the fresh tissue. Since aging has been identified as the most important effector in this study, it is essential to validate how aging affects respiration in various fresh frozen tissues. Such analysis will ensure that the results presented are not due to the differential preservation of the mitochondrial respiration in the frozen tissue. In addition, such validations will further strengthen the conclusions and promote the broad usability of this "new" method.

      (2) In this study, the authors sampled the maximal activity of ETC complex I, II, and IV, but throughout the manuscript, they discussed the data in the context of mitochondrial function. However, it is unclear how the changes in CI, CII, and CIV activity affect overall mitochondrial function (if at all) and how small changes seen in the maximal activity of one or more complexes affect the efficiency and efficacy of ATP production (OxPhos). The authors report huge variability between the activity of different complexes - in some tissues all three complexes (CI, CII, and CIV) and often in others, just one complex was affected. For example, as presented in Figure 4, there is no difference in CI activity in the hippocampus and cerebellum, but there is a slight change in CII and CIV activity. In contrast, in heart atria, there is a change in the activity of CI but not in CII and CIV. However, the authors still suggest that there is a significant difference in mitochondrial activity (e.g., "Old males showed a striking increase in mitochondrial activity via CI in the heart atria....reduced mitochondrial respiration in the brain cortex..." - Lines 5-7, Page 9). Until and unless a clear justification is provided, the authors should not make these broad claims on mitochondrial respiration based on small changes in the activity of one or more complexes (CI/CII/CIV). With such a data-heavy and descriptive study, it is confusing to track what is relevant and what is not for the functioning of mitochondria.

      (3) What do differences in the ETC complex CI, CII, and CIV activity in the same tissue mean? What role does the differential activity of these complexes (CI, CII, and CIV) play in mitochondrial function? What do changes in Oxphos mean for different tissues? Does that mean the tissue (cells involved) shift more towards glycolysis to derive their energy? In the best world, a few experiments related to the glycolytic state of the cells would have been ideal to solidify their finding further. The authors could have easily used ECAR measurements for some tissues to support their key conclusions.

      (4) The authors further analyzed parameters that significantly changed across their study (Figure 7, 98 data points analyzed). The main caveat of such analysis is that some tissue types would be represented three or even more times (due to changes in the activity of all three complexes - CI, CII, and CIV, and across different ages and sexes), and some just once. Such a method of analysis will skew the interpretation towards a few over-represented organ/tissue systems. Perhaps the authors should separately analyze tissue where all three complexes are affected from those with just one affected complex.

      (5) The current protocol does not provide cell-type-specific resolution and will be unable to identify the cellular source of mitochondrial respiration. This becomes important, especially for those organ systems with tremendous cellular heterogeneity, such as the brain. The authors should discuss whether the observed changes result from an altered mitochondria respiratory capacity or if changes in proportions of cell types in the different conditions studied (young vs. aged) might also contribute to differential mitochondrial respiration.

      (6) Another critical concern of this study is that the same datasets were repeatedly analyzed and reanalyzed throughout the study with almost the same conclusion - namely, aging affects mitochondrial function, and sex-specific differences are limited to very few organs. Although this study has considerable potential, the authors missed the chance to add new insights into the distinct characteristics of mitochondrial activity in various tissue and organ systems. The author should invest significant efforts in putting their data in the context of mitochondrial function.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors utilize a new technique to measure mitochondrial respiration from frozen tissue extracts, which goes around the historical problem of purifying mitochondria prior to analysis, a process that requires a fair amount of time and cannot be easily scaled up.

      Strengths:

      A comprehensive analysis of mitochondrial respiration across tissues, sexes, and two different ages provides foundational knowledge needed in the field.

      Weaknesses:

      While many of the findings are mostly descriptive, this paper provides a large amount of data for the community and can be used as a reference for further studies. As the authors suggest, this is a new atlas of mitochondrial function in mouse. The inclusion of a middle aged time point and a slightly older young point (3-6 months) would be beneficial to the study.

    4. Reviewer #3 (Public Review):

      The aim of the study was to map, a) whether different tissues exhibit different metabolic profiles (this is known already), what differences are found between female and male mice and how the profiles changes with age. In particular, the study recorded the activity of respirasomes, i.e. the concerted activity of mitochondrial respiratory complex chains consisting of CI+CIII2+CIV, CII+CIII2+CIV or CIV alone.

      The strength is certainly the atlas of oxidative metabolism in the whole mouse body, the inclusion of the two different sexes and the comparison between young and old mice. The measurement was performed on frozen tissue, which is possible as already shown (Acin-Perez et al, EMBO J, 2020).

      Weakness:

      The assay reveals the maximum capacity of enzyme activity, which is an artificial situation and may differ from in vivo respiration, as the authors themselves discuss. The material used was a very crude preparation of cells containing mitochondria and other cytosolic compounds and organelles. Thus, the conditions are not well defined and the respiratory chain activity was certainly uncoupled from ATP synthesis. Preparation of more pure mitochondria and testing for coupling would allow evaluation of additional parameters: P/O ratios, feedback mechanism, basal respiration, and ATP-coupled respiration, which reflect in vivo conditions much better. The discussion is rather descriptive and cautious and could lead to some speculations about what could cause the differences in respiration and also what consequences these could have, or what certain changes imply.

      Nevertheless, this study is an important step towards this kind of analysis.

    1. eLife assessment

      This valuable study partially succeeds in providing evidence to support the therapeutic potential of the plant-derived compound eugenol for ameliorating symptoms associated with Type 1 Diabetes, identifying Nuclear factor E2 - related factor (Nrf2) as a mediator of the effects induced by eugenol. Although the study provides some interesting data, the evidence for the proposed mechanism is currently incomplete.

    2. Reviewer #1 (Public Review):

      Summary

      Type 1 diabetes mellitus (T1DM) progression is accelerated by oxidative stress and apoptosis. Eugenol (EUG) is a natural compound previously documented as anti-inflammatory, anti-oxidative, and anti-apoptotic. In this manuscript by Jiang et al., the authors study the effects of EUG on T1DM in MIN6 insulinoma cells and a mouse model of chemically induced T1DM. The authors show that EUG increases nuclear factor E2-related factor 2 (Nrf2) levels. This results in a reduction of pancreatic beta-cell damage, apoptosis, oxidative stress markers, and a recovery of insulin secretion. The authors highlight these effects as indicative of the therapeutic potential of EUG in managing T1DM.

      Strengths

      Relevant, timely, and addresses an interesting question in the field. The authors consistently observe enhanced beta cell functionality following EUG treatment, which makes the compound a promising candidate for T1DM therapy.

      Weaknesses

      The in vivo experiments have too few biological replicates. With an n=3 (as all figure legends indicate) in complex mouse studies such as these, drawing robust conclusions becomes challenging. It is important to reproduce these results in a larger cohort, to validate the conclusions of the authors. Another big concern is the lack of quantifications and statistical analysis throughout the manuscript. Although the authors claim statistical significance in various experiments, the limited information provided makes it difficult to verify. The authors use vague and minimal descriptions of their experiments, which further reduces the reader's comprehension and the reproducibility of the experiments. Finally, the use of Min6 cells as a model for pancreatic beta cells is a strong limitation of this study. Future studies should seek to reproduce these findings in a more translational model and use more relevant in vitro cell systems (eg. Islets).

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, the authors consider the effects of eugenol (EUG), a plant-produced substance known to reduce oxidative stress in various cellular contexts via Nrf2, in alleviating the effects of streptozotocin (STZ), a known rodent beta cell toxin. They claim that EUG treatment would be useful for T1D therapy.

      Strengths:

      The experiments shown are sufficiently clear and rather convincing in documenting that eugenol can revert the effects of streptozotocin on animal physiology as well as beta cell oxidative stress and cell death via activation of Nrf2.

      Weaknesses:

      In my view, there are major concerns with the basic premises of the manuscript.

      (1) While oxidative stress may be implicated in T1D they are neither the primary nor the main reason for autoimmune beta cell destruction. In T1DM, ER stress rather than oxidative stress is the main intracellular mediator of cell death. Thus, the abstract statement that 'oxidative stress plays a major role in T1D' is an exaggeration.

      (2) Streptozotocin induces beta cell death through mechanisms that only partially overlap with autoimmune beta cell destruction. The main players ie beta cell / immune system crosstalk and T-cell mediated cell death are not present in the STZ model.

      In short, because the interplay between the immune system and beta cell-intrinsic factors that trigger and accelerate the disease is completely missing, STZ treatment cannot be used as a T1DM model when beta cell demise mechanisms are concerned. The statement that STZ-treated mice are, in this context, a T1DM model, is misleading.

      There are inconsistencies in the manuscript. Mechanistically, the manuscript remains at a rather superficial level demonstrating that the eugenol effects are mediated by Nrf2 upregulation and a downregulation of its partner inhibitor protein Keap1. How is eugenol penetrating the cell, is there a receptor that could be potentially targeted? Are there intermediary proteins that convey the effect to the Nrf2/Keap1 complex or is eugenol directly disrupting their interaction? What are direct downstream Nrf2 effectors? Besides, streptozotocin is also a powerful DNA alkylating agent. Are these effects mitigated by EUG?

    4. Reviewer #3 (Public Review):

      Summary:

      This study by Jiang et al. aims to establish the streptozotocin (STZ)-induced type 1 diabetes mellitus (T1DM) mouse model in vivo and the STZ-induced pancreatic β cell MIN6 cell model in vitro to explore the protective effects of Eugenol (EUG) on T1DM. The authors tried to elucidate the potential mechanism by which EUG inhibits the NRF2-mediated anti-oxidative stress pathway. Overall, this study is well executed with solid data, offering an intriguing report from animal studies for a potential new treatment strategy for T1DM.

      Strengths:

      The in vivo efficacy study is comprehensive and solid. Given that STZ-induced T1DM is a devastating and harsh model, the in vivo efficacy of this compound is really impressive.

      Weaknesses:

      The Mechanism is linked with the anti-oxidant property of the compound, which is common for many natural compounds, such as flavonoids and polyphenol. However, rarely, this kind of compound has been successfully developed into therapeutics in clinical usage. Indeed, if that is the case, Vitamin C or Vitamin E could be used here as the positive control.

    1. eLife assessment

      This valuable simulation study proposes a new coarse-grained model to explain the effects of CpG methylation on nucleosome wrapping energy and nucleosome positioning. The evidence to support the claims in the paper looks solid, although the novelty of the findings should be discussed in connection with the previous works. This work will be of interest to the researchers working on gene regulation and mechanisms of DNA methylation.

    2. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors used a coarse-grained DNA model (cgNA+) to explore how DNA sequences and CpG methylation/hydroxymethylation influence nucleosome wrapping energy and the probability density of optimal nucleosomal configuration. Their findings indicate that both methylated and hydroxymethylated cytosines lead to increased nucleosome wrapping energy. Additionally, the study demonstrates that methylation of CpG islands increases the probability of nucleosome formation.

      Strengths:

      The major strength of this method is that the model explicitly includes elastic constraints on the positions of phosphate groups facing a histone octamer, as DNA-histone binding site constraints. The authors claim that their model enhances the accuracy and computational efficiency and allows comprehensive calculations of DNA mechanical properties and deformation energies.

      Weaknesses:

      A significant limitation of this study is that the parameter sets for the methylated and hydroxymethylated CpG steps in the cgNA+ model are derived from all-atom molecular dynamics (MD) simulations that suggest that both methylated and hydroxymethylated cytosines increase DNA stiffness and nucleosome wrapping energy (Pérez A, et al. Biophys J. 2012; Battistini, et al. PLOS Comput Biol. 2021). It could predispose the coarse-grained model to replicate these findings. Notably, conflicting results from other all-atom MD simulations, such as those by Ngo T in Nat. Commun. 2016, shows that hydroxymethylated cytosines increase DNA flexibility, contrary to methylated cytosines. If the cgNA+ model was trained on these later parameters or other all-atom force fields, different conclusions might be obtained regarding the effects of methylated and hydroxymethylation on nucleosome formation.

      Despite the training parameters of the cgNA+ model, the results presented in the manuscript indicate that methylated cytosines increase both DNA stiffness and nucleosome wrapping energy. However, when comparing nucleosome occupancy scores with predicted nucleosome wrapping energies and optimal configurations, the authors find that methylated CGIs exhibit higher nucleosome occupancies than unmethylated ones, which seems to contradict their findings from the same paper which showed that increased stiffness should reduce nucleosome formation affinity. In the manuscript, the authors also admit that these conclusions "apparently runs counter to the (perhaps naive) intuition that high nucleosome forming affinity should arise for fragments with low wrapping energy". Previous all-atom MD simulations (Pérez A, et al. Biophys J. 2012; Battistini, et al. PLOS Comput Biol. 202; Ngo T, et al. Nat. Commun. 20161) show that the stiffer DNA upon CpG methylation reduces the affinity of DNA to assemble into nucleosomes or destabilizes nucleosomes. Given these findings, the authors need to address and reconcile these seemingly contradictory results, as the influence of epigenetic modifications on DNA mechanical properties and nucleosome formation are critical aspects of their study.<br /> Understanding the influence of sequence-dependent and epigenetic modifications of DNA on mechanical properties and nucleosome formation is crucial for comprehending various cellular processes. The authors' study, focusing on these aspects, will definitely garner interest from the DNA methylation research community.

    3. Reviewer #2 (Public Review):

      Summary:

      This study uses a coarse-grained model for double-stranded DNA, cgNA+, to assess nucleosome sequence affinity. cgNA+ coarse-grains DNA on the level of bases and accounts also explicitly for the positions of the backbone phosphates. It has been proven to reproduce all-atom MD data very accurately. It is also ideally suited to be incorporated into a nucleosome model because it is known that DNA is bound to the protein core of the nucleosome via the phosphates.

      It is still unclear whether this harmonic model parametrized for unbound DNA is accurate in describing DNA inside the nucleosome. Previous models by other authors, using more coarse-grained models of DNA, have been rather successful in predicting base pair sequence-dependent nucleosome behavior. This is at least the case as far as DNA shape is concerned whereas assessing the role of DNA bendability (something this paper focuses on) has been consistently challenging in all nucleosome models, to my knowledge.

      It is thus of major interest whether this more sophisticated model is also more successful in handling this issue. As far as I can tell the work is technically sound and properly accounts for not only the energy required in wrapping DNA but also entropic effects, namely the change in entropy that DNA experiences when going from the free state to the bound state. The authors make an approximation here which seems to me to be a reasonable first step.

      Of interest is also that the authors have the parameters at hand to study the effect of methylation of CpG-steps. This is especially interesting as it allows us to study a scenario where changes in the physical properties of base pair steps via methylation might influence nucleosome positioning and stability in a cell-type-specific way.

      Overall, this is an important contribution to the question of how the sequence affects nucleosome positioning and affinity. The findings suggest that cgNA+ has something new to offer. But the problem is complex, also on the experimental side, so many questions remain open.

      Strengths:

      The authors use their state-of-the-art coarse-grained DNA model which seems ideally suited to be applied to nucleosomes as it accounts explicitly for the backbone phosphates.

      Weaknesses:

      (1) According to the abstract the authors consider two "scalar measures of the sequence-dependent propensity of DNA to wrap into nucleosomes". One is the bending energy and the other, is the free energy. Specifically in the latter, the authors take the difference between the free energies of the wrapped and the free DNA. Whereas the entropy of the latter can be calculated exactly, they assume that the bound DNA always has the same entropy (independent of sequence) in its more confined state. The problem is the way in which this is written (e.g. below Eq. 6) which is hard to understand. The authors should mention that the negative of Eq. 6 is what physicists call free energy, namely especially the free energy difference between bound and free DNA.

      (2) In Eq. 5 the authors introduce penalty coefficients c_i. They write that values are "set by numerical experiment to keep distances ... within the ranges observed in the PDB structure, while avoiding sterical clashes in DNA." This is rather vague, especially since it is unclear to me what type of sterical clashes might occur. Figure 1 shows then a comparison between crystal structures and simulated structures. They are reasonably similar but standard deviations in the fluctuations of the simulation are smaller than in the experiments. Why did the authors not choose smaller c_i-values to have a better fit? Do smaller values lead to unwanted large fluctuations that would lead to steric clashes between the two DNA turns? I also wonder what side views of the nucleosomes look like (experiments and simulations) and whether in this side view larger fluctuations of the phosphates can be observed in the simulation that would eventually lead to turn-turn clashes for smaller c_i-values.

    4. Reviewer #3 (Public Review):

      Summary:

      In this study, the authors utilize biophysical modeling to investigate differences in free energies and nucleosomal configuration probability density of CpG islands and nonmethylated regions in the genome. Toward this goal, they develop and apply the cgNA+ coarse-grained model, an extension of their prior molecular modeling framework.

      Strengths:

      The study utilizes biophysical modeling to gain mechanistic insight into nucleosomal occupancy differences in CpG and nonmethylated regions in the genome.

      Weaknesses:

      Although the overall study is interesting, the manuscripts need more clarity in places. Moreover, the rationale and conclusion for some of the analyses are not well described.

    1. eLife assessment

      This study presents valuable findings linking circHMGCS1 and miR-4521 in diabetes-induced vascular endothelial dysfunction. The evidence supporting the claims of the authors is solid, but addressing concerns around how certain experiments were performed and controlled could enhance clarity and further strengthen the study. The work will be of interest to biomedical scientists working with cardiovascular and/or RNA biology, particularly those studying diabetes.

    2. Reviewer #1 (Public Review):

      Summary:

      HMGCS1, 3-hydroxy-3-methylglutaryl-CoA synthase1 is predicted to be involved in Acetyl-CoA metabolic process and mevalonate-cholesterol pathway. To induce diet-induced diabetes, they fed wild-type littermates either a standard chow (Control) or a high fat-high sucrose (HFHG) diet, where the diet composition consisted of 60% fat, 20% protein, and 20% carbohydrate (H10060, Hfkbio, China). The dietary regimen was maintained for 14 weeks. Throughout this period, body weight and fasting blood glucose (FBG) levels were measured on a weekly basis. Although the authors induced diabetes with a diet also rich in fat, the cholesterol concentration or metabolism was not investigated. After the treatment, were the animals with endothelial dysfunction? How was the blood pressure of the animals?

      Strengths:

      To explore the potential role of circHMGCS1 in regulating endothelial cell function, the authors cloned exons 2-7 of HMGCS1 into lentiviral vectors for ectopic overexpression of circHMGCS1 (Figure S2). The authors could use this experiment as a concept proof and investigate the glucose concentration in the cell culture medium. Is the pLV-circ HMGCS1 transduction in HUVEC increasing the glucose release? (Line 163)

      Weaknesses:

      (1) Pg 20. The cells were transfected with miR-4521 mimics, miR-inhibitor, or miR-NC and incubated for 24 hours. Subsequently, the cells were treated with PAHG for another 24 hours.

      Were the cells transfected with lipofectanine? The protocol or the lipofectamine kit used should be described. The lipofectamine protocol suggests using an incubation time of 72 hours. Why did the authors incubate for only 24 hours?

      If the authors did the mimic and inhibitor curves, these should be added to the supplementary figures. Please, describe the miRNA mimic and antagomir concentration used in cell culture.

      (2) Pg 20, line 507. What was the miR-4521 agomiR used to treatment of the animals?

      (3) Figure 1B. The results are showing the RT-qPCR for only 5 circRNA, however, the results show 48 circRNAs were upregulated, and 18 were downregulated (Figure S1D). Why were the other cicRNAs not confirmed? The circRNAs upregulated with high expression are not necessarily with the best differential expression comparing control vs. PAHG groups. Furthermore, Figure 1A and S1D show circRNAs downregulated also with high expression. Why were these circRNAs not confirmed?

      (4) Figure 1B shows the relative circRNAs expression. Were host genes expressed in the same direction?

      (5) Line 128. The circRNA RT-qPCR methodology was not described. The methodology should be described in detail in the Methods Session.

      (6) Line 699. The relative gene expression was calculated using the 2-ΔΔCt method. This is not correct, the expression for miRNA and gene expression are represented in percentage of control.

      (7) Line 630. Detection of ROS for tissue and cells. The methodology for tissue was described, but not for cells.

      (8) Line 796. RNA Fluorescent In Situ Hybridization (RNA-FISH). Figure 1F shows that the RNA-Fluorescence in situ hybridization (RNA-FISH) confirmed the robust expression of cytoplasmic circHMGCS1 in HUVECs (Figure 1F). However, in the methods, lines 804 and 805 described the probes targeting circMAP3K5 and miR-4521 were applied to the sections. Hybridization was performed in a humid chamber at 37{degree sign}C overnight. Is it correct?

      (9) Line 14. Fig 1-H. The authors discuss qRT-PCR demonstrated that circHMGCS1 displayed a stable half-life exceeding 24 h, whereas the linear transcript HMGCS1 mRNA had a half-life less than 8 h (Figure 1H).<br /> Several of the antibodies may contain trace amounts of RNases that could degrade target RNA and could result in loss of RNA hybridization signal or gene expression. Thus, all of the solutions should contain RNase inhibitors. The HMGCS1 mRNA expression could be degraded over the incubation time (0-24hs) leading to incorrect results. Moreover, in the methods is not mentioned if the RNAse inhibitor was used. Please, could the authors discuss and provide information?

      (10) Further experiments demonstrated that the overexpression of circHMGCS1 stimulated the expression of adhesion molecules (VCAM1, ICAM1, and ET-1) (Figures 2B and 2C), suggesting that circHMGCS1 is involved in VED. How were these genes expressed in the RNA-seq?

      (11) Line 256. By contrast, the combined treatment of circHMGCS1 and miR-4521 agomir did not significantly affect the body weight and blood glucose levels. OGTT and ITT experiments demonstrated that miR-4521 agomir considerably enhanced glucose tolerance and insulin resistance in diabetic mice (Figures 5C, 5D, and Figures S5B and S5C). Why didi the miR-4521 agomir treatment considerably enhance glucose tolerance and insulin resistance in diabetic mice, but not the blood glucose levels?

      (12) In the experiments related to pull-down, the authors performed Biotin-coupled miR-4521 or its mutant probe, which was employed for circHMGCS1 pull-down. This result only confirms the Luciferase experiments shown in Figure 4A. The experiment that the authors need to perform is pull-down using a biotin-labeled antisense oligo (ASO) targeting the circHMGCS1 backsplice junction sequence followed by pulldown with streptavidin-conjugated magnetic beads to capture the associated miRNAs and RNA binding proteins (RBPs). Also, the ASO pulldown assay can be coupled to miRNA RT-qPCR and western blotting analysis to confirm the association of miRNAs and RBPs predicted to interact with the target circRNA.

      (13) In Figure 5, the authors showed that the results suggest that miR-4521 can inhibit the occurrence of diabetes, whereas circHMGCS1 specifically dampens the function of miR-4521, weakening its protective effect against diabetes. In this context, what are the endogenous target genes for the miR-4521 that could be regulating diabetes?

      (14) In the western blot of Figure 5, the β-actin band appears to be different from the genes analyzed. Was the same membrane used for the four proteins? The Ponceau S membrane should be provided.

      (15) Why did the authors use AAV9, since the AAV9 has a tropism for the liver, heart, skeletal muscle, and not to endothelial vessels?

    3. Reviewer #2 (Public Review):

      Summary:

      The authors observed an aggravated vascular endothelial dysfunction upon overexpressing circHMGCS1 and inhibiting miR-4521. This study discovered that circHMGCS1 promotes arginase 1 expression by sponging miR-4521, which accelerated the impairment of vascular endothelial function.

      Strengths:

      The study is systematic and establishes the regulatory role of the circHMGCS1-miR-4521 axis in diabetes-induced cardiovascular diseases.

      Weaknesses:

      (1) The authors selected the miR-4521 as the target based on their reduced expression upon circHMGCS1 overexpression. Since the miRNA level is downregulated, the downstream target gene is expected to be upregulated even in the absence of circRNA. The changes in miRNA expression opposite to the levels of target circRNA could be through Target RNA-Directed MicroRNA Degradation. In addition, miRNA can also be stabilized by circRNAs. Hence, selecting miRNA targets based on opposite expression patterns and concluding miRNA sponging by circRNA needs further evidence of direct interactions.

      (2) The majority of the experiments were performed with an overexpression vector which can generate a lot of linear RNAs along with circRNAs. The linear RNAs produced by the overexpression vectors can have a similar effect to the circRNA due to sequence identity.

      (3) There is a lack of data of circHMGCS1 silencing and its effect on target miRNA & mRNAs.

    1. eLife assessment

      This useful manuscript presents an interesting multi-modal omics analysis of lung adenocarcinoma patients with distinct clinical clusters, mutation hotspots, and potential risk factors identified in cases linked to air pollution. The findings show potential for high clinical and therapeutic impact. However, some of the conclusions are incomplete as they are based on correlative or suggestive findings, and would benefit from further functional investigation and validating approaches.

    2. Reviewer #1 (Public Review):

      Summary:

      This is a well-written and detailed manuscript showing important results on the molecular profile of 4 different cohorts of female patients with lung cancer.

      The authors conducted comprehensive multi-omic profiling of air-pollution-associated LUAD to study the roles of the air pollutant BaP. Utilizing multi-omic clustering and mutation-informed interface analysis, potential novel therapeutic strategies were identified.

      Strengths:

      The authors used several different methods to identify potential novel targets for therapeutic interventions.

      Weaknesses:

      Statistical test results need to be provided in comparisons between cohorts.

    3. Reviewer #2 (Public Review):

      Summary:

      Zhang et al. performed a proteogenomic analysis of lung adenocarcinoma (LUAD) in 169 female never-smokers from the Xuanwei area (XWLC) in China. These analyses reveal that XWLC is a distinct subtype of LUAD and that BaP is a major risk factor associated with EGFR G719X mutations found in the XWLC cohort. Four subtypes of XWLC were classified with unique features based on multi-omics data clustering.

      Strengths:

      The authors made great efforts in performing several large-scale proteogenomic analyses and characterizing molecular features of XWLCs. Datasets from this study will be a valuable resource to further explore the etiology and therapeutic strategies of air-pollution-associated lung cancers, particularly for XWLC.

      Weaknesses:

      (1) While analyzing and interpreting the datasets, however, this reviewer thinks that authors should provide more detailed procedures of (i) data processing, (ii) justification for choosing methods of various analyses, and (iii) justification of focusing on a few target gene/proteins in the datasets for further validation in the main text.

      (2) Importantly, while providing the large datasets, validating key findings is minimally performed, and surprisingly there is no interrogation of XWLC drug response/efficacy based on their findings, which makes this manuscript descriptive and incomplete rather than conclusive. For example, testing the efficacy of XWLC response to afatinib combined with other drugs targeting activated kinases in EGFR G719X mutated XWLC tumors would be one way to validate their datasets and new therapeutic options.

      (3) The authors found MAD1 and TPRN are novel therapeutic targets in XWLC. Are these two genes more frequently mutated in one subtype than the other 3 XWLC subtypes? How these mutations could be targeted in patients?

      (4) In Figures 2a and b: while Figure 2a shows distinct genomic mutations among each LC cohort, Figure 2b shows similarity in affected oncogenic pathways (cell cycle, Hippo, NOTCH, PI3K, RTK-RAS, and WNT) between XWLC and TNLC/CNLC. Considering that different genomic mutations could converge into common pathways and biological processes, wouldn't these results indicate commonalities among XWLC, TNLC, and CNLC? How about other oncogenic pathways not shown in Figure 2b?

      (5) In Figure 2c, how and why were the four genes (EGFR, TP53, RBM10, KRAS) selected? What about other genes? In this regard, given tumor genome sequencing was done, it would be more informative to provide the oncoprints of XWLC, TSLC, TNLC, and CNLC for complete genomic alteration comparison.

      (6) Supplementary Table 11 shows a number of mutations at the interface and length of interface between a given protein-protein interaction pair. Such that, it does not provide what mutation(s) in a given PPI interface is found in each LC cohort. For example, it fails to provide whether MAD1 R558H and TPRN H550Q mutations are found significantly in each LC cohort.

      (7) Figure 7c and d are simulation data not from an actual binding assay. The authors should perform a biochemical binding assay with proteins or show that the mutation significantly alters the interaction to support the conclusion.

    4. Reviewer #3 (Public Review):

      Summary:

      The manuscript from Zhang et al. utilizes a multi-omics approach to analyze lung adenocarcinoma cases in female never smokers from the Xuanwei area (XWLC cohort) compared with cases associated with smoking or other endogenous factors to identify mutational signatures and proteome changes in lung cancers associated with air pollution. Mutational signature analysis revealed a mutation hotspot, EGFR-G719X, potentially associated with BaP exposure, in 20% of the XWLC cohort. This correlated with predicted MAPK pathway activations and worse outcomes relative to other EGFR mutations. Multi-omics clustering, including RNA-seq, proteomics, and phosphoproteomics identified 4 clusters with the XWLC cohort, with additional feature analysis pathway activation, genetic differences, and radiomic features to investigate clinical diagnostic and therapeutic strategy potential for each subgroup. The study, which nicely combines multi-modal omics, presents potentially important findings, that could inform clinicians with enhanced diagnosis and therapeutic strategies for more personalized or targeted treatments in lung adenocarcinoma associated with air pollution. The authors successfully identify four distinct clusters with the XWLC cohort, with distinct diagnostic characteristics and potential targets. However, many validating experiments must be performed, and data supporting BaP exposure linkage to XWLC subtypes is suggestive but incomplete to conclusively support this claim. Thus, while the manuscript presents important findings with the potential for significant clinical impact, the data presented are incomplete in supporting some of the claims and would benefit from validation experiments.

      Strengths:

      Integration of omics data from multimodalities is a tremendous strength of the manuscript, allowing for cross-modal comparison/validation of results, functional pathway analysis, and a wealth of data to identify clinically relevant case clusters at the transcriptomic, translational, and post-translational levels. The inclusion of phosphoproteomics is an additional strength, as many pathways are functional and therefore biologically relevant actions center around activation of proteins and effectors via kinase and phosphatase activity without necessarily altering the expression of the genes or proteins.

      Clustering analysis provides clinically relevant information with strong therapeutic potential both from a diagnostic and treatment perspective. This is bolstered by the individual microbiota, radiographic, wound healing, outcomes, and other functional analyses to further characterize these distinct subtypes.

      Visually the figures are well-designed and presented and for the most part easy to follow. Summary figures/histograms of proteogenomic data, and specifically highlighted genes/proteins are well presented.

      Molecular dynamics simulations and 3D binding analysis are nice additions.

      While I don't necessarily agree with the authors' interpretation of the microbiota data, the experiment and results are very interesting, and clustering information can be gleaned from this data.

      Weaknesses:

      Statistical methods for assessing significance may not always be appropriate.

      Necessary validating experiments are lacking for some of the major conclusions of the paper.

      Many of the conclusions are based on correlative or suggestive results, and the data is not always substantive to support them.

      Experimental design is not always appropriate, sometimes lacking necessary controls or large disparity in sample sizes.

      Conclusions are sometimes overstated without validating measures, such as in BaP exposure association with the identified hotspot, kinase activation analysis, or the EMT function.

    1. eLife assessment

      This valuable work provides novel insights into the substrate binding mechanism of a tripartite ATP-independent periplasmic (TRAP) transporter. The structural analysis is convincing, but evidence to support some of the conclusions regarding the mechanism is incomplete. This study will be of interest to the membrane transport and bacterial biochemistry communities.

    2. Reviewer #1 (Public Review):

      Summary:

      This manuscript reports the substrate-bound structure of SiaQM from F. nucleatum, which is the membrane component of a Neu5Ac-specific Tripartite ATP-dependent Periplasmic (TRAP) transporter. Until recently, there was no experimentally derived structural information regarding the membrane components of the TRAP transporter, limiting our understanding of the transport mechanism. Since 2022, there have been 3 different studies reporting the structures of the membrane components of Neu5Ac-specific TRAP transporters. While it was possible to narrow down the binding site location by comparing the structures to proteins of the same fold, a structure with substrate bound has been missing. In this work, the authors report the Na+-bound state and the Na+ plus Neu5Ac state of FnSiaQM, revealing information regarding substrate coordination. In previous studies, 2 Na+ ion sites were identified. Here, the authors also tentatively assign a 3rd Na+ site. The authors reconstitute the transporter to assess the effects of mutating the binding site residues they identified in their structures. Of the 2 positions tested, only one of them appears to be critical to substrate binding.

      Strengths:

      The main strength of this work is the capture of the substrate-bound state of SiaQM, which provides insight into an important part of the transport cycle.

      Weaknesses:

      The main weakness is the lack of experimental validation of the structural findings. The authors identified the Neu5Ac binding site, but only tested 2 residues for their involvement in substrate interactions, which was very limited. The authors tentatively identified a 3rd Na+ binding site, which if true would be an impactful finding, but this site was not tested for its contribution to Na+ dependent transport, and the authors themselves report that the structural evidence is not wholly convincing. This lack of experimental validation undermines the confidence of the findings. However, the reporting of these new data is important as it will facilitate follow-up studies by the authors or other researchers.

    3. Reviewer #2 (Public Review):

      In this exciting new paper from the Ramaswamy group at Purdue, the authors provide a new structure of the membrane domains of a tripartite ATP-independent periplasmic (TRAP) transporter for the important sugar acid, N-acetylneuraminic acid or sialic acid (Neu5Ac). While there have been a number of other structures in the last couple of years (the first for any TRAP-T) this is the first to trap the structure with Neu5Ac bound to the membrane domains. This is an important breakthrough as in this system the ligand is delivered by a substrate-binding protein (SBP), in this case, called SiaP, where Neu5Ac binding is well studied but the 'hand over' to the membrane component is not clear. The structure of the membrane domains, SiaQM, revealed strong similarities to other SBP-independent Na+-dependent carriers that use an elevator mechanism and have defined Na+ and ligand binding sites. Here they solve the cryo-EM structure of the protein from the bacterial oral pathogen Fusobacterium nucleatum and identify a potential third (and theoretically predicted) Na+ binding site but also locate for the first time the Neu5Ac binding site. While this sits in a region of the protein that one might expect it to sit, based on comparison to other transporters like VcINDY, it provides the first molecular details of the binding site architecture and identifies a key role for Ser300 in the transport process, which their structure suggests coordinates the carboxylate group of Neu5Ac. The work also uses biochemical methods to confirm the transporter from F. nucleatum is active and similar to those used by selected other human and animal pathogens and now provides a framework for the design of inhibitors of these systems.

      The strengths of the paper lie in the locating of Neu5Ac bound to SiaQM, providing important new information on how TRAP transporters function. The complementary biochemical analysis also confirms that this is not an atypical system and that the results are likely true for all sialic acid-specific TRAP systems.

      The main weakness is the lack of follow-up on the identified binding site in terms of structure-function analysis. While Ser300 is shown to be important, only one other residue is mutated and a much more extensive analysis of the newly identified binding site would have been useful.

    4. Reviewer #3 (Public Review):

      The manuscript by Goyal et al reports substrate-bound and substrate-free structures of a tripartite ATP-independent periplasmic (TRAP) transporter from a previously uncharacterized homolog, F. nucleatum. This is one of the most mechanistically fascinating transporter families, by means of its QM domain (the domain reported in his manuscript) operating as a monomeric 'elevator', and its P domain functioning as a substrate-binding 'operator' that is required to deliver the substrate to the QM domain; together, this is termed an 'elevator with an operator' mechanism. Remarkably, previous structures had not demonstrated the substrate Neu5Ac bound. In addition, they confirm the previously reported Na+ binding sites and report a new metal binding site in the transporter, which seems to be mechanistically relevant. Finally, they mutate the substrate binding site and use proteoliposomal uptake assays to show the mechanistic relevance of the proposed substrate binding residues.

      The structures are of good quality, the functional data is robust, the text is well-written, and the authors are appropriately careful with their interpretations. Determination of a substrate-bound structure is an important achievement and fills an important gap in the 'elevator with an operator' mechanism. Nevertheless, I have concerns with the data presentation, which in its current state does not intuitively demonstrate the discussed findings. Furthermore, the structural analysis appears limited, and even slight improvements in data processing and resulting resolution would greatly improve the authors' claims. I have several suggestions to hopefully improve the clarity and quality of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This structural and biochemical study of the mouse homolog of acidic mammalian chitinase (AMCase) enhances our understanding of the pH-dependent activity and catalytic properties of mouse AMCase and sheds light on its adaptation to different physiological pH environments. The methods and analysis of data are solid, providing several lines of evidence to support a development of mechanistic hypotheses. While the findings and interpretation will be valuable to those studying AMCase in mice, the broader significance, including extension of the results to other species including human, remain unclear.

      Public Reviews:

      Reviewer #1 (Public Review):

      General comments:

      This paper investigates the pH-specific enzymatic activity of mouse acidic mammalian chitinase (AMCase) and aims to elucidate its function's underlying mechanisms. The authors employ a comprehensive approach, including hydrolysis assays, X-ray crystallography, theoretical calculations of pKa values, and molecular dynamics simulations to observe the behavior of mouse AMCase and explore the structural features influencing its pH-dependent activity.

      The study's key findings include determining kinetic parameters (Kcat and Km) under a broad range of pH conditions, spanning from strong acid to neutral. The results reveal pH-dependent changes in enzymatic activity, suggesting that mouse AMCase employs different mechanisms for protonation of the catalytic glutamic acid residue and the neighboring two aspartic acids at the catalytic motif under distinct pH conditions.

      The novelty of this research lies in the observation of structural rearrangements and the identification of pH-dependent mechanisms in mouse AMCase, offering a unique perspective on its enzymatic activity compared to other enzymes. By investigating the distinct protonation mechanisms and their relationship to pH, the authors reveal the adaptive nature of mouse AMCase, highlighting its ability to adjust its catalytic behavior in response to varying pH conditions. These insights contribute to our understanding of the pH-specific enzymatic activity of mouse AMCase and provide valuable information about its adaptation to different physiological conditions.

      Overall, the study enhances our understanding of the pH-dependent activity and catalytic properties of mouse AMCase and sheds light on its adaptation to different physiological pH environments.

      Reviewer #2 (Public Review):

      Summary:

      In this study of the mouse homolog of acidic mammalian chitinase, the overall goal is to provide a mechanistic explanation for the unusual observation of two pH optima for the enzyme. The study includes biochemical assays to establish kinetic parameters at different solution pH, structural studies of enzyme/substrate complexes, and theoretical analysis of amino acid side chain pKas and molecular dynamics.

      Strengths:

      The biochemical assays are rigorous and nicely complemented by the structural and computational analysis. The mechanistic proposal that results from the study is well rationalized by the observations in the study.

      Weaknesses:

      The overall significance of the work could be made more clear. Additional details could be provided about the limitations of prior biochemical studies of mAMC that warranted the kinetic analysis. The mouse enzyme seems unique in terms of its behavior at high and low pH, so it remains unclear how the work will enhance broader understanding of this enzyme class. It was also not clear can the findings be used for therapeutic purposes, as detailed in the abstract, if the human enzyme works differently.

      We have edited the paper to address these concerns

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) Regarding the pH profiles of mouse AMCase, previous studies have reported its activity at pH 2.0 and within the pH range of 3-7. In this paper, the authors conducted kinetic measurements and showed that pH 6.5 is optimal for kcat/Km. The authors emphasize the significance of mouse AMCase's activity in the neutral region, particularly at pH 6.5, for understanding its physiological relevance in humans. To provide a comprehensive overview, it would be valuable for the authors to summarize the findings from previous and current studies, discuss their implications for future pulmonary therapy in humans, and cite relevant literature. Additionally, the authors should highlight their research's specific contributions and novel findings, such as the determination of kinetic parameters (Kcat and Km) under different pH conditions. Emphasizing why previous studies may have required these observations and underscoring the importance of the present findings in addressing those knowledge gaps will help readers understand the significance of the study and its impact on the field of enzymology.

      We thank the reviewer for this comment. In keeping with the knowledge gaps addressed directly by this paper, we have not augmented the discussion of future pulmonary therapy in humans. We have summarized the present findings at the end of the introduction as follows:

      “We measured the mAMCase hydrolysis of chitin, which revealed significant activity increase under more acidic conditions compared to neutral or basic conditions. To understand the relationship between catalytic residue protonation state and pH-dependent enzyme activity, we calculated the theoretical pKa of the active site residues and performed molecular dynamics (MD) simulations of mAMCase at various pHs. We also directly observed conformational and chemical features of mAMCase between pH 4.74 to 5.60 by solving X-ray crystal structures of mAMCase in complex with oligomeric GlcNAcn across this range.”

      (2) Regarding the implications of the pKa values and Asp138 orientation for the pH optima, it would be valuable for the authors to discuss the variations in optimal activity by pH among GH-18 chitinases and investigate the underlying factors contributing to these differences. In particular, exploring the role of Asp138 orientation in chitotriosidase, another mammalian chitinase, would provide important insights. Chitotriosidase is known to be inactive at pH 2.0, and it would be interesting to investigate whether the observed orientation of Asp138 towards Glu140 in mouse AMCase for pH 2.0 activity is lacking in chitotriosidase.

      There are similar rotations of the two acidic residues in the literature on Chit1. The variety of crystal pH conditions and the lack of a straightforward mechanism for pKa shifts in AMCase make it difficult to draw a comparison to why Chit1 is inactive at low pH, but this is an interesting area for future study. See a more full discussion in: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2760363/

      Furthermore, considering the lower activity of human AMCase at pH 2.0, it would be worthwhile to examine whether the Asp138 orientation towards Glu140, as observed in mouse AMCase, is also absent in human AMCase. Exploring this aspect will help determine if the orientation of Asp138 plays a critical role in pH-dependent activity in human AMCase.

      The situation for hAMCase is similar to Chit1 as the rotations observed here for mAMCase are also present. It is not the whether Asp138 can rotate, but rather the relevant energetic penalties as we discuss in the manuscript.

      (3) In a previous study by Okawa et al.(Loss and gain of human acidic mammalian chitinase activity by nonsynonymous SNPs. Mol Biol Evol 33, 3183-3193, 2016), it was reported that specific amino acid substitutions (N45D, D47N, and R61M) encoded by nonsynonymous single nucleotide polymorphisms (nsSNPs) in the N-terminal region of human AMCase had distinct effects on its chitinolytic activity. Introducing these three residues (N45D, D47N, and R61M) could activate human AMCase. This activation significantly shifted the optimal pH from 4-5 to 2.0.

      Considering the significant impact of these amino acid substitutions on the pH-dependent activity of human AMCase, the authors should discuss this point in the manuscript's discussion section. Incorporating the findings and relating them to the current study's observations on pH optima and Asp138 orientation can provide a comprehensive understanding of the factors influencing pH-dependent activity in AMCase.

      We added a citation and dicuss how the mutations identified by this study could potentially shift the pKa of key catalytic residues:

      “Okawa et al identified how primate AMCase lost activity by integration of specific, potentially pKa-shifting, mutations relative to the mouse counterpart42b.”

      (4) To further strengthen the discussion, the authors could explore the ancestral insectivorous nature of placental mammals and the differences in chitinase activity between herbivorous and omnivorous species. Incorporating these aspects would add depth and relevance to the overall discussion of AMCase. AMCase is an enzyme known for its role in digesting insect chitin in the stomachs of various insectivorous and omnivorous animals, including bats, mice, chickens, pigs, pangolins, common marmosets, and crab-eating monkeys 1-7. However, in certain animals, such as dogs (carnivores) and cattle (herbivores), AMCase expression and activity are significantly low, leading to impaired chitin digestion 8. These observations suggest a connection between dietary habits and the expression and activity of the AMCase gene, ultimately influencing chitin digestibility across different animal species 8.

      (1) Strobelet al. (2013). Insectivorous bats digest chitin in the stomach using acidic mammalian chitinase. PloS one 8, e72770.

      (2) Ohno et al. (2016). Acidic mammalian chitinase is a proteases-resistant glycosidase in mouse digestive system. Sci Rep 6, 37756.

      (3) Tabata et al. (2017). Gastric and intestinal proteases resistance of chicken acidic chitinase nominates chitin-containing organisms for alternative whole edible diets for poultry. Sci Rep 7, 6662.

      (4) Tabata et al. (2017). Protease resistance of porcine acidic mammalian chitinase under gastrointestinal conditions implies that chitin-containing organisms can be sustainable dietary resources. Sci Rep 7, 12963.

      (5) Ma et al. (2018). Acidic mammalian chitinase gene is highly expressed in the special oxyntic glands of Manis javanica. FEBS Open Bio 8, 1247-1255.

      (6) Tabata et al. (2019). High expression of acidic chitinase and chitin digestibility in the stomach of common marmoset (Callithrix jacchus), an insectivorous nonhuman primate. Sci. Rep. 9. 159.

      (7) Uehara et al. (2021). Robust chitinolytic activity of crab-eating monkey (Macaca fascicularis) acidic chitinase under a broad pH and temperature range. Sci. Rep. 11, 15470.

      (8) Tabata et al. (2018). Chitin digestibility is dependent on feeding behaviors, which determine acidic chitinase mRNA levels in mammalian and poultry stomachs. Sci Rep 8, 1461.

      This overall point is covered by our brief discussion on diet differences:

      “However, hAMCase is likely too destabilized at low pH to observe an increase in _k_cat. hAMCase may be under less pressure to maintain high activity at low pH due to humans’ noninsect-based diet, which contains less chitin compared to other mammals with primarily insect-based diets42. “

      (5) It is important for the authors to clearly state the limitations of their simulations and emphasize the need for experimental validation or additional supporting evidence. This will provide transparency and enable readers to understand the boundaries of the study's findings. A comprehensive discussion of limitations would contribute to a more robust interpretation of the results.

      We added a sentence to the discussion:

      “Our simulations have important limitations that could be overcome by quantum mechanical simulations that allow for changes in protonation state and improved consideration of polarizability.”

      Minor comments:

      (1) Regarding the naming of AMCase, it is important to accurately describe it based on its acidic isoelectric point rather than its enzymatic activity under acidic conditions based on the original paper (Reference #14 (Boot, R. G. et al. Identification of a novel acidic mammalian chitinase distinct from chitotriosidase. J. Biol. Chem. 276, 6770-6778 (2001)).

      We have made this modification

      (2) In the introduction, providing more context regarding the terminology of acidic mammalian chitinase (AMCase) would be beneficial. While AMCase was initially discovered in mice and humans, subsequent research has revealed its presence in various vertebrates, including birds, fish, and other species. Therefore, it would be appropriate to include the alternative enzyme name, Chia (chitinase, acidic), in the introduction to reflect its broader distribution across different organisms. This clarification would enhance the readers' understanding of the enzyme's taxonomy and facilitate further exploration of its functional significance in diverse biological systems.

      We have made this modification

      (3) The authors mention that AMCase is active in tissues with neutral pHs, such as the lung. However, it is important to consider that the pH in the lung is lower, around 5, due to the presence of dissolved CO2 that forms carbonic acid. The lung microenvironment is known to vary, and specific regions or conditions within the lung may have slightly different pH levels. By addressing the pH conditions in the lungs and their relationship to AMCase's activity, the authors can enhance our understanding of the enzyme's function within its physiological context. A thorough discussion of the specific pH conditions in the lung and their implications for AMCase's activity would provide valuable insights into the enzyme's role in lung pathophysiology.

      To keep the focus on the insights we have made, we have elected not to expand this discussion.

      (4) It would be helpful for the authors to provide more information about the substrate or products of AMCase. The basic X-ray crystal structures used in this study are GlcNAc2 or GlcNAc3, known products of AMCase. Including details about the specific ligands involved in the enzymatic reactions would enhance the understanding of the study's focus.

      We are unclear about what this means - and since it is a minor comment, we have elected not to change the discussion of substrates here.

      (5) The authors should critically evaluate the inclusion of the term "chitin-binding" in the Abstract and Introduction. Suppose substantial evidence or discussion regarding the specific chitin-binding properties of the enzyme or its relevance to the immune response needs to be included. In that case, removing or modifying that statement might be appropriate.

      We are unclear about what this means - and since it is a minor comment, we have elected not to change the discussion of “chitin-binding” here.

      (6) The authors developed an endpoint assay to measure the activity of mouse AMCase across a broad pH range, allowing for direct measurement of kinetic parameters. The authors should provide a more detailed description of the methods used, including any specific modifications made to the previous assay, to ensure reproducibility and facilitate further research in the field. It is important to clearly show the novelty of their endpoint assay compared to previous methods employed in other reports. The authors should also explain how their modified endpoint assay differs from existing assays and highlight its advancements or improvements. This will help readers understand the unique features and contributions of the assay in the context of previous methods.

      We have included a detailed method description and figures already. See also our previous paper by Barad which includes other, related, assays.

      (7) The authors suggest that mouse AMCase may be subject to product inhibition, potentially due to its transglycosylation activity, which can affect the Michaelis-Menten model predictions at high substrate concentrations. However, the reviewer needed help understanding the specific impact of transglycosylation on the kinetic parameters. It would be helpful for the authors to provide a more appropriate and detailed explanation, clarifying how transglycosylation activity influences the kinetic behavior of AMCase and its implications for the observed results.

      The experiments to conclusively demonstrate this are beyond our current capabilities.

      (8) In the Abstract, the authors state, "We also solved high resolution crystal structures of mAMCase in complex with chitin, where we identified extensive conformational ligand heterogeneity." This reviewer suggests replacing "chitin" with "oligomeric GlcNAcn" throughout the text, specifically about biochemical experiments. It is important to accurately describe the experimental conditions and ligands used in the study.

      We have made these changes throughout the manuscript

      (9) In the introduction, the authors mention "a polymer of β(1-4)-linked N-acetyl-D-glucosamine (GlcNAc)". In this case, the letter "N" should be italicized to conform to the proper notation for the monosaccharide abbreviation.

      corrected (and hopefully would have been done so by the copy editor!)

      (10) In the introduction, the authors state, "In the absence of AMCase, chitin accumulates in the airways, leading to epithelial stress, chronic activation of type 2 immunity, and age-related pulmonary fibrosis5,6". It is recommended to clarify that "AMCase" refers to "acidic mammalian chitinase (AMCase)" in this context, as it is the first mention of the enzyme in the introduction.

      We moved that section so that it flows better and is introduced with the full name.

      (11) In the introduction, the authors state, "Mitigating the negative effects of high chitin levels is particularly important for mammalian lung and gastrointestinal health." This reviewer requests further clarification on the connection between chitin and gastrointestinal health. Please provide an explanation or reference to support this statement.

      We have modified this sentence to:

      “Chitin levels can be potentially important for mammalian lung and gastrointestinal health.”

      (12) In the introduction, the authors mention that "Acidic Mammalian Chitinase (AMCase) was originally discovered in the stomach and named for its high enzymatic activity under acidic conditions." It is recommended to include Reference #14 (Boot et al. J. Biol. Chem. 276, 6770-6778, 2001) as it provides the first report on mouse and human AMCase, contributing to the understanding of the enzyme.

      However, it is worth noting that while this paragraph primarily focuses on human tissues, Reference #14 primarily discusses mouse AMCase but also reports on human AMCase. Additionally, References #8 and #9 mainly discuss mouse AMCase. This creates confusion in the description of human and mouse AMCase within the paragraph.

      Considering that this paper aims to focus on the unique features of mouse AMCase, it is suggested that the authors provide a more specific and balanced description of both human and mouse AMCase throughout the main text..

      We have clarified the origin of the name AMCase and the results distinguish the two orthologs in the text with h or mAMCase.

      (13) Figure 1A in the Introduction section has been previously presented in several papers. The authors should consider moving this figure to the Results section and present an alternative figure based on their experimental results to enhance the novelty and impact of the study.

      We have considered this option, but prefer the original placement.

      (14) In the Results section, the authors mentioned, "Prior studies have focused on relative mAMCase activity at different pH18,20, limiting the ability to define its enzymological properties precisely and quantitatively across conditions of interest." It would be beneficial for the authors to include reference #14, the first report showing the pH profile of mouse AMCase, to support their statement.

      We have added this reference

      (15) Regarding the statement, "To overcome the pH-dependent fluorescent properties of 4MU-chitobioside, we reverted the assay into an endpoint assay, which allowed us to measure substrate breakdown across different pH (Supplemental Figure 1A)", the authors should provide a more detailed description of the improvements made to measure AMCase activity. Additionally, it would be helpful to include a thorough explanation of the figure legend for Supplementary Figure 1A to provide clarity to readers.

      We have included a detailed method description and figures already. See also our previous paper by Barad which includes other, related, assays.

      (16) Figure 1B shows that the authors used the AMCase catalytic domain. It would benefit the authors to explain the rationale behind this choice in the figure legend or the main text.

      This point is addressed in the text:

      “Previous structural studies on AMCase have focused on interactions between inhibitors like methylallosamidin and the catalytic domain of the protein.”

      (17) For Figures 1C-E, it is recommended that the authors include error bars in their results to represent the variability or uncertainty of the data. In Figure 1E, the authors should clarify the units of the Y-axis (e.g., sec-1 µM-1). Additionally, in Figure 1F, the authors should explain how the catalytic acidity is shown.

      We have added error bars and axis labels. Figure 1F is conceptual, so we are leaving it as is.

      (18) The authors stated, "These observations raise the possibility that mAMCase, unlike other AMCase homologs, may have evolved an unusual mechanism to accommodate multiple physiological conditions." It would be helpful for the authors to compare and discuss the pH-dependent AMCase activity of mouse AMCase with other AMCase homologs to support this statement.

      That is an excellent idea for future comparative studies, but beyond the scope of what we are examining in this paper.

      (19) The authors should explain Supplemental Figures 1B and C in the Results or Methods sections to provide context for these figures.

      We are unclear about what this means - and since it is a minor comment, we have elected not to change these sections.

      (20) Supplemental Figure 3 is missing any description. It would be important for the authors to include a mention of this figure in the main text before Supplemental Figure 4 to guide the readers.

      The full legend is in there now and the reference to Supplemental 4 was mislabeled.

      (21) For Supplemental Figure 4, the authors should explain the shape of the symbol used in the figure. Additionally, they should explain "apo" and "holoenzyme" in the context of this figure.

      Unclear what a shape means in this context - perhaps the confusion arises because these are violin plots showing distributions.

      (22) Table 1 requires a more detailed explanation of its contents. Additionally, Tables 2 and 3 need to be included. The authors should include these missing tables in the revised version and explain their contents appropriately.

      Table 1 is the standard crystallographic table - there isn’t much more detailed explanation that can be offered. Tables 2 and 3 were not transferred properly by BioRxiv but were included in the review packet as requested a day after submission.

      (23) In Figure 4, it would be beneficial to enlarge Panels A-C to improve the ease of comprehension for readers. Additionally, it is recommended to use D136, D138, and E140 instead of D1, D2, and E to label the respective parts. The authors should also explain the meaning of the symbol used in the figure.

      Since it is a minor comment, we have elected not to change these figures.

      (24) In Figure 5, it would be beneficial to enlarge Panels A-C to improve the ease of comprehension for readers.

      Since it is a minor comment, we have elected not to change these figures.

      (25) Similarly, in Figure 6, all panels should be enlarged to enhance the ease of comprehension for readers.

      Since it is a minor comment, we have elected not to change these figures.

      Reviewer #2 (Recommendations For The Authors):

      In general, I did not identify many detailed or technical concerns with the work. A few items for the authors to consider are listed below.

      (1) The interpretation of the crystallographic datasets seems complicated by the heterogeneity in the substrate component. It might be nice to see more critical analysis of the approach here. Are there other explanations or possible models that were considered? Do other structures of chitinases or other polysaccharide hydrolases exhibit the same phenomenon?

      We have tried in writing it to provide a very critical approach to this and it is quite likely that other structures contain unmodeled density containing similar heterogeneity (but it is just unmodeled).

      (2) It would be ideal to include more experimental validation of the proposed mechanism. Much of the manuscript includes theoretical validations (pKa estimation, dynamics, etc) - but it would be optimal to make an enzyme variant or do an experiment with a substrate analog.

      Yes - we agree that follow on experiments are needed to fully test the mechanism and that those will be the subject of future work.

      (3) For an uninitiated reviewer, I think the major issue with this study is that the broader significance of the work and how it fits into the context of other work on these enzymes is not clear. It would be helpful to be more specific about what we know of mechanism from work on other enzymes to help the reader understand the motivation for this study.

      We have added w few additional references, guided by reviewer 1 comments, that should help in this respect.

    2. eLife assessment

      This structural and biochemical study of the mouse homolog of acidic mammalian chitinase (AMCase) enhances our understanding of the pH-dependent activity and catalytic properties of mouse AMCase, and it sheds light on its adaptation to different physiological pH environments. The methods and analysis of data are solid, providing several lines of evidence to support the development of mechanistic hypotheses. While the findings and interpretation will be valuable to those studying AMCase in mice, the broader significance, including extension of the results to other species including human, remain less clear.

    3. Reviewer #1 (Public Review):

      General comments:

      This paper investigates the pH-specific enzymatic activity of mouse acidic mammalian chitinase (AMCase) and aims to elucidate its function's underlying mechanisms. The authors employ a comprehensive approach, including hydrolysis assays, X-ray crystallography, theoretical calculations of pKa values, and molecular dynamics simulations to observe the behavior of mouse AMCase and explore the structural features influencing its pH-dependent activity.

      The study's key findings include determining kinetic parameters (Kcat and Km) under a broad range of pH conditions, spanning from strong acid to neutral. The results reveal pH-dependent changes in enzymatic activity, suggesting that mouse AMCase employs different mechanisms for protonation of the catalytic glutamic acid residue and the neighboring two aspartic acids at the catalytic motif under distinct pH conditions.<br /> The novelty of this research lies in the observation of structural rearrangements and the identification of pH-dependent mechanisms in mouse AMCase, offering a unique perspective on its enzymatic activity compared to other enzymes. By investigating the distinct protonation mechanisms and their relationship to pH, the authors reveal the adaptive nature of mouse AMCase, highlighting its ability to adjust its catalytic behavior in response to varying pH conditions. These insights contribute to our understanding of the pH-specific enzymatic activity of mouse AMCase and provide valuable information about its adaptation to different physiological conditions.<br /> Overall, the study enhances our understanding of the pH-dependent activity and catalytic properties of mouse AMCase and sheds light on its adaptation to different physiological pH environments.

      Comments on revised version:

      In their revised manuscript, the authors have made significant efforts to address the reviewers' comments.

    1. eLife assessment

      This important manuscript presents several structures of the Kv1.2 voltage-gated potassium channel, based on state-of-the-art cryoEM techniques and algorithms. The authors present solid evidence for structures of DTX-bound Kv1.2 and of Kv1.2 in potassium-free solution (with presumably sodium ions bound within the selectivity filter). These structures advance our knowledge of the molecular basis of the channel inactivation process.

    2. Reviewer #1 (Public Review):

      In this manuscript by Wu et al., the authors present the high resolution cryoEM structures of the WT Kv1.2 voltage-gated potassium channel. Along with this structure the authors have solved several structures of mutants or experimental conditions relevant to the slow inactivation process that these channels undergo and which is not yet completely understood.

      One of the main findings is the determination of the structure of a mutant (W366F) that is thought to correspond to the slow inactivated state. These experiments confirm results in similar mutants in different channels from Kv1.2 that indicate that inactivation is associated with an enlarged selectivity filter.

      Another interesting structure is the complex of Kv1.2 with the pore blocking toxin Dendrotoxin 1. The results shown in the revised version indicate that the mechanism of block is similar to that of related blocking-toxins, in which a lysine residue penetrates in the pore. Surprisingly, in these new structures, the bound toxin results in a pore with empty external potassium binding sites.

      The quality of the structural data presented in this revised manuscript is very high and allows for unambiguous assignment of side chains. The conclusions are supported by the data. This is an important contribution that should further our understanding of voltage-dependent potassium channel gating. In the revised version, the authors have addressed my previous specific comments, which are appended below.

      (1) In the main text's reference to Figure 2d residues W18' and S22' are mentioned but are not labeled in the insets.

      (2) On page 8 there is a discussion of how the two remaining K+ ions in binding sites S3 and S4 prevent permeation K+ in molecular dynamics. However, in Shaker, inactivated W434F channels can sporadically allow K+ permeation with normal single-channel conductance but very reduced open times and open probability at not very high voltages.

      (3) The structures of WT in the absence of K+ shows a narrower selectivity filter, however Figure 4 does not convey this finding. In fact, the structure in Figure 4B is constructed in such an angle that it looks as if the carbonyl distances are increased, perhaps this should be fixed. Also, it is not clear how the distances between carbonyls given in the text on page 12 are measured. Is it between adjacent or kitty-corner subunits?

      (4) It would be really interesting to know the authors opinion on the driving forces behind slow inactivation. For example, potassium flux seems to be necessary for channels to inactivate, which might indicate a local conformational change is the trigger for the main twisting events proposed here.

    3. Reviewer #2 (Public Review):

      Cryo_EM structures of the Kv1.2 channel in the open, inactivated, toxin complex and in Na+ are reported. The structures of the open and inactivated channels are merely confirmatory of previous reports. The structures of the dendrotoxin bound Kv1.2 and the channel in Na+ are new findings that will of interest to the general channel community.

      Review of the resubmission:

      I thank the authors for making the changes in their manuscript as suggested in the previous review. The changes in the figures and the additions to the text do improve the manuscript. The new findings from a further analysis of the toxin channel complex are welcome information on the mode of the binding of dendrotoxin.

      A few minor concerns:<br /> (1) Line 93-96, 352: I am not sure as to what is it the authors are referring to when they say NaK2P. It is either NaK or NaK2K. I don't think that it has been shown in the reference suggested that either of these channels change conformation based on the K+ concentration. Please check if there is a mistake and that the Nichols et. al. reference is what is being referred to.

      (2) Line 365: In the study by Cabral et. al., Rb+ ions were observed by crystallography in the S1, S3 and S4 site, not the S2 site. Please correct.

    4. Reviewer #3 (Public Review):

      Wu et al. present cryo-EM structures of the potassium channel Kv1.2 in open, C-type inactivated, toxin-blocked and presumably sodium-bound states at 3.2 Å, 2.5 Å, 2.8 Å, and 2.9 Å. The work builds on a large body of structural work on Kv1.2 and related voltage-gated potassium channels. The manuscript presents a plethora of structural work, and the authors are commended on the breadth of the studies. The structural studies are well-executed. Although the findings are mostly confirmatory, they do add to the body of work on this and related channels. Notably, the authors present structures of DTx-bound Kv1.2 and of Kv1.2 in a low concentration of potassium (which may contain sodium ions bound within the selectivity filter). These two structures add considerable new information. The DTx structure has been markedly improved in the revised version and the authors arrive at well-founded conclusions regarding its mechanism of block. Regarding the Na+ structure, the authors claim that the structure with sodium has "zero" potassium - I caution them to make this claim. It is likely that some K+ persists in their sample and that some of the density in the "zero potassium" structure may be due to K+ rather than Na+. This can be clarified by revisions to the text and discussion. I do not think that any additional experiments are needed. Overall, the manuscript is well-written, a nice addition to the field, and a crowning achievement for the Sigworth lab.

      Most of this reviewer's initial comments have been addressed in the revised manuscript. Some comments remain that could be addressed by revisions of the text.

      Specific comments on the revised version:<br /> Quotations indicate text in the manuscript.<br /> (1) "While the VSD helices in Kv1.2s and the inactivated Kv1.2s-W17'F superimpose very well at the top (including the S4-S5 interface described above), there is a general twist of the helix bundle that yields an overall rotation of about 3o at the bottom of the VSD."

      Comment: This seemed a bit confusing. I assume the authors aligned the complete structures - the differences they indicate seem to be slight VSD repositioning relative to the pore rather than differences between the VSD conformations themselves. The authors may wish to clarify. As they point out in the subsequent paragraph, the VSDs are known to be loosely associated with the pore.

      (2) Comment: The modeling of DTx into the density is a major improvement in the revision. Figure 3 displays some interactions between the toxin and Kv1.2 - additional side views of the toxin and the channel might allow the reader to appreciate the interactions more fully. The overall fit of the toxin structure into the density is somewhat difficult to assess from the figure. (The authors might consider using ChimeraX to display density and model in this figure.)

      (3) "We obtained the structure of Kv1.2s in a zero K+ solution, with all potassium replaced with sodium, and were surprised to find that it is little changed from the K+ bound structure, with an essentially identical selectivity filter conformation (Figure 4B and Figure 4-figure supplement 1)."

      Comment: It should be noted in the manuscript that K+ and Na+ ions cannot be distinguished by the cryo-EM studies - the densities are indistinguishable. The authors are inferring that the observed density corresponds to Na+ because the protein was exchanged from K+ into Na+ on a gel filtration (SEC) column. It is likely that a small amount of K+ remains in the protein sample following SEC. I caution the authors to claim that there is zero K+ in solution without measuring the K+ content of the protein sample. Additionally, it should be considered that K+ may be present in the blotting paper used for cryo-EM grid preparation (our laboratory has noted, for example, a substantial amount of Ca2+ in blotting paper). The affinity of Kv1.2 for K+ has not been determined, to my knowledge - the authors note in the Discussion that the Shaker channel has "tight" binding for K+. It seems possible that some portion of the density in the selectivity filter could be due to residual K+. This caveat should be clearly stated in the main text and discussion. More extensive exchange into Na+, such as performing the entire protein purification in NaCl, or by dialysis (as performed for obtaining the structure of KcsA in low K+ by Y. Zhou et al. & Mackinnon 2001), would provide more convincing removal of K+, but I suspect that the Kv1.2 protein would not have sufficient biochemical stability without K+ to endure this treatment. One might argue that reduced biochemical stability in NaCl could be an indication that there was a meaningful amount of K+ in the final sample used for cryo-EM (or in the particles that were selected to yield the final high-resolution structure).

      (4) Referring to the structure obtained in NaCl: "The ion occupancy is also similar, and we presume that Kv1.2 is a conducting channel in sodium solution."

      Comment: Stating that "Kv1.2 is a conducting channel in sodium solution" and implying that conduction of Na+ is achieved by an analogous distribution of ion binding sites as observed for K+ are strong statements to make - and not justified by the experiments provided. Electrophysiology would be required to demonstrate that the channel conducts sodium in the absence of K+. More complete ionic exchange, better control of the ionic conditions (Na+ vs K+), and affinity measurements for K+ would be needed to determine the distribution of Na+ in the filter (as mentioned above). At minimum, the authors should revise and clarify what the intended meaning of the statement "we presume that Kv1.2 is a conducting channel in sodium solution". As mentioned above, it seems possible/likely that a portion of the density in the filter may be due to K+.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In this manuscript by Wu et al., the authors present the high-resolution cryoEM structures of the WT Kv1.2 voltage-gated potassium channel. Along with this structure, the authors have solved several structures of mutants or experimental conditions relevant to the slow inactivation process that these channels undergo and which is not yet completely understood. 

      One of the main findings is the determination of the structure of a mutant (W366F) that is thought to correspond to the slow inactivated state. These experiments confirm results in similar mutants in different channels from Kv1.2 that indicate that inactivation is associated with an enlarged selectivity filter. 

      Another interesting structure is the complex of Kv1.2 with the pore-blocking toxin Dendrotoxin 1. The results show that the mechanism of the block is different from similar toxins, in which a lysine residue penetrates the pore deep enough to empty most external potassium binding sites. 

      The quality of the structural data presented in this manuscript is very high and allows for the unambiguous assignment of side chains. The conclusions are supported by the data. This is an important contribution that should further our understanding of voltagedependent potassium channel gating. Specific comments are appended below. 

      (1) In the mains text's reference to Figure 2d residues W18' and S22' are mentioned but are not labeled in the insets. 

      Now labeled in Fig. 2D

      (2) On page 8 there is a discussion of how the two remaining K+ ions in binding sites S3 and S4 prevent permeation K+ in molecular dynamics. However, in Shaker, inactivated W434F channels can sporadically allow K+ permeation with normal single-channel conductance but very reduced open times and open probability at not very high voltages. 

      Addressed in the Discussion, lines 480-490.

      (3) The structures of WT in the absence of K+ show a narrower selectivity filter, however, Figure 4 does not convey this finding. In fact, the structure in Figure 4B is constructed at such an angle that it looks as if the carbonyl distances are increased, perhaps this should be fixed. Also, it is not clear how the distances between carbonyls given in the text on page 12 are measured. Is it between adjacent or kitty-corner subunits? 

      We decided to remove mention of carbonyl distances, because at our resolutions the atoms are not resolved.

      (4) It would be really interesting to know the authors' opinions on the driving forces behind slow inactivation. For example, potassium flux seems to be necessary for channels to inactivate, which might indicate a local conformational change is the trigger for the main twisting events proposed here. 

      We cite Sauer et al. (2011) for the idea that the intact selectivity filter is a strained conformation, and its relaxation yields the wide vestibule seen in NaK2K and Kv channels.  Lines 434-439.

      Reviewer #2 (Public Review): 

      There are four Kv1.2 channel structures reported: the open state, the C-type inactivated state, a dendrotoxin-bound state, and a structure in Na+. 

      A high-resolution crystal structure of the open state for a chimeric Kv1.2 channel was reported in 2007 and there is no new information provided by the cryoEM structure reported in this study. 

      The cryo-EM structure of the C-type inactivated state of the Kv1.2 channel was determined for a channel with the W to F substitution in the pore helix. A cryo-EM structure of the Shaker channel and a crystal structure of a chimeric Kv1.2 channel with an equivalent W to F mutation were reported in 2022. Cryo-EM structures of the C-type inactivated Kv1.3 channel are also available. All these previous structures have provided a relatively consistent structural view of the C-type inactivated state and there is no significant new information that is provided by the structure reported in this study. 

      A structure of the Kv1.2 channel blocked by dendrotoxin is reported. A crystal structure of charybdotoxin and the chimeric Kv1.2 channel was reported in 2013. Density for dendrotoxin could not be clearly resolved due to symmetry issues and so the definitive information from the structure is that dendrotoxin binds, similarly to charybdotoxin, at the mouth of the pore. A potential new finding is that there is a deeper penetration of the blocking Lys residue in dendrotoxin compared to charybdotoxin. It will however be necessary to use approaches to break the symmetry and resolve the electron density for the dendrotoxin molecule to support this claim and to make this structure significant.  

      We have now succeeded in breaking the symmetry and present in Fig. 3 a C1 structure of the toxin-channel complex. In the improved map we now see that our previous conclusion was wrong: the penetration of Lys5 cannot be much deeper than that seen in CTx and ShK structures. However for some reason the pattern of ion-site occupancies in the blocked state is different in this structure than in the others. Fig. 3, Fig. 4E; text lines 559-568.

      The final structure reported is the structure of the Kv1.2 channel in K+ free conditions and with Na+ present. The structure of the KcsA channel by the MacKinnon group in 2001 showed a constricted filter and since then it has been falsely assumed by the K channel community that the lowering of K concentration leads to a construction of the selectivity filter. There have been structural studies on the MthK and the NaK2K channels showing a lack of constriction in the selectivity filter in the absence of K+. These results have been generally ignored and the misconception of filter constriction/collapse in the absence of K+ still persists. The structure of the Kv1.2 channel in Na+ provided a clear example that loss of K+ does not necessarily lead to filter constriction. 

      We are grateful to the reviewer for pointing out this serious omission. We now cite other work including from the Y. Jiang and C. Nichols labs showing examples of outer pore expansion and destabilization. Page p. 4, lines 90-104; lines 421-439.

      The structure in Na+ is significant while the other structures are either merely reproductions of previous reports or are not resolved well enough to make any substantial claims. 

      We now state more clearly the confirmatory nature of our Kv1.2 open structure (lines 71-74) and the similarities of the inactivated-channel structures (lines 193196).

      Reviewer #3 (Public Review): 

      Wu et al. present cryo-EM structures of the potassium channel Kv1.2 in open, C-type inactivated, toxin-blocked and presumably sodium-bound states at 3.2 Å, 2.5 Å, 2.8 Å, and 2.9 Å. The work builds on a large body of structural work on Kv1.2 and related voltage-gated potassium channels. The manuscript presents a large quantity of structural work on the Kv1.2 channel, and the authors should be commended on the breadth of the studies. The structural studies seem well-executed (this is hard to fully evaluate because the current manuscript is missing a data collection and refinement statistics table). The findings are mostly confirmatory, but they do add to the body of work on this and related channels. Notably, the authors present structures of DTXbound Kv1.2 and of Kv1.2 in a low concentration of potassium (with presumably sodium ions bound within the selectivity filter). These two structures add new information, but the studies seem somewhat underdeveloped - they would be strengthened by accompanying functional studies and further structural analyses. Overall, the manuscript is well-written and a nice addition to the field. 

      The data collection and refinement table has been added (Fig. 4 supplement 3.)

      We agree and regret the lack of functional studies. We have not been able to carry them out because work in our laboratory is winding down and the lab soon will be closing.

      Recommendations for the authors: 

      Reviewer #2 (Recommendations For The Authors): 

      (1) It is not obvious from the data shown how well the side chain positions in the inactivated state are defined by the electron density. These figures should be redone. Maybe the use of stereo would be useful. This will be particularly useful for the reader to decide if the small changes in, for example, the positioning of the carbonyl oxygens are believable. 

      Figure 2 – figure supplement 4 shows the stereo views.

      (2) The authors note the changes observed (though small) in the VSD which were not observed in other structures. The relevance of this observation is not described. Do these changes arise due to the different environments of detergents versus nanodisc etc. in the different structures?

      We’ve now inserted a note about variety of environments and how this might be a cause of the difference: lines 280-285.  

      Are there changes in the pore-VSD interface in the inactivated and the open channel structures and if yes, then do mutations at these residues affect inactivation?

      There is surprisingly little movement at the S4-S5 interface residues identified by Bassetto et al. (2022) as having effects on inactivation. Lines 262-267.

      (3) For the structures in Na+, it is important to provide analytical data showing the biochemical behavior of the channel. This is also true for the wild type and the W to F mutant channel. Size exclusion profiles should be included. 

      The SEC profile (noisy, but showing a clear peak) of the channel in Na+ is now shown in Fig. 4 supplement 1. Low expression of the W366F mutant produced even worse SEC results, but we include a representative micrograph of W366F in Na+ to show the monodispersed protein prep. In Figure 5 – figure supplement 1.

      Reviewer #3 (Recommendations For The Authors): 

      Portions of text from the manuscript are indicated by quotations. 

      Introduction: "One goal of the current study was to examine the structure of the native Kv1.2 channel." 

      Comment, minor points: The authors refer to the Kv1.2 construct used for the structural studies as "native Kv1.2". I found this somewhat confusing because the word "native" suggests derived from a native source. The phrasing above also gives the impression that the structure by Wu et al is the first structure of Kv1.2. The Kv1.2 construct is essentially identical to the one used by Long et al in 2005 to determine the initial structure of Kv1.2 (PDB 2A79). The authors discuss a subsequent paddle-chimera Kv1.2-2.1 structure from 2007 (PDB 2R9R) in the introduction, but it would be prudent to mention the 2005 one of Kv1.2 as well. The open structure determined by Wu et al. is an improvement on the 2A79 structure in that the 2A79 structure was modeled as a poly-alanine model within the voltage sensor domain. Nevertheless, the Kv1.2-2.1 structure (2R9R) is highly similar to the 2A79 structure of Kv1.2. The 2007 structure indicated that Kv1.2-2.1 recapitulates structural features of Kv1.2. It is therefore not surprising that the open structure presented here is highly similar to that of both PDB 2A79 (Kv1.2) and PDB 2R9R (Kv1.2-2.1).  

      We failed to point out the high quality of the original Long et al. 2005 structure and its comparisons with the chimeric structure in Long et al. 2007. We now have tried to correct this: lines 70-74.

      Comment: The cryo-EM analyses suggest that a large percentage (most?) of the particles are missing the beta subunit. This should be commented on somewhere.      

      Now noted on lines 120-132, we pooled particles with and without beta subunits. 

      Regarding ions in the selectivity filter, one-dimensional plots of the density would strengthen the analysis.

      Now included in Fig. 4.

      Also, one should mention caveats associated with identifying ions in cryo-EM maps and the added difficulty/uncertainty when the density is located along a symmetry axis (C4 axis, due to the possible build-up of noise). C1 reconstructions, showing density within the filter, if possible, would strengthen the analyses.

      You are correct. However local resolution is highest in the selectivity filter region. So I think that since the CTF-based filtering is constant over all the structure I think the SNR will be good on axis. 

      Comment: The section on channel inactivation could be simplified by stating that the structure is highly similar to W17'F structures of other Kv channels. (And then discussing possible differences).  

      We now note, “overall conformational difference is identical…” p. 7, lines 193-196.

      "Salt bridges involving the S4 Arg and Lys residues are shifted slightly (Figure 2-figure supplement 3A-D). Arg300 (R3) is in close proximity to Glu226 on the S2 helix for the open channel, while R3 is closer to Glu183 in the S2 helix. The Glu226 side chain adopts a visible interaction with R4 in the inactivated state." 

      Comment: The density for these acidic amino acids seems weak, especially in the inactivated state. It seems like a stretch to make much of their possible conformational changes. 

      We’ve included stereo pairs in Fig. 2 – figure supplement 4.

      "By adding 100 nM α-DTx to detergent solubilized Kv1.2 protein we obtained a cryo-EM structure at 2.8 Å resolution of the complex." 

      Comment: 100 nm. might be lower than the Kv concentration. The current methods are ambiguous on the concentration of Kv channel used for the DTx sample. From the methods, it seems possible that 100 nM DTX is a sub-stoichiometric amount relative to the channel. Regardless, the cryo-EM data seems to suggest that a large percentage of particles do not have DTx bound. This surely complicates the interpretation of density within the filter (which has partly been ascribed to a lysine side chain from DTx).

      The reviewer correctly points a potentially serious problem. It turns out that the 100nM figure we quoted was incorrect, and the actual concentration of toxin, >400 nM, was substantially greater than the protein concentration. This is confirmed by the small fraction (<1%) of 3D class particles that do not show the toxin density (lines 303-306).

      Comment: The methods on atomic structure building/refinement (Protein model building, refinement, and structural analysis) are sparse. A table is needed showing data collection and refinement statistics for each of the structures. This data should also provide average B factors for the ions in the filter. An example can be found in PMID 36224384. 

      Data collection and statistics are now in Fig. 4 – figure supplement 3.

      "In the selectivity filter of the toxin-bound channel (Figure 3E) a continuous density is seen to extend downward from the external site IS0 through to the boundary between IS1 and IS2. This density is well modeled by an extended Lys side chain from the bound toxin, with the terminal amine coordinated by the carbonyls of G27”.

      Comment: While there seems to be extra density in site IS0 from the figures, the density ascribed to lysine in the filter doesn't seem that distinct from those of ions in the open structure. 1-dimensional density plots and some degree of caution may be prudent. Could there, for example, be a mixture of toxin-bound and free channels in the dataset?

      Could the lysine penetrate to different depths? If the toxin binds with nM affinity, why are any channels missing the toxin? Have the authors modeled an atomic structure of the entire toxin bound to the channel to evaluate how plausible the proposed binding of the lysine is? Can the toxin be docked onto Kv1.2 with the deep positioning of the lysine and not clash with the extracellular surface of Kv1.2? 

      We also were concerned about these issues. We have been able to obtain a C1 reconstruction of the toxin-channel complex. In building the atomic model we found that indeed the Lys5 side chain could not penetrate as far as we had thought, and appears to be coordinated by the first carbonyl pair. Fig. 3; text lines 331-332. 

      "Toxin binding shrinks the distances between opposing carbonyl oxygens in the selectivity filter, forming a narrower tunnel into which the Lys side chain fits (Figure 3F). The second and fourth carbonyl oxygen distances are substantially reduced from 4.7 Å and 4.6 Å in an open state to 3.7 Å and 3.9 Å, respectively (Figure 4E). In a superposition of Kv1.2 open-state and α-DTX-bound P-loop structures, there is also an upward shift of the first three carbonyl groups by 0.7~1.0 Å (Figure 4F). " 

      Comment: I suspect the authors intend to refer to Figure 3F rather than 4. I would be cautious here. The refined positions of the carbonyl oxygens are almost certainly affected by the presence or absence of ions in the atomic model during refinement. The density and the resolution of the map may not be able to distinguish small changes to the positions of the carbonyl oxygens (and these differences/uncertainties are compounded by the C4 symmetry). 

      "On the other hand, the terminal amine of lysine in α-DTX is deeply wedged at the second set of carbonyls, narrowing both IS1 and IS2 while displacing ions from the sites (Figure 3-figure supplement 2A). CTX does not cause narrowing of the selectivity filter or displacements of the carbonyls (Figure 3-figure supplement 2B). "

      Comment: Again, caution would be prudent here.  

      We are very grateful to the reviewer for pointing out these problems. We have removed these statements that are weakly supported at our resolution level.

      "Shaker channels are able to conduct Na+ in the absence of K+ (Melishchuk et al., 1998)." 

      Comment: How about the Kv1.2 channel? Is Kv1.2 able to conduct Na+ in the absence of K+ ? This would certainly be relevant for interpreting the conformation of the filter and the density ascribed to Na+ for the structure in sodium.  

      We agree wholeheartedly, but unfortunately we are no longer capable of doing the measurements as our lab will soon close.

      "Ion densities are seen in the IS1, IS3, and IS4 ion binding sites, but the selectivity filter shows a general narrowing as would be expected for binding of sodium ions. The second, third, and fourth carbonyl oxygen distances are reduced from 4.7 Å, 4.7 Å, and 4.6 Å in the open state to 4.4 Å, 3.9 Å, and 4.5 Å, respectively. The rest of the channel structure is very little perturbed. " 

      Comment: The density for IS4 seems weak. To me, it looks like IS1 and IS3 are occupied, whereas IS2 and IS4 are much weaker. 1-dimensional density plots would be helpful. I would suggest caution in commenting too strongly on the "general narrowing" since the resolution of the maps, the local density, and the atomic structure refinement would be consistent with coordinate errors of 0.5 Å or more - and would be compounded (~ doubled) by measuring between symmetry-related atoms.  

      We present 1D plots in Fig. 4E. We no longer comment on “narrowing”

      "Finally, the snake toxin a-Dendrotoxin (DTx) studied here is seen to block Kv1.2 by insertion of a lysine residue into the pore." 

      Comment: Discussion (and references) should be given regarding what was known prior to this study on the mode of inhibition by DTx. 

      Discussion and references now added, lines 287-301.

      "On the other hand, a lengthy molecular-dynamics simulation of deactivation in the Kv1.2-2.1..." 

      Comment: I don't think mentioning this personal communication adds to the manuscript. 

      Actually the original “personal communication” reference was there because the situation is complicated. The movie S3 accompanying the Jensen et al. paper shows deactivation and dewetting of the channel during a 250 us simulation. In the movie there are ions visible in the selectivity filter for the first 50 us, but after that the SF appears empty. Puzzled by this we contacted Dr. Jensen who explained that the movie was in error, ions remain in the SF throughout the entire 250 us. We now cite Jensen (2012) along with the personal communication.

      "The difference between the open and inactivated Kv1.2 structures, like the difference in Kv1.2-2.1 (Reddi et al., 2022) and Shaker (Tan et al., 2022) can be imagined as resulting from a two-step process." 

      Comment: Confusing phrasing because the authors mean to compare their structure to inactivated structures of Kv1.2-2.1 and shaker. 

      Fixed, lines 220-222.

      "Molecular dynamics simulations by Tan et al. based on the Shaker-W17'F structure show that IS3 and IS4 are simultaneously occupied by K+ ions in the inactivated state." 

      Comment: I think that the word "show" is too strong. Perhaps "suggest" 

      The MD result seems to us to be unequivocal, that most of the time the two sites are occupied by ions.

      References are needed for the following statements:  

      -  "as well as the charge-transfer center phenylalanine"

      Now citing Tao et al. 2010, line 156.

      - "total gating charge movement in Shaker channels is larger, about 13 elementary charges per channel" 

      Now citing the review by Islas, 2015 (line 166-169).

      "The selectivity filter of potassium channels consists of an array of four copies of the extended loop (the P-loop) formed by a highly conserved sequence, in this case, TTVGYGD. Two residues anchor the outer half of the selectivity filter and are particularly important in inactivation mechanisms (Figure 2B, right panels). Normally, the tyrosine Y28' (Y377 in Kv1.2) is constrained by hydrogen bonds to residues in the pore helix and helix S6 and is key to the conformation of the selectivity filter. The final aspartate of the P-loop, D30' (D379 in Kv1.2) is normally located near the extracellular surface and has a side chain that also participates in H-bonds with W17' (W366 in Kv1.2) on the pore helix." 

      Citations added (Pless 2013, Sauer 2011) lines 211-214.

      - "During normal conduction, ion binding sites in the selectivity filter are usually occupied by K+ and water molecules in alternation." 

      Added Morais-Cabral et al. 2001, p. 17, lines 463-465.

    1. eLife assessment:

      This paper characterises a novel gene (Spar), and presenting valuable findings in the field of insect biology and behaviour. The experiments are well designed, with attention to detail, showcasing the potential of the Drosophila melanogaster model and the use of online resources. The mixed approach presents a convincing argument for a genetic interaction between Alk and Spar.

    1. eLife assessment

      Receptor tyrosine kinases such as ALK play critical roles during appropriate development and behaviour and are nodal in many disease conditions, through molecular mechanisms that weren't completely understood. This manuscript identifies a previously unknown neuropeptide precursor as a downstream transcriptional target of Alk signalling in Clock neurons in the Drosophila brain. The experiments are well designed with attention to detail, the data are convincing, and the findings will be valuable to those interested in events downstream of signalling by receptor tyrosine kinases.

    1. Reviewer #3 (Public Review):

      In this manuscript, the authors explored the interaction between the pattern recognition receptor MDA5 and 5'ppp-RNA in the Miiuy croaker. They found that MDA5 can serve as a substitute for RIG-I in detecting 5'ppp-RNA of Siniperca cheilinus rhabdovirus (SCRV) when RIG-I is absent in Miiuy croaker. Furthermore, they observed MDA5's recognition of 5'ppp-RNA in chickens (Gallus gallus), a species lacking RIG-I. Additionally, the authors documented that MDA5's functionality can be compromised by m6A-mediated methylation and degradation of MDA5 mRNA, orchestrated by the METTL3/14-YTHDF2/3 regulatory network in Miiuy croaker during SCRV infection. This impairment compromises the innate antiviral immunity of fish, facilitating SCRV's immune evasion. These findings offer valuable insights into the adaptation and functional diversity of innate antiviral mechanisms in vertebrates.

    2. eLife assessment

      The authors present evidence suggesting that MDA5 can substitute as a sensor for triphosphate RNA in a species that naturally lacks RIG-I. The key findings are potentially important for our understanding of the evolution of innate immune responses. Compared to an earlier version of the paper, the strength of evidence has improved but it is still partially incomplete due to a few key missing experiments and controls.

    3. Reviewer #1 (Public Review):

      This study offers valuable insights into host-virus interactions, emphasizing the adaptability of the immune system. Readers should recognize the significance of MDA5 in potentially replacing RIG-I and the adversarial strategy employed by 5'ppp-RNA SCRV in degrading MDA5 mediated by m6A modification in different species, further indicating that m6A is a conservational process in the antiviral immune response.

      However, caution is warranted in extrapolating these findings universally, given the dynamic nature of host-virus dynamics. The study provides a snapshot into the complexity of these interactions, but further research is needed to validate and extend these insights, considering potential variations across viral species and environmental contexts. Additionally, it is noted that the main claims put forth in the manuscript are only partially supported by the data presented.

    4. Reviewer #2 (Public Review):

      This manuscript by Geng et al. aims to demonstrate that MDA5 compensates for the loss of RIG-I in certain species, such as teleofish miiuy croacker. The authors use siniperca cheats rhabdovirus (SCRV) and poly(I:C) to demonstrate that these RNA ligands induce an IFN response in an MDA5-dependent manner in m.miiuy derived cells. Furthermore, they show that MDA5 requires its RD domain to directly bind to SCRV RNA and to induce an IFN response. They use in vitro synthesized RNA with a 5'triphosphate (or lacking a 5'triphosphate as a control) to demonstrate that MDA5 can directly bind to 5'-triphosphorylated RNA. The second part of the paper is devoted to m6A modification of MDA5 transcripts by SCRV as an immune evasion strategy. The authors demonstrate that the modification of MDA5 with m6A is increased upon infection and that this causes increased decay of MDA5 and consequently a decreased IFN response.

      - One critical caveat in this study is that it does not address whether ppp-SCRV RNA induces IRF3-dimerization and type I IFN induction in an MDA5 dependent manner. The data demonstrate that mmiMDA5 can bind to triphosphorylated RNA (Fig. 4D). In addition, triphosphorylated RNA can dimerize IRF3 (4C). However, a key experiment that ties these two observations together is missing.<br /> - Specifically, although Fig. 4C demonstrates that 5'ppp-SCRV RNA induces dimerization (unlike its dephosphorylated or capped derivatives), this does not proof that this happens in an MDA5-dependent manner. This experiment should have been done in WT and siMDA5 MKC cells side-by-side to demonstrate that the IRF3 dimerization that is observed here is mediated by MDA5 and not by another (unknown) protein. The same holds true for Fig. 4J.<br /> - Fig 1C-D: these experiments are not sufficiently convincing, i.e. the difference in IRF3 dimerization between VSV-RNA and VSV-RNA+CIAP transfection is minimal.<br /> - Fig. 2N and 2O: why did the authors decide to use overexpression of MDA5 to assess the impact of STING on MDA5-mediated IFN induction? This should have been done in cells transfected with SCRV or polyIC (as in 2D-G) or in infected cells (as in 2H-K). In addition, it is a pity that the authors did not include an siMAVS condition alongside siSTING, to investigate the relative contribution of MAVS versus STING to the MDA5-mediated IFN response. Panel O suggests that the IFN response is completely dependent on STING, which is hard to envision.<br /> - Fig. 3F and 3G: where are the mock-transfected/infected conditions? Given that ectopic expression of hMDA5 is known to cause autoactivation of the IFN pathway, the baseline ISG levels should be shown (ie. In absence of a stimulus or infection). Normalization of the data does not reveal whether this is the case and is therefore misleading.<br /> - Fig. 4F and 4G: can the authors please indicate in the figure which area of the gel is relevant here? The band that runs halfway the gel? If so, the effects described in the text are not supported by the data (i.e. the 5'OH-SCRV and 5'pppGG-SCRV appear to compete with Bio-5'ppp-SCRV as well as 5'ppp-SCRV).<br /> - My concerns about Fig. 5 remain unaltered. The fact that MDA5 is an ISG explains its increased expression and increased methylation pattern. The authors should at the very least mention in their text that MDA5 is an ISG and that their observations may be partially explained by this fact.

    5. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors present evidence suggesting that MDA5 can substitute as a sensor for triphosphate RNA in a species that naturally lacks RIG-I. The key findings are potentially important for our understanding of the evolution of innate immune responses, but the evidence is incomplete, as additional biochemical and functional experiments are needed to unambiguously assign MDA5 as a bona fide sensor of triphosphate RNA in this model. This also leaves the title as overstating its case.

      We would like to thank the editorial team for these positive comments on our manuscript and the constructive suggestions to improve our manuscript. According to the suggestions and valuable comments of the referees, we have added substantial amounts of new data and analysis to substantiate our claims, and the manuscript, including the title, has been carefully revised to better reflect our conclusions. We are now happy to send you our revised manuscript, we hope the modified manuscript addresses your and the reviewers’ concerns satisfactorily and is suitable for publication in eLife now.

      Reviewer #1 (Public Review):

      This study offers valuable insights into host-virus interactions, emphasizing the adaptability of the immune system. Readers should recognize the significance of MDA5 in potentially replacing RIG-I and the adversarial strategy employed by 5'ppp-RNA SCRV in degrading MDA5 mediated by m6A modification in different species, further indicating that m6A is a conservational process in the antiviral immune response.

      However, caution is warranted in extrapolating these findings universally, given the dynamic nature of host-virus dynamics. The study provides a snapshot into the complexity of these interactions, but further research is needed to validate and extend these insights, considering potential variations across viral species and environmental contexts.

      We concur with the viewpoint that virus-host coevolution complicates the derivation of universal conclusions. To address this challenge, incorporated additional experiments and data based on the suggestions of the reviewers. These experiments were carried out across diverse models, including two distinct vertebrate species (M. miiuy and G. gallus), two different viruses (SCRV and VSV), and the synthesis of corresponding 5’ppp-RNA probes. We believe that these supplementary data bolster the evidence supporting the immune replacement role of MDA5 in the recognition of 5'ppp-RNA in RIG-I deficient species (Figure 1C-1E, Figure 2O and 2P, Figure 4). Moreover, we have duly incorporated references in both the introduction and discussion sections to further support our conclusion that MDA5 in T. belangeri, a mammal lacking RIG-I, possesses the ability to detect RNA viruses posed as RIG-I agonists (doi: 10.1073/pnas.1604939113). Lastly, meticulous revisions have been undertaken in the manuscript, including adjustments to the title, to ensure harmonization with our research outcomes.

      Reviewer#2 (Public Review):

      This manuscript by Geng et al. aims to demonstrate that MDA5 compensates for the loss of RIG-I in certain species, such as teleost fish miiuy croaker. The authors use siniperca cheats rhabdovirus (SCRV) and poly(I:C) to demonstrate that these RNA ligands induce an IFN response in an MDA5-dependent manner in M. miiuy derived cells. Furthermore, they show that MDA5 requires its RD domain to directly bind to SCRV RNA and to induce an IFN response. They use in vitro synthesized RNA with a 5'triphosphate (or lacking a 5'triphosphate as a control) to demonstrate that MDA5 can directly bind to 5'-triphosphorylated RNA. The second part of the paper is devoted to m6A modification of MDA5 transcripts by SCRV as an immune evasion strategy. The authors demonstrate that the modification of MDA5 with m6A is increased upon infection and that this causes increased decay of MDA5 and consequently a decreased IFN response.

      The key message of this paper, i.e. MDA5 can sense 5'-triphosphorylated RNA and thereby compensate for the loss of RIG-I, is novel and interesting, yet there is insufficient evidence provided to prove this hypothesis. Most importantly, it is crucial to test the capacity of in vitro synthesized 5'-triphosphorylated RNA to induce an IFN response in MDA5-sufficient and -deficient cells. In addition, a number of important controls are missing, as detailed below.

      To further support the notion that MDA5 is capable of detecting 5'ppp-RNA in species lacking RIG-I, we conducted additional experiments. Initially, we isolated the RNA from SCRV and VSV viruses. Subsequently, we synthesized 5'ppp-RNA probes that corresponded to the genome termini of SCRV and VSV in vitro. Then, these RNAs were treated with Calf intestinal phosphatase (CIAP) to generate dephosphorylated derivatives. Next, we separately tested the activation ability of various RNAs on IRF3 dimer and IFN response in MKC (M. miiuy kidney cell line) and DF-1 (G. gallus fibroblast cell line) cells, and determined that the immune activation ability of SCRV/VSV viruses depends on their triphosphate structure (Figure 1C-1E, Figure 4C and 4J). In addition, the knockdown of MDA5 inhibited the immune response mediated by SCRV RNA (Figure 2P and 2Q). Finally, we incorporated essential experimental controls (Figure 4B and 4I). We think that the inclusion of these supplementary experimental data significantly enhances the credibility and further substantiates our hypothesis.

      The authors describe an interaction between MDA5 and STING which, if true, is very interesting. However, the functional implications of this interaction are not further investigated in the manuscript. Is STING required to relay signaling downstream of MDA5?

      To better explore the role of STING in MDA5 signal transduction, we constructed a STING expression plasmid and synthesized specific siRNA targeting STING. Next, we found that co-expression of STING and MDA5 significantly enhance MDA5-mediated IFN-1 response during SCRV virus infection (Figure 2N). Conversely, silencing of STING expression restored the MDA5-mediated IFN-1 response (Figure 2O). These findings provide important evidence for the critical involvement of STING in the immune signaling cascade mediated by MDA5 in response to 5'ppp-RNA viruses.

      The second part of the paper is quite distinct from the first part. The fact that MDA5 is an interferon-stimulated gene is not mentioned and complicates the analyses (i.e. is there truly more m6A modification of MDA5 on a per molecule basis, or is there simply more total MDA5 and therefore more total m6A modification of MDA5).

      For the experimental data analysis in Figure 5E and 5F, we first compared the m6A-IP group to the input group, and then normalized the control group (IgG group of 5E and Mock group of 5F) to a value of “1”. Given the observed variability in MDA5 expression levels within the input group of Mock and SCRV virus-infected cells, our analysis represents the actual m6A content of each MDA5 molecule. To enhance clarity, we have updated the label on the Y-axis in Figure 5E and 5F.

      Finally, it should be pointed out that several figures require additional labels, markings, or information in the figure itself or in the accompanying legend to increase the overall clarity of the manuscript. There are frequently details missing from figures that make them difficult to interpret and not self-explanatory. These details are sometimes not even found in the legend, only in the materials and methods section. The manuscript also requires extensive language editing by the editorial team or the authors.

      We acknowledge the valuable feedback from the reviewer and have made significant improvements to our manuscript based on the recommendations provided in the "Recommendation for the authors" section. Furthermore, we have conducted a thorough review of the entire article, resulting in substantial enhancements to the format, clarity, and overall readability of our manuscript.

      Reviewer#3 (Public Review):

      Summary: In this manuscript, the authors investigated the interaction between the pattern recognition receptor MDA5 and 5'ppp-RNA in a teleost fish called Miiuy croaker. They claimed that MDA5 can replace RIG-I in sensing 5'ppp-RNA of Siniperca cheats rhabdovirus (SCRV) in the absence of RIG-I in Miiuy croaker. The recognition of MDA5 to 5'ppp-RNA was also observed in the chicken (Gallus gallus), a bird species that lacks RIG-I. Additionally, they reported that the function of MDA5 can be impaired through m6A-mediated methylation and degradation of MDA5 mRNA by the METTL3/14-YTHDF2/3 regulatory network in Miiuy croaker under SCRV infection. This impairment weakens the innate antiviral immunity of fish and promotes the immune evasion of SCRV.

      Strengths:<br /> These findings provide insights into the adaptation and functional diversity of innate antiviral activity in vertebrates.

      Weaknesses:<br /> However, there are some major and minor concerns that need to be further addressed. Addressing these concerns will help the authors improve the quality of their manuscript.One significant issue with the manuscript is that the authors claim to be investigating the role of MDA5 as a substitute for RIG-I in recognizing 5'ppp-RNA, but their study extends beyond this specific scenario. Based on my understanding, it appears that sections 2.2, 2.3, 2.5, 2.6, and 2.7 do not strictly adhere to this particular scenario. Instead, these sections tend to investigate the functional involvement of Miiuy croaker MDA5 in the innate immune response to viral infection. Furthermore, the majority of the data is focused on Miiuy croaker MDA5, with only a limited and insufficient study on chicken MDA5. Consequently, the authors cannot make broad claims that their research represents events in all RIG-I deficient species, considering the limited scope of the species studied.

      We agree with the reviewer's perspective that functional analysis of MDA5 in M. miiuy may not adequately represent all species lacking RIG-I. To address this concern, we have incorporated additional experimental data utilizing different model systems, including two different vertebrate species (M. miiuy and G. gallus), two distinct viruses (SCRV and VSV), and the synthesis of two corresponding 5’ppp-RNA probes. While the functional characterization of G. gallus MDA5 remains relatively limited compared to M. miiuy, our current experimental findings provide support for two key observations. Firstly, the triphosphate structure of the VSV virus is pivotal in activating the innate immune response in G. gallus against the virus (Figure 1D and 4J). Secondly, G. gallus MDA5 can recognize 5’ppp-RNA (Figure 4I, 4K and 4L). Consequently, although we cannot definitively establish the immune surrogate function of MDA5 in all RIG-I-deficient species, our research data further substantiates this hypothesis. Moreover, we have adopted a more cautious attitude in summarizing our experimental conclusions, thereby enhancing the rigor of our manuscript language.

      The current title of the article does not align well with its actual content. It is recommended that the focus of the research be redirected to the recognition function and molecular mechanism of MDA5 in the absence of RIG-I concerning 5'ppp-RNA. This can be achieved through bolstering experimental analysis in the fields of biochemistry and molecular biology, as well as enhancing theoretical research on the molecular evolution of MDA5. It is advisable to decrease or eliminate content related to m6A modification.

      Following the reviewer's recommendations, we have revised the title to emphasize that our main research focus is a teleost fish devoid of RIG-I. Furthermore, we have conducted additional molecular experiments to further elucidate the 5'ppp-RNA recognition function of MDA5 in RIG-I-deficient species. In an attempt to analyze the potential molecular evolution of MDA5 resulting from RIG-I deficiency, we collected MDA5 coding sequences from diverse vertebrates. However, due to multiple independent loss events of RIG-I in fish, fish with or without RIG-I genes in the phylogenetic tree cannot be effectively clustered separately, making it extremely difficult to perform this aspect of analysis. Consequently, we have regrettably opted to forgo the molecular evolution analysis of MDA5.

      Our article topic is to reveal an antagonistic phenomenon between fish receptor and RNA viruses. The MDA5 of RIG-I-lost fish has evolved the ability to recognize 5’ppp-RNA virus and mediate IFN response to resist SCRV infection. Conversely, the m6A methylation mechanism endows the SCRV virus with a means to weaken the immune capacity of MDA5. Therefore, we believe that the latter part is an important part of the arms race between the virus and its host, and should be retained.

      Additionally, the main body of the writing contains several aspects that lack rigor and tend to exaggerate, necessitating significant improvement.

      We appreciate the reviewer’s comment and have improved the manuscript addressing the points raised in the “Recommendation for the authors”. We have added corresponding experiments to strengthen the verification of the conclusions, and in addition, we are more cautious in summarizing the language of the conclusions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The evidential foundation within the Result 1 section appears somewhat tenuous.

      Firstly, the author derives conclusions regarding the phenomenon of RIG-I loss in lower vertebrates by referencing external literature and conducting bioinformatics analyses. It is pertinent to inquire whether the author considered fortifying these findings through additional WB/PCR experiments, particularly for evaluating RIG-I expression levels across diverse vertebrates, encompassing both lower and higher orders.

      Firstly, the species we analyzed are mostly model species with excellent genomic sequence information in the database. Secondly, the RIG-I protein sequences (at least some domain sequences) are relatively conserved in vertebrates. Therefore, the credibility of evaluating the existence of RIG-I in these species through homology comparison is high. Therefore, we do not intend to conduct additional PCR/WB experiments to confirm this.

      Additionally, following the identification of RIG-I loss, the author postulates MDA5 as a substitute of RIG-I, grounding this speculation in the analysis of MDA5 and LGP2 protein structures. It is imperative to address whether the author could enhance the manuscript by supplying expression data for MDA5 and LGP2 across different vertebrates and elucidating further why MDA5 is posited as the compensatory mechanism for RIGI loss.

      Like MDA5, LGP2 is also an interferon-stimulating gene, so they both likely exhibit high sensitivity to viral infections. Therefore, we think that comparing the expression data of these two genes is difficult to evaluate their function. In mammals, the regulatory mechanisms of LGP2 to RIG-I and MDA5 were complicated and ambiguous. To evaluate the potential function of LGP2 in M. miiuy, we further constructed LGP2 plasmid and synthesized siRNA targeting LGP2. Then, our results indicate that mmiLGP2 can enhance the antiviral immune response mediated by mmiMDA5 (Figure 1H and 1I), further indicating the regulatory role of mmiLGP2 in RLR signaling, rather than acting as a compensatory receptor for RIG-I.

      Also, is it conceivable that other receptors contribute to this compensatory effect in lower vertebrates?

      5’ triphosphate short blunt-end double-strand RNA is the ligand of RIG-I as contained in the panhandle of negative-strand viral genomes. We mainly focus on the immune recognition and compensatory effects of other receptors on RIG-I loss, and MDA5, as the protein with the most similar structure, first attracted our attention. In addition, IFIT proteins have been reported to recognize triphosphate single-stranded RNA (doi: 10.1038/nature11783). However, we used SCRV and VSV RNA as viral models, both of which have negative stranded genomes and meet the ligand standards of RIG-I, rather than IFIT. Therefore, we excluded the IFIT protein from our research scope.

      (2) The article exclusively employs a singular type of 5'PPP-RNA virus and one specific lower vertebrate species, thereby potentially compromising the robustness of the assertion that this phenomenon is prevalent in lower vertebrates. To bolster this claim, could the author consider incorporating data from an alternative 5'PPP-RNA virus and a different lower vertebrate species?

      To address this concern, we have incorporated additional experimental data utilizing different model systems, including two different vertebrate species (M. miiuy and G. gallus) and two distinct viruses (SCRV and VSV). While the functional characterization of G. gallus MDA5 remains relatively limited compared to M. miiuy, our current experimental findings provide support for two key observations. Firstly, the triphosphate structure of the VSV virus is pivotal in activating the innate immune response in G. gallus against the virus (Figure 1D and 4J). Secondly, G. gallus MDA5 can recognize 5’ppp-RNA (Figure 4I, 4K and 4L). Consequently, these experimental results further confirmed the conservatism of this immune compensation mechanism.

      (3) A nuanced consideration of the statement in Result 5 is warranted. Examination of the results under SCRV infection conditions suggests dynamic fluctuations in MDA5 expression levels, challenging the veracity of the statement implying "increased expression", which contradicts the proposed working model of this article.

      Because MDA5 acts as a receptor and plays a recognition immune role in the early stages of virus infection, the expression of MDA5 in the early stage of SCRV infection rapidly increases. In the later stage of infection, the expression of MDA5 may gradually decrease again due to the negative feedback mechanism in the host body to prevent excessive inflammation. However, compared to the uninfected group, the expression of MDA5 was significantly increased in the SCRV-infected group, so we believe that the term "increased expression" is not a problem. In addition, the m6A mechanism can weaken the function of MDA5, but it still cannot prevent the overall increase of MDA5 expression, which is not contradictory to the working model in this article.

      Additionally, the alterations in m6A levels in miiuy croaker under SCRV infection conditions warrant clarification. Could the author employ m6A dot blotting to supplement the findings related to total m6A levels?

      Our previous studies (doi: 10.4049/jimmunol.2200618) have suggested that the total m6A level is increased after SCRV infection in miiuy croaker. We cited this conclusion in the discussion of our manuscript.

      (4) It would be beneficial if the editors could assist the author in enhancing the language of the manuscript.

      We have carefully checked the full article and modified it with Grammarly tools, and we believe that the grammar, format, and readability of our articles have been greatly improved.

      Reviewer #2 (Recommendations For The Authors):

      Figure 1

      (1) Figure 1B - some clarification needs to be added about this figure in the text. It is unclear what the main point is that the authors would like to convey.

      What we want to emphasize is that some species with RIG-I, such as zebrafish, have also experienced RIG-I loss events, but have undergone whole genome replication events before the loss, thus preserving a copy of RIG-I. This indicates that loss events of RIG-I are very common in vertebrates and do not occur randomly. We have elaborated on this point in the results and discussion.

      (2) Figure 1C - is not very informative other than showing Mm MDA5 and LGP2 side-by-side. It would be more useful to show a comparison of human RIG-I/MDA5 alongside Mm and Gg MDA5. Are there any conserved/shared key residues between hRIG-I/hMDA5 versus mmMDA5?

      Homologous proteins are often known to adopt the same or similar structure and function. We have added human RIG-I domain information to this figure (Figure 1F). By comparing the domain information of human RIG-I with M. miiuy MDA5 and LGP2, M. miiuy MDA5 has a similar structure to human RIG-I, making it most likely to compensate for the missing RIG-I. While M. miiuy LGP2 lacks the CARD domain, which is crucial for signal transduction, so we will shift our focus to M. miiuy MDA5. In addition, we collected protein sequences of MDA5 and RIG-I from various vertebrates to identify key residues evolved in recognizing 5'ppp-RNA by M. miiuy MDA5. However, unfortunately, no potential residues were found during the comparison process.

      Figure 2

      (1) Figure 2B - It would be important to demonstrate MDA5-Flag expression by immunoblot and compare MDA5-Flag overexpression to endogenous MDA5 expression using the anti-MDA5 antibody from panel 2A. If IF is used, more cells need to be visible in the field.

      After transfecting the MDA5 plasmid into MKC, endogenous MDA5 expression was detected using MDA5 antibodies. The results showed a significant increase in MDA5 protein levels, indicating that MDA5 antibodies can specifically recognize MDA5 protein. In addition, we retained the original immunofluorescence images to better demonstrate the subcellular localization of MDA5.

      (2) Figure 2C - The 1:1 stoichiometry of MDA5:MAVS (in the absence of any stimulus) is quite surprising. How does the interaction between MDA5 and MAVS change upon stimulation with an RNA ligand (SCRV, poly(I:C))?

      We do not believe that the actual stoichiometry between MDA5 and MAVS is what you described as 1:1. In fact, the proportion of proteins in the complex depends on many factors in the experimental results with Co-IP. Firstly, the MDA5 plasmid in this study has a 3 × Flag tag, while the MAVS only has a 1x Myc tag, which makes the antibody more sensitive for detecting MDA5-Flag. In addition, the Co-IP results are also affected by multiple factors such as the type of antibody and the number of recoveries, making it difficult to estimate the actual ratio of MDA5 to MAVS. Based on the above reasons and the fact that the detection of the interaction strength between MDA5 and MAVS after infection seems to be off-topic, we did not continue to explore this point.

      (3) Figure 2D - The interaction between MDA5 and STING is a very interesting finding but is not elaborated on in the paper (even though the interaction between MDA5 and STING is mentioned in the abstract). The manuscript would be strengthened if the interaction between MDA5 and STING is further investigated. For example, does the IFN response that is reported in panels 2E to 2H require the presence of STING? Does mmMDA5 signal via STING in response to a DNA ligand?

      We appreciate the referee's suggestion to study the mutual influence between MDA5 and STING. We found that co-expression of STING and MDA5 can enhance MDA5-mediated IFN-1 response during SCRV virus infection, while knocking down STING can restore MDA5-mediated IFN-1 (Figure 2N and 2O). This indicates that STING plays an important signaling role in the immune response of MDA5 to RNA viruses. We understand the importance of cGAS/STING pathways in identifying exogenous DNA, so exploring the MDA5 pathway for DNA ligand recognition is an interesting and meaningful perspective. But this seems to be detached from the theme of our article, so we didn't continue to explore this point.

      (4) Figures 2F and 2H - the authors demonstrate that SCRV induces a type I IFN response in an MDA5-dependent manner. While SCRV is a single-stranded negative-sense RNA virus that contains 5'ppp-RNA, it cannot be excluded that MDA5 is activated here in response to a double-stranded RNA intermediate of viral origin or even a host-derived RNA whose expression or modification is altered during infection. To demonstrate in an unambiguous manner that MDA5 senses 5'ppp-RNA, it is crucial to use the in vitro synthesized 5'ppp-RNA (and its dephosphorylated derivative as a control) from Fig. 4 in these experiments.

      We transfected 5 'ppp SCRV and 5' ppp VSV (and their dephosphorylated derivatives) synthesized in vitro into MKC cells and DF-1 cells, respectively. The results showed that 5’ppp-RNAs significantly promoted the formation of IRF3 dimers, while their dephosphorylated derivatives did not (Figure 4C and 4J). In addition, we extracted virus RNA from the SCRV and VSV viruses and dephosphorylated them with Calf intestinal phosphatase (CIAP). These RNAs were transfected into MKC and DF-1 cells and found that the immune response mediated by virus RNAs was much higher than the dephosphorylated form (Figure 1C-1E). The above results indicate that the immune response activated by SCRV and VSV is indeed dependent on their triphosphate structure. Finally, the IRF3 dimer and IFN induction activated by SCRV RNA can be inhibited by si-MDA5 (Figure 2P and 2Q), further demonstrating the involvement of MDA5 in the immune response mediated by 5’ppp-RNA ligands.

      (5) In mice and humans, MDA5 is known to collaborate with LGP2 to jointly induce an IFN response. Does M.miiuy express LGP2? If so, it would be informative to include a siRNA targeting LGP2 in the experiments in panel F. In mammals, LGP2 potentiates the response via MDA5 while it may inhibit RIG-I activation.

      M.miiuy express LGP2. We constructed an LGP2 plasmid and synthesized si-LGP2 to investigate the impact of LGP2 on MDA5-mediated immune processes (Figure 1G-1I). The results showed that LGP2 can enhance the IFN response mediated by MDA5 during SCRV virus infection, similar to that in mammals.

      (6) Minor comment - Is the poly(I:C) used in this figure high or low molecular weight poly(I:C)? HMW poly(I:C) preferentially stimulates MDA5, while LMW poly(I:C) preferentially stimulates RIG-I.

      We used poly(I:C)-HMW as a positive control for activating MDA5. We have modified the relevant information in Figure 2 and its legend.

      Figure 3

      (1) Figure 3F/G - The normalization in this Figure is difficult to interpret. It would be better to split Figure 3G into 4 separate graphs and include the mock-infected cells alongside the infected samples (as done in Figure 2).

      To better demonstrate the function of the RD domain of MDA5 in M. miiuy, we have changed the experimental plan, as shown in figure 3F. We detected the induction of antiviral factors by overexpression of MDA5 and MDA5-△RD under poly (I:C)-HMW stimulation. This can indicate that the RD domain of MDA5 has a conserved function in the recognition of poly(I:C)-HMW in M. miiuy, and can serve as a positive control for the recognition of SCRV virus by the RD domain.

      Figure 4

      (1) Figure 4B - A number of important controls are missing. Was the immunoprecipitation of RNA successful? This could be shown by running a fraction of the immunoprecipitated material on an RNA gel and/or by showing that the input RNA was depleted after IP. In addition, a control IP (Streptavidin beads without biotinylated RNA) is missing to ensure that MDA5 does not stick non-specifically to the Streptavidin resin.

      We appreciate the referee's suggestions. We rerun this experiment and added a non-biomarker RNA IP control group, and the results showed that MDA5 did not adsorb non-specific onto the beads (Figure 4B). In addition, based on the referee's suggestion, we tested the consumption of RNA before and after immunoprecipitation, and the results showed that biotin-labeled RNA, rather than non-biotin-labeled RNA, could be adsorbed by beads, indicating the success of RNA precipitation. However, we think that this is not necessary for the final presentation of the experimental results, so we did not show this in the figure.

      (2) Figure 4B - It is unclear why there is such a large molecular weight difference between endogenous MDA5 and MDA5-Flag (110 kDa versus 130/140 kDa). Why is there less MDA5-Flag retrieved than endogenous MDA5?

      After careful analysis, we believe that the significant difference in molecular weight between endogenous MDA5 and MDA5 Flag may be due to three reasons. Firstly, MDA5 flag has a 3× Flag tag. Secondly, as shown in the primer table, we constructed MDA5 between the NotI and XbaI cleavage sites in the pcDNA3.1 vector, which are located at the posterior position in the vector. This means that the Flag tag has a certain distance from the starting codon of MDA5, and these sequences on the vector can also be translated and increase the molecular weight of the exogenous MDA5 protein. Finally, in order to facilitate the amplification of the primers, the F-terminal primers of MDA5 contain a small portion of the 3'UTR sequence (excluding the stop codon). These above reasons may have led to significant differences in molecular weight. In addition, in order to supplement important experimental controls, we have conducted a new RNA pull-down experiment as shown in Figure 4B.

      (3) Minor point: Figure 4B - please clarify in the figure whether RNA or protein is immunoprecipitated and via which tags.

      We have conducted a new RNA pull-down experiment as shown in Fig 4B, and we have clearly labeled the relevant information in the figure.

      (4) Figure 4E - the fraction of MDA5 that binds 5'ppp-RNA seems incredibly minor. And why is this experiment done using 5'OH-RNA as a competitor, rather than simply incubating MDA5 and 5'OH-RNA together and demonstrating that these do not form a complex?

      The proportion of MDA5 combined with 5’ppp-RNA is influenced by many conditions, including the concentration and purity of the probe and purified protein. In addition, the dosage ratio between the RNA probe and MDA5 protein in the EMSA experiment can also have a significant impact on the results. Therefore, it is not possible to accurately determine the actual binding force between MDA5 and RNA. In the EMSA experimental program, both cold probes (5’ppp-RNA) and mutated cold probes (5’OH-RNA and 5’pppGG-RNA) are crucial for demonstrating the specific binding between MDA5 and 5’ppp-RNA, as they can exclude false positive errors caused by factors such as the presence of biotin in the purified MDA5 protein itself.

      (5) Figure 4B/4C/4F - These experiments would be strengthened by including an MDA5 mutant that cannot bind to RNA. These mutants are well-described in mammals. If these residues are conserved, it is straightforward to generate this mutant.

      As shown in Figure 3, the MDA5 of M. miiuy has an RD domain that can recognize the SCRV virus. We constructed MDA5-△RD mutant plasmids with 6x His-tags and purified them for EMSA experiments (Figure 4E). The experimental results further indicate that MDA5, rather than MDA5-△RD, can bind to 5’ppp-SCRV (Figure 4G). This further confirms the crucial role of the RD domain in recognizing the 5'ppp-RNA virus.

      (6) Minor point: Figure 4E: please clarify in which lanes MDA5 has been added.

      Thank you for the referee's suggestion. We have synthesized new 5'ppp-RNA probes (5’ppp-SCRV and their dephosphate derivatives) and rerun this experiment, and relevant information has been added in the Figure (Figure 4F).

      Figure 5

      (1) Figure 5C - As MDA5 is an interferon-stimulated gene (as shown in panel G/H/I)) the increased MDA5 expression could simply explain the increase in the amount of m6A-MDA5 that is immunoprecipitated after infection. Could this figure be improved by doing a fold change between input vs m6A-IP OR uninfected vs SCRV-infected conditions? This would reveal whether the modification of MDA5 with m6A is really increased after infection.

      As shown in Figure 5F below, our data indicates that the proportion of m6A-modified MDA5 does indeed increase after SCRV infection, rather than solely due to the increased expression of MDA5 itself.

      (2) Figure. 5E/F - The y-axis is unclear: relative MDA5 m6A levels. Relative to what? Input? Mock infected?

      For experiments in Figure 5E/F, we first compared the m6A-IP group with the input group, and then normalized the control group (IgG group of 5E and Mock group of 5F) to “1”. We have replaced the Y-axis name with a clearer one (Figure 5E and 5F).

      (3) General comment - It is not mentioned in the text that MDA5 is an interferon-stimulated gene. This would account for the increase in expression (qPCR) after viral infection or poly(I:C) transfection, hence there is no novelty in this finding. In addition, the authors suggest that MDA5 increases at the protein level (by immunoblot) but the increase on these blots is not convincing (figure 5H/5I).

      We understand that the increase in expression of MDA5 as an interferon-stimulated gene after viral infection is a common phenomenon. We present this to further validate the m6A sequencing transcriptome data, and to demonstrate that although m6A modification interferes with MDA5 expression during viral infection, it cannot prevent the increase of mRNA level of MDA5. In addition, we rerun the experiment and the results showed that the expression of MDA5 protein can indeed be specifically activated by the SCRV virus and poly(I:C)-HMW.

      Figure 6

      (1) Figure 6E - What was the MOI of the virus used in this experiment? It is not mentioned in the figure legend.

      MOI=5, we have added this point in the figure legend.

      Figure 7

      (1) Figure 7J - This graphic is somewhat misleading and should be altered to better reflect the conclusions that are drawn in the manuscript. The graphic suggests that MAVS and STING interact, but this is not demonstrated in the paper. In addition, the paper does not demonstrate whether MAVS or STING (or both) are needed downstream of MDA5 to relay signalling. Finally, please draw an arrow from type I IFNs to increased expression of MDA5 to illustrate that MDA5 is an ISG.

      Thank you for the referee's suggestion. We have revised the images to more accurately match the conclusions of the manuscript (Figure 7J). Firstly, we have separated the STING protein from the MAVS protein. Secondly, arrows have been used to indicate that MDA5 is an IFN-stimulated gene. Finally, as we have added relevant experiments to demonstrate the importance of MITA protein in the signaling process of MDA5-activated IFN response. In addition, the function of MAVS binding to MDA5 protein and promoting its signal transduction is very conserved, and there is a good research background even in fish with RIG-I deficiency (10.1016/j.dci.2021.104235). Therefore, in Figure 7J, we still chose to bind MAVS to MDA5 protein and use it as a downstream signal transducer of MDA5.

      Discussion<br /> (1) There is very little discussion about METTL and YTHDF proteins in the discussion despite the fact that the last 2 figures are entirely devoted to these proteins.

      Based on the referee's suggestion, we have added relevant content about METTL and YTHDF proteins in the discussion. In addition, the basic mechanism and function of METTL and YTHDF proteins were briefly described in the introduction.

      Reviewer #3 (Recommendations For The Authors):

      Please refer to the specific suggestions and recommendations. They include proposals for experimental additions, improved methodologies, and suggestions to resolve writing-related concerns.

      Major concerns

      (1) I suggest changing the article title to "Functional Replacement of RIG-I with MDA5 in Fish Miiuy Croaker", or a similar title, to make it more focused and closely aligned with the content of the article.

      Following the reviewer's recommendations, we have revised the title to emphasize our primary research subject is a teleost fish that lacks RIG-I. In addition, we have changed “5’ppp-RNA” to “5’ppp-RNA virus” to emphasize the interaction between the virus and the receptor. We believe that the revised title is more in line with the content of the article.

      (2) Due to the inherent limitations in genome sequencing, assembly, and annotation for the Miiuy croaker, comprehensive annotation of immune-related genes remains incomplete. To address this critical gap, it is recommended that authors establish experimental protocols, such as Fluorescence In Situ Hybridization (FISH), to confirm the absence of RIG-I in the Miiuy croaker. They should simultaneously employ MDA5 probes as a positive control for validation purposes.

      The miiuy croaker has good genomic information at the chromosomal level (doi: 10.1016/j.aaf.2021.06.001). In addition, studies have shown that RIG-I is absent in the orders of Perciformes (doi: 10.1016/j.fsirep.2021.100012), while miiuy croaker belongs to the order Perciformes, so it does indeed lose the RIG-I gene. Therefore, we do not intend to use FISH technology to prove this.

      (3) Similarly, it is recommended that the authors first provide evidence of the presence of 5'ppp at the 5' terminus of the genome RNA of SCRV, as demonstrated in the study by Goubau et al. (doi: 10.1038/nature13590, Supplementary figure 1). This evidence is crucial before drawing conclusions about the compensatory role of MDA5 in recognizing 5'ppp RNA viruses, using SCRV as the viral model.

      As suggested by the referee, we extracted SCRV RNA from SCRV virus particles and assessed the 5’-phosphate-dependence of stimulation by SCRV RNA. Calf intestinal phosphatase (CIAP) treatment substantially reduced the stimulatory activity of SCRV RNA in MKC cells of M. miiuy (Figure 1C and 1E). In addition, similar results were obtained by transfecting VSV-RNA isolated from VSV virus into DF-1 cells of G. gallus (Figure 1D). The above evidences confirm the presence of triphosphate molecular features between SCRV and VSV viruses, and indicating that birds and fish lacking RIG-I have other receptors that can recognize 5’ppp-RNA.

      (4) The 62-nucleotide (nt) 5'ppp-RNA utilized in this study was obtained from Vesicular Stomatitis Virus (VSV). In order to provide direct evidence, it is necessary to include a 62-nt 5'ppp-RNA that is directly derived from SCRV itself.

      We adopted this suggestion and synthesized a 67-nucleotide 5’ppp-SCRV RNA probe. We found that 5’ppp-SCRV activates dimerization of IRF3 and binds to MDA5 of M. miiuy in a 5’-triphosphate-dependent manner (Figure 4A-4F).

      (5) Given that RNAs with uncapped diphosphate (PP) groups at the 5′ end also activate RIG-I, similar to RNAs with 5′-PPP moieties, and the 5′-terminal nucleotide must remain unmethylated at its 2′-O position to allow RNA recognition by RIG-I, it is necessary for the authors to conduct additional experiments to supplement and validate these two distinguishing features of RIG-I in RNA recognition. This will provide more reliable evidence for the replacement of RIG-I by MDA5 in RNA recognition.

      Thank you for the reviewer's professional suggestions. We understand that exploring the combination of 5’pp-RNA and 2′-O-methylated RNA with MDA5 can further demonstrate the alternative function of MDA5. But we think that the use of 5’ppp-RNA and their dephosphorylation derivatives can fully demonstrate that the MDA5 of M. miiuy and G. gallus have evolved to recognize 5’triphosphate structure like human RIG-I. Therefore, we do not intend to conduct any additional experiments

      (6) In section 2.3, the authors assert that Miiuy croaker recognizes SCRV through its RD domain. This claim is supported by their data showing that cells overexpressed with the MDA5 ΔRD mutant lost the ability to inhibit SCRV replication. As a result, the authors draw the conclusion that "these findings provide evidence that MDA5 may recognize 5'-triphosphate-dependent RNA (5'ppp-RNA) through its RD domain." However, to strengthen their argument, the authors should first demonstrate that during SCRV infection, MDA5-mediated antiviral immune response is indeed initiated by recognizing the 5'ppp part of the SCRV RNA, rather than the double-strand part (which can exist in ssRNA virus) of the viral RNA, as this is naturally a ligand for MDA5. Additionally, the authors should treat the isolated SCRV RNA with CIP to remove the phosphate group and examine the binding of MDA5 with SCRV RNA before and after treatment. They should also transfect CIP-treated or untreated SCRV RNA into MDA5 knockdown and wild-type MKC cells to investigate the induction of antiviral signaling and levels of viral replication. Finally, the authors should verify the binding ability of the mutants with isolated SCRV RNA, with or without CIP treatment, to determine which domain of MDA5 is responsible for SCRV 5'ppp-RNA recognition.

      We understand the reviewer's concern that MDA5 may be identified by binding to dsRNA in the SCRV virus. Based on the reviewer's suggestion, we extracted SCRV RNA and obtained its dephosphorylated RNA using Calf intestinal phosphatase (CIAP). Next, we transfected them into MDA5-knockdown and wild-type MKC cells, and detected the dimerization of IRF3 and IFN reaction. The results indicate that SCRV RNA does indeed activate immunity in a triphosphate-dependent manner, and knockdown of MDA5 prevents immune activation of SCRV RNA (Figure 1C and 1E, Figure 2P and 2Q). Finally, we synthesized a 5'ppp-SCRV RNA probe and demonstrated that MDA5 binds to 5'ppp-SCRV through the RD domain (Figure 4E-4G). We believe that these results can better demonstrate that MDA5 recognizes 5’ppp-RNA through its RD domain and addresses the concerns of the reviewers.

      (7) Similarly, merely presenting Co-IP data demonstrating the interaction between Miiuy croaker MDA5 and STING in overexpressed EPC cells does not justify the claim that "in vertebrates lacking RIG-I, MDA5 can utilize STING to facilitate signal transduction in the antiviral response". This is because interactions observed through overexpression may not accurately reflect the events occurring during viral infection or their actual antiviral functions. To provide more robust evidence, it is essential to conduct functional experiments after STING knockout (or at least knockdown). Furthermore, it is important to note that Miiuy Croaker alone cannot adequately represent all "vertebrates lacking RIG-I".

      We found that co-expression of STING and MDA5 can enhance MDA5-mediated IFN-1 response during SCRV virus infection, while knocking down STING can restore MDA5-mediated IFN-1 response (Figure 2N and 2O). This indicates that STING plays an important signaling role in the immune response of MDA5 to RNA viruses. In addition, loss of RIG-I is a common phenomenon in vertebrates, and STING of birds such as chickens (doi: 10.4049/jimmunol.1500638) and mammalian tree shrews (doi: 10.1073/pnas.1604939113) can also bind to MDA5, indicating that STING can indeed play a crucial role in MDA5 signaling in species with RIG-I deficiency. We have added this section to our discussion and elaborated on our observations in more cautious language.

      (8) In the manuscript, a series of experiments were conducted using an antibody (Beyotime Cat# AF7164) against endogenous MDA5. The corresponding immunogen for this MDA5 antibody is a recombinant fusion protein containing amino acids 1-205 of human IFIH1/MDA5 (NP_071451.2). However, the amino acid sequences of IFIH1/MDA5 differ substantially between humans and Miiuy croaker, which could introduce errors in the results. Therefore, it is essential to employ antibodies specifically designed for targeting Miiuy croaker's own MDA5 in the experiments.

      As shown in Figure 2B, endogenous MDA5 antibodies can detect the MDA5 portion that is forcibly overexpressed by plasmids, suggesting that the MDA5 antibody can indeed specifically recognize the MDA5 protein of M. miiuy.

      (9) It is recommended to investigate the phosphorylation of IRF3 in order to confirm the downstream signaling pathway during viral infection when MDA5 is knocked down or overexpressed.

      Due to the lack of available phosphorylation antibodies for fish IRF3, we used IRF3 dimer experiments to detect downstream signaling (Figure 1C and 1D, Figure 2P, Figure 4C and 4J).

      (10) The use of poly I:C as a mimic for dsRNA to investigate MDA5's recognition of 5'ppp-RNA in hosts lacking RIG-I, as well as the examination of the regulatory role of MDA5 m6A methylation upon activation by 5'ppp-RNA, may be inappropriate. Poly I:C does not possess 5'ppp, and while it has been identified as a ligand for MDA5 in various studies, MDA5 cannot serve as a substitute for RIG-I in recognizing poly (I:C). Therefore, the authors should utilize 5'ppp-dsRNA as the mimic and include the corresponding 5'ppp-dsRNA control without a 5'triphosphate as the negative control (both available from InvivoGen). This approach will specifically elucidate the mechanisms involved when MDA5 functions similarly to RIG-I in the recognition of 5'ppp-RNA.

      In our study, we used poly(I:C)-HMW, a known dsRNA mimetic that can be preferentially recognized by MDA5 rather than RIG-I, as a positive control for activating MDA5. What we want to demonstrate is that, like poly(I:C)-HMW (positive control), SCRV can also promote MDA5-mediated IFN immunity, further indicating the important role of MDA5 in 5’ppp-RNA virus invasion. We have clearly labeled the type of poly(I:C) in the figures and legends to avoid misunderstandings for readers.

      (11) In Figure 2, Figure 3, and Figure 6, the appearance of virus plaques is not readily apparent, and it is necessary to replace these images with clearer photographs. It appears that MKC or MPC cells are not appropriate for conducting plaque assays. To accurately assess viral proliferation, the authors should measure key indicators throughout the process, such as the production of positive-strand RNAs (+RNAs), replication intermediates (RF), and transcription of subgenomic RNAs. This approach is preferable to solely measuring the M and G protein genes from the virus genome as positive results can still be observed in contaminated cells.

      As pointed out by the reviewer, we also think that the virus plaque images in Figure 2K and Figure 3D are not clear enough, so we have replaced them with new clear images (Figure 2J and Figure 3D). But we think that other images can clearly display the proliferation of the SCRV virus, so we did not replace them. In addition, the primers we currently use do measure +RNA, so the replication level of the SCRV virus can be accurately evaluated without being affected by virus contamination. Because the regions where the two pairs of primers are located belong to the SCRV-M and SCRV-G protein genes, we label them as SCRV-M and SCRV-G to distinguish between the two pairs of genes. To avoid reader misunderstanding, we have modified the Y-axis label in the figures (Figure 2I and 2K, Figure 3E, Figure 6E and 6O).

      (12) There is a substantial disparity in the molecular size of M. miiuy MDA5 between endogenous and exogenously expressed proteins, as shown in Figure 2A and 2C-D. Please provide clarification.

      Please refer to the response to Reviewer 2's question regarding Figure 4B above.

      (13) The manuscript incorporates the evolutionary perspective, but lacks specific evolutionary analysis. Thus, it is essential to include relevant analysis to comprehend the evolutionary dynamics and positive selection on MDA5 and LGP2 in the absence of RIG-I in Miiuy croaker. This can be achieved through theoretical calculations using appropriate algorithms, such as the branch models and branch-site models based on the maximum-likelihood method implemented in the phylogenetic analysis by maximum likelihood (PAML) package.

      In fact, we have analyzed the molecular evolution of MDA5 and LGP2. Unfortunately, even when analyzing only the MDA5/LGP2 CDS sequences in fish, we found that the topologies of gene trees of MDA5/LGP2 were largely consistent with the species tree. Thus, species with or without RIG-I in the gene trees cannot effectively separate clusters, making it extremely difficult to analyze the molecular evolution of MDA5/LGP2 caused by RIG-I deficiency. Consequently, we gave up this aspect of analysis.

      (14) If the narrative regarding m6A methylation goes beyond the activation of MDA5 through recognition of 5'ppp-RNA and represents a regulatory mechanism for all MDA5 activation events, it is not relevant to the theme of "An arms race under RIG-I loss: 5'ppp-RNA and its alternative recognition receptor MDA5." Therefore, all investigations in this paper should focus solely on events when MDA5 recognizes 5'ppp-RNA. Any data associated with the broader regulatory mechanisms and m6A methylation of MDA5 should be excluded from this manuscript and instead be included in a separate study dedicated to exploring this specific topic.

      Our theme aims to showcase RNA viruses, rather than an interaction between 5'ppp-RNA and host virus receptors, which our current topic cannot accurately express. Therefore, we made two main changes: firstly, we limited the study species to M. miiuy, although some studies on the functional substitution of MDA5 for RIG-I involved birds. Secondly, change “5’ppp-RNA” to “5’ppp-RNA virus”. We believe that the revised title is more in line with our current research contents.

      (15) The running title appears to be hastily done.

      We modified it to “MDA5 recognizes 5’ppp-RNA virus in species lacking RIG-I”.

      (16) There are many descriptions that are not strongly related to the main theme of the article in the introduction section, making it lengthy and fragmented. Please focus on the research background of RIG-I and MDA5, including their structures, functions, and regulatory mechanisms, as well as the research progress on the compensatory effect of MDA5 in the absence of RIG-I and its evolutionary adaptation mechanism in other species.

      Based on the suggestions of the reviewers, we have removed some of the less relevant content in the introduction and added research progress on the compensatory effect of MDA5 in the evolutionary adaptation mechanism of tree shrews in the absence of RIG-I.

      (17) Lines 149-156 in the "Results" section include content that resembles an "Introduction" It is important to avoid duplicating information in the results section. Therefore, the authors are encouraged to revise this paragraph to ensure conciseness in the article.

      We have streamlined this section to enhance the article's conciseness and clarity.

      (18) In the "Results" section, at line 177, the authors assert, "As depicted in Figure 1F-1H," which should be corrected to Figure 2F-2H. Furthermore, the y-axis of the two figures on the right-hand side of Figure 2H represents the ISG15 genes. At line 182, "as demonstrated in Figure 1I-1L," should be revised as "as illustrated in Figure 2I-2L". The authors demonstrated a lack of attention to detail.

      Thank you to the reviewer for pointing out our errors, and we have made the necessary corrections.

      (19) In lines 197-198, the authors stated that "MDA5-ΔRD showed an inability to interact with SCRV." However, Figure 3D did not reveal any significant difference, thus it is advisable to repeat this experiment at least once.

      We have replaced this virus spot image with a new one (Figure 3D).

      (20) In lines 200-201 of the "2.3 RD domain is required for MDA5 to recognize SCRV" section, the authors report that the expression of antiviral genes was induced by the overexpression of both MDA5 and MDA5-ΔRD, even in the absence of infection (Figure 3F). Why does the expression of antiviral genes increase in the absence of viral RNA stimulation? Please provide a reasonable explanation.

      In the absence of viral infection, overexpression of viral receptor proteins may still transmit erroneous signaling, affecting the body's immunity. We speculate that due to the preservation of the CARD domain by MDA5 and MDA5-ΔRD, they can still induce the expression of antiviral factors without ligands, although this induction effect is much smaller than that of viral infection. However, in order to better demonstrate the function of the RD domain of MDA5 in M. miiuy, we have changed the experimental plan, as shown in the figure 3F. We detected the induction of antiviral factors by overexpression of MDA5 and MDA5-△RD under poly (I:C)-HMW stimulation. This can indicate that the RD domain has a conserved function in the recognition of poly(I:C)-HMW in M. miiuy, and can serve as a positive control for the recognition of SCRV virus invasion by the RD domain of MDA5.

      (21) Please provide the GeneBank accession number of M. miiuy MDA5.

      The GeneBank accession number of M. miiuy MDA5 was added in the section 4.5 plasmids construction.

      (22) The content of lines 228-233 in the "Results" section bears resemblance to that of the "Introduction." To ensure the avoidance of information duplication, it is recommended to remove this paragraph from the results section.

      This section has been streamlined.

      (23) The bands of mmiMDA5 in the 5'ppp-RNA and dsRNA lanes in Figure 4B are weak and almost unobservable. Please replace them with clear images.

      We have rerun this experiment and replaced the images (Figure 4B).

      (24) In Figure 5G and at line 253, there are only results presented for the SCRV infection group, while no results are shown for the control group. This raises the question of why the control group results are missing. It is necessary to provide a reasonable explanation or correction for this issue.

      The "0 h" infection time point of the SCRV virus is the control group, and we have replaced it with a more intuitive image (Figure 5G).

      (25) In Figure 7C, it would be necessary to include the western blot result of YTHDF protein expression in order to verify the efficiency of YTHDF siRNA.

      In fact, we have attempted to detect the endogenous expression of YTHDF protein using available commercial antibodies. Unfortunately, only the YTHDF2 antibody can specifically recognize the endogenous protein expression of YTHDF2 in M. miiuy. In addition, the knockdown effect of si-YTHDF2 has been validated by YTHDF2 antibody (doi: 10.4049/jimmunol.2200618).

      (26) In line 422 of the "4.3 Cell culture and treatment" section, the paragraph raises a question regarding the nature of Miiuy croaker kidney cells (MKCs) and spleen cells (MPCs) - whether they are cell lines or freshly isolated cells (or primary cultures) derived from kidney and spleen tissues. If these cells are indeed cell lines, it is requested to provide detailed information about the sources and properties of the cells (such as whether they are epithelial cells or other mixed cell types) and the generations of propagation. Alternatively, if the cells were freshly isolated or primary cultures obtained from fish, the method for cell isolation should be provided. The source and stability of cells are extremely important for ensuring the repeatability and reliability of experimental outcomes.

      M. miiuy kidney cells (MKCs) and spleen cells (MPCs) are cell lines derived from the kidney and spleen tissues of M. miiuy, with passages ranging from 20 to 40 times. These details have been incorporated into section 4.3.

      (27) There are many inaccurate descriptions in the text, which employ concepts that are too broad. These descriptions need to be narrowed down to specific species or objects. Here are a few examples, along with the necessary revisions. Other similar instances should also be revised accordingly. For instance, in line 119, "fish MDA5" should be changed to "Miiuy croaker MDA5." Similarly, in line 166, "fish MDA5-mediated signaling pathway" should be changed to "Miiuy croaker MDA5-mediated signaling pathway." In line 174, "fish MDA5" should be revised to "Miiuy croaker MDA5." Additionally, in line 185, "antiviral responses of teleost" should be changed to "antiviral responses of Miiuy croaker." In line 197, "interact with SCRV" should be revised to "interact with 5'ppp-RNA of SCRV." In line 337, "loss of RIG-I in the vertebrate" should be modified to "loss of RIG-I in Miiuy croaker and chicken." Similarly, in line 338, "MDA5 of fish" should be changed to "MDA5 of Miiuy croaker." Lastly, in line 348, "RIG-I deficient vertebrates" should be revised to "RIG-I deficient Miichthys miiuy and Gallus gallus."

      Thank you for the reviewer's suggestions. We have made revisions to these inaccurate descriptions and reviewed the entire manuscript to address similar statements with broad concepts.

      (28) Finally, it should be noted that a similar discovery has already been reported in tree shrews (Ling Xu, et al., Proc Natl Acad Sci., 2016, 113(39):10950-10955). This article shares similarities with that research report, therefore it is necessary to discuss in detail the relationship between the two in the discussion and compare and analyze the evolutionary patterns of MDA5 from it.

      Based on the reviewer's suggestions, we have compared the similarities and differences between these two reports during the discussion and analyzed the evolutionary dynamics of MDA5 in these vertebrates lacking RIG-I.

      Minor concerns:

      Thank you to the reviewer for their meticulous examination to our manuscript, we have made revisions to the following suggestions.

      (1) At line 120, the sentence "SCRV(one 5'ppp-RNA virus)" should have a space between "SCRV" and "(one 5'ppp-RNA virus)". Please make this correction.

      Corrected.

      (2) At lines 147-148, the sentence "However, the downstream gene of TOPORSa is missing a RIG-I" is not accurate and needs modification.

      We have modified this sentence.

      (3) At line 184, "findings indicate" should be corrected to "findings indicated".

      Corrected.

      (4) At line 189, "a 5'ppp-RNA virus" should be deleted and the text seems redundant.

      Deleted.

      (5) At line 198, "replication. (Figure 3C-3E)", please remove the punctuation between "replication" and "(Figure 3C-3E)".

      Corrected.

      (6) At line 416 in "Materials and methods" section, "4.2 Sample and challenge" should be corrected to "4.2 Fish and challenge".

      Corrected.

      (7) At line 419, the authors state that "The experimental procedure for SCRV infection was performed as described", please briefly describe the SCRV infection method and the infectious dose.

      Based on the reviewer's suggestions, we have added relevant descriptions of SCRV infection in section 4.2.

      (8) There are several formatting issues in the "Materials and Methods" section. For instance, in line 424, there is no space between the number and letter in "100 μg/ml" and "26 ℃" should be corrected to "26℃". Additionally, in line 430, "Cells" should be corrected to "cells".

      Corrected.

      (9) At line 446, "50 ng/ul" and "100 mU/ul" should be corrected to "50 ng/μl" and "100 mU/μl".

      Corrected.

      (10) At line 459, "primers 1)" should be corrected to "primers".

      Corrected.

      (11) At lines 461-464, the description "For protein purification, MDA5 plasmids with 6× His tag was constructed based on pcDNA3" seems to be no direct logical connection between protein purification and the plasmid construction. Please make the necessary corrections.

      Corrected.

      (12) At line 548, "cytoplasmic" should be corrected to "Cytoplasmic".

      Corrected.

      (13) At line 549, "5× 107" should be corrected to "5 × 107".

      Corrected.

      (14) At line 557, "MgCl2" should be corrected to "MgCl2".

      Corrected.

      (15) At line 558, "6 %" should be corrected to "6%".

      Corrected.

      (16) At line 565, "50μg" should be corrected to "50 μg".

      Corrected.

      (17) At line 571, "300{plus minus}50 bp." should be corrected to "300 {plus minus} 50 bp."

      Corrected.

      (18) At lines 592-593, the sentence "After several incubations, the m6A level was quantified colorimetrically at a wavelength of 450 nm" does not read smoothly, please improve it.

      Revised.

      (19) At line 786, "MDA5 recognize" should be corrected to "MDA5 recognized".

      Corrected.

      (20) At lines 788 and 798, "Pulldown" should be corrected to "Pull-down".

      Corrected.

      (21) At lines 790 and 796, "bluestaining" should be corrected to "blue staining".

      Deleted.

      (22) At line 825, "SCRV and infection" should be corrected to "SCRV infection".

      Corrected.

      (23) At lines 826-827, "SCRV (H) and poly(I:C) (I) infection" should be corrected to "SCRV infection (H) and poly(I:C) stimulation (I)".

      Corrected.

    1. eLife assessment

      This article presents important results describing how the gathering, integration, and broadcasting of information in the brain changes when consciousness is lost either through anesthesia or injury. They provide convincing evidence to support their conclusions, although the paper relies on a single analysis tool (partial information decomposition) and could benefit from a clearer explication of its conceptual basis, methodology, and results. The work will be of interest to both neuroscientists and clinicians interested in basic and clinical aspects of consciousness

    1. eLife assessment

      This article presents important results describing how the gathering, integration, and broadcasting of information in the brain changes when consciousness is lost either through anesthesia or injury. They provide convincing evidence to support their conclusions, although the paper relies on a single analysis tool (partial information decomposition) and could benefit from a clearer explication of its conceptual basis, methodology, and results. The work will be of interest to both neuroscientists and clinicians interested in basic and clinical aspects of consciousness.

    1. Author response:

      We thank the reviewers for their help and their suggestions to make this manuscript more rigorous. We would like to post provisional author responses when eLife publish the reviewed preprint, and the more detailed responses will be supplemented with the revised manuscript.

      • There are questions about choices made in the computational approach (architecture and type of generative model, training set).

      We will train a new generator model based on the current GAN architecture, but with ‘hybrid’ AMP/AVP training sets (Reviewer 1 and 3). Hence, we can directly compare the performances of two generators. Based on our preliminary data, providing GAN with more AVP sequences during training helped the designed peptides pass the AVP filter, at the cost of reducing the average AMPredicgtor scores. The new generator also elevated the diversity of designed sequences.

      We also perturbed the detailed architecture of our deep learning models, including fully-connected graph edge encodings and different versions of ESM (e.g. esm1b_t33_650M_UR50S, esm2_t48_15B_UR50D, Reviewer 2). In the revised manuscript, we will report the effects of these modifications and suggest the overall construct of GCN and GAN are suitable for a light-weight sequence label model, as demonstrated in Author response table 1 and 2. For the generator, we suggest that using our approach, we may have reached a plateau for the GAN sampling (Author response table 3).

      Author response table 1.

      Results of AMPredictor with different graph edge encodings

      Author response table 2.

      Results of AMPredictor with different ESM versions

      Author response table 3.

      Evaluation of generated sequences with different sampling numbers

      • There is an important concern about the small number of antimicrobial peptides tested, compared to other studies, and the origin of antiviral activities.

      We will address this concern by increasing the number of peptides tested in anti-microbial and anti-viral experiments. As reported in current version of our manuscript, the first generation of GAN generated 128 unique designs and the top 2% (3 designs) was tested experimentally. The second generation of GAN will produce ~1024 designs (1-2 weeks) and the top 2% (~ 20 new sequences) will be tested. We are in the process of synthesize (2-3 weeks) and MIC measurement (1 week). The overall size of tested sample will reach 20-30 sequences. We will focus on sequences with low similarity (< 30%) to any known AMPs, thus expanding the universe functional peptides. We estimated the collection of these new data in 6 weeks.

    2. eLife assessment

      This study presents a useful pipeline for de novo design of antimicrobial peptides active both against bacteria and viruses. The method is based on deep learning, using a GAN generator and a regression tasked to predict antimicrobial activity. The evidence supporting the conclusions is promising but incomplete: three generated peptides are studied experimentally in vitro, and one is then tested in vivo in mice; the comparisons to other design methods could also be strengthened. This work will be of interest to the community working on machine learning for biomedical applications and specifically on antimicrobial peptides.

    3. Reviewer #1 (Public Review):

      This manuscript presents a pipeline incorporating a deep generative model and peptide property predictors for the de novo design of peptide sequences with dual antimicrobial/antiviral functions. The authors synthesized and experimentally validated three peptides designed by the pipeline, demonstrating antimicrobial and antiviral activities, with one leading peptide exhibiting antimicrobial efficacy in animal models. However, the manuscript as it stands, has several major limitations on the computational side.

      Major issues:

      (1) The choice of GAN as the generative model. There are multiple deep generative frameworks (e.g., language models, VAEs, and diffusion models), and GANs are known for their training difficulty and mode collapse. Could the authors elaborate on the specific rationale behind choosing GANs for this task?

      (2) The pipeline is supposed to generate peptides showing dual properties. Why were antiviral peptides not used to train the GAN? Would adding antiviral peptides into the training lead to a higher chance of getting antiviral generations?

      (3) For the antimicrobial peptide predictor, where were the contact maps of peptides sourced from?

      (4) Morgan fingerprint can be used to generate amino acid features. Would it be better to concatenate ESM features with amino acid-level fingerprints and use them as node features of GNN?

      (5) Although the number of labeled antiviral peptides may be limited, the input features (ESM embeddings) should be predictive enough when coupled with shallow neural networks. Have the authors tried simple GNNs on antiviral prediction and compared the prediction performance to those of existing tools?

      (6) Instead of using global alignment to get match scores, the authors should use local alignment.

      (7) How novel are the validated peptides? The authors should run a sequence alignment to get the most similar known AMP for each validated peptide, and analyze whether they are similar.

      (8) Only three peptides were synthesized and experimentally validated. This is too few and unacceptable in this field currently. The standard is to synthesize and characterize several dozens of peptides at the very least to have a robust study.

    4. Reviewer #2 (Public Review):

      Summary:

      This study marks a noteworthy advance in the targeted design of AMPs, leveraging a pioneering deep-learning framework to generate potent bifunctional peptides with specificity against both bacteria and viruses. The introduction of a GAN for generation and a GCN-based AMPredictor for MIC predictions is methodologically robust and a major stride in computational biology. Experimental validation in vitro and in animal models, notably with the highly potent P076 against a multidrug-resistant bacterium and P002's broad-spectrum viral inhibition, underpins the strength of their evidence. The findings are significant, showcasing not just promising therapeutic candidates, but also demonstrating a replicable means to rapidly develop new antimicrobials against the threat of drug-resistant pathogens.

      Strengths:

      The de novo AMP design framework combines a generative adversarial network (GAN) with an AMP predictor (AMPredictor), which is a novel approach in the field. The integration of deep generative models and graph-encoding activity regressors for discovering bifunctional AMPs is cutting-edge and addresses the need for new antimicrobial agents against drug-resistant pathogens. The in vitro and in vivo experimental validations of the AMPs provide strong evidence to support the computational predictions. The successful inhibition of a spectrum of pathogens in vitro and in animal models gives credibility to the claims. The discovery of effective peptides, such as P076, which demonstrates potent bactericidal activity against multidrug-resistant A. baumannii with low cytotoxicity, is noteworthy. This could have far-reaching implications for addressing antibiotic resistance. The demonstrated activity of the peptides against both bacterial and viral pathogens suggests that the discovered AMPs have a wide therapeutic potential and could be effective against a range of pathogens.

    5. Reviewer #3 (Public Review):

      Summary:

      Dong et al. described a deep learning-based framework of antimicrobial (AMP) generator and regressor to design and rank de novo antimicrobial peptides (AMPs). For generated AMPs, they predicted their minimum inhibitory concentration (MIC) using a model that combines the Morgan fingerprint, contact map, and ESM language model. For their selected AMPs based on predicted MIC, they also use a combination of antiviral peptide (AVP) prediction models to select AMPs with potential antiviral activity. They experimentally validated 3 candidates for antimicrobial activity against S. aureus, A. baumannii, E. coli, and P. aeruginosa, and their toxicity on mouse blood and three human cell lines. The authors select their most promising AMP (P076) for in vivo experiments in A. baumannii-infected mice. They finally test the antiviral activity of their 3 AMPs against viruses.

      Strengths:

      -The development of de novo antimicrobial peptides (AMPs) with the novelty of being bifunctional (antimicrobial and antiviral activity).

      -Novel, combined approach to AMP activity prediction from their amino acid sequence.

      Weaknesses:

      -I missed justification on why training AMPs without information of their antiviral activity would generate AMPs that could also have antiviral activity with such high frequency (32 out of 104).

      -The justification for AMP predictor advantages over previous tools lacks rationale, comparison with previous tools (e.g., with the very successful AMP prediction approach described by Ma et al. 10.1038/s41587-022-01226-0), and proper referencing.

      -Experimental validation of three de novo AMPs is a very low number compared to recent similar studies.

      -I have concerns regarding the in vivo experiments including i) the short period of reported survival compared to recent studies (0.1038/s41587-022-01226-0, 10.1016/j.chom.2023.07.001, 0.1038/s41551-022-00991-2) and ii) although in Figure 2 f and g statistics have been provided, log scale y-axis would provide a better comparative representation of different conditions.

      -I had difficulty reading the story because of the use of acronyms without referring to their full name for the first time, and incomplete annotation in figures and captions.

    1. eLife assessment

      This fundamental study addresses discrepancies in determining bacterial burden in osteomyelitis as determined by culture and enumeration using DNA. The authors present compelling data demonstrating the emergence of discrepancies between CFU counts and genome copy numbers detected by PCR in Staphylococcus aureus strains infecting osteocyte-like cells. The observations represent a substantial addition to the field of musculoskeletal infection, with possible broad applicability and clinical benefit to other infectious diseases.

    2. Reviewer #1 (Public Review):

      Summary:

      This work shows, based on basic laboratory investigations of in vitro grown bacteria as well as human bone samples, that conventional bacterial culture can substantially underrepresent the quantity of bacteria in infected tissues. This has often been mentioned in the literature, however, relatively limited data has been provided to date. This manuscript compares culture to a digital droplet PCR approach, which consistently showed greater levels of bacteria across the experiments (and for two different strains).

      Strengths:

      Consistency of findings across in vitro experiments and clinical biopsies. There are real-world clinical implications for the findings of this study.

      Weaknesses:<br /> No major weaknesses. Only 3 human samples were analyzed, although the results are compelling.

    3. Reviewer #2 (Public Review):

      In this study, the authors address discrepancies in determining the local bacterial burden in osteomyelitis between that determined by culture and enumeration by DNA-directed assay. Discrepancies between culture and other means of bacterial enumeration are long established and highlighted by Staley and Konopka's classic, "The great plate count anomaly" (1985). Here, the authors first present data demonstrating the emergence of discrepancies between CFU counts and genome copy numbers detected by PCR in S. aureus strains infecting osteocyte-like cells. They go on to demonstrate PCR evidence that S. aureus can be detected in bone samples from sites meeting a widely accepted clinico-pathological definition of osteomyelitis. They conclude their approach offers advantages in quantifying intracellular bacterial load in their in vitro "co-culture" system.

      WEAKNESSES

      (A) My main concern here is the significance of these results outside the model osteocyte system used by this group. Although they carefully avoid over-interpreting their results, there is a strong undercurrent suggesting their approach could enhance aetiologic diagnosis in osteomyelitis and that enumeration of the infecting pathogen might have clinical value. In the first place molecular diagnostics such as 16S rDNA-directed PCR are well established in identifying pathogens that don't grow. Secondly, it is hard to see how enumeration could have value beyond in vitro and animal model studies since serial samples will rarely be available from clinical cases.

      (B) I have further concerns regarding interpretation of the combined bacterial and host cell-directed PCRs against the CFU results. Significance is attached to the relatively sustained genome counts against CFU declines. On the one hand it must be clearly recognised that detection of bacterial genomes does not equate to viable bacterial cells with potential for further replication or production of pathogenic factors. Of equal importance is the potential contribution of extracellular DNA from lysed bacteria and host cells to these results. The authors must clarify what steps, if any, they have taken to eliminate such contributions for both bacteria and host cells. Even the treatment with lysotaphin may have coated their osteocyte cultures with bacterial DNA, contributing downstream to the ddPCR results presented.

      STRENGTHS

      (C) On the positive side, the authors provide clear evidence for the value of the direct buffer extraction system they used as well as confirming the utility of ddPCR for quantification. In addition, the successful application of MinION technology to sequence the EF-Tu amplicons from clinical samples is of interest.

      (D) Moreover, the phenomenology of the infection studies indicating greater DNA than CFU persistence and differences between the strains and the different MOI inoculations are interesting and well-described, although I have concerns regarding interpretation.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This work shows, based on basic laboratory investigations of invitro-grown bacteria as well as human bone samples, that conventional bacterial culture can substantially underrepresent the quantity of bacteria in infected tissues. This has often been mentioned in the literature, however, relatively limited data has been provided to date. This manuscript compares culture to a digital droplet PCR approach, which consistently showed greater levels of bacteria across the experiments (and for two different strains).

      Strengths:

      Consistency of findings across in vitro experiments and clinical biopsies. There are real-world clinical implications for the findings of this study.

      Weaknesses:

      No major weaknesses. Only three human samples were analyzed, although the results are compelling.

      We only put in three examples of clinical diagnosis to showcase the application of this method particularly to osteomyelitis. For further validation, larger cohort studies are required, which are currently underway.

      Reviewer #2 (Public Review):

      In this study, the authors address discrepancies in determining the local bacterial burden in osteomyelitis between that determined by culture and enumeration by DNA-directed assay. Discrepancies between culture and other means of bacterial enumeration are long established and highlighted by Staley and Konopka's classic, "The great plate count anomaly" (1985). Here, the authors first present data demonstrating the emergence of discrepancies between CFU counts and genome copy numbers detected by PCR in S. aureus strains infecting osteocyte-like cells. They go on to demonstrate PCR evidence that S. aureus can be detected in bone samples from sites meeting a widely accepted clinicopathological definition of osteomyelitis. They conclude their approach offers advantages in quantifying intracellular bacterial load in their in vitro "co-culture" system.

      The publication related to “The great plate count anomaly (1985)” has been added to revised version as new reference #2.

      Weaknesses

      - My main concern here is the significance of these results outside the model osteocyte system used by this group. Although they carefully avoid over-interpreting their results, there is a strong undercurrent suggesting their approach could enhance aetiologic diagnosis in osteomyelitis and that enumeration of the infecting pathogen might have clinical value. In the first place, molecular diagnostics such as 16S rDNA-directed PCR are well established in identifying pathogens that don't grow. Secondly, it is hard to see how enumeration could have value beyond in vitro and animal model studies since serial samples will rarely be available from clinical cases.

      Indeed, we initiated this study for the purpose of trying to improve the diagnostic outcomes for osteomyelitis, in particular that associated with prosthetic joint infection (PJI) but also all other forms, as the current gold-standard diagnostic approaches for this type of infection, either bacterial culture or whole genome sequencing, are very time consuming and costly, and yet are not necessarily accurate. Our method has the benefits (not limited to) of achieving absolute quantification of bacterial load in a shortened time period (in the order of hours) in clinical bone specimens from infected patients. Many of the identified bacterial species in patients were not able to be diagnosed by standard bacterial culturing. Moreover, one of the problematic features of treating bone infection is that repetitive surgeries are usually needed, particularly in PJI, hence, serial clinical bone specimens from the same patient are in fact often available. Therefore, our method of being able to quantify bacterial load offers the advantage of monitoring the infected status throughout the treatment journey. In this study, we chose the tuf gene as the targeting sequence to amplify the bacterial signal instead of the well-established 16S PCR for the reason that tuf provides much better sequence discrimination between bacterial species. Therefore, the short PCR amplicon of just 271 bp used in our study, is able to give us a highly accurate taxonomic readout. By this approach, we again shorten the time required for diagnosis. In the last paragraph of the Discussion in the revised manuscript, extra text, a figure demonstrating the strong sequence diversity in tuf (Supplementary Figure 2) and an additional reference have been added to address the Reviewer’s concerns.

      - I have further concerns regarding the interpretation of the combined bacterial and host cell-directed PCRs against the CFU results. Significance is attached to the relatively sustained genome counts against CFU declines. On the one hand, it must be clearly recognised that the detection of bacterial genomes does not equate to viable bacterial cells with the potential for further replication or production of pathogenic factors. Of equal importance is the potential contribution of extracellular DNA from lysed bacteria and host cells to these results. The authors must clarify what steps, if any, they have taken to eliminate such contributions for both bacteria and host cells. Even the treatment with lysotaphin may have coated their osteocyte cultures with bacterial DNA, contributing downstream to the ddPCR results presented.

      We agree that concerns around the interpretation of any molecular readout need to be taken into account. We have yet to find a method that can definitively identify bacterial viability in a clinical setting in the absence of culture. However, PJI and osteomyelitis in general is characterised by a high percentage of culture-negative infection cases, calling for such molecular approaches. Commercially available, so called “live/dead” bacterial PCR reagents exist that act as PCR signal inhibitors by penetrating the cell wall of compromised cells to prevent the PCR signal being generated from those cells. In our experience, while these can provide a certain level of added scrutiny in an experimental setting, they are not definitive because the reaction is often incomplete in an idealised situation and also the reagent may cancel signal from viable bacteria growing under conditions of stress, such as during antimicrobial treatment and host-derived stress imparted in intracellular or intra-tissue environments. Indeed, such stresses are likely contributors to clinical non-culturability. Whole genome sequencing would provide more certainty of bacterial viability to demonstrate genomic intactness but as we discuss herein, this a lengthy and costly process, and one which may prove difficult from host tissue with a low pathogen load. It should be noted that the significance of any diagnostic readout, including from culture, WGS or our method reported here would need to be interpreted by the treating clinical team. We would argue that a rapid, practical molecular diagnostic method in the absence or even presence of culture would provide treating clinicians with an improved rationale for tailoring antimicrobial treatments. 

      Strengths

      - On the positive side, the authors provide clear evidence for the value of the direct buffer extraction system they used as well as confirming the utility of ddPCR for quantification. In addition, the successful application of MinION technology to sequence the EF-Tu amplicons from clinical samples is of interest.

      - Moreover, the phenomenology of the infection studies indicating greater DNA than CFU persistence and differences between the strains and the different MOI inoculations are interesting and well-described, although I have concerns regarding interpretation.

    1. eLife assessment

      This manuscript by Vuong and colleagues reports on the kinetics of viremia in a large set of individuals from Vietnam. In the large cohort, all 4 dengue serotypes are represented and the authors try to correlate viraemia measured at various days from illness onset with thrombocytopaenia and severe dengue, according to the WHO 2009 classification scheme. These are fundamental findings that provide compelling evidence of the importance of measuring viremia early in the phase of the disease. These data will help to inform the design of studies of antiviral drugs against dengue.

    2. Reviewer #1 (Public Review):

      Summary:

      This manuscript by Vuong and colleagues reports a study that pooled data from 3 separate longitudinal study that collectively spanned an observation period of over 15 years. The authors examined for correlation between viraemia measured at various days from illness onset with thrombocytopaenia and severe dengue, according to the WHO 2009 classification scheme. The motivation for this study is both to support the use of viraemia measurement as a prognostic indicator of dengue and also to, when an antiviral drug becomes licensed for use, guide the selection of patients for antiviral therapy. They found that the four DENVs show differences in peak and duration of viraemia and that viraemia levels before day 5 but not those after from illness onset correlated with platelet count and plasma leakage at day 7 onwards. They concluded that the viraemia kinetics call for early measurement of viraemia levels in the early febrile phase of illness.

      Strengths:

      This is a unique study due to the large sample size and longitudinal viraemia measurements in the study subjects. The data addresses a gap in information in the literature, where although it has been widely indicated that viraemia levels are useful when collected early in the course of illness, this is the first time anyone has systematically examined this notion. The inclusion of correlation between rate of viraemia decline and risk of severe dengue/plasma leakage further strengthens the relevance of this paper to those interested in anti-dengue therapeutic research and development.

      Weaknesses:

      The study only analysed data from dengue patients in Vietnam. Moreover, the majority of these patients had DENV-1 infection; few had DENV-4 infection. The data could thus be skewed by the imbalance in the prevalence of the different types of DENV during the period of observation. The use of patient-reported time of symptom onset as a reference point for viraemia measurement is pragmatic although there is subjectivity and thus noise in the data.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors have carried out a comprehensive analysis regarding the kinetics of viraemia and clinical disease severity.

      Strengths:

      The manuscript provides important information, especially regarding the time of clearance of the virus and disease severity.

      Weaknesses:

      Due to the lower number of patients with primary dengue, cannot get an idea regarding viraemia kinetics and disease severity for different serotypes during primary infection.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This manuscript by Vuong and colleagues reports a study that pooled data from 3 separate longitudinal studies that collectively spanned an observation period of over 15 years. The authors examined for correlation between viraemia measured at various days from illness onset with thrombocytopaenia and severe dengue, according to the WHO 2009 classification scheme. The motivation for this study is both to support the use of viraemia measurement as a prognostic indicator of dengue and also when an antiviral drug becomes licensed for use, to guide the selection of patients for antiviral therapy. They found that the four DENVs show differences in peak and duration of viraemia and that viraemia levels before day 5 but not those after from illness onset correlated with platelet count and plasma leakage at day 7 onwards. They concluded that the viraemia kinetics call for early measurement of viraemia levels in the early febrile phase of illness.

      Strengths:

      This is a unique study due to the large sample size and longitudinal viraemia measurements in the study subjects. The data addresses a gap in information in the literature, where although it has been widely indicated that viraemia levels are useful when collected early in the course of illness, this is the first time anyone has systematically examined this notion.

      Weaknesses:

      The study only analysed data from dengue patients in Vietnam. Moreover, the majority of these patients had DENV-1 infection; few had DENV-4 infection. The data could thus be skewed by the imbalance in the prevalence of the different types of DENV during the period of observation. The use of patient-reported time of symptom onset as a reference point for viraemia measurement is pragmatic although there is subjectivity and thus noise in the data.

      We acknowledge and appreciate your comments regarding the limitations of our study, including the pooled data from Vietnam and the use of symptom onset as a reference point for viremia kinetics. These points have been incorporated into the “Limitations” section.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript highlights very important findings in the field, especially in designing clinical trials for the evaluation of antivirals.

      Strengths:

      The study shows significant differences between the kinetics of viral loads between serotypes, which is very interesting and should be taken into account when designing trials for antivirals.

      Weaknesses:

      The kinetics of the viral loads based on disease severity throughout the illness are not described, and it would be important if this could be analyzed.

      In response to your suggestion, we have expanded our analysis to investigate the relationship between the rate of viremia decline and clinical outcomes. Our findings demonstrate that a faster rate of viremia decline is associated with a reduced risk of severe clinical outcomes. We have incorporated this new analysis into the revised manuscript, providing further details in the “Statistical Analysis” section (page 7) and presenting the results on pages 15 and in Figure 6.

      Reviewer #1 (Recommendations For The Authors):

      Several areas require additional attention. I have limited my comments on the findings as I am not a mathematician and cannot knowledgeably comment on the statistical modelling methods.

      Comment #1: Lines 83-84. Although viraemia level shows declining trends from illness onset and thus lessens its prognostic value, it remains unknown if a more rapid rate of decline in viraemia is associated with a reduced risk of severe dengue. This is the fundamental premise of antiviral drug development for the treatment of dengue. The authors are uniquely poised to show if this logic that underpins antiviral development is likely correct and perhaps even estimate the extent to which a decline in viraemia needs to occur for a measurable reduction in the risk of severe dengue. Could the authors consider such an analysis?

      We appreciate your valuable suggestion. In response, we have expanded our analysis to investigate the relationship between the rate of viremia decline and clinical outcomes Utilizing a model of viremia kinetics with the assumption of a linear log-10 viremia decrease over time, we calculated the rate of decline for each patient. Our findings demonstrate that a faster rate of viremia decline is associated with a significantly reduced risk of severe clinical outcomes. We have incorporated this new analysis into the revised manuscript, providing further details in the “Statistical Analysis” section (page 7) and presenting the results on pages 15 and in Figure 6.

      Comment #2: Lines 101-102. Studies A and B were conducted in parallel, and several patients enrolled in study A from primary healthcare clinics were eventually also enrolled in study B upon hospitalization. It would be helpful to know how many patients from study A were included in study B. It would also be useful for the authors to indicate if such inclusion would constitute double-counting at any point in their analyses.

      To address potential confusion regarding patient overlap between studies A and B, we have provided further clarification in the revised manuscript’s Legend of Figure 1. Among confirmed dengue patients, 31 individuals enrolled in study A were later included in study B upon hospitalization. Of these, 9 had viremia measurements available in both studies and were consequently analysed in study A only. The remaining 22 lacked viremia data in study A but had measurements in study B, leading to their inclusion in study B in the analysis. We have taken meticulous care to ensure no patient data is double-counted.

      Comment #3: Lines 126-127. The definition of probable primary and secondary dengue from IgG measurements needs more detail. How was the anti-DENV IgG ELISA data from paired sera interpreted?

      To ensure clarity, we have moved the definitions of probable primary and secondary infections from the supplementary file (Appendix 2) to the main text of the revised manuscript (Methods section – Plasma viremia measurement, dengue diagnostics, and clinical endpoints – page 6): “A probable primary infection was defined by two negative/equivocal IgG results on separate samples taken at least two days apart within the first ten days of symptom onset, with at least one sample during the convalescent phase (days 6-10). A probable secondary infection was defined by at least one positive IgG result during the first ten days. Cases without time-appropriate IgG results were classified as indeterminate.”

      Comment #4: Lines 230-232 and Figure 4. The findings reported in Figure 4 are curious. Why is the platelet count highest (significantly?) for DENV-1 compared to other DENV-type infections at low viraemia levels on LM days 1-3? Does that also mean that DENV-3 and -4 infections have a greater impact on platelet counts at days 7-10 than DENV-1 and -2?

      In our analyses, we allowed the relation between viremia and platelet count to differ by serotype. Figure 4 shows the highest platelet counts for DENV-1 compared to other serotypes, especially at low viremia levels. Apparently, while DENV-1 on average has higher viremia (Figure 3), the same viremia level in DENV-1 compared to other serotypes is associated with a less severe disease course and higher platelet count. This does not necessarily imply that platelet count overall, uncorrected for viremia level, differs by genotype. Indeed, our unpublished analysis (shown below) indicates a modest influence of serotype on platelet count.

      Author response image 1.

      Comment #5: Figure 5. In a recent paper (Vuong et al, Clin Infect Dis 2021), the authors show elegantly that the viraemia levels on admission correlated with severe dengue. However, these correlations were different for each of the four DENV types and whether the infection was primary or secondary. Why wasn't the analysis in Figure 5 further stratified by their probable primary or secondary dengue status?

      We appreciate your feedback and have stratified Figure 5 by serotype and immune status as suggested. Please note that due to the limited number of severe dengue in primary infections (only 1 case in DENV-1) and plasma leakage in primary DENV-4 (see Appendix 4-table 1), the estimated probability of having these outcomes is nearly zero across all viremia levels within these subgroups.

      Comment #6: Line 279. The description in this line is at odds with the data in Figure 3A, which shows that DENV-2 could be detected over a longer period than DENV-1 as the one-step RT-qPCR assay has a lower detection limit than DENV-1.

      In response to your feedback, we have revised the description to clarify that DENV-1 exhibits higher viremia levels compared to DENV-2 and DENV-3 in the revised manuscript (page 18).

      Reviewer #2 (Recommendations For The Authors):

      Introduction

      Comment #1: Line 56: the authors state that viraemia is associated with dengue disease severity and cite their previous results. They then summarize the results of this study and others. The highlights of this paper should be described in more detail. It is important that the authors state the conclusions of their own paper, including that the association was not very strong and that the viral loads were lowest with DENV2, but DENV2 was associated with more severe disease.

      Thank you for your comment. To improve the introduction’s flow, we have removed that sentence in line 56 of the manuscript and have added the weak association in the next paragraph (pages 3-4).

      Comment #2: It would be important to cite smaller studies that show a delay in clearance of the virus being associated with more severe disease outcomes.

      Thanks for your suggestion. We have added information to the introduction (page 4), highlighting a study which found a slower rate of viral clearance to be associated with more severe outcomes (Wang et al., 2008). However, other studies have shown no association (Vaughn et al., 2000; Fox et al., 2011). This lack of conclusive evidence underscores the need for further research.

      Methods

      Comment #3: The authors highlight the possible discrepancies in comparing viral kinetics of two RT-PCR methods. Although it is not ideal to combine such results, the authors have analyzed them separately, providing valuable data.

      We appreciate your comment.

      Comment #4: Which tests were used to define the immune status as primary and secondary? What were the definitions?

      We have moved the definitions of probable primary and secondary infections from the supplementary file (Appendix 2) to the main text of the revised manuscript (Methods section – Plasma viremia measurement, dengue diagnostics, and clinical endpoints – page 6): “A probable primary infection was defined by two negative/equivocal IgG results on separate samples taken at least two days apart within the first ten days of symptom onset, with at least one sample during the convalescent phase (days 6-10). A probable secondary infection was defined by at least one positive IgG result during the first ten days. Cases without time-appropriate IgG results were classified as indeterminate.”

      Results

      Comment #5: It is interesting that DENV2 showed the slowest decline, but yet associated with overall lower viral loads during early illness and more severe disease outcomes. Could delayed clearance of the virus be associated with disease severity?

      We have expanded our analysis to investigate the relationship between the rate of viremia decline and clinical outcomes Utilizing a model of viremia kinetics with the assumption of a linear log-10 viremia decrease over time, we calculated the rate of decline for each patient. Our findings demonstrate that a faster rate of viremia decline is associated with a significantly reduced risk of severe clinical outcomes. We have incorporated this new analysis into the revised manuscript, providing further details in the “Statistical Analysis” section (page 7) and presenting the results on pages 15 and in Figure 6.

      Comment #6: Were there any differences in the kinetics of viral loads in children vs adults? I.e. children, young adults and older adults (>60 or 50?). Or were there insufficient numbers for this comparison?

      To address this point, we have modified the reported results of Figure 3-D by ages of 5, 10, 15, 25, and 50 years, represented children, adolescents, young adults, and older adults. Our analysis shows that viremia kinetics are largely similar across ages.

      Comment #7: Did any patients have comorbidities such as diabetes, obesity etc... if so, were there any differences in the viral loads?

      We appreciate your interest in the potential impact of comorbidities on viral loads. However, due to data limitations, we were unable to analyze this association. Only 6 patients had documented diabetes in the pooled dataset. In study C, 39 patients had obesity, whereas body mass index data is not available for studies A and B, although reports suggest a lower prevalence of obesity compared to study C.

      Comment #8: Were there any differences in the kinetics of the overall viral loads between DF/DHF/DSS or dengue with warning signs, without warning signs and severe dengue? Especially related to the time for viral clearance?

      Thank you for your suggestion. Such analysis reverses time and the causal direction, while we are more interested in looking forward. Therefore, instead of analyzing viremia kinetics based on disease severity, we have added an analysis to investigate the relationship between the rate of decline in viremia and clinical outcomes, as shown in the response to your comment #5. Results show that a more rapid rate of viremia decline is associated with a reduced risk of more severe clinical outcomes. In addition, in this study, we selected two clinical outcomes severe dengue and plasma leakage. The definitions are based on the WHO 2009 guidelines and standard endpoint definitions for dengue trials (Tomashek et al., 2018).

    1. Reviewer #1 (Public Review):

      Summary:

      Authors previously demonstrated that species-specific variation in primate CD4 impacts its ability to serve as a functional receptor for diverse SIVs. Here, Warren and Barbachano-Guerrero et al. perform population genetics analyses and functional characterization of great ape CD4 with a particular focus on gorillas, which are natural hosts of SIVgor. They first used ancestral reconstruction to derive the ancestral hominin and hominid CD4. Using pseudotyped viruses representing a panel of envelopes from SIVcpz and HIV strains, they find that these ancestral reconstructions of CD4 are more similar to human CD4 in terms of being a broadly susceptible entry receptor (in the context of mediating entry into Cf2Th cells stably expressing human CCR5). In contrast, extant gorilla and chimpanzee CD4 are functional entry receptors for a narrower range of HIV and SIVcpz isolates. Based on these differences, authors next surveyed gorilla sequences and identified several CD4 haplotypes, specifically in the region encoding the CD4 D1 domain, which directly contacts the viral glycoprotein and thus may impact the interaction. Consistent with this possibility, authors demonstrated that gorilla CD4 haplotypes are, on average, less capable of supporting entry than human CD4, and that some are largely unable to function as SIV entry receptors. Interestingly, individual residues found at key positions in the gorilla CD4 D1 when tested in the context of human CD4 reduce entry of some virions pseudotyped with diverse SIVcpz envelopes, suggesting that individual amino acids can in part explain the observed differences across gorilla CD4 haplotypes. Finally, the authors perform statistical tests to infer that CD4 from great apes with endemic SIV (i.e., chimpanzees and gorillas) but not non-reservoirs (i.e., orangutans, bonobos) or recent spillover hosts (i.e., humans), have been subject to selection as a result of pressure from endemic SIV.

      The conclusions of this paper are mostly well supported by data.

      Strengths:

      (1) The functional assays are appropriate to test the stated hypothesis, and the authors use a broad diversity of envelopes from HIV and SIVcpz strains. Authors also partially characterize one potential mechanism of gorilla CD4 resistance - receptor glycosylation at the derived N15 found in 5/6 gorilla haplotypes.

      (2) Ancestral reconstruction provides a particularly interesting aspect of the study, allowing authors to infer the ancestral state of hominid CD4 relative to modern CD4 from gorillas and chimpanzees. This, coupled with evidence supporting SIV-driven selection of gorilla CD4 diversity and the characterization of functional diversity of extant haplotypes provides several interesting findings.

      Weaknesses:

      (3). The major inference of the work is that SIV infection of gorillas drove the observed diversity in gorilla CD4. This is supported by the majority of SNPs being localized to the CD4 D1, which directly interacts with envelope, and the demonstrated functional consequences of that diversity for viral entry. However, SIVgor (to the best of my knowledge) only infects Western lowland gorillas (Gorilla gorilla gorilla), and one Gorilla gorilla diehli and three Gorilla beringei graueri individuals were included in the haplotype and allele frequency analyses. The presence of these haplotypes or the presence of similar allele frequencies in Eastern lowland and mountain gorillas would impact this conclusion. It would be helpful for the authors to clarify this point.

      (4) The authors appear to use a somewhat atypical approach to assess intra-population selection to compensate for relatively small numbers of NHP sequences (Fig. 6). However, they do not cite precedence for the robustness of the approach or the practice of grouping sequences from multiple species for the endemic vs other comparison. They also state in the methods that some genes encoded in the locus were removed from the analysis "because they have previously been shown to directly interact with a viral protein." This seems to undercut the analysis, and prevents alternative explanations for the observed diversity in CD4 (e.g., passenger mutations from selection at a neighboring locus).

      (5) Data in Figure 5 is graphed as % infected cells instead of virus titer (TDU/mL). It's unclear why this is the case, and prevents a comparison to data in Figure 2 and Figure 4.

      (6) The lack of pseudotyping with SIVgor envelope is a surprising omission from this study, that would help to contextualize the findings. Similarly, building gorilla CD4 haplotype SNPs onto the hominin ancestor (as opposed to extant human CD4) may provide additional insights that are meaningful towards understanding the evolutionary trajectory of gorilla CD4.

      Comments on revised version:

      In the revised manuscript, the authors more appropriately contextualize conclusions that can be made based on their data versus inferences, which are now much more clearly described in the discussion. The authors also included more references to substantiate claims, additional description of methodology, and provided well-reasoned responses to the weaknesses described in my primary review.

      Re: #3. As the authors point out, we do not know if eastern gorillas were at one time exposed to SIV. The authors use a variety of phylogenetic and functional approaches to infer that SIVcpz is the selective pressure-shaping gorilla CD4. While I agree this is a highly likely scenario, the allelic diversity of CD4 across gorilla subpopulations raises multiple evolutionary scenarios consistent with the data.

      Re: #4. The explanation provided by the authors is reasonable. However, a demonstration that this approach is robust to potential factors that might skew the data (e.g., recombination) is argued but not tested. Part of the concern here is that the study is limited by very small sample sizes, and to the best of my knowledge, grouping sequences from multiple species to make claims about selection is not an established practice. The authors note in their response that they confirmed the existence of CD4 alleles in this study with those identified in 100 gorilla individuals from Russell et al. 2021 (unavailable to the authors at the time of submission) - a re-analysis that includes that data from Russell et al. 2021 would have strengthened the analyses.

    2. eLife assessment

      This study presents an important finding on how lentiviral infection has driven the diversification of the HIV/SIV entry receptor CD4. Using a combination of molecular evolution approaches coupled with functional testing of extant and ancestral reconstructions of great ape CD4, the authors provide solid evidence to support the idea that endemic simian immunodeficiency virus infection in gorillas have selected for gorilla CD4 alleles that are more resistant to SIV infection. Expanding the study to interrogate the evolution and function of additional primate CD4 sequences could yield more convincing evidence.

    3. Reviewer #2 (Public Review):

      Lentiviral infection of primate species has been linked to the rapid mutational evolution of numerous primate genes that interact with these viruses, including genes that inhibit lentiviruses as well as genes required for viral infection. In this manuscript, Warren et al. provide further support for the diversification of CD4, the lentiviral entry receptor, to resist lentiviral infection in great ape populations. This work builds on their prior publication (Warren et al. 2019, PMCID: PMC6561292 ) and that of other groups (e.g., Russell et al. 2021, PMCID: PMC8020793; Bibollet-Ruche et al. 2019, PMCID: PMC6386711) documenting both sequence and functional diversity in CD4, specifically within (1) the CD4 domain that binds to the lentiviral envelope and (2) great ape populations with endemic lentiviruses. Thus, the paper's finding that gorilla populations exhibit diverse CD4 alleles that differ in their susceptibility to lentiviral infection is well demonstrated both here and in a prior publication.

      Strengths:

      By reconstructing the CD4 sequence from the ancestor of gorillas and chimpanzees, the authors document that modern species have evolved more resistance to (admittedly modern) lentiviruses. They also deconstruct the molecular basis of this resistance by showing that one mutation, which adds a glycosylation site to CD4, is sufficient to confer lentiviral resistance to the susceptible human allele.

      Weaknesses:

      Warren et al. also pursue two novel lines of evidence to suggest that lentiviruses are the causative driver of great ape CD4 diversification, which seems likely from a logical perspective but is difficult to prove. First, they demonstrate that resistance to lentiviral infection is a derived trait in chimpanzees and gorillas, which have been co-evolving with endemic lentiviruses, but not in humans, which only recently acquired HIV. Nevertheless, these three examples are insufficient to prove that derived resistance is not stochastic or due to drift. The argument would be strengthened by demonstrating that bonobo and orangutan CD4, which also do not have endemic lentiviruses, resemble the ancestral and human susceptibility to great-ape-infecting lentiviruses.

      Second, Warren et al. provide a population genetic argument that only endemically infected primates exhibit diversifying selection, again arguing for endemic lentiviruses being the evolutionary driver. The authors compare SNP occurrence in CD4 to neighboring genes, demonstrating that non-synonymous SNP frequency is only elevated in endemically infected species. Moreover, these amino-acid-coding changes are significantly concentrated in the CD4 domain that binds the lentiviral envelope. This is a creative analysis to overcome the problem of very small sample sizes, with very few great ape individuals sequenced. However, the small number of species compared (2-4 in each group) also limits the power of the analysis. Expanding the analysis to Old World Monkey species that do or do not have endemic lentiviruses, as well as great apes, would strengthen this argument.

      Overall, this manuscript lends additional support to a well-documented example of a host-virus arms race: that of lentiviruses and the viral entry receptor.

    4. Author response:

      The following is the authors’ response to the original reviews.

      We are thankful for the comments and suggestions from the Editor and Reviewers about our manuscript submitted to the eLife Journal. We have addressed all the comments, and we think these modifications will help bring clarity to our message and be helpful to your readership. Here we include an outline of the corrections performed, as well as a detailed response to each of the reviewer’s comments.

      As per the Editor and Reviewers suggestions, outline of corrections:

      ·        The title of the manuscript has been changed to reflect a more conservative conclusion.

      ·        Changes in the main manuscript text were made to enhance clarity, including the use genetic terminology and naming.

      ·        Specific responses to some comments from the reviewers are included in this document. We combined some comments that would be better addressed together.

      Accompanied to this letter is an updated version of our manuscript with the track changes feature enabled. Again, we are thankful of the comments and suggestions we received, and we hope this revised version of our manuscript will be accompanied by an updated assessment and public reviews and a final eLife Version of Record.

      Response to the public review and minor recommendations.

      From Reviewer #1:

      The major inference of the work is that SIV infection of gorillas drove the observed diversity in gorilla CD4. This is supported by the majority of SNPs being localized to the CD4 D1, which directly interacts with the envelope, and the demonstrated functional consequences of that diversity for viral entry. However, SIVgor (to the best of my knowledge) only infects Western lowland gorillas (Gorilla gorilla gorilla), and one Gorilla gorilla diehli and three Gorilla beringei graueri individuals were included in the haplotype and allele frequency analyses. The presence of these haplotypes or the presence of similar allele frequencies in Eastern lowland and mountain gorillas would impact this conclusion. It would be helpful for the authors to clarify this point.

      From Reviewer #1 (minor comment):

      Which subspecies of gorilla are the nsSNPs coming from? Gorilla gorilla diehli [n =1]; Gorilla beringei graueri [n = 3]) are not extant reservoirs of SIV and to my knowledge are not thought to have been, and so it's important to point out where the diversity is coming from if the authors are asserting that SIVgor drove this population-level diversity in gorilla CD4.

      We initially included genomic data from all the gorilla individuals available to maximize sensitivity to identify allelic variants. Although evidence points to eastern gorillas not being currently infected with SIV, our results show that all allelic variants identified have differential susceptibility to the HIV-1 and SIVcpz strains tested. The allelic variants we identified with this genomic data set match the variants identified by Russell et al (doi.org/10.1073/pnas.2025914118), including the ones found in eastern gorillas, and recapitulate that those variants have differential susceptibility to lentiviral entry, similar to the variants of western populations. Whether eastern gorillas have been exposed to lentiviruses in the past remains unknown.

      From Reviewer #1:

      The authors appear to use a somewhat atypical approach to assess intra-population selection to compensate for relatively small numbers of NHP sequences (Fig. 6). However, they do not cite precedence for the robustness of the approach or the practice of grouping sequences from multiple species for the endemic vs other comparison. They also state in the methods that some genes encoded in the locus were removed from the analysis "because they have previously been shown to directly interact with a viral protein." This seems to undercut the analysis and prevents alternative explanations for the observed diversity in CD4 (e.g., passenger mutations from selection at a neighboring locus).

      Given the nature of our samples, to detect any influence of natural selection acting on CD4, we chose to compare patterns of molecular evolution of CD4 to its neighboring loci. Comparisons of molecular evolution signatures across genomic regions are the basis of methods to detect positive selection (e.g., Sabeti DOI: 10.1038/nature01140). For our comparison, the neighboring loci represent our neutral standard for the genomic region CD4 resides. Our rationale is that demographic and neutral influences on the number and frequency of polymorphic sites in a region would equally affect all loci in a genomic region. Because these neighboring loci are our neutral benchmark, we excluded before analysis other genes in this genomic region that interact with viruses. The logic is that these loci may be evolving under the influence of positive selection and would decrease the power of our comparison. None of the excluded loci are direct neighbors to CD4. This, and given that the CD4 genomic region in humans is of average recombination rate, dampens the possibility that what we are observing at CD4 is due to selection acting at a neighboring locus. In addition, the classic population genetic method to detect positive selection, the McDonald-Kreitman test (McDonald DOI: 10.1038/351652a0), was originally presented combining polymorphism data across species. We assume that any effect on levels of diversity created by combining variability between species would equally affect all loci included in the study, not just CD4.

      From Reviewer #1:

      Data in Figure 5 is graphed as % infected cells instead of virus titer (TDU/mL). It's unclear why this is the case, and prevents a comparison to data in Figure 2 and Figure 4.

      From Reviewer #1 (minor comment):

      Figure 5: the data presentation is now shown as % infected cells instead of viral titer. This makes it difficult to compare data from Figure 5 to other figures. Can the authors please either justify this change, display data consistently or provide matched data displays as a Supplemental Figure?

      For the experiments presented in figures 2 and 4 we used different volumes of infecting pseudoviruses, which allowed us to identify the linear range of infection. Then, based on the number of cells plated per experimental replicate, we calculated a virus titer. In follow-up experiments (Fig. 5), we used fixed volumes of virus that would infect ~10-20% of control (wild-type; wt) CD4-expressing cells. Comparisons were then made between wt and mutated CD4s, and these data are best presented in their raw forms as percent cells infected.  Although this change in method prevents direct comparison between the figures, we focused on the differences observed between the experimental conditions per experimental panel.

      From Reviewer #1:

      The lack of pseudotyping with SIVgor envelope is a surprising omission from this study, that would help to contextualize the findings.

      From Reviewer #2 (minor comment):

      The inclusion of HIV-1 but not SIVgor strains in Figures 2D/E is somewhat conspicuous since chimpanzee alleles certainly differ in susceptibility to SIVcpz (and SIVgor) strains per Russell et al. 2021. The authors should either test some SIVgor infections, cite published data on at least extant human/chimpanzee/gorilla CD4 susceptibility to SIVgor, or address why they did not include it.

      We agree the data of host susceptibility to SIVgor strains would have been an interesting question to explore. However, we opted to focus on the transmission of SIVcpz strains into gorilla populations for this study. It is worth mentioning that we have cloned SIVgor envelope genes from some strains into our expression system, but we were unable to recover infectious pseudoviruses using an HIV-1DEnv-GFP backbone. This suggests that HIV-1 may be incompatible with incorporating SIVgor Env into virus particles. Recently, Russell et al (DOI: 10.1073/pnas.2025914118) managed to generate SIVgor Env pseudotyped virions using a different backbone (SIVcpzDEnv-GFP) that was unavailable to us at the time of this study.

      From Reviewer #1:

      Similarly, building gorilla CD4 haplotype SNPs onto the hominin ancestor (as opposed to extant human CD4) may provide additional insights that are meaningful toward understanding the evolutionary trajectory of gorilla CD4.

      We decided to use the extant human CD4 as a backbone to test the effects on the individual amino acid variants found in the allelic diversity of the gorilla population since the human protein is highly susceptible to all the HIV-1 and SIV strains tested, and the expected phenotype is a loss-of-function. Since the D1 of the human and ancestral sequences for CD4 are almost identical (except for a change that is fixed in gorillas), and they showed similar levels of susceptibility to lentivirus entry, we expect that the phenotypes found would be the same if the gorilla SNPs were built into the ancestral CD4 backbone.

      From Reviewer #2:

      To bolster the argument that lentiviruses are indeed the causative driver of this diversification, which seems likely from a logical perspective but is difficult to prove, Warren et al. pursue two novel lines of evidence. First, the authors reconstruct ancestral CD4 genes that predate lentiviral infection of hominid populations. They then demonstrate that resistance to lentiviral infection is a derived trait in chimpanzees and gorillas, which have been co-evolving with endemic lentiviruses, but not in humans, which only recently acquired HIV. Nevertheless, the derived resistance could be stochastic or due to drift. This argument would be strengthened by demonstrating that bonobo and orangutan CD4, which also do not have endemic lentiviruses, resemble the ancestral and human susceptibility to great-ape-infecting lentiviruses.

      From Reviewer #2 (minor comment):

      The data presented in Figure 2, showing that chimp and gorilla (but not human) CD4 resistance to lentiviral infection is a derived trait, is very intriguing for suggesting that endemic lentiviruses are the causative driver of CD4 evolution. Nevertheless, this could be stochastic or due to genetic drift. Given the later emphasis on several other non-endemically infected species, the authors should at the very least include the sequences for bonobo and orangutan CD4 in the presented alignment (Fig 2B). Ideally, they would also test these orthologs to demonstrate that they are not resistant to lentiviruses infecting great apes (SIVcpz / HIV-1 / SIVgor). If they have also derived resistance, this would suggest a possible other evolutionary driver or genetic drift.

      Based on our analysis on polymorphic sites using available data from populations of apes, we strongly believe the accumulation of resistant polymorphisms in CD4 did not arise in a stochastic manner. The frequency and accumulation of these changes strongly correlate with the function of CD4 as a receptor for lentivirus entry. We agree that experimentally testing the CD4 protein from bonobo and orangutan would strengthen our conclusions; however, based on our genomic analyses, we decided to focus on the species that would present a higher level of variability of susceptibility to the lentivirus tested, namely gorillas and chimpanzees.

      From Reviewer #2:

      Warren et al. provide a population genetic argument that only endemically infected primates exhibit diversifying selection, again arguing for endemic lentiviruses being the evolutionary driver. The authors compare SNP occurrence in CD4 to neighboring genes, demonstrating that non-synonymous SNP frequency is only elevated in endemically infected species. Moreover, these amino-acid-coding changes are significantly concentrated in the CD4 domain that binds the lentiviral envelope. This is a creative analysis to overcome the problem of very small sample sizes, with very few great ape individuals sequenced. The additional small number of species compared (2-3 in each group) also limits the power of the analysis; the authors could consider expanding their analysis to Old World Monkey species that do or do not have endemic lentiviruses, as well as great apes.

      The scope of this project was to evaluate the differential phenotype of the accumulated polymorphisms found in the ape branch of the primates. Although evaluating the accumulation of polymorphisms in a broader range of primates would generate interesting observations, this would likely require increasing the total number of primate species to include sampling along the speciation tree, many of which lack population level data.

      From Reviewer #1 (minor comment):

      Ancestral reconstruction methods and associated data tables should be included to indicate statistical support for assigned codons. A comment on ambiguity at relevant positions is needed. Similarly, given the polymorphic nature of gorilla and chimpanzee CD4, how confident are the authors in their ancestral reconstructions based on a single representative genome per species? Does this change when you include the broader panel of gorilla sequences? Is the ancestral reconstruction robust to other methods besides PAML?

      We used the PAML software package to reconstruct the ancestral hominin and hominid sequence of CD4 because it is a standard and well recognized method for this purpose. For this analysis, we used the set of primate sequences selected for positive selection analyses (see methods), namely the longest isoform sequences for each of the available species that best aligned with human CD4. We feel that the best way to perform to the ancestral state reconstruction was to use only these curated sequences instead of the population level sequences, removing potential biases introduced by having different numbers of variants per species. 

      From Reviewer #1 (minor comment):

      Page 10: "It seems that allele 2, which doesn't have this glycan, would be at a fitness disadvantage. In support of this, allele 2 is one of the least frequent alleles in the gorilla population that we surveyed (Figure 3B)." - this inference depends on the gorilla species that encode allele 2 and allele frequencies. There are statistical tests to address this inference.

      Population genetic statistics that test for skews in sample allele frequencies are not appropriate here due to the nature of the samples in this study. However, the reviewer is correct that our inference in allele frequency is dependent on the gorilla species that we find this allele in. Allele 2 is found in the Gorilla beringei graueri subspecies of gorilla included in this study.  We only have data for three individuals (six alleles) from this subspecies compared to 51 individual (102 alleles) from Gorilla gorilla gorilla. As such, genetic subdivision between the gorilla subspecies could also produce the low frequency of allele 2 observed in our sample.

      From Reviewer #1 (minor comment):

      Page 11: "These results imply that the resistance to SIVcpz found in gorilla individuals is not dependent on single amino acids, but rather the cumulative effect of multiple SNPs." Would it be more relevant (or relevant in other ways) to test this statement by putting those mutations into the hominid ancestor? Testing individual residues in the context of human CD4 may be subject to epistasis or several other factors.

      We agree that constructing multiple of the resistant SNPs in the susceptible human background would have strengthened our hypothesis, as all these amino acid changes are associated with increased resistance to at least one of the lentiviruses tested. However, the number of CD4 variants to test would increase significantly and we feel that this approach was out of the scope of this manuscript.

      From Reviewer #1 (minor comment):

      Figure 6: If you perform this analysis on chimpanzee CD4 alone do you get the same result? Just gorillas? If you remove eastern/mountain gorillas? The very small numbers of non-human non-SIV-reservoir great apes may preclude a strong conclusion.

      We agree that our study is limited by the small number of available sequences from individuals of the studied species. If we remove a whole species or subspecies the statistical power would be greatly reduced. Removing all chimpanzees or gorillas (or a subspecies) would still show that only each of those species accumulate SNPs in the D1 region of CD4, although with less statistical significance.

      From Reviewer #2 (minor comment):

      Related to Figure 2: It would strengthen the argument that resistance is a derived trait if the authors mapped the causative mutations from gorilla CD4 onto the ancestral hominin CD4. However, this experiment is not particularly critical, merely a suggestion.

      We appreciate this suggestion. We decided to use the human CD4 backbone as it is widely susceptible to lentiviral entry. The hominid and hominin ancestral sequences are almost identical to the human sequence in domain 1, except for a fixed mutation shared with the gorilla CD4. We expect that the SNPs observed in the gorilla population would also reduce susceptibility to lentivirus entry in the ancestral CD4 reconstructions.

      From Reviewer #2 (minor comment):

      Related to Figure 3B: It is difficult to make much of the allele frequency for 8 alleles in 32 individuals. Can the authors collate this with allele frequency for the referenced 100 individuals from Russell et al. 2021, to give a better sense of population frequency? This may allow the authors to better correlate allele frequency with SIVcpz resistance patterns in Figure 4, strengthening their argument that more resistant alleles should be over-represented in the population.

      At the time of our analysis the data from Russell (DOI: 10.1073/pnas.2025914118) was not available to collate or compare. When that data became available, we immediately compared the existence of the alleles found and confirmed that the ones we found were also detected in the samples used in that study.

      From Reviewer #2 (minor comment):

      Related to Figure 6: As written, several methodological details should be clarified. How were human genomes selected to limit the sample size to 50?

      We selected a total of 50 human individuals in order to size-match the sample size of the largest group in Fig 6B (chimpanzee, n=50). We randomly selected 10 individuals for each of the 5 superpopulations [Africans (AFR), Admixed Americans (AMR), East Asians (EAS), Europeans (EUR) and South Asians (SAS)] defined by the 1000 Genome Project.

      From Reviewer #2 (minor comment):

      Related to Figure 6: What comparison is being reported for the Mann-Whitney U test (CD4 vs. which gene)? Are the means shown in A an average of 2 (endemic) or 3 (non-endemic) species - if so, the authors should show the individual data points to give a clearer depiction of the data spread. In addition, it is not clear that a statistical test with sample sizes of 2 is meaningful, since Mann Whitney typically assumes n > 5. To strengthen this statistical argument, it may be necessary to include additional species that have (a) multiple genomes (or at least this locus) sequenced, and (b) have or lack lentiviral sequences. This may necessitate expanding the analysis to include Old World Monkeys (e.g. Rhesus Macaque Genome Project).

      In the Figure 6 we use the Mann-Whitney U test to compare variation between CD4 and the neighboring loci. The average and SEM are for two endemic and four non-endemic species (two orangutan datasets are from two distinct species vs the gorilla subspecies). It is true our sample size is small for any statistical testing. For the Mann-Whitney U-test it is generally preferred to have n > 5 in each group. So, we do run into problems with the endemically infected comparisons as we only have two data points (chimpanzee and gorilla) for the CD4 group. For the uninfected species, CD4 has four data points.

      From Reviewer #1 (minor comment):

      Page 6. "This suggests that the ancestral versions of CD4 in apes were susceptible to primate lentivirus entry" - The data show that tested virus pseudotyped with SIV/HIV envs can engage ancestral CD4 in the context of a canine cell line expressing human CCR5, but not necessarily that this interaction was sufficient for the process of entry per se, especially in the context of a gorilla (or hominid) cell. Some additional context would be useful for a broad readership.

      From Reviewer #1 (minor comment):

      Page 6: "but that selective pressures exerted by SIVs in the chimpanzee and gorilla lineages have led to the retention of mutations that confer resistance to primate lentivirus infection. This has not happened in humans where selective pressure by HIV-1 is too new" - this cannot be concluded from the data in Figure 1. It would be more appropriate as a Discussion point.

      From Reviewer #1 (minor comment):

      Page 14: "Natural tolerance is often required before a virus can establish itself long term in a host reservoir, and thus understanding it is key to understanding virus reservoirs in nature" - please provide a reference. This is one among several theories of long-term host-virus evolution dynamics/outcomes, and further discussion may benefit the broad readership of eLife.

      From Reviewer #1 (minor comment):

      Page 15: "There is a surprising outcome of virus-driven host evolution in that the divergence and diversity of these host genes ultimately comes at a detriment to the very viruses that drove this evolution." - it is not clear to this reviewer why this is surprising.

      From Reviewer #2 (minor comment):

      Related to Figure 5A: The authors suggest that the gorilla glycosylation site provides resistance to SIVcpz, based on TAN1.910, but in fact the glycosylated allele is no more resistant than the un-glycosylated allele to most SIVcpz strains (in Figure 4). The authors should acknowledge this more clearly in the text.

      From Reviewer #2 (minor comment):

      The title of this article (that infection "has driven selection") is somewhat overstated - though it seems very likely that lentiviruses are driving CD4 diversification, this is difficult to prove. The arguments presented here rely on very few data points: modern chimp and gorilla compared to ancestral CD4, and a population genetic analysis relying on 2 or 3 species with 10-50 individuals each. The authors should either bolster these arguments (see the above suggestions) and/or soften the claim in the title.

      Modifications to the main text of the manuscript have been made to enhance clarity on the subjects stated above.

    1. eLife assessment

      The humanized model of EAE represents a valuable model in which to evaluate mechanisms that may drive EAE-like processes in vivo. The data are solid given the revisions and expansion of numbers of mice to yield more statistical rigor. This model will be used by the greater community studying EAE.

    2. Joint Public Review:

      The premise of this work carries great potential. Namely, developing a humanized mouse system in which features of adaptive immunity that contribute to inflammatory demyelination can be interrogated will allow for traction into therapeutics currently unavailable to the field. Immediate questions stemming from the current study include the potential effect of ex vivo activation of PBMCs (or individual T and B cells) in vitro prior to transfer as well as the TCR and BCR repertoire of CNS vs peripheral lymphocytes before and after immunization. This group has been thoughtful and clever about their approach (e.g. use of subjects treated with natalizumab), which gives hope that fundamental aspects of pathogenesis will be uncovered by this form of modeling MS disease.

      Multiple sclerosis is an inflammatory and demyelinating disease of the central nervous system where immune cells play an important role in disease pathobiology. Increased incidence of disease in individuals carrying certain HLA class-II genes plus studies in animal models suggests that HLA-DRB1*15 restricted CD4 T cells might be responsible for disease initiation, and other immune cells such as B cells, CD8 T cells, monocytes/macrophages, and dendritic cells (DC) also contribute to disease pathology. However, a direct role of human immune cells in disease is lacking to a lag between immune activation and the first sign of clinical disease. Therefore, there is an emphasis on understanding whether immune cells from HLA-DR15+ MS patients differ from HLA-DR15+ healthy controls in their phenotype and pro-inflammatory capacity. To overcome this, authors have used severely immunodeficient B2m-NOG mice that lack B, T cells and NK cells and have defective innate immune responses and engrafted PBMCs from 3 human donors (HLA-DR15+ MS and HI donors, HLA-DR13+ MS donor) in these B2m-NOG mice to determine whether they can induce CNS inflammation and demyelination like MS.

      The study's strength is the use of PBMCs from HLADRB1-typed MS subjects and healthy control, the use of NOG mice, the characterization of immune subsets (revealing some interesting observations), CNS pathology etc. Weaknesses are lack of phenotype in mice and no disease phenotype even in humanized mice immunized for disease using standard disease induction protocol employed in an animal model of MS, and lack of mechanistic data on why CD8 T cells are more enriched than CD4+ T cells. The last point is important as postmortem human MS patients' brain tissue had been shown to have more CD8+ T cells than CD4+ T cells.

      Thus, this work is an important step in the right direction as previous humanized studies have not used HLA-DRB1 typed PBMCs however the weaknesses as highlighted above are limitations in the model.

    3. Author response:

      The following is the authors’ response to the original reviews.

      We provide below a point-by-point reply to the Reviewers, and hope that our new manuscript will now meet the Reviewers’ concerns and the requirements for publication in eLife. 

      In summary, we have performed a new set of mouse humanization experiments using a new cohort of 4 additional HLA-DRB1*15-typed MS patients as donors, all presenting with highly active disease and under treatment with natalizumab. The new experiments aim to strengthen and further extend the findings of the original paper that HLA restriction rather than disease status plays an important role in the development of CNS inflammation. Additionally, we performed EAE using a revised protocol using lower amounts of peptide antigens to reduce the possibility of immune tolerance. Indeed, our original observations were further enriched with the finding that immunization increases infiltration of the CNS by human CD4 T cells, a finding consistent with EAE pathology, and that these human CD4 T cells co-localize with human CD8 T cells in the brain lesions. Further, we provide more detailed information concerning the EBV infection status of the PBMC donors used for humanization and find some first indications of relationships between the B cell engraftment in humanized mice, EBV status  of the donors and the development of brain lesions that might stimulate further investigation in future studies.   

      Point-by-point reply to reviewers:

      Reviewer 1:

      We thank Reviewer 1 for their valuable comments, and for their support of the overall approach as a model system. We have addressed the comments by providing additional requested information, as well as performing a EAE with a revised protocol, as suggested. We believe the new results significantly upgrades the information gained from this study.

      (1) Throughout their paper, the authors never quantify the difference in CD4 vs CD8 T cell infiltration into the CNS. While repeatedly claiming that there are fewer CD4 T cells present than CD8 T cells within the CNS, this data is not included. Further, spinal cord numbers of CD4 and CD8 are not provided in lieu of CD3 T cell characterization.

      Reply: We have now included quantitative data for the differences in CD4 vs CD8 T cells in the brain and spinal cord of non-immunized and EAE immunized mice. Thus, in brain (Fig. 2E) and spinal cord (Fig. 3D) of non-immunized mice, and brain (Fig. 4D, E, L) and spinal cord (Fig. 5D) of immunized mice we show data for numbers of hCD8 and hCD4 T cells, and ratios of CD4 to CD8 in at borders and parenchyma. Notably, using a revised EAE protocol in the second set of experiments, we observed a marked increase in hCD4 T cell infiltration at the CNS borders and parenchyma, an observation consistent with successful EAE immunization.

      B cells don't make up any significant component of the cells transferred from HLA-DR15 donors. While the cells transferred from the HLA-DR13 donor are composed of a considerable number of B cells, the mice that received these cells didn't develop any signs of neurologic disease.

      In the second experiment using new DR15 MS donors, we observed significant B cell engraftment also in several groups of DR15 MS mice. With the additional groups of mice, we were able to see a relationship between B cell engraftment in DR13 and DR15 MS mice with indicators of recent or ongoing reactivation of EBV. This is an interesting preliminary observation that might be tested in future larger studies. 

      (2) Incomplete exploration of potential experimental autoimmune encephalomyelitis (EAE) modeling. Comparison of the susceptibility of B2m-NOG mice to EAE dependent on various peptide doses would be highly informative. Given that the number of hCD45+ in the periphery of NOG mice decreases following this immunization it would be prudent for the authors to determine if such a high peptide dose is truly ideal for EAE development in this mouse model.

      Reply: We thank the reviewer for this critical comment. In the second group of experiments (DR15 MS2-5), we revised the EAE protocol to use lower amounts of peptides in a single immunization, thereby greatly reducing the exposure of human T cells to antigen and risk of tolerance/anergy. This resulted in (i), by-pass of the reduction in proportions of peripheral hCD45 cells following immunization in the peripheral blood (Fig. 1A), and (ii), increased numbers of hCD4 T cells and hCD4/hCD8 T cell ratios at the borders and infiltrating the parenchyma of brain (Fig. 4D,E) and spinal cord (Fig. 5D). 

      (3) The degree of myelin injury is not presented. The statement is repeatedly made that "demyelination was not observed in the brain or spinal cord" but no quantification of myelin staining is shown.  

      Reply: The reviewer refers to a pivotal feature (and limitation) of this particular humanized model. Despite significant T cell infiltration of white and grey matter regions of brain and spinal cord, there is no detectable demyelination. This has also been reported by in independent study using a similar humanized system (Zayoud et al., 2013). We have supplemented the figures with photomicrographs showing the presence of unperturbed myelin in the corpus callosum white T cell lesions (Fig. 4F, inset stained with Luxol fast blue), and a confocal micrograph in the same region double-immunostained for hCD45 immune cells and MBP (Fig. 4G). 

      Minor points:

      Method of quantification (e.g. cells per brain slice in figures 2E; 4E) is not very quantitative and should be justified or more appropriately updated to be more rigorous in methodology.

      Reply: In the new figures, we have changed the method of quantification of brain parenchyma infiltrating cells from per brain slice, to cells per tissue area mm2 (Fig. 2D, Fig. 4D).

      Fig. 4 data should be shown from un-immunized DR15 MS and DR15 HI mice.

      Reply: We now include the quantitative data from un-immunized mice compared to immunized mice in all groups (Fig. 4 C-E). 

      Reviewer 2:

      We thank Reviewer 2 for their very pertinent comments and overall for highlighting the importance of humanized mice as an approach for further understanding the pathobiology of MS. We also thank this reviewer for their positive comments concerning the study design, specifically the use of fresh PBMC isolated from HLADRB1-typed MS individuals and healthy control. The reviewer highlights 4 major weaknesses of the study that we have tried to address in order to increase the value of the study.

      (i) Lack of sufficient sample size (n=1 in each group) to make any conclusion.

      Reply: We have increased the sample size for the DR15 MS group from n=1 to n=5 by generating new humanized mice using PBMC freshly isolated from additional MS donors, all HLA-DRB1*5 with active RRMS and under treatment with natalizumab. Here we were able to maximize on our excellent collaboration with neurologists at the neighboring University Hospital, which runs a large organized MS outpatient clinic, with HLADRB1-typed MS individuals that are closely monitored over the course of their disease and therapy. In this way, we were able to address the engraftment success of human immune cells and variability in CNS lesion development across mice generated from 5 different DR15 MS patients. We also monitored markers for EBV activation status in all the patients used for mouse humanization in this study. 

      (ii) Lack of phenotype in mice.

      Reply: As already described in the results and address in the discussion, the B2m-NOG immunodeficient mouse strain used here is a state-of-the-art experimental tool for humanization studies, but unfortunately fails to support engraftment by human monocytes. We and previous groups (Zayoud et al., 2013) show that CNS lesions in humanized mice contain high numbers of hCD4 and CD8 T cells, accompanied by locally activated murine microglia and astrocytes, but lack human monocytes. The humanized mice contain large proportions of immature mouse CD11b+Ly6Chi monocytes in the periphery (Suppl. Table 4) but these cells are not recruited into the CNS in non-immunized or immunized humanized mice, potentially due to incompatible chemokine signals across mouse/human. The absence of human monocyte engraftment in this model is the most likely reason that lesions do not demyelinate and this limitation of the currently available host mouse strains is one that needs to be addressed before full modelling of CNS demyelination by human immune cells can be achieved.

      (iii) No disease phenotype even in humanized mice immunized for disease using standard disease induction protocol employed in an animal model of MS.

      Reply: As described above, following the suggestion of reviewer 1 (point 2) we revised the EAE protocol to use lower amounts of peptides given as a single immunization. This resulted in increased numbers of hCD4 T cells and the hCD4/hCD8 T cell ratios at the borders and infiltrating the parenchyma of brain ((Fig. 1E, Fig. 2D) and spinal cord (Fig. 5D), all indicative of a successful EAE immunization. Although immunized mice showed lesions with mixed populations of hCD4 and hCD8 T cells, demyelination and therefore clinical symptoms were again not observed. As outlined in (ii) above, successful human monocyte engraftment would be fundamental for the development of demyelination and clinical symptoms in PBMC humanized mice, and new immunodeficient animal strains should be developed to achieve this.  

      (iv) Mechanistic data on why CD8 T cells are more enriched than CD4+ T cells.

      Reply: The question of why hCD8 T cells are more enriched in the CNS than hCD4 cells is answered at least in part by the results from our new EAE experiments, which clearly show that immunization increases CNS infiltration by hCD4 T cells versus hCD8 T cells. In general, EAE protocols are designed to activate antigen-specific CD4 T cells and this is verified in the CNS of immunized humanized mice, where hCD4 T cells infiltrate to join hCD8T cells in lesion areas. The predilection of hCD8 T cells for CNS is obvious in non-immunized humanized mice, especially in the parenchyma (see Fig. 2E) and MS patients, while hCD4 infiltration becomes important after EAE immunization. The humanized model system might therefore represent a unique tool for studying mechanisms underlying preferential hCD8 T cell involvement in MS neuroinflammaton, a system that is not accurately modelled in current EAE models. As this reviewer correctly points out, this is very important point as postmortem MS patients’ brains have more CD8 T cells than CD4 T cells.

    1. eLife assessment

      In this valuable study, the authors use a computational model to investigate how recurrent connections influence the firing patterns of grid cells, which are thought to play a role in encoding an animal's position in space. The work suggests that a one-dimensional network architecture may be sufficient to generate the hexagonal firing patterns of grid cells, a possible alternative to attractor models based on recurrent connectivity between grid cells. However, the support for this proposal was incomplete, as some conclusions for how well the model dynamics are necessary to generate features of grid cell organization were not well supported.

    2. Reviewer #1 (Public Review):

      I'll begin by summarizing what I understand from the results presented, and where relevant how my understanding seems to differ from the authors' claims. I'll then make specific comments with respect to points raised in my previous review (below), using the same numbering. Because this is a revision I'll try to restrict comments here to the changes made, which provide some clarification, but leave many issues incompletely addressed.

      As I understand it the main new result here is that certain recurrent network architectures promote emergence of coordinated grid firing patterns in a model previously introduced by Kropff and Treves (Hippocampus, 2008). The previous work very nicely showed that single neurons that receive stable spatial input could 'learn' to generate grid representations by combining a plasticity rule with firing rate adaptation. The previous study also showed that when multiple neurons were synaptically connected their grid representations could develop a shared orientation, although with the recurrent connectivity previously used this substantially reduced the grid scores of many of the neurons. The advance here is to show that if the initial recurrent connectivity is consistent with that of a line attractor then the network does a much better job of establishing grid firing patterns with shared orientation.

      Beyond this point, things become potentially confusing. As I understand it now, the important influence of the recurrent dynamics is in establishing the shared orientation and not in its online generation. This is clear from Figure S3, but not from an initial read of the abstract or main text. This result is consistent with Kropff and Treves' initial suggestion that 'a strong collateral connection... from neuron A to neuron B... favors the two neurons to have close-by fields... Summing all possible contributions would result in a field for neuron B that is a ring around the field of neuron A.' This should be the case for the recurrent connections now considered, but the evidence provided doesn't convincingly show that attractor dynamics of the circuit are a necessary condition for this to arise. My general suggestion for the authors is to remove these kind of claims and to keep their interpretations more closely aligned with what the results show.

      Major (numbered according to previous review)

      (1) Does the network maintain attractor dynamics after training? Results now show that 'in a trained network without feedforward Hebbian learning the removal of recurrent collaterals results in a slight increase in gridness and spacing'. This clearly implies that the recurrent collaterals are not required for online generation of the grid patterns. This point needs to be abundantly clear in the abstract and main text so the reader can appreciate that the recurrent dynamics are important specifically during learning.<br /> (2) Additional controls for Figure 2 to test that it is connectivity rather than attractor dynamics (e.g. drawing weights from Gaussian or exponential distributions). The authors provide one additional control based on shuffling weights. However, this is far from exhaustive and it seems difficult on this basis to conclude that it is specifically the attractor dynamics that drive the emergence of coordinated grid firing.<br /> (3) What happens if recurrent connections are turned off? The new data clearly show that the recurrent connections are not required for online grid firing, but this is not clear from the abstract and is hard to appreciate from the main text.<br /> (4) This is addressed, although the legend to Fig. S2D could provide an explanation / definition for the y-axis values.<br /> (5) Given the 2D structure of the network input it perhaps isn't surprising that the network generates 2D representations and this may have little to do with its 1D connectivity. The finding that the networks maintain coordinated grids when recurrent connections are switched off supports my initial concern and the authors explanation, to me at least, remain confusing. I think it would be helpful to consider that the connectivity is specifically important for establishing the coordinated grid firing, but that the online network does not require attractor dynamics to generate coordinated grid firing.<br /> (6) Clarity of the introduction. This is somewhat clearer, but I wonder if it would be hard for someone not familiar with the literature to accurately appreciate the key points.<br /> (7) Remapping. I'm not sure why this is ill posed. It seems the proposed model can not account for remapping results (e.g. Fyhn et al. 2007). Perhaps the authors could just clearly state this as a limitation of the model (or show that it can do this).

      Previous review:

      This study investigates the impact of recurrent connections on grid fields generated in networks trained by adjusting the strength of feedforward spatial inputs. The main result is that if the recurrent connections in the network are given a 1D continuous attractor architecture, then aligned grid firing patterns emerge in the network following training. Detailed analyses of the low dimensional dynamics of the resulting networks are then presented. The simulations and analyses appear carefully carried out.

      The feedforward model investigated by the authors (previously introduced by Kropff & Treves, 2008) is an interesting and important alternative to models that generate grid firing patterns through 2-dimensional continuous attractor network (CAN) dynamics. However, while both classes of model generate grid fields, in making comparisons the manuscript is insufficiently clear about their differences. In particular, in the CAN models grid firing is a direct result of their 2-D architecture, either a torus structure with a single activity bump (e.g. Guanella et al. 2007, Pastoll et al. 2013), or sheet with multiple local activity bumps (Fuhs & Touretzky, Burak & Fiete, 2009). In these models, spatial input can anchor the grid representations but is not necessary for grid firing. By contrast, in the feedforward models neurons transform existing spatial inputs into a grid representation. Thus, the two classes of model implement different computations; CANs path integrate, while the feedforward models transform spatial representations. A demonstration that a 1D CAN generates coordinated 2D grid fields would be surprising and important, but its less clear why coordination between grids generated by the feedforward mechanism would be surprising. As written, it's unclear which of these claims the study is trying to make. If the former, then the conclusion doesn't appear well supported by the data as presented, if the latter then the results are perhaps not so unexpected, and the imposed attractor dynamics may still not be relevant.

      Whichever claim is being made, it could be helpful to more carefully evaluate the model dynamics given predictions expected for the different classes of model. Key questions that are not answered by the manuscript include:

      - At what point is the 1D attractor architecture playing a role in the models presented here? Is it important specifically for training or is it also contributing to computation in the fully trained network?

      - Is an attractor architecture required at all for emergence of population alignment and gridness? Key controls missing from Figure 2 include training on networks with other architectures. For example, one might consider various architectures with randomly structured connectivity (e.g. drawing weights from exponential or Gaussian distributions).

      - In the trained models do the recurrent connections substantially influence activity in the test conditions? Or after training are the 1D dynamics drowned out by feedforward inputs?

      - What is the low dimensional structure of the input to the network? Can the apparent discrepancy between dimensionality of architecture and representation be resolved by considering structure of the inputs, e.g. if the input is a 2 dimensional representation of location then is it surprising that the output is too?

      - What happens to representations in the trained networks presented when place cells remap? Is the 1D manifold maintained as expected for CAN models, or does it reorganise?

    3. Reviewer #3 (Public Review):

      Summary:

      The paper proposes an alternative to the attractor hypothesis, as an explanation for the fact that grid cell population activity patterns (within a module) span a toroidal manifold. The proposal is based on a class of models that were extensively studied in the past, in which grid cells are driven by synaptic inputs from place cells in the hippocampus. The synapses are updated according to a Hebbian plasticity rule. Combined with an adaptation mechanism, this leads to patterning of the inputs from place cells to grid cells such that the spatial activity patterns are organized as an array of localized firing fields with hexagonal order. I refer to these models below as feedforward models.

      It has already been shown by Si, Kropff, and Treves in 2012 that recurrent connections between grid cells can lead to alignment of their spatial response patterns. This idea was revisited by Urdapilleta, Si, and Treves in 2017. Thus, it should already be clear that in such models, the population activity pattern spans a manifold with toroidal topology. The main new contributions in the present paper are (i) in considering a form of recurrent connectivity that was not directly addressed before. (ii) in applying topological analysis to simulations of the model. (iii) in interpreting the results as a potential explanation for the observations of Gardner et al.

      Strengths:

      The exploration of learning in a feedforward model, when recurrent connectivity in the grid cell layer is structured in a ring topology, is interesting. The insight that this not only align the grid cells in a common direction but also creates a correspondence between their intrinsic coordinate (in terms of the ring-like recurrent connectivity) and their tuning on the torus is interesting as well, and the paper as a whole may influence future theoretical thinking on the mechanisms giving rise to the properties of grid cells.

      Weaknesses:

      (1) In Si, Kropff and Treves (2012) recurrent connectivity was dependent on the head direction tuning, in addition to the location on a 2d plane, and therefore involved a ring structure. Urdapilleta, Si, and Treves considered connectivity that depends on the distance on a 2d plane. The novelty here is that the initial connectivity is structured uniquely according to latent coordinates residing on a ring.

      (2) The paper refers to the initial connectivity within the grid cell layer as one that produces an attractor. However, it is not shown that this connectivity, on its own, indeed sustains persistent attractor states. Furthermore, it is not clear whether this is even necessary to obtain the results of the model. It seems possible that (possibly weaker) connections with ring topology, that do not produce attractor dynamics but induce correlations between neurons with similar locations on the ring would be sufficient to align the spatial response patterns during the learning of feedforward weights.

      (3) Given that all the grid cells are driven by an input from place cells that span a 2d manifold, and that the activity in the grid cell network settles on a steady state which is uniquely determined by the inputs, it is expected that the manifold of activity states in the grid cell layer, corresponding to inputs that locally span a 2d surface, would also locally span a 2d plane. The result is not surprising. My understanding is that this result is derived as a prerequisite for the topological analysis, and it is therefore quite technical.

      (4) The modeling is all done in planar 2d environments, where the feedforward learning mechanism promotes the emergence of a hexagonal pattern in the single neuron tuning curve. Under the scenario in which grid cell responses are aligned (i.e. all neurons develop spatial patterns with the same spacing and orientation) it is already quite clear, even without any topological analysis that the emerging topology of the population activity is a torus.

      However, the toroidal topology of grid cells in reality has been observed by Gardner et al also in the wagon wheel environment, in sleep, and close to boundaries (whereas here the analysis is restricted to the a sub-region of the environment, far away from the walls). There is substantial evidence based on pairwise correlations that it persists also in various other situations, in which the spatial response pattern is not a hexagonal firing pattern. It is not clear that the mechanism proposed in the present paper would generate toroidal topology of the population activity in more complex environments. In fact, it seems likely that it will not do so, and this is not explored in the manuscript.

      (5) Moreover, the recent work of Gardner et al. demonstrated much more than the preservation of the topology in the different environments and in sleep: the toroidal tuning curves of individual neurons remained the same in different environments. Previous works, that analyzed pairwise correlations under hippocampal inactivation and various other manipulations, also pointed towards the same conclusion. Thus, the same population activity patterns are expressed in many different conditions. In the present model, this preservation across environments is not expected. Moreover, the results of Figure 6 suggest that even across distinct rectangular environments, toroidal tuning curves will not be preserved, because there are multiple possible arrangements of the phases on the torus which emerge in different simulations.

      (6) In real grid cells, there is a dense and fairly uniform representation of all phases (see the toroidal tuning of grid cells measured by Gardner et al). Thus, the highly clustered phases obtained in the model (Fig. S1) seem incompatible with the experimental reality. I suspect that this may be related to the difficulty in identifying the topology of a torus in persistent homology analysis based on the transpose of the matrix M.

      (7) The motivations stated in the introduction came across to me as weak. As now acknolwledged in the manuscript, attractor models can be fully compatible with distortions of the hexagonal spatial response patterns - they become incompatible with this spatial distortions only if one adopts a highly naive and implausible hypothesis that the attractor state is updated only by path integration. While attractor models are compatible with distortions of the spatial response pattern, it is very difficult to explain why the population activity patterns are tightly preserved across multiple conditions without a rigid two-dimentional attractor structure. This strong prediction of attractor models withstood many experimental tests - in fact, I am not aware of any data set where substantial distortions of the toroidal activity manifold were observed, despite many attempts to challenge the model. This is the main motivation for attractor models. The present model does not explain these features, yet it also does not directly offer an explanation for distortions in the spatial response pattern.

      (8). There is also some weakness in the mathematical description of the dynamics. Mathematical equations are formulated in discrete time steps, without a clear interpretation in terms of biophysically relevant time scales. It appears that there are no terms in the dynamics associated with an intrinsic time scale of the neurons or the synapses (a leak time constant and/or synaptic time constants). I generally favor simple models without lots of complexity, yet within this style of modelling, the formulation adopted in this manuscript is unconventional, introducing a difficulty in interpreting synaptic weights as being weak or strong, and a difficulty in interpreting the model in the context of other studies.

      In my view, the weaknesses discussed above limit the ability of the model, as it stands, to offer a compelling explanation for the toroidal topology of grid cell population activity patterns, and especially the rigidity of the manifold across environments and behavioral states. Still, the work offers an interesting way of thinking on how the toroidal topology might emerge.

    1. eLife assessment

      This study presents a valuable insight into a computational mechanism of pain perception. The evidence supporting the authors' claims is compelling. The work will be of interest to pain researchers working on computational models and cognitive mechanisms of pain in a Bayesian framework.

    2. Reviewer #1 (Public Review):

      Summary:

      This study examined the role of statistical learning in pain perception, suggesting that individuals' expectations about a sequence of events influence their perception of pain intensity. They incorporated the components of volatility and stochasticity into their experimental design and asked participants (n = 27) to rate the pain intensity, their prediction, and their confidence level. They compared two different inference strategies: Bayesian inference vs. heuristic-employing Kalman filters and model-free reinforcement learning. They showed that the expectation-weighted Kalman filter best explained the temporal pattern of participants' ratings. These results provide evidence for a Bayesian inference perspective on pain, supported by a computational model that elucidates the underlying process.

      Strengths:

      - Their experimental design included a wide range of input intensities and the levels of volatility and stochasticity. With elaborated computational models, they provide solid evidence that statistical learning shapes pain.

      Weaknesses:

      - Relevance to clinical pain: While the authors underscore the relevance of their findings to chronic pain, they did not include data pertaining to clinical pain.

    3. Reviewer #3 (Public Review):

      The study investigated how statistical aspects of temperature sequences, such as manipulations of stochasticity (i.e., randomness of a sequence) and volatility (i.e., speed at which a sequence unfolded) influenced pain perception. Using an innovative stimulation paradigm and computational modelling of perceptual variables, this study demonstrated that perception is weighted by expectations. Overall, the findings support the conclusion that pain perception is mediated by expectations in a Bayesian manner. The provision of additional details during the review process strengthens the reliability of this conclusion. The methods presented offer tools and frameworks for further research in pain perception and can be extended to investigations into chronic pain processes.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a valuable insight into a computational mechanism of pain perception. The evidence supporting the authors’ claims is solid, although the inclusion of 1) more diverse candidate computational models, 2) more systematic analysis of the temporal regularity effects on the model fit, and 3) tests on clinical samples would have strengthened the study. The work will be of interest to pain researchers working on computational models and cognitive mechanisms of pain in a Bayesian framework.

      Thank you very much again for considering the manuscript and judging it as a valuable contribution to understanding mechanisms of pain perception. We recognise the above-mentioned points of improvement and elaborate on them in the initial response to the reviewers.

      Response to the reviewers

      Reviewer 1:

      Reviewer Comment 1.1 — Selection of candidate computational models: While the paper juxtaposes the simple model-free RL model against a Kalman Filter model in the context of pain perception, the rationale behind this choice remains ambiguous. It prompts the question: could other RL-based models, such as model-based RL or hierarchical RL, offer additional insights? A more detailed explanation of their computational model selection would provide greater clarity and depth to the study.

      Initial reply: Thank you for this point. Our models were selected a-priori, following the modelling strategy from Jepma et al. (2018) and hence considered the same set of core models for clear extension of the analysis to our non-cue paradigm. The key question for us was whether expectations were used to weight the behavioural estimates, so our main interest was to compare expectation vs non-expectation weighted models.

      Model-based and hierarchical RL are very broad terms that can be used to refer to many different models, and we are not clear about which specific models the reviewer is referring to. Our Bayesian models are generative models, i.e. they learn the generative statistics of the environment (which is characterised by inherent stochasticity and volatility) and hence operate model-based analyses of the stimulus dynamics. In our case, this happened hierarchically and it was combined with a simple RL rule.

      Revised reply: We clarified our modelling choices in the ”Modelling strategy” subsection of the results section.

      Reviewer Comment 1.2 — Effects of varying levels of volatility and stochasticity: The study commendably integrates varying levels of volatility and stochasticity into its experimental design. However, the depth of analysis concerning the effects of these variables on model fit appears shallow. A looming concern is whether the superior performance of the expectation-weighted Kalman Filter model might be a natural outcome of the experimental design. While the non-significant difference between eKF and eRL for the high stochasticity condition somewhat alleviates this concern, it raises another query: Would a more granular analysis of volatility and stochasticity effects reveal fine-grained model fit patterns?

      Initial reply: We are sorry that the reviewer finds shallow ”the depth of analysis concerning the effects of these variables on model fit”. We are not sure which analysis the reviewer has in mind when suggesting a ”more granular analysis of volatility and stochasticity effects” to ”reveal fine-grained model fit patterns”. Therefore, we find it difficult to improve our manuscript in this regard. We are happy to add analyses to our paper but we would be greatful for some specific pointers. We have already provided:

      •    Analysis of model-naive performance across different levels of stochasticity and volatility (section 2.3, figure 3, supplementary information section 1.1 and tables S1-2)

      •    Model fitting for each stochasticity/volatility condition (section 2.4.1, figure 4, supplementary table S5)

      •    Group-level and individual-level differences of each model parameter across stochasticity/volatility conditions (supplementary information section 7, figures S4-S5).

      •    Effect of confidence on scaling factor for each stochasticity/volatility condition (figure 5)

      Reviewer Comment 1.3 — Rating instruction: According to Fig. 1A, participants were prompted to rate their responses to the question, ”How much pain DID you just feel?” and to specify their confidence level regarding their pain. It is difficult for me to understand the meaning of confidence in this context, given that they were asked to report their *subjective* feelings. It might have been better to query participants about perceived stimulus intensity levels. This perspective is seemingly echoed in lines 100-101, ”the primary aim of the experiment was to determine whether the expectations participants hold about the sequence inform their perceptual beliefs about the intensity of the stimuli.”

      Initial reply: Thank you for raising this question, which allows us to clarify our paradigm. On half of the trials, participants were asked to report the perceived intensity of the previous stimulus; on the remaining trials, participants were requested to predict the intensity of the next stimulus. Therefore, we did query ”participants about perceived stimulus intensity levels”, as described at lines 49-55, 296-303, and depicted in figure 1.

      The confidence refers to the level of confidence that participants have regarding their rating - how sure they are. This is done in addition to their perceived stimulus intensity and it has been used in a large body of previous studies in any sensory modality.

      Reviewer Comment 1.4 — Relevance to clinical pain: While the authors underscore the relevance of their findings to chronic pain, they did not include data pertaining to clinical pain. Notably, their initial preprint seemed to encompass data from a clinical sample (https://www.medrxiv.org /content/10.1101/2023.03.23.23287656v1), which, for reasons unexplained, has been omitted in the current version. Clarification on this discrepancy would be instrumental in discerning the true relevance of the study’s findings to clinical pain scenarios.

      Initial reply: The preprint that the Reviewer is referring to was an older version of the manuscript in which we combined two different experiments, which were initially born as separate studies: the one that we submitted to eLife (done in the lab, with noxious stimuli in healthy participants) and an online study with a different statistical learning paradigm (without noxious stimuli, in chronic back pain participants). Unfortunately, the paradigms were different and not directly comparable. Indeed, following submission to a different journal, the manuscript was criticised for this reason. We therefore split the paper in two, and submitted the first study to eLife. We are now planning to perform the same lab-based experiment with noxious stimuli on chronic back pain participants. Progress on this front has been slowed down by the fact that I (Flavia Mancini) am on maternity leave, but it remains top priority once back to work.

      Reviewer Comment 1.5 — Paper organization: The paper’s organization appears a little bit weird, possibly due to the removal of significant content from their initial preprint. Sections 2.12.2 and 2.4 seem more suitable for the Methods section, while 2.3 and 2.4.1 are the only parts that present results. In addition, enhancing clarity through graphical diagrams, especially for the experimental design and computational models, would be quite beneficial. A reference point could be Fig. 1 and Fig. 5 from Jepma et al. (2018), which similarly explored RL and KF models.

      Initial reply: Thank you for these suggestions. We will consider restructuring the paper in the revised version.

      Revised reply: We restructured introduction, results and parts of the methods. We followed the reviewer’s suggestion regarding enhancing clarity through graphical diagrams. We have visualised the experimental design in Figure 1D. Furthemore, we have visualised the two main computational models (eRL and eKF) in Figure 2, following from Jepma et al. (2018). As a result, we have updated the notation in Section 4.4 to be clearer and consistent with the graphical representation (rename the variable referring to observed thermal input from Ot to Nt).

      Reviewer Comment 1.6 — In lines 99-100, the statement ”following the work by [23]” would be more helpful if it included a concise summary of the main concepts from the referenced work.

      - It would be helpful to have descriptions of the conditions that Figure 1C is elaborating on.

      - In line 364, the ”N {t}” in the sentence ”The observation on trial t, N {t}”, should be O {t}.

      Initial reply: Thank you for spotting these and for providing the suggestions. We will include the correction in the revised version.

      Revised reply: We have added the following regarding the lines 99-100:

      ”We build on the work by [23], who show that pain perception is strongly influenced by expectations as defined by a cue that predicts high or low pain. In contrast to the cue-paradigm from [23], the primary aim of our experiment was to determine whether the expectations participants hold about the sequence itself inform their perceptual beliefs about the intensity of the stimuli.”

      See comment in the previous reply, regarding the notation change from Ot to Nt.

      Reviewer 2:

      Reviewer Comment 2.1 — This is a highly interesting and novel finding with potential implications for the understanding and treatment of chronic pain where pain regulation is deficient. The paradigm is clear, the analysis is state-of-the-art, the results are convincing, and the interpretation is adequate.

      Initial reply: Thank you very much for these positive comments.

      Reviewer 3:

      Summary:

      I am pleased to have had the opportunity to review this manuscript, which investigated the role of statistical learning in the modulation of pain perception. In short, the study showed that statistical aspects of temperature sequences, with respect to specific manipulations of stochasticity (i.e., randomness of a sequence) and volatility (i.e., speed at which a sequence unfolded) influenced pain perception. Computational modelling of perceptual variables (i.e., multi-dimensional ratings of perceived or predicted stimuli) indicated that models of perception weighted by expectations were the best explanation for the data. My comments below are not intended to undermine or question the quality of this research. Rather, they are offered with the intention of enhancing what is already a significant contribution to the pain neuroscience field. Below, I highlight the strengths and weaknesses of the manuscript and offer suggestions for incorporating additional methodological details.

      Strengths:

      The manuscript is articulate, coherent, and skilfully written, making it accessible and engaging.

      - The innovative stimulation paradigm enables the exploration of expectancy effects on perception without depending on external cues, lending a unique angle to the research.

      - By including participants’ ratings of both perceptual aspects and their confidence in what they perceived or predicted, the study provides an additional layer of information to the understanding of perceptual decision-making. This information was thoughtfully incorporated into the modelling, enabling the investigation of how confidence influences learning.

      - The computational modelling techniques utilised here are methodologically robust. I commend the authors for their attention to model and parameter recovery, a facet often neglected in previous computational neuroscience studies.

      - The well-chosen citations not only reflect a clear grasp of the current research landscape but also contribute thoughtfully to ongoing discussions within the field of pain neuroscience.

      Initial reply: We are really grateful for reviewer’s insightful comments and for providing useful guidance regarding our methodology. We are also thankful for highlighting the strengths of our manuscript. Below we respond to individual weakness mentioned in the reviews report.

      Reviewer Comment 3.1 — In Figure 1, panel C, the authors illustrate the stimulation intensity, perceived intensity, and prediction intensity on the same scale, facilitating a more direct comparison. It appears that the stimulation intensity has been mathematically transformed to fit a scale from 0 to 100, aligning it with the intensity ratings corresponding to either past or future stimuli. Given that the pain threshold is specifically marked at 50 on this scale, one could logically infer that all ratings falling below this value should be deemed non-painful. However, I find myself uncertain about this interpretation, especially in relation to the term ”arbitrary units” used in the figure. I would greatly appreciate clarification on how to accurately interpret these units, as well as an explanation of the relationship between these values and the definition of pain threshold in this experiment.

      Initial reply: Indeed, as detailed in the Methods section 4.3, the stimulation intensity was originally transformed from the 1-13 scale to 0-100 scale to match the scales in the participant response screens.

      Following the method used to establish the pain threshold, we set the stimulus intensity of 7 as the threshold on the original 1-13 scale. However, during the rating part of the experiment, several of the participants never or very rarely selected a value above 50 (their individually defined pain threshold), despite previously indicating a moment during pain threshold procedure when a stimulus becomes painful. This then results in the re-scaled intensity values as well the perception rating, both on the same 0-100 scale of arbitrary units, to never go above the pain threshold. Please see all participant ratings and inputs in the Figure below. We see that it would be more illustrative to re-plot Figure 1 with a different exemplary participant, whose ratings go above the pain threshold, perhaps with an input intensity on the 1-13 scale on the additional right-hand-side y-axis. We will add this in the revised version as well as highlight the fact above.

      Importantly, while values below 50 are deemed non-painful by participants, the thermal stimulation still activates C-fibres involved in nociception, and we would argue that the modelling framework and analysis still applies in this case.

      Revised reply: We re-plotted Figure 1E-F with a different exemplary participant, whose rating go above the pain threshold. We also included all participant pain perception and prediction ratings, noxious input sequences and confidence ratings in the supplement in Figures S1-S3.

      Reviewer Comment 3.2 — The method of generating fluctuations in stimulation temperatures, along with the handling of perceptual uncertainty in modelling, requires further elucidation. The current models appear to presume that participants perceive each stimulus accurately, introducing noise only at the response stage. This assumption may fail to capture the inherent uncertainty in the perception of each stimulus intensity, especially when differences in consecutive temperatures are as minimal as 1°C.

      Initial reply: We agree with the reviewer that there are multiple sources of uncertainty involved in the process of rating the intensity of thermal stimuli - including the perception uncertainty. In order to include an account of inaccurate perception, one would have to consider different sources that contribute to this, which there may be many. In our approach, we consider one, which is captured in the expectation weighted model, more clearly exemplified in the expectation-weighted Kalman-Filter model (eKF). The model assumes participants perception of input as an imperfect indicator of the true level of pain. In this case, it turns out that perception is corrupted as a result of the expectation participants hold about the upcoming stimuli. The extent of this effect is partly governed by a subjective level of noise ϵ, which may also subsume other sources of uncertainty beyond the expectation effect. Moreover, the response noise ξ, could also subsume any other unexplained sources of noise.

      Author response image 1.

      Stimulis intensity transformation

      Revised reply: We clarified our modelling choices in the ”2.2 Modelling strategy” subsection.

      Reviewer Comment 3.3 — A key conclusion drawn is that eKF is a better model than eRL. However, a closer examination of the results reveals that the two models behave very similarly, and it is not clear that they can be readily distinguished based on model recovery and model comparison results.

      Initial reply: While, the eKF appears to rank higher than the eRL in terms of LOOIC and sigma effects, we don’t wish to make make sweeping statements regarding significance of differences between eRL and eKF, but merely point to the trend in the data. We shall make this clearer in the revised version of the manuscript. However, the most important result is that the models involving expectation-weighing are arguably better capturing the data.

      Revised reply: We elaborated on the significance statements in the ”Modelling Results” subsection:

      • We considered at least a 2 sigma effect as indication of a significant difference. In each condition, the expectation weighted models (eKF and eRL) provided better fit than models without this element (KF and RL; approx. 2-4 sigma difference, as reported in Figure 5A-D). This suggests that regardless of the levels of volatility and stochasticity, participants still weigh perception of the stimuli with their expectation.

      and in the first paragraph of the Discussion:

      • When varying different levels of inherent uncertainty in the sequences of stimuli (stochasticity and volatility), the expectation and confidence weighted models fitted the data better than models weighted for confidence but not for expectations (Figure 5A-D). The expectation-weighted bayesian (KF) model offered a better fit than the expectation-weighted, model-free RL model, although in conditions of high stochasticity this difference was short of significance. Overall, this suggests that participants’ expectations play a significant role in the perception of sequences of noxious stimuli.

      We are aware of the limitations and lack of clear guidance regarding using sigma effects to establish significance (as per reviewer’s suggestion: https://discourse.mc-stan.org/t/loo-comparison-in-referenceto-standard-error/4009). Here we decided to use the above-mentioned threshold of 2-sigma as an indication of significance, but note the potential limitations of the inferences - especially when distinguishing between eRL/eKF models.

      Reviewer Comment 3.4 — Regarding model recovery, the distinction between the eKF and eRL models seems blurred. When the simulation is based on the eKF, there is no ability to distinguish whether either eKF or eRL is better. When the simulation is based on the eRL, the eRL appears to be the best model, but the difference with eKF is small. This raises a few more questions. What is the range of the parameters used for the simulations?

      Initial reply: We agree that the distinction between eKF and eRL in the model recovery is not that clean-cut, which may in turn point to the similarity between the two models. To simulate the data for the model and parameter recovery analysis, we used the group means and variances estimated on the participant data to sample individual parameter values.

      Reviewer Comment 3.5 — Is it possible that either eRL or eKF are best when different parameters are simulated? Additionally, increasing the number of simulations to at least 100 could provide more convincing model recovery results.

      Initial reply: It could be a possibility, but would require further investigation and comparison of fits for different bins/ranges of parameters to see if there is any consistent advantage of one model over another is each bin. We will consider adding this analysis, and provide an additional 50 simulations to paint a more convincing picture.

      Revised reply: We increased the number of simulations per model pair to ≈ 100 (after rejecting fits based on diagnostics criteria - E-BFMI and divergent transitions) and updated the confusion matrix (Table S4). Although the confusion between eRL and eKF remains, the model recovery shows good distinction between expectation weighted vs non-expectation weighted (and Random) models, which supports our main conclusion in the paper.

      Reviewer Comment 3.6 — Regarding model comparison, the authors reported that ”the expectation-weighted KF model offered a better fit than the eRL, although in conditions of high stochasticity, this difference was short of significance against the eRL model.” This interpretation is based on a significance test that hinges on the ratio between the ELPD and the surrounding standard error (SE). Unfortunately, there’s no agreed-upon threshold of SEs that determines significance, but a general guideline is to consider ”several SEs,” with a higher number typically viewed as more robust. However, the text lacks clarity regarding the specific number of SEs applied in this test. At a cursory glance, it appears that the authors may have employed 2 SEs in their interpretation, while only depicting 1 SE in Figure 4.

      Initial reply: Indeed, we considered 2 sigma effect as a threshold, however we recognise that there is no agreed-upon threshold, and shall make this and our interpretation clearer regarding the trend in the data, in the revision.

      Revised reply: We clarify this further, as per our revised response to Comment 3.3 above. We have also added the following statement in section 4.5.1 (Methods, Model comparison): ”There’s no agreed-upon threshold of SEs that determines significance, but the higher the sigma difference, the more robust is the effect.”

      Reviewer Comment 3.7 — With respect to parameter recovery, a few additional details could be included for completeness. Specifically, while the range of the learning rate is understandably confined between 0 and 1, the range of other simulated parameters, particularly those without clear boundaries, remains ambiguous. Including scatter plots with the simulated parameters on the xaxis and the recovered parameters on the y-axis would effectively convey this missing information.

      Furthermore, it would be beneficial for the authors to clarify whether the same priors were used for both the modelling results presented in the main paper and the parameter recovery presented in the supplementary material.

      Initial reply: Thanks for this comment and for the suggestions. To simulate the data for the model and parameter recovery analysis, we used the group means and variances estimated on the participant data to sample individual parameter values. The priors on the group and individual-level parameters in the recovery analysis where the same as in the fitting procedure. We will include the requested scatter plots in the next iteration of the manuscript.

      Revised reply: We included parameter recovery scatter plots for each model and parameter in the Supplement Figures S7-S11.

      Reviewer Comment 3.8 — While the reliance on R-hat values for convergence in model fitting is standard, a more comprehensive assessment could include estimates of the effective sample size (bulk ESS and/or tail ESS) and the Estimated Bayesian Fraction of Missing Information (EBFMI), to show efficient sampling across the distribution. Consideration of divergences, if any, would further enhance the reliability of the results.

      Initial reply: Thank you very much for this suggestion, we will aim to include these measures in the revised version.

      Revised reply: We have considered the suggested diagnostics and include bulk and tail ESS values for each condition, model, parameter in the Supplement Tables S6-S9. We also report number of chain with low E-BFMI (0), number of divergent transitions (0) and the E-BFMI values per chain in Table S10.

      Reviewer Comment 3.9 — The authors write: ”Going beyond conditioning paradigms based in cuing of pain outcomes, our findings offer a more accurate description of endogenous pain regulation.” Unfortunately, this statement isn’t substantiated by the results. The authors did not engage in a direct comparison between conditioning and sequence-based paradigms. Moreover, even if such a comparison had been made, it remains unclear what would constitute the gold standard for quantifying ”endogenous pain regulation.”

      Initial reply: This is valid point, indeed we do not compare paradigms in our study, and will remove this statement in the future version.

      Revised reply: We have removed this statement from the revised version.

      Reviewer Comment 3.10 — In relation to the comment on model comparison in my public review, I believe the following link may provide further insight and clarify the basis for my observation. It discusses the use of standard error in model comparison and may be useful for the authors in addressing this particular point: https://discourse.mc-stan.org/t/loo-comparison-in-referenceto-standard-error/4009

      Initial reply: Thank you for this suggestion, we will consider the forum discussion in our manuscript.

    1. eLife assessment

      This useful study reports how neuronal activity in the prefrontal cortex maps time intervals during which animals have to wait until reaching a reward and how this mapping is preserved across days. However, the evidence supporting the claims is incomplete as these sequential neuronal patterns do not necessarily represent time but instead may be correlated with stereotypical behavior and restraint from impulsive decision, which would require further controls (e.g. behavioral analysis) to clarify the main message. The study will be of interest to neuroscientists interested in decision making and motor control.

    2. Reviewer #1 (Public Review):

      Summary:

      This paper investigates the neural population activity patterns of the medial frontal cortex in rats performing a nose poking timing task using in vivo calcium imaging. The results showed neurons that were active at the beginning and end of the nose poking and neurons that formed sequential patterns of activation that covaried with the timed interval during nose poking on a trial-by-trial basis. The former were not stable across sessions, while the latter tended to remain stable over weeks. The analysis on incorrect trials suggests the shorter non-rewarded intervals were due to errors in the scaling of the sequential pattern of activity.

      Strengths:

      This study measured stable signals using in vivo calcium imaging during experimental sessions that were separated by many days in animals performing a nose poking timing task. The correlation analysis on the activation profile to separate the cells in the three groups was effective and the functional dissociation between beginning and end, and duration cells was revealing. The analysis on the stability of decoding of both the nose poking state and poking time was very informative. Hence, this study dissected a neural population that formed sequential patterns of activation that encoded timed intervals.

      Weaknesses:

      It is not clear whether animals had enough simultaneously recorded cells to perform the analyzes of Figures 2-4. In fact, rat 3 had 18 responsive neurons which probably is not enough to get robust neural sequences for the trial-by-trial analysis and the correct and incorrect trial analysis. In addition, the analysis of behavioral errors could be improved. The analysis in Figure 4A could be replaced by a detailed analysis on the speed, and the geometry of neural population trajectories for correct and incorrect trials. In the case of Figure 4G is not clear why the density of errors formed two clusters instead of having a linear relation with the produce duration. I would be recommendable to compute the scaling factor on neuronal population trajectories and single cell activity or the computation of the center of mass to test the type III errors.

      Due to the slow time resolution of calcium imaging, it is difficult to perform robust analysis on ramping activity. Therefore, I recommend downplaying the conclusion that: "Together, our data suggest that sequential activity might be a more relevant coding regime than the ramping activity in representing time under physiological conditions."

    3. Reviewer #2 (Public Review):

      In this manuscript, Li and collaborators set out to investigate the neuronal mechanisms underlying "subjective time estimation" in rats. For this purpose, they conducted calcium imaging in the prefrontal cortex of water-restricted rats that were required to perform an action (nosepoking) for a short duration to obtain drops of water. The authors provided evidence that animals progressively improved in performing their task. They subsequently analyzed the calcium imaging activity of neurons and identify start, duration, and stop cells associated with the nose poke. Specifically, they focused on duration cells and demonstrated that these cells served as a good proxy for timing on a trial-by-trial basis, scaling their pattern of actvity in accordance with changes in behavioral performance. In summary, as stated in the title, the authors claim to provide mechanistic insights into subjective time estimation in rats, a function they deem important for various cognitive conditions.

      This study aligns with a wide range of studies in system neuroscience that presume that rodents solve timing tasks through an explicit internal estimation of duration, underpinned by neuronal representations of time. Within this framework, the authors performed complex and challenging experiments, along with advanced data analysis, which undoubtedly merits acknowledgement. However, the question of time perception is a challenging one, and caution should be exercised when applying abstract ideas derived from human cognition to animals. Studying so-called time perception in rats has significant shortcomings because, whether acknowledged or not, rats do not passively estimate time in their heads. They are constantly in motion. Moreover, rats do not perform the task for the sake of estimating time but to obtain their rewards are they water restricted. Their behavior will therefore reflects their motivation and urgency to obtain rewards. Unfortunately, it appears that the authors are not aware of these shortcomings. These alternative processes (motivation, sensorimotor dynamics) that occur during task performance are likely to influence neuronal activity. Consequently, my review will be rather critical. It is not however intended to be dismissive. I acknowledge that the authors may have been influenced by numerous published studies that already draw similar conclusions. Unfortunately, all the data presented in this study can be explained without invoking the concept of time estimation. Therefore, I hope the authors will find my comments constructive and understand that as scientists, we cannot ignore alternative interpretations, even if they conflict with our a priori philosophical stance (e.g., duration can be explicitly estimated by reading neuronal representation of time) and anthropomorphic assumptions (e.g., rats estimate time as humans do). While space is limited in a review, if the authors are interested, they can refer to a lengthy review I recently published on this topic, which demonstrates that my criticism is supported by a wide range of timing experiments across species (Robbe, 2023). In addition to this major conceptual issue that cast doubt on most of the conclusions of the study, there are also several major statistical issues.

      Main Concerns

      (#1) The authors used a task in which rats must poke for a minimal amount of time (300 ms and then 1500 ms) to be able to obtain a drop of water delivered a few centimeters right below the nosepoke. They claim that their task is a time estimation task. However, they forget that they work with thirsty rats that are eager to get water sooner than later (there is a reason why they start by a short duration!). This task is mainly probing the animals ability to wait (that is impulse control) rather than time estimation per se. Second, the task does not require to estimate precisely time because there appear to be no penalties when the nosepokes are too short or when they exceed. So it will be unclear if the variation in nosepoke reflects motivational changes rather than time estimation changes. The fact that this behavioral task is a poor assay for time estimation and rather reflects impulse control is shown by the tendency of animals to perform nose-pokes that are too short, the very slow improvement in their performance (Figure 1, with most of the mice making short responses), and the huge variability. Not only do the behavioral data not support the claim of the authors in terms of what the animals are actually doing (estimating time), but this also completely annhilates the interpretation of the Ca++ imaging data, which can be explained by motivational factors (changes in neuronal activity occurring while the animals nose poke may reflect a growing sens of urgency to check if water is available).

      (#2) A second issue is that the authors seem to assume that rats are perfectly immobile and perform like some kind of robots that would initiate nose pokes, maintain them, and remove them in a very discretized manner. However, in this kind of task, rats are constantly moving from the reward magazine to the nose poke. They also move while nose-poking (either their body or their mouth), and when they come out of the nose poke, they immediately move toward the reward spout. Thus, there is a continuous stream of movements, including fidgeting, that will covary with timing. Numerous studies have shown that sensorimotor dynamics influence neural activity, even in the prefrontal cortex. Therefore, the authors cannot rule out that what the records reflect are movements (and the scaling of movement) rather than underlying processes of time estimation (some kind of timer). Concretely, start cells could represent the ending of the movement going from the water spout to the nosepoke, and end cells could be neurons that initiate (if one can really isolate any initiation, which I doubt) the movement from the nosepoke to the water spout. Duration cells could reflect fidgeting or orofacial movements combined with an increasing urgency to leave the nose pokes.

      (#3) The statistics should be rethought for both the behavioral and neuronal data. They should be conducted separately for all the rats, as there is likely interindividual variability in the impulsivity of the animals.

      (#4) The fact that neuronal activity reflects an integration of movement and motivational factors rather than some abstract timing appears to be well compatible with the analysis conducted on the error trials (Figure 4), considering that the sensorimotor and motivational dynamics will rescale with the durations of the nose poke.

      (#5) The authors should mention upfront in the main text (result section) the temporal resolution allowed by their Ca+ probe and discuss whether it is fast enough in regard of behavioral dynamics occurring in the task.

    4. Author response:

      eLife assessment

      This useful study reports how neuronal activity in the prefrontal cortex maps time intervals during which animals have to wait until reaching a reward and how this mapping is preserved across days. However, the evidence supporting the claims is incomplete as these sequential neuronal patterns do not necessarily represent time but instead may be correlated with stereotypical behavior and restraint from impulsive decision, which would require further controls (e.g. behavioral analysis) to clarify the main message. The study will be of interest to neuroscientists interested in decision making and motor control. 

      We thank the editors and reviewers for the constructive comments. In light of the questions mentioned by the reviewers, we plan to perform additional analyses in our revision, particularly aiming to address issues related to single-cell scalability, and effects of motivation and movement. We believe these additional data will greatly improve the rigor and clarity of our study. We are grateful for the review process of eLife.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper investigates the neural population activity patterns of the medial frontal cortex in rats performing a nose poking timing task using in vivo calcium imaging. The results showed neurons that were active at the beginning and end of the nose poking and neurons that formed sequential patterns of activation that covaried with the timed interval during nose poking on a trial-by-trial basis. The former were not stable across sessions, while the latter tended to remain stable over weeks. The analysis on incorrect trials suggests the shorter non-rewarded intervals were due to errors in the scaling of the sequential pattern of activity. 

      Strengths:

      This study measured stable signals using in vivo calcium imaging during experimental sessions that were separated by many days in animals performing a nose poking timing task. The correlation analysis on the activation profile to separate the cells in the three groups was effective and the functional dissociation between beginning and end, and duration cells was revealing. The analysis on the stability of decoding of both the nose poking state and poking time was very informative. Hence, this study dissected a neural population that formed sequential patterns of activation that encoded timed intervals. 

      We thank the reviewer for the positive comments.

      Weaknesses: 

      It is not clear whether animals had enough simultaneously recorded cells to perform the analyzes of Figures 2-4. In fact, rat 3 had 18 responsive neurons which probably is not enough to get robust neural sequences for the trial-by-trial analysis and the correct and incorrect trial analysis. 

      We thank the reviewer for the comment. We would like to mention that the 18 cells plotted in Supplementary figure 1 were only from the duration cell category. To improve the clarity of our results, we are going to provide information regarding the number of cells from each rat in our revision. In general, we imaged more than 50 cells from each rat. We would also like to point to the data from individual trials in Supplementary figure 1B showing robust sequentiality.

      In addition, the analysis of behavioral errors could be improved. The analysis in Figure 4A could be replaced by a detailed analysis on the speed, and the geometry of neural population trajectories for correct and incorrect trials.

      We thank the reviewer for the suggestions. We are going to conduct the analysis as the reviewer recommended. We agree with the reviewer that better presentation of the neural activity will be helpful for the readers.

      In the case of Figure 4G is not clear why the density of errors formed two clusters instead of having a linear relation with the produce duration. I would be recommendable to compute the scaling factor on neuronal population trajectories and single cell activity or the computation of the center of mass to test the type III errors. 

      We would like to mention that the prediction errors plotted in this graph were calculated from two types of trials. The correct trials tended to show positive time estimation errors while the incorrect trials showed negative time estimation errors. We believe that the polarity switch between these two types suggested a possible use of this neural mechanism to time the action of the rats.

      In addition, we are going to perform the analysis suggested by the reviewer in our revision. We agree that different ways of analyzing the data would provide better characterization of the scaling effect.

      Due to the slow time resolution of calcium imaging, it is difficult to perform robust analysis on ramping activity. Therefore, I recommend downplaying the conclusion that: "Together, our data suggest that sequential activity might be a more relevant coding regime than the ramping activity in representing time under physiological conditions." 

      We agree with the reviewer and we have mentioned this caveat in our original manuscript. We are going to rephrase the sentence as the reviewer suggested during our revision.

      Reviewer #2 (Public Review):

      In this manuscript, Li and collaborators set out to investigate the neuronal mechanisms underlying "subjective time estimation" in rats. For this purpose, they conducted calcium imaging in the prefrontal cortex of water-restricted rats that were required to perform an action (nosepoking) for a short duration to obtain drops of water. The authors provided evidence that animals progressively improved in performing their task. They subsequently analyzed the calcium imaging activity of neurons and identify start, duration, and stop cells associated with the nose poke. Specifically, they focused on duration cells and demonstrated that these cells served as a good proxy for timing on a trial-by-trial basis, scaling their pattern of actvity in accordance with changes in behavioral performance. In summary, as stated in the title, the authors claim to provide mechanistic insights into subjective time estimation in rats, a function they deem important for various cognitive conditions. 

      This study aligns with a wide range of studies in system neuroscience that presume that rodents solve timing tasks through an explicit internal estimation of duration, underpinned by neuronal representations of time. Within this framework, the authors performed complex and challenging experiments, along with advanced data analysis, which undoubtedly merits acknowledgement. However, the question of time perception is a challenging one, and caution should be exercised when applying abstract ideas derived from human cognition to animals. Studying so-called time perception in rats has significant shortcomings because, whether acknowledged or not, rats do not passively estimate time in their heads. They are constantly in motion. Moreover, rats do not perform the task for the sake of estimating time but to obtain their rewards are they water restricted. Their behavior will therefore reflects their motivation and urgency to obtain rewards. Unfortunately, it appears that the authors are not aware of these shortcomings. These alternative processes (motivation, sensorimotor dynamics) that occur during task performance are likely to influence neuronal activity. Consequently, my review will be rather critical. It is not however intended to be dismissive. I acknowledge that the authors may have been influenced by numerous published studies that already draw similar conclusions. Unfortunately, all the data presented in this study can be explained without invoking the concept of time estimation. Therefore, I hope the authors will find my comments constructive and understand that as scientists, we cannot ignore alternative interpretations, even if they conflict with our a priori philosophical stance (e.g., duration can be explicitly estimated by reading neuronal representation of time) and anthropomorphic assumptions (e.g., rats estimate time as humans do). While space is limited in a review, if the authors are interested, they can refer to a lengthy review I recently published on this topic, which demonstrates that my criticism is supported by a wide range of timing experiments across species (Robbe, 2023). In addition to this major conceptual issue that cast doubt on most of the conclusions of the study, there are also several major statistical issues. 

      Main Concerns 

      (1) The authors used a task in which rats must poke for a minimal amount of time (300 ms and then 1500 ms) to be able to obtain a drop of water delivered a few centimeters right below the nosepoke. They claim that their task is a time estimation task. However, they forget that they work with thirsty rats that are eager to get water sooner than later (there is a reason why they start by a short duration!). This task is mainly probing the animals ability to wait (that is impulse control) rather than time estimation per se. Second, the task does not require to estimate precisely time because there appear to be no penalties when the nosepokes are too short or when they exceed. So it will be unclear if the variation in nosepoke reflects motivational changes rather than time estimation changes. The fact that this behavioral task is a poor assay for time estimation and rather reflects impulse control is shown by the tendency of animals to perform nose-pokes that are too short, the very slow improvement in their performance (Figure 1, with most of the mice making short responses), and the huge variability. Not only do the behavioral data not support the claim of the authors in terms of what the animals are actually doing (estimating time), but this also completely annhilates the interpretation of the Ca++ imaging data, which can be explained by motivational factors (changes in neuronal activity occurring while the animals nose poke may reflect a growing sens of urgency to check if water is available). 

      We would like to respond to the reviewer’s comments 1, 2 and 4 together since they all focus on the same issue. We thank the reviewer for the very thoughtful comments and for sharing his detailed reasoning from a recently published review (Robbe, 2023). A lot of the discussion goes beyond the scope of this study and we agree that whether there is an explicit representation of time (an internal clock) in the brain is a difficult question to answer, particularly by using animal behaviors. In fact, even with fully conscious humans and elaborated task design, we think it is still questionable to clearly dissociate the neural substrate of “timing” from “motor”. In the end, it may as well be that as the reviewer cited from Bergson’s article, the experience of time cannot be measured.

      Studying the neural representation of any internal state may suffer from the same ambiguity. With all due respect, however, we would like to limit our response in the scope of our results. According to the reviewer, two alternative interpretations of the task-related sequential activity exist: 1, duration cells may represent fidgeting or orofacial movements and 2, duration cells may represent motivation or motion plan of the rats. To test the first alternative interpretation, we will perform a more comprehensive analysis of the behavior data at all the limbs and visible body parts of the rat during nose poke and analyze its periodicity among different trials, although the orofacial movements may not be visible to us.

      Regarding the second alternative interpretation, we think our data in the original Figure 4G argues against it. In this graph, we plotted the decoding error of time using the duration cells’ activity against the actual duration of the trials. If the sequential activity of durations cells only represents motivation, then the errors should distribute evenly across different trial times, or linearly modulated by trial durations. The unimodal distribution we observed (Figure 4G and see Author response image 1 below for a re-plot without signs) suggests that the scaling factor of the sequential activity represents information related to time. And the fact that this unimodal distribution centered at the time threshold of the task provides strong evidence for the active use of scaling factor for time estimation. In order to further test the relationship to motivation, we will measure the time interval between exiting nose poke to the start of licking water reward as an independent measurement of motivation for each trial. We will analyze and report whether this measurement correlates with the nose poking durations in our data in the revision.

      Author response image 1.

      Furthermore, whether the scaling sequential activity we report represents behavioral timing or true time estimation, the reviewer would agree that these activities correlate with the animal’s nose poking durations, and a previous study has showed that PFC silencing led to disruption of the mouse’s timing behavior (PMID: 24367075). The main surprising finding of the paper is that these duration cells are different from the start and end cells in terms of their coding stability. Thus, future studies dissecting the anatomical microcircuit of these duration cells may provide further clue regarding whether they receive inputs from thirst or reward-related brain regions. This may help partially resolve the “time” vs. “motor” debate the reviewer mentioned.

      (2) A second issue is that the authors seem to assume that rats are perfectly immobile and perform like some kind of robots that would initiate nose pokes, maintain them, and remove them in a very discretized manner. However, in this kind of task, rats are constantly moving from the reward magazine to the nose poke. They also move while nose-poking (either their body or their mouth), and when they come out of the nose poke, they immediately move toward the reward spout. Thus, there is a continuous stream of movements, including fidgeting, that will covary with timing. Numerous studies have shown that sensorimotor dynamics influence neural activity, even in the prefrontal cortex. Therefore, the authors cannot rule out that what the records reflect are movements (and the scaling of movement) rather than underlying processes of time estimation (some kind of timer). Concretely, start cells could represent the ending of the movement going from the water spout to the nosepoke, and end cells could be neurons that initiate (if one can really isolate any initiation, which I doubt) the movement from the nosepoke to the water spout. Duration cells could reflect fidgeting or orofacial movements combined with an increasing urgency to leave the nose pokes.

      (3)The statistics should be rethought for both the behavioral and neuronal data. They should be conducted separately for all the rats, as there is likely interindividual variability in the impulsivity of the animals.

      We thank the reviewer for the comment, yet we are not quite sure what specifically was asked by the reviewer. There is undoubtedly variance among individual animals. One of the core reasons for statistical comparison is to compare the group difference with the variance due to sampling. It appears that the reviewer would like to require we conduct our analysis using each rat individually. We will conduct and report analysis with individual rat in Figure 1C, Figure 2C, G, K, Figure 4F in our revised manuscript.

      (4) The fact that neuronal activity reflects an integration of movement and motivational factors rather than some abstract timing appears to be well compatible with the analysis conducted on the error trials (Figure 4), considering that the sensorimotor and motivational dynamics will rescale with the durations of the nose poke. 

      (5) The authors should mention upfront in the main text (result section) the temporal resolution allowed by their Ca+ probe and discuss whether it is fast enough in regard of behavioral dynamics occurring in the task. 

      We thank the reviewer for the suggestion. We have originally mentioned the caveat of calcium imaging in the interpretation of our results. We will incorporate more texts for this purpose during our revision. In terms of behavioral dynamics (start and end of nose poke in this case), we think calcium imaging could provide sufficient kinetics. However, the more refined dynamics related to the reproducibility of the sequential activity or the precise representation of individual cells on the scaled duration may be benefited from improved time resolution.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Please refer explicitly to the three types of cells in the abstract. 

      We will modify the abstract as suggested during revision.

      (2) Please refer to the work of Betancourt et al., 2023 Cell Reports, where a trial-by-trail analysis on the correlation between neural trajectory dynamics in MPC and timing behavior is reported. In that same paper the stability of neural sequences across task parameters is reported. 

      We will cite and discuss this study in our revised paper.

      (3) Please state the number of studied animals at the beginning of the results section. 

      We will provide this information as requested. The number of animals were also plotted in Figure 1D for each analysis.

      (4) Why do the middle and right panels of Figure 2E show duration cells. 

      Figure 2E was intended to show examples of duration cells’ activity. We included different examples of cells that peak at different points in the scaled duration. We believe these multiple examples would give the readers a straight forward impression of these cells’ activity patterns.

      (5) Which behavioral sessions of Figure 1B were analyzed further. 

      We will label the analyzed sessions in Figure 1B during our revision.

      (6) In Figure 3A-C please increase the time before the beginning of the trial in order to visualize properly the activation patterns of the start cells. 

      We thank the reviewer for the suggestion and will modify the figure accordingly during revision.

      (7) Please state what could be the behavioral and functional effect of the ablation of the cortical tissue on top of mPFC. 

      We thank the reviewer for the question. In our experience, mice with lens implanted in mPFC did not show observable different to mice without surgery regarding the acquisition of the task and the distribution of the nose-poke durations. Although we could not rule out the effect on other cognitive process, the mice appeared to be intact in the scope of our task. We will provide these behavior data during our revision.

    1. eLife assessment

      This study presents a useful exploration of the complex relationship between structure and function in the developing human brain using a large-scale imaging dataset from the Human Connectome Project in Development and gene expression profiles from the Allen Brain Atlas. The evidence supporting the claims of the authors is solid, although the inclusion of more systematic analyses of structural and functional connectivity with respect to myelin measures and oligodendrocyte-related genes, and also more details regarding the imaging analyses, cognitive scores, and design and validation strategies, would have strengthened the paper. The work will be of interest to developmental biologists and neuroscientists seeking to elucidate structure-function relationships in the human brain.

    2. Reviewer #1 (Public Review):

      Summary:

      This work studies spatio-temporal patterns of structure-function coupling in developing brains, using a large set of imaging data acquired from children aged 5-22. Magnetic resonance imaging data of brain structure and function were obtained from a publicly available database, from which structural and functional features and measures were derived. The authors examined the spatial patterns of structure-function coupling and how they evolve with brain development. This work further sought correlations of brain structure-function coupling with behavior and explored evolutionary, microarchitectural and genetic bases that could potentially account for the observed patterns.

      Strength:

      The strength of this work is the use of currently available state-of-the-art analysis methods, along with a large set of high-quality imaging data, and comprehensive examinations of structure-function coupling in developing brains. The results are comprehensive and illuminating.

      Weakness:

      As with most other studies, transcriptomic and cellular architectures of structure-function coupling were characterized only on the basis of a common atlas in this work.

      The authors have achieved their aims in this study, and the findings provide mechanistic insights into brain development, which will inspire further basic and clinical studies along this line.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Lines 40-42: The sentence "The coupling of structural connectome (SC) and functional connectome (FC) varies greatly across different cortical regions reflecting anatomical and functional hierarchies as well as individual differences in cognitive function, and is regulated by genes" is a misstatement. Regional variations of structure-function coupling do not really reflect differences in cognitive function among individuals, but inter-subject variations do.

      Thank you for your comment. We have made revisions to the sentence to correct its misstatement. Please see lines 40-43: “The coupling of structural connectome (SC) and functional connectome (FC) varies greatly across different cortical regions reflecting anatomical and functional hierarchies[1, 6-9] and is regulated by genes[6, 8], as well as its individual differences relates to cognitive function[8, 9].”

      (2) In Figure 1, the graph showing the relation between intensity and cortical depth needs explanation.

      Thank you for your comment. We have added necessary explanation, please see lines 133-134: “The MPC was used to map similarity networks of intracortical microstructure (voxel intensity sampled in different cortical depth) for each cortical node.”

      (3) Line 167: Change "increased" to "increase".

      We have corrected it, please see lines 173-174: “…networks significantly increased with age and exhibited greater increase.”

      (4) Line 195: Remove "were".

      We have corrected it, please see line 204: “…default mode networks significantly contributed to the prediction…”

      (5) Lines 233-240, Reproducibility analyses: Comparisons of parcellation templates were not made with respect to gene weights. Is there any particular reason?

      Thank you for your comment. We have quantified the gene weights based on HCPMMP using the same procedures. We identified a correlation (r \= 0.25, p<0.001) between the gene weights in HCPMMP and BNA. Given that this is a relatively weak correlation, we need to clarify the following points.

      Based on HCPMMP, we produced an averaged gene expression profile for 10,027 genes covering 176 left cortical regions[1]. The excluding 4 cortical regions that had an insufficient number of assigned samples may lead to different templates having a relatively weak correlation of gene associations. Moreover, the effect of different template resolutions on the results of human connectome-transcriptome association is still unclear.

      In brain connectome analysis, the choice of parcellation templates can indeed influence the subsequent findings to some extent. A methodological study[2] provided referenced correlations about 0.4~0.6 for white matter connectivity and 0.2~0.4 for white matter nodal property between two templates (refer to Figure 4 and 5 in [2]). Therefore, the age-related coupling changes as a downstream analysis was calculated using multimodal connectome and correlated with gene expression profiles, which may be influenced by the choice of templates. 

      We have further supplemented gene weights results obtained from HCPMMP to explicitly clarify the dependency of parcellation templates.

      Please see lines 251-252: “The gene weights of HCPMMP was consistent with that of BNA (r = 0.25, p < 0.001).”

      Author response image 1.

      The consistency of gene weights between HCPMMP and BNA.

      Please see lines 601-604: “Finally, we produced an averaged gene expression profile for 10,027 genes covering 176 left cortical regions based on HCPMMP and obtained the gene weights by PLS analysis. We performed Pearson's correlation analyses to assess the consistency of gene weights between HCPMMP and BNA.”

      Reviewer #2 (Recommendations For The Authors):

      Your paper is interesting to read and I found your efforts to evaluate the robustness of the results of different parcellation strategies and tractography methods very valuable. The work is globally easy to navigate and well written with informative good-quality figures, although I think some additional clarifications will be useful to improve readability. My suggestions and questions are detailed below (I aimed to group them by topic which did not always succeed so apologies if the comments are difficult to navigate, but I hope they will be useful for reflection and to incorporate in your work).

      * L34: 'developmental disorder'

      ** As far as I understand, the subjects in HCP-D are mostly healthy (L87). Thus, while your study provides interesting insights into typical brain development, I wonder if references to 'disorder' might be premature. In the future, it would be interesting to extend your approach to the atypical populations. In any case, it would be extremely helpful and appreciated if you included a figure visualising the distribution of behavioural scores within your population and in relationship to age at scan for your subjects (and to include a more detailed description of the assessment in the methods section) given that large part of your paper focuses on their prediction using coupling inputs (especially given a large drop of predictive performance after age correction). Such figures would allow the reader to better understand the cognitive variability within your data, but also potential age relationships, and generally give a better overview of your cohort.

      We agree with your comment that references to 'disorder' is premature. We have made revisions in abstract and conclusion. 

      Please see lines 33-34: “This study offers insight into the maturational principles of SC-FC coupling in typical development.”

      Please see lines 395-396: “Further investigations are needed to fully explore the clinical implications of SC-FC coupling for a range of developmental disorders.”

      In addition, we have included a more detailed description of the cognitive scores in the methods section and provided a figure to visualize the distributions of cognitive scores and in relationship to age for subjects. Please see lines 407-413: “Cognitive scores. We included 11 cognitive scores which were assessed with the National Institutes of Health (NIH) Toolbox Cognition Battery (https://www.healthmeasures.net/exploremeasurement-systems/nih-toolbox), including episodic memory, executive function/cognitive flexibility, executive function/inhibition, language/reading decoding, processing speed, language/vocabulary comprehension, working memory, fluid intelligence composite score, crystal intelligence composite score, early child intelligence composite score and total intelligence composite score. Distributions of these cognitive scores and their relationship with age are illustrated in Figure S12.”

      Author response image 2.

      Cognitive scores and age distributions of scans.

      * SC-FC coupling

      ** L162: 'Regarding functional subnetworks, SC-FC coupling increased disproportionately with age (Figure 3C)'.

      *** As far as I understand, in Figure 3C, the points are the correlation with age for a given ROI within the subnetwork. Is this correct? If yes, I am not sure how this shows a disproportionate increase in coupling. It seems that there is great variability of SC-FC correlation with age across regions within subnetworks, more so than the differences between networks. This would suggest that the coupling with age is regionally dependent rather than network-dependent? Maybe you could clarify?

      The points are the correlation with age for a given ROI within the subnetwork in Figure 3C. We have revised the description, please see lines 168-174: “Age correlation coefficients distributed within functional subnetworks were shown in Figure 3C. Regarding mean SC-FC coupling within functional subnetworks, the somatomotor (𝛽𝑎𝑔𝑒\=2.39E-03, F=4.73, p\=3.10E-06, r\=0.25, p\=1.67E07, Figure 3E), dorsal attention (𝛽𝑎𝑔𝑒\=1.40E-03, F=4.63, p\=4.86E-06, r\=0.24, p\=2.91E-07, Figure 3F), frontoparietal (𝛽𝑎𝑔𝑒 =2.11E-03, F=6.46, p\=2.80E-10, r\=0.33, p\=1.64E-12, Figure 3I) and default mode (𝛽𝑎𝑔𝑒 =9.71E-04, F=2.90, p\=3.94E-03, r\=0.15, p\=1.19E-03, Figure 3J) networks significantly increased with age and exhibited greater increase.” In addition, we agree with your comment that the coupling with age is more likely region-dependent than network-dependent. We have added the description, please see lines 329-332: “We also found the SC-FC coupling with age across regions within subnetworks has more variability than the differences between networks, suggesting that the coupling with age is more likely region-dependent than network-dependent.” This is why our subsequent analysis focused on regional coupling.  

      *** Additionally, we see from Figure 3C that regions within networks have very different changes with age. Given this variability (especially in the subnetworks where you show both positive and negative correlations with age for specific ROIs (i.e. all of them)), does it make sense then to show mean coupling over regions within the subnetworks which erases the differences in coupling with age relationships across regions (Figures 3D-J)?

      Considering the interest and interpretation for SC-FC coupling, showing the mean coupling at subnetwork scales with age correlation is needed, although this eliminates variability at regional scale. These results at different scales confirmed that coupling changes with age at this age group are mainly increased.

      *** Also, I think it would be interesting to show correlation coefficients across all regions, not only the significant ones (3B). Is there a spatially related tendency of increases/decreases (rather than a 'network' relationship)? Would it be interesting to show a similar figure to Figure S7 instead of only the significant regions?

      As your comment, we have supplemented the graph which shows correlation coefficients across all regions into Figure 3B. Similarly, we supplemented to the other figures (Figure S3-S6).

      Author response image 3.

      Aged-related changes in SC-FC coupling. (A) Increases in whole-brain coupling with age. (B) Correlation of age with SC-FC coupling across all regions and significant regions (p<0.05, FDR corrected). (C) Comparisons of age-related changes in SC-FC coupling among functional networks. The boxes show the median and interquartile range (IQR; 25–75%), and the whiskers depict 1.5× IQR from the first or third quartile. (D-J) Correlation of age with SC-FC coupling across the VIS, SM, DA, VA, LIM, FP and DM. VIS, visual network; SM, somatomotor network; DA, dorsal attention network; VA, ventral attention network; LIM, limbic network; FP, frontoparietal network; DM, default mode network.

      *** For the quantification of MPC.

      **** L421: you reconstructed 14 cortical surfaces from the wm to pial surface. If we take the max thickness of the cortex to be 4.5mm (Fischl & Dale, 2000), the sampling is above the resolution of your anatomical images (0.8mm). Could you expand on what the interest is in sampling such a higher number of surfaces given that the resolution is not enough to provide additional information?

      The surface reconstruction was based on state-of-the-art equivolumetric surface construction techniques[3] which provides a simplified recapitulation of cellular changes across the putative laminar structure of the cortex. By referencing a 100-μm resolution Merkerstained 3D histological reconstruction of an entire post mortem human brain (BigBrain: https://bigbrain.loris.ca/main.php), a methodological study[4] systematically evaluated MPC stability with four to 30 intracortical surfaces when the resolution of anatomical image was 0.7 mm, and selected 14 surfaces as the most stable solution. Importantly, it has been proved the in vivo approach can serve as a lower resolution yet biologically meaningful extension of the histological work[4]. 

      **** L424: did you aggregate intensities over regions using mean/median or other statistics?

      It might be useful to specify.

      Thank you for your careful comment. We have revised the description in lines 446-447: “We averaged the intensity profiles of vertices over 210 cortical regions according to the BNA”.

      **** L426: personal curiosity, why did you decide to remove the negative correlation of the intensity profiles from the MPC? Although this is a common practice in functional analyses (where the interpretation of negatives is debated), within the context of cortical correlations, the negative values might be interesting and informative on the level of microstructural relationships across regions (if you want to remove negative signs it might be worth taking their absolute values instead).

      We agree with your comment that the interpretation of negative correlation is debated in MPC. Considering that MPC is a nascent approach to network modeling, we adopted a more conservative strategy that removing negative correlation by referring to the study [4] that proposed the approach. As your comment, the negative correlation might be informative. We will also continue to explore the intrinsic information on the negative correlation reflecting microstructural relationships.

      **** L465: could you please expand on the notion of self-connections, it is not completely evident what this refers to.

      We have revised the description in lines 493-494: “𝑁𝑐 is the number of connection (𝑁𝑐 = 245 for BNA)”.

      **** Paragraph starting on L467: did you evaluate the multicollinearities between communication models? It is possibly rather high (especially for the same models with similar parameters (listed on L440-444)). Such dependence between variables might affect the estimates of feature importance (given the predictive models only care to minimize error, highly correlated features can be selected as a strong predictor while the impact of other features with similarly strong relationships with the target is minimized thus impacting the identification of reliable 'predictors').

      We agree with your comment. The covariance structure (multicollinearities) among the communication models have a high probability to lead to unreliable predictor weights. In our study, we applied Haufe's inversion transform[5] which resolves this issue by computing the covariance between the predicted FC and each communication models in the training set. More details for Haufe's inversion transform please see [5]. We further clarified in the manuscript, please see in lines 497-499: “And covariance structure among the predictors may lead to unreliable predictor weights. Thus, we applied Haufe's inversion transform[38] to address these issues and identify reliable communication mechanisms.”

      **** L474: I am not completely familiar with spin tests but to my understanding, this is a spatial permutation test. I am not sure how this applies to the evaluation of the robustness of feature weight estimates per region (if this was performed per region), it would be useful to provide a bit more detail to make it clearer.

      As your comment, we have supplemented the detail, please see lines 503-507: “Next, we generated 1,000 FC permutations through a spin test[86] for each nodal prediction in each subject and obtained random distributions of model weights. These weights were averaged over the group and were investigated the enrichment of the highest weights per region to assess whether the number of highest weights across communication models was significantly larger than that in a random discovery.”

      **** L477: 'significant communication models were used to represent WMC...', but in L103 you mention you select 3 models: communicability, mean first passage, and flow graphs. Do you want to say that only 3 models were 'significant' and these were exactly the same across all regions (and data splits/ parcellation strategies/ tractography methods)? In the methods, you describe a lot of analysis and testing but it is not completely clear how you come to the selection of the final 3, it would be beneficial to clarify. Also, the final 3 were selected on the whole dataset first and then the pipeline of SC-FC coupling/age assessment/behaviour predictions was run for every (WD, S1, S2) for both parcellations schemes and tractography methods or did you end up with different sets each time? It would be good to make the pipeline and design choices, including the validation bit clearer (a figure detailing all the steps which extend Figure 1 would be very useful to understand the design/choices and how they relate to different runs of the validation).

      Thank you for your comment. In all reproducibility analyses, we used the same 3 models which was selected on the main pipeline (probabilistic tractography and BNA parcellation). According to your comment, we produced a figure that included the pipeline of model selection as the extend of Figure 1. And the description please see lines 106-108: “We used these three models to represent the extracortical connectivity properties in subsequent discovery and reproducibility analyses (Figure S1).” 

      Author response image 4.

      Pipeline of model selection and reproducibility analyses.

      **** Might the imbalance of features between structural connectivity and MPC affect the revealed SC-FC relationships (3 vs 1)? Why did you decide on this ratio rather than for example best WM structural descriptor + MPC?

      We understand your concern. The WMC communication models represent diverse geometric, topological, or dynamic factors. In order to describe the properties of WMC as best as possible, we selected three communication models after controlling covariance structure that can significantly predict FC from the 27 models. Compared to MPC, this does present a potential feature imbalance problem. However, this still supports the conclusion that coupling models that incorporate microarchitectural properties yield more accurate predictions of FC from SC[6, 7]. The relevant experiments are shown in Figure S2 below. If only the best WM structural descriptor is used, this may lose some communication properties of WMC.

      **** L515: were intracranial volume and in-scanner head motion related to behavioural measures? These variables likely impact the inputs, do you expect them to influence the outcome assessments? Or is there a mistake on L518 and you actually corrected the input features rather than the behaviour measures?

      The in-scanner head motion and intracranial volume are related to some age-adjusted behavioural measures, as shown in the following table. The process of regression of covariates from cognitive measures was based on these two cognitive prediction studies [8, 9]. Please see lines 549-554: “Prior to applying the nested fivefold cross-validation framework to each behaviour measure, we regressed out covariates including sex, intracranial volume, and in-scanner head motion from the behaviour measure[59, 69]. Specifically, we estimated the regression coefficients of the covariates using the training set and applied them to the testing set. This regression procedure was repeated for each fold.”

      Author response table 1.

      ** Additionally, in the paper, you propose that the incorporation of cortical microstructural (myelin-related) descriptors with white-matter connectivity to explain FC provides for 'a more comprehensive perspective for characterizing the development of SC-FC coupling' (L60). This combination of cortical and white-matter structure is indeed interesting, however the benefits of incorporating different descriptors could be studied further. For example, comparing results of using only the white matter connectivity (assessed through selected communication models) ~ FC vs (white matter + MPC) ~ FC vs MPC ~ FC. Which descriptors better explain FC? Are the 'coupling trends' similar (or the same)? If yes, what is the additional benefit of using the more complex combination? This would also add strength to your statement at L317: 'These discrepancies likely arise from differences in coupling methods, highlighting the complementarity of our methods with existing findings'. Yes, discrepancies might be explained by the use of different SC inputs. However, it is difficult to see how discrepancies highlight complementarity - does MCP (and combination with wm) provide additional information to using wm structural alone?~

      According to your comment, we have added the analyses based on different models using only the myelin-related predictor or WM connectivity to predict FC, and further compared the results among different models. please see lines 519-521: “In addition, we have constructed the models using only MPC or SCs to predict FC, respectively. Spearman’s correlation was used to assess the consistency between spatial patterns based on different models.” 

      Please see lines 128-130: “In addition, the coupling pattern based on other models (using only MPC or only SCs to predict FC) and the comparison between the models were shown in Figure S2A-C.” Please see lines 178-179: “The age-related patterns of SC-FC coupling based other coupling models were shown in Figure S2D-F.”

      Although we found that there were spatial consistencies in the coupling patterns between different models, the incorporation of MPC with SC connectivity can improve the prediction of FC than the models based on only MPC or SC. For age-related changes in coupling, the differences between the models was further amplified. We agree with you that the complementarity cannot be explicitly quantified and we have revised the description, please see line 329: “These discrepancies likely arise from differences in coupling methods.”

      Author response image 5.

      Comparison results between different models. Spatial pattern of mean SC-FC coupling based on MPC ~ FC (A), SCs ~ FC (B), and MPC + SCs ~ FC (C). Correlation of age with SC-FC coupling across cortex based on MPC ~ FC (D), SCs ~ FC (E), and MPC + SCs ~ FC (F).

      ** For the interpretation of results: L31 'SC-FC coupling is positively associated with genes in oligodendrocyte-related pathways and negatively associated with astrocyte-related gene'; L124: positive myelin content with SC-FC coupling...and similarly on L81, L219, L299, L342, and L490:

      ***You use a T1/T2 ratio which is (in large part) a measure of myelin to estimate the coupling between SC and FC. Evaluation with SC-FC coupling with myeline described in Figure 2E is possibly biased by the choice of this feature. Similarly, it is possible that reported positive associations with oligodendrocyte-related pathways and SC-FC coupling in your work could in part result from a bias introduced by the 'myelin descriptor' (conversely, picking up the oligodendrocyte-related genes is a nice corroboration for the T1/T2 ration being a myelin descriptor, so that's nice). However, it is possible that if you used a different descriptor of the cortical microstructure, you might find different expression patterns associated with the SCFC coupling (for example using neurite density index might pick up neuronal-related genes?). As mentioned in my previous suggestions, I think it would be of interest to first use only the white matter structural connectivity feature to assess coupling to FC and assess the gene expression in the cortical regions to see if the same genes are related, and subsequently incorporate MPC to dissociate potential bias of using a myelin measure from genetic findings.

      Thank you for your insightful comments. In this paper, however, the core method of measuring coupling is to predict functional connections using multimodal structural connections, which may yield more information than a single modal. We agree with your comment that separating SCs and MPC to look at the genes involved in both separately could lead to interesting discoveries. We will continue to explore this in the future.

      ** Generally, I find it difficult to understand the interpretation of SC-FC coupling measures and would be interested to hear your thinking about this. As you mention on L290-294, how well SC predicts FC depends on which input features are used for the coupling assessment (more complex communication models, incorporating additional microstructural information etc 'yield more accurate predictions of FC' L291) - thus, calculated coupling can be interpreted as a measure of how well a particular set of input features explain FC (different sets will explain FC more or less well) ~ coupling is related to a measure of 'missing' information on the SC-FC relationship which is not contained within the particular set of structural descriptors - with this approach, the goal might be to determine the set that best, i.e. completely, explains FC to understand the link between structure and function. When you use the coupling measures for comparisons with age, cognition prediction etc, the 'status' of the SC-FC changes, it is no longer the amount of FC explained by the given SC descriptor set, but it's considered a descriptor in itself (rather than an effect of feature selection / SC-FC information overlap) - how do you interpret/argue for this shift of use?

      Thank you for your comment. In this paper, we obtain reasonable SC-FC coupling by determining the optimal set of structural features to explain the function. The coupling essentially measures the direct correspondence between structure and function. To study the relationship between coupling and age and cognition is actually to study the age correlation and cognitive correlation of this direct correspondence between structure and function. 

      ** In a similar vein to the above comment, I am interested to hear what you think: on L305 you mention that 'perfect SC-FC coupling may be unlikely'. Would this reasoning suggest that functional activity takes place through other means than (and is therefore somehow independent of) biological (structural) substrates? For now, I think one can only say that we have imperfect descriptors of the structure so there is always information missing to explain function, this however does not mean the SC and FC are not perfectly coupled (only that we look at insufficient structural descriptors - limitations of what imaging can assess, what we measure etc). This is in line with L305 where you mention that 'Moreover, our results suggested that regional preferential contributions across different SCs lead to variations in the underlying communication process'. This suggests that locally different areas might use different communication models which are not reflected in the measures of SC-FC coupling that was employed, not that the 'coupling' is lower or higher (or coupling is not perfect). This is also a change in approach to L293: 'This configuration effectively releases the association cortex from strong structural constraints' - the 'release' might only be in light of the particular structural descriptors you use - is it conceivable that a different communication model would be more appropriate (and show high coupling) in these areas.

      Thank you for your insightful comments. We have changed the description, please see lines 315317: “SC-FC coupling is dynamic and changes throughout the lifespan[7], particularly during adolescence[6,9], suggesting that perfect SC-FC coupling may require sufficient structural descriptors.” 

      *Cognitive predictions:

      ** From a practical stand-point, do you think SC-FC coupling is a better (more accurate) indicator of cognitive outcomes (for example for future prediction studies) than each modality alone (which is practically easier to obtain and process)? It would be useful to check the behavioural outcome predictions for each modality separately (as suggested above for coupling estimates). In case SC-FC coupling does not outperform each modality separately, what is the benefit of using their coupling? Similarly, it would be useful to compare to using only cortical myelin for the prediction (which you showed to increase in importance for the coupling). In the case of myelin->coupling-> intelligence, if you are able to predict outcomes with the same performance from myelin without the need for coupling measures, what is the benefit of coupling?

      From a predictive performance point of view, we do not believe that SC-FC coupling is a better indicator than a single mode (voxel, network or other indicator). Our starting point is to assess whether SC-FC coupling is related to the individual differences of cognitive performances rather than to prove its predictive power over other measures. As you suggest, it's a very interesting perspective on the predictive power of cognition by separating the various modalities and comparing them. We will continue to explore this issue in the future study.

      ** The statement on L187 'suggesting that increased SC-FC coupling during development is associated with higher intelligence' might not be completely appropriate before age corrections (especially given the large drop in performance that suggests confounding effects of age).

      According to your comment, we have removed the statement.

      ** L188: it might be useful to report the range of R across the outer cross-validation folds as from Figure 4A it is not completely clear that the predictive performance is above the random (0) threshold. (For the sake of clarity, on L180 it might be useful for the reader if you directly report that other outcomes were not above the random threshold).

      According to your comment, we have added the range of R and revised the description, please see lines 195-198: “Furthermore, even after controlling for age, SC-FC coupling remained a significant predictor of general intelligence better than at chance (Pearson’s r\=0.11±0.04, p\=0.01, FDR corrected, Figure 4A). For fluid intelligence and crystal intelligence, the predictive performances of SC-FC coupling were not better than at chance (Figure 4A).”

      In a similar vein, in the text, you report Pearson's R for the predictive results but Figure 4A shows predictive accuracy - accuracy is a different (categorical) metric. It would be good to homogenise to clarify predictive results.

      We have made the corresponding changes in Figure 4.

      Author response image 6.

      Encoding individual differences in intelligence using regional SC-FC coupling. (A) Predictive accuracy of fluid, crystallized, and general intelligence composite scores. (B) Regional distribution of predictive weight. (C) Predictive contribution of functional networks. The boxes show the median and interquartile range (IQR; 25–75%), and the whiskers depict the 1.5× IQR from the first or third quartile.

      *Methods and QC:

      -Parcellations

      ** It would be useful to mention briefly how the BNA was applied to the data and if any quality checks were performed for the resulting parcellations, especially for the youngest subjects which might be most dissimilar to the population used to derive the atlas (healthy adults HCP subjects) ~ question of parcellation quality.

      We have added the description, please see lines 434-436: “The BNA[31] was projected on native space according to the official scripts (http://www.brainnetome.org/resource/) and the native BNA was checked by visual inspection.” 

      ** Additionally, the appropriateness of structurally defined regions for the functional analysis is also a topic of important debate. It might be useful to mention the above as limitations (which apply to most studies with similar focus).

      We have added your comment to the methodological issues, please see lines 378-379: “Third, the appropriateness of structurally defined regions for the functional analysis is also a topic of important debate.”

      - Tractography

      ** L432: it might be useful to name the method you used (probtrackx).

      We have added this name to the description, please see lines 455-456: “probabilistic tractography (probtrackx)[78, 79] was implemented in the FDT toolbox …”

      ** L434: 'dividing the total fibres number in source region' - dividing by what?

      We have revised the description, please see line 458: “dividing by the total fibres number in source region.”

      ** L436: 'connections in subcortical areas were removed' - why did you trace connections to subcortical areas in the first place if you then removed them (to match with cortical MPC areas I suspect)? Or do you mean there were spurious streamlines through subcortical regions that you filtered?

      On the one hand we need to match the MPC, and on the other hand, as we stated in methodological issues, the challenge of accurately resolving the connections of small structures within subcortical regions using whole-brain diffusion imaging and tractography techniques[10, 11]. 

      ** Following on the above, did you use any exclusion masks during the tracing? In general, more information about quality checks for the tractography would be useful. For example, L437: did you do any quality evaluations based on the removed spurious streamlines? For example, were there any trends between spurious streamlines and the age of the subject? Distance between regions/size of the regions?

      We did not use any exclusion masks. We performed visual inspection for the tractography quality and did not assess the relationship between spurious streamlines and age or distance between regions/size of the regions.

      ** L439: 'weighted probabilistic network' - this was weighted by the filtered connectivity densities or something else?

      The probabilistic network is weighted by the filtered connectivity densities.

      ** I appreciate the short description of the communication models in Text S1, it is very useful.

      Thank you for your comment.

      ** In addition to limitations mentioned in L368 - during reconstruction, have you noticed problems resolving short inter-hemispheric connections?

      We have not considered this issue, we have added it to the limitation, please see lines 383-384: “In addition, the reconstruction of short connections between hemispheres is a notable challenge.”

      - Functional analysis:

      ** There is a difference in acquisition times between participants below and above 8 years (21 vs 26 min), does the different length of acquisition affect the quality of the processed data?

      We have made relatively strict quality control to ensure the quality of the processed data.  

      ** L446 'regressed out nuisance variables' - it would be informative to describe in more detail what you used to perform this.

      We have provided more detail about the regression of nuisance variables, please see lines 476-477: “The nuisance variables were removed from time series based on general linear model.”

      ** L450-452: it would be useful to add the number of excluded participants to get an intuition for the overall quality of the functional data. Have you checked if the quality is associated with the age of the participant (which might be related to motion etc). Adding a distribution of remaining frames across participants (vs age) would be useful to see in the supplementary methods to better understand the data you are using.

      We have supplemented the exclusion information of the subjects during the data processing, and the distribution and aged correlation of motion and remaining frames. Please see lines 481-485: “Quality control. The exclusion of participants in the whole multimodal data processing pipeline was depicted in Figure S13. In the context of fMRI data, we computed Pearson’s correlation between motion and age, as well as between the number of remaining frames and age, for the included participants aged 5 to 22 years and 8 to 22 years, respectively. These correlations were presented in Figure S14.”

      Author response image 7.

      Exclusion of participants in the whole multimodal data processing pipeline.  

      Author response image 8.

      Figure S14. Correlations between motion and age and number of remaining frames and age.

      ** L454: 'Pearson's correlation's... ' In contrast to MPC you did not remove negative correlations in the functional matrices. Why this choice?

      Whether the negative correlation connection of functional signal is removed or not has always been a controversial issue. Referring to previous studies of SC-FC coupling[12-14], we find that the practice of retaining negative correlation connections has been widely used. In order to retain more information, we chose this strategy. Considering that MPC is a nascent approach to network modeling, we adopted a more conservative strategy that removing negative correlation by referring to the study [4] that proposed the approach.

      - Gene expression:

      ** L635, you focus on the left cortex, is this common? Do you expect the gene expression to be fully symmetric (given reported functional hemispheric asymmetries)? It might be good to expand on the reasoning.

      An important consideration regarding sample assignment arises from the fact that only two out of six brains were sampled from both hemispheres and four brains have samples collected only in the left. This sparse sampling should be carefully considered when combining data across donors[1]. We have supplemented the description, please see lines 569-571: “Restricting analyses to the left hemisphere will minimize variability across regions (and hemispheres) in terms of the number of samples available[40].”

      ** Paragraph of L537: you use evolution of coupling with age (correlation) and compare to gene expression with adults (cohort of Allen Human Brain Atlas - no temporal evolution to the gene expressions) and on L369 you mention that 'relative spatial patterns of gene expressions remain stable after birth'. Of course this is not a place to question previous studies, but would you really expect the gene expression associated with the temporary processes to remain stable throughout the development? For example, myelination would follow different spatiotemporal gradient across brain regions, is it reasonable to expect that the expression patterns remain the same? How do you then interpret a changing measure of coupling (correlation with age) with a gene expression assessed statically?

      We agree with your comment that the spatial expression patterns is expected to vary at different periods. We have revised the previous description, please see lines 383-386: “Fifth, it is important to acknowledge that changes in gene expression levels during development may introduce bias in the results.”

      - Reproducibility analyses:

      ** Paragraph L576: are we to understand that you performed the entire pipeline 3 times (WD, S1, S2) for both parcellations schemes and tractography methods (~12 times) including the selection of communication models and you always got the same best three communication models and gene expression etc? Or did you make some design choices (i.e. selection of communication models) only on a specific set-up and transfer to other settings?

      The choice of communication model is established at the beginning, which we have clarified in the article, please see lines 106-108: “We used these three models to represent the extracortical connectivity properties in subsequent discovery and reproducibility analyses (Figure S1).” For reproducibility analyses (parcellation, tractography, and split-half validation), we fixed other settings and only assessed the impact of a single factor.

      ** Paragraph of L241: I really appreciate you evaluated the robustness of your results to different tractography strategies. It is reassuring to see the similarity in results for the two approaches. Did you notice any age-related effects on tractography quality for the two methods given the wide age range (did you check?)

      In our study, the tractography quality was checked by visual inspection. Using quantifiable tools to tractography quality in future studies could answer this question objectively.

      ** Additionally, I wonder how much of that overlap is driven by the changes in MPC which is the same between the two methods... especially given its high weight in the SC-FC coupling you reported earlier in the paper. It might be informative to directly compare the connectivity matrices derived from the two tracto methods directly. Generally, as mentioned in the previous comments, I think it would be interesting to assess coupling using different input settings (with WM structural and MPC separate and then combined).

      As your previous comment, we have examined the coupling patterns, coupling differences, coupling age correlation, and spatial correlations between the patterns based on different models, as shown in Figure S2. Please see our response to the previous comment for details.

      ** L251 - I also wonder if the random splitting is best adapted to validation in your case given you study relationships with age. Would it make more sense to make stratified splits to ensure a 'similar age coverage' across splits?

      In our study, we adopt the random splitting process which repeated 1,000 times to minimize bias due to data partitioning. The stratification you mentioned is a reasonable method, and keeping the age distribution even will lead to higher verification similarity than our validation method. However, from the validation results of our method, the similarity is sufficient to explain the generalization of our findings.

      Minor comments

      L42: 'is regulated by genes'

      ** Coupling (if having a functional role and being regulated at all) is possibly resulting from a complex interplay of different factors in addition to genes, for example, learning/environment, it might be more cautious to use 'regulated in part by genes' or similar.

      We have corrected it, please see line 42.

      L43 (and also L377): 'development of SC-FC coupling'

      ** I know this is very nitpicky and depends on your opinion about the nature of SC-FC coupling, but 'development of SC-FC coupling' gives an impression of something maturing that has a role 'in itself' (for example development of eye from neuroepithelium to mature organ etc.). For now, I am not sure it is fully certain that SC-FC coupling is more than a byproduct of the comparison between SC and FC, using 'changes in SC-FC coupling with development' might be more apt.

      We have corrected it, please see lines 43-44.

      L261 'SC-FC coupling was stronger ... [] ... and followed fundamental properties of cortical organization.' vs L168 'No significant correlations were found between developmental changes in SC-FC coupling and the fundamental properties of cortical organization'.

      **Which one is it? I think in the first you refer to mean coupling over all infants and in the second about correlation with age. How do you interpret the difference?

      Between the ages of 5 and 22 years, we found that the mean SC-FC coupling pattern has become similar to that of adults, consistent with the fundamental properties of cortical organization. However, the developmental changes in SC-FC coupling are heterogeneous and sequential and do not follow the mean coupling pattern to change in the same magnitude.

      L277: 'temporal and spatial complexity'

      ** Additionally, communication models have different assumptions about the flow within the structural network and will have different biological plausibility (they will be more or less

      'realistic').

      Here temporal and spatial complexity is from a computational point of view.

      L283: 'We excluded a centralized model (shortest paths), which was not biologically plausible' ** But in Text S1 and Table S1 you specify the shortest paths models. Does this mean you computed them but did not incorporate them in the final coupling computations even if they were predictive?

      ** Generally, I find the selection of the final 3 communication models confusing. It would be very useful if you could clarify this further, for example in the methods section.

      We used all twenty-seven communication models (including shortest paths) to predict FC at the node level for each participant. Then we identified three communication models that can significantly predict FC. For the shortest path, he was excluded because he did not meet the significance criteria. We have further added methodological details to this section, please see lines 503-507.

      L332 'As we observed increasing coupling in these [frontoparietal network and default mode network] networks, this may have contributed to the improvements in general intelligence, highlighting the flexible and integrated role of these networks' vs L293 'SC-FC coupling in association areas, which have lower structural connectivity, was lower than that in sensory areas. This configuration effectively releases the association cortex from strong structural constraints imposed by early activity cascades, promoting higher cognitive functions that transcend simple sensori-motor exchanges'

      ** I am not sure I follow the reasoning. Could you expand on why it would be the decoupling promoting the cognitive function in one case (association areas generally), but on the reverse the increased coupling in frontoparietal promoting the cognition in the other (specifically frontoparietal)?

      We tried to explain the problem, for general intelligence, increased coupling in frontoparietal could allow more effective information integration enable efficient collaboration between different cognitive processes.

      * Formatting errors etc.

      L52: maybe rephrase?

      We have rephrased, please see lines 51-53: “The T1- to T2-weighted (T1w/T2w) ratio of MRI has been proposed as a means of quantifying microstructure profile covariance (MPC), which reflects a simplified recapitulation in cellular changes across intracortical laminar structure[6, 1215].”

      L68: specialization1,[20].

      We have corrected it.

      L167: 'networks significantly increased with age and exhibited greater increased' - needs rephrasing.

      We have corrected it.

      L194: 'networks were significantly predicted the general intelligence' - needs rephrasing.

      We have corrected it, please see lines 204-205: “we found that the weights of frontoparietal and default mode networks significantly contributed to the prediction of the general intelligence.”

      L447: 'and temporal bandpass filtering' - there is a verb missing.

      We have corrected it, please see line 471: “executed temporal bandpass filtering.”

      L448: 'greater than 0.15' - unit missing.

      We have corrected it, please see line 472: “greater than 0.15 mm”.

      L452: 'After censoring, regression of nuisance variables, and temporal bandpass filtering,' - no need to repeat the steps as you mentioned them 3 sentences earlier.

      We have removed it.

      L458-459: sorry I find this description slightly confusing. What do you mean by 'modal'? Connectional -> connectivity profile. The whole thing could be simplified, if I understand correctly your vector of independent variables is a set of wm and microstructural 'connectivity' of the given node... if this is not the case, please make it clearer.

      We have corrected it, please see line 488: “where 𝒔𝑖 is the 𝑖th SC profiles, 𝑛 is the number of SC profiles”.

      L479: 'values and system-specific of 480 coupling'.

      We have corrected it.

      L500: 'regular' - regularisation.

      We have changed it to “regularization”.

      L567: Do you mean that in contrast to probabilistic with FSL you use deterministic methods within Camino? For L570, you introduce communication models through 'such as': did you fit all models like before? If not, it might be clearer to just list the ones you estimated rather than introduce through 'such as'.

      We have changed the description to avoid ambiguity, please see lines 608-609: “We then calculated the communication properties of the WMC including communicability, mean first passage times of random walkers, and flow graphs (timescales=1).”

      Citation [12], it is unusual to include competing interests in the citation, moreover, Dr. Bullmore mentioned is not in the authors' list - this is most likely an error with citation import, it would be good to double-check.

      We have corrected it.

      L590: Python scripts used to perform PLS regression can 591 be found at https://scikitlearn.org/. The link leads to general documentation for sklearn.

      We have corrected it, please see lines 627-630: “Python scripts used to perform PLS regression can be found at https://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.PLSRegression.html#sklearn.cro ss_decomposition.PLSRegression.”

      P26 and 27 - there are two related sections: Data and code availability and Code availability - it might be worth merging into one section if possible.

      We have corrected it, please see lines 623-633.

      References

      (1) Arnatkeviciute A, Fulcher BD, Fornito A. A practical guide to linking brain-wide gene expression and neuroimaging data. Neuroimage. 2019;189:353-67. Epub 2019/01/17. doi: 10.1016/j.neuroimage.2019.01.011. PubMed PMID: 30648605.

      (2) Zhong S, He Y, Gong G. Convergence and divergence across construction methods for human brain white matter networks: an assessment based on individual differences. Hum Brain Mapp. 2015;36(5):1995-2013. Epub 2015/02/03. doi: 10.1002/hbm.22751. PubMed PMID: 25641208; PubMed Central PMCID: PMCPMC6869604.

      (3) Waehnert MD, Dinse J, Weiss M, Streicher MN, Waehnert P, Geyer S, et al. Anatomically motivated modeling of cortical laminae. Neuroimage. 2014;93 Pt 2:210-20. Epub 2013/04/23. doi: 10.1016/j.neuroimage.2013.03.078. PubMed PMID: 23603284.

      (4) Paquola C, Vos De Wael R, Wagstyl K, Bethlehem RAI, Hong SJ, Seidlitz J, et al. Microstructural and functional gradients are increasingly dissociated in transmodal cortices. PLoS Biol. 2019;17(5):e3000284. Epub 2019/05/21. doi: 10.1371/journal.pbio.3000284. PubMed PMID: 31107870.

      (5) Haufe S, Meinecke F, Gorgen K, Dahne S, Haynes JD, Blankertz B, et al. On the interpretation of weight vectors of linear models in multivariate neuroimaging. Neuroimage. 2014;87:96-110. Epub 2013/11/19. doi: 10.1016/j.neuroimage.2013.10.067. PubMed PMID: 24239590.

      (6) Demirtas M, Burt JB, Helmer M, Ji JL, Adkinson BD, Glasser MF, et al. Hierarchical Heterogeneity across Human Cortex Shapes Large-Scale Neural Dynamics. Neuron. 2019;101(6):1181-94 e13. Epub 2019/02/13. doi: 10.1016/j.neuron.2019.01.017. PubMed PMID: 30744986; PubMed Central PMCID: PMCPMC6447428.

      (7) Deco G, Kringelbach ML, Arnatkeviciute A, Oldham S, Sabaroedin K, Rogasch NC, et al. Dynamical consequences of regional heterogeneity in the brain's transcriptional landscape. Sci Adv. 2021;7(29). Epub 2021/07/16. doi: 10.1126/sciadv.abf4752. PubMed PMID: 34261652; PubMed Central PMCID: PMCPMC8279501.

      (8) Chen J, Tam A, Kebets V, Orban C, Ooi LQR, Asplund CL, et al. Shared and unique brain network features predict cognitive, personality, and mental health scores in the ABCD study. Nat Commun. 2022;13(1):2217. Epub 2022/04/27. doi: 10.1038/s41467-022-29766-8. PubMed PMID: 35468875; PubMed Central PMCID: PMCPMC9038754.

      (9) Li J, Bzdok D, Chen J, Tam A, Ooi LQR, Holmes AJ, et al. Cross-ethnicity/race generalization failure of behavioral prediction from resting-state functional connectivity. Sci Adv. 2022;8(11):eabj1812. Epub 2022/03/17. doi: 10.1126/sciadv.abj1812. PubMed PMID: 35294251; PubMed Central PMCID: PMCPMC8926333.

      (10) Thomas C, Ye FQ, Irfanoglu MO, Modi P, Saleem KS, Leopold DA, et al. Anatomical accuracy of brain connections derived from diffusion MRI tractography is inherently limited. Proc Natl Acad Sci U S A. 2014;111(46):16574-9. Epub 2014/11/05. doi: 10.1073/pnas.1405672111. PubMed PMID: 25368179; PubMed Central PMCID: PMCPMC4246325.

      (11) Reveley C, Seth AK, Pierpaoli C, Silva AC, Yu D, Saunders RC, et al. Superficial white matter fiber systems impede detection of long-range cortical connections in diffusion MR tractography. Proc Natl Acad Sci U S A. 2015;112(21):E2820-8. Epub 2015/05/13. doi: 10.1073/pnas.1418198112. PubMed PMID: 25964365; PubMed Central PMCID: PMCPMC4450402.

      (12) Gu Z, Jamison KW, Sabuncu MR, Kuceyeski A. Heritability and interindividual variability of regional structure-function coupling. Nat Commun. 2021;12(1):4894. Epub 2021/08/14. doi: 10.1038/s41467-021-25184-4. PubMed PMID: 34385454; PubMed Central PMCID: PMCPMC8361191.

      (13) Liu ZQ, Vazquez-Rodriguez B, Spreng RN, Bernhardt BC, Betzel RF, Misic B. Time-resolved structure-function coupling in brain networks. Commun Biol. 2022;5(1):532. Epub 2022/06/03. doi: 10.1038/s42003-022-03466-x. PubMed PMID: 35654886; PubMed Central PMCID: PMCPMC9163085.

      (14) Zamani Esfahlani F, Faskowitz J, Slack J, Misic B, Betzel RF. Local structure-function relationships in human brain networks across the lifespan. Nat Commun. 2022;13(1):2053. Epub 2022/04/21. doi: 10.1038/s41467-022-29770-y. PubMed PMID: 35440659; PubMed Central PMCID: PMCPMC9018911.

    1. eLife assessment

      This study addresses an important, understudied question using approaches that link molecular, circuit, and behavioral changes. The novel findings that Netrin-1 and UNC5c can guide dopaminergic innervation from the nucleus accumbens to the cortex during adolescence are solid. The data showing that the onset of Unc5 expression is sexually dimorphic in mice, and that in Siberian hamsters environmental effects on development are also sexually dimorphic are also solid. Reviewers identified some gaps in evidence for specificity of Netrin-1 expression, which, if filled, would strengthen the evidence for some of the claims. Future work would also benefit from Unc5C knockdown to corroborate the results and investigation of the cause-effect relationship. This paper will be of interest to those interested in neural development, sex differences, and/or dopamine function.

    1. eLife assessment

      The authors present a valuable computational platform, which aims to automate the workflow for coarse-grained simulations of biomolecules in the framework of the popular MARTINI model. The capability of the platform has been convincingly demonstrated by the application to a large number of proteins as well as macrocycles and polymers. On the other hand, because the developments have largely been based on the MARTINI model, some might argue that the general impact on the multi-scale simulation community is limited, leaving the support for the claimed significance incomplete.

    2. Reviewer #1 (Public Review):

      Summary:

      In this study, the authors provide a new computational platform called Vermouth to automate topology generation, a crucial step that any biomolecular simulation starts with. Given a wide arrange of chemical structures that need to be simulated, varying qualities of structural models as inputs obtained from various sources, and diverse force fields and molecular dynamics engines employed for simulations, automation of this fundamental step is challenging, especially for complex systems and in case that there is a need to conduct high-throughput simulations in the application of computer-aided drug design (CADD). To overcome this challenge, the authors develop a programing library composed of components that carry out various types of fundamental functionalities that are commonly encountered in topological generation. These components are intended to be general for any type of molecules and not to depend on any specific force field and MD engines. To demonstrate the applicability of this library, the authors employ those components to re-assemble a pipeline called Martinize2 used in topology generation for simulations with a widely used coarse-grained model (CG) MARTINI. This pipeline can fully recapitulate the functionality of its original version Martinize but exhibit greatly enhanced generality, as confirmed by the ability of the pipeline to faithfully generate topologies for two high-complexity benchmarking sets of proteins.

      Strengths:

      The main strength of this work is the use of concepts and algorithms associated with induced subgraph in graph theory to automate several key but non-trivial steps of topology generation such as the identification of monomer residue units (MRU), the repair of input structures with missing atoms, the mapping of topologies between different resolutions, and the generation of parameters needed for describing interactions between MRUs. In addition, the documentation website provided by the authors is very informative, allowing users to get quickly started with Vermouth.

      Weaknesses:

      Although the Vermouth library is designed as a general tool for topology generation for molecular simulations, only its applications with MARTINI have been demonstrated in the current study. Thus, the claimed generality of Vermouth remains to be exmained. The authors may consider to point out this in their manuscript.

    3. Reviewer #2 (Public Review):

      This work introduces a Vermouth library framework to enhance software development within the Martini community. Specifically, it presents a Vermouth-powered program, Martinize2, for generating coarse-grained structures and topologies from atomistic structures. In addition to introducing the Vermouth library and the Martinize2 program, this paper illustrates how Martinize2 identifies atoms, maps them to the Martini model, generates topology files, and identifies protonation states or post-translational modifications. Compared with the prior version, the authors provide a new figure to show that Martinize2 can be applied to various molecules, such as proteins, cofactors, and lipids. To demonstrate the general application, Martinize2 was used for converting 73% of 87,084 protein structures from the template library, with failed cases primarily blamed on missing coordinates.

      I was hoping to see some fundamental changes in the resubmitted version. To my disappointment, the manuscript remains largely unchanged (even the typo I pointed out previously was not fixed). I do not doubt that Martinize2 and Vermouth are useful to the Martini community, and this paper will have some impact. The manuscript is very technical and limited to the Martini community. The scientific insight for the general coarse-grained modeling community is unclear. The goal of the work is ambitious (such as high-throughput simulations and whole-cell modeling), but the results show just a validation of Martinize2. This version does not reverse my previous impression that it is incremental. As I pointed out in my previous review (and no response from the authors), all the issues associated with the Martini model are still there, e.g. the need for ENM. In this shape, I feel this manuscript is suitable for a specialized journal in computational biophysics or stays as part of the GitHub repository.

    4. Reviewer #3 (Public Review):

      The manuscript Kroon et al. described two algorithms, which when combined achieve high throughput automation of "martinizing" protein structures with selected protonation states and post-translational modifications. After the revisions provided by the authors, I recommend minor revision.

      The authors have addressed most of my concerns provided previously. Specifically, showcasing the capability of coarse-graining other types of molecules (Figure 7) is a useful addition, especially for the booming field of therapeutic macrocycles.

      My only additional concern is that to justify Martinize2 and Vermouth as a "high-throughput" method, the speed of these tools needs to be addressed in some form in the manuscript as a guideline to users.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors provide a new computational platform called Vermouth to automate topology generation, a crucial step that any biomolecular simulation starts with. Given a wide arrange of chemical structures that need to be simulated, varying qualities of structural models as inputs obtained from various sources, and diverse force fields and molecular dynamics engines employed for simulations, automation of this fundamental step is challenging, especially for complex systems and in case that there is a need to conduct high-throughput simulations in the application of computer-aided drug design (CADD). To overcome this challenge, the authors develop a programming library composed of components that carry out various types of fundamental functionalities that are commonly encountered in topological generation. These components are intended to be general for any type of molecules and not to depend on any specific force field and MD engines. To demonstrate the applicability of this library, the authors employ those components to re-assemble a pipeline called Martinize2 used in topology generation for simulations with a widely used coarse-grained model (CG) MARTINI. This pipeline can fully recapitulate the functionality of its original version Martinize but exhibit greatly enhanced generality, as confirmed by the ability of the pipeline to faithfully generate topologies for two high-complexity benchmarking sets of proteins.

      Strengths:

      The main strength of this work is the use of concepts and algorithms associated with induced subgraph in graph theory to automate several key but non-trivial steps of topology generation such as the identification of monomer residue units (MRU), the repair of input structures with missing atoms, the mapping of topologies between different resolutions, and the generation of parameters needed for describing interactions between MRUs.

      Weaknesses:

      Although the Vermouth library appears promising as a general tool for topology generation, there is insufficient information in the current manuscript and a lack of documentation that may allow users to easily apply this library. More detailed explanation of various classes such as Processor, Molecule, Mapping, ForceField etc. that are mentioned is still needed, including inputs, output and associated operations of these classes. Some simple demonstration of application of these classes would be of great help to users. The formats of internal databases used to describe reference structures and force fields may also need to be clarified. This is particularly important when the Vermouth needs to be adapted for other AA/CG force fields and other MD engines.

      We thank the reviewer for pointing out the strengths of the presented work and agree that one of the current limitations is the lack of documentation about the library. In the revision, we point more clearly to the documentation page of the Vermouth library, which contains more detailed information on the various processors. The format of the internal databases has also been added to the documentation page. Providing a simple demonstration of applications of these classes is a great suggestion, however, we believe that it is more convenient to provide those in the form of code examples in the documentation or for instance jupyter notebooks rather than in the paper itself.  

      The successful automation of the Vermouth relies on the reference structures that need to be pre-determined. In case of the study of 43 small ligands, the reference structures and corresponding mapping to MARTINIcompatible representations for all these ligands have been already defined in the M3 force field and added into the Vermouth library. However, the authors need to comment on the scenario where significantly more ligands need to be considered and other force fields need to be used as CG representations with a lack of reference structures and mapping schemes.

      We acknowledge that vermouth/martinize2 is not capable of automatically generating Martini mappings or parameters on the fly for unknown structures that are not part of the database. However, this capability is not the purpose of the program, which is rather to distribute and manage existing parameters. Unlike atomistic force fields, which frequently have automated topology builders, Martini parameters are usually obtained for a set of specific molecules at a time and benchmarked accordingly. As more parameters are obtained by researchers, they can be added to the vermouth library via the GitHub interface in a controlled manner. This process allows the database to grow and in our opinion will quickly grow beyond the currently implemented parameters. Furthermore, the API of Vermouth is set up in a way that it can easily interface with automated topology builders which are currently being developed. Hence this limitation in our view does not diminish the applicability of vermouth to high-throughput applications with many ligands. The framework is existing and works, now only more parameters have to be added.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript by Kroon, Grunewald, Marrink and coworkers present the development of Vermouth library for coarse grain assignment and parameterization and an updated version of python script, the Martinize2 program, to build Martini coarse grained (CG) models, primarily for protein systems.

      Strengths:

      In contrast to many mature and widely used tools to build all-atom (AA) models, there are few well-accepted programs for CG model constructions and parameterization. The research reported in this manuscript is among the ongoing efforts to build such tools for Martini CG modeling, with a clear goal of high-throughput simulations of complex biomolecular systems and, ultimately, whole-cell simulations. Thus, this manuscript targets a practical problem in computational biophysics. The authors see such an effort to unify operations like CG mapping, parameterization, etc. as a vital step from the software engineering perspective.

      Weaknesses:

      However, the manuscript in this shape is unclear in the scientific novelty and appears incremental upon existing methods and tools. The only "validation" (more like an example application) is to create Martini models with two protein structure sets (I-TASSER and AlphaFold). The success rate in building the models was only 73%, while the significant failure is due to incomplete AA coordinates. This suggests a dependence on the input AA models, which makes the results less attractive for high-throughput applications (for example, preparation/creation of the AA models can become the bottleneck). There seems to be an improvement in considering the protonation state and chemical modification, but convincing validation is still needed. Besides, limitations in the existing Martini models remain (like the restricted dynamics due to the elastic network, the electrostatic interactions or polarizability).

      We thank the reviewer for pointing out the strengths of the presented work, but respectfully disagree with the criticism that the presented work is only incremental upon existing methods and tools. All MD simulations of structured proteins regardless of the force field or resolution rely on a decent initial structure to produce valid results. Therefore, failure upon detection of malformed protein input structures is an essential feature for any high-throughput pipeline working with proteins, especially considering the computational cost of MD simulations. We note that programs such as the first version of Martinize generate reasonable-looking input parameters that lead to unphysical simulations and wasted CPU hours.

      The alpha-fold database for which we surveyed 200,000 structures only contained 7 problematic structures, which means that the success rate was 99% for this database. This example simply shows that users potentially have to add the step of fixing atomistic protein input structures, if they seek to run a high-throughput pipeline.

      But at least they can be assured that martinize2 will make sure to check that no issues persist.

      Furthermore, we note that the manuscript does not aim to validate or improve the existing Martini (protein) models. All example cases presented in the paper are subject to the limitations of the protein models for the reason that martinize2 is only the program to generate those parameters. Future improvements in the protein model, which are currently underway, will immediately be available through the program to the broader community.  

      Reviewer #3 (Public Review):

      Summary:

      The manuscript Kroon et al. described two algorithms, which when combined achieve high throughput automation of "martinizing" protein structures with selected protonation states and post-translational modifications.

      Strengths:

      A large scale protein simulation was attempted, showing strong evidence that authors' algorithms work smoothly.

      The authors described the algorithms in detail and shared the open-source code under Apache 2.0 license on GitHub. This allows both reproducibility of extended usefulness within the field. These algorithms are potentially impactful if the authors can address some of the issues listed below.

      We thank the reviewer for pointing out the strengths.  

      Weaknesses:

      One major caveat of the manuscript is that the authors claim their algorithms aim to "process any type of molecule or polymer, be it linear, cyclic, branched, or dendrimeric, and mixtures thereof" and "enable researchers to prepare simulation input files for arbitrary (bio)polymers". However, the examples provided by the manuscript only support one type of biopolymer, i.e. proteins. Despite the authors' recommendation of using polyply along with martinize2/vermouth, no concrete evidence has been provided to support the authors' claim. Therefore, the manuscript must be modified to either remove these claims or include new evidence.

      We acknowledge that the current manuscript is largely protein-centric. To some extent this results from the legacy of martinize version 1, which was also only used for proteins. However, to show that martinize2 also works for cyclic as well as branched molecules we implemented two additional test cases and updated formerly Figure 6 and now Figure 7. Crown ether is used as an example of a cyclic molecule whereas a small branched polyethylene molecule is a test case for branching. Needless to say both molecules are neither proteins nor biomolecules. 

      Method descriptions on Martinize2 and graph algorithms in SI should be core content of the manuscript. I argue that Figure S1 and Figure S2 are more important than Figure 3 (protonation state). I recommend the authors can make a workflow chart combining Figure S1 and S2 to explain Martinize2 and graph algorithms in main text.

      The reviewer's critique is fair. Given the already rather large manuscript, we tried to strike a balance between describing benchmark test cases, some practical usage information (e.g. the Histidine modification), and the algorithmic library side of the program. In particular, we chose to add the figure on protonation state, because how to deal with protonation states—in particular, Histidines—was amongst the top three raised issues by users on our GitHub page. Due to this large community interest, we consider the figure equally important. However, we moved Figure S1 from the Supporting Information into the manuscript and annotated the already mentioned text with the corresponding panels to more clearly illustrate the underlying procedure. 

      In Figure 3 (protonation state), the figure itself and the captions are ambiguous about whether at the end the residue is simply renamed from HIS to HIP, or if hydrogen is removed from HIP to recover HIS.

      Using either of the two routes yields the same parameters in the end, which are for the protonated Histidine. In the second route, the extra hydrogen on Histidine is detected as an additional atom and therefore a different logic flow is triggered. Atoms are never removed, but only compounded to a base block plus modification atoms. We adjusted the figure caption to point this out more clearly.  

      In "Incorporating a Ligand small-molecule Database", the authors are calling for a community effort to build a small-molecule database. Some guidance on when the current database/algorithm combination does or does not work will help the community in contributing.

      Any small molecule not part of the database will not work. However, martinize2 will quickly identify if there are missing components of the system and alert the users. At that point, the users can decide to make their files, guided by the new documentation pages. 

      A speed comparison is needed to compare Martinize2 and Martinize.

      We respectfully disagree that a speed comparison is needed. We already alerted in the manuscript discussion that martinize2 is slower, since it does more checks, is more general, and does not only implement a single protein model.

    1. eLife assessment

      This important study of artificial selection in microbial communities shows that the possibility of selecting a desired fraction of slow and fast-growing types is impacted by their initial fractions. The evidence, which relies on mathematical analysis and simulations of a stochastic model, is convincing. It highlights the tension between selection at the strain and the community level. This study should be of interest to researchers interested in ecology, both theoretical and experimental.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors demonstrate with a simple stochastic model that the initial composition of the community is important in achieving a target frequency during the artificial selection of a community.

      Strengths:

      To my knowledge, the intra-collective selection during artificial selection has not been seriously theoretically considered. However, in many cases, the species dynamics during the incubation of each selection cycle are important and relevant to the outcome of the artificial selection experiment. Stochasticity from birth and death (demographic stochasticity) plays a big role in these species' abundance dynamics. This work uses a simple framework to tackle this idea meticulously.

      This work may or may not be related to hysteresis (path dependency). If this is true, maybe it would be nice to have a discussion paragraph talking about how this may be the case. Then, this work would even attract the interest of people studying dynamic systems.

      Weaknesses:

      (1) Connecting structure and function

      In typical artificial selection literature, most of them select the community based on collective function. Here in this paper, the authors are selecting a target composition. Although there is a schematic cartoon illustrating the relationship between collective function (y-axis) and the community composition in the main Figure 1, there is no explicit explanation or justification of what may be the origin of this relationship. I think giving the readers a naïve idea about how this structure-function relationship arises in the introduction section would help. This is because the conclusion of this paper is that the intra-collective selection makes it hard to artificially select a community that has an intermediate frequency of f (or s). If there is really evidence or theoretical derivation from this framework that indeed the highest function comes from the intermediate frequency of f, then the impact of this paper would increase because the conclusions of this stochastic model could allude to the reasons for the prevalent failures of artificial selection in literature.

      (2) Explain intra-collective and inter-collective selection better for readers.

      The abstract, the introduction, and the result section use these terms or intra-collective and inter-collective selection without much explanation. A clear definition in the beginning would help the audience grasp the importance of this paper, because these concepts are at the core of this work.

      (3) Achievable target frequency strongly depending on the degree of demographic stochasticity.

      I would expect that the experimentalists would find these results interesting and would want to consider these results during their artificial selection experiments. The main Figure 4 indicates that the Newborn size N0 is a very important factor to consider during the artificial selection experiment. This would be equivalent to how much bottleneck is imposed on the artificial selection process in every iteration step (i.e., the ratio of serial dilution experiment). However, with a low population size, all target frequencies can be achieved, and therefore in these regimes, the initial frequency now does not matter much. It would be great for the authors to provide what the N0 parameter actually means during the artificial selection experiments. Maybe relative to some other parameter in the model. I know this could be very hard. But without this, the main result of this paper (initial frequency matters) cannot be taken advantage of by the experimentalists.

      (4) Consideration of environmental stochasticity.

      The success (gold area of Figure 2d) in this framework mainly depends on the size of the demographic stochasticity (birth-only model) during the intra-collective selection. However, during experiments, a lot of environmental stochasticity appears to be occurring during artificial selection. This may be out of the scope of this study. But it would definitely be exciting to see how much environmental stochasticity relative to the demographic stochasticity (variation in the Gaussian distribution of F and S) matters in succeeding in achieving the target composition from artificial selection.

      (5) Assumption about mutation rates

      If setting the mutation rates to zero does not change the result of the simulations and the conclusion, what is the purpose of having the mutation rates \mu? Also, is the unidirectional (S -> F -> FF) mutation realistic? I didn't quite understand how the mutations could fit into the story of this paper.

      (6) Minor points

      In Figure 3b, it is not clear to me how the frequency difference for the Intra-collective and the Inter-collective selection is computed.

      In Figure 5b, the gold region (success) near the FF is not visible. Maybe increase the size of the figure or have an inset for zoom-in. Why is the region not as big as the bottom gold region?

    3. Reviewer #2 (Public Review):

      The authors provide an analytical framework to model the artificial selection of the composition of communities comprised of strains growing at different rates. Their approach takes into account the competition between the targeted selection at the level of the meta-community and the selection that automatically favors fast-growing cells within each replicate community. Their main finding is a tipping point or path-dependence effect, whereby compositions dominated by slow-growing types can only be reached by community-level selection if the community does not start and never crosses into a range of compositions dominated by fast growers during the dynamics.

      These results seem to us both technically correct and interesting. We commend the authors on their efforts to make their work reproducible even when it comes to calculations via extensive appendices, though perhaps a table of contents and a short description of these appendices at the start of SI would help navigate them.

      The main limitation in the current form of the article is that it could clarify how its assumptions and findings differ from and improve upon the rest of the literature:

      - Many studies discuss the interplay between community-level evolution and species- or strain-level evolution. But "evolution" can be a mix of various forces, including selection, drift/randomness, and mutation/innovation.

      - This work's specificity is that it focuses strictly on constant community-level selection versus constant strain-level selection, all other forces being negligible (neither stochasticity nor innovation/mutation matter at either level, as we try to clarify now).

      - Regarding constant community-level selection, it is only briefly noted that "once a target frequency is achieved, inter-collective selection is always required to maintain that frequency due to the fitness difference between the two types" [pg. 3 {section sign}2]. In other words, action from the selector is required indefinitely to maintain the community in the desired state. This assumption is found in a fraction of the literature, but is still worth clarifying from the start as it can inform the practical applicability of the results.

      - More importantly, strain-level evolution also boils down here to pure selection with a constant target, which is less usual in the relevant literature. Here, (1) drift from limited population sizes is very small, with no meaningful counterbalancing of selection, (2) pure exponential regime with constant fitness, no interactions, no density- or frequency-dependence, (3) there is no innovation in the sense that available types are unchanging through time (no evolution of traits such as growth rate or interactions) and (4) all the results presented seem unchanged when mutation rate mu = 0 (as noted in Appendix III), meaning that the conclusions are not "about" mutation in any meaningful way.

      - Furthermore, the choice of mutation mechanism is peculiar, as it happens only from slow to fast grower: more commonly, one assumes random non-directional mutations, rather than purely directional ones from less fit to fitter (which is more of a "Lamarckian" idea). Given that mutation does not seem to matter here, this choice might create unnecessary opposition from some readers or could be considered as just one possibility among others.

      It would be helpful to have all these points stated clearly so that it becomes easy to see where this article stands in an abundant literature and contributes to our understanding of multi-level evolution, and why it may have different conclusions or focus than others tackling very similar questions.

      Finally, a microbial context is given to the study, but the assumptions and results are in no way truly tied to that context, so it should be clear that this is just for flavor.

    4. Reviewer #3 (Public Review):

      The authors address the process of community evolution under collective-level selection for a prescribed community composition. They mostly consider communities composed of two types that reproduce at different rates, and that can mutate one into the other. Due to such differences in 'fitness' and to the absence of density dependence, within-collective selection is expected to always favour the fastest grower, but the collective-level selection can oppose this tendency, to a certain extent at least. By approximating the stochastic within-generation dynamics and solving it analytically, the authors show that not only high frequencies of fast growers can be reproducibly achieved, aligned with their fitness advantage. Small target frequencies can also be maintained, provided that the initial proportion of fast growers is sufficiently small. In this regime, similar to the 'stochastic corrector' model, variation upon which selection acts is maintained by a combination of demographic stochasticity and of sampling at reproduction. These two regions of achievable target compositions are separated by a gap, encompassing intermediate frequencies that are only achievable when the bottleneck size is small enough or the number of communities is (disproportionately) larger.

      A similar conclusion, that stochastic fluctuations can maintain the system over evolutionary time far from the prevalence of the faster-growing type, is then confirmed by analyzing a three-species community, suggesting that the qualitative conclusions of this study are generalizable to more complex communities.

      I expect that these results will be of broad interest to the community of researchers who strive to improve community-level selection, but are often limited to numerical explorations, with prohibitive costs for a full characterization of the parameter space of such embedded populations. The realization that not all target collective functions can be as easily achieved and that they should be adapted to the initial conditions and the selection protocol is also a sobering message for designing concrete applications.

      A major strength of this work is that the qualitative behaviour of the system is captured by an analytically solvable approximation so that the extent of the 'forbidden region' can be directly and generically related to the parameters of the selection protocol.

      I however found the description of the results too succinct and I think that more could be done to unpack the mathematical results in a way that is understandable to a broader audience. Moreover, the phenomenon the authors characterize is of purely ecological nature. Here, mutations of the growth rate are, in my understanding, neither necessary (non-trivial equilibria can be maintained also when \mu =0) nor sufficient (community-level selection is necessary to keep the system far from the absorbing state) for the phenomenon described. Calling this dynamics community evolution reflects a widespread ambiguity, and is not ascribable just to this work. I find that here the authors have the opportunity to make their message clearer by focusing on the case where the 'mutation' rate \mu vanishes (Equations 39 & 40 of the SI) - which is more easily interpretable, at least in some limits - while they may leave the more general equations 3 & 4 in the SI. Combined with an analysis of the deterministic equations, that capture the possibility of maintaining high frequencies of fast growers, the authors could elucidate the dynamics that are induced by the presence of a second level of selection, and speculate on what would be the result of real open-ended evolution (not encompassed by the simple 'switch mutations' generally considered in evolutionary game theory), for instance discussing the invasibility (or not) of mutant types with slightly different growth rates.

      The single most important model hypothesis that I would have liked to be discussed further is that the two types do not interact. Species interactions are not only essential to achieve inheritance of composition in the course of evolution but are generally expected to play a key role even on ecological time scales. I hope the authors plan to look at this in future work.

    1. eLife assessment

      This important study implicates Sempharon 4a in both mice and humans as a key suppressor of psoriatic inflammation. The data are in parts incomplete in defining the precise functionally relevant cellular source and mechanism. Nonetheless, this study brings new insight into psoriasis pathogenesis and a potential new therapeutic target.

    2. Reviewer #1 (Public Review):

      Summary:

      In this study, Kume et al examined the role of the protein Semaphorin 4a in steady-state skin homeostasis and how this relates to skin changes seen in human psoriasis and imiquimod-induced psoriasis-like disease in mice. The authors found that human psoriatic skin has reduced expression of Sema4a in the epidermis. While Sema4a has been shown to drive inflammatory activation in different immune populations, this finding suggested Sema4a might be important for negatively regulating Th17 inflammation in the skin. The authors go on to show that Sema4a knockout mice have skin changes in key keratinocyte genes, increased gdT cells, and increased IL-17 similar to differences seen in non-lesional psoriatic skin, and that bone marrow chimera mice with WT immune cells and Sema4a KO stromal cells develop worse IMQ-induced psoriasis-like disease, further linking expression of Sema4a in the skin to maintaining skin homeostasis. The authors next studied downstream pathways that might mediate the homeostatic effects of Sema4a, focusing on mTOR given its known role in keratinocyte function. As with the immune phenotypes, Sema4a KO mice had increased mTOR activation in the epidermis in a similar pattern to mTOR activation noted in non-lesional psoriatic skin. The authors next targeted the mTOR pathway and showed rapamycin could reverse some of the psoriasis-like skin changes in Sema4a KO mice, confirming the role of increased mTOR in contributing to the observed skin phenotype.

      Strengths:

      The most interesting finding is the tissue-specific role for Sema4a, where it has previously been considered to play a mostly pro-inflammatory role in immune cells, this study shows that when expressed by keratinocytes, Sema4a plays a homeostatic role that when missing leads to the development of psoriasis-like skin changes. This has important implications in terms of targeting Sema4a pharmacologically. It also may yield a novel mouse model to study mechanisms of psoriasis development in mice separate from the commonly used IMQ model. The included experiments are well-controlled and executed rigorously.

      Weaknesses:

      A weakness of the study is the lack of tissue-specific Sema4a knockout mice (e.g. in keratinocytes only). The authors did use bone marrow chimeras, but only in one experiment. This work implies that psoriasis may represent a Sema4a-deficient state in the epidermal cells, while the same might not be true for immune cells. Indeed, in their analysis of non-lesional psoriasis skin, Sema4a was not significantly decreased compared to control skin, possibly due to compensatory increased Sema4a from other cell types. Unbiased RNA-seq of Sema4a KO mouse skin for comparison to non-lesional skin might identify other similarities besides mTOR signaling. Indeed, targeting mTOR with rapamycin reveres some of the skin changes in Sema4a KO mice, but not skin thickness, so other pathways impacted by Sema4a may be better targets if they could be identified. Utilizing WTKO chimeras in addition to global KO mice in the experiments in Figures 6-8 would more strongly implicate the separate role of Sema4a in skin vs immune cell populations and might more closely mimic non-lesional psoriasis skin.

    3. Reviewer #2 (Public Review):

      Summary:

      Kume et al. found for the first time that Semaphorin 4A (Sema4A) was downregulated in both mRNA and protein levels in L and NL keratinocytes of psoriasis patients compared to control keratinocytes. In peripheral blood, they found that Sema4A is not only expressed in keratinocytes but is also upregulated in hematopoietic cells such as lymphocytes and monocytes in the blood of psoriasis patients. They investigated how the down-regulation of Sema4A expression in psoriatic epidermal cells affects the immunological inflammation of psoriasis by using a psoriasis mice model in which Sema4A KO mice were treated with IMQ. Kume et al. hypothesized that down-regulation of Sema4A expression in keratinocytes might be responsible for the augmentation of psoriasis inflammation. Using bone marrow chimeric mice, Kume et al. showed that KO of Sema4A in non-hematopoietic cells was responsible for the enhanced inflammation in psoriasis. The expression of CCL20, TNF, IL-17, and mTOR was upregulated in the Sema4AKO epidermis compared to the WT epidermis, and the infiltration of IL-17-producing T cells was also enhanced.

      Strengths:

      Decreased Sema4A expression may be involved in psoriasis exacerbation through epidermal proliferation and enhanced infiltration of Th17 cells, which helps understand psoriasis immunopathogenesis.

      Weaknesses:

      The mechanism by which decreased Sema4A expression may exacerbate psoriasis is unclear as yet.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      This study investigated the role of CD47 and TSP1 in extramedullary erythropoiesis by utilization of both global CD47-/- mice and TSP1-/- mice. 

      Strengths:  

      Flow cytometry combined with spleen bulk and single-cell transcriptomics were employed. The authors found that stress-induced erythropoiesis markers were increased in CD47-/- spleen cells, particularly genes that are required for terminal erythroid differentiation. Moreover, CD47 dependent erythroid precursors population was identified by spleen scRNA sequencing. In contrast, the same cells were not detected in TSP1-/- spleen. These findings provide strong evidence to support the conclusion that the differential role of CD47 and TSP1 in extramedullary erythropoiesis in mouse spleen. 

      Weaknesses: 

      Methods and data analysis are appropriate. However, some clarifications are required. The discussion section needs to be expanded.  

      (1) The sex of mice that were used in the study is unknown.  

      (2) In the method of Single-cell RNA sequencing (page 10), it mentioned that single cell suspensions from mouse spleens were depleted of all mature hematopoietic cell lineages by passing through CD8a microbeads and CD8a+ T cell isolation Kit. As described, it is confusing what cell types are obtained for performing scRNAseq. More information is required for clarity.  

      (3) The constitutive CD47 knockout mouse model is utilized in this study. The observed accumulation of erythroid precursors in the spleens of CD47-/- mice suggests a chronic effect of CD47 on spleen function. Can the current findings be extrapolated to acute scenarios involving CD47 knockdown or loss, as this may have more direct relevance to the potential side effects associated with an-CD47-mediated cancer therapy? Please expand on this topic in the discussion section.  

      (1) The missing mouse gender information is incorporated into the revised manuscript. For flow cytometry, two male and two female mice of each genotype were used. For single cell RNA sequencing, two female and one male mouse of each genotype were used. For the bulk RNA sequencing four male cd47−/− mice and four male wildtype mice were used.

      (2) We apologize for the confusing presentation, which has been corrected. The bulk RNA sequencing analysis identified elevated expression of erythropoietic genes in CD8+ spleen cells from cd47−/− versus wildtype mice that were obtained using magnetic bead depletion of all other lineages. Therefore, we used the same Miltenyi negative selection kit as the first step to prepare the cells for single cell RNA sequencing. These untouched cells were then depleted of most mature CD8 T cells using a Miltenyi CD8a(Ly2) antibody positive selection kit. An important consideration underlying this approach was recognizing that the commercial magnetic bead depletion kits used for preparing specific immune cell types are optimized to give relatively pure populations of the intended immune cells using wildtype mice. Our previous experience studying NK cell development in the cd47−/− mice taught us that NK precursors, which are rare in wildtype mouse spleens, accumulate in cd47−/− spleens and were not removed by the antibody cocktail optimized for wildtype spleen cells (Nath et al Front Immunol 2018). The present data indicate that erythroid precursors behave similarly.

      (3) The Discussion was edited as recommended. Anemia is a prevalent side effect of several CD47 therapeutic antibodies being developed for cancer therapy. This anemia would be expected to induce erythropoiesis in bone marrow and possibly at extramedullary sites. Human spleen cells are not accessible to directly evaluate extramedullary erythropoiesis in cancer patients, but analysis of circulating erythroid precursors or liquid biopsy methods could be useful to detect induction of extramedullary erythropoiesis by these therapeutics. We are currently investigating the ability of CD47 antibodies to directly induce erythropoiesis using a human in vitro model.

      Reviewer #2 (Public Review):

      Summary: 

      The authors used existing mouse models to compare the effects of ablating the CD47 receptor and its signaling ligand Thrombospondin. The CD47-KO model used in this study was generated by Kim et al, 2018, where hemolytic anemia and splenomegaly was reported. This study analyzes the cell composition of the spleens from CD47-KO and Thsp-KO, focusing on early hematopoietic and erythroid populations. The data broadly shows that splenomegaly in the CD47-KO is largely due to an increase in committed erythroid progenitors as seen by Flow Cytometry and single-cell sequencing, whereas the Thsp-KO shows a slight depletion of committed erythroid progenitors but is otherwise similar to WT in splenic cell composition.  

      Strengths:

      The techniques used are appropriate for the study and the data support the main conclusions of the study. This study provides novel insights into a putative role of Thsp-CD47 signaling in triggering definitive erythropoiesis in the mouse spleen in response to anemic stress and constitutes a good resource for researchers seeking to understand extramedullary erythropoiesis.  

      Weaknesses:

      The Flow cytometry data alone supports the authors' main conclusion and single-cell sequencing confirms them but does not add further information, other than those already observed in the Flow data. The single-cell sequencing analysis and presentation could be improved by using alternate clustering methods as well as separating the data by genotype and displaying them in order for readers to fully grasp the nuanced differences in marker expression between the genotypes. Further, it is not clear from the authors' description of their results whether the increased splenic erythropoiesis is a direct consequence of CD47-KO or a response to the anemic stress in this mouse model. The enrichment of cKit+ Ter119+ Sca1- cells in CD47-KO indicates that these are likely stress erythroid progenitors. Another CD47-KO mouse model (Lindberg et al 1996) has no reported erythroid defects and was also not examined in this study.  

      (1) The reviewer asked, “whether the increased splenic erythropoiesis is a direct consequence of CD47-KO or a response to the anemic stress in this mouse model.” Our data supports both a direct role for CD47 and an indirect role resulting from the response to anemic stress. We cited our previous publications describing increased Sox2+ stem cells in spleens of Cd47 and Thbs1 knockout mice, but we neglected to emphasize another study where we found that bone marrow from cd47−/− mice subjected to the stress of ionizing radiation exhibited more colony forming units for erythroid (CFU-E) and burst-forming unit-erythroid (BFU-E) progenitors compared to bone marrow from irradiated wildtype mice (Maxhimer Sci Transl Med 2009). Taken together, our published data demonstrates that loss of CD47 results in an intrinsic protection of hematopoietic stem cells from genotoxic stress. This function of CD47 is thrombospondin-1-dependent and is consistent with the up-regulation of early erythroid precursors in the spleens of both knockout mice but cannot explain why the Thbs1−/−  mice have fewer committed erythroid precursors than wildtype. We cited studies that documented increased red cell turnover in cd47−/− mice but less red cell turnover in Thbs1−/−  mice compared to wildtype mice. Increased red cell clearance in cd47−/− mice is mediated by loss of the “don’t eat me” function of CD47 on red cells. In wildtype mice, clearance is augmented by thrombospondin-1 binding to the clustered CD47 on aging red cells (Wang, Aging Cell 2020). Thus, anemic stress in the mouse strains studied here decreases in the order cd47−/− > WT > Thbs−/−. This is consistent with the increased committed erythroid progenitors reported here in cd47−/− spleens and decreased committed progenitors in the Thbs1−/− spleens. 

      (2) Based on the reviewer’s question regarding alternative mechanisms and the publication of Yang et al 2022 identifying a role for CD47 in stress erythropoiesis though transfer of mitochondria to erythroblasts, we asked whether cd47-/- erythroid precursors  would show decreased mRNA expression for mitochondrial chromosome genes (new Figure 4−figure supplement 3C). Some of these mRNAs were more abundant in cd47-/- and thbs1-/- erythroid cells, which is the opposite of what we expected based on Yang 2022 but consistent with our previous publications identifying thrombospondin-1 and CD47 as negative regulators of mitochondrial homeostasis in muscle cells and T cells.

      (3) The cd47−/− mice used for the current study are the same strain as those reported by Lindberg et al in 1996, with additional backcrossing onto a C57BL/6 background.

      Recommendations For The Authors:

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses.  

      Significant efforts went into analyzing the type of erythroid progenitors by marker expression, but typical Flow cytometry strategies using Ter119 and CD44 combined with forward scatter can be used to stage the committed erythroid progenitors precisely.  

      We appreciate this suggestion to extend the flow data. However, the upcoming retirement of the PI required closing our breeding colony, and the mice are no longer available.  

      How can the difference between the erythroid phenotypes of the Lindberg et al 1996 CD47-KO (exon2 Neo knock-in) and Kim et al 2018 CD47-ko (exon1 26bp indel) be explained?  

      We are not convinced that the erythroid phenotypes of the Lindberg and Kim CD47-KO mice differ at the age used in our studies. Kim et al. focused on progressive hemolytic anemia and changes in T cells in spleen that emerge at 26 weeks age, whereas the mice used here were younger. The Lindberg and Kim mice have similar spleen enlargement at the age we used.

      Another manuscript under review from our lab suggests that cis-regulation of an adjacent colinear gene could contribute to some phenotypes observed when perturbing the Cd47 gene. The Lindberg mouse exhibits minimal perturbation of that adjacent gene, but we have no data regarding the Kim et al mouse. The reviewer’s question brought to our attention that we neglected to state in the Methods that the mice used here are the Lindberg mice, not the Kim mice. This omission is now corrected.

      The authors used Lindberg mouse for 2018 study on NK cells and observed splenomegaly. Did they check for extramedullary erythropoiesis there?  

      Retrospective examination of the RNAseq data for the spleen cells enriched in NK precursors used in our 2018 publication (Nath, 2018) reveals significantly elevated expression for a majority of the extramedullary erythroid markers listed in Table 1, but they were generally less abundant than observed for the lineage-depleted spleen cells used in the present manuscript.   

      Author response table 1.

      To clarify the stress erythropoiesis issue, it might be helpful to examine the sc-seq data for the expression of specific stress erythropoiesis markers in CD47-KO. Targets of BMP4 and Hedgehog signaling can also be examined. Further colony assays can help determine if stress BFU-Es are prevalent in the CD47-KO spleens and depleted in Thsp-KO  

      As noted in Table 1, twelve of the genes we studied are established markers of stress-induced extramedullary erythropoiesis, and most of these were included in the scRNA seq data presented. Our previous publication demonstrated that bone marrow from cd47−/− mice subjected to the stress of ionizing radiation exhibited more colony forming units for erythroid (CFU-E) and burst-forming unit-erythroid (BFU-E) progenitors compared to bone marrow from irradiated wildtype mice (Maxhimer Sci Transl Med 2009). We have not performed colony formation assays using spleen.

      To address the reviewer’s question regarding BMP4 and hedgehog signaling we performed gene set enrichment analysis for known BMP4 and hedgehog signaling signatures. Using GSE26351_UNSTIM_VS_BMP_PATHWAY_STIM_HEMATOPOIETIC_PROGENITORS, cd47-/- cells in cluster 12 or their CD34+ orCD34- subsets did not show significant enrichment for BMP4 targets compared to WT. Thbs1-/- cells in clusters 12 and 14 showed marginally significant depletion of the BMP4 signature (p=0.04 and p=0.023, respectively). Using the KEGG_HEDGEHOG_SIGNALING_PATHWAY, we did not find any significant enrichment. However, only a few genes in this pathway were detectable in the scRNAseq data. These data suggest that the BMP4 signaling may be regulated by thrombospondin-1, but properly testing this hypothesis would require achieving greater sequencing depth combined with a cell isolation method that better enriches the early hematopoietic progenitors that are known to utilize the BMP4 pathway.

      In the reclustering of erythroid progenitors in Figure 5, inclusion of Gata1 as a selection marker may help capture more of the early erythroid progenitors from the dataset and provide a more complete picture of the erythroid populations. 

      We thank the reviewer for suggesting inclusion of Gata1. We repeated the reclustering including Gata1 and found the selected cell count increased from 876 cells to 1007 cells. However, most of the increase was not in the erythroid cluster, which increased from 413 cells to 419 cells. Most of the increase represented Gata1+ T cells (548 cells including Gata1 versus 463 cells without). The revised manuscript presents genotype-dependent differential gene expression based on including Gata1 selection, but none of the specific conclusions were changed from the initial submission. The new Table 4 and Figure 7−figure supplement 1 enabled us to compare differential expression of erythropoietic genes obtained using supervised and unsupervised clustering and show that both methods yield comparable results.

      Just out of curiosity, was there an attempt to make a CD47 Thsp double KO? . Is it viable?  

      Cd47 KO mice are somewhat difficult breeders, and several previous attempts to cross with other transgenics have produced viable homozygous offspring that could not be propagated.

      Recommendations for improving the wring and presentation.  

      Perhaps readers would find it more intriguing if the paper led with the single-cell sequencing showing enrichment of erythroid populations in CD47-KO, and later confirmed with Flow Cytometry (even if this was not necessarily the order in which the experiments were done). 

      We considered this suggestion but believe that some of the flow cytometry data is needed to understand why we focused on CD34+ and CD34- subsets and proliferation markers when analyzing the scRNAseq data

      The single-cell sequencing data in Figure 3 might benefit from UMAP clustering as well. In addition, it would greatly help readers if the data points were separated by genotype and displayed after clustering. A similar analysis has been done in this paper: doi:10.1038/s41556-022-00898-9 by clustering different conditions together but displaying them separately by condition. 

      We initially explored tSNE and UMAP clustering and obtained similar results. We have added violin plots separated by genotype in Figure 4-figure supplement 2. We also included improved clusters separated by genotype in the revised Figure 3 panels C and D and for the reclustering in Figure 6D. UMAP plots provided better presentation for the reclustering (revised Figure 7). All data have been updated to the latest pipeline as noted in the Methods.

      Minor corrections to the text and figures.  

      Figure 4: Labels and plot legends are illegible in general, please relabel manually and if possible, redo plots with bigger font size and legends (relatively easy using ggplot2) 

      All figure panels were relabeled using larger fonts

      Figure 5D: Individual plots are stacked randomly atop each other and in many cases, gene names are not visible. Please restack the layers and ensure that the gene names are visible 

      Panel D was made a separate figure with enlarged labels (now Figure 7).

      Supp Fig 2: Layout can be organized a little better. Consider splitting into two figures for better organization  

      The figure was split as recommended. Now Figure 1-figure supplement 2 and Figure 2-figure supplement

      1.

      Abstract Line 10: "...mRNA expression of Kit, Ermap, and Tfrc, Induction of committed erythroid precursors is...". Replace comma after "Tfrc" with period   

      Done.

      Discussion Page 9 Line 8: "...WT spleens, s. mRNAs for some markers of committed erythroid cells including Nr3c1 mRNA...". Remove ", s" after spleens.   

      Done.

    2. eLife assessment

      This study presents a valuable finding on the cell composition in mouse spleen depleted for the CD47 receptor and its signaling ligand Thrombospondin in hematopoietic differentiation. The supporting evidence is convincing with analytical improvements on the individual contributions of the signaling components and with functional studies. This work has implications for the role of CD47/Thsp in extramedullary erythropoiesis in mouse spleen and will be of interest to medical biologists working on cell signaling, transfusion medicine, and cell therapy.

    3. Reviewer #1 (Public Review):

      Summary:

      This study investigated the role of CD47 and TSP1 in extramedullary erythropoiesis by utilization of both global CD47-/- mice and TSP1-/- mice.

      Strengths:

      Flow cytometry combined with spleen bulk and single cell transcriptomics were employed. The authors found that stress-induced erythropoiesis markers were increased in CD47-/- spleen cells, particularly genes that are required for terminal erythroid differentiation. Moreover, CD47 dependent erythroid precursors population was identified by spleen scRNA sequencing. In contrast, the same cells were not detected in TSP1-/- spleen. These findings provide strong evidence to support the conclusion that differential role of CD47 and TSP1 in extramedullary erythropoiesis in mouse spleen. Furthermore, the relevance of the current finding to the prevalent side effect (anemia) of anti-CD47 mediated cancer therapy has been discussed in the Discussion section.

    4. Reviewer #3 (Public Review):

      The authors used existing mouse models to compare the effects of ablating the CD47 receptor and its signaling ligand Thrombospondin. They analyze the cell composition of the spleens from CD47-KO and Thsp-KO using Flow Cytometry and single cell sequencing and focus mostly on early hematopoietic and erythroid populations. The data broadly shows that splenomegaly in the CD47-KO is largely due to an increase in committed erythroid progenitors, whereas the Thsp-KO shows a slight depletion of committed erythroid progenitors but is otherwise similar to WT in splenic cell composition. Thus, both their datasets supports the main conclusions of the study. One caveat of the single-cell dataset is that, insofar as the authors have explored and presented it, a clear picture of the mechanism driving extra medullary erythropoiesis in CD47-KO is lacking. This would be extremely valuable since one of the stated translational implications of this study is to assess and remedy the anemia caused by anti-CD47 therapy used in subtypes of AML. Nevertheless, this study provides novel insights into a putative role of Thsp-CD47 signaling in triggering definitive erythropoiesis in the mouse spleen in response to anemic stress and constitutes a good resource for researchers seeking to understand extramedullary erythropoiesis. This study also has generated data that will enable exploration of the possible adverse effects of using anti-CD47 therapies to treat AML.

    1. eLife assessment

      This valuable study describes a new type of NAD+ and Zn2+-independent protein lysine deacetylase in prokaryotes. These results extend the understanding of regulatory mechanisms related to bacterial lysine acetylation modifications however, the experimental evidence is incomplete and does not fully support the conclusions made. The work will be of interest to microbiologists studying metabolism and post-translational modifications.

    2. Reviewer #1 (Public Review):

      Summary:

      This study by Wang et al. identifies a new type of deacetylase, CobQ, in Aeromonas hydrophila. Notably, the identification of this deacetylase reveals a lack of homology with eukaryotic counterparts, thus underscoring its unique evolutionary trajectory within the bacterial domain.

      Strengths:

      The manuscript convincingly illustrates CobQ's deacetylase activity through robust in vitro experiments, establishing its distinctiveness from known prokaryotic deacetylases. Additionally, the authors elucidate CobQ's potential cooperation with other deacetylases in vivo to regulate bacterial cellular processes. Furthermore, the study highlights CobQ's significance in the regulation of acetylation within prokaryotic cells.

      Weaknesses:

      While the manuscript is generally well-structured, some clarification and some minor corrections are needed.

    3. Reviewer #2 (Public Review):

      In recent years, lots of researchers have tried to explore the existence of new acetyltransferase and deacetylase by using specific antibody enrichment technologies and high-resolution mass spectrometry. This study adds to this effort. The authors studied a novel Zn2+- and NAD+-independent KDAC protein, AhCobQ, in Aeromonas hydrophila. They studied the biological function of AhCobQ by using a biochemistry method and used MS identification technology to confirm it. The results extend our understanding of the regulatory mechanism of bacterial lysine acetylation modifications. However, I find their conclusion to be a little speculative, and unfortunately, it also doesn't totally support the conclusion that the authors provided. In addition, regarding the figure arrangement, lots of the supplementary figures are not mentioned, and tables are not all placed in context.

      Major concerns:

      -In the opinion of this reviewer, is a little arbitrary to come to the title "Aeromonas hydrophila CobQ is a new type of NAD+- and Zn2+-independent protein lysine deacetylase in prokaryotes." This should be modified to delete the "in the prokaryotes", unless the authors get new or more evidence in the other prokaryotes for the existence of the AhCobQ.

      -I was confused about the arrangement of the supplementary results. There are no citations for Figures S9-S19.

      -No data are included for Tables S1-S6.

      -The load control is not all integrated. All of the load controls with whole PAGE gel or whole membrane western blot results should be provided. Without these whole results, it is not convincing to come to the conclusion that the authors have.

      -The materials & methods section should be thoroughly reviewed. It is unclear to me what exactly the authors are describing in the method. All the experimental designs and protocols should be described in detail, including growth conditions, assay conditions, purification conditions, etc.

      -Relevant information should be included about the experiments performed in the figure legends, such as experimental conditions, replicates, etc. Often it is not clear what was done based on the figure legend description.

    4. Reviewer #3 (Public Review):

      Summary:

      This study reports on a novel NAD+ and Zn2+-independent protein lysine deacetylase (KDAC) in Aeromonas hydrophila, termed AhCobQ (AHA_1389). This protein is annotated as a CobQ/CobB/MinD/ParA family protein and does not show similarity with known NAD+-dependent or Zn2+-dependent KDACs. The authors show that AhCobQ has NAD+ and Zn2+-independent deacetylase activity with acetylated BSA by western blot and MS analyses. They also provide evidence that the 195-245 aa region of AhCobQ is responsible for the deacetylase activity, which is conserved in some marine prokaryotes and has no similarity with eukaryotic proteins. They identified target proteins of AhCobQ deacetylase by proteomic analysis and verified the deacetylase activity using site-specific acetyllysine-incorporated target proteins. Finally, they show that AhCobQ activates isocitrate dehydrogenase by deacetylation at K388.

      Strengths:

      The finding of a new type of KDAC has a valuable impact on the field of protein acetylation. The characters (NAD+ and Zn2+-independent deacetylase activity in an unknown domain) shown in this study are very unexpected.

      Weaknesses:

      (1) As the characters of AhCobQ are very unexpected, to convince readers, MSMS data would be needed to exactly detect deacetylation at the target site in deacetylase activity assays. The authors show the MSMS data in assays with acetylated BSA, but other assays only rely on western blot.

      (2) They prepared site-specific Kac proteins and used them in deacetylase activity assays. The incorporation of acetyllysine at the target site needs to be confirmed by MSMS and shown as supplementary data.

      (3) The authors imply that the 195-245 aa region of AhCobQ may represent a new domain responsible for deacetylase activity. The feature of the region would be of interest but is not sufficiently described in Figure 5. The amino acid sequence alignments with representative proteins with conserved residues would be informative. It would be also informative if the modeled structure predicted by AlphaFold is shown and the structural similarity with known deacetylases is discussed.

    1. eLife assessment

      This paper reports a large drug repurposing screen based on an in vitro culture platform to identify compounds that can kill Plasmodium hypnozoites. This valuable work adds to the current repertoire of anti-hypnozoites agents and uncovers targetable epigenetic pathways to enhance our understanding of this mysterious stage of the Plasmodium life cycle. The data presented here are based on solid methodology and represent a starting point for further investigation of epigenetic inhibitors to treat P. vivax infection. This paper will be of interest to Plasmodium researchers and more broadly to readers in the fields of host-pathogen interactions and drug development.

    2. Reviewer #1 (Public Review):

      Summary:

      Plasmodium vivax can persist in the liver of infected individuals in the form of dormant hypnozoites, which cause malaria relapses and are resistant to most current antimalarial drugs. This highlights the need to develop new drugs active against hypnozoites that could be used for radical cure. Here, the authors capitalize on an in vitro culture system based on primary human hepatocytes infected with P. vivax sporozoites to screen libraries of repurposed molecules and compounds acting on epigenetic pathways. They identified a number of hits, including hydrazinophthalazine analogs. They propose that some of these compounds may act on epigenetic pathways potentially involved in parasite quiescence. To provide some support to this hypothesis, they document DNA methylation of parasite DNA based on 5-methylcytosine immunostaining, mass spectrometry, and bisulfite sequencing.

      Strengths:<br /> -The drug screen itself represents a huge amount of work and, given the complexity of the experimental model, is a tour de force.<br /> -The screening was performed in two different laboratories, with a third laboratory being involved in the confirmation of some of the hits, providing strong support that the results were reproducible.<br /> -The screening of repurposing libraries is highly relevant to accelerate the development of new radical cure strategies.

      Weaknesses:

      -The manuscript is composed of two main parts, the drug screening itself and the description of DNA methylation in Plasmodium pre-erythrocytic stages. Unfortunately, these two parts are loosely connected. First, there is no evidence that the identified hits kill hypnozoites via epigenetic mechanisms. The hit compounds almost all act on schizonts in addition to hypnozoites, therefore it is unlikely that they target quiescence-specific pathways. At least one compound, colforsin, seems to selectively act on hypnozoites, but this observation still requires confirmation. Second, while the description of DNA methylation is per se interesting, its role in quiescence is not directly addressed here. Again, this is clearly not a specific feature of hypnozoites as it is also observed in P. vivax and P. cynomolgi hepatic schizonts and in P. falciparum blood stages. Therefore, the link between DNA methylation and hypnozoite formation is unclear. In addition, DNA methylation in sporozoites may not reflect epigenetic regulation occurring in the subsequent liver stages.

      -The mode of action of the hit compounds remains unknown. In particular, it is not clear whether the drugs act on the parasite or on the host cell. Merely counting host cell nuclei to evaluate the toxicity of the compounds is probably acceptable for the screen but may not be sufficient to rule out an effect on the host cell. A more thorough characterization of the toxicity of the selected hit compounds is required.

      -There is no convincing explanation for the differences observed between P. vivax and P. cynomolgi. The authors question the relevance of the simian model but the discrepancy could also be due to the P. vivax in vitro platform they used.

      -Many experiments were performed only once, not only during the screen (where most compounds were apparently tested in a single well) but also in other experiments. The quality of the data would be increased with more replication.

      -While the extended assay (12 days versus 8 days) represents an improvement of the screen, the relevance of adding inhibitors of core cytochrome activity is less clear, as under these conditions the culture system deviates from physiological conditions.

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, inhibitors of the P. vivax liver stages are identified from the Repurposing, Focused Rescue, and Accelerated Medchem (ReFRAME) library as well as a 773-member collection of epigenetic inhibitors. This study led to the discovery that epigenetics pathway inhibitors are selectively active against P. vivax and P. cynomolgi hypnozoites. Several inhibitors of histone post-translational modifications were found among the hits and genomic DNA methylation mapping revealed the modification on most genes. Experiments were completed to show that the level of methylation upstream of the gene (promoter or first exon) may impact gene expression. With the limited number of small molecules that act against hypnozoites, this work is critically important for future drug leads. Additionally, the authors gleaned biological insights from their molecules to advance the current understanding of essential molecular processes during this elusive parasite stage.

      Strengths:<br /> -This is a tremendously impactful study that assesses molecules for the ability to inhibit Plasmodium hypnozoites. The comparison of various species is especially relevant for probing biological processes and advancing drug leads.

      -The SI is wonderfully organized and includes relevant data/details. These results will inspire numerous studies beyond the current work.

    4. Reviewer #3 (Public Review):

      Although this work represents a massive screening effort to find new drugs targeting P. vivax hypnozoites, the authors should balance their statement that they identified targetable epigenetic pathways in hypnozoites.

      • They should emphasize the potential role of the host cell in the presentation of the results and the discussion, as it is known that other pathogens modify the epigenome of the host cell (i.e. toxoplasma, HIV) to prevent cell division. Also, hydrazinophtalazines target multiple pathways (notably modulation of calcium flux) and have been shown to inhibit DNA-methyl transferase 1 which is lacking in Plasmodium.

      • In a drug repurposing approach, the parasite target might also be different than the human target.

      • The authors state that host-cell apoptotic pathways are downregulated in P. vivax infected cells (p. 5 line 162). Maybe the HDAC inhibitors and DNA-methyltransferase inhibitors are reactivating these pathways, leading to parasite death, rather than targeting parasites directly.

      It would make the interpretation of the results easier if the authors used EC50 in µM rather than pEC50 in tables and main text. It is easy to calculate when it is a single-digit number but more complicated with multiple digits.

      Authors mention hypnozoite-specific effects but in most cases, compounds are as potent on hypnozoite and schizonts. They should rather use "liver stage specific" to refer to increased activity against hypnozoites and schizonts compared to the host cell. The same comment applies to line 351 when referring to MMV019721. Following the same idea, it is a bit far-fetched to call MMV019721 "specific" when the highest concentration tested for cytotoxicity is less than twice the EC50 obtained against hypnozoites and schizonts.

      Page 5 lines 187-189, the authors state "...hydrazinophtalazines were inactive when tested against P. berghei liver schizonts and P. falciparum asexual blood stages, suggesting that hypnozoite quiescence may be biologically distinct from developing schizonts". The data provided in Figure 1B show that these hydrazinophtalazines are as potent in P. vivax schizonts than in P. vivax hypnozoites, so the distinct activity seems to be Plasmodium species specific and/or host-cell specific (primary human hepatocytes rather than cell lines for P. berghei) rather than hypnozoite vs schizont specific.

      Why choose to focus on cadralazine if abandoned due to side effects? Also, why test the pharmacokinetics in monkeys? As it was a marketed drug, were no data available in humans?

      In the counterscreen mentioned on page 6, the authors should mention that the activity of poziotinib in P. berghei and P. cynomolgi is equivalent to cell toxicity, so likely not due to parasite specificity.

      To improve the clarity and flow of the manuscript, could the authors make a recapitulative table/figure for all the data obtained for poziotinib and hydrazinophtalazines in the different assays (8-days vs 12-days) and laboratory settings rather than separate tables in main and supplementary figures. Maybe also reorder the results section notably moving the 12-day assay before the DNA methylation part.

      The isobologram plot shows an additive effect rather than a synergistic effect between cadralazine and 5-azacytidine, please modify the paragraph title accordingly. Please put the same axis scale for both fractional EC50 in the isobologram graph (Figure 2A).

      Concerning the immunofluorescence detection of 5mC and 5hmC, the authors should be careful with their conclusions. The Hoechst signal of the parasites is indistinguishable because of the high signal given by the hepatocyte nuclei. The signal obtained with the anti-5hmC in hepatocyte nuclei is higher than with the anti-5mC, thus if a low signal is obtained in hypnozoites and schizonts, it might be difficult to dissociate from the background. In blood stages (Figure S18), the best to obtain a good signal is to lyse the red blood cell using saponin, before fixation and HCl treatment.

      To conclude that 5mC marks are the predominate DNA methylation mark in both P. falciparum and P. vivax, authors should also mention that they compare different stages of the life cycle, that might have different methylation levels.

      Also, the authors conclude that "[...] 5mC is present at low level in P. vivax and P. cynomolgi sporozoites and could control liver stage development and hypnozoite quiescence". Based on the data shown here, nothing, except presence the of 5mC marks, supports that DNA methylation could be implicated in liver stage development or hypnozoite quiescence.

      How many DNA-methyltransferase inhibitors were present in the epigenetic library? Out of those, none were identified as hits, maybe the hydrazinophtalazines effect is not linked to DNMT inhibition but another target pathway of these molecules like calcium transport?

      The authors state (line 344): "These results corroborate our hypothesis that epigenetic pathways regulate hypnozoites". This conclusion should be changed to "[...] that epigenetic pathways are involved in P. vivax liver stage survival" because:<br /> • The epigenetic inhibitors described here are as active on hypnozoite than liver schizonts.<br /> • Again, we cannot rule out that the host cell plays a role in this effect and that the compound may not act directly on the parasite.

      The same comment applies to the quote in lines 394 to 396. There is no proof in the results presented here that DNA methylation plays any role in the effect of hydrazinophtalazines in the anti-plasmodial activity obtained in the assay.

    1. eLife assessment

      The manuscript by Yang and coworkers presents valuable evidence that an in vitro brain blood barrier composed of endothelial cells, astrocytes, and neuroblastoma cells of human origin, would resemble better the in vivo condition. The presented results constitute solid evidence that GDNF induces the expression of VE-Cadherin and Claudin-5. Further, silencing of GDNF in the brain of mice altered brain blood barrier properties. This provides a new perspective on the interaction between neurons and endothelial cells and this model can be used to screen the permeability of the brain blood barrier to different drugs.

    2. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors established an in vitro triple co-culture BBB model and demonstrated its advantages compared with the mono or double co-culture BBB model. Further, the authors used their established in vitro BBB model and combined it with other methodologies to investigate the specific mechanism that co-culture with astrocytes but also neurons enhanced the integrity of endothelial cells.

      Strengths:

      The results persuasively showed the established triple co-culture BBB model well mimicked several important characteristics of BBB compared with the mono-culture BBB model, including better barrier function and in vivo/in vitro correlation. The human-derived immortalized cells used made the model construction process faster and more efficient, and have a better in vivo correlation without species differences. This model is expected to be a useful high-throughput evaluation tool in the development of CNS drugs.

      Based on the previous experimental results, detailed studies investigated how co-culture with neurons and astrocytes promoted claudin-5 and VE-cadherin in endothelial cells, and the specific signaling mechanisms were also studied. Interestingly, the authors found that neurons also released GDNF to promote barrier properties of brain endothelial cells, as most current research has focused on the promoting effect of astrocytes-derived GDNF on BBB. Meanwhile, the author also validated the functions of GDNF for BBB integrity in vivo by silencing GDNF in mouse brains. Overall, the experiments and data presented support their claim that, in addition to astrocytes, neurons also have a promoting effect on the barrier function of endothelial cells through GDNF secretion.

      Weaknesses:

      Although the authors demonstrated a highly usable for predicting the BBB permeability, recorded TEER measurements are still far from the human BBB in vivo reported measurements of TEER, and expression of transporters was not promoted by co-culture, which may lead to the model being unsuitable for studying drug transport mediated by transporters on BBB.

    3. Reviewer #2 (Public Review):

      Summary:

      Yang and colleagues developed a new in vitro blood-brain barrier model that is relatively simple yet outperforms previous models. By incorporating a neuroblastoma cell line, they demonstrated increased electrical resistance and decreased permeability to small molecules.

      Strengths:

      The authors initially elucidated the soluble mediator responsible for enhancing endothelial functionality, namely GDNF. Subsequently, they elucidated the mechanisms by which GDNF upregulates the expression of VE-cadherin and Claudin-5. They further validated these findings in vivo, and demonstrated predictive value for molecular permeability as well. The study is meticulously conducted and easily comprehensible. The conclusions are firmly supported by the data, and the objectives are successfully achieved. This research is poised to advance future investigations in BBB permeability, leakage, dysfunction, disease modeling, and drug delivery, particularly in high-throughput experiments. I anticipate an enthusiastic reception from the community interested in this area. While other studies have produced similar results with tri-cultures (PMID: 25630899), this study notably enhances electrical resistance compared to previous attempts.

      Weaknesses:

      Considerable effort has been directed towards developing in vitro models that more closely resemble their in vivo counterparts, utilizing stem cell-derived NVU cells. Although these examples are currently rudimentary, they offer better BBB mimicry than Yang's study.

      Additionally, some instances might benefit from more robust statistical tests; nonetheless, I do not think this would significantly alter the experimental conclusions.

      Similar experiments with tri-cultures yielding analogous results have been reported by other authors (PMID: 25630899). TEER values are a bit higher than the aforementioned experiments; however, this study has values at least one order of magnitude lower than physiological levels.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents a valuable finding on the mechanism to promote distant metastasis in breast cancer. The evidence supporting the claims of the authors is convincing. The work will be of interest to medical biologists working on breast cancer.

      Public Reviews:

      Reviewer #1 (Public Review):

      Strengths

      The paper has shown the expression of RGS10 is related to the molecular subtype, distant metastasis, and survival status of breast cancer. The study utilizes bioinformatic analyses, human tissue samples, and in vitro and in vivo experiments which strengthen the data. RGS10 was validated to inhibit EMT through a novel mechanism dependent on LCN2 and miR-539-5p, thereby reducing cancer cell proliferation, colony formation, invasion, and migration. The study elaborated the function of RGS10 in influencing the prognosis and biological behavior which could be considered as a potential drug target in breast cancer.

      Weakness

      The mechanism by which the miR-539-5p/RGS10/LCN2 axis may be related to the prognosis of cancer patients still needs to be elucidated. In addition, the sample size used is relatively limited. Especially, if further exploration of the related pathways and mechanisms of LCN2 can be carried out by using organoid models, as well as the potential of RGS10 as a biomarker for further clinical translation to verify its therapeutic target effect, which will make the data more convincing.

      Answer: Thank you for your comments and suggestions. In future research, we will utilize large clinical cohorts and organoid models to further explore relevant research mechanisms.

      Reviewer #2 (Public Review):

      Liu et al., by focusing on the regulation of G protein-signaling 10 (RGS10), reported that RGS10 expression was significantly lower in patients with breast cancer, compared with normal adjacent tissue. Genetic inhibition of RGS10 caused epithelial-mesenchymal transition, and enhanced cell proliferation, migration, and invasion, respectively. These results suggest an inhibitory role of RGS10 in tumor metastasis. Furthermore, bioinformatic analyses determined signaling cascades for RGS10-mediated breast cancer distant metastasis. More importantly, both in vitro and in vivo studies evidenced that alteration of RGS10 expression by modulating its upstream regulator miR-539-5p affects breast cancer metastasis. Altogether, these findings provide insight into the pathogenesis of breast tumors and hence identify potential therapeutic targets in breast cancer.

      The conclusions of this study are mostly well supported by data. However, there is a weakness in the study that needs to be clarified.

      In Figure 2A, although some references supported that SKBR3 and MCF-7 possess poorly aggressive and less invasive abilities, examining only RGS10 expression in those cells, it could not be concluded that 'RGS10 acts as a tumor suppressor in breast cancer'. It would be better to introduce a horizontal comparison of the invasive ability of these 3 types of cells using an invasion assay.

      Answer: Thank you for your comments and suggestions. MDA-MB-231, SKBR3, and MCF-7 originate from triple-negative breast cancer (high invasiveness), Her-2 receptor overexpression (relatively weak invasiveness), and luminal type breast cancer (relatively weak invasiveness) separately. Previous studies have demonstrated the invasive ability of these 3 types of cells. (PMID: 34390568)

      Reviewer #3 (Public Review):

      Distant metastasis is the major cause of death in patients with breast cancer. In this manuscript, Liu et al. show that RGS10 deficiency elicits distant metastasis via epithelial-mesenchymal transition in breast cancer. As a prognostic indicator of breast cancer, RGS10 regulates the progress of breast cancer and affects tumor phenotypes such as epithelial-mesenchymal transformation, invasion, and migration. The conclusions of this paper are mostly well supported by data, but some analyses need to be clarified.

      (1) Because diverse biomarkers have been identified for EMT, it is recommended to declare the advantages of using RGS10 as an EMT marker.

      Answer: Thank you for your comments. The dysregulation of RGS protein expression has been observed to be associated with various types of cancer. (PMID: 26293348). Previous studies have shown that RGS10 knocking down can lead to chemotherapy resistance of ovarian cancer cells to paclitaxel, cisplatin, and vincristine. In colorectal tumors, the transcription of RGS10 is regulated by DNA methylation and histone deacetylation. As a key regulatory factor in the G protein signaling pathway, RGS 10 is involved in tumor development including survival, polarization, adhesion, chemotaxis, and differentiation, these hints suggest RGS10 might be a marker for EMT in breast cancer.

      (2) The authors utilized databases to study the upstream regulatory mechanisms of RSG10. It is recommended to clarify why the authors focused on miRNAs rather than other epigenetic modifications.

      Answer: Thank you for your comments. miRNAs are short-chain non-coding RNA molecules that bind to the target mRNA's 3 'untranslated region (3'UTR) to cause mRNA degradation or translation inhibition, thus regulating gene expression in cells. These small molecules play a crucial role in regulating the expression of cancer-related genes and can act as tumor promoters or tumor suppressors. To further improve the molecular mechanism of malignant biological behavior of breast cancer cells with RGS10, we verified that miR-539-5p might be the upstream regulation target of RGS10 through bioinformatics prediction and in-vitro experiments.

      (3) The role of miR-539-5p in breast cancer has been described in previous studies. Hence, it is recommended to provide detailed elaboration on how miR-539-5p regulates the expression of RSG10.

      Answer: Thank you for your comments. To verify the effect of miRNA-539-5p regulating the expression of RSG10, we transfected miR-539-5p mimic, miR-539-5p mimic NC, miR-539-5p inhibitor, miR-539-5p inhibitor NC in SKBR3 cells and MDA-MB-231 cells respectively, and verified the expression of RGS10 through RT-qPCR and Western blot experiments. The results showed that compared with the transfected miR-539-5p mimic NC or wild-type SKBR3 cells, RGS10 m RNA and protein levels were significantly reduced. On the contrary, after MDA-MB-231 cells were transfected with miR-539-5p inhibitor to inhibit the expression of miR-539-5p, RGS10 mRNA and protein levels in MDA-MB-231 cells were significantly increased (Fig. 3.4A-C, Fig. 3.5A-C). This indicates that miR-539-5p can target and regulate RGS10.

      (4) To enhance the clarity and interpretability of the Western blot results, it would be advisable to mark the specific kilodalton (kDa) values of the proteins.

      Answer: Thank you for your comments and suggestions. We have corrected to mark the specific kilodalton (kDa) values of the proteins in WB.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The function of RGS10 in breast cancer was identified in the paper. However, some major issues in this paper need to be specified:

      (1) From reading the introduction section and its references, RGS proteins participate in multiple essential cellular processes and may be tumor initiators or suppressors (Li et al., 2023). This article focuses on the significance of RGS10 in breast cancer, it is recommended to show how the function of RGS10 exhibits therapeutic significance in other types of cancer.

      Answer: Thanks for your comments and suggestions on our findings. The dysregulation of RGS protein expression has been observed to be associated with various types of cancer. Especially in ovarian cancer cells. (PMID: 26293348). It has been found that the RGS10 expression is lower than that of normal ovarian cells. (PMID: 21044322). In addition, it has been found that knocking down RGS10 can enhance the vitality of ovarian cancer cells and promote chemoresistance by activating the Rheb GTP/mTOR signaling pathway. (PMID: 26319900). A study suggests that RGS10 mediates inflammation signaling regulation in SKOV-3 ovarian cancer cells with high expression of TNF and COX-2 after RGS10 knockdown. In colorectal tumors, RGS10 transcription is regulated by DNA methylation and histone deacetylation. (PMID: 35810565). RGS10 expression also are associated with poor prognosis in laryngeal cancer, hepatocellular carcinoma, and pediatric acute myeloid leukemia. (PMID: 32776811, PMID: 26516143, PMID: 30538250)

      (2) The authors characterize RGS10 protein expression in the breast cancer cell lines MDA-MB-231, MCF7, and SKBR3 in vitro Figure 2A. However, more information would strengthen the data - e.g. information on the expression of RGS10 protein and the survival in public databases, as well as the correlation between RGS10 and Her-2 expression.

      Answer: Thanks for your comments. we have checked the correlation of RGS10 expression and survival rate of Her-2 positive breast cancer patients in a public database. Although there is no significant difference in the “p” value, however, RGS10 high-expression patients have a favorable prognosis tendency than RGS10 low-expression patients after the 100th month.

      Author response image 1.

      (3) Regarding the current situation of clinical trials in the RGS family, the potential to develop RGS 10 for clinic translation is a driving factor for EMT.

      Answer: Thank you for your comments. The RGS (G protein signal transduction regulator) gene family provides an important "braking" function for the cell receptor family of G-protein coupled receptors (GPCR). GPCR controls hundreds of important functions in systemic cells and is the largest class of drug targets, with over one-third of FDA approved drugs treating diseases by binding to GPCR and altering its activity. When GPCRs are activated by hormones or neurotransmitters, they initiate signaling cascades within host cells through signal-carrying proteins called G proteins. The function of the RGS protein is to inactivate the G protein, thereby shutting down this signaling cascade reaction, which limits G protein signal transduction and allows cells to reset and receive new incoming signals. If it were not for it, the signals triggered by GPCR would inappropriately remain on, and the signal transduction would experience dysfunction (PMID: 33007266). The potential to develop RGS10 as a driving factor of EMT is meaningful for clinic translation.

      (4) In Figure 3A, the paper showed that differential gene expression revealed 70 genes were significantly upregulated in RGS10-depleted SKBR3 cells, The authors didn't show any data on the expression of other EMT-related proteins in pathway analysis.

      Answer: Thank you for your comments. The enrichment analysis of RNA sequencing in RGS10-depleted SKBR3 cells suggests that high correlation factors that are associated with EMT, such as TAGLN, TNFSF10, NDUFA4L2, CCN5, PHGDH, ST3GAL5, ANG, and LCN2.

      (5) In Figure 3B, the paper focuses on LCN2 in pathway analysis, however, the author did not elaborate on the significance of LCN2-related pathways in EMT.

      Answer: Thank you for your comments. Some studies have the significance of LCN2-related pathways in EMT. It was confirmed that LCN2 upregulation triggered by PTEN insufficiency induces EMT to promote migration and invasion in MCF7 cells (PMID: 27466505). The activation of STAT3 contributes to an increase in LCN2 expression, which activates ERK pathway-dependent EMT, thus promoting lung metastasis in MDA-MB-231 cells in breast cancer (PMID: 33473115). The silencing of LCN2 reduced the ability of migration and invasion of SUM149 cells and the proportion of tumor stem cells, suggesting that LCN2 may mediate the invasion and metastasis of cancer cells by regulating the stemness of breast cancer cells. The biological effects of LCN2 small molecule inhibitors ZINC00640089 and ZINC00784494 targeting IBC cells have been confirmed. The siRNA-mediated silencing of LCN2 in IBC cells significantly reduces cell proliferation, viability, migration, and invasion. (PMID: 34445288).

      (6) Minor: the author did not conduct a semi-quantitative analysis of the immunohistochemical results of RGS10.

      Answer: Thank you for your suggestion. We would like to demonstrate the qualitative analysis of RGS10 immunohistochemistry. The semi-quantitative analysis is not required in the paper.

      Reviewer #2 (Recommendations For The Authors):

      The role of RGS10 was well-characterized in this study, However, some minor points need to be modified.

      (1) Page 15 line 296, description of cell proliferation was missing, please modify.

      Answer: Thank you for your comments. We have corrected the description of cell proliferation on Page 15 highlighted in red.

      (2) In Figure 2C, the title of the Y-axis was missing.

      Answer: Thank you for your comments. We have corrected the description of the Y-axis title in Figure 2C.

      (3) Describe the transfection reagent that was used in this study, and incorporated into the methods section.

      Answer: Thank you for your comments. We have added the description of the transfection reagent to the methods section.

      (4) The manuscript needs proofreading.

      Answer: Thank you for your comments. We have proofread the manuscript.

    2. Reviewer #2 (Public Review):

      Liu et al., by focusing on the regulation of G protein-signaling 10 (RGS10), reported that RGS10 expression was significantly lower in patients with breast cancer, compared with normal adjacent tissue. Genetic inhibition of RGS10 caused epithelial-mesenchymal transition, and enhanced cell proliferation, migration, and invasion, respectively. These results suggest an inhibitory role of RGS10 in tumor metastasis. Furthermore, bioinformatic analyses determined signaling cascades for RGS10-mediated breast cancer distant metastasis. More importantly, both in vitro and in vivo studies evidenced that alteration of RGS10 expression by modulating its upstream regulator miR-539-5p affects breast cancer metastasis. Altogether, these findings provide insight into the pathogenesis of breast tumors and hence identify potential therapeutic targets in breast cancer.

      The conclusions of this study are mostly well supported by data.

    3. Reviewer #3 (Public Review):

      Distant metastasis is the major cause of death in patients with breast cancer. In this manuscript, Liu et al. show that RGS10 deficiency elicits distant metastasis via epithelial-mesenchymal transition in breast cancer. As a prognostic indicator of breast cancer, RGS10 regulates the progress of breast cancer and affects tumor phenotypes such as epithelial-mesenchymal transformation, invasion, and migration. The conclusions of this paper are mostly well supported by data.

    4. eLife assessment

      This valuable paper first demonstrated that RGS10 was identified as a biomarker to evaluate the prognosis of breast cancer. To prevent the loss of RGS10 theoretically provide a new strategy for the treatment of breast cancer. The evidence supporting the claims of the authors is solid, although inclusion of a larger number of patient samples and an animal model would have strengthened the study. The work will be of interest to clinicians working on breast cancer.

    5. Reviewer #1 (Public Review):

      The paper has shown the expression of RGS10 is related to the molecular subtype, distant metastasis, and survival status of breast cancer. The study utilizes bioinformatic analyses, human tissue samples, and in vitro and in vivo experiments which strengthen the data. RGS10 was validated to inhibit EMT through a novel mechanism dependent on LCN2 and miR-539-5p, thereby reducing cancer cell proliferation, colony formation, invasion, and migration. The study elaborated on the function of RGS10 in influencing the prognosis and biological behavior which could be considered as a potential drug target in breast cancer.

    1. eLife assessment

      This study investigates the role of the Cadherin Flamingo (Fmi) in cell competition in developing tissues in Drosophila melanogaster. The findings are valuable in that they show that Fmi is required in winning cells in several competitive contexts. The evidence supporting the conclusions is solid, as the authors identify Fmi as a potential new regulator of cell competition, however, they don't delve into a mechanistic understanding of how this occurs.

    2. Reviewer #1 (Public Review):

      Summary:

      This paper is focused on the role of Cadherin Flamingo (Fmi) - also called Starry night (stan) - in cell competition in developing Drosophila tissues. A primary genetic tool is monitoring tissue overgrowths caused by making clones in the eye disc that express activated Ras (RasV12) and that are depleted for the polarity gene scribble (scrib). The main system that they use is ey-flp, which makes continuous clones in the developing eye-antennal disc beginning at the earliest stages of disc development. It should be noted that RasV12, scrib-i (or lgl-i) clones only lead to tumors/overgrowths when generated by continuous clones, which presumably creates a privileged environment that insulates them from competition. Discrete (hs-flp) RasV12, lgl-i clones are in fact out-competed (PMID: 20679206), which is something to bear in mind.

      The authors show that clonal loss of Fmi by an allele or by RNAi in the RasV12, scrib-i tumors suppresses their growth in both the eye disc (continuous clones) and wing disc (discrete clones). The authors attributed this result to less killing of WT neighbors when Myc over-expressing clones lacking Fmi, but another interpretation (that Fmi regulates clonal growth) is equally as plausible with the current results. Next, the authors show that scrib-RNAi clones that are normally out-competed by WT cells prior to adult stages are present in higher numbers when WT cells are depleted for Fmi. They then examine death in RasV12, scrib-i ey-FLP clones, or in discrete hs-FLP UAS-Myc clones. They state that they see death in WT cells neighboring RasV12, scrib-i clones in the eye disc (Figures 4A-C). Next, they write that RasV12, scrib-I cells become losers (i.e., have apoptosis markers) when Fmi is removed. Neither of these results are quantified and thus are not compelling. They state that a similar result is observed for Myc over-expression clones that lack Fmi, but the image was not compelling, the results are not quantified and the controls are missing (Myc over-expressing clones alone and Fmi clones alone). They then want to test whether Myc over-expressing clones have more proliferation. They show an image of a wing disc that has many small Myc overexpressing clones with and without Fmi. The pHH3 results support their conclusion that Myc overexpressing clones have more pHH3, but I have reservations about the many clones in these panels (Figures 5L-N). They show that the cell competition roles of Fmi are not shared by another PCP component and are not due to the Cadherin domain of Fmi. The authors appear to interpret their results as Fmi is required for winner status. Overall, some of these results are potentially interesting and at least partially supported by the data, but others are not supported by the data.

      Strengths:

      Fmi has been studied for its role in planar cell polarity, and its potential role in competition is interesting.

      Weaknesses:

      (1) In the Myc over-expression experiments, the increased size of the Myc clones could be because they divide faster (but don't outcompete WT neighbors). If the authors want to conclude that the bigger size of the Myc clones is due to out-competition of WT neighbors, they should measure cell death across many discs of with these clones. They should also assess if reducing apoptosis (like using one copy of the H99 deficiency that removes hid, rpr, and grim) suppresses winner clone size. If cell death is not addressed experimentally and quantified rigorously, then their results could be explained by faster division of Myc over-expressing clones (and not death of neighbors). This could also apply to the RasV12, scrib-i results.

      (2) This same comment about Fmi affecting clone growth should be considered in the scrib RNAi clones in Figure 3.

      (3) I don't understand why the quantifications of clone areas in Figures 2D, 2H, 6D are log values. The simple ratio of GFP/RFP should be shown. Additionally, in some of the samples (e.g., fmiE59 >> Myc, only 5 discs and fmiE59 vs >Myc only 4 discs are quantified but other samples have more than 10 discs). I suggest that the authors increase the number of discs that they count in each genotype to at least 20 and then standardize this number.

      (4) There is a typo when referring to Figures 3C-D. It should be Figure 2C-D.

      (5) Figure 4 - shows examples of cell death. Cas3 is written on the figure but Dcp-1 is written in the results. Which antibody was used? The authors need to quantify these results. They also need to show that the death of cells is part of the phenotype, like an H99 deficiency, etc (see above).

      (6) It is well established that clones overexpressing Myc have increased cell death. The authors should consider this when interpreting their results.

      (7) A better characterization of discrete Fmi clones would also be helpful. I suggest inducing hs-flp clones in the eye or wing disc and then determining clone size vs twin spot size and also examining cell death etc. If such experiments have already been done and published, the authors should include a description of such work in the preprint.

      (8) We need more information about the expression pattern of Fmi. Is it expressed in all cells in imaginal discs? Are there any patterns of expression during larval and pupal development?

      (9) Overall, the paper is written for specialists who work in cell competition and is fairly difficult to follow, and I suggest re-writing the results to make it accessible to a broader audience.

    3. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Bosch et al. reveal Flamingo (Fmi), a planar cell polarity (PCP) protein, is essential for maintaining 'winner' cells in cell competition, using Drosophila imaginal epithelia as a model. They argue that tumor growth induced by scrib-RNAi and RasV12 competition is slowed by Fmi depletion. This effect is unique to Fmi, not seen with other PCP proteins. Additional cell competition models are applied to further confirm Fmi's role in 'winner' cells. The authors also show that Fmi's role in cell competition is separate from its function in PCP formation.

      Strengths:

      (1) The identification of Fmi as a potential regulator of cell competition under various conditions is interesting.

      (2) The authors demonstrate that the involvement of Fmi in cell competition is distinct from its role in planar cell polarity (PCP) development.

      Weaknesses:

      (1) The authors provide a superficial description of the related phenotypes, lacking a comprehensive mechanistic understanding. Induction of apoptosis and JNK activation are general outcomes, but it is important to determine how they are specifically induced in Fmi-depleted clones. The authors should take advantage of the power of fly genetics and conduct a series of genetic epistasis analyses.

      (2) The depletion of Fmi may not have had a significant impact on cell competition; instead, it is more likely to have solely facilitated the induction of apoptosis.

      (3) To make a solid conclusion for Figure 1, the authors should investigate whether complete removal of Fmi by a mutant allele affects tumor growth induced by expressing RasV12 and scrib RNAi throughout the eye.

      (4) The authors should test whether the expression level of Fmi (both mRNA and protein) changes during tumorigenesis and cell competition.

    4. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Bosch and colleagues describe an unexpected function of Flamingo, a core component of the planar cell polarity pathway, in cell competition in the Drosophila wing and eye disc. While Flamingo depletion has no impact on tumour growth (upon induction of Ras and depletion of Scribble throughout the eye disc), and no impact when depleted in WT cells, it specifically tunes down winner clone expansion in various genetic contexts, including the overexpression of Myc, the combination of Scribble depletion with activation of Ras in clones or the early clonal depletion of Scribble in eye disc. Flamingo depletion reduces the proliferation rate and increases the rate of apoptosis in the winner clones, hence reducing their competitiveness up to forcing their full elimination (hence becoming now "loser"). This function of Flamingo in cell competition is specific to Flamingo as it cannot be recapitulated with other components of the PCP pathway, and does not rely on the interaction of Flamingo in trans, nor on the presence of its cadherin domain. Thus, this function is likely to rely on a non-canonical function of Flamingo which may rely on downstream GPCR signaling.

      This unexpected function of Flamingo is by itself very interesting. In the framework of cell competition, these results are also important as they describe, to my knowledge, one of the only genetic conditions that specifically affect the winner cells without any impact when depleted in the loser cells. Moreover, Flamingo does not just suppress the competitive advantage of winner clones, but even turns them into putative losers. This specificity, while not clearly understood at this stage, opens a lot of exciting mechanistic questions, but also a very interesting long-term avenue for therapeutic purposes as targeting Flamingo should then affect very specifically the putative winner/oncogenic clones without any impact in WT cells.

      The data and the demonstration are very clean and compelling, with all the appropriate controls, proper quantification, and backed-up by observations in various tissues and genetic backgrounds. I don't see any weakness in the demonstration and all the points raised and claimed by the authors are all very well substantiated by the data. As such, I don't have any suggestions to reinforce the demonstration.

      While not necessary for the demonstration, documenting the subcellular localisation and levels of Flamingo in these different competition scenarios may have been relevant and provided some hints on the putative mechanism (specifically by comparing its localisation in winner and loser cells).

      Also, on a more interpretative note, the absence of the impact of Flamingo depletion on JNK activation does not exclude some interesting genetic interactions. JNK output can be very contextual (for instance depending on Hippo pathway status), and it would be interesting in the future to check if Flamingo depletion could somehow alter the effect of JNK in the winner cells and promote downstream activation of apoptosis (which might normally be suppressed). It would be interesting to check if Flamingo depletion could have an impact in other contexts involving JNK activation or upon mild activation of JNK in clones.

      Strengths:

      - A clean and compelling demonstration of the function of Flamingo in winner cells during cell competition.

      - One of the rare genetic conditions that affects very specifically winner cells without any impact on losers, and then can completely switch the outcome of competition (which opens an interesting therapeutic perspective in the long term)

      Weaknesses:

      - The mechanistic understanding obviously remains quite limited at this stage especially since the signaling does not go through the PCP pathway.