5,388 Matching Annotations
  1. Nov 2024
    1. Author response:

      Thank you for the positive and constructive feedback on our manuscript. We appreciate you highlighting the importance of our work advancing our understanding of the molecular etiology of acquired immunodeficiency syndrome (AIDS). To extend and further substantiate the observation that the CARD8 inflammasome is activated in response to viral protease during HIV-1 cell-to-cell transmission, we are in the process of completing additional experiments that are responsive to reviewer feedback, including:

      • Primary CD4+ T cell to monocyte-derived macrophage (MDM) transmission:  We have now repeated the cell-to-cell experiments with HIV-1 transfer from primary CD4+ T cells to primary monocyte-derived macrophages, and our findings are consistent with CARD8-dependent IL-1β release from HIV-1-infected macrophages in this more physiologic context. We are in the process of repeating these experiments with additional donors and will add these results to the revised manuscript.

      • Heterogeneity amongst blood donors: We have now repeated the cell-to-cell transfer and CARD8 knockout in MDMs with additional donors. While we continue to observe heterogeneity amongst donors, the key observation that CARD8 is require for inflammasome responses to HIV-1 infection is consistent. We note that some donors, including the one individual reported in the first submission, have markedly diminished CARD8 activity (to both HIV-1 and VbP).

      • Time course experiments: We did conduct a time course experiment when initially establishing these assays. We have now repeated these experiments with additional timepoints and in the presence or absence of the RT inhibitor nevirapine. The results of these experiments will be included in the revised manuscript.

      • The role of Gasdermin D: We are mostly interested in the release of IL-1β from the infected macrophages due to its potential contribution to myeloid-driven inflammation in PLWH. To date, there is no evidence that any other pore-forming protein other than GSDMD can initiate IL-1β release (and pyroptosis) downstream of CARD8. Nonetheless, we will attempt this experiment with the Gasdermin D inhibitor, disulfiram. 

      We believe these and other experiments will further support the importance of the CARD8 inflammasome in myeloid-driven inflammation in PLWH and look forward to submitting the revision.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The authors test whether the archerfish can modulate the fast response to a falling target.

      We have not tested whether archerfish can 'modulate the fast response'. We quantitatively test specific hypotheses on the rules used by the fish. For this the accuracy of the decisions is analyzed with respect to specific points that can be calculated precisely in each experiment. The ill-defined term 'modulate' does in no way capture what is done here. This assessment might explain the question, raised by the reviewer, of 'what is the difference of this study and Reinel, 2016' (i.e. Reinel and Schuster, 2016). In that study, all objects were strictly falling ballistically, and latency and accuracy of the turn decisions were determined when the initial motion was not only horizontal but had an additional vertical component of speed. The question of that study was if the need to account to an additional variable (vertical speed) in the decision would affect its latency or accuracy. The study showed that also then archerfish rapidly turn to the later impact point. It also showed that accuracy and latency (defined in this study exactly as in the present study) were not changed by the added degree of freedom. This is a completely different question and by its very nature does not leave the realm of ballistics.

      By manipulating the trajectory of the target, they claim

      that the fish can modulate the fast response.

      While it is clear from the result that the fish can modulate the fast response, the experimental support for the argument that the fish can do it for a reflex-like behavior is inadequate. 

      This is disturbing: The manuscript is full of data that directly report response latency (a parameter that's critical in all experiments) and there are even graphical displays of the distribution of latency (Figs. 2, 5). How fast the responses are, can also already be seen in the first video. Most importantly, the nature of the 40 ms limit has been discovered and has been reported by our group in 2008 (Schlegel and Schuster, 2008, Fig. 4). For easy reference, we attach Schlegel and Schuster, 2008 with the relevant passages marked in yellow. But later studies also using high speed video (ie. typically 500 fps) and simultaneously evaluating accuracy and kinematics (in the same ways as used here!) to address various questions repeatedly report and even graphically represent minimum latencies of 40 ms, e.g. Krupczynski and Schuster, 2013 (e.g. Fig. 2); Reinel and Schuster, 2014; Reinel and Schuster, 2016;  Reinel and Schuster, 2018a, b (e.g. see Fig. 7 in the first part) and report how latency is increased as urgency is decreased (if the fish are too close or time of falling is increased), as temperature is decreased or as viewing conditions and their homogeneity across the tank change. Moreover, even a field study is available (Rischawy, Blum and Schuster, 2015) that shows why the speed is needed. This is because of massive competition with at least some of the competitor fish also be able to turn to the later impact point. So, speed is an absolute necessity if competitors are around. Interestingly, when the fish are isolated, latency goes up and eventually the fish will no longer respond with C-starts (Schlegel and Schuster, 2008).

      Another aspect: considering the introduction it would not even have mattered if not 40 ms but instead 150 ms were the time needed for an accurate start (which is not the case). That would still be faster than an Olympic sprinter responds to a gun shot. Moreoever, please also note that we carefully talk of reflex-speed not of a reflex-behavior (which is as easy to verify as any other if the false statements made).

      Strengths: 

      Overall, the question that the authors raised in the manuscript is interesting. 

      Given the statement of no difference between the present study and Reinel and Schuster, 2016, it is not clear what this assessment refers to.

      Weaknesses: 

      (1) The argument that the fish can modulate reflex-like behavior relies on the claim that the archerfish makes the decision in 40 ms. There is little support for the 40 ms reaction time.

      The 'little support' is a paper in Science in which this important aspect is directly analyzed (Fig. 4 of that paper) and that has been praised by folks like Yadin Dudai (e.g . in Faculty 1000). The support is also data on latency as presented in the present paper. Furthermore, additional publications are available on the reaction time (see above).

      The reaction time for the same behavior in Schlegel 2008, is 60-70 ms, and in Tsvilling 2012 about 75 ms, if we take the half height of the maximum as the estimated reaction time in both cases. If we take the peak (or average) of the distribution as an estimation of reaction time, the reaction time is even longer. This number is critical for the analysis the authors perform since if the reaction time is longer, maybe this is not a reflex as claimed.

      See above.

      In addition, mentioning the 40 ms in the abstract is overselling the result.

      See above.

      Just for completeness: Considering a very interesting point raised by reviewer 2 we add an additional panel to further emphasize the exciting point that accuracy and latency are unrelated in the start decisions. That point was already made in Fig.4 of the paper in Science but can be directly addressed.  

      The title is also not supported by the results. 

      No: the title is clearly supported by the results that are reported in the paper.

      (2) A critical technical issue of the stimulus delivery is not clear.

      The stimulus delivery is described in detail. Most importantly we emphasize (even mentioning frame rate) that all VR setups require experimental confirmation that they work for the species and for the behavior at hand. Ideally, they should elicit the same behavior (in all aspects) as a real stimulus does that the VR approach intends to mimic. Whether VR works in a given animal and for the behavior at hand in that animal cannot be known or postulated a priori. It must be shown in direct critical experiments. Such experiments and the need to perform them are described in detail in Figure 2 and in the text that is associated with that figure.

      The frame rate is 120 FPS and the target horizontal speed can be up to 1.775 m/s. This produces a target jumping on the screen 15 mm in each frame. This is not a continuous motion. Thus, the similarity between the natural system where the target experiences ballistic trajectory and the experiment here is not clear. Ideally, another type of stimulus delivery system is needed for a project of this kind that requires fast-moving targets (e.g. Reiser, J. Neurosci.Meth. 2008).

      See above. It is quite funny that one of the authors of the present study had been involved in developing a setup with a complete panorama of 6000 LEDs (Strauss, Schuster and Götz, 1997; and appropriately cited in Reiser) that has been the basis for Reiser. This panorama was also used to successfully implement VR in freely walking Drosophila (Schuster et al., Curr. Biol., 2002). However, an LED based approach was abandoned because of insufficient spatial resolution (that, in archerfish, is very different from that of Drosophila).

      But the crucial point really is this: Just looking at Figure 2 shows that our approach could not have worked better in any way - it provided the input needed to cause turn decisions that are in all aspects just as those with real objects. Achieving this was not at all trivial and required enormous effort and many failed attempts. But it allows addressing our questions for the first time after 20 years of studying these interesting decisions.

      In addition, the screen is rectangular and not circular, so in some directions, the target vanishes earlier than others. It must produce a bias in the fish response but there is no analysis of this type. 

      Why 'must' it produce a bias? Is it not conceivable that you can only use a circular part of the screen? Briefly, and as could have been checked by quickly looking into the methods section, this is what we did. But still, why would it have mattered in our strictly randomized design? It could have mattered only in a completely silly way of running the experiments in which exclusively long trajectories are shown in one condition and exclusively short ones in another.

      (3) The results here rely on the ability to measure the error of response in the case of a virtual experiment. It is not clear how this is done since the virtual target does not fall.

      Well, of course it does not fall!!! That is the whole point that enables the study, and this is explained in connection with the glass plate experiment of Fig. 1 and quite some text is devoted to say that this is the starting point for the present analysis. The ballistic impact point is calculated (just as explained in our very first paper on the start decisions, Rossel, Corlija and Schuster, 2002) from the initial speed and height of the target, using simple high-school physics and the justification for that is also in that paper. This has been done already more than 20 years ago. How else could that paper have arrived at the conclusion that the fish turned to the virtual impact point even though nothing is falling? We also describe this for the readers of the present study, illustrate how accuracy is determined in Figures, in all videos and in an additional Supplementary Figure. Consulting the paper reveals that orientation of the fish is determined immediately at the end of stage 2 of its C-start and the error directly reports how close continuing in that direction would lead the fish to the (real or virtual) impact point. This measure has also been used since the first paper in 2002 in our lab and it is very useful because it provides an invariant measure that allows pooling all the different conditions (orientation and position of responding fish as well as direction, speed and height of target).

      How do the authors validate that the fish indeed perceives the virtual target as the falling target?

      See above. The fish produce C-starts (whose kinematics are analyzed and reported in Figures), whose latency is measured (from onset of target motion to onset of C-start) and whose accuracy in aligning them to the calculated virtual impact point is measured (see above). Additionally, the errors are also analyzed to other points of interest, for instance landmarks, the ballistic landing point in the re-trained fish or points calculated on the basis of specific hypotheses in the generalization experiments.

      Since the deflection is at a later stage of the virtual trajectory, it is not clear what is the actual physics that governs the world of the experiment.

      As explained in the text what we need is substituting the ballistic connection with another fixed relation between initial target motion and the landing point. This other relation needs to produce a large error in the aims when they remain based on the ballistic virtual landing point. It is directly shown in the key experiments that the fish need not see the deflection but can respond appropriately to the initial motion after training (Figs. 3, 5 and corresponding paragraphs in the text as well as additional movies). Please also note that after training the decision is based on the initial movement. This is shown in the interspersed experiments in which nothing than the initial (pre-deflection) movement was shown.

      Overall, the experimental setup is not well designed. 

      It is obviously designed well enough to mimic the natural situation in every aspect needed (see Fig. 2) and well enough to answer the questions we have asked.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript studies prey capture by archer fish, which observe the initial values of motion of aerial prey they made fall by spitting on them, and then rapidly turn to reach the ballistic landing point on the water surface. The question raised by the article is whether this incredibly fast decision-making process is hardwired and thus unmodifiable or can be adjusted by experience to follow a new rule, namely that the landing point is deflected from a certain amount of the expected ballistic landing point. The results show that the fish learn the new rule and use it afterward in a variety of novel situations that include height, side, and speed of the prey, and which preserve the speed of the fish's decision. Moreover, a remarkable finding presented in this work is the fact that fish that have learned to use the new rule can relearn to use the ballistic landing point for an object based on its shape (a triangle) while keeping simultaneously the 'deflected rule' for an object differing in shape (a disc); in other words, fish can master simultaneously two decision-making rules based on the different shape of objects. 

      Strengths: 

      The manuscript relies on a sophisticated and clever experimental design that allows changing the apparent landing point of a virtual prey using a virtual reality system. Several robust controls are provided to demonstrate the reliability and usefulness of the experimental setup. 

      Overall, I very much like the idea conveyed by the authors that even stimuli triggering apparently hardwired responses can be relearned in order to be associated with a different response, thus showing the impressive flexibility of circuits that are sometimes considered mediating pure reflexive responses.

      Thank you so much for this precise assessment of what we have shown!

      This is the case - as an additional example - of the main component of the Nasanov pheromone of bees (geraniol), which triggers immediate reflexive attraction and appetitive responses, and which can, nevertheless, be learned by bees in association with an electric shock so that bees end up exhibiting avoidance and the aversive response of sting extension to this odorant (1), which is a fully unnatural situation, and which shows that associative aversive learning is strong enough to override preprogrammed responding, thus reflecting an impressive behavioral flexibility. 

      That's very interesting, thanks.

      Weaknesses: 

      As a general remark, there is some information that I missed and that is mandatory in the analysis of behavioral changes. 

      Firstly, the variability in the performances displayed. The authors mentioned that the results reported come from 6 fish (which is a low sample size). How were the individual performances in terms of consistency? Were all fish equally good in adjusting/learning the new rule? How did errors vary according to individual identity? It seems to me that this kind of information should be available as the authors reported that individual fish could be recognized and tracked (see lines 620-635) and is essential for appreciating the flexibility of the system under study. 

      Secondly, the speed of the learning process is not properly explained. Admittedly, fish learn in an impressive way the new rule and even two rules simultaneously; yet, how long did they need to achieve this? In the article, Figure 2 mentions that at least 6 training stages (each defined as a block of 60 evaluated turn decisions, which actually shows that the standard term 'Training Block' would be more appropriate) were required for the fish to learn the 'deflected rule'. While this means 360 trials (turning starts), I was left with the question of how long this process lasted. How many hours, days, and weeks were needed for the fish to learn? And as mentioned above, were all fish equally fast in learning? I would appreciate explaining this very important point because learning dynamics is relevant to understanding the flexibility of the system. 

      First, it is very important to keep the question in mind that we wanted to clarify: Does the system have the potential to re-tune the decisions to other non-ballistic relations between the input variables and the output? This would have been established if one fish was found capable of doing that. However, we do have sufficient evidence to say that all six fish learned the new law and that at least one (actually four) individual was capable of simultaneously handling the two laws. We will explain this much better (hopefully) in our revised version. We also have to stress that not all archerfish might actually be able to do this and that not all archerfish might learn in the same way, at the same speed, or using the same strategies. These questions are extremely interesting and we therefore definitely will include all evidence that we have. If some individuals are better than others in quickly adjusting, then even observational learning could become a part of the story. However, we needed to make and document the first steps. Understanding these is essential and apparently is difficult enough.

      Reference: 

      (1) Roussel, E., Padie, S. & Giurfa, M. Aversive learning overcomes appetitive innate responding in honeybees. Anim Cogn 15, 135-141, doi:10.1007/s10071-011-0426-1 (2012). 

      Thanks for this reference!

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      This is an interesting manuscript tackling the issue of whether subcircuits of the cerebellum are differentially involved in processes of motor performance, learning, or learning consolidation. The authors focus on cerebellar outputs to the ventrolateral thalamus (VL) and to the centrolateral thalamus (CL), since these thalamic nuclei project to the motor cortex and striatum respectively, and thus might be expected to participate in diverse components of motor control and learning. In mice challenged with an accelerating rotarod, the investigators reduce cerebellar output either broadly, or in projection-specific populations, with CNO targeting DREADD-expressing neurons. They first establish that there are not major control deficits with the treatment regime, finding no differences in basic locomotor behavior, grid test, and fixed-speed rotarod. This is interpreted to allow them to differentiate control from learning, and their inter-relationships. These manipulations are coupled with chronic electrophysiological recordings targeted to the cerebellar nuclei (CN) to control for the efficacy of the CNO manipulation. I found the manuscript intriguing, offering much food for thought, and am confident that it will influence further work on motor learning consolidation. The issue of motor consolidation supported by the cerebellum is timely and interesting, and the claims are novel. There are some limitations to the data presentation and claims, highlighted below, which, if amended, would improve the manuscript.

      We thank the reviewer for the positive comments and insightful critics.

      (1.1) Statistical analyses: There is too little information provided about how the Deming regressions, mean points, slopes, and intercepts were compared across conditions. This is important since in the heart of the study when the effects of inactivating CL- vs VL- projecting neurons are being compared to control performance, these statistical methods become paramount. Details of these comparisons and their assumptions should be added to the Methods section. As it stands I barely see information about these tests, and only in the figure legends. I would also like the authors to describe whether there is a criterion for significance in a given correlation to be then compared to another. If I have a weak correlation for a regression model that is non-significant, I would not want to 'compare' that regression to another one since it is already a weak model. The authors should comment on the inclusion criteria for using statistics on regression models.

      Currently the Methods indeed explain that groups are compared by testing differences of distributions of residuals of treatment and control groups around the Deming regression of the control groups: “To test if treatments altered the relationship between initial performance vs learning or daily vs overnight learning, we compared the distribution of signed distance to the control Deming regression line between groups.” But this shall indeed be explained in more details.

      The performance on a given day depends on a cumulative process, so that the average measure of performance is not fully informative on what is learned or what is changed by a treatment (this is further explained in the text p9-10).The challenge is to deal with the multivariate relationships where initial performance, daily learning, and consolidated learning are interdependent. While in control groups these quantities show linear relationships, this is far less the case in treatment groups; this may indeed be due to the variability of the effect of the treatment (efficacy of viral injections) which adds up to the intrinsic variability in the absence of treatment.

      Our choice to see if there is a shift in these relationships following treatments, is to see to which extent treatment points in bivariate comparisons (initial perf x daily learning, daily learning x consolidated learning) are evenly distributed around the control group regression line. We take the presence of a significant difference in the distribution of residuals between the control and treatment group as an indication that the process represented in group is disrupted by the treatment: e.g. if the residuals of the treatment group are lower than those of the control group in the initial performance * daily learning comparison, it indicates that learning is slower (or larger). If the residuals of the treatment group are lower than those of the control group in the daily learning * consolidated learning comparison, it indicates that consolidation is lower. This shall be clarified in a revised version.

      (1.2a) The introduction makes the claim that the cerebellar feedback to the forebrain and cortex are functionally segregated. I interpreted this to mean that the cerebellar output neurons are known to project to either VL or CL exclusively (i.e. they do not collateralize). I was unaware of this knowledge and could find no support for the claim in the references provided (Proville 2014; Hintzer 2018; Bosan 2013). Either I am confused as to the authors' meaning or the claim is inaccurate. This point is broader however than some confusion about citation.

      The references are not cited in the context of collaterals: “They [basal ganglia and cerebellum] send projections back to the cortex via anatomically and functionally segregated channels, which are relayed by predominantly non-overlapping thalamic regions (Bostan, Dum et al. 2013, Proville, Spolidoro et al. 2014, Hintzen, Pelzer et al. 2018). ” Indeed, the thalamic compartments targeted by the basal ganglia and cerebellum are distinct, and in the Proville 2014, we showed some functional segregation of the cerebello-cortical projections (whisker vs orofacial ascending projections). We do not claim that there is a full segregation of the two pathways, there is indeed some known degree of collateralization (see below).

      (1.2b) The study assumes that the CN-CL population and CN-VL population are distinct cells, but to my knowledge, this has not been established. It is difficult to make sense of the data if they are entirely the same populations, unless projection topography differs, but in any event, it is critical to clarify this point: are these different cell types from the nuclei?; how has that been rigorously established?; is there overlap? No overlap? Etc. Results should be interpreted in light of the level of this knowledge of the anatomy in the mouse or rat.

      Actually, the study does not assume that CL-projecting and VAL-projecting neurons are entirely separate populations (actually it is known that there is an overlap), but states that inhibition of neurons following retrograde infections from the CL and VAL do not produce identical results.

      There is indeed a paragraph devoted to the discussion of this point (middle paragraph p20). “Interestingly, both Dentate and Interposed nuclei contain some neurons with collaterals in both VAL and CL thalamic structures (Aumann and Horne 1996, Sakayori, Kato et al. 2019), suggesting that the effect on learning could be mediated by a combined action on the learning process in the striatum (via the CL thalamus) and in the cortex (via the VAL thalamus). However, consistent with (Sakayori, Kato et al. 2019), we found that the manipulations of cerebellar neurons retrogradely targeted either from the CL or from the VAL produced different effects in the task. This indicates that either the distinct functional roles of VAL-projecting of CL-projecting neurons reported in our study is carried by a subset of pathway-specific neurons without collaterals, or that our retrograde infections in VAL and CL preferentially targeted different cerebello-thalamic populations even if these populations had axon terminals in both thalamic regions.”. In other words, we actually know from the literature that there is a degree of collateralization (CN neurons projecting to both VAL and CL, see refs cited above), but as the reviewer says, it does not seem logically possible that the exact same population would have different effects, which are very distinct during the first learning days. The only possible explanation is the CN-CL and CN-VAL retrograde infections recruit somewhat different populations of neurons. This could be due to differences in density of collaterals in CL and VAL of neurons with collaterals in both regions, or presence of CL-projecting neurons without collaterals in VAL, and VAL-projecting neurons without collaterals in CL in addition to the (established) population of neurons with collaterals in both regions. The lesional approach of CN-thalamus neurons in Sakayori et al. 2019 also observed separate effects for CL and VL injections consistent with the differential recruitment of CN populations by retrograde infections.

      This should be improved in a revised version of the manuscript.

      (1.3) It is commendable that the authors perform electrophysiology to validate DREADD/CNO. So many investigators don't bother and I really appreciate these data. Would the authors please show the 'wash' in Figure 1a, so that we can see the recovery of the spiking hash after CNO is cleared from the system? This would provide confidence that the signal is not disappearing for reasons of electrode instability or tissue damage/ other.

      We do not have the wash data on the same day, but there is no significant change in the baseline firing rate across recording days.

      (1.4) I don't think that the "Learning" and "Maintenance" terminology is very helpful and in fact may sow confusion. I would recommend that the authors use a day range " Days 1-3 vs 4-7" or similar, to refer to these epochs. The terminology chosen begs for careful validation, definitions, etc, and seems like it is unlikely uniform across all animals, thus it seems more appropriate to just report it straight, defining the epochs by day. Such original terminology could still be used in the Discussion, with appropriate caveats.

      This shall be indeed corrected in a revised version.

      (1.5) Minor, but, on the top of page 14 in the Results, the text states, "Suggesting the presence of a 'critical period' in the consolidation of the task". I think this is a non-standard use of 'critical period' and should be removed. If kept, the authors must define what they mean specifically and provide sufficient additional analyses to support the idea. As it stands, the point will sow confusion.

      This shall be indeed corrected in a revised version

      Reviewer #2 (Public review):

      Summary:

      This study examines the contribution of cerebello-thalamic pathways to motor skill learning and consolidation in an accelerating rotarod task. The authors use chemogenetic silencing to manipulate the activity of cerebellar nuclei neurons projecting to two thalamic subregions that target the motor cortex and striatum. By silencing these pathways during different phases of task acquisition (during the task vs after the task), the authors report valuable findings of the involvement of these cerebellar pathways in learning and consolidation.

      Strengths:

      The experiments are well-executed. The authors perform multiple controls and careful analysis to solidly rule out any gross motor deficits caused by their cerebellar nuclei manipulation. The finding that cerebellar projections to the thalamus are required for learning and execution of the accelerating rotarod task adds to a growing body of literature on the interactions between the cerebellum, motor cortex, and basal ganglia during motor learning. The finding that silencing the cerebellar nuclei after a task impairs the consolidation of the learned skill is interesting.

      We thank the reviewer for the positive comments and insightful critics below.

      Weaknesses:

      (2.1) While the controls for a lack of gross motor deficit are solid, the data seem to show some motor execution deficit when cerebellar nuclei are silenced during task performance. This deficit could potentially impact learning when cerebellar nuclei are silenced during task acquisition.

      One of our key controls are the tests of the treatment on fixed speed rotarod, which provides the closest conditions to the ones found in the accelerating rotarod (the main difference between the protocols being the slow steady acceleration of rod rotation [+0.12 rpm per s]- in the accelerating version).

      In the CN experiments, we found clear deficits in learning and consolidation while there was no effect on the fixed speed rotarod (performance of the DREAD-CNO are even slightly better than some control groups), consistent with a separation of the effect on learning/consolidation from those on locomotion on a rotarod. However, small but measurable deficits are found at the highest speed in the fixed speed rotarod in the CN-VAL group; there was no significant effect in the CN-CL group, while the CN-CL actually shows lower performances from the second day of learning; we believe this supports our claim that the CN-CL inhibition impacted more the learning process than the motor coordination. In contrast the CN-VAL group only showed significantly lower performance on day 4 of the accelerating rotarod consistent with intact learning abilities. Of note, under CNO, CN-VAL mice could stay for more than a minute and half at 20rpm, while on average they fell from the accelerating rotarod as soon as the rotarod reached the speed of ~19rpm (130s).

      The text currently states “The inhibition of CN-VAL neurons during the task also yielded lower levels of performance in the Maintenance stage,[[NB: day 5-7]] suggesting that these neurons contribute also to learning and retrieval of motor skills, although the mild defect in fixed speed rotarod could indicate the presence of a locomotor deficit, only visible at high speed.” Following the reviewers’ comment, we shall however revise the sentence above in the revised version of the MS to say that we cannot fully disambiguate the execution / learning-retrieval effect at high speed for these mice.

      (2.2a) Separately, I find the support for two separate cerebello-thalamic pathways incomplete. The data presented do not clearly show the two pathways are anatomically parallel.

      As explained above (point 1.2a), it is already known that these pathways overlap to some degree (discussion p 20), but yet their targeting differentially affects the behavior, consistent with separate contributions. A similar finding was observed for a lesional (irreversible) approach in Sakayori et al. 2019.

      (2.2b) The difference in behavioral deficits caused by manipulating these pathways also appears subtle.

      While we agree that after 3-4 days of learning the difference of performance between the groups becomes elusive, we respectfully disagree with the reviewer that in the early stages these differences are negligible and the impact of inhibition on "learning rate" (ie. amount of learning for a given daily initial performance) and consolidation (i.e. overnight retention of daily gain of performance) exhibit different profiles for the two groups (fig 3h vs 3k).

      Reviewer #3 (Public review)

      Summary:

      Varani et al present important findings regarding the role of distinct cerebellothalamic connections in motor learning and performance. Their key findings are that:

      (1) cerebellothalamic connections are important for learning motor skills

      (2) cerebellar efferents specifically to the central lateral (CL) thalamus are important for short-term learning

      (3) cerebellar efferents specifically to the ventral anterior lateral (VAL) complex are important for offline consolidation of learned skills, and

      (4) that once a skill is acquired, cerebellothalamic connections become important for online task performance.

      The authors went to great lengths to separate effects on motor performance from learning, for the most part successfully. While one could argue about some of the specifics, there is little doubt that the CN-CL and CN-VAL pathways play distinct roles in motor learning and performance. An important next step will be to dissect the downstream mechanisms by which these cerebellothalamic pathways mediate motor learning and adaptation.

      Strengths:

      (1) The dissociation between online learning through CN-CL and offline consolidation through CN-VAL is convincing.

      (2) The ability to tease learning apart from performance using their titrated chemogenetic approach is impressive. In particular, their use of multiple motor assays to demonstrate preserved motor function and balance is an important control.

      (3) The evidence supporting the main claims is convincing, with multiple replications of the findings and appropriate controls.

      We thank the reviewer for the positive comments and insightful critics below.

      Weaknesses:

      (3.1) Despite the care the authors took to demonstrate that their chemogenetic approach does not impair online performance, there is a trend towards impaired rotarod performance at higher speeds in Supplementary Figure 4f, suggesting that there could be subtle changes in motor performance below the level of detection of their assays.

      This is also discussed in point 2.1 above. In our view, the fixed speed rotarod is a control very close to the accelerating rotarod condition, with very similar requirements between the two tasks (yet unfortunately rarely tested in accelerating rotarod studies). We do not exclude the presence of motor deficits, but the main argument is that these do not suffice to explain the differences observed in the accelerating rotarod. No detectable deficit was found in the CN group while very clear deficits in learning/consolidation were observed. A mild deficit is only significant in the CN-VAL group, while the deficit is not significant in the fixed-speed rotarod for the CN-CL group which shows the strongest deficit in accelerating rotarod during the first days: e.g. on day 2, the CN-CL group is already below the control group with latencies to fall ~100s (corresponding to immediate fall at ~15rpm) while the fixed speed rotarod performances at 15s of the control and CNO-treated groups show an ability to stay more than 1 min at this speed. The text shall be improved to clarify this point.

      (3.2) There is likely some overlap between CN neurons projecting to VAL and CL, somewhat limiting the specificity of their conclusions.

      There is indeed published evidence for some degree of anatomical overlap, but also for some differential contribution of CN-VAL and CN-CL to the task. The answer to this point is developed in the points 1.2a 2.2a above. Although this point was exposed in the discussion (p20), the text shall be improved in a revised version of the MS to clarify our statement.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors successfully detected distinct mechanisms signalling prediction violations in the auditory cortex of mice. For this purpose, an auditory pure-tone local-global paradigm was presented to awake and anaesthetised mice. In awake rodents, the authors also evaluated interneuron cell types involved in responses to the interruption of the regularity imposed by local-global sequences. By performing two-photon calcium imaging and single-unit electrophysiology, the authors disentangled three phenomena underlying responses to violations of the distinct local-global regularity levels: Stimulus-specific adaptation, surprise and surprise adaptation. Both stimulus-specific adaptation and surprise-or deviant-evoked responses are observable under anaesthesia. Altogether, this work advances our understanding of distinct predictive processes computing prediction violations upon the complexity of the regularity imposed by the auditory sequence.

      Strengths:

      it is an elegant study beautifully executed.

      Weaknesses:

      No weaknesses were identified by this reviewer.

      Reviewer #2 (Public review):

      Summary:

      Oddball responses are increases in sensory responses when a stimulus is encountered in an unexpected location in a sequence of predictable stimuli. There are two computational interpretations for these responses: stimulus-specific adaptation and prediction errors. In recent years, evidence has accumulated that a significant part of these sequence violation responses cannot be explained simply by stimulus-specific adaptation. The current work elegantly adds to this evidence by using a sequence paradigm based on two levels of sequence violations: "Local" sequence violations of repetitions of identical stimuli, and "global" sequence violations of stimulus sequence patterns. The authors demonstrate that both local and global sequence violation responses are found in L2/3 neurons of the mouse auditory cortex. Using sequences with different inter-stimulus intervals, they further demonstrate that these sequence violation responses cannot be explained by stimulus-specific adaption.

      Strengths:

      The work is based on a very clever use of a sequence violation paradigm (local-global paradigm) and provides convincing evidence for the interpretation that there are at least two types of sequence violation responses and that these cannot be explained by stimulus-specific adaption. Most of the conclusions are based on a large dataset, and are compelling.

      Weaknesses:

      The final part of the paper focuses on the responses of VIP and PV-positive interneurons. The responses of VIP interneurons appear somewhat variable and difficult to interpret (e.g. VIP neurons exhibit omission responses in the A block, but not the B block). The conclusions based on these data appear less solid.

      We agree with the referee that the response modulations observed in  VIP and PV-Positive interneurons are weak and variable. This is indicated in the manuscript. Probably, larger scale recordings are necessary to ascertain fully the presence and distribution of omission responses.

      Reviewer #3 (Public review):

      Summary:

      In their manuscript entitled "Parallel mechanisms signal a hierarchy of sequence structure violations in the auditory cortex", Jamali et al. provide evidence for cellular-level mechanisms in the auditory cortex of mice for the encoding of predictive information on different temporal and contextual scales. The study design separates more clearly than previous studies between the effects of local and global deviants and separates their respective effects on the neuronal responses clearly through the use of various contextual conditions and short and long time scales. Further, it identifies a contribution from a small set of VIP interneurons to the detection of omitted sounds, and shows the influence of isofluorane anesthesia on the neural responses.

      Strengths:

      (1) The study provides a rather encompassing set of experimental techniques to study the cellular level responses, using two complementary recording techniques in the same animal and similar cortical location.

      (2) Comparison between awake and anesthetized states are conducted in the same animals, which allows for rather a direct comparison of populations under different conditions, thus reducing sampling variability.

      (3) The set of paradigms is well developed and specifically chosen to provide appropriate and meaningful controls/comparisons, which were missing from previous studies.

      (4) The addition of cell-type specific recordings is valuable and in particular in combination with the contrast of awake and anesthetized animals provides novel insights into the cellular level representation of deviant signals, such as surprise, prediction error, and general adaptation.

      (5) The analysis and presentation of the data are clear and quite complete, yet remain succinct and perform insightful contrasts.

      (6) The study will have an impact on multiple levels, as it introduces important variations in the paradigm and analytical contrasts that both human and animal researchers can pick up and improve their studies. The cell-type-specific results are particularly intriguing, although these would likely require replication before being completely reliable. Further, the study provides a substantial and diverse dataset that others can explore.

      Weaknesses:

      (1) The responses from cells recorded via Neuropixel and 2p differ qualitatively, as noted by the authors, with NP-recorded cells showing much more inhibited/reduced responses between acoustic stimulations. The authors briefly qualify these differences as potentially indicating a sampling issue, however, this matter deserves more detailed consideration in my opinion. Specifically, the authors could try to compare the different depths at which these neurons were sampled or relate the locations in the cortex to each other (as the Neuropixel recordings were collected in the same animals, a subset of the 2p recordings could be compared to the Neuropixel recordings.).

      We agree with the referee that the sampling issue, which we propose as a possible explanation for the large difference between our Neuropixel and 2P imaging recordings, must be investigated more thoroughly. This is, however, largely outside of the scope of this study. We have reported the depth and location of Neuropixel recordings in Figure S2. The depth is similar for both techniques covering mostly layers 2, 3 and 4. The location spans mostly the primary auditory cortex for two photon imaging and Neuropixel recordings. One Neuropixel recording is located in the ventral secondary auditory cortex. We could not find any evidence that the response to global violations in Neuropixel data stems specifically from this particular recording. 

      (2) The current study did not monitor the attentional state of the mouse in relation to the stimulus by either including a behavioral component or pupil monitoring, which could influence the neural responses to deviant stimuli and omissions.

      As reported by Bekinschtein et al. 2009, the attentional state influences responses to global violation in human subjects. It is extremely difficult to precisely compare attentional states in mice and human subjects. We have performed recordings in mice that had to attend to sound to detect a white noise sound in between the sequence to obtain a reward. This did not lead to increased global violation response. However, as the sequence themselves did not predict reward in this context they may divert attention. Therefore, this result is inconclusive and not worth including in our manuscript. If the sequence predicts rewards, there is a potential confound between violation responses and reward expectations or motor preparation signals. Pupil monitoring could be an alternative which we did not investigate.

      (3) Given the complexity and variety of the paradigms, conditions, and analyzed cell-types, the manuscript could profit from a more visual summary figure that provides an easy-to-access overview of what was found.

      This is an excellent suggestion, although given the complexity and diversity of our observations it may be hard to fit everything in one understandable figure.

    1. Author response:

      We appreciate the insightful comments and suggestions, which will significantly improve our work. We will revise the manuscript to address the reviewer’s concerns. Here, we list some of the key aspects of those concerns and our preliminary plans to address them.

      Both reviewers pointed out that we did not sufficiently justify the chosen optogenetic stimulation frequencies. We acknowledge and concur with their assessment, and will discuss it more extensively from a biological perspective (e.g., the neural firing rates in the olfactory bulb, OB, anterior olfactory nucleus, AON, and piriform cortex, Pir, under natural odor stimulation and respiration rhythm). Reviewer #1 suggested using beta values (b) rather than the area under the BOLD signal profile (AUC) to quantify the fMRI activations as they are more conventional for general linear model (GLM) analysis. We are aware of b and have used them for quantification of the amplitude of fMRI activations in our previous rodent fMRI studies1-3. However, in this study, we chose to utilize AUC as it offers a more comprehensive measure of BOLD signal change over time, including shape, duration, and magnitude, thereby capturing the bulk of neural activities and their dynamics throughout the stimulation period. b primarily represents the peak amplitude of BOLD responses (i.e., the % BOLD signal change)4 and can be constrained by the assumptions and limitations of the GLM analysis, such as the shape of the hemodynamic response function (HRF). AUC provides greater flexibility in capturing different aspects of neural responses across various brain regions, such as transient peaks and sustained responses.

      As mentioned by reviewer #1, correlating the adaptation of BOLD and electrophysiology signals at the brain region level would better signify our findings. We will pursue additional analysis to address this in our forthcoming responses. Reviewer #2 would like us to clarify the image and signal quality of our echo planar imaging (EPI)-based fMRI data, especially in the regions close to the air-tissue interface such as OB, Pir, entorhinal cortex and amygdala, and the methodology for some of the experimental protocols implemented in our study. We will show the raw EPI fMRI images from a representative animal and revise the results, discussion, and methods sections of the manuscript to address reviewer #2's concerns.

      In our forthcoming detailed responses to the reviewers' comments and recommendations, we will revise the text, figures, and captions accordingly to address and clarify the questions brought up by both reviewers.

      References

      (1) Gao, P.P., Zhang, J.W., Chan, R.W., Leong, A.T.L. & Wu, E.X. BOLD fMRI study of ultrahigh frequency encoding in the inferior colliculus. Neuroimage 114, 427-437 (2015).

      (2) Leong, A.T.L., Wong, E.C., Wang, X. & Wu, E.X. Hippocampus Modulates Vocalizations Responses at Early Auditory Centers. Neuroimage 270, 119943 (2023).

      (3) Gao, P.P., Zhang, J.W., Fan, S.J., Sanes, D.H. & Wu, E.X. Auditory midbrain processing is differentially modulated by auditory and visual cortices: An auditory fMRI study. Neuroimage 123, 22-32 (2015).

      (4) Goddard, E. & Mullen, K.T. fMRI representational similarity analysis reveals graded preferences for chromatic and achromatic stimulus contrast across human visual cortex. Neuroimage 215, 116780 (2020).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The emergence of Drosophila EM connectomes has revealed numerous neurons within the associative learning circuit. However, these neurons are inaccessible for functional assessment or genetic manipulation in the absence of cell-type-specific drivers. Addressing this knowledge gap, Shuai et al. have screened over 4000 split-GAL4 drivers and correlated them with identified neuron types from the "Hemibrain" EM connectome by matching light microscopy images to neuronal shapes defined by EM. They successfully generated over 800 split-GAL4 drivers and 22 split-LexA drivers covering a substantial number of neuron types across layers of the mushroom body associative learning circuit. They provide new labeling tools for olfactory and non-olfactory sensory inputs to the mushroom body; interneurons connected with dopaminergic neurons and/or mushroom body output neurons; potential reinforcement sensory neurons; and expanded coverage of intrinsic mushroom body neurons. Furthermore, the authors have optimized the GR64f-GAL4 driver into a sugar sensory neuron-specific split-GAL4 driver and functionally validated it as providing a robust optogenetic substitute for sugar reward. Additionally, a driver for putative nociceptive ascending neurons, potentially serving as optogenetic negative reinforcement, is characterized by optogenetic avoidance behavior. The authors also use their very large dataset of neuronal anatomies, covering many example neurons from many brains, to identify neuron instances with atypical morphology. They find many examples of mushroom body neurons with altered neuronal numbers or mistargeting of dendrites or axons and estimate that 1-3% of neurons in each brain may have anatomic peculiarities or malformations. Significantly, the study systematically assesses the individualized existence of MBON08 for the first time. This neuron is a variant shape that sometimes occurs instead of one of two copies of MBON09, and this variation is more common than that in other neuronal classes: 75% of hemispheres have two MBON09's, and 25% have one MBON09 and one MBON08. These newly developed drivers not only expand the repertoire for genetic manipulation of mushroom body-related neurons but also empower researchers to investigate the functions of circuit motifs identified from the connectomes. The authors generously make these flies available to the public. In the foreseeable future, the tools generated in this study will allow important advances in the understanding of learning and memory in Drosophila.

      Strengths:

      (1) After decades of dedicated research on the mushroom body, a consensus has been established that the release of dopamine from DANs modulates the weights of connections between KCs and MBONs. This process updates the association between sensory information and behavioral responses. However, understanding how the unconditioned stimulus is conveyed from sensory neurons to DANs, and the interactions of MBON outputs with innate responses to sensory context remains less clear due to the developmental and anatomic diversity of MBONs and DANs. Additionally, the recurrent connections between MBONs and DANs are reported to be critical for learning. The characterization of split-GAL4 drivers for 30 major interneurons connected with DANs and/or MBONs in this study will significantly contribute to our understanding of recurrent connections in mushroom body function.

      (2) Optogenetic substitutes for real unconditioned stimuli (such as sugar taste or electric shock) are sometimes easier to implement in behavioral assays due to the spatial and temporal specificity with which optogenetic activation can be induced. GR64f-GAL4 has been widely used in the field to activate sugar sensory neurons and mimic sugar reward. However, the authors demonstrate that GR64f-GAL4 drives expression in other neurons not necessary for sugar reward, and the potential activation of these neurons could introduce confounds into training, impairing training efficiency. To address this issue, the authors have elaborated on a series of intersectional drivers with GR64f-GAL4 to dissect subsets of labeled neurons. This approach successfully identified a more specific sugar sensory neuron driver, SS87269, which consistently exhibited optimal training performance and triggered ethologically relevant local searching behaviors. This newly characterized line could serve as an optimized optogenetic tool for sugar reward in future studies.

      (3) MBON08 was first reported by Aso et al. 2014, exhibiting dendritic arborization into both ipsilateral and contralateral γ3 compartments. However, this neuron could not be identified in the previously published Drosophila brain connectomes. In the present study, the existence of MBON08 is confirmed, occurring in one hemisphere of 35% of imaged flies. In brains where MBON08 is present, its dendrite arborization disjointly shares contralateral γ3 compartments with MBON09. This remarkable phenotype potentially serves as a valuable resource for understanding the stochasticity of neurodevelopment and the molecular mechanisms underlying mushroom body lobe compartment formation.

      Weaknesses:

      There are some minor weaknesses in the paper that can be clarified:

      (1) In Figure 8, the authors trained flies with a 20s, weak optogenetic conditioning first, followed by a 60s, strong optogenetic conditioning. The rationale for using this training paradigm is not explicitly provided.

      These experiments were designed to test if flies could maintain consistent performance with repetitive and intense LED activation, which is essential for experiments involving long training protocols or coactivation of other neurons inside a brain.

      In Figure 8E, if data for training with GR64f-GAL4 using the same paradigm is available, it would be beneficial for readers to compare the learning performance using newly generated split-GAL4 lines with the original GR64f-GAL4, which has been used in many previous research studies. It is noteworthy that in previously published work, repeating training test sessions typically leads to an increase in learning performance in discrimination assays. However, this augmentation is not observed in any of the split-GAL4 lines presented in Figure 8E. The authors may need to discuss possible reasons for this.

      As the reviewer pointed out, many previous studies including ours used the original Gr64f-GAL4 in olfactory conditioning. Figure 1H of Yamada et al., 2023 (https://doi.org/10.7554/eLife.79042) showed such a result, where the first and second-order olfactory conditioning were assayed. Indeed, the first-order conditioning scores were gradually augmented over repeated training. In this experiment, we used low red LED intensity for the optogenetic activation. In the Figure 8E of the present paper, the first memory test was after 3x pairing of 20s odor with five 1s red LED without intermediate tests. Therefore, flies were already sufficiently trained to show a plateau memory level in “Test1”. In the revision of another recent report (Figure 1C-F of Aso et al., 2023; https://doi.org/10.7554/eLife.85756), we included the learning curve data of our best Gr64f-split-GAL4, SS87269. Under a less saturated training conditioning, SS87269 did show learning augmentation over repeated training.

      (2) In line 327, the authors state that in all samples, the β'1 compartment is arborized by MBON09. However, in Figure 11J, the probability of having at least one β'1 compartment not arborized is inferred to be 2%. The authors should address and clarify this conflict in the text to avoid misunderstanding.

      The chance of visualizing MBON08 in MCFO images was 21/209 in total (Figure 11I). If we assume that each of four cells adopt MBON08 development fate at this chance, we can calculate the probability for each case of MBON08/09 cell type composition. From this calculation, we inferred approximately 2% of flies would lack innervations to β'1 compartment in at least one hemisphere. However, we didn't observe a lack of β'1 arborizations in 169 sample flies. If these MBONs independently develop into MBON08 at 21/209 odds, the chance of never observing two MBON08s in either hemisphere of all 169 samples is 3.29%. Therefore, some developmental mechanisms may prevent the emergence of two MBON08 in the same hemisphere.

      In the revised manuscript, we displayed these estimated probability for each case separately, and annotated actual observation on the right side.

      (3) In general, are the samples presented male or female? This sample metadata will be shown when the images are deposited in FlyLight, but it would be useful in the context of this manuscript to describe in the methods whether animals are all one sex or mixed sex, and in some example images (e.g. mAL3A) to note whether the sample is male or female.

      The samples presented in this study are mixed sex, except for Figure 11I, where genders are specified. We provided metadata information of the presented images in Supplemental File 7, and we added a paragraph in the in the method section:

      “Most samples were collected from females, though typically at least one male fly was examined for each driver line. While we noticed certain lines such as SS48900, exhibited distinct expression patterns in females and males, we did not particularly focus on sexual dimorphism, which is analyzed elsewhere (Meissner et al. 2024). Therefore, unless stated otherwise, the presented samples are of mixed gender.

      Detailed metadata, including gender information and the reporter used, can be found in Supplementary File 7.”

      Reviewer #2 (Public Review):

      Summary:

      The article by Shuai et al. describes a comprehensive collection of over 800 split-GAL4 and split-LexA drivers, covering approximately 300 cell types in Drosophila, aimed at advancing the understanding of associative learning. The mushroom body (MB) in the insect brain is central to associative learning, with Kenyon cells (KCs) as primary intrinsic neurons and dopaminergic neurons (DANs) and MB output neurons (MBONs) forming compartmental zones for memory storage and behavior modulation. This study focuses on characterizing sensory input as well as direct upstream connections to the MB both anatomically and, to some extent, behaviorally. Genetic access to specific, sparsely expressed cell types is crucial for investigating the impact of single cells on computational and functional aspects within the circuitry. As such, this new and extensive collection significantly extends the range of targeted cell types related to the MB and will be an outstanding resource to elucidate MB-related processes in the future.

      Strengths:

      The work by Shuai et al. provides novel and essential resources to study MB-related processes and beyond. The resulting tools are publicly available and, together with the linked information, will be foundational for many future studies. The importance and impact of this tool development approach, along with previous ones, for the field cannot be overstated. One of many interesting aspects arises from the anatomical analysis of cell types that are less stereotypical across flies. These discoveries might open new avenues for future investigations into how such asymmetry and individuality arise from development and other factors, and how it impacts the computations performed by the circuitry that contains these elements.

      Weaknesses:

      Providing such an array of tools leaves little to complain about. However, despite the comprehensive genetic access to diverse sensory pathways and MB-connected cell types, the manuscript could be improved by discussing its limitations. For example, the projection neurons from the visual system seem to be underrepresented in the tools produced (or almost absent). A discussion of these omissions could help prevent misunderstandings.

      We internally distributed efforts to produce split-GAL4 lines at Janelia Research Campus. The recent preprint (Nern et al., 2024; doi: https://doi.org/10.1101/2024.04.16.589741) described the full collection of split-GAL4 driver lines in the optic lobe including the visual projection neurons to the mushroom body. We cited this preprint in the revised manuscript by adding a short paragraph of discussion.

      “Although less abundant than the olfactory input, the MB also receives visual information from the visual projection neurons (VPNs) that originate in the medulla and lobula and are targeted to the accessory calyx (Vogt et al. 2016; Li et al. 2020). A recent preprint described the full collection of split-GAL4 driver lines in the optic lobe, which includes the VPNs to the MB (Nern et al. 2024).”

      Additionally, more details on the screening process, particularly the selection of candidate split halves and stable split-GAL4 lines, would provide valuable insights into the methodology and the collection's completeness.

      The details of our split-GAL4 design and screening procedures were described in previous studies (Aso et al., 2014; Dolan et al., 2019). Available data and tools to design split-GAL4 changed over time, and we took different approaches accordingly. Many of split-GAL4 lines presented in this study were designed and screened in parallel to the lines for MBONs and DANs in 2010-2014 when MCFO images of GAL4 drivers and EM connectome were not yet available. With knowledge of where MBONs and DANs project, I (Y.A.) manually examined and annotated thousands of confocal stacks (Jenett et al., 2012; https://doi.org/10.1016/j.celrep.2012.09.011) to find candidate cell types that may concat with them.

      Later I used more advanced computational tools (Otsuna et al., 2018; doi: https://doi.org/10.1101/318006) and MCFO images aligned to the standard brain volume (Meissner et al., 2023; DOI: 10.7554/eLife.80660.). Now, if one needs to further generate split-GAL4 lines for cell type identified in EM connectome data, neuron bridge website (https://neuronbridge.janelia.org/) can be very helpful to provide a list of GAL4 drivers that may label the neuron of interest.

      Reviewer #3 (Public Review):

      Summary:

      Previous research on the Drosophila mushroom body (MB) has made this structure the best-understood example of an associative memory center in the animal kingdom. This is in no small part due to the generation of cell-type specific driver lines that have allowed consistent and reproducible genetic access to many of the MB's component neurons. The manuscript by Shuai et al. now vastly extends the number of driver lines available to researchers interested in studying learning and memory circuits in the fly. It is an 800-plus collection of new cell-type specific drivers target neurons that either provide input (direct or indirect) to MB neurons or that receive output from them. Many of the new drivers target neurons in sensory pathways that convey conditioned and unconditioned stimuli to the MB. Most drivers are exquisitely selective, and researchers will benefit from the fact that whenever possible, the authors have identified the targeted cell types within the Drosophila connectome. Driver expression patterns are beautifully documented and are publicly available through the Janelia Research Campus's Flylight database where full imaging results can be accessed. Overall, the manuscript significantly augments the number of cell type-specific driver lines available to the Drosophila research community for investigating the cellular mechanisms underlying learning and memory in the fly. Many of the lines will also be useful in dissecting the function of the neural circuits that mediate sensorimotor circuits.

      Strengths:

      The manuscript represents a huge amount of careful work and leverages numerous important developments from the last several years. These include the thousands of recently generated split-Gal4 lines at Janelia and the computational tools for pairing them to make exquisitely specific targeting reagents. In addition, the manuscript takes full advantage of the recently released Drosophila connectomes. Driver expression patterns are beautifully illustrated side-by-side with corresponding skeletonized neurons reconstructed by EM. A comprehensive table of the new lines, their split-Gal4 components, their neuronal targets, and other valuable information will make this collection eminently useful to end-users. In addition to the anatomical characterization, the manuscript also illustrates the functional utility of the new lines in optogenetic experiments. In one example, the authors identify a specific subset of sugar reward neurons that robustly promotes associative learning.

      Weaknesses:

      While the manuscript succeeds in making a mass of descriptive detail quite accessible to the reader, the way the collection is initially described - and the new lines categorized - in the text is sometimes confusing. Most of the details can be found elsewhere, but it would be useful to know how many of the lines are being presented for the first time and have not been previously introduced in other publications/contexts.

      We revised the text as below.

      “Among the 828 lines, a subset of 355 lines, collectively labeling at least 319 different cell types, exhibit highly specific and non-redundant expression patterns are likely to be particularly valuable for behavioral experiments. Detailed information, including genotype, expression specificity, matched EM cell type(s), and recommended driver for each cell type, can be found in Supplementary File 1. A small subset of 40 lines from this collection have been previously used in studies (Aso et al., 2023; Dolan et al., 2019; Gao et al., 2019; Scaplen et al., 2021; Schretter et al., 2020; Takagi et al., 2017; Xie et al., 2021; Yamada et al., 2023). All transgenic lines newly generated in this study are listed in Supplementary File 2 (Aso et al., 2023; Dolan et al., 2019; Gao et al., 2019; Scaplen et al., 2021; Schretter et al., 2020; Takagi et al., 2017; Xie et al., 2021; Yamada et al., 2023).”

      And where can the lines be found at Flylight? Are they listed as one collection or as many?

      They are listed as one collection - “Aso 2021” release. It is named “2021” because we released the images and started sharing lines in December of 2021 without a descriptive paper. We added a sentence in the Methods section.

      “All splitGAL4 lines can be found at flylight database under “Aso 2021” release, and fly strains can be requested from Janelia or the Bloomington stock center.”

      Also, the authors say that some of the lines were included in the collection despite not necessarily targeting the intended type of neuron (presumably one that is involved in learning and memory). What percentage of the collection falls into this category?

      We do not have a good record of split-GAL4 screening to calculate the chance to intersect unintended cell types, but it was rather rare. Those unintended cell types can still be a part of circuits for associative learning (e.g. olfactory projection neurons) or totally unrelated cell types. For instance, among a new collection of split-LexA lines using Gr43a-LexADBD hemidriver (Figure 7-figure supplement 2), one line specifically intersected T1 neurons in the optic lobe despite that the AD line was selected to intersect sugar sensory neurons. We suspect that this is due to ectopic expression of Gr43a-LexADBD. Nonetheless, we included it in the paper because cell-type-specific Split-LexA driver for T1 will be useful irrespective of whether the expression of Gr43a gene is expressed in T1 or not.

      And what about the lines that the authors say they included in the collection despite a lack of specificity? How many lines does this represent?

      For a short answer, there are about 100 lines in the collection that lack the specificity for behavioral experiments.

      We ranked specificity of split-GAL4 drivers in the Supplementary File 1. Rank 2 are the ideal lines, Rank 1 are less ideal but acceptable, and Rank 0 is not suitable for activation screening in behavioral experiments. Out of the 828 split-GAL4 lines reported here, there are 413, 305 and 103 lines in rank2, rank1 and rank0 categories respectively. 7 lines are not ranked for specificity because only flipout expression data are available.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      As mentioned elsewhere and in addition to the minor points below, it is advisable for the authors to elaborate on the details of the screening process. Furthermore, a discussion about the circuits not targeted by their research, such as the visual projection neurons, would be beneficial.

      See the response above to Reviewer #2’s public review.

      Line 32-33: The citations are very fly-centric. the authors might want to consider reviews on the MB of other insect species regarding learning and memory.

      We additionally cited Rybak and Menzel 2017’s book chapter on honey bee mushroom body.

      Line 43-44: Citations should be added, e.g. Séjourné et al. (2011), Pai et al. (2013), Plaçais et al. (2013).

      Citation added

      Line 50-52: Citation Hulse et al. (2021) should be added.

      Citation added

      Line 162: In this part, it might be valuable for the reader to understand which of these PNs are actually connecting with KCs.

      A full list of cell types within the MB were provided in Supplementary File 4 of the revised manuscript. See also response to Reviewer 3, Lines 150-1.

      Line 179: Citation Burke et al. (2012) should be mentioned.

      Citation added

      Line 181: Thermogenic might be thermogenetic.

      Corrected

      Line 189: Citations add Otto et al. (2020) and Felsenberg et al. (2018).

      Citations added

      Line 208ff: The authors should consider discussing why they did not use other GR and IR promoters. For example, Gr5a is prominent in sugar-sensing, while Ir76b could be a reinforcement signal related to yeast food (Steck et al., 2018; Ganguly et al., 2017; see also Corfas et al., 2019 for local search).

      We focused on the Gr64f promoter because of its relatively broad expression and successful use of Gr64f-GAL4 for fictive reward experiment. We added the Split-LexA lines with Gr43a and Gr66a promoters (Figure 7-figure supplement 2). Other gustatory sensory neurons also have the potential to be reinforcement signals, but we just did not have the bandwidth to cover them all.

      Line 319: Consider citing Linneweber et al. (2020) for a neurodevelopmental account of such individuality.

      We added a sentence and cited this reference.

      “On the other hand, the neurodevelopmental origin of neuronal morphology appeared to have functional significance on behavioral individuality (Linneweber et al. 2020).”

      Line 352: Citation add Hulse et al. (2021).

      Citations added

      Line 356ff: The utility and value of Split-LexA may not be apparent to non-expert readers. Moreover, how were LexADBDs chosen for creating these lines?

      We have added an introductory sentence at the beginning of the paragraph and explained that these split-LexA lines were a conversion of split-GAL4 lines that were published in 2014 and frequently used in studying the mushroom body circuit.

      “Split-GAL4 lines enable cell-type-specific manipulation, but some experiments require independent manipulation of two cell types. Split-GAL4 lines can be converted into split-LexA lines by replacing the GAL4 DNA binding domain with that of LexA (Ting et al., 2011). To broaden the utility of the split-GAL4 lines that have been frequently used since the publication in 2014 (Aso et al., 2014a), we have generated over 20 LexADBD lines to test the conversions of split-GAL4 to split-LexA. The majority (22 out of 34) of the resulting split-LexA lines exhibited very similar expression patterns to their corresponding original split-GAL4 lines (Figure 12).”

      Line 374: Italicize Drosophila melanogaster.

      Revised as suggested.

      Reviewer #3 (Recommendations For The Authors):

      Major Comments:

      As mentioned in the Public Review, the drivers are nicely classified in the various subsections of the manuscript, but the statements in the text summarizing how many lines there are in specific categories are often confusing. For example, line 129 refers to "drivers encompassing 111 cell types that connect with the DANs and MBONs", but Figure 1E indicates that 46 new cell types downstream of MBONs and upstream of DANs have been generated. This seems like a discrepancy.

      The 46 cell types in Figure 1E consider only the CRE/SMP/SIP/SLP area, where MBON downstreams and DAN upstreams are highly enriched, while the 111 cell types include all. To avoid confusion, we removed the “MBON downstream and DAN upstream” counting in Figure 1E in the revised manuscript.

      Also, at line 75 the MBON lines previously generated by Rubin and Aso (2023) are referred to as though they are separate from the 828 described "In this report." Supplementary file 1 suggests, however, that they are included as part of this report.

      Twenty five lines generated in Rubin and Aso (2023) were initially included in Supplementary file 1 for the convenience of users, but they were not counted towards the 828 new lines described in this report. To avoid confusion, we removed these 25 lines in the revised manuscript. Now all lines listed in Supplementary file 1 were generated in this study (“Aso 2021” release), and if a line has been used in earlier studies, or introduced in other contexts, for example the accompanying omnibus preprint (Meissener 2024, doi: 10.1101/2024.01.09.574419), the citations are listed in the reference column.

      More generally, in lines 94-102 "828 useful lines based on their specificity, intensity and non-redundancy" are referred to, but they are subsequently subdivided into categories of lines with lower specificity (i.e. with off-target expression) and lines that did not target intended cell types (presumably ones unlikely to be involved in learning and memory). It would be useful to know how many lines (at least roughly) fall into these subcategories.

      See the response above to Reviewer #3’s public review.

      Finally, Figures 3B & C indicate cell types connected to DANs and MBONs and the number for which Split-Gal4 lines are available. The text (lines 136-7) states that the new collection covers 30 of these major cell types (Figure 3C)," but Figure 3C clearly has more than 30 dots showing the drivers available. Presumably existing and new driver lines are being pooled, but this should either be explained or the two should be distinguished.

      “(Figure 3C)” was replaced with “(Supplementaryl File 3)” in the revised manuscript to correct the reference. Figure 3B & C are plots of all MB interneurons, not just the major cell types.

      Minor Comments:

      Although the paper is generally well written there are minor grammatical errors throughout (e.g. dropped articles, odd constructions, etc.) that somewhat detract from an otherwise smooth and enjoyable reading experience. A quick editing pass by a native speaker (i.e. any of several of the authors) could clean up these and numerous other small mistakes. A few examples: line 138 "presented" should be present; line 204: "contain off-targeted expressions" should be "have off-target expression;" line 219: "usage to substitute reward" is awkward at best and could be something like "use in generating fictive rewards"; line 326 "arborize[s]"; l. 331 "Based on the likelihood" should be something like "based on these observations"'; line 349 "[is] likely to appear"; l. 352 "extensive connection[s]"; line 353 "has [a] strong influence;" l. 963 "Projections" should be singular; etc.

      All the mentioned examples have been corrected, and we have asked a native speaker to edit through the revised manuscript.

      Lines 81-3: Is the lookup table referred to Suppl. File 1? A reference is desirable.

      Yes, the lookup table referred to “Supplementary File 1” and a reference was added.

      Lines 111-2: what is a "non-redundant set of...cell types?" Cell types that are represented by a single cell (or bilateral pair)? Or does this sentence mean that of the 828 lines, 355 are specific to a single cell type, and in total 319 cell types are targeted? The statement is confusing.

      We revised the text as below.

      “Figure 1E provides an overview of the categories of covered cell types. Among the 828 lines, a subset of 355 lines, collectively labeling at least 319 different cell types, exhibit highly specific and non-redundant expression patterns are likely to be particularly valuable for behavioral experiments. Detailed information, including genotype, expression specificity, matched EM cell type(s), and recommended driver for each cell type, can be found in Supplementary File 1. A small subset of 40 lines from this collection have been previously used in studies (Aso et al.,

      2023; Dolan et al., 2019; Gao et al., 2019; Scaplen et al., 2021; Schretter et al., 2020; Takagi et al., 2017; Xie et al., 2021; Yamada et al., 2023). All transgenic lines newly generated in this study are listed in Supplementary File 2 (Aso et al., 2023; Dolan et al., 2019; Gao et al., 2019; Scaplen et al., 2021; Schretter et al., 2020; Takagi et al., 2017; Xie et al., 2021; Yamada et al., 2023).”

      Line 148: "MB major interneurons" is a confusing descriptor for postsynaptic partners of MBONs.

      We added a sentence to clarify the definition of the “MB major interneurons”.

      “In the hemibrain EM connectome, there are about 400 interneuron cell types that have over 100 total synaptic inputs from MBONs and/or synaptic outputs to DANs. Our newly developed collection of split-GAL4 drivers covers 30 types of these ‘major interneurons’ of the MB (Supplementary File 3).”

      Lines 150-1: Not sure what is meant by "have innervations within the MB." Sounds like cells are presynaptic to KCs, DANS, and MBONs, but Figure 3 Figure Supplement 1 indicates they include neurons that both provide and receive innervation to/from MB neurons. Please clarify.

      For clarification, in the revised manuscript we have included a full list of cell types within the MB in Supplementary File 4. Included are all neurons with >= 50 pre-synaptic connections or with >=250 post-synaptic connections in the MB roi in the hemibrain (excluding the accessory calyx). The cell types include KCs, MBONs, DANs, PNs, and a few other cell types. The coverage ratio was updated based on this list.

      Also, in line 152, what does it mean that they "may have been overlooked previously?" this seems unnecessarily ambiguous. Were they overlooked or weren't they?

      Changed the text to “These lines offer valuable tools to study cell types that previously are not genetically accessible. Notably, SS85572 enables the functional study of LHMB1, which forms a rare direct pathway from the calyx and the lateral horn (LH) to the MB lobes (Bates et al., 2020). ”

      Line 158 refers to PN cells within the MB, which are not mentioned in any place else as MB components.

      What are these PNs and how do they differ from MBONs?

      See responses to Lines 150-1 for clarification of cell types within the MB.

      Line 188: not clear what is meant by "more continual learning tasks".

      We rephrase it as “more complex learning tasks” to avoid jargon.

      Line 235: Not clear why "extended training with high LED intensity" wouldn't promote the formation of robust memories. Is this for some reason unexpected based on previous experiments? Please explain.

      See responses to weakness #1 of the same reviewer

      Lines 317-9: It would be useful to state here that MB0N08 and MB0N09 are the two neurons labeled by MB083C.

      Revised as suggested.

      Line 368: Presumably the "lookup table" referred to is Supplementary File 1, but a reference here would be useful.

      Yes, Supplementary File 1 and a reference was added.

      Comments on Figures:

      Figure 1C The "Dopamine Neurons" label position doesn't align with the Punishment and Reward labels, which is a bit confusing.

      They are intentionally not aligned, because dopamine neurons are not reward/punishment per se. We intend to use the schematic to show that the punishment and reward are conveyed to the MB through the dopamine neuron layer, just as the output from the MB output neuron layer is used to guide further integration and actions. To keep the labels of “Dopamine neurons” and “MB Output Neurons” in a symmetrical position, we decide to keep the original figure unchanged. But we thank the reviewer for the kind suggestion.

      Figure 1F and Figure 1 - Figure Supplement 1: the light gray labels presumably indicate the (EM-identified) neuron labeled by each line, but this should be explicitly stated in the figure legends. It would also be useful in the legends to direct the reader to the key (Supplementary File 1) for decoding neuronal identities.

      Revised as suggested.

      Figure 2: For clarity, I'd recommend titling this figure "LM-EM Match of the CRE011-specific driver SS45245". This reduces the confusion of mixing and matching the driver and cell-type names. Also, it would be helpful to indicate (e.g. with labels above the figure parts) that A & B represent the MCFO characterization step and C & D represent the LM-EM matching step of the pipeline. Revised as suggested.

      Figure 6: For clarity, it would be useful to separately label the PN and sensory neuron groups. Also, for the sensory neurons at the bottom, what is the distinction between the cell names in gray and black font?

      Figure 6 was updated to separate the non-olfactory PN and sensory neuron groups. The gray was intended for olfactory receptor neuron cell types that are additionally labeled in the driver lines. To avoid confusion, the gray cell types were removed in the revised figure, and a clarification sentence was added to the legend.

      “Other than thermo-/hygro-sensory receptor neurons (TRNs and HRNs), SS00560 and MB408B also label olfactory receptor neurons (ORNs): ORN_VL2p and ORN_VC5 for SS00560, ORN_VL1 and ORN_VC5 for MB408B.”

      Figure 7A: It's unclear why the creation of 6 Gr64f-LexADBD lines is reported. Aren't all these lines the same? If not, an explanation would be useful.

      These six Gr64f-LexADBD lines are with different insertion sites, and with the presence or absence of the p10 translational enhancer. Explanation was added to legend. Enhanced expression level with p10 can be helpful to compensate for the general tendency that split-LexA is weaker than split-GAL4. Different insertions will be useful to avoid transvections with split-GAL4s, which are mostly in attP40 and attP2.

      Figure 8F: It would help to include in the legend a brief description of each parameter being measured-essentially defining the y-axis label on the graphs as in Figure Supplement 2. Also, how is the probability of return calculated and what behavioral parameter does the change of curvature refer to?

      We added a brief description to the behavioral parameters in the legend of Figure 8F.

      “Return behavior was assessed within a 15-second time window. The probability of return (P return) is the percentage of flies that made an excursion (>10 mm) and then returned to within 3 mm of their initial position. Curvature is the ratio of angular velocity to walking speed.”

      Figure 9E: What are the parenthetical labels for lines SS49267, SS49300, and SS35008?

      They are EM bodyIDs. Figure legend was revised.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study compiles a wide range of results on the connectivity, stimulus selectivity, and potential role of the claustrum in sensory behavior. While most of the connectivity results confirm earlier studies, this valuable work provides incomplete evidence that the claustrum responds to multimodal stimuli and that local connectivity is reduced across cells that have similar long-range connectivity. The conclusions drawn from the behavioral results are weakened by the animals' poor performance on the designed task.This study has the potential to be of interest to neuroscientists.

      We thank the editor and the reviewers for their feedback on our work, which we have incorporated to help improve interpretation of our findings as outlined in the response below. While we agree with the editor that further work is necessary to provide a comprehensive understanding of claustrum circuitry and activity, this is true of most scientific endeavors and therefore we feel that describing this work as “incomplete” unfairly mischaracterizes the intent of the experiments performed which provide fundamental insights into this poorly understood brain region. Additionally, as identified in the main text, methods section, and our responses to the comments below, we disagree that the behavioral results are “weakened” by the performance of the animals. Our goal was to assess what information animals learned and used in an ambiguous sensory/reward environment, not to shape them toward a particular behavior and interpret the results solely based on their accuracy in performing the task.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper by Shelton et al investigates some of the anatomical and physiological properties of the mouse claustrum. First, they characterize the intrinsic properties of claustrum excitatory and inhibitory neurons and determine how these different claustrum neurons receive input from different cortical regions. Next, they perform in vitro patch clamp recordings to determine the extent of intraclaustrum connectivity between excitatory neurons. Following these experiments, in vivo axon imaging was performed to determine how claustrum-retrosplenial cortex neurons are modulated by different combinations of auditory, visual, and somatosensory input. Finally, the authors perform claustrum lesions to determine if claustrum neurons are required for performance on a multisensory discrimination task

      Strengths:

      An important potential contribution the authors provide is the demonstration of intra-claustrum excitation. In addition, this paper provides the first experimental data where two cortical inputs are independently stimulated in the same experiment (using 2 different opsins). Overall, the in vitro patch clamp experiments and anatomical data provide confirmation that claustrum neurons receive convergent inputs from areas of the frontal cortex. These experiments were conducted with rigor and are of high quality.

      We thank the reviewer for their positive appraisal of our work.

      Weaknesses:

      The title of the paper states that claustrum neurons integrate information from different cortical sources. However, the authors did not actually test or measure integration in the manuscript. They do show physiological convergence of inputs on claustrum neurons in the slice work. Testing integration through simultaneous activation of inputs was not performed. The convergence of cortical input has been recently shown by several other papers (Chia et al), and the current paper largely supports these previous conclusions. The in vivo work did test for integration because simultaneous sensory stimulations were performed. However, integration was not measured at the single cell (axon) level because it was unclear how activity in a single claustrum ROI changes in response to (for example) visual, tactile, and visual-tactile stimulations. Reading the discussion, I also see the authors speculate that the sensory responses in the claustrum could arise from attentional or salience-related inputs from an upstream source such as the PFC. In this case, claustrum cells would not integrate anything (but instead respond to PFC inputs).

      We thank the reviewer for raising this point. In response, we have provided a definition of “integration” in the manuscript text (lines 112-114, 353-354):

      “...single-cell responsiveness to more than one input pathway, e.g. being capable of combining and therefore integrating these inputs.”

      The reviewer’s point about testing simultaneous input to the claustrum is well made but not possible with the dual-color optogenetic stimulation paradigm used in our study as noted in the Results and Discussion sections (see also Klapoetke et al., 2014, Hooks et al., 2015). The novelty of our paper comes from testing these connections in single CLA neurons, something not shown in other studies to-date (Chia et al., 2020; Qadir et al., 2022), which average connectivity over many neurons.

      Finally, we disagree with the reviewer regarding whether integration was tested at the single-axon level and provide data and supplementary figures to this effect (Fig. 6, Supp. Fig. S14, lines 468-511) . Although the possibility remains that sensory-related information may arise in the prefrontal cortex, as we note, there is still a large collection of studies (including this one) that document and describe direct sensory inputs to the claustrum (Olson & Greybeil, 1980; Sherk & LeVay, 1981; Smith & Alloway, 2010; Goll et al., 2015; Atlan et al., 2017; etc.). We have updated the wording of these sections to note that both direct and indirect sensory input integration is possible.

      The different experiments in different figures often do not inform each other. For example, the authors show in Figure 3 that claustrum-RSP cells (CTB cells) do not receive input from the auditory cortex. But then, in Figure 6 auditory stimuli are used. Not surprisingly, claustrum ROIs respond very little to auditory stimuli (the weakest of all sensory modalities). Then, in Figure 7 the authors use auditory stimuli in the multisensory task. It seems that these experiments were done independently and were not used to inform each other.

      The intention behind the current manuscript was to provide a deep characterisation of claustrum to inform future research into this enigmatic structure. In this case, we sought to test pathways in vivo that were identified as being weak or absent in vitro to confirm and specifically rule out their influence on computations performed by claustrum. We agree with the reviewer’s assessment that it is not surprising that claustrum ROIs respond weakly to auditory stimuli. Not testing these connections in vivo because of their apparent sparsity in vitro would have represented a critical gap in our knowledge of claustrum responses during passive sensory stimulation.

      One novel aspect of the manuscript is the focus on intraclaustrum connectivity between excitatory cells (Figure 2). The authors used wide-field optogenetics to investigate connectivity. However, the use of paired patch-clamp recordings remains the ground truth technique for determining the rate of connectivity between cell types, and paired recordings were not performed here. It is difficult to understand and gain appreciation for intraclaustrum connectivity when only wide-field optogenetics is used.

      We thank the reviewer for acknowledging the novelty of these experiments. We further acknowledge that paired patch-clamp recordings are the gold standard for assessing synaptic connectivity. Typically such experiments are performed in vitro, a necessity given the ventral location of claustrum precluding in vivo patching. In vitro slice preparations by their very nature sever connections and lead to an underestimate of connectivity as noted in our Discussion. Kim et al. (2016) have done this experiment in coronal slices with the understanding that excitatory-excitatory connectivity would be local (<200 μm) and therefore preserved. We used a variety of approaches that enabled us to explore connectivity along the longitudinal axis of the brain (the rostro-caudal, e.g. “long” axis of the claustrum), providing fresh insight into the circuitry embedded within this structure that would be challenging to examine using dual recordings. Further, our optogenetic method (CRACM, Petreanu et al., 2007), has been used successfully across a variety of brain structures to examine excitatory connectivity while circumventing artifacts arising from the slice axis.

      In Figure 2, CLA-rsp cells express Chrimson, and the authors removed cells from the analysis with short latency responses (which reflect opsin expression). But wouldn't this also remove cells that express opsin and receive monosynaptic inputs from other opsin-expressing cells, therefore underestimating the connectivity between these CLA-rsp neurons? I think this needs to be addressed.

      The total number of opsin-expressing CLA neurons in our dataset is 4/46 tested neurons. Assuming all of these neurons project to RSP, they would have accounted for 4/32 CLARSP neurons. Given the rate of monosynaptic connectivity observed in this study, these neurons would only contribute 2-3 additional connected neurons. Therefore, the exclusion of these neurons does not significantly impact the overall statistical accuracy of our connectivity findings.

      In Figure 5J the lack of difference in the EPSC-IPSC timing in the RSP is likely due to 1 outlier EPSC at 30 ms which is most likely reflecting polysynaptic communication. Therefore, I do not feel the argument being made here with differences in physiology is particularly striking.

      We thank the reviewer for their attention to detail about this analysis. We have performed additional statistics and found that leaving this neuron out does not affect the significance of the results (new p-value = 0.158, original p-value = 0.314, Mann-Whitney U test). We have removed this datapoint from the figure and our analysis.

      In the text describing Figure 5, the authors state "These experiments point to a complex interaction ....likely influenced by cell type of CLA projection and intraclaustral modules in which they participate". How does this slice experiment stimulating axons from one input relate to different CLA cell types or intra-claustrum circuits? I don't follow this argument.

      We have removed this speculation from the Results section.

      In Figure 6G and H, the blank condition yields a result similar to many of the sensory stimulus conditions. This blank condition (when no stimulus was presented) serves as a nice reference to compare the rest of the conditions. However, the remainder of the stimulation conditions were not adjusted relative to what would be expected by chance. For example, the response of each cell could be compared to a distribution of shuffled data, where time-series data are shuffled in time by randomly assigned intervals and a surrogate distribution of responses generated. This procedure is repeated 200-1000x to generate a distribution of shuffled responses. Then the original stimulus-triggered response (1s post) could be compared to shuffled data. Currently, the authors just compare pre/post-mean data using a Mann-Whitney test from the mean overall response, which could be biased by a small number of trials. Therefore, I think a more conservative and statistically rigorous approach is warranted here, before making the claim of a 20% response probability or 50% overall response rate.

      We appreciate the reviewer's thorough analysis and suggestion for a more conservative statistical approach. We acknowledge that responses on blank trials occur about 10% of the time, indicating that response probabilities around this level may not represent "real" responses. To address this, we will include the responses to the blank condition in the manuscript (lines 505-509). This will allow readers to make informed decisions based on the presented data.

      Regarding Figure 6, a more conventional way to show sensory responses is to display a heatmap of the z-scored responses across all ROIs, sorted by their post-stimulus response. This enables the reader to better visualize and understand the claims being made here, rather than relying on the overall mean which could be influenced by a few highly responsive ROIs.

      We apologize to the reviewer that our data in this figure was challenging to interpret. We have included an additional supplemental figure (Supp. Fig. S15) that displays the requested information.

      For Figure 6, it would also help to display some raw data showing responses at the single ROI level and the population level. If these sensory stimulations are modulating claustrum neurons, then this will be observable on the mean population vector (averaged df/f across all ROIs as a function of time) within a given experiment and would add support to the conclusions being made.

      We appreciate the reviewer’s desire to see more raw data – we would have included this in the figure given more space. However, the average df/f across all ROIs is shown as a time series with 95% confidence intervals in Fig. 6D.

      As noted by the authors, there is substantial evidence in the literature showing that motor activity arises in mice during these types of sensory stimulation experiments. It is foreseeable that at least some of the responses measured here arise from motor activity. It would be important to identify to what extent this is the case.

      While we acknowledge that some responses may arise from motor-related activity, addressing this comprehensively is beyond the scope of this paper. Given the extensive number of trials and recorded axonal segments, we believe that motor-related activity is unlikely to significantly impact the average response across all trials. Future studies focusing specifically on motor activity during sensory stimulation experiments would be needed to elucidate this aspect in detail.

      All claims in the results for Figure 6 such as "the proportion of responsive axons tended to be highest when stimuli were combined" should be supported by statistics.

      We have provided additional statistics in this section (lines 490-511) to address the reviewer’s comment.

      In Figure 7, the authors state that mice learned the structure of the task. How is this the case, when the number of misses is 5-6x greater than the number of hits on audiovisual trials (S Figure 19). I don't get the impression that mice perform this task correctly. As shown in Figure 7I, the hit rate is exceptionally low on the audiovisual port in controls. I just can't see how control and lesion mice can have the same hit rate and false alarm rate yet have different d'. Indeed, I might be missing something in the analysis. However, given that both groups of mice are not performing the task as designed, I fail to see how the authors' claim regarding multisensory integration by the claustrum is supported. Even if there is some difference in the d' measure, what does that matter when the hits are the least likely trial outcome here for both groups.

      We thank the reviewer for their comments and hope the following addresses their confusion about the performance of animals during our multimodal conditioning task.

      Firstly, as pointed out by the reviewer, the hit-rate (HR) is lower than false-alarm-rate (FR) but crucially only when assessed explicitly within-condition (e.g. just auditory or just visual stimulation). Given the multimodal nature of the assay, HR and FR could also be evaluated across different trials, unimodal and multimodal, for both auditory and visual stimuli. Doing so resulted in a net positive d', as observed by the reviewer. From this perspective, and as documented in the Methods (Multimodal Conditioning and Reversal Learning) and Supplemental Figures, mice do indeed learn the conditioning task and perform at above-chance levels.

      Secondly, as raised in the Discussion, an important caveat of this assay was that it was unnecessary for mice to learn the task structure explicitly but, rather, that they respond to environmental cues in a reward-seeking manner that indicated perception of a stimulus. "Performance" as it is quantified here demonstrates a perceptual difference between conditions that is observed through behavioral choice and timing, not necessarily the degree to which the mice have an understanding of the task per se.

      In the discussion, it is stated that "While axons responded inconsistently to individual stimulus presentations, their responsivity remained consistent between stimuli and through time on average...". I do not understand this part of the sentence. Does this mean axons are consistently inconsistent?

      The reviewer’s interpretation is correct – although recorded axons tended to have a preferred stimulus or combination of stimuli, they displayed variability in their responses (response probability), though little or no variability in their likelihood to respond over time (on average).

      In the discussion, the authors state their axon imaging results contrast with recent studies in mice. Why not actually do the same analysis that Ollerenshaw did, so this statement is supported by fact? As pointed out above, the criteria used to classify an axon as responsive to stimuli were very liberal in this current manuscript.

      While we appreciate this comment from the reviewer, we feel that it was not necessary to perform similar analyses to those of Ollerenshaw et al in order to appreciate that methodological differences between these studies would have confounded any comparisons made, as we note in the Discussion.

      I find the discussion wildly speculative and broad. For example, "the integrative properties of the CLA could act as a substrate for transforming the information content of its inputs (e.g. reducing trial-to-trial variability of responses to conjunctive stimuli...)". How would a claustrum neuron responding with a 10% reliability to a stimuli (or set of stimuli) provide any role in reducing trial-to-trial variability of sensory activity in the cortex?

      We thank the reviewer for their feedback. We acknowledge the reviewer's concern regarding the speculative nature of our discussion. To address the specific point raised, while a neuron with a 10% reliability might appear limited in reducing trial-to-trial variability in sensory activity, it's possible that such neurons are responsive to a combination of stimuli or conditions not fully controlled or recorded in our current setup. For instance, variables like the animal’s attentional or motivational states could influence the responsiveness of claustrum neurons, thus integrating these inputs could theoretically modulate cortical processing. We have refined this section to clarify these points (now lines 810-813).

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Shelton et al. explore the organization of the Claustrum. To do so, they focus on a specific claustrum population, the one projecting to the retrosplenial cortex (CLA-RSP neurons). Using an elegant technical approach, they first described electrophysiological properties of claustrum neurons, including the CLA-RSP ones. Further, they showed that CLA-RSP neurons (1) directly excite other CLA neurons, in a 'projection-specific' pattern, i.e. CLA-RSP neurons mainly excite claustrum neurons not projecting to the RSP and (2) receive excitatory inputs from multiple cortical territories (mainly frontal ones). To confirm the 'integrative' property of claustrum networks, they then imaged claustrum axons in the cortex during singleor multi-sensory stimulations. Finally, they investigated the effect of CLA-RSP lesion on performance in a sensory detection task.

      Strengths:

      Overall, this is a really good study, using state-of-the-art technical approaches to probe the local/global organization of the Claustrum. The in-vitro part is impressive, and the results are compelling.

      We thank the reviewer for their positive appraisal of our work.

      Weaknesses:

      One noteworthy concern arises from the terminology used throughout the study. The authors claimed that the claustrum is an integrative structure. Yet, integration has a specific meaning, i.e. the production of a specific response by a single neuron (or network) in response to a specific combination of several input signals. In this study, the authors showed compelling results in favor of convergence rather than integration. On a lighter note, the in-vivo data are less convincing, and do not entirely support the claim of "integration" made by the authors.

      We thank the reviewer for their clarity on this issue. We absolutely agree that without clear definition in the study, interpretation of our data could be misconstrued for one of several possible meanings. We have updated our Introduction, Results, and Discussion text to reflect the definition of ‘integration’ we used in the interpretation of our work and hope this clarifies our intent to the reader.

      Reviewer #3 (Public Review):

      The claustrum is one of the most enigmatic regions of the cerebral cortex, with a potential role in consciousness and integrating multisensory information. Despite extensive connections with almost all cortical areas, its functions and mechanisms are not well understood. In an attempt to unravel these complexities, Shelton et al. employed advanced circuit mapping technologies to examine specific neurons within the claustrum. They focused on how these neurons integrate incoming information and manage the output. Their findings suggest that claustrum neurons selectively communicate based on cortical projection targets and that their responsiveness to cortical inputs varies by cell type.

      Imaging studies demonstrated that claustrum axons respond to both single and multiple sensory stimuli. Extended inhibition of the claustrum significantly reduced animals' responsiveness to multisensory stimuli, highlighting its critical role as an integrative hub in the cortex.

      However, the study's conclusions at times rely on assumptions that may undermine their validity. For instance, the comparison between RSC-projecting and non-RSC-projecting neurons is problematic due to potential false negatives in the cell labeling process, which might not capture the entire neuron population projecting to a brain area. This issue casts doubt on the findings related to neuron interconnectivity and projections, suggesting that the results should be interpreted with caution. The study's approach to defining neuron types based on projection could benefit from a more critical evaluation or a broader methodological perspective.

      We thank the reviewer for their attention to the methods used in our study. We acknowledge that there is an inherent bias introduced by false-negatives as a result of incomplete labeling but contend that this is true of most modern tracing experiments in neuroscience, irrespective of the method used. Moreover, if false-negative biases are affecting our results, then they likely do so in the direction of supporting our findings – perfect knowledge of claustrum connectivity would likely enhance the effects seen by increasing the pool of neurons for which we find an effect. For example, our cortico-claustal connectivity findings in Figure 3 likely would have shown even larger effects should false-negative CLARSP neurons have been positively identified.

      Where appropriate we have provided estimates of variability and certainty in our experimental findings and do not claim any definitive knowledge of the true rate and scope of claustrum connectivity.

      Nevertheless, the study sets the stage for many promising future research directions. Future work could particularly focus on exploring the functional and molecular differences between E1 and E2 neurons and further assess the implications of the distinct responses of excitatory and inhibitory claustrum neurons for internal computations. Additionally, adopting a different behavioral paradigm that more directly tests the integration of sensory information for purposeful behavior could also prove valuable.

      We thank the reviewer for their outlook on the future directions of our work. These avenues for study, we believe, would be very fruitful in uncovering the cell-type-specific computations performed by claustrum neurons.

      Recommendations for the authors:

      Reviewing Editor (Recommendations for the Authors):

      The editor recommends addressing the issues raised by the reviewers about the statistical significance of sensory response with respect to blank stimuli, and solving the issue generated by the exclusion of monosynaptically connected neurons in the connectivity study, to raise the assessment strength of evidence from incomplete to solid. Moreover, as the reported result stands, the behavioral task does not seem to be learned by the animals as the animals are above chance for visual and auditory but largely below chance level for multisensory. It seems that the animals do not perform a multisensory task. The authors should clarify this.

      Reviewer #1 (Recommendations For The Authors):

      Several references were missing from the manuscript, where mouse CLA-retrosplenial or CLA-frontal neurons were investigated and would be highly relevant to both the discussion of claustrum function and the context of the methodologies used here. (Wang et al., 2023 Nat Comm; Nair et al., 2023 PNAS, Marriott et al. 2024 Cell Reports ; Faig et al., 2024 Current

      Biology).

      Reviewer #2 (Recommendations For The Authors):

      Let me be clear, this is an excellent study, using state-of-the-art technical approaches to probe the local/global organization of the Claustrum. However, the study is somehow disconnected, with a fantastic in-vitro part, and, in my opinion, a less convincing in-vivo one.

      As stated in the public review, I'm concerned about the use of the term "integration", as, in my opinion, the data presented in this study (which I repeat are of excellent level) do not support that claim.

      Below are my main points regarding the article:

      (1) My main comment relates to the use of the term 'integration'. It might be a semantic debate, but I think that this is an important one. In my opinion, neural integration is the "summing of several neural input signals by a single neuron to produce an output signal that is some function of those inputs". As the authors state in the discussion, they were not able to "assess the EPSP response magnitude to the conjunction of stimuli due to photosensitivity of ChrimsonR opsins to blue light". Therefore, the authors did not specifically prove integration, but rather input convergence. This does not mean that the results presented are not important or of excellent quality, but I encourage the authors to either tone down the part on integration or to give a clear definition of what they call integration.

      (2) The in vivo imaging data are somehow confusing. First, the authors image two claustral populations simultaneously (the CLA-RSP and the CLA-ACA axons). I may be missing the information, but there is no evidence that these cells overlap in the CLA (no data in the supplement and existing literature only support partial overlap). Second, in the results part, the authors claim that 96% of the sensory-responsive axons displayed multisensory response. This, combined with the 47% of axons responsive to at least one stimulus should lead to a global response of around 45% of the axons in multisensory trials. Yet, in Figures 6F-G, one can see that the response probability is actually low (closer to 20%). To be honest, I cannot really understand how to make sense of these results. At first, I thought that most of the multisensory responsive axons show no response during multisensory stimulus (but one in the unimodal stimulus). This hypothesis is however unlikely, as response AUC is biased toward positivity in Figure 6H. Overall, I'm not totally convinced by the imaging data, and I think that the authors should be more cautious about interpreting their results (as they are in the discussion part, but less in the results part).

      (3) The TetTox approach used in the study ablates all neurons expressing the CRE in the CLA. If the hypothesis proposed by the authors is true, then ablating one subpopulation should not impact that much the functioning of the whole CLA, as other neurons will likely "integrate" information coming from multiple cortices (Figures 3 and 4), the local divergence (Figure 1) will then allow the broadcasting of this information back to multiples cortices. Do the authors think that such an approach deeply modified intra-claustral network connectivity? If this is not the case, shouldn't we expect less effect after lesioning a specific sub-population of CLA neurons?

      (4) The behavioral protocol is also confusing. If I understand correctly, the aim of the task was to probe the D-Prime factor, as all trials, whatever the response of the animal are rewarded. From the Figure 7I, one can see that the mice cannot properly answer to the audiovisual cues, clearly indicating that both groups show impaired response to this type of trial. The whole conclusion of the authors is therefore drawn from the D-Prime calculation. However, even if D-Prime should represent a measure of sensitivity (i.e. is unaffected by response bias), two assumptions need to be met: (1) the signal and noise distributions should be both normal, and (2) the signal and noise distributions should have the same standard deviation. However, these assumptions cannot be tested in the task used by the authors (one would need rating tasks). The authors might want to use nonparametric measures of sensitivity such as A' (see Pollack and Norman 1964).

      Reviewer #3 (Recommendations For The Authors):

      While the study is comprehensive, some of its conclusions are based on assumptions that potentially weaken their validity. A significant issue arises in the comparison between neurons that project to the retrosplenial cortex (RSC) and those that do not. This differentiation is based on retrograde labeling from a single part of the RSC. However, CTB labeling, the technique used, does not capture 100% of the neurons projecting to a brain area. The study itself demonstrates this by showing that injecting the dye into three sections of the RSC results in three overlapping populations of neurons in the claustrum. Therefore, limiting the injection to just one of these areas inevitably leads to many false negatives-neurons that project to the RSC but are not marked by the CTB. This issue recurs in the analysis of neurons projecting to both the RSC and the prelimbic cortex (PL), where assumptions about interconnectivity are made without a thorough examination of overlap between these populations. The incomplete labeling complicates the interpretation of the data and draws firm conclusions from it.

      Minor.

      There is a reference to Figure 1D where claustrum->cortical connections are described. This should be 5D.

      This is a correct reference pointing back to our single-cell characterizations of CLA morphoelectric types.

      End of Page 22. Implies should be imply.

      This has been resolved in the manuscript text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is an interesting and valuable study that uses multiple approaches to understand the role of bursting involving voltage-gated calcium channels within the mediodorsal thalamus in the sedative-hypnotic effects of alcohol. Given its unique functional roles and connectivity pattern, the idea that the mediodorsal thalamus may have a fundamental role in regulating alcohol-induced transitions in consciousness state would be both important for researchers investigating thalamocortical dynamics and more broadly interesting for understanding brain function. In addition, the author's examination of the role of the voltage-gated calcium channel Cav3.1 provides some evidence that burst-firing mediated by this channel in the thalamus is functionally important for behavioral-state transitions. While many previous studies have suggested an analogous role for sleep-state regulation, the evidence for an analogous role of this type of bursting in sedative-induced transitions is more limited. Despite the importance of these results, however, there is some concern that the manipulations and recording approaches employed by the authors may affect other thalamic nuclei adjacent to the MD, such as the central lateral nucleus, which has also been implicated in controlling state transitions. The evidence for a specific role of the mediodorsal thalamus is therefore somewhat incomplete, and so additional validation is needed.

      Strengths:

      This study employs multiple, complementary research approaches including behavioral assays, sh-RNAbased localized knockdown, single-unit recordings, and patterned optogenetic interventions to examine the role of activity in the mediodorsal thalamus in the sedative-hypnotic effects of alcohol. Experiments and analyses included in the manuscript generally appear well conceived and are also generally well executed. Sample sizes are sufficiently large and statistical analysis appears generally appropriate though in some cases additional quantification would be helpful. The findings presented are novel and provide some interesting insight into the role of the thalamus as well as voltage-gated calcium channels within this region in controlling behavioral state transitions induced by alcohol. In particular, the observed effects of selective knockout along with recordings in total knockout of the voltage-gated calcium channel, Cav3.1, which has previously been implicated in bursting dynamics as well as state transitions, particularly in sleep, together suggest that the transition of thalamic neurons to a bursting pattern of firing from a more constant firing is important for transition to the sedated state produced by ethanol intoxication. While previous studies have similarly implicated Cav3.1 bursting in behavioral state transitions, the direct optogenetic interventions and single-unit recordings provide valuable new insight. These findings may also have interesting implications for the relationship between sleep process disruption associated with ethanol dependence, although the authors do not appear to examine this directly or extensively discuss these implications of their findings.

      Weaknesses:

      A key claim of the study is that the mediodorsal thalamus is specifically important for the sedative-hypnotic effect of ethanol and that a transition to a bursting pattern of firing in this circuit facilitates these effects due to a loss of a more constant tonic firing pattern. Despite the generally clear observed effects across the included experiments, however, the evidence presented does not fully support that the mediodorsal thalamus, in particular, is involved. This distinction is important because some previous studies have suggested that another thalamic nucleus which is very close to the mediodorsal thalamus, the central-lateral thalamus, has previously been suggested to play a role in preventing sedative-induced transitions. Despite its proximity to the mediodorsal thalamus, the central-lateral thalamus has a substantially different pattern of connectivity so distinguishing which region is impacted is important for understanding the findings in the manuscript. While sh- RNA knockdown appears to be largely centered in the mediodorsal thalamus in the example shown, (Figure 2) this is rather minimal evidence and it is also not well explained (indeed, the relevant panels do not even appear to be referenced in the text of the manuscript) and the consistency of the knockdown targeting is not quantified. Additional evidence should be provided to validate this approach. Similarly, while an example is shown for the expression of ChR2 (Fig. 5) there seems to be some spread of expression outside of the mediodorsal thalamus even in his example raising a concern about how regionally specific this effect.

      The recordings targeting the mediodorsal thalamus could provide evidence of a direct association between changes in activity specifically in this part of the thalamus with the behavioral measures but there are currently some issues with making this link. One difficulty is that, although lesions are shown in Figure S5 to validate recording locations, this figure is relatively unclear and the examples appear to be taken from a different anterior/posterior location compared to the reference diagram. A larger image and improved visualization of the overall set of lesion locations that includes multiple anterior/posterior coronal sections would be helpful. Moreover, even for these example images, it is difficult to evaluate whether these are in the mediodorsal thalamus, particularly given the small size of the image shown. Ideally, an example image that is more obviously in the mediodorsal thalamus would also be included. Finally, an assessment of the relationship between the approximate locations of recorded neurons across the tetrode arrays and the behavioral measures would be very helpful in supporting the unique role of the mediodorsal thalamus. The lack of these direct links, in combination with the histological issues, reduces the insight that can be gained from this study.

      In addition to the key experimental issues mentioned above, there are often problems in the text of the manuscript with reasoning or at least explanation as well as numerous minor issues with editing. The most substantial such issue is the lack of clarity in discussing the mediodorsal thalamus and other adjacent thalamic nuclei, such as the central-lateral nucleus, in the author's discussion of previous findings. Given that at last one of the manuscripts cited by the authors (Saalman, Front. Sys. Neuro. 2014) has directly claimed that central-lateral, rather than the mediodorsal, thalamus is important for arousal regulation related to a conscious state, this distinction should be addressed clearly in the discussion rather than papered over by grouping multiple thalamic nuclei as being medial. As part of this discussion, it would be important to consider additional relevant literature including Bastos et al., eLife, 2021 and Redinbaugh et al., Neuron, 2020 which are quite critical but currently do not appear to be cited. Considering additional literature relevant to the function of the mediodorsal thalamus would also be beneficial. While the methods employed generally seem sound, the description in the methods section is lacking in detail and is often difficult to follow. Analysis methods such as the burst index appear to only be given a brief explanation in the text and appear not to be mentioned in the methods section. Similarly, the staining method used in Figure 2 does not appear to be described in the methods section. The most substantial case is for the UMAP approach used in Figure 4-E which does not appear to be described in the methods or even described in the main text. The lack of detailed descriptions makes it difficult to evaluate the applicability and quality of the experimental and analytical approaches. Citations justifying the use of methods such as the approach to separate regular spiking and narrow spiking neuron subtypes are also needed.

      Beyond the problems with content and reasoning discussed above, there are also some relatively minor issues with the clarity of writing throughout the paper (for example, in the abstract the authors refer to "the ethanol resistance behavior in WT mice" but it is difficult to parse what they mean by this statement. Similarly, the next sentence "These results support that the maintenance..." while clearer, is not well phrased. Though individually minor, issues like this re-occur throughout the manuscript and sometimes make it difficult to follow so the text should be revised to correct them. There are also some problems with labels such as the labels of A1/A2 in Figure 4, which appear to be incorrect. Also, S7 has no label] on the B panels. Finally, some references are not included (only a label of [ref]).

      Reviewer #2 (Public Review):

      In the current study, Latchoumane and collaborators focus on the Cav3.1 calcium channels in the mediodorsal thalamic nucleus as critical players in the regulation of brain-states and ethanol resistance in mice. By combining behavioural, electrophysiological, and genetic techniques, they report three main findings. First, KO Cav3.1 mice exhibit resistance to ethanol-induced sedation and sustained tonic firing in thalamocortical units. Second, knocked-down Cav3.1 mice reproduce the same behaviour when the mediodorsal, but not the ventrobasal, thalamic nucleus is targeted. Third, either optogenetic or electric stimulation of the mediodorsal thalamus reduces ethanol-induced sedation in control animals.

      Overall, the study is well designed and performed, correctly controlled for confounds, and properly analysed. Nonetheless, it is important to address some aspects of the report. The results support the conclusions of the study. These results are likely to be relevant in the field of systems neuroscience, as they increase the molecular evidence showing how the thalamus regulates brain states.

      Reviewer #1 (Recommendations For The Authors):

      Aside from the additional quantification and clarification of the analysis discussed in the weakness section, in general, the experiments included in the manuscript seem reasonable. However, I would suggest one additional experiment as well as one control, both of which are relatively straightforward optogenetic experiments, that I feel would be helpful to further improve the study. First, as the authors note, the optogenetic interventions used do not directly address the relevance of the changes in bursting patterns observed in the knockout (KO), which are by far the most robust effect, with the changes in alcohol sensitivity. One approach that could help address this would be to use patterned suppression via inhibitory opsins (e.g. halorhodopsin) to "rescue" the periods of inhibition associated with bursting in the KO. Localizing this inhibition to the mediodorsal thalamus would also lend further credence to their claim that this nuclei is the relevant circuit for their observed effects. For the control, tonic activation of the ventrobasal nucleus, as the authors did for the mediodorsal nucleus, would be beneficial to rule out the possibility that the observed effect would occur with any thalamic nucleus. In addition to these experiments, I did not note the strategy for sharing data obtained through this study so this should be added.

      R1 – 1: A key claim of the study is that the mediodorsal thalamus is specifically important for the sedative-hypnotic effect of ethanol and that a transition to a bursting pattern of firing in this circuit facilitates these effects due to a loss of a more constant tonic firing pattern. Despite the generally clear observed effects across the included experiments, however, the evidence presented does not fully support that the mediodorsal thalamus, in particular, is involved. This distinction is important because some previous studies have suggested that another thalamic nucleus which is very close to the mediodorsal thalamus, the central-lateral thalamus, has previously been suggested to play a role in preventing sedative-induced transitions. Despite its proximity to the mediodorsal thalamus, the central-lateral thalamus has a substantially different pattern of connectivity so distinguishing which region is impacted is important for understanding the findings in the manuscript.

      R1-A1: The reviewer is right that CL has been pointed as another candidate structure with causal influence on arousal and consciousness. We have focused our efforts in including only recording single units that were from tetrode located in the MD specifically using the lesion code we explain in the method section and in response to R1 question#3. We also produced a quantification of Cav3.1 knock-down that clearly demonstrates that the KD experiment was itself specific to MD, bilaterally, and that CL to CM were minimally impacted by the knock-down process (Fig. 2C and D). Moreover, the optogenetic  (fiber incidence was 30 degrees guaranteeing a central coverage rather than lateral; Fiber optic NA = 0.22) and electric stimulation (bipolar twisted electrodes, 50uA) experiments were also very selective and specific to the MD (Fig.S5). It remains clear that MD might not be the sole structure involved in the brain state control towards sedation and “anesthetic states”, and CL might be a significant contributor as well, however, we show that CL manipulations were rather irrelevant in our experiments  (Fig. 2, S5, S9 and S11).

      R1-2: While sh-RNA knockdown appears to be largely centered in the mediodorsal thalamus in the example shown, (Figure 2) this is rather minimal evidence and it is also not well explained (indeed, the relevant panels do not even appear to be referenced in the text of the manuscript) and the consistency of the knockdown targeting is not quantified. Additional evidence should be provided to validate this approach.

      R1-A2: In order to address this important question, we have created an additional panel quantification to fig2D. We have then quantified the intensity per area of Cav3.1 expression in sub zones of 4 regions of interest: MD (left, right; 2 subzones each), Centro Medial (CM; 1 subzones in total), Centrolateral/Paraventricular nucleus (CL/PCN; left, right; 2 subzones each) and the submedial nucleus (SMT; left, right; used as a control for the intensity normalization; 1 subzones in total). This panel clearly illustrates that MD was knocked-down bilaterally (p<0.001). Moreover, CM (p<0.05) and CL (p<0.01) were also partially and unilaterally knocked down, as well. This analysis confirms that our KD had a high specificity to MD.

      We added the relevant figure caption and text:

      [Result section, Cav3.1 silencing in the MD, but not VB, increased ethanol resistance in mice, paragraph 3]

      “We then characterized the change in Cav3.1 expression following the shControl and shCav3.1 knockdown injections in three test regions MD (left and right), CM (centromedial nucleus) and CL (centrolateral nuclei, left and right side) and a negative control region SMT (submedial thalamic nuclei, left and right side). The average intensity was obtained from two coronal brain slices for each mice used in the experiment (see Methods sections, Cav3.1 Intensity quantification). Our results show that the targeting of the knockdown was very specific to the bilateral MD (p<0.001; Fig. 2D). We noted that the CM (p<0.05) and a marginal unilateral knock-down of the CL were also observed (p<0.01). Notably, we tested the correlation between the level of knock-down in MD and the total time in LOM and observed a significant association (Fig. 2D inset; R = 0.599, p = 0.018). This result highlights that the Cav3.1 knock-down was specific to MD and with an intensity associated with ethanol-induced loss of motion.”

      R1-3: One difficulty is that, although lesions are shown in Figure S5 to validate recording locations, this figure is relatively unclear and the examples appear to be taken from a different anterior/posterior location compared to the reference diagram. A larger image and improved visualization of the overall set of lesion locations that includes multiple anterior/posterior coronal sections would be helpful. Moreover, even for these example images, it is difficult to evaluate whether these are in the mediodorsal thalamus, particularly given the small size of the image shown. Ideally, an example image that is more obviously in the mediodorsal thalamus would also be included. Finally, an assessment of the relationship between the approximate locations of recorded neurons across the tetrode arrays and the behavioral measures would be very helpful in supporting the unique role of the mediodorsal thalamus.

      R1-A3: Related to fig.S5, we re-distributed the position of the recordings from the tetrode electrode burned positions over 3 representative coronal planes that best represent the implant positions. We also provided additional snapshots of tetrode location. To identify the positions of four tetrodes in each animal, we encoded the positions with different electrical lesion strategies as follows: 1 lesion(tetrode 1), 2 lesions while we redrew the tetrode with 100 um interval (tetrode 2), 3 lesions with 200um interval (tetrode 3), 4 lesions with 50um intervals (tetrode4). Tetrodes that were found outside of the MD delimited region were discarded post analysis. A straight relationship between the closeness of the electrode is unfortunately not possible for tetrode recording, a straight silicone probe which maintains the spatial spacing in recording would have been a better approach in that case, but unfortunately, it was not performed in our study.

      R1-4: In addition to the key experimental issues mentioned above, there are often problems in the text of the manuscript with reasoning or at least explanation as well as numerous minor issues with editing. The most substantial such issue is the lack of clarity in discussing the mediodorsal thalamus and other adjacent thalamic nuclei, such as the central-lateral nucleus, in the author's discussion of previous findings. Given that at last one of the manuscripts cited by the authors (Saalman, Front. Sys. Neuro. 2014) has directly claimed that central-lateral, rather than the mediodorsal, thalamus is important for arousal regulation related to a conscious state, this distinction should be addressed clearly in the discussion rather than papered over by grouping multiple thalamic nuclei as being medial. As part of this discussion, it would be important to consider additional relevant literature including Bastos et al., eLife, 2021 and Redinbaugh et al., Neuron, 2020 which are quite critical but currently do not appear to be cited. Considering additional literature relevant to the function of the mediodorsal thalamus would also be beneficial.

      R1-A4: We thank the reviewer for his comments and suggestions. We agree that the added references mentioned by the reviewers are highly relevant and should be integrated in the manuscript. We have integrated the above-mentioned references and further developed on the discussion on the role of MD relative to other thalamic nuclei (ILN and CL in particular). We believe that this better-referenced and clarified text does improve the manuscript greatly.

      [introduction section, paragraph 3]

      “The centrolateral (CL) thalamic nucleus has been implicated in the modulation of arousal, behavior arrest 31, and improvement of level of consciousness during seizures 32. Notably, the direct electrical stimulation of the intralaminar nuclei (ILN) and, in particular CL, promoted hallmarks of arousal and awakening in primate under propofol and ketamine propofol anesthesia.”

      [Discussion section, paragraph 1]

      “In this work, we identified that the neural activity in MD plays a causal role in the maintenance of consciousness. Whole body Cav3.1 KO and MD-specific Cav3.1 KD mice showed resistance to loss of consciousness induced by hypnotic dose of ethanol. In WT mice, MD neurons demonstrated a reduced firing rate in natural (sleep) and ethanol-induced unconscious states compared to awake states. This neural activity reduction was impaired in KO mice. In particular, transition to an unconscious state was accompanied with a switch of firing mode from tonic firing to burst firing in WT mice whereas this modeshift disappeared in KO mice. Finally, optogenetic or electric stimulations of the MD after ethanol injection were sufficient to induce a resistance to loss of motion, supporting that the level of neural firing in the MD is critical to maintain conscious state and delay unconscious state. We showed that the expression of Cav3.1 t-type calcium channels in MD is a cellular modulator associated with this effect.”

      [Discussion section, MD is a modulator of consciousness, paragraph 2 and 3]

      “The MD is known to innervate limbic region, basal ganglia and medial prefrontal cortex 50 and increased activity in MD might modulate the stability of cortical UP states (e.g. awaken, aroused and attentive states) and synchronization 9,26. Thus, MD might be a major hub involved in cortical state control and brain state stabilization.

      Supporting the brain state stabilization theory and the ethanol resistance of Cav3.1 mutants, Choi et al.34 demonstrated that the loss of Cav3.1 T-type calcium channel reduced the bilateral coherence between PFC and MD under ketamine anesthesia and ethanol hypnosis, especially in the delta frequency bands. More importantly, under propofol anesthesia, Bastos et al.35 showed that intralaminar nucleus and MD stimulation lead to increased wake-up subscore and arousal, together with an increased in cortico-cortico and thalamo-cortical slow (delta) frequency power.

      In the present study, we observed that MD KD (Fig. 2A), but not VB KD (Fig. S3) of Cav3.1 increased and is associated (Fig. 2D) with ethanol resistance in mice. We found that MD neurons in Cav3.1 mutant mice exhibited tonic firing within range of wakefulness (Fig. 3 and 4), indicative of resistance to ethanol and wake-like brain state. In addition, we found a strong association between the normalized tonic firing in MD and the arousal through brain states (i.e. walk to wake to sleep states), supporting that MD tonic firing could be interpreted both as a thalamic readout and a modulator of the brain state 11 (Fig. 3). Finally, direct optogenetic and electric MD stimulation increased resistance to loss of consciousness in WT mice (Fig.5 and Fig. S10). To our knowledge, this is the first report demonstrating the causal involvement of mediodorsal thalamic nucleus in the modulation of wakefulness and the resistance to ethanol-induced loss of consciousness in mice.”

      R1-5: While the methods employed generally seem sound, the description in the methods section is lacking in detail and is often difficult to follow. Analysis methods such as the burst index appear to only be given a brief explanation in the text and appear not to be mentioned in the methods section.

      R1-A5: We have added a clear definition in the supplementary method following the original work used:

      [Supplementary Method section, Single Unit recording, sorting and analysis, last paragraph]

      “The bursting index was derived as described in (Royer et al. 2012). Namely, the burst index was estimated from the spike auto-correlogram (1-ms bin size) by subtracting the mean value between 40 and 50 ms (baseline) from the peak measured between 0 and 10 ms. Positive burst amplitudes were normalized to the peak and negative amplitudes were normalized to the baseline to obtain indexes ranging from −1 to 1.” We also edited its mention in the text for clarity:

      [Result section, Lack of Ca3.1 in MD neurons removes thalamic burst in NREM sleep, paragraph 2]

      “[…] and a clear reduction in total bursting represented as bursting index (Fig. 3-B; ratio of spikes count <10 ms and >50 ms based on auto-cross-correlogram).”

      R1-6: Similarly, the staining method used in Figure 2 does not appear to be described in the methods section.

      R1-A6: The staining method can be found in the supplementary method of the paper. [supplementary method, Immunohistochemistry]

      R1-7: The most substantial case is for the UMAP approach used in Figure 4-E which does not appear to be described in the methods or even described in the main text.

      R1-A7: Regarding the method, the UMAP approach is described in the supplementary method document [Uniform Manifold Approximation and Projection (UMAP)]. We believe that only a succinct description was needed here considering the extent of the analysis. Regarding the inserts in the main text, we agree that the main text was lacking the description of these results and we have amended the main text to reflect a clear description of this result and what it entails. The following paragraph was added:

      [Result section, Under ethanol, MD neurons lacking Cav3.1 show no burst and a wake state-like neural activity, second to last paragraph]

      “Finally, we asked whether the firing modes and properties (tonic firing rate, burst firing rate; see supplementary methods) of single MD neurons would form distinct qualitative representation of “brain stages” using a lowered dimensional UMAP representation (Uniform Manifold Approximation and Projection42 ). We observed that for awake and active (i.e. walk), the brain state representation formed two adjacent clusters that confounded both wild and mutant neurons (Fig. 4E, left panel). The REM and NREM states, the wild type neurons formed 2 additional interconnected clusters, whereas the mutant neurons tend to overlap with the clusters attributed to the “awake” brain state (Fig. 4E, second to left panel). Ethanol induced fLOM, similarly to REM and NREM clusters, was distinct from awake clusters in wild type mice and overlapped with the NREM clusters (Fig. 4E, third to left panel). Here also, mutant MD neurons showed overlap with the awake clusters rather than the “low consciousness” brain states. These results indicate that the firing mode and properties could define a brain state representation that shows distinctions in levels of consciousness. Moreover, the mutant showed a representation of “low consciousness” states overlapping with wild type “awake” states consistent with the hypothesis of resistance to loss of consciousness.”

      R1-8: Citations justifying the use of methods such as the approach to separate regular spiking and narrow spiking neuron subtypes are also needed.

      R1-A8: We have added two references related to the observation of the two subpopulations of spiking neurons [Schiff and Reyes, 2012; Destexhe, 2008].

      R1-9: Beyond the problems with content and reasoning discussed above, there are also some relatively minor issues with the clarity of writing throughout the paper (for example, in the abstract the authors refer to "the ethanol resistance behavior in WT mice" but it is difficult to parse what they mean by this statement.

      R1-A9: We addressed this issue by editing and revising the manuscript for clarity and flow.

      R1-10: Similarly, the next sentence "These results support the maintenance..." while clearer, is not well phrased. Though individually minor, issues like this re-occur throughout the manuscript and sometimes make it difficult to follow so the text should be revised to correct them.

      R1-A10: We thank the reviewer for highlighting this point. We have edited the overall text to improve clarity and flow.

      [abstract] 

      These results suggest that maintaining MD neural firing at a wakeful level is sufficient to induce resistance to ethanol-induced hypnosis in WT mice.

      R1-11: There are also some problems with labels such as the labels of A1/A2 in Figure 4, which appear to be incorrect.

      R1-A11: We noted this issue and have rectified the figure for clarity.

      R1-12: Also, S7 has no label on the B panels.

      R1-A12: We thank the reviewer for pointing out this lack. We have added the y-label on the panel for clarity.

      R1-13: Finally, some references are not included (only a label of [ref]).

      R1-A13: We have completed the missing reference and thank the reviewer for pointing that out.

      Additional comments

      R1-14: Aside from the additional quantification and clarification of the analysis discussed in the weakness section, in general, the experiments included in the manuscript seem reasonable. However, I would suggest one additional experiment as well as one control, both of which are relatively straightforward optogenetic experiments, that I feel would be helpful to further improve the study. First, as the authors note, the optogenetic interventions used do not directly address the relevance of the changes in bursting patterns observed in the knockout (KO), which are by far the most robust effect, with the changes in alcohol sensitivity. One approach that could help address this would be to use patterned suppression via inhibitory opsins (e.g. halorhodopsin) to "rescue" the periods of inhibition associated with bursting in the KO.

      R1-A14: Here the reviewer proposes an interesting experiment which we have attempted to perform, however, poses several technical challenges. First, the KO do not have burst firing as they are depleted from Cav3.1 low-threshold calcium channel. Therefore, under ethanol, even if there might exist a rhythmic inhibition that activates Cav3.1 channels and causes a rebound burst, the KO are unable to have it. Therefore, an optogenetic inhibition would only accentuate the total inhibition and could potentially induce an overall decrease in MD firing, resulting in an increase in LOM features. Alternatively, we showed that in a WT with low ethanol dose (where LOM induction is harder), the increased rhythmic inhibition does indeed increase significantly LOM duration and marginally decreases latency to LOM (Fig. S12), indicating that increased inhibition could indeed explain the hypothesis: “ the stronger the decrease in MD firing, the faster and longer the LOM.” The only caveat of using WT here is that optogenetic inhibition might also include rebound burst post-inhibition. Injecting bursts only did not alter the response to ethanol (Fig. S10). These results point to the role of loss of firing in MD as a main factor for LOM, and potentially the contribution of burst necessitating a concurrent inhibition/loss of firing.

      We agree that inhibition in KO would further validate this hypothesis, controlling for the role of burst. We regret that we are not in the capacity to perform additional experiments involving the KO mice.

      R1-15: For the control, tonic activation of the ventrobasal nucleus, as the authors did for the mediodorsal nucleus, would be beneficial to rule out the possibility that the observed effect would occur with any thalamic nucleus.

      R1-A15: We agree with the reviewer that we could have added an additional region control to the gain/loss of function experiments. We would even go further as to suggest that a better control nucleus would be a high order nucleus such as PO or an unrelated sensory relay nucleus such as LGN. VB being a motor relay nucleus, could also mediate movement initiation, which could be hard to interpret. Since the complete control study for all thalamic nuclei Cav3.1 KD is outside the scope of this study, we opted not to redo these experiments and keep the focus of the manuscript on the manipulation of MD activity rather than the various available thalamic nuclei. We also do not claim that MD is the sole center able to initiate a switch in the loss of consciousness, and a more in-depth study on that matter would be clearly needed.

      R1-16: In addition to these experiments, I did not note the strategy for sharing data obtained through this study so this should be added.

      R1-A16: We have uploaded data and code for most figures at the following repository and provided a clearer statement regarding data sharing. We thank the reviewer for pointing out this missing element.

      The link for the repository is the following:

      It contains:

      - Excel spreadsheet file of all behavior values, including the newly quantified Cv3.1 expression in MD/CL/SMT

      - Excel spreadsheet follow-up of all MD cells (single unit; tetrode) analyzed

      - Folders for all groups studied with representative figures showing EEG power over time and normalized activity (WT vs KO for 2, 3 and 4 g/kg; MDshKD vs shCTR, VBshKD vs shCTR; CHR2 NOSTIM vs STIM; ESTIM Groups and ARCH NOSTIM vs STIM)

      - A1G LORRvsLOM and OPEN FIELD Matlab data

      - Matlab and ImageJ Codes: single unit analysis, characterization, brain state characterization, sleep stages, LOM, open field analysis and statistical analysis.

      We have added the data sharing subsection in the acknowledgements:

      “Part of the analyzed data and codes are available on the open access platform, mendeley:

      Latchoumane, Charles-francois (2024), “Mediodorsal thalamic nucleus mediates resistance to ethanol through Cav3.1 T-type Ca2+ regulation of neural activity”, Mendeley Data, V1, doi: 10.17632/7fr427426m.1

      Additional data (large size recording and images) can be provided upon reasonable requests.”

      Reviewer #2 (Recommendations For The Authors):

      R2-1. Consciousness is a contentious subject. Even in humans, there is still intense research on the topic, not to mention animals, about which we still know very little. Moreover, consciousness is not quantified in this study, as there is no standard metric to do so. Accordingly, talking about 'modulation', 'transition', ́level ', or 'reduction' of consciousness can be misleading. Hence, it is probably safer to strictly refer to brain-states and/or stages of the sleep-wake cycle in this study and reframe it entirely around these concepts.

      R2-A1. The reviewer points to an important point and we appreciate this highlight. Agreeing that the definition of consciousness is rather loose and arguably difficult to pinpoint. Here, we settle on a definition that relies on the loss of motion and loss of righting reflex. This definition is widely accepted as the “verified” state in which the absence of responsiveness (to continuous stimuli, inducing reflex or discomfort) is observed and uninterrupted by jerks and spurious movements. Additional metrics needed would be the recording of EMG to quantify atonia and EEG to the settling of a dominantly slow-wave frequency (~4 Hz; ethanol-induced sedation at theta rhythm), as shown in Fig S1A. The driver of this 4Hz frequency and its correlation has been investigated previously (e.g. Choi et al, PNAS, 2012), leading to the accepted link between LOM/LORR and loss of consciousness. Our data present the advantage of showing single neuron recordings and that LOM is a state where the lowest firing activity is present (Fig S7AB) and comparable to deep sleep state activity (Fig3D). The first LOM is the most important as it highlights the deepest loss of consciousness before the ethanol starts to be metabolized and cleared, which would be consistent between animals.

      As a result, we have edited the manuscript to clarify all mentions related to brain states and states of unconsciousness.

      R2-2. It is not clear why the authors focus on the mediodorsal nucleus. This should be better explained in the introduction and developed in the discussion.

      R2-A2. This comment converges with the Reviewer 1 comments and we are addressing this lack in the discussion as suggested. We have addressed it with this previous comment and believe it is now clearer.

      R2-3. The discussion mentions that 'increased activity in MD might modulate the stability of cortical UP state and synchronization' (pg 21). This point should be either further developed and put into context, or removed. In its current state, it does not seem to contribute much to the discussion of results.

      R2-A3. We understand that the working “UP state” might not be clear enough. We have modified this sentences as follows to clarify that UP state could be either a state of where the animal is awake, aroused or attentive:

      [Discussion section, MD is a modulator of consciousness, first paragraph]

      “The MD is known to innervate limbic region, basal ganglia and medial prefrontal cortex 50 and increased activity in MD might modulate the stability of cortical UP states (e.g. awaken, aroused and attentive states) and synchronization 9,26. Thus, MD might be a major hub involved in cortical state control and brain state stabilization.“

      R2-4. The discussion states that 'mutant mice did not exhibit a decreased arousal level (i.e. increased locomotor activity)' (pg 23). This is confusing as decreased arousal should be reflected in decreased locomotor activity.

      R2-A4. We understand that the formulation of this sentence may be confusing and we have edited this portion of the text to improve quality in the revised version of the manuscript. To clarify, mutant mice do not exhibit reduced or increased arousal (not quantified, just observational), they do have a phenotypic hyperlocomotion. This comes in contrast with a lower basal firing rate in the MD, which in our interpretation, is not synonymous with lower arousal. We believe that the relative change in MD determines the change in arousal, and that the absolute firing is not indicative of arousal in itself, only in comparison.

      [Discussion section, The lower variability in MD Firing reflects Ethanol Resistance in Cav3.1 mutant mice, paragraph 2]

      “Mutant RS neurons in MD showed an overall lower excitability and variability of firing in various natural conscious and unconscious states compared to wild type mice. Remarkably, Cav3.1 mutant mice exhibited a clear increased locomotor activity and an increased resistance to ethanol. The general lower firing rate and the high “arousal” observed in mutant mice suggests that the relative change from state to state in tonic firing in MD, and not the absolute value of firing, might be a better correlate of change in brain state in the mice.”

      R2-5. The methods (pg 27) state that two genetic backgrounds (129/svjae and C57BL/6J ) were used in the study. Authors should show whether there were significant differences between those backgrounds in the key parameters assessed in the study (particularly resistance to ethanol sedation).

      R2-A5. As mentioned in the method section, we only used the F1-background mice, which are the firstgeneration offspring produced by crossing 129/svjae and C57BL/6J strains. To produce F1 KO mice, we kept the heterozygote mice in two strains. We unfortunately did not study the particular difference of the respective KO of these two backgrounds; however, the pure C57BL/6J KO has been used in other studies by our group (Kim et al 2001; Na et al, 2008; Park et al., 2010). The F1 background allows us to work with mice that are less aggressive and can be handled with less inherent stress.

      R2-6. It would be convenient to produce a supplementary figure associated with Figure 1C to show the same data with averages per mouse. That is, 9 points for control and 9 points for KO mice. This also applies to all cases where data is not presented per mouse but pooled between animals.

      R2-A6. We have added a panel C in Figure S1, to show the scatter values for all the mice corresponding to the figure 1C. We have also generalized this presentation for all behavior graphics showing all the animals in the scatter plot next to the boxplot. We believe that this presentation increases further the transparency of the manuscript. We have then added the scatter plot for all mice in figure Fig1, Fig2, Fig5, Fig.S2, Fig.S3, Fig.S10 and Fig.S12.

      R2-7. It would be informative to make a supplementary figure associated with Figure 1D to compare baseline raw activity levels (i.e., baseline walking recording) between control and KO mice. That is, do KO and control mice cover comparable distances and at similar speeds during baseline conditions? Figure 1D and Figure 4A suggest that the variability of locomotor activity is larger in KO mice. Hence, this parameter should be quantified and reported.

      R2-A7. We thank the reviewer for this comment. We strived to answer to this question in the manuscript in two ways:

      - We first measure the overall hyperlocomotion of the mice using the open field total distance parkoured in our mice cohorts (FigS4C). We did observe that the KO mutant showed hyperlocomotion, but not MD or VB knock-down mice. Which indicates that the hyperlocomotion component is not specific to the two thalamic nuclei studied.

      - Using the forced walking task, we impose on the animal to keep a steady pace of roughly 6cm/s. This assay allows to normalize the general walking behavior to a relatively fixed pace making it comparable for all animals.

      The reviewer suggested reporting the mean and variance in walking of WT and KO during baseline (prior to the ethanol I.P. injection). We believe that the two points mentioned above are sufficient to describe in a more quantitative way the WT vs KO locomotion differences. Moreover, by construction the normalized locomotion on the forced walking task will return similar means for the baseline, the standard deviation would, however, potentially show differences but would remain inconclusive.

      R2-8. The legend in Figure 1 states that 'the loss of consciousness is evaluated using normalized moving index using either video analysis (differential pixel motion), on- head accelerometer-based motion, or neck electromyograms'. Authors should clarify whether these methods are equivalent and support it with data.

      R2-A8. We understand the reviewer point and we have made a few modifications to the method description aligning better with what was done. For most mice, video analysis was used to obtain the moving index. When video recording was not available (2 mice), we had an accelerometer attached to the animal’s head stage which helped us derive a moving index that was similar to the video moving index. The neck electromyogram was rather used for animals implanted with the tetrodes to identify sleep stages based on local field potential frequency and muscle tone.  We have then clarified the method for this matter and Figure 1 to avoid this confusion. Since no concurrent recording of both video and accelerometer was performed, we do not have the data to compute the correlation between the two measures, however, no noticeable deviation from loss of motion was observed between the two methods. We realize that this may be a weak argument, however, our observations showed that video and accelerometers returned very similar timings for loss of motion (only a few comparative instances insufficient to present a statistical comparison).

      R2-9. How were spike bursts defined? The authors should try different criteria and verify the consistency of results.

      R2-A9 For in vivo single unit recording, we opted for a definition that is validated from our works and others as a silencing of at least 100 ms followed by a minimum of 3 spikes with:

      - First spike pairs interspike interval less than 4 ms

      - Remaining spike pairs interspike interval less than 20 ms

      We have performed this analysis using a minimum of 2 spikes, and varied silencing periods between 50 and 100ms, without observing significant deviation of the results. As shown in Figure S6B, with this approach we observed that the burst distribution had a majority with <10 spikes per burst. Figure S6C indicated that with a clear distribution of ISI for first spike within 2-4ms as observed in previous works (Desai and Varela, 2021; Alitto et al, 2019), importantly, not clearly capped at 4 ms, showing that the range for the first ISI might indeed be lower than 4ms for thalamic burst. Within burst spike waveforms can become very variable and the choice of 3 over 2 spikes minimum per burst stems from the aim to reduce false positive detection of ultra-short bursts, which in single unit recording remains controversial (Gray et al. 1995).

      Minor:

      R2-10: Figure 4A2 'Cav3.1(+/+)' should presumably be Cav3.1(-/-).

      R2-A10: this is correct and we have corrected the figure label [This sentence is ambiguous. What is ‘this’ that is correct?]

      R2-11: Figure S2C legend states 'Post-hoc group comparison was performed using.' The sentence seems to be incomplete.

      R2-A11: We have completed the sentence for clarity.

      R2-12: In the methods (pg 29) virus concentration is reported as '107 TU/ul', which probably refers to 10e7.

      R2-A12: We have corrected it by superscripting the power 7.

      R2-13: Verify Fig 1C1 and correct Y-axis overlap between title and units.

      R2-A13: We edited the figure for clarity, thank you.

      R2-14: On page 24 there is a '[ref]' that probably stands for (a missing) reference.

      R2-A14: the missing reference has been added.

    1. Author response:

      We are glad that the reviewers found our work to be interesting and appreciate its contribution to enhancing ecological validity of attention research. We also agree that much more work is needed to solidify this approach, and that some of the results should be considered “exploratory” at this point, but appreciate the recognition of the novelty and scientific potential of the approach introduced here.

      We will address the reviewers’ specific comments in a revised version of the paper, and highlight the main points here:

      · We agree that the use of multiple different neurophysiological measures is both an advantage and a disadvantage, and that the abundance of results can make it difficult to tell a “simple” story. In our revision, we will make an effort to clarify what (in our opinion) are the most important results and provide readers with a more cohesive narrative.

      · Important additional discussion points raised by the reviewers, which will be discussed in a revised version are a) the similarities and differences between virtual and real classrooms; b) the utility of the methods and data to the community and c) the implication of these results for educational neuroscience and ADHD research.

      · In the revision, we will also clarify several methodological aspects of the data analysis, as per the reviewers’ requests.

      · After final publication, the data will be made available for other researchers to use.

    1. Author response:

      Reviewer #1 (Public review):

      Weaknesses:

      (1) The heatmaps (for example, Figure 3A, B) are challenging to read and interpret due to their size. Is there a way to alter the visualization to improve interpretability? Perhaps coloring the heatmap by general anatomical region could help? We feel that these heatmaps are critical to the utility of the registration strategy, and hence, clear visualization is necessary.

      We thank the reviewers for this point on aesthetic improvement, and we agree that clearer visualization of our correlation heatmaps is important. To address this point, we have incorporated the capability of grouping “child” subregions in anatomical order by their more general “parent” region into the package function, plot_correlation_heatmaps(). Parent regions will be visually represented as smaller sub-facets in the heatmaps, and we will be submitting our full revised manuscript with these visual changes.

      (2) Additional context in the Introduction on the use of immediate early genes to label ensembles of neurons that are specifically activated during the various behavioral manipulations would enable the manuscript and methodology to be better appreciated by a broad audience.

      We thank the reviewers for this suggestion and will be revising parts of our Introduction to reflect the broader use and appeal of immediate early genes (IEGs) for studying neural changes underlying behavior.

      (3) The authors mention that their segmentation strategies are optimized for the particular staining pattern exhibited by each reporter and demonstrate that the manually annotated cell counts match the automated analysis. They mention that alternative strategies are compatible, but don't show this data.

      We thank the reviewers for this comment. We also appreciate that integration with alternative strategies is a major point of interest to readers, given that others may be interested in compatibility with our analysis and software package, rather than completely revising their own pre-existing workflows.

      This specific point on segmentation refers to the import_segmentation_custom()function in the package. As there is currently not a standard cell segmentation export format adopted by the field, this function still requires some data wrangling into an import format saved as a .txt file. However, we chose not to visually demonstrate this capability in the paper for a few reasons.

      i. A figure showing the broad testing of many different segmentation algorithms, (e.g., Cellpose, Vaa3d, Trainable Weka Segmentation) would better demonstrate the efficacy of segmentation of these alternative approaches, which have already been well-documented. However, demonstrating importation compatibility is more of a demonstration of API interface, which is better shown in website documentation and tutorial notebooks.

      ii. Additionally, showing importation with one well-established segmentation approach is still a demonstration of a single use case. There would be a major burden-of-proof in establishing importation compatibility with all potential alternative platforms, their specific export formats, which may be slightly different depending on post-processing choices, and the needs of the experimenters (e.g., exporting one vs many channels, having different naming conventions, having different export formats). For example, output from Cellpose can take the form of a NumPy file (_seg.npy file), a .png, or Native ImageJ ROI archive output, and users can have chosen up to four channels. Until the field adopts a standardized file format, one flexible enough to account for all the variables of experimental interest, we currently believe it is more efficient to advise external groups on how to transform their specific data to be compatible with our generic import function.

      Internally, in collaborative efforts, we have validated the ability to import datasets generated from completely different workflows for segmentation and registration. We intend on releasing this documentation in coming updates on our package website, which we believe will be more demonstrative on how to take advantage of our analysis package, without adopting our entire workflow.

      (4) The authors provided highly detailed information for their segmentation strategy, but the same level of detail was not provided for the registration algorithms. Additional details would help users achieve optimal alignment.

      We apologize for this lack of detail. The registration strategy depends upon the WholeBrain package for registration to the Allen Mouse Common Coordinate Framework. While this strategy has been published and documented elsewhere, we will be revising our methods to better incorporate details of this approach.

      Reviewer #2 (Public review):

      Weaknesses:

      (1) While I was able to install the SMARTR package, after trying for the better part of one hour, I could not install the "mjin1812/wholebrain" R package as instructed in OSF. I also could not find a function to load an example dataset to easily test SMARTR. So, unfortunately, I was unable to test out any of the packages for myself. Along with the currently broken "tractatus/wholebrain" package, this is a good example of why I would strongly encourage the authors to publish SMARTR on either Bioconductor or CRAN in the future. The high standards set by Bioc/CRAN will ensure that SMARTR is able to be easily installed and used across major operating systems for the long term.

      We thank reviewers for pointing out this weakness; long-term maintenance of this package is certainly a mutual goal. Loading an .RDATA file is accomplished by either double-clicking directly on the file in a directory window, or by using the load() function, (e.g., load("directory/example.RData")). We will explicitly outline these directions in the online documentation and in our full revision.

      Moreover, we will submit our package to CRAN. Currently, SMARTR is not dependent on the WholeBrain package, which remains optional for the registration portion of our workflow. Ultimately, this independence will allow us to maintain the analysis and visualization portion of the package independently, and allow for submission to a more centralized software repository such as CRAN.

      (2) The package is quite large (several thousand lines include comments and space). While impressive, this does inherently make the package more difficult to maintain - and the authors currently have not included any unit tests. The authors should add unit tests to cover a large percentage of the package to ensure code stability.

      We appreciate this feedback and will add unit testing to improve the reliability of our package in the full revision.

      (3) Why do the authors choose to perform image segmentation outside of the SMARTR package using ImageJ macros? Leading segmentation algorithms such as CellPose and StarMap have well-documented APIs that would be easy to wrap in R. They would likely be faster as well. As noted in the discussion, making SMARTR a one-stop shop for multi-ensemble analyses would be more appealing to a user.

      We appreciate this feedback. We believe parts of our response to Reviewer 1, comment 3, are relevant to this point. Interfaces for CellPose and ClusterMap (which processes in situ transcriptomic approaches like STARmap) are both in python, and currently there are ways to call python from within R (https://rstudio.github.io/reticulate/index.html). We will certainly explore incorporating these APIs from R. However, we would anticipate this capability is more similar to “translation” between programming languages, but would not currently preclude users from the issue of still needing some familiarity with the capabilities of these python packages, and thus with python syntax.

      (4) Given the small number of observations for correlation analyses (n=6 per group), Pearson correlations would be highly susceptible to outliers. The authors chose to deal with potential outliers by dropping any subject per region that was> 2 SDs from the group mean. Another way to get at this would be using Spearman correlation. How do these analyses change if you use Spearman correlation instead of Pearson? It would be a valuable addition for the author to include Spearman correlations as an option in SMARTR.

      We thank reviewers for this suggestion and will provide a supplementary analysis of our results using Spearman correlations.

      (5) I see the authors have incorporated the ability to adjust p-values in many of the analysis functions (and recommend the BH procedure) but did not use adjusted p-values for any of the analyses in the manuscript. Why is this? This is particularly relevant for the differential correlation analyses between groups (Figures 3P and 4P). Based on the un-adjusted p-values, I assume few if any data points will still be significant after adjusting. While it's logical to highlight the regional correlations that strongly change between groups, the authors should caution which correlations are "significant" without adjusting for multiple comparisons. As this package now makes this analysis easily usable for all researchers, the authors should also provide better explanations for when and why to use adjusted p-values in the online documentation for new users.

      We appreciate the feedback and will more explicitly outline that in our paper, our dataset is presented as a more demonstrative and exploratory resource for readers and, as such, we accept a high tolerance for false positives, while decreasing risk of missing possible interesting findings. As noted by Reviewer #2, it is still “logical to highlight the regional correlations that strongly change between groups.” We will further clarify in our methods that we chose to present uncorrected p-values when speaking of significance. We will also include more statistical detail on our online documentation regarding FDR correction. Ultimately, the decision to correct for multiple comparisons and FDR choice of threshold, should still be informed by standard statistical theory and user-defined tolerance for inclusion of false-positives and missing of false-negatives. This will be influenced by factors, such as the nature and purpose of the study, and quality of the dataset.  

      (6) The package was developed in R3.6.3. This is several years and one major version behind the current R version (4.4.3). Have the authors tested if this package runs on modern R versions? If not, this could be a significant hurdle for potential users.

      We thank reviewers for pointing out concerns regarding versioning. Analysis and visualization capabilities are currently supported using R version 4.1+. The recommendation for R 3.6.3 is primarily for users interested in using the full workflow, which requires installation of the WholeBrain package. We anticipate supporting of visualization and network analysis capabilities with updated packages and R versions, and maintaining a legacy version for the full workflow presented in this paper.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors present a new application of the high-content image-based morphological profiling Cell Painting (CP) to single cell type classification in mixed heterogeneous induced pluripotent stem cellderived mixed neural cultures. Machine learning models were trained to classify single cell types according to either "engineered" features derived from the image or from the raw CP multiplexed image. The authors systematically evaluated experimental (e.g., cell density, cell types, fluorescent channels) and computational (e.g., different models, different cell regions) parameters and convincingly demonstrated that focusing on the nucleus and its surroundings contains sufficient information for robust and accurate cell type classification. Models that were trained on mono-cultures (i.e., containing a single cell type) could generalize for cell type prediction in mixed co-cultures, and describe intermediate states of the maturation process of iPSC-derived neural progenitors to differentiation neurons.

      Strengths:

      Automatically identifying single-cell types in heterogeneous mixed-cell populations holds great promise to characterize mixed-cell populations and to discover new rules of spatial organization and cell-cell communication. Although the current manuscript focuses on the application of quality control of iPSC cultures, the same approach can be extended to a wealth of other applications including an in-depth study of the spatial context. The simple and high-content assay democratizes use and enables adoption by other labs.

      The manuscript is supported by comprehensive experimental and computational validations that raise the bar beyond the current state of the art in the field of high-content phenotyping and make this manuscript especially compelling. These include (i) Explicitly assessing replication biases (batch effects); (ii) Direct comparison of feature-based (a la cell profiling) versus deep-learning-based classification (which is not trivial/obvious for the application of cell profiling); (iii) Systematic assessment of the contribution of each fluorescent channel; (iv) Evaluation of cell-density dependency; (v) Explicit examination of mistakes in classification; (vi) Evaluating the performance of different spatial contexts around the cell/nucleus; (vii) Generalization of models trained on cultures containing a single cell type (mono-cultures) to mixed co-cultures; (viii) Application to multiple classification tasks.

      I especially liked the generalization of classification from mono- to co-cultures (Figure 4C), and quantitatively following the gradual transition from NPC to Neurons (Figure 5H).

      The manuscript is well-written and easy tofollow.

      Thank you for the positive appreciation of our work and constructive comments. 

      Weaknesses:

      I am not certain how useful/important the specific application demonstrated in this study is (quality control of iPSC cultures), this could be better explained in the manuscript. 

      To clarify the importance we have added an additional explanation to the introduction (page 3) and also come back to it in the discussion (page 17).

      Text from the introduction:

      “However, genetic drift, clonal and patient heterogeneity cause variability in reprogramming and differentiation efficiency10,11. The differentiation outcome is further strongly influenced by variations in protocol12. This can significantly impact experimental outcomes, leading to inconsistent and potentially misleading results and consequently, it hinders the use of iPSC-derived cell systems in systematic drug screening or cell therapy pipelines. This is particularly true for iPSC-derived neural cultures, as their composition, purity and maturity directly affect gene expression and functional activity, which is essential for modelling neurological conditions13,14. Thus, from a preclinical perspective, there is the need for a fast and cost-effective QC approach to increase experimental reproducibility and cell type specificity15. From a clinical perspective in turn, robust QC is required for safety and regulatory compliance (e.g., for cell therapeutic solutions). This need for improved standardization and QC is underscored by large-scale collaborative efforts such as the International Stem Cell Banking Initiative16, which focusses on clinical quality attributes and provides recommendations for iPSC validation testing for use as cellular therapeutics, or the CorEuStem network, aiming to harmonize iPSC practices across core facilities in Europe.”

      Text from the discussion: 

      “Many groups highlight the difficulty of reproducible neural differentiation and attribute this to culture conditions, cultivation time and variation in developmental signalling pathways in the source iPSC material43,44. Spontaneous neural differentiation has previously been shown to require approximately 80 days before mature neurons arise that can fire action potentials and show neural circuit formation. Although these differentiation processes display a stereotypical temporal sequence34, the exact timing and duration might vary. This variation negatively affects the statistical power when testing drug interventions and thus prohibits the application of iPSC-culture derivatives in routine drug screening. Current solutions (e.g., immunocytochemistry, flow cytometry, …) are often cost-ineffective, tedious, and incompatible with longitudinal/multimodal interrogation. CP is a much more cost-effective solution and ideally suited for this purpose. Routine CP-based could add confidence to and save costs for the drug discovery pipeline. We have shown that CP can be leveraged to capture the morphological changes associated with neural differentiation.”

      Another issue that I feel should be discussed more explicitly is how far can this application go - how sensitively can the combination of cell painting and machine learning discriminate between cell types that are more subtly morphologically different from one another?

      Thank you for this interesting question. The fact that an approach based on a subregion not encompassing the whole cell (the “nucleocentric” approach) can predict cell types equally well, suggests that the cell shape as such is not the defining factor for accurate cell type profiling. And, while clearly neural progenitors, neurons or glia have vastly different cell shapes. We have shown that cells with closer phenotypes such as 1321N1 vs. SH-SY5Y or astrocytes vs. microglia can be distinguished with equal performance. However, triggered by the reviewers’ question, we have now tested additional conditions with more subtle phenotypes, including the classification of 1321N1 vs. two related retinal pigment epithelial cells with much more similar morphology (ARPE and RPE1 cells). We found that the CNN could discriminate these cells equally well and have added the results on page 8 and in Fig. 3D. To address this question from a different angle, we have also performed an experiment in which we changed cell states to assess whether discriminatory power remains high. Concretely, we exposed co-cultures of neurons and microglia to LPS to trigger microglial activation (more subtly visible as cytoskeletal changes and vacuole formation). This revealed that our approach still discriminates both cell types (neurons vs. microglia) with high accuracy, regardless of the microglial state. Furthermore, using a two-step approach, we could also distinguish LPS-treated (assumed to be activated) from unchallenged microglia (assumed to be more homeostatic), albeit with a lower accuracy. This experiment has been added as an extra results section (Cell type identification can be applied to mixed iPSC-derived neuronal cultures regardless of activation state, p12) and Fig. 7c. Finally, we have also added our take on what the possibilities could be for future applications in even more complex contexts such as tissue slice, 3D and live cell applications (page 17-18). 

      Regarding evaluations, the use of accuracy, which is a measure that can be biased by class imbalance, is not the most appropriate measurement in my opinion. The confusion matrices are a great help, but I would recommend using a measurement that is less sensitive for class imbalance for cell-type classification performance evaluations.  

      Across all CNNs trained in this manuscript, the sample size of the input classes has always been equalized, ruling out any effects of class imbalance. Nevertheless, to follow the reviewers’ recommendation, we have now used the F-score to document performance as it is insensitive to such imbalance. For clarity, we have now also mentioned the input number (ROIs/class) in every figure.

      Another issue is that the performance evaluation is calculated on a subset of the full cell population - after exclusion/filtering. Could there be a bias toward specific cell types in the exclusion criteria? How would it affect our ability to measure the cell type composition of the population?

      As explained in the M&M section, filtering was performed based on three criteria:

      (1) Nuclear size: values below a threshold of 160, objects are considered to represent debris;

      (2) DAPI intensity: values below a threshold of 500 represent segmentation errors;

      (3) IF staining intensity: gates were set onto the intensity of the fluorescent markers used with posthoc IF to only retain cells that are unequivocally positive for either marker and to avoid inclusion of double positive (or negative) cells in the ground truth training. 

      One could argue that the last criterion introduces a certain bias in that it does not consider part of the cell population. However, this is also not the purpose of our pioneering study that aims at identifying unique cell types for which ground truth is as pure and reliable as possible. Not filtering out these cells with a ‘dubious’ IF profile (e.g., cells that might be transitioning or are of a different type) would negatively affect the model by introducing noise. It is correct that the predictions are based only on these inputs and so cells of a subsequent test set will only be classified according to these labels. For example, in the neuronal differentiation experiment (Fig. 6G-H), cells are either characterized as NPC or as neurons, which leaves the transitioning (or undefined) cells in either category. Despite this simplification, the model adequately predicted the increase in neuron/NPC ratio with culture age. In future iterations, one could envision defining more refined cell (sub-)types in a population based on richer post-hoc information (e.g., through cyclic immunofluorescence or spatial single cell transcriptomics) or longitudinal follow-up of cell-state transitions using live imaging. This notion has been added to page 17 of the manuscript.

      I am not entirely convinced by the arguments regarding the superiority of the nucleocentric vs. the nuclear representations. Could it be that this improvement is due to not being sensitive/ influenced by nucleus segmentation errors?

      The reviewer has a valid point that segmentation errors may occur. However, the algorithm we have used (Stardist classifier), is very robust to nuclear segmentation errors. To verify the performance, we have now quantified segmentation errors in 20 images for 3 different densities and found a consistently low error rate (0.6 -1.6%) without correlation to the culture density. Moreover, these errors include partial imperfections (e.g., a missed protrusion or bleb) as well as over- (one nucleus detected as more) or under- (more nuclei detected as one) segmentations. The latter two will affect both the nuclear and nucleocentric predictions and should thus not affect the prediction performance. In the case of imperfect segmentations, there may be a specific impact on the nucleus-based predictions (which rely on blanking the non-nuclear part), but this alone cannot explain the significantly higher gain in accuracy for nucleocentric predictions (>5%). Therefore, we conclude that segmentation errors may contribute in part, but not exclusively, to the overall improved performance of nucleocentric input models. We have added this notion in the discussion (pages 14-15 and Suppl. Fig. 1E).

      GRADCAM shows cherry-picked examples and is not very convincing.

      To help convince the reviewer and illustrate the representativeness of selected images, we have now randomly selected for each condition and density 10 images (using random seeds to avoid cherrypicking) and added these in a Suppl. Fig. 3.

      There are many missing details in the figure panels, figure legend, and text that would help the reader to better appreciate some of the technical details, see details in the section on recommendations for the authors.

      Please see further for our specific adaptations.

      Reviewer #2 (Public Review):

      This study uses an AI-based image analysis approach to classify different cell types in cultures of different densities. The authors could demonstrate the superiority of the CNN strategy used with nucleocentric cell profiling approach for a variety of cell types classification. The paper is very clear and well-written. I just have a couple of minor suggestions and clarifications needed for the reader.

      The entire prediction model is based on image analysis. Could the authors discuss the minimal spatial resolution of images required to allow a good prediction? Along the same line, it would be interesting to the reader to know which metrics related to image quality (e.g. signal to noise ratio) allow a good accuracy of the prediction.

      Thank you for the positive and relevant feedback.

      The reviewer has a good point that it is important to portray the imaging conditions that are required for accurate predictions. To investigate this further we have performed additional experiments that give a better view on the operating window in terms of resolution and SNR (manuscript page 7-8 and new figure panels Fig. 3B-C). The initial image resolution was 0.325 µm/pixel. To understand the dependency on resolution we performed training and classifications for image data sets that were progressively binned. We found that a two-fold reduction in resolution did not significantly affect the F-score, but further degradation decreased the performance. At a resolution of 6,0 µm/pixel (20-fold binning), the F-score dropped to 0.79±0.02, comparable to the performance when only the DAPI (nuclear) channel was used as input. The effect of reduced image quality was assessed in a similar manner, by iteratively adding more Gaussian noise to the image. We found that above an SNR of 10 the prediction performance remains consistent but below it starts to degrade. While this exercise provides a first impression of the current confines of our method, we do believe it is plausible that its performance can be extended to even lower-quality images for example by using image restoration algorithms. We have added this notion in the discussion (page 14).

      The authors show that nucleocentric-based cell feature extraction is superior to feeding the CNN-based model for cell type prediction. Could they discuss what is the optimal size and shape of this ROI to ensure a good prediction? What if, for example, you increase or decrease the size of the ROI by a certain number of pixels?

      To identify the optimal input, we varied the size of the square region around the nuclear centroid from 0.6 to 150 µm for the whole dataset. Within the nuclear-to-cell window (12µm- 30µm) the average Fscore is limited, but an important observation is the increasing error and differences in precision and recall with increasing nucleocentric patch sizes, which will become detrimental in cases of class imbalance. The F-score is maximal for a box of 12-18µm surrounding the nuclear centroid. In this “sweet spot”, the precision and recall are also in balance. Therefore, we have selected this region for the actual density comparison experiment. We have added our results to the manuscript (page 9 and 15).

      It would be interesting for the reader to know the number of ROI used to feed each model and know the minimal amount of data necessary to reach a high level of accuracy in the predictions.

      The figures have now been adjusted so that the number of ROIs used as input to feed the model are listed. The minimal number of ROIs required to obtain high level accuracy is tested in Figure 2C. By systematically increasing the number of input ROIs for both RF and CNN, we found that a plateau is reached at 5000 input ROIs (per class) for optimal prediction performance. This is also documented in the results section page 6.

      From Figure 1 to Figure 4 the author shows that CNN based approach is efficient in distinguishing 1321N1 vs SH-SY5Y cell lines. The last two figures are dedicated to showing 2 different applications of the techniques: identification of different stages of neuronal differentiation (Figure 5) and different cell types (neurons, microglia, and astrocytes) in Figure 6. It would be interesting, for these 2 two cases as well, to assess the superiority of the CNN-based approach compared to the more classical Random Forest classification. This would reinforce the universal value of the method proposed.

      To meet the reviewer’s request, we have now also compared CNN to RF for the classification of cells in iPSC-derived models (Figures 6 and 7). As expected, the CNN performed better in both cases. We have now added these results in Fig. 6 D and 7 C and pages 12 and 13 of the manuscript.

      Reviewer #3 (Public Review):

      Induced pluripotent stem cells, or iPSCs, are cells that scientists can push to become new, more mature cell types like neurons. iPSCs have a high potential to transform how scientists study disease by combining precision medicine gene editing with processes known as high-content imaging and drug screening. However, there are many challenges that must be overcome to realize this overall goal. The authors of this paper solve one of these challenges: predicting cell types that might result from potentially inefficient and unpredictable differentiation protocols. These predictions can then help optimize protocols.

      The authors train advanced computational algorithms to predict single-cell types directly from microscopy images. The authors also test their approach in a variety of scenarios that one may encounter in the lab, including when cells divide quickly and crowd each other in a plate. Importantly, the authors suggest that providing their algorithms with just the right amount of information beyond the cells' nuclei is the best approach to overcome issues with cell crowding.

      The work provides many well-controlled experiments to support the authors' conclusions. However, there are two primary concerns: (1) The model may be relying too heavily on the background and thus technical artifacts (instead of the cells) for making CNN-based predictions, and (2) the conclusion that their nucleocentric approach (including a small area beyond the nucleus) is not well supported, and may just be better by random chance. If the authors were to address these two concerns (through additional experimentation), then the work may influence how the field performs cell profiling in the future.

      Thank you very much for confirming the potential value of our work and raising these relevant items. To better support our claims we have now performed additional validations, which we detail below. 

      (1) The model may be relying too heavily on the background and thus technical artifacts (instead of the cells) for making CNN-based predictions 

      To address the first point, we have adapted the GradCAM images to show an overlay of the input crop and GradCAM heatmap to give a better view of the structures that are highlighted by the CNN. We further investigated the influence of the background on the prediction performance. Our finding that a CNN trained on a monoculture retains a relatively high performance on cocultures implies that the CNN uses the salient characteristics of a cell to recognize it in more complex heterogeneous environments. Assuming that the background can vary between experiments, the prediction of a pretrained CNN on a new dataset indicates that cellular characteristics are used for robust prediction.  When inspecting GradCAM images obtained from the nucleocentric CNN approaches (now added in Suppl. Fig. 3), we noticed that the nuclear periphery typically contributed the most (but not exclusively) to the prediction performance. When using only the nuclear region as input, GradCAMs were more strongly (but again not exclusively) directed to the background surrounding the nuclei. To train the latter CNN, we had cropped nuclei and set the background to a value of zero. To rule out that this could have introduced a bias, we have now performed the exact same training and classification, but setting the background to random noise instead (Suppl. Fig. 2). While this effectively diverted the attention of the GradCAM output to the nucleus instead of the background, the prediction performance was unaltered. We therefore assume that irrespective of the background, when using nuclear crops as input, the CNN is dominated by features that describe nuclear size. We observe that nuclear size is significantly different in both cell types (although intranuclear features also still contribute) which is also reflected in the feature map gradient in the first UMAP dimension (Suppl. Fig. 2). This notion has been added to the manuscript (page 9) and Suppl. Fig. 2. 

      (2) The conclusion that their nucleocentric approach (including a small area beyond the nucleus) is not well supported, and may just be better by random chance. 

      To address this second concern, which was also raised by reviewer 2, we have performed a more extensive analysis in which the patch size was varied from 0.6 to 120µm around the nuclear centroid (Fig. 4E and page 9 of the manuscript). We observed that there is little effect of in- or decreasing patch size on the average F-score within the nuclear to cell window, but that the imbalance between the precision and recall increases towards the larger box sizes (>18µm). Under our experimental conditions, the input numbers per class were equal, but this will not be the case in situations where the ground truth is unknown (and needs to be predicted by the CNN). Therefore, a well-balanced CNN is of high importance. This notion has been added to page 15 of the manuscript.

      The main advantage of nucleocentric profiling over whole-cell profiling in dense cultures is that it relies on a more robust nuclear segmentation method and is less sensitive to differences in cell density (Suppl. Fig. 1D). In other words, in dense cultures, the segmentation mask will contain similar regional input as the nuclear mask and the nucleocentric crop will contain more perinuclear information which contributes to the prediction accuracy. Therefore, at high densities, the performance of the CNN on whole-cell crops decreases owing to poorer segmentation performance. A CNN that uses nucleocentric crops, will be less sensitive to these errors. This notion has been added to pages 14-15 of the manuscript. 

      Additionally, the impact of this work will be limited, given the authors do not provide a specific link to the public source code that they used to process and analyze their data.

      The source code is now available on the Github page of the DeVos lab, under the following URL: https://github.com/DeVosLab/Nucleocentric-Profiling

      Recommendations for the authors:  

      Reviewing Editor (Recommendations For The Authors):

      Evaluation summary

      The authors present a new application of the high-content image-based morphological profiling Cell Painting (CP) to single cell type classification in mixed heterogeneous induced pluripotent stem cellderived mixed neural cultures. Machine learning models were trained to classify single cell types according to either "engineered" features derived from the image or from the raw CP multiplexed image. The authors systematically evaluated experimental (e.g., cell density, cell types, fluorescent channels, replication biases) and computational (e.g., different models, different cell regions) parameters and argue that focusing on the nucleus and its surroundings contains sufficient information for robust and accurate cell type classification. Models that were trained on mono-cultures (i.e., containing a single cell type) could generalize for cell type prediction in mixed co-cultures, and describe intermediate states of the maturation process of iPSC-derived neural progenitors to differentiation neurons.

      Strengths:

      Automatically identifying single-cell types in heterogeneous mixed-cell populations is an important application and holds great promise. The simple and high-content assay democratizes use and enables adoption by other labs. The manuscript is supported by comprehensive experimental and computational validations. The manuscript is well-written and easy to follow.

      Weaknesses:

      The conclusion is that the nucleocentric approach (including a small area beyond the nucleus) is not well supported, and may just be better by random chance. If better supported by additional experiments, this may influence how the field performs cell profiling in the future. Model interpretability (GradCAM) analysis is not convincing. The lack of a public source code repository is also limiting the impact of this study. There are missing details in the figure panels, figure legend, and text that would help the reader to better appreciate some of the technical details.

      Essential revisions:

      To reach a "compelling" strength of evidence the authors are requested to either perform a comprehensive analysis of the effect of ROI size on performance, or tune down statements regarding the superior performance of their "nucleocentric" approach. Further addition of a public and reproducible source code GitHub repository will lead to an "exceptional" strength of evidence.

      To answer the main comment, we have performed an experiment in which we varied the size of the nucleocentric patch and quantified CNN performance. We have also evaluated the operational window of our method by varying the resolution and SNR and we have experimented with different background blanking methods. We have expanded our examples of GradCAM images and now also made our source code and an example data set available via GitHub.

      Reviewer #1 (Recommendations For The Authors):

      I think that an evaluation of how the excluded cells affect our ability to measure the cell type composition of the population would be helpful to better understand the limitations and practical measurement noise introduced by this approach. A similar evaluation of the excluded cells can also help to better understand the benefit of nucleocentric vs. cell representations by more convincingly demonstrating the case for the nucleocentric approach. In any case, I recommend discussing in more depth the arguments for using the nucleocentric representation and why it is superior to the nuclear representation.

      The benefits of nucleocentric representation over nuclear and whole-cell representation are discussed more in depth at pages 14-15 of the manuscript. 

      “The nucleocentric approach, which is based on more robust nuclear segmentation, minimizes such mistakes whilst still retaining input information from the structures directly surrounding the nucleus. At higher cell density, the whole-cell body segmentation becomes more error-prone, while also loosing morphological information (Suppl. Fig. 1D). The nucleocentric approach is more consistent as it relies on a more robust segmentation and does not blank the surrounding region. This way it also buffers for occasional nuclear segmentation errors (e.g., where blebs or parts of the nucleus are left undetected).”

      It is not entirely clear to me why Figure 5 moves back to "engineered" features after previous figures showed the superiority of the deep learning approach. Especially, where Figure 6 goes again to DL. Dimensionality reduction can be also applied to DL-based classifications (e.g., using the last layer).

      Following up on the reviewers’ interesting comment, we extracted the embeddings from the trained CNN and performed UMAP dimensionality reduction. The results are shown in Fig. 3D, 6F and supplementary figure 1B and added to the manuscript on pages 6, 8 and 12. 

      We concluded that unsupervised dimensionality reduction using the feature embeddings could separate cell type clusters, where the distance between the clusters reflected the morphological similarity between the cell lines. 

      I would recommend including more comprehensive GRADCAM panels in the SI to reduce the concern of cherry-picking examples. What is the interpretation of the nucleocentric area?

      A more extensive set of GradCAM images have now been included in supplementary material (Supplementary figure 3) using the same random seeds for all conditions, thus avoiding any cherry picking. We interpret the GradCAM maps on the nucleocentric crops as highlighting the structures surrounding the nucleus (reflecting ER, mitochondria, Golgi) indicating their importance in correct cell classification. This was added to the manuscript on pages 9 and 15.

      Missing/lacking details and suggestions in the figure panels and figure legend:

      - Scale bars missing in some of the images shown (e.g., Figure 2F, Figure 3D, Figure 4, Supplementary Figure 4), what are the "composite" channels (e.g., Figure 2F), missing x-label in Figure 3B. 

      These have now been added.

      - Terms that are not clear in the figure and not explained in the legend, such as FITC and cy3 energy (Figure 1C). 

      The figure has been adapted to better show the region, channel and feature. We have now added a Table (Table 5), detailing the definition of each morphological feature that is extracted. On page 27, information on feature extraction is noted.

      - Details that are missing or not sufficiently explained in the figure legends such as what each data point represents and what is Gini importance (Figure 1D) 

      We have added these explanations to the figure legends. The Gini importance or mean decrease in impurity reflects how often this feature is used in decision tree splits across all random forest trees.

      Is it the std shown in Figure 2C?

      Yes, this has now been added to the legend.  

      It is not fully clear what is single/mixed (Figure 2D)

      Clarification is added to the legend and in the manuscript on page 6.

      explain what is DIV 13-90 in the legend (Figure 5).

      DIV stands for days in vitro, here it refers to the days in culture since the start of the neural induction process. This has been added in the legend.

      and state what are img1-5 (Supplementary Figures 1B-C) Clarification has been added to the legend.

      - Supplementary Figure 1. What is the y-axis in panel C and how do the results align with the cell mask in panel B?

      The y-axis represents the intersection over union (IoU). The IoU quantifies the overlap between ground truth (manually segmented ROI) and the ROI detected by the segmentation algorithm. It is defined as the area of the overlapping region over the total area. This clarification has been added to the legend.

      - Supplementary Figure 1 and Methods. Please explain when CellPose and when StarDist were applied.

      Added to supplementary figure and methods at page 24. In the case of nuclear segmentation (nucleus and nucleocentric crops), Stardist was used. For whole-cell crops, cell segmentation using Cellpose was used.

      - Supplementary Figure 4C - the color code is different between nuclear and nucleocentric - this is confusing.

      We have changed to color code to correspond in both conditions in Fig. 1A.

      - Figure 3B - better to have a normalized measure in the x-axis (number of cells per area in um^2)

      We agree and have changed this.

      Suggestions and missing/lacking details in the text:

      • Line #38: "we then applied this" because it is the first time that this term is presented.

      This has been rephrased.

      • Line #88: a few words on what were the features extracted would be helpful.

      Short description added to page 26-27 and detailed definition of all features added in table 5.

      -  Line #91: PCA analysis - the authors can highlight what (known) features were important to PC1 using the linear transformation that defined it.

      The 5 most important features of PC1 were (in order of decreasing importance): channel 1 dissimilarity, channel 1 homogeneity, nuclear perimeter, channel 4 dissimilarity and nuclear area.  

      - Line #92: Order of referencing Supplementary Figure 4 before referencing Supplementary Figure 13.

      The order of the Supplementary images was changed to follow the chronology. 

      • Line #96: Can the authors show the data supporting this claim?

      The unsupervised UMAP shown in fig. 1B is either color coded by cell type (left) or replicate (right). Based on this feature map, we observe clustering along the UMAP1 axis to be associated with the cell type. Variations in cellular morphology associated with the biological replicate are more visible along the UMAP2 axis. When looking at fig. 1C, the feature map reflecting the cellular area shows a gradient along the UMAP1 direction, supporting the assumption that cell area contributes to the cell type separation. On the other hand, the average intensity (Channel 2 intensity) has a gradient within the feature map along the UMAP2 direction. This corresponds to the pattern associated with the inter-replicate variability in panel B.

      - Line #108: what is "nuclear Cy3 energy"?

      This represents the local change of pixel intensities within the ROI in the nucleus in the 3rd channel dimension. This parameter reflects the texture within the nuclear region for the phalloidin and WGA staining. The definitions of all handcrafted features are added in table 5 of the manuscript.

      - Line #110-112: Can the authors show the data supporting this claim?

      The figure has been changed to include the results from a filtered and unfiltered dataframe (exclusion and inclusion of redundant features). Features could be filtered out if the correlation was above a threshold of 0.95. This has been added to page 6 of the manuscript and fig. 1D.  

      - Line #115-116: please state the size of the mask.

      Added to the text (page 6). We used isotropic image crops of 60µm centred on individual cell centroids.

      - Lines 120-122: more details will make this more clear (single vs. mixed).

      This has been changed on page 6 of the manuscript.

      • Line #142: "(mimics)" - is it a typo?

      Tissue mimics refers to organoids/models that are meant to replicate the physiological behaviour.

      • Line #159: the bounding box for nucleocentric analysis is 15x15um (and not 60), as stated in the Methods.

      Thank you for pointing out this mistake. We have adapted this.

      - Line #165: what is the interpretation of what was important for the nucleocentric classification?

      The colour code in GradCAM images is indicative of the attention of the CNN (the more to the red, the more attention). In fig. 4D and Suppl. Fig. 3 the structures directly surrounding the nucleus receive high attention from the CNN trained on nucleocentric crops. This has been added to the manuscript page 9 and 15.

      • Section starting in line #172: not explicitly stated what model was used (nucleocentric?).

      Added in the legend of fig. 5. For these experiments, the full cell segmentation was still used. 

      - Section starting in line #199: why use a feature-based model rather than nucleocentric? A short sentence would be helpful.

      For CNN training, nucleocentric profiling was used. In response to a legitimate question of one of the reviewers, the feature-based UMAP analysis was replaced with the feature embeddings from the CNN. 

      - Line #213: Fig. 5B does not show transitioning cells.

      Thank you for pointing this out, this was a mistake and has been changed.

      Lines #218-220: not fully clear to some readers (culture condition as a weak label), more details can be helpful.

      We changed this at page 11 of the manuscript for clarity. 

      “This gating strategy resulted in a fractional abundance of neurons vs. total (neurons + NPC) of 36,4 % in the primed condition and 80,0% in the differentiated condition (Fig. 6C). We therefore refer to the culture condition as a weak label as it does not take into account the heterogeneity within each condition (well).”

      -  Line #230: "increasing dendritic outgrowth" - what does it mean? Can you explicitly highlight this phenotype in Figure 5G?

      When the cells become more mature during differentiation, the cell body becomes smaller and the neurons form long, thin ramifications. This explanation has been added to page 12 of the manuscript.

      • Line #243: is it the nucleocentric CNN?

      Yes.

      • Lines #304-313, the authors might want to discuss other papers dealing with continuous (non-neural) differentiation state transitions (eg PMID: 38238594).  

      A discussion of the use of morphological profiling for longitudinal follow-up of continuous differentiation states has been added to the manuscript at page 18. 

      - Line #444: cellpose or stardist? How did the authors use both?

      Clarification has been added to supplementary figure 1 and methods at page 24. Stardist was used for nuclear segmentation, whereas Cellpose was used for whole-cell segmentation. 

      • Line #470-474: I would appreciate seeing the performance on the full dataset without exclusions.

      Cells have been excluded based on 3 arguments: the absence of DAPI intensity, too small nuclear size and absence of ground truth staining. The first two arguments are based on the assumption that ROIs that contain no DAPI signal or are too small are errors in cell segmentation and therefore should not be taken along in the analysis. The third filtering step was based on the ground-truth IF signal. Not filtering out these cells with a ‘dubious’ IF profile (e.g., cells that might be transitioning or are of a different type) would negatively affect the model by introducing noise. It is correct that the predictions are based only on these inputs and so cells of a subsequent test set will only be classified according to these labels which might introduce bias. However, the model could predict increase in neuron/NPC ratio with culture age in absence of ground-truth staining (and thus IF-based filtering).

      Reviewer #2 (Recommendations For The Authors):

      Figure 1A: it would be interesting to the reader to see the SH-SY5Y data as well.

      This has been added in fig. 1A.

      Figure 3A: 95-100% image: showing images with the same magnification as the others would help to appreciate the cell density.

      Now fig. 4A. The figure has been changed to make sure all images have the same magnification. 

      Figure Supp 4 (line 132) is referred to before Figure Supp1 (line 152).

      The image order and numbering has been changed to solve this issue.

      Figure Supp 2 & 3 are not referred to in the text.

      This has been adjusted.

      Line 225: a statistical test would help to convince of the accuracy of these results (Figure 5C vs Figure 5F)?

      These figures represent the total ROI counts and thus represent a single number.

      Line 227: Could you explain to the reader, in a few words, what a dual SMAD inhibition is?

      This has been added to the manuscript at page 20. 

      “This dual blockade of SMAD signalling in iPSCs is induces neural differentiation by synergistically causing the loss of pluripotency and push towards neuroectodermal lineage.”

      Reviewer #3 (Recommendations For The Authors):

      I have a few concerns and several comments that, if addressed, may strengthen conclusions, and increase clarity of an already technically sound paper.

      Concerns

      • The results presented in Figure 3 panel D, may indicate a critical error in data processing and interpretation that the authors must address. The GradCAM method highlights the background as having the highest importance. While it can be argued in the nucleocentric profiling method that GradCAM focuses on the nuclear membrane, the background is highly important even for the nuclear profiling method, which should provide little information. What procedure did the authors use for mask subtraction prior to CNN training? Could the segmentation algorithm be performing differently between cell lines? The authors interpret the GradCAM results to indicate a proxy for nuclear size, but then why did the CNN perform so much better than random forest using hand-crafted features that include this variable? The authors should also present size distributions between cell lines (and across seeding densities, in case one of the cell lines has different compaction properties with increasing density).

      Perhaps clarifying this sentence (lines 166-168) would help as well: "As nuclear area dropped with culture density, the dynamic range decreased, which could explain the increased error rate of the CNN for high densities unrelated to segmentation errors (Suppl. Fig. 4B)." What do the authors mean by "dynamic range" and it is not clear how Supplementary Figure 4B provides evidence for this? 

      The dynamic range refers to the difference between the minimum and maximum nuclear area. We expect the difference to decrease at highe rdensity owing to the crowding that forces all nuclei to take on a more similar (smaller) size.

      More clarification on this has been added to page 9 of the manuscript.

      I certainly understand that extrapolating the GradCAM concern to the remaining single-cell images using only four (out of tens of thousands of options) is also dangerous, but so is "cherry-picking" these cells to visualize. Finally, I also recommend that the authors quantitatively diagnose the extent of the background influence according to GradCAM by systematically measuring background influence in all cells and displaying the results per cell line per density.

      To avoid cherry picking of GradCAM images, we have now randomly selected for each condition and density 10 images (using random seeds to avoid cherry-picking) and added these in a Suppl. Fig. 3.

      In answer to this concern, we refer to the response above: 

      “To address the first point, we have adapted the GradCAM images to show an overlay of the input crop and GradCAM heatmap to give a better view of the structures that are highlighted by the CNN. We further investigated the influence of the background on the prediction performance. Our finding that a CNN trained on a monoculture retains a relatively high performance on cocultures implies that the CNN uses the salient characteristics of a cell to recognize it in more complex heterogeneous environments. Assuming that the background can vary between experiments, the prediction of a pretrained CNN on a new dataset indicates that cellular characteristics are used for robust prediction.  When inspecting GradCAM images obtained from the nucleocentric CNN approaches (now added in Suppl. Fig. 3), we noticed that the nuclear periphery typically contributed the most (but not exclusively) to the prediction performance. When using only the nuclear region as input, GradCAMs were more strongly (but again not exclusively) directed to the background surrounding the nuclei. To train the latter CNN, we had cropped nuclei and set the background to a value of zero. To rule out that this could have introduced a bias, we have now performed the exact same training and classification, but setting the background to random noise instead (Suppl. Fig. 2). While this effectively diverted the attention of the GradCAM output to the nucleus instead of the background, the prediction performance was unaltered. We therefore assume that irrespective of the background, when using nuclear crops as input, the CNN is dominated by features that describe nuclear size. We observe that nuclear size is significantly different in both cell types (although intranuclear features also still contribute) which is also reflected in the feature map gradient in the first UMAP dimension (Suppl. Fig. 2). This notion has been added to the manuscript (page 9) and Suppl. Fig. 2.”

      • The data supporting the conclusion about nucleocentric profiling outperforming nuclear and full-cell profiling is minimal. I am picking on this conclusion in particular, because I think it is a super cool and elegant result that may change how folks approach issues stemming from cell density disproportionately impacting profiling. Figures 3B and 3C show nucleocentric slightly outperforming full cell, and the result is not significant. The authors state in lines 168-170: "Thus, we conclude that using the nucleocentric region as input for the CNN is a valuable strategy for accurate cell phenotype identification in dense cultures." This is somewhat of a weak conclusion, that, with additional analysis, could be strengthened and add high value to the community. Additionally, the authors describe the nucleocentric approach insufficiently. In the methods, the authors state (lines 501-503): "Cell crops (60μm whole cell - 15μm nucleocentric/nuclear area) were defined based on the segmentation mask for each ROI." This is not sufficient to reproduce the method. What software did the authors use?

      Presumably, 60μm refers to a box size around cytoplasm? Much more detail is needed. Additionally, I suggest an analysis to confirm the impact of nucleocentric profiling, which would strengthen the authors' conclusions. I recommend systematically varying the subtraction (-30μm, -20μm, -10μm, 5μm, 0, +5μm, +10μm, etc.) and reporting the density-based analysis in Figure 3B per subtraction. I would expect to see some nucleocentric "sweet spot" where performance spikes, especially in high culture density. If we don't see this difference, then the non-significant result presented in Figures 3B and C is likely due to random chance. The authors mention "iterative data erosion" in the abstract, which might refer to what I am recommending, but do not describe this later.

      More detail was added to the methods describing the image crops given as input to the CNN (page 28 of the manuscript). 

      “Crops were defined based on the segmentation mask for each ROI. The bounding box was cropped out of the original image with a fixed patch size (60µm for whole cells, 18µm for nucleus and nucleocentric crops) surrounding the centroid of the segmentation mask. For the whole cell and nuclear crops, all pixels outside of the segmentation mask were set to zero. This was not the case for the nucleocentric crops. Each ROI was cropped out of the original morphological image and associated with metadata corresponding to its ground truth label.”

      To address this concern, we also refer to the answer above. 

      “We have performed a more extensive analysis in which the patch size was varied from 0.6 to 120µm around the nuclear centroid (Fig. 4E and page 9 of the manuscript). We observed that there is little effect of in- or decreasing patch size on the average F-score within the nuclear to cell window, but that the imbalance between the precision and recall increases towards the larger box sizes (>18µm). Under our experimental conditions, the input numbers per class were equal, but this will not be the case in situations where the ground truth is unknown (and needs to be predicted by the CNN). Therefore, a well-balanced CNN is of high importance. This notion has been added to page 12 of the manuscript.

      The main advantage of nucleocentric profiling over whole-cell profiling in dense cultures is that it relies on a more robust nuclear segmentation method and is less sensitive to differences in cell density (Suppl. Fig. 1D). In other words, in dense cultures, the segmentation mask will contain similar regional input as the nuclear mask and the nucleocentric crop will contain more perinuclear information which contributes to the prediction accuracy. Therefore, at high densities, the performance of the CNN on whole-cell crops decreases owing to poorer segmentation performance. A CNN that uses nucleocentric crops, will be less sensitive to these errors. This notion has been added to pages 14-15 of the manuscript.“

      Comments

      • There is a disconnect between the abstract and the introduction. The abstract highlights the nucleocentric model, but then it is not discussed in the introduction, which focuses on quality control. The introduction would benefit from some additional description of the single-cell or whole-image approach to profiling.

      We highlight the importance of QC of complex iPSC-derived neural cultures as an application of morphological profiling. We used single-cell profiling to facilitate cell identification in these mixed cultures where the whole-image approach would be unable to deal with the heterogeneity withing the field of view. In the introduction, we added a description of the whole-image vs. single-cell approach to profiling (page 4). In the discussion (page 18), we further highlight the application of this single-cell profiling approach for QC purposes. 

      - Comments on Figure 1. It is unclear how panel B shows "without replicate bias". 

      In response to this comment, we refer to the answer above: “The unsupervised UMAP shown in fig. 1B is either color coded by cell type (left) or replicate (right). Based on this feature map, we observe clustering along the UMAP1 axis to be associated with the cell type. Variations in cellular morphology associated with the biological replicate are more visible along the UMAP2 axis. When looking at fig. 1C, the feature map reflecting the cellular area shows a gradient along the UMAP1 direction, supporting the assumption that cell area contributes to the cell type separation. On the other hand, the average intensity (Channel 2 intensity) has a gradient within the feature map along the UMAP2 direction. This corresponds to the pattern associated with the inter-replicate variability in panel B.” We added this notion to page 5 of the manuscript.

      The paper would benefit from a description of how features were extracted sooner.

      Information on the feature extraction was added to the manuscript at page 27. An additional table (table 5) has been added with the definition of each feature.  

      - Comments on Supplementary Figure 4. The clustering with PCA is only showing 2 dimensions, so it is not surprising UMAP shows more distinct clustering.

      We used two components for UMAP dimensionality reduction, so the data was also visualized in two dimensions. However, we agree that UMAP can show more distinct clustering as this method is non-linear.

      Why is Figure S4 the first referenced Supplementary Figure?

      This has been changed. 

      • Comments on Figure 2. Need discussion of the validation set - how was it determined? Panel E might have the answer I am looking for, but it is difficult to decipher exactly what is being done. The terminology needs to be defined somewhere, or maybe it is inconsistent. It is tough to tell. For example, what exactly are the two categories of model validation (cross-validation and independent testing)?

      Additional clarification has been added to the manuscript at pages 6-7 and figure 2.

      The metric being reported is accuracy for the independent replicate if the other two are used to train?

      Yes. 

      Panel C is a very cool analysis. Panel F needs a description of how those images were selected, randomly?

      Added in the methods section (page 29). GradCAM analysis was used to visualize the regions used by the CNN for classification. This map is specific to each cell. Images are selected randomly out the full dataset for visualization.  

      They also need scale bars.

      Added to the figures. 

      Panel G would benefit from explicit channel labels (at least a legend would be good!).

      Explanation has been added to the legend. All color code and channel numbering are consistent with fig. 1A. 

      What do the dots and boxplots represent? The legend says, "independent replicates", but independent replicates of, I assume, different model initializations?

      Clarification has been added to the figure legends. For plots showing the performance of a CNN or RF classifier, each dot represents a different model initialization. Each classifier has been initialized at least 3 times. When indicated, the model training was performed with different random seeds for data splitting.

      • Comments on Figure 3. Panel A needs scale bar. See comment on Panel D in concern #1 described above. 

      This has been added.

      • Comments on Supplementary Figure 1. A reader will need a more detailed description in panel C. I assume that the grey bar is the average of the points, and the points represent different single cells?

      How many cells? How were these cells selected? 

      This information on the figure (now Suppl. Fig. 1D), has been added to the legend.

      “Left: Representative images of 1321N1 cells with increasing density alongside their cell and nuclear mask produced using resp. Cellpose and Stardist. Images are numbered from 1-5 with increasing density. Upper right: The number of ROIs detected in comparison to the ground truth (manual segmentation). A ROI was considered undetected when the intersection over union (IoU) was below 0,15. Each bar refers to the image number on the left. The IoU quantifies the overlap between ground truth (manually segmented ROI) and the ROI detected by the segmentation algorithm. It is defined as the area of the overlapping region over the total area. IoU for increasing cell density for cell and nuclear masks is given in the bottom right. Each point represents an individual ROI. Each bar refers to the image number on the left.”

      • Comments on Figure 4. More details on quenching are needed for a general audience. The markers chosen (EdU and BrdU) are generally not specific to cell type but to biological processes (proliferation), so it is confusing how they are being used as cell-type markers. 

      The base analogues were incorporated into each cell line prior to mixing them, i.e.  when they were still growing in monoculture so they could be labelled and identified after co-seeding and morphological profiling. Additional clarification has been added to the manuscript (page 26) 

      It is also unclear why reducing CV is an important side-effect of finetuning. CV of what? The legend says, "model iterations", but what does this mean? 

      The dots in the violinplot are different CNN initializations. A lower variability between model initializations is an indicator of certainty of the results. Prior to finetuning, the results of the CNN were highly variable leading to a high CoV between the different CNNs. This means the outcome after finetuning is more robust.

      • Comments on Figure 5. This is a very convincing and well-described result, kudos! This provides another opportunity to again compare other approaches (not just nucleocentric). Additionally, since the UMAP space uses hand-crafted features. The authors could consider interpreting the specific morphology features impacted by the striking gradual shift to neuron population by fitting a series of linear models per individual feature. This might confirm (or discover) how exactly the cells are shifting morphology.

      The supervised UMAP on the handcrafted features did not highlight any features contributing to the separation. Using the supervised UMAP, the clustering is dominated by the known cell type. Unsupervised UMAP on the handcrafted features does not show any clustering. In response to a previous comment, we adapted the figure to show UMAP dimensionality reduction using the feature embeddings from the cell-based CNN. This unsupervised UMAP does show good cell type separation, but it does not use any directly interpretable shape descriptors.

      • General comments on Methods. The section on "ground truth alignment" needs more details. Why was this performed? 

      Following sequential staining and imaging rounds, multiple images were captured representing the same cell with different markers. Lifting the plate of the microscope stage and imaging in sequential rounds after several days results in small linear translations in the exact location of each image. These linear translations need to be corrected to align (or register) morphological with ground truth image data within the same ROI. This notion has been added to the manuscript at page 26. 

      Handcrafted features extracted using what software? 

      The complete analysis was performed in python. All packages used are listed in table 4. Handcrafted features were extracted using the scikit-image package (regionprops and GLCM functions). This has been added to the manuscript at page 27.

      Software should be cited more often throughout the manuscript. 

      Lastly, the GitHub URL points to the DeVosLab organization, but should point to a specific repository. Therefore, I was unable to review the provided code. A well-documented and reproducible analysis pipeline should be included.

      A test dataset and source code are available on GitHub:  https://github.com/DeVosLab/Nucleocentric-Profiling

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Comment 1. In Figure 1, the MafB antibody (Sigma) was used to identify Renshaw cells at P5. However, according to the supplementary Figure 3D, the specificity of the MafB antibody (Sigma) is relatively low. The image of MafB-GFP, V1-INs, and MafB-IR at P5 should be added to the supplementary figure. The specificity of MaFB-IR-Sigma in V1 neurons at P5 should be shown. This image also might support the description of the genetically labeled MafB-V1 distribution at P5 (page 8, lines 28-32). 

      We followed the reviewer’s suggestion and moved analyses of the MafB-GFP mouse to a supplemental figure (Fig S3). The characterization of MafB immunoreactivities is now in supplemental Figure S2 and the related text in results was also moved to supplemental to reduce technicalities in the main text. We added confocal images of MafB-GFP V1 interneurons at P5 showing immunoreactivities for both MafB antibodies, as suggested by the reviewer (Fig S2A,B). We agree with the reviewer that this strengthens our comparisons on the sensitivity and specificity of the two MafB antibodies used in this study. 

      As explained in the preliminary response we cannot show lack of immunoreactivity for MafB antibodies in MafB GFP/GFP knockout mice at P5 because MafB global KOs die at birth. This is why we used tissues from late embryos to check MafB immunoreactivities (Figure S2C and S2D). We made this point clearer in the text and supplemental figure legends.

      Comment 2. The proportion of genetically labeled FoxP2-V1 in all V1 is more than 60%, although immunolabeled FoxP2-V1 is approximately 30% at P5. Genetically labeled Otp-V1 included other nonFoxP2 V1 clades (Fig. 8L-M). I wonder whether genetically labeled FoxP2-V1 might include the other three clades. The authors should show whether genetically labeled FoxP2-V1 expresses other clade markers, such as pou6f2, sp8, and calbindin, at P5. 

      We included the requested data in Figure 3E-G. Lineage-labeled Foxp2-V1 neurons in our genetic intersection do not include cells from other V1-clades.

      Reviewer 2:

      Comment 1. The current version of the paper is VERY hard to read. It is often extremely difficult to "see the forest for the trees" and the reader is often drowned in methodological details that provide only minor additions to the scientific message. Non-specialists in developmental biology, but still interested in the spinal cord organization, especially students, might find this article challenging to digest and there is a high risk that they will be inclined to abandon reading it. The diversity of developmental stages studied (with possible mistakes between text and figures) adds a substantial complexity in the reading. It is also not clear at all why authors choose to focus on the Foxp2 V1 from page 9. Naively, the Pou6f2 might have been equally interesting. Finally, numerous discrepancies in the referencing of figures must also be fixed. I strongly recommend an in-depth streamlining and proofreading, and possibly moving some material to supplement (e.g. page 8, and elsewhere).

      The whole text was re-written and streamlined with most methodological discussion (including the section referred to by the reviewer) transferred to supplemental data. Nevertheless, enough details on samples, stats and methods were retained to maintain the rigor of the manuscript. 

      The reasons justifying a focus on Foxp2-V1 interneurons were fully explained in our preliminary response. Briefly, we are trying to elucidate V1 heterogeneity, and prior data showed that this is the most heterogeneous V1 clade (Bikoff et al., 2016), so it makes sense it was studied further. We agree that the Pou6f2 clade is equally interesting and is in fact the subject of several ongoing studies.

      Comment 2. … although the different V1 populations have been investigated in detail regarding their development and positioning, their functional ambition is not directly investigated through gain or loss of function experiments. For the Foxp2-V1, the developmental and anatomical mapping is complemented by a connectivity mapping (Fig 6s, 8), but the latter is fairly superficial compared to the former. Synapses (Fig 6) are counted on a relatively small number of motoneurons per animal, that may, or may not, be representative of the population. Likewise, putative synaptic inputs are only counted on neuronal somata. Motoneurons that lack of axo-somatic contacts may still be contacted distally. Hence, while this data is still suggestive of differences between V1 pools, it is only little predictive of function.

      We fully answered the question on functional studies in the preliminary response. Briefly, we are currently conducting these studies using various mouse models that include chronic synaptic silencing using tetanus toxin, acute partial silencing using DREADDs, and acute cell deletion using diphtheria toxin. Each intervention reveals different features of Foxp2-V1 interneuron functions, and each model requires independent validation. Moreover, these studies are being carried out at three developmental stages: embryos, early postnatal period of locomotor maturation and mature animals. Obviously, this is all beyond the goals and scope of the present study. The present study is however the basis for better informed interpretations of results obtained in functional studies.

      Regarding the question on synapse counts, we explained in the preliminary results fully why we believe our experimental designs for synapse counting at the confocal level are among the most thorough that can be found in the literature. We counted a very large number of motoneurons per animal when adding all motor column and segments analyzed in each animal. Statistical power was also enough to detect fundamental variation in synaptic density among motor columns.

      We focus our analyses on motoneuron cells bodies because analysis of full dendritic arbors on all motor columns present throughout all lumbosacral segments is not feasible. Please see Rotterman et al., 2014 (J. of Neuroscience; doi: 10.1523/JNEUROSCI.4768-13.2014) for evaluation of what this entails for a single motoneuron. We agree with the reviewer that analyses of V1 synapses over full dendrite arbors in specific motoneurons will be very relevant in further studies. These should be carried out now that we know which motor columns are of high interest. Nevertheless, inhibitory synapses exert the most efficient modulation of neuronal firing when they are on cell bodies, and our analyses clearly suggest a difference in in cell body inhibitory synapses targeting between different V1 interneuron types that we find very relevant.

      Comment 3. I suggest taking with caution the rabies labelling (Figure 8). It is known that this type of Rabies vectors, when delivered from the periphery, might also label sensory afferents and their postsynaptic targets in the cord through anterograde transport and transneuronal spread (e.g., Pimpinella et al., 2022). Yet I am not sure authors have made all controls to exclude that labelled neurons, presumed here to be premotoneurons, could rather be anterogradely labelled from sensory afferents. 

      Over the years, we performed many extensive controls and validation of rabies virus transsynaptic tracing methods. These were presented at two SfN meetings (Gomez-Perez et al., 2015 and 2016; Program Nos. 242.08 and 366.06). Our validation of this technique was fully explained in our preliminary response. We also pointed out that the methods used by Pimpinella et al. have a very different design and therefore their results are not comparable to ours. In this study we injected the virus at P15 into leg muscles, and not directly into the spinal cord. In our hands, and as cited in Pimpinella et al., the rabies virus loses tropism for primary afferents with age when injected in muscle. The lack of primary afferent labeling in key lumbosacral segments (L4 and L5) is now illustrated in a new supplemental figure (Figure S6). This figure also shows some starter motoneurons. As explained in the text and in our previous response, these are few in number because of the reduced infection rate when using this method in mature animals (after P10).  

      Comment 4. The ambition to differentiate neuronal birthdate at a half-day resolution (e.g., E10 vs E10.5) is interesting but must be considered with caution. As the author explains in their methods, animals are caged at 7pm, and the plug is checked the next morning at 7 am. There is hence a potential error of 12h. 

      We agree with the reviewer, and we previously explicitly discussed these temporal resolution caveats. We have now further expanded on this in new text (see middle paragraph in page 5). Nevertheless, the method did reveal the temporal sequence of neurogenesis of V1 clades with close to 12-hour resolution.

      As explained in text and preliminary response this is because we analyzed a sufficient number of animals from enough litters and utilized very stringent criteria to count EdU positives. 

      Moreover, our results fit very well with current literature. The data agree with previous conclusions from Andreas Sagner group (Institut für Biochemie, Friedrich-Alexander-Universität Erlangen-Nürnberg), on spinal interneurons (including V1s) birthdates based on a different methodology (Delile J et al.

      Development. 2019 146(12):dev173807. doi: 10.1242/dev.173807. PMID: 30846445; PMCID: PMC6602353). In the discussion we compared in detail both the data and methods between Delile article and our results. We also cite Sagner 2024 review as requested later in the reviewer’s detailed comments. Our results also confirmed our previous report on the birthdates of V1-derived Renshaw cells and Ia inhibitory interneurons (Benito-Gonzalez A, Alvarez FJ J Neurosci. 2012 32(4):1156-70. doi: 10.1523/JNEUROSCI.3630-12.2012. PMID: 22279202; PMCID: PMC3276112). Finally, we recently received a communication notifying us that our neurogenesis sequence of V1s has been replicated in a different vertebrate species by Lora Sweeney’s group (Institute of Science and Technology Austria; direct email from this lab) and we shared our data with them for comparison. This manuscript is currently close to submission. Therefore, we are confident that despite the limitations of EdU birthdating we discussed, the conclusions we offered are strong and are being validated by other groups using different methods and species. We also want to acknowledge the positive comments of reviewer 3 regarding our birthdating study, indicating it is one the most rigorous he or she has ever seen.

      Reviewer 3:

      Comment 1. My only criticism is that some of the main messages of the paper are buried in technical details. Better separation of the main conclusions of the paper, which should be kept in the main figures and text, and technical details/experimental nuances, which are essential but should be moved to the supplement, is critical. This will also correct the other issue with the text at present, which is that it is too long.

      Similar to our response to comment 1 from Reviewer 2 we followed the reviewers’ recommendations and greatly summarized, simplified and removed technical details from the main text, trying not to decrease rigor.  

      Reviewer #1 (Recommendations For The Authors):

      In Figure 1, the definition of the area to analyze MafB ventral and MafB dorsal is unclear. It should be described.

      This has been clarified in both text and supplemental figure S3.

      “We focused the analyses on the brighter dorsal and ventral MafB-V1 populations defined by boxes of 100 µm dorsoventral width at the level of the central canal (dorsal) or the ventral edge of the gray matter (ventral) (Supplemental Figure S3B).”

      Problems with figure citation.

      We apologize for the mistakes. All have been corrected. 

      Reviewer #2 (Recommendations For The Authors):

      As indicated in the public review, I'd recommend to substantially revise the writing, for clarity. As such, the paper is extremely hard to read. I would also recommend justifying the focus on Foxp2 neurons.

      Also, the scope of the present paper is not clearly stated in the introduction (page 4).

      Done. We also modified the introduction such that the exact goals are more clearly stated.

      I would also recommend toning down the interpretation that V1 clades constitute "unique functional subsets" (discussion and elsewhere). Functional investigation is not performed, and connectomic data is partial and only very suggestive.

      We include the following sentence at the end of the 1st paragraph in the discussion:

      “This result strengthens the conclusion that these V1 clades defined by their genetic make-up might represent distinct functional subtypes, although further validation is necessary in more functionally focused studies.”

      Different post-natal stages are used for different sections of the manuscript. This is often confusing, please justify each stage. From the beginning even, why is the initial birthdating (Figure 1) done here at p5, while the previous characterization of clades was done at p0? I am not sure to understand the justification that this was chosen "to preserve expression of V1 defining TFs". Isn't the sooner the better?

      The birthdating study was carried out at P5. P5 is a good time point because there is little variation in TF expression compared to P0, as demonstrated in the results. Furthermore, later tissue harvesting allows higher replicability since it is difficult to consistently harvest tissue the day a litter is born (P0). Also technically, it is easier to handle P5 tissue compared to P0. The analysis of VGUT1 synapses was also done at P5 rather than later ages. This has two advantages: TFs immunoreactivities are preserved at this age, and also corticospinal projections have not yet reached the lumbar cord reducing interpretation caveats on the origins of VGUT1 synapses in the ventral horn (although VGLUT1 synapses are still maturing at this age, see below).

      Other parts of the study focus on different ages selected to be most adequate for each purpose. To best study synaptic connectivity, it is best to study mature spinal cords after synaptic plasticity of the first week. For the tracing study we thoroughly explain in the text the reasons for the experimental design (see also below in detailed comments). For counting Foxp2-V1 interneurons and comparing them to motor columns we analyze mature animals. For testing our lineage labeling we use animals of all ages to confirm the consistency of the genetic targeting strategy throughout postnatal development and into adulthood.

      Figure 5: wouldn't it be worth quantifying and illustrating cellular densities, in addition to the average number of Foxp2 neurons, across lumbar segments (panel D & E)? Indeed, the size of - and hence total number of cells within - each lumbar segment might not be the same, with a significant "enlargement" from L2 to L4 (this is actually visible on the transverse sections). Hence, if the total number of cells is in the higher in these enlarged segments, but the total number of Foxp2-V1 is not, it may mean that this class is proportionally less abundant.

      We believe the critical parameter is the ratio of Foxp2-V1s to motoneurons. This informs how Foxp2-V1 interneurons vary according to the size of the motor columns and the number of motoneurons overall.

      The question asked by the reviewer would best be answered by estimating the proportion of Foxp2-V1 neurons to all NeuN labeled interneurons. This is because interneuron density in the spinal cord varies in different segments. We are not sure what this additional analysis will contribute to the paper.

      Why, in the Rabies tracing scheme (Fig 8), the Rabies injection is performed at p15? As the authors explain in the text, rabies uptake at the neuromuscular junction is weak after p10. It is not clear to me why such experiments weren't done all at early postnatal stages, with a "classical" co-injection of TVA and Rabies.

      First, we do not need TVA in this experiment because we are using B19-G coated virus and injecting it into muscles, not into the spinal cord directly.

      Second, enhanced tracing occurs when the AAV is injected a few days before rabies virus. This is because AAV transgene expression is delayed with respect to rabies virus infection and replication. We have performed full time courses and presented these data in one abstract to SfN: Gomez-Perez et al., 2015 Program Nos. 242. We believe full description of these technical details is beyond the scope of this manuscript that has already been considered too technical.

      Third, the justification of P15 timing of injections for anterograde primary afferent labeling and retrograde monosynaptic labeling of interneurons is fully explained in the text. 

      “To obtain transcomplementation of RVDG-mCherry with glycoprotein in LG motoneurons, we first injected the LG muscle with an AAV1 expressing B19-G at P4. We then performed RVDG and CTB injections at P15 to optimize muscle targeting and avoid cross-contamination of nearby muscles. Muscle specificity was confirmed post-hoc by dissection of all muscles below the knee. Analyses were done at P22, a timepoint after developmental critical windows through which Ia (VGLUT1+) synaptic numbers increase and mature on V1-IaINs (Siembab et al., 2010)” 

      Furthermore, CTB starts to decrease in intensity 7 days after injection because intracellular degradation and rabies virus labeling disappears because cell death. Both limit the time of postinjection for analyses.

      Likewise, I am surprised not to see a single motoneuron in the rabies tracing (Fig 8, neither on histology nor on graphs (Fig 8). How can authors be certain that there was indeed rabies uptake from the muscle at this age, and that all labelled cells, presumed to be preMN, are not actually sensory neurons? It is known that Rabies vectors, when delivered from the periphery, might also label sensory afferents and their post-synaptic targets through anterograde transport and transneuronal spread (e.g., Pimpinella et al., 2022). This potential bias must be considered.

      This is fully explained in our previous response to the second reviewer’s general comments. We have also added a confocal image showing starter motoneurons as requested (Figure S6A).

      Please carefully inspect the references to figures and figure panels, which I suspect are not always correct.

      Thank you. We carefully revised the manuscript to correct these deficiencies and we apologize for them.

      Reviewer #3 (Recommendations For The Authors):

      Figure 1: Data here is absolutely beautiful and provides one of the most thorough studies, in terms of timepoints, number of animals analyzed, and precision of analysis, of edU-based birth timing that has been published for neuron subtypes in the spinal cord so far. My only suggestion is to color code the early and late born populations (in for example, different shades of green for early; and blue for late, to better emphasize the differences between them). It is very difficult to differentiate between the purple, red and black colors in G-I, which this would also fix. The antibody staining for Pou6f2 (F) is also difficult to see; gain could be increased on these images or insets added for clarity.

      The choice of colors is adapted for optimal visualization by people with different degrees of color blindness. Shades of individual colors are always more difficult to discriminate. This is personally verified by the senior corresponding author of this paper who has some color discrimination deficits. Moreover, each line has a different symbol for the same purpose of easing differentiation.

      Figure 2: This is also a picture-perfect figure showing further diversity by birth time even within a clade. One small aesthetic comment is that the arrows are quite unclear and block the data. Perhaps the contours themselves could be subdivided by region and color coded by birth time-such that for example the dorsal contours that emerge in the MafB clade at E11 are highlighted in their own color. Some quantification of the shift in distribution as well as the relative number of neurons within each spatially localized group would also be useful. For MafB, for example, it looks as though the ventral cells (likely Renshaw) are generated at all times in the contour plots; in the dot plots however, it looks like the most ventral cells are present at e10.5. This is likely because the contours are measuring fractional representations, not absolute number. An independent measure of absolute number of ventral and dorsal, by for example, subdividing the spinal cord into dorsoventral bins, would be very useful to address this ambiguity.

      We believe density plots already convey the message of the shift in positions with birthdate. We are not sure how we can quantify this more accurately than showing the differences in cellular density plots. We used dorsoventral and mediolateral binning in our first paper decades ago (Avarez et al., 2005). This has now been replaced by more rigorous density profiles that describe better cell distributions. Unfortunately, to obtain the most accurate density profiles we need to pool all cells from all animals precluding statistical comparisons. This is because for some groups there have very few cells per animal (for example early born Sp8 or Foxp2 cells).

      Figure 3 and Figure 4: These, and all figures that compare the lineage trace and antibody staining, should be moved to the supplement in my opinion-as they are not for generalist readers but rather specialists that are interested in these exact tools. In addition, the majority of the text that relates to these figures should be transferred to the supplement as well. Figure 5: Another great figure that sets the stage for the analysis of FoxP2V1-to-MN synaptic connectivity, and provides basic information about the rostrocaudal distribution of this clade, by analyzing settling position by level. I have only minor comments. The grid in B obscures the view of the cells and should be removed. The motor neuron cell bodies in C would be better visible if they were red.

      We moved some of the images to supplemental (see new supplemental Fig S4). However, we also added new data to the figure as requested by reviewers (Fig 3E-G). We preserved our analyses of Foxp2 and non-Foxp2 V1s across ages and spinal segments because we think this information is critical to the paper. Finally, we want to prevent misleading readers into believing that Foxp2 is a marker that is unique to V1s. Therefore, we also preserved Figures 3H to 3J showing the non-V1 Foxp2 population in the ventral horn. 

      Figure 6: Very careful and quantitative analysis of V1 synaptic input to motor neurons is presented here.  For the reader, a summary figure (similar to B but with V1s too) that schematizes V1 FoxP2 versus Renshaw cell connectivity with LMC, MMC, and PGC motor neurons are one level would be useful.

      Thanks for the suggestion. A summary figure has now been included (Figure 5G). 

      Figure 7: The goal of this figure is to highlight intra-clade diversity at the level of transcription factor expression (or maintenance of expression), birth timing and cell body position culminating in the clear and concise diagram presented in G. In panels A-F however, it takes extra effort to link the data shown to these I-IV subtypes. The figure should be restructured to better highlight these links. One option might be to separate the figure into four parts (one for each type): with the individual spatial, birth timing and TF data for each population extracted and presented in each individual part.

      We agree with the reviewer that this is a very busy figure. We tried to re-structure the figure following the suggestions of the reviewer and also several alternative options. All resulted in designs that were more difficult to follow than the original figure. We apologize for its complexity, but we believe this is the best organization to describe all the data in the simplest form.

      Figure 8: in A-D, the main point of the figure - that V1FoxP2Otp preferentially receive proprioceptive synapses is buried in a bunch of technical details. To make it easier for the reader, please:

      (1) add a summary as in B of the %FoxP2-V1 Otp+ cells (82%) with Vglut1 synapses to make the point stronger that the majority of these cells have synapses.

      We added this graph by extending the previous graph to include lineage labeled Foxp2-V1s with OTP or Foxp2 immunoreactivity. It is now Figure 7B.

      (2) Additionally, add a representative example that shows large numbers of proximal synapses on an FoxP2-V1 Otp+.

      The image we presented before as Figure 8A was already immunostained for OTP, so we just added the OTP channel to the images. Now all this information is in panels that are subparts of Figure 7A.

      (3) Move the comparison between FoxP2-V1 and FoxP2AB+V1s to the supplement.

      We preserved the quantitative data on Foxp2-V1 lineage cells with Foxp2-immunoreactivity but made this a standalone figure, so it is not as busy.

      (4) Move J-M description of antibody versus lineage trace of Otp to supplement as ending with this confuses the main message of the paper (see comment above).

      All results for the Otp-V1 mouse model have now been placed in a supplemental figure (Figure 5S).

      Discussion: A more nuanced and detailed discussion of how the temporal pattern of subtype generation presented here aligns with the established temporal transcription factor code (nicely summarized in Sagner 2024) would be helpful to place their work in the broader context of the field.

      This aspect of the discussion was expanded on pages 20 and 21. We replaced the earlier cited review (Sagner and Briscoe, 2019, Development) with the updated Sagner 2024 review and further discussed the data in the context of the field and neurogenesis waves throughout the neural tube, not only the spinal cord. We previously carefully compared our data with the spinal cord data from Sagner’s group (Delile et, 2019, Development). We have now further expanded this comparison in the discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      In this manuscript by Napoli et al, the authors study the intracellular function of Cytosolic S100A8/A9 a myeloid cell soluble protein that operates extracellularly as an alarmin, whose intracellular function is not well characterized. Here, the authors utilize state-of-the-art intravital microscopy to demonstrate that adhesion defects observed in cells lacking S100A8/A9 (Mrp14-/-) are not rescued by exogenous S100A8/A9, thus highlighting an intrinsic defect. Based on this result subsequent efforts were employed to characterize the nature of those adhesion defects.

      The authors thank reviewer #1 for his/her insightful comments and suggestions. Please find our point to point responses below.

      (1) Ex vivo characterization of the function of S100A8/A9 in adhesion, spreading, and calcium signaling requires at least one rescue experiment to support the direct role of these proteins in the biological processes under study.

      We thank the reviewer for this comment. We agree that rescue experiments would be helpful to confirm the direct role of intracellular S100A8/A9 in adhesion, spreading, and Ca2+ signaling. Although transfection of primary cells, especially neutrophils, poses challenges due to their short half-life, we now have undertaken additional in vitro rescue experiments. Specifically, we used extracellular S100A8/A9 and coated Ibidi flow chambers with E-selectin, ICAM-1 and CXCL1 alone or alongside S100A8/A9, and measured rolling and adhesion of blood neutrophils. Our data reveal that extracellular S100A8/A9 can induce increased adhesion in WT neutrophils but fails to rescue the adhesion defect in Mrp14-/- neutrophils (Author response image 1). This result corroborates our in vivo findings, emphasizing that the observed adhesion defect is due to the lack of intracellular S100A8/A9.

      Author response image 1.

      Extracellular S100A8/A9 does not rescue the adhesion defect in Mrp14/- neutrophils. Analysis of number of adherent leukocytes FOV-1 normalized to the WBC of WT and Mrp14-/- mice. Whole blood was harvested through a carotid artery catheter and perfused with a high precision pump at constant shear rate using flow cambers coated with either E-selectin, ICAM-1 and CXCL1 or E-selectin, ICMA-1, CXCL1 and S100A8/A9. [mean+SEM, n=5 mice per group, 12 (WT) and 14 (Mrp14-/-) flow chambers, 2way ANOVA, Sidak’s multiple comparison]. ns, not significant; *p≤0.05, **p≤0.01, ***p≤0.001.

      (2) There is room for improvement in the analysis of signaling pathways presented in Figures 3 H and I. Western blots and analyses are not convincing, in particular for p-Pax.

      We acknowledge the reviewer's concern regarding the clarity of the signaling pathway analysis, particularly the western blots for p-Paxillin. To address this, we have repeated the western blot experiments using murine neutrophils. Our new data confirm the defective paxillin phosphorylation upon CXCL1 stimulation and ICAM-1 binding in the absence of cytosolic S100A8/A9. We have now integrated these new findings with the original data and included the updated results in the manuscript (Figure 3I revised). These enhanced analyses provide a more robust and convincing demonstration of the signaling defects in Mrp14-/- neutrophils.

      (3) At least one western blot showing a knockdown of S100A8/A9 should be included towards the beginning of the result section.

      We appreciate the reviewer's suggestion to include a western blot demonstrating the knockout of S100A8/A9 early in the results section. In a recent publication by our group, we have already demonstrated the absence of S100A8/A9 at the protein level in Mrp14-/- neutrophils via western blotting ([1], please refer to Extended Data Fig. 1h). We agree that visual confirmation of the absence of S100A8/A9 protein is crucial for establishing the validity of our study.

      (4) The Ca2+ measurements at LFA-1 nanoclusters using the Mrp14-/- Lyz2xGCamP5 are interesting; It is understood that the authors are correcting calcium levels by normalizing by LFA-1 cluster areas and that seems fine to me. The issue is that the total calcium signal seems decreased in Mrp14-/- cells compared to WT cells (Fig. 4E)...why is totalCa2+ low? Please discuss.

      We thank the reviewer for this insightful comment. Indeed, our observations reveal reduced overall Ca2+ levels in Mrp14-/- neutrophils compared to WT neutrophils. Initially, we noticed a general decrease in Ca2+ intensity (Author response image 2A-B) and lifetime in Mrp14-/- neutrophils (Author response image 2C-D). Further analysis indicated that these differences in Ca2+ levels are localized specifically to the LFA-1 nanocluster sites. In contrast, the cytosolic Ca2+ levels outside of the LFA-1 nanocluster areas were comparable between Mrp14-/- and WT neutrophils (Figure 4H-J). This suggests that the reduced total Ca2+ levels observed in Mrp14-/- neutrophils are primarily due to the impaired Ca2+ supply at the LFA-1 nanocluster areas. Our data support the notion that cytosolic S100A8/A9 plays a crucial role in actively supplying Ca2+ to LFA-1 nanoclusters during neutrophil crawling. In the absence of S100A8/A9, the increase in overall Ca2+ levels (summing both inside and outside LFA-1 nanocluster areas) is minimal, further highlighting the specific role of S100A8/A9 in maintaining localized Ca2+ concentrations at these crucial sites.

      Author response image 2.

      Overall Ca2+ levels in WT and Mrp14-/- neutrophils (A) Representative confocal images of neutrophils from WT Lyz2xGCaMP5 and Mrp14-/- Lyz2xGCaMP5 mice, labeled with Lyz2 td Tomato marker. The images illustrate overall cytosolic Ca2+ levels during neutrophil crawling flow chambers coated with E-selectin, ICAM-1, and CXCL1 (scale bar=10μm). (B) Quantitative analysis of total cytosolic Ca2+ intensity in single cells from WT Lyz2xGCaMP5 and Mrp14-/- Lyz2xGCaMP5 neutrophils measured over three time intervals: min 0-1, 5-6 and 9-10 [mean+SEM, n=5 mice per group, 56 (WT) and 54 (Mrp14-/-) neutrophils, 2way ANOVA, Sidak’s multiple comparison]. (C) Representative traces and (D) single cell analysis of total Ca2+ lifetime over the first 5 minutes in WT Lyz2xGCaMP5 and Mrp14-/- Lyz2xGCaMP5 neutrophils crawling on Eselectin, ICAM-1, and CXCL1 coated flow chambers recorded with FLIM microscopy [mean+SEM, n=3 mice per group, 111 (WT) and 95 (Mrp14-/-) neutrophils, 2way ANOVA, Sidak’s multiple comparison]. ns, not significant; *p≤0.05, **p≤0.01, ***p≤0.001.

      (5) Even if the calcium level outside LFA-1 nanoclusters is not significant (Figure 4J), the data at min 9-10 in Figure 4J seems to be affected by a single event that may be an outlier. Additional data may be needed here.

      We appreciate the reviewer’s attention to this detail. To address the concern regarding a potential outlier in the Ca2+ level measurements at 9-10 minutes in Figure 4J, we rigorously tested the dataset using the GraphPad outlier calculator. The analysis revealed that no data point was statistically identified as an outlier. Given that the current dataset is robust and the statistical analysis confirms the integrity of the data, we believe that the results accurately reflect the biological variability observed in our experiments. Therefore, we have not added additional data points at this stage but remain open to discussing this further.

      (6) Finally, even though there is less calcium at LFA-1 clusters, that does not necessarily mean that "cytosolic S100A8/A9 plays an important role in Ca2+ "supply" at LFA-1 adhesion spots" as proposed. S100A8/A9 may play an indirect role in calcium availability. The analysis of the subcellular localization of S100A8/A9 at LFA-1 clusters together with calcium dynamics in stimulated WT cells would help support the authors' interpretation, which although possibly correct, seems speculative at this point.

      We thank the reviewer for this insightful comment and fully agree that additional evidence regarding the subcellular localization of S100A8/A9 would strengthen our conclusions. Although live cell imaging of intracellular S100A8/A9 was initially challenging due to technical limitations, we have now performed additional experiments to address this issue. We conducted end-point measurements where we allowed WT neutrophils to crawl on E-selectin, ICAM-1, and CXCL1 coated flow chambers for 10 minutes. Following this, we fixed and permeabilized the cells to stain intracellular S100A9, along with LFA-1 and a cell tracker for segmentation. Confocal microscopy and subsequent single-cell analysis revealed a significant enrichment of S100A8/A9 at LFA-1 positive nanocluster areas compared to the surrounding cytosol (Figure 4K and 4L, new). This finding supports our hypothesis that S100A8/A9 plays a direct role in the localized supply of Ca2+ at LFA-1 adhesion spots, thus facilitating efficient neutrophil crawling under shear stress. These new data have been included in the revised manuscript, providing stronger evidence for our proposed mechanism.

      Reviewer #2:

      Napoli et al. provide a compelling study showing the importance of cytosolic S100A8/9 in maintaining calcium levels at LFA-1 nanoclusters at the cell membrane, thus allowing the successful crawling and adherence of neutrophils under shear stress. The authors show that cytosolic S100A8/9 is responsible for retaining stable and high concentrations of calcium specifically at LFA-1 nanoclusters upon binding to ICAM-1, and imply that this process aids in facilitating actin polymerisation involved in cell shape and adherence. The authors show early on that S100A8/9 deficient neutrophils fail to extravasate successfully into the tissue, thus suggesting that targeting cytosolic S100A8/9 could be useful in settings of autoimmunity/acute inflammation where neutrophil-induced collateral damage is unwanted.

      The authors appreciate reviewer #2's insightful comments and suggestions. Below are our detailed responses:

      (1) Extravasation is shown to be a major defect of Mrp14-/- neutrophils, but the Giemsa staining in Figure 1H seems to be quite unspecific to me, as neutrophils were determined by nuclear shape and granularity. It would have perhaps been more clear to use immunofluorescence staining for neutrophils instead as seen in Supplementary Figure 1A (staining for Ly6G or other markers instead of S100A9).

      We acknowledge the reviewer's concern. However, Giemsa staining is a well-established method in hematology, histology, cytology, and bacteriology, widely recognized for its ability to distinguish leukocyte subsets based on nuclear shape and cytoplasmic characteristics. This method is extensively documented in the literature [2-5]. Its advantages are the easy morphological discrimination of leukocytes based on nuclear and cytoplasmic shape and conformation (Author response image 3).

      Author response image 3.

      Giemsa staining of extravasated leukocyte subsets. (A) Representative image of Giemsa-stained cremaster muscle tissue post-TNF stimulation. The image clearly differentiates leukocyte subsets (white arrow = neutrophils, yellow arrow = eosinophils, red arrow = monocytes). Scale bar = 50µm.

      (2) The representative image for Mrp14-/- neutrophils used in Figure 4K to demonstrate Ripley's K function seems to be very different from that shown above in Figures 4C and 4F.

      The reviewer correctly observed that the cell in Figure 4K is different from those in Figures 4C and 4F. This is intentional, as Figure 4K is meant to show a representative image that accurately reflects the overall results of the experiments. We assure the reviewer that all cells analyzed in Figures 4C and 4F were also included in the analysis for Figure 4K.

      (3) Although the authors have done well to draw a path linking cytosolic S100A8/9 to actin polymerisation and subsequently the arrest and adherence of neutrophils in vitro, the authors can be more explicit with the analysis - for example, is the F-actin co-localized with the LFA-1 nanoclusters? Does S100A8/9 localise to the membrane with LFA-1 upon stimulation? Lastly, I think it would have been very useful to close the loop on the extravasation observation with some in vitro evidence to show that neutrophils fail to extravasate under shear stress.

      We thank the reviewer for this comment and questions. 

      Concerning the co-localization of F-actin with LFA-1 nanoclusters and S100A8/9 localization: We appreciate the reviewer's interest in the co-localization between F-actin and LFA-1. Unfortunately, due to the limitations of our GCaMP5 mouse model (with neutrophils labeled with td-Tomato and eGFP for LyzM and Ca2+), we could only stain for either LFA-1 or F-actin at a time. However, in our F-actin movies, we observed that F-actin predominantly localizes at the rear of the cell, while LFA-1 is more uniformly distributed at the plasma membrane.

      Regarding S100A8/A9 localization, as mentioned in response to Reviewer 1's sixth point, we now conducted endpoint measurements. We stained neutrophils with cell tracker green CMFDA and LFA-1, allowed them to crawl on E-selectin, ICAM-1, and CXCL1-coated flow chambers, and then performed intracellular S100A9 staining after fixation and permeabilization. Our analysis shows higher S100A9 intensity at LFA-1 positive areas compared to LFA-1 negative areas (Figure 4K and 4L, new). This indicates that S100A8/A9 indeed concentrates Ca2+ at LFA-1 nanoclusters, supporting adhesion and post-arrest modification events under flow.

      Regarding the extravasation defect under shear stress: To address the reviewer's suggestion, we performed transwell migration assays under static conditions. Our results show no significant difference in transmigration between WT and Mrp14-/- neutrophils without flow, indicating that the extravasation defect in Mrp14-/- neutrophils is shear-dependent. This supports our hypothesis that S100A8/A9-mediated Ca2+ supply at LFA-1 nanoclusters is critical under flow conditions (Author response image 4).

      Author response image 4.

      Static Transmigration assay. (a) Transmigration of WT and Mrp14-/- neutrophils in static transwell assays (3um pore size, 45min migration time) showing spontaneously migration (PBS) or migration towards CXCL1. [mean+SEM, n=3 mice per group, 2way ANOVA, Sidak’s multiple comparison]. ns, not significant; *p≤0.05, **p≤0.01, ***p≤0.001.

      Additional References

      (1) Pruenster, M., et al., E-selectin-mediated rapid NLRP3 inflammasome activation regulates S100A8/S100A9 release from neutrophils via transient gasdermin D pore formation. Nature Immunology, 2023. 24(12): p. 2021-2031.

      (2) Kuwano, Y., et al., Rolling on E- or P-selectin induces the extended but not high-affinity conformation of LFA-1 in neutrophils. Blood, 2010. 116(4): p. 617-24.

      (3) Porse, B., Mouse Hematology – A Laboratory Manual. European Journal of Haematology, 2010. 84(6): p. 554-554.

      (4) Frommhold, D., et al., Protein C concentrate controls leukocyte recruitment during inflammation and improves survival during endotoxemia after efficient in vivo activation. Am J Pathol, 2011. 179(5): p. 2637-50.

      (5) Braach, N., et al., RAGE Controls Activation and Anti-Inflammatory Signalling of Protein C. PLOS ONE, 2014. 9(2): p. e89422.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study characterized the cellular and molecular mechanisms of spike timing-dependent long-term depression (t-LTD) at the synapses between excitatory afferents from lateral (LPP) and medial (MPP) perforant pathways to granule cells (GC) of the dentate gyrus (DG) in mice.

      Strengths:

      The electrophysiological experiments are thorough. The experiments are systematically reported and support the conclusions drawn.

      This study extends current knowledge by elucidating additional plasticity mechanisms at PP-GC synapses, complementing existing literature.

      We thank the reviewer for the positive assessment of our work and the constructive suggestions to improve the manuscript.

      Weaknesses:

      To more conclusively define the pivotal role of astrocytes in modulating t-LTD at MPP and LPP GC synapses through SNARE protein-dependent glutamate release, as posited in this study, the authors could adopt additional methods, such as alternative mouse models designed to regulate SNARE-dependent exocytosis, as well as optogenetic or chemogenetic strategies for precise astrocyte manipulation during t-LTD induction. This would provide more direct evidence of the influence of astrocytic activity on synaptic plasticity.

      We thank the reviewer for the suggestion. As stated in the manuscript and in figure 4, we already used two different approaches (aBAPTA to interfere with astrocyte calcium signalling and dnSNARE mice (that have vesicular release impaired) to determine the involvement of astrocytes in the discovered forms of LTD, and both approaches clearly indicated the requirement of astrocytes for t-LTD. In BAPTA-treated astrocytes and in dnSNARE mice, t-LTD was prevented. Notwithstanding this, and as suggested by the reviewer, we used two additional approaches to confirm astrocyte participation. We loaded astrocytes with the light chain of the tetanus toxin (TeTxLC), which is known to block exocytosis by cleaving the vesicle-associated membrane protein, an important part of the SNARE complex (Schiavo et al., 1992, Nature 359, 832-835). In this experimental condition, we observed a clear lack of t-LTD at both (lateral and medial) pathways, thus confirming the requirement of astrocytes and the SNARE complex and vesicular release for both types of t-LTD. In addition, to gain more insight into the fact that glutamate is released by astrocytes, we blocked glutamate release from astrocytes by loading the astrocytes with Evans blue, known to interfere with glutamate uptake into vesicles as it inhibits the vesicular glutamate transporter (VGLUT). In this experimental condition, again t-LTD was prevented, indicating that t-LTD requires Ca2+dependent exocytosis of glutamate from astrocytes.

      Reviewer #2 (Public Review):

      Summary:

      This work reports the existence of spike timing-dependent long-term depression (t-LTD) of excitatory synaptic strength at two synapses of the dentate gyrus granule cell, which are differently connected to the entorhinal cortex via either the lateral or medial perforant pathways (LPP or MPP, respectively). Using patch-clamp electrophysiological recording of tLTD in combination with either pharmacology or a genetically modified mouse model, they provide information on the differences in the molecular mechanism underlying this t-LTD at the two synapses.

      Strengths:

      The two synapses analyzed in this study have been understudied. This new data thus provides interesting new information on a plasticity process at these synapses, and the authors demonstrate subtle differences in the underlying molecular mechanisms at play. Experiments are in general well controlled and provide robust data that are properly interpreted.

      We thank the reviewer for the positive assessment of our work and the constructive suggestions to improve the manuscript.

      Weaknesses:

      • Caution should be taken in the interpretation of the results to extrapolate to adult brain as the data were obtained in P13-21 days old mice, a period during which synapses are still maturing and highly plastic.

      We thank the reviewer for noticing this. In fact, our experiments were intentionally performed in young animals (P13-21), just knowing that this is a critical period of plasticity. We indicate that in the methods, results, and discussion (where we discuss that in some detail) sections.

      • In experiments where the drug FK506 or thapsigargin are loaded intracellularly, the concentrations used are as high as for extracellular application. Could there be an error of interpretation when stating that the targeted actors are necessarily in the post-synaptic neuron? Is it not possible for the drug to diffuse out of the cell as it is evident that it can enter the cell when applied extracellularly?

      We thank the reviewer for rising this point. While it would be possible that these compounds cross the cell membranes, to do it and to pass to other cells, this would, in principle, require a relatively long time to occur. Additionally, to have any effect, the same concentration or a relatively high concentration of that we put into the pipette has to reach other cells. Furthermore, even if a compound is able to cross a cell membrane during the duration of an experiment, after this, it may be exposed to the extracellular fluid where will be diluted and most probably washed out. For all these reasons, we do not see this very plausible. Notwithstanding this, and as suggested, we have repeated the experiments using lower concentrations of thapsigargin (1 uM) and FK506 (1 uM), and have obtained the same results. These data are now included in the figure 3 and in the text.

      • The experiments implicating glutamate release from astrocytes in t-LTD would require additional controls to better support the conclusions made by the authors. As the data stand, it is not clear, how the authors identified astrocytes to load BAPTA and if dnSNARE expression in astrocytes does not indirectly perturb glutamate release in neurons.

      We thank the reviewer for rising this point. We now indicate how astrocytes have been identified to load BAPTA. We reply to this in detail in the “Recommendations for the authors” from reviewer 2.

      Significance:

      While this is the first report of t-LTD at these synapses, this plasticity process has been mechanistically well investigated at other synapses in the hippocampus and in the cortex. Nevertheless, this new data suggests that mechanistic differences in the induction of t-LTD at these two DG synapses could contribute to the differences in the physiological influence of the LPP and MPP pathways.

      Reviewer #3 (Public Review):

      Coatl et al. investigated the mechanisms of synaptic plasticity of two important hippocampal synapses, the excitatory afferents from lateral and medial perforant pathways (LPP and MPP, respectively) of the entorhinal cortex (EC) connecting to granule cells of the hippocampal dentate gyrus (DG). They find that these two different EC-DG synaptic connections in mice show a presynaptically expressed form of long-term depression (LTD) requiring postsynaptic calcium, eCB synthesis, CB1R activation, astrocyte activity, and metabotropic glutamate receptor activation. Interestingly, LTD at MPP-GC synapses requires ionotropic NMDAR activation whereas LTD at LPP-GC synapse is NMDAR independent. Thus, they discovered two novel forms of t-LTD that require astrocytes at EC-GC synapses. Although plasticity of EC-DG granule cell (GC) synapses has been studied using classical protocols, These are the first analysis of the synaptic plasticity induced by spike timing dependent protocols at these synapses. Interestingly, the data also indicate that t-LTD at each type of synapse require different group I mGluRs, with LPP-GC synapses dependent on mGluR5 and MPP-GC t-LTD requiring mGluR1.

      The authors performed a detailed analysis of the coefficient of variation of the EPSP slopes, miniature responses and different approaches (failure rate, PPRs, CV, and mEPSP frequency and amplitude analysis) they demonstrate a decrease in the probability of neurotransmitter release and a presynaptic locus for these two forms of LTD at both types of synapses. By using elegant electrophysiological experiments and taking advantage of the conditional dominant-negative (dn) SNARE mice in which doxycycline administration blocks exocytosis and impairs vesicle release by astrocytes, they demonstrate that both LTD forms require the release of gliotransmitters from astrocytes. These data add in an interesting way to the ongoing discussion on whether LTD induced by STDP participates in refining synapses potentially weakening excitatory synapses under the control of different astrocytic networks. The conclusions of this paper are mostly well supported by data, but some aspects the results must be clarified and extended.

      We thank the reviewer for the positive assessment of our work and the constructive suggestions to improve the manuscript.

      (1) It should be clarified whether present results are obtained with or without the functional inhibitory synapse activation. It is not clear if GABAergic synapses are blocked or not. If GABAergic synapses are not blocked authors must discuss whether the LTD of the EPSPs is due to a decrease in glutamatergic receptor activation or an increase in GABAergic receptor activation. Moreover, it should be recommended to analyze not only the EPSPs but also the EPSCs to address whether the decrease in synaptic transmission is caused by a decrease in the input resistance or by a decrease in the space constant (lambda).

      We thank the reviewer for rising these points. GABAergic inhibition was not blocked in our experiments. The observed forms of t-LTD seem to be due to a decrease in glutamate release probability as indicated in the manuscript, mediated by the mechanism we uncover and describe here. To determine and clarify whether GABA receptors have any role in these forms of t-LTD, we repeated the experiments in the presence of the GABAA and GABAB receptors antagonists bicuculline and SCH50911, respectively. Blocking GABA receptors do not prevent or affect t-LTD at LPP- or MPP-GC synapses, that is still present and with a similar magnitude that controls. These results indicating that these receptors are not involved in these forms of t-LTD. These results are now included in the text in the results section (page 8) and as a new figure S1. In our experiments, no changes in input resistance or space constant were observed, and importantly, no changes were observed in the amplitude/slopes of EPSP in the control pathway that does not undergo plasticity protocol that we routinely use in our experiments.

      (2) Authors show that Thapsigargin loaded in the postsynaptic neuron prevents the induction of LTD at both synapses. Analyzing the effects of blocking postsynaptic IP3Rs (Heparin in the patch pipette) and Ryanodine receptors (Ruthenium red in the patch pipette) is recommended for a deeper analysis of the mechanism implicated in the induction of this novel forms of LTD in the hippocampus.

      We thank the reviewer for this suggestion. We repeated the experiments loading the postsynaptic cell with heparin and ruthenium red using the path pipette. In these experimental conditions, we observed that t-LTD was not affected by the heparin treatment (discharging a role of IP3Rs), but that it was prevented by the ruthenium red treatment (indicating the requirement of ryanodine receptors). We include now this data in the text (page 12) and in the Figure 3a, b, e, f.

      (3) Authors nicely demonstrate that CB1R activation is required in these forms of LTD by blocking CB1Rs with AM251, however an interesting unanswered question is whether CB1R activation is sufficient to induce this synaptic plasticity. This reviewer suggests studying whether applying puffs of the CB1R agonist, WIN 55,212-2, could induce these forms of LTD.

      We thank the reviewer for this suggestion. We repeated the experiments adding WIN55, 212-2 as suggested.  The activation of CB1R by puffs of the agonist WIN 55, 212-2 to the astrocyte, directly induced LTD at both LPP- and MPP-GC synapses. We include now this data in the text (page 14) and in the Figure 3c, d, g, h.

      (4) Finally, adding a last figure with a cartoon summarizing the proposed model of action in these novel forms of LTD would add a positive value and would help the reading of the manuscript, especially in those aspects related with the discussion of the results.

      We thank the reviewer for the suggestion. We include now a figure showing the proposed mechanisms (Figure 5).

      The extension of these results would improve the manuscript, which provides interesting results showing two novel forms of presynaptic t-LTD in the brain synapses with different action mechanisms probably implicated in the different aspects of information processing.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There are just a few aspects that could be clarified to bolster the authors' conclusions.

      The author centered the conclusion of their study on the role of astrocytic activity in regulating these two forms of plasticity (see title). To strengthen the evidence that astrocytes are key regulators of t-LTD at MPP and LPP GC synapses by regulating SNARE protein-dependent glutamate release, additional complementary approaches should be considered, such as other mouse models enabling the control of SNARE-dependent exocytosis and/or optogenetic/chemogenetic tools to selectively manipulate astrocytes during the induction of t-LTD, thereby directly assessing the impact of astrocytic activity on synaptic plasticity. Implementing calcium imaging or glutamate sensors to visualize the dynamics of astrocytic calcium signaling and glutamate release during t-LTD could be also considered.

      We thank the reviewer for the suggestion. As stated in the manuscript and in figure 4, we already used two different approaches (aBAPTA to interfere with astrocyte calcium signalling and dnSNARE mice (that have vesicular release impaired) to determine the involvement of astrocytes in the discovered forms of LTD, and both approaches clearly indicated the requirement of astrocytes for t-LTD. In BAPTA-treated astrocytes and in dnSNARE, t-LTD was prevented. Notwithstanding this, and as suggested by the reviewer, we used two additional approaches to confirm astrocytes participation. We loaded astrocytes with the light chain of the tetanus toxin (TeTxLC), which is known to block exocytosis by cleaving the vesicle-associated membrane protein, an important part of the SNARE complex (Schiavo et al., 1992, Nature 359, 832-835). In this experimental condition, we observed a clear lack of t-LTD at both (lateral and medial) pathways, thus confirming the requirement of astrocytes and the SNARE complex and vesicular release for both types of t-LTD. In addition, to gain more insight into the fact that glutamate is released by astrocytes, we blocked glutamate release from astrocytes by loading the astrocytes with Evans blue, known to interfere with glutamate uptake into vesicles as it inhibits the vesicular glutamate transporter (VGLUT). In this experimental condition, again t-LTD was prevented, indicating that t-LTD requires Ca2+-dependent exocytosis of glutamate from astrocytes. This information is now included in the text, pages 14 and 15 and in figure 4.

      • How were astrocytes identified to be loaded with BAPTA? The author should clarify this methodological aspect and provide confocal images of patched astrocytes situated 50-100 um from the recorded neuron.

      We thank the reviewer for the comment. We include now this information in the Methods section (page 6) and in figure S3. Astrocytes were identified by their rounded morphology under differential interference contrast microscopy, and were characterized by low membrane potential, low membrane resistance and passive responses (they do not show action potentials) to both negative and positive current injection.

      • Please provide confocal images of EGFP expression in the DG astrocytes of dnSNARE mice both on and off Dox, to verify transgene expression in astrocytes

      We thank the reviewer for this suggestion. We now include an image of GFP expression in the DG astrocytes of off Dox dnSNARE mice. We did not provide the animals with doxycycline since birth and thus the gene was constantly expressed. We now show this image in Fig. S3. All the pups and mice are not DOX fed, meaning that the transgenes are continuously being expressed and therefore the exocytosis should be blocked in astrocytes.

      Minor points:

      Lines 250-253: It is mentioned that TTX is added at baseline, washed out for the t-LTD experiment, and then reapplied post t-LTD. I suggest clarifying the timing and rationale for this application for a broad audience.

      We thank the reviewer for the suggestion. We now include some information related to the timing and rationale of the experiment phases (page 9).

      The discussion is quite detailed and provides a comprehensive overview of the study's findings. To enhance clarity and impact, the authors might consider to,

      • add subheadings and bullet points for key findings. This will improve readability.

      • this section could benefit from streamlining to avoid redundancy.

      • some sentences could be made more concise without losing meaning.

      We thank the reviewer for these suggestions. We now include subheadings in the discussion section to improve readability and have made some sentences more concise and simple without losing meaning.

      In figure legends, consistency with capitalization should be maintained, for example in the statistical significance notation, ***P < 0.001" or ***p < 0.001")

      We now include p<0.001 in the figure legend 4 for consistency.

      Reviewer #2 (Recommendations For The Authors):

      Major:

      • All results were obtained in young still quite immature synapses. To strengthen the significance of the findings, the authors could repeat some of the main experiments in adult mice (8 weeks and beyond). If not, they should state clearly that these mechanisms were only evidenced in early post-natal conditions.

      We thank the reviewer for noticing this. In fact, our experiments were intentionally performed in young animals (P13-21), just knowing that this is a critical period of plasticity. As the reviewer suggests, we indicate that in the methods (page 5), results (page 8), and discussion (page 19) (where we discuss that in some detail) sections.

      • Lines 246-249 and fig 1f,p: Authors need to perform a statistical test on these two graphs to support their claim that 'A plot of CV-2 versus the change in the mean evoked EPSP 246 slope (M) before and after t-LTD mainly yielded points below the diagonal line at LPP-GC and MPP-GC synapses'.

      That could not be clear in the previous version. We observed an error in the points (with some points missing) of one of the graphs that we have corrected. In addition, and as suggested by the reviewer we performed a regression analysis that confirms the conclusions stated. This is now included in the text (page 9). Thus, we have added information about mean values ± SEM in the text and the linear regression of the data for LPP-GC (Mean = 0.607 ± 0.054 vs 1/CV2 = 0.439 ± 0.096, R2 = 0.337; n = 14) and MPP-GC synapses (Mean = 0.596 ± 0.056 vs 1/CV2 = 0.461 ± 0.090, R2 = 0.168; n = 13), respectively. Data yielded on the dotted horizontal line, 1/CV2 = 1, indicates no change in the probability of release, in contrast, data yielded below the dotted diagonal line is suggestive of a change in the probability of release parameters (for review, see Brock et al., 2020, Front Synaptic Neurosci 12, 11).

      • We are not sure that the experiment with the MK801 provided in the patch pipet can be interpreted correctly (Figure 2 a,b and e,f). How sure are the authors that, when applying MK801 in the patch pipet, it can reach its binding site within the pore? The concentration of MK801 is also very high (500 microM) and used at the same concentration extracellularly and intracellularly. Why did the authors not use lower concentration when applied intracellularly?

      We thank the reviewer for rising this point. MK801 in the pipette is reaching the pore when loaded postsynaptically as when we record NMDA currents from postsynaptic neurons loaded with MK801, these currents are blocked. We include now a control experiment showing the effect of postsynaptic MK801 on NMDA current in the text (page 10). NMDA currents has been recorded at +40 mV, blocking AMPAR and GABAR with NBQX and bicuculline. Related to the concentration, it has been described that the affinity from the internal site is much lower (several orders of magnitude) than from the extracellular side(Sun et al., 2018 Neuropharmacology, 143, 122-129) and the concentrations used have been extensively used in previous studies. It is clear that the concentrations used in the present work blocked NMDAR currents but did not prevent LTD.

      • Linked to the point above, for the intracellular application of FK506 and thapsigargin, the concentrations used extracellularly and intracellularly are identical. The authors could have used lower concentrations for the intracellular application. Also, how can they be sure of the correct interpretation of these data as the drug essentially reaching a post-synaptic target when applied intracellularly? If the drug can enter the neuron, why could it not diffuse out of the neuron especially when loaded at a high concentration? Maybe using a lower concentration when applied intracellularly could at least partially address this issue.

      It is evident that it can enter the cell when applied extracellularly?

      We thank the reviewer for rising this point. While it would be possible that these compound cross the cell membranes, to do it and to pass to other cells, this would, in principle, require a relatively long time to occur. Additionally, to have any effect, the same concentration or a relatively high concentration of that we put into the pipette has to reach other cells. Furthermore, even if a compound is able to cross a cell membrane during the duration of an experiment, after this, it may be exposed to the extracellular fluid where it will be diluted and most probably washed out. For all these reasons, we do not see this very plausible. Notwithstanding this, we have repeated the experiments using lower concentrations of thapsigargin (1 uM) and FK506 (1 uM) and have obtained the same results. These data are now included in the figure 3 and the numbers in the text have been updated (pages 12-13).

      • The data supporting the possibility of glutamate release by astrocytes as a main source of glutamate to promote t-LTD needs to be strengthened. In experiment Figure a-h, it is not clear how the authors recognize astrocytes to patch. No details are provided in the methods or in the main text. If we understand correctly, it is only by performing a current steps protocol to ensure that the patched cell did not produce action potentials. If this was the case, the authors need to be more specific and provide details of this protocol. More importantly, the one trace that was provided in Figures 4a and 4f suggests, albeit by a rough estimation that we made with a ruler, that the highest current step only depolarized the cell to about -40 mV. This is not sufficient to ensure that the recorded cell is not a neuron. The authors should increase their steps to high depolarizing currents to ensure that the patched cell is not a neuron. Better yet, they should load the cell with an dye to process the slice after the electrophysiological recording for immunohistochemistry to ensure that it was indeed an astrocyte. Alternatively, they can try to aspirate the cell content at the end of the recording to perform a qPCR for astrocyte markers eg. GFAP.

      We thank the reviewer for the comment. We include now information regarding how astrocytes were identified (also raised by reviewer 1) in the Methods section (page 6) and in figure S3. Astrocytes were identified by their rounded morphology under differential interference contrast microscopy, eGFP fluorescence (astrocytes from dnSNARE mice), and were characterized by low membrane potential, low membrane resistance and passive responses (they do not show action potentials) to both negative and positive current injection.

      We agree with the reviewer that in figure 4a and 4f, the step protocol might not be completely clear. For this, we revised that and now include in a clearer way that we applied pulses that depolarized astrocytes beyond -20 mV, with no action potentials found at any point. We also include now this in figure S3.

      • Related to the point above, the use of the model expressing dnSNARE in astrocytes is elegant. Yet, to really interpret the data obtained in these slices as a lack of vesicle release (and most importantly glutamate) we think that the authors should ensure that glutamate release from nearby neurons is not impacted. They could patch nearby neurons in dnSNARE slices and test PPR or synaptic fatigue when stimulating either the LPP or MPP. The authors should avoid overinterpretation of these results. As it stands, it is not evident that dnSNARE expression does not perturb other mechanisms within the astrocyte that in turn perturb pre-synaptic glutamate release. Adding back glutamate as puffs does not help to disentangle this issue.

      To gain more insight into the fact that glutamate is released by astrocytes we blocked glutamate release from astrocytes by loading the astrocytes with Evans blue, known to interfere with glutamate uptake into vesicles as it inhibits the vesicular glutamate transporter (VGLUT). In this experimental condition, as indicated above, t-LTD was prevented, indicating that t-LTD requires Ca2+-dependent exocytosis of glutamate from astrocytes. This is included in the text (page 15) and in figure 4d,e, i, j.

      In addition, we loaded astrocytes with the light chain of the tetanus toxin (TeTxLC) which is known to block exocytosis by cleaving the vesicle-associated membrane protein, an important part of the SNARE complex (Schiavo et al., 1992, Nature 359, 832-835). In this experimental condition, we observed a clear lack of t-LTD at both (lateral and medial) pathways, thus confirming the requirement of astrocytes and the SNARE complex and vesicular release for both types of t-LTD. These data indicate that t-LTD requires Ca2+-dependent exocytosis of glutamate from astrocytes. This information is now included in the text, page 14 and in figure 4.

      Minor points:

      • line 107, did the authors mean t-LTP and t-LTD? we don't understand STDP mentioned here.

      We meant to say t-LTP. This is now corrected.

      • line 108: should STDP be replaced by t-LTD as the authors only focused on this plasticity mechanism.

      We agree, we indicate now t-LTD.

      • line 131-132 : it is not clear when the animals were fed with doxycycline. If it was from birth, then the 'not' should be removed. Otherwise the authors should clearly state when the doxycyline was provided.

      DOX was not provided and that means that the transgene was continuously expressed and therefore the exocytosis should be blocked in astrocytes. We express that clearer in page 5, methods section.

      • line 223 : which hippocampal synapses? needs to be stated

      As suggested this is now included in the text as for cortical synapses. Synapses are Schaffer collaterals SC-CA1 for hippocampus and layer L4-L2/3 for cortical synapses (page 8).

      • line 273: what do the authors mean when writing 'from'? We don't understand the data provided on this line.

      We thank the reviewer for noticing this. That refers to the amplitude of NMDAR-mediated currents average before and after D-AP5 or MK801. We express this now in a clearer way (page 10, from 57±8 pA to 6±5 pA).

      • line 286 : why do the authors point out work on GluN2B and GluN3A only here when they first investigate GluN2A contribution to t-LTD? what about previous data on GluN2A?

      We have now expressed this in a different way to make it clear. We wanted to indicate that the available data for presynaptic NMDAR at MPP-GC synapses has been indicated to contain GluN2B and GluN3A subunits and to our knowledge, no data indicate that they contain GluN2A subunits.

      • line 428 : what do the authors mean by 'not least' ?

      This is a typo and we have removed that from the text.

      Reviewer #3 (Recommendations For The Authors):

      My only suggestion for improving data presentation in the manuscript would be to split some figures of the paper. In my opinion, the figures are too dense and therefore difficult to follow for the broad audience of eLife readers. In addition, a real image of the recorded dentate granule cells in the slice showing also the location of the real stimulation electrodes would significantly improve the presentation of Figure 1.

      We thank the reviewer for the suggestion, but we would prefer to let the figures as they are organized, as while we agree in some cases they are a bit big, in this way it is easier to compare lateral and medial pathways. For this, it could be better to let information regarding the two pathways in the same figure. Nevertheless, we try now to make figures clearer to use a columnar organization of the figures for each pathway what we think, would make easier to compare pathways. As the reviewer suggests we include now a real image of the recorded dentate granule cells in the slice showing also the location of the real stimulation electrodes in Figure 1, that we agree will improve the presentation of this figure and thank the reviewer for the suggestion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      The reviewers found this manuscript to present convincing evidence for associative and non-associative behaviors elicited in male and female mice during a serial compound stimulus Pavlovian fear conditioning task. The work adds to ongoing efforts to identify multifaceted behaviors that reflect learning in classic paradigms and will be valuable to others in the field. The reviewers do note areas that would benefit from additional discussion and some minor gaps in data reporting that could be filled by additional analyses or experiments.

      We thank the reviewers and the editors for their thoughtful and constructive critiques of our manuscript. We have updated our manuscript with data from additional experiments as suggested by the reviewers, and we have significantly edited the text and figures to reflect these additions. Our detailed, point-by-point responses are below.

      Reviewer #1 (Public Review):

      The main goal of the study was to tease apart the associative and non-associative elements of cued fear conditioning that could influence which defensive behaviors are expressed. To do this, the authors compared groups conditioned with paired, unpaired, or shock only procedures followed by extinction of the cue. The cue used in the study was not typical; serial presentation of a tone followed by a white noise was used in order to assess switches in behavior across the transition from tone to white noise. Many defensive behaviors beyond the typical freezing assessments were measured, and both male and female mice were included throughout. The authors found changes in behavioral transitions from freezing to flight during conditioning as the tone transitioned into white noise, and a switch in freezing during extinction such that it became high during the white noise as flight behavior decreased. Overall, this was an interesting analysis of transitions in defensive behaviors to a serially presented cue consisting of two auditory stimuli during conditioning and then extinction.

      We thank the Reviewer for their supportive insight.

      There are some concerns regarding the possibility that the white noise is more innately aversive than the tone, inducing more escape-like behaviors compared to a tone, especially since the shock only group also showed increased escape-like behaviors during the white noise versus tone. This issue would have been resolved by adding a control group where the order of the auditory stimuli was reversed (white noise->tone).

      We appreciate this concern, and we have added two additional groups to address this possibility. We have conducted the same experimental paradigm with 2 reverse-SCS groups (WN—tone), one with paired (new PA-R group), and one with unpaired (new UN-R group), presentations to shock during conditioning. These experiments revealed that during conditioning day 2 in both reverse order groups, WN causes reductions in freezing and increases in locomotor activity (see revised Figure 2D), an effect that is stronger in the UN-R compared to the PA-R group. This locomotor effect is neither darting nor escape jumping in the PA-R group (revised Figure 3G, I; Figure 4G). In the UN-R group, WN induces more activity than the PA-R group (Figure 2D), including some jumping at WN onset (Figure 3H), but no darting (Figure 4G). It is worth noting that WN does not elicit defensive behavior before conditioning at the sound intensity we use (75dB; see Fadok et al. 2017, Borkar et al. 2020, Borkar et al. 2024). Together, these results suggest that WN is an inherently more salient stimulus than tone, and it can elicit defensive behaviors in shock-sensitized mice through non-associative mechanisms. Indeed, stimulus salience is a key factor in this paradigm for inducing activity (see Hersman et al. 2020).

      While the more complete assessment of defensive behaviors beyond freezing is welcomed, the main conclusions in the discussion are overly focused on the paired group and the associative elements of conditioning, which would likely not be surprising to the field. If the goal, as indicated in the title, was to tease apart the associative and non-associative elements of conditioning and defensive behaviors, there needs to be a more emphasized discussion and explicit identification of the non-associative findings of their study, as this would be more impactful to the field.

      We have rewritten the Discussion to provide a greater emphasis on the findings of the study that are more related to non-associative mechanisms. For example, we argue that cue-salience and changes in stimulus intensity can induce non-associative increases in locomotor behavior and tail rattling in shock-sensitized mice.

      Reviewer #2 (Public Review):

      Summary:

      The authors examined several defensive responses elicited during Pavlovian conditioning using a serial compound stimulus (SCS) as the conditioned stimulus (CS) and a shock unconditioned stimulus (US) in male and female mice. The SCS consisted of tone pips followed by white noise. Their design included 3 treatment groups that were either exposed to the CS and US in a paired fashion, in an unpaired fashion, or only exposed to the shock US. They compared freezing, jumping, darting, and tail rattling across all groups during conditioning and extinction. During conditioning, strong freezing responses to the tone pips followed by strong jumping and darting responses to the white noise were present in the paired group but less robust or not present in the unpaired or shock only groups. During extinction, tone-induced freezing diminished while the jumping was replaced by freezing and darting in the paired group. Together, these findings support the idea that associative pairings are necessary for conditioned defensive responses.

      Strengths:

      The study has strong control groups including a group that receives the same stimuli in an unpaired fashion and another control group that only receives the shock US and no CS to test the associative value of the SCS to the US. The authors examine a wide variety of defensive behaviors that emerge during conditioning and shift throughout extinction: in addition to the standard freezing response, jumping, darting, and tail rattling were also measured.

      We thank the Reviewer for their supportive appraisal of this study’s strengths.

      Weaknesses:

      This study could have greater impact and significance if additional conditions were added (e.g., using other stimuli of differing salience during the SCS), and determining the neural correlates or brain regions that are differentially recruited during different phases of the task across the different groups.

      In the revised manuscript, we have conducted experiments with 2 reverse-SCS groups (WN—tone): one with paired (new PA-R group), and one with unpaired (new UN-R group), presentations to shock during conditioning. These experiments revealed that during conditioning day 2 in both reverse order groups, WN causes reductions in freezing and increases in locomotor activity (see revised Figure 2D), an effect that is stronger in the UN-R compared to the PA-R group. This locomotor effect is neither darting nor escape jumping in the PA-R group (revised Figure 3G, I; Figure 4G). In the UN-R group, WN induces more activity than the PA-R group (Figure 2D), including some jumping at WN onset (Figure 3H), but no darting (Figure 4G). Indeed, stimulus salience is a key factor in this paradigm for inducing activity (see Hersman et al. 2020). Together, these results suggest that WN is an inherently more salient stimulus than tone, and it can elicit defensive behaviors in shock-sensitized mice through non-associative mechanisms. It is worth noting that WN does not elicit defensive behavior before conditioning at the sound intensity we use (75dB; see Fadok et al. 2017, Borkar et al. 2020, Borkar et al. 2024).

      We agree that determining the neuronal correlates and brain regions that are involved in defensive ethograms at various stages within this paradigm is of great importance, but we feel that those experiments are beyond the scope of the current study, which is focused on identifying behavioral differences based on associative and non-associative factors.

      Reviewer #1 (Recommendations For The Authors):

      In LINES 72-73, authors say they used a "truly random procedure" as one of their control groups. Then in LINES 113-116, they describe this group as "unpaired" where the "SCS could not reliably predict footshock". Combined, it is unclear if this group is random or unpaired. The "truly random procedure" is defined, by the cited Rescorla paper, as "the two events are programmed entirely randomly and independently in such a way that some "pairings" of CS and US may occur by chance alone". So, truly random would indicate that the shock may occur during the cue, while unpaired indicates the shock was explicitly unpaired from the cue. If the authors used a random procedure, the groups need to be labeled as random, not unpaired, and the # of cues that happened to coincide with footshock per animal needs to be reported somewhere. If the authors used an unpaired procedure (which appears to be the case based on 40-60s ITI between SCS and footshock being reported), it needs to be clearer and consistent throughout that it was explicitly unpaired, as well as removing the claim in LINE 72-73 that they used a "truly random procedure".

      We did indeed use an explicitly unpaired procedure. We have adjusted the text and figures to better reflect this, and we removed any mentions of randomness with regards to the presentations of SCS and footshock.

      Despite the lack of significant sex differences, it would still be helpful if data panels with individual data points (e.g. Fig 2E-J), were presented as identifiable by sex (e.g. closed vs open circles for males vs females).

      The revised manuscript now compares four or five groups per figure, making data presentation complicated. Providing the individual data points in each panel reduces figure clarity, therefore, we feel it is best to present the data as box-and-whisker plots without them. However, the source data files for each figure are available to the reader and the data are clearly labeled to be identifiable by sex.

      Is it not odd that all groups showed similar levels of contextual freezing during the 3min baseline? If shocks are unsignaled in the UN and SO groups, one would expect higher levels of contextual freezing compared to a paired group.

      We are not certain why one would expect higher levels of contextual freezing in the UN and SO groups compared to the PA group at the beginning of conditioning day 2. Another study also looked at baseline freezing in a contextual fear group (which is the same as shock only in our study) and in an auditory cued fear conditioning group within the conditioning context, and their data show that freezing during the baseline period is equivalent between groups (Sachella et al., 2022).

      During baseline on Extinction Day 1, it does seem that the unpaired and SO groups tend to have higher freezing levels compared to the paired groups. Author response image 1 shows baseline freezing during the first 3 minutes of extinction day 1. After two days of conditioning in the conditioned flight paradigm, contextual freezing either is, or trends to be significantly higher in the UN, UN-R, and SO groups than the PA and PA-R groups.

      Author response image 1.

      Baseline Freezing levels for all groups during the first extinction session. Baseline period is defined as the first 180 seconds of the session, before any auditory stimulus was presented. PA, Paired; UN, Unpaired; SO, Shock Only; PA-R, Paired Reverse; UN-R, Unpaired Reverse. *p<0.05, **p<0.01, ****p<0.0001.

      Do the tone and WN elicit similar levels of defensive behaviors in a naïve mouse? Or have the authors tested WN followed by tone? Is there a potential issue that the WN may be innately aversive which is then amplified with training? i.e. does a tone preferentially induce freezing while WN induces active behaviors, regardless of which sensory stimulus is temporally closer to the shock? If the change in behavior is really due to the pairing and temporal proximity to shock, then there should be increased jumps, etc to the tone if trained with WN->tone.

      WN can indeed be used as an aversive stimulus under certain conditions and at sufficiently high decibel levels. In the conditioned flight paradigm, WN is presented at 75dB, which is below the threshold for eliciting an acoustic startle response in a C57BL/6J mouse (Fadok et al. 2009). Also, during pre-exposure, when animals are naïve to the SCS, tone and WN stimuli do not elicit defensive behaviors (see Fadok et al. 2017, Borkar et al. 2020, 2024).

      As suggested by the Reviewer, during revision we have included reverse-SCS paired (PA-R) and unpaired (UN-R) groups to test for the role of stimulus salience and stimulus order on defensive ethograms. During conditioning day 2, the PA-R group exhibited little freezing to the WN, with a slightly elevated activity index, and they exhibited robust freezing during tone (revised Figure 2A-H). The activity during the WN in the PA-R group was significantly lower than that of the PA group (Figure 2L). The PA-R group also did not respond to WN with escape jumps or darting (Figure 3I, 4G). The UN-R group displayed greater activity during the WN than the UN and PA-R groups, but less activity than the PA group (Figure 2D, H). The UN-R group did not dart but this group displayed some jumping at WN onset (Figure 3H), like what was observed in the UN group.

      These data suggest that WN has inherent, salient properties that can induce some non-associative activity after the mouse has been sensitized by shock (see also Hersman et al. 2020 for more detailed analysis of stimulus salience in the conditioned flight paradigm). However, only in the PA group is robust flight behavior (comprised of high numbers of escape jumps and darting) observed. Therefore, both stimulus salience and temporal order are important for eliciting transitions from freezing to flight.

      Fig 3G/4G are hard for me to understand. The figure legends say they're survival graphs but the y-axis labels "Latency to initial jump/dart (% of cohort)" confuses me. What is the purpose of these graphs? Perhaps they are not needed. Or consider presenting them similar to Fig 7C, D as those were more intuitive and faster for me to grasp.

      We had intended these plots to show that a greater proportion of the paired group jumps and darts during WN compared to the unpaired group, and that the percentage of the cohort that jumps and darts increases across conditioning trials. Because these graphs were not clear, we have removed them, and we have replaced them with graphs comparing total cohort percentages that jumped (Figure 3I) or darted (Figure 4G) over the whole CD2 session.

      For the extinction data, I did not see within group analyses for within or between session fear extinction to the tone. So, for the paired group, were the last 4 trials of Ext 1 significantly lower than the first 4 trials? If not, then they did not show within-session extinction. Also, for the paired group, were the last 4 trials of Ext 1 significantly different than the first 4 trials of Ext 2? This would test for long-term retention and spontaneous recovery.

      In the original submission and in the revised manuscript, we calculated a delta change score for freezing during tone in the early versus late blocks of 4 trials, and then we statistically compared these differences across groups (Figure 5C, D). This allowed us to assess between-group differences in changes to tone-evoked freezing during extinction. Freezing to tone did decrease significantly over the first extinction session for the paired group (Early Ext1 vs Late Ext1, paired t-test, t(31) \= 6.23, p<0.0001), and when comparing late Ext1 and early Ext2, we found that tone-evoked freezing did significantly increase (Late Ext1 vs Early Ext2, paired t-test, t(31) \= 5.26, p<0.0001). This increase in cue-induced freezing between days of extinction is characteristic of C57BL/6J mice (Hefner et al., 2008). Our study did not test for more distal timepoints, so we cannot comment on the efficacy of long-term retention or spontaneous recovery.

      For the conditioning and extinction data across Figs 2, 5 and 6, what I gather from them is that freezing is high to the tone and low to the WN during conditioning, and then low to the tone, and high to the WN across extinction. Then for activity levels I see they are low to the tone and high to the WN during conditioning, and then low to the WN during extinction. The piece that is missing is what are activity levels like to the tone during extinction. Are they low like in conditioning and remain low in extinction? Or do they increase across extinction as freezing decreases? As I was going through these graphs I drew myself out step function summaries of the freezing and activity levels between tone/WN for conditioning vs extinction; maybe the authors could consider a summary figure.

      We thank the Reviewer for their interest. We found that within the paired group, activity to tone remained low throughout both days of extinction (though increased within each session) and did not return to normal activity levels. We present this data in Author response image 2. We thank the Reviewer for the suggestion of a summary figure, but we feel there are too many axes of classification (between-group, within-group, multiple behaviors, tone/WN, conditioning/extinction) to coherently present our findings in a single figure.

      Author response image 2.

      Trial-by-trial plot of activity index during the tone period of SCS across both extinction sessions for the PA group. SCS, Serial compound stimulus; Ext, extinction; PA, Paired.

      In the discussion (LINE 592-3), they discuss that shock sensitization in the SO group may prime a stressed animal to dart more readily to WN upon stimulus transition. Should this not also happen during the transition of silence to tone? What is special about a transition between two auditory stimuli that would result in panic like behavior in an animal that only received shock presentations? This also gets back to an earlier concern above regarding the potentially innately aversiveness of the WN.

      After 2 days of shock sensitization, we observe that mice exhibit freezing to the tone during the first three trials of extinction day 1 (Figure 5A). This non-associative freezing response is like that observed in other studies of non-associative fear processing (please see Kamprath and Wotjak, 2004). As trials progress during extinction day 1, mice do become mildly activated during the tone (Author response image 3). The transition to WN in the shock-only group during extinction induces non-associative darting responses, but it does not induce escape jumping behavior (Figure 7).  We hypothesize that the innate salience of the WN is a vital factor contributing to these escalated responses. The importance of stimulus salience in conditioned flight was also demonstrated by Hersman et al., 2020 for SCS conditioning, and by Furuyama et al., 2023 for single tone conditioning.  Just as with conditional freezing responses (Kamprath and Wotjak, 2004), we believe that conditional flight is controlled by summative components, one being associative and the other non-associative.

      Author response image 3.

      Trial-by-trial plot of activity index during the tone period of SCS across both extinction sessions for the SO group. SCS, Serial compound stimulus; Ext, extinction; SO, Shock Only.

      In the discussion (LINE 583), they say that the development of explosive defensive behaviors are "not achievable with traditional single-cue Pavlovian conditioning paradigms". The authors should include a caveat here that the current study did not compare their results to a group of mice that received just WN-shock pairings.

      We thank the reviewer for this comment. This statement was meant to highlight that traditional paradigms do not offer an element of signaling the temporal imminence of threat, only its inevitability. It was not our intention to state that defensive escape behaviors were unachievable in single-cue conditioning paradigms, and we regret not making this clear. Indeed, the supplement of Fadok et al. 2017 shows that WN-shock conditioning is capable of inducing flight, Furuyama et al. 2023 shows that tone-shock conditioning is capable of inducing flight under specific parameters, and Gruene et al. 2015 demonstrates that single CS-US pairings induce conditional darting behaviors in female rats. We have adjusted the text to better reflect our intent.  

      Minor comment to LINE 613-5: Speaking as someone who has done fear conditioning in both mice and rats, tail rattling may be specific to mice (I have seen this often) and likely not observable in rats (never seen it).

      We thank the Reviewer for this information. We have adjusted our text to mainly discuss mouse-specific tail rattling.

      Reviewer #2 (Recommendations For The Authors):

      The research questions in this study are novel and bring new insight to the field. However, there are some issues that can be addressed to improve the overall quality of the study, namely, the reader is left wanting to know more, especially about how neural circuits contribute to these different defensive behaviors during this task. Below are some recommendations for the authors that would greatly improve the impact and significance of this study.

      (1) What are the neural correlates or circuits recruited during these different defensive behaviors across the course of conditioning and extinction? How might they differ between the PA and UN groups? What differences might emerge when an animal is shifting their defensive behavior from freezing to darting, for example? Answering these questions would require intensive additional experiments, therefore more discussion of possible neural mechanisms that might be recruited during this task would be appreciated, given the scope of the subject area.

      We agree that understanding the neural circuits recruited during these behaviors and across conditioning and extinction is of vital importance. We are actively working on these questions, and we have published on the role of central amygdala circuits (Fadok et al. 2017) as well as on top-down control of flight by the medial prefrontal cortex (Borkar et al. 2024). Because the current manuscript is focused on learning mechanisms influencing defensive behavior, we would prefer to focus our discussion on that, rather than speculating on possible neural mechanisms. However, we have added a statement in the Discussion (LINES 706-707) emphasizing that future studies should investigate the neuronal mechanisms contributing to threat associations and different defensive behaviors.

      (2) Were any vocalizations observed during conditioning or extinction phases? If not, could you speculate how type and occurrence of vocalizations might correlate with the different defensive responses observed?

      Audible vocalizations were only observed during footshock presentations (squeaks). Unfortunately, we do not have the proper specialized recording equipment to monitor the full spectrum of mouse vocalizations, especially those in the ultrasonic range. Thus, we cannot speculate on the nuances of vocalizations in mice with respect to this behavioral paradigm. To the best of our knowledge, mice have not been reported to emit specific ultrasonic calls during conditioned threat like those of rats. That said, it would be of interest to determine if mice emit different vocalizations during different defensive behaviors.

      (3) The transition from freezing to flight during the SCS is thought to be due to the close proximity of threat imminence between the WN CS and shock US. What if you switched the order of the SCS stimuli to WN followed by tone stimuli? If the salience of the WN stimulus is truly driving the jumping behavior, then it would be observed even if the WN stimulus preceded the pure tone stimulus and that would bring additional evidence that it is the associative value of the stimuli rather than its salience that's driving the defensive behaviors. What do you predict you would observe in rodents that were given a WN-tone SCS paired and unpaired in the same design of this study?

      As suggested by the reviewer, we collected data from reverse-SCS paired and unpaired groups and reported our findings within the manuscript. Our detailed findings are also discussed above. Overall, we find that a combination of stimulus salience and temporal proximity, and a summation of non-associative and associative mechanisms, are necessary to elicit explosive flight behavior (escape jumping and darting).

      References

      Borkar CD, Dorofeikova M, Le QE, Vutukuri R, Vo C, Hereford D, Resendez A, Basavanhalli S, Sifnugel N, Fadok JP (2020) Sex differences in behavioral responses during a conditioned flight paradigm. Behavioural Brain Research 389:112623.

      Borkar CD, Stelly CE, Fu X, Dorofeikova M, Le QE, Vutukuri R, Vo C, Walker A, Basavanhalli S, Duong A, Bean E, Resendez A, Parker JG, Tasker JG, Fadok JP (2024) Top-down control of flight by a non-canonical cortico-amygdala pathway. Nature 625: 743-749.

      Fadok JP, Krabbe S, Markovic M, Courtin J, Xu C, Massi L, Botta P, Bylund K, Müller C, Kovacevic A, Tovote P, Lüthi A (2017) A competitive inhibitory circuit for selection of active and passive fear response. Nature 542:96-100.

      Furuyama T, Imayoshi A, Iyobe T, Ono M, Ishikawa T, Ozaki N, Kato N, Yamamoto R (2023) Multiple factors contribute to flight behaviors during fear conditioning. Scientific Reports 13:10402. 

      Gruene TM, Flick K, Stefano A, Shea SD, Shansky RM (2015) Sexually divergent expression of active and passive conditioned fear responses in rats. eLIfe 4:e11352.

      Hefner K, Whittle N, Juhasz J, Norcross M, Karlsson RM, Saksida LM, Bussey TJ, Singewald N, Holmes A (2008) Impaired Fear Extinction Learning and Cortico-Amygdala Circuit Abnormalities in a Common Genetic Mouse Strain. Journal of Neuroscience 6:8074-8085.

      Hersman S, Allen D, Hashimoto M, Brito SI, Anthony T (2020) Stimulus salience determines defensive behaviors elicited by aversively conditioned serial compound auditory stimuli. elife 9:e53803. 

      Kamprath K and Wotjak CT (2004) Nonassociative learning processes determine expression and extinction of conditioned fear in mice. Learning & Memory 11:770-786.

      Sachella TE, Ihidoype MR, Proulx CD, Pafundo DE, Medina JH, Mendez P & Piriz J (2022) A novel role for the lateral habenula in fear learning. Neuropsychopharmacology 47:1210-1219.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We thank the Reviewer for all their effort and suggestions over multiple drafts. Their comments have encouraged us to read and think more deeply about the issue under discussion (BLA spiking in response to CS/US inputs), and to find the papers whose contents we think provide a potential solution. We agree that there is more to understand about the mechanisms underlying associative learning in the BLA. We offer our paper as providing a new way of understanding the role of circuit dynamics (rhythms) in guiding associative learning via STDP. As we pointed out in our response to the previous review, the issue highlighted by the Reviewer is an issue for the entire field of associative learning in BLA: our discussion of the issue suggests why the experimentally observed BLA spiking in response to CS inputs, performed in the absence of US inputs (as done in the papers cited by the Reviewer), may not be what occurs in the presence of the US. Since our explanation involves the role of neuromodulators, such as ACh and dopamine, the suggestion is open to further testing.


      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Public Review’s only objection: “Deficient in this study is the construction of the afferent drive to the network, which does elicit activities that are consistent with those observed to similar stimuli. It still remains to be demonstrated that their mechanism promotes plasticity for training protocols that emulate the kinds of activities observed in the BLA during fear conditioning.”

      Recommendations for the Authors: “The authors have successfully addressed most of my concerns. I commend them for their thorough response. The one nagging issue is the unrealistic activation used to drive CS and US activation in their network. While I agree that their stimulus parameters are consistent with a contextual fear task, or one that uses an olfactory CS, this was not the focus of their study as originally conceived. Moreover, the types of activation observed in response to auditory cues, which is the focus of their study, do not follow what is reported experimentally. Thus, I stand by the critique that the proposed mechanism has not been demonstrated to work for the conditioning task which the authors sought to emulate (Krabbe et al. 2019). Frustratingly, addressing this is simple: run the model with ECS neurons driven so that they fire bursts of action potentials every ~1 sec for 30 sec, and with the US activation noncontiguous with that. If the model does not produce plasticity in this case, then it suggests that the mechanisms embedded in the model are not sufficient, and more work is needed to identify them. While 'memory' effects are possible that could extend the temporal contiguity of the CS and US, the authors need to provide experimental evidence for this occurring in the BLA under similar conditions if they want to invoke it in their model. 

      (1) Fair response. I accept the authors arguments and changes. 

      (2) The authors rightly point out that the simulated afferents need not perfectly match the time courses of the peripheral inputs, since what the amygdala receives them indirectly via the thalamus, cortex, etc. However, it is known how amygdala neurons respond to such stimuli, so it behooves the authors to incorporate that fact into their model. 

      Quirk et al. 1997 show that the response to the tone plummets after the first 100 ms in Figs 5A and 6B. The Herry et al. 2007 paper emphasizes the transient response to tone pips, with spiking falling back to a poisson low firing rate baseline outside of the time when the pip is delivered. 

      Regarding potential metabotropic glutamate activation, the stimulus in Whittington et al. 1995 was electrical stimulation at 100 Hz that would synchronously activate a large volume of tissue, which is far outside the physiological norm. I appreciate that metabotropic glutamate receptors may play a role here, but ultimately the model depends upon spiking activity for the plastic process to occur, and to the best of my knowledge the spiking activity in BLA in response to a sustained, unconditioned tone, is brief (see also Quirk, Repa, and Ledoux 1995). Perhaps a better justification for the authors would be Bordi and Ledoux 1992, which found that 18% of auditory responsive neurons showed a 'sustained' response, but the sustained response neurons appear to show much weaker responses than those with transient ones (Fig 2).  I am willing to say that their paper IS relevant to contextual fear, but that is not what the authors set out to do. 

      (3) Fair response. 

      (4) Very good response! 

      Minor points: All points were addressed.”

      We thank Reviewer 1 (R1) for the positive feedback and also for pointing out that, in R1’s opinion, there is still a nagging issue related to the activation in response to CS we modeled. In (Krabbe et al., 2019), CS is a pulsed input and US is delivered right after the CS offset. The current objection of R1 is that instead, we are modeling CS and US as continuous and overlapping. R1 suggested that we add the actual input and see if they will produce the desired outputs. The answer is simple: it will not work because we need the effects of CS and US on pyramidal cells to overlap. We note that the fear learning community appears to agree with us that such contingency is necessary for synaptic plasticity (Sun et al., 2020; Palchaudhuri et al., 2024). To the best of our understanding, the source of that overlap is not understood in the community, and the gap has been much noticed (Sun et al., 2020). We do note, however, that STDP may not be the only kind of plasticity in fear learning (Li et al., 2009; Kim et al., 2013, 2016).

      It is important to emphasize that it is not the aim of our paper to model the origin of the overlap. Rather, our intent is to demonstrate the roles of brain rhythms in producing the appropriate timing for STDP, assuming that ECS and F cells can continue to be active after the offset of CS and US, respectively. This assumption is very close to how the field now treats the plasticity, even for auditory fear conditioning (Sun et al., 2020). Thus, our methodology does not contradict known results. However, the question raised by R1 is indeed very interesting, if not the point of our paper. Hence, below we give details about why our hypothesis is reasonable.

      Several papers (Quirk, Repa and LeDoux, 1995; Herry et al, 2007; Bordi and Ledoux 1992) show that the pips in auditory fear conditioning increase the activity of some BLA neurons: after an initial transient, the overall spike rate is still higher than baseline activity. As R1 points out, we did not model the transient increase in BLA spiking activity that occurs in response to each pip in the auditory fear conditioning paradigm. However, we did model the low-level sustained activity that occurs in between pips of the CS in the absence of US (Quirk, Repa and LeDoux, 1995, Fig. 2) and after CS offset (see Fig. 2B, left hand part of our manuscript). We read the data of Quirk et al., 1995 as suggesting that the low-level activity can be sustained for some indefinite time after a pip (cut off of recording was at 500 ms with no noticeable decrease in activity). As such, even if the pips and the US do not overlap in time, as in (Krabbe et al., 2019), the spiking of the ECS can be sustained after CS offset and thus overlap with US, a condition necessary in our model for plasticity through STDP. In Herry et al., 2007 Fig. 3 shows that BLA neurons respond to a pip at the population level with a transient increase in spiking and return to a baseline Poisson firing rate. However, a subset of cells continues to fire at an increased-over-baseline rate after the transient effect wears off (Fig. 3C, top few neurons) and this increased rate extends to the end of the recording time (here ~ 300 ms). These are the cells we consider to be ECS in our model. In Quirk et al., 1997, Fig. 5A also shows sustained low level activity of neurons in BLA in response to a pip. The low-level activity is shown to increase after fear learning, as is also the case in our model since ECS now entrains F so that there are more pyramidal cells spiking in response to CS. The question remains as to whether the spiking is sustained long enough and at a high enough rate for STDP to take place when US is presented sometime after the stop of the CS. 

      Experimental recordings cannot speak to the rate of spiking of BLA neurons during US due to recording interference from the shock. However, evidence seems to suggest that ECS activity should increase during the US due to the release of acetylcholine (ACh) from neurons in the basal forebrain (BF) (Rajebhosale et al., 2024). Pyramidal cells of the BLA robustly express M1 muscarinic ACh receptors (Muller et al., 2013; McDonald and Mott, 2021). Thus, ACh from BF should elicit a depolarization in pyramidal cells. Indeed, the pairing of ACh with even low levels of spiking of BLA neurons results in a membrane depolarization that can last 7 – 10 s (Unal et al., 2015). This should induce higher spiking rates and more sustained activity in the ECS and F neurons during and after the presentation of US, thus ensuring a concomitant activation of ECS and fear (F) neurons necessary for STDP to take place. Other modulators, including dopamine, may also play a role in producing the sustained activity. Activation of US leads to increased dopamine release in the BLA (Harmer and Phillips, 1999; Suzuki et al., 2002). D1 receptors are known to increase the membrane excitability of BLA projection neurons by lowering their spiking threshold (Kröner et al., 2005). Thus, the activation of the US can lead to continued and higher firing rates of ECS and F. The effect of dopamine can last up to 20 minutes (Kröner et al., 2005). For CS-positive neurons, the ACh modulation coming from the firing of US may lead to a temporary extension of firing that is then amplified and continued by dopaminergic effects.

      Hence, we suggest that a solution to the problem raised by R1 may be solved by considering the roles of ACh and dopamine in the BLA. The involvement of neuromodulators is consistent with the suggestion of (Sun et al., 2020). The model we have may be considered a “minimal” model that puts in by hand the overlap in activity due to the neuromodulation without explicitly modeling it. As R1 says, it is important for us to give the motivation of our hypotheses. We have used the simplest way to model overlap without assumptions about timing specificity in the overlap.

      To account for these points in the manuscript, we first specified that we consider the effects of the US and CS inputs on the neuronal network as overlapping, while the actual inputs may not overlap. To do that, we added the following text:

      (1) In the introduction: 

      “In this paper, we aim to show 1) How a variety of BLA interneurons (PV, SOM and VIP) lead to the creation of these rhythms and 2) How the interaction of the interneurons and the rhythms leads to the appropriate timing of the cells responding to the US and those responding to the CS to promote fear association through spike-timing-dependent plasticity (STDP). Since STDP requires overlap of the effects of the CS and US, and some conditioning paradigms do not have overlapping US and CS, we include as a hypothesis that the effects of the CS and US overlap even if the CS and US stimuli do not. In the Discussion, we suggest how neuromodulation by ACh and/or dopamine can provide such overlap. We create a biophysically detailed model of the BLA circuit involving all three types of interneurons and show how each may participate in producing the experimentally observed rhythms and interacting to produce the necessary timing for the fear learning.”

      (2) In the Result section “With the depression-dominated plasticity rule, all interneuron types are needed to provide potentiation during fear learning”:

      “The 40-second interval we consider has both ECS and F, as well as VIP and PV interneurons, active during the entire period: an initial bout of US is known to produce a long-lasting fear response beyond the offset of the US (Hole and Lorens, 1975) and to induce the release of neuromodulators. The latter, in particular acetylcholine and dopamine that are known to be released upon US presentation (Harmer and Phillips, 1999; Suzuki et al., 2002; Rajebhosale et al., 2024), may induce more sustained activity in the ECS, F, VIP, and PV neurons during and after the presentation of US, thus ensuring a concomitant activation of those neurons necessary for STDP to take place (see “Assumptions and predictions of the model” in the Discussion).”

      (3) In the Discussion section “Synaptic plasticity in our model”:

      “Synaptic plasticity is the mechanism underlying the association between neurons that respond to the neutral stimulus CS (ECS) and those that respond to fear (F), which instantiates the acquisition and expression of fear behavior. One form of experimentally observed long-term synaptic plasticity is spike-timing-dependent plasticity (STDP), which defines the amount of potentiation and depression for each pair of pre- and postsynaptic neuron spikes as a function of their relative timing (Bi and Poo, 2001; Caporale and Dan, 2008). All forms of STDP require that there be an overlap in the firing of the pre- and postsynaptic cells. In some fear learning paradigms, the US and the CS do not overlap. We address this below under “Assumptions and predictions of the model”, showing how the effects of US and CS on the spiking of the relevant neurons can overlap even in the absence of overlap of US and CS.”

      To fully present our reasoning about the origin of the overlap of the effects of US and CS, we modified and added to the last paragraph of the Discussion section “Assumptions and predictions of the model”, which now reads as follows:

      “Finally, our model requires the effect of the CS and US inputs on the BLA neuron activity to overlap in time in order to instantiate fear learning through STDP. Such a hypothesis, that learning uses spike-timing-dependent plasticity, is common in the modeling literature (Bi and Poo, 2001; Caporale and Dan, 2008; Markram et al., 2011). Current paradigms of fear conditioning include examples in which the CS and US stimuli do not overlap (Krabbe et al., 2019). Such a condition might seem to rule out the mechanisms in our paper. Nevertheless, the argument below suggests that the effects of the CS and US can cause an overlap in neuronal spiking of ECS, F, VIP, and SOM, even when CS and US inputs do not overlap.

      Experimental recordings cannot speak to the rate of spiking of BLA neurons during US due to recording interference from the shock. However, evidence suggests that ECS activity should increase during the US due to the release of acetylcholine (ACh) from neurons in the basal forebrain (BF) (Rajebhosale et al., 2024). Pyramidal cells of the BLA robustly express M1 muscarinic ACh receptors (McDonald and Mott, 2021). Thus, ACh from BF should elicit a depolarization in pyramidal cells. Indeed, the pairing of ACh with even low levels of spiking of BLA neurons results in a membrane depolarization that can last 7 – 10 s (Unal et al., 2015).   Other modulators, including dopamine, may also play a role in producing the sustained activity. Activation of US leads to increased dopamine release in the BLA (Harmer and Phillips, 1999; Suzuki et al., 2002). D1 receptors are known to increase the membrane excitability of BLA projection neurons by lowering their spiking threshold (Kröner et al., 2005). Thus, neuromodulator release should induce higher spiking rates and more sustained activity in the ECS and F neurons during and after the presentation of US, thus ensuring a concomitant activation of ECS and fear (F) neurons necessary for STDP to take place. Thus, the activation of the US can lead to continued and higher firing rates of ECS and F. The effect of dopamine can last up to 20 minutes (Kröner et al., 2005). For CS-positive neurons, the ACh modulation coming from the firing of US may lead to a temporary extension of firing that is then amplified and continued by dopaminergic effects.

      Hence, we suggest that a solution to the problem apparently posed by the non-overlap US and CS in some paradigms of auditory fear conditioning (Krabbe et al., 2019) may be solved by considering the roles of ACh and dopamine in the BLA. The model we have may be considered a “minimal” model that puts in by hand the overlap in activity due to the neuromodulation without explicitly modeling it. We have used the simplest way to model overlap without assumptions about timing specificity in the overlap. We note that, even though ECS and F neurons have the ability to fire continuously when ACh and dopamine are involved, the participation of the interneurons enforces periodic silence needed for the depression-dominated STDP.”

      In the Discussion (in section “Involvement of other brain structures”), we also acknowledged that the overlap between the effects of US and CS in the BLA may be provided by other brain structures by writing the following:

      “In our model, the excitatory projection neurons and VIP and PV interneurons show sustained activity during and after the US presentation, thus allowing potentiation through STDP to take place. The medial prefrontal cortex and/or the hippocampus may provide the substrates for the continued firing of the BLA neurons after the 2-second US stimulation. We also discuss below that this network sustained activity may originate from neuromodulator release induced by US (see section “Assumptions and predictions of the model” in the Discussion).”

      We also improved our discussion about the (Grewe et al., 2017) paper, which questions Hebbian plasticity in the context of fear conditioning based on several critiques. We included a new section in the Discussion entitled “Is STDP needed in fear conditioning?” to discuss those critiques and how our model may address them, which reads as follows:

      “Is STDP needed in fear conditioning? The study in (Grewe et al., 2017) questions the validity of the Hebbian model in establishing associative learning during fear conditioning. There are several critiques we discuss here. The first critique is that Hebbian plasticity does not explain the experimental finding showing that both upregulation and downregulation of stimulus-evoked responses are present between coactive neurons. The upregulation is provided by our model, so the issue is the downregulation, which is not addressed by our model. However, our model highlights that coactivity alone does not create potentiation; the fine timing of the pre- and postsynaptic spikes determines whether there is potentiation or depression. Here, we find that PING networks are instrumental in setting up the fine timing for potentiation. We suggest that networks not connected to produce the PING may undergo depression when coactive.

      The second critique raised by (Grewe et al., 2017) is that Hebbian plasticity alone does not explain why most of the cells exhibiting enhanced responses to the CS did not react to the US before fear conditioning. They suggest that neuromodulators may provide a third condition (besides the activity of the pre- and postsynaptic neurons) that changes the plasticity rule. Our model also does not explicitly address this experimental finding since it requires F to be initially activated by US in order for the fear association to be established. We agree that the fear cells described in (Grewe et al. 2017) may be depolarized by the US without reaching the spiking threshold; however, with neuromodulation provided during the fear training, the same input can lead to spiking, enabling the conditions for Hebbian plasticity. Our discussions above about how neuromodulators affect excitability are relevant to this point. We do not exclude that other forms of plasticity may play a role during fear conditioning in cells not initially activated by the US, but this is not the topic of our modeling study.

      The third critique raised by (Grewe et al., 2017) is that Hebbian plasticity cannot explain why the majority of cells that were US- and CS-responsive before training have a reduced CS-evoked response afterward. The reduced response happens over multiple exposures of CS without US; this can involve processes similar to those present in fear extinction, which require plasticity in further networks, especially involving the infralimbic cortex (Milad and Quirk, 2002; Burgos-Robles et al., 2007). An extension of our model could investigate such mechanisms. In the fourth critique, (Grewe et al., 2017) suggests that the Hebbian plasticity rule cannot easily account for the reduction of the responses of many CS+-responsive cells, but not of the CS−-responsive cells. We suggest that the circuits involving paradigms similar to fear extinction do not involve the CS- cells.

      Overall, we agree with (Grewe et al., 2017) that neuromodulators play a crucial role in fear conditioning, especially in prolonging the US- and CS-encoding activity as discussed in (see section “Assumptions and predictions of the model” in the Discussion), or even participating in changing the details of the plasticity rule. A possible follow-up of our work involves investigating how fear ensembles form and modify through fear conditioning and later stages. This follow-up work may involve using a tri-conditional rule, as suggested in (Grewe et al., 2017), in which the potential role of neuromodulators is taken into account in the plasticity rule in addition to the pre- and postsynaptic neuron activity. Another direction is to investigate a possible relationship between neuromodulation and a depression-dominated Hebbian rule.”

      Finally, we made additional minor changes to the manuscript:

      (1) In the Result section “Interneurons interact to modulate fear neuron output”, we specified the following:

      “The US input on the pyramidal cell and VIP interneuron is modeled as a Poisson spike train at ~ 50 Hz and an applied current, respectively. In the rest of the paper, we will use the words “US” as shorthand for “the effects of US”.” 

      (2) In the Result section “Interneuron rhythms provide the fine timing needed for depression dominated STDP to make the association between CS and fear”, we also reported the following:

      “Similarly to the US, in the rest of the paper, we will use the words “CS” as shorthand for “the effects of CS”. In our simulations, CS is modeled as a Poisson spike train at ~ 50 Hz, independent of the US input. Thus, we hypothesize that the time structure of the inputs sometimes used for the training (e.g., a series of auditory pips) is not central to the formation of the plasticity in the network.”  

      Reviewer #2 (Public Reviews):

      The authors of this study have investigated how oscillations may promote fear learning using a network model. They distinguished three types of rhythmic activities and implemented an STDP rule to the network aiming to understand the mechanisms underlying fear learning in the BLA. 

      After the revision, the fundamental question, namely, whether the BLA networks can or cannot intrinsically generate any theta rhythms, is still unanswered. The author added this sentence to the revised version: "A recent experimental paper, (Antonoudiou et al., 2022), suggests that the BLA can intrinsically generate theta oscillations (3-12 Hz) detectable by LFP recordings under certain conditions, such as reduced inhibitory tone." In the cited paper, the authors studied gamma oscillations, and when they applied 10 uM Gabazine to the BLA slices observed rhythmic oscillations at theta frequencies. 10 uM Gabazine does not reduce the GABA-A receptor-mediated inhibition but eliminates it, resulting in rhythmic populations burst driven solely by excitatory cells. Thus, the results by Antonoudiou et al., 2022 contrast with, and do not support, the present study, which claims that rhythmic oscillations in the BLA depend on the function of interneurons. Thus, there is still no convincing evidence that BLA circuits can intrinsically generate theta oscillations in intact brain or acute slices. If one extrapolates from the hippocampal studies, then this is not surprising, as the hippocampal theta depends on extrahippocampal inputs, including, but not limited to the entorhinal afferents and medial septal projections (see Buzsaki, 2002). Similarly, respiratory related 4 Hz oscillations are also driven by extrinsic inputs. Therefore, at present, it is unclear which kind of physiologically relevant theta rhythm in the BLA networks has been modelled. 

      In our public reply to the Reviewer’s point, we reported the following:

      (1) We kindly disagree that (Antonoudiou et al., 2022) contrasts with our study. (Antonoudiou et al., 2022) is a slice study showing that the BLA theta power (3-12 Hz) increases with gabazine compared to baseline. With all GABAergic currents omitted due to gabazine, the LFP is composed of excitatory currents and intrinsic currents. In our model, the high theta (6-12 Hz) comes from the spiking activity of the SOM cells, which increase their activity if the inhibition from VIP cells is removed. Thus, the model produces high theta in the presence of gabazine (see Fig. 1 in our replies to the Reviewers’ public comments). The model also shows that a PING rhythm is produced without gabazine, and that this rhythm goes away with gabazine because PING requires feedback inhibition from PV to fear cells. Thus, the high theta increase and gamma reduction with gabazine in the (Antonoudiou et al., 2022) paper can be reproduced in our model.

      (2) We agree that (Antonoudiou et al., 2022) alone is not sufficient evidence that the BLA can produce low theta (3-6 Hz); we discussed a new paper (Bratsch-Prince et al., 2024) that provides further evidence of BLA ability to produce low theta and under what circumstances. The authors reported that intrinsic BLA theta is produced in slices with ACh stimulation (without needing external glutamate input) which, in vivo, would be provided by the basal forebrain (Rajebhosale et al., eLife, 2024) in response to salient stimuli. The low theta depends on muscarinic activation of CCK interneurons, a group of interneurons that overlaps with the VIP neurons in our model (Krabbe 2017; Mascagni and McDonald, 2003). We suspect that the low theta produced in (Bratsch-Prince et al., 2024) is the same as the low theta in our model. In future work, we will aim to show that ACh activates the BLA VIP cells, which are essential to the low theta generation in the network.

      In the manuscript, we added to and modified the Discussion section “Where the rhythms originate, and by what mechanisms”. This text aims to better discuss (Antonoudiou et al. 2022) and introduce (Bratsch-Prince et al., 2024) with its connection to our hypothesis that the theta oscillations can be produced within the BLA. The new version is:

      “Where the rhythms originate, and by what mechanisms. A recent experimental paper (Antonoudiou et al., 2022) suggests that the BLA can intrinsically generate theta oscillations (312 Hz) detectable by LFP recordings when inhibition is totally removed due to gabazine application. They draw this conclusion in mice by removing the hippocampus, which can volume conduct to BLA, and noticing that other nearby brain structures did not display any oscillatory activity. In our model, we note that when inhibition is removed, both AMPA and intrinsic currents contribute to the network dynamics and the LFP. Thus, interneurons with their specific intrinsic currents (i.e., D-current in the VIP interneurons, and NaP- and H- currents in SOM interneurons) can indeed affect the model LFP and support the generation of theta and gamma rhythms (Fig. 6G). 

      Another slice study, (Bratsch-Prince et al., 2024), shows that BLA is intrinsically capable of producing a low theta rhythm with ACh stimulation and without needing external glutamate input. ACh is produced in vivo by the basal forebrain in response to US (Rajebhosale et al., 2024). Although we did not explicitly include the BF and ACh modulation of BLA in our model, we implicitly include the effect of ACh in BLA by increasing the activity of the VIP cells, which then produce the low theta rhythm. Indeed, low theta in the BLA is known to depend on the muscarinic activation of CCK interneurons, a group of interneurons that overlaps with the class of VIP neurons in our model (Mascagni and McDonald, 2003; Krabbe et al., 2018). 

      Although the BLA can produce these rhythms, this does not rule out that other brain structures also produce the same rhythms through different mechanisms, and these can be transmitted to the BLA. Specifically, it is known that the olfactory bulb produces and transmits the respiratoryrelated low theta (4 Hz) oscillations to the dorsomedial prefrontal cortex, where it organizes neural activity (Bagur et al., 2021). Thus, the respiratory-related low theta may be captured by BLA LFP because of volume conduction or through BLA extensive communications with the prefrontal cortex. Furthermore, high theta oscillations are known to be produced by the hippocampus during various brain functions and behavioral states, including during spatial exploration (Vanderwolf, 1969) and memory formation/retrieval (Raghavachari et al., 2001), which are both involved in fear conditioning. Similarly to the low theta rhythm, the hippocampal high theta can manifest in the BLA. It remains to understand how these other rhythms may interact with the ones described in our paper. However, we emphasize that there is also evidence (as discussed above) that these rhythms arise within the BLA.”

      Reviewer #2 (Recommendations for the Authors):

      (1) Three different types of VIP interneurons with distinct firing patterns have been revealed in the BLA (Rhomberg et al., 2018). Does the generation of rhythmic activities depend on the firing features of VIP interneurons? Does it matter whether VIP interneurons fire burst of action potentials or they discharge more regularly?  

      (2) The authors used data for modeling SST interneurons obtained e.g., in the hippocampus. However, there are studies in the BLA where the intrinsic characteristics of SST interneurons have been reported (Unal et al., 2020; Guthman et al., 2020; Vereczki et al., 2021). Have the authors considered using results of studies that were conducted in the BLA? 

      We thank the Reviewer for their questions, which have helped us further improve our manuscript in response to similar queries from Reviewer 3 in the previous review round. More in detail:

      (1) Although other electrophysiological types exist (Sosulina et al., 2010), we hypothesized that the electrophysiological type of VIP neurons that display intrinsic stuttering is the type that would be involved in mediating low theta oscillations during fear conditioning. This is because VIP intrinsic stuttering in cortical neurons is thought to involve the D-current, which helps create low theta bursting oscillations in the neuronal spiking patterns (Chartove et al., 2020). We think that the other subtypes of VIP interneurons are not essential for the low theta oscillatory dynamics observed during fear conditioning and, thus, did not provide an essential constraint for the phenomena we are trying to capture. VIP interneurons in our network must fire bursts at low theta to be effective in creating the pauses in ECS and F spiking needed for potentiation; single spikes at theta are not sufficient to create these pauses.

      (2) In our model, we used the results conducted in a BLA study (Sosulina et al., 2010). SOM cells in the BLA display several physiologic types. We chose to include in our model the type showing early adaptation in response to a depolarizing current and inward (outward) rectification upon the initiation (release) of a hyperpolarizing current. We hypothesize that this type can produce high theta oscillations, a prominently observed rhythm in the BLA. Unal et al., 2020 (Unal et al., 2020) found two populations of SOM cells in the BLA, which have been previously recorded in (Sosulina et al., 2010), including the one type we chose to model. This SOM cell type shows a low threshold spiking profile characterized by spike frequency adaptation and voltage sag indicative of an H-current used in our model. Guthman et al., 2020, (Guthman et al., 2020), also found a population of SOM cells with hyperpolarization induced sag.

      Our model also uses a NaP-current for which there is no data in the BLA. However, it is known to exist in hippocampal SOM cells and that NaP- and H- currents can produce such a high theta in hippocampal cells. It is a standard practice in modeling to use the best possible replacement for unknown currents. Of course, it is unfortunate to have to do this. We also note that models can be considered proof of principle, that can be proved or disproved by further experimental work. Both (Guthman et al., 2020) and (Vereczki et al., 2021) also uncover further heterogeneity among BLA SOM interneurons involving more than electrophysiology. We hypothesize that such a level of heterogeneity revealed by these three studies is not key to the question we are asking (where crucial ingredients are the rhythms) and, therefore, was not included in our minimal model.

      We modified the Discussion section titled “Assumptions and predictions of the model” as follows:

      “Our model, which is a first effort towards a biophysically detailed description of the BLA rhythms and their functions, does not include the neuron morphology, many other cell types, conductances, and connections that are known to exist in the BLA; models such as ours are often called “minimal models” and constitute most biologically detailed models. For example, although there is considerable variability in the activity patterns of both VIP cells and SOM cells (Sosulina et al., 2010; Guthman et al., 2020; Ünal et al., 2020; Vereczki et al., 2021), our focus was specifically on those subtypes that generate critical rhythms within the BLA. Such minimal models are used to maximize the insight that can be gained by omitting details whose influence on the answers to the questions addressed in the model are believed not to be qualitatively important. We note that the absence of these omitted features constitutes hypotheses of the model: we hypothesize that the absence of these features does not materially affect the conclusions of the model about the questions we are investigating. Of course, such hypotheses can be refuted by further work showing the importance of some omitted features for these questions and may be critical for other questions. Our results hold when there is some degree of heterogeneity of cells of the same type, showing that homogeneity is not a necessary condition.”

      (3) The authors may double-check the reference list, as e.g., Cuhna-Reis et al., 2020 is not listed. 

      We thank the Reviewer for spotting this. We checked the reference list and all the references are now listed.

      Finally, we wanted to acknowledge that we made other changes to the manuscript unrelated to the reviewers’ questions with the purpose of gaining clarity. More specifically:

      (1) We included a section titled “Significance” after the abstract and keywords, which reads as follows:

      “Our paper accounts for the experimental evidence showing that amygdalar rhythms exist, suggests network origins for these rhythms, and points to their central role in the mechanisms of plasticity involved in associative learning. It is one of the few papers to address high-order cognition with biophysically detailed models, which are sometimes thought to be too detailed to be adequately constrained. Our paper provides a template for how to use information about brain rhythms to constrain biophysical models. It shows in detail, for the first time, how multiple interneurons help to provide time scales necessary for some kinds of spike-timing-dependent plasticity (STDP). It spells out the conditions under which such interactions between interneurons are needed for STDP and why. Finally, our work helps to provide a framework by which some of the discrepancies in the fear learning literature might be reevaluated. In particular, we discuss issues about Hebbian plasticity in fear learning; we show in the context of our model how neuromodulation might resolve some of those issues. The model addresses issues more general than that of fear learning since it is based on interactions of interneurons that are prominent in the cortex, as well as the amygdala.”

      (2) The Result section “Physiology of the interneuron types is critical to their role in depression-dominated plasticity”, which is now titled “Mechanisms by which interneurons contribute to potentiation in depression-dominated plasticity”, now reads as follows:

      “Mechanisms by which interneurons contribute to potentiation during depressiondominated plasticity. The PV cell is necessary to induce the correct pre-post timing between ECS and F needed for long-term potentiation of the ECS to F conductance. In our model, PV has reciprocal connections with F and provides lateral inhibition to ECS. Since the lateral inhibition is weaker than the feedback inhibition, PV tends to bias ECS to fire before F. This creates the fine timing needed for the depression-dominated rule to instantiate plasticity. If we used the classical Hebbian plasticity rule (Bi and Poo, 2001) with gamma frequency inputs, this fine timing would not be needed and ECS to F would potentiate over most of the gamma cycle, and thus we would expect random timing between ECS and F to lead to potentiation (Fig. S4). In this case, no interneurons are needed (See Discussion “Synaptic plasticity in our model” for the potential necessity of the depression-dominated rule). 

      In this network configuration, the pre-post timing for ECS and F is repeated robustly over time due to coordinated gamma oscillations (PING, as shown in Fig. 4A, Fig. 1C) arising through the reciprocal interactions between F and PV (Feng et al., 2019). PING can arise only when PV is in a sufficiently low excitation regime such that F can control PV activity (Börgers et al., 2005), as in Fig. 4A. However, although such a low excitation regime establishes the correct fine timing for potentiation, it is not sufficient to lead to potentiation (Fig. 4A, Fig. S2C): the depression-dominated rule leads to depression rather than potentiation unless the PING is periodically interrupted. During the pauses, made possible only in the full network by the presence of VIP and SOM, the history-dependent build-up of depression decays back to baseline, allowing potentiation to occur on the next ECS/F active phase. (The detailed mechanism of how this happens is in the Supplementary Information, including Fig. S2). Thus, a network without the other interneuron types cannot lead to potentiation. Though a low excitation level for a PV cell is necessary to produce a PING, a higher excitation level is necessary to produce a pause in the ECS and F. This higher excitation level is consistent with the experimental literature showing a strong activation of PV after the onset of CS (Wolff et al., 2014). The higher excitation happens when the VIP cell is silent, whereas a low excitation level is achieved when the VIP cell fires and partially inhibits the PV cell (Fig. 4B, Fig. S2D). The interruption in the ECS and F activity requires the participation of another interneuron, the SOM cell (Figs. 2B, S2): the pauses in inhibition from the VIP periodically interrupt ECS and F firing by releasing PV and SOM from inhibition and thus indirectly silencing ECS and F. Without these pauses, depression dominates (see SI section “ECS and F activity patterns determine overall potentiation or depression”).”

      We also removed a supplementary figure (Fig. S2).

      (3) We wanted to be clear and motivate our choice to extend the low theta range to 2-6 Hz and the high theta range to 6-14 Hz, compared to the 3-6 Hz and 6-12 Hz, respectively in the BLA experimental literature. Our main reason for extending the ranges was because the peaks of low and high theta power in the VIP and SOM cells, respectively, (the cells that generate these oscillations) occurred at the borders of the experimental ranges. Thus, in order to include the peaks of the model LFP, we lowered the low theta range by 1 Hz and increased the high theta range by 2 Hz.

      We present a new supplementary figure (Fig. S1) containing the power spectra of VIP, which is the source of low theta in our model, and SOM interneuron, which is the source of high theta:

      We mention Fig. S1 in the Result section “Rhythms in the BLA can be produced by interneurons”, where we added the following text: o “In the baseline condition, the condition without any external input from the fear conditioning paradigm (Fig. 1B, top), our VIP neurons exhibit short bursts of gamma activity (~38 Hz) at low theta frequencies (~2-6 Hz) (peaking at ~3.5 Hz) (see Fig. S1A).” o “In our baseline model, SOM cells have a natural frequency of ~12 Hz (Fig. 1B, middle; Fig. S1B), which is at the upper limit of the experimental high theta range; this motivates our choice to extend the high theta range up to 14 Hz in order to include the peak.” 

      Knowing the natural frequencies of VIP and SOM interneurons from the Result section “Rhythms in the BLA can be produced by interneurons”, we specified more clearly that we quantify the change of power in the low and high theta range around the power peaks in those ranges. Specifically, we changed some sentences in the first paragraph of the Result section “Increased low-theta frequency is a biomarker of fear learning” as follows:

      “We find that fear conditioning leads to an increase in low theta frequency power of the network spiking activity compared to the pre-conditioned level (Fig. 6 A,B); there is no change in the high theta power. We also find that the LFP, modeled as the linear sum of all the AMPA, GABA, NaP-, D-, and H- currents in the network, similarly reveals a low theta power increase when considering the peak of the low theta power, and no significant variation in the high theta power again when considering the peak of the high theta power (Fig. 6 C,D,E).”

      Finally, we made a few other small changes:

      In the Introduction, we mention the following: “We also note that there is not uniformity on the exact frequencies associated with low and high theta, e.g., ((Lorétan et al., 2004) used 2-6 Hz for low theta). Here, we use 2-6 Hz for the theta range and 6-14 Hz for the high theta range.”

      In Fig. 6DE (reported below point 3)), we reran the statistics using a smaller interval for high theta (11.5-13 Hz) to focus around the peak. Our initial result showing significant change in low theta between pre and post fear conditioning and no change in high theta still holds.

      In Fig. 6 of the Result section “Increase low-theta frequency is a biomarker of fear learning”, we switched the order of panels F and G. This change allows us to first focus on the AMPA currents, which are the major contributors of the low theta power increase, and to specify what AMPA current drives that increase. After that, we present the power spectrum of the GABA currents, as well.

      The corresponding text in the Result section, now reads as follows:

      “We find that fear conditioning leads to an increase in low theta frequency power of the network spiking activity compared to the pre-conditioned level (Fig. 6 A,B); there is no change in the high theta power. We also find that the LFP, modeled as the linear sum of all the AMPA, GABA, NaP-, D-, and H- currents in the network, similarly reveals a low theta power increase when considering the peak of the low theta power, and no significant variation in the high theta power again when considering the peak of the high theta power (Fig. 6 C,D,E). These results are consistent with the experimental findings in (Davis et al., 2017). Specifically, the newly potentiated AMPA synapse from ECS to F ensures F is active after fear conditioning, thus generating strong currents in the PV cells to which it has strong connections (Fig. 6F). It is the AMPA currents to the PV interneurons that are directly responsible for the low theta increase; it is the newly potentiated ECS to F synapse that paces the AMPA currents in the PV interneurons to go at low theta. Thus, the low theta increase is due to added excitation provided by the new learned pathway.”

      (4) In the Discussion section “Assumptions and predictions of the model”, we specified the following:

      “Our model predicts that blockade of D-current in VIP interneurons (or silencing VIP interneurons) will both diminish low theta and prevent fear learning. Finally, the model assumes the absence of significantly strong connections from the excitatory projection cells ECS to PV interneurons, unlike the ones from F to PV. Including those synapses would alter the PING rhythm created by the interactions between F and PV, which is crucial for fine timing between ECS and F needed for LTP.”

      (5) Finally, to broaden the potential interest of our study, we added the following sentences:

      At the conclusion of the abstract:

      “The model makes use of interneurons commonly found in the cortex and, hence, may apply to a wide variety of associative learning situations.” - At the conclusion of the introduction:

      “Finally, we note that the ideas in the model may apply very generally to associative learning in the cortex, which contains similar subcircuits of pyramidal cells and interneurons: PV, SOM and VIP cells.” 

      Also, changes in the emphasis of the paper led us to remove the following from the abstract: “Finally, we discuss how the peptide released by the VIP cell may alter the dynamics of plasticity to support the necessary fine timing.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The manuscript could be improved by addressing the following issues.

      (1) Fig. 3: The analgesic effects after astrocyte ablation appear to recover after one week. Is this due to repopulation of astrocytes?

      Although we did not detect the proliferation of astrocytes, we hypothesized that it was likely related to the microglia phagocytosis of astrocyte debris after astrocyte ablation. Microglia are known to have the function of phagocytosis of cell debris. Diphtheria toxin-mediated cell ablation caused AAV2/5-GfaABC1D-Cre labeled astrocytes death and cell fragmentation. We hypothesized that the microglia could phagocyte the astrocyte fragments and were stimulated to activate type I interferon signal. When microglia phagocyte debris ended, the activation of type I interferon signal was also declined. Reduced activation of type I interferon signal may also be accompanied by recurrence of pain.

      (2) Fig. 3: Please justify the large sample size of n=30-36. Is this sample size based on previous studies or statistical estimation?

      The number of mice was based on our previous report [1], and the increased number of mice may also ensure that the pain data would also be reliable. Not only did we explore the differences between the sexes, and we also needed to obtain samples at different times for different experiments.

      (3) Please try to plot individual data points for some critical time points to demonstrate data distribution. It is also helpful to plot male and female data points separately for some time points.

      Individual data have been plotted as your request and added in the supplementary material.

      (4) It is unclear if the same number of males and females were used in this study, as females were typically used for SCI studies. I wonder if you can use repeated measures Two-Way ANOVA for statistical analysis.

      According to our observations, the number of males and females was not the same, while both of them were sufficient for statistical analysis. In addition, in the process of breeding transgenic mice, we would obtain both male and female mice, and rational use of mice may be better for us. Indeed, previous studies have shown that female mice are more commonly used in pain studies. Although we did not observe a gender difference in this study, it has been reported in the previous studies that gender is one of the factors for pain differences. According to your suggestion, we adopted the Two-Way ANOVA for statistical analysis and updated it in the part of statistical methods, but the statistical results were consistent with the previous results, so we did not modify the statistical results of the pictures.

      (5) Fig. 3C, D: The effects of astrocyte ablation on mechanical pain are mild, compared to thermal pain. Electronic von Frey apparatus may be difficult for mice. It works very well for rats and large animals.

      Since the animals involved in this study were all mice, we did not know how electronic von Frey was used in rats and large animals. But after the using of electronic von Frey, it seems to us that electronic von Frey is very suitable for mouse experiments. Best of all, our electronic von Frey can achieve accuracy as low as 0.01g. This allows us to detect very sensitive pain data, which may be more accurate and intuitive than before.

      (6) Fig. 2B: In the figure legend it states n = 3 biological repeats. There are many more dots in each column. Are these individual animals or spinal cord sections?

      As we describe in our method, n = 3 biological repeats represented three biological repeats per group, i.e., three mice/group with three IF per mouse. We take three or more values in each ascending tract (depending on the partition size of the different ascending tracts of lumbar enlargements). So, we would get more data as shown in Figure 2, which could be also more reliable.

      (7) Fig. 4C: It appears that GFAP is increased by toxin treatment. Please explain this result.

      This figure was calculated for astrocyte activation in the lesion area (T9-10), but not for the lumbar enlargement.

      Reviewer #2 (Recommendations For The Authors):

      Specific Comments:

      RNA-Sequencing Analysis: The strength of the RNA-sequencing data in elucidating the impact of astrocyte elimination is compelling. While the focus on IFN signaling is well-supported, the manuscript overlooks other differentially expressed genes. A deeper analysis or at least a discussion of these genes could enrich the study's conclusions, offering a more holistic view of the underlying mechanisms.

      Although we did not focus more on other relevant differential genes, we focused on the most significant differential genes, for these differential genes have a more significant effect on pain.

      Q2: Figure Presentation: Consolidating Figures 1-3 could increase the clarity of the result presentation, reducing distractions from the main narrative. Certain aspects, such as the comparison of different tracts in Figure 2B and the body weight data in Figure 3C, seem tangential and might be better suited for supplementary materials.

      The comparison of astrocyte activation in different ascending tracts of lumbar enlargements explained the relationships between astrocyte activation and pain, and laid the foundation for the subsequent astrocyte elimination. The weight data is also important, reflecting not only the changes in the overall recovery process after spinal cord injury, but also the effect of astrocyte elimination on the overall effect of mice. Thus, the weight data together with the pain test results will be more intuitive for the reader to understand the change of overall conditions of mice after astrocyte elimination.

      Q3: Schematic Clarity: The schematic in Figure 1A is confusing, particularly in distinguishing between transgenic mice and viral constructs. The inconsistent naming of Cre recombinase (alternatively referred to as Cre, CRE, and sometimes DRE) further complicates understanding. Standardizing these elements would greatly enhance clarity for the readers.

      As we described in the part of method, Gt(ROSA)26Sorem1(CAG-LSL-RSR-tdTomato-2A-DTR)Smoc mice contain both Loxp-stop-Loxp sequence and Rox-stop-Rox sequence. In the process of reproduction, Gt(ROSA)26Sorem1(CAG-LSL-RSR-tdTomato-2A-DTR)Smoc mice crossed with C57BL/6JSmoc-Tg(CAG-Dre)Smoc mice could remove the Rox-stop-Rox sequence, which could further crossed with mice containing Cre recombinase, or with AAV2/5-GfaABC1D-Cre intervention to remove the Loxp-stop-Loxp sequence and induce the expression of tdTomato and DTR.

      Q4: Pathway Analysis: The discussion of the signal pathway analysis in Figure 8 leans heavily on speculation without direct evidence from the study. Distinguishing clearly between findings and literature-derived hypotheses is crucial. A more detailed discussion that properly cites sources for each pathway element would strengthen the manuscript.

      According to your question, we have added this figure to the supplementary picture.

      Q5: Statistical Analysis: The use of one-way ANOVA, despite presenting data in groups, is misaligned with the data's structure. Employing two-way ANOVA followed by post-hoc comparisons is appropriate for statistical analysis.

      According to your suggestions, we adopted the Two-Way ANOVA for statistical analysis and updated it in the part of statistical methods, but the statistical results are consistent with the previous ones. Therefore, we did not modify the statistical results of the pictures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      We appreciate the valuable and constructive comments of Reviewer #1 on our manuscript. We have addressed the comments from Reviewer #1 in the public review in the response to the recommendations for the authors, as the public review comments largely overlap with that of the recommendations for the authors.

      Reviewer #1 (Recommendations For The Authors):

      (1.1) Figure 1 did not use a mock-infected control for the development of R-loops but only a time before infection. I think it would have been a good control to have that after the same time of infection non-infected cells did not show increases in R-loops and this is not a product of the cell cycle.

      We prepared our DRIPc-seq library using cell extracts harvested at 0, 3, 6, and 12 h post-infection (hpi), all at the same post-seeding time point. Each sample was infected with HIV-1 virus in a time-dependent manner. Therefore, it is unlikely that the host cellular R-loop induction observed in our DRIPc-seq results was due to R-loop formation during the cell cycle. In Lines 93–95 of the Results section of the revised manuscript, we have provided a more detailed description of our DRIPc-seq library experimental scheme. Thank you. 

      (1.2) Figure 2 should have included a figure showing the proportion of DRIPc-seq peaks located in different genome features relative to one another instead of whether they were influenced by time post-infection. Figure 2C was performed in HeLa cells, but primary T cell data would have been more relevant as primary CD4+ T cells are more relevant to HIV infection.

      We have included a new figure presenting the relative proportion of DRIPc-seq peaks mapped to different genomic features at each hpi (Fig. 2C of the revised manuscript). We found that the proportion of DRIPc-seq peaks mapped to various genomic compartments remained consistent over the hours following the HIV-1 infection. This further supports our original claim that HIV-1 infection does not induce R-loop enrichment at specific genomic features but that the accumulation of R-loops after HIV-1 infection is widely distributed.

      We considered HeLa cells as the primary in vitro infection model, therefore, we conducted RNA-seq only on HeLa cells. However, we agree with the reviewer's opinion that data from primary CD4+ T cells may be more physiologically relevant. Nevertheless, as demonstrated in the new figure (Fig. 2C of the revised manuscript), HIV-1 infection did not significantly alter the proportion of R-loop peaks mapped to specific genomic compartments, such as gene body regions, in HeLa, primary CD4+ T, and Jurkat cells. Therefore, we anticipate no clear correlation between changes in gene expression levels and R-loop peak detection upon HIV-1 infection, even in primary T cells. Thank you.   

      (1.3) Figure 5G is very hard to see when printed, is there a change in brightness or contrast that could be used? The arrows are helpful but they don't seem to be pointing to much.

      We have highlighted the intensity of the PLA foci and magnified the images in Fig. 5G in the revised manuscript. While editing the images according to your suggestion, we found a misannotation regarding the multiplicity of infection in the number of PLA foci per nucleus quantification analysis graph in Fig. 5G of the original manuscript. We have corrected this issue and hope that it is now much clearer. 

      (1.4) The introduction provided a good background for those who may not have a comprehensive understanding of DNA-RNA hybrids and R-loops, but the rationale that integration in non-expressed sequence implies that R-loops may be involved is very weak and was not addressed experimentally. A better rationale would have been to point out that, although integration in genes is strongly associated with gene expression, the association is not perfect, particularly in that some highly expressed genes are, nonetheless, poor integration targets.

      In accordance with the reviewer's comment, we revised the Introduction. We have deleted the statement and reference in the introduction "... the most favored region of HIV-1 integration is an intergenic locus, ...”, which may overstate the relevance of the R-loop in HIV-1 integration events in non-expressed sequences. Instead, we introduced a more recent finding that high levels of gene expression do not always predict high levels of integration, together with the corresponding citation (Lines 46– 47 of the revised manuscript), according to the reviewer’s suggestion in the reviewer's public review 2)-(a).

      (1.5) The discussion was seriously lacking in connecting their conclusions regarding R-loop targeting of integration to how integration works at the structural level, where it is very clear that concerted integration on the two DNA strands ca 5 bp apart is essential to correct, 2-ended integration. It is very difficult to visualize how this would be possible with the triple-stranded R-loop as a target. The manuscript would be greatly strengthened by an experiment showing concerted integration into a triplestranded structure in vitro using PICs or pure integrase.

      We believe there has been a misunderstanding of our interpretation regarding the putative role of R-loop structures in the HIV-1 integration site mechanism because of some misleading statements in our original manuscript. Based primarily on our current data, we believe that R-loop structures are bound by HIV-1 integrase proteins and lead to HIV-1 viral genome integration into the vicinity regions of the host genomic R-loops. By carefully revising our manuscript, we found that the title, abstract, and discussion of our original manuscript includes phrases, such as “HIV-1 targets R-loops for integration,” which may overstate our finding on the role of R-loop in HIV-1 integration site selection. We replaced these phrases. For example, we used phrases, such as, “HIV-1 favors vicinity regions of R-loop for the viral genome integration,” in the revised manuscript. We apologize for the inconvenience caused by the unclear and nonspecific details of our findings.  

      Using multiple biochemical experiments, we successfully demonstrated the interaction between the cellular R-loop and HIV-1 integrase proteins in cells and in vitro (Fig. 5 of the revised manuscript). However, we could not validate whether the center of the triple-stranded R-loops is the extraction site of HIV-1 integration, where the strand transfer reaction by integrase occurs. This is because an R-loop can be multi-kilobase in size (1, 2); therefore, we displayed a large-scale genomic region (30-kb windows) to present the integration sites surrounding the R-loop centers. Nevertheless, we believe that we validated R-loop-mediated HIV-1 integration in R-loop-forming regions using our pgR-poor and pgR-rich cell line models. When infected with HIV-1, pgR-rich cells, but not pgR-poor cells, showed higher infectivity upon R-loop induction in designated regions following DOX treatment (Fig. 3C and 3D of the revised manuscript). In addition, we quantified site-specific integration events in R-loop regions, and found that a greater number of integration events occurred in designated regions of the pgR-rich cellular genome upon R-loop induction by DOX treatment, but not in pgR-poor cells (Fig. 3E–G of the revised manuscript). 

      We agree with the reviewer that an experiment showing the concerted integration of purified PICs into a triple-stranded structure in vitro would greatly strengthen our manuscript. We attempted the purification of viral DNA (vDNA)-bound PICs using either Sso7d-tagged HIV-1 integrase proteins or non-tagged HIV-1 integrase proteins (F185K/C280S) procured from the NIH HIV reagent program (HRP-20203), following the method described by Passos et al., Science, 2017; 355 (89-92) (3). Despite multiple attempts, we could not purify the nucleic acid-bound protein complexes for in vitro integration assays. However, we believe that pgR-poor and pgR-rich cell line models provide a strong advantage in specificity of our primer readouts. Compounded with our in cellulo observation, we believe that our work provides strong evidence for a causative relationship between R-loop formation/R-loop sites and HIV-1 integration.

      Additionally, in the Discussion section of the revised manuscript, we have expanded our discussion on the role of genomic R-loops contributing in molding the host genomic environment for HIV-1 integration site selection, and the potential explanation on how R-loops are driving integration over long-range genomic regions. Thank you. 

      (1.6) There are serious concerns with the quantitation of integration sites used here, which should be described in detail following line 503 but isn't. In Figure 3, E-G, they are apparently shown as reads per million, while in Figure 4B as "sites (%)" and in 4C as log10 integration frequency." Assuming the authors mean what they say, they are using the worst possible method for quantitation. Counting reads from restriction enzyme-digested, PCR-digested DNA can only mislead. At the numbers provided (MOI 0.6, 10 µg DNA assayed) there would be about 1 million proviruses in the samples assayed, so the probability of any specific site being used more than once is very low, and even less when one considers that a 10% assay efficiency is typical of integration site assays. Although the authors may obtain millions of reads per experiment, the number of reads per site is an irrelevant value, determined only by technical artefacts in the PCR reactions, most significantly the length of the amplicons, a function of the distance from the integration site to the nearest MstII site, further modified by differences in Tm. Better is to collapse identical reads to 1 per site, as may have been done in Figure 4B, however, the efficiency of integration site detection will still be inversely related to the length of the amplicon. Indeed, if the authors were to plot the read frequency against distance to the nearest MstII site, it is likely that they would get plots much like those in Figure 4B.

      Detailed methods for integration site sequencing data processing are described in the Materials and Methods section of the revised manuscript (Line 621–631 of the revised manuscript). We primarily followed HIV-1 integration site sequencing data processing methods previously described by Li et al., mBio, 2020; 11(5) (4).  

      While it may be correct that the HIV-1 integration event cannot occur more than once at a given site, our Fig. 3E, 4C, and 4D of the revised manuscript present the number of integration-site sequencing read counts expressed in reads-per-million (RPM) units or as log10-normalized values. Based on the number of mapped reads from the integration site sequencing results, we can infer that there was an integration event at this site, whether it was a single or multiple event.

      We believe that the original annotation of y-axis, “Integration frequency,” may be misleading as it can be interpreted as a probability of any specific site being used for HIV-1 integration. Therefore, we corrected it as “number of mapped read” for clarity (Fig. 3E–G, 4C and 4D, and the corresponding figure legends of the revised manuscript). We apologize for any confusion. Thank you.

      Other points:

      (1.7) Overall: There are numerous grammatical and usage errors, especially in agreement of subject and verb, and missing articles, sometimes multiple times in the same sentence. These must be corrected prior to resubmission.

      The revised manuscript was edited by a professional editing service. Thank you.

      (1.8) Line 126-134: A striking result, but it needs more controls, as discussed above, including a dose-response analysis.

      We determined the doses of NVP and RAL inhibitors in HeLa cells by optimizing the minimum dose of drug treatment that provided a sufficient inhibitory effect on HIV1 infection (Author response image 1). The primary objective of this experiment was to determine R-loop formation while reverse transcription or integration of the HIV-1 life cycle was blocked, therefore, we do not think that a dose-dependent analysis of inhibitors is required.

      Author response image 1.

      (A and B) Representative flow cytometry histograms of VSV-G-pseudotyped HIV-1-EGFP-infected HeLa cells at an MOI of 1, harvested at 48 hpi. The cells were treated with DMSO, the indicated doses of nevirapine (NVP) (A) or indicated doses of raltegravir (RAL) (B) for 24 h before infection. 

      (1.9) Line 183: Please tell us what ECFP is and why it was chosen. Is there a reference for its failure to form R-loops?

      Ibid: The human AIRN gene is a very poor target for HIV integration in PBMC.

      A high GC skew value (> 0) is a predisposing factor for R-loop formation at the transcription site. This is because a high GC skew causes a newly synthesized RNA strand to hybridize to the template DNA strand, and the non-template DNA strand remains looped out in a single-stranded conformation (5) (Ref 36 in the revised manuscript). The ECFP sequence possessed a low GC skew value, as previously used for an R-loop-forming negative sequence (6) (Ref 17 of the revised manuscript). We have added this description and the corresponding references to Lines 188–192 of the revised manuscript.  

      The human AIRN gene (RefSeq DNA sequence: NC_000006.12) sequence possesses a GC skew value of -0.04, in a window centered at base 2186, while the mouse AIRN (mAIRN) sequence is characterized by a GC skew value of 0.213. The ECFP sequence gave a GC skew value of -0.086 in our calculation. We anticipated that the human AIRN gene region does not form a stable R-loop, and in fact, it did not harbor R-loop enrichment upon HIV-1 infection in our DRIPc-seq data analysis of multiple cell types (Author response image 2)

      Author response image 2.

      Genome browser screenshot over the chromosomal regions in 20-kb windows centered on human AIRN showing results from DRIPc-seq in the indicated HIV-1-infected cells (blue, 0 hpi; yellow, 3 hpi; green, 6 hpi; red, 12 hpi)

      (1.10) Line 190: You haven't shown dependence. Associated is a better word.

      Thank you for the suggestion. We have changed “R-loop-dependent site-specific HIV-1 integration events...” to “R-loop-associated site-specific HIV-1 integration events...” (Line 198 of the revised manuscript) according to the reviewer’s suggestion in the revised manuscript. 

      (1.11) Line 239: What happened to P1? What is the relationship of the P and N regions to genes?

      We have added superimpositions of the P1 chromatin region on DRIPc-seq and the HIV-1 integration frequency to Figure 4C of the revised manuscript. We observed a relevant integration event within the P1 R-loop region, but to a lesser extent than in the P2 and P3 R-loop regions, perhaps because the P1 region has relatively less R-loop enrichment than the P2 and P3 regions, as examined by DRIP-qPCR in S3A Fig. of the revised manuscript.

      Genome browser screenshots with annotations of accommodating genes in the P and N regions are shown in S2A–E Fig. of the revised manuscript, and RNA-seq analysis of the relative gene expression levels of the P1-3 and N1,2 R-loop regions are shown in S4 Table of the revised manuscript. Thank you.

      (1.12) Line 261: But the binding affinity of integrase to the R-loop is somewhat weaker than to double-stranded DNA according to Figure 5A.

      Nucleic acid substrates were loaded at the same molarity, and the percentage of the unbound fraction was calculated by dividing the intensity of the unbound fraction in each lane by the intensity of the unbound fraction in the lane with 0 nM integrase in the binding reaction. The calculated percentages of the unbound fraction from three independent replicate experiments are shown in Fig. 5A, right of the revised manuscript. In our analysis and measurements, the integrase proteins showed higher binding affinities to the R-loop and R-loop comprising nucleic acid structures than to dsDNA in vitro. We hope that this explanation clarifies this point. 

      (1.13) Line 337: "accumulate". This is a not uncommon misinterpretation of the results of studies on the distribution of intact proviruses in elite controllers. The only possible correct interpretation of the finding is that proviruses form everywhere else but cells containing them are eliminated, most likely by the immune system.

      Thank you for the suggestion. We have changed the Line 337 of the original manuscript to “... HIV-1 proviruses in heterochromatic regions are not eliminated but selected by immune system,” in Lines 361-363 of the revised manuscript. 

      (1.14) Line 371 How many virus particles per cell does this inoculum amount to?

      We determined the amount of GFP reporter viruses required to transduce ∼50% of WT Jurkat T cells, corresponding to an approximate MOI of 0.6. We repeatedly obtained 30–50% of VSV-G-pseudotyped HIV-1-EGFP positively infected cells for HIV1 integration site sequencing library construction for Jurkat T cells. 

      (1.15) Line 503 and Figures 3 and 4: There must be a clear description of how integration events are quantitated.

      Detailed methods for integration site sequencing data processing are described in the Materials and Methods section of the revised manuscript (Line 621–631 of the revised manuscript). We primarily followed HIV-1 integration site sequencing data processing methods previously described in Li et al., mBio, 2020; 11(5) (4).

      Reviewer #2 (Public Review):

      Retroviral integration in general, and HIV integration in particular, takes place in dsDNA, not in R-loops. Although HIV integration can occur in vitro on naked dsDNA, there is good evidence that, in an infected cell, integration occurs on DNA that is associated with nucleosomes. This review will be presented in two parts. First, a summary will be provided giving some of the reasons to be confident that integration occurs on dsDNA on nucleosomes. The second part will point out some of the obvious problems with the experimental data that are presented in the manuscript.

      We appreciate your comments. We have carefully addressed the concerns expressed as follows (your comments are in italics):  

      (2.1) 2017 Dos Passos Science paper describes the structure of the HIV intasome. The structure makes it clear that the target for integration is dsDNA, not an R-loop, and there are very good reasons to think that structure is physiologically relevant. For example, there is data from the Cherepanov, Engelman, and Lyumkis labs to show that the HIV intasome is quite similar in its overall structure and organization to the structures of the intasomes of other retroviruses. Importantly, these structures explain the way integration creates a small duplication of the host sequences at the integration site. How do the authors propose that an R-loop can replace the dsDNA that was seen in these intasome structures?

      We do appreciate the current understanding of the HIV-1 integration site selection mechanism and the known structure of the dsDNA-bound intasome. Our study proposes an R-loop as another contributor to HIV-1 integration site selection. Recent studies providing new perspectives on HIV-1 integration site targeting motivated our current work. For instance, Ajoge et al., 2022 (7) indicated that a guanine-quadruplex (G4) structure formed in the non-template DNA strand of the R-loop influences HIV-1 integration site targeting. Additionally, I. K. Jozwik et al., 2022 (8) showed retroviral integrase protein structure bound to B-to-A transition in target DNA. R-loop structures are a prevalent class of alternative non-B DNA structures (9). We acknowledge the current understanding of HIV-1 integration site selection and explore how R-loop interactions may contribute to this knowledge in the Discussion section of our manuscript. 

      Primarily based on our current data, we believe that R-loop structures are bound by HIV-1 integrase proteins and lead to HIV-1 viral genome integration into the vicinity regions of the host genomic R-loops, but we do not claim that R-loops completely replace dsDNA as the target for HIV-1 integration. An R-loop can be multi-kilobase in size and the R-loop peak length widely varies depending on the immunoprecipitation and library construction methods (1, 2), therefore, we could not validate whether the center of triple-stranded R-loops is the extraction site of HIV-1 integration where the strand transfer reaction by integrase occurs. Therefore, we replaced phrases such as, “HIV-1 targets R-loops for integration,” which may overstate our finding on the role of R-loop in HIV-1 integration site selection, with phrases, such as, “HIV-1 favors vicinity regions of R-loop for the viral genome integration,” in the revised manuscript. We apologize for the inconvenience caused by the unclear and non-specific details of our findings. Nevertheless, we believe that we validated R-loop-mediated HIV-1 integration in R-loop-forming regions using our pgR-poor and pgR-rich cell line models. We quantified site-specific integration events in the R-loop regions, and found that a greater number of integration events occurred in designated regions of the pgR-rich cellular genome upon R-loop induction by DOX treatment, but not in pgR-poor cells (Fig. 3E–G of the revised manuscript). 

      dsDNA may have been the sole target of the intasome demonstrated in vitro possibly because dsDNA has only been considered as a substrate for in vitro intasome assembly. We hope that our work will initiate and advance future investigations on target-bound intasome structures by considering R-loops as potential new targets for integrated proteins and intasomes.  

      (2.2) As noted above, concerted (two-ended) integration can occur in vitro on a naked dsDNA substrate. However, there is compelling evidence that, in cells, integration preferentially occurs on nucleosomes. Nucleosomes are not found in R loops. In an infected cell, the viral RNA genome of HIV is converted into DNA within the capsid/core which transits the nuclear pore before reverse transcription has been completed. Integration requires the uncoating of the capsid/core, which is linked to the completion of viral DNA synthesis in the nucleus. Two host factors are known to strongly influence integration site selection, CPSF6 and LEDGF. CPSF6 is involved in helping the capsid/core transit the nuclear pore and associate with nuclear speckles. LEDGF is involved in helping the preintegration complex (PIC) find an integration site after it has been released from the capsid/core, most commonly in the bodies of highly expressed genes. In the absence of an interaction of CPSF6 with the core, integration occurs primarily in the lamin-associated domains (LADs). Genes in LADs are usually not expressed or are expressed at low levels. Depending on the cell type, integration in the absence of CPSF6 can be less efficient than normal integration, but that could well be due to a lack of LEDGF (which is associated with expressed genes) in the LADs. In the absence of an interaction of IN with LEDGF (and in cells with low levels of HRP2) integration is less efficient and the obvious preference for integration in highly expressed genes is reduced. Importantly, LEDGF is known to bind histone marks, and will therefore be preferentially associated with nucleosomes, not R-loops. LEDGF fusions, in which the chromatin binding portion of the protein is replaced, can be used to redirect where HIV integrates, and that technique has been used to map the locations of proteins on chromatin. Importantly, LEDGF fusions in which the chromatin binding component of LEDGF is replaced with a module that recognizes specific histone marks direct integration to those marks, confirming integration occurs efficiently on nucleosomes in cells. It is worth noting that it is possible to redirect integration to portions of the host genome that are poorly expressed, which, when taken with the data on integration into LADs (integration in the absence of a CPSF6 interaction) shows that there are circumstances in which there is reasonably efficient integration of HIV DNA in portions of the genome in which there are few if any R-loops.

      Although R-loops may not wrap around nucleosomes, long and stable R-loops likely cover stretches of DNA corresponding to multiple nucleosomes (10). For example, R-loops are associated with high levels of histone marks, such as H3K36me3, which LEDGF recognizes (2, 11). R-loops dynamically regulate the chromatin architecture. Possibly by altering nucleosome occupancy, positioning, or turnover, R-loop structures relieve superhelical stress and are often associated with open chromatin marks and active enhancers (2, 10). These features are also distributed over HIV-1 integration sites (12). In the Discussion section of the revised manuscript, we explored the R-loop molding mechanisms in the host genomic environment for HIV-1 integration site selection and its potential collaborative role with LEDGF/p75 and CPSF6 governing HIV-1 integration site selection. 

      By carefully revising our original manuscript, with respect to the reviewer's comment, we recognized the need to tone down our statements. We found that the title, abstract, and discussion of our original manuscript includes phrases, such as, “HIV-1 targets Rloops for integration,” which may overstate our finding on the role of R-loop in HIV-1 integration site selection. We replaced these phrases. For example, we used phrases, such as “HIV-1 favors vicinity regions of R-loop for the viral genome integration,” in the revised manuscript. We apologize for the inconvenience caused by the unclear and non-specific details of our findings.

      (2.3) Given that HIV DNA is known to preferentially integrate into expressed genes and that R-loops must necessarily involve expressed RNA, it is not surprising that there is a correlation between HIV integration and regions of the genome to which R loops have been mapped. However, it is important to remember that correlation does not necessarily imply causation.

      We understand the reviewer's concern regarding the possibility of a coincidental correlation between the R-loop regions and HIV-1 integration sites, particularly when the interpretation of this correlation is primarily based on a global analysis. 

      Therefore, we designed pgR-poor and pgR-rich cell lines, which we believe are suitable models for distinguishing between integration events driven by transcription and the presence of R-loops. Although the two cell lines showed comparable levels of transcription at the designated region upon DOX treatment via TRE promoter activation (Fig. 3B of the revised manuscript), only pgR-rich cells formed R-loops at the designated regions (Fig. 3C of the revised manuscript). When infected with HIV1, pgR-rich cells, but not pgR-poor cells, showed higher infectivity after DOX treatment (Fig. 3D of the revised manuscript). Moreover, we quantified site-specific integration events in the R-loop regions, and found that a greater number of integration events occurred in designated regions of the pgR-rich cellular genome upon R-loop induction by DOX treatment, but not in pgR-poor cells (Fig. 3E of the revised manuscript). Therefore, we concluded that transcriptional activation without an R-loop (in pgR-poor cells) may not be sufficient to drive HIV-1 integration. We believe that our work provides strong evidence for a causative relationship between R-loop formation/Rloop sites and HIV-1 integration. We hope that our explanation addresses your concerns. Thank you.

      If we consider some of the problems in the experiments that are described in the manuscript:

      (2.4) In an infected individual, cells are almost always infected by a single virion and the infecting virion is not accompanied by large numbers of damaged or defective virions. This is a key consideration: the claim that infection by HIV affects R-loop formation in cells was done with a VSVg vector in experiments in which there appears to have been about 6000 virions per cell. Although most of the virions prepared in vitro are defective in some way, that does not mean that a large fraction of the defective virions cannot fuse with cells. In normal in vivo infections, HIV has evolved in ways that avoid signaling infected the cell of its presence. To cite an example, carrying out reverse transcription in the capsid/core prevents the host cell from detecting (free) viral DNA in the cytoplasm. The fact that the large effect on R-loop formation which the authors report still occurs in infections done in the absence of reverse transcription strengthens the probability that the effects are due to the massive amounts of virions present, and perhaps to the presence of VSVg, which is quite toxic. To have physiological relevance, the infections would need to be carried out with virions that contain HIV even under circumstances in which there is at most one virion per cell.

      Our virus production and in vitro and ex vivo HIV-1 infection experimental conditions, designed for infecting cell types, such as HeLa cells and primary CD4+ T cells with VSV-G pseudotyped HIV, were based on a comprehensive review of numerous references. At the very beginning of this study, we tested HIV-1-specific host genomic R-loop induction using empty virion particles (virus-like particles, VLP) or other types of viruses (non-retrovirus, SeV; retroviruses, FMLV and FIV), all produced with a VSV G protein donor. We could not include a control omitting the VSV G protein or using natural HIV-1 envelope protein to prevent viral spread in culture. We observed that despite all types of virus stocks being prepared using VSV-G, only cells infected with HIV-1 viruses showed R-loop signal enrichment (Author response image 3). Therefore, we omitted the control for the VSV G protein in subsequent analyses, such as DRIPcseq. We have also revised our manuscript to provide a clearer description of the experimental conditions. In particular, we now clearly stated that we used VSV-G pseudotyped HIV-1 in this study, throughout the abstract, results, and discussion sections of the revised manuscript. Thank you.

      Author response image 3.

      (A) Dot blot analysis of the R-loop in gDNA extracts from HIV-1 infected U2OS cells with MOI of 0.6 harvested at 6 hpi. The gDNA extracts were incubated with or without RNase H in vitro before membrane loading (anti-S9.6 signal). (B) Dot blot analysis of the R-loop in gDNA extracts from HeLa cells infected with 0.3 MOI of indicated viruses. The infected cells were harvested at 6 hpi. The gDNA extracts were incubated with or without RNase H in vitro before membrane loading (anti-S9.6 signal).

      HIV-1 co-infection may also be expected in cell-free HIV-1 infections. However, it was previously suggested that the average number of infection events varies within 1.02 to 1.65 based on a mathematical model that estimates the frequency of multiple infections with the same virus (Figure 4c of Ito et al., Sci. Rep, 2017; 6559) (13). 

      (2.5) Using the Sso7d version of HIV IN in the in vitro binding assays raises some questions, but that is not the real question/problem. The real problem is that the important question is not what/how HIV IN protein binds to, but where/how an intasome binds. An intasome is formed from a combination of IN bound to the ends of viral DNA. In the absence of viral DNA ends, IN does not have the same structure/organization as it has in an intasome. Moreover, HIV IN (even Sso7d, which was modified to improve its behavior) is notoriously sticky and hard to work with. If viral DNA had been included in the experiment, intasomes would need to be prepared and purified for a proper binding experiment. To make matters worse, there are multiple forms of multimeric HIV IN and it is not clear how many HIV INs are present in the PICs that actually carry out integration in an infected cell.

      As the reviewer has noted, HIV IN, even with Sso7d tagging, is difficult. We attempted the purification of viral DNA (vDNA)-bound PICs using either Sso7d-tagged HIV-1 integrase proteins or non-tagged HIV-1 integrase proteins (F185K/C280S), procured from the NIH HIV reagent program (HRP-20203), following the method described by Passos et al., Science, 2017; 355 (89-92) (3). Despite multiple attempts, we were unable to purify the vDNA-bound IN protein complexes for in vitro assays. However, through multiple biochemical experiments, we believe that we have successfully demonstrated the interaction between cellular R-loops and HIV-1 integrase proteins both in cells and in vitro (Fig. 5A–F of the revised manuscript). We also observed a close association between integrase proteins and host cellular Rloops in HIV-1-infected cells, using a fluorescent recombinant virus (HIV-IN-EGFP) with intact IN-EGFP PICs (Fig. 5G of the revised manuscript). 

      (2.6) As an extension of comment 2, the proper association of an HIV intasome/PIC with the host genome requires LEDGF and the appropriate nucleic acid targets need to be chromatinized.

      The interaction between cellular R-loops and HIV-1 integrase proteins in HeLa cells endogenously expressing LEDGF/p75 was examined using reciprocal immunoprecipitation assays in Fig. 5C–F, S6B, and S6D Fig. of the revised manuscript. In addition, as discussed in more detail in our response to comment [28], we observed a close association between host cellular R-loops and HIV-1 integrase proteins by PLA assay, in HIV-1-infected HeLa cells. 

      (2.7) Expressing any form of IN, by itself, in cells to look for what IN associates with is not a valid experiment. A major factor that helps to determine both where integration takes place and the sites chosen for integration is the transport of the viral DNA and IN into the nucleus in the capsid core. However, even if we ignore that important part of the problem, the IN that the authors expressed in HeLa cells won't be bound to the viral DNA ends (see comment 2), even if the fusion protein would be able to form an intasome. As such, the IN that is expressed free in cells will not form a proper intasome/PIC and cannot be expected to bind where/how an intasome/PIC would bind.

      As discussed in more detail in our response to comment [2-8], we believe that our PLA experiment using the pVpr-IN-EGFP virus, which has previously been examined for virion integrity, as well as the IN-EGFP PICs (14), demonstrated a close association between host cellular R-loops and HIV-1 integrase proteins in HIV-1-infected cells. 

      (2.8) As in comment 1, for the PLA experiments presented in Figure 5 to work, the number of virions used per cell (which differs from the MOI measured by the number of cells that express a viral marker) must have a high, which is likely to have affected the cells and the results of the experiment. However, there is the additional question of whether the IN-GFP fusion is functional. The fact that the functional intasome is a complex multimer suggests that this could be a problem. There is an additional problem, even if IN-GFP is fully functional. During a normal infection, the capsid core will have delivered copies of IN (and, in the experiments reported here, the IN-GFP fusion) into the nucleus that is not part of the intasome. These "free" copies of IN (here IN-GFP) are not likely to go to the same sites as an intasome, making this experiment problematic (comment 4).

      The HIV-IN-EGFP virus stock was produced by polyethylenimine-mediated transfection of HEK293T cells with 6 µg of pVpr-IN-EGFP, 6 µg of HIV-1 NL4-3 noninfectious molecular clone (pD64E; NIH AIDS Reagent Program 10180), and 1 µg of pVSV-G as previously described in (14), and described in the Materials and Methods section of our manuscript. The pVpr-IN-EGFP vector used to produce HIV-1-IN-EGFP virus stock was provided by Anna Cereseto group (Albanese et al., PLOS ONE, 2008; 6(6); Ref 34 of the revised manuscript). It was previously reported that the HIV-1INEGFP virions produced by IN-EGFP trans-incorporation through Vpr are intact and infective viral particles (Figure 1 of Albanese et al., PLOS ONE, 2008; 6(6)). Therefore, we believe that the HIV-IN-EGFP used in our PLA experiments was functional. 

      Additionally, Albanese et al. showed that the EGFP signal of HIV-IN-EGFP virions colocalizes with the viral protein matrix (p17MA) and capsid (P24CA) as well as with the newly synthesized cDNA produced by reverse transcriptase by labeling and visualizing the synthesized cDNA (14). In addition, the fluorescent recombinant virus (HIV-INEGFP) was structurally intact at the nuclear level (Figure 6 of Albanese et al., PLOS ONE, 2008; 6(6)). Therefore, we believe that our PLA experimental result is not likely misled as the reviewer concerns due to the integrity of the HIV-IN-EGFP virion as well as IN-EGFP PICs.

      Furthermore, the in vitro HIV-1 infection setting of our PLA experiments was carefully determined based on multiple studies that performed image-based assays on HIV-1infected cells. For instance, Albanese et al. infected 4 × 104 cells with viral loads equivalent to 1.5 or 3 µg of HIV-1 p24 for their immunofluorescence analysis, in their previous report (14). We titrated the fluorescent HIV-1 virus stocks by examining both the multiplicity of infection (MOI) and quantifying the HIV-1 p24 antigen content (Author response image 4). In our calculation, we infected 5 × 104 HeLa cells with viral loads equivalent to 1.3 ug of HIV-1 p24, which is indicated as 2 MOI in Fig. 5G of our manuscript, for our PLA experiments. 

      Image-Based Assays often require increased and enhanced signal for statistical robustness. For example, Achuthan et al. infected cells with VSV-G-pseudotyped HIV1 at the approximate MOI of 350 for vDNA and PIC visualization (15). Therefore, we believe our experimental condition for PLA experiments, which we carefully designed based on previous study that are frequently referred, are reasonable. We really hope that our discussion sufficiently addressed the reviewer’s concern. 

      Author response image 4.

      Gating strategy used to determine HIV-1-infectivity in HeLa cells at 48 hpi. Cells were infected with a known p24 antigen content in the stock of the VSV-G-pseudotyped HIV-1-EGFP-virus. The percentages of GFP-positive cell population are indicated.

      (2.9) In the Introduction, the authors state that the site of integration affects the probability that the resulting provirus will be expressed. Although this idea is widely believed in the field, the actual data supporting it are, at best, weak. See, for example, the data from the Bushman lab showing that the distribution of integration sites is the same in cells in which the integrated proviruses are, and are not, expressed. However, given what the authors claim in the introduction, they should be more careful in interpreting enzyme expression levels (luciferase) as a measure of integration efficiency in experiments in which they claim proviruses are integrated in different places.

      We thank the reviewer for the constructive comment. We have changed the statement in Lines 41–42 in the Introduction section of our original manuscript to “The chromosomal landscape of HIV-1 integration influences proviral gene expression, persistence of integrated proviruses, and prognosis of antiretroviral therapy.” (Lines 39-41 of the revised manuscript). We believe that this change can tone-down the relevance between the site of integration and the provirus expression level.

      The piggyBac transposase randomly insert the “cargo (transposon)” into TTAA chromosomal sites of the target genome, generating efficient insertions at different genomic loci (16, 17). We believe that this random insertion of the pgR-poor/rich vector mediated by the piggyBac system allows us not to mislead the R-loop-mediated HIV1 integration site because of the genome locus bias of the vector insertion. Therefore, Figure 3 in our manuscript does not claim any relevance between the site of integration and the resulting provirus expression levels. Instead, as noted in Line 214 of the revised manuscript, using the luciferase reporter HIV-1 virus, we attempted to examine HIV-1 infection in cells with an "extra number of R-loops” in the host cellular genome. We observed that pgR-rich cells showed higher luciferase activity upon DOX treatment than pgR-poor cells (Fig. 3D of the revised manuscript). We believe that this is because a greater number of HIV-1 integration events may occur in pgR-rich cells, where DOX-inducible de novo R-loop regions are introduced. This has been further examined in Fig. 3E–G of the revised manuscript. We hope this explanation clarifies the Figure 3. Thank you. 

      (2.10) Using restriction enzymes to create an integration site library introduces biases that derive from the uneven distribution of the recognition sites for the restriction enzymes.

      As described in the Materials and Methods section, we adopted a sequencing library construction method using a previously established protocol (18, 19). Although we recognize the advantages of DNA fragmentation by sonication, in in vitro or ex vivo HIV-1 infection settings, where the multiplicity of infection is carefully determined based on multiple references, more copies of integrated viral sequences are expected compared to that in samples from infected patients (18). Therefore, in these settings, restriction enzyme-based DNA fragmentation and ligation-mediated PCR sequencing are well-established methods that provide significant data sources for HIV-1 integration site sequencing (15, 20-22). Furthermore, our data showing the proportion of integration sites over R-loop regions (Fig. 4B of the revised manuscript) are presented alongside the respective random controls (i.e., proportion of integration sites within the 30-kb windows centered on randomized DRIPc-seq peaks, gray dotted lines; control comparisons between randomized integration sites with DRIPc-seq peaks, black dotted lines; and randomized integration sites with randomized DRIPcseq peaks, gray solid lines), which do not show such a correlation between the HIV-1 integration sites and nearby areas of the R-loop regions. Therefore, we believe that our results from the integration site sequencing data analysis are unlikely to be biased. 

      Reviewer #3 (Public Review):

      In this manuscript, Park and colleagues describe a series of experiments that investigate the role of R-loops in HIV-1 genome integration. The authors show that during HIV-1 infection, R-loops levels on the host genome accumulate. Using a synthetic R-loop prone gene construct, they show that HIV-1 integration sites target sites with high R-loop levels. They further show that integration sites on the endogenous host genome are correlated with sites prone to R-loops. Using biochemical approaches, as well as in vivo co-IP and proximity ligation experiments, the authors show that HIV-1 integrase physically interacts with R-loop structures.

      My primary concern with the paper is with the interpretations the authors make about their genome-wide analyses. I think that including some additional analyses of the genome-wide data, as well as some textual changes can help make these interpretations more congruent with what the data demonstrate. Here are a few specific comments and questions:

      We are grateful for the time and effort we spent on our behalf and the reviewer’s appreciation for the novelty of our work, in particular, R-loop induction by HIV-1 infection and the correlation between host R-loops and the genomic site of HIV-1 integration. In the following sections, we provide our responses to your comments and suggestions. Your comments are in italics. We have carefully addressed the following issues.

      (3.1) I think Figure 1 makes a good case for the conclusion that R-loops are more easily detected HIV-1 infected cells by multiple approaches (all using the S9.6 antibody). The authors show that their signals are RNase H sensitive, which is a critical control. For the DRIPc-Seq, I think including an analysis of biological replicates would greatly strengthen the manuscript. The authors state in the methods that the DRIPc pulldown experiments were done in biological replicates for each condition. Are the increases in DRIPc peaks similar across biological replicates? Are genomic locations of HIV-1-dependent peaks similar across biological replicates? Measuring and reporting the biological variation between replicate experiments is crucial for making conclusions about increases in R-loop peak frequency. This is partially alleviated by the locus-specific data in Figure S3A. However, a better understanding of how the genome-wide data varies across biological replicates will greatly enhance the quality of Figure 1.

      DRIPc-seq experiments were conducted with two biological replicates. To define consensus DRIPc-seq peaks using these two replicates, we used two methods applicable to ChIP-seq analysis: the irreproducible discovery rate (IDR) method and sequencing data pooling. We found that the sequencing data pooling method yielded significantly more DRIPc-seq peaks than consensus peak identification through IDR, and we decided to utilize R-loop peaks from pooled sequencing data for our downstream analyses, as described in the figure legends and Materials and Methods of the revised manuscript. 

      As noted by the reviewer, it is important to verify whether the increasing trend in the number of R-loop peaks and genomic locations of HIV-1 dependent R-loops were consistently observed across the two biological replicates. Therefore, we independently performed R-loop calling on each replicate of the sequencing data of primary CD4+ T cells from two individual donors to verify that the increase in R-loop numbers was consistent (Author response image 5). Additionally, the overlap of the R-loop peaks between the two replicates was statistically significant across the genome (Author response table 1). Thank you.

      Author response image 5.

      Bar graph indicating DRIPc-seq peak counts for HIV-1-infected primary CD4+ T cells harvested at the indicated hours post infection (hpi). Pre-immunoprecipitated samples were untreated (−) or treated (+) with RNase H, as indicated. Each dot corresponds to an individual data set from two biologically independent experiments.

      Author response table 1.

      DRIPc-seq peak length and Chi-square p-value in CD4+ T cells from individual donor 1 and 2 

      (3.2) I think that the conclusion that R-loops "accumulate" in infected cells is acceptable, given the data presented. However, in line 134 the authors state that "HIV1 infection induced host genomic R-loop formation". I suggest being very specific about the observation. Accumulation can happen by (a) inducing a higher frequency of the occurrence of individual R-loops and/or (b) stabilizing existing R-loops. I'm not convinced the authors present enough evidence to claim one over the other. It is altogether possible that HIV-1 infection stabilizes R-loops such that they are more persistent (perhaps by interactions with integrase?), and therefore more easily detected. I think rephrasing the conclusions to include this possibility would alleviate my concerns.

      We thank the reviewer for the considerable discussion on our manuscript. We have now changed Line 134 to, “HIV-1 infection induces host genomic R-loop enrichment” (Lines 132-133 of the revised manuscript), and added a new conclusion sentence implicating the possible explanation for the R-loop signal enrichment upon HIV-1 infection (Lines 133–135 of the revised manuscript), according to the reviewer's suggestion.    

      (3.3) A technical problem with using the S9.6 antibody for the detection of R-loops via microscopy is that it cross-reacts with double-stranded RNA. This has been addressed by the work of Chedin and colleagues (as well as others). It is absolutely essential to treat these samples with an RNA:RNA hybrid-specific RNase, which the authors did not include, as far as their methods section states. Therefore, it is difficult to interpret all of the immunofluorescence experiments that depend on S9.6 binding.

      We understand the reviewer's concern regarding the cross-reactivity of the S9.6 antibody with more abundant dsRNA, particularly in imaging applications. We carefully designed the experimental and analytical methods for R-loop detection using microscopy. For example, we pre-extracted the cytoplasmic fraction before staining with the S9.6 antibody and quantified the R-loop signal by subtracting the nucleolar signal. Both of these steps were taken to eliminate the possibility of misdetecting Rloops via microscopy because of the prominent cytoplasmic and nucleolar S9.6 signals, which primarily originate from ribosomal RNA. In addition, we included R-loop negative control samples in our microscopy analysis that were subjected to intensive RNase H treatment (60U/mL RNase H for 36 h) and observed a significant reduction in the S9.6 signal (Figure 1E of the revised manuscript). RNase H-treated samples served as essential and widely accepted negative controls for R-loop detection. 

      We would like to point out that recent studies have reported strong intrinsic specificity of S9.6 anybody for DNA:RNA hybrid duplex over dsDNA and dsRNA, along with the structural elucidations of S9.6 antibody recognition of hybrids (23, 24). Therefore, our interpretation of host cellular R-loop enrichment after HIV-1 infection using S9.6 antibodies in multiple biochemical approaches is well supported. Nevertheless, we agree with the reviewer's opinion that additional negative controls for the detection of R-loops via microscopy, such as RNase T1-and RNase III-treated samples, could improve the robustness and accuracy of R-loop imaging data (25).  

      (3.4) Given that there is no clear correlation between expression levels and R-loop peak detection, combined with the data that show increased detection of R-loop frequency in non-genic regions, I think it will be important to show that the R-loop forming regions are indeed transcribed above background levels. This will help alleviate possible concerns that there are technical errors in R-loop peak detection.

      Figures S5D and S5E in the revised manuscript show the relative gene expression levels of the R-loop-forming positive regions (P1-3) and the referenced Rloop-positive loci (RPL13A and CALM3). The gene expression levels of these R-loopforming regions were significantly higher than those of the ECFP or mAIRN genes without DOX treatment, which can be considered background levels of transcription in cells. Thank you. 

      (3.5) In Figures 4C and D the hashed lines are not defined. It is also interesting that the integration sites do not line up with R-loop peaks. This does not necessarily directly refute the conclusions (especially given the scale of the genomic region displayed), but should be addressed in the manuscript. Additionally, it would greatly improve Figure 4 to have some idea about the biological variation across replicates of the data presented 4A.

      We thank the reviewer for the considerable comment on our study. First of all, we added an annotation for the dashed lines in the figure legends of Figures 4C and 4D in the revised manuscript.

      We agree with the reviewer's interpretation of the relationship between the integration sites and R-loop peaks. Primarily based on our current data, we believe R-loop structures are bound by HIV-1 integrase proteins and lead HIV-1 viral genome integration into the “vicinity” regions of the host genomic R-loops. We displayed a large-scale genomic region (30-kb windows) to present integration sites surrounding R-loop centers because an R-loop can be multi-kilobase in size (1, 2). Depending on the immunoprecipitation and library construction methods, the R-loop peaks varied in size, and the peak length showed a wide distribution (Figure 3B of Malig et al., 2020, Figure 1B of Sanz et al., 2016, and Figure 2A of the revised manuscript). Therefore, presenting integration site events within a wide window of R-loop peaks could be more informative and better reflect the current understanding of R-loop biology.

      R-loop formation recruits diverse chromatin-binding protein factors, such as H3K4me1, p300, CTCF, RAD21, and ZNF143 (Figure 6A and 6B of Sanz et al., 2016) (26), which allow R-loops to exhibit enhancer and insulator chromatin states, which can act as distal regulatory elements (26, 27). We have demonstrated physical interactions between host cellular R-loops and HIV-1 integrase proteins (Figure 5 of the revised manuscript), therefore, we believe that this ‘distal regulatory element-like feature’ of the R-loop can be a potential explanation for how R-loops drive integration over longrange genomic regions.

      According to your suggestion, we added this explanation to the relevant literature in the Discussion section of the revised manuscript.

      Author response image 6 which represents the biological variation across replicates of the data shown in Figure 4A. The integration site sequencing data for Jurkat cells were adopted from SRR12322252 (4), which consists of the integration site sequencing data of HIV-1-infected wild type Jurkat cells with one biological replicate. We hope that our explanations and discussion have successfully addressed your concerns. Thank you. 

      Author response image 6.

      Bar graphs showing the quantified number of HIV-1 integration sites per Mb pair in total regions of 30-kb windows centered on DRIPc-seq peaks from HIV-1 infected HeLa cells and primary CD4+ T cells (magenta) or non-R-loop region in the cellular genome (gray). Each dot corresponds to an individual data set from two biologically independent experiments.

      (3.6) The authors do not adequately describe the Integrase mutant that they use in their biochemical experiments in Figure 5A. Could this impact the activity of the protein in such a way that interferes with the interpretation of the experiment? The mutant is not used in subsequent experiments for Figure 5 and so even though the data are consistent with each other (and the conclusion that Integrase interacts with R-loops) a more thorough explanation of why that mutant was used and how it impacts the biochemical activity of the protein will help the interpretation of the data presented in Figure 5.

      We appreciate the reviewer’s suggestions. In our EMSA analysis, we purified and used Sso7d-tagged HIV-1 integrase proteins with an active-site amino acid substitution, E152Q. First, we used the Sso7d-tagged HIV-1 integrase protein, as it has been suggested in previous studies that the fusion of small domains, such as Sso7d (DNA binding domain) can significantly improve the solubility of HIV integrase proteins without affecting their ability to assemble with substrate nucleic acids and their enzymatic activity (Figure 1B of Li et al., PLOS ONE, 2014;9 (8) (28, 29). We used an integrase protein with an active site amino acid substitution, E152Q, in our mobility shift assay, because the primary goal of this experiment was to examine the ability of the protein to bind or form a complex with different nucleic acid substrates. We thought that abolishing the enzymatic activity of the integrase protein, such as 3'-processing that cleaves DNA substrates, would be more appropriate for our experimental objective. This Sso7d tagged- HIV-1 integrase with the E152Q mutation has also been used to elucidate the structural model of the integrase complex with a nucleic acid substrate by cryo-EM (3) and has been shown to not disturb substrate binding.   Based on the reviewer’s comments, we have added a description of the E152Q mutant integrase protein in Lines 268–270 of the revised manuscript. Thank you.

      Reviewer #3 (Recommendations For The Authors):

      The paper suffers from many grammatical errors, which sometimes interfere with the interpretations of the experiments. In the view of this reviewer, the manuscript must be carefully revised prior to publication. For example, lines 247-248 "Intasomes consist of HIV-1 viral cDNA and HIV-1 coding protein, integrases." It is unclear from this sentence whether there are multiple integrases or multiple proteins that interact with the viral genome to facilitate integration. This makes the subsequent experiments in Figure 5 difficult to interpret. There are many other examples, too numerous to point out individually.

      We thoughtfully revised the original manuscript, making the best efforts to provide clearer details of our findings. We believe that we have made substantial changes to the manuscript, including Lines 247–248 of the original manuscript that the reviewer noted. Furthermore, the revised manuscript was edited by a professional editing service. Thank you.     (1) M. Malig, S. R. Hartono, J. M. Giafaglione, L. A. Sanz, F. Chedin, Ultra-deep Coverage Singlemolecule R-loop Footprinting Reveals Principles of R-loop Formation. J Mol Biol 432, 22712288 (2020).

      (2) L. A. Sanz et al., Prevalent, Dynamic, and Conserved R-Loop Structures Associate with Specific Epigenomic Signatures in Mammals. Mol Cell 63, 167-178 (2016).

      (3) D. O. Passos et al., Cryo-EM structures and atomic model of the HIV-1 strand transfer complex intasome. Science 355, 89-92 (2017).

      (4) W. Li et al., CPSF6-Dependent Targeting of Speckle-Associated Domains Distinguishes Primate from Nonprimate Lentiviral Integration. mBio 11,  (2020).

      (5) P. A. Ginno, Y. W. Lim, P. L. Lott, I. Korf, F. Chedin, GC skew at the 5' and 3' ends of human genes links R-loop formation to epigenetic regulation and transcription termination. Genome Res 23, 1590-1600 (2013).

      (6) S. Hamperl, M. J. Bocek, J. C. Saldivar, T. Swigut, K. A. Cimprich, Transcription-Replication Conflict Orientation Modulates R-Loop Levels and Activates Distinct DNA Damage Responses. Cell 170, 774-786 e719 (2017).

      (7) H. O. Ajoge et al., G-Quadruplex DNA and Other Non-Canonical B-Form DNA Motifs Influence Productive and Latent HIV-1 Integration and Reactivation Potential. Viruses 14,  (2022).

      (8) I. K. Jozwik et al., B-to-A transition in target DNA during retroviral integration. Nucleic Acids Res 50, 8898-8918 (2022).

      (9) F. Chedin, C. J. Benham, Emerging roles for R-loop structures in the management of topological stress. J Biol Chem 295, 4684-4695 (2020).

      (10) F. Chedin, Nascent Connections: R-Loops and Chromatin Patterning. Trends Genet 32, 828838 (2016).

      (11) P. B. Chen, H. V. Chen, D. Acharya, O. J. Rando, T. G. Fazzio, R loops regulate promoterproximal chromatin architecture and cellular differentiation. Nat Struct Mol Biol 22, 9991007 (2015).

      (12) A. R. Schroder et al., HIV-1 integration in the human genome favors active genes and local hotspots. Cell 110, 521-529 (2002).

      (13) Y. Ito et al., Number of infection events per cell during HIV-1 cell-free infection. Sci Rep 7, 6559 (2017).

      (14) A. Albanese, D. Arosio, M. Terreni, A. Cereseto, HIV-1 pre-integration complexes selectively target decondensed chromatin in the nuclear periphery. PLoS One 3, e2413 (2008).

      (15) V. Achuthan et al., Capsid-CPSF6 Interaction Licenses Nuclear HIV-1 Trafficking to Sites of Viral DNA Integration. Cell Host Microbe 24, 392-404 e398 (2018).

      (16) X. Li et al., piggyBac transposase tools for genome engineering. Proc Natl Acad Sci U S A 110, E2279-2287 (2013).

      (17) Y. Cao et al., Identification of piggyBac-mediated insertions in Plasmodium berghei by next generation sequencing. Malar J 12, 287 (2013).

      (18) E. Serrao, P. Cherepanov, A. N. Engelman, Amplification, Next-generation Sequencing, and Genomic DNA Mapping of Retroviral Integration Sites. J Vis Exp,  (2016).

      (19) K. A. Matreyek et al., Host and viral determinants for MxB restriction of HIV-1 infection. Retrovirology 11, 90 (2014).

      (20) G. A. Sowd et al., A critical role for alternative polyadenylation factor CPSF6 in targeting HIV-1 integration to transcriptionally active chromatin. Proc Natl Acad Sci U S A 113, E10541063 (2016).

      (21) B. Lucic et al., Spatially clustered loci with multiple enhancers are frequent targets of HIV-1 integration. Nat Commun 10, 4059 (2019).

      (22) P. K. Singh, G. J. Bedwell, A. N. Engelman, Spatial and Genomic Correlates of HIV-1 Integration Site Targeting. Cells 11,  (2022).

      (23) C. Bou-Nader, A. Bothra, D. N. Garboczi, S. H. Leppla, J. Zhang, Structural basis of R-loop recognition by the S9.6 monoclonal antibody. Nat Commun 13, 1641 (2022).

      (24) Q. Li et al., Cryo-EM structure of R-loop monoclonal antibody S9.6 in recognizing RNA:DNA hybrids. J Genet Genomics 49, 677-680 (2022).

      (25) J. A. Smolka, L. A. Sanz, S. R. Hartono, F. Chedin, Recognition of RNA by the S9.6 antibody creates pervasive artifacts when imaging RNA:DNA hybrids. J Cell Biol 220,  (2021).

      (26) L. A. Sanz, F. Chedin, High-resolution, strand-specific R-loop mapping via S9.6-based DNARNA immunoprecipitation and high-throughput sequencing. Nat Protoc 14, 1734-1755 (2019).

      (27) M. Merkenschlager, D. T. Odom, CTCF and cohesin: linking gene regulatory elements with their targets. Cell 152, 1285-1297 (2013).

      (28) M. Li, K. A. Jurado, S. Lin, A. Engelman, R. Craigie, Engineered hyperactive integrase for concerted HIV-1 DNA integration. PLoS One 9, e105078 (2014).

      (29) M. Li et al., A Peptide Derived from Lens Epithelium-Derived Growth Factor Stimulates HIV1 DNA Integration and Facilitates Intasome Structural Studies. J Mol Biol 432, 2055-2066 (2020).

  2. Oct 2024
    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      […] Strengths:

      The study has several important strengths: (i) the work on GDA stability and competition of GDA with point mutations is a very promising area of research and the authors contribute new aspects to it, (ii) rigorous experimentation, (iii) very clearly written introduction and discussion sections. To me, the best part of the data is that deletion of lon stimulates GDA, which has not been shown with such clarity until now.

      Weaknesses:

      The minor weaknesses of the manuscript are a lack of clarity in parts of the results section (Point 1) and the methods (Point 2).

      We thank the reviewer for their comments and suggestions on our manuscript. We also appreciate the succinct summary of key findings that the Reviewer has taken cognisance of in their assessment, in particular the association of the Lon protease with the propensity for GDAs as well as its impact on their eventual fate. Going ahead, we plan to revise the manuscript for greater clarity as suggested by Reviewer #1.

      Reviewer #2 (Public review):

      […] The study does what any bold and ambitious study should: it contains large claims and uses multiple sorts of evidence to test those claims.

      Weaknesses:

      While the general argument and conclusion are clear, this paper is written for a bacterial genetics audience that is familiar with the manner of bacterial experimental evolution. From the language to the visuals, the paper is written in a boutique fashion. The figures are even difficult for me - someone very familiar with proteostasis - to understand. I don't know if this is the fault of the authors or the modern culture of publishing (where figures are increasingly packed with information and hard to decipher), but I found the figures hard to follow with the captions. But let me also consider that the problem might be mine, and so I do not want to unfairly criticize the authors.

      For a generalist journal, more could be done to make this study clear, and in particular, to connect to the greater community of proteostasis researchers. I think this study needs a schematic diagram that outlines exactly what was accomplished here, at the beginning. Diagrams like this are especially important for studies like this one that offer a clear and direct set of findings, but conduct many different sorts of tests to get there. I recommend developing a visual abstract that would orient the readers to the work that has been done.

      Next, I will make some more specific suggestions. In general, this study is well done and rigorous, but doesn't adequately address a growing literature that examines how proteostasis machinery influences molecular evolution in bacteria.

      While this paper might properly test the authors' claims about protein quality control and evolution, the paper does not engage a growing literature in this arena and is generally not very strong on the use of evolutionary theory. I recognize that this is not the aim of the paper, however, and I do not question the authors' authority on the topic. My thoughts here are less about the invocation of theory in evolution (which can be verbose and not relevant), and more about engagement with a growing literature in this very area.

      The authors mention Rodrigues 2016, but there are many other studies that should be engaged when discussing the interaction between protein quality control and evolution.

      A 2015 study demonstrated how proteostasis machinery can act as a barrier to the usage of novel genes: Bershtein, S., Serohijos, A. W., Bhattacharyya, S., Manhart, M., Choi, J. M., Mu, W., ... & Shakhnovich, E. I. (2015). Protein homeostasis imposes a barrier to functional integration of horizontally transferred genes in bacteria. PLoS genetics, 11(10), e1005612

      A 2019 study examined how Lon deletion influenced resistance mutations in DHFR specifically: Guerrero RF, Scarpino SV, Rodrigues JV, Hartl DL, Ogbunugafor CB. The proteostasis environment shapes higher-order epistasis operating on antibiotic resistance. Genetics. 2019 Jun 1;212(2):565-75.

      A 2020 study did something similar: Thompson, Samuel, et al. "Altered expression of a quality control protease in E. coli reshapes the in vivo mutational landscape of a model enzyme." Elife 9 (2020): e53476.

      And there's a new review (preprint) on this very topic that speaks directly to the various ways proteostasis shapes molecular evolution:

      Arenas, Carolina Diaz, Maristella Alvarez, Robert H. Wilson, Eugene I. Shakhnovich, C. Brandon Ogbunugafor, and C. Brandon Ogbunugafor. "Proteostasis is a master modulator of molecular evolution in bacteria."

      I am not simply attempting to list studies that should be cited, but rather, this study needs to be better situated in the contemporary discussion on how protein quality control is shaping evolution. This study adds to this list and is a unique and important contribution. However, the findings can be better summarized within the context of the current state of the field. This should be relatively easy to implement.

      We thank the reviewer for their encouraging assessment of our manuscript. We appreciate that the manuscript may not be accessible for a general readership in its present form. We plan to revise the manuscript, in part by modifying figures and adding schematics, to afford greater clarity. We also appreciate the concern regarding situating this study in the context of other published work that relates proteostasis and molecular evolution. Indeed, this was a particularly difficult aspect for us given the different kinds of literature that were needed to make sense of our study. We plan on revising the manuscript by incorporating the references that the Reviewer has pointed out.

      Reviewer #3 (Public review):

      […] Strengths:

      The major strength of this paper is identifying an example of antibiotic resistance evolution that illustrates the interplay between the proteolytic stability and copy number of an antibiotic target in the setting of antibiotic selection. If the weaknesses are addressed, then this paper will be of interest to microbiologists who study the evolution of antibiotic resistance.

      Weaknesses:

      Although the proposed mechanism is highly plausible and consistent with the data presented, the analysis of the experiments supporting the claim is incomplete and requires more rigor and reproducibility. The impact of this finding is somewhat limited given that it is a single example that occurred in a lon strain and compensatory mutations for evolved antibiotic resistance mechanisms are described. In this case, it is not clear that there is a functional difference between the evolution of copy number versus any other mechanism that meets a requirement for increased "expression demand" (e.g. promoter mutations that increase expression and protein stabilizing mutations).

      We thank the reviewer for their in-depth assessment of our work and appreciate their concerns regarding reproducibility and rigor in analysis of our data. We will incorporate this feedback and provide the necessary clarifications in the revised version of our manuscript.

    1. Author Response:

      We would like thank reviewers for your comprehensive and insightful reviews of our manuscript. We highly value your constructive comments and suggestions and are preparing revisions that will enhance both the clarity and robustness of our study. Below is an outline of the changes we will implement in response to the points you raised.

      All three reviewers expressed concerns regarding the robustness of our conclusions about the relationship between task-related theta activity and aperiodic changes. We will revise the manuscript to present these conclusions more cautiously, stating that the findings indicate a potential contribution of aperiodic activity to what is traditionally interpreted as theta activity. While our results emphasize the importance of distinguishing between periodic and aperiodic components, further research is necessary to fully understand this relationship. We will conduct additional control analyses, including a comparison of the scalp topographies of theta and aperiodic components, to better understand the relationship between aperiodic and periodic (theta) activity.

      In response to Reviewer #1's request for greater transparency in our reporting of methodological details, we will provide key clarifications. We will add a clear statement noting that the primary results are based on data from middle-aged to older adults, some of whom had subjective cognitive complaints (SCC). However, it is important to note that no differences were observed between the SCC group and the control group regarding periodic or aperiodic changes in power. Additionally, the main findings were replicated in a sample of middle-aged adults.

      To address potential confounding factors, we will include an analysis contrasting response-related ERPs with the identified aperiodic components. However, we do not entirely agree with the assertion that this will necessarily clarify the results. ERPs are not inherently distinct from aperiodic (or periodic) activity; they may reflect changes in aperiodic (or periodic) power. In our view, examining aperiodic and periodic power, ERPs, or time-frequency decomposition with baseline correction provides different perspectives on the same data. Nonetheless, the combined analyses and their results are intended to guide future researchers toward the most suitable approach for interpreting this data.

      Reviewer #3 raised concerns regarding the task's effectiveness in evoking theta power and the ability of spectral parameterization method (specparam) to adequately quantify background activity around theta bursts. To address these concerns, we will include additional visualizations demonstrating that the task reliably elicited theta (and delta) activity. Regarding the reviewer's concerns about specparam and theta bursts, it is important to clarify that specparam, in the form we used, does not incorporate time information; rather, it can be applied to any power spectral density (PSD), independent of how the PSD is derived. Specparam’s performance depends on the methods used to estimate frequency content. For time-frequency decomposition, we employed superlets (https://doi.org/10.1038/s41467-020-20539-9), which have been shown to resolve short bursts of activity more effectively than other methods. To our knowledge, superlets provide the highest resolution in terms of both time and frequency. Moreover, to improve stability, we performed spectral parameterization on trial-averaged power (in contrast to the approach in https://doi.org/10.7554/eLife.77348). Nonetheless, we will conduct a simulation to test whether specparam can reliably resolve low-frequency peaks over the 1/f activity.

      Reviewer #2 suggested that the manuscript would benefit from a more detailed account of the effects. In response, we will include more detailed quantifications of the analyzed effects, such as model error and R² values.

      We believe that the planned revisions will strengthen the manuscript and address the primary concerns raised by the reviewers. We sincerely appreciate your thoughtful feedback and look forward to submitting an improved version of the manuscript soon.

      Once again, thank you for your time and expertise in reviewing our work.

      Sincerely,

      Andraž Matkovič & Tisa Frelih

    1. Author response:

      The following is the authors’ response to the original reviews.

      We greatly appreciate reviewer 2 comments with both insightful and clearly evaluated assessments of this study that include, much appreciated reframing and evaluation of the study’s advances in the sleep field. It is a constructive review and provides considerable added value to this study in better defining the biological significance of the findings, including both advances and limitations.  

      Reviewer 2 nicely summarized the work as “…highlight(ing) the accumulation and resolution of sleep need centered on the strength of excitatory synapses onto excitatory neurons.”. The reviewer succinctly placed one of the main electrophysiological findings in context of one of the sleep field’s most prevalent views, “that LTP associated with wake, leads to the accumulation of sleep need by increasing neuronal excitability, and by the "saturation" of LTP capacity.” It has been speculated that “This saturation subsequently impairs the capacity for further ongoing learning. This new data provides a satisfying mechanism of this saturation phenomenon (and its restoration by recovery sleep) by introducing the concept of silent synapses.” We want to emphasize that sleep need and its resolution involves more than just homeostasis of excitatory synaptic strength but may also be extended to include homeostasis of excitatory synaptic potential to undergo LTP (a homeostasis of meta-plasticity), with implications for learning and memory.   

      Reviewer 2 also identified another advance made by this study, summarized as, “The new snRNAseq dataset indicates the sleep need is primarily seen (at the transcriptional level) in excitatory neurons, consistent with a number of other studies.” References for these studies are nicely provided by the reviewer. Our analysis of this data extends the evidence for transcriptional sleep-need-driven changes, observed by us and others in excitatory neurons to more particularly involve the excitatory neurons in layers 2-5, targeting  intra-telencephalic neurons.  

      Reviewer 2, importantly noted, “New snRNAseq analysis indicates that SD drives the expression of synaptic shaping components (SSCs) consistent with the excitatory synapse as a major target for the restorative basis of sleep function”, and that “SD-induced gene expression is also enriched for autism spectrum disorder (ASD) risk genes”. These comments are well appreciated as they emphasize that beyond identification of the major target cell type of sleep function, the major sleep-target, gene-ontological characteristics are starting to be addressed.

      Reviewer 2 commented on the molecular sleep model, making a key observation that “SDinduced gene expression in excitatory neurons overlaps with genes regulated by the transcription factor MEF2C and HDAC4/5 (Figure 4),” and accurately discusses the significance with respect to the proposed model.

      We are in complete agreement with the observation that the molecular sleep model presented is not “definitively supported by the new data and in this regard should be viewed as a perspective…”. One of the more glaring gaps in supporting evidence is the absence of understanding of the role of HDAC4/5 (part of the SIK3-HDAC4/5 pathway) in sleep need modulation of excitatory synapses. Resolution of this issue might be approached by assessment of the synaptic effects of constitutively nuclear HDAC4/5. The current study provides a first step in the assessment by showing a correlation between HDAC4/5 and MEF2c target genes and a subset of differentially expressed synaptic shaping component (SSC) genes that modulate excitatory synapse strength and phenotype. However, the functional studies have yet to be completed. Complimentary studies on SD-induced SSC-DEGs (identified in this study) are also needed for follow-up characterization of their sleep need induced functional impact (both strength and meta-plasticity modulation) on the most relevant excitatory synapses (as identified in the current study).

      We agree with both reviewers 1 and 2 that, “Additional work is also needed to understand the mechanistic links between SIK3-HDAC4/5 signaling and MEF2C activity”. Reviewer 2 clarifies the key unresolved issue as, “cnHDAC4/5 suppresses NREM amount and NREM SWA but had no effect on the NREM-SWA increase following SD (Zhou et al., Nature 2022). Loss of MEF2C in CaMKII neurons had no effect on NREM amount and suppressed the increase in NREM-SWA following SD (Bjorness et al., 2020)”. One may conclude with reviewer 2, “These instances indicate that cnHDAC4/5 and loss of MEF2C do not exactly match suggesting additional factors are relevant in these phenotypes.”

      An understanding of the mechanism(s) responsible for the relationship between sleep need and SWA are critical to the evaluation of sleep need’s correlation with sleep DEGs and synaptic transmission, including “additional factors” as suggested by reviewer 2. SWA might result from a decrease of cortical glutamatergic neurotransmission below some threshold, which might occur in response to prolonged waking (possibly in response to waking activity-induced local increases of adenosine?), rather than being a cause of, or, being intimately involved in resolving sleep need.  

      An increase of SWA in association with SD can result directly from an acute SD-induced increase in local adenosine concentration. This will elicit an ADORA1-mediated down-regulation of glutamate excitatory neurotransmission in the cortex (Bjorness et al., 2016) and in cholinergic arousal centers (Rainnie et al., 1994; Porkka-Heiskanen et al., 1997; Portas et al., 1997; Li et al., 2023). When MEF2c is derepressed by chronic loss of HDAC4 function, SWA is facilitated (Kim et al., 2022). It is plausible that loss of HDAC4 function contributes to the increased SWA by downscaling glutamate excitatory transmission (independent of sleep need). This is expected to result from derepressed, MEF2c mediated sleep-gene expression.  

      Similarly, over-expression of constitutively active HDAC4 (cnHD4) can contribute to chronic upscaling of cortical glutamate synaptic strength to depress SWA (again, independent of sleep need). Thus, facilitation or depression of SWA correlates with up or down scaling effects on cortical glutamate neurotransmission, respectively, even in the absence of  a direct effects on sleep need (Figure 4D). Many reagents that reduce the excitability of glutamate pyramidal cells by various mechanisms, including anesthetics like isoflurane, barbiturates or benzodiazepines in addition to those activating ADORA1, increase SWA. Finally, it is important to acknowledge that direct evidence for this proposed link of SWA to cortical glutamate transmission remains in need of further investigation. Thus, SWA may reflect generalized cortical glutamate synaptic activity whether modulated by sleep function or by other agents.

      Still, other factors that can have a role mediating some of the mis-match between cnHD4/5 DEGs and Mef2c-cKO DEGs, include the broader over-expression of AAV-cnHD4 compared to CamKII- driven Cre KO of Mef2c. The cnHD4 overexpression can increase arousal center activity in the hypothalamus and other arousal areas to interfere with SWA, but not to the exclusion of SD-DEG repression resulting from a repression of MEF2c-mediated sleep gene expression.

      The critique by reviewer 1 raises a number of important technical issues with this study. A key, potentially critical issue raised by reviewer 1, is that of our method of experimental sleep deprivation (ESD). The reviewer suggests that “…neuronal activity/induction of plasticity”, peculiar to the ESD methodology employed in this study, “…rather than sleep/wake states are responsible for the observed results…”.  

      In this study, a slow-moving treadmill (SMTM; 0.1km/hour, as stated in the methods), requiring locomotion to avoid bumping into the backwall of a false bottomed plexiglass cage was used to induce ESD. A mouse, in its home cage, typically moves much faster than 0.1km/hour and the mouse is able to eat and drink freely while in the cage (see file: video 1). Furthermore, our observations using a beam-break cage, indicate that mice spontaneously travel for comparable to longer distances over 6 hours than the treadmill moves (during the ESD of 6 hours). Finally, our EEG recordings of mice on the active treadmill show 100% waking while it is on (Bjorness et al., 2009), whereas prevention of NREM sleep (including transition time) using the “gentle handling”  (GH) technique occurs depending on the diligence of the experimenter.  

      The accommodation (one week prior to ESD) included exposure to the treadmill-on for 30minutes ~ZT=2 & ZT= 14 hours (now spelled out in the “Materials & Methods” section). Thus, the likelihood of motor learning seems vanishingly small.  

      As with all ESD methods, there must be some associated increase in sensory and motor neuronal activity to drive arousal and prevent transition to sleep. For example, the more widely employed GH method of ESD involves sensory stimulation (tactile and or auditory) of sufficient intensity to induce postural change from that associated with sleep to that associated with wake (often involving some locomotion). Like the SMTM, both sensory and motor systems are likely to be engaged. Unlike the SMTM method, the stimulation used in GH is variably-intermittent from mouse to mouse and from experimenter to experimenter as it is applied only when the experimenter judges the mouse to be falling asleep. . It can even be argued that the varied and unpredictable ways in which these interactions happen cause plastic changes with a higher likelihood than the constant slow motion of a treadmill – the mice know how to walk, after all. In other protocols, novel objects are introduced to the animals – those will certainly trigger plastic processes –something that is avoided using a slow-running treadmill to which the mouse has been accommodated, for sleep deprivation.  

      The changes induced by SMTM technique are reproducible and induce arousal by somatic stimulation of sufficient intensity to induce natural motor activity as with GH. All ESD methods induce motor activity and it is reasonable to speculate that induced, motor activity is essential for effective ESD for the prolonged durations (>4 hours in mice) that elicit high sleep need. Electrophysiological assessment of SD-evoked increases in mEPSC amplitude and frequency using GH-ESD (Liu et al., 2010) are similar in all respects to our observations of the response to SMTMESD (Bjorness et al., 2020). Further studies might directly address a comparison of SMTM-ESD to GH-ESD as suggested by reviewer 1 but are regrettably outside the scope and resources of our study.

      The model presented in Figure 4C is consistent with the experimental findings with respect to the observed electrophysiological changes (including loss of silent synapses and increased AMPA/NMDA ratio after ESD of 6 hours) and altered gene expression that includes enrichment of SSC genes, many of which (7 candidates are listed) can affect both AMPA/NMDA ratio and silent synapses. No claim of mechanism linking the changed expression to altered AMPAR or NMDAR activity can be made at this point, even as to polarity of gene expression, related to electrophysiological outcome. Furthermore, some transcripts may involve receptor trafficking while others more directly affect activated receptor function. To help illustrate the complexity of interpreting gene up-regulation, consider the following hypothetical scenario. If a gene like upregulated Grin3a acts rapidly, it may facilitate reduction of NMDAR function (decreasing plasticity) during ESD, whereas upregulation of a gene like Kif17, if acting in a more delayed manner, might enhance NMDAR surface expression and activity (increasing silent synapses) in response to ESD, during recovery sleep. Relevant references, consistent with these various outcomes are supplied in the manuscript but further investigation is clearly needed, or as reviewer 2 so aptly commented, this work “…provides a framework to stimulate further research and advances on the molecular basis of sleep function”.  

      Several issues are raised by reviewer 1 concerning the electrophysiological methodology and statistical assessment. In regard to the former, we closely followed established protocols employed in the frontal neocortex (Myme et al., 2003). We did not include the details for series resistance monitoring. Series resistance values ranged between 8 and 15 MOhm and experiments with changes larger than 25% not used for further analyses. Thank you for bringing this  oversight on our part, to our attention. This essential information, that is unfailingly gathered for all our whole cell recordings, is now added to the version of record.

      The -90 mV holding potential was chosen according to precedent (Myme et al., 2003). It increases driving force and permits lower stimulus strength for the same response size – reducing the likelihood for polysynaptic responses. Experiments with multiple response peaks at -90 mV were not included in the analysis. The -90 mV holding potential also increases NMDA receptor Mg++ block resulting in a minimally contaminated AMPA response. This information is now added to our submitted version of record.

      The statistical assessments shown in Table 1 refer to two sets of data measured from 3X2=6 different cohorts for each sleep condition (CS, SD, RS): 1) AMPA & NMDA EPSCs and 2) AMPA/NMDA FR ratios (FRR; now bolded in row 1, second tab, Table S1). As stated in the results section, “A two-way ANOVA analysis showed a significant interaction between AMPA matched to NMDA EPSC response for each neuron, and sleep condition (F (2, 21) = 7.268, p<0.004; Figure 1 A, C, E). When considered independently, neither the effect of sleep condition nor of EPSC subtype reached significance at p<0.05 (Figure 1 C)”.  

      As noted by reviewer 1, we inadvertently dropped one of the data points from the RS FR and FR ratio (FRR) statistical analysis (raw data in the third tab of Table S1, statistical data in fourth and fifth tab and illustrated in figure 1 F). Thanks to this appreciated, rigorous review, we can correct the oversight (using raw data unchanged in Table S1, third tab). The Table S1 and figure 1 F are now corrected for the version of record. For better clarity, we now use two tabs, the fourth and fifth tabs, respectively of Table S1, for separate stat analyses of FR and FRR data.

      The significance of the AMPA/NMDA FRR across sleep conditions was assessed with the KruskalWallis test, a non-parametric method. The two-stage linear step-up procedure of Benjamini, Krieger, and Yekutieli (BKY) was used to control for the FDR across multiple sleep conditions, in the non-parametric Kruskal-Wallis test but it is usually less powerful than tests presuming normal distributions like the one-way ANOVA and Holm-Sidak’s test. We have now added re-analyzed  FRR across CS, SD and RS conditions using a normal one-way ANOVA (Table S1, tab5). The results now read, “The difference between  sleep conditions and FRR is significant (F (2, 19) = 11.3, Table S1, tab5). Multiple comparisons (Holm-Sidak, Table S1, tab5) indicate the near absence of silent synapses was reversed by either CS or RS (SD/CS; p<0.0011 and SD/RS: p<0.0006; Table S1, tab 5; Figure 1 F).”. These analyses compare well to the non-parametric assessment using the  KruskalWallis test (significant at p= 0.0006) with BYK correction for multiple comparison analysis to give for CS-SD, p<= 0.0262 and for RS-SD, p<= 0.0006 (statistics also shown in Table S1, tab5). [Also shown in tab5 is the “standard approach of correcting for family wise error rate”, namely, Dunn’s test. It is more conservative but less powerful than the BYK correction- in general the tradeoff of greater power/ less conservative is better tolerated when many comparisons are made, however, it can be argued that in the present analysis type 2 errors are also potentially misleading and thus not well tolerated.]  The modifications of our statistical analyses, inspired by reviewer 1,  did not affect the interpretation of the data nor the conclusions.  

      Bjorness TE, Kelly CL, Gao T, Poffenberger V, Greene RW (2009) Control and function of the homeostatic sleep response by adenosine A1 receptors. The Journal of neuroscience : the official journal of the Society for Neuroscience 29:1267-1276.

      Bjorness TE, Dale N, Mettlach G, Sonneborn A, Sahin B, Fienberg AA, Yanagisawa M, Bibb JA, Greene RW (2016) An Adenosine-Mediated Glial-Neuronal Circuit for

      Homeostatic Sleep. The Journal of neuroscience : the official journal of the Society for Neuroscience 36:3709-3721.

      Bjorness TE, Kulkarni A, Rybalchenko V, Suzuki A, Bridges C, Harrington AJ, Cowan CW, Takahashi JS, Konopka G, Greene RW (2020) An essential role for MEF2C in the cortical response to loss of sleep in mice. Elife 9.

      Kim SJ et al. (2022) Kinase signalling in excitatory neurons regulates sleep quantity and depth. Nature 612:512-518.

      Li B, Ma C, Huang YA, Ding X, Silverman D, Chen C, Darmohray D, Lu L, Liu S, Montaldo G, Urban A, Dan Y (2023) Circuit mechanism for suppression of frontal cortical ignition during NREM sleep. Cell 186:5739-5750 e5717.

      Liu ZW, Faraguna U, Cirelli C, Tononi G, Gao XB (2010) Direct evidence for wake-related increases and sleep-related decreases in synaptic strength in rodent cortex. The Journal of neuroscience : the official journal of the Society for Neuroscience 30:8671-8675.

      Myme CI, Sugino K, Turrigiano GG, Nelson SB (2003) The NMDA-to-AMPA ratio at synapses onto layer 2/3 pyramidal neurons is conserved across prefrontal and visual cortices. Journal of neurophysiology 90:771-779.

      Porkka-Heiskanen T, Strecker RE, Thakkar M, Bjorkum AA, Greene RW, McCarley RW (1997) Adenosine: a mediator of the sleep-inducing effects of prolonged wakefulness. Science 276:1265-1268.

      Portas CM, Thakkar M, Rainnie DG, Greene RW, McCarley RW (1997) Role of adenosine in behavioral state modulation: a microdialysis study in the freely moving cat. Neuroscience 79:225-235.

      Rainnie DG, Grunze HC, McCarley RW, Greene RW (1994) Adenosine inhibition of mesopontine cholinergic neurons: implications for EEG arousal. Science 263:689692.

    1. Author response

      We appreciate the positive comments and constructive suggestions from the editors and reviewers, which will help us improve our manuscript. We will implement the changes as requested by the reviewers, focusing primarily on revising and clarifying the following aspects:

      First, we will clarify the use of biological and technical replicates in each experiment and provide more details about the statistical analyses conducted. Additionally, we plan to include a schematic representation of the experimental design.

      Second, we will explain the experiment conducted to rule out hormonal effects or differences in the oocyte maturation method used. We will also indicate the concentration of OVGP1 in the oviduct and explain why we selected OVGP1 as the probable cause of species specificity.

      Third, by addressing all of the reviewers' suggestions, we aim to resolve any concerns, inconsistencies, or minor errors identified by the reviewers.

      We are committed to addressing all the issues raised by the reviewers and believe that the manuscript will greatly benefit from the insightful suggestions and invaluable contributions of the editors and reviewers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The paper begins with phenotyping the DGRP for post-diapause fecundity, which is used to map genes and variants associated with fecundity. There are overlaps with genes mapped in other studies and also functional enrichment of pathways including most surprisingly neuronal pathways. This somewhat explains the strong overlap with traits such as olfactory behaviors and circadian rhythm. The authors then go on to test genes by knocking them down effectively at 10 degrees. Two genes, Dip-gamma and sbb, are identified as significantly associated with post-diapause fecundity, and they also find the effects to be specific to neurons. They further show that the neurons in the antenna but not the arista are required for the effects of Dip-gamma and sbb. They show that removing the antenna has a diapause-specific lifespan-extending effect, which is quite interesting. Finally, ionotropic receptor neurons are shown to be required for the diapause-associated effects.

      Strengths and Weaknesses:

      Overall I find the experiments rigorously done and interpretations sound. I have no further suggestions except an ANOVA to estimate the heritability of the post-diapause fecundity trait, which is routinely done in the DGRP and offers a global parameter regarding how reliable phenotyping is. A minor point is I cannot find how many DGRP lines are used.

      Thank you for the suggestions. We screened 193 lines and we will add that information to the methods. Additionally, we will add the heritability estimate of the post-diapause fecundity trait.

      Reviewer #2 (Public Review):

      Summary

      In this study, Easwaran and Montell investigated the molecular, cellular, and genetic basis of adult reproductive diapause in Drosophila using the Drosophila Genetic Reference Panel (DGRP). Their GWAS revealed genes associated with variation in post-diapause fecundity across the DGRP and performed RNAi screens on these candidate genes. They also analyzed the functional implications of these genes, highlighting the role of genes involved in neural and germline development. In addition, in conjunction with other GWAS results, they noted the importance of the olfactory system within the nervous system, which was supported by genetic experiments. Overall, their solid research uncovered new aspects of adult diapause regulation and provided a useful reference for future studies in this field.

      Strengths:

      The authors used whole-genome sequenced DGRP to identify genes and regulatory mechanisms involved in adult diapause. The first Drosophila GWAS of diapause successfully uncovered many QTL underlying post-diapause fecundity variations across DGRP lines. Gene network analysis and comparative GWAS led them to reveal a key role for the olfactory system in diapause lifespan extension and post-diapause fecundity.

      Weaknesses:

      (1) I suspect that there may be variation in survivorship after long-term exposure to cold conditions (10ºC, 35 days), which could also be quantified and mapped using genome-wide association studies (GWAS). Since blocking Ir21a neuronal transmission prevented flies from exiting diapause, it is possible that natural genetic variation could have a similar effect, influencing the success rate of exiting diapause and post-diapause mortality. If there is variation in this trait, could it affect post-diapause fecundity? I am concerned that this could be a confounding factor in the analysis of post-diapause fecundity. However, I also believe that understanding phenotypic variation in this trait itself could be significant in regulating adult diapause.

      We agree that it is possible that the ability to endure cool temperatures per se may influence post-diapause fecundity. However, cool temperature is the essential diapause-inducing condition in Drosophila, so it is not obvious how to separate those effects experimentally, and we agree that phenotypic variation in the cool-sensitivity trait itself could be significant in regulating diapause.

      (2) On p.10, the authors conclude that "Dip-𝛾 and sbb are required in neurons for successful diapause, consistent with the enrichment of this gene class in the diapause GWAS." While I acknowledge that the results support their neuronal functions, I remain unconvinced that these genes are required for "successful diapause". According to the RNAi scheme (Figure 4I), Dip-γ and sbb are downregulated only during the post-diapause period, but still show a significant effect, comparable to that seen in the nSyb Gal4 RNAi lines (Figure 4K).

      Our definition of successful diapause is the ability to produce viable adult progeny post-diapause, which requires that the flies enter, maintain, and exit diapause, alive and fertile. We will restate our conclusion to say that Dip-γ and sbb are required for post-diapause fecundity.

      In addition, two other RNAi lines (SH330386, 80461) that did not show lethality did not affect post-diapause fecundity.

      We interpret those results to mean that those RNAi lines were not effective since Dip-γ and sbb are known to be essential.

      Notably, RNAi (27049, KK104056) substantially reduced non-diapause fecundity, suggesting impairment of these genes affects fecundity in general regardless of diapause experience. Therefore, the reduced post-diapause fecundity observed may be a result of this broader effect on fecundity, particularly in a more "sensitized" state during the post-diapause period, rather than a direct regulation of adult diapause by these genes.

      Ubiquitous expression of RNAi lines #27049 or #KK104056 was lethal, so we included the tubGAL80ts repressor to prevent RNAi from taking effect during development. Flies had to be shifted to 30 °C to inactivate the repressor and thereby activate the RNAi. At 30 °C, fecundity of the controls (GFP RNAi lines #9331, KK60102) were also lower (average non-diapause fecundity = 12 and 19 respectively) and similar to #27049 or #KK104056. We also assessed the knockdown using Repo GAL4 and nSyb GAL4 and did not find a significant difference/decline in the non diapause fecundity for #27049 and #KK104056 as compared to a nonspecific RNAi control (#54037).

      (3) The authors characterized 546 genetic variants and 291 genes associated with phenotypic variation across DGRP lines but did not prioritize them by significance. They did prioritize candidate genes with multiple associated variants (p.9 "Genes with multiple SNPs are good candidates for influencing diapause traits."), but this is not a valid argument, likely due to a misunderstanding of LD among variants in the same gene. A gene with one highly significantly associated variant may be more likely to be the causal gene in a QTL than a gene with many weakly associated variants in LD. I recommend taking significance into account in the analysis.

      We agree with the reviewer, and in Supplemental Table S3 we list top-associated SNPs in order from the lowest (most significant) p-value. Most of the top-associated genes from this analysis were uncharacterized CG numbers for which there were insufficient tools available for validation purposes. Nevertheless, there is overlap amongst the highly significant genes by p-value and those with multiple SNPs. Amongst the top 15 genes with multiple associated SNPs- CG18636 & CR15280 ranked 3rd by p-value, CG7759 ranked 4th, CG42732 ranked 10th, and Drip ranked 30th (all above the conservative Bonferroni threshold of 4.8e-8) while three Sbb-associated SNPs also appear in Table 3 above the standard e-5 threshold.

      Reviewer #3 (Public Review):

      Summary:

      Drosophila melanogaster of North America overwinters in a state of reproductive diapause. The authors aimed to measure 'successful' D. melanogaster reproductive diapause and reveal loci that impact this quantitative trait. In practice, the authors quantified the number of eggs produced by a female after she exited 35 days of diapause. The authors claim that genes involved with olfaction in part contribute to some of the variation in this trait.

      Strengths:

      The work used the power platform of the fly DRGP/GWAS. The work tried to verify some of the candidate loci with targeted gene manipulations.

      Weaknesses:

      Some context is needed. Previous work from 2001 established that D. melanogaster reproductive diapause in the laboratory suspends adult aging but reduces post-diapause fecundity. The work from 2001 showed the extent fecundity is reduced is proportional to diapause duration. As well, the 2001 data showed short diapause periods used in the current submission reduce fecundity only in the first days following diapause termination; after this time fecundity is greater in the post-diapause females than in the non-diapause controls.

      The 2001 paper by Tatar et al. reports the number of eggs laid after 3, 6, or 9 weeks in diapause conditions. Thus the diapause conditions used in this study (35 days or 5 weeks) are neither short nor long, rather intermediate. Does the reviewer have a specific concern?

      In this context, the submission fails to offer a meaningful concept for what constitutes 'successful diapause'. There is no biological rationale or relationship to the known patterns of post-diapause fecundity. The phenotype is biologically ambiguous.

      We have unambiguously defined successful diapause as the ability to produce viable adult progeny post-diapause. Other groups have measured % of flies that arrest ovarian development or % of post-diapause flies with mature eggs in the ovary, or # eggs laid post-diapause; however we suggest that # of viable adult progeny produced post-diapause is more meaningful than the other measurements from the point of view of perpetuating the species.

      I have a serious concern about the antenna-removal design. These flies were placed on cool/short days two weeks after surgery. Adults at this time will not enter diapause, which must be induced soon after eclosion. Two-week-old adults will respond to cool temperatures by 'slowing down', but they will continue to age on a time scale of day-degrees. This is why the control group shows age-dependent mortality, which would not be seen in truly diapaused adults. Loss of antennae increases the age-dependent mortality of these cold adults, but this result does not reflect an impact on diapause.

      We carried out the lifespan study under two different conditions. We either removed the antenna and moved the flies directly to 10 °C or we removed the antenna and allowed a “wound healing” period prior to moving the flies to 10 °C (out of concern that the flies might die quickly because wound healing may be impaired at 10 °C). In both cases, antenna removal shortened lifespan. Furthermore the lifespan extension at 10 °C was similar regardless of whether flies had experienced two weeks at 25 °C or not.

      • Appraisal of whether the authors achieved their aims, and whether the results support their conclusions.

      The work falls well short of its aim because the concept of 'successful diapause' is not biologically established. The paper studies post-diapause fecundity, and we don't know what that means. The loci identified in this analysis segregate for a minimally constructed phenotype. The results and conclusions are orthogonal.

      It is unclear to us why the reviewer has such a negative opinion of measuring post-diapause fecundity, specifically the ability to produce viable progeny post-diapause. The value of this measurement seems obvious from the point of view of perpetuating the species.

      • The likely impact of the work on the field, and the utility of the methods and data to the community.

      The work will have little likely impact. Its phenotype and operational methods are weakly developed. It lacks insight based on the primary literature on post-diapause. The community of insect diapause investigators are not likely to use the data or conclusions to understand beneficial or pest insects, or the impact of a changing climate on how they over-winter.

      The reviewer has not explained why his/her opinion is so negative.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Perform an ANOVA to estimate heritability.

      We will do this.

      (2) List the number of DGRP lines tested.

      193

      Reviewer #2 (Recommendations For The Authors):

      [Minor suggestions]

      (1) Check Drosophila italics

      We will do this.

      (2) It would be informative to include the number of DGRP lines used in this study in the Results and Methods section.

      We will include the information that we assessed 193 DGRP lines.

      (3) Figure 1C - several dots are missing at the top of the line.

      We will correct.

      (4) Figures 1E, F - Why use a discontinuous histogram for continuous distribution? Consider using a continuous histogram (e.g. Lafuente et al. (2018) Figure 1C).

      We will do this.

      (5) Figure 1F - Why have fewer bins than panel E?

      Figure 1F is normalized post-diapause fecundity. Individual post-diapause fecundity was normalized to the mean non-diapause fecundity. Then the normalized individual post-diapause fecundity was averaged to get the mean normalized post-diapause fecundity for the DGRP line. So the bins are different in panel E. Please refer to Supplemental Table S1.

      (6) Figure 2D - It would be informative to have fold enrichment stats.

      The following will be added in the methods section: The Gene Ontology (GO) categories and Q-values from the false discovery rate (FDR)-corrected hypergeometric test for enrichment are reported. Additionally, coverage ratios for the number of annotated genes in the displayed network versus the number of genes with that annotation in the genome are provided. GeneMANIA estimates Q-values using the Benjamini-Hochberg procedure.

      (7) Supplementary table (Table S5) or supplemental table (other supplementary tables)? Need consistency (to Supplementary?)

      We will change ‘Supplementary Table S5’ to ‘Supplemental Table S5’.

      (8) Figure 5D,E - unused ticks on the x-axis.

      The unused ticks on the x-axis will be removed from Figures 5D and E.

      Reviewer #3 (Recommendations For The Authors):

      • Suggestions for improved or additional experiments, data or analyses.

      The authors cannot redo the GWAS with an alternative trait that might better reflect 'successful diapause', and I am not even sure what such a trait would involve or mean. Given this limitation, the authors should consider how they can conduct additional experiments to better define, justify, and elaborate how post-diapause reproduction relates to the mechanisms, processes, depth, and 'success' of diapause.

      We agree that it is entirely unclear what trait would be a better measure of successful diapause. Other investigators might have chosen to measure something different but there is no reason why a different choice would be a better choice. We do not believe that this is a “limitation.” We believe that we have unambiguously defined and justified  post-diapause reproduction as a measurement of successful diapause with respect to perpetuating the species through a stressful period.

      • Recommendations for improving the writing and presentation.

      The mechanics of the writing are fine, aside from some typos/grammar issues. But, the paper is conceptually superficial and tautological. It claims to provide a 'stringent criterion' for 'successful diapause', then measures an unjustified trait, then claims this demonstrates variation for 'successful diapause'.

      We respectfully disagree with this opinion.

      This story is conducted without reference to prior, primary literature or on the mechanisms of reproductive diapause. The presentation may be improved by considering the literature and precedence for what and how reproductive diapause is induced, maintained, and terminated ... in many insects as well as Drosophila

      We will revisit our citations of the literature and apologize for any inadvertent omissions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      In our initial submission, reviewers highlighted that the major limitations of our study were related to both the number of minibinders tested as well as the number of optimizations we evaluated for improving minibinder function. In this revision, we have focused on expanding the minibinders tested. To do so, we selected two previously published minibinders against the epidermal growth factor receptor (EGFR). Selection of EGFR as a target enabled us to evaluate two minibinders that bind at different sites, unlike the previously evaluated binders LCB1 and LCB3 which both bind the same interface on SARS-CoV-2 Spike. Further, using EGFR as a target enabled us to qualitatively compare the efficacy of minibinder-coupled chimeric antigen receptors against an existing anti-EGFR CAR. We believe the results here demonstrate broader generalizability of our approach across binding sites, targets, and minibinders. We hope this addition is sufficient to convince future would-be users of these tools to attempt synthetic receptor engineering using minibinders against their protein of choice.

      Reviewers made comments about the presentation of flow data and the use of statistics throughout the manuscript. We did not modify how flow data are presented as the density plots we used are common throughout the field. We have opted to not include statistics – we believe that in the case of most of the experiments we show, our findings are obvious. In cases where statistics would be helpful for discerning whether subtle effects are real – for example, comparing the linker-based optimizations or comparing the anti-EGFR CARs – we believe that other experimental factors like construct expression are sufficient confounds that even in the presence of statistically significant effects we would be leading readers astray to make such claims about our data. As such, we have sought to limit the claims we make and hope that reviewers and audience agree we do not over interpret our data without statistical support.

      On more minor points, both reviewers addressed the differences in Figure 5A and 5C, which we addressed in our figure legend and in the previous response to reviews is the result of these data originating from different time points of the same assay. Reviewer #2 believed we should be more staid in our comments about linker optimality, which we have addressed by changing the referenced line in the discussion. Otherwise, we have made no modifications to figures or text beyond the addition of new data.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We addressed the issue of “tolerability” in our answers to Reviewer 2 and in the revised manuscript where we had added data concerning tolerability, see the paragraph in the Results Section, page 11:

      "Finally, tolerability studies were performed with the administration of up to 20 and 40 mg/kg eq. NT (i.e. 25.8 and 51.6 mg/kg of VH-N412) with n=3 for these doses. The rectal temperature of the animals did not fall below 32.5 to 33.2°C, similar to the temperature induced with the 4 mg/kg eq. NT dose. We observed no mortality or notable clinical signs other than those associated with the rapid HT effect such as a decrease in locomotor activity. We thus report a very interesting therapeutic index since the maximal tolerated dose (MTD) was > 40 mg/kg eq. NT, while the maximum effect is observed at a 10x lower dose of 4 mg/kg eq. NT and an ED50 established at 0.69 mg/kg as shown in Figure 1G.”

      We have slightly modified the paragraph above to emphasize that the tolerability studies were performed in “naïve mice”. 

      "Finally, tolerability studies were performed in naïve mice with the administration of up to 20 and 40 mg/kg eq. NT (i.e. 25.8 and 51.6 mg/kg of VH-N412) with n=3 for these doses. The rectal temperature of the animals did not fall below 32.5 to 33.2°C, similar to the temperature induced with the 4 mg/kg eq. NT dose. We observed no mortality or notable clinical signs other than those associated with the rapid HT effect such as a decrease in locomotor activity. We thus report a very interesting therapeutic index since the maximal tolerated dose (MTD) was > 40 mg/kg eq. NT, while the maximum effect is observed at a 10x lower dose of 4 mg/kg eq. NT and an ED50 established at 0.69 mg/kg as shown in Figure 1G.”

      We propose to add a sentence in the Results section, page 11, relative to the fact that we can also induce severe hypothermia in rats using conjugates similar to VH-N412.

      We also added in the Discussion section (page 38) that we could induce hypothermia with different conjugates in mice, rats and pigs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) Some of the figures are of rather poor quality. For example, the H&E and Sirius Red stainings in Figures 3 and 4 are quite poor so it is difficult to see what is going on in the muscles. The authors should take note of another publication on dy3K/dy3K mice of similar age (PMID: 31586140) where such images are of much higher quality. Similarly, the Western blot for laminin-alpha2 (Figure 4B) of the wild-type mouse needs improvement. If the single laminin-alpha2 protein is not detected, there is an issue with the denaturation buffer used to load the protein.

      Thank you for the valuable suggestions. We have read the study on dy3K/dy3K mice of similar age (PMID: 31586140) which showed dystrophic changes in dy3K/dy3K muscle throughout the disease course with the whole muscle and representative muscle area. We have generated new figures with higher quality including the whole muscle and representative muscle area for the H&E and Sirius Red stainings.  However, due to the large images, we have added them in the new Figure supplement 2 and Figure supplement 3. Also, we have changed the denaturation buffer used to load the protein, and performed Western blot of laminin α2, the result of the laminin α2 protein of the wild-type mice (n =3) and dyH/dyH mice (n =3) detected by Western blot has been showed in Figure 4B.

      (2) My biggest concern is, however, the many overstatements in the manuscript and the over-interpretation of the data. This already starts with the first sentence in the abstract where the authors write: "Understanding the underlying pathogenesis of LAMA2- related muscular dystrophy (LAMA2-MD) have been hampered by lack of genuine mouse model." This is not correct as the dy3K/dy3K, generated in 1997 (PMID: 9326364), are also Lama2 knockout mice; there are also other strains (dyW/dyW mice) that are severely affected and there are the dy2J/dy2J mice that represent a milder form of LAMA2-MD. Similarly, the last two sentences of the abstract "This is the first reported genuine model simulating human LAMA2-MD. We can use it to study the molecular pathogenesis and develop effective therapies." are a clear overstatement. The mechanisms of the disease are well studied and the above-listed mouse models have been amply used to develop possible treatment options. The overinterpretation concerns the results from transcriptomics. The fact that Lama2 is expressed in particular cell types of the brain does not at all imply that Lama2 knockout mice have a defect in the blood-brain barrier as the authors state. If there are no functional data, this cannot be stated. Indications for a blood-brain barrier defect come from work in dy3K/dy3K mice (PMID: 25392494) and this needs to be written like this.

      Thank you for your comment and sorry for the overstatements in the manuscript. We have carefully considered our previous statements and corrected them accordingly. We have changed the first sentence in the abstract into "Our understanding of the molecular pathogenesis of LAMA2-related muscular dystrophy (LAMA2-MD) requires improving". Also, we have changed the last two sentences in the abstract with "In summary, this study provided useful information for understanding the molecular pathogenesis of LAMA2-MD".

      We also agree that "Lama2 is expressed in particular cell types of the brain does not at all imply that Lama2 knockout mice have a defect in the blood-brain barrier", and the indications for a blood-brain barrier defect come from work in dy3K/dy3K mice (PMID: 25392494). Therefore, we have corrected the overstatement according to the suggestion with "It was reported that the deficiency of laminin α2 in astrocytes and pericytes was associated with a defective blood-brain barrier (BBB) in the dy3K/dy3K mice (Menezes et al., 2014). The defective BBB presented with altered integrity and composition of the endothelial basal lamina, reduced pericyte coverage, and hypertrophic astrocytic endfeet lacking appropriately polarized aquaporin4 channels."

      (3) Finally, the bulk RNA-seq data also needs to be presented in a disease context. The authors, again, mix up changes in expression with functional impairment. All gene expression changes are interpreted as direct evidence of an involvement of the cytoskeleton. In fact, changes in the cytoskeleton are more likely a consequence of the severe muscle phenotype and the delay in muscle development. This is particularly possible as muscle samples from 14-day-old mice are compared; a stage at which muscle still develops and grows tremendously. Thus, all the data need to be interpreted with caution.

      Thank you for your comment. We have changed the over-interpretation of the bulk RNA-seq data, and have corrected the last sentence in the Result with "These observations important data for the impaired muscle cytoskeleton and abnormal muscle development which were associated with the muscle pathology consequence of severe dystrophic changes in the dyH/dyH mice.".

      (4) In summary, the authors need to improve data presentation and, most importantly, they need to tone down the interpretation and they must be fully aware that their work is not as novel as they present it.

      Thank you for your comments and valuable suggestions, and we have changed the previous overstatements and interpretation of the results. We are sorry that we failed to clearly present our rational of making this mouse model. Indeed, there were many existing mouse models, which were all important to the research in the field. One of the reasons why we wished to create dyH/dyH is to make a mouse model without any trace of engineering (e.g., inserted bacterial elements for knockout). By doing so, we were hoping to provide a novel model suited for gene-editing-based gene therapy development. To this end, dyH/dyH was created to reflect the hot mutation region in the Chinese population. Hopefully, you will agree with our points and see that we were not trying to belittle previous models but were simply trying to provide a different option. The overstatements were largely rooted from language barriers, and we have tried to make our statements more cautious and acceptable to the readers.

      Reviewer #2 (Public Review):

      (1) The major weakness is the manuscript reads like this was the first-ever knockout mouse model generated for LAMA2-CMD. There are in fact many Lama2 knockout mice (dy, dy2J, dy3k, dyW, and more) which have all been extensively studied with publications. It is important for the authors to comment on these other published studies that have generated these well-studied mouse lines. Therefore, there is a lack of background information on these other Lama2 null mice.

      Thank you for your comment. We have added background information on these other Lama2 null mice with the sentences "The most common mouse models for LAMA2-MD are the dy/dy, dy3k/dy3k, dyw/dyw and dy2J/dy2J mice (Xu et al., 1994; Michelson et al., 1995; Miyagoe et al., 1997; Kuang et al., 1998; Sunada et al., 1995). Among them, the dy/dy, dy3k/dy3k, dyw/dyw mice present severe muscular dystrophy, and dy2J/dy2J mice show mild muscular dystrophy and peripheral neuropathy (Gawlik and Durbeej, 2020). The mutation of the dy/dy mice has been still unclear (Xu et al., 1994; Michelson et al., 1995). The dy3k/dy3k mice were generated by inserting a reverse Neo element in the 3' end of exon 4 of Lama2 gene in 1997 (Miyagoe et al., 1997), and the dyw/dyw mice were created with an insertion of lacZ-neo in the exon 1 of Lama2 gene in 1998 (Kuang et al., 1998). The dy2J/dy2J mice were generated in 1970 by a spontaneous splice donor site mutation which resulted in a predominant transcript with a 171 base in-frame deletion, leading to the expression of a truncated laminin α2 with a 57 amino acid deletion (residues 34-90) and a substitution of Gln91Glu (Sunada et al., 1995). They were established in the pre-gene therapy era, leaving trace of engineering, such as bacterial elements in the Lama2 gene locus, thus unsuitable for testing various gene therapy strategies. Moreover, insufficient transcriptomic data of the muscle and brain of LAMA2-CMD mouse models limits the understanding of disease hallmarks. Therefore, there is a need to create new appropriate mouse models for LAMA2-CMD based on human high frequently mutated region using the latest gene editing technology such as clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9."

      (2) The phenotypes of dyH/dyH are similar to, if not identical to dy/dy, dy2J/dy2J, dy3k/dy3k, dyW/dyW including muscle wasting, muscle weakness, compromised blood-brain barrier, and reduced life expectancy. This should be addressed, and a comparison made with Lama2 deficient mice in published literature.

      Thank you for your comment. We have added Table supplement 3 to make a comparison between dyH/dyH with other Lama2 deficient mice. We aslo have added the statement in Discussin with "Compared with other Lama2 deficient mice including dy/dy, dy2J/dy2J, dy3k/dy3k and dyW/dyW, the phenotype of the dyH/dyH mice presented with a very severe muscular dystrophy, which was similar to that of the dy3k/dy3k mice (Table supplement 3)."

      (3) Recent published studies (Chen et al., Development (2023), PMID 36960827) show loss of Itga7 causes disruption of the brain-vascular basal lamina leading to defects in the blood-brain barrier. This should be referenced in the manuscript since this integrin is a major Laminin-211/221 receptor in the brain and the mouse model appears to phenocopy the dyH/dyH mouse model.

      Thank you for your great suggestion. We have cited the published studies (Chen et al., Development (2023), PMID 36960827) and added statements in Discussion with "As reported, the aberrant BBB function was also associated with the adhesion defect of alpha7 integrin subunit in astrocytes to laminins in the Itga_7-/- mice (_Chen et al., 2023). In this study, loss of communications involving the laminins’ pathway between laminin α2 and integrins were predicted between vascular and leptomeningeal fibroblasts and astrocytes in the dyH/dyH brain, providing more evidence for the impaired BBB due to laminin α2 deficiency."

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Improve the data presentation (as mentioned above). Make a new picture of the histology; repeat the Western blots. Discuss the RNA-seq data with more caution and present it in a more attractive way. Tone down the wording.

      Thank you for your recommendations. We have revised the overstatements and improved the RNA-seq data interpretation as suggested. Also,we have made a new picture of the histology, and repeated the Western blots.

      Reviewer #2 (Recommendations For The Authors)

      (1) There are many grammatical errors within the manuscript. The manuscript should be carefully proofread.

      Thank you for your recommendations. We have carefully corrected the grammatical errors within the manuscript.

      (2) Figure 2: The animal numbers used in this analysis were not indicated. Please include this number in the figure legend.

      Thank you for your recommendations. We have added animal numbers in the figure legends wherever applicable.

      (3) Figure 2: The forelimb grip strength is informative but has limitations. Ex vivo or in vivo muscle contractility is the gold standard for measuring muscle strength.

      Thank you for your recommendations. We do agree that the ex vivo or in vivo muscle contractility is the gold standard for measuring muscle strength, and we really want to finish this experiment. However, we feel sorry that this test has not been finished due to the following reasons: (1) The forelimb grip strength for measuring muscle strength is a classic method and remains a commonly used method for measuring mouse muscle strength in the studies of different muscular dystrophies, such as LAMA2-MD (Amelioration of muscle and nerve pathology of Lama2-related dystrophy by AAV9-laminin-αLN linker protein. JCI Insight. 2022;7(13):e158397. PMID: 35639486), Duchenne muscular dystrophy (Investigating the role of dystrophin isoform deficiency in motor function in Duchenne muscular dystrophy. J Cachexia Sarcopenia Muscle. 2022;13(2):1360-1372. PMID: 35083887), facioscapulohumeral muscular dystrophy (Systemic delivery of a DUX4-targeting antisense oligonucleotide to treat facioscapulohumeral muscular dystrophy. Mol Ther Nucleic Acids. 2021;26:813-827. PMID: 34729250), and etc. (2) The forelimb grip strength for measuring muscle strength is also used in the human studies (PMID: 32366821; PMID: 29313844; PMID: 34499663, and etc). In view of reasons above, for measuring muscle strength, we used the forelimb grip strength, and have not finished the supplementary experiment of ex vivo or in vivo muscle contractility.

      (4) Figure 3: Muscle fibrosis should be measured with a hydroxyproline assay.

      Thank you for your recommendations. We do agree that the hydroxyproline assay is one of the most classic method to evaluate collagen content for measuring muscle fibrosis. However, we performed Sirius Red staining for measuring muscle fibrosis due to the following reasons: (1) Muscle fibrosis measured by Sirius Red staining can be observed more directly, and the other pathological features also can be observed, and compared through muscle pathology. (2) Sirius Red staining is also a classic method and remains a commonly used method for measuring muscle fibrosis, which has been previously reported in the mouse studies of muscle disorders, such as PMID: 22522482 (Losartan, a therapeutic candidate in congenital muscular dystrophy: studies in the dy(2J) /dy(2J) mouse. Ann Neurol. 2012;71(5):699-708.), PMID: 34337906 (Aging-related hyperphosphatemia impairs myogenic differentiation and enhances fibrosis in skeletal muscle. J Cachexia Sarcopenia Muscle. 2021;12(5):1266-1279.), PMID: 28798156 (Phosphodiesterase 4 inhibitor and phosphodiesterase 5 inhibitor combination therapy has antifibrotic and anti-inflammatory effects in mdx mice with Duchenne muscular dystrophy. FASEB J. 2017;31(12):5307-5320.), and etc. Therefore, we used Sirius Red staining to measure muscle fibrosis in this study.

      (5) Figure 8: The N=3 is very low which could result in type I or II statistical errors. A larger sample size will reduce the chance of statistical errors.

      Thank you for your recommendations. We have increased the number of animals to reduce the chance of statistical errors. We have performed the supplementary experiment, the number of animals for each group has been increased to 6 (3 male and female each).  The results were consistent with previous data in Figure 8.

      (6) Power analysis to estimate experimental animal numbers should be reported in the manuscript.

      Thank you for your recommendations. Refer to previous study (Power and sample size. Nature Methods. 2013;10:1139–1140), “The distributions show effect sizes d = 1, 1.5 and 2 for n = 3 and α = 0.05. Right, power as function of d at four different a values for n = 3”, and “If we average seven measurements (n = 7), we are able to detect a 10% increase in expression levels (μ_A = 11, _d = 1) 84% of the time with α = 0.05.”, the experimental animal numbers estimated were 3 to 7. Moreover, if the increased number of experimental animals could be available, we would retain data.

      (7) It is unclear if the studies were performed with adequate rigor. Were those scoring outcome measures blinded to the treatment groups?

      Thank you for your recommendations. We performed the studies with those scoring outcome measures not blinded to the treatment groups, the groups were based on their genotype. Actually, it was easy to discriminate the dyH/dyH groups from the WT/Het mice due to their small body shape.

      (8) Authors should appropriately cite previous studies that have generated Lama2 null mice.

      Thank you for your recommendations. We have cited previous studies that have generated Lama2 null mice with the sentence “The most common mouse models for LAMA2-MD are the dy/dy, dy3k/dy3k, dyw/dyw and dy2J/dy2J mice (Xu et al., 1994; Michelson et al., 1995; Miyagoe et al., 1997; Kuang et al., 1998; Sunada et al., 1995)”.

      (9) The number of animals should be increased to reduce the chance of statistical error.

      Thank you for your recommendations. We have performed the supplementary experiment, the number of animals for each group has been increased to reduce the chance of statistical error.

      (10) A power analysis should be performed to determine the number of experimental animals.

      Thank you for your recommendations. We have performed a power analysis to determine the number of experimental animals as mentioned above.

      (11) There are many grammatical errors within the manuscript. The manuscript should be carefully proofread.

      Thank you for your recommendations. We have carefully corrected the grammatical errors within the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1:

      (1) General comment: The evidence for these highly novel, potentially interesting roles (of the exocyst) would need to be more compelling to support direct involvement.

      We wish to thank the reviewer for his/her comments, and for considering that the proposed functions are highly novel and potentially interesting. To strengthen the evidence supporting the new roles of the exocyst, we have performed a number of additional experiments that are depicted in novel figures or figure panels of the new version of the manuscript. Particularly, we aimed at providing further support of the direct involvement of the exocyst in different steps of the regulated secretory pathway. Please see the details below.

      (2) For instance, the localization of exocyst to Golgi or to granule-granule contact sites does not seem substantial.

      We have performed quantitative colocalization studies, as suggested by the reviewer to further substantiate our initial findings. We have carefully analysed GFP-Sec15 distribution in relation to the Golgi complex and secretory Glue granules at relevant time points of salivary gland development. Overall, we found that GFP-Sec15 distribution is dynamic during salivary gland development. Before Glue synthesis (72 h AEL), Sec15 was observed in close association (defined as a distance equal to, or less than 0.6 µm) with the Golgi complex (please see below Author response image 1). This association was lost once Glue granules have begun to form (96 h AEL). Importantly, we do not see relevant association between GFP-Sec15 and the ER (please see Author response image 2). These observations support our conclusion that the exocyst plays a role at the Golgi complex. New images supporting these conclusions, as well as quantitative data, have been included in Figure 5 of the new version of the manuscript. In addition, real time imaging, as well as 3D reconstruction analyses, confirming the close association between Sec15 and Golgi cisternae are now included in the manuscript. Please see Supplementary Videos 1-3. These new data are described in the text lines 200-210 of the Results section and text lines 359368 of the Discussion section.

      Interestingly, at the time when Sec15-Golgi association is lost (96 h AEL), Sec15 foci associate instead with newly formed secretory granules (< 1µm diameter). This association persists during secretory granule maturation (100-116 h AEL), when Sec15 foci localize specifically in between neighbouring, immature secretory granules. When maturation has ended and Glue granule exocytosis begins (116-120 h AEL), this localization between granules is lost. These observations are consistent with a role of the exocyst in homotypic fusion during SG maturation. We have included new images showing that association between Sec15 and secretory granules is dynamic and depends on the developmental stage. We have quantified this association both during maturation and at a stage when SGs are already mature. We have in addition performed a 3D reconstruction analysis of these images to confirm the close association between Sec15 and immature SGs. These new data are now depicted in Figure 7BC, Supplementary Videos 4-5, and described in text lines 216-221 of the Results section. In addition, a lower magnification image is provided below in this letter (Author response image 3), quantifying the proportion of Sec15 foci localized in between SGs (yellow arrows) relative to the total number of Sec15 foci (yellow arrows + green arrowheads).

      Author response image 1.

      Criteria utilized to define Sec15 focithat were“associated” or“not associated” withthe trans-Golgi network in the experiments of Figure 5C-E of the manuscript.When the distance between maximal intensities of GFP-Sec15 and Golgi-RFP signals was equal or less than 0.6 m, the signals were considered “associated” (upper panels). When the distance was more than 0.6 m, the signals were considered “not associated” (lower panels).

      Author response image 2.

      Criteria utilized to define Sec15 focithat were“associated” or“not associated” withthe ERin the experiments of Figure 5A-Bof the manuscript.When the distance between maximal intensities of GFP-Sec15 and KDEL-RFP signals was equal or less than 0.6 m, the signals were considered “associated”. When the distance was more than 0.6 m, the signals were considered “not associated”.

      Author response image 3.

      (A) GFP-Sec15 foci (cyan) and SGs (red) are shown in cells bearing Immature SGs or (B) with mature SGs. Yellow arrows indicate GFP-Sec15 foci localized in between SGs; green arrowheads indicate GFP-Sec15 foci that arenot in between SGs. (C) Quantification of the percentage (%) of Sec15 foci localized in between SGs respect to the total number of Sec15 foci in cells filled with immature SGs (ISG)vs cells with mature SGs (MSG).

      It is interesting to mention that previous evidence from mammalian cultured cells (Yeaman et al,  2001) show that the exocyst localizes both at the trans-Golgi network and at the plasma membrane, weighing in favour of our claim that the exocyst is required at various steps of the exocytic pathway. Thus, the exocyst may play multiple roles in the secretion pathway in other biological models as well. This concept has now been included at the Discussion section of the revised version of the manuscript (lines 359-368).

      To make the conclusions of our work clearer, in the revised version of the manuscript, we have now included a graphical abstract, summarizing the dynamic localization of the exocyst in relation to the processes of SG biogenesis, maturation and exocytosis reported in our work. 

      (3) Instead, it is possible that defects in Golgi traffic and granule homotypic fusion are not due to direct involvement of the exocyst in these processes, but secondary to a defect in canonical exocyst roles at the plasma membrane. A block in the last step of glue exocytosis could perhaps propagate backward in the secretory pathway to disrupt Golgi complexes or cause poor cellular health due to loss of cell polarity or autophagy.

      We thank the reviewer for these thoughtful comments. We have performed a number of additional experiments to assess “cellular health” or to identify possible defects in cell polarity after knock-down of exocyst subunits. These new data have been included in new supplementary figures 5 and 6 of the revised version of the manuscript (please see below). 

      In our view, the precise localization of GFP-Sec15 at the Golgi complex (Figure 5C-E), as well as in between immature secretory granules (Figure 7B-D), argues in favour of a direct involvement of the exocyst in SG biogenesis and homofusion respectively. 

      We truly appreciate the comment of the reviewer raising the possibility that the defects that we observe at early steps of the pathway (SG biogenesis and SG maturation) may actually stem from a backward effect of the role of the exocyst in SG-plasma membrane tethering. We wish to respectfully point out that the processes of biogenesis, maturation and plasma membrane tethering/fusion of SGs do not occur simultaneously in the Drosophila larval salivary gland in vivo, as they do in other secretory model systems (i.e. cell culture). In this regard, the experimental model is unique in terms of synchronization. In each cell of the salivary gland, the three processes (biogenesis, maturation and exocytosis) occur sequentially, and controlled by developmental cues. At the developmental stage when SGs fuse with the plasma membrane, SG biogenesis has already ceased many hours earlier: SG biogenesis occurs at 96-100 hours after egg lay (AEL), SG maturation takes place at 100-112 hours AEL, and SG-plasma membrane fusion happens only when all SGs have undergone maturation and are ready to fuse with the plasma membrane at 116-120 h AEL. Thus, in our view it is not conceivable that a defect in SG-plasma membrane tethering/fusion (116-120 h AEL) may affect backwards the processes of SG biogenesis or SG maturation, which have occurred earlier in development (96-112 h AEL).

      As suggested by the reviewer, we have analysed several markers of cellular health and cell polarity, comparing conditions of exocyst subunit silencing (exo70RNAi, sec3RNAi or exo84RNAi) with wild type controls (whiteRNAi). These new data are depicted in Supplementary Figures 5 and 6, and described in lines 172-179 of the Results section of the revised version of the manuscript. Noteworthy, for these experiments we have applied silencing conditions that block secretory granule maturation, bringing about mostly immature SGs. Our analyses included: 1) Subcellular distribution of PI(4,5)P2, 2) subcellular distribution of the tetraspanin CD63, 3) of Rab11, 4) of filamentous actin, and 5) of CD8. We have also compared 6) nuclear size and nuclear general morphology, 7) the number and distribution of mitochondria, 8) morphology and subcellular distribution of the cis- and 9) trans-Golgi networks. Finally, 10) we have compared basal autophagy in salivary cells with or without knocking down exocyst subunits. The markers that we have analysed behaved similarly to those of control salivary glands, suggesting that the observed defects in regulated exocytosis indeed reflect different roles of the exocyst in the secretory pathway, rather than poor cellular health or impaired cell polarity.  

      Our conclusions are in line with previous studies in which apico-basal polarity, Golgi complex morphology and distribution, as well as apical membrane trafficking were also evaluated in exocyst mutant backgrounds, finding no anomalies (Jafar-Nejad et al, 2005). 

      Conversely, in studies in which apical polarity was disturbed by interfering with Crumbs levels, SG biogenesis, maturation and exocytosis were not affected (Lattner et al, 2019), indicating that these processes not necessarily interfere with one another.  

      (4) Final recommendation: In the absence of stronger evidence for these other exocyst roles, I would suggest focusing the study on the canonical role (interesting, as it was previously reported that Drosophila exocyst had no function in the salivary gland and limited function elsewhere [DOI: 10.1034/j.1600-0854.2002.31206.x]), and leave the alternative roles for discussion and deeper study in the future.  

      We appreciate the reviewer´s recommendation. However, we believe that the major strength of our work is the discovery of non-canonical roles of the exocyst complex, unrelated to its function as a tethering complex for vesicle-plasma membrane fusion. We believe that in the new version of our manuscript, we provide stronger evidence supporting the two novel roles of the exocyst:

      a) Its participation in maintaining the normal structure of the Golgi complex, and b) Its function in secretory granule maturation.

      Reviewer 2:

      (5) General comment: A key strength is the breadth of the assays and study of all 8 exocyst subunits in a powerful model system (fly larvae). Many of the assays are quantitated and roles of the exocyst in early phases of granule biogenesis have not been ascribed. 

      We are grateful that the reviewer appreciates the novelty of our contribution.

      (6) However there are several weaknesses, both in terms of experimental controls, concrete statements about the granules (better resolution), and making a clear conceptual framework. Namely, why do KD of different exocysts have different effects on presumed granule formation

      The reviewer has raised a point that is central to the interpretation of all our data throughout the manuscript. The short answer is that the extent of RNAi-dependent silencing of exocyst subunits determines the phenotype: 

      1) Maximum silencing affects Golgi complex morphology and prevents SG biogenesis. 2) Intermediate silencing blocks SG maturation, without affecting Golgi complex morphology and SG biogenesis. 3) Weak silencing blocks SG tethering and fusion with the plasma membrane, without affecting Golgi complex morphology, SG biogenesis or SG maturation. 

      In other words, 1) Low levels of exocyst subunits are sufficient for normal Golgi complex morphology and SG biogenesis. 2) Intermediate levels of exocyst subunits are sufficient for SG maturation (and also sufficient for SG biogenesis). 3) High levels of exocyst subunits are required for SG tethering and subsequent fusion with the plasma membrane. 

      Based on the above notion, we have exploited the fact that temperature can fine-tune the level of Gal4/UAS-dependent transcription, thereby achieving different levels of silencing, as shown by Norbert Perrimon et al in their seminal paper “the level of RNAi knockdown can also be altered by using Gal4 lines of various strengths, rearing flies at different temperatures, or via coexpression of UAS-Dicer2” (Perkins et al, 2015). 

      We found in our system that indeed, by applying appropriate silencing conditions (RNAi line and temperature) to any of the eight subunits of the exocyst, we have been able to obtain one of the three alternative phenotypes: Impaired SG biogenesis, or impaired SG maturation, or impaired SG tethering/fusion with the plasma membrane.

      These concepts are summarized below in Author response image 4. Please see also at point 26, the general comment of Reviewer #3. 

      We have conducted qRT-PCR assays to provide experimental support to the notions summarized above in Author response image 4. We measured the remaining levels of mRNAs of some of the exocyst subunits, after inducing RNAi-mediated silencing at different temperatures, or with different RNAi transgenic lines. The remaining RNA levels after silencing correlate well with the observed phenotypes, following the predictions of Author response image 4 and summarized in Author response image 5. These new data are now shown in Supplementary Figure 2 of the revised version of the manuscript, and described in lines 153-159 at the Results section.

      (7) Why does just overexpression of a single subunit (Sec15) induce granule fusion?

      The reviewer raises a very important point. Based on available data from the literature, Sec15 behaves as a seed for assembly of the holocomplex and it also mediates the recruitment of the holocomplex to SGs through its interaction with Rab11 (Escrevente et al, 2021; Bhuin and Roy, 2019; Wu et al, 2005; Zhang et al, 2004; Guo et al, 1999). Thus, overexpression of Sec15 is expected to enhance exocyst assembly, thereby potentiating the activities carried out by the complex in the cell, including SG homofusion. In the revised version of the manuscript we have also performed the overexpression of Sec8, finding that, unlike Sec15, Sec8 fails to induce homotypic fusion. These results were expected, as they confirm that Sec8 does not behave as a seed for mounting the whole complex. These new data have been included in Figure 7E-H, and are described in text lines 221-229 of the Results section. 

      Author response image 4.

      Conceptual model of RNAi expression at different temperatures , remaining levels of mRNA/protein levels and phenotypes obtained at each temperature.

      Author response image 5.

      qRT-PCR assays presented in Supplementary Figure 2 are shown in combination with the phenotypes observed at each of the conditions analyzed. Note the correlation between phenotypes and the extent of mRNA downregulation.

      (8) While the paper is fascinating, the major comments need to be addressed to really be able to make better sense of this work, which at present is hard to disentangle direct vs. secondary effects, especially as much of the TGN seems to be altered in the KDs.  

      We hope that our response to point 6) has helped to clarify this important point raised by the Reviewer. After applying silencing conditions where normal structure of the trans-Golgi network is impaired, SG biogenesis does not occur. Thus, since SGs do not form, it is not conceivable to detect defects in SG maturation or SG fusion with the plasma membrane in the same cell.

      (9) The authors conveniently ascribe many of the results to the holocomplex, but their own data (Fig. 4 and Fig. 6) are at odds with this.

      This is another central point of our work, so we thank the reviewer for his/her comment. In Figures 4A, 7A and 9A of the revised version of the manuscript, we show that, by inducing appropriate levels of silencing of any of the 8 subunits of the exocyst, each of the three alternative phenotypic manifestations can occur. In our opinion, this argues in favour of a function for the whole exocyst complex in each of the three specific activities proposed in our study: 1) SG biogenesis, 2) SG maturation, and 3) SG tethering/fusion with the plasma membrane. In detailed characterizations of these three phenotypes performed throughout the study, we decided to induce silencing of just two or three of the subunits of the exocyst, assuming that the whole complex accounts the mechanisms involved.

      Major comments

      (10) Resolution not sufficient. Identification of "mature secretory granules" (MSG) in Fig. 3 is based on low-resolution images in which the MSG are not clearly seen (see control in Fig. 3A) and rather appear as a diffuse haze, and not as clear granules. There may be granules here, but as shown it is not clear. Thus it would be helpful to acquire images at higher resolution (at the diffraction limit, or higher) to see and count the MSG.

      We thank the reviewer for raising this point, as it may not be straightforward to the reader to identify the SGs throughout the figures of our study. To make it clearer, in Figure 3A (magnified insets on the right), we have delimitated individual SGs with a green dotted line, and included diagrams (far right), which we hope will help the identification of SGs. In Figure 3B, we show that after silencing Sec84, a mosaic phenotype was observed: In some cells SGs fail to undergo maturation, and remain smaller than normal. In other cells of this mosaic phenotype, biogenesis of SGs was impaired and the fluorescent cargo remained trapped in a mesh-like structure (that we later show that corresponds to the ER). The dotted line marks individual SGs, and the diagrams included on the right intend to help the interpretation of the phenotype. The mesh-like structures where Sgs3-GFP was retained are also marked with dotted line, and schematized on the right. These new schemes are described in the Figure 3 caption of the revised version of the manuscript.

      We wish to mention that all the confocal images depicted in this figure and throughout the manuscript  have been captured at high resolution, with a theoretical resolution limit of 168177nm (d = γ/2NA). Given that secretory granules range from 0.8-7µm in diameter, the resolution is more than sufficient to clearly resolve these structures. 

      (11) Note: the authors are not clear on which objective was used. Maybe the air objective as the resolution appears poor).  

      In this particular figure, we have utilized a Plan-Apochromat 63X/1.4NA oil objective of the inverted Carl Zeiss LSM 880 confocal microscope (mentioned in materials and methods).

      (12) They need to prove that the diffuse Sgs3-GFP haze is indeed due to MSG.  

      If we interpret correctly the concern of the reviewer, what he/she calls “diffuse haze” is actually the distribution of Sgs3-GFP within individual SGs, which, as previously reported by other authors, is not homogeneous at this stage (Syed et al. 2022). We hope that the diagrams that we have included in Figure 3 A, B (point 10) will help the readers interpreting the images.   

      (13) Related it is unclear what are the granule structures that correspond to Immature secretory granules (ISG) and cells with mesh-like structures (MLS)?

      We are confident that the diagrams now included in Figure 3A and B will help the interpretation, and particularly to identify immature granules and the mesh-like structure generated after silencing of exocyst subunits.

      (14) Similarly, Sgs3 images of KD of 8 exocyst subunits were interpreted to be identical, in Fig. 4, but the resolution is poor.

      We hope that the issue related to resolution of our images has been properly addressed in the response to point 10) of this letter. In Figure 4A, we show that after silencing of any of the 8 subunits (with the appropriate conditions), in all cases SG biogenesis was impaired, and Sgs3GFP was instead retained in a mesh-like structure. Images obtained after silencing different exocyst subunits are of course not identical, but in all cases, a mesh-like structure has replaced the formation of SGs (Figure 4A). Hopefully, the diagrams now included in Figure 3A and B help the correct interpretation of the phenotypes throughout the study.

      To demonstrate that the structure in which Sgs3-GFP was retained upon exocyst complex knockdown corresponds to the ER, we performed a colocalization analysis between Sgs3-GFP and the ER markers GFP-KDEL or Bip-sfGFP-HDEL, after which we calculated the Pearsons Coefficient, which indicated substantial colocalization (Figure 4B-G and Supplementary Figures 7 and 8). These new data are described in lines 196-199 of the revised version of the manuscript. To facilitate the visualization of the results, in the revised version of the manuscript we have included magnified cropped areas of the images shown in Figure 4A.

      (15) What is remarkable is a highly variable effect of different subunit KD on the percentage of cells with MLS (Fig. 4C). Controls = 100 %, Exo70=~75% (at 19 deg), Sec3 = ~30%, Sec10 = 0%, Exo84 = 100% ... This is interesting for the functional exocyst is an octameric holocomples, thus why the huge subunit variability in the phenotypes? The trivial explanation is either: i) variable exocyst subunit KD (not shown) or ii) variability between experiments (no error bars are shown). Both should be addressed by quantification of the KD of different proteins and secondly by replicating the experiments.

      We agree with the reviewer statement. We believe that both, variability of KD efficiency (i) and variability between experiments (ii) contribute to the variable effect observed after knocking down the different subunits. As detailed in the response to point 6), we have performed qRT-PCR determinations to confirm that the severity of the phenotype depends on the efficiency of RNAimediated silencing. We chose to analyse in detail the effect on the subunits exo70 and sec3, which were those with the highest phenotypic differences between the three silencing temperatures utilized. We found that as expected, the levels of silencing were temperaturedependent, being higher at 29°C and lower at 19°C. These data were included in Supplementary Figure 2, and described lines 153-159 of the Results section and also summarized in Author response images 4 and 5 of this rebuttal letter.

      We thank the reviewer for his/her comment on the replication of experiments and statistics. We failed to include detailed numerical information in the original submission, such as the number of replicas and standard deviations of the data depicted in Figure 3C and Supplementary Figure 1, so we apologize for this omission. In the revised version of the manuscript, we have included a table (Supplementary Table 3) in which all the raw data of Figure 3C and Supplementary Figure 1, including standard deviations, are now depicted.

      (16) If their data holds up then the underlying mechanism here needs to be considered.

      (Note: there is some precedent from the autophagy field of differential exocyst effects)

      Our proposed mechanism is essentially that the holocomplex is required for multiple processes along the secretory pathway. Each of these actions (Golgi structure maintenance, SG maturation and SG tethering/fusion with the plasma membrane) requires different amounts of holocomplex activity, being this the reason why each phenotype manifests at different levels of RNAi-mediated silencing (Author response image 4 of this letter). The model predicts that Golgi structure maintenance requires minimal levels of complex activity, and that is why strong knock-down of exocyst subunits is required to obtain this phenotype. In line with our results, it has been reported that other tethering complexes of the CATCHR family are also required for maintaining Golgi cisternae stuck together (D'Souza et al, 2020; Khakurel and Lupashin, 2023; Liu et al, 2019). One possibility is that the exocyst may play a redundant role in the maintenance of the normal structure of the Golgi complex, along with other CATCHR complexes. This potential redundancy could explain why severe exocyst knock-down is required to observe structural anomalies at this organelle. On the other end of the spectrum, we propose that tethering/fusion with the plasma membrane is very susceptible to even slight reduction of complex activity, so that mild RNAi-mediated silencing is sufficient to provoke defects in this process. This proposed model is depicted in Author response image 4 and discussed in lines 395-405 of the Discussion section. 

      (17) In the salivary glands the authors state that the exocyst is needed for Sgs3-GFP exit from the ER. First, Pearson's coefficient should be shown so as to quantitate the degree of ER localizations of all KDs.

      We thank the reviewer for this comment that helped us to strengthen the observation that when SG biogenesis is impaired, Sgs3-GFP remains trapped in the ER. In the revised version of the manuscript, we have calculated Pearson´s coefficient to assess colocalization between ER markers (GFP-KDEL or Bip-sfGFP-HDEL) and Sgs3-GFP in salivary gland cells that express sec15RNAi. The Pearson’s coefficient was around 0.6 for both ER markers, indicating that colocalization with Sgs3-GFP was substantial (Supplementary Figure 8, text lines 196-199 of the Results section).

      (18) Second, there should be some rescue performed (if possible) to support specificity. 

      As suggested by the reviewer, we have performed a rescue experiment of the phenotype provoked by the expression of sec15 RNAi, which consisted on the retention of Sgs3-GFP in the endoplasmic reticulum: Expression of Sec15-GFP reverted substantially the ER retention phenotype, rescuing SG biogenesis and also SG maturation in most cells (over 60% of the cells). These new data are now shown in Supplementary Figure 4, and described in lines 168-171 of the Results section.

      (19) Third, importantly other proteins that should traffic to the PM need to be shown to traffic normally so as to rule out a non-specific effect.

      We have addressed this issue (also mentioned by Reviewer #1), by analyzing the localization of a number of polarization markers, finding that the overall polarization of the cell was not affected by loss of function of exocyst subunits. Please, see our response to the point 3) raised by Reviewer #1. The new data showing cell polarization markers are shown in Supplementary Figure 6 of the revised version of the manuscript, and described on text lines 172-179 of the Results section.

      (20) It is unclear from their model (Fig. 5) why after exocyst KD of Sec15 the cis-Golgi is more preserved than the TGN, which appears as large vacuoles. This is not quantitated and not shown for the 8 subunits.

      We thank the reviewer for this relevant comment. We agree that the phenotype of either, sec15 or sec3 loss-of-function cells manifests differently with cis-Golgi and trans-Golgi markers. While the cis-Golgi marker looked fragmented and aggregated, the trans-Golgi marker adopted a swollen appearance. However, in our view, the different appearance of the two markers does not necessarily imply that one compartment is more preserved than the other. In the revised version of the manuscript, we have quantified the penetrance of the phenotypes provoked by sec15 or sec3 silencing, using both cis-Golgi and trans-Golgi markers. In both cases, the penetrance was high, although even higher with the trans-Golgi marker. These new data are now depicted in Supplementary Figure 9 of the revised version of the manuscript. 

      It is interesting to mention that in HeLa cells, as well as in the retinal epithelial cell line hTERT, Golgi phenotypes similar to those we have described here have been reported after loss-offunction of other tethering complexes, which were shown to maintain the Golgi cisternae stuck together, including the GOC and GARP complexes (D'Souza et al, 2020, Khakurel and Lupashin, 2023; Shijie Liu et al, 2019). As we did throughout our work, not every aspect of the analysis included the silencing of all eight subunits. In this case, we chose to silence Sec3 and Sec15. Please note that we have modified the model depicted in Figure 6E-F, to highlight the cis- and transGolgi phenotypes upon exocyst knock-down, as well as the localization of the exocyst in cisternae of the Golgi complex.

      (21) Acute/Chronic control: It would be nice to acutely block the exocyst so as to better distinguish if the effects observed are primary or secondary effects (e.g. on a recycling pathway).

      We thank the reviewer for raising this important issue. To address this point, and to be able to induce silencing of exocyst subunits at specific time intervals of larval development, we utilized a strategy based on a thermosensitive variant of the Gal4 inhibitor Gal80 (Gal80ts)(Lee and Luo, 1999). We blocked Gal4 activity (and therefore RNAi expression) by maintaining the larvae at 18 °C during the 1st and 2nd instars (until 120 hours after egg lay), and then induced the activity of Gal4 specifically at the 3rd larval instar by raising the temperature to 29 ºC, a condition in which Gal80ts becomes inactive. After silencing the expression of sec3 or sec15 at the 3rd larval instar only, the phenotype was very similar to that observed after chronic silencing of exocyst subunits (larvae maintained at 29 ºC all throughout development, where Gal4 was never inhibited). These observations suggest that the defects observed in the secretory pathway after knock down of exocyst subunits reflect genuine functions of the exocyst in this pathway, rather than a secondary effect derived from impaired development of the salivary glands at early larval stages. These new results are now shown in Supplementary Figure 3, and described in manuscript lines 160-171 of the Results section.   

      (22) Granule homotypic fusion. Strangely over-expression of just one subunit, Sec15-GFP, made giant secretory granules (SG) that were over 8 microns big! Why is that, especially if normally the exocyst is normally a holocomplex. Was this an effect that was specific to Sec15 or all exocyst subunits? Is the Sec15 level rate limiting in these cells? It may be that a subcomplex of Sec15/10 plays earlier roles, but in any case this needs to be addressed across all (or many) of the exocyst subcomplex members.

      Please, see our response to point 7) of this letter. Sec15 is believed to act as a seed for the formation of the whole complex.

      (23) In summary, there are clearly striking effects on secretory granule biogenesis by dysfunction of the exocyst, however right now it is hard to disentangle effects on ERGolgi traffic, loss of the TGN, and a problem in maturation or fusion of granules. 

      As discussed in detail in our response to the point 3 raised by Reviewer #1, the secretory pathway is highly synchronized in each of the cells of the Drosophila salivary gland. SG biogenesis, SG maturation and SG fusion with the plasma membrane never occur simultaneously in the same cell. Thus, in a cell in which ER-Golgi traffic is impaired (and SG biogenesis does not occur), SGs do not exist, and therefore, they cannot exhibit defects in the process of maturation or fusion with the plasma membrane. In summary, we believe that our work has shown that in Drosophila larval salivary glands the exocyst holocomplex is required for (at least) three functions along the secretory pathway: 1) To maintain the appropriate Golgi complex architecture, thus enabling ERGolgi transport; 2) For secretory granule maturation: both, homotypic fusion and acquisition of maturation factors; 3) For secretory granule exocytosis: secretory granule tethering to enable subsequent fusion with the plasma membrane. As mentioned above (point 6 of this letter), these three functions require different amounts of the holocomplex, and therefore can be revealed by inducing different levels of silencing.  

      (24) It is also confusing if the entire exocyst holocomplex or subcomplex plays a key role 

      The fact that, by silencing any of the subunits (with the appropriate conditions) it is possible obtain any of the 3 phenotypes (impaired SG biogenesis, impaired SG maturation or impaired SG fusion with the plasma membrane) argues in favour of a function of the complex as a whole in each of these three functions.

      Reviewer 3:

      (25) General comment: Freire and co-authors examine the role of the exocyst complex during the formation and secretion of mucins from secretory granules in the larval salivary gland of Drosophila melanogaster. Using transgenic lines with a tagged Sgs3 mucin the authors KD expression of exocyst subunit members and observe a defect in secretory granules with a heterogeneity of phenotypes. By carefully controlling RNAi expression using a Gal4-based system the authors can KD exocyst subunit expression to varying degrees. The authors find that the stronger the inhibition of expression of exocyst the earlier in the secretory pathway the defect. The manuscript is well written, the model system is physiological, and the techniques are innovative.

      We appreciate the reviewer´s assessment of our work. 

      (26) My major concern is that the evidence underlying the fundamental claim of the manuscript that "the exocyst complex participates" in multiple secretory processes lacks direct evidence.

      We thank the reviewer for raising this important issue. We believe that the analysis of Sec15 subcellular localization during salivary gland development (Figures 5, 7B-D and 9E-F), in combination with the detailed analysis of the phenotypes provoked by loss-of-function of each of the exocyst subunits, provide evidence supporting multiple functions of the exocyst in the secretory pathway. We have also included 3D reconstructions and videos of GFP-Sec15 colocalization with Golgi and SG markers to support exocyst localization associated to these structures (Supplementary Videos 1-7), text lines 200-210; 216-221 and 303-305.

      (27) It is clear from multiple lines of evidence, which are discussed by the authors, that exocyst is essential for an array of exocytic events. The fundamental concern is that loss of homeostasis on the plasma membrane proteome and lipidome might have severe pleiotropic effects on the cell.

      We agree with the reviewer that this is an important point that needed to be addressed. As discussed in detail above at the response to point 3 raised by Reviewer #1, we have analysed several plasma membrane markers (including a PI(4,5)P2 lipid reporter), and found that overall, plasma membrane integrity and polarity were not substantially affected (Supplementary Figure 6). In addition, we have analyzed several markers of general cellular “health” that indicate that salivary gland cells do not seem to be distressed by the reduction of exocyst complex activity (Supplementary Figure 5). These new data are described in lines 172-179 of the Results section.

      (28) Perhaps the authors have more evidence that exocyst is important for homeotypic fusion of the SGs, as supported by the localisation of Sec15 on the fusion sites.

      We believe that the fact that, by silencing any of the exocyst subunits (with the appropriate conditions), immature smaller-than-normal granules were observed, argus in favour that the exocyst as a whole participates in SG homofusion (Figure 7A). In addition, we have included more images, quantifications, 3D reconstructions and videos of GFP-Sec15 localized just at the contact sites between immature SGs. We have quantified and compared GFP-Sec15 localization at immature SG vs its localization at mature SGs, finding that localizes preferentially at immature SGs, supporting a role of the exocyst as a tethering complex during homotypic fusion (shown Figure 7B-C and Supplementary Videos 4-6, and described in lines 216-221 of the Results section). Please see also our response to the point 2 raised by reviewer 1 in this rebuttal letter, and to Author response image 3 above in this letter.

      (29) The second question that I think is important to address is, what exactly do the varying RNAi levels correspond to in terms of experiments, and have these been validated? Due to the fundamental claim being that the severity of the phenotype being correlated with the level of KD, I think validation of this model is absolutely essential.  

      We thank the Reviewer for raising this important point, and agree it was lacking in the original version of our manuscript. As discussed in our response to the point 6) raised by Reviewer #2, we have performed qRT-PCR determinations for exo70 and sec3 mRNA levels after inducing silencing of these subunits at different temperatures, or with different RNAi transgenic lines. The remnant mRNA levels correlate well with the observed phenotypes. Please see Supplementary Figure 2 of the revised manuscript, and Author response image 5 of this rebuttal letter; described in lines 155-159 of the Results section. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      -  The authors assert in the discussion that exocyst involvement in constitutive secretion is well documented. This is based on a very recent study in mammalian culture cells. Therefore, I would not dismiss the issue as completely settled. Furthermore, a previous study of Drosophila sec10 reported no roles outside the ring gland (DOI: 10.1034/j.1600-0854.2002.31206.x).

      We have included these observations in the Discussion section. Lines 326-329.

      -  A salivary gland screening by Julie Brill's lab reported exocyst components as hits (DOI: 10.1083/jcb.201808017).

      We have referred to this paper in the Discussion section. Lines 326-329.

      -  It should be explained in more detail what is measured in graphs 7C, F, and others quantifying fluorescence around secretory granules. Looking at the images, the decrease in Rab1 and Rab11 seems less convincing.

      We have made a clearer description of how fluorescence intensity was measured in the Methods section lines 558-561. Also, we have uploaded a source data file in which the raw data of each experiment used for quantifications are disclosed. 

      Please note that the data indicates that Rab11 levels are higher in sec5 (Figure 8J-L) and sec3 (supplementary Figure 11M-R).

      Reviewer #2 (Recommendations For The Authors):

      No major issues.

      Writing - The authors should better frame their interpretations of other studies of the exocyst that include the role in autophagy, Palade body trafficking, and differential roles of the subunits.

      We have discussed these specific points in the Discussion section, lines 348-355 and 409-410.

      Minor - Fig. 6A: Why are variable temperatures (19-29 deg C used for the 8 KD experiments)?

      Please show it all at the same temperature (control too).

      The need for the usage of specific temperatures to obtain specific phenotypes with each of the RNAi lines used was explained in point 6 of this letter.

      Reviewer #3 (Recommendations For The Authors):

      In the abstract, the authors refer to the exocytic process and go on to describe secretory granule biogenesis and exocytosis. However, there are many exocytic processes aside from secretory granule biogenesis, and I think the authors should clarify this.

      Corrected in the Abstract. Lines 19-21

      Page 17 Thomas, 2021 reference, there is a glitch with the reference.

      Thanks for noticing. Fixed.

      References

      Bhuin T, Roy JK. Developmental expression, co-localization and genetic interaction of exocyst component Sec15 with Rab11 during Drosophila development. Exp Cell Res. 2019 Aug 1;381(1):94-104. doi: 10.1016/j.yexcr.2019.04.038. Epub 2019 May 7. PMID: 31071318.

      D'Souza Z, Taher FS, Lupashin VV. Golgi inCOGnito: From vesicle tethering to human disease. Biochim Biophys Acta Gen Subj. 2020 Nov;1864(11):129694. doi: 10.1016/j.bbagen.2020.129694. Epub 2020 Jul 27. PMID: 32730773; PMCID: PMC7384418.

      Escrevente C, Bento-Lopes L, Ramalho JS, Barral DC. Rab11 is required for lysosome exocytosis through the interaction with Rab3a, Sec15 and GRAB. J Cell Sci. 2021 Jun 1;134(11):jcs246694. doi: 10.1242/jcs.246694. Epub 2021 Jun 8. PMID: 34100549; PMCID: PMC8214760.

      Guo W, Roth D, Walch-Solimena C, Novick P. The exocyst is an effector for Sec4p, targeting secretory vesicles to sites of exocytosis. EMBO J. 1999 Feb 15;18(4):1071-80. doi: 10.1093/emboj/18.4.1071. PMID: 10022848; PMCID: PMC1171198.

      Jafar-Nejad H, Andrews HK, Acar M, Bayat V, Wirtz-Peitz F, Mehta SQ, Knoblich JA, Bellen HJ. Sec15, a component of the exocyst, promotes notch signaling during the asymmetric division of Drosophila sensory organ precursors. Dev Cell. 2005 Sep;9(3):351-63. doi: 10.1016/j.devcel.2005.06.010. PMID: 16137928.

      Khakurel A, Lupashin VV. Role of GARP Vesicle Tethering Complex in Golgi Physiology. Int J Mol Sci. 2023 Mar 23;24(7):6069. doi: 10.3390/ijms24076069. PMID: 37047041; PMCID: PMC10094427.

      Lattner J, Leng W, Knust E, Brankatschk M, Flores-Benitez D. Crumbs organizes the transport machinery by regulating apical levels of PI(4,5)P2 in Drosophila. Elife. 2019 Nov 7;8:e50900. doi: 10.7554/eLife.50900. PMID: 31697234; PMCID: PMC6881148.

      Lee T, Luo L. Mosaic analysis with a repressible cell marker for studies of gene function in neuronal morphogenesis. Neuron. 1999 Mar;22(3):451-61. doi: 10.1016/s08966273(00)80701-1. PMID: 10197526.

      Liu S, Majeed W, Grigaitis P, Betts MJ, Climer LK, Starkuviene V, Storrie B. Epistatic Analysis of the Contribution of Rabs and Kifs to CATCHR Family Dependent Golgi Organization. Front Cell Dev Biol. 2019 Aug 2;7:126. doi: 10.3389/fcell.2019.00126. PMID: 31428608; PMCID: PMC6687757.

      Perkins LA, Holderbaum L, Tao R, Hu Y, Sopko R, McCall K, Yang-Zhou D, Flockhart I, Binari R, Shim HS, Miller A, Housden A, Foos M, Randkelv S, Kelley C, Namgyal P, Villalta C, Liu LP, Jiang X, Huan-Huan Q, Wang X, Fujiyama A, Toyoda A, Ayers K, Blum A, Czech B, Neumuller R, Yan D, Cavallaro A, Hibbard K, Hall D, Cooley L, Hannon GJ, Lehmann R, Parks A, Mohr SE, Ueda R, Kondo S, Ni JQ, Perrimon N. The Transgenic RNAi Project at Harvard Medical School: Resources and Validation. Genetics. 2015 Nov;201(3):843-52. doi: 10.1534/genetics.115.180208. Epub 2015 Aug 28. PMID: 26320097; PMCID: PMC4649654.

      Wu S, Mehta SQ, Pichaud F, Bellen HJ, Quiocho FA. Sec15 interacts with Rab11 via a novel domain and affects Rab11 localization in vivo. Nat Struct Mol Biol. 2005 Oct;12(10):879-85. doi: 10.1038/nsmb987. Epub 2005 Sep 11. PMID: 16155582.

      Yeaman C, Grindstaff KK, Wright JR, Nelson WJ. Sec6/8 complexes on trans-Golgi network and plasma membrane regulate late stages of exocytosis in mammalian cells. J Cell Biol. 2001 Nov 12;155(4):593-604. doi: 10.1083/jcb.200107088. Epub 2001 Nov 5. PMID: 11696560; PMCID: PMC2198873.

      Zhang XM, Ellis S, Sriratana A, Mitchell CA, Rowe T. Sec15 is an effector for the Rab11 GTPase in mammalian cells. J Biol Chem. 2004 Oct 8;279(41):43027-34. doi: 10.1074/jbc.M402264200. Epub 2004 Jul 29. PMID: 15292201.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We summarized the main changes:

      (1) In the Introduction part, we give a general definition of habitat fragmentation to avoid confusion, as reviewers #1 and #2 suggested.

      (2) We clarify the two aspects of the observed “extinction”——“true dieback” and “emigration”, as reviewers #2 and #3 suggested.

      (3) In the Methods part, we 1) clarify the reason for testing the temporal trend in colonization/extinction dynamics and describe how to select islands as reviewer #1 suggested; 2) describe how to exclude birds from the analysis as reviewer #2 suggested.

      (4) In the Results part, we modified and rearranged Figure 4-6 as reviewers #1, #2 and #3 suggested.

      (5) In the Discussion part, we 1) discuss the multiple aspects of the metric of isolation for future research as reviewer #3 suggested; 2) provide concrete evidence about the relationship between habitat diversity or heterogeneity and island area and 3) provide a wider perspective about how our results can inform conservation practices in fragmented habitats as reviewer #2 suggested.

      eLife Assessment

      This important study enhances our understanding of how habitat fragmentation and climate change jointly influence bird community thermophilization in a fragmented island system. The evidence supporting some conclusions is incomplete, as while the overall trends are convincing, some methodological aspects, particularly the isolation metrics and interpretation of colonization/extinction rates, require further clarification. This work will be of broad interest to ecologists and conservation biologists, providing crucial insights into how ecosystems and communities react to climate change.

      We sincerely extend our gratitude to you and the esteemed reviewers for acknowledging the importance of our study and for raising these concerns. We have clarified the rationale behind our analysis of temporal trends in colonization and extinction dynamics, as well as the choice of distance to the mainland as the isolation metric. Additionally, we further discuss the multiple aspects of the metric of isolation for future research and provide concrete supporting evidence about the relationship between habitat diversity or heterogeneity and island area.

      Incorporating these valuable suggestions, we have thoroughly revised our manuscript, ensuring that it now presents a more comprehensive and nuanced account of our research. We are confident that these improvements will further enhance the impact and relevance of our work for ecologists and conservation biologists alike, offering vital insights into the resilience and adaptation strategies of communities facing the challenges of climate change.

      Reviewer #1 (Public Review):

      Summary:

      This study reports on the thermophilization of bird communities in a network of islands with varying areas and isolation in China. Using data from 10 years of transect surveys, the authors show that warm-adapted species tend to gradually replace cold-adapted species, both in terms of abundance and occurrence. The observed trends in colonisations and extinctions are related to the respective area and isolation of islands, showing an effect of fragmentation on the process of thermophilization.

      Strengths:

      Although thermophilization of bird communities has been already reported in different contexts, it is rare that this process can be related to habitat fragmentation, despite the fact that it has been hypothesized for a long time that it could play an important role. This is made possible thanks to a really nice study system in which the construction of a dam has created this incredible Thousand Islands lake. Here, authors do not simply take observed presence-absence as granted and instead develop an ambitious hierarchical dynamic multi-species occupancy model. Moreover, they carefully interpret their results in light of their knowledge of the ecology of the species involved.

      Response: We greatly appreciate your recognition of our study system and the comprehensive approach and careful interpretation of results. 

      Weaknesses:

      Despite the clarity of this paper on many aspects, I see a strong weakness in the authors' hypotheses, which obscures the interpretation of their results. Looking at Figure 1, and in many sentences of the text, a strong baseline hypothesis is that thermophilization occurs because of an increasing colonisation rate of warm-adapted species and extinction rate of cold-adapted species. However, there does not need to be a temporal trend! Any warm-adapted species that colonizes a site has a positive net effect on CTI; similarly, any cold-adapted species that goes extinct contributes to thermophilization.

      Thank you very much for these thoughtful comments. The understanding depends on the time frame of the study and specifically, whether the system is at equilibrium. We think your claim is based on this background: if the system is not at equilibrium, then CTI can shift simply by having differential colonization (or extinction) rates for warm-adapted versus cold-adapted species. We agree with you in this case.

      On the other hand, if a community is at equilibrium, then there will be no net change in CTI over time. Imagine we have an archipelago where the average colonization of warm-adapted species is larger than the average colonization of cold-adapted species, then over time the archipelago will reach an equilibrium with stable colonization/extinction dynamics where the average CTI is stable over time. Once it is stable, then if there is a temporal trend in colonization rates, the CTI will change until a new equilibrium is reached (if it is reached).

      For our system, the question then is whether we can assume that the system is or has ever been at equilibrium. If it is not at equilibrium, then CTI can shift simply by having differential colonization (or extinction) rates for warm-adapted versus cold-adapted species. If the system is at equilibrium (at the beginning of the study), then CTI will only shift if there is a temporal change or trend in colonization or extinction rates.

      Habitat fragmentation can affect biomes for decades after dam formation. The “Relaxation effect” (Gonzalez, 2000) refers to the fact that the continent acts as a potential species pool for island communities. Under relaxation, some species will be filtered out over time, mainly through the selective extinction of species that are highly sensitive to fragmentation. Meanwhile, for a 100-hectare patch, it takes about ten years to lose 50% of bird species; The smaller the patch area, the shorter the time required (Ferraz et al., 2003; Haddad et al., 2015). This study was conducted 50 to 60 years after the formation of the TIL, making the system with a high probability of reaching “equilibrium” through “Relaxation effect”(Si et al., 2014). We have no way of knowing exactly whether “equilibrium” is true in our system. Thus, changing rates of colonization-extinction over time is actually a much stronger test of thermophilization, which makes our inference more robust.

      We add a note to the legend of Figure 1 on Lines 781-786:

      “CTI can also change simply due to differential colonization-extinction rates by thermal affinity if the system is not at equilibrium prior to the study. In our study system, we have no way of knowing whether our island system was at equilibrium at onset of the study, thus, focusing on changing rates of colonization-extinction over time presents a much stronger tests of thermophilization.”

      We hope this statement can make it clear. Thank you again for this meaningful question.

      Another potential weakness is that fragmentation is not clearly defined. Generally, fragmentation sensu lato involves both loss of habitat area and changes in the spatial structure of habitats (i.e. fragmentation per se). Here, both area and isolation are considered, which may be slightly confusing for the readers if not properly defined.

      Thank you for reminding us of that. Habitat fragmentation in this study involves both habitat loss and fragmentation per se. We have clarified the general definition in the Introduction on Lines 61-63:

      “Habitat fragmentation, usually defined as the shifts of continuous habitat into spatially isolated and small patches (Fahrig, 2003), in particular, has been hypothesized to have interactive effects with climate change on community dynamics.”

      Reviewer #2 (Public Review):

      Summary:

      This study addresses whether bird community reassembly in time is related to climate change by modelling a widely used metric, the community temperature index (CTI). The authors first computed the temperature index of 60 breeding bird species thanks to distribution atlases and climatic maps, thus obtaining a measure of the species realized thermal niche.

      These indices were aggregated at the community level, using 53 survey transects of 36 islands (repeated for 10 years) of the Thousand Islands Lake, eastern China. Any increment of this CTI (i.e. thermophilization) can thus be interpreted as a community reassembly caused by a change in climate conditions (given no confounding correlations).

      The authors show thanks to a mix of Bayesian and frequentist mixed effect models to study an increment of CTI at the island level, driven by both extinction (or emigration) of cold-adapted species and colonization of newly adapted warm-adapted species. Less isolated islands displayed higher colonization and extinction rates, confirming that dispersal constraints (created by habitat fragmentation per se) on colonization and emigration are the main determinants of thermophilization. The authors also had the opportunity to test for habitat amount (here island size). They show that the lack of microclimatic buffering resulting from less forest amount (a claim backed by understory temperature data) exacerbated the rates of cold-adapted species extinction while fostering the establishment of warm-adapted species.

      Overall these findings are important to range studies as they reveal the local change in affinity to the climate of species comprising communities while showing that the habitat fragmentation VS amount distinction is relevant when studying thermophilization. As is, the manuscript lacks a wider perspective about how these results can be fed into conservation biology, but would greatly benefit from it. Indeed, this study shows that in a fragmented reserve context, habitat amount is very important in explaining trends of loss of cold-adapted species, hinting that it may be strategic to prioritize large habitats to conserve such species. Areas of diverse size may act as stepping stones for species shifting range due to climate change, with small islands fostering the establishment of newly adapted warm-adapted species while large islands act as refugia for cold-adapted species. This study also shows that the removal of dispersal constraints with low isolation may help species relocate to the best suitable microclimate in a heterogenous reserve context.

      Thank you very much for your valuable feedback. We greatly appreciate your recognition of the scientific question to the extensive dataset and diverse approach. In particular, you provided constructive suggestions and examples on how to extend the results to conservation guidance. This is something we can’t ignore in the manuscript. We have added a paragraph to the end of the Discussion, stating how our results can inform conservation, on Lines 339-347:

      ‘Overall, our findings have important implications for conservation practices. Firstly, we confirmed the role of isolation in limiting range shifting. Better connected landscapes should be developed to remove dispersal constraints and facilitate species’ relocation to the best suitable microclimate. Second, small patches can foster the establishment of newly adapted warm-adapted species while large patches can act as refugia for cold-adapted species. Therefore, preserving patches of diverse sizes can act as stepping stones or shelters in a warming climate depending on the thermal affinity of species. These insights are important supplement to the previous emphasis on the role of habitat diversity in fostering (Richard et al., 2021) or reducing (Gaüzère et al., 2017) community-level climate debt.’

      Strength:

      The strength of the study lies in its impressive dataset of bird resurveys, that cover 10 years of continued warming (as evidenced by weather data), 60 species in 36 islands of varying size and isolation, perfect for disentangling habitat fragmentation and habitat amount effects on communities. This distinction allows us to test very different processes mediating thermophilization; island area, linked to microclimatic buffering, explained rates for a variety of species. Dispersal constraints due to fragmentation were harder to detect but confirms that fragmentation does slow down thermophilization processes.

      This study is a very good example of how the expected range shift at the biome scale of the species materializes in small fragmented regions. Specifically, the regional dynamics the authors show are analogous to what processes are expected at the trailing and colonizing edge of a shifting range: warmer and more connected places display the fastest turnover rates of community reassembly. The authors also successfully estimated extinction and colonization rates, allowing a more mechanistic understanding of CTI increment, being the product of two processes.

      The authors showed that regional diversity and CTI computed only by occurrences do not respond in 10 years of warming, but that finer metrics (abundance-based, or individual islands considered) do respond. This highlights the need to consider a variety of case-specific metrics to address local or regional trends. Figure Appendix 2 is a much-appreciated visualization of the effect of different data sources on Species thermal Index (STI) calculation.

      The methods are long and diverse, but they are documented enough so that an experienced user with the use of the provided R script can follow and reproduce them.

      Thank you very much for your profound Public Review. We greatly appreciate your recognition of the scientific question, the extensive dataset and the diverse approach. 

      Weaknesses:

      While the overall message of the paper is supported by data, the claims are not uniformly backed by the analysis. The trends of island-specific thermophilization are very credible (Figure 3), however, the variable nature of bird observations (partly compensated by an impressive number of resurveys) propagate a lot of errors in the estimation of species-specific trends in occupancy, abundance change, and the extinction and colonization rates. This materializes into a weak relationship between STI and their respective occupancy and abundance change trends (Figure 4a, Figure 5, respectively), showing that species do not uniformly contribute to the trend observed in Figure 3. This is further shown by the results presented in Figure 6, which present in my opinion the topical finding of the study. While a lot of species rates response to island areas are significant, the isolation effect on colonization and extinction rates can only be interpreted as a trend as only a few species have a significant effect. The actual effect on the occupancy change rates of species is hard to grasp, and this trend has a potentially low magnitude (see below).

      Thank you very much for pointing out this shortcoming. The R2 between STI and their respective occupancy trends is relatively small (R2\=0.035). But the R2 between STI and their respective abundance change trends are relatively bigger, in the context of Ecology research (R2\=0.123). The R2 between STI and their respective colonization rate (R2\=0.083) and extinction rate trends (R2\=0.053) are also relatively small. Low R2 indicates that we can’t make predictions using the current model, we must notice that except STI, other factors may influence the species-specific occupancy trend. Nonetheless, it is important to notice that the standardized coefficient estimates are not minor and the trend is also significant, indicating the species-specific response is as least related to STI.

      The number of species that have significant interaction terms for isolation (Figure 6) is indeed low. Although there is uncertainty in the estimation of relationships, there are also consistent trends in response to habitat fragmentation of colonization of warm-adapted species and extinction of cold-adapted species. This is especially true for the effect of isolation, where on islands nearer to the mainland, warm-adapted species (15 out of 15 investigated species) increased their colonization probability at a higher rate over time, while most cold-adapted species (21 out of 23 species) increased their extinction probability at a higher rate. We now better highlight these results in the Results and Discussion.

      While being well documented, the myriad of statistical methods used by the authors ampere the interpretation of the figure as the posterior mean presented in Figure 4b and Figure 6 needs to be transformed again by a logit-1 and fed into the equation of the respective model to make sense of. I suggest a rewording of the caption to limit its dependence on the method section for interpretation.

      Thank you for this suggestion. The value on the Y axis indicates the posterior mean of each variable (year, area, isolation and their interaction effects) extracted from the MSOM model, where the logit(extinction rate) or logit(colonization rate) was the response variable. All variables were standardized before analysis to make them comparable so interpretation is actually quite straight forward: positive values indicate positive influence while negative values indicate negative influence. Because the goal of Figure 6 is to display the negative/positive effect, we didn’t back-transform them. Following your advice, we thus modified the caption of Figure 6 (now renumbered as Figure 5, following a comment from Reviewer #3, to move Figure 5 to Figure 4c). The modified title and legends of Figure 5 are on Lines 817-820:

      “Figure 5. Posterior estimates of logit-scale parameters related to cold-adapted species’ extinction rates and warm-adapted species’ colonization rates. Points are species-specific posterior means on the logit-scale, where parameters >0 indicate positive effects (on extinction [a] or colonization [b]) and parameters <0 indicate negative effects...”

      By using a broad estimate of the realized thermal niche, a common weakness of thermophilization studies is the inability to capture local adaptation in species' physiological or behavioral response to a rise in temperature. The authors however acknowledge this limitation and provide specific examples of how species ought to evade high temperatures in this study region.

      We appreciate your recognition. This is a common problem in STI studies. We hope in future studies, researchers can take more details about microclimate of species’ true habitat across regions into consideration when calculating STI. Although challenging, focusing on a smaller portion of its distribution range may facilitate achievement.

      Reviewer #3 (Public Review):

      Summary:

      Juan Liu et al. investigated the interplay between habitat fragmentation and climate-driven thermophilization in birds in an island system in China. They used extensive bird monitoring data (9 surveys per year per island) across 36 islands of varying size and isolation from the mainland covering 10 years. The authors use extensive modeling frameworks to test a general increase in the occurrence and abundance of warm-dwelling species and vice versa for cold-dwelling species using the widely used Community Temperature Index (CTI), as well as the relationship between island fragmentation in terms of island area and isolation from the mainland on extinction and colonization rates of cold- and warm-adapted species. They found that indeed there was thermophilization happening during the last 10 years, which was more pronounced for the CTI based on abundances and less clearly for the occurrence-based metric. Generally, the authors show that this is driven by an increased colonization rate of warm-dwelling and an increased extinction rate of cold-dwelling species. Interestingly, they unravel some of the mechanisms behind this dynamic by showing that warm-adapted species increased while cold-dwelling decreased more strongly on smaller islands, which is - according to the authors - due to lowered thermal buffering on smaller islands (which was supported by air temperature monitoring done during the study period on small and large islands). They argue, that the increased extinction rate of cold-adapted species could also be due to lowered habitat heterogeneity on smaller islands. With regards to island isolation, they show that also both thermophilization processes (increase of warm and decrease of cold-adapted species) were stronger on islands closer to the mainland, due to closer sources to species populations of either group on the mainland as compared to limited dispersal (i.e. range shift potential) in more isolated islands.

      The conclusions drawn in this study are sound, and mostly well supported by the results. Only a few aspects leave open questions and could quite likely be further supported by the authors themselves thanks to their apparent extensive understanding of the study system.

      Strengths:

      The study questions and hypotheses are very well aligned with the methods used, ranging from field surveys to extensive modeling frameworks, as well as with the conclusions drawn from the results. The study addresses a complex question on the interplay between habitat fragmentation and climate-driven thermophilization which can naturally be affected by a multitude of additional factors than the ones included here. Nevertheless, the authors use a well-balanced method of simplifying this to the most important factors in question (CTI change, extinction, and colonization, together with habitat fragmentation metrics of isolation and island area). The interpretation of the results presents interesting mechanisms without being too bold on their findings and by providing important links to the existing literature as well as to additional data and analyses presented in the appendix.

      We appreciate very much for your positive and constructive comments and suggestions. Thank you for your recognition of the scientific question, the modeling approach and the conclusions. 

      Weaknesses:

      The metric of island isolation based on the distance to the mainland seems a bit too oversimplified as in real life the study system rather represents an island network where the islands of different sizes are in varying distances to each other, such that smaller islands can potentially draw from the species pools from near-by larger islands too - rather than just from the mainland. Thus a more holistic network metric of isolation could have been applied or at least discussed for future research. The fact, that the authors did find a signal of island isolation does support their method, but the variation in responses to this metric could hint at a more complex pattern going on in real-life than was assumed for this study.

      Thank you for this meaningful question. Isolation can be measured in different ways in the study region. We chose the distance to the mainland as a measure of isolation based on the results of a previous study. One study in our system provided evidence that the colonization rate and extinction rate of breeding bird species were best fitted using distance to the nearest mainland over other distance-based measures (distance to the nearest landmass, distance to the nearest bigger landmass)(Si et al., 2014). Besides, their results produced almost identical patterns of the relationship between isolation and colonization/extinction rate (Si et al., 2014). That’s why we only selected “Distance to the mainland” in our current analysis and we do find some consistent patterns as expected. The plants on all islands were cleared out about 60 years ago due to dam construction, with all bird species coming from the mainland as the original species pool through a process called “relaxation”. This could be the reason why distance to the nearest mainland is the best predictor.

      We agree with you that it’s still necessary to consider more aspects of “isolation” at least in discussion for future research. In our Discussion, we address these on Lines 292-299:

      “As a caveat, we only consider the distance to the nearest mainland as a measure of fragmentation, consistent with previous work in this system (Si et al., 2014), but we acknowledge that other distance-based metrics of isolation that incorporate inter-island connections could reveal additional insights on fragmentation effects. The spatial arrangement of islands, like the arrangement of habitat, can influence niche tracking of species (Fourcade et al., 2021). Future studies should take these metrics into account to thoroughly understand the influence of isolation and spatial arrangement of patches in mediating the effect of climate warming on species.”

      Further, the link between larger areas and higher habitat diversity or heterogeneity could be presented by providing evidence for this relationship. The authors do make a reference to a paper done in the same study system, but a more thorough presentation of it would strengthen this assumption further.

      Thank you very much for this question. We now add more details about the relationship between habitat diversity and heterogeneity based on a related study in the same system. The observed number of species significantly increased with increasing island area (slope = 4.42, R2 = 0.70, p < .001), as did the rarefied species richness per island (slope = 1.03, R2 = 0.43, p < .001), species density (slope = 0.80, R2 = 0.33, p = .001) and the rarefied species richness per unit area (slope = 0.321, R2 = 0.32, p = .001). We added this supporting evidence on Lines 317-321:

      “We thus suppose that habitat heterogeneity could also mitigate the loss of these relatively cold-adapted species as expected. Habitat diversity, including the observed number of species, the rarefied species richness per island, species density and the rarefied species richness per unit area, all increased significantly with island area instead of isolation in our system (Liu et al., 2020)”

      Despite the general clear patterns found in the paper, there were some idiosyncratic responses. Those could be due to a multitude of factors which could be discussed a bit better to inform future research using a similar study design.

      Thank you for these suggestions. We added a summary statement about the reasons for idiosyncratic responses on Lines 334-338:

      “Overall, these idiosyncratic responses reveal several possible mechanisms in regulating species' climate responses, including resource demands and biological interactions like competition and predation. Future studies are needed to take these factors into account to understand the complex mechanisms by which habitat loss meditates species range shifts.”

      Reviewer #1 (Recommendations For The Authors):

      (1) Figure 1: I disagree that there should be a temporal trend in colonisation/extinction dynamics.

      Thank you again for these thoughtful comments. We have explained in detail in the response to the Public Review.

      (2) L 485-487: As explained before I disagree. I don't see why there needs to be a temporal trend in colonization and extinction.

      Thank you again for these thoughtful comments. Because we can’t guarantee that the study system has reached equilibrium, changing rates of colonization-extinction over time is actually a much stronger test of thermophilization. More detailed statement can be seen in the response to the Public Review.

      (3) L 141: which species' ecological traits?

      Sorry for the confusion. The traits included continuous variables (dispersal ability, body size, body mass and clutch size) and categorical variables (diet, active layer, residence type). Specifically, we tested the correlation between STI and dispersal ability, body size, body mass and clutch size using Pearson correlation test. We also tested the difference in STI between different trait groups using the Wilcoxon signed-rank test for three Category variables: diet (carnivorous/ omnivorous/ herbivory), active layer (canopy/mid/low), and residence type (resident species/summer visitor). There is no significant difference between any two groups for each of the three category variables (p > 0.2). We added these on Lines 141-145:

      “No significant correlation was found between STI and species’ ecological traits; specifically, the continuous variables of dispersal ability, body size, body mass and clutch size (Pearson correlations for each, |r| < 0.22), and the categorial variables of diet (carnivorous/omnivorous/herbivory), active layer (canopy/mid/low), and residence type (resident species/summer visitor)”

      (4) L 143: CTIoccur and CTIabun were not defined before.

      Because CTIoccur and CTIabun were first defined in Methods part (section 4.4), we change the sentence to a more general statement here on Lines 147-150:

      “At the landscape scale, considering species detected across the study area, occurrence-based CTI (CTIoccur; see section 4.4) showed no trend (posterior mean temporal trend = 0.414; 95% CrI: -12.751, 13.554) but abundance-based CTI (CTIabun; see section 4.4) showed a significant increasing trend.”

      (5) Figure 4: what is the dashed vertical line? I assume the mean STI across species?

      Sorry for the unclear description. The vertical dashed line indicates the median value of STI for 60 species, as a separation of warm-adapted species and cold-adapted species. We have added these details on Lines 807-809:

      “The dotted vertical line indicates the median of STI values. Cold-adapted species are plotted in blue and warm-adapted species are plotted in orange.”

      (6) Figure 6: in the legend, replace 'points in blue' with 'points in blue/orange' or 'solid dots' or something similar.

      Thank you for this suggestion. We changed it to “points in blue/orange” on Lines 823.

      (7) L 176-176: unclear why the interaction parameters are particularly important for explaining the thermophilization mechanism: if e.g. colonization rate of warm-adapted species is constantly higher in less isolated islands, (and always higher than the extinction rate of the same species), it means that thermophilization is increased in less isolated islands, right?

      Thank you for this question. This is also related to the question about “Why use temporal trends in colonization/extinction rate to test for thermophilization mechanisms”. Colonization-extinction over time is actually a much stronger test of thermophilization (more details refer to response to Public Review and Recommendations 1&2).

      Based on this, the two main driving processes of thermophilization mechanism include the increasing colonization rate of warm-adapted species and the increasing extinction rate of cold-adapted species with year. The interaction effect between island area (or isolation) and year on colonization rate (or extinction rate) can tell us how habitat fragmentation mediates the year effect. For example, if the interaction term between year and isolation is negative for a warm-adapted species that increased in colonization rate with year, it indicates that the colonization rate increased faster on less isolated islands. This is a signal of a faster thermophilization rate on less-isolated islands.

      (8) L201-203: this is only little supported by the results that actually show that there is NO significant interaction for most species.

      Thank you for this comment. Although most species showed non-significant interaction effect, the overall trend is relatively consistent, this is especially true for the effect of isolation. To emphasize the “trend” instead of “significant effect”, we slightly modified this sentence in more rigorous wording on Lines 205-208: 

      “We further found that habitat fragmentation influences two processes of thermophilization: colonization rates of most warm-adapted species tended to increase faster on smaller and less isolated islands, while the loss rates of most cold-adapted species tended to be exacerbated on less isolated islands.”

      (9) Section 2.3: can't you have a population-level estimate? I struggled a bit to understand all the parameters of the MSOM (because of my lack of statistical/mathematical proficiency) so I cannot provide more advice here.

      Thank you for raising this advice. We think what you are mentioning is the overall estimate across all species for each variable. From MSOM, we can get a standardized estimate of every variable (year, area, isolation, interaction) for each species, separately. Because the divergent or consistent responses among species are what we are interested in, we didn’t calculate further to get a population-level estimate.

      (10) L 291: a dot is missing.

      Done. Thank you for your correction.

      (11) L 305, 315: a space is missing

      Done

      (12) L 332: how were these islands selected?

      Thank you for this question. The 36 islands were selected according to a gradient of island area and isolation, spreading across the whole lake region. The selected islands guaranteed there is no significant correlation between island area and isolation (the Pearson correlation coefficient r = -0.21, p = 0.21). The biggest 7 islands among the 36 islands are also the only several islands larger than 30 ha in the whole lake region. We have modified this in the Method part on Lines 360-363.

      “We selected 36 islands according to a gradient of island area and isolation with a guarantee of no significant correlation between island area and isolation (Pearson r = -0.21, p = 0.21). For each island, we calculated island area and isolation (measured in the nearest Euclidean distance to the mainland) to represent the degree of habitat fragmentation.”

      (13) L 334: "Distance to the mainland" was used as a metric of isolation, but elsewhere in the text you argue that the observed thermophilization is due to interisland movements. It sounds contradictory. Why not include the average or shortest distance to the other islands?

      Thank you very much for raising this comment. Yes, “Distance to the mainland” was the only metric we used for isolation. We carefully checked through the manuscript where the “interisland movement” comes from and induces the misunderstanding. It must come from Discussion 3.1 (n Lines 217-221): “Notably, when tested on the landscape scale (versus on individual island communities), only the abundance-based thermophilization trend was significant, indicating thermophilization of bird communities was mostly due to inter-island occurrence dynamics, rather than exogenous community turnover.”

      Sorry, the word “inter-island” is not exactly what we want to express here, we wanted to express that “the thermophilization was mostly due to occurrence dynamics within the region, rather than exogenous community turnover outside the region”. We have changed the sentence in Discussion part on Lines 217-221:

      “Notably, when tested on the landscape scale (versus on individual island communities), only the abundance-based thermophilization trend was significant, indicating thermophilization of bird communities was mostly due to occurrence dynamics within the region, rather than exogenous community turnover outside the region.”

      Besides, I would like to explain why we use distance to the mainland. We chose the distance to the mainland as a measure of isolation based on the results of a previous study. One study in our system provided evidence that the colonization rate and extinction rate of breeding bird species were best fitted using distance to the nearest mainland over other distance-based measures (distance to the nearest landmass, distance to the nearest bigger landmass)(Si et al., 2014). Besides, their results produced almost identical patterns of the relationship between isolation and colonization/extinction rate(Si et al., 2014). That’s why we only selected “Distance to the mainland” in our current analysis and we do find some consistent patterns as expected. The plants on all islands were cleared out about 60 years ago due to dam construction, with all bird species coming from the mainland as the original species pool through a process called “relaxation”. This may be the reason why distance to the nearest mainland is the best predictor.

      In Discussion part, we added the following discussion and talked about the other measures on Lines 292-299:

      “As a caveat, we only consider the distance to the nearest mainland as a measure of fragmentation, consistent with previous work in this system (Si et al., 2014), but we acknowledge that other distance-based metrics of isolation that incorporate inter-island connections could reveal additional insights on fragmentation effects. The spatial arrangement of islands, like the arrangement of habitat, can influence niche tracking of species (Fourcade et al., 2021). Future studies should take these metrics into account to thoroughly understand the influence of isolation and spatial arrangement of patches in mediating the effect of climate warming on species.”

      (14) L 347: you write 'relative' abundance but this measure is not relative to anything. Better write something like "we based our abundance estimate on the maximum number of individuals recorded across the nine annual surveys".

      Thank you for this suggestion, we have changed the sentence on Lines 377-379:

      “We based our abundance estimate on the maximum number of individuals recorded across the nine annual surveys.”

      (15) L 378: shouldn't the formula for CTIoccur be (equation in latex format):

      CTI{occur, j, t} =\frac{\sum_{i=1}^{N_{j,t}}STI_{i}}{N_{j,t}}

      Where Nj,t is the total number of species surveyed in the community j in year t

      Thank you very much for this careful check, we have revised it on Lines 415, 417:

      “where Nj,t is the total number of species surveyed in the community j in year t.”

      Reviewer #2 (Recommendations For The Authors):

      (1) Line 76: "weakly"

      Done. Thank you for your correction.

      (2) Line 98: I suggest a change to this sentence: "For example, habitat fragmentation renders habitats to be too isolated to be colonized, causing sedentary butterflies to lag more behind climate warming in Britain than mobile ones"

      Thank you for this modification, we have changed it on Lines 99-101.

      (3) Line 101: remove either "higher" or "increasing"

      Done, we have removed “higher”. Thank you for this advice.

      (4) Line 102: "benefiting from near source of"

      Done.

      (5) Line 104: "emigrate"

      Done.

      (6) Introduction: I suggest making it more explicit what process you describe under the word "extinction". At first read, I thought you were only referring to the dieback of individuals, but you also included emigration as an extinction process. It also needs to be reworded in Fig 1 caption.

      Thank you for this suggestion. Yes, we can’t distinguish in our system between local extinction and emigration. The observed “extinction” of cold-adapted species over 10 years may involve two processes that usually occur in order: first “emigration” and then if can’t emigrate or withstand, “real local dieback”. It should also be included in the legend of Figure 1, as you said. We have modified the legend in Lines 780-781:

      “Note that extinction here may include both the emigration of species and then the local extinction of species.”

      There is also one part in the Discussion that mentions this on Lines 287-291: “While we cannot truly distinguish in our system between local extinction and emigration, we suspect that given two islands equal except in isolation, and if both lose suitability due to climate change, individuals can easily emigrate from the island nearer to the mainland, while individuals on the more isolated island would be more likely to be trapped in place until the species went locally extinct due to a lack of rescue”.

      (7) I also suggest differentiating habitat fragmentation (distances between islands) and habitat amount (area) as explained in Fahrig 2013 (Rethinking patch size and isolation effects: the habitat amount hypothesis) and her latter paper. This will help the reader what lies behind the general trend of fragmentation: fragmentation per se and habitat amount reduction.

      Thank you for this suggestion! Habitat fragmentation in this study involves both habitat loss and fragmentation per se. We now give a general definition of habitat fragmentation on Lines 61-63:

      “Habitat fragmentation, usually defined as the shifts of continuous habitat into spatially isolated and small patches (Fahrig, 2003), in particular, has been hypothesized to have interactive effects with climate change on community dynamics.”

      (8) Line 136: is the "+-" refers to the standard deviation or confidence interval, I suggest being explicit about it once at the start of the results.

      Thank you for reminding this. The "+-" refers to the standard deviation (SD). The modified sentence is now on Lines 135-139:

      “The number of species detected in surveys on each island across the study period averaged 13.37 ± 6.26 (mean ± SD) species, ranging from 2 to 40 species, with an observed gamma diversity of 60 species. The STI of all 60 birds averaged 19.94 ± 3.58 ℃ (mean ± SD) and ranged from 9.30 ℃ (Cuculus canorus) to 27.20 ℃ (Prinia inornate), with a median of STI is 20.63 ℃ (Appendix 1—figure 2; Appendix 1—figure 3).”

      (9) Line 143: please specify the unit of thermophilization.

      The unit of thermophilization rate is the change in degree per unit year. Because in all analyses, predictor variables were z-transformed to make their effect comparable. We have added on Line 151:

      “When measuring CTI trends for individual islands (expressed as °/ unit year)”

      (10) Line 289: check if no word is missing from the sentence.

      The sentence is: “In our study, a large proportion (11 out of 15) of warm-adapted species increasing in colonization rate and half (12 out of 23) of cold-adapted species increasing in extinction rate were changing more rapidly on smaller islands.”

      Given that we have defined the species that were included in testing the third prediction in both Methods part and Result part: 15 warm-adapted species that increased in colonization rate and 23 cold-adapted species that increased in extinction rate. We now remove this redundant information and rewrote the sentence as below on Lines 300-302:

      “In our study, the colonization rate of a large proportion of warm-adapted species (11 out of 15) and the extinction rate of half of old-adapted species (12 out of 23) were increasing more rapidly on smaller islands.”

      (11) Line 319: I really miss a concluding statement of your discussion, your results are truly interesting and deserve to be summarized in two or three sentences, and maybe a perspective about how it can inform conservation practices in fragmented settings.

      Thank you for this profound suggestion both in Public Review and here. We have added a paragraph to the end of the Discussion, stating how our results can inform conservation, on Lines 339-347:

      “Overall, our findings have important implications for conservation practices. Firstly, we confirmed the role of isolation in limiting range shifting. Better connected landscapes should be developed to remove dispersal constraints and facilitate species’ relocation to the best suitable microclimate. Second, small patches can foster the establishment of newly adapted warm-adapted species while large patches can act as refugia for cold-adapted species. Therefore, preserving patches of diverse sizes can act as stepping stones or shelters in a warming climate depending on the thermal affinity of species. These insights are important supplement to the previous emphasis on the role of habitat diversity in fostering (Richard et al., 2021) or reducing (Gaüzère et al., 2017) community-level climate debt.”

      (12) Line 335: I suggest " ... the islands has been protected by forbidding logging, ..."

      Thanks for this wonderful suggestion. Done. The new sentence is now on Lines 365-366:

      “Since lake formation, the islands have been protected by forbidding logging, allowing natural succession pathways to occur.”

      (13) Line 345: this speed is unusually high for walking, check the speed.

      Sorry for the carelessness, it should be 2.0 km/h. It has been corrected on Lines 375-376:

      “In each survey, observers walked along each transect at a constant speed (2.0 km/h) and recorded all the birds seen or heard on the survey islands.”

      (14) Line 351: you could add a sentence explaining why that choice of species exclusion was made. Was made from the start of the monitoring program or did you exclude species afterward?

      We excluded them afterward. We excluded non-breeding species, nocturnal and crepuscular species, high-flying species passing over the islands (e.g., raptors, swallows) and strongly water-associated birds (e.g., cormorants). These records were recorded during monitoring, including some of them being on the shore of the island or high-flying above the island, and some nocturnal species were just spotted by accident.

      We described more details about how to exclude species on Lines 379-387:

      “We excluded non-breeding species, nocturnal and crepuscular species, high-flying species passing over the islands (e.g., raptors, swallows) and strongly water-associated birds (e.g., cormorants) from our record. First, our surveys were conducted during the day, so some nocturnal and crepuscular species, such as the owls and nightjars were excluded for inadequate survey design. Second, wagtail, kingfisher, and water birds such as ducks and herons were excluded because we were only interested in forest birds. Third, birds like swallows, and eagles who were usually flying or soaring in the air rather than staying on islands, were also excluded as it was difficult to determine their definite belonging islands. Following these operations, 60 species were finally retained.”

      (15) Line 370: I suggest adding the range and median of STI.

      Thanks for this good suggestion. The range, mean±SD of STI were already in the Results part, we added the median of STI there as well. The new sentence is now in Results part on Lines 137-139:

      “The STI of all 60 birds averaged 19.94 ± 3.58 ℃ (mean ± SD) and ranged from 9.30 ℃ (Cuculus canorus) to 27.20 ℃ (Prinia inornate), with a median of 20.63 ℃ (Appendix 1—figure 2; Appendix 1—figure 3).”

      (16) Figure 4.b: Is it possible to be more explicit about what that trend is? the coefficient of the regression Logit(ext/col) ~ year + ...... ?

      Thank you for this advice. Your understanding is right: we can interpret it as the coefficient of the ‘year’ effect in the model. More specifically, the ‘year’ effect or temporal trend here is the ‘posterior mean’ of the posterior distribution of ‘year’ in the MSOM (Multi-species Occupancy Model), in the context of the Bayesian framework. We modified this sentence on Lines 811-813:

      “ Each point in (b) represents the posterior mean estimate of year in colonization, extinction or occupancy rate for each species.”

      (17) Figure 6: is it possible to provide an easily understandable meaning of the prior presented in the Y axis? E.g. "2 corresponds to a 90% probability for a species to go extinct at T+1", if not, please specify that it is the logit of a probability.

      Thank you for this question both in Public Review and here. The value on the Y axis indicates the posterior mean of each variable (year, area, isolation and their interaction effects) extracted from the MSOM model, where the logit(extinction rate) or logit(colonization rate) was the response variable. All variables were standardized before analysis to make them comparable. So, positive values indicate positive influence while negative values indicate negative influence. Because the goal of Figure 6 is to display the negative/positive effect, we didn’t back-transform them. Following your advice, we thus modified the caption of Figure 6 (now renumbered as Figure 5, following a comment from Reviewer #3, to move Figure 5 to Figure 4c). The modified title and legends of Figure 5 are on Lines 817-820:

      “Figure 5. Posterior estimates of logit-scale parameters related to cold-adapted species’ extinction rates and warm-adapted species’ colonization rates. Points are species-specific posterior means on the logit-scale, where parameters >0 indicate positive effects (on extinction [a] or colonization [b]) and parameters <0 indicate negative effects.”

      (18) Line 773: points in blue only are significant? I suggest "points in color".

      Thank you for your reminder. Points in blue and orange are all significant. We have revised the sentence on Line 823:

      “Points in blue/orange indicate significant effects.”

      These are all small suggestions that may help you improve the readability of the final manuscript. I warmly thank you for the opportunity to review this impressive study.

      We appreciate your careful review and profound suggestions. We believe these modifications will improve the final manuscript.

      Reviewer #3 (Recommendations For The Authors):

      I have a few minor suggestions for paper revision for your otherwise excellent manuscript. I wish to emphasize that it was a pleasure to read the manuscript and that I especially enjoyed a very nice flow throughout the ms from a nicely rounded introduction that led well into the research questions and hypotheses all the way to a good and solid discussion.

      Thank you very much for your review and recognition. We have carefully checked all recommendations and addressed them in the manuscript.

      (1) L 63: space before the bracket missing and I suggest moving the reference to the end of the sentence (directly after habitat fragmentation does not seem to make sense).

      Thank you very much for this suggestion. The missed space was added, and the reference has been moved to the end of the sentence. We also add a general definition of habitat fragmentation. The new sentence is on Lines 61-64:

      “Habitat fragmentation, usually defined as the shifts of continuous habitat into spatially isolated and small patches (Fahrig, 2003), in particular, has been hypothesized to have interactive effects with climate change on community dynamics.”

      (2) L 102: I suggest to write "benefitting ..." instead.

      Done.

      (3) L 103: higher extinction rates (add "s").

      Done.

      (4) L 104: this should probably say "emigrate" and "climate warming".

      Done.

      (5) L 130-133: this is true for emigration (more isolated islands show slower emigration). But what about increased local extinction, especially for small and isolated islands? Especially since you mentioned later in the manuscript that often emigration and extinction are difficult to identify or differentiate. Might be worth a thought here or somewhere in the discussion?

      Thank you for this good question. I would like to answer it in two aspects:

      Yes, we can’t distinguish between true local extinction and emigration. The observed local “extinction” of cold-adapted species over 10 years may involve two processes that usually occur in order: first “emigration” and then, if can’t emigrate or withstand, “real local dieback”. Over 10 years, the cold-adapted species would have to tolerate before real extinction on remote islands because of disperse limitation, while on less isolated islands it would be easy to emigrate and find a more suitable habitat for the same species. Consequently, it’s harder for us to observe “extinction” of species on more isolated islands, while it’s easier to observe “fake extinct” of species on less isolated islands due to emigration. As a result, the observed extinction rate is expected to increase more sharply for species on less remote islands, while the observed extinction rate is expected to increase relatively moderately for the same species on remote islands.

      We have modified the legend of Figure 1 on Lines 780-781:

      “Note that extinction here may include both the emigration of species and then the local extinction of species.”

      There is also one part in the Discussion that mentions this on Lines 287-291: “While we cannot truly distinguish in our system between local extinction and emigration, we suspect that given two islands equal except in isolation, if both lose suitability due to climate change, individuals can easily emigrate from the island nearer to the mainland, while individuals on the more isolated island would be more likely to be trapped in place until the species went locally extinct due to a lack of rescue”.

      Besides, you said “But what about increased local extinction, especially for small and isolated islands?”, I think you are mentioning the “high extinction rate per se on remote islands”. We want to test the “trend” of extinction rate on a temporal scale, rather than the extinction rate per se on a spatial scale. Even though species have a high extinction rate on remote islands, it can also show a slower changing rate in time.

      I hope these answers solve the problem.

      (6) L 245: I think this is the first time the acronym appears in the ms (as the methods come after the discussion), so please write the full name here too.

      Thank you for pointing out this. I realized “Thousand Island Lake” appears for the first time in the last paragraph of the Introduction part. So we add “TIL” there on Lines 108-109:

      “Here, we use 10 years of bird community data in a subtropical land-bridge island system (Thousand Island Lake, TIL, China, Figure 2) during a period of consistent climatic warming.”

      (7) L 319: this section could end with a summary statement on idiosyncratic responses (i.e. some variation in the responses you found among the species) and the potential reasons for this, such as e.g. the role of other species traits or interactions, as well as other ways to measure habitat fragmentation (see main comments in public review).

      Thank you for this suggestion both in Public Review and here. We added a summary statement about the reasons for idiosyncratic responses on Lines 334-338:

      “Overall, these idiosyncratic responses reveal several possible mechanisms in regulating species' climate responses, including resource demands and biological interactions like competition and predation. Future studies are needed to take these factors into account to understand the complex mechanisms by which habitat loss meditates species range shifts.”

      We only strengthen “habitat loss” here, because idiosyncratic responses mainly come from the mediating effect of habitat loss. For the mediating effect of isolation, the response is relatively consistent (see Page 8, Lines 183-188): “In particular, the effect of isolation on temporal dynamics of thermophilization was relatively consistent across cold- and warm-adapted species (Figure 5a, b); specifically, on islands nearer to the mainland, warm-adapted species (15 out of 15 investigated species) increased their colonization probability at a higher rate over time, while most cold-adapted species (21 out of 23 species) increased their extinction probability at a higher rate”.

      (8) L 333: what about the distance to other islands? it's more of a network than a island-mainland directional system (Figure 2). You could address this aspect in the discussion.

      Thank you for this good question again. Isolation can be measured in different ways in the study region. We chose distance to the mainland because it was the best predictor of colonization and extinction rate of breeding birds in the study region, and produced similar results like the other distance-based measures, including distance to the nearest landmass, distance to the nearest larger landmass (Si et al., 2014). We still agree with you that it’s necessary to consider more aspects of “isolation” at least in discussion for future research. In Discussion part, we addressed these on Lines 292-299. For more details refer to the response to Public Review.

      (9) Figure 2: Is B1 one of the sampled islands? It is clearly much larger than most other islands and I think it could thus serve as an important population source for many of the adjacent smaller islands? Thus, the nearest neighbor distance to B1 could be as important in addition to the distance to the mainland?

      Yes, B1 is one of the sampled islands and is also the biggest island. In previous research in our study system, we tried distance to the nearest landmass, to the nearest larger landmass and the nearest mainland, they produced similar results (For more details refer to the response to Public Review). We agree with you that the nearest neighbor distance to B1 could be a potentially important measure, but need further research. In our Discussion, we address these on Lines 292-299:

      “As a caveat, we only consider the distance to the nearest mainland as a measure of fragmentation, consistent with previous work in this system (Si et al., 2014), but we acknowledge that other distance-based metrics of isolation that incorporate inter-island connections could reveal additional insights on fragmentation effects. The spatial arrangement of islands, like the arrangement of habitat, can influence niche tracking of species (Fourcade et al., 2021). Future studies should take these metrics into account to thoroughly understand the influence of isolation and spatial arrangement of patches in mediating the effect of climate warming on species.”

      (10) L 345: 20km/h walking seems impressively fast? I assume this is a typo.

      Sorry for the carelessness, it should be 2.0 km/h. it has been corrected on Lines 375-376:

      “In each survey, observers walked along each transect at a constant speed (2.0 km/h) and recorded all the birds seen or heard on the survey islands.”

      (11) L 485: I had difficulties fully understanding the models that were fitted here and could not find them in the codes you provided (which were otherwise very well documented!). Could you explain this modeling step in a bit more detail?

      Thank you for your recognition! According to Line 485 in the online PDF version (Methods part 4.6.3), it says: “An increasing colonization trend of warm-adapted species and increasing extinction trend of cold-adapted species are two main expected processes that cause thermophilization (Fourcade et al., 2021). To test our third prediction about the mediating effect of habitat fragmentation, we selected warm-adapted species that had an increasing trend in colonization rate (positive year effect in colonization rate) and cold-adapted species that had an increasing extinction rate (positive year effect in extinction rate)…..”

      We carefully checked the code in Figshare link and found that the MOSM JAGS code was not uploaded before. Very sorry for that. Now it can be found in the document [MOSM.R] at https://figshare.com/s/7a16974114262d280ef7. Hope the code, together with the modeling process in section 4.5 in the Methods can help to understand the whole modeling process. Besides, we would like to explain how to decide the temporal trend in colonization or extinction of each species related to Line 485. Let’s take the model of species-specific extinction rate for example:

      In this model, “Island” was a random effect, “Year” is added as a random slope, thus allowing “year effect” (that is: the temporal trend) of extinction rate of species to vary with “island”. Further, the interaction effect between island variables (isolation, area) was added to test if the “year effect” was related to island area or isolation.

      Because we are only interested in warm-adapted species that have a positive temporal trend in colonization and cold-adapted species that have a positive temporal trend in extinction, which are two main processes underlying thermophilizaiton, we choose warm-adapted species that have a positive year-effect in colonization, and cold-adapted species that has a positive year-effect in extinction. Hope this explanation and the JAGS code can help if you are confused about this part.

      Hope these explanations can make it clearer.

      (12) Figure 1: to me, it would be more intuitive to put the landscape configuration in the titles of the panels b, c, and d instead of "only" the mechanisms. E.g. they could be: a) fragmented islands with low climate buffering; b) small islands with low habitat heterogeneity; c) isolated islands with dispersal limitations?

      It is also slightly confusing that the bird communities are above "island" in the middle of the three fragmented habitats - which all look a bit different in terms of tree species and structure which makes the reader first think that it has something to do with the "new" species community. so maybe worth rethinking how to illustrate the three fragmented islands?

      We would like to thank you for your nice proposition. Firstly, it’s a good idea to put the landscape configuration in the title of the panels b, c, d. The new title (a) is “Fragmented islands with low climate buffering”, title (b) is “Small islands with low habitat heterogeneity”, and title (c) is “Isolated patches with dispersal limitations”.

      Second, we realized that putting the “bird community” above “island” in the middle of the three patches is a bit confusing. Actually, we wanted to show bird communities only on that one island in the middle. The other two patches are only there to represent a fragmented background. To avoid misunderstanding, we added a sentence in the legend of Figure 1 on Lines 778-780:

      “The three distinct patches signify a fragmented background and the community in the middle of the three patches was selected to exhibit colonization-extinction dynamics in fragmented habitats.”

      (13) Figure 4: please add the description of the color code for panel a.

      Sorry for the unclear description. The vertical dashed line indicates the median value of STI for 60 species, as a separation of warm-adapted species and cold-adapted species. We have added these details on Lines 807-809:

      “The dotted vertical line indicates the median of STI values. Cold-adapted species are plotted in blue and warm-adapted species are plotted in orange.”

      (14) Figure 5: You could consider adding this as panel c to Figure 4 as it depicts the same thing as in 4a but for CTI-abundance.

      Thank you for this advice. We have moved the original Figure 5 to Figure 4c. Previous Figure 6 thus turned into Figure 5. All corresponding citations in the main text were checked to adapt to the new index. The new figure is now on Lines 801-815:

      References

      Ferraz, G., Russell, G. J., Stouffer, P. C., Bierregaard Jr, R. O., Pimm, S. L., & Lovejoy, T. E. (2003). Rates of species loss from Amazonian forest fragments. Proceedings of the National Academy of Sciences, 100(24), 14069-14073. doi:10.1073/pnas.2336195100

      Fourcade, Y., WallisDeVries, M. F., Kuussaari, M., van Swaay, C. A., Heliölä, J., & Öckinger, E. (2021). Habitat amount and distribution modify community dynamics under climate change. Ecology Letters, 24(5), 950-957. doi:10.1111/ele.13691

      Gaüzère, P., Princé, K., & Devictor, V. (2017). Where do they go? The effects of topography and habitat diversity on reducing climatic debt in birds. Global Change Biology, 23(6), 2218-2229. doi:10.1111/gcb.13500

      Gonzalez, A. (2000). Community relaxation in fragmented landscapes: the relation between species richness, area and age. Ecology Letters, 3(5), 441-448. doi:10.1046/j.1461-0248.2000.00171.x

      Haddad, N. M., Brudvig, L. A., Clobert, J., Davies, K. F., Gonzalez, A., Holt, R. D., . . . Collins, C. D. (2015). Habitat fragmentation and its lasting impact on Earth’s ecosystems. Science advances, 1(2), e1500052. doi:10.1126/sciadv.1500052

      Richard, B., Dupouey, J. l., Corcket, E., Alard, D., Archaux, F., Aubert, M., . . . Macé, S. (2021). The climatic debt is growing in the understorey of temperate forests: Stand characteristics matter. Global Ecology and Biogeography, 30(7), 1474-1487. doi:10.1111/geb.13312

      Si, X., Pimm, S. L., Russell, G. J., & Ding, P. (2014). Turnover of breeding bird communities on islands in an inundated lake. Journal of Biogeography, 41(12), 2283-2292. doi:10.1111/jbi.12379

    1. Author Response:

      Reviewer #1 (Public review):

      Summary:

      Fallah and colleagues characterize the connectivity between two basal ganglia output nuclei, the SNr and GPe, and the pedunculopontine nucleus, a brainstem nucleus that is part of the mesencephalic locomotor region. Through a series of systematic electrophysiological studies, they find that these regions target and inhibit different populations of neurons, with anatomical organization. Overall, SNr projects to PPN and inhibits all major cell types, while the GPe inhibits glutamatergic and GABAergic PPN neurons, and preferentially in the caudal part of the nucleus. Optogenetic manipulation of these inputs had opposing effects on behavior - SNr terminals in the PPN drove place aversion, while GPe terminals drove place preference.

      Strengths:

      This work is a thorough and systematic characterization of a set of relatively understudied circuits. They build on the classic notions of basal ganglia connectivity and suggest a number of interesting future directions to dissect motor control and valence processing in brainstem systems.

      We thank the reviewers for these positive comments.

      Weaknesses:

      Characterization of the behavioral effects of manipulations of these PPN input circuits could be further parsed, for a better understanding of the functional consequences of the connections demonstrated in the ephys analyses.

      We will further analyze our behavioral data to reveal more nuanced functional effects.

      All the cell type recording studies showing subtle differences in the degree of inhibition and anatomical organization of that inhibition suggest a complex effect of general optogenetic manipulation of SNr or GPe terminals in the PPN. It will be important to determine if SNr or GPe inputs onto a particular cell type in PPN are more or less critical for how the locomotion and valence effects are demonstrated here.

      This is a really interesting future direction and we will expand on these points in the discussion.

      Reviewer #2 (Public review):

      Summary:

      Fallah et al carefully dissect projections from SNr and GPe - two key basal ganglia nuclei - to the PPN, an important brainstem nucleus for motor control. They consider inputs from these two areas onto 3 types of downstream PPN neurons: GABAergic, glutamatergic, and cholinergic neurons. They also carefully map connectivity along the rostrocaudal axis of the PPN.

      Strengths:

      The slice electrophysiology work is technically well done and provides useful information for further studies of PPN. The optogenetics and behavioral studies are thought-provoking, showing that SNr and GPe projections to PPN play distinct roles in behavior.

      We appreciate the reviewer’s positive evaluation.

      Weaknesses:

      Although the optogenetics and behavioral studies are intriguing, they are somewhat difficult to fit together into a specific model of circuit function. Perhaps the authors can work to solidify the connection between these two arms of the work.

      We will expand on these topics in the discussion.

      (1) Male and female mice are used, but the authors do not discuss any analysis of sex differences. If there are no sex differences, it is still useful to report data disaggregated by sex in addition to pooled data.

      While we do not have sufficient n for a well-powered analysis of sex differences in behavior, we find that both male and female mice increase movement in response to SNr axon stimulation and decrease movement in response to GPe axon stimulation. We will expand on this further in the revised manuscript.

      (2) There is some lack of clarity in the current manuscript on the ages used - 2-5 months vs "at least 7 weeks." Is 7 weeks the time of virus injection surgery, then recordings 3 weeks later (at least 10 weeks)? Please clarify if these ages apply equally to electrophysiological and behavioral studies. If the age range used for the test is large, it may be useful to analyze and report if there are age-related effects.

      7 weeks is the youngest age at which mice used for electrophysiology were injected, and all were used for electrophysiology between 2-5 months. For behavior, the youngest mice used were 11 weeks old at time of behavior (8 weeks old at injection). Mice in the GPe-stimulated condition were 110 ± 7.4 SEM days old and mice in the SNr-stimulated condition 132 ± 23.4 SEM days old. We will add these details to the revised manuscript.

      In addition, we have correlated distance traveled at baseline and during stimulation with age for both SNr and GPe stimulated conditions. Baseline distance traveled did not correlate with age, but there was a trend toward more movement during stimulation with older mice in the SNr axon stimulation group. We will discuss this in the revised manuscript.

      (3) Were any exclusion criteria applied, e.g. to account for missed injections?

      All injection sites and implant sites were within our range of acceptability, so we did not exclude any mice for missed injections.

      (4) 28-34degC is a fairly wide range of temperatures for electrophysiological recording, which could affect kinetics.

      This is an important consideration. We have checked our main measurement of current amplitude in the condition where we found significant differences between rostral and caudal PPN (SNr to Vglut2 PPN neurons) against temperature and found no correlation (Pearson’s r value = -0.0076). Similarly, we found no correlation between baseline (pre-opto) firing frequency and temperature (r = -0.068).

      (5) It would be good to report the number of mice used for each condition in addition to n=cells. Statistically, it would be preferable not to assume that each cell from the same mouse is an independent measurement and to use a nested ANOVA.

      For electrophysiology, the number of mice used in each experiment was 6 (3 male, 3 female). In the manuscript ‘N’ represents number of mice and ‘n’ represents number of cells. Because of the unpredictability of how many healthy cells can be recorded from one mouse, our data were planned to be collected with n=cells, and are underpowered for a nested ANOVA. However, rostral and caudal data were collected from the same mice. While we do not have sufficient paired data for each parameter, analyzing one of our main and most important findings with a paired comparison (with biological replicates being mice) shows a statistically significant difference in the inhibitory effect of SNr axon stimulation on firing rate between rostral and caudal glutamatergic neurons (p=0.031, Wilcoxon signed rank test).

      Reviewer #3 (Public review):

      Summary:

      The study by Fallah et al provides a thorough characterization of the effects of two basal ganglia output pathways on cholinergic, glutamatergic, and GABAergic neurons of the PPN. The authors first found that SNr projections spread over the entire PPN, whereas GPe projections are mostly concentrated in the caudal portion of the nucleus. Then the authors characterized the postsynaptic effects of optogenetically activating these basal ganglia inputs and identified the PPN's cell subtypes using genetically encoded fluorescent reporters. Activation of inputs from the SNr inhibited virtually all PPN neurons. Activation of inputs from the GPe predominantly inhibited glutamatergic neurons in the caudal PPN, and to a lesser extent GABAergic neurons. Finally, the authors tested the effects of activating these inputs on locomotor activity and place preference. SNr activation was found to increase locomotor activity and elicit avoidance of the optogenetic stimulation zone in a real-time place preference task. In contrast, GPe activation reduced locomotion and increased the time in the RTPP stimulation zone.

      Strengths:

      The evidence of functional connectivity of SNr and GPe neurons with cholinergic, glutamatergic, and GABAergic PPN neurons is solid and reveals a prominent influence of the SNr over the entire PPN output. In addition, the evidence of a GPe projection that preferentially innervates the caudal glutamatergic PPN is unexpected and highly relevant for basal ganglia function.

      Opposing effects of two basal ganglia outputs on locomotion and valence through their connectivity with the PPN.

      Overall, these results provide an unprecedented cell-type-specific characterization of the effects of basal ganglia inputs in the PPN and support the well-established notion of a close relationship between the PPN and the basal ganglia.

      We thank the reviewer for their positive comments.

      Weaknesses:

      The behavioral experiments require further analysis as some motor effects could have been averaged out by analyzing long segments.

      We will further analyze our motor effects in the revised manuscript.

      Additional controls are needed to rule out a motor effect in the real-time place preference task.

      This is an important point. Our use of unilateral stimulation in the RTPP task reduces potential motor effects, and our supplemental videos show that the mice can easily escape and enter the stimulated zone. However, we can't completely rule out a motor component. To delve into this further, we analyzed mouse speed in the RTPP task. We find that in both SNr and GPe stimulation conditions, the maximum speed of the mouse is not different in the stimulated vs unstimulated zone. We will further analyze mouse speed at the transition into and out of the stimulated zone to identify any acute motor effects in this experiment.

      Importantly, the location of the stimulation is not reported even though this is critical to interpret the behavioral effects.

      The implant locations were generally over the middle-to-rostral PPN and we will clarify this in the revised manuscript. These locations are shown in figure 7B.

      There are some concerns about the possible recruitment of dopamine neurons in the SNr experiments.

      We are very interested in this possibility and plan to discuss this with more clarity in a revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations For The Authors): 

      This is not a recommendation. While reading old literature, I found some interesting facts. The shape of the neurocranium in monotremes, birds, and mammals, at least in early stages, resembles the phenotype of 'dact'1/2, wnt11f2, or syu mutants. For more details, see DeBeer's: 'The Development of the Vertebrate Skull, !937' Plate 137. 

      Thank you for pointing this out. It is indeed interesting.

      Minor Comments: 

      • Lines 64, 66, and 69: same citation without interruption: Heisenberg, Brand et al. 1996

      Revised line 76. 

      • Lines 101 and 102: same citation without interruption: Li, Florez et al. 2013 

      Revised line 118.

      • Lines 144, 515, 527, and 1147: should be wnt11f2 instead of wntllf2 - if not, then explain 

      Revised lines 185, 625, 640,1300.

      • Lines 169 and 171: incorrect figure citation: Fig 1D - correct to Fig 1F 

      Revised lines 217, 219.

      • Line 173: delete (Fig. S1) 

      Revised line 221.

      • Line 207: indicate that both dact1 and dact2 mRNA levels increased, noting a 40% higher level of dact2 mRNA after deletion of 7 bp in the dact2 gene 

      Revised line 265.

      • Line 215: Fig 1F instead of Fig 1D 

      Revised line 217.

      • Line 248: unify naming of compound mutants to either dact1/2 or dact1/dact2 compound mutants 

      Revised to dact1/2 throughout.

      • Line 259: incorrect figure citation: Fig S1 - correct to Fig S2D/E 

      Revised line 324.

      • Line 302: correct abbreviation position: neural crest (NCC) cell - change to neural crest cell (NCC) population 

      Revised line 380.

      • Line 349: repeating kny mut definition from line 70 may be unnecessary 

      Revised line 434.

      • Line 351: clarify distinction between Fig S1 and Fig S2 in the supplementary section 

      Revised line 324.

      • Line 436: refer to the correct figure for pathways associated with proteolysis (Fig 7B) 

      Revised line 530.

      • Line 446-447: complete the sentence and clarify the relevance of smad1 expression, and correct the use of "also" in relation to capn8 

      Revised line 567.

      • Line 462: clarify that this phenotype was never observed in wildtype larvae, and correct figure reference to exclude dact1+/- dact2+/- 

      Revised line 563, 568.

      • Line 463: explain the injection procedure into embryos from dact1/2+/- interbreeding 

      Revised line 565.

      • Lines 488 and 491: same citation without interruption: Waxman, Hocking et al. 2004 

      Revised line 591.

      • Line 502: maintain consistency in referring to TGF-beta signaling throughout the article 

      Revised throughout.

      • Line 523: define CNCC; previously used only NCC 

      Revised to cranial NCC throughout.

      • Line 1105: reconsider citing another work in the figure legend 

      Revised line 1249.

      • Line 1143: consider using "mutant" instead of "mu" 

      Revised line 1295.

      • Fig 2A/B: indicate the number of animals used ("n") 

      N is noted on line 1274.

      • Fig 2C, D, E: ensure uniform terminology for control groups ("wt" vs. "wildtype") 

      Revised in figure.

      • Fig 7C: clarify analysis of dact1/2-/- mutant in lateral plate mesoderm vs. ectoderm 

      Revised line 1356.

      • Fig 8A: label the figure to indicate it shows capn8, not just in the legend 

      Revised.

      • Fig 8D: explain the black/white portions and simplify to highlight important data 

      Revised.

      • Fig S2: add the title "Figure S2" 

      Revised.

      • Consider omitting the sentence: "As with most studies, this work has contributed some new knowledge but generated more questions than answers." 

      Revised line 720.

      Reviewer #2 (Recommendations For The Authors): 

      Major comments: 

      (1) The authors have addressed many of the questions I had, including making the biological sample numbers more transparent. It might be more informative to use n = n/n, e.g. n = 3/3, rather than just n = 3. Alternatively, that information can be given in the figure legend or in the form of penetrance %. 

      The compound heterozygote breeding and phenotyping analyses were not carried out in such a way that we can comment on the precise % penetrance of the ANC phenotype, as we did not dissect every ANC and genotype every individual that resulted from the triple heterozygote in crossings. We collected phenotype/genotype data until we obtained at least three replicates.

      We did genotype every individual resulting from dact1/2 dHet crosses to correlate genotype to the phenotype of the embryonic convergent extension phenotype and narrowed ethmoid plate (Fig. 2A, Fig. 3) which demonstrated full penetrance.

      (2) The description of the expression of dact1/2 and wnt11f2 is not consistent with what the images are showing. In the revised figure 1 legend, the author says "dact2 and wnt11f2 transcripts are detected in the anterior neural plate" (line 1099)", but it's hard to see wnt11f2 expression in the anterior neural plate in 1B. The authors then again said " wnt11f2 is also expressed in these cells", referring to the anterior neural plate and polster (P), notochord (N), paraxial and presomitic mesoderm (PM) and tailbud (TB). However, other than the notochord expression, other expression is actually quite dissimilar between dact2 and wnt11f2 in 1C. The authors should describe their expression more accurately and take that into account when considering their function in the same pathway. 

      We have revised these sections to more carefully describe the expression patterns. We have added references to previous descriptions of wnt11 expression domains.

      (3) Similar to (2), while the Daniocell was useful in demonstrating that expression of dact1 and dact2 are more similar to expression of gpc4 and wnt11f2, the text description of the data is quite confusing. The authors stated "dact2 was more highly expressed in anterior structures including cephalic mesoderm and neural ectoderm while dact1 was more highly expressed in mesenchyme and muscle" (lines 174-176). However, the Daniocell seems to show more dact1 expression in the neural tissues than dact2, which would contradict the in situ data as well. I think the problem is in part due to the dataset contains cells from many different stages and it might be helpful to include a plot of the cells at different stages, as well as the cell types, both of which are available from the Daniocell website. 

      We have revised the text to focus the Daniocell analysis on the overall and general expression patterns. Line 220.

      (4) The authors used the term "morphological movements" (line 337) to describe the cause of dact1/2 phenotypes. Please clarify what this means. Is it cell movement? Or is it the shape of the tissues? What does "morphological movements" really mean and how does that affect the formation of the EP by the second stream of NCCs? 

      We have revised this sentence to improve clarity. Line 416.

      (5) In the first submission, only 1 out of 142 calpain-overexpressing animals phenocopied dact1/2 mutants and that was a major concern regarding the functional significance of calpain 8 in this context. In the revised manuscript, the authors demonstrated that more embryos developed the phenotype when they are heterozygous for both dact1/2. While this is encouraging, it is interesting that the same phenomenon was not observed in the dact1-/-; dact2+/- embryos (Fig. 6D). The authors did not discuss this and should provide some explanation. The authors should also discuss sufficiency vs requirement tested in this experiment. However, given that this is the most novel aspect of the paper, performing experiments to demonstrate requirements would be important. 

      We have added a statement regarding the non-effect in dact1-/-;dact2+/- embryos. Line 568-570. We have also added discussion of sufficiency vs necessity/requirement testing. Line 676-679.

      (6) Related to (5), the authors cited figure 8c when mentioning 0/192 gfp-injected embryos developed EP phenotypes. However, figure 8c is dact1/2 +/- embryos. The numbers also doesn't match the numbers in Figure 8d either. Please add relevant/correct figures. 

      The text has been revised to distinguish between our overexpression experiment in wildtype embryos (data not shown) versus overexpression in dact1/2 double het in cross embryos (Fig 8).

      Minor comments: 

      (1) Fig 1 legend line 1106 "the midbrain (MP)" should be MB 

      Revised line 1250.

      (2) Wntllf2, instead of wnt11f2, (i.e. the letter "l" rather than the number "1") was used in 4 instances, line 144, 515, 527, 1147 

      Revised lines 185, 625, 640,1300.

      (3) The authors replaced ANC with EP in many instances, but ANC is left unchanged in some places and it's not defined in the text. It's first mentioned in line 170.

      Revised line 218.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript gives a broad overview of how to write NeuroML, and a brief description of how to use it with different simulators and for different purposes - cells to networks, simulation, optimization, and analysis. From this perspective, it can be an extremely useful document to introduce new users to NeuroML.

      We are glad the reviewer found our manuscript useful.

      However, the manuscript itself seems to lose sight of this goal in many places, and instead, the description at times seems to target software developers. For example, there is a long paragraph on the board and user community. The discussion on simulator tools seems more for developers, not users. All the information presented at the level of a developer is likely to be distracting to eLife readership.

      To make the paper less developer focussed and more accessible to the end user we have shortened the long paragraphs on the board and user community (and moved some of this text to the Methods section; lines: 524-572 in the document with highlighted changes). We have also made the discussion on simulator tools more focussed on the user (lines 334-406). However, we believe some information on the development and oversight of NeuroML and its community base are relevant to the end user, so we have not removed these completely from the main text.

      Strengths:

      The modularity of NeuroML is indeed a great advantage. For example, the ability to specify the channel file allows different channels to be used with different morphologies without redundancy. The hierarchical nature of NeuroML also is commendable, and well illustrated in Figures 2a through c.

      The number of tools available to work with NeuroML is impressive.

      The abstract, beginning, and end of the manuscript present and discuss incorporating NeuroML into research workflows to support FAIR principles.

      Having a Python API and providing examples using this API is fantastic. Exporting to NeuroML from Python is also a great feature.

      We are glad the reviewer appreciated the design of NeuroML and its support for FAIR principles.

      Weaknesses:

      Though modularity is a strength, it is unclear to me why the cell morphology isn't also treated similarly, i.e., specify the morphology of a multi-compartmental model in a separate file, and then allow the cell file to specify not only the files containing channels, but also the file containing the multi-compartmental morphology, and then specify the conductance for different segment groups. Also, after pynml_write_neuroml2_file, you would not have a super long neuroML file for each variation of conductances, since there would be no need to rewrite the multi-compartmental morphology for each conductance variation.

      We thank the reviewer for highlighting this shortcoming in NeuroML2. We have now added the ability to reference externally defined (e.g. in another file) <morphology> and <biophysicalProperties> elements from <cells>. This has enabled the morphologies and/or specification of ionic conductances to be separated out and enables more streamlined analysis of cells with different properties, as requested. Simulators NEURON, NetPyNE and EDEN already support this new form. Information on this feature has been added to https://docs.neuroml.org/Userdocs/ImportingMorphologyFiles.html#neuroml2 and also mentioned in the text (lines 188-190).

      This would be especially important for optimizations, if each trial optimization wrote out the neuroML file, then including the full morphology of a realistic cell would take up excessive disk space, as opposed to just writing out the conductance densities. As long as cell morphology must be included in every cell file, then NeuroML is not sufficiently modular, and the authors should moderate their claim of modularity (line 419) and building blocks (551).

      We believe the new functionality outlined above addresses this issue, as a single file containing the <morphology> element could be referenced, while a much smaller file, containing the channel distributions in a <biophysicalProperties> element would be generated and saved on each iteration of the optimisation.

      In addition, this is very important for downloading NeuroML-compliant reconstructions from NeuroMorpho.org. If the cell morphology cannot be imported, then the user has to edit the file downloaded from NeuroMorpho.org, and provenance can be lost.

      While the NeuroMorpho.Org website does support converting reconstructed morphologies in SWC format to NeuroML, this export feature is no longer supported on most modern browsers due to it being based on Java Applet technologies. However, a desktop version of this application, CVApp, is actively maintained

      (https://github.com/NeuroML/Cvapp-NeuroMorpho.org), and we have updated it to support export of the SWC to the standalone <morphology> element form of NeuroML discussed above. Additionally, a new Python application for conversion of SWC to NeuroML is in development and will be incorporated into PyNeuroML (Google Summer of Code 2024). Our documentation has been updated with the recommended use of SWC in NeuroML based modelling here: https://docs.neuroml.org/Userdocs/Software/Tools/SWC.html

      We have also included URLs to the tool and the documentation in the paper (lines: 473-474).

      SWC files, however, cannot be used “as is” for modelling since they only include information (often incomplete—for example a single point may represent a soma in SWC files) on the points that make the cell, but not on the sections/segments/cables that these form. Therefore, NeuroML and other simulation tools, including NEURON, must convert these into formats suitable for simulation. The suggested pipeline for use of NeuroMorpho SWC files would therefore be to convert them to NeuroML, check that they represent the intended compartmentalisation of the neuron and then use them in models.

      To ensure that provenance is maintained in all NeuroML models (including conversions from other formats), NeuroML supports the addition of RDF annotations using the COMBINE annotation specifications in model files:

      https://docs.neuroml.org/Userdocs/Provenance.html. We have added this information to the paper (lines: 464-465).

      Also, Figure 2d loses the hierarchical nature by showing ion channels, synapses, and networks as separate main branches of NeuroML.

      While an instance of an ion channel is on a segment, in a cell, in a population (and hence there is a hierarchy between them), in terms of layout in a NeuroML file the ion channel is defined at the “top level” so that it can be referenced and used by multiple cells, the cell definitions are also defined top level, and used in multiple populations, etc. There are multiple ways to depict these relationships between entities, and we believe Fig 2d complements Fig 2a-c (which is more hierarchical), by emphasising the different categories of entities present in NeuroML files. We have modified the caption of Figure 2d to clarify that it shows the main categories of elements included in the NeuroML standard in their respective hierarchies.

      In Figure 5, the difference between the core and native simulator is unclear.

      We have modified the figure and text (lines: 341) to clarify this. We now say “reference” simulators instead of “core”. This emphasises that jNeuroML and pyLEMS are intended as reference implementations in each of their languages of how to interpret NeuroML models, as opposed to high performance simulators for research use. We have also updated the categorization of the backends in the text accordingly.

      What is involved in helper scripts?

      Simulators such as NetPyNE can import NeuroML into their own internal format, but require some boilerplate code to do this (e.g. the NetPyNE scripts calls the importNeuroML2SimulateAnalyze() method with appropriate parameters). The NeuroML tools generate short scripts that use this boilerplate code. We have renamed “helper scripts” to “import scripts'' for clarity (Figure 5 and its caption).

      I thought neurons could read NeuroML? If so, why do you need the export simulator-specific scripts?

      The NEURON simulator does have some NeuroML functionality (it can export cells, though not the full network, to NeuroML 2 through its ModelView menu), but does not natively support reading/importing of NeuroML in its current version. But this is not a problem as jNeuroML/PyNeuroML translates the NeuroML model description into NEURON’s formats: Python scripts/HOC/Nmodl which NEURON then executes.

      As NEURON is the simulator which allows simulation of the widest range of NeuroML elements, we have (in agreement with the NEURON developers) concentrated on incorporating the best support for NeuroML import/export in the latest (easy to install/update) releases of PyNeuroML, rather than adding this to the Neuron source code. NEURON’s core features have been very stable for years and many versions of the simulator are used by modellers - installing the latest PyNeuroML gives them the latest NEURON support without having to reinstall the latter.

      In addition, it seems strange to call something the "core" simulation engine, when it cannot support multi-compartmental models. It is unclear why "other simulators" that natively support NeuroML cannot be called the core.

      We agree that this terminology was confusing. As mentioned above, we have changed “core simulator” to “reference simulator”, to emphasise the roles of these simulation engine options.

      It might be more helpful to replace this sort of classification with a user-targeted description. The authors already state which simulators support NeuroML and which ones need code to be exported. In contrast, lines 369-370 mention that not all NeuroML models are supported by each simulator. I recommend expanding this to explain which features are supported in each simulator. Then, the unhelpful separation between core and native could be eliminated.

      As suggested, we have grouped the simulators in terms of function and removed the core/ non-core distinction. We have also added a table (Table 3) in the appendices that lists what features each simulation engine supports and updated the text to be more user focussed (lines: 348-394).

      The body of the manuscript has so much other detail that I lose sight of how NeuroML supports FAIR. It is also unclear who is the intended audience. When I get to lines 336-344, it seems that this description is too much detail for the eLife audience. The paragraph beginning on line 691 is a great example of being unclear about who is the audience. Does someone wanting to develop NeuroML models need to understand XSD schema? If so, the explanation is not clear. XSD schema is not defined and instead explains NeuroML-specific aspects of XSD. Lines 734-735 are another example of explaining to code developers (not model developers).

      We have modified these sentences to be more suitable for the general eLife audience: we have moved the explanation of how the different simulator backends are supported to the more technically detailed Methods section (lines 882-942).

      While the results sections focus on documenting what users can do with NeuroML, the Methods sections include information on “how” the NeuroML and software ecosystem function. While the information in the methods sections may not be required by users who want to use the standard NeuroML model elements, those users looking to extend NeuroML with their own model entities and/or contribute these for inclusion in the NeuroML standard will require some understanding of how the schema and component types work.

      We have tried to limit this information to the bare minimum, pointing to online documentation where appropriate. XSD schemas are, for example, briefly introduced at the beginning of the section “The NeuroML XML Schema”. We have also included a link to the W3C documentation on XSD schemas as a footnote (line 724).

      Reviewer #2 (Public Review):

      Summary:

      Developing neuronal models that are shareable, reproducible, and interoperable allows the neuroscience community to make better use of published models and to collaborate more effectively. In this manuscript, the authors present a consolidated overview of the NeuroML model description system along with its associated tools and workflows. They describe where different components of this ecosystem lay along the model development pathway and highlight resources, including documentation and tutorials, to help users employ this system.

      Strengths:

      The manuscript is well-organized and clearly written. It effectively uses the delineated model development life cycle steps, presented in Figure 1, to organize its descriptions of the different components and tools relating to NeuroML. It uses this framework to cover the breadth of the software ecosystem and categorize its various elements. The NeuroML format is clearly described, and the authors outline the different benefits of its particular construction. As primarily a means of describing models, NeuroML also depends on many other software components to be of high utility to computational neuroscientists; these include simulators (ones that both pre-date NeuroML and those developed afterwards), visualization tools, and model databases.

      Overall, the rationale for the approach NeuroML has taken is convincing and well-described. The pointers to existing documentation, guides, and the example usages presented within the manuscript are useful starting points for potential new users. This manuscript can also serve to inform potential users of features or aspects of the ecosystem that they may have been unaware of, which could lower obstacles to adoption. While much of what is presented is not new to this manuscript, it still serves as a useful resource for the community looking for information about an established, but perhaps daunting, set of computational tools.

      We are glad the reviewer appreciated the utility of the manuscript.

      Weaknesses:

      The manuscript in large part catalogs the different tools and functionalities that have been produced through the long development cycle of NeuroML. As discussed above, this is quite useful, but it can still be somewhat overwhelming for a potential new user of these tools. There are new user guides (e.g., Table 1) and example code (e.g. Box 1), but it is not clear if those resources employ elements of the ecosystem chosen primarily for their didactic advantages, rather than general-purpose utility. I feel like the manuscript would be strengthened by the addition of clearer recommendations for users (or a range of recommendations for users in different scenarios).

      To make Table 1 more accessible to users and provide recommendations we have added the following new categories: Introductory guides aimed at teaching the fundamental

      NeuroML concepts; Advanced guides illustrating specific modelling workflows; and Walkthrough guides discussing the steps required for converting models to NeuroML. Box 1 has also been improved to clearly mark API and command line examples.

      For example, is the intention that most users should primarily use the core NeuroML tools and expand into the wider ecosystem only under particular circumstances? What are the criteria to keep in mind when making that decision to use alternative tools (scale/complexity of model, prior familiarity with other tools, etc.)? The place where it seems most ambiguous is in the choice of simulator (in part because there seem to be the most options there) - are there particular scenarios where the authors may recommend using simulators other than the core jNeuroML software?

      The interoperability of NeuroML is a major strength, but it does increase the complexity of choices facing users entering into the ecosystem. Some clearer guidance in this manuscript could enable computational neuroscientists with particular goals in mind to make better strategic decisions about which tools to employ at the outset of their work.

      As mentioned in the response to Reviewer 1, the term “core simulator” for jNeuroML was confusing, as it suggested that this is a recommended simulation tool. We have changed the description of jNeuroML to a “reference simulator” to clarify this (Figure 5 and lines 341, 353).

      In terms of giving specific guidance on which simulator to use, we have focussed on their functionality and limitations rather than recommending a specific tool (as simulator independent standards developers we are not in a position to favour particular simulators). While NEURON is the most widely used simulator currently, other simulation opinions (e.g. EDEN) have emerged recently which provide quite comprehensive NeuroML support and similar performance. Our approach is to document and promote all supported tools, while encouraging innovation and new developments. The new Table 3 in the Appendix gives a guide to assist users in choosing which simulator may best suit their needs and we have updated the text to include a brief description (lines 348-394).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I do not understand what the $comments mean in Box 1. It isn't until I get further in the text that I realize that those are command line equivalents to the Python commands.

      We thank the reviewer for highlighting this confusion. We’ve now explicitly marked the API usage and command line usage example columns to make this clearer. We have also used “>” instead of “$” now to indicate the command line,

      In Figure 9 Caption "Examples of analysis functions ..", the word analysis seems a misnomer, as these graphs all illustrate the simulation output and graphing of existing variables. I think analysis typically refers to the transformation of variables, such as spike counts and widths.

      To clarify this we have changed the caption to “Examples of visualizing biophysical properties of a NeuroML model neuron”.

      Figure 10: Why is the pulse generator part of a model? Isn't that the input to a model?

      Whether the input to the model is described separately from the NeuroML biophysical description or combined with it is a choice for the researcher. This is possible because in NeuroML any entity which has time varying states can be a NeuroML element, including the current pulse generator. In this simple example the input is contained within the same file (and therefore <neuroml> element) as the cell. However, this does not need to be the case. The cell could be fully specified in its own NeuroML file and then this can be included in other files which add different inputs to facilitate different simulation scenarios. The Python scripting interface facilitates these types of workflows.

      In the interest of modularity, can stim information be stored in a separate file and "included"?

      Yes, as mentioned above, the stimulus could be stored in a separate file.

      I find it strange to use a cell with mostly dimensionless numbers as an example. I think it would be more helpful to use a model that was more physiological.

      In choosing an example model type to use to illustrate the use of LEMS (Fig 12), NeuroML (Fig 10), XML Schema (Fig 11), the Python API (Fig 13) and online documentation (Fig 15), we needed an example which showed a sufficiently broad range of concepts (dimensional parameters, state variables, time derivatives), but which is sufficiently compact to allow a concise depiction of the key elements in figures, that fit in a single page (e.g. Fig 12). We felt that the Hindmarsh Rose model, while not very physiological, was well suited for this purpose (explaining the underlying technologies behind the NeuroML specification). The simplicity of the Hindmarsh Rose model is counterbalanced in the manuscript by the detailed models of neurons and circuits in Figures 7 & 9. The latter shows a morphologically and biophysically detailed cortical L5b pyramidal cell model.

      In lines 710-714, it is unclear what is being validated. That all parameters are defined? Using the units (or lack thereof) defined in the schema?

      Validation against the schema is “level 1” validation where the model structure, parameters, parameter values and their units, cardinality, and element positioning in the model hierarchy are checked. We have updated the paragraph to include this information and to also point to Figure 6 where different levels of validation are explained.

      Lines 740 to 746 are confusing. If 1-1 between XSD and LEMS (1st sentence) then how can component types be defined in LEMS and NOT added to the standard? Which is it? 1-1 or not 1-1?

      For the curated model elements included in the NeuroML standard, there will be a 1-1 correspondence between their component type definitions in LEMS and type definitions in the XSD schema. New user defined component types (e.g. a new abstract cell model) can be specified in LEMS as required, and these do not need to be included in the XSD schema to be loaded/simulated. However, since they are not present in the schema definition of the core/curated elements, they cannot be validated against it (level 1 validation). We have modified the text to make this clearer (line: 778).

      Nonetheless, if the new type is useful for the wider community, it can be accepted by the Editorial Board, and at that stage it will be incorporated into the core types, and added to the Schema, to be part of “valid NeuroML”.

      Figure 12. select="synapses[*]/i" is not explained. Does /i mean that iSyn is divided by i, which is current (according to the sentence 3 lines after 766) or perhaps synapse number?

      We thank the reviewer for highlighting this confusion. We have now explained the construct in the text (lines 810-812). It denotes “select the i (current) values from all Attachments which have the id ‘synapses’”. These multiple values should be reduced down to a single value through addition, as specified by the attribute: reduce=”add”.

      The line after 766 says that "DerivedVariables, variables whose values depend on other variables". You should add "and that are not derivatives, which are handled separately" because by your definition derivatives are derived variables.

      Thank you. We have updated the text with your suggestion

      Reviewer #2 (Recommendations For The Authors):

      - Figure 9: I found it somewhat confusing to have the header from the screenshot at the top ("Layer 5 Burst Accommodating Double Bouquet Cell (5)") not match the morphology shown at the bottom. It's not visually clear that the different panels in Figure 9 may refer to unrelated cells/models.

      Thank you for pointing this out. We have replaced the NeuroML-DB screenshot with one of the same Layer 5b pyramidal cells shown in the panels below it.

      Additional change:

      Figure 7c (showing the NetPyNE-UI interface) has been replaced. Previously, this displayed a 3D model which had been created in NetPyNE itself, but now shows a model which has been created in NeuroML and imported for display/simulation in NetPyNE-UI, and therefore better illustrates NeuroML functionality.

    1. Author response:

      To Reviewer #1:

      Thank you for your kind words regarding the novelty, study design, and evidence presented. We will clarify our language when describing fuzzy local-linear regression discontinuity analysis. We thank you for this feedback as our goals are to introduce these methods to a neuroscientific audience. Lastly, we will respond and clarify the methodological points, including post-selection inference, bandwidths, and Bayesian analysis in version 2.

      To Reviewers #2 and #3:

      We thank you both for your constructive feedback, specifically in highlighting 1) the scope of the intervention and 2) the UKB-neuro healthy volunteer bias. In the next manuscript version, we will expand our discussion of plausible reasons for not finding an effect – weighing up the strengths and limitations of our study in 3 aspects; statistical (RD power), design-based (lack of representativeness vs. large sample), and mechanistic (the impact/or lack thereof of one-year of education on neural plasticity decades later). As we believe the approach of natural experiments with RD designs has considerable promise for the field of population cognitive neuroscience beyond this particular study, we will address each of these points within a broader section focused on considerations on how to optimize the insight, power, and inferences gained in future work within and beyond Biobank. Moreover, we will situate our discussion on the magnitude of the educational intervention among a broader discussion of cognitive training versus education, and short - versus long-term effects. We believe revising the manuscript will improve interpretation for the reader and thank you for your in-depth feedback. Lastly, we will provide a point-by-point response in the next version.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      The conserved AAA-ATPase PCH-2 has been shown in several organisms including C. elegans to remodel classes of HORMAD proteins that act in meiotic pairing and recombination. In some organisms the impact of PCH-2 mutations is subtle but becomes more apparent when other aspects of recombination are perturbed. Patel et al. performed a set of elegant experiments in C. elegans aimed at identifying conserved functions of PCH-2. Their work provides such an opportunity because in C. elegans meiotically expressed HORMADs localize to meiotic chromosomes independently of PCH-2. Work in C. elegans also allows the authors to focus on nuclear PCH-2 functions as opposed to cytoplasmic functions also seen for PCH-2 in other organisms. 

      The authors performed the following experiments: 

      (1) They constructed C. elegans animals with SNPs that enabled them to measure crossing over in intervals that cover most of four of the six chromosomes. They then showed that doublecrossovers, which were common on most of the four chromosomes in wild-type, were absent in pch-2. They also noted shifts in crossover distribution in the four chromosomes. 

      (2) Based on the crossover analysis and previous studies they hypothesized that PCH-2 plays a role at an early stage in meiotic prophase to regulate how SPO-11 induced double-strand breaks are utilized to form crossovers. They tested their hypothesis by performing ionizing irradiation and depleting SPO-11 at different stages in meiotic prophase in wild-type and pch-2 mutant animals. The authors observed that irradiation of meiotic nuclei in zygotene resulted in pch-2 nuclei having a larger number of nuclei with 6 or greater crossovers (as measured by COSA-1 foci) compared to wildtype. Consistent with this observation, SPO11 depletion, starting roughly in zygotene, also resulted in pch-2 nuclei having an increase in 6 or more COSA-1 foci compared to wild type. The increased number at this time point appeared beneficial because a significant decrease in univalents was observed. 

      (3) They then asked if the above phenotypes correlated with the localization of MSH-5, a factor that stabilizes crossover-specific DNA recombination intermediates. They observed that pch-2

      mutants displayed an increase in MSH-5 foci at early times in meiotic prophase and an unexpectedly higher number at later times. They conclude based on the differences in early MSH-5 localization and the SPO-11 and irradiation studies that PCH-2 prevents early DSBs from becoming crossovers and early loading of MSH-5. By analyzing different HORMAD proteins that are defective in forming the closed conformation acted upon by PCH-2, they present evidence that MSH-5 loading was regulated by the HIM-3 HORMAD. 

      (4) They performed a crossover homeostasis experiment in which DSB levels were reduced. The goal of this experiment was to test if PCH-2 acts in crossover assurance. Interestingly, in this background PCH-2 negative nuclei displayed higher levels of COSA-1 foci compared to PCH-2 positive nuclei. This observation and a further test of the model suggested that "PCH-2's presence on the SC prevents crossover designation." 

      (5) Based on their observations indicating that early DSBS are prevented from becoming crossovers by PCH-2, the authors hypothesized that the DNA damage kinase CHK-2 and PCH2 act to control how DSBs enter the crossover pathway. This hypothesis was developed based on their finding that PCH-2 prevents early DSBs from becoming crossovers and previous work showing that CHK-2 activity is modulated during meiotic recombination progression. They tested their hypothesis using a mutant synaptonemal complex component that maintains high CHK-2 activity that cannot be turned off to enable crossover designation. Their finding that the pch-2 mutation suppressed the crossover defect (as measured by COSA-1 foci) supports their hypothesis. 

      Based on these studies the authors provide convincing evidence that PCH-2 prevents early DSBs from becoming crossovers and controls the number and distribution of crossovers to promote a regulated mechanism that ensures the formation of obligate crossovers and crossover homeostasis. As the authors note, such a mechanism is consistent with earlier studies suggesting that early DSBs could serve as "scouts" to facilitate homolog pairing or to coordinate the DNA damage response with repair events that lead to crossing over. The detailed mechanistic insights provided in this work will certainly be used to better understand functions for PCH-2 in meiosis in other organisms. My comments below are aimed at improving the clarity of the manuscript. 

      We thank the reviewer for their concise summary of our manuscript and their assessment of our work as “convincing” and providing “detailed mechanistic insight.”

      Comments 

      (1) It appears from reading the Materials and Methods that the SNPs used to measure crossing over were obtained by mating Hawaiian and Bristol strains. It is not clear to this reviewer how the SNPs were introduced into the animals. Was crossing over measured in a single animal line? Were the wild-type and pch-2 mutations made in backgrounds that were isogenic with respect to each other? This is a concern because it is not clear, at least to this reviewer, how much of an impact crossing different ecotypes will have on the frequency and distribution of recombination events (and possibly the recombination intermediates that were studied). 

      We will clarify these issues in the Materials and Methods of an updated preprint. The control and pch-2 mutants were isogenic in either the Bristol or Hawaiian backgrounds. Control lines were the original Bristol and Hawaiian lines and pch-2 mutants were originally made in the Bristol line and backcrossed at least 3 times before analysis. Hawaiian pch-2 mutants were made by backcrossing pch-2 mutants at least 7 times to the Hawaiian background and verifying the presence of Hawaiian SNPs on all chromosomes tested in the recombination assay. To perform the recombination assays, these isogenic lines were crossed to generate the relevant F1s.

      (2) The authors state that in pch-2 mutants there was a striking shift of crossovers (line 135) to the PC end for all of the four chromosomes that were tested. I looked at Figure 1 for some time and felt that the results were more ambiguous. Map distances seemed similar at the PC end for wildtype and pch-2 on Chrom. I. While the decrease in crossing over in pch-2 appeared significant for Chrom. I and III, the results for Chrom. IV, and Chrom. X. seemed less clear. Were map distances compared statistically? At least for this reviewer the effects on specific intervals appear less clear and without a bit more detail on how the animals were constructed it's hard for me to follow these conclusions. 

      We hope that the added details above makes the results of these assays more clear. Map distances were compared and did not satisfy statistical significance, except where indicated. While we agree that the comparisons between control animals and pch-2 mutants may seem less clear with individual chromosomes, we argue that more general patterns become clear when analyzing multiple chromosomes. Indeed, this is why we expanded our recombination analysis beyond Chromosome III and the X Chromosomes, as reported in Deshong, 2014. 

      (3) Figure 2. I'm curious why non-irradiated controls were not tested side-by-side for COSA-1 staining. It just seems like a nice control that would strengthen the authors' arguments. 

      We will add these controls in the updated preprint.

      (4) Figure 3. It took me a while to follow the connection between the COSA-1 staining and DAPI staining panels (12 hrs later). Perhaps an arrow that connects each set of time points between the panels or just a single title on the X-axis that links the two would make things clearer. 

      We will make changes in the updated preprint to make this figure more clear.

      Reviewer #2 (Public review): 

      Summary: 

      This paper has some intriguing data regarding the different potential roles of Pch-2 in ensuring crossing over. In particular, the alterations in crossover distribution and Msh-5 foci are compelling. My main issue is that some of the models are confusingly presented and would benefit from some reframing. The role of Pch-2 across organisms has been difficult to determine, the ability to separate pairing and synapsis roles in worms provides a great advantage for this paper. 

      Strengths: 

      Beautiful genetic data, clearly made figures. Great system for studying the role of Pch-2 in crossing over. 

      We thank the reviewers for their constructive and useful summary of our manuscript and the analysis of its strengths. 

      Weaknesses: 

      (1) For a general audience, definitions of crossover assurance, crossover eligible intermediates, and crossover designation would be helpful. This applies to both the proposed molecular model and the cytological manifestation that is being scored specifically in C. elegans. 

      We will make these changes in an updated preprint.

      (2) Line 62: Is there evidence that DSBs are introduced gradually throughout the early prophase? Please provide references. 

      We will reference Woglar and Villeneuve 2018 and Joshi et. al. 2015 to support this statement in the updated preprint.

      (3) Do double crossovers show strong interference in worms? Given that the PC is at the ends of chromosomes don't you expect double crossovers to be near the chromosome ends and thus the PC? 

      Despite their rarity, double crossovers do show interference in worms. However, the PC is limited to one end of the chromosome. Therefore, even if interference ensures the spacing of these double crossovers, the preponderance of one of these crossovers toward one end (and not both ends) suggest something functionally unique about the PC end.

      (4) Line 155 - if the previous data in Deshong et al is helpful it would be useful to briefly describe it and how the experimental caveats led to misinterpretation (or state that further investigation suggests a different model etc.). Many readers are unlikely to look up the paper to find out what this means. 

      We will add this to the updated preprint.

      (5) Line 248: I am confused by the meaning of crossover assurance here - you see no difference in the average number of COSA-1 foci in Pch-2 vs. wt at any time point. Is it the increase in cells with >6 COSA-1 foci that shows a loss of crossover assurance? That is the only thing that shows a significant difference (at the one time point) in COSA-1 foci. The number of dapi bodies shows the loss of Pch-2 increases crossover assurance (fewer cells with unattached homologs). So this part is confusing to me. How does reliably detecting foci vs. DAPI bodies explain this? 

      We apologize for the confusion and will make this more clear in an updated perprint. The reviewer is correct that we do not see a difference in the average number of GFP::COSA1 foci at all time points in this experiment, even though we do see a difference in the number of DAPI stained bodies (an increase in crossover assurance in pch-2 mutants). What we meant to convey is that because of PCH-2’s dual role in regulating crossover formation (inhibiting it in early prophase, guaranteeing assurance later), the average number of GFP::COSA-1 foci at all time points also reflects this later role, resulting in this average being lower than if PCH-2 only inhibited crossovers early in meiotic prophase. We have shown that this later role does not significantly affect the average number of DAPI stained bodies, allowing us to see the role of PCH-2 in early meiotic prophase on crossover formation more clearly.

      (6) Line 384: I am confused. I understand that in the dsb-2/pch2 mutant there are fewer COSA-1 foci. So fewer crossovers are designated when DSBs are reduced in the absence of PCH-2.

      How then does this suggest that PCH-2's presence on the SC prevents crossover designation? Its absence is preventing crossover designation at least in the dsb-2 mutant. 

      We will also make this more clear in an updated preprint, as well as provide additional evidence to support this claim. In this experiment, we had identified three possible explanations for why PCH-2 persists on some nuclei that do not have GFP::COSA-1 foci: 1) PCH-2 removal is coincident with crossover designation; 2) PCH-2 removal depends on crossover designation; and 3) PCH-2 removal facilitates crossover designation. The decrease in the number of GFP::COSA-1 foci in dsb-2::AID;pch-2 mutants argues against the first two possibilities, suggesting that the third might be correct. We have additional evidence that we will include in an updated preprint that should provide stronger support and make this more clear.

      (7) Discussion Line 535: How do you know that the crossovers that form near the PCs are Class II and not the other way around? Perhaps early forming Class I crossovers give time for a second Class II crossover to form. In budding yeast, it is thought that synapsis initiation sites are likely sites of crossover designation and class I crossing over. Also, the precursors that form class I and II crossovers may be the same or highly similar to each other, such that Pch-2's actions could equally affect both pathways. 

      We do not know that the crossovers that form near the PC are Class II but hypothesize that they are based on the close, functional relationship that exists between Class I crossovers and synapsis and the apparent antagonistic relationship that exists between Class II crossovers and synapsis. We agree that Class I and Class II crossover precursors are likely to be the same or highly similar, exhibit extensive crosstalk that may complicate straightforward analysis and PCH-2 is likely to affect both, as strongly suggested by our GFP::MSH-5 analysis. We present this hypothesis based on the apparent relationship between PCH-2 and synapsis in several systems but agree that it needs to be formally tested. We will make this argument more clear in an updated preprint.

      Reviewer #3 (Public review): 

      Summary: 

      This manuscript describes an in-depth analysis of the effect of the AAA+ ATPase PCH-2 on meiotic crossover formation in C. elegant. The authors reach several conclusions, and attempt to synthesize a 'universal' framework for the role of this factor in eukaryotic meiosis. 

      Strengths: 

      The manuscript makes use of the advantages of the 'conveyor' belt system within the c.elegans reproductive tract, to enable a series of elegant genetic experiments. 

      We thank this reviewer for the useful assessment of our manuscript and the articulation of its strengths.

      Weaknesses: 

      A weakness of this manuscript is that it heavily relies on certain genetic/cell biological assays that can report on distinct crossover outcomes, without clear and directed control over other aspects and variables that might also impact the final repair outcome. Such assays are currently out of reach in this model system. 

      In general, this manuscript could be more generally accessible to non-C.elegans readers. Currently, the manuscript is hard to digest for non-experts (even if meiosis researchers). In addition, the authors should be careful to consider alternative explanations for certain results. At several steps in the manuscript, results could ostensibly be caused by underlying defects that are currently unknown (for example, can we know for sure that pch-2 mutants do not suffer from altered DSB patterning, and how can we know what the exact functional and genetic interactions between pch-2 and HORMAD mutants tell us?). Alternative explanations are possible and it would serve the reader well to explicitly name and explain these options throughout the manuscript. 

      We will make the manuscript more accessible to non-C. elegans readers and discuss alternate explanations for specific results in an updated preprint.

    1. Author response:

      Reviewer 1:

      There are no significant weaknesses to signal in the manuscript. However, in order to fully conclude that there is no obvious advantage for the linguistic dimension in neonates, it would have been most useful to test a third condition in which the two dimensions were pitted against each other, that is, in which they provide conflicting information as to the boundaries of the words comprised in the artificial language. This last condition would have allowed us to determine whether statistical learning weighs linguistic and non-linguistic features equally, or whether phonetic content is preferentially processed.

      We appreciate the reviewers' suggestion that a stream with conflicting information would provide valuable insights. In the present study, we started with a simpler case involving two orthogonal features (i.e., phonemes and voices), with one feature being informative and the other uninformative, and we found similar learning capacities for both. Future work should explore whether infants—and humans more broadly—can simultaneously track regularities in multiple speech features. However, creating a stream with two conflicting statistical structures is challenging. To use neural entrainment, the two features must lead to segmentation at different chunk sizes so that their effects lead to changes in power/PLV at different frequencies—for instance, using duplets for the voice dimension and triplets for the linguistic dimension  (or vice versa). Consequently, the two dimensions would not be directly comparable within the same participant in terms of the number of distinguishable syllables/voices, memory demand, or SNR given the 1/F decrease in amplitude of background EEG activity. This would involve comparisons between two distinct groups counter-balancing chunk size and linguistic non-linguistic dimension. Considering the test phase, words for one dimension would have been part-words for the other dimension. As we are measuring differences and not preferences, interpreting the results would also have been difficult. Additionally, it may be difficult to find a sufficient number of clearly discriminable voices for such a design (triplets imply 12 voices). Therefore, an entirely different experimental paradigm would need to be developed.

      If such a design were tested, one possibility is that the regularities for the two dimensions are calculated in parallel, in line with the idea that the calculation of statistical regularities is a ubiquitous implicit mechanism (see Benjamin et al., 2024, for a proposed neural mechanism). Yet, similar to our present study, possibly only phonetic features would be used as word candidates. Another possibility is that only one informative feature would be explicitly processed at a time due to the serial nature of perceptual awareness, which may prioritise one feature over the other.

      Note: The reviewer’s summary contains a typo: syllabic rate (4 Hz) –not 2 Hz, and word rate (2 Hz) –not 4 Hz.

      Reviewer 2:

      N400: I am skeptical regarding the interpretation of the phoneme-specific ERP effect as a precursor of the N400 and would suggest toning it down. While the authors are correct in that infant ERP components are typically slower and more posterior compared to adult components, and the observed pattern is hence consistent with an adult N400, at the same time, it could also be a lot of other things. On a functional level, I can't follow the author's argument as to why a violation in phoneme regularity should elicit an N400, since there is no evidence for any semantic processing involved. In sum, I think there is just not enough evidence from the present paradigm to confidently call it an N400.

      The reviewer is correct that we cannot definitively determine the type of processing reflected by the ERP component that appears when neonates hear a triplet after exposure to a stream with phonetic regularities. We interpreted this component as a precursor to the N400, based on prior findings in speech segmentation tasks without semantic content, where a ~400 ms component emerged when adult participants recognised pseudowords (Sander et al., 2002) or during structured streams of syllables (Cunillera et al., 2006, 2009). Additionally, the component we observed had a similar topography and timing to those labelled as N400 in infant studies, where semantic processing was involved (Parise et al., 2010; Friedrich & Friederici, 2011).

      Given our experimental design, the difference we observed must be related to the type of regularity during familiarisation (either phonemes or voices). Thus, we interpreted this component as reflecting lexical search— a process which could be triggered by a linguistic structure but which would not be relevant to a non-linguistic regularity such as voices. However, we are open to alternative interpretations. In any case, this difference between the two streams reveals that computing regularities based on phonemes versus voices does not lead to the same processes. We will revise and tone down the corresponding part of the discussion to clarify that it is just a possible interpretation of the results.  

      Female and male voices: Why did the authors choose to include male and female voices? While using both female and male stimuli of course leads to a higher generalizability, it also introduces a second dimension for one feature that is not present for this other (i.e., phoneme for Experiment 1 and voice identity plus gender for Experiment 2). Hence, couldn't it also be that the infants extracted the regularity with which one gender voice followed the other? For instance, in List B, in the words, one gender is always followed by the other (M-F or F-M), while in 2/3 of the part-words, the gender is repeated (F-F and M-M). Wouldn't you expect the same pattern of results if infants learned regularities based on gender rather than identity?

      We used three female and three male voices to maximise acoustic variability. The streams were synthesised using MBROLA, which provides a limited set of artificial voices. Indeed, there were not enough French voices of acceptable quality, so we also used two Italian voices (the phonemes used existed in both Italian and French).

      Voices differ in timbre, and female voices tend to be higher pitched. However, it is sometimes difficult to categorise low-pitched female voices and high-pitched male voices. Given that gender may be an important factor in infants' speech perception (newborns, for instance, prefer female voices at birth), we conducted tests to assess whether this dimension could have influenced our results.  

      We first quantified the transitional probabilities matrices during the structured stream of Experiment 2, considering that there are only two types of voices: Female and Male.  

      For List A, all transition probabilities are equal to 0.5 (P(M|F), P(F|M), P(M|M), P(F|F)), resulting in flat TPs throughout the stream (see Author response image 1, top). Therefore, we would not expect neural entrainment at the word rate (2 Hz), nor would we anticipate ERP differences between the presented duplets in the test phase.

      For List B, P(M|F)=P(F|M)=0.66 while P(M|M)=P(F|F)=0.33. However, this does not produce a regular pattern of TP drops throughout the stream (see Author response image 1, bottom). As a result, strong neural entrainment at 2 Hz was unlikely, although some degree of entrainment might have occasionally occurred due to some drops occurring at a 2 Hz frequency. Regarding the test phase, all three Words and only one Part-word presented alternating patterns (TP=0.6). Therefore, the difference in the ERPs between Words and Partwords in List B might be attributed to gender alternation.  

      However, it seems unlikely that gender alternation alone explains the entire pattern of results, as the effect is inconsistent and appears in only one of the lists. To rule out this possibility, we analysed the effects in each list separately.

      Author response image 1.

      Transition probabilities (TPs) across the structured stream in Experiment 2, considering voices processed by gender (Female or Male). Top: List A. Bottom: List B.

      We computed the mean activation within the time windows and electrodes of interest and compared the effects of word type and list using a two-way ANOVA. For the difference between Words and Part-words over the positive cluster, we observed a main effect of word type (F(1,31) = 5.902, p = 0.021), with no effects of list or interactions (p > 0.1). Over the negative cluster, we again observed a main effect of word type (F(1,31) = 10.916, p = 0.0016), with no effects of list or interactions (p > 0.1). See Author response image 2.  

      Author response image 2.

      Difference in ERP voltage (Words – Part-words) for the two lists (A and B); W=Words; P=Part-Words, 

      We conducted a similar analysis for neural entrainment during the structured stream on voices. A comparison of entrainment at 2 Hz between participants who completed List A and List B showed no significant differences (t(30) = -0.27, p = 0.79). A test against zero for each list indicated significant entrainment in both cases (List A: t(17) = 4.44, p = 0.00036; List B: t(13) = 3.16, p = 0.0075). See Author response image 3.

      Author response image 3.

      Neural entrainment at 2Hz during the structured stream of Experiment 2 for Lists A and B.

      Words entrainment over occipital electrodes: Do you have any idea why the duplet entrainment effect occurs over the electrodes it does, in particular over the occipital electrodes (which seems a bit unintuitive given that this is a purely auditory experiment with sleeping neonates).

      Neural entrainment might be considered as a succession of evoked response induced by the stream. After applying an average reference in high-density EEG recordings, the auditory ERP in neonates typically consists of a central positivity and a posterior negativity with a source located at the electrical zero in a single-dipole model (i.e. approximately in the superior temporal region (Dehaene-Lambertz & Dehaene, 1994). In adults, because of the average reference (i.e. the sum of voltages is equal to zero at each time point) and because the electrodes cannot capture the negative pole of the auditory response, the negativity is distributed around the head. In infants, however, the brain is higher within the skull, allowing for a more accurate recording of the negative pole of the auditory ERP (see Author response image 4 for the location of electrodes in an infant head model).  

      Besides the posterior electrodes, we can see some entrainment on more anterior electrodes that probably corresponds to the positive pole of the auditory ERP.

      Author response image 4.

      International 10–20 sensors' location on the skull of an infant template, with the underlying 3-D reconstruction of the grey-white matter interface and projection of each electrode to the cortex. Computed across 16 infants (from Kabdebon et al, Neuroimage, 2014). The O1, O2, T5, and T6 electrodes project lower than in adults.

      Reviewer 3:

      (1) While it's true that voice is not essential for language (i.e., sign languages are implemented over gestures; the use of voices to produce non-linguistic sounds, like laughter), it is a feature of spoken languages. Thus I'm not sure if we can really consider this study as a comparison between linguistic and non-linguistic dimensions. In turn, I'm not sure that these results show that statistical learning at birth operates on non-linguistic features, being voices a linguistic dimension at least in spoken languages. I'd like to hear the authors' opinions on this.

      On one hand, it has been shown that statistical learning (SL) operates across multiple modalities and domains in human adults and animals. On the other hand, SL is considered essential for infants to begin parsing speech. Therefore, we aimed to investigate whether SL capacities at birth are more effective on linguistic dimensions of speech, potentially as a way to promote language learning.

      We agree with the reviewer that voices play an important role in communication (e.g., for identifying who is speaking); however, they do not contribute to language structure or meaning, and listeners are expected to normalize across voices to accurately perceive phonemes and words. Thus, voices are speech features but not linguistic features. Additionally, in natural speech, there are no abrupt voice changes within a word as in our experiment; instead, voice changes typically occur on a longer timescale and involve only a limited number of voices, such as in a dialogue. Therefore, computing regularities based on voice changes would not be useful in real-life language learning. We considered that contrasting syllables and voices was an elegant way to test SL beyond its linguistic dimension, as the experimental paradigm is identical in both experiments.  

      Along the same line, in the Discussion section, the present results are interpreted within a theoretical framework showing statistical learning in auditory non-linguistic (string of tones, music) and visual domains as well as visual and other animal species. I'm not sure if that theoretical framework is the right fit for the present results.

      (2) I'm not sure whether the fact that we see parallel and independent tracking of statistics in the two dimensions of speech at birth indicates that newborns would be able to do so in all the other dimensions of the speech. If so, what other dimensions are the authors referring to?

      The reviewer is correct that demonstrating the universality of SL requires testing additional modalities and acoustic dimensions. However, we postulate that SL is grounded in a basic mechanism of long-term associative learning, as proposed in Benjamin et al. (2024), which relies on a slow decay in the representation of a given event. This simple mechanism, capable of operating on any representational output, accounts for many types of sequence learning reported in the literature (Benjamin et al., in preparation). We will revise the discussion section to clarify this theoretical framework.

      (3) Lines 341-345: Statistical learning is an evolutionary ancient learning mechanism but I do not think that the present results are showing it. This is a study on human neonates and adults, there are no other animal species involved therefore I do not see a connection with the evolutionary history of statistical learning. It would be much more interesting to make claims on the ontogeny (rather than philogeny) of statistical learning, and what regularities newborns are able to detect right after birth. I believe that this is one of the strengths of this work.

      We did not intend to make claims about the phylogeny of SL. Since SL appears to be a learning mechanism shared across species, we use it as a framework to suggest that SL may arise from general operational principles applicable to diverse neural networks. Thus, while it is highly useful for language acquisition, it is not specific to it. We will revise this section to tone down our claims.  

      (4) The description of the stimuli in Lines 110-113 is a bit confusing. In Experiment 1, e.g., "pe" and "tu" are both uttered by the same voice, correct? ("random voice each time" is confusing). Whereas in Experiment 2, e.g., "pe" and "tu" are uttered by different voices, for example, "pe" by yellow voice and "tu" by red voice. If this is correct, then I recommend the authors to rephrase this section to make it more clear.

      To clarify, in Experiment 1, the voices were randomly assigned to each syllable, with the constraint that no voice was repeated consecutively. This means that syllables within the same word were spoken by different voices, and each syllable was heard with various voices throughout the stream. As a result, neonates had to retrieve the words based solely on syllabic patterns, without relying on consistent voice associations or specific voice relationships.

      In Experiment 2, the design was orthogonal: while the syllables were presented in a random order, the voices followed a structured pattern. Similar to Experiment 1, each syllable (e.g., “pe” and “tu”) was spoken by different voices. The key difference is that in Experiment 2, the structured regularities were applied to the voices rather than the syllables. In other words, the “green” voice was always followed by the “red” voice for example but uttered different syllables.

      We will revise the methods section to clarify these important points.

      (5) Line 114: the sentence "they should compute a 36 x 36 TPs matrix relating each acoustic signal, with TPs alternating between 1/6 within words and 1/12 between words" is confusing as it seems like there are different acoustic signals. Can the authors clarify this point?

      Thank you for highlighting this point. To clarify, our suggestion is that neonates might not track regularities between phonemes and voices as separate features. Instead, they may treat each syllable-voice combination as a distinct item—for example, "pe" spoken by the "yellow" voice is one item, while "pe" spoken by the "red" voice is another. Under this scenario, there would be a total of 36 unique items (6 syllables × 6 voices), and infants would need to track regularities between these 36 combinations.

      We will rephrase this sentence in the manuscript to make it clearer.

    1. Author response:

      The following is the authors’ response to the original reviews.

      A summary of changes

      (1) Line 93: “positive effect” to “positive contribution”, as suggested by reviewer 2.

      (2) Line 147-148: the null hypothesis to test “equal interspecific and intraspecific interactions”, as indicated by reviewers 2 and 4.

      (3) Lines 155-162: removed to reduce duplication with the additive partitioning, as suggested by reviewer 2.

      (4) Lines 186-188: added “the estimated competitive growth response would also include the effects of density-dependent pests, pathogens, or microclimates”, as suggested by reviewer 3.  

      (5) Lines 219-222: added “The community positive effect can be further partitioned by mechanisms of positive interactions (resource partitioning and facilitation), and facilitative effect can be classified as mutualism (+/+), commensalism (+/0), or parasitic (+/–) based on species specific assessments”.  

      (6) Lines 377-386: added options for determining maximum competitive growth response in some extreme scenarios of species mixtures.

      (7) Figure 1: modified to show the variations of competitive growth response with relative competitive ability from minimum (null expectation) to maximum (competitive exclusion).    

      A summary of four reviewers’ questions and authors’ response

      (1) A summary of authors’ responses. Reviewers did not seem to understand our work. They indicated that our model is inadequate for hypothesis testing. The fact is, as we note below, that our model allows for more hypothesis testing than the additive partitioning model. They suggested that one of our model components, the competitive growth response, needs to be further partitioned. However, this term represents only the competition effect and can not be split any further. Reviewers criticized us for misunderstanding the additive components while they suggested the same logic to test some intuitive ideas. They did not seem to know that the effects of competitive interactions vary with assessment methods, which differ between competition and biodiversity research. Our work seeks to harmonise definitions between these two fields and bridge the gap. The reviewers acknowledged that the additive components (i.e., the selection effect and complementarity effect) do not have clear biological meanings; however, they did not acknowledge that the additive components are used extensively for determining mechanisms of species interactions in biodiversity research. There is hardly any research that uses the additive partitioning model without linking the additive components to specific mechanisms of species interactions (i.e., positive SE to competition and positive CE to positive interactions).

      (2) Additive partitioning and underlying mechanisms. Some reviewers acknowledged that additive partitioning is not meant for determining mechanisms of species interactions and therefore argued that the additive partitioning should not be criticized for lack of biological meanings with the additive components. However, they insisted that additive partitioning is useful in quantifying net biodiversity effects against the null hypothesis that there is no difference between intraspecific and interspecific interactions or testing the idea that “niche complementarity mitigates competition” or “competitively superior species dominate mixtures”. Are these views contradictory each other? How can the additive partitioning that is not designed for determining mechanisms of species interactions provide meaningful explanations for outputs of species interactions, e.g., “niche complementarity mitigates competition” or “competitively superior species dominate mixtures”?

      Reviewers did not seem to realize that these ideas are equivalent to the suggestions that CE represents for the effects of positive interactions and SE for the effects of competitive interactions, that the quantification of net biodiversity effects does not require the two additive components, and that the null hypothesis exists long before the additive partitioning (see de Wit, 1960, de Wit et al., 1966). It is generally agreed that CE and SE result from mathematical calculations and do not have clear biological meanings in terms of linkages to specific mechanisms of species interactions responsible for observed net biodiversity effects or changes in ecosystem function (Loreau and Hector, 2012; Bourrat et al., 2023). Calling some mixed effects of species interactions as mechanisms (e.g., CE and SE) is misleading.        

      Model structure: incomplete or inadequate for hypothesis testing. Other than positive, negative, and competition interactions, two reviewers wanted to have more specific interactions such as microclimate amelioration and negative feedback from species-specific pests and pathogens. The determination of these specific mechanisms requires more investigations and cannot be simply made through partitioning growth and yield data. However, the effects of these interactions will be captured in our definition of species interactions.  Reviewers did not seem to know that the additive partitioning would also not allow identifying these specific positive species interactions.

      Inspired by the mathematical form of additive partitioning, two reviewers suggested that our model (presumably equation 4) is incomplete and the second term, i.e., competitive growth response needs to be further explored or partitioned. The second term represents deviations from the null expectation, due to species differences in growth and competitive ability or competition effect. We do not know why and how this term can be further partitioned and what any subcomponents would mean.   

      Our competitive partitioning model is based on two hypotheses: first, the null hypothesis to test the equivalence of interspecific and intraspecific interactions. This hypothesis is the same as the additive partitioning model. Second, the competitive hypothesis, which tests the dominance of positive or negative species interactions in a community. Thus, our model allows for more hypothesis testing than the current additive partitioning model.     

      (3) Types of species interactions. We follow the definition of species interactions generally used in biodiversity research (see Loreau and Hector, 2001), i.e., positive interactions (or complementarity) include resource partitioning and facilitation, negative interactions include interference competition, and competitive interactions include resource competition. One reviewer suggested that resource partitioning is byproduct of competition and should not be part of positive species interactions, which may be true for long-term evolution of species co-existence but not for biodiversity experiments of decade duration at most. Two reviewers suggested that positive interactions should also include microclimate amelioration or negative feedback from species-specific pests and pathogens. We agree and these are included in our definition. 

      (4) Significance of partial density monocultures. We used partial and full density monocultures and species competitive ability to determine what species can possibly achieve in mixture under the competitive hypothesis that constituent species share an identical niche but differ in growth and competitive ability. We did not use partial monocultures to test the effects of density on biodiversity effects. As with the additive partitioning, the competitive partitioning model is not designed for comparing yields across different densities. We added at lines 186-188 to indicate that the estimated competitive growth response would also include the effects of density-dependent pests, pathogens, or microclimates.  

      Similarly, we do not use the partial density monoculture to  supplant the replacement series design. Partial density monocultures only supplement the “replacement series” design that does not provides estimates of facilitative effects and competitive growth responses that would occur in mixtures. It is crucial to know that one experimental approach is simply not enough for determining underlying mechanisms of species interactions responsible for changes in ecosystem function.  

      (5) Competition effect in competition and biodiversity research. Due to different methods used, competition effect in competition research has different ecological meanings from that in biodiversity research. In competition research, species performance in mixture are compared with their partial density monocultures and therefore competition effect is generally negative, as suggested by reviewer 4. In biodiversity research, comparison is between mixture and full density monocultures. The resulting competition effect can be positive or negative for both individual species and community productivity defined by species composition and full density monoculture yields.     

      Therefore, we cannot use the results of competition research based on additive series design to describe effects of competitive interactions on ecosystem productivity based replacement series design.

      Reviewer #1 (Public Review):

      [Editors' note: this is an overall synthesis from the Reviewing Editor in consultation with the reviewers.]

      The three reviews expand our critique of this manuscript in some depth and complementary directions. These can be synthesized in the following main points (we point out that there is quite a bit more that could be written about the flaws with this study; however, time constraints prevented us from further elaborating on the issues we see):

      (1) It is unclear what the authors want to do.

      As indicate by the title, our objective is to “partition changes in ecosystem productivity by effects of species interactions”, i.e., partitioning net biodiversity effects estimated from the null expectation into components associated with positive, negative, or competition interspecific interactions.

      It seems their main point is that the large BEF literature and especially biodiversity experiments overstate the occurrence of positive biodiversity effects because some of these can result from competition.

      We demonstrated through ecological theories and simulation/experiment data that competition is a major source of the net biodiversity effects estimated with additive partitioning model. We know that competition effect varies with mixture attributes. Future research will determine average effect of competitive interactions on biodiversity effects in large BEF literature.   

      Because reduced interspecific relative to intraspecific competition in mixture is sufficient to produce positive effects in mixtures (if interspecific competition = 0 then RYT = S, where S is species richness in mixture -- this according to the reciprocal yield law = law of constant final yield), they have a problem accepting NE > 0 as true biodiversity effect (see additive partitioning method of Loreau & Hector 2001 cited in manuscript).

      We have no problem to accept NE>0 as true positive biodiversity effect. However, NE>0 can also result from competitive interactions based on the null expectation and needs to be partitioned by effects of species interactions.

      (2) The authors' next claim, without justification, that additive partitioning of NE is flawed and theoretically and biologically meaningless.

      The additive partitioning model is based on Covariance equation (or Price equation) that has nothing to do with biodiversity partitioning (Bourrat et al., 2023). Biological meaning was arbitrarily assigned to CE and SE. We made clear that the additive partitioning model is mathematically sound but does not have biological meanings that it has been used for.   

      They misinterpret the CE component as biological niche partitioning and the SE component as biological dominance.

      We did not. Loreau and Hector (2001) clearly indicated positive CE for positive interactions and positive SE for competitive interactions, which is generally what has been used for in the last twenty years.

      They do not seem to accept that the additive partitioning is a logically and mathematically sound derivation from basic principles that cannot be contested.

      We do not have problem with mathematical form of additive partitioning but only oppose ecological meanings assigned to CE and SE, simply because CE and SE both result from all species interactions (see Loreau and Hector, 2001; Bourrat et al., 2023). The reviewer seemed to have a contradictory thinking that the additive components are biologically meaningless but derived from biological basic principles.       

      (3) The authors go on to introduce a method to calculate species-level overyielding (RY > 1/S in replacement series experiments) as a competitive growth response and multiply this with the species monoculture biomass relative to the maximum to obtain competitive expectation. This method is based on resource competition and the idea that resource uptake is fully converted into biomass (instead of e.g. investing it in allelopathic chemical production).

      Correct, but we did not assume “resource uptake is fully converted into biomass”.

      (4) It is unclear which experiments should be done, i.e. are partial-density monocultures planted or simply calculated from full-density monocultures? At what time are monocultures evaluated? The framework suggests that monocultures must have the full potential to develop, but in experiments, they are often performing very poorly, at least after some time. I assume in such cases the monocultures could not be used.

      Both partial and full density monocultures are needed, along with mixtures to separate NE by species interactions. Calculating competitive growth responses from density-size relationships can be an alternative, given the lack of partial density monocultures in current biodiversity experiments, but is not preferred.

      Similar to additive partitioning, our model can (and should) be applied to all developmental stages of an experiment to examine how interactions evolve through time.   

      (5) There are many reasons why the ideal case of only resource competition playing a role is unrealistic. This excludes enemies but also differential conversion factors of resources into biomass and antagonistic or facilitative effects. Because there are so many potential reasons for deviations from the null model of only resource competition, a deviation from the null model does not allow conclusions about underlying mechanisms.

      The competitive expectation is only a hypothesis, just as the null expectation. The difference between competitive and null expectations represents a competitive effect resulting from species differences in growth and competitive ability, while the deviation of observed yields from the competitive expectation indicates positive or negative effect (see lines 201-219).

      Furthermore, this is not a systematically developed partitioning, but some rather empirical ad hoc formulation of a first term that is thought to approximate competitive effects as understood by the authors (but again, there already are problems here). The second residual term is not investigated. For a proper partitioning approach, one would have to decompose overyielding into two (or more) terms and demonstrate (algebraically) that under some reasonable definitions of competitive and non-competitive interactions, these end up driving the respective terms.

      The first term represents the null expectation assuming equal interspecific and intraspecific interactions, i.e., absence of positive, negative, and competition effects. The second residual term represents competition effect, due to species differences in growth and competitive ability. The meaning of second residual term is clear and does not need to be further partitioned or investigated.

      In fact, our competitive partitioning also has several components including null expectation, competitive growth response, and observed yield, plus partial density monocultures for species assessment, or null expectations, competitive expectations, and observed yields for community level assessment, although different from the additive partitioning.

      (6) Using a simplistic simulation to test the method is insufficient. For example, I do not see how the simulation includes a mechanism that could create CE in additive partitioning if all species would have the same monoculture yield. Similarly, they do not include mechanisms of enemies or antagonistic interactions (e.g. allelopathy).

      The simulation model we used is developed from real world data and can only do what are available in the model in terms of species and their growth under different conditions. We can not go beyond data limitation. The model is empirical and has been shown to accurately estimate yield in the aspen-spruce forest condition. We would also note that we do also use experimental data (Table 2).  

      (7) The authors do not cite relevant literature regarding density x biodiversity experiments, competition experiments, replacement-series experiments, density-yield experiments, additive partitioning, facilitation, and so on.

      We cited literature relevant to biodiversity partitioning since we are not aiming to cover everything. The reviewer may not be aware that most of the research areas listed are actually included in our work, such as additive and replacement-series experiment designs, additive partitioning, facilitation, competition studies, and density-yield relationships. Our competitive model partitioning is based on biological principles, while the additive partitioning model is based only on a mathematical equation.   

      Overall, this manuscript does not lead further from what we have already elaborated in the broad field of BEF and competition studies and rather blurs our understanding of the topic.

      The results of competition studies based on additive series design are not really used in the broad field of BEF based on replacement series design. The effects of competitive interactions on BEF are never clearly defined using the results of competition studies. Our work is filling that gap.  

      Reviewer #2 (Public Review):

      This manuscript is motivated by the question of what mechanisms cause overyielding in mixed-species communities relative to the corresponding monocultures. This is an important and timely question, given that the ultimate biological reasons for such biodiversity effects are not fully understood.

      As a starting point, the authors discuss the so-called "additive partitioning" (AP) method proposed by Loreau & Hector in 2001. The AP is the result of a mathematical rearrangement of the definition of overyielding, written in terms of relative yields (RY) of species in mixtures relative to monocultures. One term, the so-called complementarity effect (CE), is proportional to the average RY deviations from the null expectations that plants of both species "do the same" in monocultures and mixtures. The other term, the selection effect (SE), captures how these RY deviations are related to monoculture productivity. Overall, CE measures whether relative biomass gains differ from zero when averaged across all community members, and SE, whether the "relative advantage" species have in the mixture, is related to their productivity. In extreme cases, when all species benefit, CE becomes positive. When large species have large relative productivity increases, SE becomes positive. This is intuitively compatible with the idea that niche complementarity mitigates competition (CE>0), or that competitively superior species dominate mixtures and thereby driver overyielding (SE>0).

      The reviewer needs to know that these ideas are based on the same logic that positive CE represents the effects of positive interactions and positive SE represents the effects of competitive interactions. CE>0 or SE>0 can result from many different scenarios of species interactions, not necessarily “niche complementarity mitigates competition” or “competitively superior species dominate mixtures”. CE>0 and SE>0 can occur alone or together. We simply can not tell underlying mechanisms of overyielding from mathematical calculations (CE and SE), as suggested by this reviewer later.

      The reviewer criticizes us while using the same logic themselves.

      However, it is very important to understand that CE and SE capture the "statistical structure" of RY that underlies overyielding. Specifically, CE and SE are not the ultimate biological mechanisms that drive overyielding, and never were meant to be. CE also does not describe niche complementarity. Interpreting CE and SE as directly quantifying niche complementarity or resource competition, is simply wrong, although it sometimes is done. The criticism of the AP method thus in large part seems unwarranted. The alternative methods the authors discuss (lines 108-123) are based on very similar principles.

      The reviewer actually supports our point. However, CE and SE have been largely used as biological mechanisms, positive CE as the results of complementary interactions and positive SE as the results of competitive interactions (see Loreau and Hector, 2001).  

      We do not have problem with the "statistical structure" of AP; it is simply a covariance equation. It is important to know that CE and SE do not provide additional information on overyielding than NE in terms of underlying mechanisms of species interactions. Any attempt to investigate mechanism of overyielding with CE or SE can easily go wrong.

      Our competitive partitioning model incorporates effects of competitive interactions into the conventional null expectation and allows for separating different effects of species interactions. In comparison, the additive partitioning model does not have this capacity, not even designed for this purpose, as suggested by this and other reviewers.         

      The authors now set out to develop a method that aims at linking response patterns to "more true" biological mechanisms.

      Assuming that "competitive dominance" is key to understanding mixture productivity, because "competitive interactions are the predominant type of interspecific relationships in plants", the authors introduce "partial density" monocultures, i.e. monocultures that have the same planting density for a species as in a mixture. The idea is that using these partial density monocultures as a reference would allow for isolating the effect of competition by the surrounding "species matrix".

      Correct.

      The authors argue that "To separate effects of competitive interactions from those of other species interactions, we would need the hypothesis that constituent species share an identical niche but differ in growth and competitive ability (i.e., absence of positive/negative interactions)." - I think the term interaction is not correctly used here, because clearly competition is an interaction, but the point made here is that this would be a zero-sum game.

      We did not say that competition is not an interaction; we only want to separate the effect of competition from those of other species interactions.

      The authors use the ratio of productivity of partial density and full-density monocultures, divided by planting density, as a measure of "competitive growth response" (abbreviated as MG). This is the extra growth a plant individual produces when intraspecific competition is reduced.

      Correct.

      We added at lines 377-386 to discuss options to determine MG in some uncommon scenarios of species mixtures.

      Here, I see two issues: first, this rests on the assumption that there is only "one mode" of competition if two species use the same resources, which may not be true, because intraspecific and interspecific competition may differ. Of course, one can argue that then somehow "niches" are different, but such a niche definition would be very broad and go beyond the "resource set" perspective the authors adopt. Second, this value will heavily depend on timing and the relationship between maximum initial growth rates and competitive abilities at high stand densities.

      First, the "competitive effect" focusses on resource competition and other forms of competition (presumably interference competition) are included in the negative interactions.

      Second, competitive growth response varies over time and with density, and so do NE, CE, SE, and interspecific interactions.

      The authors then progress to define relative competitive ability (RC), and this time simply uses monoculture biomass as a measure of competitive ability. To express this biomass in a standardized way, they express it as different from the mean of the other species and then divide by the maximum monoculture biomass of all species.

      I have two concerns here: first, if competitive ability is the capability of a species to preempt resources from a pool also accessed by another species, as the authors argued before, then this seems wrong because one would expect that a species can simply be more productive because it has a broader niche space that it exploits. This contradicts the very narrow perspective on competitive ability the authors have adopted. This also is difficult to reconcile with the idea that specialist species with a narrow niche would outcompete generalist species with a broad niche. Second, I am concerned by the mathematical form. Standardizing by the maximum makes the scaling dependent on a single value.

      First, growth conditions are controlled in biodiversity experiments, i.e., both monocultures and mixtures are the same in resource space. Species do not have opportunity to exploit resources outside experimental area. For example, if less productive species on normal soils outperform more competitive species on saline/alkaline soil, these “less productive species” are considered “more productive”.    

      Second, as discussed in our paper (lines 367-376; Figure 1), more research is needed to determine relationships between species traits (biomass or height) and relative competitive ability. By then, scaling by the maximum would not be needed. There has been quite a lot of research on such relationships; we should leave this to subject experts to determine what would be mostly appropriate for species studied.

      As a final step, the authors calculate a "competitive expectation" for a species' biomass in the mixture, by scaling deviations from the expected yield by the product MG ⨯ RC. This would mean a species does better in a mixture when (1) it benefits most from a conspecific density reduction, and (2) has a relatively high biomass.

      Put simply, the assumption would be that if a species is productive in monoculture (high RC), it effectively does not "see" the competitors and then grows like it would be the sole species in the community, i.e. like in the partial density monoculture.

      Correct, if species competitive ability differs substantially, the more competitive species in the mixture would grow like partial density monoculture. This extra growth should not be treated as sources of positive biodiversity effects, simply because it does not result from positive species interactions.   

      Overall, I am not very convinced by the proposed method.

      (1) The proposed method seems not very systematic but rather "ad hoc". It also is much less a partitioning method than the AP method because the other term is simply the difference. It would be good if the authors investigated the mathematical form of this remainder and explored its properties.. when does complementarity occur? Would it capture complementarity and facilitation?

      AP is, by no means, systematic. Remember, AP is based on covariance equation (or Price equation) that has nothing to do with species interactions, other than nice-looking mathematical form (Bourrat et al., 2023). Ecological meanings are subjectively given to CE and SE. Therefore,  CE and SE reflect what we call them, not what they really mean.    

      The remainder measures deviations from the null expectation, due to only competition effect, and can not be partitioned any further. The remainder would be positive for more competitive species and negative for less competitive species in mixture relative to their full density monoculture. The deviation of observed yields from competitive expectations indicates dominance of positive or negative species interactions. All these are clearly outlined at lines 201-221.   

      (2) The justification for the calculation of MG and RC does not seem to follow the very strict assumptions of what competition (in the absence of complementarity) is. See my specific comments above.

      We do not see why not.

      (3) Overall, the manuscript is hard to read. This is in part a problem of terminology and presentation, and it would be good to use more systematic terms for "response patterns" and "biological mechanisms".

      To help understand the variations of competitive growth response with relative competitive ability, the x axis of Figure 1 is labelled with null expectation, competitive expectation, and competitive exclusion from minimum to maximum deviation of competitive ability from community average.

      We have followed terms used in biodiversity partitioning and changing terms can be confusing.  

      Examples:

      - on line 30, the authors write that CE is used to measure "positive" interactions and SE to measure "competitive interactions", and later name "positive" and "negative" interactions "mechanisms of species interactions". Here the authors first use "positive interaction" as any type of effect that results in a community-level biomass gain, but then they use "interaction" with reference to specific biological mechanisms (e.g. one species might attract a parasite that infests another species, which in turn may cause further changes that modify the growth of the first and other species).

      There are some differences in meaning, but that is what CE and SE have been generally used for. Using different terms can be confusing and does not help understanding the problems with AP.

      - on line 70, the authors state that "positive interaction" increases productivity relative to the null expectation, but it is clear that an interaction can have "negative" consequences for one interaction partner and "positive" ones for the other. Therefore, "positive" and "negative" interactions, when defined in this way, cannot be directly linked to "resource partitioning" and "facilitation", and "species interference" as the authors do. Also, these categories of mechanisms are still simple. For example, how do biotic interactions with enemies classify, see above?

      We are explaining effects of competitive interactions on species yield, and ultimately on community yield that can be linked to “resource partitioning" and "facilitation", and "species interference".

      More specific species interactions require detailed biological investigation and cannot be determined through partitioning of biomass production.  

      - line 145: "Under the null hypothesis, species in the mixture are assumed to be competitively equivalent (i.e., absence of interspecific interactions)". This is wrong. The assumption is that there are interspecific interactions, but that these are the same as the intraspecific ones. Weirdly, what follows is a description of the AP method, which does not belong here. This paragraph would better be moved to the introduction where the AP method is mentioned. Or omitted, since it is basically a repetition of the original Loreau & Hector paper.

      As suggested, “absence of interspecific interactions” was replaced with “equal interspecific and intraspecific interactions”.

      We have removed lines 155-162 to reduce duplication. However, our method is based on null expectation that needs to be introduced, despite it is part of AP.

      Other points:

      - line 66: community productivity, not ecosystem productivity.

      Both community productivity and ecosystem productivity are used in biodiversity research, although meaning can be slightly different. Comparatively, ecosystem productivity is more common.

      - line 68: community average responses are with respect to relative yields - this is important!

      - line 64: what are "species effects of species interactions"?

      We searched and did not find “species effects of species interactions”.

      - line 90: here "competitive" and "productive" are mixed up, and it is important to state that "suffers more" refers to relative changes, not yield changes.

      It, in fact, refers to yield changes. For example, less productive species, at active growth, are more responsive to changes in competition, while more productive species, at inactive growth (i.e., aging), are less responsive to changes in competition.   

      - line 92: "positive effect of competitive dominance": I don't understand what is meant here.

      The phrase was modified to “positive contribution of competitive dominance to ecosystem productivity based on the null expectation”.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript by Tao et al. reports on an effort to better specify the underlying interactions driving the effects of biodiversity on productivity in biodiversity experiments. The authors are especially concerned with the potential for competitive interactions to drive positive biodiversity-ecosystem functioning relationships by driving down the biomass of subdominant species. The authors suggest a new partitioning schema that utilizes a suite of partial density treatments to capture so-called competitive ability. While I agree with the authors that understanding the underlying drivers of biodiversity-ecosystem functioning relationships is valuable - I am unsure of the added value of this specific approach for several reasons.

      Strengths:

      I can find a lot of value in endeavouring to improve our understanding of how biodiversity-ecosystem functioning relationships arise. I agree with the authors that competition is not well integrated into the complementarity and selection effect and interrogating this is important.

      Weaknesses:

      (1) The authors start the introduction very narrowly and do not make clear why it is so important to understand the underlying mechanisms driving biodiversity-ecosystem functioning relationships until the end of the discussion.

      There are different ways to start introduction; we believe that starting with the problems of the current approach is the most effective for outlining the study’s objective.  

      (2) The authors criticize the existing framework for only incorporating positive interactions but this is an oversimplification of the existing framework in several ways:

      We did not criticize the existing framework for only incorporating positive interactions. We criticize the existing framework, because it is not based on mechanisms of species interactions, but is extensively used to determine underlying mechanisms driving biodiversity-ecosystem functioning relationships.

      a. The existing partitioning scheme incorporates resource partitioning which is an effect of competition.

      Resource partitioning means that species utilize resources differently, while competition means species use the same resources. “resource partitioning is an effect of competition” is not true in biodiversity experiments that are often short in duration and controlled in conditions.  

      b. The authors neglect the potential that negative feedback from species-specific pests and pathogens can also drive positive BEF and complementarity effects but is not a positive interaction, necessarily. This is discussed in Schnitzer et al. 2011, Maron et al. 2011, Hendriks et al. 2013, Barry et al. 2019, etc.

      We did not. The feedback effect will be reflected in the differences between observed yields and competitive expectations if species in mixtures have different pests and pathogens relative to monocultures. The additive partitioning does not identify these feedback effects either.

      c. Hector and Loreau (and many of the other citations listed) do not limit competition to SE because resource partitioning is a byproduct of competition.

      Positive SE has been largely interpreted as the result of competition including Hector and Loreau (2001) and many others. It needs to be clear that neither of the additive components can be linked to specific mechanisms of species interactions. 

      Does “resource partitioning is a byproduct of competition” mean that species change their niche to avoid competition? If this is what the reviewer means, it may occur through long-term evolution, but not in short-term biodiversity experiments. Hector and Loreau (2001) clearly indicated that their complementarity effect includes both resource partitioning and facilitation.   

      (3) It is unclear how this new measure relates to the selection effect, in particular. I would suggest that the authors add a conceptual figure that shows some scenarios in which this metric would give a different answer than the traditional additive partition. The example that the authors use where a dominant species increases in biomass and the amount that it increases in biomass is greater than the amount of loss from it outcompeting a subdominant species is a general example often used for a selection effect when exactly would you see a difference between the two?:<br /> a. Just a note - I do think you should see a difference between the two if the species suffers from strong intraspecific competition and has therefore low monoculture biomass but this would tend to also be a very low-density monoculture in practice so there would potentially be little difference between a low density and high-density monoculture because the individuals in a high-density monoculture would die anyway. So I am not sure that in practice you would really see this difference even if partial density plots were incorporated.

      Linking new measure to SE or CE would be difficult (see many comparisons in Tables and Figures in our manuscript), as SE and CE are derived from mathematical equation and do not represent specific mechanisms of species interactions (Hector and Loreau 2012; Bourrat et al., 2023).

      (4) One of the tricky things about these endeavors is that they often pull on theory from two different subfields and use similar terminology to refer to different things. For example - in competition theory, facilitation often refers to a positive relative interaction index (this seems to be how the authors are interpreting this) while in the BEF world facilitation often refers to a set of concrete physical mechanisms like microclimate amelioration. The truth is that both of these subfields use net effects. The relative interaction index is also a net outcome as is the complementarity effect even if it is only a piece of the net biodiversity effect. Trying to combine these two subfields to come up with a new partitioning mechanism requires interrogating the underlying assumptions of both subfields which I do not see in this paper.

      Agree, microclimate amelioration is also part of positive effect and will be reflected in the difference between observed yield and competitive expectation. We can not separate the two mechanisms of positive species interactions without investigating influences of microclimate on growth and yield.

      (5) The partial density treatment does not isolate competition in the way that the authors indicate. All of the interactions that the authors discuss are density-dependent including the mechanism that is not discussed (negative feedback from species-specific pests and pathogens). These partial density treatment effects therefore cannot simply be equated to competition as the authors indicate.:

      We use partial density monoculture to determine maximum competitive growth response, effect of density-dependent intraspecific interactions, and species competitive ability to determine the level of maximum competitive growth response species can achieve in mixtures. There may be changes in species-specific pests and pathogens from partial to full density monocultures, which will be captured in competitive growth responses of individuals. We added at lines 186-188 to indicate that the maximum competitive growth response estimated would also include the effects of density-dependent pests, pathogens, or microclimates.   

      a. Additionally - the authors use mixture biomass as a stand-in for competitive ability in some cases but mixture biomass could also be determined by the degree to which a plant is facilitated in the mixture (for example).

      We used monoculture biomass, not mixture biomass, to assess competitive ability

      (6) I found the literature citation to be a bit loose. For example, the authors state that the additive partition is used to separate positive interactions from competition (lines 70-76) and cite many papers but several of these (e.g. Barry et al. 2019) explicitly do not say this.

      Barry et al. (2019) defined CE as overproduction from monocultures, an effect of positive interactions.  

      (7) The natural take-home message from this study is that it would be valuable for biodiversity experiments to include partial density treatments but I have a hard time seeing this as a valuable addition to the field for two reasons:

      a. In practice - adding in partial density treatments would not be feasible for the vast majority of experiments which are already often unfeasibly large to maintain.

      The reviewer suggested that quantity is more important than quality. Without partial density monocultures no one can separate different effects of species interactions, as suggested by Loreau and Hector, reviewers, and many others that effects of species interactions can not be clearly differentiated with replacement series design. Unreliable scientific findings are not valuable.

      b. The density effect would likely only be valuable during the establishment phase of the experiment because species that are strongly limited by intraspecific competition will die in the full-density plots resulting in low-density monocultures. You can see this in many biodiversity experiments after the first years. Even though they are seeded (or rarely planted) at a certain density, the density after several years in many monocultures is quite low.

      True. High or low density also depends on individual size; if individuals do not get enough resources, density is high. Therefore, density effect can be strong even as density drops substantially from initial levels.  

      Reviewer #4 (Public Review):

      Summary:

      This manuscript claims to provide a new null hypothesis for testing the effects of biodiversity on ecosystem functioning. It reports that the strength of biodiversity effects changes when this different null hypothesis is used. This main result is rather inevitable. That is, one expects a different answer when using a different approach. The question then becomes whether the manuscript’s null hypothesis is both new and an improvement on the null hypothesis that has been in use in recent decades.

      It needs to be clear that we use two hypotheses, null hypothesis that is currently used with AP, and competitive hypothesis that is new with this manuscript. The null hypothesis helps determine changes in ecosystem productivity from all species interactions, while the competitive hypothesis helps partition changes in ecosystem productivity by mechanisms of species interactions, i.e., positive, negative, or competitive interactions.    

      Strengths:

      In general, I appreciate studies like this that question whether we have been doing it all wrong and I encourage consideration of new approaches.

      Weaknesses:

      Despite many sweeping critiques of previous studies and bold claims of novelty made throughout the manuscript, I was unable to find new insights. The manuscript fails to place the study in the context of the long history of literature on competition and biodiversity and ecosystem functioning. The Introduction claims the new approach will address deficiencies of previous approaches, but after reading further I see no evidence that it addresses the limitations of previous approaches noted in the Introduction. Furthermore, the manuscript does not reproducibly describe the methods used to produce the results (e.g., in Table 1) and relies on simulations, claiming experimental data are not available when many experiments have already tested these ideas and not found support for them. Finally, it is unclear to me whether rejecting the ‘new’ null hypothesis presented in the manuscript would be of interest to ecologists, agronomists, conservationists, or others. I will elaborate on each of these points below.

      First, there are many biodiversity experiments but those with partial density monocultures are rare. We found only one greenhouse experiment. We have to use simulation to illustrate different scenarios of species interactions to demonstrate how our approach works and how different it is from the AP.  

      Because of different methods used, the results of long history competition research (generally based on additive series design) cannot be used to define effects of competitive interactions in biodiversity research (generally based on replacement series design). This may be the reason that few competition researchers were cited in Loreau and Hector (2001).

      Our approach requires two hypotheses, null and competitive, and the meaning of deviation from these hypotheses are outlined at lines 201-221 for both individual species and community level assessments. Distinguishing changes in ecosystem productivity by species interactions would be of great interest to “ecologists, agronomists, conservationists, or others”.

      The critiques of biodiversity experiments and existing additive partitioning methods are overstated, as is the extent to which this new approach addresses its limitations. For example, the critique that current biodiversity experiments cannot reveal the effects of species interactions (e.g., lines 37-39) isn't generally true, but it could be true if stated more specifically. That is, this statement is incorrect as written because comparisons of mixtures, where there are interspecific and intraspecific interactions, with monocultures, where there are only intraspecific interactions, certainly provide information about the effects of species interactions (interspecific interactions). These biodiversity experiments and existing additive partitioning approaches have limits, of course, for identifying the specific types of interactions (e.g., whether mediated by exploitative resource competition, apparent competition, or other types of interactions). However, the approach proposed in this manuscript gets no closer to identifying these specific mechanisms of species interactions. It has no ability to distinguish between resource and apparent competition, for example. Thus, the motivation and framing of the manuscript do not match what it provides. I believe the entire Introduction would need to be rewritten to clarify what gap in knowledge this proposed approach is addressing and what would be gained by filling this knowledge gap.

      Our approach helps determine underlying mechanisms of species interactions, i.e., positive (resources partitioning or facilitation), negative, or competitive interactions. I am not sure how much we need to go further in identifying more specific mechanisms. If resource and apparent competition refers to resource and interference competition, our approach can tease apart them.

      I recommend that the Introduction instead clarify how this study builds on and goes beyond many decades of literature considering how competition and biodiversity effects depend on density. This large literature is insufficiently addressed in this manuscript. This fails to give credit to previous studies considering these ideas and makes it unclear how this manuscript goes beyond the many previous related studies. For example, see papers and books written by de Wit, Harper, Vandermeer, Connolly, Schmid, and many others. Also, note that many biodiversity experiments have crossed diversity treatments with a density treatment and found no significant effects of density or interactions between density and diversity (e.g., Finn et al. 2013 Journal of Applied Ecology). Thus, claiming that these considerations of density are novel, without giving credit to the enormous number of previous studies considering this, is insufficient.

      A misunderstanding here. Our approach is not designed to test density effect. The same density is held across full density monocultures and mixtures. We use partial density monocultures to determine what species may competitively achieve in full density mixture, without positive or negative interspecific interactions.  

      Replacement series designs emerged as a consensus for biodiversity experiments because they directly test a relevant null hypothesis. This is not to say that there are no other interesting null hypotheses or study designs, but one must acknowledge that many designs and analyses of biodiversity experiments have already been considered. For example, Schmid et al. reviewed these designs and analyses two decades ago (2002, chapter 6 in Loreau et al. 2002 OUP book) and the overwhelming consensus in recent decades has been to use a replacement series and test the corresponding null hypothesis.

      Some wrong impressions. We are not trying to supplant “replacement series” with “additive series”; we use “additive series” designs to supplement “replacement series” design for partitioning changes in ecosystem productivity by mechanisms of species interactions, which would not be possible with “replacement series” design alone, as suggested by many including reviewers.   

      It is unclear to me whether rejecting the 'new' null hypothesis presented in the manuscript would be of interest to ecologists, agronomists, conservationists, or others. Most biodiversity experiments and additive partitions have tested and quantified diversity effects against the null hypothesis that there is no difference between intraspecific and interspecific interactions. If there was no less competition and no more facilitation in mixtures than in monocultures, then there would be no positive diversity effects. Rejecting this null hypothesis is relevant when considering coexistence in ecology, overyielding in agronomy, and the consequences of biodiversity loss in conservation (e.g., Vandermeer 1981 Bioscience, Loreau 2010 Princeton Monograph). This manuscript proposes a different null hypothesis and it is not yet clear to me how it would be relevant to any of these ongoing discussions of changes in biodiversity.

      Our method begins with the null expectation: that intraspecific and interspecific interactions are equivalent. We then propose the competitive hypothesis as a second non-exclusive hypothesis which tests the dominance of positive or negative specific interactions. As shown by its name, the additive partitioning model has been advocated for partitioning biodiversity effects by some ecological mechanisms (CE and SE). The ecological meaning of deviation from the two hypotheses are outlined at lines 201-221 for both individual species and community level assessments.   

      The claim that all previous methods 'are not capable of quantifying changes in ecosystem productivity by species interactions and species or community level' is incorrect. As noted above, all approaches that compare mixtures, where there are interspecific interactions, to monocultures, where there are no species interactions, do this to some extent. By overstating the limitations of previous approaches, the manuscript fails to clearly identify what unique contribution it is offering, and how this builds on and goes beyond previous work.

      The reviewer implies that a partial truth equals the whole truth. The same argument can also be applied to the additive partitioning if relative yield total or response ratio provides a kind of comparison between mixture and monocultures. Our statement is correct in the way that previous approaches are not designed to separate changes in ecosystem productivity by species interactions, as indicated by other reviewers. The additive partitioning is built on Price equation (covariance equation) that has never been biologically demonstrated for relevance in biodiversity partitioning (Bourrat et al., 2023).  

      We made clear that our work is built on and beyond the null expectation with addition of competitive expectation.

      The manuscript relies on simulations because it claims that current experiments are unable to test this, given that they have replacement series designs (lines 128-131). There are, however, dozens of experiments where the replacement series was repeated at multiple densities, which would allow a direct test of these ideas. In fact, these ideas have already been tested in these experiments and density effects were found to be nonsignificant (e.g., Finn et al. 2013).

      Out of point. Again, we are not testing density effect. Partial density is used to determine competitive growth responses that species may achieve in mixture based on their relative competitive ability. We used simulations, as partial density monocultures are used only in one experimental study that has been included in our study.  

      It seems that the authors are primarily interested in trees planted at a fixed density, with no opportunity for changes in density, and thus only changes in the size of individuals (e.g., Fig. 1). In natural and experimental systems, realized density differs from the initial planted density, and survivorship of seedlings can depend on both intraspecific and interspecific interactions. Thus, the constrained conditions under which these ideas are explored in this manuscript seem narrow and far from the more complex reality where density is not fixed.

      We use fixed density only for convenience. In biodiversity experiments, density can increase or decrease over time from initial levels. However, initial density is generally used in evaluation of species interactions. If interest is community productivity, density change does not need to be considered. Again, we are not testing density effects.    

      Additional detailed comments:

      It is unclear to me which 'effects' are referred to on line 36. For example, are these diversity effects or just effects of competition? What is the response variable?

      It means the effect of competitive interactions on productivity and should be clear based on previous sentences.

      The usefulness of the approach is overstated on line 52. All partitioning approaches, including the new one proposed here, give the net result of many types of species interactions and thus cannot 'disentangle underlying mechanisms of species interactions.'

      Not sure how many types of species interactions the reviewer referred to. If mechanisms of species interactions are grouped in three categories (positive, negative, and competitive) as has been in biodiversity research, our approach can tease them apart.   

      The weaknesses of previous approaches are overstated throughout the manuscript, including in lines 60-61. All approaches provide some, but not all insights. Sweeping statements that previous approaches are not effective, without clarifying what they can and can't do, is unhelpful and incorrect. Also, these statements imply that the approach proposed here addresses the limitations of these previous approaches. I don't yet see how it does so.

      The weaknesses of previous approaches are not overstated in terms of separating changes in ecosystem productivity by species interactions. As pointed by other reviewers, none of the previous approaches are designed for quantifying changes in ecosystem productivity by species interactions.   

      The definitions given for the CE and SE on line 71 are incorrect. Competition affects both terms and CE can be negative or have nothing to do with positive interactions, as noted in many of the papers cited.

      We are not trying to define CE and SE but only point out how CE and SE have been generally used in biodiversity research (see recent publication by Feng et al., 2022).

      The proposed approach does not address the limitations noted on lines 73 and 74.

      It does in terms of sources of net biodiversity effect, whether from positive, negative or competitive interactions.

      The definition of positive interactions in lines 77 and 78 seems inconsistent with much of the literature, which instead focuses on facilitation or mutualism, rather than competition when describing positive interactions.

      Much of the literature supports our definition (see Loreau and Hector, 2001). In biodiversity research, positive interactions include resource partitioning and facilitation. What we are trying to point out is that competition affects species and community level assessments based on the null expectation and needs to be separated.

      Throughout the manuscript, competition is often used interchangeably with resource competition (e.g., line 82) and complementarity is often attributed to resource partitioning (e.g., line 77). This ignores apparent competition and partitioning enemy-free niche space, which has been found to contribute to biodiversity effects in many studies.

      If apparent competition refers to interference competition, it is included in negative interaction. Changes in species-specific pests and pathogens in mixture will be captured in positive or negative effects through facilitation or interference.  

      In what sense are competitive interactions positive for competitive species (lines 82-83)? By definition, competition is an interaction that has a negative effect. Do you mean that interspecific competition is less than intraspecific competition? I am having a very difficult time following the logic.

      I am glad the reviewer raised this question that may confuse many others and has never been clearly discussed. It all depends on how comparison is made. If species performance in mixture are compared with that in partial density monocultures, as is in competition research, competition effect is negative for all species. If comparison is made between mixture and full density monocultures, as is done in biodiversity research, competition effect should be positive for more competitive species and negative for less competitive species, with resources flowing from less to more competitive species in mixture relative to full density monocultures.   

      Therefore, the definitions of competitive interactions based on additive series design in competition research cannot be used to describe competitive interactions based on replacement series design in biodiversity research. In biodiversity research, the effects of competitive interactions are never clearly defined at species or community level and mixed up with those of other species interactions.      

      Results are asserted on lines 93-95, but I cannot find the methods that produced these results. I am unable to evaluate the work without a repeatable description of the methods.

      We have added references on sources of these data.

      The description of the null hypothesis in the common additive partitioning approach on lines 145-146 is incorrect. In the null case, it does not assume that there are no interspecific interactions, but rather that interspecific and intraspecific interactions are equivalent.

      Correct, changes have been made as suggested.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      I recommend to:

      - re-organize the presentation of the material (see my concerns in the public review section). The manuscript is very difficult to read.

      Changes have been made to help with understanding of our approach. Figure 1 was modified to show the variations of competitive growth response with relative competitive ability from minimum (null expectation) to maximum (competitive exclusion).

      - explore the mathematical form the the remainder term. It seems important to understand that the remainder capture terms unrelated to competition as defined in the present scope.

      The remainder measures deviations from the null expectation, due to species differences in growth and competitive ability or competition effect. The term has clear meaning, positive for more competitive species and negative for less competitive species (lines 202-204), and does not need to be further explored or partitioned. The deviations of observed yields from competitive expectations are outlined in lines 205-221.  

      Reviewer #4 (Recommendations For The Authors):

      The authors should be sure to include reproducible methods and share any data and code.

      Both simulation and experimental data are shared through supplementary tables. Calculations are included in excel spreadsheets and do not require program coding.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer 1 (Public Review):

      Summary:

      The authors present a mean-field model that describes the interplay between (protein) aggregation and phase separation. Different classes of interaction complexity and aggregate dimensionality are considered, both in calculations concerning (equilibrium) phase behavior and kinetics of assembly formation.

      Strengths:

      The present work is, although purely theoretical, of high interest to understanding biological processes that occur as a result of a coupling between protein aggregation and phase separation. Of course, such processes are abundant, in the living cell as well as in in-vitro experiments. I appreciate the consideration of aggregates with various dimensionality, as well as the categorization into different ”interaction classes”, together with the mentioning of experimental observations from biology. The model is convincing and underlines the complexity associated with the distribution of proteins across phases and aggregates in the living cell.

      Weaknesses:

      There are a few minor weaknesses.

      Reviewer 2 (Public Review):

      This work deals with a very difficult physical problem: relating the assembly of building blocks on a molecular scale to the appearance of large, macroscopic assemblies. This problem is particularly difficult to treat, because of the large number of units involved, and of the complex way in which these units-monomers-interact with each other and with the solvent. In order to make the problem treatable, the authors recur to a number of approximations: Among these, there is the assumption that the system is spatially homogeneous, i.e., its features are the same in all regions of space. In particular, the homogeneity assumption may not hold in biologically relevant systems such as cells, where the behavior close to the cell membrane may strongly differ from the one in the bulk. As a result, this hypothesis calls for a cautious consideration and interpretation of the results of this work. Another notable simplification introduced by the authors is the assumption that the system can only follow two possible behaviors: In the first, each monomer interacts equally with the solvent; no matter the size of the cluster of which it is part. In the second case, monomers in the bulk of a cluster and monomers at the assembly boundary interact with the solvent in a different way. These two cases are considered not only because they simplify the problem, but also because they are inspired by biologically relevant proteins.

      With these simplifications, the authors trace the phase diagram of the system, characterizing its phases for different fractions of the volume occupied by the monomers and solvent, and for different values of the temperature. The results qualitatively reproduce some features observed in recent experiments, such as an anomalous distribution of cluster sizes below the system saturation threshold, and the gelation of condensed phases above such threshold.

      Reviewer 3 (Public Review):

      Summary:

      The authors combine classical theories of phase separation and self-assembly to establish a framework for explaining the coupling between the two phenomena in the context of protein assemblies and condensates. By starting from a mean-field free energy for monomers and assemblies immersed in solvent and imposing conditions of equilibrium, the authors derive phase diagrams indicating how assemblies partition into different condensed phases as temperature and the total volume fraction of proteins are varied. They find that phase separation can promote assembly within the protein-rich phase, providing a potential mechanism for spatial control of assembly. They extend their theory to account for the possibility of gelation. They also create a theory for the kinetics of self-assembly within phase separated systems, predicting how assembly size distributions change with time within the different phases as well as how the volumes of the different phases change with time.

      Strengths:

      The theoretical framework that the authors present is an interesting marriage of classic theories of phase separation and self-assembly. Its simplicity should make it a powerful general tool for understanding the thermodynamics of assembly coupled to phase separation, and it should provide a useful framework for analyzing experiments on assembly within biomolecular condensates.

      The key advance over previous work is that the authors now account for how self-assembly can change the boundaries of the phase diagram.

      A second interesting point is the explicit theoretical consideration for the possibility that gelation (i.e. self-assembly into a macroscopic aggregate) could account for widely observed solidification of condensates. While this concept has been broadly discussed, to date I have yet to see a rigorous theoretical analysis of the possibility.

      The kinetic theory in sections 5 and 6 is also interesting as it extends on previous work by considering the kinetics of phase separation as well as those of self-assembly.

      Weaknesses:

      A key point the authors make about their theory is that it allows, as opposed to previous research, to study non-dilute limits. It is true that they consider gelation when the 3D assemblies become macroscopic. However, dilute solution theory assumptions seem to be embedded in many aspects of their theory, and it is not always clear where else the non-dilute limits are considered. Is it in the inter-species interaction χij? Why then do they never explore cases for which χij is nonzero in their analysis?

      We explicitly consider that monomers and aggregates are non-dilute with respect to solvent. This is evident in accounting for the mixing entropy of all components, including the solvent. Moreover, we account for interactions among the monomers and the different aggregates with the solvent. We consider the case where each monomeric unit, independent in aggregate it is part of, interacts the same way with the solvent. Please note that this case corresponds to a non-dilute scenario where interactions indeed drive phase separation.

      The connection between this theory and biological systems is described in the introduction but lost along the main text. It would be very helpful to point out, for instance, that the presence of phase separation might induce aggregation of proteins. This point is described formally at the end of Section 3, but a more qualitative connection to biological systems would be very useful here.

      We thank the referee for the useful comment, we now mention this in the introduction (line 80) and point out the biological relevance of assembly formation and localization via the presence of phase separation (lines 268 and 283).

      Building on the previous point, it would be helpful to give an intuitive sense of where the equations derived in the Appendices and presented in the main text come from and to spell out clear physical interpretations of the results. For example, it would be helpful to point out that Eq. 4 is a form of the law of mass action, familiar from introductory chemistry. It would be useful to better explain how the current work extends on existing previous work from these authors as well as others. Along these lines, closely related work by W. Jacobs and B. Rogers [O. Hedge et al. 2023, https://arxiv.org/abs/2301.06134; T. Li et al. 2023, https://arxiv.org/abs/2306.13198] should be cited in the introduction. The results discussed in the first paragraph of Section 3 on assembly size distributions in a homogeneous system are well-known from classic theories of self-assembly. This should be acknowledged and appropriate references should be added; see for instance, Rev. Mod. Phys. 93, 025008 and Statistical Thermodynamics Of Surfaces, Interfaces, And Membranes by Sam Safran. Equation 14 for the kinetic of volume fractions is given with reference to Bauermann et al. 2022, but it should be accompanied by a better intuitive interpretation of its terms in the main text. In particular, how should one understand the third term in this equation? Why does the change in volume impact the change of volume fraction in this way?

      We thank the referee for the suggestions. We have included the missing references, with a particular emphasis on DNA nanostars that inhibit phase separation in DNA liquids in the definition of class II. We added intuitive explanations of the main equations, such as Eqs. (4),(8),(14), (17), and (18). Notice that, according to Mysels, Karol J., J. Chem. Educ., 33, 178 (1956) (https://pubs-acs-org.sire.ub.edu/doi/epdf/10.1021/ed033p178) we refer to (18) as the law of mass action.

      The discussion in the last paragraph of Section 6 should be clarified. How can the total amount of protein in both phases decrease? This would necessarily violate either mass or volume conservation. Also, the discussion of why the volume is non-monotonic in time is not clear.

      A decrease in the total amount of protein in both phases does not violate mass conservation, if the volume of the phases varies accordingly. In particular, the volume of the denser phase should grow. This given, in the case presented the total protein amount in the dense phase decreases, while in the dilute phase increases. For this reason, we revised the paragraph and now explain the results in more detail (see lines starting from 407). The nonmonotonic volume change is indeed a puzzling finding that, as we now state in the manuscript, requires further investigation. Given the lack of analytical approaches available to tackle the complex kinetics in the presence of coexisting phases, we believe that this analysis goes beyond the scope of the present paper.

      Recommendations for the authors

      Reviewer 1 (Recommendations For The Authors):

      Line 96: I feel a mentioning/definition/explanation and perhaps some discussion on the parameter M (limiting aggregate size) would have been in place in the introduction of Equation (1). Furthermore, in the usual interpretation, Flory interaction parameters (symbolized χ) are dimensionless, as, classically, they represent an exchange energy (normalized by kT), defined on a monomeric basis. Here they seem to carry the dimension of energy.

      We thank the reviewer for the observation. We have included a brief comment on M and mentioned that we use χ parameters that carry the dimension of energy such that, varying kBT, we scale at the same time the term containing interaction propensities (χ) and the one containing internal energies (_e_int). See the comment on line 127

      Line 150: The choice of ρi \= i physically implies that a single protein is assumed to have the same as a solvent molecule. This may be a bit of a stretch. This assumption leads to an overestimation of the translational entropy of the aggregates (first term in Equation (1)). Acknowledging that ρ_1 >> ρs_ would give a pronounced desymmetrization of the phase diagram (I suspect).

      Indeed, in the case of monomers only, the assumption leads to a symmetric phase diagram which may be unrealistic. Once assemblies form, however, the phase diagram becomes asymmetric and for this reason we decided to assume ρi \= i, simplifying the theoretical analysis. We have added a clarifying sentence in the manuscript, see line 163

      Furthermore, the pictures in Figure 1a-c suggest the presence of a disordered residue, the degree of swelling of which might affect binding strength (see for instance: https://doi.org/10.3389/fnmol.2022.962526).

      We added a comment on the possible coupling between internal free energies and interaction propensities, such as the swelling mechanism that affects binding sites, and included the reference above (line 215).

      Line 154-156: It’s unclear what is meant with ”an internal bond that keeps each assembly together”. How should this be interpreted on an intuitive physical level?

      We apologise for being unclear. We meant the internal bonds that lead to the formation of assemblies. We have now rephrased this sentence in the main text (lines starting from 169).

      Line 254: The fact that ϕsg is defined below does not mean it does not fall out of the air here. The same holds for the consideration of the limit M →∞. Ideally, the main text should stand on its own, in particular with respect to physical intuitiveness, as well as the necessity and interest of discussion topics. Technical details, derivations and additional information can be in an appendix.

      We agree with the referee and added some physical insights about the limit. We now also state clearly in the main text (line 298) that _ϕ_sg is affected by temperature and the free energy of internal bonds.

      Line 257: ”Since we do not explicitly include the solvent in assembly formation we will consider the gel as a phase without solvent and thus ϕtot \= 1”. I’m not sure if I can agree with this. I would say, a gel, certainly in biological context, almost per definition contains a large fraction of solvent, i.e. here water. The situation ”ϕtot \= 1” would rather be a solid precipitate. Is gelation properly captured by this model?

      We thank the referee for this very relevant observation. We now state in the main text that the model predicts a macroscopic assembly which we call ’the gel phase’, in agreement with previous literature. Then, to clarify, we added the sentence ”Please note that, since we do not explicitly include the solvent in assembly formation (see reaction scheme in Fig.1a), in our model the gel corresponds to a phase without solvent, _ϕ_tot \= 1. To account for biological gels that can be rich in water, our theory can be straightforwardly extended by incorporating the solvent into the reaction scheme.”, see main text line 300.

      Line 268: Shouldn’t ”solvent” be ”solution”? If fsol is given by Equation (1), surely not only the solvent is considered.

      Indeed, this is a typo, and we now use the term ’solution’ instead of ’solvent.’

      Line 273: At this stage, the only information provided in the main text is that ω∞ is ”a constant that does not affect chemical nor phase equilibrium, except in the limit M →∞” (see lines 153-154). This is a little bit too abstract for me. Again, the main text should stand on its own, meaning the reader should not have to rely on an appendix to at least have an intuitive physical understanding of any modeling or input parameter discussed in the main text.

      We thank the reviewer for pointing this out. We now comment on the physical interpretation of ω∞ in the main text, see lines from 320 on.

      Figure 4. appears in Equation (39) but it is not defined.

      We thank the reviewer for pointing this out. We have reshaped appendix 6A, making use of chemical activities and clarified the origin of the rate .

      Line 317. I don’t fully understand the intention of the remark on the model being adaptable for ”primary and secondary nucleation”. How/in what way is this different from association and dissociation? For instance, classical nucleation theory is based on association and dissociation of monomeric units to and from clusters.

      We agree that the kinetic rate coefficients kij (appearing in the association and dissociation rates ∆rij, Eq. 17) in our manuscript already depend on assembly length, see Appendix 6 B, where we now clarified their definition. Please note that, however, that secondary nucleation is a special kind of association, for which the kinetic rate coefficients corresponding to associations of small assemblies, i.e. kij with_i,j_ ≪ M, explicitly depend on the presence of large assemblies with sizes l ≫ 1. In our manuscript, we have not accounted for such a dependence. We now make this aspect clear in the manuscript, see Appendix 6 B.

      Line 321. Why is ∆rij called the ”monomer exchange rate”? In line 318 the same parameter is defined as the ”reaction rate for the formation of a (i+j)-mer”. Why should these be the same?

      We thank the reviewer for spotting this typo.

      Line 323. Why do these calculations use M = 15?

      The exploration of a 15-dimensional phase space is already numerically challenging. We are currently working on a generalization of the numerical scheme to work with larger values of M but, to discuss the fundamental physical principles, we kept M \= 15.

      Reviewer 2 (Recommendations For The Authors):

      The manuscript presents several issues, on both the scientific and presentational level, which need to be carefully addressed. Please find below a list of the points that need to be addressed by the authors, divided into major and minor points. Major issues:

      • A general, major concern about the results in the paper is the homogeneity assumption. I do understand that repeating the whole analysis presented in the manuscript by allowing for spatial inhomogeneities partially goes beyond the scope of this paper. However, the authors should at least discuss how such inhomogeneities may alter the results in a qualitative way, and treat explicitly the presence of inhomogeneity in one prototypical case treated in the manuscript. Namely, what happens if the volume fractions and relative molecular volumes in the free energy (1) depend on space, e.g., ϕiϕi(x)?

      We would like to stress that, in the present paper, we do account for spatial inhomogeneities. Indeed, in the case of phase separation, we consider systems which are divided into two phases, characterized by different values of the assemblies’ volume fractions ϕi. We do, however, consider the system to be homogeneous inside the phases, implying a jump in the value of the volume fraction at the interface between the two phases. In this sense, the analysis we carry out is valid in the thermodynamic limit, where gradients of the volume fractions ϕi(x) within the phases, can be neglected. On the other hand, considering the full spatial problem, i.e. solving the equations for M \= 15 spatially varying fields, would be numerically extremely challenging.

      • The authors’ results relate molecular assembly- a phenomenon at the molecular scale-to phase separation-a mesoscopic or macroscopic phenomenon. The authors should stress the conceptual importance of this connection between scales, and present their results from the perspective of a multi-scale model.

      We thank the reviewer for pointing this out. We now emphasize the multi-scale feature of our model in the introduction (line 80).

      • Starting from Section 1, the reader is not well guided through the sections that follow. The authors should provide an outline of the line of though that they are going to follow in the following sections, and logically connect each section to the next one with a short paragraph at the end of each section. This paragraph should resume what has been addressed in the current section, and the connection with the topic that will be addressed in the next one.

      We agree with the reviewer and have added a transitioning sentence at the end of each paragraph.

      • ’We focus on linear assemblies (d = 1)’: Given the striking differences of the results between d = 1 and d > 1 shown above, the authors should discuss what happens for d > 1 as well.

      • ’In figure Fig. 5a, we show the initial and final equilibrium binodals (black and coloured curve, respectively), for the case of linear assemblies (d = 1) belonging to class 1’: Again, show what happens for d > 1.

      We agree with the reviewer, the kinetics in d > 1 would be definitely interesting. However, in this case, one assembly can become macroscopic (i.e. M must be set to ∞). This requires some substantial modification in the kinetic scheme, like introducing an absorbing boundary condition for monomers ’sucked in’ the gel. We prefer to leave this for future work, and now state it explicitly in the manuscript (line 383).

      • ’This difference arises because, within class 2, monomers in the bulk of an assembly have reduced interaction propensity with respect to the boundary ones. As a consequence, the formation of large clusters shifts the onset of phase separation to higher ϕtot values.’: To prove this argument, the authors should show Fig. 2g and h for d > 1. In fact, by varying d, the effect of the boundary vs. bulk also varies.

      We prefer to discuss the thermodynamics of d > 1 in section 4 on gelation. There we present only a single phase diagram so as not to blow up the discussion on equilibrium too much.

      • ’referring for simplicity to systems belonging to Class 1’: The authors should do the same analysis for Class 2.

      We agree with the reviewer. However, again not to blow up the discussion on equilibrium, we leave it for future work.

      • ’other, implying that the corresponding Flory-Huggins parameter χij vanishes’: Why?

      The explanation based on a lattice model is reported in Appendix 2, and is now more clearly referenced (line 185).

      Minor issues:

      • Eq. (10): Here the authors should explain in the main text, possibly in a simple and intuitive way, why the number of monomers i and the space dimension d enter the righthand side of this equation in this particular way.

      We thank the reviewer for pointing this out. We added the physical origin of the scaling with dimension in Eq. (10) and in Eq. (8), as pointed out by reviewer 3.

      • ’The second and fifth terms of fsol characterize the internal free energies’: What do you mean by ’characterize the internal free energies’? Please clarify.

      As we now state more clearly (lines 114-120), these two contributions include the internal free energies ω_s and _ωi, stemming from the free energy of internal bonds that lead to assembly formation.

      • ’depend on the scaling form of the’: Scaling with respect to what ? Please clarify.

      We have now clarified that the scaling is with respect to the assembly size i.

      • Figure 2 is way too dense: it should be split into two figures, and the legend of each of the two figures should be expanded to properly guide the reader to understand the figures.

      We understand the reviewer’s point of view. To avoid altering the present flow, we decided not to split the figure, but we have included shaded boxes to better guide the reader.

      • ’this is a consequence of the gelation transition’: Please clarify

      • ’and this limitation can be dealt with by introducing explicitly the infinite-sized gel in the free energy’: Why? Please clarify.

      We have now rephrased these sentences, hopefully in a clearer way. We now state: ’We know that this divergence is physical, and is caused by the gelation transition. This limitation can be dealt with by introducing explicitly a term in the free energy that accounts for an infinite-sized assembly (the gel)’, see lines 320-322.

      • Figure 4: Add plots of panels d, e, h and i with log scale on the y axis to make explicit an eventual exponential behavior, and revise the text accordingly

      Not to further complicate Figure 4, we preferred to display the logarithmic plots of the equilibrium distribution in the appendix, see Figure A3-1.

      • ’... an equilibrium distribution which monotonously decreases with assembly size’: It is not the distributions that decreases but the cluster volume fraction, please rephrase.

      We thank the reviewer for pointing this out and have now rephrased this sentence (line 394).

      Reviewer 3 (Recommendations For The Authors):

      I could not obtain the exact form of Eq 29 in App 3, can the authors elaborate on this calculation. App 3: What does it mean binodal agrees well with ϕsg? And doesn’t ϕsg depend on temperature through phi tilde? What temperature is this result for?

      We apologise for the unclear explanation. We now state in detail that Eq. (29) is obtained by plugging the expression of ϕi given in Eq. (24) into Eq. (1), in the main text. The dependence of ϕ<sub>1</sub> on ϕ<sub>tot</sub> is expressed in Eq. (26), and we have omitted linear terms in ϕ<sub>tot</sub>, since they do not affect phase equilibrium (see lines 802-809). Moreover, ϕsg depends indeed on k<sub>B</sub>T. We refer to the comparison between the full curve ϕsg in the k<sub>B</sub>T−ϕ<sub>tot</sub> plane, and the branch of the binodal between the triple point (indicated now with a cross) and ϕ<sub>tot</sub> \= 1. The two curves are close, as expected since both correspond to the boundary between homogeneous mixtures and the gel state, obtained with different methods.

      The references to Figures in the appendices are confusing. Please make it clear whether Figures in the main text or the appendices are being referenced. On a related note, the Appendix figures seem to be placed in appendices whose text describes something else - Appendix 2, Figure 1 should be moved to Appendix 3; Appendix 3, Figure 1 should be moved to Appendix 4; etc.

      We revised the appendix, corrected the figure positions and clarified their references.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This research offers an in-depth exploration and quantification of social vocalization within three families of Mongolian gerbils. In an enlarged, semi-natural environment, the study continuously monitored two parent gerbils and their four pups from P14 to P34. Through dimensionality reduction and clustering, a diverse range of gerbil call types was identified. Interestingly, distinct sets of vocalizations were used by different families in their daily interactions, with unique transition structures exhibited across these families. The primary results of this study are compelling, although some elements could benefit from clarification

      Strengths:

      Three elements of this study warrant emphasis. Firstly, it bridges the gap between laboratory and natural environments. This approach offers the opportunity to examine natural social behavior within a controlled setting (such as specified family composition, diet, and life stages), maintaining the social relevance of the behavior. Secondly, it seeks to understand short-timescale behaviors, like vocalizations, within the broader context of daily and life-stage timescales. Lastly, the use of unsupervised learning precludes the injection of human bias, such as pre-defined call categories, allowing the discovery of the diversity of vocal outputs.

      Weaknesses:

      (1) While the notable differences in vocal clusters across families are convincing, the drivers of these differences remain unclear. Are they attributable to "dialect," call usage, or specific vocalizing individuals (e.g., adults vs. pups)? Further investigation, via a literature review or additional observation, into acoustic differences between adult and pup calls is recommended. Moreover, a consistent post-weaning decrease in the bottom-left cluster (Fig. S3) invites interpretation: could this reflect drops in pup vocalization?

      Thank you for bringing up this point of clarification. Without knowledge of individual vocalizers, we are unable to rigorously assess pronunciation differences between individuals, however we can get a clear proxy for dialect through observing usage differences between families. We’ve added the following text (blue) in the Discussion to help clarify:

      “To address whether gerbils also exhibit family specific vocal features, we compared GMM-labeled vocal cluster usages across the three recorded families and showed differences in vocal type usage (Figure 3). The differences in this study align with the definition of human vocal dialect, which is a regional or social variety of language that can differ in pronunciation, grammatical, semantic and/or language use differences (Henry et al., 2015). This definition of dialect is inclusive of both pronunciation differences (e.g. a Bostonian’s characteristic pronunciation of “car” as “cah”) and usage differences (e.g. a Bostonian’s preferential usage of the words “Go Red Sox” vs. a New Yorker’s preferential usage of the words “Go Yankees”). In our case, vocal clusters can be rarely observed in some families yet highly over-expressed in others (e.g. analogous to language usage differences in humans), or highly expressed in both families, but contain subtle spectrotemporal variations (Figure 3D, Family 1 cluster 11 vs. Family 3 clusters 2, 18, 30; e.g. analogous to pronunciation differences in humans).”

      Indeed, our recordings obtained after pup removal could suggest that adults may use fewer low frequency calls (bottom left cluster in UMAP). However, this dataset does not permit a proper assessment of post-weaning pup calls. In fact, our results and the literature shows that adults are likely to use low frequency calls, but only during social interactions with pups or other adults. For example, Furuyama et al. 2022 describe a number of low frequency call types used by adults in agonistic social interactions, which look similar to a low frequency call type used by pups described in Silberstein et al. 2023. Similarly, Ter-Mikaelian et al. 2012 (their Figure 6) recorded several types of sonic vocalizations during adult social interaction. To our knowledge, it has not been shown whether gerbil pups and adults produce distinct call types. It is a challenging problem to solve, as animals placed in isolation (i.e. an experimental condition for which the identity of the vocalizer is known) vocalize infrequently and of the limited number they might emit, they do not use the full range of vocalizations described in the literature (RP personal observations). To properly address this question, one would need to elicit full use of the vocal repertoire through free social interaction, then attribute calls to individual vocalizers via sound source localization and/or head-mounted microphones — we are currently pursuing both of these technical challenges, but this is outside the scope of this manuscript.

      Although the literature reflects the limitations discussed above, we have added a brief paragraph to the Discussion (limitations section) that addresses the reviewer’s question about the development of vocalizations:

      “Although we were not able to attribute vocalizations to individual family members, we did seek to determine the importance of family structure by comparing audio recordings before and after removal of the pups at P30. The results show a clear effect of family integrity, and the sudden reduction of sonic calls following pup removal (Figure S3) could suggest that these vocalizations are produced selectively by pups.

      However, there is ample evidence that adult gerbils also produce sonic vocalizations. For example, a number of low frequency call types are used by adults during a range of social interactions (Ter-Mikaelian et al., 2012; Furuyama et al., 2022), some of which are similar to a low frequency call type used by pups (Silberstein et al., 2023). Vocalization patterns of developing gerbils depend on isolation or staged interactions. Thus, when gerbil pups are recorded during isolation, ultrasonic vocalization rate declines and sonic vocalizations increase for animals that are in a high arousal state (De Ghett 1974, Silberstein et al., 2023). As gerbils progress from juvenile to adolescent development (P17-55) a significant increase in ultrasonic vocalization rate is observed during dyadic social encounters, with a distinct change in usage pattern that depends upon the sex of each animal (Holman & Seale 1991, Holman et al. 1995). The development of vocalization types has been assessed in another member of the Gerbillinae subfamily, called fat-tailed gerbils (Pachyuromys duprasi), during isolation and handling. Here, the number of ultrasonic vocalization syllable types increase from neonatal to adult animals (Zaytseva et al. 2019), while some very low frequency sonic call types were rarely observed after P20 (Zaytseva et al. 2020). By comparison, mouse syllable usage changes during development, but pups produced 10 of the 11 syllable types produced by adults (Grimsley et al. 2011). In summary, our understanding of the maturation of vocalization usage remains limited by our inability to obtain longitudinal data from individual animals within their natural social setting. For example, when recorded in their natural environment, chimpanzees display a prolonged maturation of vocalization complexity, such as the probability of a unique utterance in a sequence, with the greatest changes occuring when animals begin to experience non-kin social interactions (Bortolato et al. 2023).”

      (2) Developmental progression, particularly during pre-weaning periods when pup vocal output remains unstable, might be another factor influencing cross-family vocal differences. Representing data from this non-stationary process as an overall density map could result in the loss of time-dependent information. For instance, were dominating call types consistently present throughout the recording period, or were they prominent only at specific times? Displaying the evolution of the density map would enhance understanding of this aspect.

      This is a great suggestion. Thank you for bringing it up. To address this, we have added an additional figure (Figure 4) to the main text (Note that the former Figure 4 is now Figure 5). New text associated with this new figure was added to the Results and Discussion sections:

      Results

      “Vocal usage differences remain stable across days of development It is possible that the observed vocal usage differences could result from varying developmental progression of vocal behavior or overexpression of certain vocal types during specific periods within the recording. To assess the potential effect of daily variation on family specific vocal usage, we visualized density maps of vocal usage across days for each of the families (Figure 4A). There are two noteworthy trends: 1.) the density map remains coarsely stable across days (rows) and 2.) the maps look distinct across families on any given day (columns). This is a qualitative approximation for the repertoire’s stability, but does not take into account variation of call type usage (as defined by GMM clustering of the latent space). Figure 4B, shows the normalized usage of each cluster type over development for each family. Cluster usages during the period of “full family, shared recording days” (postnatal days beneath the purple bars) are stable across days within families – as is apparent by the horizontal striations in the plot – though each family maintains this stability through using a unique set of call types. This is addressed empirically in Figure 4C, which shows clearly separable PCA projections of the cluster usages shown in Figure 4B (purple days). Finally, we computed the pairwise Mean Max Discrepancy (MMD) between latent distributions of vocalizations from individual recording days for each of the families (Figure 4D). This shows that across-family repertoire differences are substantially larger than within-family differences. This is visualized in a multidimensional scaling projection of the MMD matrix in Figure 4E.”

      Discussion

      “The described family differences collapse data from multiple days into a single comparison, however it’s possible that factors such as vocal development and/or high usage of particular vocal types during specific periods of the recording could explain family differences. Therefore, we took advantage of the longitudinal nature of our dataset to assess whether repertoire differences remain stable across time. First, we visualized vocal repertoire usage across days as either UMAP probability density maps (Figure 4A) or daily GMM cluster usages (Figure 4B). Though qualitative, one can appreciate that family repertoire usage remains stable across days and appears to differ on a consistent daily basis across families. To formally quantify this, we first projected GMM cluster usages from Figure 4B into PC space and show that family GMM cluster usage patterns are highly separable, regardless of postnatal day (Figure 4C). If families had used a more overlapping set of call types, then the projections would have appeared intermixed. Next, we performed a cluster-free analysis by computing the pairwise MMD distance between VAE latent distributions of vocalizations from each family and day (Figure 4D). This analysis shows very low MMD values across days within a family (i.e. the repertoire is highly consistent with itself), and high MMD values across families/days (greater than would be expected by chance; see shuffle control in Figure S2D). The relative differences in this matrix are made clear in Figure 4E, which provides additional evidence that family vocal repertoires remain stable across days and are consistently different from other families. Taken together, we believe that this is compelling evidence that differences in vocal repertoires between families are not driven by dominating call types during specific phases in the recording period; rather, families consistently emit characteristic sets of call types across days. This opens up the possibility to assess repertoire differences over much shorter time periods (e.g. 24 hours) in future studies.”

      (3) Family-specific vocalizations were credited to the transition structure, a finding that may seem obvious if the 1-gram (i.e., the proportion of call types) already differs. This result lacks depth unless it can be demonstrated that, firstly, the transition matrix provides a robust description of the data, and secondly, different families arrange the same set of syllables into unique sequences.

      Thank you for these important suggestions. We agree that it is true that the 2-gram transition structure must vary based on the 1-gram structure. To determine whether this influences the interpretation of the finding, we have added Figure S5 and the following text in the Results section:

      “To determine whether differences in 1-gram structure contribute to differences in the transition (2-gram) structure, we performed a number of controls. Although subtle, vertical streaks are clearly present in shuffled transition matrices that correspond to 1-gram usages (Figure S5A-B). Given the shuffled data structure, we sought to determine whether the observed transition probabilities differed significantly from chance levels. We randomly shuffled label sequences 1000 times independently for each family to generate a null transition matrix distribution. Using these null distributions and the observed transition probabilities, we computed a p-value for each transition using a one-sample t-test and created a binary transition matrix indicating which transitions happen above chance levels (Figure S5C, black pixels, p <= 0.05 after post hoc Benjamini-Hochberg multiple comparisons correction). As is made clear in Figure S5C, most transitions for each family occur significantly above chance levels, despite the inherent 1-gram structure. Moreover, by looking at transitions from a highly usage cluster type used roughly the same proportion across families (cluster 12), we show that families arrange the same sets of vocal clusters into unique sequences (Figure S5D). We believe that this provides compelling evidence that the 1-gram structure does not change the interpretation of the main claim that transition structure varies by family. “””

      To address your second point, we inspected frequent transitions from individual syllables to all other syllables using bigram transition probability graphs. This revealed a common trend that across all families, many shared and unshared transitions existed, suggesting that families use the same sets of syllables to make unique transition patterns. Figure S5D shows a single syllable example of the phenomenon, with red lines indicating the shared transition types between families and black showing transition patterns not shared between families (i.e. unique family-specific transitions, or lack thereof).”

      Reviewer #2 (Public Review):

      Peterson et al., perform a series of behavioral experiments to study the repertoire and variance of Mongolian gerbil vocalizations across social groups (families). A key strength of the study is the use of a behavioral paradigm which allows for long term audio recordings under naturalistic conditions. This experimental set-up results in the identification of additional vocalization types. In combination with state of the art methods for vocalization analysis, the authors demonstrate that the distribution of sound types and the transitions between these sound types across three gerbil families is different. This is a highly compelling finding which suggests that individual families may develop distinct vocal repertoires. One potential limitation of the study lies in the cluster analysis used for identifying distinct vocalization types. The authors use a Gaussian Mixed Model (GMM) trained on variational auto Encoder derived latent representation of vocalizations to classify recorded sounds into clusters. Through the analysis the authors identify 70 distinct clusters and demonstrate a differential usage of these sound clusters across families. While the authors acknowledge the inherent challenges in cluster analysis and provide additional analyses (i.e. maximum mean discrepancy, MMD), additional analysis would increase the strength of the conclusions. In particular, analysis with different cluster sizes would be valuable. An additional limitation of the study is that due to the methodology that is used, the authors can not provide any information about the bioacoustic features that contribute to differences in sound types across families which limits interpretations about how the animals may perceive and react to these sounds in an ethologically relevant manner.

      The conclusions of this paper are well supported by data, but certain parts of the data analysis should be expanded and more fully explained.

      • Can the authors comment on the potential biological significance of the 70 sound clusters? Does each cluster represent a single sound type? How many vocal clusters can be attributed to a single individual? Similarly, can the authors comment on the intra-individual and inter-individual variability of the sound types within and across families?

      Previous work documenting the Mongolian gerbil repertoire (Ter-Mikaelian 2012, Kobayasi 2012) has revealed ~12 vocalization types that vary with social context. Our thinking is that we are capturing these ~12 (plus a few more, as illustrated in Figure 2C) as well as individual or family-specific variations of some call types. Although the number of discrete call types is likely less than 70, it’s plausible that variation due to vocalizer identity pushes some calls into unique clusters. This idea is supported by the fact that both naked mole rats and Mongolian gerbils have been shown to exhibit individual-specific variation in vocalizations, though only in single call types (Barker 2021, Figure 1; Nishiyama 2011, Table I). The current study is not ideal to test this prediction, as we cannot attribute each vocalization to individual family members. Using our 4-mic array, we attempted to apply established sound source localization techniques to assign vocalizations to individuals (Neunuebel 2015), but the technique failed, presumably due to high amounts of reverberation in the arena. We are currently developing a custom deep learning based sound localization algorithm, and had hoped to extract individual animal vocalizations from our data set (part of the reason why this manuscript has taken longer than expected to return!), but the performance is not yet satisfactory for large groups of animals. We have added text to the Methods sections with the context outlined above to further justify the use of ~70 clusters.

      • As a main conclusion of the paper rests on the different distribution of sound clusters across families, it is important to validate the robustness of these differences across different cluster parameters. Specifically, the authors state that "we selected 70 clusters as the most parsimonious fit". Could the authors provide more details about how this was fit? Specifically, could the authors expand upon what is meant by "prior domain knowledge about the number of vocal types...". If the authors chose a range of cluster values (i.e. 10, 30, 50, 90) does the significance of the results still hold?

      Thank you for the suggestion, this is an important point that we have addressed with new analyses in the revision (see GMM clustering methods and new Figure S4). The prior domain knowledge referenced is with respect to the information known about the Mongolian gerbil vocal types provided in the response above. We have made this more clear in the discussion.

      We mainly based our selection of the number of clusters using the elbow method on GMM held-out log likelihood (Figure S2C). Around 70 clusters is when the likelihood begins to plateau, though it’s clear that there are a number of reasonable cluster sizes. To assess whether cluster size has an effect on interpretation of the family differences result, we added Figure S5, where we varied the number of GMM clusters used and compared cluster usage differences across families (Figure S4A). We quantified pairwise family differences in cluster usage by computing the sum of the absolute value of differential cluster usages, for each GMM cluster value (Figure S4B). We find that relative usage differences remain unchanged across the range of cluster values used, indicating that GMM cluster size does bias the finding.

      • While VAEs are powerful tools for analyzing complex datasets in this case they are restricted to analysis of spectrogram images. Have the authors identified any acoustic differences (i.e. in pitch, frequency, and other sound components) across families?

      Though it’s true that this VAE is limited to spectrograms, the VAE latent space has been shown to correspond to real acoustic features such as frequency and duration, and contain a higher representational capacity than traditional acoustic features (Goffinet 2021, Figure 2). Therefore, clustering of the latent space necessarily means that vocalizations with similar acoustic features are clustered together regardless of their family identity.

      Despite this, your point is well taken that there could be systematic differences in certain acoustic features for specific call types. We are not able to ascertain this with the current dataset. This is addressed in Barker 2021 by recording a single call type (soft chirp) from individuals within and across families. Mongolian gerbils have been shown to exhibit individual differences in the initial, terminal, minimum, and maximum frequency of the ultrasonic up-frequency modulated call type (Figure 2, top right green; Nishiyama 2011, Figure 1A ). Therefore it’s possible that family-specific differences exist for that particular call type. To assess whether other call types show family or individual differences, it’s necessary to either 1.) elicit all call types from an animal in isolation or 2.) determine vocalizer identity in social-vocal interactions. The problem with the former idea is that gerbils only produce up-frequency modulated USVs in isolation and there is no known way to elicit the full vocal repertoire in single animals. The latter idea would allow for full use of the vocal repertoire, but requires invasive techniques (e.g., skull-implanted microphones, or awake-behaving laryngeal nerve recordings) that permit assignment of vocalizations to individuals during a natural social interaction. We are actively exploring solutions to both problems.

      It’s likely that future studies will look deeper into acoustic differences between individuals and families. Therefore, we have added acoustic feature quantification of vocalizations in each of the GMM clusters as a reference (Figure S6).

      Reviewer #3 (Public Review):

      Summary:

      In this study, Peterson et al. longitudinally record and document the vocal repertoires of three Mongolian gerbil families. Using unsupervised learning techniques, they map the variability across these groups, finding that while overall statistics of, e.g., vocal emission rates and bout lengths are similar, families differed markedly in their distributions of syllable types and the transitions between these types within bouts. In addition, the large and rich data are likely to be valuable to others in the field.

      Strengths:

      - Extensive data collection across multiple days in multiple family groups.

      -  Thoughtful application of modern analysis techniques for analyzing vocal repertoires. - Careful examination of the statistical structure of vocal behavior, with indications that these gerbils, like naked mole rats, may differ in repertoire across families.

      Weaknesses:

      - The work is largely descriptive, documenting behavior rather than testing a specific hypothesis.

      - The number of families (N=3) is somewhat limited.

      We agree that the number of families is relatively small. However, our new analysis of vocal repertoire by postnatal day (Figure 4) demonstrates that the finding is quite robust. A high sample-size study was outside the scope of this initial observational study given the difficulty of obtaining and processing longitudinal data of this scale. In light of new analyses in Figure 4, we are confident that future studies will not need so much data to characterize family-specific differences. A single 24-hour recording should be sufficient, making comparison of many more families relatively straightforward.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Several minor concerns:

      (1) The three thresholds used for vocalization segmentation lack explanation.

      Figure 1C's first vocal event appears to define the first gap via the gray threshold (th_2, as the trace does not cross the black line) and the second gap via the black threshold (th_1 or th_3). And this is not addressed in the Methods section.

      Thank you for bringing this to our attention. We agree, this is presented in an unnecessarily complicated way. We have updated the methods section describing the thresholding procedure.

      “Sound onsets are detected when the amplitude exceeds 'th_3' (black dashed line, Figure 1C), and sound offset occurs when there is a subsequent local minimum e.g., amplitude less than 'th_2' (gray dashed line, Figure 1C), or 'th_1' (black dashed line, Figure 1C), whichever comes first. In this specific use case, th_2 (5) will always come before th_1 (2), therefore the gray dashed line will always be the offset. A subsequent onset will be marked if the sound amplitude crosses th_2 or th_3, whichever comes first. For example, the first sound event detected in Figure 1C shows the sound amplitude rising above the black dashed line (th_3) and marks an onset. Subsequently, the amplitude trace falls below the gray dashed line (th_2) and an offset is marked. Finally, the amplitude rises above th_2 without dipping below th_3 and an onset for a new sound event is marked. Had the amplitude dipped below th_3, a new sound event onset would be marked when the amplitude trace subsequently exceeded th_3 (e.g. between sound event 2 and 3, Figure 1C). The maximum and minimum syllable durations were selected based on published duration ranges of gerbil vocalizations (Ter-Mikaelian et al. 2012, Kobayasi & Riquimaroux, 2012).”

      (2) The determination of multi-syllabic calls could be explained further. In Figure 1C, for instance, do syllables separated by short gaps (e.g., the first syllable and the rest of the first group, and the third group in this example) belong to the same call or different calls?

      We have added an operational definition of mono vs. multisyllabic calls in the Results section:

      “Vocalizations occur as either single syllables bounded by silence (monosyllabic) or consist of combinations of single syllables without a silent interval (multisyllabic).”

      Under this definition, the examples you mentioned in Figure 1C are considered monosyllabic. One could reasonably expand the definition to include calls separated by less than X ms of silence for example, however we choose not to do that in this study. A deeper understanding of the phonation mechanisms for different gerbil vocalization types would be helpful to more rigorously determine the distinction between mono vs. multisyllabic vocalizations.

      (3) Labeling the calls shown in Fig. 3D in the latent feature space would help highlight within-family diversity and between-family similarities.

      Great suggestion. We have updated Figure 3 to include where in UMAP space each family’s preferred clusters are.

      (4) In the introduction, the statement, "Therefore, our study considers the possibility that there is a diversity of vocalizations within the gerbil family social group" doesn't naturally follow from the previous example. This could be rephrased.

      Agreed, thank you. We revised this section of the introduction to flow better.

      Reviewer #2 (Recommendations For The Authors):

      While outside the scope of the current study the authors may consider the following experiments and analysis for future studies:

      • Do vocal repertories retain their family signatures across subsequent generations of pups? (i.e. if vocalizations are continually monitored during second or third litters of the same parents).

      • Do the authors observe any long-term changes in family repertoires related to the developmental trajectory of the pups? Are there changes in individual pup vocal features or sound type usage throughout development?

      Thank you for these great suggestions. Given that naked mole rats learn vocalizations through cultural transmission, it would be interesting to see whether other subterranean species with complex social structures (gerbils, voles, rats) have similar abilities. A straightforward way to assess this possibility could be as you suggest — are latent distributions of vocalizations from multi-generational families closer together than cross-family differences? If true, this would provide compelling evidence to investigate further.

      We partially address your second suggestion in our response to Reviewer 1 and in Figure S4, which shows that the family repertoire remains stable throughout this particular period of development. This doesn’t rule out the possibility that there could be other phases of development that undergo more vocal change. Your final suggestion is an area that we are actively researching and eager to know the answer to. A follow-up question: could differences in pup vocal features contribute to differential care by parents?

      Reviewer #3 (Recommendations For The Authors):

      In all, I found the paper clearly written and the figures easy to follow. One small suggestion:

      Figure 1: I can't see the black and gray thresholds described in the caption very well. Perhaps a zoom-in to the first 0.15s or so of the normalized amplitude plot would better display these.

      Agreed, thank you. We added a zoom-in to Figure 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Unckless and colleagues address the issue of the maintenance of genetic diversity of the gene diptericin A, which encodes an antimicrobial peptide in the model organism Drosophila melanogaster.

      Strengths:

      The data indicate that flies homozygous for the dptA S69 allele are better protected against some bacteria. By contrast, male flies homozygous for the R69 allele better resist starvation than flies homozygous for the S69 allele.

      Weaknesses:

      -I am surprised by the inconsistency between the data presented in Fig. 1A and Fig. S2A for the survival of male flies after infection with P. rettgeri. I am not convinced that the data presented support the claim that females have lower survival rates than males when infected with P. rettgeri (lines 176-182).

      The two figures are pasted above (1A left, S2A right). The reviewer is correct that the two experiments look different in terms of overall outcomes for males, though qualitatively similar. These two experiments were performed by different researchers, and as much as we attempt to infect consistently from researcher to researcher, some have heavier hands than others. It is true that the genotype that has the largest sex effect is the arginine line (blue) where females (in this experiment) are as bad as the null allele, and males are more intermediate. Also note that the experiments in S2A (male and female) were done in the same block so they are the better comparison. We’ve reflected this in the manuscript.

      - The data in Fig. 2 do not seem to support the claim that female flies with either the dptA S69 or the R69 alleles have a longer lifespan than males (lines 211-215). A comment on the [delta] dpt line, which is one of the CRISPR edited lines, would be welcome.

      We’ve reworded this section based on these comments.

      - The data in Fig. 2B show that male flies with the dptA S69 or R69 alleles have the same lifespan when poly-associated with L. plantarum and A. tropicalis, which contradicts the claim of the authors (lines 256-260).

      This is correct – the effect is only in females. It has been corrected.

      Reviewer #2 (Public Review):

      Summary: In this study, the authors delve into the mechanisms responsible for the maintenance of two diptericin alleles within Drosophila populations. Diptericin is a significant antimicrobial peptide that plays a dual role in fly defense against systemic bacterial infections and in shaping the gut bacterial community, contributing to gut homeostasis.

      Strengths: The study unquestionably demonstrates the distinct functions of these two diptericin alleles in responding to systemic infections caused by specific bacteria and in regulating gut homeostasis and fly physiology. Notably, these effects vary between male and female flies.

      Weaknesses: Although the findings are highly intriguing and shed light on crucial mechanisms contributing to the preservation of both diptericin alleles in fly populations, a more comprehensive investigation is warranted to dissect the selection mechanisms at play, particularly concerning diptericin's roles in systemic infection and gut homeostasis. Unfortunately, the results from the association study conducted on wild-caught flies lack conclusive evidence.

      This is true that the wild fly association study is mostly a negative result. We’ve backed off the claim about the Morganella association.

      Major Concerns:

      Lines 120-134: The second hypothesis is not adequately defined or articulated. Please revise it to provide more clarity. Additionally, it should be explicitly stated that the first part of the first hypothesis (pathogen specificity), i.e., the superior survival of the S allele in Providencia infections compared to the R allele, has been previously investigated and supported by the results in the Unckless et al. 2016 paper. The current study aims to additionally investigate the opposite scenario: whether the R allele exhibits better survival in a different infection. Please consider revising to emphasize this point.

      We’ve reworded this section and added references to both the Unckless et al. 2016 and Hanson et al. 2023 papers.

      Figures and statistical analyses: It is essential to present the results of significant differences from the statistical analyses within Figures 1B, 2B, and 3. Additionally, please include detailed descriptions of the statistical analysis methods in the figure legends. Specify whether the error bars represent standard error or standard deviation, particularly in Figure 3, where assays were conducted with as few as 3 flies.

      We have added statistical details as requested.

      Lines 317-318 (as well as 320-328): The data related to P. rettgeri appear somewhat incomplete, and the authors acknowledge that bacterial load varies significantly, and this bacterium establishes poorly in the gut. These data may introduce more noise than clarity to the study. Please consider revising these sections by either providing more data, refining the presentation, or possibly removing them altogether.

      The fact that P. rettgeri establishes poorly in the gut in wildtype flies is the result of several unpublished experiments in the Lazzaro and Unckless labs. We don’t have this as a figure because it was not directly tested in these experiments. We’ve added a note that it is personal observation and we’ve reworked the discussion in the second section.

      Lines 335-387 and Figure 4: Although these results are intriguing and suggest interactions between functional diptericin and fly physiology, some mediated by the gut microbiome, they remain descriptive and do not significantly contribute to our understanding of the mechanism that maintains the diptericin alleles.

      While the reviewer is correct that these experiments do not elucidate mechanism, they do strongly suggest (based on the controlled nature of the experiments) that the physiological tradeoffs are due to Diptericin genotype. The disagreement is the level of “mechanism”. At the evolutionary level, the demonstration of a physiological cost of a protective immune allele is sufficient to explain the maintenance of alleles. However, we have not determined (and did not attempt to determine) why Diptericin genotype influences these traits. That will have to wait for future experiments.

      Lines 399-400: The contrast between this result and statement and the highly reproducible data presented in Figures 2-4 should be discussed.

      We’ve added some discussion to this section including a reference to the “inconstancy” of the Drosophila gut microbiome.

      Lines 422-429 and Figure 5D: The conclusion regarding an association between diptericin alleles and Morganellaceae bacteria is not clearly supported by Figure 5D and lacks statistical evidence.

      We’ve changed this to just be suggestive.

      Reviewer #3 (Public Review):

      Summary:

      This paper investigates the evolutionary aspects around a single amino acid polymorphism in an immune peptide (the antimicrobial peptide Diptericin A) of Drosophila melanogaster. This polymorphism was shown in an earlier population genetic study to be under long-term balancing selection. Using flies with different AA at this immune peptide it was found that one allelic form provides better survival of systemic infections by a bacterial pathogen, but that the alternative allele provides its carriers a longer lifespan under certain conditions (depending on the microbiota). It is suggested that these contrasting fitness effects of the two alleles contribute to balance their long-term evolutionary fate.

      Strengths:

      The approach taken and the results presented are interesting and show the way forward for studying such polymorphisms experimentally.

      Weaknesses:

      (1) A clear demonstration (in one experiment) that the antagonistic effect of the two selection pressures isolated is not provided.

      The study is overwhelming with many experiments and countless statistical tests. The overall conclusion of the many experiments and tests suggests that "dptS69 flies survive systemic infection better, while dptS69R flies survive some opportunistic gut infections better." (line 444-446). Given the number of results, different experiments, and hundreds of tests conducted, how can we make sure that the result is not just one of many possible combinations? I suggest experimentally testing this conclusion in one experiment (one may call this the "killer-experiment") with the relevant treatments being conducted at the same time, side by side, and the appropriate statistical test being conducted by a statistical test for a treatment x genotype interaction effect.

      This is a nice idea but would not work in practice since the fly lines used are different (gnotobiotic vs conventional) and gnotobiotics have to be derived from axenic lines that need a few generations to recover from the bleaching treatment.

      (2) The implication that the two forms of selection acting on the immune peptide are maintained by balancing selection is not supported.

      The picture presented about how balancing selection is working is rather simplistic and not convincing. In particular, it is not distinguished between fluctuating selection (FL) and balancing selection (BL). BL is the result of negative frequency-dependent selection. It may act within populations (e.g. Red Queen type processes, mating types) or between populations (local adaptation). FL is a process that is sometimes suggested to produce BL, but this is only the case when selection is negative frequency dependent. In most cases, FL does not lead to BL.

      The presented study is introduced with a framework of BL, but the aspects investigated are all better described as FL (as the title says: "A suite of selective pressures ..."). The two models presented in the introduction (lines 62 to 69; two pathogens, cost of resistance) are both examples for FL, not for BL.

      We’ve added a discussion of how fluctuating selection and balancing selection relate at the end of the discussion.

      Finally, no evidence is presented that the different selection pressures suggested to select on the different allelic forms of the immune peptide are acting to produce a pattern of negative frequency dependence.

      We are not arguing for negative frequency dependent selection. We assume throughout that Dpt allele does not drive overall frequency of P. rettgeri in populations since it is a ubiquitous microbe. So evolution within D. melanogaster therefore has little to no effect on density of the pathogen.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Minor Comments:

      Line 31: Rewrite the sentence mentioning "homozygous serine" for improved clarity, especially since the S/R polymorphism of Diptericin has not been introduced yet.

      This has been changed to be vague in terms of specific alleles and just refers to “one allele” vs the other.

      Lines 87-94: Consider reorganizing this paragraph to maintain a logical flow of the discussion on the Drosophila immune system and the IMD pathway.

      We explored other orders, but we think that as is (IMD to AMPs in general to AMPs in Drosophila) makes the most sense here.

      Line 99: Provide an explanation of balancing selection for a broader readership, differentiating it from other modes of selection.

      We added a brief discussion but note that the intro has significant discussion of balancing selection.

      Lines 105-106: Please provide a proper reference. Additionally, ensure that the Unkless et al. 2016 paper is correctly referenced, both in lines 111 and 138-141.

      This has been added.

      Lines 138-141: It would be beneficial to state that the previous study by Unkless et al. 2016 did not control for genetic background, which is why the assay was redone with gene editing.

      This has been added.

      Lines 296-303: Clarify the source of the survival observations and consider incorporating this data into Figure 2 for improved visualization.

      We’ve clarified that this is Figure 2.

      Lines 390-394: Explain the distinctions between vials and cages, particularly in terms of food consumption, exposure to bacteria, etc., which can be relevant to gut homeostasis.

      We’ve added a discussion of why these two approaches are complementary.

      Reviewer #3 (Recommendations For The Authors):

      Statistics

      Statistical results are limited to the presentation of p-values (several hundred of them!). For a proper assessment of the statistical analyses, one would also want to see the models used and the test statistics obtained.

      The statistical tests done are often unclear. For example, in several experiments, pools of 3 trials (blocs) of multiple animals were tested. The blocs need to be included in the model. Likewise, it seems that multiple delta-dpt fly genotypes were produced. Apparently, they were not distinguished later. Were they considered in the statistical analyses? By contrast, two lines of dptS69R flies were reported to show differences. What concept was applied to test for line difference in some cases and not in others?

      In the same dataset (i.e. data resulting from one experiment), it seems that mostly multiple tests were done. For example, in one case each treatment was contrasted to the dptS69 flies. It is generally not acceptable to break down one dataset in multiple subsets and conduct tests with each subtest. One single model for each experiment should be done. This may then be followed by post-hoc tests to see which treatments differ from each other.

      We’ve attempted to clarify these statistical approaches throughout.

      Minor points

      In the legend of Figure 3 it says: "A) monoassociations where each plot represents a different experiment,". This is unclear to me. First, how many plots are there: 3 or 12? Second, what means "experiment"? Are these treatments, or entirely different experiments? How was this statistically taken into account?

      We’ve changed this to “different condition” which is clearer. We performed statistical analysis independently for each condition and we’ve now discussed that.

      Fig. 5D. It is suggested in the text ("Most intriguing", line 426) and the figure legend that the abundance of Morganellaceae in wild-caught flies differs among genotypes. This is not visible in the figure and not convincingly shown in the text. No stats are given.

      We’ve now added that these differences are not significant.

      Line 458-461: This sentence is unclear.

      We’ve attempted to clarify.

      What is a "a traditional adaptive immune system"?

      We’ve reworded to “an adaptive immune system”.

      There are several typos in the manuscript. Please correct.

      We’ve attempted to fix typos throughout.

      Bold statements are often without references.

      We’ve attempted to add appropriate references throughout.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      R1-01 - Does ank-G-GFP label all isoforms (190, 270 and 480kDa) of ankG? From the images of the AIS and noR it appears that the large forms (270 and 480 kDa) are probably tagged with GFP. Did the authors check for puncta along dendrites and in dendritic spines, which are thought to be formed by the small (190 kDa) isoform? Perhaps a western blot to show that Ank-G-GFP labels all isoforms would be a useful addition to this study.

      We believe that AnkG-GFP indeed labels the major Ank3 transcripts in the brain, including the 190, 270, and 480 kDa isoforms, based both on known mRNA exon usage and on Western blot analysis (data not shown). Thus, theoretically, this model would be useful for examining the localization of 190 kD ankyrin-G to dendritic spines. While we attempted to examine this in sections from tissue, it was difficult to separate punctate ankyrinG-GFP labeling from the background. However, these experiments were done in genetic crosses that would label most pyramidal neurons in a given area (i.e. CaMKIIa-Cre). Given the Cre-dependence of this model, future experiments could utilize sparse transduction with a Cre virus that also fills neurons with soluble fluorophores (i.e. mCherry or tdTomato) to mark isolated neurons and identify dendritic spines, as exemplified in Fig. 2D. This would allow examination of subcellular localization of ankyrin-G within single pyramidal cells before and after induction of synaptic plasticity.

      R1-02 - In Figure 2, does all the native Ank-G get replaced by Ank-G-GFP? In Fig. 2E the GFP signal along the AIS of CamKII +ve neurons does not appear to be very homogeneous compared to the BIV-spectrin label. Have the authors carried out more experiments like those in 2F, using antibodies that label AnkG together with the GFP fluorescence of the labeled AnkG? It would also be informative to know if, as one might expect, the total levels of ankG-GFP correlate with the levels of ankG at the AIS.

      We agree that this is an important point and conducted additional experiments to address your concerns. Of course, we cannot exclude that some unmodified ankyrin-G remains in the AIS or other structures. We expect the turnover of the protein to be rather slow, and native ankyrin-G likely remains to some degree. However, our quantification demonstrates that the ankyrin-G-GFP labeling is sufficiently homogeneous to accurately represent AIS size, indicating proportional levels of GFP to native ankyrin-G. Animals were crossed with a CaMKIIa-Cre driver line and ex vivo slices were imaged live and after immunolabeling. We found a strong correlation between live ankyrin-G-GFP (patch clamp chamber), postfix ankyrin-G-GFP, postfix ankyrin-G, and βIV-spectrin immunosignals of the same AIS. Furthermore, our measurements of AIS length using the intrinsic GFP signal in combination with ankyrin-G, or βIV-spectrin antibodies showed significant overlap (see R103). We now included these graphs as supplemental Fig. S2 in the manuscript (pp. 8-9, ll. 173-177).

      R1-03 - Does the length and position of the AIS change when Ank-G is tagged with GFP? This seems like important information that is needed to make sure that there are no structural differences in AIS morphology when compared to native Ank-G.

      This is a very important point. We used the βIV-spectrin signal to compare the length of AIS with and without GFP modification in acute slices after patch-clamp recordings (N= 3 animals, 27 GFP+ and 48 GFP- AIS). As secondary control, we plotted the measurements of 160 AIS from a Thy1-GFP mouse line (N = 3 animals, 160 AIS). We found no significant difference in the length and position of the βIV-spectrin signal between GFP positive and negative AIS (p=0.3364 unpaired t-test, p=0.6138 non-parametric Mann-Whitney test, respectively). We have now included this analysis as Supplemental Fig. S2A in the manuscript (pp. 8-9, ll. 173-177). 

      R1-04 - How was node length measured in Figure 3? Was this done using the endogenous ank-G signal? In this figure, it would be informative to also quantify the number of noRs with a Nav1.6 stain. Perhaps even check if there are correlations between Ank-G-GFP and Nav1.6 levels. In this figure, it appears that comparisons are carried out between Ank-G-GFP +ve and -ve neurons in the same cryosections, from Ank-G-GFP mice crossed with CamKIIa-Cre. I worry that this may not be comparing the same types of axons. What cells do the CamKIIa -ve axons belong to? Also, the labels on the bar graph are confusing - perhaps GFP+ve and GFP-ve would be clearer?

      The reviewer raises an important point. We forgot to declare the signal which was used to measure node length in the manuscript. We have corrected this error and clearly state now in the Fig.3C legend that we used the ankyrin-G signal to quantify node length. Furthermore, using CaMKIIa-Cre mediated expression triggers ankyrin-G-GFP only in a genetically defined subset of neurons. Nodes that do not belong to this subgroup might very well have different node properties. Yet, we cannot assign potential differences in node length to the presence or absence of the GFP label, since we do not have an independent labeling technique for the very same subset of neurons. Since node lengths were similar and showed the same spread of lengths in our sample (Fig. 3C), we assume that the GFP length does probably not affect node length to a significant degree. We have now discussed this limitation in the result (p. 7, ll. 159-165) and method section (p. 30, ll. 644-645) and provide Supplementary Fig. S1 for more clarity. As suggested by the reviewer, we have measured mean fluorescence intensities between 91 GFP+ and 141 GFP- nodes using automated image processing in Imaris. The nodes were again defined by the ankyrin-G signal. We found no difference in length and ellipticity between the groups. We repeated this analysis and compared fluorescence intensities of Nav1.6 and ankyrin-G antibodies and again found no statistical differences between both groups. As suggested by the reviewer, we investigated whether ankyrin-G-GFP interferes with the fluorescence intensities of sodium channels (Nav1.6) and ankyrin-G in general. While the GFP signal showed a strong correlation with ankyrin-G, we found no interdependence with the Nav1.6 signal, indicating that the GFP label does not interfere with the general molecular composition of the nodes. We included these new analyses in Supplemental Fig. S1 (p. 7, ll. 159-165).

      R1-05 - In Figure 4 it would also be important to show the distribution of AIS molecules along the AIS, compared to the GFP signal, to establish whether this spatial arrangement of AIS-specific molecules remains intact. For example, Nav1.6 has been described as a more distally-located channel. As the authors point out, the example in A appears to show precisely this feature, but there is no quantification. The same applies to Kv1.2. This would also allow the authors to provide some quantification across multiple AISs, rather than just example images.

      We agree that quantifying and comparing AIS-associated proteins would be informative. We measured the intensity profiles of Nav1.6 and Kv2.1 in neighboring AIS and found no preferences for either end of the AIS, neither of GFP-positive nor GFP-negative AIS. We want to note that not all neurons exhibit a distal localization of Nav1.6 and hypothesize that our samples (neocortex layer II) also fall into this group. We included this new graph as Supplemental Fig. S2D and E in the manuscript (p. 9, ll. 180-184).

      R1-08 - In Figure 4, did the +Cre condition result in all cells showing a GFP-labelled AIS? If not, were the autocorrelations for +Cre-treated neurons done specifically on cells that expressed AnkG-GFP?

      We assume the reviewer refers to the autocorrelation in Figure 6. In this in vitro paradigm, we used virus-induced Cre expression which triggered ankyrin-G-GFP in almost all neurons. The orange boxplots describe the autocorrelation of all ankyrin-G, using a C-terminal antibody as in Fig.6C, but in neurons that also express ankyrin-G-GFP. The green samples use the GFP signal of ankyrin-GFP. We clarified this in the graph and legend of Fig. 6C (pages 14-15).

      R1-09 - As mentioned above in Figure 3, the comparisons in Figure 5 (GFP +ve and -ve neurons) may not be comparing like-for-like neurons. I imagine that many of the CamKII+ve cells in the cortex and hippocampus will be GABAergic interneurons, whereas presumably all of the CamKII+ve neurons will be pyramidal cells. Have the authors made sure that they are comparing across the same cell types? The fact that the number of axo-axonic synapses is similar across the two populations (Fig. 5B) does suggest that similar neuron types (presumably pyramidal cells) were compared in the hippocampus, but some other way of making sure would be a nice addition.

      We agree with the reviewer that the grey and green boxes are not sampled from the same subset of neurons, since only CaMKIIa-positive principal cells will express ankyrin-G-GFP. However, we are confident that the selected AIS belong to pyramidal neurons in both cases. Principal neurons can be well distinguished from interneurons not only by the size, shape, and position of their somas but also by the length and thickness of their AIS. We have performed previous studies on the AIS of interneurons using genetic GAD and parvalbumin markers. Thus, we are confident that the plots in 5A and 5B are sampled from pyramidal neurons, though certainly from genetically different subsets. We now highlight and discuss this limitation in the result section (p. 11, ll. 215-217) and modified the graph in Fig. 5A and 5B for clarity.

      R1-10 - In Figure 6, what was the promoter for the DCre and Cre+ lentivirus? Was this also driven by CamKIIa? In culture it is not always easy to be sure of neuronal identity - did the authors try to bias their analysis to specific neuronal types?

      Indeed, the nature of the promotor was not stated in the legend or method section, which we now corrected. We used lentiviral FUW-nGFP-Cre and FUW-nGFP-ΔCre constructs to trigger ankyrin-G-GFP expression. Both viruses use the CMV (Cytomegalovirus) promoter, which drives constitutively high levels of gene expression in a wide range of cell types, including neuronal cells. The majority of neurons in dissociated hippocampal cultures are excitatory, especially larger cells with larger AIS, which were preferably used in the analysis. Thus, we cannot claim that AIS nanostructure is intact in cultured interneurons, but this is also true for in vivo conditions in general. Since mice did not show any obvious behavioral phenotypes, we are positive that interneuron functionality is preserved. We also note that the parallel expression of nuclear GFP in the infected neurons was undesired, but did not impact STED imaging due to that technique’s high resolution. 

      R1-11 - The ability to visualize the plasticity of the AIS in real-time is an important advance in the field. The loss of proximal Ank-G-GFP signal upon local application of 15 mM KCl is particularly interesting. The fact that neighboring AISs are not affected is surprising - do the authors know how local their KCl application was? Also, although the neighboring AISs are a nice control, the one control lacking here is the local application of normal solution (preferably 15 mM NaCl to account for osmolarity changes) to make sure that this does not affect the properties of the AIS.

      We used KCl puffs in previous, unrelated experiments where we observed that only cells directly in front of the pipette are visibly depolarized by an acute KCl puff (measured by patch-clamp). Due to technical limitations, patched and live imaged neurons were generally in the first 2-5 cell layers of the brain slice, which is well perfused by the constant flow of oxygenated ACSF. KCl is thus quickly diluted and carried away. We have visualized the concentration gradients via puff application by puffing the fluorescent marker fluorescein in the same recording condition. The cone of fluorescence was only visible in front of the pipette and vanished in less than a second post-pressure application. To verify that it is indeed KCl and not the mechanical stress that lead to the loss of proximal Ank-G-GFP, one would indeed need an ACSF puff control, which we did for other studies. However, this is not the point we wanted to make. Instead of studying live single-cell AIS plasticity, we want to demonstrate that such investigations are generally possible using the ankyrin-G-GFP line.

      Author response image 1.

      R1-12 - The ability to be able to image AISs in vivo is another important finding. Were the authors able to image noRs as well?

      We believe that this is indeed the case. The panels in Figure 9C contain densely labeled puncta that also remain in position from week 1 to week 2. These are likely nodes of Ranvier, although we do not have the means to verify their presence at this time.

      Reviewer #2:

      R2-01 - Are there indeed different Ank-G-GFP isoforms expressed in this model and could they correspond to classical neuronal Ank-G isoforms?

      This is an important issue that was also raised by reviewer #1. Please consult the respective section R1-01 above for our response.

      R2-02 - What is the rationale of doing Ank-G co-labelling in the case of Ank-G-GFP expression, rather than Pan-Nav staining for example? The co-staining with Nav1.6 antibody, when present, is however convincing.

      We used the co-labeling to emphasize that the ankyrin-G-GFP construct allows reliable investigation of the whole AIS. This is why we wanted to demonstrate that the ankyrin-G-GFP signal overlaps with other AIS markers, as well as all ankyrin-G in general (including potentially remaining native and unlabeled ankyrin-G). This was also a point raised by Reviewer 1, which is why we provided some additional graphs (see response R1-02). However, we agree that staining with another independent marker, such as Nav1.6 or βIVspectrin was necessary. 

      R2-03 - Figure 2D and F: what is the rationale for not using betaIV-Spectrin staining as in the other panels of this figure? Furthermore, could betaIV-Spectrin localization be affected by Ank-GGFP expression, as betaIV-Spectrin is known to depend on Ank-G for its AIS targeting? Are there any other AIS markers, which localization is known to be independent of Ank-G, that could have been used?

      We have compiled this figure from a multitude of different experimental setups from different labs to showcase the reliability and robustness of the ankyrin-G-GFP label. This is why the type of staining is not consistent among panels. However, we provide some quantification on the possible impact of ankyrin-G-GFP expression on the βIV-spectrin signal and the composition of the AIS in general. The STED image verifies that the basic subcellular arrangement of the cytoskeleton, including βIV-spectrin, remains intact (Fig. 6). Most AIS markers are at least in some way dependent on ankyrin-G expression, but FGF14 and neurofascin may be the most independent candidates (Fig. 4).

      R2-04 - Did the authors measure the mean AIS length and distance from cell soma in Ank-G-GFPexpressing neurons versus non-expressing ones (considering the same neuronal subtypes) to assess whether these were unaffected by Ank-G-GFP expression?

      This is an important point that was also raised by Reviewer 1 (see also our comments to R1-03). We have included this analysis now in the manuscript as Supplemental Fig. S2A (pp. 8-9, ll. 173-177).

      R2-05 - Figure 5C: the microglial staining and 3D reconstruction could have been clearer.

      We have modified the image and 3D rendering to make Figure 5C clearer to the reader. We hope that our changes suffice.

      R2-06 - Figure 8: do hippocampal neurons retain their electrophysiological properties after 20 DIV? It could strengthen this part of the work to have access to the electrophysiological data mentioned in the text. 

      This is an important issue. We did not perform any electrophysiological recordings in OTCs in the course of this study. Panel E uses acute hippocampal slices like in Fig. 7. We have performed patch-clamp experiments up to DIV 10 for an unrelated study (see graph for action potential firing, Author response image 2). There are not many studies performing electrophysiology in slice cultures due to the formation of a glial scar on top of the slices. However, multielectrode array (MEA) recordings demonstrated that hippocampal organotypic slice cultures remain viable and show electric activity past DIV 20 (though with decreased viability and activity). We kindly refer to the following publications on that matter:

      Author response image 2.

      Sample traces of action potentials triggered by cuttrent injections

      Gong W, Senčar J, Bakkum DJ, Jäckel D, Obien ME, Radivojevic M, Hierlemann AR. Multiple SingleUnit Long-Term Tracking on Organotypic Hippocampal Slices Using High-Density Microelectrode Arrays. Front Neurosci. 2016 Nov 22;10:537. doi: 10.3389/fnins.2016.00537. PMID: 27920665; PMCID: PMC5118563.

      Mohajerani MH, Cherubini E. Spontaneous recurrent network activity in organotypic rat hippocampal slices. Eur J Neurosci. 2005 Jul;22(1):107-18. doi: 10.1111/j.1460-9568.2005.04198.x. PMID: 16029200.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In the manuscript, the authors explore the mechanism by which Taenia solium larvae may contribute to human epilepsy. This is extremely important question to address because T. solium is a significant cause of epilepsy and is extremely understudied. Advances in determining how T. solium may contribute to epilepsy could have significant impact on this form of epilepsy. Excitingly, the authors convincingly show that Taenia larvae contain and release glutamate sufficient to depolarize neurons and induce recurrent excitation reminiscent of seizures. They use a combination of cutting-edge tools including electrophysiology, calcium and glutamate imaging, and biochemical approaches to demonstrate this important advance. They also show that this occurs in neurons from both mice and humans. This is relevant for pathophysiology of chronic epilepsy development. This study does not rule out other aspects of T. solium that may also contribute to epilepsy, including immunological aspects, but demonstrates a clear potential role for glutamate.

      Strengths:

      - The authors examine not only T. solium homogenate, but also excretory/secretory products which suggests glutamate may play a role in multiple aspects of disease progression.

      - The authors confirm that the human relevant pathogen also causes neuronal depolarization in human brain tissue

      - There is very high clinical relevance. Preventing epileptogenesis/seizures possibly with Glu-R antagonists or by more actively removing glutamate as a second possible treatment approach in addition to/replacing post-infection immune response.

      - Effects are consistent across multiple species (rat, mouse, human) and methodological assays (GluSnFR AND current clamp recordings AND Ca imaging)

      - High K content (comparable levels to high-K seizure models) of larvae could have also caused depolarization. Adequate experiments to exclude K and other suspected larvae contents (i.e. Substance P).

      Weaknesses:

      - Acute study is limited to studying depolarization in slices and it is unclear what is necessary/sufficient for in vivo seizure generation or epileptogenesis for chronic epilepsy. - There is likely a significant role of the immune system that is not explored here. This issue is adequately addressed in the discussion, however, and the glutamate data is considered in this context.

      Discuss impact:

      - Interfering with peri-larval glutamate signaling may hold promise to prevent ictogenesis and chronic epileptogenesis as this is a very understudied cause of epilepsy with unknown mechanistic etiology.

      Additional context for interpreting significance:

      - High medical need as most common adult onset epilepsy in many parts of the world

      We thank Reviewer 1 for their positive and thorough assessment of our manuscript. We have elected to respond to and address the following aspects from their “Recommendations For The Authors” below:

      Reviewer #1 (Recommendations For The Authors):

      Additional experiments/analysis:

      -   Fig 4a-c: Larva on a slice and not next to it? Negative results maybe because its E/S products are just washed away (assuming submerged recording chamber/conditions)? Experiments and negative results described here do not seem conclusive. Should be discussed at least?

      We agree with the reviewer and have added the following sentence to the relevant section of the Results: ‘Our submerged recording setup might have led to swift diffusion or washout of released glutamate, possibly explaining the lack of observable changes.’

      Writing & presentation:

      - Data is not always reported consistently in text and figures, examples:

      - Results in text are reported varyingly without explanation:

      - Mean and/or median? SEM or SD and/or IQR? Stat info included in text or not? i.e. lines 130/131 vs. 160/161

      Results and data are now presented in a more uniform fashion. We report medians and IQRs, sample size, statistical test result, statistical test used in that order.

      - Larval release data interrupts reading flow, lines 246-252 double up results presented in Fig 5F.

      This section has now been significantly abbreviated and reads as follows: ‘T. crassiceps larvae released a relatively constant median daily amount of glutamate, ranging from 41.59 – 60.15 ug/20 larvae, which showed no statistically significant difference across days one to six. Similarly, T. crassiceps larvae released a relatively constant median daily amount of aspartate, ranging from 9.431 – 14.18 ug/20 larvae, which showed no statistically significant difference across days one to six.’

      - Results in figures are reported in different styles:

      Results have now been made uniform, reporting medians and IQRs and: sample size, p test result, statistical test used, figure # reported in that order.

      - Fig 6: E/S glu concentration seems to be significantly higher in solium vs crassiceps (about 6fold higher in solium). Should be discussed at least.

      Given the small sample size from T. solium (see response below), we do not draw attention to this difference and instead simply make the point that T. solium larvae contain and release glutamate.

      - In this context - N=1 may be sufficient for proof of principle (release) but seems too small of a cohort to describe non-constant release of glu over days (Fig 6D). Is initial release on day 1, no release and recovery in the following days reproducible? Is very high glu content of E/S content (15-fold higher in comparison to solium homogenate AND 6-fold higher in comparison to crassiceps homogenate and E/S content). Not sure if Fig 6D is adding relevant information, especially since it is based on n = 1

      We agree that a N=1 is only sufficient for proof of principle. However it is worth noting that the measurements still reflect the cumulative release from 20 larvae. Nonetheless, the statement in text has been simplified to say: ‘These results demonstrate that T. solium larvae continually release glutamate and aspartate into their immediate surroundings.’ As this focusses on the point that the larvae release glutamate and aspartate continuously and that we can’t draw conclusions about the variability over days.

      Methods:

      - Human slices, mention cortex - what part, patient data would be interesting. I.e. etiology of epilepsy, epilepsy duration 

      In the Materials and Methods section “Brain slice preparation” we have now added a table with the requested information.

      - For Taenia solium: How were they acquired and used in these experiments?

      In the Materials and Methods section “Taenia maintenance and preparation of whole cyst homogenates and E/S products” we describe how Taenia solium larvae were acquired and used.

      - Was access resistance monitored? Add exclusion criteria for patch experiments

      Figure supplement tables containing the basic properties for each cell recording have been added for each figure and the following statements were added to the electrophysiology section of the Methods: ‘Basic properties of each cell were recorded (supplementary files 1, 2, 3, 4, 6).’ and ‘Cells were excluded from analyses if the Ra was greater than 80 Ω or if the resting membrane potential was above –40 mV.’  

      - Cannot see any reference to mouse slices in methods? Also, mouse organotypic cultures (for AAV?)? Or only acute slices from mice and organotypic hip cultures from rats? Seems to have been mouse and rat organotypic cultures? But not clear with further clarification in methods.

      We have now added the following clarification to the methods: ‘For experiments using calcium and glutamate imaging mouse hippocampal organotypic brain slices were used. For all other experiments rat hippocampal organotypic brain slices were used. A subset of experiments used acute human cortical brain slices and are specified.’

      - How long after the wash-in phase was the wash-out phase data collected?

      For wash-in recordings drugs were washed in for 8 mins before recordings were made. Drugs were washed out for at least 8 mins before wash-out recordings were made. This information has been added to the Materials and Methods section.

      - In general, the M&M section seems to have been written hastily - author's internal remarks "supplier?" are still present.

      The M&M section has been thoroughly proofread for errors and internal remarks removed or corrected.

      - A little more information on the clinical subjects would be appreciated. I.e. duration of epilepsy? Localization? What cortex? Usual temporal lobe or other regions?

      We have now added a table with this information to the Materials and Methods section “Brain slice preparation”.

      Minor corrections text/figures:

      - i.e. 3D,F,H,J show individual data points, thats great, but maybe add mean/median marker (as results are reported like this in text)  like in fig 4G,I and others

      Figures 3D,F,H & J have been revised to include median and IQR.

      - Only one patient mentioned in acknowledgements, but 2 in methods and text

      We apologize for this oversight and now acknowledge both patients in the acknowledgements.

      - Fig 1 B-F individual puffs are described as increasing - consistent with cellular effects (1st puff depolarizes, 2nd puff elicits 1 AP, 3rd puff elicits AP burst)  However, dilution ratio of homogenate or puff concentrations are not mentioned (or potentially longer than 20 ms puffs for 2nd and 3rd stimulus?) in text or figures. Seems to be enough space to indicate in figure as well (i.e. multiple or thicker arrows for subsequent puffs or label with homogenate dilution/concentration in figure).

      We state in the results section associated with Fig. 1 that increasing the amount of homogenate delivered was achieved by increasing the pressure applied to the ejection system. We now include this information in the figure legend.

      - Figure legend describes 30 ms puff for Ca imaging whereas ephys data (from text) is 20 ms puff. Was Ca imaging performed in acute mouse hippocampal slices (as figure text suggests) or were those organotypic hippocampal cultures from mice?

      Ca2+  imaging was performed in mouse hippocampal organotypic brain slice cultures. The figure text for Fig. 1 E) states “widefield fluorescence image of neurons in the dentate gyrus of a mouse hippocampal organotypic brain slice culture expressing the genetically encoded Ca2+ reporter GCAMP6s...”

      - 11.4 mM K is reported for homogenate in text only. How variable is that? How many n? No SD reported in text and no individual data points reported since this experiment is not represented as a figure.

      This has been clarified in the text by adding (N = 1, homogenate prepared from >100 larvae).

      - Same results (effect of 11.4 mM K on Vm) described twice in one paragraph, compare lines 126-131 with 131-136.

      The repetition has been removed.

      - Line 182 - example for consistency: decide IQR or SD/SEM

      To improve consistency, we have changed to median and IQR throughout.

      - Neuronal recordings are reported as hippocampal pyramidal neurons (i.e. line 222) but some recordings were made from dentate granule cells - please clarify which neurons were recorded in ephys, ca imaging, GluSnFr imaging

      For each experiment we describe which type of neurons were recorded from. For rodent recordings these were hippocampal pyramidal neurons except in the case of the Ca2+ imaging example where the widefield recording was over the dentate gyrus subfield.

      - Line 309: "should" seems to be an extra word

      We have removed the word ‘should’ and made the sentence shorter and clearer. It now reads: ‘Given our finding that cestode larvae contain and release significant quantities of glutamate, it is possible that homeostatic mechanisms for taking up and metabolizing glutamate fail to compensate for larvalderived glutamate in the extracellular space. Therefore, similar glutamate-dependent excitotoxic and epileptogenic processes that occur in stroke, traumatic brain injury and CNS tumors are likely to also occur in NCC.’

      Reviewer #2 (Public Review):

      Since neurocysticercosis is associated with epilepsy, the authors wish to establish how cestode larvae affect neurons. The underlying hypothesis is that the larvae may directly excite neurons and thus favor seizure genesis.

      To test this hypothesis, the authors collected biological materials from larvae (from either homogenates or excretory/secretory products), and applied them to hippocampal neurons (rats and mice) and human cortical neurons.

      This constitutes a major strength of the paper, providing a direct reading of larvae's biological effects. Another strength is the combination of methods, including patch clamp, Ca, and glutamate imaging.

      We thank the Reviewer 2 for their review of the strength and weaknesses of our manuscript. We respond to the identified weaknesses below.

      There are some weaknesses:

      (1) The main one relates to the statement: "Together, these results indicate that T. crassiceps larvae homogenate results not just in a transient depolarization of cells in the immediate vicinity of application, but can also trigger a wave of excitation that propagates through the brain slice in both space and time. This demonstrates that T. crassiceps homogenate can initiate seizurelike activity under suitable conditions."

      The only "evidence" of propagation is an image at two time points. It is one experiment, and there is no quantification. Either increase n's and perform a quantification, or remove such a statement.

      We acknowledge that the data is from one experiment, with the intention of demonstrating that it is plausible for intense depolarization of a subset of neurons to result in the initiation and propagation of seizure-like activity to nearby neurons under suitable conditions. However, we agree that it is prudent to remove this statement and have done so.

      Likewise, there is no evidence of seizure genesis. A single cell recording is shown. The presence of a seizure-like event should be evaluated with field recordings.

      In this experiment the Ca2+ imaging demonstrates activity spreading from the site of the restricted homogenate puff to all surrounding neurons. Furthermore, the whole-cell recoding is typical of a slice wide seizure-like event.  

      (2) Control puff experiments are lacking for Fig 1. Would puffing ACSF also produce a depolarization, and even firing, as suggested in Fig. 2D? This is needed for at least one species.

      We agree and have added this data for the rat and mouse neuron in a new Figure 1-figure supplement 1.

      (3) What is the rationale to use a Cs-based solution? Even in the presence of TTX and with blocking K channels, the depolarization may be sufficient to activate Ca channels (LVGs), which would further contribute to the depolarization. Why not perform voltage clamp recordings to directly the current?

      The intention of the Cs-based solution was to block K+ channels and reduce the effect of moderately raised K+ in the homogenate to isolate the contribution of other causative agents of depolarization (i.e. glutamate / aspartate). We agree that performing voltage clamp recordings would have been useful for directly recording the currents responsible for depolarization. 

      (4) Why did you use organotypic slices? Since you wish to model adult epilepsy, it would have been more relevant to use fresh slices from adult rats/mice. At least, discuss the caveat of using a network still in development in vitro.

      Recordings were performed 6–14 days post culture, which is equivalent to postnatal Days (P) 12 to 22. Previous work has shown that neurons in the organotypic hippocampal brain slice are relatively mature (Gähwiler et al., 1997). For example they possess mature Cl- homeostasis mechanisms at this point, as evidenced by their hyperpolarizing EGABA (Raimondo et al., 2012).  

      (5) Please include both the number of slices and number of cells recorded in each condition. This is the standard (the number of cells is not enough).

      This has now been added to all relevant sections of the results text.  

      (6) Please provide a table with the basic properties of cells (Rin, Rs, etc.). This is standard to assess the quality of the recordings.

      Tables containing the basic properties for each cell recording have been created for each figure (as Figure supplements) and the following statement was added to the electrophysiology section of the Methods: ‘Basic properties of each cell were recorded (see Figure supplements).’

      (7) Please provide a table on patient's profile. This is standard when using human material. Were these TLE cases (and "control" cortex) or epileptogenic cortex?

      We have now added a basic table on the patient’s profiles to the Materials and Methods section.

      Globally, the authors achieved their aims. They show convincingly that larvae material can depolarize neurons, with glutamate (and aspartate) as the most likely candidates.

      This is important not only because it provides mechanistic insight but also potential therapeutic targets. The result is impactful, as the authors use quasi-naturalistic conditions, to assess what might happen in the human brain. The experimental design is appropriate to address the question. It can be replicated by any interested person.

      We thank the Reviewer 2 for their enthusiastic and constructive assessment of our manuscript. We have elected to respond to and address the following aspects from their “Recommendations For The Authors” below:

      Reviewer #2 (Recommendations For The Authors):

      lines 132 and following are a repetition of those above

      These have been removed.

      line 151 Fig "2" missing

      This has been added.

      187, 190 should be E, F not C, D

      This has been changed in the text.  

      481, 482 supplier?

      This has been corrected and the correct suppliers described.

      Reviewer #3 (Public Review):

      This paper has high significance because it addresses a prevalent parasitic infection of the nervous system, Neurocysticercosis (NCC). The infection is caused by larvae of the parasitic cestode Taenia solium It is a leading cause of epilepsy in adults worldwide

      To address the effects of cestode larvae, homogenates and excretory/secretory products of larvae were added to organotypic brain slice cultures of rodents or layer 2/3 of human cortical brain slices from patients with refractory epilepsy.

      We thank Reviewer 3 for their helpful comments and suggestions for improvement which we address below.

      A self-made pressure ejection system was used to puff larvae homogenate (20 ms puff) onto the soma of patched neurons. The mechanical force could have caused depolarizaton so a vehicle control is critical. On line 150 they appear to have used saline in this regard, and clarification would be good. Were the controls here (and aCSF elsewhere) done with the low Mg2+o aCSF like the larvae homogenates?

      We agree and have added examples where aCSF alone was pressure ejected onto the same rat and mouse neurons in a new Figure 1-figure supplement 1. In Figure 1, the same aCSF as that was used to bathe the slices was used. In Figure 2D-G, either PBS (which larval homogenates were prepared in) or growth medium (which contain larval E/S products) were used as comparative controls.

      They found that neurons depolarized after larvae homogenate exposure and the effect was mediated by glutamate but not nicotinic receptors for acetylcholine (nAChRs), acid-sensing channels or substance P. To address nAChRs, they used 10uM mecamyline, and for ASICs 2mM amiloride which seems like a high concentration. Could the concentrations be confirmed for their selectivity? 

      We did not independently verify the selectivity of the antagonist concentrations used in our study. However, the persistence of depolarizations despite the use of high concentrations of mecamylamine (10 μM) and amiloride (2 mM) provides strong evidence that neither nAChRs nor ASICs are primarily responsible for mediating these responses. The high concentrations used, while potentially raising concerns about specificity, actually strengthen our conclusion that these receptor types are not involved in the observed effect.

      Glutamate receptor antagonists, used in combination, were 10uM CNQX, 50uM DAP5, and 2mM kynurenic acid. These concentrations are twice what most use. Please discuss. 

      We intentionally used higher-than-typical concentrations of glutamate receptor antagonists in our experimental design. Our rationale for this approach was to ensure maximal blockade of glutamate receptors, thereby minimizing the possibility of residual receptor activity confounding our results.

      Also, it would be very interesting to know if the glutamate receptor is AMPA, Kainic acid, or NMDA. Were metabotropic antagonists ever tested? That would be logical because CNQX/DAPR/Kynurenic acid did not block all of the depolarization.

      We appreciate the reviewer's interest in the specific glutamate receptor subtypes involved in our study. Our research primarily focused on ionotropic glutamate receptors as a group, without differentiating the individual contributions of AMPA, Kainate, and NMDA receptors. This approach, while broad, allowed us to establish the involvement of glutamatergic signalling in the observed effects. We acknowledge that we did not investigate metabotropic glutamate receptors in this study. Importantly, we demonstrate later in our manuscript that the larval products contain both glutamate and aspartate. Therefore the precise nature of the glutamate-dependent depolarization observed using a particular experimental preparation would depend on the specific types of neurons exposed to the homogenate and the expression profile of different glutamate receptor subtypes on these neurons.

      They also showed the elevated K+ in the homogenate (~11 mM) could not account for the depolarization. However, the experiment with K+ was not done in a low Mg2+o buffer (Or was it -please clarify). 

      The experiment where 11.39 mM K+ as well as the experiment with T. crass. Homogenate with a cesium internal and added TTX were all done in standard 2 mM Mg2+ containing aCSF.

      They also confirmed that only small molecules led to the depolarization after filtering out very large molecules. That supports the conclusion that glutamate - which is quite small - could be responsible. It is logical to test substance P because the Intro points out prior work links the larvae and seizures by inflammation and implicates substance P. However, why focus on nAChRs and ASIC?

      These were chosen as they are ionotropic receptors which mediate depolarization and hence could conceivably be responsible for the homogenate-induced depolarization we observed.

      The depolarizations caused seizure-like events in slices. The slices were exposed to a proconvulant buffer though- low Mg2+o. This buffer can cause spontaneous seizure-like events so it is important to know what the buffer did alone.

      We agree that a low M2+ buffer solution can elicit seizure-like events in organotypic slices alone. However, the timing of the onset of the seizure-like event in the example presented in Figure 1 strongly suggests that it was triggered by the T. crass homogenate puff. Nonetheless, on the suggestion of the other reviewers we have reduced emphasis on our experimental evidence for the ability of T. crass. homogenate to illicit seizure-like events.  

      They suggest the effects could underlie seizure generation in NCC. However, there is only one event that is seizure-like in the paper and it is just an inset. Were others similar? How frequency were they? How long?

      Please see the response above as well as our response to Reviewer 1 who raised a similar concern.

      Using Glutamate-sensing fluorescent reporters they found the larvae contain glutamate and can release it, a strength of the paper.

      Fig. 4. Could an inset be added to show the effects are very fast? That would support an effect of glutamate.

      We have not added an inset. However, given the scale bar (500 ms) for the trace provided, the response is very fast.  

      Why is aspartate relatively weak and glutamate relatively effective as an agonist?

      Glutamate generally has a higher affinity for glutamate receptors compared to aspartate. This is particularly true for AMPA and kainate receptors, where glutamate is the primary endogenous agonist. Similarly iGluSnFR has a higher sensitivity for glutamate over aspartate (Marvin et al., 2013).

      Could some of the variability in Fig 4G be due to choice of different cell types? That would be consistent with Fig 5B where only a fraction of cells in the culture showed a response to the larvae nearby. 

      Whilst differences in cell types could contribute to the variability in Fig 4G, all the responses were recorded from hippocampal pyramidal neurons and hence it is more likely that the variability is a function of other sources of variation including differences in iGluSnFR expression, depth of the cell imaged, the proximity of the puffer pipette etc. In Fig. 5B we think the lack of response may be due to the fact that any released glutamate by the live larvae was not able reach the iGluSnFR neurons at sufficient concentrations due to the nature of our submerged recording setup. We have added the following sentence to the results. ‘Our submerged recording setup might have led to swift diffusion or washout of released glutamate, possibly explaining the lack of observable changes.’

      On what basis was the ROI drawn in Fig. 5B.

      The ROI drawn in Fig. 5B was selected to include all iGluSnFR expressing neurons in the brain slice. which were captured in the field of view.

      Also in 5B, I don't see anything in the transmitted image. What should be seen exactly?

      We agree that it is difficult to resolve much in the transmitted image. However, both the brain slice on the left as well as a T. crass. larva on the right is visible and outlined with a green or orange dashed line respectively.

      Human brain slices were from temporal cortex of patients with refractory epilepsy. Was the temporal cortex devoid of pathology and EEG abnormalities? This area may be quite involved in the epilepsy because refractory epilepsy that goes to surgery is often temporal lobe epilepsy. Please discuss the limitations of studying the temporal cortex of humans with epilepsy since it may be more susceptible to depolarizations of many kinds, not just larvae.

      We acknowledge the important limitations of using temporal cortex tissue from patients with refractory epilepsy. While we aimed to use visually normal tissue, we recognize that the tissue may have underlying pathology or functional abnormalities not visible to the naked eye. It may also be more susceptible to induced depolarizations due to epilepsy-related changes in neuronal excitability. Despite these limitations, we believe our human tissue data still provides valuable data that the larval homogenates can induce depolarization in human as well as rodent neurons.  

      Please discuss the limitations of the cultures - they are from very young animals and cultured for 6-14 days.

      We acknowledge the potential limitations of our experimental model using organotypic hippocampal slice cultures from young animals. The use of relatively immature tissue may not fully represent the adult nervous system due to developmental differences in receptor expression, synaptic connections, and network properties. The 6-14 day culture period, while allowing some maturation, may induce changes that differ from the in vivo environment, including alterations in cellular physiology and network reorganization. Despite these limitations, this model provides a valuable balance between preserved local circuitry and experimental accessibility. Future studies comparing results with acute adult slices and in vivo models would be beneficial to validate and extend our findings.

      References:

      Gähwiler, B.H. et al. (1997) ‘Organotypic slice cultures: a technique has come of age.’, Trends in neurosciences, 20(10), pp. 471–7.

      Marvin, J.S. et al. (2013) ‘An optimized fluorescent probe for visualizing glutamate neurotransmission.’, Nature methods, 10(2), pp. 162–70. Available at: https://doi.org/10.1038/nmeth.2333.

      Raimondo, J.V. et al. (2012) ‘Optogenetic silencing strategies differ in their effects on inhibitory synaptic transmission.’, Nat. Neurosci., 15(8), pp. 1102–4. Available at: https://doi.org/10.1038/nn.3143.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors describe a method to probe both the proteins associated with genomic elements in cells, as well as 3D contacts between sites in chromatin. The approach is interesting and promising, and it is great to see a proximity labeling method like this that can make both proteins and 3D contacts. It utilizes DNA oligomers, which will likely make it a widely adopted method. However, the manuscript over-interprets its successes, which are likely due to the limited appropriate controls, and of any validation experiments. I think the study requires better proteomic controls, and some validation experiments of the "new" proteins and 3D contacts described. In addition, toning down the claims made in the paper would assist those looking to implement one of the various available proximity labeling methods and would make this manuscript more reliable to non-experts.

      Strengths:

      (1) The mapping of 3D contacts for 20 kb regions using proximity labeling is beautiful.

      (2) The use of in situ hybridization will probably improve background and specificity.

      (3) The use of fixed cells should prove enabling and is a strong alternative to similar, living cell methods.

      Weaknesses:

      (1) A major drawback to the experimental approach of this study is the "multiplexed comparisons". Using the mtDNA as a comparator is not a great comparison - there is no reason to think the telomeres/centrosomes would look like mtDNA as a whole. The mito proteome is much less complex. It is going to provide a large number of false positives. The centromere/telomere comparison is ok, if one is interested in what's different between those two repetitive elements. But the more realistic use case of this method would be "what is at a specific genomic element"? A purely nuclear-localized control would be needed for that. Or a genomic element that has nothing interesting at it (I do not know of one). You can see this in the label-free work: non-specific, nuclear GO terms are enriched likely due to the random plus non-random labeling in the nucleus. What would a Telo vs general nucleus GSEA look like? (GSEA should be used for quantitative data, no GO). That would provide some specificity. Figures 2G and S4A are encouraging, but a) these proteins are largely sequestered in their respective locations, and b) no validation by an orthogonal method like ChIP or Cut and Run/Tag is used.

      You can also see this in the enormous number of "enriched" proteins in the supplemental volcano plots. The hypothesis-supporting ones are labeled, but do the authors really believe all of those proteins are specific to the loci being looked at? Maybe compared to mitochondria, but it's hard to believe there are not a lot of false positives in those blue clouds. I believe the authors are more seeing mito vs nucleus + Telo than the stated comparison. For example, if you have no labeling in the nucleus in the control (Figures 1C and 2C) you cannot separate background labeling from specific labeling. Same with mito vs. nuc+Telo. It is not the proper control to say what is specifically at the Telo.

      I would like to see a Telo vs nuclear control and a Centromere vs nuc control. One could then subtract the background from both experiments, then contrast Telo vs Cent for a proper, rigorous comparison. However, I realize that is a lot of work, so rewriting the manuscript to better and more accurately reflect what was accomplished here, and its limitations, would suffice.

      (2) A second major drawback is the lack of validation experiments. References to literature are helpful but do not make up for the lack of validation of a new method claiming new protein-DNA or DNA-DNA interactions. At least a handful of newly described proximal proteins need to be validated by an orthogonal method, like ChIP qPCR, other genomic methods, or gel shifts if they are likely to directly bind DNA. It is ok to have false positives in a challenging assay like this. But it needs to be well and clearly estimated and communicated.

      (3) The mapping of 3D contacts for 20 kb regions is beautiful. Some added discussion on this method's benefits over HiC-variants would be welcomed.

      (4) The study claims this method circumvents the need for transfectable cells. However, the authors go on to describe how they needed tons of cells, now in solution, to get it to work. The intro should be more in line with what was actually accomplished.

      (5) Comments like "Compared to other repetitive elements in the human genome...." appear to circumvent the fact that this method is still (apparently) largely limited to repetitive elements. Other than Glopro, which did analyze non-repetitive promoter elements, most comparable methods looked at telomeres. So, this isn't quite the advancement you are implying. Plus, the overlap with telomeric proteins and other studies should be addressed. However, that will be challenging due to the controls used here, discussed above.

      We thank the Reviewer for their careful reading of manuscript and constructive suggestions. We plan to substantially revise the framing and presentation of manuscript to address the concerns raised by all three reviewers.

      Reviewer #2 (Public review):

      Summary

      Liu and MacGann et al. introduce the method DNA O-MAP that uses oligo-based ISH probes to recruit horseradish peroxidase for targeted proximity biotinylation at specific DNA loci. The method's specificity was tested by profiling the proteomic composition at repetitive DNA loci such as telomeres and pericentromeric alpha satellite repeats. In addition, the authors provide proof-of-principle for the capture and mapping of contact frequencies between individual DNA loop anchors.

      Strengths

      Identifying locus-specific proteomes still represents a major technical challenge and remains an outstanding issue (1). Theoretically, this method could benefit from the specificity of ISH probes and be applied to identify proteomes at non-repetitive DNA loci. This method also requires significantly fewer cells than other ISH- or dCas9-based locus-enrichment methods. Another potential advantage to be tested is the lack of cell line engineering that allows its application to primary cell lines or tissue.

      Weaknesses

      The authors indicate that DNA O-MAP is superior to other methods for identifying locus-specific proteomes. Still, no proof exists that this method could uncover proteomes at non-repetitive DNA loci. Also, there is very little validation of novel factors to confirm the superiority of the technique regarding specificity.

      The authors first tested their method's specificity at repetitive telomeric regions, and like other approaches, expected low-abundant telomere-specific proteins were absent (for example, all subunits of the telomerase holoenzyme complex). Detecting known proteins while identifying noncanonical and unexpected protein factors with high confidence could indicate that DNA O-MAP does not fully capture biologically crucial proteins due to insufficient enrichment of locus-specific factors. The newly identified proteins in Figure 1E might still be relevant, but independent validation is missing entirely. In my opinion, the current data cannot be interpreted as successfully describing local protein composition.

      Finally, the authors could have discussed the limitations of DNA O-MAP and made a fair comparison to other existing methods (2-5). Unlike targeted proximity biotinylation methods, DNA O-MAP requires paraformaldehyde crosslinking, which has several disadvantages. For instance, transient protein-protein interactions may not be efficiently retained on crosslinked chromatin. Similarly, some proteins may not be crosslinked by formaldehyde and thus will be lost during preparation (6).

      (1) Gauchier M, van Mierlo G, Vermeulen M, Dejardin J. Purification and enrichment of specific chromatin loci. Nat Methods. 2020;17(4):380-9.

      (2) Dejardin J, Kingston RE. Purification of proteins associated with specific genomic Loci. Cell. 2009;136(1):175-86.

      (3) Liu X, Zhang Y, Chen Y, Li M, Zhou F, Li K, et al. In Situ Capture of Chromatin Interactions by Biotinylated dCas9. Cell. 2017;170(5):1028-43 e19.

      (4) Villasenor R, Pfaendler R, Ambrosi C, Butz S, Giuliani S, Bryan E, et al. ChromID identifies the protein interactome at chromatin marks. Nat Biotechnol. 2020;38(6):728-36.

      (5) Santos-Barriopedro I, van Mierlo G, Vermeulen M. Off-the-shelf proximity biotinylation for interaction proteomics. Nat Commun. 2021;12(1):5015.

      (6) Schmiedeberg L, Skene P, Deaton A, Bird A. A temporal threshold for formaldehyde crosslinking and fixation. PLoS One. 2009;4(2):e4636.

      We thank the Reviewer for their constructive feedback on our work. As noted above, we plan to substantially revise the framing and presentation of manuscript to address the concerns raised by all three reviewers.

      Reviewer #3 (Public review):

      Significance of the Findings:

      The study by Liu et al. presents a novel method, DNA-O-MAP, which combines locus-specific hybridisation with proximity biotinylation to isolate specific genomic regions and their associated proteins. The potential significance of this approach lies in its purported ability to target genomic loci with heightened specificity by enabling extensive washing prior to the biotinylation reaction, theoretically improving the signal-to-noise ratio when compared with other methods such as dCas9-based techniques. Should the method prove successful, it could represent a notable advancement in the field of chromatin biology, particularly in establishing the proteomes of individual chromatin regions - an extremely challenging objective that has not yet been comprehensively addressed by existing methodologies.

      Strength of the Evidence:

      The evidence presented by the authors is somewhat mixed, and the robustness of the findings appears to be preliminary at this stage. While certain data indicate that DNA-O-MAP may function effectively for repetitive DNA regions, a number of the claims made in the manuscript are either unsupported or require further substantiation. There are significant concerns about the resolution of the method, with substantial biotinylation signals extending well beyond the intended target regions (megabases around the target), suggesting a lack of specificity and poor resolution, particularly for smaller loci. Furthermore, comparisons with previous techniques are unfounded since the authors have not provided direct comparisons with the same mass spectrometry (MS) equipment and protocols. Additionally, although the authors assert an advantage in multiplexing, this claim appears overstated, as previous methods could achieve similar outcomes through TMT multiplexing. Therefore, while the method has potential, the evidence requires more rigorous support, comprehensive benchmarking, and further experimental validation to demonstrate the claimed improvements in specificity and practical applicability.

      We thank the Reviewer for providing detailed critiques of our manuscript. As noted above, we plan to substantially revise the framing and presentation of manuscript to address the concerns raised by all three reviewers.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The crystal structure of the Sld3CBD-Cdc45 complex presented by Li et al. is a novel contribution that significantly advances our understanding of CMG formation during the rate-limiting step of DNA replication initiation. This structure provides insights into the intermediate steps of CMG formation. The study builds upon previously known structures of Sld3 and Cdc45 and offers new perspectives into how Cdc45 is loaded onto MCM DH through Sld3-Sld7. The most notable finding is the structural difference in Sld3CBD when bound to Cdc45, particularly the arrangement of the α8-helix, which is essential for Cdc45 binding and may also pertain to its metazoan counterpart, Treslin. Additionally, the conformational shift in the DHHA1 domain of Cdc45 suggests a possible mechanism for its binding to MCM2NTD.

      Strengths:

      The manuscript is generally well-written, with a precise structural analysis and a solid methodological section that will significantly advance future studies in the field. The predictions based on structural alignments are intriguing and provide a new direction for exploring CMG formation, potentially shaping the future of DNA replication research.

      Weaknesses:

      The main weakness of the manuscript lies in the lack of experimental validation for the proposed Sld3-Sld7-Cdc45 model. Specifically, the claim that Sld3 binding to Cdc45-MCM does not inhibit GINS binding, a finding that contradicts previous research, is not sufficiently substantiated with experimental evidence. To strengthen their model, the authors must provide additional experimental data to support this mechanism. Also, the authors have not compared the recently published Cryo-EM structures of the metazoan CMG helicases with their predicted models to see if Sld3/Treslin does not cause any clash with the GINS when bound to the CMG. Still, the work holds great potential in its current form but requires further experiments to confirm the authors' conclusions.

      We appreciate the reviewers’ careful reading and the comments.

      The structure of Sld3CBD-Cdc45 showed that the binding site of Cdc45 to Sld3CBD was distinct from the binding ranges of Cdc45 to GINS and MCM, indicating that the Sld3CBD, MCM, and GINS bind to separate sites of Cdc45 on the CMG complex. The SCMG-DNA model confirmed such a binding situation but did not show whether the binding of Sld3 to Cdc45 affects the recruitment of GINS (by GINS-Dbp11-Sld2) for CMG formation. We will modify our manuscript and discuss this point. Also, we will check the recently published Cryo-EM structures of the metazoan CMG helicases with their predicted models to confirm our conclusions. We will try to conduct the experiments as suggested.

      Reviewer #2 (Public review):

      Summary

      The manuscript presents valuable findings, particularly in the crystal structure of the Sld3CBD-Cdc45 interaction and the identification of additional sequences involved in their binding. The modeling of the Sld7-Sld3CBD-CDC45 subcomplex is novel, and the results provide insights into potential conformational changes that occur upon interaction. However, the work remains incomplete as several main claims are only partially supported by experimental data, particularly the proposed model for Sld3 interaction with GINS on the CMG. Additionally, the single-stranded DNA binding data from different species do not convincingly advance the manuscript's central arguments.

      Strengths

      (1) The Sld3CBD-Cdc45 structure is a novel contribution, revealing critical residues involved in the interaction.

      (2) The model structures generated from the crystal data are well presented and provide valuable insights into the interaction sequences between Sld3 and Cdc45.

      (3) The experiments testing the requirements for interaction sequences are thorough and conducted well, with clear figures supporting the conclusions.

      (4) The conformational changes observed in Sld3 and Cdc45 upon binding are interesting and enhance our understanding of the interaction.

      (5) The modeling of the Sld7-Sld3CBD-CDC45 subcomplex is a new and valuable addition to the field.

      Weaknesses

      (1) The proposed model for Sld3 interacting with GINS on the CMG needs more experimental validation and conflicts with published findings. These discrepancies need more detailed discussion and exploration.

      (2) The section on the binding of Sld3 complexes to origin single-stranded DNA needs significant improvement. The comparisons between Sld3-CBD, Sld3CBD-Cdc45, and Sld7-Sld3CBD-Cdc45 involve complexes from different species, limiting the comparisons' value.

      (3) The authors' model proposing the release of Sld3 from CMG based on its binding to single-stranded DNA is unclear and needs more elaboration.

      We appreciate your positive comments. As suggested, we will try to improve the experiments and manuscript and discuss in more detail, including the interaction between Sld3 and GINS on the CMG, ssDNA-binding section, and the explanations of why we use different species for comparison and more elaboration on the Sld3-release proposal.

      Reviewer #3 (Public review):

      Summary:

      The paper by Li et al. describes the crystal structure of a complex of Sld3-Cdc45-binding domain (CBD) with Cdc45 and a model of the dimer of an Sld3-binding protein, Sld7, with two Sld3-CBD-Cdc45 for the tethering. In addition, the authors showed the genetic analysis of the amino acid substitution of residues of Sld3 in the interface with Cdc45 and biochemical analysis of the protein interaction between Sld3 and Cdc45 as well as DNA binding activity of Sld3 to the single-strand DNAs of the ARS sequence.

      Strengths:

      The authors provided a nice model of an intermediate step in the assembly of an active Cdc45-MCM-GINS (CMG) double hexamers at the replication origin, which is mediated by the Sld3-Sld7 complex. The dimer of the Sld3-Sld7 complexes tethers two MCM hexamers together for the recruitment of GINS-Pol epsilon on the replication origin.

      Weaknesses:

      The biochemical analysis should be carefully evaluated with more quantitative ways to strengthen the authors' conclusion.

      We thank your positive assessment. We will provide more quantitative information and try to quantify the experiments as suggested.

    1. Author response:

      Reviewer 1:

      (1) I think the article is a little too immature in its current form. I'd recommend that the authors work on their writing. For example, the objectives of the article are not completely clear to me after reading the manuscript, composed of parts where the authors seem to focus on SGCs, and others where they study "engram" neurons without differentiating the neuronal type (Figure 5). The next version of the manuscript should clearly establish the objectives and sub-aims.

      Our overarching focus was to identify whether intrinsic physiology and circuit connectivity of SGCs contribute to their unique overrepresentation in neurons labeled as part of a behaviorally relevant dentate engram. Since our systematic analysis of “engram SGCs” did not support the proposal that engram SGCs drive robust feedforward excitation of engram GCs or feedback inhibition of non-engram GCs, we examined an alternative hypothesis that inputs drive recruitment of neurons, regardless of subtype (in figure 5). These are sparsely labeled neurons, with mixed populations of GCs and SGCs undergoing paired recordings. Since the focus of the experiment was input correlation between two simultaneously recorded neurons, we did not report the individual cell types. We regret that this caused confusion and will clarify this issue in the revised manuscript.

      (2) In addition, some results are not entirely novel (e.g., the disproportionate recruitment as well as the distinctive physiological properties of SGCs), and/or based on correlations that do not fully support the conclusions of the article. In addition to re-writing, I believe that the article would benefit from being enriched with further analyses or even additional experiments before being resubmitted in a more definitive form.

      We would like to note that while we and others have previously reported the distinctive SGC physiology, this study is the first to compare physiological properties of SGCs labeled as part of an engram to unlabeled SGCs. That was the thrust of the data presented which may have been missed and will be emphasized in the revision. Similarly, while others have shown higher SGC recruitment in dentate engrams, we had to validate this in the dentate dependent behaviors that we adopted in this study. We also note that the proportional SGC recruitment in our study, based on morphometric classification, differs from what was reported previously. These aspects of study, which were considered confirmatory, represent the necessary validation needed to proceed with the novel cell-type specific paired recordings and optogenetic analyses of engram neurons presented in subsequent sections of the manuscript. We will emphasize these considerations in the revised manuscript.

      Reviewer 2:

      (1) The authors conclude that SGCs are disproportionately recruited into cfos assemblies during the enriched environment and Barnes maze task given that their classifier identifies about 30% of labelled cells as SGCs in both cases and that another study using a different method (Save et al., 2019) identified less than 5% of an unbiased sample of granule cells as SGCs. To make matters worse, the classifier deployed here was itself established on a biased sample of GCs patched in the molecular layer and granule cell layer, respectively, at even numbers (Gupta et al., 2020). The first thing the authors would need to show to make the claim that SGCs are disproportionately recruited into memory ensembles is that the fraction of GCs identified as SGCs with their own classifier is significantly lower than 30% using their own method on a random sample of GCs (e.g. through sparse viral labelling). As the authors correctly state in their discussion, morphological samples from patch-clamp studies are problematic for this purpose because of inherent technical issues (i.e. easier access to scattered GCs in the molecular layer).

      We regret that there seems to be some confusion about use of a classifier. We did NOT use any automated classifier in this study. All cell type classifications in the study were conducted by experienced investigators examining cell morphology and classifying cells based on established morphometric criteria. In our prior study (Gupta et al., 2020) we had conducted an automated cluster analysis that was able to classify GCs and SGCs as different cell types. The principal components underlying the automated clustering in Gupta et al 2020 were consistent with the major criteria identified in prior morphology-based analyses by us and others (including Williams et al 2010 and Save et al., 2019). To date, in the absence of a validated molecular marker, morphometry from recorded and filled cells or sparsely labeled neurons is the only established method to classify SGCs. This was the approach we adopted, and this will be further clarified in the revisions.

      (2) The authors claim that recurrent excitation from SGCs onto GCs or other SGCs is irrelevant because they did not find any connections in 32 simultaneous recordings (plus 63 in the next experiment). Without a demonstration that other connections from SGCs (e.g. onto mossy cells or interneurons) are preserved in their preparation and if so at what rates, it is unclear whether this experiment is indicative of the underlying biology or the quality of the preparation. The argument that spontaneous EPSCs are observed is not very convincing as these could equally well arise from severed axons (in fact we would expect that the vast majority of inputs are not from local excitatory cells). The argument on line 418 that SGCs have compact axons isn't particularly convincing either given that the morphologies from which they were derived were also obtained in slice preparations and would be subject to the same likelihood of severing the axon. Finally, even in paired slice recordings from CA3 pyramidal cells the experimentally detected connectivity rates are only around 1% (Guzman et al., 2016). The authors would need to record from a lot more than 32 pairs (and show convincing positive controls regarding other connections) to make the claim that connectivity is too low to be relevant.

      As noted in our discussion, we are fully cognizant that potential SGC to GC connections may have been missed by the nature of slice physiology experiments and made every effort to limit this possibility. As noted in the manuscript, we only analyzed GC/SGC pairs where hilar axon collaterals of the neurons were recovered. We do not claim that SGC to GC/SGC connections are irrelevant, rather, we indicate that these connections, if present, are sparse and unlikely to drive engram refinement. Interestingly, wide field optical stimulation, designed to activate multiple labeled engram neurons and axon terminals including those of SGCs whose somata were outside the slice, did not lead to EPSCs in other unlabeled GCs or SGCs suggesting the lack of robust SGC to GC/SGC synaptic connectivity. While we have previously published paired recordings from interneurons to GCs (Proddutur  et al 2023) , we agree that recordings demonstrating the presence of SGC/GC to hilar neuron synapses would serve as an added control in the revised manuscript.

      (3) Another troubling sign is the fact that optogenetic GC stimulation rarely ever evokes feedback inhibition onto other cells which contrasts with both other in vitro (e.g. Braganza et al., 2020) and in vivo studies (Stefanelli et al., 2016) studies. Without a convincing demonstration that monosynaptic connections between SGCs/GCs and interneurons in both directions is preserved at least at the rates previously described in other slice studies (e.g. Geiger et al., 1997, Neuron, Hainmueller et al., 2014, PNAS, Savanthrapadian et al., 2014, J. Neurosci), the notion that this setting could be closer to naturalistic memory processing than the in vivo experiments in Stefanelli et al. (e.g. lines 443-444) strikes me as odd. In any case, the discussion should clearly state that compromised connectivity in the slice preparation is likely a significant confound when comparing these results.

      We would like to note that our data are consistent with Braganza 2020 study, as we explain below. Moreover, we would like to point out that the demonstration of “feedback inhibition” in the Stefanelli study was NOT in engram or behaviorally labeled neurons nor was it in vivo. As we explain below, the physiological assay in Stefanelli was in slices and in a cohort of GCs with virally driven ChR2 expression. Thus, we are fully confident that our experimental paradigm better reflects a behavioral engram. As noted in response (2, we have previously published paired monosynaptic connections from interneurons to GCs (Proddutur  et al 2023) and find the connectivity consistent with published data. However, we agree that recordings demonstrating the presence of SGC/GC to hilar neuron synapses  or recruitment of feedback inhibition by focal activation of GCs would serve to allay concerns regarding slice preparation. We also submit that we already discuss the potential concerns regarding compromised connectivity in slice preparations.

      Regarding the lack of optically evoked feedback inhibition, we would like to point out that the Braganza 2020 study examined focal optogenetic activation of GCs, where a high density of GCs was labeled using a Prox-cre line. They reported that about 2-4% of these densely labeled cells need to be recruited to evoke feedback IPSCs. Our experimental condition, where ChR2 was expressed in behaviorally labeled neurons, leads to sparse labeling much less than the focal 4% needed to evoke IPSCs in the Braganza study. We do not claim that feedback inhibition cannot be activated by focal activation of a cohort of GCs and even show an example of paired recording with feedback GC inhibition of an SGC. Our conclusion is that the few sparsely labeled neurons during a behavioral episode do not support robust feedback inhibition proposed to mediate engram refinement. We submit that our findings are fully consistent with the sparse GC driven feedback inhibition, and the need to activate a cohort of focal GCs to recruit feedback inhibition, reported in Braganza 2020

      Regarding the Stefanelli study, we maintain that our behaviorally relevant in vivo labeling approach is more naturalistic than the DREADD and Channelrhodopsin driven artificial “engrams” generated in the Stefanelli study. Of note, we used cFOS driven TRAP mice to label, in vivo, neurons active during a behavior and then undertook slice physiology studies in these mice a week later. In contrast, the slice physiology data demonstrating putative feedback inhibition in the Stefanelli study (Fig 5) used wildtype mice injected with AAV CAMKII-cre and AAV-DIO-ChR2. Thus, unlike our study, the physiological data demonstrating feedback inhibition in the Stefanelli study was not performed in a behaviorally labeled engram. Apart from the one set of histological experiments using AAV-SARE-GFP to demonstrate increased GFP labeling of SST neurons in behavior, all other data presented in the Stefanelli study are generated based on artificially generated engrams where optogenetic activation or silencing on granule cells was used to manipulate the numbers of neurons active during a task followed by histological analysis of cFOS staining or behaviors. Thus, the physiological experiments in the Stefanelli et al (2016) generated by wide field activation of a large cohort of GCs labeled by focal virally driven ChR2 expression, were similar to wide field optical stimulation studies in the Braganza 2020 study, and were NOT conducted in a behavioral engram. The strength of our study is in the use of a behaviorally tagged engram neurons for analysis and our findings in sparsely labeled neurons are consistent with the reports in Braganza 2020. We will further clarify in our discussion that the data presented in the Stefanelli study do NOT represent a natural behavior generated engram.

      (4) Probably the most convincing finding in this study is the higher zero-time lag correlation of spontaneous EPSCs in labelled vs. unlabeled pairs. Unfortunately, the fact that the authors use spontaneous EPSCs to begin with, which likely represent a mixture of spontaneous release from severed axons, minis, and coordinated discharge from intact axon segments or entire neurons, makes it very hard to determine the meaning and relevance of this finding. At the bare minimum, the authors need to show if and how strongly differences in baseline spontaneous EPSC rates between different cells and slices are contributing to this phenomenon. I would encourage the authors to use low-intensity extracellular stimulation at multiple foci to determine whether labelled pairs really share higher numbers of input from common presynaptic axons or cells compared to unlabeled pairs as they claim. I would also suggest the authors use conventional Cross correlograms (CCG; see e.g. English et al., 2017, Neuron; Senzai and Buzsaki, 2017, Neuron) instead of their somewhat convoluted interval-selective correlation analysis to illustrate co-dependencies between the event time series. The references above also illustrate a more robust approach to determining whether peaks in the CCGs exceed chance levels.

      We appreciate the comment can provide additional data on the EPSC frequency in individual labeled and unlabeled cells in the revised manuscript. As indicated in the manuscript, we constrained our analysis to cell pairs with comparable EPSC frequency in order to avoid additional confounds in analysis. We have additional experiments to show that over 50% of the sEPSCs represent action potential driven events which we will include in the revised manuscript. We thank the reviewer for the suggestion to explores alternative methods of analyses including CCGs to further strengthen our findings.

      (5) Finally, one of the biggest caveats of the study is that the ensemble is labelled a full week before the slice experiment and thereby represents a latent state of a memory rather than encoding consolidation, or recall processes. The authors acknowledge that in the discussion but they should also be mindful of this when discussing other (especially in vivo) studies and comparing their results to these. For instance, Pignatelli et al 2018 show drastic changes in GC engram activity and features driven by behavioral memory recall, so the results of the current study may be very different if slices were cut immediately after memory acquisition (if that was possible with a different labelling strategy), or if animals were re-exposed to the enriched environment right before sacrificing the animal.

      As noted by the reviewer, we fully acknowledge and are cognizant of the concern that slices prepared a week after labeling may not reflect ongoing encoding. Although our data show that labeled cells are reactivated in higher proportion during recall, we have discussed this caveat and will include alternative experimental strategies in the discussion.

      Reviewer 3:

      (1) Engram cells are (i) activated by a learning experience, (ii) physically or chemically modified by the learning experience, and (iii) reactivated by subsequent presentation of the stimuli present at the learning experience (or some portion thereof), resulting in memory retrieval. The authors show that exposure to Barnes Maze and the enriched environment-activated semilunar granule cells and granule cells preferentially in the superior blade of the dentate gyrus, and a significant fraction were reactivated on re-exposure. However, physical or chemical modification by experience was not tested. Experience modifies engram cells, and a common modification is the Hebbian, i.e., potentiation of excitatory synapses. The authors recorded EPSCs from labeled and unlabeled GCs and SGCs. Was there a difference in the amplitude or frequency of EPSCs recorded from labeled and unlabeled cells?

      We agree that we did not examine the physical or chemical modifications by experience. Although we constrained our sEPSC analysis to cell pairs with comparable sEPSC frequency, we will include data on sEPSC parameters in labeled and unlabeled cells in the revised manuscript.

      (2) The authors studied five sequential sections, each 250 μm apart across the septotemporal axis, which were immunostained for c-Fos and analyzed for quantification. Is this an adequate sample? Also, it would help to report the dorso-ventral gradient since more engram cells are in the dorsal hippocampus. Slices shown in the figures appear to be from the dorsal hippocampus.

      We thank the reviewer for the comment. We analyzed sections along the dorso-ventral gradient. As explained in the methods, there is considerable animal to animal variability in the number of labeled cells which was why we had to use matched littermate pairs in our experiments This variability could render it difficult to tease apart dorsoventral differences.

      (3) The authors investigated the role of surround inhibition in establishing memory engram SGCs and GCs. Surprisingly, they found no evidence of lateral inhibition in the slice preparation. Interneurons, e.g., PV interneurons, have large axonal arbors that may be cut during slicing. Similarly, the authors point out that some excitatory connections may be lost in slices. This is a limitation of slice electrophysiology.

      We agree that slice physiology has limitations and discuss this caveat. As noted in response (2, we have previously published paired monosynaptic connections from interneurons to GCs (Proddutur  et al 2023) and find the connectivity consistent with published data. However, we agree that recordings demonstrating the presence of SGC/GC to hilar neuron synapses  or recruitment of feedback inhibition by focal activation of GCs would serve to allay concerns regarding slice preparation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The study by Chikermane and colleagues investigates the functional, structural, and dopaminergic network substrates of cortical beta oscillations (13-30 Hz). The major strength of the work lies in the methodology taken by the authors, namely a multimodal lesion network mapping. First, using invasive electrophysiological recordings from healthy cortical territories of epileptic patients they identify regions with the highest beta power. Next, they leverage open-access MRI data and PET atlases and use the identified high-beta regions as seeds to find (1) the whole-brain functional and structural maps of regions that form the putative underlying network of high-beta regions and (2) the spatial distribution of dopaminergic receptors that show correlation with nodal connectivity of the identified networks. These steps are achieved by generating aggregate functional, structural, and dopaminergic network maps using lead-DBS toolbox, and by contrasting the results with those obtained from high-alpha regions.

      The main findings are:

      (1) Beta power is strongest across frontal, cingulate, and insular regions in invasive electrophysiological data, and these regions map onto a shared functional and structural network. (2) The shared functional and structural networks show significant positive correlations with dopamine receptors across the cortex and basal ganglia (which is not the case for alpha, where correlations are found with GABA).

      Nevertheless, a few clarifications regarding the choice of high-power electrodes and distributions of functional connectivity maps (i.e., strength and sign across cortex and sub-cortex) can help with understanding the results.

      We thank the reviewer for this critical expert assessment. 

      Reviewer #1 (Recommendations For The Authors):

      To potentially enhance the quality of the manuscript in the current version, I kindly ask the authors to address the following points:

      Major:

      (A) Power analysis of electrophysiological data

      (1) How were significant peaks identified exactly? I understand that the authors used FOOOF methodology to estimate periodic components of brain activity.

      Thank you for pointing us to this lack of clarity. The application of FOOOF consists of the fitting of a one-over-f curve that delineates the aperiodic component followed by the definition of gaussians to fit periodic activity. This allows for extraction of periodic peak power estimates that are corrected for offset and exponent of the one-over-f or non-oscillatory aperiodic component in the spectrum (further information can be found here https://fooof-tools.github.io/fooof/auto_tutorials/plot_02-FOOOF.html). We included all peaks that could be fitted using the process.

      How about aperiodic components (Figure 1, PSD plots)? 

      We share the interest in aperiodic activity with the reviewer. However, given that the primary aim of this study was the description of beta oscillations and the methodology and results presentation is already very complex, we did not include the analysis of aperiodic activity in this manuscript. This could be done in the future and it would surely be interesting to visualize the whole brain connectomic fingerprints of aperiodic exponent and offset. With regard to the purely anatomical description of nonoscillatory aperiodic activity we would like to refer to Figure 8 in Frauscher et al. Brain 2018 (https://doi.org/10.1093/brain/awy035) where this is described. We have decided not to include additional information on this matter, because a) we felt that this would further convolute the results and discussion without directly addressing any of the hypotheses and aims that we set out to tackle and b) the interpretation of aperiodic activity is still a matter of intense research with conflicting results, which warrants very careful considerations of many aspects that again would go beyond the scope of this paper. 

      In addition, to what degree would the results change if one identified the peaks relative to sites with no peak, similar to Frauscher et al. 

      Beta activity, the oscillation of interest in our analysis is ubiquitous in the brain. In fact, of 1772 channels, only 21 channels did not exhibit a beta peak detectable with FOOOF. Thus, a comparison of 1751 against 21 would not yield meaningful results. We have therefore decided to focus on the channels in which beta activity is the strongest and dominant observable oscillation. 

      If the FOOOF approach has some advantages, these should be pointed out or discussed.

      FOOOF indeed has the advantage that it provides an objective and reproducible estimation of peak oscillatory activity that accounts for differences in aperiodic activity. To the best of our knowledge, there is no other approach that is nearly as well documented, validated and computationally reproducible. 

      Changes in manuscript: We have now further clarified the definition of peak amplitudes in the results and methods section and have discussed the use of alternative measures in the limitations section of our manuscript.

      Results: “The frequency band with the highest peak amplitude was identified using the extracted peak parameter (pw) for each channel and depicted as the dominant rhythm for the respective localisation (Figure 1).”

      Methods: “Peak height was extracted using the pw parameter, which depicts peak amplitude after subtraction of any aperiodic activity.”

      Discussion: “Alternative approaches could yield different results, e.g. reusing channels for each peak that is observable and contrasting them to channels where such peak was not present. However, in our study the majority of channels exhibited beta activity, even if peaks were of low amplitude, which we believe would have led to less interpretable results.”

      (2) How exactly do the authors deal with channels with more than one peak? Some elaboration on this and how this could potentially impact the results would be appreciated. Sorry if I have missed it.

      Indeed, a description of this was lacking so we are very thankful that the reviewer pointed this out. The maximum peak amplitude method was a winner-takes-all approach where in the case of multiple peaks, the peak with the higher amplitude was chosen. This method of course has drawbacks in the form of lost or disregarded peaks and remains a limitation to this study. 

      Changes in manuscript: We have now clarified this in the methods and results sections, which now read: 

      Methods: “In case of multiple peaks within the same region, we used only the highest peak amplitude.”

      Results: “In case of multiple peaks within the same frequency band, we focused the analysis on the peak with the highest amplitude.”

      And added the following to the Limitations section of the discussion: 

      “Another limitation in our study is the fact that the statistical approach for the comparison of beta and alpha networks and even for multiple peaks within the same frequency band follows a winner takes all logic that is, by definition, a simplification, as most areas will contribute to more than one spatiospectrally distinct oscillatory network. Specifically, while multiple peaks within or across frequency bands could be present in each channel, we decided to allocate this channel to only the frequency band containing the highest peak amplitude.” 

      (B) Network mapping

      (1) Knowing that fMRI data are preprocessed by regressing the global signal, there are negative correlations across the functional networks. Unfortunately, the distribution, sign, and strength of the correlations are not quantitatively shown in any of the plots. Thus, it is unclear whether, e.g., corticocortical vs. subcortico-cortical correlations differ in strength and/or sign. I think this additional information is important for better understanding the up/down-regulation of beta, e.g., by DA signaling. Some discussion around this point in addition would be insightful, I think.

      The referee is touching upon a very important and difficult point, which we have considered very carefully. Global signal regression is a controversial topic and the neurophysiological basis of negative correlations remains to be elucidated. We can justify our use of this approach based on an expert consensus described in Murphy & Fox 2017 (https://doi.org/10.1016%2Fj.neuroimage.2016.11.052), which highlights that global signal regression can improve the specificity of positive correlations, improve the correspondence to anatomical connectivity. The truth however is that, we relied on it, because it is the more commonly used and validated approach used in lesion network and DBS connectivity mapping and implemented in the Lead Mapper pipeline. Indeed all connectivity estimates are shown in Supplementary figure 3. We remain hesitant to raise the focus to these points, because of the uncertain underlying neural correlates. However, when looking at the values, it is interesting to note that most key regions of interest exhibit positive connectivity values. 

      Changes in manuscript: We now point to the supplement containing all connectivity values in the results section more prominently: “All connectivity values including their sign are shown in figures as brain region averages parcellated with the automatic anatomical labelling atlas in supplementary figures 2&3.”

      (2) I assume no thresholding is applied to the functional connectivity maps (in a graph-theoretical sense). Please clarify (this is also related to the comment above, in particular, the strength of correlations.

      Indeed, we demonstrate SPM maps using family wise error corrected stats in figure 2, but all further analyses were performed on unthresholded maps as correctly pointed out by the referee. 

      Changes in manuscript: 

      Results: “Specifically, we analysed to what degree the spatial uptake patterns of dopamine, as measurable with fluorodopa (FDOPA; cohort average of 12 healthy subjects) and other dopamine signalling related tracers that bind D1/D2 receptors (average of N=17/44 respectively healthy subjects) or the dopamine transporter (DAT; cohort average of N=180 healthy subjects) were correlated with the unthresholded MRI connectivity maps.”

      Methods: “This parcellation was applied to both PET and unthresholded structural and functional connectivity maps using SPM and custom code.”

      Minor

      (1) Methods, Connectivity analysis: The description of (mass-univariate) GLM analysis is confusing. The maps underwent preprocessing? Which preprocessing steps are meant here? What is the dependent variable and what are the predictors exactly?

      We thank the reviewer for catching this error in our methods. We apologise for the confusion and mistake and thank the reviewer for catching it. Indeed, we have used t-tests without further preprocessing instead of a GLM. 

      Changes in manuscript: The respective section has been removed from the methods section and intermediate steps have been clarified. The section now reads: “To investigate differences between beta dominant and alpha dominant functional connectivity networks, a two sample t-test was calculated for the condition where beta was greater than alpha and vice versa using SPM. Here, the connectivity maps from each dominant channel (1005 beta functional connectivity maps and 397 alpha connectivity maps) Estimation of model parameters yielded t-values for each voxel, indicating the strength and direction of differences between the two contrasts (beta > alpha, alpha > beta). To address the issue of multiple comparisons, we applied Family-Wise Error (FWE) correction, adjusting significance thresholds such that only voxels with p < 0.05 would be included.”

      (2) I encourage the authors to find a better (visual) way of reporting Table 1, to make the main observations easier to grasp and compare (maybe a two-dimensional bar plot? Or color-coding the cells?)

      Reply: Thank you for your suggestion to improve the table, the new table is adjusted to the recommended changes to make it more readable.

      Reviewer #2 (Public Review):

      Summary:

      This is a very interesting paper that leveraged several publicly available datasets: invasive cortical recording in epilepsy patients, functional and structural connectomic data, and PET data related to dopaminergic and gaba-ergic synapses. These were combined to create a unified hypothesis of beta band oscillatory activity in the human brain. They show that beta frequency activity is ubiquitous, not just in sensorimotor areas, and cortical regions where beta predominated had high connectivity to regions high in dopamine re-uptake.

      Strengths:

      The authors leverage and integrate three publicly available human brain datasets in a creative way. While these public datasets are powerful tools for human neuroscience, it is innovative to combine these three types of data into a common brain space to generate novel findings and hypotheses. Findings are nicely controlled by separately examining cortical regions where alpha predominates (which have a different connectivity pattern). GABA uptake from PET studies is used as a control for the specificity of the relationship between beta activity and dopamine uptake. There is much interest in synchronized oscillatory activity as a mechanism of brain function and dysfunction, but the field is short on unifying hypotheses of why particular rhythms predominate in particular regions. This paper contributes nicely to that gap. It is ambitious in generating hypotheses, particularly that modulation of beta activity may be used as a "proxy" for modulating phasic dopamine release.

      Weaknesses:

      As the authors point out, the use of normative data is excellent for exploring hypotheses but does not address or explore individual variations which could lead to other insights. It is also biased to resting state activity; maps of task-related activity (if they were available) might show different findings.

      The figures, results, introduction, and methods are admirably clear and succinct but the discussion could be both shorter and more convincing.

      Reviewer #2 (Recommendations For The Authors):

      The tone of the discussion is excessively lofty and abstract, and hard to follow in places. Specific examples in comments to authors below.

      We thank the reviewer for their positive assessment and their constructive feedback on the discussion. Also in light of the other reviewers we have made a sincere effort to shorten, restructure and improve the discussion. Additionally, we have addressed all the specific comments the reviewer had below. We appended each change to the manuscript where appropriate below and have addressed all comments in the main text. Having that said, we see this paper and discussion to provide our most up-to-date and personal perspective on a correct concept on the interplay of beta oscillations and dopamine that is generalizable. Providing a concept that is so generalizable is very challenging and so far very few authors have even attempted this. One notable exception is the “status quo” concept by Fries & Engel. While we will do our very best to address the comments, we have decided not to deviate from our initial ambition to provide a discussion on a generalizable concept. Naturally such a concept must be very complex and therefore it will be hard to understand in parts. Through the revision, we hope that the readability and comprehensibility has improved, while it provides an in-depth perspective and hypothesis on how beta oscillations, dopamine and their brain circuits may facilitate brain function. Nevertheless, we want to express our honest gratitude for the thoroughness with which the reviewer has read and scrutinized our paper. The review clearly tells that the reviewer had the ambition to follow and understand what we were trying to convey, which can be rare nowadays. We are truly thankful for this.

      The first sentence is not quite true, as invasive neurophysiology was not, and cannot be, done in healthy humans. "The present study combined three openly available datasets of invasive neurophysiology, MRI connectomics, and molecular neuroimaging in healthy humans to characterise the spatial distribution of brain regions exhibiting resting beta activity, their shared circuit architecture, and its correlation with molecular markers of dopamine signaling in the human brain."

      Changes in manuscript: We have now removed the “healthy” from the respective sentence.

      "Our results motivate to conceptualise the capacity to generate.... This is not clear.

      Changes in manuscript: “Our results suggest that one common denominator of brain regions that generate beta activity, is their affiliation with beta oscillations as a feature that arises from a largescale global brain network that is modulated by dopamine.”

      "Similarly, the robust beta modulation that is elicited by voluntary action in sensorimotor cortex and its correlation with motor symptoms of Parkinson's disease is long known" - the association between movement-related cortical beta desynchronization and Parkinson's motor signs is not well described - could the authors specify and reference this?

      We thank the reviewer for pointing out this lack of clarity. We meant that independently beta is known for “movement” and for “movement disorders” and not “movement in movement disorders”. Having that said, there are some studies that suggest that beta ERD is altered in PD (e.g.https://doi.org/10.1093/cercor/bht121), but saying that this is “long known” would be an overstatement and was not our intention. We rephrased this sentence accordingly.

      Changes in manuscript: The sentence now reads: “Moreover, the robust beta modulation that is elicited by voluntary action in sensorimotor cortex and its correlation with motor symptoms of Parkinson’s disease is long known.”

      "...first fast-cyclic voltammetry experiments that allowed for combined measurement of dopamine release with invasive neurophysiology have provided first evidence that beta band oscillations in healthy non-human primates can differentially link dopamine release, beta oscillations and reward and motor control, depending on the contextual information and striatal domain" - This is not very clear - not sure what "differentially link" signifies.

      I think the fact that this is not easy to understand signifies the complexity that we and the authors of the cited paper from Ann Graybiel’s lab aimed to communicate. In fact, we stayed very close to the phrasing used in their paper to try and avoid confusion (Title: Dopamine and beta-band oscillations differentially link to striatal value and motor control” - https://doi.org/10.1126/sciadv.abb9226). The specific results go beyond the scope of the discussion but are very interesting, so I would be happy if our paper would inspire readers to look it up. 

      Changes in manuscript: We have now adapted the sentence to “In line with this more complex picture, direct measurement of dopamine concentration in non-human primates revealed specific interactions between dopamine release, beta oscillations, reward value and motor control, depending on contextual information and striatal domain. This shows that the relationship of dopamine and beta activity is not solely associated with either reward or movement and depends on where in the striatum beta activity is recorded.”

      "In fact, one could argue that it can be contextualised in a recently described framework of neural reinforcement, that serves to orchestrate the re-entrance and refinement of neural population dynamics for the production of neural trajectories" - this is not clear - for example what is a neural trajectory? What is meant by "re-entrance and refinement"?

      A neural trajectory refers to the path that the activity of a neural population takes through a high-dimensional space over time. It can be obtained through multivariate analysis of population activity with dimensionality reduction techniques, such as PCA. The concept of low-dimensional representations of high-dimensional neural activity has gained a lot of attention in computational neuroscience ever since high-channel count recordings of neural population activity have become available (an early and prominent example is Churchland et al., 2012 Nature https://doi.org/10.1038/nature11129 , while a more recent example is Safaie et al., Nature 2023 https://doi.org/10.1038/s41586-023-06714-0). The review we refer to by Rui Costa and colleagues (Athalye, V. R., Carmena, J. M. & Costa, R. M. Neural reinforcement: re-entering and refining neural dynamics leading to desirable outcomes. Curr Opin Neurobiol 60, 145–154 (2020) https://doi.org/10.1016/j.conb.2019.11.023) suggests that dopamine may serve to modulate the likelihood of a specific pattern to emerge and re-enter the cortex – basal ganglia loop, for the “reliable production of neural trajectories driving skillful behavior on-demand”. We believe that this concept could be revolutionary in our understanding of dopaminergic modulation and disoroders and together with colleague Alessia Cavallo have written an invited perspective on this topic (https://doi.org/10.1111/ejn.16222), which may help further clarify the topic. 

      Changes in manuscript: We realize that this aspect may sound a bit unclear or far away from the data in this manuscript. However, given that we have spent more than a decade thinking about beta oscillations and how they can be conceptualized, we would prefer not to entirely change our points and rather bet on the possibility that the concepts become more widely accepted and well-known. Nevertheless, we have now adapted the text to make this a bit more clear:

      “We hypothesise that, this “status quo” hypothesis could be equally or maybe even more adequately posed on the neural level. Namely, it could provide insights to what degree a certain activity pattern or synaptic connection is to be strengthened or weakened, in light of neural learning. We propose that this putative function can be contextualised in a recently described framework of neural reinforcement, that serves to orchestrate the re-entrance and refinement of neural population dynamics for the production of neural trajectories.”

      "....after which it was quickly translated to first experimental studies using cortical or subcortical beta signals in human patients44." - reference 44 only deals with the use of subcortical beta, not cortical, in adaptive control.

      The reviewer is right, in fact there is no study using motor cortex beta for adaptive DBS yet, but different studies have used different markers (especially gamma) since then. 

      Changes in manuscript: We have rephrased and added citations accordingly: “This approach, also termed adaptive DBS, was first demonstrated based on cortical beta activity that was used to adapt pallidal DBS in the MPTP non-human primate model of PD43. It was quickly translated to first experimental studies using subcortical beta signals in human patients44, followed by further research using more complex cortical and subcortical sensing setups and biomarker combinations45,46.”

      The paragraph headed " Implications for neurotechnology" is quite long and should be condensed and focused. It doesn't seem to support the last sentence, "....targeted interventions that can increase and decrease beta activity, as recently shown through phase specific modulation45 could be utilised to mimic phasic dopamine release as a neuroprosthetic approach to alter neural reinforcement38." - I don't quite follow the logic. The authors have clearly shown that beta-related circuits tend to be those linked to dopamine modulation, and may subserve tasks for which reinforcement learning is an important mechanism. However the logic of how modulation of beta activity can "substitute" for modulation of dopamine isn't clear. That would seem to require that the mechanism by which dopamine produces reinforcement, is via an effect on beta oscillation properties (phase, amplitude, frequency). Is there evidence for this? If so it should be better spelled out.

      We realize that this is very speculative at this point. Indeed, we believe that subthalamic DBS can mimic dopaminergic control and in the future there may be new treatment avenues, e.g. using neurochemical using neurochemical interfaces for which beta could be informative to mimic dopamine release but ultimately explaining this would be very complex, so we have removed the sentence. With regard to the remaining text in the section, we considered shortening / condensing but felt that this paragraph is highly relevant for the ongoing development of neurotechnology and therefore decided to only remove the first and last sentences.

      Changes in manuscript: We have removed the first and last sentences.

      "While the abovementioned prospects are promising we should cautiously consider the limitations of our study." - an unnecessary sentence to start a "limitations" section, its clearly a paragraph about limitations. In general, authors should go thru discussion and reduce verbosity; it is not nearly as well edited as the rest of the paper.

      Agreed. 

      Changes in manuscript: We removed the sentence. 

      Reviewer #3 (Public Review):

      Summary:

      In this paper, Chikermane et al. leverages a large open dataset of intracranial recordings (sEEG or ECoG) to analyze resting state (eyes closed) oscillatory activity from a variety of human brain areas. The authors identify a dominant proportion of channels in which beta band activity (12-30Hz) is most prominent and subsequently seek to relate this to anatomical connectivity data by using the sEEG/ECoG electrodes as seeds in a large set of MRI data from the human connectome project. This reveals separate regions and white matter tracts for alpha (primarily occipital) and beta (prefrontal cortex and basal ganglia) oscillations. Finally, using a third available dataset of PET imaging, the authors relate the parcellated signals to dopamine signaling as estimated by spatial uptake patterns of dopamine, and reveal a significant correlation between the functional connectivity maps and the dopamine reuptake maps, suggesting a functional relationship between the two.

      Strengths:

      Overall, I found the paper well justified, focused on an important topic, and interesting. The authors' use of 3 different open datasets was creative and informative, and it significantly adds to our understanding of different oscillatory networks in the human brain, and their more elusive relation with neuromodulator signaling networks by adding to our knowledge of the association between beta oscillations and dopamine signaling. Even my main comments about the lack of a theta network analysis and discussion points are relatively minor, and I believe this paper is valuable and informative.

      Weaknesses:

      The analyses were adequate, and the authors cleverly leveraged these different datasets to build an interesting story. The main aspect I found missing (in addition to some discussion items, see below) was an examination of the theta network. Theta oscillations have been involved in a number of cognitive processes including spatial navigation and memory, and have been proposed to have different potential originating brain regions, and it would be informative to see how their anatomical networks (e.g. as in Figure 2) look like under the author's analyses.

      The authors devote a significant portion of the discussion to relating their findings to a popular hypothesis for the function of beta oscillations, the maintenance of the "status quo", mostly in the context of motor control. As the authors acknowledge, given the static nature of the data and lack of behavior, this interpretation remains largely speculative and I found it a bit too far-reaching given the data shown in the paper. In contrast, I missed a more detailed discussion on the growing literature indicating a role for beta in mood (e.g. in Kirkby et al. 2018), especially given the apparent lack of hippocampal and amygdala involvement in the paper, which was surprising.

      We thank the reviewer for their insightful review of our manuscript. One of the aims of our paper was to provide the ground for a circuit-based conceptualization of beta activity, which does not primarily relate to behavior. Practically we have the ambition to provide a generalizable concept that can be applied to all behavioral domains including mood. The reason we focus on the “status quo” hypothesis, is that it is one of the very few if not only generalizable concept of the function of beta oscillations. Through our paper and the discussion, we have to redirect this concept towards a less cognitive/behavioral and more anatomical network based domain, while acknowledging principles that may overlap. We realize that this is very ambitious and this endeavour is necessarily very complex and not easy to communicate. In light of the reviewers comments, we have made an effort to improve the discussion as best we could without trailing too far away from what our initial aim was. We are thankful for the suggested reference, which we have now added to the discussion in the section where we have previously discussed beta as biomarker for mood, also noting the absence of beta dominant channels in amygdala and hippocampus. Here it should be clarified however, that a) only three channels were located in the amygdala of which one exhibited beta activity, we should be cautious to not overinterpret this result and b) most channels exhibited beta and just because beta wasn’t dominant, it doesn’t mean that beta is not present or important in these brain areas. Absence of evidence is not evidence for absence with the way we approached the analysis. We are thankful for the interesting reference, which we have now included our discussion. Notably the study used a complex network analysis, which we could not perform because we did not have parallel recordings from these areas in multiple patients. This is now noted in the limitations. 

      Changes in manuscript: “For example, it was shown that beta is implicated in working memory28, utilisation of salient sensory cues29, language processing30, motivation31, sleep32, emotion recognition33, mood34 and may even serve as a biomarker for depressive symptom severity in the anterior cingulate cortex35” and “One impactful study reported that beta oscillatory sub-networks of Amygdala and hippocampus could reflect human variations in mood 34. This is interesting, but highlights another relevant limitation of our study, namely that recordings in different areas were stemming from different patients and thus, such sub-network analyses on the oscillatory level could not be conducted.” 

      Major comment:

      • Although the proportion of electrodes with theta-dominant oscillations was lower (~15%) than alpha (~22%) or beta (~57%), it would be very valuable to also see the same analyses the authors carried out in these frequency bands extended to theta oscillations.

      We agree with the reviewer and appreciate the interest in other frequency bands; theta, alpha and gamma. Our primary interest was to provide a network concept of beta activity, but anticipated that interest would go beyond that frequency band. However, we also had to limit ourselves to what is communicable and comprehensible. The key aim for us was to provide a data-driven circuit description of beta activity that can lay ground for a generalizable concept of where beta oscillations emerge. Reproducing all analyses for every frequency band would clutter both the results and the discussion. Moreover, the honest truth is that funding and individual career plans of the researchers currently do not allow to allocate time for a reanalysis of all data which would be a significant effort. Therefore, we have decided to just add the topography of theta and gamma channels as a supplement. In case the reviewer is interested on a collaboration on extending this project to other frequency bands and circuits, we would like to invite them to get in touch and perhaps this could be a new collaborative project. Until then, we have extended our limitation that this would be important work for the future. 

      Changes in manuscript: 

      We have added and cited the new supplementary figure for the results from theta in the results section, which now reads: 

      “Further information on the topography of theta channels are shown in supplementary figure 1.”

      We would like to add that a sensible interpretation of results from gamma dominant channels is unlikely to be possible given the low count of channels with prominent resting activity in this frequency band. We have added the following text to the limitations section: “The aim of this study was to elucidate the circuit architecture of beta oscillations, which is why insights from this study for other frequency bands are limited. Future research investigating the specific circuits of theta, alpha and gamma oscillations and their relationship with neurotransmitter uptake could yield new important insights on the networks underlying human brain rhythms.“ 

      Reviewer #3 (Recommendations For The Authors):

      Minor comments:

      • Results: "we performed non-parametric Spearman's correlations between the structural and functional connectivity maps of beta networks with neurotransmitter uptake". This is a significantly complex analysis that requires more detail for the reader to evaluate. There is more detail in the Figure 3 legend but still insufficient. The Methods offer more detail, but I found the description of the parcellation to be vague and I would appreciate a more detailed description.

      We thank the reviewer for bringing the insufficient explanation of the methods used to calculate the correlations in analysis to our attention. We have now made an effort to provide more level of detail in the relevant paragraphs. 

      Changes in manuscript: We have now made changes to both the Results and Methods sections and added the following explanations respectively:

      Results: “Next, we resliced the beta network map and the PET images to allow for a meaningful comparison, using a combined parcellation with 476 brain regions that include cortex19, basal ganglia20, and cerebellum21. Here, each parcel – which was a collection of voxels belonging to a particular brain region – from the connectivity map was correlated with the same parcel containing average neurotransmitter uptake from the respective PET scan (see Figure 3A). In this way nonparametric Spearman’s correlations between PET intensity and structural and functional connectivity maps of beta networks were obtained, which indicate to what degree the spatial distribution of connectivity is similar to the distribution of neurotransmitter uptake.“

      Methods: “A custom master parcellation in MNI space was created in Matlab using SPM functions by combining three existing parcellations to include cortical regions19, structures of the basal ganglia20 and cerebellar regions21. Regions that were (partially) overlapping between the atlases were only selected once. The final compound parcellation had 476 regions in total. This parcellation was applied to both PET and structural and functional connectivity maps using SPM and custom code. This allowed for the calculation of spatial correlations, providing a statistical measure of spatial similarity of the PET intensity and MRI connectivity distributions. For this, Spearman’s ranked correlations were used to calculate correlations between the PET images, such as the dopamine aggregate map and both functional and structural beta connectivity networks (Figure 3). The analysis was repeated for individual tracers showing similar results Supplementary figure 2. Finally, to validate these results, a control analysis was performed using a GABA PET scan from the same open dataset of neurotransmitter uptake following the same pipeline (Figure 2A, 2B).”

      • All of the recordings were taken in an eyes-closed condition. This is likely to affect the power of alpha oscillations; the authors should comment on this.

      We agree with the reviewer that this will likely have influenced the results. However, given that the key result of our paper is the abundance and circuit topography of beta oscillations, it is unlikely that increased alpha in some channels will have led to false positive results for beta. If anything, it may have increased the contrast leading to a more conservative estimate of which channels truly show strong beta dominance. On the other hand, we should acknowledge that this limitation can affect the interpretation of the alpha result. Another reason for us to primarily focus on beta in the discussion and results presentation. 

      Changes in manuscript: We now comment on this in the results:

      “It should be noted that that alpha recordings were performed in eyes closed which is known to increase alpha power, which may influence the generalizability of the alpha maps to an eyes open condition. However, given that our primary use of alpha was to act as a control, we believe that this should not affect the interpretability of the key findings of our study.” 

      • Although the relative proportion of theta and gamma channels is lower, it would be interesting to see the distribution of channels in a SOM figure.

      As described above, we have now added supplementary figure 1 that accommodates the topography but not the network analyses.

      • Figure legend - typo - "Neither, alpha nor beta" - no comma needed.

      Now fixed, thank you for pointing is to this lapse!

      • Results: " ere, we aimed to investigate the whole brain circuit representation of beta activity, which is impossible with current neurophysiology approaches" not entirely accurate; suggest rephrasing it to "Here, we aimed to investigate the whole brain circuit representation of beta activity, which is impossible with non-invasive neurophysiology approaches "

      Thank you for suggesting the alternative formulation. 

      Changes in manuscript: The text has been modified as per the suggestion and now reads “Here, we aimed to investigate the whole brain circuit representation of beta activity, which is impossible with non-invasive neurophysiology approaches”.

      • Results - typo - "cortical brain areas, that exhibit resting beta activity share a common brain network" - no comma needed.

      Thank you for the suggestion, the comma has been removed to better the flow of the sentence structure as suggested.

    1. Author response:

      eLife Assessment

      This useful study presents the first detailed and comprehensive description of brain sulcus anatomy of a range of carnivoran species based on a robust manual labeling model allowing species comparisons. Although the database is recognized and the method for reconstructing cortical surfaces is convincing, the evidence supporting the conclusions is incomplete due to the lack of appropriate quantitative measurements and analyses. Considering additional specimens to assess intraspecies variations, as well as exploring the functional correlates of interspecies differences would increase the scope of the study. Setting an instructive foundation for comparative anatomy, this study will be of interest to neuroscientists and neuroimaging researchers interested in that field, as well as in brain morphology and sulcal patterns, their phylogeny, and ontogeny in relation to functional development and behaviour. 

      We are pleased that our primary objective of creating a comprehensive framework to navigate carnivoran brains is considered as successfully achieved and that our work is expected to be of broad interest to various disciplines, as it provides the foundation for future investigations into carnivoran brain organization.

      As we will set out below, a description of the major sulci is an appropriate measure for large-scale comparative anatomy — it is stable enough in the population of each species to not require a large N, provides a suitable variability across species, and can be related to other aspects of between-species diversity. We will include a number of additional species to increase the scope of the study, as suggested. Although a quantitative assessment of functional correlates is, in principle, beyond the scope of this first foundational paper, we will provide a first start of this as well. We emphasize, however, that this was a secondary outcome, emerging after first application of the framework.

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The paper by Boch and colleagues, entitled Comparative Neuroimaging of the Carnivore Brain: Neocortical Sulcal Anatomy, compares and describes the cortical sulci of eighteen carnivore species, and sets a benchmark for future work on comparative brains. 

      Based on previous observations, electrophysiological, histological and neuroimaging studies and their own observations, the authors establish a correspondence between the cortical sulci and gyri of these species. The different folding patterns of all brain regions are detailed, put into perspective in relation to their phylogeny as well as their potential involvement in cortical area expansion and behavioral differences. 

      Strengths: 

      This is a pioneering article, very useful for comparative brain studies and conducted with great seriousness and based on many past studies. The article is well-written and very didactic. The different protocols for brain collection, perfusion, and scanning are very detailed. The images are self-explanatory and of high quality. The authors explain their choice of nomenclature and labels for sulci and gyri on all species, with many arguments. The opening on ecology and social behavior in the discussion is of great interest and helps to put into perspective the differences in folding found at the level of the different cortexes. In addition, the authors do not forget to put their results into the context of the laws of allometry. They explain, for example, that although the largest brains were the most folded and had the deepest folds in their dataset, they did not necessarily have unique sulci, unlike some of the smaller, smoother brains. 

      Weaknesses: 

      The article is aware of its limitations, not being able to take into account inter-individual variability within each species, inter-hemispheric asymmetries, or differences between males and females. However, this does not detract from their aim, which is to lay the foundations for a correspondence between the brains of carnivores so that navigation within the brains of these species can be simplified for future studies. This article does not include comparisons of morphometric data such as sulci depth, sulci wall surface, or thickness of the cortical ribbon around the sulci. 

      We thank the reviewer for their overwhelmingly positive evaluation of our work. As noted by the reviewer, our primary aim was to establish a framework for navigating carnivoran brains to lay the foundation for future research. We are pleased that this objective is deemed as successfully achieved.

      As the reviewer points out, we do not quantify within-species intraindividual differences. This is a conscious choice; we aimed to emphasize breadth of species over individuals, as is standard in large-scale comparative anatomy (cf. Heuer et al., 2023, eLife; Suarez et al., 2022, eLife). Following the logic of phylogenetic relationships, the presence of a particular sulcus in related species is also a measure of reliability. We felt safe in this choice, as previous work in both primates and carnivorans has shown that differences across major sulci across individuals are a matter of degree rather than a case of presence or absence (Connolly, 1950, External morphology of the primate brain, C.C. Thomas; Hecht et al., 2019 J Neurosci; Kawamuro 1971 Acta Anat., Kawamuro & Naito, 1977, Acta Anat.). In our revised manuscript, we aim to include some additional individuals of selected species as supplementary material, further illustrating this point.

      We feel that measures such as sulci depth, sulci wall surface, or thickness of the cortical ribbon are measures that vary more across individuals and we have therefore not included them in the study. In addition, these are measures that are not generally used as between-species comparative measures, whereas sulcal patterning is (cf. Amiez et al., 2019, Nat Comms; Connolly, 1950; Miller et al., 2021, Brain Behav Evol; Radinsky 1975, J Mammal; Radinsky 1969, Ann N Y Acad Sci; Welker & Campos 1963 J. Comp Neurol).

      Reviewer #2 (Public review): 

      Summary: 

      The authors have completed MRI-based descriptions of the sulcal anatomy of 18 carnivoran species that vary greatly in behaviour and ecology. In this descriptive study, different sulcal patterns are identified in relation to phylogeny and, to some extent, behaviour. The authors argue that the reported differences across families reflect behaviour and electrophysiology, but these correlations are not supported by any analyses. 

      Strengths: 

      A major strength of this paper is using very similar imaging methods across all specimens. Often papers like this rely on highly variable methods so that consistency reduces some of the variability that can arise due to methodology. 

      The descriptive anatomy was accurate and precise. I could readily follow exactly where on the cortical surface the authors referring. This is not always the case for descriptive anatomy papers, so I appreciated the efforts the authors took to make the results understandable for a broader audience. 

      I also greatly appreciate the authors making the images open access through their website. 

      Weaknesses: 

      Although I enjoyed many aspects of this manuscript, it is lacking in any quantitative analyses that would provide more insights into what these variations in sulcal anatomy might mean. The authors do discuss inter-clade differences in relation to behaviour and older electrophysiology papers by Welker, Campos, Johnson, and others, but it would be more biologically relevant to try to calculate surface areas or volumes of cortical fields defined by some of these sulci. For example, something like the endocast surface area measurements used by Sakai and colleagues would allow the authors to test for differences among clades, in relation to brain/body size, or behaviour. Quantitative measurements would also aid significantly in supporting some of the potential correlations hinted at in the Discussion. 

      Although quantitative measurements would be helpful, there are also some significant concerns in relation to the specimens themselves. First, almost all of these are captive individuals. We know that environmental differences can alter neocortical development and humans and nonhuman animals and domestication affects neocortical volume and morphology. Whether captive breeding affects neocortical anatomy might not be known, but it can affect other brain regions and overall brain size and could affect sulcal patterns. Second, despite using similar imaging methods across specimens, fixation varied markedly across specimens. Fixation is unlikely to affect the ability to recognize deep sulci, but variations in shrinkage could nevertheless affect overall brain size and morphology, including the ability to recognize shallow sulci. Third, the sample size = 1 for every species examined. In humans and nonhuman animals, sulcal patterns can vary significantly among individuals. In domestic dogs, it can even vary greatly across breeds. It, therefore, remains unclear to what extent the pattern observed in one individual can be generalized for a species, let alone an entire genus or family. The lack of accounting for inter-individual variability makes it difficult to make any firm conclusions regarding the functional relevance of sulcal patterns. 

      We thank the reviewer for their assessment of our work. The primary aim of this study was to establish a framework for navigating carnivoran brains by providing a comprehensive overview of all major neocortical sulci across eighteen different species. Given the inconsistent nomenclature in the literature and the lack of standardized criteria (“recipes”) for identifying the major sulci, we specifically focused on homogenizing the terminology and creating recipes for their identification. Moreover, we also generated digital surfaces of all brains and will also add sulcal masks to further facilitate future research building on our framework. We are pleased to hear that we succeeded in our primary objective.

      We respectfully disagree with the reviewer on two accounts, where we believe the reviewer is not judging the scope of the current work.

      The first is with respect to individual differences. To the best of our knowledge, differences between captive and wild animals, or indeed between individuals, do not affect the presence or absence of any major sulci. No differences in sulcal patterns were detected between captive and (semi-)wild macaques (cf. Sallet et al., 2011, Science; Testard et al., 2022, Sci Adv), different dog breeds (Hecht et al., 2019 J Neurosci) or foxes selectively bred to simulate domestication, compared to controls (Hecht et al., 2021 J. Neurosci). Indeed, we do not find major differences between wolf-like canid species, suggesting that a difference between individuals of the same species is even more unlikely. Nevertheless, we agree with the reviewer that building up a database like ours will benefit from providing as much information about the samples as possible to enable these issues to be tested. We, therefore, will update our table to include if the animals were from captive or wild populations. Moreover, we aim, where possible, to include both wild and captive animals of the same species if they are available in our revision.

      The second is in the quantification of structure/function relationships. We believe the sulci atlases themselves are the main deliverables of this project. We felt it prudent to include some qualitative descriptions of the relationship between sulci as we observed them and behaviours as known from the literature as an illustration of the possibilities that this foundational work opens us. This approach also allowed us to confirm previous findings based on observations from a less diverse range of carnivoran species and families (Radinsky 1968 J Comp Neurol; Radinsky 1969, Ann N Y Acad Sci; Welker & Campos 1963 J Comp Neurol; Welker & Seidenstein, 1959 J Comp Neurol). However, a full statistical framework for analysis is beyond the scope of this paper. Our group has previously worked on methods to quantitatively compare brain organization across species — indeed, we have developed a full framework for doing so (Mars et al., 2021, Annu Rev Neurosci), based on the idea that brains that differ in size and morphology should be compared based on anatomical features in a common feature space. Previously, we have used white matter anatomy (Mars et al., 2018, eLife) and spatial transcriptomics (Beauchamp et al., 2021, eLife). The present work presents the foundation for this approach to be expanded to sulcal anatomy, but the full development of this approach will be the topic of future communications.

      Nevertheless, we aim to include a first step quantitative analysis of the relationship between the presence and absence of particular sulci and the two behaviours of interest in our manuscript.

      We also would like to emphasize that we strongly believe that looking at measures of brain organization at a more detailed level than brain size or relative brain size is informative. Indeed, studies looking at correlations between brain size and particular behavioural variables, although very prominent in the literature, have found it very difficult to distinguish between competing behavioural hypotheses (Healy, 2021, Adaptation and the brain, OUP). In contrast, connectivity has a much more direct relationship to behavioural differences across species (Bryant et al., 2024, bioRxiv), as does sulcal anatomy (Amiez et al., 2019, Nat Comms; Miller et al., 2021, Brain Behav Evol). Moreover, such measures are less sensitive to the effects of fixation since that will affect brain size but not the presence or absence of a sulcus.

      Following the reviewer’s recommendations, we will endeavour to include an even broader range of species in the revised version.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The authors address a fundamental question for cell and tissue biology using the skin epidermis as a paradigm and ask how stratifying self-renewing epithelia induce diCerentiation and upward migration in basal dividing progenitor cells to generate suprabasal barrier-forming cells that are essential for a functional barrier formed by such an epithelium. The authors show for the first time that an increase in intracellular actomyosin contractility, a hallmark of barrier-forming keratinocytes, is suCicient to trigger terminal diCerentiation. Hence the data provide in vivo evidence of the more general interdependency of cell mechanics and diCerentiation. The data appear to be of high quality and the evidences are strengthened through a combination of diCerent genetic mouse models, RNA sequencing, and immunofluorescence analysis. 

      To generate and maintain the multilayered, barrier-forming epidermis, keratinocytes of the basal stem cell layer diCerentiate and move suprabasally accompanied by stepwise changes not only in gene expression but also in cell morphology, mechanics, and cell position. Whether any of these changes is instructive for diCerentiation itself and whether consecutive changes in diCerentiation are required remains unclear. Also, there are few comprehensive data sets on the exact changes in gene expression between diCerent states of keratinocyte diCerentiation. In this study, through genetic fluorescence labeling of cell states at diCerent developmental time points the authors were able to analyze gene expression of basal stem cells and suprabasal diCerentiated cells at two diCerent stages of maturation: E14 (embryonic day 14) when the epidermis comprises mostly two functional compartments (basal stem cells and suprabasal so-called intermediate cells) and E16 when the epidermis comprise three (living) compartments where the spinous layer separates basal stem cells from the barrier-forming granular layer, as is the case in adult epidermis. Using RNA bulk sequencing, the authors developed useful new markers for suprabasal stages of diCerentiation like MafB and Cox1. The transcription factor MafB was then shown to inhibit suprabasal proliferation in a MafB transgenic model. 

      The data indicate that early in development at E14 the suprabasal intermediate cells resemble in terms of RNA expression, the barrier-forming granular layer at E16, suggesting that keratinocytes can undergo either stepwise (E16) or more direct (E14) terminal diCerentiation. 

      Previous studies by several groups found an increased actomyosin contractility in the barrier-forming granular layer and showed that this increase in tension is important for epidermal barrier formation and function. However, it was not clear whether contractility itself serves as an instructive signal for diCerentiation. To address this question, the authors use a previously published model to induce premature hypercontractility in the spinous layer by using spastin overexpression (K10-Spastin) to disrupt microtubules (MT) thereby indirectly inducing actomyosin contractility. A second model activates myosin contractility more directly through overexpression of a constitutively active RhoA GEF (K10Arhgef11CA). Both models induce late diCerentiation of suprabasal keratinocytes regardless of the suprabasal position in either spinous or granular layer indicating that increased contractility is key to induce late diCerentiation of granular cells. A potential weakness of the K10-spastin model is the disruption of MT as the primary eCect which secondarily causes hypercontractility. However, their previous publications provided some evidence that the eCect on diCerentiation is driven by the increase in contractility (Ning et al. cell stem cell 2021). Moreover, the data are confirmed by the second model directly activating myosin through RhoA. These previous publications already indicated a role for contractility in diCerentiation but were focused on early diCerentiation. The data in this manuscript focus on the regulation of late diCerentiation in barrier-forming cells. These important data help to unravel the interdependencies of cell position, mechanical state, and diCerentiation in the epidermis, suggesting that an increase in cellular contractility in most apical positions within the epidermis can induce terminal diCerentiation. Importantly the authors show that despite contractility-induced nuclear localization of the mechanoresponsive transcription factor YAP in the barrier-forming granular layer, YAP nuclear localization is not suCicient to drive premature diCerentiation when forced to the nucleus in the spinous layer. 

      Overall, this is a well-written manuscript and a comprehensive dataset. Only the RNA sequencing result should be presented more transparently providing the full lists of regulated genes instead of presenting just the GO analysis and selected target genes so that this analysis can serve as a useful repository. The authors themselves have profited from and used published datasets of gene expression of the granular cells. Moreover, some of the previous data should be better discussed though. The authors state that forced suprabasal contractility in their mouse models induces the expression of some genes of the epidermal diCerentiation complex (EDC). However, in their previous publication, the authors showed that major classical EDC genes are actually not regulated like filaggrin and loricrin (Muroyama and Lechler eLife 2017). This should be discussed better and necessitates including the full list of regulated genes to show what exactly is regulated. 

      We thank all the reviewers for their suggestions and comments.

      Thank you especially for the reminder to include gene lists. We had an excel document with all this data but neglected to upload it with the initial manuscript decision. This includes all the gene signatures for the diCerent cell compartments across development. We will also include a page that lists all EDC genes and whether they were up-regulated in intermediate cells and cells in which contractility was induced. Further, we note that all the RNA-Seq datasets are available for use on GEO. 

      In our previous publication, we indeed included images showing a lack of change in loricrin and filaggrin in the embryos where spastin was expressed in the diCerentiated epidermis. Consistent with this, there is no change in Lor mRNA levels by RNA-Seq, (it is one of the rare EDC genes that is unchanged). In contrast, Flg mRNA was up in the RNASeq, though we didn’t see a dramatic change in protein levels. We have not further pursued whether this reflects translational regulation. That said, our data clearly show that other genes associated with granular fate were increased in the contractile skin.  

      Reviewer #2 (Public review): 

      Summary: 

      The manuscript from Prado-Mantilla and co-workers addresses mechanisms of embryonic epidermis development, focusing on the intermediate layer cells, a transient population of suprabasal cells that contributes to the expansion of the epidermis through proliferation. Using bulk-RNA they show that these cells are transcriptionally distinct from the suprabasal spinous cells and identify specific marker genes for these populations. They then use transgenesis to demonstrate that one of these selected spinous layer-specific markers, the transcription factor MafB is capable of suppressing proliferation in the intermediate layers, providing a potential explanation for the shift of suprabasal cells into a non-proliferative state during development. Further, lineage tracing experiments show that the intermediate cells become granular cells without a spinous layer intermediate. Finally, the authors show that the intermediate layer cells express higher levels of contractilityrelated genes than spinous layers and overexpression of cytoskeletal regulators accelerates the diCerentiation of spinous layer cells into granular cells. 

      Overall the manuscript presents a number of interesting observations on the developmental stage-specific identities of suprabasal cells and their diCerentiation trajectories and points to a potential role of contractility in promoting diCerentiation of suprabasal cells into granular cells. The precise mechanisms by which MafB suppresses proliferation, how the intermediate cells bypass the spinous layer stage to diCerentiate into granular cells, and how contractility feeds into these mechanisms remain open. Interestingly, while the mechanosensitive transcription factor YAP appears deferentially active in the two states, it is shown to be downstream rather than upstream of the observed diCerences in mechanics. 

      Strengths: 

      The authors use a nice combination of RNA sequencing, imaging, lineage tracing, and transgenesis to address the suprabasal to granular layer transition. The imaging is convincing and the biological eCects appear robust. The manuscript is clearly written and logical to follow. 

      Weaknesses: 

      While the data overall supports the authors' claims, there are a few minor weaknesses that pertain to the aspect of the role of contractility, The choice of spastin overexpression to modulate contractility is not ideal as spastin has multiple roles in regulating microtubule dynamics and membrane transport which could also be potential mechanisms explaining some of the phenotypes. Use of Arghap11 overexpression mitigates this eCect to some extent but overall it would have been more convincing to manipulate myosin activity directly. It would also be important to show that these manipulations increase the levels of F-actin and myosin II as shown for the intermediate layer. It would also be logical to address if further increasing contractility in the intermediate layer would enhance the diCerentiation of these cells. 

      We agree with the reviewer that the development of additional tools to precisely control myosin activity will be of great use to the field. That said, our series of publications has clearly demonstrated that ablating microtubules results in increased contractility and that this phenocopies the eCects of Arhgef11 induced contractility (Ning et al, Cell Stem Cell 2021). Further, we showed that these phenotypes were rescued by myosin inhibition with blebbistatin. Our prior publications also showed a clear increase in junctional acto-myosin through expression of either spastin or Arhgef11, as well as increased staining for the tension sensitive epitope of alpha-catenin (alpha-18) (also in Ning et al, 2021).  We are not aware of tools that allow direct manipulation of myosin activity that currently exist in mouse models.  

      The gene expression analyses are relatively superficial and rely heavily on GO term analyses which are of course informative but do not give the reader a good sense of what kind of genes and transcriptional programs are regulated. It would be useful to show volcano plots or heatmaps of actual gene expression changes as well as to perform additional analyses of for example gene set enrichment and/or transcription factor enrichment analyses to better describe the transcriptional programs 

      We will include an excel document that lists all the gene signatures. Additionally, all of our data are deposited in GEO for others to perform their own analyses.  

      Claims of changes in cell division/proliferation changes are made exclusively by quantifying EdU incorporation. It would be useful to more directly look at mitosis. At minimum Y-axis labels should be changed from "% Dividing cells" to % EdU+ cells to more accurately represent findings 

      We will change the axis label to precisely match our analysis.  

      Despite these minor weaknesses the manuscript is overall of high quality, sheds new light on the fundamental mechanisms of epidermal stratification during embryogenesis, and will likely be of interest to the skin research community. 

      Reviewer #3 (Public review): 

      Summary: 

      This is an interesting paper by Lechler and colleagues describing the transcriptomic signature and fate of intermediate cells (ICs), a transient and poorly defined embryonic cell type in the skin. ICs are the first suprabasal cells in the stratifying skin and unlike laterdeveloping suprabasal cells, ICs continue to divide. Using bulk RNA seq to compare ICs to spinous and granular transcriptomes, the authors find that IC-specific gene signatures include hallmarks of granular cells, such as genes involved in lipid metabolism and skin barrier function that are not expressed in spinous cells. ICs were assumed to diCerentiate into spinous cells, but lineage tracing convincingly shows ICs diCerentiate directly into granular cells without passing through a spinous intermediate. Rather, basal cells give rise to the first spinous cells. They further show that transcripts associated with contractility are also shared signatures of ICs and granular cells, and overexpression of two contractility inducers (Spastin and ArhGEF-CA) can induce granular and repress spinous gene expression. This contractility-induced granular gene expression does not appear to be mediated by the mechanosensitive transcription factor, Yap. The paper also identifies new markers that distinguish IC and spinous layers and shows the spinous signature gene, MafB, is suCicient to repress proliferation when prematurely expressed in ICs. 

      Strengths: 

      Overall this is a well-executed study, and the data are clearly presented and the findings convincing. It provides an important contribution to the skin field by characterizing the features and fate of ICs, a much-understudied cell type, at high levels of spatial and transcriptomic detail. The conclusions challenge the assumption that ICs are spinous precursors through compelling lineage tracing data. The demonstration that diCerentiation can be induced by cell contractility is an intriguing finding and adds a growing list of examples where cell mechanics influence gene expression and diCerentiation. 

      Weaknesses: 

      A weakness of the study is an over-reliance on overexpression and suCiciency experiments to test the contributions of MafB, Yap, and contractility in diCerentiation. The inclusion of loss-of-function approaches would enable one to determine if, for example, contractility is required for the transition of ICs to granular fate, and whether MafB is required for spinous fate. Second, whether the induction of contractility-associated genes is accompanied by measurable changes in the physical properties or mechanics of the IC and granular layers is not directly shown. The inclusion of physical measurements would bolster the conclusion that mechanics lies upstream of diCerentiation. 

      We agree that loss of function studies would be useful. For MafB, these have been performed in cultured human keratinocytes, where loss of MafB and its ortholog cMaf results in a phenotype consistent with loss of spinous diCerentiation (Lopes-Pajares, Dev Cell 2015). Due to the complex genetics involved, generating these double mutant mice is beyond the scope of this study. Loss of function studies of myosin are also complicated by genetic redundancy of the non-muscle type II myosin genes, as well as the role for these myosins in actin cross linking in addition to contractility. In addition, we have found that these myosins are quite stable in the embryonic intestine, with loss of protein delayed by several days from the induction of recombination. Therefore, elimination of myosins by embryonic day e14.5 with our current drivers is not likely possible. Thus, generation of inducible inhibitors of contractility is a valuable future goal. 

      A number of recent papers have used AFM of skin sections to probe tissue rigidity. We have not attempted these studies and are unclear about the spatial resolution and whether, in the very thin epidermis at these stages we could spatially resolve diCerences. That said, we previously assessed the macro-contractility of tissues in which myosin activity was induced and demonstrated that there was a significant increase in this over a tissue-wide scale (Ning et al, Cell Stem Cell, 2021).  

      Finally, whether the expression of granular-associated genes in ICs provides them with some sort of barrier function in the embryo is not addressed, so the role of ICs in epidermal development remains unclear. Although not essential to support the conclusions of this study, insights into the function of this transient cell layer would strengthen the overall impact.  

      By traditional dye penetration assays, there is no epidermal barrier at the time that intermediate cells exist. One interpretation of the data is that cells are beginning to express mRNAs (and in some cases, proteins) so that they are able to rapidly generate a barrier as they become granular cells. We have attempted experiments to ablate intermediate cells with DTA expression - this resulted in ineCicient and delayed cell death and thus did not yield strong conclusions. Our findings that transcriptional regulators of granular diCerentiation (such as Grhl3 and Hopx) are also present in intermediate cells, should allow future analysis of the eCects of their ablation on the earliest stages of granular diCerentiation from intermediate cells.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This paper aims to address the establishment and maintenance of neural circuitry in the case of a massive loss of neurons. The authors used genetic manipulations to ablate the principal projection neurons, the mitral/tufted cells, in the mouse olfactory bulb. Using diphtheria toxin (Tbx21-Cre:: loxP-DTA line) the authors ablated progressively large numbers of M/T cells postnatally. By injecting diphtheria toxin (DT) into the Tbx21-Cre:: loxP-iDTR line, the authors were able to control the timing of the ablation in the adult stage. Both methods led to the successful elimination of a majority of M/TCs by 4 months of age. The authors made a few interesting observations. First, they found that the initial pruning of the remaining M/T cell primary dendrite was unaffected. However, in adulthood, a significant portion of these cells extended primary dendrites to innervate multiple glomeruli. Moreover, the incoming olfactory sensory neuron (OSN) axons, as examined for those expressing the M72 receptor, showed a divergent innervation pattern as well. The authors conclude that M/T cell density is required to maintain the dendritic structures and the olfactory map. To address the functional consequences of eliminating a large portion of principal neurons, the authors conducted a series of behavioral assays. They found that learned odor discrimination was largely intact. On the other hand, mating and aggression were reduced. The authors concluded that learned behaviors are more resilient than innate ones.

      The study is technically sound, and the results are clear-cut. The most striking result is the contrast between the normal dendritic pruning during early development and the expanded dendritic innervation in adulthood. It is a novel discovery that can lead to further investigation of how the single-glomerulus dendritic innervation is maintained. The authors conducted a

      few experiments to address potential mechanisms, but it is inconclusive, as detailed below. It is also interesting to see that the massive neuronal loss did not severely impact learned odor discrimination. This result, together with previous studies showing nearly normal odor discrimination in the absence of large portions of the olfactory bulb or scrambled innervation patterns, attests to the redundancy and robustness of the sensory system. The discussion should take into account these other studies in a historical context.

      Main comments:

      (1) In previous studies, it has been concluded that dendritic pruning unfolds independently, regardless of the innervation pattern or activity of the OSNs. The new observation bolsters this conclusion by showing that a loss of neighboring M/T cells does not affect the developmental process. A more nuanced discussion comparing the results of these studies would strengthen the paper.

      We thank the reviewer for the suggestion. We now include an extended discussion citing relevant previous works in the manuscript (Lines 351-374).

      (2) The authors propose that a certain density of M/T is required to prevent the divergent innervation of primary dendrites, but the evidence is not sufficient to support this proposal. The experiment with low-dose DT injection to ablate a smaller portion of M/T cells did not change the percentage of cells innervating two or more glomeruli. The authors suggest that a threshold must be met, but this threshold is not determined.  

      In our experiments using high-dose DT, we hypothesized that there may be many empty glomeruli (glomeruli not innervated by M/T cells), and as a result, that some of the remaining M/T cells could branch their apical dendrite tuft into multiple empty glomeruli. To test this hypothesis, we carried out another experiment using a lower dose of DT. In this experiment, the fraction of remaining M/T cells was 25% (~10,000 M/T cells), which was higher than with the high DT dose (5%, or around 2,000 M/T cells) , but still significantly lower than wild type mice (~40,000 cells M/T cells). With around 2,000 glomeruli and 10,000 M/T per bulb, it could be expected that each glomerulus would be innervated by ~5 M/T cells (on average). However, we found that the percentage of M/T cells projecting to multiple glomeruli (around 40%) was similar when either 10,000 or 2,000 of M/T remained in the bulb. In addition, it is important to emphasize that even in wt animals with a full set of M/T cells, a small percentage of M/T cells still innervate more than one glomerulus (Lin et al., 2000). Together, these observations suggest that the innervation of multiple glomeruli by M/T cells is not simply due to the presence of empty glomeruli, and that our hypothesis was not correct.

      We have added a comment explaining this issue in the Results section (Lines 200-203).

      (3) The authors suggest that neural activity is not required for this plasticity. The evidence was derived primarily from naris occlusion and neuronal silencing using Kir2.1. While the results are consistent with the notion, it is a rather narrow interpretation of how neural activity affects circuit configuration. Perturbation of neural activity also entails an increase in firing. Inducing the activity of the neurons may alter this plasticity. Silencing per se may induce a homeostatic response that expands the neurite innervation pattern to increase synaptic input to compensate for the loss of activity. Thus, further silencing the cells may not reduce multiglomerular innervation, but an increased activity may.

      The experiments with Kir2.1 demonstrate that the structural plasticity observed after reducing the total number of M/T cells in an animal is not regulated by the firing action potentials in the remaining cells. Instead, this experiment indicates that the observed structural plasticity may be regulated by other types of mechanisms (including increased synaptic excitation as suggested by the reviewer) that do not require the firing of action potentials in M/T cells. 

      We now have included a comment regarding this point (Lines 243-247).  

      (4) There is a discrepancy between this study and the one by Fujimoto et al. (Developmental Cell; 2023), which shows that not only glutamatergic inputs to the primary dendrite can facilitate pruning of remaining dendrites but also Kir2.1 overexpression can significantly perturb dendritic pruning. This discrepancy is not discussed by the authors.

      We agree that it would be useful to contrast these two works.

      In our experiments, performed in adult animals, we blocked sensory input by performing naris occlusion before we induced ablation of M/T cells. In a separate experiment, also in adult animals, we expressed the Kir2.1 channel, to reduce the ability of neurons to fire action potentials. With both types of manipulations, we observed that the ablation of a large fraction of M/T cells still caused the remaining M/T cells to maintain a single apical dendrite that sprouts several new tufts towards multiple glomeruli. A recent paper (Fujimoto et al., 2023)) in which Kir2.1 was expressed in a large percentage of M/T starting during embryonic development showed that these “silent” M/T cells failed to prune their arbors to a single dendrite. In aggregate, these observations indicate that action potentials are necessary for the normal pruning that occurs during perinatal development (Fujimoto et al., 2023), but are not required for the expansion of dendritic trees caused by ablating a large fraction of M/T cells in adult animals (our current manuscript).

      We have now explained the differences between both studies in the manuscript (Lines 427-439).

      (5) An alternative interpretation of the discrepancy between the apparent normal pruning by p10 and expanded dendritic innervation in adulthood is that there are more cells before P10, when ~25% of M/T cells are present, but at a later date only 1-3% are present. 

      The relationship between the number of M/T cells and single glomerulus innervation has not been explored during postnatal development. It would be important to test this hypothesis.

      We agree with this comment, and in lines 375-381 we discuss the discrepancy between normal refinement during development, and dendritic sprouting in adults.

      Cre is expressed in M/T cells and it induces DTA expression starting around P0. The elimination of M/T cells starts at this time, and continues until by P10, when more than 75% of M/T have been eliminated. At P21 more than 90% of M/T have been eliminated, and their number remains stable thereafter.

      Pruning of the dendrites of M/T cells starts at P0 and it is mostly complete by P10. Therefore, it is possible that between P0 and P7, when dendrites are being pruned, the number of M/T cells remaining in the bulb is still over a threshold that does not interfere with the process of normal dendrite pruning. We agree that it would be very informative to perform additional experiments in the future where a large set of M/T cells could be ablated before pruning occurs (ideally before P0). 

      (6) The authors attribute the change in the olfactory map to the loss of M/T cells. Another obvious possibility is that the diffused projection is a response to the change in the olfactory bulb size. With less space to occupy, the axons may be forced to innervate neighboring glomeruli. It is not known how the total number of glomeruli is affected. This question could be addressed by tracking developmental changes in bulb volume and glomerular numbers.

      Certainly, this is a possibility, and we have now included a comment on this regard in the manuscript (Lines 473-480). 

      We believe that there are three likely scenarios that could account for these observations:

      (a) After ablating M/T cells, the tufts of the remaining M/T cells sprout into multiple glomeruli, and this causes the axons of OSNs to project into multiple glomeruli.

      (b) Ablating M/T cells may cause changes in other OB cells that make synapses in the glomeruli (ETCs, PGCs, sAC, etc…), and the misrouting of OSN axons that we observed in our experiments may be a secondary effect caused by the elimination of M/T cells.

      (c) After ablating the majority of M/T cells, the olfactory bulb gets reduced in size, and the axons of OSNs find it difficult to precisely converge on a target that now has become smaller. As a result, the axons of OSNs fail to converge on single glomeruli.

      (7) The retained ability to discriminate odors upon reinforced training is not surprising in light of a number of earlier studies. For example, Slotnick and colleagues have shown that rats losing ~90% of the OB can retain odor discrimination. Weiss et al have shown that humans without an olfactory bulb can perform normal olfactory tasks. Gronowitz et al have used theoretical prediction and experimental results to demonstrate that perturbing the olfactory map does not have a major impact on olfactory discrimination. Fleischmann et al have shown that mice with a monoclonal nose can discriminate odors. The authors should discuss their results in these contexts.

      We apologize for this important oversight - we now include a more elaborate discussion including the relevant references as suggested in the manuscript (Line 483-496).

      (8) It should be noted that odor discrimination resulting from reinforcement training does not mean normal olfactory function. It is a highly artificial situation as the animals are overtrained. It should not be used as a measure of the robustness of the olfactory sense. Natural odor discrimination (without training), detection threshold, and innate appetitive/aversive response to certain odors may be affected. These experiments were not conducted.

      We agree that the standard tests commonly used to measure olfactory function require substantial training, and thus, are quite artificial. However, these tests are used because they allow a more precise quantification of olfactory function than those relying on natural behaviors.  

      We have now included a few sentences to address this point in the results (Lines 321322) and discussion sections (Lines 541-543).

      (9) The social behaviors were conducted using relatively coarse measures (vaginal plug and display of aggression). Moreover, these behaviors are most likely affected by the disruption of the AOB mitral cells and have little to do with the dendritic pruning process described in the paper. It is misleading to lump social behaviors with innate responses to odors.

      This point follows the same logic as the previous one. The olfactory tests that rely on natural behaviors are quite coarse and difficult to quantify. In contrast, the olfactory tests using apparatuses such as olfactometers can be quantified with precision, but they are artificial. We agree that some of the naturalistic behaviors that we studied such as mating or aggression may depend to a large extent on the AOB (although it is possible that the MOB may also be involved in these tasks to a degree). In our initial version of the manuscript, we commented on the anticipated relative involvement of the MOB and AOB in the studied tasks, but we have now added some additional sentences to make this point clearer. In addition, we now add a comment indicating that it is possible that the abnormal behaviors could simply be due to a reduction in the number of AOB M/T cells (~98.5% and ~ 85% elimination of M/T cells in the AOB in Tbx::DTA and Tbx::iDTR mice, respectively), regardless of the abnormal dendritic pruning of main OB M/T cells (Lines 530-534).

      See Figure 5E - M/T cells in AOB (Lines 1238-1239). 

      Reviewer #2 (Public Review):

      The authors make the interesting observation that the developmental refinement of apical M/T cell dendrites into individual glomeruli proceeds normally even when the majority of neighboring M/T cells are ablated. At later stages, the remaining neurons develop additional dendrites that invade multiple glomeruli ectopically, and similarly, OSN inputs to glomeruli lose projection specificity as well. The authors conclude that the normal density of M/T neurons is not required for developmental refinement, but rather for maintaining specific connectivity in adults.

      The observations are indeed quite striking; however, the authors' conclusions are not entirely supported by the data.

      (1) It is unclear whether the expression of diphtheria toxin that eventually leads to the ablation of the large majority of M/T neurons compromises the cell biology of the remaining ones.

      DT is an extremely potent toxin that kills cells by inhibiting proteins translation, and it has been demonstrated that the presence of a single DT molecule in a cell is sufficient to kill it, because of its highly efficient catalytic activity. Accordingly, previous experiments have shown that DT kills cells within a few hours after its appearance in the cytoplasm (Yamaizumi et al., 1978). In other words, all the published evidence suggests that if a cell is exposed to the action of DT, that cell will die shortly. There is no evidence that cells exposed to DT can survive and experience long-term effects. Finally, previous works have not observed any long-term changes in neurons directly caused by the actions of DT (Johnson et al., 2017).

      (2) The authors interpret the growth of ectopic dendrites later in life as a lack of maintenance of dendrite structure; however, maybe the observed changes reflect actually adaptations that optimize wiring for extremely low numbers of M/T neurons. The finding that olfactory behavior was less affected than predicted supports this interpretation.

      We do not know the cellular or molecular mechanisms that explain why reducing the density of M/T cells is followed by the growth of ectopic dendrites from the remaining M/T cells. We agree that the functional outcome of growing ectopic dendrites may result in an optimization of wiring in the bulb and could explain why olfactory function is relatively preserved. We now include a comment regarding this possibility (Lines 513-525).   

      (3) The number of remaining M/T neurons is much higher at P10 than later. Can the relatively large number of remaining neurons (or their better health status) be the reason that dendrites refine normally at the early developmental stages rather than a (currently unknown) developmental capacity that preserves refinement?

      We thank the reviewer for the suggestion, which was also raised by reviewer 1. 

      We agree with this comment, and in lines 375-381 we discuss the discrepancy between normal refinement during development, and dendritic sprouting in adults.

      Cre is expressed in M/T cells and it induces DTA expression starting around P0. The elimination of M/T cells starts at this time, and continues until by P10, when more than 75% of M/T have been eliminated. At P21 more than 90% of M/T have been eliminated, and their number remains stable thereafter.

      Pruning of the dendrites of M/T cells starts at P0 and it is mostly complete by P10. Therefore, it is possible that between P0 and P7, when dendrites are being pruned, the number of M/T cells remaining in the bulb is still over a threshold that does not interfere with the process of normal dendrite pruning. We agree that it would be very informative to perform additional experiments in the future where a large set of M/T cells could be ablated before pruning occurs (ideally before P0). 

      (4) While the effect of reduced M/T neuron density on both M/T dendrites and OSN axons is described well, the relationship between both needs to be characterized better: Is one effect preceding the other or do they occur simultaneously? Can one be the consequence of the other?

      Previous works have demonstrated that disrupting the topographic projection of the OSN axons has no effect on the structure of the apical dendrite of M/T cells (Ma et al., 2014; Nishizumi et al., 2019). Our experiments ablating a large fraction of M/T cells suggest that they are necessary for the correct targeting of OSN axons into the bulb. However, our experiments do not allow us to tell apart these 2 scenarios: 

      (a) the ablation of a large fraction of M/T cells directly causes the sprouting of the apical dendrite of M/T cells, and that this sprouting in turn causes the abnormal projection of OSN axons onto the bulb. 

      (b) the ablation of a large fraction of M/T cells first causes the axons of OSN to project abnormally onto multiple glomeruli in the bulb, and this in turn causes the dendrite of remaining M/T cells to sprout onto multiple glomeruli. 

      We now include a comment on the manuscript explaining this point. (Lines 473-492)

      (5) Page 7: the observation that not all neurons develop additional dendrites is not a sign of differences between cell types, it may be purely stochastic.

      This is correct, and we mention these 2 scenarios in the discussion (Line 407-408). 

      (6) Page 8: the fact that activity blockade did not affect the formation of ectopic dendrites does not suggest that the process is not activity-dependent: both manipulations have the same effect and may just mask each other.

      The experiments with Kir2.1 demonstrate that the structural plasticity observed after reducing the total number of M/T cells in an animal is not regulated by the firing action potentials in the remaining cells. Instead, this experiment indicates that the observed structural plasticity may be regulated by other types of mechanisms (including increased synaptic excitation as suggested by the reviewer) that do not require the firing of action potentials in M/T cells. 

      We now have included a comment regarding this point (Lines 243-247).  

      (7) It remains unclear how the observed structural changes can explain the behavioral effects.

      We agree that the relationship between structural changes and behavior was not appropriately explained in our manuscript. Our manipulations cause two major changes in the olfactory system, one primary, and several secondary. The primary change is a large reduction in the number of M/T cells both in the MOB and AOB. This reduction in M/T cell number triggers significant secondary changes in the connectivity of the bulb, including an abnormal projection of OSNs onto the OB, and the growth of ectopic dendrites from the remaining M/T cells into multiple glomeruli.

      The behavioral abnormalities displayed by these mice is ultimately caused by the reduction in the number of M/T cells, but it is likely that the secondary structural changes could regulate some of the behavioral phenomena that we observed. For example, in principle, it is possible that the ectopic dendrites innervating several glomeruli could help the bulb to perceive smells with a much reduced number of M/T cells. On the other hand, this promiscuous growth of dendrites into multiple glomeruli could make it more difficult for the animals to discriminate between smells. The same argument could be made about the fact that OSN axons project onto multiple glomeruli: we simply do not know if this change helps or makes it more difficult for the animal to detect smells.  

      We now include a comment regarding this issue (Lines 513-525).   

      Reviewer #1 (Recommendations For The Authors):

      Additional experiments and a more thorough discussion of the results, as suggested in the public review, would significantly strengthen the paper. Below are some specific parts that need to be addressed.

      There is a lack of information on how M/T cell numbers are quantified. Without the information, it is difficult to evaluate the claim. Using the tdTomato signal may miss cells that are not labeled due to the transgenic effect. 

      Although we cannot conclude that we are identifying the complete set of M/T cells (because the transgenic lines may fail to label some M/T cells), the number of M/T cells that we observed is similar to that previously reported (Richard et al., 2010). This concern has been included in the Results section (Lines 121-124).

      A more detailed description about M/T cells quantification has been added into the method section (Lines 627-632).

      There is a lack of information on the timeline of treatment and how measurement of the olfactory bulb volume is conducted.

      We now include a more detailed description of how the volume of the OB was measured in the methods (Lines 621-623).

      The volume measurement is inconsistent with the pictures shown. In Figure 1, supplemental data 2 panels B and C, it appears that the bulbs in DTA and DTR mice are about half in length in each dimension. This would translate into ~1/8 of the volume of the control mice.

      We measured the volume of the bulbs based on the Neurolucida reconstructions, and we observed that in both DTA and iDTR mice the volumes of their bulbs are roughly 50% compared to a wild type mouse. In Figure 1 - figure supplement 2 the sections that were shown for wild type, DTA and iDTR mice were not taken at the same position in the bulb, and this gave the impression that the bulbs from DTA and iDTR were much smaller than they really are. We now show sections for these three animals at equivalent positions in the bulb. 

      Figure 1 E and F have no legend.

      We apologize for this mistake - we have now added the legend for Figures 1E and F (Lines 1009-1013).

      Figure 3, supplemental data 2, it is not clear what the readers should be looking at. The data is confusing even for experts in the field. The authors should describe the figures more clearly, pointing out what they are supposed to show.

      We apologize for this, and we have now added a more detailed description of Figure3 – figure supplement 2 (Lines 1153-1167).

      In several figures, it is not clearly written what the comparisons were for where there are indications of statistical significance above the bars.

      We have now included a more detailed description of the statistics comparison in the figure legends.

      AAV serotype should be specified.

      The AAV serotype used to label M/T cells was the AAV-PHP.eB. We have added this information in the methods section of the manuscript. 

      Reviewer #2 (Recommendations For The Authors):

      Minor points

      Page 5, para 2: "The decrease in neuronal plasticity with age": it is unclear what "the decrease" refers to.

      We have changed this sentence in the text to make it clear:

      “The decrease in structural plasticity of M/T cells after apical dendrite refinement (Mizrahi and Katz, 2003),….”

      Line 146-148

      Is there a quantification of the effect of Kir2.1 overexpression alone (example shown in Figure 3D)?

      We did an experiment in IDTR animals in which a fraction of M/T cells expressed Kir2.1, and we split these animals in 2 groups: (a) animals that received an injection of DT, and (b) animals that did not receive any DT. We quantified the effect of Kir2.1 on M/T cells from animals that received DT injection (with an ablation of around of 90% of M/T cells) and we did not observe any clear statistically significant differences between cells expressing Kir2.1 or neurons that did not express Kir2.1 from other iDTR animals that also received DT injections. We did not quantify the possible effects of kir2.1 in the group of animals that did not receive DT because on a first inspection we did not observe any clear differences between Kir2.1 cells and neighboring wild type cells. 

      References

      Fujimoto S, Leiwe MN, Aihara S, Sakaguchi R, Muroyama Y, Kobayakawa R, Kobayakawa K, Saito T, Imai T. 2023. Activity-dependent local protection and lateral inhibition control synaptic competition in developing mitral cells in mice. Dev Cell S1534-5807(23)00237-X. doi:10.1016/j.devcel.2023.05.004

      Johnson RE, Tien N-W, Shen N, Pearson JT, Soto F, Kerschensteiner D. 2017. Homeostatic plasticity shapes the visual system’s first synapse. Nat Commun 8:1220. doi:10.1038/s41467-017-01332-7

      Lin DM, Wang F, Lowe G, Gold GH, Axel R, Ngai J, Brunet L. 2000. Formation of precise connections in the olfactory bulb occurs in the absence of odorant-evoked neuronal activity. Neuron 26:69–80. doi:10.1016/s0896-6273(00)81139-3

      Ma L, Wu Y, Qiu Q, Scheerer H, Moran A, Yu CR. 2014. A developmental switch of axon targeting in the continuously regenerating mouse olfactory system. Science 344:194–197. doi:10.1126/science.1248805

      Nishizumi H, Miyashita A, Inoue N, Inokuchi K, Aoki M, Sakano H. 2019. Primary dendrites of mitral cells synapse unto neighboring glomeruli independent of their odorant receptor identity. Commun Biol 2:1–12. doi:10.1038/s42003-018-0252-y

      Richard MB, Taylor SR, Greer CA. 2010. Age-induced disruption of selective olfactory bulb synaptic circuits. Proc Natl Acad Sci U S A 107:15613–15618. doi:10.1073/pnas.1007931107

      Yamaizumi M, Mekada E, Uchida T, Okada Y. 1978. One molecule of diphtheria toxin fragment A introduced into a cell can kill the cell. Cell 15:245–250. doi:10.1016/0092-8674(78)90099-5

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In their manuscript, "Nicotine enhances the stemness and tumorigenicity in intestinal stem cells via Hippo-YAP/TAZ and Notch signal pathway", authors Isotani et al claimed that this study identifies a NIC-triggered pathway regulating the stemness and tumorigenicity of ISCs and suggest the use of DBZ as a potential therapeutic strategy for treating intestinal tumors. However, the presented data do not support the primary claims.

      Weaknesses:

      My main reservation is that the quality of the results presented in the manuscript may not fully substantiate their conclusions. For instance, in Figure 2 A and B, it is challenging to discern a healthy organoid. This is significant, as the entirety of Figure 2 and several panels in Figures 3 - 5 are based on these organoid assays. Additionally, there seems to be a discrepancy in the quality of results from the western blot, as the lanes of actin do not align with other proteins (Figure 6B).

      We directly count organoids under microscopy as described previously (Igarashi M et.al., Cell.2016 Igarashi M et.al., Aging Cell.2019). When we count the number of organoids, we exactly can discern which are alive or dead organoids under microscope. Hence, we will detail the method and show which are alive or dead organoids using arrows in our revised version (Figure2A and B).

      Moreover, as reviewer1 pointed out, the number of organoids originated from intestinal or colonic crypts can be affected by dead organoids as in Figure2A and 2B. However, almost all colonies from isolated intestinal stem cells (ISCs) (Figure 2C and D) are alive, so the number of colonies are less affected by dead colonies in those experiments using isolated ISCs. Since all organoid data in Figure 3-5 are based on the same method as that of Figure2C and D, the data quality of Figures 3-5 cannot be affected by dead colonies.

      Finally, to improve data quality of Figure6B, we repeated this experiments and replaced it by new figures.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Isotani et al characterizes the hyperproliferation of intestinal stem cells (ISCs) induced by nicotine treatment in vivo. Employing a range of small molecule inhibitors, the authors systematically investigated potential receptors and downstream pathways associated with nicotine-induced phenotypes through in vitro organoid experiments. Notably, the study specifically highlights a signaling cascade involving α7-nAChR/PKC/YAP/TAZ/Notch as a key driver of nicotine-induced stem cell hyperproliferation. Utilizing a Lgr5CreER Apcfl/fl mouse model, the authors extend their findings to propose a potential role of nicotine in stem cell tumorgenesis. The study posits that Notch signaling is essential during this process.

      Strengths and Weaknesses:

      One noteworthy research highlight in this study is the indication, as shown in Figure 2 and S2, that the trophic effect of nicotine on ISC expansion is independent of Paneth cells. In the Discussion section, the authors propose that this independence may be attributed to distinct expression patterns of nAChRs in different cell types. To further substantiate these findings, it is suggested that the authors perform tissue staining of various nAChRs in the small intestine and colon. This additional analysis would provide more conclusive evidence regarding how stem cells uniquely respond to nicotine. It is also recommended to present the staining of α7-nAChR from different intestinal regions. This will provide insights into the primary target sites of nicotine in the gut tract. Additionally, it is recommended that the authors consider rephrasing the conclusion in this section (lines 123-124). The current statement implies that nicotine does not affect Paneth cells, which may be inaccurate based on the suggestion in line 275 that nicotine might influence Paneth cells through α2β4-nAChR. Providing a more nuanced conclusion would better reflect the complexity of nicotine's potential impact on Paneth cells.

      It was difficult to obtain nAchRs antibodies usable in immunostaining. Hence, we instead performed qPCR of nAchRs in ISCs and Paneth cells from isolated whole small intestine (new Figure3C), although we cannot know the difference of the nAchRs expression in different intestinal regions by this method. Although the comparatively high expression was observed in α7-nAChR and α8nAChR in both ISCs and Paneth cells, the significant difference between ISCs and Paneth cells were not observed (Figure3C). 

      Interestingly, nicotine up-regulated only the expression of α7-nAChR in ISCs, suggesting the specifical response of α7-nAChR to nicotine (Figures 3C and D). We paraphrased the conclusion of the paragraph according to reviewer’s suggestion.

      As shown in the same result section, the effect of nicotine on ISC organoid formation appears to be independent of CHIR99021, a Wnt activator. Despite this, the authors suggest a potential involvement of Wnt/β-catenin activation downstream of nicotine in Figure 4F. In the Lgr5CreER Apcfl/fl mouse model, it is known that APC loss results in a constitutive stabilization of β-catenin, thus the hyperproliferation of ISCs by nicotine treatment in this mouse model is likely beyond Wnt activation. Therefore, it is recommended that the authors reconsider the inclusion of Wnt/β-catenin as a crucial signaling pathway downstream of nicotine, given the experimental evidence provided in this study.

      We appreciate for this important suggestion. Certainly, Wnt/β-catenin was activated in Nicotine treated ISCs. However, as reviewer points out, the hyperproliferation of ISCs by nicotine treatment is likely beyond Wnt activation.  According to the reviewer’s suggestion, we removed Wnt/β-catenin as a crucial signaling pathway downstream of nicotine (Figure 5G).

      In Figure 4, the authors investigate ISC organoid formation with a panPKC inhibitor, revealing that PKC inhibition blocks nicotine-induced ISC expansion. It's noteworthy that PKC inhibitors have historically been used successfully to isolate and maintain stem cells by promoting self-renewal. Therefore, it is surprising to observe no effect or reversal effect on ISCs in this context. A previous study demonstrated that the loss of PKCζ leads to increased ISC activity both in vivo and in vitro (DOI: 10.1016/j.celrep.2015.01.007). Additionally, to strengthen this aspect of the study, it would be beneficial for the authors to present more evidence, possibly using different PKC inhibitors, to reproduce the observed results with Gö 6983. This could help address potential concerns or discrepancies and contribute to a more comprehensive understanding of the role of PKC in nicotine-induced ISC expansion.

      Gö 6983 is a pan-PKC inhibitor against for PKCα, PKCβ, PKCγ, PKCδ and PKCζ with IC50 of 7 nM, 7 nM, 6 nM, 10 nM and 60 nM, respectively. Since we used Gö 6983 at the concentration of 10nM in our experiment, we consider PKCζ may not be possible target of nicotine. Additionally, we treated using 5nM Sotrastaurin, another pan-PKC inhibitor, which is supposed not to affect PKCζ. The observed result with Gö 6983 was reproduced by Sotrastaurin (Supplemental Figure 3E).

      An additional avenue that could enhance the clinical relevance of the study is the exploration of human datasets. Specifically, leveraging scRNA-seq datasets of the human intestinal epithelium (DOI: 10.1038/s41586-021-03852-1) could provide valuable insights. Analyzing the expression patterns of nAChRs across diverse regions and cell types in the human intestine may offer a potential clinical implication.

      We analyzed distribution pattern nAChRs of by scRNA-seq datasets of the human intestinal epithelium (DOI: 10.1038/s41586-021-03852-1). In consistent with mouse data (Figure3C), the expression of human α7-nAChR is higher than that of other nAChRs. The difference of the expression between ISCs and Paneth cells is not clear as in that of mouse (Supplemental Figure4A and B). From mouse and human data, we speculate the induction of specific nAChR by nicotine is essence of ISC response to nicotine, rather than the distribution of nAChRs.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript could benefit from addressing a few minor points to enhance its quality before publication:

      (1) Ensure all images are presented in higher resolution to improve visual clarity.

      We replaced all images by those with higher resolution.

      (2) Quantify Western blot results accurately for rigor and precision in data representation.

      We quantified all blots.

      (3) Include error bars in control groups where missing, particularly in Figures 3C and 4D, to enhance data interpretation.

      We included error bars in control groups in new Figure 3C and 4D.

      (4) The layout of Figure S3B, S4A and S4B should be corrected.

      We corrected the layout of those Figures.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Petty and Bruno investigate how response characteristics in the higher-order thalamic nuclei POm (typically somatosensory) and LP (typically visual) change when a stimulus (whisker air puff or visual drifting grating) of one or the other modality is conditioned to a reward. Using a two-step training procedure, they developed an elegant paradigm, where the distractor stimulus is completely uninformative about the reward, which is reflected in the licking behavior of trained mice. While the animals seem to take on to the tactile stimulus more readily, they can also associate the reward with the visual stimulus, ignoring tactile stimuli. In trained mice, the authors recorded single-unit responses in both POm and LP while presenting the same stimuli. The authors first focused on POm recordings, finding that in animals with tactile conditioning POm units specifically responded to the air puff stimulus but not the visual grating. Unexpectedly, in visually conditioned animals, POm units also responded to the visual grating, suggesting that the responses are not modality-specific but more related to behavioral relevance. These effects seem not be homogeneously distributed across POm, whereas lateral units maintain tactile specificity and medial units respond more flexibly. The authors further ask if the unexpected cross-modal responses might result from behavioral activity signatures. By regressing behavior-coupled activity out of the responses, they show that late activity indeed can be related to whisking, licking, and pupil size measures. However, cross-modal short latency responses are not clearly related to animal behavior. Finally, LP neurons also seem to change their modality-specificity dependent on conditioning, whereas tactile responses are attenuated in LP if the animal is conditioned to visual stimuli.

      The authors make a compelling case that POm neurons are less modality-specific than typically assumed. The training paradigm, employed methods, and analyses are mostly to the point, well supporting the conclusions. The findings importantly widen our understanding of higher-order thalamus processing features with the flexibility to encode multiple modalities and behavioral relevance. The results raise many important questions on the brain-wide representation of conditioned stimuli. E.g. how specific are the responses to the conditioned stimuli? Are thalamic cross-modal neurons recruited for the specific conditioned stimulus or do their responses reflect a more global shift of attention from one modality to another? 

      To elaborate on higher-order thalamic activity in relationship to conditioned behavior, a trialby-trial analysis would be very useful. Is neuronal activity predictive of licking and at which relative timing? 

      To elaborate on the relationship between neuronal activity and licking, we have created a new supplementary figure (Figure S1), where we present the lick latency of each mouse on the day of recording. We also perform more in-depth analysis of neural activity that occurs before lick onset, which is presented in a new main figure (new Figure 4). 

      Furthermore, I wonder why the (in my mind) major and from the data obvious take-away, "POm neurons respond more strongly to visual stimuli if visually conditioned", is not directly tested in the summary statistics in Figure 3h.

      We have added a summary statistic to Figure 3h and to the Results section (lines 156-157) comparing the drifting grating responses in visually and tactilely conditioned mice.  

      The remaining early visual responses in POm in visually conditioned mice after removing behavior-linked activity are very convincing (Figure 5d). It would help, however, to see a representation of this on a single-neuron basis side-by-side. Are individual neurons just coupled to behavior while others are independent, or is behaviorally coupled activity a homogeneous effect on all neurons on top of sensory activity?

      In lieu of a new figure, we have performed a new analysis of individual neurons to classify them as “stimulus tuned” and/or “movement tuned.” We find that nearly all POm cells encode movement and arousal regardless of whether they also respond to stimuli. This is presented in the Results under the heading “POm correlates with arousal and movement regardless of conditioning” (Lines 219-231).

      The conclusions on flexible response characteristics in LP in general are less strongly supported than those in POm. First, the differentiation between POm and LP relies heavily on the histological alignment of labeled probe depth and recording channel, possibly allowing for wrong assignment. 

      We appreciate the importance in differentiating between POm, LP, and surrounding regions to accurately assign a putative cell to a brain region. The method we employed (aligning an electrode track to a common reference atlas) is widely used in rodent neuroscience, especially in regions like POm and LP which are difficult to differentiate molecularly (for example, see Sibille, Nature Communications, 2022; and Schröder, Neuron, 2020). 

      Furthermore, it seems surprising, but is not discussed, that putative LP neurons have such strong responses to the air puff stimuli, in both conditioning cases. In tactile conditioning, LP air puff responses seem to be even faster and stronger than POm. In visual conditioning, drifting grating responses paradoxically seem to be later than in tactile conditioning (Fig S2e). These differences in response changes between POm and LP should be discussed in more detail and statements of "similar phenomena" in POm and LP (abstract) should be qualified.  

      We have further developed our analysis and discussion of LP activity. Our analysis of LP stimulus response latencies are now presented in greater detail in Figure S3, and we have expanded the results section accordingly (lines 266-275). We have also expanded the discussion section to both address these new analyses and speculate on what might drive these surprising “tactile responses” in LP.

      Reviewer #2 (Public Review): 

      Summary  

      This manuscript by Petty and Bruno delves into the still poorly understood role of higherorder thalamic nuclei in the encoding of sensory information by examining the activity in the Pom and LP cells in mice performing an associative learning task. They developed an elegant paradigm in which they conditioned head-fixed mice to attend to a stimulus of one sensory modality (visual or tactile) and ignore a second stimulus of the other modality. They recorded simultaneously from POm and LP, using 64-channel electrode arrays, to reveal the contextdependency of the firing activity of cells in higher-order thalamic nuclei. They concluded that behavioral training reshapes activity in these secondary thalamic nuclei. I have no major concerns with the manuscript's conclusions, but some important methodological details are lacking and I feel the manuscript could be improved with the following revisions.

      Strengths 

      The authors developed an original and elegant paradigm in which they conditioned headfixed mice to attend to a stimulus of one sensory modality, either visual or tactile, and ignore a second stimulus of the other modality. As a tactile stimulus, they applied gentle air puffs on the distal part of the vibrissae, ensuring that the stimulus was innocuous and therefore none aversive which is crucial in their study. 

      It is commonly viewed that the first-order thalamus performs filtering and re-encoding of the sensory flow; in contrast, the computations taking place in high-order nuclei are poorly understood. They may contribute to cognitive functions. By integrating top-down control, high-order nuclei may participate in generating updated models of the environment based on sensory activity; how this can take place is a key question that Petty and Bruno addressed in the present study.

      Weaknesses  

      (1) Overall, methods, results, and discussion, involving sensory responses, especially for the Pom, are confusing. I have the feeling that throughout the manuscript, the authors are dealing with the sensory and non-sensory aspects of the modulation of the firing activity in the Pom and LP, without a clear definition of what they examined. Making subsections in the results, or a better naming of what is analyzed could convey the authors' message in a clearer way, e.g., baseline, stim-on, reward.  

      We thank Reviewer 2 for this suggestion. We have adjusted the language throughout the paper to more clearly state which portions of a given trial we analyzed. We now consistently refer to “baseline,” “stimulus onset,” and “stimulus offset” periods. 

      In line #502 in Methods, the authors defined "Sensory Responses. We examined each cell's putative sensory response by comparing its firing rate during a "stimulus period" to its baseline firing rate. We first excluded overlapping stimuli, defined as any stimulus occurring within 6 seconds of a stimulus of a different type. We then counted the number of spikes that occurred within 1 second prior to the onset of each stimulus (baseline period) and within one second of the stimulus onset (stimulus period). The period within +/-50ms of the stimulus was considered ambiguous and excluded from analysis." 

      Considering that the responses to whisker deflection, while weak and delayed, were shown to occur, when present, before 50 ms in the Pom (Diamond et al., 1992), it is not clear what the authors mean and consider as "Sensory Responses"? 

      We have addressed this important concern in three ways. First, we have reanalyzed our data to include the 50ms pre- and post-stimulus time windows that were previously excluded. This did not qualitatively change our results, but updated statistical measurements are reflected in the Results and the legends of figures 3 and 7. Second, we have created a new figure (new Figure 4) which provides a more detailed analysis of early POm stimulus responses at a finer time scale. Third, we have amended the language throughout the paper to refer to “stimulus responses” rather than “sensory responses” to reflect how we cannot disambiguate between bottom-up sensory input and top-down input into POm and LP with our experimental setup. We refer only to “putative sensory responses” when discussing lowlatency (<100ms) stimulus responses.

      Precise wording may help to clarify the message. For instance, line #134: "Of cells from tactilely conditioned mice, 175 (50.4%) significantly responded to the air puff, as defined by having a firing rate significantly different from baseline within one second from air puff onset (Figure 3d, bottom)", could be written "significantly responded to the air puff" should be written "significantly increased (or modified if some decreased) their firing rate within one second after the air puff onset (baseline: ...)". This will avoid any confusion with the sensory responses per se.

      We have made this specific change suggested by the reviewer (lines 145-146) and made similar adjustments to the language throughout the manuscript to better communicate our analysis methods. 

      (2) To extend the previous concern, the latency of the modulation of the firing rate of the Pom cells for each modality and each conditioning may be an issue. This latency, given in Figure S2, is rather long, i.e. particularly late latencies for the whisker system, which is completely in favor of non-sensory "responses" per se and the authors' hypothesis that sensory-, arousal-, and movement-evoked activity in Pom are shaped by associative learning. Latency is a key point in this study. 

      Therefore, 

      - latencies should be given in the main text, and Figure S2 could be considered for a main figure, at least panels c, d, and e, could be part of Figure 3. 

      - the Figure S2b points out rather short latency responses to the air puff, at least in some cells, in addition to late ones. The manuscript would highly benefit from an analysis of both early and late latency components of the "responses" to air puffs and drafting grating in both conditions. This analysis may definitely help to clarify the authors' message. Since the authors performed unit recordings, these data are accessible.

      - it would be highly instructive to examine the latency of the modulation of Pom cells firing rate in parallel with the onset of each behavior, i.e. modification of pupil radius, whisking amplitude, lick rate (Figures 1e, g and 3a, b). The Figure 1 does not provide the latency of the licks in conditioned mice.

      - the authors mention in the discussion low-latency responses, e.g., line #299: "In both tactilely and visually conditioned mice, movement could not explain the increased firing rate at air puff onset. These low-latency responses across conditioning groups is likely due in part to "true" sensory responses driven by S1 and SpVi."; line #306: "Like POm, LP displayed varied stimulus-evoked activity that was heavily dependent on conditioning. LP responded to the air puff robustly and with low latency, despite lacking direct somatosensory inputs."  But which low-latency responses do the authors refer to? Again, this points out that a robust analysis of these latencies is missing in the manuscript but would be helpful to conclude.

      We have moved our analysis of stimulus response latency in POm to new Figure 4 in the main text and have expanded both the Results and Discussion sections accordingly. We have also analyzed the lick latency on the day of recording, included in a new supplemental Figure S1. 

      (3) Anatomical locations of recordings in the dorsal part of the thalamus. Line #122 "Our recordings covered most of the volume of POm but were clustered primarily in the anterior and medial portions of LP (Figure 2d-f). Cells that were within 50 µm of a region border were excluded from analysis." 

      How did the authors distinguish the anterior boundary of the LP with the LD nucleus just more anterior to the LP, another higher-order nucleus, where whisker-responsive cells have been isolated (Bezdudnaya and Keller, 2008)? 

      Cells within 50µm of any region boundary were excluded, including those at the border of LP and LD. We also reviewed our histology images by eye and believe that our recordings were all made posterior of LD. 

      (4) The mention in the Methods about the approval by an ethics committee is missing.  All the surgery (line #381), i.e., for the implant, the craniotomy, as well as the perfusion, are performed under isoflurane. But isoflurane induces narcosis only and not proper anesthesia. The mention of the use of analgesia is missing. 

      We thank Reviewer 2 for drawing our attention to this oversight. All experiments were conducted under the approval of the Columbia University IACUC. Mice were treated with the global analgesics buprenorphine and carprofen, the local analgesic bupivacaine, and anesthetized with isoflurane during all surgical procedures. We have amended the Methods section to include this information (Lines 458-470).

      Reviewer #3 (Public Review): 

      Petty and Bruno ask whether activity in secondary thalamic nuclei depends on the behavioral relevance of stimulus modality. They recorded from POm and LP, but the weight of the paper is skewed toward POm. They use two cohorts of mice (N=11 and 12), recorded in both nuclei using multi-electrode arrays, while being trained to lick to either a tactile stimulus (air puff against whiskers, first cohort) or a visual stimulus (drifting grating, second cohort), and ignore the respective other. They find that both nuclei, while primarily responsive to their 'home' modality, are more responsive to the relevant modality (i.e. the modality predicting reward). 

      Strengths: 

      The paper asks an important question, it is timely and is very well executed. The behavioral method using a delayed lick index (excluding impulsive responses) is well worked out. Electrophysiology methods are state-of-the-art with information about spike quality in Figure S1. The main result is novel and important, convincingly conveying the point that encoding of secondary thalamic nuclei is flexible and clearly includes aspects of the behavioral relevance of a stimulus. The paper explores the mapping of responses within POm, pointing to a complex functional structure, something that has been reported/suggested in earlier studies. 

      Weaknesses: 

      Coding: It does not become clear to which aspect of the task POm/LP is responding. There is a motor-related response (whisking, licking, pupil), which, however, after regressing it out leaves a remaining response that the authors speculate could be sensory.

      Learning: The paper talks a lot about 'learning', although it is only indirectly addressed. The authors use two differently (over-)trained mice cohorts rather than studying e.g. a rule switch in one and the same mouse, which would allow us to directly assess whether it is the same neurons that undergo rule-dependent encoding. 

      We disagree that our animals are “overtrained,” as every mouse was fully trained within 13 days. We agree that it would be interesting to study a rule-switch type experiment, but such an experiment is not necessary to reveal the profound effect that conditioning has on stimulus responses in POm and LP. 

      Mapping: The authors treat and interpret the two nuclei very much in the same vein, although there are clear differences. I would think these differences are mentioned in passing but could be discussed in more depth. Mapping using responses on electrode tracks is done in POm but not LP.

      The mapping of LP responses by anatomical location is presented in the supplemental Figure S4 (previously S3). We have expanded our discussion of LP and how it might differ from POm.

      Reviewer #1 (Recommendations For The Authors):  

      Minor writing issues: 

      122 ...67 >LP< cells?

      301 plural "are”

      We have fixed these typos.

      Figure issues

      *  3a,b time ticks are misaligned and the grey bar (bottom) seems not to align with the visual/tactile stimulus shadings.

      *  legend to Figure 3b refers to Figure 1c which is a scheme, but if 1g is meant, this mouse does not seem to have a session 12? 

      *  3c,e time ticks slightly misaligned. 

      *  5e misses shading for the relevant box plots, assuming it should be like Figure 3h.  

      We thank Reviewer 1 for pointing out these errors. We have adjusted Figures 1, 3, and 5 accordingly.

      Analyses 

      I am missing a similar summary statistics for LP as in Figure 3h 

      We have added a summary box chart of LP stimulus responses (Figure 7g), similar to that of POm in Figure 3. We have also performed similar statistical analyses, the results of which are presented in the legend for Figure 7. 

      Reviewer #2 (Recommendations For The Authors): 

      More precisions are required for the following points: 

      (1) The mention of the use of analgesia is missing and this is not a minor concern. Even if the recordings are performed 24 hours after the surgery for the craniotomy and screw insertion and several days after the main surgery for the implant, taking into account the pain of the animals during surgeries is crucial first for ethical reasons, and second because it may affect the data, especially in Pom cells: pain during surgery may induce the development of allodynia and/or hyperalgesia phenomenae and Pom responses to sensory stimuli were shown to be more robust in behavioral hyperalgesia (Masri et al., 2009).  

      We neglected to include details on the analgesics used during surgery and post-operation recovery in our original manuscript. Mice were administered buprenorphine, carprofen, and bupivacaine immediately prior to the head plate surgery and were treated with additional carprofen during recovery. Mice were similarly treated with analgesics for the craniotomy procedure. Mice were carefully observed after craniotomy, and we saw no evidence of pain or discomfort. Furthermore, mice performed the behavior at the same level pre- and postcraniotomy (now presented in Figure 1j), which also indicates that they were not in any pain. 

      (2) The head-fixed preparation is only poorly described.

      Line #414: "Prior to conditioning, mice were habituated to head fixation and given ad libitum water in the behavior apparatus for 15-25 minutes." 

      And line #425 "Mice were trained for one session per day, with each session consisting of an equal number of visual stimuli and air puffs. Sessions ranged from 20-60 minutes and about 40-120 of each stimulus. " 

      More details should be given about the head-fixation training protocol. Are 15-25 minutes the session time duration, 60 minutes, or other time duration? How long does it take to get mice well trained to the head fixation, and on which criteria?  

      Line #389: "Mice were then allowed to recover for 24 hours, after which the sealant was removed and recordings were performed. At the end of experiments,"

      The timeline is not clear: is there one day or several days of recordings? 

      We have expanded on our description of the head fixation protocol in the Methods. We describe in more detail how mice were habituated to head fixation, the timing of water restriction, and the start of conditioning/training (Habituation and Conditioning, lines 492-500).

      (4) Line #411: "Mice were deprived of water 3 days prior to the start of conditioning" followed by line #414 "Prior to conditioning, mice were habituated to head fixation and given ad libitum water in the behavior apparatus for 15-25 minutes".

      If I understood correctly, the mice were then not fully water-deprived for 3 days since they received water while head-fixed. This point may be clarified. 

      We addressed these concerns in the changes to the Methods section mentioned in the preceding point (3).

      (5) Line #157: "Modality selectivity varies with anatomical location in Pom" while the end of the previous paragraph is "This suggests that POm encoding of reward and/or licking is insensitive to task type, an observation we examine further below."

      The authors then come to anatomical concerns before coming back to what the Pom may encode in the following section. This makes the story quite confusing and hard to follow even though pretty interesting.  

      We have reordered our Figures and Results to improve the flow of the paper and remove this point of confusion. We now present results on the encoding of movement before analyzing the relationship between POm stimulus responses and anatomical location. What was old Figure 5 now precedes what was old Figure 4.

      (6) Licks Analysis. Line #99 "However, this mouse also learned that the air puff predicted a lack of reward in the shaping task, as evidenced by withholding licking upon the onset of the air puff. The mouse thus displayed a positive visual lick index and a negative tactile lick index, suggesting that it attended to both the tactile and visual stimuli (Figure 1f, middle arrow)."

      Line #105 "All visually conditioned mice exhibited a similar learning trajectory (Figure 1i left, 1j left)". 

      Interestingly, the authors revealed that mice withheld licking upon the onset of the air puff in the visual conditioning, which they did not do at the onset of the drifting grating in the tactile conditioning. This withholding was extinguished after the 8th session, which the authors interpret as the mice finally ignoring the air puff. Is this effect significant, is there a significant withholding licking upon the onset of the air puff on the 12 tested mice? 

      The withholding of licking was significant (assessed with a sign-rank test) in visually conditioned mice prior to switching to the full version of the task. Indeed, it was the abolishment of this effect after conditioning with the full version of the task that was our criterion for when a mouse was fully trained. We have elaborated on this in the Habituation and Conditioning section in the Methods.

      (1) Throughout the manuscript "Touch" is used instead of passive whisker deflection, and may be confusing with "active touch" for the whisker community readers. I recommend avoiding using "touch" instead of "passive whisker deflection".

      We appreciate that “touch” can be an ambiguous term in some contexts. However, we have limited our use of the word to refer to the percept of whisker deflection; we do not describe the air puff stimulus as a “touch.” We respectfully would like to retain the use of the word, as it is useful for comparing somatosensory stimuli to visual stimuli.

      (2) Line #395: "Air puffs (0.5-1 PSI) were delivered through a nozzle (cut p1000 pipet tip, approximately 3.5mm diameter aperture)".

      Are air puffs of <1 PSI applied, not <1 bar?  

      We thank Reviewer 3 for pointing out this inaccuracy. The air puffs were indeed between 0.5 and 1 bar, not PSI. We have addressed this in the Methods.

      (3) Line #441: "In the full task, the stimuli and reward were identical, but stimuli were presented at uncorrelated and less predictable intervals."  Do the authors mean that all stimuli are rewarded?  

      The stimuli and reward were identical between the shaping and full versions of the task. In the full version of the task, the unrewarded stimulus was truly uncorrelated with reward, rather than anticorrelated. 

      (4) Line #445 "for a mean ISI of 20 msec." ISI is not defined, I guess that it means interstimulus interval. Even if pretty obvious, to avoid any confusion for future readers, I would recommend using another acronym, especially in a manuscript about electrophysiology, since ISI is a dedicated acronym for inter-spike interval. 

      We have defined the acronym ISI as “inter-stimulus interval” when first introduced in the results (Line 82) and in the Methods (Line 511).

      (5) Line #416 "In the first phase of conditioning ("shaping"), mice were separated into two cohorts: a "tactile" cohort and a "visual" cohort. Mice were presented with tactile stimuli (a two-second air puff delivered to the distal whisker field) and visual stimuli (vertical drifting grating on a monitor). Throughout conditioning, mice were monitored via webcam to ensure that the air puff only contacted the whiskers and did not disturb the facial fur nor cause the mouse to blink, flinch, or otherwise react - ensuring the stimulus was innocuous. The stimulus types were randomly ordered. In the visual conditioning cohort, the visual stimulus was paired with a water reward (8-16µL) delivered at the time of stimulus offset. In the tactile conditioning cohort, the reward was instead paired with the offset of the air puff. Regardless of the type of conditioning, stimulus type was a balanced 50:50 with an inter-stimulus interval of 8-12 seconds (uniform distribution)." 

      The mention of the "full version of the task" will be welcome in this paragraph to clarify what the task is for the mouse in the Methods part.

      We have more clearly defined the full version of the task in a later paragraph (line 506). We believe this addresses the potential confusion caused by the original description of the conditioning paradigm. 

      (6) Line #467: "Units were assigned to the array channel on which its mean waveform was largest". 

      Should it read mean waveform "amplitude"? 

      This is correct, we have adjusted the statement accordingly. 

      (7) Line #482 "The eye camera was positioned on the right side of the face and recorded at 60 fps." Then line #487 "The trace of pupil radius over time was smoothed over 5 frames (8.3 msec).” 5 frames, with a 60fps, represent then 83 ms and not 8.3 ms.

      We have corrected this error.  

      (8) Line #121: "257 POm cells and 67 cells from 12 visually conditioned mice" 

      67 LP cells, LP is missing 

      We have corrected this error. 

      (9) Line #354: "A consistent result of attention studies in humans and nonhuman primates is the enhancement of cortical and thalamic sensory responses to an attended visual stimuli. Here, we show not just enhancement of sensory responses to stimuli within a single modality, but also across modalities. It is worth investigating further how secondary thalamus and high-order sensory cortex encode attention to stimuli outside of their respective modalities. Our surprising conclusion that the nuclei are equivalently activated by behaviorally relevant stimuli is nevertheless compatible with these previous studies."  Since higher-order thalamic nuclei are integrative centers of many cortical and subcortical inputs, they cannot be viewed simply as relay nuclei, and there is therefore no "surprising" conclusion in these results. Not surprising, but still an elegant demonstration of the contextdependent activity/responses of the Pom/LP cells. 

      We disagree. Visual stimuli activating strong POm responses and tactile stimuli activating strong LP responses - however they do it - is a surprising result. We agree that higher-order thalamic nuclei are integrative centers, but exactly what they integrate and what the integrated output means is still poorly understood.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The models described are not fundamentally novel, essentially a random intercept model (with a warping function), and some flexible covariate effects using splines (i.e., additive models).

      We respectfully but strongly disagree with the reviewer’s assessment of the novelty of our work. The models referred to by the reviewer as “random intercept models … and some flexible covariate effects” seem to relate to the estimation of normative models derived cross-sectionally as developed in and adopted from previous work, not to the work presented here. To be clear, the contributions of this work are: (i) a principled methodology to make statistical predictions for individual subjects in longitudinal studies based on a novel z-diff score, (ii) an approach to transfer information large scale normative models estimated on large scale cross-sectional data to longitudinal studies (iii) an extensive theoretical analysis of the properties of this approach and (iv) empirical evaluation on an unpublished psychosis dataset. Put simply, we provide the ability to estimate within subject change in normative models which until now only provide the ability to show a subject's position in the normative range at a given timepoint. With the exception of the reference [13] cited in the main text, we are not aware of any methods available that can achieve this. Based on this feedback combined with the feedback of the Reviewer 2, we now improved our introduction and clearly state our contribution right from the outset of the manuscript whilst also shortening the introduction to make it more concise. In this work, we are trying to be very transparent in showing to the reader that our method builds on a previously peer-reviewed model.

      The assumption of constant quantiles is very strong, and limits the utility of the model to very short term data.

      We now provide an extensive theoretical analysis of our approach (section 2.1.3), where we show that this assumption is actually not strictly necessary and that our approach yields valid inferences even under much milder assumptions. More specifically, we first provide a mathematical grounding for the assumption we made in the initial submission, then generalise our method to a wider class of residual processes and show that our original assumption of constant quantiles is not too restrictive. We also provide a simulation study to show how the practitioner can evaluate the validity and implications of this assumption on a case-by-case basis. This generalisation is described in depth in section 2.1.3.

      The schizophrenia example leads to a counter-intuitive normalization of trajectories, which leads to suspicions that this is driven by some artifact of the data modeling/imaging pipelines.

      We understand that the observed normalisation effects might appear surprising. As we outlined in our provisional response, we would like to emphasise that there is increasing evidence that the old neurodegenerative view of psychosis is an oversimplification and that trajectories of cortical thickness are highly variable across different individuals after the first psychotic episode. More specifically, we have shown in an independent sample and with different methodology that individuals treated with second-generation antipsychotics and with careful clinical follow-up can show normalisation of cortical thickness atypicalities after the first episode (https://www.medrxiv.org/content/10.1101/2024.04.19.24306008v2, now accepted in Schizophrenia Bulletin). These results are well-aligned with the results we show in this manuscript. We now added remarks on this topic into the discussion. We would also like to re-emphasise that the data were processed with the utmost rigour using state of the art processing pipelines including quality control, which we have reported as transparently as possible. The confidence that the results are not ‘driven by some artifact of the data modeling/imaging pipelines’ is also supported by the fact that analysis of a group of healthy controls did not show any significant z-diffs (see Discussion section), neither frontally nor elsewhere. If the reviewer believes there are additional quality control checks that would further increase confidence in our findings, we would welcome the reviewer to provide specific details.

      The method also assumes that the cross-sectional data is from a "healthy population" without describing what this population is (there is certainly every chance of ascertainment bias in large scale studies as well as small scale studies). This issue is completely elided over in the manuscript.

      Indeed, we do not describe the cross-sectional population used for training the models, as these models were already trained and published with in-depth description of the datasets used for the training (https://elifesciences.org/articles/72904). We now make this more explicit in the section 2.1.1. of the manuscript (page 7), and also more explicitly acknowledge the possibility of ascertainment bias in the simulation section 2.1.4. However, we would like to emphasise that such ascertainment bias is not in any way specific to the analyses we report. In fact it is present in all studies that utilise large scale cohorts such as UK Biobank. Indeed, we are currently working on another manuscript to address this question in detail, but given the complexity of this problem and the fact that many publicly available legacy studies simply do not record sufficient demographic information, e.g. to assess racial bias properly, we believe that this is beyond the scope of the current work.

      Reviewer #2 (Public Review):

      The organization and clarity of this manuscript need enhancement for better comprehension and flow. For example, in the first few paragraphs of the introduction, the wording is quite vague. A lot of information was scattered and repeated in the latter part of the introduction, and the actual challenges/motivation of this work were not introduced until the 5th paragraph.

      As noted above in our response to Reviewer 1, we significantly pruned the introduction, stating our objective in the first paragraph and elaborating on the topic later in the text. We hope that it is now less repetitive and easier to follow.

      There are no simulation studies to evaluate whether the adjustment of the crosssectional normative model to longitudinal data can make accurate estimations and inferences regarding the longitudinal changes. Also, there are some assumptions involved in the modeling procedure, for example, the deviation of a healthy control from the population over time is purely caused by noise and constant variability of error/noise across x_n, and these seem to be quite strong assumptions. The presentation of this work's method development would be strengthened if the authors can conduct a formal simulation study to evaluate the method's performance when such assumptions are violated, and, ideally, propose some methods to check these assumptions before performing the analyses.

      This comment encouraged us to zoom out from our original assumption and generalise our method to a wider class of residual processes (stationary Gaussian processes) in section 2.1.3. We now present a theoretical analysis of our model to show that our original assumption (of stable quantiles plus noise) is actually not necessary for valid inference in our method, which broadens the applicability of our method. Of course, we also discuss in what way the original assumption is restrictive and how it aligns with the more general dynamics. We also include a simulation study to evaluate the method's performance and elucidate the role of the more general dynamics in section 2.1.4.

      The proposed "z-diff score" still falls in the common form of z-score to describe the individual deviation from the population/reference level, but now is just specifically used to quantify the deviation of individual temporal change from the population level. The authors need to further highlight the difference between the "z-score" and "z-diff score", ideally at its first mention, in case readers get confused (I was confused at first until I reached the latter part of the manuscript). The z-score can also be called a measure of "standardized difference" which kind of collides with what "z-diff" implies by its name.

      We added the mention of the difference between z-score and z-diff score into the last paragraph of introduction.

      Explaining that one component of the variance is related to the estimation of the model and the other is due to prediction would be helpful for non-statistical readers.

      We now added an interpretation of the z-score in the original model below equation 7.

      It would be easier for the non-statistical reader if the authors consistently used precision or variance for all variance parameters. Probably variance would be more accessible.

      This was a very useful observation, we unified the notation and now only use variance.

      The functions psi were never explicitly described. This would be helpful to have in the supplement with a reference to that in the paper.

      Indeed, while describing the original model we had to make choices about how to condense the necessary information from the original model so that we can build upon it. As the phi function is only used for data transformation in the original model, we did not further elaborate on it, however, we now refer to the specific section of the original paper of Fraza et al. 2021 where it is described more in detail (https://www.sciencedirect.com/science/article/pii/S1053811921009873).

      What is the goal of equations (13) and (14)? The authors should clarify what the point of writing these equations is prior to showing the math. It seems like it is to obtain an estimate of \sigma_{\ksi}^2, which the reader only learns at the end.

      We corrected the formatting.

      What is the definition of "adaption" as used to describe equation (15)? In this equation, I think norm on subsample was not defined.

      We added a more detailed description of the adaptation after equation 15.

      "(the sandwich part with A)" - maybe call this an inner product so that it is not confused with a sandwich variance estimator. This is a bit unclear. Equation (8) does have the inner product involving A and \beta^{-1} does include variability of \eta. It seems like you mean that equation (8) incorrectly includes variability of \eta and does not have the right term vector component of the inner product involving A, but this needs clarifying.

      We now changed the formulation to be less confusing and also explicitly clarified the caveat regarding the difference of z-scores.

      One challenge with the z-diff score is that it does not account for whether a person sits above or below zero at the first time point. It might make it difficult to interpret the results, as the results for a particular pathology could change depending on what stage of the lifespan a person is in. I am not sure how the authors would address those challenges.

      We agree with the outlined limitation in interpretation of overall trends when the position in the visit one is different between the subjects. However, this is a much broader challenge and is not specific to our approach. This effect is generally independent of the lifespan, but may further interact with the typical lifespan of disease. rWhen the z scores are taken in the context of the cross-sectional normative models, it does make it possible to identify what the overall trend of an illness is across the lifespan, and individual patient’s z-diffs not in line (with what would this typical group trajectory predicts) may e.g. correspond to early/late onset of their individual atrophy. We now make these considerations explicitly in the discussion section.

      Reviewer #2 (Recommendations For The Authors):

      Other minor suggestions to help improve the text:...

      We thank Reviewer #2 for the list of minor suggestions to improve the text, which we all implemented in the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This important work presents a new methodology for the statistical analysis of fiber photometry data, improving statistical power while avoiding the bias inherent in the choices that are necessarily made when summarizing photometry data. The reanalysis of two recent photometry data sets, the simulations, and the mathematical detail provide convincing evidence for the utility of the method and the main conclusions, however, the discussion of the re-analyzed data is incomplete and would be improved by a deeper consideration of the limitations of the original data. In addition, consideration of other data sets and photometry methodologies including non-linear analysis tools, as well as a discussion of the importance of the data normalization are needed.

      Thank you for reviewing our manuscript and giving us the opportunity to respond and improve our paper. In our revision, we have strived to address the points raised in the comments, and implement suggested changes where feasible. We have also improved our package and created an analysis guide (available on our Github - https://github.com/gloewing/fastFMM and https://github.com/gloewing/photometry_fGLMM), showing users how to apply our methods and interpret their results. Below, we provide a detailed point-by-point response to the reviewers.

      Reviewer #1:

      Summary:

      Fiber photometry has become a very popular tool in recording neuronal activity in freely behaving animals. Despite the number of papers published with the method, as the authors rightly note, there are currently no standardized ways to analyze the data produced. Moreover, most of the data analyses confine to simple measurements of averaged activity and by doing so, erase valuable information encoded in the data. The authors offer an approach based on functional linear mixed modeling, where beyond changes in overall activity various functions of the data can also be analyzed. More in-depth analysis, more variables taken into account, and better statistical power all lead to higher quality science.

      Strengths:

      The framework the authors present is solid and well-explained. By reanalyzing formerly published data, the authors also further increase the significance of the proposed tool opening new avenues for reinterpreting already collected data.

      Thank you for your favorable and detailed description of our work!

      Weaknesses:

      However, this also leads to several questions. The normalization method employed for raw fiber photometry data is different from lab to lab. This imposes a significant challenge to applying a single tool of analysis.

      Thank you for these important suggestions. We agree that many data pre-processing steps will influence the statistical inference from our method. Note, though, that this would also be the case with standard analysis approaches (e.g., t-tests, correlations) applied to summary measures like AUCs. For that reason, we do not believe that variability in pre-processing is an impediment to widespread adoption of a standard analysis procedure. Rather, we would argue that the sensitivity of analysis results to pre-processing choices should motivate the development of statistical techniques that reduce the need for pre-processing, and properly account for structure in the data arising from experimental designs. For example, even without many standard pre-processing steps, FLMM provides smooth estimation results across trial timepoints (i.e., the “functional domain”), has the ability to adjust for betweentrial and -animal heterogeneity, and provides a valid statistical inference framework that quantifies the resulting uncertainty. We appreciate the reviewer’s suggestion to emphasize and further elaborate on our method from this perspective. We have now included the following in the Discussion section:

      “FLMM can help model signal components unrelated to the scientific question of interest, and provides a systematic framework to quantify the additional uncertainty from those modeling choices. For example, analysts sometimes normalize data with trial-specific baselines because longitudinal experiments can induce correlation patterns across trials that standard techniques (e.g., repeated measures ANOVA) may not adequately account for. Even without many standard data pre-processing steps, FLMM provides smooth estimation results across trial time-points (the “functional domain”), has the ability to adjust for between-trial and -animal heterogeneity, and provides a valid statistical inference approach that quantifies the resulting uncertainty. For instance, session-to-session variability in signal magnitudes or dynamics (e.g., a decreasing baseline within-session from bleaching or satiation) could be accounted for, at least in part, through the inclusion of trial-level fixed or random effects. Similarly, signal heterogeneity due to subject characteristics (e.g., sex, CS+ cue identity) could be incorporated into a model through inclusion of animal-specific random effects. Inclusion of these effects would then influence the width of the confidence intervals. By expressing one’s “beliefs” in an FLMM model specification, one can compare models (e.g., with AIC). Even the level of smoothing in FLMM is largely selected as a function of the data, and is accounted for directly in the equations used to construct confidence intervals. This stands in contrast to “trying to clean up the data” with a pre-processing step that may have an unknown impact on the final statistical inferences.”

      Does the method that the authors propose work similarly efficiently whether the data are normalized in a running average dF/F as it is described in the cited papers? For example, trace smoothing using running averages (Jeong et al. 2022) in itself may lead to pattern dilution.

      By modeling trial signals as “functions”, the method accounts for and exploits correlation across trial timepoints and, as such, any pre-smoothing of the signals should not negatively affect the validity of the 95% CI coverage. It will, however, change inferential results and the interpretation of the data, but this is not unique to FLMM, or many other statistical procedures.

      The same question applies if the z-score is calculated based on various responses or even baselines. How reliable the method is if the data are non-stationery and the baselines undergo major changes between separate trials?

      Adjustment for trial-to-trial variability in signal magnitudes or dynamics could be accounted for, at least in part, through the inclusion of trial-level random effects. This heterogeneity would then influence the width of the confidence intervals, directly conveying the effect of the variability on the conclusions being drawn from the data. This stands in contrast to “trying to clean up the data” with a pre-processing step that may have an unknown impact on the final statistical inferences. Indeed, non-stationarity (e.g., a decreasing baseline within-session) due to, for example, measurement artifacts (e.g., bleaching) or behavioral causes (e.g., satiation, learning) should, if possible, be accounted for in the model. As mentioned above, one can often achieve the same goals that motivate pre-processing steps by instead applying specific FLMM models (e.g., that include trial-specific intercepts to reflect changes in baseline) to the unprocessed data. One can then compare model criteria in an objective fashion (e.g., with AIC) and quantify the uncertainty associated with those modeling choices. Even the level of smoothing in FLMM is largely selected as a function of the data, and is accounted for directly in the equations used to construct confidence intervals. In sum, our method provides both a tool to account for challenges in the data, and a systematic framework to quantify the additional uncertainty that accompanies accounting for those data characteristics.

      Finally, what is the rationale for not using non-linear analysis methods? Following the paper’s logic, non-linear analysis can capture more information that is diluted by linear methods.

      This is a good question that we imagine many readers will be curious about as well. We have added in notes to the Discussion and Methods Section 4.3 to address this (copied below). We thank the reviewer for raising this point, as your feedback also motivated us to discuss this point in Part 5 of our Analysis Guide.

      Methods

      “FLMM models each trial’s signal as a function that varies smoothly across trial time-points (i.e., along the “functional domain”). It is thus a type of non-linear modeling technique over the functional domain, since we do not assume a linear model (straight line). FLMM and other functional data analysis methods model data as functions, when there is a natural ordering (e.g., time-series data are ordered by time, imaging data are ordered by x-y coordinates), and are assumed to vary smoothly along the functional domain (e.g., one assumes values of a photometry signal at close time-points in a trial have similar values). Functional data analysis approaches exploit this smoothness and natural ordering to capture more information during estimation and inference.”

      Discussion

      “In this paper, we specified FLMM models with linear covariate–signal relationships at a fixed trial time-point across trials/sessions, to compare the FLMM analogue of the analyses conducted in (Jeong et al., 2022). However, our package allows modeling of covariate–signal relationships with non-linear functions of covariates, using splines or other basis functions. One must consider, however, the tradeoff between flexibility and interpretability when specifying potentially complex models, especially since FLMM is designed for statistical inference.”

      Reviewer #2:

      Summary:

      This work describes a statistical framework that combines functional linear mixed modeling with joint 95% confidence intervals, which improves statistical power and provides less conservative statistical inferences than in previous studies. As recently reviewed by Simpson et al. (2023), linear regression analysis has been used extensively to analyze time series signals from a wide range of neuroscience recording techniques, with recent studies applying them to photometry data. The novelty of this study lies in 1) the introduction of joint 95% confidence intervals for statistical testing of functional mixed models with nested random-effects, and 2) providing an open-source R package implementing this framework. This study also highlights how summary statistics as opposed to trial-by-trial analysis can obscure or even change the direction of statistical results by reanalyzing two other studies.

      Strengths:

      The open-source package in R using a similar syntax as the lme4 package for the implementation of this framework on photometry data enhances the accessibility, and usage by other researchers. Moreover, the decreased fitting time of the model in comparison with a similar package on simulated data, has the potential to be more easily adopted.

      The reanalysis of two studies using summary statistics on photometry data (Jeong et al., 2022; Coddington et al., 2023) highlights how trial-by-trial analysis at each time-point on the trial can reveal information obscured by averaging across trials. Furthermore, this work also exemplifies how session and subject variability can lead to opposite conclusions when not considered.

      We appreciate the in-depth description of our work and, in particular, the R package. This is an area where we put a lot of effort, since our group is very concerned with the practical experience of users.

      Weaknesses:

      Although this work has reanalyzed previous work that used summary statistics, it does not compare with other studies that use trial-by-trial photometry data across time-points in a trial. As described by the authors, fitting pointwise linear mixed models and performing t-test and BenjaminiHochberg correction as performed in Lee et al. (2019) has some caveats. Using joint confidence intervals has the potential to improve statistical robustness, however, this is not directly shown with temporal data in this work. Furthermore, it is unclear how FLMM differs from the pointwise linear mixed modeling used in this work.

      Thank you for making this important point. We agree that this offers an opportunity to showcase the advantages of FLMM over non-functional data analysis methods, such as the approach applied in Lee et al. (2019). As mentioned in the text, fitting entirely separate models at each trial timepoint (without smoothing regression coefficient point and variance estimates across timepoints), and applying multiple comparisons corrections as a function of the number of time points has substantial conceptual drawbacks. To see why, consider that applying this strategy with two different sub-sampling rates requires adjustment for different numbers of comparisons, and could thus lead to very different proportions of timepoints achieving statistical significance. In light of your comments, we decided that it would be useful to provide a demonstration of this. To that effect, we have added Appendix Section 2 comparing FLMM with the method in Lee et al. (2019) on a real dataset, and show that FLMM yields far less conservative and more stable inference across different sub-sampling rates. We conducted this comparison on the delay-length experiment (shown in Figure 6) data, sub-sampled at evenly spaced intervals at a range of sampling rates. We fit either a collection of separate linear mixed models (LMM) followed by a Benjamini–Hochberg (BH) correction, or FLMM with statistical significance determined with both Pointwise and Joint 95% CIs. As shown in Appendix Tables 1-2, the proportion of timepoints at which effects are statistically significant with FLMM Joint CIs is fairly stable across sampling rates. In contrast, the percentage is highly inconsistent with the BH approach and is often highly conservative. This illustrates a core advantage of functional data analysis methods: borrowing strength across trial timepoints (i.e., the functional domain), can improve estimation efficiency and lower sensitivity to how the data is sub-sampled. A multiple comparisons correction may, however, yield stable results if one first smooths both regression coefficient point and variance estimates. Because this includes smoothing the coefficient point and variance estimates, this approach would essentially constitute a functional mixed model estimation strategy that uses multiple comparisons correction instead of a joint CI. We have now added in a description of this experiment in Section 2.4 (copied below).

      “We further analyze this dataset in Appendix Section 2, to compare FLMM with the approach applied in Lee et al. (2019) of fitting pointwise LMMs (without any smoothing) and applying a Benjamini–Hochberg (BH) correction. Our hypothesis was that the Lee et al. (2019) approach would yield substantially different analysis results, depending on the sampling rate of the signal data (since the number of tests being corrected for is determined by the sampling rate). The proportion of timepoints at which effects are deemed statistically significant by FLMM joint 95% CIs is fairly stable across sampling rates. In contrast, that proportion is both inconsistent and often low (i.e., highly conservative) across sampling rates with the Lee et al. (2019) approach. These results illustrate the advantages of modeling a trial signal as a function, and conducting estimation and inference in a manner that uses information across the entire trial.”

      In this work, FLMM usages included only one or two covariates. However, in complex behavioral experiments, where variables are correlated, more than two may be needed (see Simpson et al. (2023), Engelhard et al. (2019); Blanco-Pozo et al. (2024)). It is not clear from this work, how feasible computationally would be to fit such complex models, which would also include more complex random effects.

      Thank you for bringing this up, as we endeavored to create code that is able to scale to complex models and large datasets. We agree that highlighting this capability in the paper will strengthen the work. We now state in the Discussion section that “[T]he package is fast and maintains a low memory footprint even for complex models (see Section 4.6 for an example) and relatively large datasets.” Methods Section 4.6 now includes the following:

      Our fastFMM package scales to the dataset sizes and model specifications common in photometry. The majority of the analyses presented in the Results Section (Section 2) included fairly simple functional fixed and random effect model specifications because we were implementing the FLMM versions of the summary measure analyses presented in Jeong et al. (2022). However, we fit the following FLMM to demonstrate the scalability of our method with more complex model specifications:

      We use the same notation as the Reward Number model in Section 4.5.2, with the additional variable TL_i,j,l_ denoting the Total Licks on trial j of session l for animal i. In a dataset with over 3,200 total trials (pooled across animals), this model took ∼1.2 min to fit on a MacBook Pro with an Apple M1 Max chip with 64GB of RAM. Model fitting had a low memory footprint. This can be fit with the code:

      model_fit = fui(photometry ~ session + trial + iri + lick_time + licks + (session + trial + iri + lick_time + licks | id), parallel = TRUE, data = photometry_data)

      This provides a simple illustration of the scalability of our method. The code (including timing) for this demonstration is now included on our Github repository.

      Reviewer #3:

      Summary:

      Loewinger et al., extend a previously described framework (Cui et al., 2021) to provide new methods for statistical analysis of fiber photometry data. The methodology combines functional regression with linear mixed models, allowing inference on complex study designs that are common in photometry studies. To demonstrate its utility, they reanalyze datasets from two recent fiber photometry studies into mesolimbic dopamine. Then, through simulation, they demonstrate the superiority of their approach compared to other common methods.

      Strengths:

      The statistical framework described provides a powerful way to analyze photometry data and potentially other similar signals. The provided package makes this methodology easy to implement and the extensively worked examples of reanalysis provide a useful guide to others on how to correctly specify models.

      Modeling the entire trial (function regression) removes the need to choose appropriate summary statistics, removing the opportunity to introduce bias, for example in searching for optimal windows in which to calculate the AUC. This is demonstrated in the re-analysis of Jeong et al., 2022, in which the AUC measures presented masked important details about how the photometry signal was changing.

      Meanwhile, using linear mixed methods allows for the estimation of random effects, which are an important consideration given the repeated-measures design of most photometry studies.

      We would like to thank the reviewer for the deep reading and understanding of our paper and method, and the thoughtful feedback provided. We agree with this summary, and will respond in detail to all the concerns raised.

      Weaknesses:

      While the availability of the software package (fastFMM), the provided code, and worked examples used in the paper are undoubtedly helpful to those wanting to use these methods, some concepts could be explained more thoroughly for a general neuroscience audience.

      Thank you for this point. While we went to great effort to explain things clearly, our efforts to be concise likely resulted in some lack of clarity. To address this, we have created a series of analysis guides for a more general neuroscience audience, reflecting our experience working with researchers at the NIH and the broader community. These guides walk users through the code, its deployment in typical scenarios, and the interpretation of results.

      While the methodology is sound and the discussion of its benefits is good, the interpretation and discussion of the re-analyzed results are poor:

      In section 2.3, the authors use FLMM to identify an instance of Simpson’s Paradox in the analysis of Jeong et al. (2022). While this phenomenon is evident in the original authors’ metrics (replotted in Figure 5A), FLMM provides a convenient method to identify these effects while illustrating the deficiencies of the original authors’ approach of concatenating a different number of sessions for each animal and ignoring potential within-session effects.

      Our goal was to demonstrate that FLMM provides insight into why the opposing within- and between-session effects occur: the between-session and within-session changes appear to occur at different trial timepoints. Thus, while the AUC metrics applied in Jeong et al. (2022) are enough to show the presence of Simpson’s paradox, it is difficult to hypothesize why the opposing within-/between-session effects occur. An AUC analysis cannot determine at what trial timepoints (relative to licking) those opposing trends occur.

      The discussion of this result is muddled. Having identified the paradox, there is some appropriate speculation as to what is causing these opposing effects, particularly the decrease in sessions. In the discussion and appendices, the authors identify (1) changes in satiation/habitation/motivation, (2) the predictability of the rewards (presumably by the click of a solenoid valve) and (3) photobleaching as potential explanations of the decrease within days. Having identified these effects, but without strong evidence to rule all three out, the discussion of whether RPE or ANCCR matches these results is probably moot. In particular, the hypotheses developed by Jeong et al., were for a random (unpredictable) rewards experiment, whereas the evidence points to the rewards being sometimes predictable. The learning of that predictability (e.g. over sessions) and variation in predictability (e.g. by attention level to sounds of each mouse) significantly complicate the analysis. The FLMM analysis reveals the complexity of analyzing what is apparently a straightforward task design.

      While we are disappointed to hear the reviewer felt our initial interpretations and discussion were poor, the reviewer brings up an excellent point re: potential reward predictability that we had not considered. They have convinced us that acknowledging this alternative perspective will strengthen the paper, and we have added it into the Discussion. We agree that the ANCCR/RPE model predictions were made for unpredictable rewards and, as the reviewer rightly points out, there is evidence that the animals may sense the reward delivery. After discussing extensively with the authors of Jeong et al. (2022), it is clear that they went to enormous trouble to prevent the inadvertent generation of a CS+, and it is likely changes in pressure from the solenoid (rather than a sound) that may have served as a cue. Regardless of the learning theory one adopts (RPE, ANCCR or others), we agree that this potential learned predictability could, at least partially, account for the increase in signal magnitude across sessions. As this paper is focused on analysis methods, we feel that we can contribute most thoughtfully to the dopamine–learning theory conversation by presenting this explanation in detail, for consideration in future experiments. We have substantially edited this discussion and, as per the reviewer’s suggestion, have qualified our interpretations to reflect the uncertainty in explaining the observed trends.

      If this paper is not trying to arbitrate between RPE and ANCCR, as stated in the text, the post hoc reasoning of the authors of Jeong et al 2022 provided in the discussion is not germane. Arbitrating between the models likely requires new experimental designs (removing the sound of the solenoid, satiety controls) or more complex models (e.g. with session effects, measures of predictability) that address the identified issues.

      Thank you for this point. We agree with you that, given the scope of the paper, we should avoid any extensive comparison between the models. To address your comment, we have now removed portions of the Discussion that compared RPE and ANCCR. Overall, we agree with the reviewer, and think that future experiments will be needed for conclusively testing the accuracy of the models’ predictions for random (unpredicted) rewards. While we understand that our description of several conversations with the Jeong et al., 2022 authors could have gone deeper, we hope the reviewer can appreciate that inclusion of these conversations was done with the best of intentions. We wish to emphasize that we also consulted with several other researchers in the field when crafting our discussion. We do commend the authors of Jeong et al., 2022 for their willingness to discuss all these details. They could easily have avoided acknowledging any potential incompleteness of their theory by claiming that our results do not invalidate their predictions for a random reward, because the reward could potentially have been predicted (due to an inadvertent CS+ generated from the solenoid pressure). Instead, they emphasized that they thought their experiment did test a random reward, to the extent they could determine, and that our results suggest components of their theory that should be updated. We think that engagement with re-analyses of one’s data, even when findings are at odds with an initial theoretical framing, is a good demonstration of open science practice. For that reason as well, we feel providing readers with a perspective on the entire discussion will contribute to the scientific discourse in this area.

      Finally, we would like to reiterate that this conversation is happening at least in part because of our method: by analyzing the signal at every trial timepoint, it provides a formal way to test for the presence of a neural signal indicative of reward delivery perception. Ultimately, this was what we set out to do: help researchers ask questions of their data that may have been harder to ask before. We believe that having a demonstration that we can indeed do this for a “live” scientific issue is the most appropriate way of demonstrating the usefulness of the method.

      Of the three potential causes of within-session decreases, the photobleaching arguments advanced in the discussion and expanded greatly in the appendices are not convincing. The data being modeled is a processed signal (∆F/F) with smoothing and baseline correction and this does not seem to have been considered in the argument. Furthermore, the photometry readout is also a convolution of the actual concentration changes over time, influenced by the on-off kinetics of the sensor, which makes the interpretation of timing effects of photobleaching less obvious than presented here and more complex than the dyes considered in the cited reference used as a foundation for this line of reasoning.

      We appreciate the nuance of this point, and we have made considerable efforts in the Results and Discussion sections to caution that alternative hypotheses (e.g., photobleaching) cannot be definitively ruled out. In response to your criticism, we have consulted with more experts in the field regarding the potential for bleaching in this data, and it is not clear to us why photobleaching would be visible in one time-window of a trial, but not at another (less than a second away), despite high ∆F/F magnitudes in both time-windows. We do wish to point out that the Jeong et al. (2022) authors were also concerned about photobleaching as a possible explanation. At their request, we analyzed data from additional experiments, collected from the same animals. In most cases, we did not observe signal patterns that seemed to indicate photobleaching. Given the additional scrutiny, we do not think that photobleaching is more likely to invalidate results in this particular set of experiments than it would be in any other photometry experiment. While the role of photobleaching may be more complicated with this sensor than others in the references, that citation was included primarily as a way of acknowledging that it is possible that non-linearities in photobleaching could occur. Regardless, your point is well taken and we have qualified our description of these analyses to express that photobleaching cannot be ruled out.

      Within this discussion of photobleaching, the characterization of the background reward experiments used in part to consider photobleaching (appendix 7.3.2) is incorrect. In this experiment (Jeong et al., 2022), background rewards were only delivered in the inter-trial-interval (i.e. not between the CS+ and predicted reward as stated in the text). Both in the authors’ description and in the data, there is a 6s before cue onset where rewards are not delivered and while not described in the text, the data suggests there is a period after a predicted reward when background rewards are not delivered. This complicates the comparison of this data to the random reward experiment.

      Thank you for pointing this out! We removed the parenthetical on page 18 of the appendix that incorrectly stated that rewards can occur between the CS+ and the predicted reward.

      The discussion of the lack of evidence for backpropagation, taken as evidence for ANCCR over RPE, is also weak.

      Our point was initially included to acknowledge that, although our method yields results that conflict with the conclusions described by Jeong et al., 2022 on data from some experiments, on other experiments our method supports their results. Again, we believe that a critical part of re-analyzing shared datasets is acknowledging both areas where new analyses support the original results, as well as those where they conflict with them. We agree with the reviewer that qualifying our results so as not to emphasize support for/against RPE/ANCCR will strengthen our paper, and we have made those changes. We have qualified the conclusions of our analysis to emphasize they are a demonstration of how FLMM can be used to answer a certain style of question with hypothesis testing (how signal dynamics change across sessions), as opposed to providing evidence for/against the backpropagation hypothesis.

      A more useful exercise than comparing FLMM to the methods and data of Jeong et al., 2022, would be to compare against the approach of Amo et al., 2022, which identifies backpropagation (data publicly available: DOI: 10.5061/dryad.hhmgqnkjw). The replication of a positive result would be more convincing of the sensitivity of the methodology than the replication of a negative result, which could be a result of many factors in the experimental design. Given that the Amo et al. analysis relies on identifying systematic changes in the timing of a signal over time, this would be particularly useful in understanding if the smoothing steps in FLMM obscure such changes.

      Thank you for this suggestion. Your thoughtful review has convinced us that focusing on our statistical contribution will strengthen the paper, and we made changes to further emphasize that we are not seeking to adjudicate between RPE/ANCCR. Given the length of the manuscript as it stands, we could only include a subset of the analyses conducted on Jeong et al., 2022, and had to relegate the results from the Coddington et al., data to an appendix. Realistically, it would be hard for us to justify including analyses from a third dataset, only to have to relegate them to an appendix. We did include numerous examples in our manuscript where we already replicated positive results, in a way that we believe demonstrates the sensitivity of the methodology. We have also been working with many groups at NIH and elsewhere using our approach, in experiments targeting different scientific questions. In fact, one paper that extensively applies our method, and compares the results with those yielded by standard analysis of AUCs, is already published (Beas et al., 2024). Finally, in our analysis guide we describe additional analyses, not included in the manuscript, that replicate positive results. Hence there are numerous demonstrations of FLMM’s performance in less controversial settings. We take your point that our description of the data supporting one theory or the other should be qualified, and we have corrected that. Specifically for your suggestion of Amo et al. 2022, we have not had the opportunity to personally reanalyze their data, but we are already in contact with other groups who have conducted preliminary analyses of their data with FLMM. We are delighted to see this, in light of your comments and our decision to restrict the scope of our paper. We will help them and other groups working on this question to the extent we can.

      Recommendations for the Authors:

      Reviewer #2:

      First, I would like to commend the authors for the clarity of the paper, and for creating an open-source package that will help researchers more easily adopt this type of analysis.

      Thank you for the positive feedback!

      I would suggest the authors consider adding to the manuscript, either some evidence or some intuition on how feasible would be to use FLMM for very complex model specifications, in terms of computational cost and model convergence.

      Thank you for this suggestion. As we described above in response to Reviewer #2’s Public Reviews, we have added in a demonstration of the scalability of the method. Since our initial manuscript submission, we have further increased the package’s speed (e.g., through further parallelization). We are releasing the updated version of our package on CRAN.

      From my understanding, this package might potentially be useful not just for photometry data but also for two-photon recordings for example. If so, I would also suggest the authors add to the discussion this potential use.

      This is a great point. Our updated manuscript Discussion includes the following:

      “The FLMM framework may also be applicable to techniques like electrophysiology and calcium imaging. For example, our package can fit functional generalized LMMs with a count distribution (e.g., Poisson). Additionally, our method can be extended to model time-varying covariates. This would enable one to estimate how the level of association between signals, simultaneously recorded from different brain regions, fluctuates across trial time-points. This would also enable modeling of trials that differ in length due to, for example, variable behavioral response times (e.g., latency-topress).”

      Reviewer #3:

      The authors should define ’function’ in context, as well as provide greater detail of the alternate tests that FLMM is compared to in Figure 7.

      We include a description of the alternate tests in Appendix Section 5.2. We have updated the Methods Section (Section 4) to introduce the reader to how ‘functions’ are conceptualized and modeled in the functional data analysis literature. Specifically, we added the following text:

      “FLMM models each trial’s signal as a function that varies smoothly across trial time-points (i.e., along the “functional domain”). It is thus a type of non-linear modeling technique over the functional domain, since we do not assume a linear model (straight line). FLMM and other functional data analysis methods model data as functions, when there is a natural ordering (e.g., time-series data are ordered by time, imaging data are ordered by x-y coordinates), and are assumed to vary smoothly along the functional domain (e.g., one assumes values of a photometry signal at close time-points in a trial have similar values). Functional data analysis approaches exploit this smoothness and natural ordering to capture more information during estimation and inference.”

      Given the novelty of estimating joint CIs, the authors should be clearer about how this should be reported and how this differs from pointwise CIs (and how this has been done in the past).

      We appreciate your pointing this out, as the distinction is nuanced. Our manuscript includes a description of how joint CIs enable one to interpret effects as statistically significant for time-intervals as opposed to individual timepoints. Unlike joint CIs, assessing significance with pointwise CIs suffers from multiple-comparisons problems. As a result of your suggestion, we have included a short discussion of this to our analysis guide (Part 1), entitled “Pointwise or Joint 95% Confidence Intervals.” The Methods section of our manuscript also includes the following:

      “The construction of joint CIs in the context of functional data analysis is an important research question; see Cui et al. (2021) and references therein. Each point at which the pointwise 95% CI does not contain 0 indicates that the coefficient is statistically significantly different from 0 at that point. Compared with pointwise CIs, joint CIs takes into account the autocorrelation of signal values across trial time-points (the functional domain). Therefore, instead of interpreting results at a specific timepoint, joint CIs enable joint interpretations at multiple locations along the functional domain. This aligns with interpreting covariate effects on the photometry signals across time-intervals (e.g., a cue period) as opposed to at a single trial time-point. Previous methodological work has provided functional mixed model implementations for either joint 95% CIs for simple random-effects models (Cui et al., 2021), or pointwise 95% CIs for nested models (Scheipl et al., 2016), but to our knowledge, do not provide explicit formulas or software for computing joint 95% CIs in the presence of general random-effects specifications.”

      The authors identify that many photometry studies are complex nested longitudinal designs, using the cohort of 8 animals used in five task designs of Jeong et al. 2022 as an example. The authors miss the opportunity to illustrate how FLMM might be useful in identifying the effects of subject characteristics (e.g. sex, CS+ cue identity).

      This is a fantastic point and we have added the following into the Discussion:

      “...[S]ignal heterogeneity due to subject characteristics (e.g., sex, CS+ cue identity) could be incorporated into a model through inclusion of animal-specific random effects.”

      In discussing the delay-length change experiment, it would be more accurate to say that proposed versions of RPE and ANCCR do not predict the specific change.

      Good point. We have made this change.

      Minor corrections:

      Panels are mislabeled in Figure 5.

      Thank you. We have corrected this.

      The Crowder (2009) reference is incorrect, being a review of the book with the book presumably being the correct citation.

      Good catch, thank you! Corrected.

      In Section 5 (first appendix), the authors could include the alternate spelling ’fibre photometry’ to capture any citations that use British English spelling.

      This is a great suggestion, but we did not have time to recreate these figures before re-submission.

      Section 7.4 is almost all quotation, though unevenly using the block quotation formatting. It is unclear why such a large quotation is included.

      Thank you for pointing this out. We have removed this Appendix section (formerly Section 7.4) as the relevant text was already included in the Methods section.

      References

      Sofia Beas, Isbah Khan, Claire Gao, Gabriel Loewinger, Emma Macdonald, Alison Bashford, Shakira Rodriguez-Gonzalez, Francisco Pereira, and Mario A Penzo. Dissociable encoding of motivated behavior by parallel thalamo-striatal projections. Current Biology, 34(7):1549–1560, 2024.

      Erjia Cui, Andrew Leroux, Ekaterina Smirnova, and Ciprian Crainiceanu. Fast univariate inference for longitudinal functional models. Journal of Computational and Graphical Statistics, 31:1–27, 07 2021. doi: 10.1080/10618600.2021.1950006.

      Huijeong Jeong, Annie Taylor, Joseph R Floeder, Martin Lohmann, Stefan Mihalas, Brenda Wu, Mingkang Zhou, Dennis A Burke, and Vijay Mohan K Namboodiri. Mesolimbic dopamine release conveys causal associations. Science, 378(6626):eabq6740, 2022. doi: 10.1126/science.abq6740. URL https://www. science.org/doi/abs/10.1126/science.abq6740.

      Rachel S Lee, Marcelo G Mattar, Nathan F Parker, Ilana B Witten, and Nathaniel D Daw. Reward prediction error does not explain movement selectivity in dms-projecting dopamine neurons. eLife, 8:e42992, apr 2019. ISSN 2050-084X. doi: 10.7554/eLife.42992. URL https://doi.org/10.7554/eLife.42992.

      Fabian Scheipl, Jan Gertheiss, and Sonja Greven. Generalized functional additive mixed models. Electronic Journal of Statistics, 10(1):1455 – 1492, 2016. doi: 10.1214/16-EJS1145. URL https://doi.org/10.1214/16-EJS1145.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Freas et al. investigated if the exceedingly dim polarization pattern produced by the moon can be used by animals to guide a genuine navigational task. The sun and moon have long been celestial beacons for directional information, but they can be obscured by clouds, canopy, or the horizon. However, even when hidden from view, these celestial bodies provide directional information through the polarized light patterns in the sky. While the sun's polarization pattern is famously used by many animals for compass orientation, until now it has never been shown that the extremely dim polarization pattern of the moon can be used for navigation. To test this, Freas et al. studied nocturnal bull ants, by placing a linear polarizer in the homing path on freely navigating ants 45 degrees shifted to the moon's natural polarization pattern. They recorded the homing direction of an ant before entering the polarizer, under the polarizer, and again after leaving the area covered by the polarizer. The results very clearly show, that ants walking under the linear polarizer change their homing direction by about 45 degrees in comparison to the homing direction under the natural polarization pattern and change it back after leaving the area covered by the polarizer again. These results can be repeated throughout the lunar month, showing that bull ants can use the moon's polarization pattern even under crescent moon conditions. Finally, the authors show, that the degree in which the ants change their homing direction is dependent on the length of their home vector, just as it is for the solar polarization pattern. 

      The behavioral experiments are very well designed, and the statistical analyses are appropriate for the data presented. The authors' conclusions are nicely supported by the data and clearly show that nocturnal bull ants use the dim polarization pattern of the moon for homing, in the same way many animals use the sun's polarization pattern during the day. This is the first proof of the use of the lunar polarization pattern in any animal.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors aimed to understand whether polarised moonlight could be used as a directional cue for nocturnal animals homing at night, particularly at times of night when polarised light is not available from the sun. To do this, the authors used nocturnal ants, and previously established methods, to show that the walking paths of ants can be altered predictably when the angle of polarised moonlight illuminating them from above is turned by a known angle (here +/- 45 degrees).

      Strengths: 

      The behavioural data are very clear and unambiguous. The results clearly show that when the angle of downwelling polarised moonlight is turned, ants turn in the same direction. The data also clearly show that this result is maintained even for different phases (and intensities) of the moon, although during the waning cycle of the moon the ants' turn is considerably less than may be expected.

      Weaknesses: 

      The final section of the results - concerning the weighting of polarised light cues into the path integrator - lacks clarity and should be reworked and expanded in both the Methods and the Results (also possibly with an extra methods figure). I was really unsure of what these experiments were trying to show or what the meaning of the results actually are.

      Rewrote these sections and added figure panel to Figure 6.

      Impact: 

      The authors have discovered that nocturnal bull ants while homing back to their nest holes at night, are able to use the dim polarised light pattern formed around the moon for path integration. Even though similar methods have previously shown the ability of dung beetles to orient along straight trajectories for short distances using polarised moonlight, this is the first evidence of an animal that uses polarised moonlight in homing. This is quite significant, and their findings are well supported by their data.

      Reviewer #3 (Public Review): 

      Summary: 

      This manuscript presents a series of experiments aimed at investigating orientation to polarized lunar skylight in a nocturnal ant, the first report of its kind that I am aware of.

      Strengths: 

      The study was conducted carefully and is clearly explained here. 

      Weaknesses: 

      I have only a few comments and suggestions, that I hope will make the manuscript clearer and easier to understand.

      Time compensation or periodic snapshots 

      In the introduction, the authors compare their discovery with that in dung beetles, which have only been observed to use lunar skylight to hold their course, not to travel to a specific location as the ants must. It is not entirely clear from the discussion whether the authors are suggesting that the ants navigate home by using a time-compensated lunar compass, or that they update their polarization compass with reference to other cues as the pattern of lunar skylight gradually shifts over the course of the night - though in the discussion they appear to lean towards the latter without addressing the former. Any clues in this direction might help us understand how ants adapted to navigate using solar skylight polarization might adapt use to lunar skylight polarization and account for its different schedule. I would guess that the waxing and waning moon data can be interpreted to this effect.

      Added a paragraph discussing this distinction in mechanisms and the limits of the current data set in untangling them. An interesting topic for a follow up to be sure.

      Effects of moon fullness and phase on precision 

      As well as the noted effect on shift magnitudes, the distributions of exit headings and reorientations also appear to differ in their precision (i.e., mean vector length) across moon phases, with somewhat shorter vectors for smaller fractions of the moon illuminated. Although these distributions are a composite of the two distributions of angles subtracted from one another to obtain these turn angles, the precision of the resulting distribution should be proportional to the original distributions. It would be interesting to know whether these differences result from poorer overall orientation precision, or more variability in reorientation, on quarter moon and crescent moon nights, and to what extent this might be attributed to sky brightness or degree of polarization.

      See below for response to this and the next reviewer comment

      N.B. The Watson-Williams tests for difference in mean angle are also sensitive to differences in sample variance. This can be ruled out with another variety of the test, also proposed by Watson and Williams, to check for unequal variances, for which the F statistic is = (n2-1)*(n1-R1) / (n1-1)*(n2-R2) or its inverse, whichever is >1. 

      We have looked at the amount of variance from the mean heading direction in terms of both the shifts and the reorientations and found no significant difference in variance between all relevant conditions. It is possible (and probably likely) that with a higher n we might find these differences but with the current data set we cannot make statistical statements regarding degradations in navigational precision.  

      As an additional analysis to address the Watson-Williams test‘s sensitivity to changes in variance, we have added var test comparisons for each of the comparisons, which is a well-established test to compare variance changes. None of these were significantly different, suggesting the observed differences in the WW tests are due to changes in the mean vector and not the distribution. We have added this test to the text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      I have only very few minor suggestions to improve the manuscript: 

      (1) While I fully agree with the authors that their study, to the best of my knowledge, provides the first proof (in any animal) of the use of the moon's polarization pattern, the many repetitions of this fact disturb the flow of the text and could be cut at several instances. 

      Yes, it is indeed repeated to an annoying degree. 

      We have removed these beyond bookending mentions (Abstract and Discussion).

      (2) In my opinion, the authors did not change the "ambient polarization pattern" when using the linear polarization filter (e.g., l. 55, 170, 177 ...). The linear polarizer presents an artificial polarization pattern with a much higher degree of polarization in comparison to the ambient polarization pattern. I would suggest re-phrasing this, to emphasize the artificial nature of the polarization pattern under the polarizer.

      We have made these suggested changes throughout the text to clarify. We no longer say the ambient pattern was   

      (3) Line 377: I do not see the link between the sentence and Figure 7 

      Changed where in the discussion we refer to Figure 7.

      (4) Figure 7 upper part: In my opinion, the upper part of Figure 7 does not add any additional value to the illustration of the data as compared to Figure 5 and could be cut.

      We thought it might be easier for some reader to see the shifts as a dial representation with the shift magnitude converted to 0-100% rather than the shifts in Figure 5. This makes it somewhat like a graphical abstract summarising the whole study.

      I agree that Figure 5 tells the same story but a reader that has little background in directional stats might find figure 7 more intuitive. This was the intent at least. 

      If it becomes a sticking point, then we can remove the upper portion.  

      Reviewer #2 (Recommendations For The Authors): 

      MINOR CORRECTIONS AND QUERIES 

      Line 117: THE majority 

      Corrected

      Lines 129-130: Do you have a reference to support this statement? I am unaware of experiments that show that homing ants count their steps, but I could have missed it.

      We have added the references that unpack the ant pedometer.  

      Line 140: remove "the" in this line. 

      Removed

      Line 170: We need more details here about the spectral transmission properties of the polariser (and indeed which brand of filter, etc.). For instance, does it allow the transmission of UV light?

      Added

      Line 239: "...tested identicALLY to ...." 

      Corrected

      Lines 242-258 (Vector testing): I must admit I found the description of these experiments very difficult to follow. I read this section several times and felt no wiser as a result. I think some thought needs to be given to better introduce the reader to the rationale behind the experiment (e.g., start by expanding lines 243-246, and maybe add a methods figure that shows the different experimental procedures).

      I have rewritten this section of the methods to clearly state the experiment rational and to be clearer as to the methodology.

      Also Added a methods panel to Figure 6.

      Line 247: "reoriented only halfway". What does this mean? Do you mean with half the expected angle?

      Yes, this is a bit unclear. We have altered for clarity:

      ‘only altered their headings by about half of the 45° e-vector shift (25.2°± 3.7°), despite being tested on near-full-moon nights.’

      Results section (in general): In Figure 1 (which is a very nice figure!) you go to all the trouble of defining b degrees (exit headings) and c degrees (reorientation headings), which are very intuitive for interpreting the results, and then you totally abandon these convenient angles in favour of an amorphous Greek symbol Phi (Figs. 2-6) to describe BOTH exit and reorientation headings. Why?? It becomes even more confusing when headings described by Phi can be typically greater than 300 degrees in the figures, but they are never even close to this in the text (where you seem to have gone back to using the b degrees and c degrees angles, without explicitly saying so). Personally, I think the b degrees and c degrees angles are more intuitive (and should be used in both the text and the figures), but if you do insist on using Phi then you should use it consistently in both the text and the figures. 

      Replaced Phi with b° and c° for both figures and in the text.

      Finally, for reorientation angles in Figure 4A, you say that the angle is 16.5 degrees. This angle should have been 143.5 degrees to be consistent with other figures. 

      Yes, the reorientation was erroneously copied from the shift data (it is identical in both the +45 shift and reorientation for Figure 4A). This has now been corrected

      Line 280, and many other lines: Wherever you refer to two panels of the same figure, they should be written as (say) Figure 2A, B not Figure 2AB.

      Changed as requested throughout the text.

      Line 295 (Waxing lunar phases): For these experiments, which nest are you using? 1 or 2?

      We have added that this is nest 1. 

      Figure 3B: The title of this panel should be "Waxing Crescent Moon" I think. 

      Ah yes, this is incorrect in the original submission. I have fixed this.

      Lines 312-313: Here it sounds as though the ants went right back to the full +/- 45 degrees orientations when they clearly didn't (it was -26.6 degrees and 189.9 degrees). Maybe tone the language down a bit here.

      Changed this to make clear the orientation shift is only ‘towards’ the ambient lunar e-vector.

      Line 327: Insert "see" before "Figure 5" 

      Added

      Line 329: See comment for Line 295. 

      We have added that this is nest 1. 

      Lines 357-373 (Vector testing): Again, because of the somewhat confusing methods section describing these experiments, these results were hard to follow, both here and in the Discussion. I don't really understand what you have shown here. Re-think how you present this (and maybe re-working the Methods will be half the battle won). 

      I have rewritten these sections to try to make clear these are ant tested with differences in vector length 6m vs. 2m, tested at the same location. Hopefully this is much clearer, but I think if these portions remain a bit confusing that a full rename of the conditions is in order. Something like long vector and short vector would help but comes with the problem of not truly describing what the purpose of the test is which is to control for location, thus the current condition names. As it stands, I hope the new clarifications adequately describe the reasoning while keeping the condition names. Of course, I am happy to make more changes here as making this clear to readers is important for driving home that the path integrator is in play.

      See current change to results as an example: ‘Both forgers with a long ~6m remaining vector (Halfway Release), or a short ~2m remaining vector (Halfway Collection & Release), tested at the same location_,_ exhibited significant shifts to the right of initial headings when the e-vector was rotated clockwise +45°.’

      Line 361: I think this should be 16.8 not 6.8 

      Yes, you are correct. Fixed in text (16.8).

      Line 365: I think this should be -12.7 not 12.7 

      Yes, you are correct. Fixed in text (–12.7).

      Line 408: "morning twilight". Should this be "morning solar twilight"? Plus "M midas" should be "M. midas"

      Added and fixed respectively.

      Line 440. "location" is spelt wrong. 

      Fixed spelling.

      Line 444: "...WITH longer accumulated vectors, ..." 

      Added ‘with’ to sentence. 

      Line 447: Remove "that just as"

      Removed.

      Line 448: "Moonlight polarised light" should be "Polarised moonlight" 

      Corrected.

      Lines 450-453: This sentence makes little sense scientifically or grammatically. A "limiting factor" can't be "accomplished". Please rephrase and explain in more detail.

      This sentence has been rephrased:

      ‘The limiting factors to lunar cue use for navigation would instead be the ant’s detection threshold to either absolute light intensity, polarization sensitivity and spectral sensitivity. Moonlight is less UV rich compared to direct sunlight and the spectrum changes across the lunar cycle (Palmer and Johnsen 2015).’

      Line 474: Re-write as "... due to the incorporation of the celestial compass into the path integrator..."

      Added.

      Reviewer #3 (Recommendations For The Authors): 

      Minor comments 

      Line 84 I am not sure that we can infer attentional processes in orientation to lunar skylight, at least it has not yet been investigated.

      Yes, this is a good point. We have changed ‘attend’ to ‘use’.  

      Line 90 This description of polarized light is a little vague; what is meant by the phrase "waves which occur along a single plane"? (What about the magnetic component? These waves can be redirected, are they then still polarized? Circular polarization?). I would recommend looking at how polarized light is described in textbooks on optics.

      Response: We have rewritten the polarised light section to be clearer using optics and light physics for background. 

      Line 92 The phrase "e-vector" has not been described or introduced up to this point.

      We now introduce e-vector and define it. 

      ‘Polarised light comprises light waves which occur along a single plane and are produced as a by-product of light passing through the upper atmosphere (Horváth & Varjú 2004; Horváth et al., 2014). The scattering of this light creates an e-vector pattern in the sky, which is arranged in concentric circles around the sun or moon's position with the maximum degree of polarisation located 90° from the source. Hence when the sun/moon is near the horizon, the pattern of polarised skylight is particularly simple with uniform direction of polarisation approximately parallel to the north-south axes (Dacke et al., 1999, 2003; Reid et al. 2011; Zeil et al., 2014).’

      Happy to make further changes as well.  

      Line 107 Diurnal dung beetles can also orient to lunar skylight if roused at night (Smolka et al., 2016), provided the sky is bright enough. Perhaps diurnal ants might do the same?

      Added the diurnal dung beetles mention as well as the reference.

      Also, a very good suggestion using diurnal bull ants.

      Line 146 Instead of lunar calendar the authors appear to mean "lunar cycle". 

      Changed

      Line 165 In Figure 1B, it looks like visual access to the sky was only partly "unobstructed". Indeed foliage covers as least part of the sky right up to the zenith.

      We have added that the sky is partially obstructed. 

      Line 179 This could also presumably be checked with a camera? 

      For this testing we tried to keep equipment to a minimum for a single researcher walking to and from the field site given the lack of public transport between 1 and 4am. But yes, for future work a camera based confirmation system would be easier. 

      Line 243 The abbreviation "PI" has not been described or introduced up to this point.

      Changes to ‘path integration derived vector lengths….’

      Line 267 The method for comparing the leftwards and rightwards shifts should be described in full here (presumably one set of shifts was mirrored onto the other?).

      We have added the below description to indicate the full description of the mirroring done to counterclockwise shifts.

      ‘To assess shift magnitude between −45° and +45° foragers within conditions, we calculated the mirror of shift in each −45° condition, allowing shift magnitude comparisons within each condition. Mirroring the −45° conditions was calculated by mirroring each shift across the 0° to 180° plane and was then compared to the corresponding unaltered +45 condition.’

      Discussion Might the brightness and spectrum of lunar skylight also play a role here?

      We have added a section to the discussion to mention the aspects of moonlight which may be important to these animals, including the spectrum, brightness and polarisation intensity.  

      Line 451 The sensitivity threshold to absolute light intensity would not be the only limiting factor here. Polarization sensitivity and spectral sensitivity may also play a role (moonlight is less UV rich than sunlight and the spectrum of twilight changes across the lunar cycle: Palmer & Johnsen, 2015). 

      Added this clarification.

      Line 478 Instead of the "masculine ordinal" symbol used (U+006F) here a degree symbol (U+00B0) should be used.

      Ah thank you, we have replaced this everywhere in the text.  

      Line 485 It should be possible to calculate the misalignment between polarization pattern before and after this interruption of celestial cues. Does the magnitude of this misalignment help predict the size of the reorientation?

      Reorientations are highly correlated with the shift size under the filter, which makes sense as larger shifts mean that foragers need to turn back more to reorient to both the ambient pattern and to return to their visual route. Reorientation sizes do not show a consistent reduction compared to under-the-filter shifts when the lunar phase is low and is potentially harder to detect.

      I have reworked this line in the text as I do not think there is much evidence for misalignment and it might be more precise to say that overnight periods where the moon is not visible may adversely impact the path integrator estimate, though it is currently unknown the full impact of this celestial cue gap of if other cues might also play a role.

      Line 642 "from their" should be "relative to" 

      Changed as requested

      Figure 1B Some mention should be made of the differences in vegetation density. 

      Added a sentence to the figure caption discussing the differences in both vegetation along the horizon and canopy cover.

      Figures 2-6 A reference line at 0 degrees change might help the reader to assess the size of orientation changes visually. Confidence intervals around the mean orientation change would also help here.

      We have now added circular grid lines and confidence intervals to the circular plots. These should help make the heading changes clear to readers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment <br /> This valuable study is a companion to a paper introducing a theoretical framework and methodology for identifying Cancer Driving Nucleotides (CDNs). While the evidence that recurrent SNVs or CDNs are common in true cancer driver genes is solid, the evidence that many more undiscovered cancer driver mutations will have CDNs, and that this approach could identify these undiscovered driver genes with about 100,000 samples, is limited. 

      Same criticism as in the eLife assessment of eLife-RP-RA-2024-99340 (https://elifesciences.org/reviewed-preprints/99340). Hence, please refer to the responses to the companion paper.

      Public Reviews:

      Reviewer #1 (Public Review):

      The study investigates Cancer Driving Nucleotides (CDNs) using the TCGA database, finding that these recurring point mutations could greatly enhance our understanding of cancer genomics and improve personalized treatment strategies. Despite identifying 50-150 CDNs per cancer type, the research reveals that a significant number remain undiscovered, limiting current therapeutic applications, and underscoring the need for further larger-scale research.

      Strengths:

      The study provides a detailed examination of cancer-driving mutations at the nucleotide level, offering a more precise understanding than traditional gene-level analyses. The authors found a significant number of CDNs remain undiscovered, with only 0-2 identified per patient out of an expected 5-8, indicating that many important mutations are still missing. The study indicated that identifying more CDNs could potentially significantly impact the development of personalized cancer therapies, improving patient outcomes.

      Weaknesses:

      The study is constrained by relatively small sample sizes for each cancer type, which reduces the statistical power and robustness of the findings. ICGC and other large-scale WGS datasets are publicly available but were not included in this study.

      Thanks. We indeed have used all public data, including GENIE (figure 7 of the companion paper), ICGC and other integrated resources such as COSMIC. The main study is based on TCGA because it is unbiased for estimating the probability of CDN occurrences. In many datasets, the numerators are given but the denominators are not (the number of patients with the mutation / the total number of patients surveyed). In GENIE, we observed that E(u) estimated upon given sequencing panels are much smaller than in TCGA, this might be due to the selective report of nonsynonymous mutations for synonymous mutations are generally considered irrelevant in tumorigenesis.

      To be able to identify rare driver mutations, more samples are needed to improve the statistical power, which is well-known in cancer research. The challenges in direct functional testing of CDNs due to the complexity of tumor evolution and unknown mutation combinations limit the practical applicability of the findings.

      We fully agree. We now add a few sentences, making clear that the theory allows us to see how much more can be gained by each stepwise increase in sample size. For example, when the sample size reaches 106, further increases will yield almost no gain in confidence of CDNs identified (see figures of eLife-RP-RA-2024-99340. As pointed out in our provisional responses, an important strength of this pair of studies is that the results are testable. The complexity is the combination of mutations required for tumorigenesis and the identification of such combinations is the main goal and strength of this pair of studies. We add a few sentences to this effect.

      While the importance of large sample sizes in identifying cancer drivers is well-recognized, the analytical framework presented in the companion paper (https://elifesciences.org/reviewed-preprints/99340) goes a step further by quantitatively elucidating the relationship between sample size and the resolution of CDN detection.

      The question is very general as it is about multigene interactions, or epistasis. The challenges are true in all aspects of evolutionary biology, for example, the genetics of reproductive isolation(Wu and Ting 2004). The issue of epistasis is difficult because most, if not all, of the underlying mutations have to be identified in order to carry out functional tests. While the full identification is rarely feasible, it is precisely the objective of the CDN project. When the sample size increases to 100,000 for a cancer type, all point mutations for that cancer type should be identifiable.

      The QC of the TCGA data was not very strict, i.e, "patients with more than 3000 coding region point mutations were filtered out as potential hypermutator phenotypes", it would be better to remove patients beyond +/- 3*S.D from the mean number of mutations for each cancer type. Given some point mutations with >3 hits in the TCGA dataset, they were just false positive mutation callings, particularly in the large repeat regions in the human genome.

      Thanks. The GDC data portal offers data calls from multiple pipelines, enabling us to select mutations detected by at least two pipelines. While including patients with hypermutator phenotypes could introduce potential noise, as shown in Eq. 10 of the main text, our method for defining the upper limit of i* is relative robust to the fluctuations in the E(u) of the corresponding cancer population. Since readers may often ask about this, we expand the Methods section somewhat to emphasize this point.

      The codes for the statistical calculation (i.e., calculation of Ai_e, et al) are not publicly available, which makes the findings hard to be replicated.

      We have now updated the section of “Data Availability” in both papers. The key scripts for generating the major results are available at: https://gitlab.com/ultramicroevo/cdn_v1.

      Reviewer #2 (Public Review):

      Summary:

      The study proposes that many cancer driver mutations are not yet identified but could be identified if they harbor recurrent SNVs. The paper leverages the analysis from Paper #1 that used quantitative analysis to demonstrate that SNVs or CDNs seen 3 or more times are more likely to occur due to selection (ie a driver mutation) than they are to occur by chance or random mutation.

      Strengths:

      Empirically, mutation frequency is an excellent marker of a driver gene because canonical driver mutations typically have recurrent SNVs. Using the TCGA database, the paper illustrates that CDNs can identify canonical driver mutations (Figure 3) and that most CDNs are likely to disrupt protein function (Figure 2). In addition, CDNs can be shared between cancer types (Figure 4).

      Weaknesses:

      Driver alteration validation is difficult, with disagreements on what defines a driver mutation, and how many driver mutations are present in a cancer. The value proposed by the authors is that the identification of all driver genes can facilitate the design of patient-specific targeting therapies, but most targeted therapies are already directed towards known driver genes. There is an incomplete discussion of oncogenes (where activating mutations tend to target a single amino acid or repeat) and tumor suppressor genes (where inactivating mutations may be more spread across the gene). Other alterations (epigenetic, indels, translocations, CNVs) would be missed by this type of analysis.

      The above paragraph has three distinct points. We shall respond one by one.

      First, …  can facilitate the design of patient-specific targeting therapies, but most targeted therapies are already directed towards known driver genes…

      We state in the text of Discussion the following that shows only a few best-known driving mutations have been targeted. It is accurate to say that < 5% of CDNs we have identified are on the current targeting list. Furthermore, this list we have compiled is < 10% of what we expect to find.

      Direct functional test of CDNs would be to introduce putative cancer-driving mutations and observe the evolution of tumors. Such a task of introducing multiple mutations that are collectively needed to drive tumorigenesis has been done only recently, and only for the best-known cancer driving mutations (Ortmann et al. 2015; Takeda et al. 2015; Hodis et al. 2022). In most tumors, the correct combination of mutations needed is not known. Clearly, CDNs, with their strong tumorigenic strength, are suitable candidates.

      Second, “There is an incomplete discussion of oncogenes (where activating mutations tend to target a single amino acid or repeat) and tumor suppressor genes (where inactivating mutations may be more spread across the gene).”

      We sincerely thank the reviewer for this insightful comment. Below are two new paragraphs in the Discussion pertaining to the point:

      In this context, we should comment on the feasibility of targeting CDNs that may occur in either oncogenes (ONCs) or tumor suppressor genes (TSGs). It is generally accepted that ONCs drive tumorigenesis thanks to the gain-of-function (GOF) mutations whereas TSGs derive their tumorigenic powers by loss-of-function (LOF) mutations. It is worthwhile to point out that, since LOF mutations are likely to be more widespread on a gene, CDNs are biased toward GOF mutations. The often even distribution of non-sense mutations along the length of TSGs provide such evidence. As gene targeting aims to diminish gene functions, GOF mutations are perceived to be targetable whereas LOF mutations are not. By extension, ONCs should be targetable but TSGs are not. This last assertion is not true because mutations on TSGs may often be of the GOF kind as well.

      The data often suggest that mis-sense mutations on TSGs are of the GOF kind. If mis-sense mutations are far more prevalent than nonsense mutations in tumors, the mis-sense mutations cannot possibly be LOF mutations. (After all, it is not possible to lose more functions than nonsense mutations.) For example, AAA to AAC (K to Q) is a mis-sense mutation while AAA to AAT (K to stop) is a non-sense mutation. In a separate study (referred to as the escape-route analysis), we found many cases where the mis-sense mutations on TSGs are more prevalent (> 10X) than nonsense mutations. Another well-known example is the distribution of non-sense mutations TSGs. For example, on APC, a prominent TSG, non-sense mutations are far more common in the middle 20% of the gene than the rest (Zhang and Shay 2017; Erazo-Oliveras et al. 2023). The pattern suggests that even these non-sense mutations could have GOF properties. 

      The following response is about the clinical implications of our CDN analysis. Canonical targeted therapy often relies on the Tyrosine Kinase Inhibitors (TKIs) (Dang et al. 2017; Danesi et al. 2021; Waarts et al. 2022). Theoretically, any intervention that suppresses the expression of gain-of-function (GOF) CDNs could potentially have therapeutic value in cancer treatment. This leads us to a discussion of oncogenes versus TSGs in the context of GOF / LOF (loss of function) mutations. Not all mutations on oncogenes have oncogenic effect, besides, truncated mutations in oncogenes are often subject to negative selection (Bányai et al. 2021), the identification of CDNs within oncogenes is therefore crucial for developing effective cancer treatment guidelines. Secondly, while TSGs are generally believed to promote cancer development via loss of function mutations, research suggests that certain mutations within TSGs can have GOF-like effect, such as the dominant negative effect of truncated TP53 mutations (Marutani et al. 1999; de Vries et al. 2002; Gerasimavicius et al. 2022). Characterizing driver mutations as GOF or LOF mutations could potentially expand the scope of targeted cancer therapy. We’ll address this issue in a third study in preparation.

      The method could be more valuable when applied to the noncoding genome, where driver mutations in promoters or enhancers are relatively rare, or as yet to be discovered. Increasingly more cancers have had whole genome sequencing. Compared to WES, criteria for driver mutations in noncoding regions are less clear, and this method could potentially provide new noncoding driver CDNs. Observing the same mutation in more than one cancer specimen is empirically unusual, and the authors provide a solid quantitative analysis that indicates many recurrent mutations are likely to be cancer-driver mutations.

      Again, we are grateful for the comments which prompt us to expand a paragraph in Discussion, reproduced below.

      The CDN approach has two additional applications. First, it can be used to find CDNs in non-coding regions. Although the number of whole genome sequences at present is still insufficient for systematic CDN detection, the preliminary analysis suggests that the density of CDNs in non-coding regions is orders of magnitude lower than in coding regions. Second, CDNs can also be used in cancer screening with the advantage of efficiency as the targeted mutations are fewer. For the same reason, the false negative rate should be much lower too. Indeed, the false positive rate should be far lower than the gene-based screen which often shows a false positive rate of >50% (supplement File S1).

      Again, we are grateful that Reviewer #2 have addressed the potential value of our study in finding cancer drivers in non-coding regions. A major challenge in this area lies in defining the appropriate L value as presented in Eq. 10. In the main text, we used a gamma distribution to account for the variability of mutation rates across sites in coding region. For the non-coding region, we will categorize these regions based on biological annotations. The goal is to set different i* cutoffs for different genomic regions (such as heterochromatin / euchromatin, GC-rich regions or centromeric regions), and avoid false positive calls for CDN in repeated regions (Elliott and Larsson 2021; Peña et al. 2023).

      References

      Bányai L, Trexler M, Kerekes K, Csuka O, Patthy L. 2021. Use of signals of positive and negative selection to distinguish cancer genes and passenger genes. Elife 10:e59629.

      Danesi R, Fogli S, Indraccolo S, Del Re M, Dei Tos AP, Leoncini L, Antonuzzo L, Bonanno L, Guarneri V, Pierini A, et al. 2021. Druggable targets meet oncogenic drivers: opportunities and limitations of target-based classification of tumors and the role of Molecular Tumor Boards. ESMO Open 6:100040.

      Dang CV, Reddy EP, Shokat KM, Soucek L. 2017. Drugging the “undruggable” cancer targets. Nat Rev Cancer 17:502–508.

      Elliott K, Larsson E. 2021. Non-coding driver mutations in human cancer. Nat Rev Cancer 21:500–509.

      Erazo-Oliveras A, Muñoz-Vega M, Mlih M, Thiriveedi V, Salinas ML, Rivera-Rodríguez JM, Kim E, Wright RC, Wang X, Landrock KK, et al. 2023. Mutant APC reshapes Wnt signaling plasma membrane nanodomains by altering cholesterol levels via oncogenic β-catenin. Nat Commun 14:4342.

      Gerasimavicius L, Livesey BJ, Marsh JA. 2022. Loss-of-function, gain-of-function and dominant-negative mutations have profoundly different effects on protein structure. Nat Commun 13:3895.

      Hodis E, Triglia ET, Kwon JYH, Biancalani T, Zakka LR, Parkar S, Hütter J-C, Buffoni L, Delorey TM, Phillips D, et al. 2022. Stepwise-edited, human melanoma models reveal mutations’ effect on tumor and microenvironment. Science 376:eabi8175.

      Marutani M, Tonoki H, Tada M, Takahashi M, Kashiwazaki H, Hida Y, Hamada J, Asaka M, Moriuchi T. 1999. Dominant-negative mutations of the tumor suppressor p53 relating to early onset of glioblastoma multiforme. Cancer Res 59:4765–4769.

      Ortmann CA, Kent DG, Nangalia J, Silber Y, Wedge DC, Grinfeld J, Baxter EJ, Massie CE, Papaemmanuil E, Menon S, et al. 2015. Effect of Mutation Order on Myeloproliferative Neoplasms. N Engl J Med 372:601–612.

      Peña MV de la, Summanen PAM, Liukkonen M, Kronholm I. 2023. Chromatin structure influences rate and spectrum of spontaneous mutations in Neurospora crassa. Genome Res. 33:599–611.

      Takeda H, Wei Z, Koso H, Rust AG, Yew CCK, Mann MB, Ward JM, Adams DJ, Copeland NG, Jenkins NA. 2015. Transposon mutagenesis identifies genes and evolutionary forces driving gastrointestinal tract tumor progression. Nat Genet 47:142–150.

      de Vries A, Flores ER, Miranda B, Hsieh H-M, van Oostrom CThM, Sage J, Jacks T. 2002. Targeted point mutations of p53 lead to dominant-negative inhibition of wild-type p53 function. Proceedings of the National Academy of Sciences 99:2948–2953.

      Waarts MR, Stonestrom AJ, Park YC, Levine RL. 2022. Targeting mutations in cancer. J Clin Invest 132:e154943.

      Wu C-I, Ting C-T. 2004. Genes and speciation. Nat Rev Genet 5:114–122.

      Zhang L, Shay JW. 2017. Multiple Roles of APC and its Therapeutic Implications in Colorectal Cancer. JNCI: Journal of the National Cancer Institute 109:djw332.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment This valuable paper reports a theoretical framework and methodology for identifying Cancer Driving Nucleotides (CDNs), primarily based on single nucleotide variant (SNV) frequencies. A variety of solid approaches indicate that a mutation recurring three or more times is more likely to reflect selection rather than being the consequence of a mutation hotspot. The method is rigorously quantitative, though the requirement for larger datasets to fully identify all CDNs remains a noted limitation. The work will be of broad interest to cancer geneticists and evolutionary biologists. 

      The key criticism “the requirement for larger datasets to fully identify all CDNs remains a noted limitation” that is also found in both reviews. We have clarified the issue in the main text, the relevant parts, from which are copied below. The response below also addresses many comments in the reviews. In addition, Discussion of eLife-RP-RA-2024-99341 has been substantially expanded to answer the questions of Reviewer 2.

      We shall answer the boldface comment in three ways. First, it can be answered using GENIE data. Fig. 7 of the main text (eLife-RP-RA-2024-99340) shows that, when n increases from ~ 1000 to ~ 9,000, the numbers of discovered CDNs increase by 3 – 5 fold, most of which come from the two-hit class. Hence, the power of discovering more CDNs with larger datasets is evident. By extrapolation, a sample size of 100,000 should be able to yield 90% of all CDNs, as calculated here. (Fig. 7 also addresses the queries of whether we have used datasets other than TCGA. We indeed have used all public data, including GENIE and COSMIC.) 

      Second, the power of discovering more cancer driver genes by our theory is evident even without using larger datasets. Table 3 of the companion study (eLife-RP-RA-2024-99341) shows that, averaged across cancer types, the conventional method would identify 45 CDGs while the CDN method tallies 258 CDGs. The power of the CDN method is demonstrated. This is because the conventional approach has to identify CDGs (cancer driver genes) in order to identify the CDNs they carry. However, many CDNs occur in non-CDGs and are thus missed by the conventional approach. In Supplementary File S2, we have included a full list of CDNs discovered in our study, along with population allele frequency annotations from gnomAD. The distribution patterns of these CDNs across different cancer types show their pan-cancer properties as further explored in the companion paper.

      Third, while many, or even most CDNs occur in non-CDGs and are thus missed, the conventional approach also includes non-CDN mutations in CDGs. This is illustrated in Fig. 5 of the companion study (eLife-RP-RA-2024-99341) that shows the adverse effect of misidentifications of CDNs by the conventional approach. In that analysis, the gene-targeting therapy is effective if the patient has the CDN mutations on EGFR, but the effect is reversed if the EGFR mutations are non-CDN mutations.

      Reviewer #1 (Public Review):

      The authors developed a rigorous methodology for identifying all Cancer Driving Nucleotides (CDNs) by leveraging the concept of massively repeated evolution in cancer. By focusing on mutations that recur frequently in pan-cancer, they aimed to differentiate between true driver mutations and neutral mutations, ultimately enhancing the understanding of the mutational landscape that drives tumorigenesis. Their goal was to call a comprehensive catalogue of CDNs to inform more effective targeted therapies and address issues such as drug resistance.

      Strengths

      (1) The authors introduced a concept of using massively repeated evolution to identify CDNs. This approach recognizes that advantageous mutations recur frequently (at least 3 times) across cancer patients, providing a lens to identify true cancer drivers.

      (2) The theory showed the feasibility of identifying almost all CDNs if the number of sequenced patients increases to 100,000 for each cancer type.

      Weaknesses

      (1) The methodology remains theoretical and no novel true driver mutations were identified in this study.

      We now address the weakness criticism, which is gratefully received.

      The second part of the criticism (no novel true driver mutations were identified in this study) has been answered in the long responses to eLife assessment above. The first part “The methodology remains theoretical” is somewhat unclear. It might be the lead to the second part. However, just in case, we interpret the word “theoretical” to mean “the lack of experimental proof” and answer below.

      As Reviewer #1 noted, a common limitation of theoretical and statistical analyses of cancer drivers is the need to validate their selective advantage through in vitro or in vivo functional testing. This concern is echoed by both reviewers in the companion paper (eLife-RP-RA-2024-99341), prompting us to consider the methodology for functional testing of potential cancer drivers. An intuitive approach would involve introducing putative driver mutations into normal cells and observing phenotypic transformation in vitro and in vivo. In a recent stepwise-edited human melanoma model, Hodis et al. demonstrated that disease-relevant phenotypes depend on the “correct” combinations of multiple driver mutations (Hodis et al. 2022). Other high-throughput strategies can be broadly categorized into two approaches: (1) introducing candidate driver mutations into pre-malignant model systems that already harbor a canonical mutant driver (Drost and Clevers 2018; Grzeskowiak et al. 2018; Michels et al. 2020) and (2) introducing candidate driver mutations into growth factor-dependent cell models and assessing their impact on resulting fitness (Bailey et al. 2018; Ng et al. 2018). The underlying assumption of these strategies is that the fitness outcomes of candidate driver mutations are influenced by pre-existing driver mutations and the specific pathways or cancer hallmarks being investigated. This confines the functional test of potential cancer driver mutations to conventional cancer pathways. A comprehensive identification of CDNs is therefore crucial to overcome these limitations. In conjunction with other driver signal detection methods, our study aims to provide a more comprehensive profile of driver mutations, thereby enabling the functional testing of drivers involved in non-conventional cancer evolution pathways.

      (2) Different cancer types have unique mutational landscapes. The methodology, while robust, might face challenges in uniformly identifying CDNs across various cancers with distinct genetic and epigenetic contexts.

      We appreciate the comment. Indeed, different cancer types should have different genetic and epigenetic landscapes. In that case, one may have expected CDNs to be poorly shared among cancer types. However, as reported in Fig. 4 of the companion study, the sharing of CDNs across cancer types is far more common than the sharing of CDGs (Cancer Driving Genes). We suggest that CDNs have a much higher resolution than CDGs, whereby the signals are diluted by non-driver mutations. In other words, despite that the mutational landscape may be cancer-type specific, the pan-cancer selective pressure may be sufficiently high to permit the detection of CDN sharing among cancer types.

      Below, we shall respond in greater details. Epigenetic factors, such as chromatin states, methylation/acetylation levels, and replication timing, can provide valuable insights when analyzing mutational landscapes at a regional scale (Stamatoyannopoulos et al. 2009; Lawrence et al. 2013; Makova and Hardison 2015; Baylin and Jones 2016; Alexandrov et al. 2020; Abascal et al. 2021; Sherman et al. 2022). However, at the site-specific level, the effectiveness of these covariates in predicting mutational landscapes depends on their integration into a detailed model. Overemphasizing these covariates could lead to false negatives for known driver mutations (Hess et al. 2019; Elliott and Larsson 2021). In figure 3B of the main text, we illustrate the discrepancy between the mutation rate predictions from Dig and empirical observation. Ideally, no covariates would be needed under extensive sample sizes, where each mutable genomic sites would have sufficient mutations to yield a statistic significance and consequently, synonymous mutations would be sufficient for the characterization of mutational landscape. In this sense, the integration of mutational covariates represents a compromise under current sample size. In our study, the effect of unique mutational landscapes is captured by E(u), the mean mutation rate for each cancer type. We further accounted for the variability of site-level mutability using a gamma distribution. The primary goal of our study is to determine the upper limit of mutation recurrences under mutational mechanisms only. While selection force acts blindly to genomic features, mutational hotspots should exhibit common characteristics determined by their underlying mechanisms. In the main text, we attempted to identify such shared features among CDNs. Until these mutational mechanisms are fully understood, CDNs should be considered as potential driver mutations.

      (3) L223, the statement "In other words, the sequences surrounding the high-recurrence sites appear rather random.". Since it was a pan-cancer analysis, the unique patterns of each cancer type could be strongly diluted in the pan-cancer data.

      We now state that the analyses of mutation characteristic have been applied to the individual cancer types and did not find any pattern that deviates from randomness. Nevertheless, it may be argued that, with the exception of those with sufficiently large sample sizes such as lung and breast cancers, most datasets do not have the power to reject the null hypothesis. To alleviate this concern, we applied the ResNet and LSTM/GRU methods for the discovery of potential mutation motifs within each cancer type. All methods are more powerful than the one used but the results are the same – no cancer type yields a mutation pattern that can reject the null hypothesis of randomness (see below).

      As a positive control, we used these methods for the discovery of splicing sites of human exons. When aligned up with splicing site situated in the center (position 51 in the following plot), the sequence motif would look like:

      Author response image 1.

      5-prime

      Author response image 2.

      3-prime

      However, To account for the potential influence of distance from the mutant site in motif analysis, we randomly shuffled the splicing sites within a specified window around the alignment center, and their sequence logo now looks like:

      Author response image 3.

      5-prime shuffled

      Author response image 4.

      3-prime shuffled

      Author response image 5.

      random sequences from coding regions

      The classification results of the shuffled 5-prime (donner), 3-prime (acceptor) and random sequences from coding regions (Random CDS) are presented in the Author response table 1 (The accuracy for the aligned results, which is approximately 99%, is not shown here).

      Author response table 1.

      With the positive results from these positive controls (splicing site motifs) validating our methodology, we applied the same model structure to the train and test of potential mutational motifs of CDN sites. All models achieved approximately 50% accuracy in CDN motif analysis, suggesting that the sequence contexts surrounding CDN sites are not significantly different from other coding regions of the genome. This further implies that the recurrence of mutations at CDN sites is more likely driven by selection rather than mutational mechanisms.

      Note that this preliminary analysis may be limited by insufficient training data for CDN sites. Future studies will require larger sample sizes and more sophisticated models to address these limitations.

      (4) To solidify the findings, the results need to be replicated in an independent dataset.

      Figure 7 validates our CDN findings using the GENIE dataset, which primarily consists of targeted sequencing data from various panels. By focusing on the same genomic regions sequenced by GENIE, we observed a 3-5 fold increase in the number of discovered CDNs as sample size increased from approximately 1000 to 9000. Moreover, the majority of CDNs identified in TCGA were confirmed as CDNs in GENIE.

      (5) The key scripts and the list of key results (i.e., CDN sites with i{greater than or equal to}3) need to be shared to enable replication, validation, and further research. So far, only CDN sites with i{greater than or equal to}20 have been shared.

      We have now updated the “Data Availability” section in the main text, the corresponding scripts for key results are available on Gitlab at: https://gitlab.com/ultramicroevo/cdn_v1.

      (6) The versions of data used in this study are not clearly detailed, such as the specific version of gnomAD and the version and date of TCGA data downloaded from the GDC Data Portal.

      The versions of data sources have now been updated in the revised manuscript.

      Recommendations For The Authors:

      (1) L119, states "22.7 million nonsynonymous sites," but Table 1 lists the number as 22,540,623 (22.5 million). This discrepancy needs to be addressed for consistency.<br /> (2) Figure 2B, there is an unexplained drop in the line at i = 6 and 7 (from 83 to 45). Clarification is needed on why this drop occurs.<br /> (3) Figure 3A, for the CNS type, data for recurrence at 8 and 9 are missing. An explanation should be provided for this absence.<br /> (4) L201, the title refers to "100-mers," but L218 mentions "101-mers." This inconsistency needs to be corrected to ensure clarity and accuracy.<br /> (5) Figures 6 and 7 currently lack titles. Titles should be added to these figures to improve readability.

      Thanks. All corrections have been incorporated into the revised manuscript.

      Reviewer #2 (Public Review):<br /> Summary:<br /> The authors propose that cancer-driver mutations can be identified by Cancer Driving Nucleotides (CDNs). CDNs are defined as SNVs that occur frequently in genes. There are many ways to define cancer driver mutations, and the strengths and weaknesses are the reliance on statistics to define them.<br /> Strengths:<br /> There are many well-known approaches and studies that have already identified many canonical driver mutations. A potential strength is that mutation frequencies may be able to identify as yet unrecognized driver mutations. They use a previously developed method to estimate mutation hotspots across the genome (Dig, Sherman et al 2022). This publication has already used cancer sequence data to infer driver mutations based on higher-than-expected mutation frequencies. The advance here is to further illustrate that recurrent mutations (estimated at 3 or more mutations (CDNs) at the same base) are more likely to be the result of selection for a driver mutation (Figure 3). Further analysis indicates that mutation sequence context (Figure 4) or mutation mechanisms (Figure 5) are unlikely to be major causes for recurrent point mutations. Finally, they calculate (Figure 6) that most driver mutations identifiable by the CDN approach could be identified with about 100,000 to one million tumor coding genomes.<br /> Weaknesses:<br /> The manuscript does provide specific examples where recurrent mutations identify known driver mutations but do not identify "new" candidate driver mutations. Driver mutation validation is difficult and at least clinically, frequency (ie observed in multiple other cancer samples) is indeed commonly used to judge if an SNV has driver potential. The method would miss alternative ways to trigger driver alterations (translocations, indels, epigenetic, CNVs). Nevertheless, the value of the manuscript is its quantitative analysis of why mutation frequencies can identify cancer driver mutations.

      Recommendations For The Authors<br /> Whereas the analysis of driver mutations in WES has been extensive, the application of the method to WGS data (ie the noncoding regions) would provide new information.

      We appreciate that Reviewer #2 has suggested the potential application of our method to noncoding regions. Currently, the background mutation model is based on the site level mutations in coding regions, which hinders its direct applications in other mutation types such as CNVs, translocations and indels. We acknowledge that the proportion of patients with driver event involving CNV (73%) is comparable to that of coding point mutations (76%) as reported in the PCAWG analysis (Fig. 2A from Campbell et al., 2020). In future studies, we will attempt to establish a CNV-based background mutation rate model to identify positive selection signals driving tumorigenesis.

      References

      Abascal F, Harvey LMR, Mitchell E, Lawson ARJ, Lensing SV, Ellis P, Russell AJC, Alcantara RE, Baez-Ortega A, Wang Y, et al. 2021. Somatic mutation landscapes at single-molecule resolution. Nature:1–6.

      Alexandrov LB, Kim J, Haradhvala NJ, Huang MN, Tian Ng AW, Wu Y, Boot A, Covington KR, Gordenin DA, Bergstrom EN, et al. 2020. The repertoire of mutational signatures in human cancer. Nature 578:94–101.

      Bailey MH, Tokheim C, Porta-Pardo E, Sengupta S, Bertrand D, Weerasinghe A, Colaprico A, Wendl MC, Kim J, Reardon B, et al. 2018. Comprehensive Characterization of Cancer Driver Genes and Mutations. Cell 173:371-385.e18.

      Baylin SB, Jones PA. 2016. Epigenetic Determinants of Cancer. Cold Spring Harb Perspect Biol 8:a019505.

      Campbell PJ, Getz G, Korbel JO, Stuart JM, Jennings JL, Stein LD, Perry MD, Nahal-Bose HK, Ouellette BFF, Li CH, et al. 2020. Pan-cancer analysis of whole genomes. Nature 578:82–93.

      Drost J, Clevers H. 2018. Organoids in cancer research. Nat Rev Cancer 18:407–418.

      Elliott K, Larsson E. 2021. Non-coding driver mutations in human cancer. Nat Rev Cancer 21:500–509.

      Grzeskowiak CL, Kundu ST, Mo X, Ivanov AA, Zagorodna O, Lu H, Chapple RH, Tsang YH, Moreno D, Mosqueda M, et al. 2018. In vivo screening identifies GATAD2B as a metastasis driver in KRAS-driven lung cancer. Nat Commun 9:2732.

      Hess JM, Bernards A, Kim J, Miller M, Taylor-Weiner A, Haradhvala NJ, Lawrence MS, Getz G. 2019. Passenger Hotspot Mutations in Cancer. Cancer Cell 36:288-301.e14.

      Hodis E, Triglia ET, Kwon JYH, Biancalani T, Zakka LR, Parkar S, Hütter J-C, Buffoni L, Delorey TM, Phillips D, et al. 2022. Stepwise-edited, human melanoma models reveal mutations’ effect on tumor and microenvironment. Science 376:eabi8175.

      Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, Carter SL, Stewart C, Mermel CH, Roberts SA, et al. 2013. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499:214–218.

      Makova KD, Hardison RC. 2015. The effects of chromatin organization on variation in mutation rates in the genome. Nat Rev Genet 16:213–223.

      Michels BE, Mosa MH, Streibl BI, Zhan T, Menche C, Abou-El-Ardat K, Darvishi T, Członka E, Wagner S, Winter J, et al. 2020. Pooled In Vitro and In Vivo CRISPR-Cas9 Screening Identifies Tumor Suppressors in Human Colon Organoids. Cell Stem Cell 26:782-792.e7.

      Ng PK-S, Li J, Jeong KJ, Shao S, Chen H, Tsang YH, Sengupta S, Wang Z, Bhavana VH, Tran R, et al. 2018. Systematic Functional Annotation of Somatic Mutations in Cancer. Cancer Cell 33:450-462.e10.

      Sherman MA, Yaari AU, Priebe O, Dietlein F, Loh P-R, Berger B. 2022. Genome-wide mapping of somatic mutation rates uncovers drivers of cancer. Nat Biotechnol 40:1634–1643.

      Stamatoyannopoulos JA, Adzhubei I, Thurman RE, Kryukov GV, Mirkin SM, Sunyaev SR. 2009. Human mutation rate associated with DNA replication timing. Nat Genet 41:393–395.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      The authors proposed a framework to estimate the posterior distribution of parameters in biophysical models. The framework has two modules: the first MLP module is used to reduce data dimensionality and the second NPE module is used to approximate the desired posterior distribution. The results show that the MLP module can capture additional information compared to manually defined summary statistics. By using the NPE module, the repetitive evaluation of the forward model is avoided, thus making the framework computationally efficient. The results show the framework has promise in identifying degeneracy. This is an interesting work.

      We thank the reviewer for the positive comments made on our manuscript. 

      Reviewer #1 (Recommendations For The Authors): 

      I have some minor comments. 

      (1) The uGUIDE framework has two modules, MLP and NPE. Why are the two modules trained jointly? The MLP module is used to reduce data dimensionality. Given that the number of features for different models is all fixed to 6, why does one need different MLPs? This module should, in principle, be general-purpose and independent of the model used.

      The MLP must be trained together with the NPE module to maximise inference performance in terms of accuracy and precision. Although the number of features predicted by the MLP was fixed to six, the characteristics of these six features can be very different, depending on the chosen forward model and the available data, as we showed in Appendix 1 Figure 1. Training the MLP independently of the NPE would result in suboptimal performance of µGUIDE, with potentially higher bias and variance of the predicted posterior distributions. We have now added these considerations in the Methods section.

      (2) The authors mentioned at L463 that all the 3 models use 6 features. From L445 to L447, it seems model 3 has 7 unknown parameters. How can one use 6 features to estimate 7 unknowns? 

      Thank you for pointing out the lack of clarity regarding the parameters to estimate in this section. Model 3 is a three-compartment model, whose parameters of interest are the signal fraction and diffusivity from water diffusing in the neurite space (fn and Dn), the neurites orientation dispersion index (ODI), the signal fraction in cell bodies (fs), a proxy to soma radius and diffusivity (Cs), and the signal fraction and diffusivity in the extracellular space (fe and De). The signal fractions are constrained by the relationship fn + fs + fe = 1, hence fe  i_s calculated from the estimated _fn and fs. This leaves us with 6 parameters to estimate: fn, Dn, ODI, fs, Cs, De. We clarified it in the revised version of the paper. 

      (3) L471, Rician noise is not a proper term. Rician distribution is the distribution of pixel intensities observed in the presence of noise. And Rician distribution is the result of magnitude reconstruction. See "Noise in magnitude magnetic resonance images" published in 2008. I assume that real-valued Gaussian noise is added to simulated data. 

      We apologize for the confusion. We added Gaussian noise to the real and imaginary parts of the simulated signals and then used the magnitude of this noisy complex signal for our experiments. We rephrased the sentence for more clarity.

      (4) L475, why thinning is not used in MCMC? In figure 3, the MCMC results are more biased than uGUIDE, is it related to no thinning in MCMC? 

      We followed the recommendations by Harms et al. (2018) for the MCMC experiments. They analysed the impact of thinning (among other parameters) on the estimated posterior distributions. Their findings indicate that thinning is unnecessary and inefficient, and they recommend using more samples instead. For further details, we refer the reviewer to their publication, along with the theoretical works they cite. We have now added this note in the Methods section.

      (5) Did the authors try model-fitting methods with different initializations to get a distribution of the parameters? Like the paper "Degeneracy in model parameter estimation for multi‐compartmental diffusion in neuronal tissue". For the in vivo data, it is informative to see the model-fitting results.

      No, we did not try model-fitting methods with different initializations because such methods provide only a partial description of the solution landscape, which can be interpreted as a partial posterior distribution. Although this approach can help to highlight the problem of degeneracy, it does not provide a complete description of all potential solutions. In contrast, MCMC estimates the full posterior distribution, offering a more accurate and precise characterization of degeneracies and uncertainties compared to model-fitting methods with varying initializations. Hence, we decided to use MCMC as benchmark. We have now added these considerations to the Discussion section. 

      Reviewer #2 (Public Review): 

      Summary: 

      The authors improve the work of Jallais et al. (2022) by including a novel module capable of automatically learning feature selection from different acquisition protocols inside a supervised learning framework. Combining the module above with an estimation framework for estimating the posterior distribution of model parameters, they obtain rich probabilistic information (uncertainty and degeneracy) on the parameters in a reasonable computation time. 

      The main contributions of the work are: 

      (1) The whole framework allows the user to avoid manually defining summary statistics, which may be slow and tedious and affect the quality of the results. 

      (2) The authors tested the proposal by tackling three different biophysical models for brain tissue and using data with characteristics commonly used by the diffusion-MRmicrostructure research community. 

      (3) The authors validated their method well with the state-of-the-art. 

      The main weakness is: 

      (1) The methodology was tested only on scenarios with a signal-to-noise ratio (SNR) equal to 50. It is interesting to show results with lower SNR and without noise that the method can detect the model's inherent degenerations and how the degeneration increases when strong noise is present. I suggest expanding the Figure in Appendix 1 to include this information. 

      The authors showed the utility of their proposal by computing complex parameter descriptors automatically in an achievable time for three different and relevant biophysical models. 

      Importantly, this proposal promotes tackling, analysing, and considering the degenerated nature of the most used models in brain microstructure estimation. 

      We thank the reviewer for these positive remarks. 

      Concerning the main weakness highlighted by the reviewer: In our submitted work, we presented results both without noise and with a signal-to-noise ratio (SNR) equal to 50 (similar to the SNR in the experimental data analysed). Figure 5 shows exemplar posterior distributions obtained in a noise-free scenario, and Table 1 reports the number of degeneracies for each model on 10000 noise-free simulations. These results highlight that the presence of degeneracies is inherent to the model definition. Figures 3, 6 and 7 present results considering an SNR of 50. We acknowledge that results with lower SNR have not been included in the initial submission. To address this, we added a figure in the appendix illustrating the impact of noise on the posterior distributions. Specifically, Figure 1A of Appendix 2 shows posterior distributions estimated from signals generated using an exemplar set of model parameters with varying noise levels

      (no noise, SNR=50 and SNR=25). Figure 1B presents uncertainties values obtained on 1000 simulations for each noise level. We observe that, as the SNR reduces, uncertainty increases. Noise in the signal contributes to irreducible variance. The confidence in the estimates therefore reduces as the noise level increases.  

      Reviewer #2 (Recommendations For The Authors):  

      Some suggestions: 

      Panel A of Figure 2 may deserve a better explanation in the Figure's caption. 

      We agree that the description of panel A of figure 2 was succinct and added more explanation in the figure’s caption.  

      The caption of Figure 3 should mention that the panel's titles are the parameters of the used biophysical models. 

      We added in the caption of figure 3 that the names of the model parameters are indicated in the titles of the panels. We apologise for the confusion it may have created.

      In equation (3), the authors should indicate the summation index. 

      We apologise for not putting the summation index in equation 3. We added it in the revised version.

      In line 474, the authors should discuss if the systematic use of the maximum likelihood estimator as an initializer for the sampling does not bias the computed results. 

      Concerning the MCMC estimations, we followed the recommendations from Harms et al. (2018). They investigated the use of starting from the maximum likelihood estimator (MLE). They concluded that starting from the MLE allows to start in the stationary distribution of the Markov chain, removing the need for some burn-in. Additionally, they showed that initializing the sampling from the MLE has the advantage of removing salt- and pepper-like noise from the resulting mean and standard deviation maps. We have now added this note in the Methods section.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Tubert C. et al. investigated the role of dopamine D5 receptors (D5R) and their downstream potassium channel, Kv1, in the striatal cholinergic neuron pause response induced by thalamic excitatory input. Using slice electrophysiological analysis combined with pharmacological approaches, the authors tested which receptors and channels contribute to the cholinergic interneuron pause response in both control and dyskinetic mice (in the L-DOPA off state). They found that activation of Kv1 was necessary for the pause response, while activation of D5R blocked the pause response in control mice. Furthermore, in the L-DOPA off-state of dyskinetic mice, the absence of the pause response was restored by the application of clozapine. The authors claimed that (1) the D5R-Kv1 pathway contributes to the cholinergic interneuron pause response in a phasic dopamine concentration-dependent manner, and (2) clozapine inhibits D5R in the L-DOPA off state, which restores the pause response.

      Strengths:

      The electrophysiological and pharmacological approaches used in this study are powerful tools for testing channel properties and functions. The authors' group has well-established these methodologies and analysis pipelines. Indeed, the data presented were robust and reliable.

      Thank you for your comments.

      Weaknesses:

      Although the paper has strengths in its methodological approaches, there is a significant gap between the presented data and the authors' claims.

      There was no direct demonstration that the D5R-Kv1 pathway is dominant when dopamine levels are high. The term 'high' is ambiguous, and it raises the question of whether the authors believe that dopamine levels do not reach the threshold required to activate D5R under physiological conditions.

      We acknowledge that further work is necessary to clarify the role of the D5R in physiological conditions. While we haven’t found effects of the D1/D5 receptor antagonist SCH23390 on the pause response in control animals (Fig. 3), it is still possible that dopamine levels reach the threshold to stimulate D5R when burst firing of dopaminergic neurons contributes to dopamine release. We believe the pause response depends, among other factors, on the relative stimulation levels of SCIN D2 and D5 receptors, which is likely not an all-or-nothing phenomenon. To reduce ambiguity, we will change the labels referring to dopamine levels in Figure 6F.

      Furthermore, the data presented in Figure 6 are confusing. If clozapine inhibits active D5R and restores the pause response, the D5R antagonist SCH23390 should have the same effect. The data suggest that clozapine-induced restoration of the pause response might be mediated by other receptors, rather than D5R alone.

      Thank you for letting us clarify this issue. Please note that the levels of endogenous dopamine 24 h after the last L-DOPA challenge in severe parkinsonian mice are expected to be very low. In the absence of an agonist, a pure D1/D5 antagonist would not exert an effect, as demonstrated with SCH23390 alone, which did not have an impact on the SCIN response to thalamic stimulation (Fig. 6). While clozapine can also act as a D1/D5 receptor antagonist, its D1/D5 effects in absence of an agonist are attributed to its inverse agonist properties (PMID: 24931197). Notably, SCH23390 prevented the effect of clozapine, allowing us to conclude that ligand-independent D1/D5 receptor-mediated mechanisms are involved in suppressing the pause response in dyskinetic mice. We will make the point clearer in the Discussion.

      Reviewer #2 (Public review):

      Summary:

      This manuscript by Tubert et al presents the role of the D5 receptor in modulating the striatal cholinergic interneuron (CIN) pause response through D5R-cAMP-Kv1 inhibitory signaling. Their model elucidates the on / off switch of CIN pause, likely due to the different DA affinity between D2R and D5R. This machinery may be crucial in modulating synaptic plasticity in cortical-striatal circuits during motor learning and execution. Furthermore, the study bridges their previous finding of CIN hyperexcitability (Paz et al., Movement Disorder 2022) with the loss of pause response in LID mice.

      Strengths:

      The study had solid findings, and the writing was logically structured and easy to follow. The experiments are well-designed, and they properly combined electrophysiology recording, optogenetics, and pharmacological treatment to dissect/rule out most, if not all, possible mechanisms in their model.

      Thank you for your comments.

      Weaknesses:

      The manuscript is overall satisfying with only some minor concerns that need to be addressed. Manipulation of intracellular cAMP (e.g. using pharmacological analogs or inhibitors) can add additional evidence to strengthen the conclusion.

      Thank you for the suggestion. While we acknowledge that we are not providing direct evidence of the role of cAMP, we chose not to conduct these experiments because cAMP levels influence several intrinsic and synaptic currents beyond Kv1, significantly affecting  membrane oscillations and spontaneous firing, as shown in Paz et al. 2021. However, we are modifying the manuscript so there is no misinterpretation about our findings in the current work.

      Reviewer #3 (Public review):

      Summary:

      Tubert et al. investigate the mechanisms underlying the pause response in striatal cholinergic interneurons (SCINs). The authors demonstrate that optogenetic activation of thalamic axons in the striatum induces burst activity in SCINs, followed by a brief pause in firing. They show that the duration of this pause correlates with the number of elicited action potentials, suggesting a burst-dependent pause mechanism. The authors demonstrated this burst-dependent pause relied on Kv1 channels. The pause is blocked by an SKF81297 and partially by sulpiride and mecamylamine, implicating D1/D5 receptor involvement. The study also shows that the ZD7288 does not reduce the duration of the pause and that lesioning dopamine neurons abolishes this response, which can be restored by clozapine.

      Weaknesses:

      While this study presents an interesting mechanism for SCIN pausing after burst activity, there are several major concerns that should be addressed:

      (1) Scope of the Mechanism:

      It is important to clarify that the proposed mechanism may apply specifically to the pause in SCINs following burst activity. The manuscript does not provide clear evidence that this mechanism contributes to the pause response observed in behavioral animals. While the thalamus is crucial for SCIN pauses in behavioral contexts, the exact mechanism remains unclear. Activating thalamic input triggers burst activity in SCINs, leading to a subsequent pause, but this mechanism may not be generalizable across different scenarios. For instance, approximately half of TANs do not exhibit initial excitation but still pause during behavior, suggesting that the burst-dependent pause mechanism is unlikely to explain this phenomenon. Furthermore, in behavioral animals, the duration of the pause seems consistent, whereas the proposed mechanism suggests it depends on the prior burst, which is not aligned with in vivo observations. Additionally, many in vivo recordings show that the pause response is a reduction in firing rate, not complete silence, which the mechanism described here does not explain. Please address these in the manuscript.

      Thank you for your valuable feedback. While the absence of an initial burst in some TANs in vivo may suggest the involvement of alternative or additional mechanisms, it does not exclude a participation of Kv1 currents. We have seen that subthreshold depolarizations induced by thalamic inputs are sufficient to produce an afterhyperpolarization (AHP) mediated by Kv1 channels (see Tubert et al., 2016, PMID: 27568555). Although such subthreshold depolarizations are not captured in current recordings from behaving animals, intracellular in vivo recordings have demonstrated an intrinsically generated AHP after subthreshold depolarization of SCIN caused by stimulation of excitatory afferents (PMID: 15525771). Additionally, when pause duration is plotted against the number of spikes elicited by thalamic input (Fig. 1G), we found that one elicited spike is followed by an interspike interval 1.4 times longer than the average spontaneous interspike interval. We acknowledge the potential involvement of additional factors, including a decrease of excitatory thalamic input coinciding with the pause, followed by a second volley of thalamic inputs (Fig. 1G-J, after observations by Matsumoto et al., 2001- PMID: 11160526), as well as the timing of elicited spikes relative to ongoing spontaneous firing (Fig. 1D-E). Dopaminergic modulation (Fig. 3) and regional differences among striatal regions (PMID: 24559678) may also contribute to the complexity of these dynamics.

      (2) Terminology:

      The use of "pause response" throughout the manuscript is misleading. The pause induced by thalamic input in brain slices is distinct from the pause observed in behavioral animals. Given the lack of a clear link between these two phenomena in the manuscript, it is essential to use more precise terminology throughout, including in the title, bullet points, and body of the manuscript.

      While we acknowledge that our study does not include in vivo evidence, we believe ex vivo preparations have been instrumental in elucidating the mechanisms underlying the responses observed in vivo. We also agree with previous ex vivo studies in using consistent terminology. However, we will clarify the ex vivo nature of our work in the abstract and bullet points for greater transparency.

      (3) Kv1 Blocker Specificity:

      It is unclear how the authors ruled out the possibility that the Kv1 blocker did not act directly on SCINs. Could there be an indirect effect contributing to the burst-dependent pause? Clarification on this point would strengthen the interpretation of the results.

      Thank you for letting us clarify this issue. In our previous work (Tubert et al., 2016) we showed that the Kv1.3 and Kv1.1 subunits are selectively expressed in SCIN throughout the striatum. Moreover, gabaergic transmission is blocked in our preparations. We are including a phrase to make it clearer in the manuscript.

      (4) Role of D1 Receptors:

      While it is well-established that activating thalamic input to SCINs triggers dopamine release, contributing to SCIN pausing (as shown in Figure 3), it would be helpful to assess the extent to which D1 receptors contribute to this burst-dependent pause. This could be achieved by applying the D1 agonist SKF81297 after blocking nAChRs and D2 receptors.

      Thank you for letting us clarify this point. We show that blocking D2R or nAChR reduces the pause only for strong thalamic stimulation eliciting 4 SCIN spikes (Figure 3G), whereas the D1/D5 agonist SKF81297 is able to reduce the pause induced by weaker stimulation as well (Figure 3C). This may indicate that nAChR-mediated dopamine release induced by thalamic-induced bursts more efficiently activates D2R compared to D5R. We speculate that, in this context, lack of D5R activation may be necessary to keep normal levels of Kv1 currents necessary for SCIN pauses.

      (5) Clozapine's Mechanism of Action:

      The restoration of the burst-dependent pause by clozapine following dopamine neuron lesioning is interesting, but clozapine acts on multiple receptors beyond D1 and D5. Although it may be challenging to find a specific D5 antagonist or inverse agonist, it would be more accurate to state that clozapine restores the burst-dependent pause without conclusively attributing this effect to D5 receptors.

      Thank you for your insightful observation. We acknowledge the difficulty of targeting dopamine receptors pharmacologically due to the lack of highly selective D1/D5 inverse agonists. We used SCH23390, which is a highly selective D1/D5 receptor antagonist devoid of inverse agonist effects, to block clozapine’s ability to restore SCIN pauses (Figure 6C). This indicates that the restoration of SCIN pauses by clozapine depends on D1/D5 receptors. Furthermore, in a previous study, we demonstrated that clozapine’s effect on restoring SCIN excitability in dyskinetic mice (a phenomenon mediated by Kv1 channels in SCIN; Tubert et al., 2016) was not due to its action on serotonin receptors (Paz, Stahl et al., 2022). While our data do not rule out the potential contribution of other receptors, such as muscarinic acetylcholine receptors, we believe they strongly support the role of D1/D5 receptors. To reflect this, we will add a statement discussing the potential contribution of receptors beyond D1/D5.

    1. Author response:

      We thank the editor and reviewers for their feedback. We believe we can address the substantive criticisms in full, first, by providing a more explicit theoretical basis for the method. Then, we believe criticism based on assumptions about phase consistency across time points are not well founded and can be answered. Finally, in response to some reviewer comments, we will improve the surrogate testing of the method.

      We will enhance the theoretical justification for the application of higher-order singular value decomposition (SVD) to the problem of irregular sampling of the cortical area. The initial version of the manuscript was written to allow informal access to these ideas (if possible), but the reviewers find a more rigorous account appropriate. We will add an introduction to modern developments in the use of functional SVD in geophysics, meteorology & oceanography (e.g., empirical orthogonal functions) and quantitative fluid dynamics (e.g., dynamic mode decomposition) and computational chemistry. Recently SVD has been used in neuroscience studies (e.g., cortical eigenmodes). To our knowledge, our work is the first time higher-order SVD has been applied to a neuroscience problem. We use it here to solve an otherwise (apparently) intractable problem, i.e., how to estimate the spatial frequency (SF) spectrum on a sparse and highly irregular array with broadband signals.

      We will clarify the methodological strategy in more formal terms in the next version of the paper. But essentially SVD allows a change of basis that greatly simplifies quantitative analysis. Here it allows escape from estimating the SF across millions of data-points (triplets of contacts, at each sample), each of which contains multiple overlapping signals plus noise (noise here defined in the context of SF estimation) and are inter-correlated across a variety of known and unknown observational dimensions. Rather than simply average over samples, which would wash out much of the real signal, SVD allows the signals to be decomposed in a lossless manner (up to the choice of number of eigenvectors at which the SVD is truncated). The higher-order SVD we have implemented reduces the size of problem to allow quantification of SF over hundreds of components, each of which is guaranteed certain desirable properties, i.e., they explain known (and largest) amounts of variance of the original data and are orthonormal. This last property allows us to proceed as if the observations are independent. SF estimates are made within this new coordinate system.

      We will also more concretely formalise the relation between Fourier analysis and previous observations of eigenvectors of phase that are smooth gradients.

      We will very briefly review Fourier methods designed to deal with non-uniform sampling. The problems these methods are designed for fall into the non-uniform part of the spectrum from uniform–non-uniform–irregular–highly-irregular–noise. They are highly suited to, for example, interpolating between EEG electrodes to produce a uniform array for application of the fast Fourier transform (Alamia et al., 2023). However, survey across a range of applied maths fields suggests that no method exists for the degree of irregular sampling found in the sEEG arrays at issue here. In particular, the sparseness of the contact coverage presents an insurmountable hurdle to standard methods. While there exists methods for sparse samples (e.g., Margrave & Fergusen, 1999; Ying 2009), these require well-defined oscillatory behavior, e.g., for seismographic analysis. Given the problems of highly irregular sampling, sparseness of sampling and broadband, nonstationary signals, we have attempted a solution via the novel methods introduced in the current manuscript. We were able to leverage previous observations regarding the relation between eigenvectors of cortical phase and Fourier analysis, as we outline in the manuscript.

      We will extend the current 1-dimensional surrogate data to better demonstrate that the method does indeed correctly detect the ordinal relations in power on different parts of the SF spectrum. We will include the effects of a global reference signal. Simulations of cortical activity are an expensive way to achieve this goal. While the first author has published in this area, such simulations are partly a function of the assumptions put into them (i.e., spatial damping, boundary conditions, parameterization of connection fields). We will therefore use surrogate signals derived from real cortical activity to complete this task.

      Some more specific issues raised:<br /> (1) Application of the method to general neuroscience problems:<br /> The purpose of the manuscript was to estimate the SF spectrum of phase in the cortex, in the range where it was previously not possible. The purpose was not specifically to introduce a new method of analysis that might be immediately applicable to a wide range of available data-sets. Indeed, the specifics of the method are designed to overcome an otherwise intractable disadvantage of sEEG (irregular spatial sampling) in order to take advantage of its good coverage (compared to ECoG) and low volume conduction compared to extra-cranial methods. On the other hand, the developing field of functional SVD would be of interest to neuroscientists, as a set of methods to solve difficult problems, and therefore of general interest. We will make these points explicit in the next version of the manuscript. In order to make the method more accessible, we will also publish code for the key routines (construction of triplets of contacts, Morlet wavelets, calculation of higher-order SVD, calculation of SF).

      (2) Novelty:<br /> We agree with the third reviewer: if our results can convince, then the study will have an impact on the field. While there is work that has been done on phase interactions at a variety of scales, such as from the labs of Fries, Singer, Engels, Nauhaus, Logothetis and others, it does not quantify the relative power of the different spatial scales. Additionally, the research of Freeman et al. has quantified only portions of the SF spectrum of the cortex, or used EEG to estimate low SFs. We would appreciate any pointers to the specific literature the current research contributes to, namely, the SF spectrum of activity in the cortex.

      (3) Further analyses:<br /> The main results of the research are relatively simple: monotonically falling SF-power with SF; this effect occurs across the range of temporal frequencies. We provide each individual participant’s curves in the supplementary Figures. By visual inspection, it can be seen that the main result of the example participant is uniformly recapitulated. One is rarely in this position in neuroscience research, and we will make this explicit in the text.

      The research stands or falls by the adequacy of the method to estimate the SF curves. For this reason most statistical analyses and figures were reserved for ruling out confounds and exploring the limits of the methods. However, for the sake of completeness, we will now include the SF vs. SF-power correlations and significance in the next version, for each participant at each frequency.

      Since the main result was uniform across participants, and since we did not expect that there was anything of special significance about the delayed free recall task, we conclude that more participants or more tasks would not add to the result. As we point out in the manuscript, each participant is a test of the main hypothesis. The result is also consistent with previous attempts to quantify the SF spectrum, using a range of different tasks and measurement modalities (Barrie et al., 1996; Ramon & Holmes 2015; Alexander et al., 2019; Alexander et al., 2016; Freeman et al., 2003; Freeman et al. 2000). The search for those rare sEEG participants with larger coverage than the maximum here is a matter of interest to us, but will be left for a future study.

      (4) Sampling of phase and its meaningfulness:<br /> The wavelet methods used in the present study have excellent temporal resolution but poor frequency resolution. We additionally oversample the frequency range to produce visually informative plots (usually in the context of time by frequency plots, see Alexander et al., 2006; 2013; 2019). But it is not correct that the methods for estimating phase assume a narrow frequency band. Rather, the poor frequency resolution of short time-series Morlet wavelets means the methods are robust to the exact shape of the waveforms; the signal need be only approximately sinusoidal; to rise and fall. The reason for using methods that have excellent resolution in the time-domain is that previous work (Alexander et al., 2006; Patten et al. 2012) has shown that traveling wave events can last only one or two cycles, i.e., are not oscillatory in the strict sense but are non-stationary events. So while short time-window Morlet wavelets have a disadvantage in terms of frequency resolution, this means they precisely do not have the problem of assuming narrow-band sinusoidal waveforms in the signal. We strongly disagree that our analysis requires very strong assumptions about oscillations (see last point in this section).

      Our hypothesis was about the SF spectrum of the phase. When the measurement of phase is noise-like at some location, frequency and time, then this noise will not substantially contribute to the low SF parts of the spectrum compared to high SFs. Our hypothesis also concerned whether it was reasonable to interpret the existing literature on low SF waves in terms of cortically localised waves or small numbers of localised oscillators. This required us to show that low SFs dominate, and therefore that this signal must dominate any extra-cranial measurements of apparent low SF traveling waves. It does not require us to demonstrate that the various parts of the SF spectrum are meaningful in the sense of functionally significant. This has been shown elsewhere (see references to traveling waves in manuscript, to which we will also add a brief survey of research on phase dynamics).

      The calculation of phase can be bypassed altogether to achieve the initial effect described in the introduction to the methods (Fourier-like basis functions from SVD). The observed eigenvectors, increasing in spatial frequency with decreasing eigenvalues, can be reproduced by applying Gaussian windows to the raw time-series (D. Alexander, unpublished observation). For example, undertaking an SVD on the raw time-series windowed over 100ms reproduces much the same spatial eigenvectors (except that they come in pairs, recapitulating the real and imaginary parts of the signal). This reproducibility is in comparison to first estimating the phase at 10Hz using Morlet wavelets, then applying the SVD to the unit-length complex phase values.

      (5) Other issues to be addressed and improved:<br /> clarity on which experiments were analyzed (starting in the abstract) discussion of frequencies above 60Hz and caution in interpretation due to spike-waveform artefact or as a potential index of multi-unit spiking discussion of whether the ad hoc, quasi-random sampling achieved by sEEG contacts somehow inflates the low SF estimates

      References (new)<br /> Patten TM, Rennie CJ, Robinson PA, Gong P (2012) Human Cortical Traveling Waves: Dynamical Properties and Correlations with Responses. PLoS ONE 7(6): e38392. https://doi.org/10.1371/journal.pone.0038392<br /> Margrave GF, Ferguson RJ (1999) Wavefield extrapolation by nonstationary phase shift, GEOPHYSICS 64:4, 1067-1078<br /> Ying Y (2009) Sparse Fourier Transform via Butterfly Algorithm SIAM Journal on Scientific Computing, 31:3, 1678-1694

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper examines changes in relaxation time (T1 and T2) and magnetization transfer parameters that occur in a model system and in vivo when cells or tissue are depolarized using an equimolar extracellular solution with different concentrations of the depolarizing ion K+. The motivation is to explain T2 changes that have previously been observed by the authors in an in vivo model with neural stimulation (DIANA) and to try provide a mechanism to explain those changes.

      Strengths:

      The authors argue that the use of various concentrations of KCL in the extracellular fluid depolarize or hyperpolarize the cell pellets used and that this change in membrane potential is the driving force for the T2 (and T1-supplementary material) changes observed. In particular, they report an increase in T2 with increasing KCL concentration in the extracellular fluid (ECF) of pellets of SH-SY5Y cells. To offset the increasing osmolarity of the ECF due to the increase in KCL, the NaCL molarity of the ECF is proportionally reduced. The authors measure the intracellular voltage using patch clamp recordings, which is a gold standard. With 80 mM of KCL in the ECF, a change in T2 of the cell pellets of ~10 ms is observed with the intracellular potential recorded as about -6 mv. A very large T1 increase of ~90 ms is reported under the same conditions. The PSR (ratio of hydrogen protons on macromolecules to free water) decreases by about 10% at this 80 mM KCL concentration. Similar results are seen in a Jurkat cell line and similar, but far smaller changes are observed in vivo, for a variety of reasons discussed. As a final control, T1 and T2 values are measured in the various equimolar KCL solutions. As expected, no significant changes in T1 and T2 of the ECF were observed for these concentrations.

      Weaknesses:

      While the concepts presented are interesting, and the actual experimental methods seem to be nicely executed, the conclusions are not supported by the data for a number of reasons. This is not to say that the data isn't consistent with the conclusions, but there are other controls not included that would be necessary to draw the conclusion that it is membrane potential that is driving these T1 and T2 changes. Unfortunately for these authors, similar experiments conducted in 2008 (Stroman et al. Magn. Reson. in Med. 59:700-706) found similar results (increased T2 with KCL) but with a different mechanism, that they provide definite proof for. This study was not referenced in the current work.

      It is well established that cells swell/shrink upon depolarization/hyperpolarization. Cell swelling is accompanied by increased light transmittance in vivo, and this should be true in the pellet system as well. In a beautiful series of experiments, Stroman et al. (2008) showed in perfused brain slices that the cells swell upon equimolar KCL depolarization and the light transmittance increases. The time course of these changes is quite slow, of the order of many minutes, both for the T2-weighted MRI signal and for the light transmittance. Stroman et al. also show that hypoosmotic changes produce the exact same timecourse as the KCL depolarization changes (and vice versa for the hyperosmotic changes - which cause cell shrinkage). Their conclusion, therefore, was that cell swelling (not membrane potential) was the cause of the T2-weighted changes observed, and that these were relatively slow (on the scale of many minutes).

      What are the implications for the current study? Well, for one, the authors cannot exclude cell swelling as the mechanism for T2 changes, as they have not measured that. It is however well established that cell swelling occurs during depolarization, so this is not in question. Water in the pelletized cells is in slow/intermediate exchange with the ECF, and the solutions for the two compartment relaxation model for this are well established (see Menon and Allen, Magn. Reson. in Med. 20:214-227 (1991). The T2 relaxation times should be multiexponential (see point (3) further below). The current work cannot exclude cell swelling as the mechanism for T2 changes (it is mentioned in the paper, but not dealt with). Water entering cells dilutes the protein structures, changes rotational correlation times of the proteins in the cell and is known to increase T2. The PSR confirms that this is indeed happening, so the data in this work is completely consistent with the Stroman work and completely consistent with cell swelling associated with depolarization. The authors should have performed light scattering studies to demonstrate the presence or absence of cell swelling. Measuring intracellular potential is not enough to clarify the mechanism.

      We appreciate the reviewer’s comments. We agree that changes in cell volume due to depolarization and hyperpolarization significantly contribute to the observed changes in T2, PSR, and T1, especially in pelletized cells. For this reason, we already noted in the Discussion section of the original manuscript that cell volume changes influence the observed MR parameter changes, though this study did not present the magnitude of the cell volume changes. In this regard, we thank the reviewer for introducing the work by Stroman et al. (Magn Reson Med 59:700-706, 2008). When discussing the contribution of the cell volume changes to the observed MR parameter changes, we will additionally discuss the work of Stroman et al. in the revised manuscript.

      In addition, we acknowledge that the title and main conclusion of the original manuscript may be misleading, as we did not separately consider the effect of cell volume changes on MR parameters. To more accurately reflect the scope and results of this study and to consider the reviewer 2’s suggestion, we will adjust the title to “Responses to membrane potential-modulating ionic solutions measured by magnetic resonance imaging of cultured cells and in vivo rat cortex” and will also revise the relevant phrases in the main text.

      Finally, when [K+]-induced membrane potential changes are involved, there seems to be factors other than cell volume changes also appear to influence T2 changes. Our ongoing study shows that there are differences in T2 changes (for the same volume changes) between two different situations: pure osmotic volume changes vs. [K+]-induced volume changes (e.g., hypoosmotic vs. depolarization). Furthermore, this study suggests that mechanisms such as changes in free (primarily intracellular) and bound water within a voxel play an important role in generating this T2 difference. Our group is preparing a manuscript for this follow-up study and will report on it shortly.

      So why does it matter whether the mechanism is cell swelling or membrane potential? The reason is response time. Cell swelling due to depolarization is a slow process, slower than hemodynamic responses that characterize BOLD. In fact, cell swelling under normal homeostatic conditions in vivo is virtually non-existent. Only sustained depolarization events typically associated with non-naturalistic stimuli or brain dysfunction produce cell swelling. Membrane potential changes associated with neural activity, on the other hand, are very fast. In this manuscript, the authors have convincingly shown a signal change that is virtually the same as what was seen in the Stroman publication, but they have not shown that there is a response that can be detected with anything approaching the timescale of an action potential. So one cannot definitely say that the changes observed are due to membrane potential. One can only say they are consistent with cell swelling, regardless of what causes the cell swelling.

      For this mechanism to be relevant to explaining DIANA, one needs to show that the cell swelling changes occur within a millisecond, which has never been reported. If one knows the populations of ECF and pellet, the T2s of the ECF and pellet and the volume change of the cells in the pellet, one can model any expected T2 changes due to neuronal activity. I think one would find that these are minuscule within the context of an action potential, or even bulk action potential.

      In the context of cell swelling occurring at rapid response times, if we define cell swelling simply as an “increase in cell volume,” there are several studies reporting transient structural (or volumetric) changes (e.g., ~nm diameter change over ~ms duration) in neuron cells during action potential propagation (Akkin et al., Biophys J 93:1347-1353, 2007; Kim et al., Biophys J 92:3122-3129, 2007; Lee et al., IEEE Trans Biomed Eng 58:3000-3003, 2011; Wnek et al., J Polym Sci Part B: Polym Phys 54:7-14, 2015; Yang et al., ACS Nano 12:4186-4193, 2018). These studies show a good correlation between membrane potential changes and cell volume changes (even if very small) at the cellular level within milliseconds.

      As mentioned in the Response 1 above, this study does not address rapid dynamic membrane potential changes on the millisecond scale, which we explicitly discussed as one of the limitations in the Discussion section of the original manuscript. For this reason, we do not claim in this study that we provide the reader with definitive answers about the mechanisms involved in DIANA. Rather, as a first step toward addressing the mechanism of DIANA, this study confirms that there is a good correlation between changes in membrane potential and measurable MR parameters (e.g., T2 and PSR) when using ionic solutions that modulate membrane potential. Identifying T2 changes that occur during millisecond-scale membrane potential changes due to rapid neural activation will be further addressed in future studies.

      There are a few smaller issues that should be addressed.

      (1) Why were complicated imaging sequences used to measure T1 and T2? On a Bruker system it should be possible to do very simple acquisitions with hard pulses (which will not need dictionaries and such to get quantitative numbers). Of course, this can only be done sample by sample and would take longer, but it avoids a lot of complication to correct the RF pulses used for imaging, which leads me to the 2nd point.

      We appreciate the reviewer’s suggestion regarding imaging sequences. We would like to clarify that dictionaries were used for fitting in vivo T2 decay data, not in vitro data. Sample-by-sample nonlocalized acquisition with hard pulses may be applicable for in vitro measurements. However, for in vivo measurements, a slice-selective multi-echo spin-echo sequence was necessary to acquire T2 maps within a reasonable scan time. Our choice of imaging sequence was guided by the need to spatially resolve MR signals from specific regions of interests while balancing scan time constraints.

      (2) Figure S1 (H) is unlike any exponential T2 decay I have seen in almost 40 years of making T2 measurements. The strange plateau at the beginning and the bump around TE = 25 ms are odd. These could just be noise, but the fitted curve exactly reproduces these features. A monoexponential T2 decay cannot, by definition, produce a fit shaped like this.

      The T2 decay curves in Figure S1(H) indeed display features that deviate from a simple monoexponential decay. In our in vivo experiments, we used a multi-echo spin-echo sequence with slice-selective excitation and refocusing pulses. In such sequences, the echo train is influenced by stimulated echoes and imperfect slice profiles. This phenomenon is inherent to the pulse sequence rather than being artifacts or fitting errors (Hennig, Concepts Magn Reson 3:125-143, 1991; Lebel and Wilman, Magn Reson Med 64:1005-1014, 2010; McPhee and Wilman, Magn Reson Med 77:2057-2065, 2017). Therefore, we fitted the T2 decay curve using the technique developed by McPhee and Wilman (2017).

      (3) As noted earlier, layered samples produce biexponential T2 decays and monoexponential T1 decays. I don't quite see how this was accounted for in the fitting of the data from the pellet preparations. I realize that these are spatially resolved measurements, but the imaging slice shown seems to be at the boundary of the pellet and the extracellular media and there definitely should be a biexponential water proton decay curve. Only 5 echo times were used, so this is part of the problem, but it does mean that the T2 reported is a population fraction weighted average of the T2 in the two compartments.

      We understand the reviewer’s concern regarding potential biexponential decay due to the presence of different compartments. In our experiments, we carefully positioned the imaging slice sufficiently remote from the pellet-media interface. This approach ensures that the signal predominantly arises from the cells (and interstitial fluid), excluding the influence of extracellular media above the cell pellet. We will clearly describe the imaging slice in the revised manuscript. As mentioned in our Methods section, for in vitro experiments, we repeated a single-echo spin-echo sequence with 50 difference echo times. While Figure 1C illustrates data from five echo times for visual clarity, the full dataset with all 50 echo times was used for fitting. We will clarify this point in the revised manuscript to avoid any misunderstanding.

      (4) Delta T1 and T2 values are presented for the pellets in wells, but no absolute values are presented for either the pellets or the KCL solutions that I could find.

      As requested by the reviewer, we will include the absolute values in the revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      Min et al. attempt to demonstrate that magnetic resonance imaging (MRI) can detect changes in neuronal membrane potentials. They approach this goal by studying how MRI contrast and cellular potentials together respond to treatment of cultured cells with ionic solutions. The authors specifically study two MRI-based measurements: (A) the transverse (T2) relaxation rate, which reflects microscopic magnetic fields caused by solutes and biological structures; and (B) the fraction or "pool size ratio" (PSR) of water molecules estimated to be bound to macromolecules, using an MRI technique called magnetization transfer (MT) imaging. They see that depolarizing K+ and Ba2+ concentrations lead to T2 increases and PSR decreases that vary approximately linearly with voltage in a neuroblastoma cell line and that change similarly in a second cell type. They also show that depolarizing potassium concentrations evoke reversible T2 increases in rat brains and that these changes are reversed when potassium is renormalized. Min et al. argue that this implies that membrane potential changes cause the MRI effects, providing a potential basis for detecting cellular voltages by noninvasive imaging. If this were true, it would help validate a recent paper published by some of the authors (Toi et al., Science 378:160-8, 2022), in which they claimed to be able to detect millisecond-scale neuronal responses by MRI.

      Strengths:

      The discovery of a mechanism for relating cellular membrane potential to MRI contrast could yield an important means for studying functions of the nervous system. Achieving this has been a longstanding goal in the MRI community, but previous strategies have proven too weak or insufficiently reproducible for neuroscientific or clinical applications. The current paper suggests remarkably that one of the simplest and most widely used MRI contrast mechanisms-T2 weighted imaging-may indicate membrane potentials if measured in the absence of the hemodynamic signals that most functional MRI (fMRI) experiments rely on. The authors make their case using a diverse set of quantitative tests that include controls for ion and cell type-specificity of their in vitro results and reversibility of MRI changes observed in vivo.

      Weaknesses:

      The major weakness of the paper is that it uses correlational data to conclude that there is a causational relationship between membrane potential and MRI contrast. Alternative explanations that could explain the authors' findings are not adequately considered. Most notably, depolarizing ionic solutions can also induce changes in cellular volume and tissue structure that in turn alter MRI contrast properties similarly to the results shown here. For example, a study by Stroman et al. (Magn Reson Med 59:700-6, 2008) reported reversible potassium-dependent T2 increases in neural tissue that correlate closely with light scattering-based indications of cell swelling. Phi Van et al. (Sci Adv 10:eadl2034, 2024) showed that potassium addition to one of the cell lines used here likewise leads to cell size increases and T2 increases. Such effects could in principle account for Min et al.'s results, and indeed it is difficult to see how they would not contribute, but they occur on a time scale far too slow to yield useful indications of membrane potential. The authors' observation that PSR correlates negatively with T2 in their experiments is also consistent with this explanation, given the inverse relationship usually observed (and mechanistically expected) between these two parameters. If the authors could show a tight correspondence between millisecond-scale membrane potential changes and MRI contrast, their argument for a causal connection or a useful correlational relationship between membrane potential and image contrast would be much stronger. As it is, however, the article does not succeed in demonstrating that membrane potential changes can be detected by MRI.

      We appreciate the reviewer’s comments. We agree that changes in cell volume due to depolarization and hyperpolarization significantly contribute to the observed MR parameter changes. For this reason, we have already noted in the Discussion section of the original manuscript that cell volume changes influence the observed MR parameter changes. In this regard, we thank the reviewer for introducing the work by Stroman et al. (Magn Reson Med 59:700-706, 2008) and Phi Van et al. (Sci Adv 10:eadl2034, 2024). When discussing the contribution of the cell volume changes to the observed MR parameter changes, we will additionally discuss both work of Stroman et al. and Phi Van et al. in the revised manuscript.

      In addition, this study does not address rapid dynamic membrane potential changes on the millisecond scale, which we explicitly discussed as one of the limitations of this study in the Discussion section of the original manuscript. For this reason, we do not claim in this study that we provide the reader with definitive answers about the mechanisms involved in DIANA. Rather, as a first step toward addressing the mechanism of DIANA, this study confirms that there is a good correlation between changes in membrane potential and measurable MR parameters (although on a slow time scale) when using ionic solutions that modulate membrane potential. Identifying T2 changes that occur during millisecond-scale membrane potential changes due to rapid neural activation will be further addressed in future studies.

      Together, we acknowledge that the title and main conclusion of the original manuscript may be misleading. To more accurately reflect the scope and results of this study and to consider the reviewer’s suggestion, we will adjust the title to “Responses to membrane potential-modulating ionic solutions measured by magnetic resonance imaging of cultured cells and in vivo rat cortex” and will also revise the relevant phrases in the main text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, Molnar, Suranyi and colleagues have probed the genomic stability of Mycobacterium smegmatis in response to several anti-tuberculosis drugs as monotherapy and in combination. Unlike the study by Nyinoh and McFaddden http://dx.doi.org/10.1002/ddr.21497 (which should be cited), the authors use a sub-lethal dose of antibiotic. While this is motivated by sound technical considerations, the biological and therapeutic rationale could be further elaborated.

      In the mutation accumulation experiments, we needed to ensure continuous and reproducible growth of a small number of colonies across multiple passages. This technical requirement necessitated the use of sublethal drug concentrations. However, sublethal doses also have biological relevance. Noncompliance with prescribed antibiotic regimens and the presence of antibiotic residues in food due to the extensive use of antibiotics in agricultural mass production are two obvious sources of prolonged exposure to sublethal antibiotics.

      The results the authors obtain are in line with papers examining the genomic mutation rate in vitro and from patient samples in Mycobacterium tuberculosis, in vitro in Mycobacterium smegmatis and in vitro in Mycobacterium tuberculosis (although the study by HL David (PMID: 4991927) is not cited). The results are confirmatory of previous studies.

      The two cited studies, along with several others, did not distinguish between genetic mutations and phenotypic responses to drug exposure (the fluctuation test alone is not suitable for this). Therefore, their objectives are not comparable to ours, which specifically investigated whether resistant colonies carry adaptive mutations. Nevertheless, we acknowledge the relevance of these studies and have now cited them in the appropriate sections in the text.

      It is therefore puzzling why the authors propose the opposite hypothesis in the paper (i.e antibiotic exposure should increase mutation rates) merely to tear it down later. This straw-man style is entirely unnecessary.  

      The phenomenon of stress-inducible mutagenesis in bacterial evolution remains a topic of heated debate. The emergence of genetically encoded resistance may stem from either microevolution or the dissemination of pre-existing variants from polyclonal infections under drug pressure. We believe that the Introduction presents both of these hypotheses in a balanced manner to elucidate the rationale behind our mutation accumulation investigations.  

      The results on the nucleotide pools are interesting, but the statistically significant data is difficult to identify as presented, and therefore the new biological insights are unclear.

      We now indicate statistical significance in the figure, in addition to the detailed statistical analysis of all dNTP measurements provided in Table S5.

      Finally, the authors show that a fluctuation assay generates mutations with higher frequencies that the genetic stability assays, confirming the well-known effect of phenotypic antibiotic resistance.

      What we show is that the fluctuation assay generated bacteria that tolerated the applied antibiotic without developing mutations. Conclusions about mutation rates are often drawn from fluctuation assays without confirming genetic-level changes, a discrepancy that persists despite these assays accounting for both phenotypic and genotypic alterations. By combining genome sequencing with fluctuation assays, our approach emphasizes the importance of distinguishing between these changes. While fluctuation assays remain valuable, inexpensive, and simple tools for evaluating the response of bacterial populations to various selective environments, they should not be considered definitive indicators of genetic changes.

      Recommendations For The Authors:

      The quality of the figures can be significantly improved. In Figure 1, cell lengths can be shown on separate histograms or better still as violin plots to enable better comparisons.

      Thank you for the suggestion. We have revised the data presentation accordingly.

      Details for statistical tests should be provided in the figure legend.  

      Statistical details are now added in the figure legend.

      In Figure 2, the number of data points is not mentioned.

      Statistical information is now added to the new Figure 2, which has been revised extensively based on suggestions from all Referees.

      The data in Figure 3 would be much easier to comprehend as a heatmap.  

      The figure we provided is a color gradient table representing different gene expression levels, along with numerical data and statistical significance indicated within the color boxes, expanding the information content of a traditional heatmap. In response to the Referee's suggestion, we also prepared a hierarchical clustering heatmap, demonstrating that the grouping of rows and columns based on functional information in the original figure is consistent with the clustering pattern observed in the heatmap (Figure S5). As the original figure is more informative and better structured, we have included the new figure in the supplementary materials.

      No statistical tests are provided for Figure 4.

      We now indicate statistical significance in the figure and describe the statistical analysis in the figure legend, as suggested. Additionally, Table S5 is dedicated to the statistical analysis of the dNTP data.  

      Reviewer #2 (Public Review):

      In this study, the authors assess whether selective pressure from drug chemotherapy influences the emergence of drug resistance through the acquisition of genetic mutations or phenotypic tolerance. I commend the authors on their approach of utilizing the mutation accumulation (MA) assay as a means to answer this and whole genome sequencing of clones from the assay convincingly demonstrates low mutation rates in Mycobacteria when exposed to sub-inhibitory concentrations of antibiotics. Also, quantitative PCR highlighted the upregulation of DNA repair genes in Mycobacteria following drug treatment, implying the preservation of genomic integrity via specific repair pathways.

      Even though the findings stem from M. smegmatis exposure to antibiotics under in vitro conditions, this is still relevant in the context of the development of drug resistance so I can see where the authors' train of thought was heading in exploring this. However, I think important experiments to perform to more fully support the conclusion that resistance is largely associated with phenotypic rather than genetic factors would have been to either sequence clones from the ciprofloxacin tolerance assay (to show absence/ minimal genetic mutations) or to have tested the MIC of clones from the MA assay (to show an increase in MIC).

      Thank you for acknowledging the values of the manuscript and for the insightful suggestions for improvement. We agree on the necessity to directly connect the mutation accumulation experiments with the tolerance assay, and we have performed both suggested additional experiments.  

      (1) We repeated the ciprofloxacin tolerance assay (Figure S6) using a large number of plates to gather enough cells for genomic DNA extraction and whole genome sequencing. The sequencing confirmed the absence of mutations in bacteria grown in both 0.3 and 0.5 ug/ml ciprofloxacin. We integrated this result in the revised manuscript text, while the sequencing data are available at the European Nucleotide Archive (ENA) with PRJEB71590 project number.

      (2) We resuscitated three different clones from the MA assays stored at -80°C and tested the MIC of the respective drugs. The results are presented in Figure 2C. Except for EMB, we observed an increase in MIC values across the treatments.

      There seems to be a disconnect between making these conclusions from experiments conducted under different conditions, or perhaps the authors can clarify why this was done.  

      Molecular biology analysis methods are not easily compatible with long-term mutation accumulation experiments, or at least we could not establish the necessary conditions. When DNA or RNA extraction was required, we had to adjust the experimental scale for further analysis, which could be done in liquid culture. We believe that the suggested critical back-and-forth control experiments have significantly improved the comparability of the results.

      With regards to the sub-inhibitory drug concentration applied, there is significant variation in the viability as calculated by CFUs following the different treatments and there is evidence that cell death greatly affects the calculation of mutation rate (PMCID: PMC5966242). For instance, the COMBO treatment led to 6% viability whilst the INH treatment led to 80% cell viability. Are there any adjustments made to take this into account?

      We agree with and have been aware of the notion that cell death affects the calculation of the mutation rate. We included treatment optimization data on agar plates (Table 1 and Figure S2), which now demonstrate that the applied subinhibitory drug concentrations resulted in ≤10% viability across all treatments in the MA assay. This minimizes the potential discrepancy in the mutation rate calculation caused by variable cell death.  

      It would also be useful to the reader to include a supplementary table of the SNPs detected from the lineages of each treatment - to determine if at any point rifampicin treatment led to mutations in rpoB, isoniazid to katG mutations, etc.  

      Overall, while this study is tantalizingly suggestive of phenotypic tolerance playing a leading role in drug resistance (and perhaps genetic mutations a sub-ordinate role) a more substantial link is needed to clarify this.

      The SNPs identified from the lineages of each treatment are compiled in the 'unique_muts.xls' file within the Figshare document bundle that was originally enclosed with the manuscript. In response to your suggestion, we have now added a simplified version of this data set in Table S2, listing the detected SNPs. Notably, no confirmed adaptive mutation developed in our experiments; rifampicin treatment did not result in mutations in rpoB, nor did isoniazid lead to mutations in katG.

      Recommendations For The Authors:

      I would suggest moving Figure 1 to the supplementary - it shows that cell wall targeting drugs cause cell shortening and DNA replication targeting drugs cause cell elongation as would be expected and this is simply a secondary observation, not one that is central to the paper.  

      We agree that this is not a novel or unexpected observation. However, we used it as an indicator of drug effectiveness, particularly for bacteriostatic cell wall-targeting drugs in liquid culture that induced moderate cell death. Following Reviewer 1's suggestions, we extensively revised the figure to better convey our intended message. We believe the updated version now more clearly demonstrates the drugs' impact, and for this reason, we have opted to keep it in the main text.

      Figure 2 and Table 2 show the same data so this can be combined as a paneled figure or one moved to the supplementary. It would be useful to include a diagram of how the MA assay was conducted, similar to the CIP tolerance assay figure.

      Thank you for the suggestions. We have added a diagram to Figure 2 explaining the MA assay (Figure 2A), as well as the MIC experiment conducted on the MA cells (Figure 2C). To avoid redundancy, Table 2 has been removed.

      Reviewer #3 (Public Review):

      Summary:

      This manuscript describes how antibiotics influence genetic stability and survival in Mycobacterium smegmatis. Prolonged treatment with first-line antibiotics did not significantly impact mutation rates. Instead, adaptation to these drugs appears to be mediated by upregulation of DNA repair enzymes. While this study offers robust data, findings remain correlative and fall short of providing mechanistic insights.

      Strengths:

      The strength of this study is the use of genome-wide approaches to address the specific question of whether or not mycobacteria induce mutagenic potential upon antibiotic exposure.

      Weaknesses:

      The authors suggest that the upregulation of DNA repair enzymes ensures a low mutation rate under drug pressure. However, this suggestion is based on correlative data, and there is no mechanistic validation of their speculations in this study.

      Furthermore, as detailed below, some of the statements made by the authors are not substantiated by the data presented in the manuscript.

      Finally, some clarifications are needed for the methodologies employed in this study. Most importantly, reduced colony growth should be demonstrated on agar plates to indicate that the drug concentrations calculated from liquid culture growth can be applied to agar surface growth. Without such validations, the lack of induced mutation could simply be due to the fact that the drug concentrations used in this study were insufficient.

      Thank you for appreciating the manuscript's merits and for the instructive suggestions. We agree that demonstrating reduced colony growth on agar plates is important to validate the relevance of the drug concentrations used in the study. In response, we have added the treatment optimization data on agar plates in Figure S2 and reorganized Table 1 to show the decrease in CFU achieved with the applied subinhibitory drug concentrations.

      We acknowledge that the observed upregulation of DNA repair enzymes and the low mutation rates under drug pressure represent correlative data. We removed the reference to mechanism from the abstract and avoided presenting the qPCR results as a mechanistic explanation in the text. We have only raised the possibility that correlation could be a causal relationship: "The observed upregulation of the relevant DNA repair enzymes might account for the low mutation rate even under drug pressure." We recognize the necessity for a new series of targeted experiments to provide mechanistic explanations. We added the following text to the Discussion:

      “The observed activation of DNA repair processes likely mitigates mutation pressure, ensuring genome stability. However, to confirm this hypothesis, these investigations should be conducted using genetically modified DNA repair mutant strains.”

      In the current manuscript, we aim to convincingly demonstrate that long-term antibiotic pressure did not induce the occurrence of new adaptive mutations.

      Recommendations For The Authors:

      Additional specific comments are:

      Page 2. Do not italicize "Mycobacteria", which is not considered a scientific name.

      Corrected.

      Page 4. "Bacto pepcone" is a typo.

      Corrected.

      Page 6. "Quiagen" is a typo.

      Corrected.

      Page 9. In Table 1, RIF being described as a protein synthesis inhibitor is misleading.

      Corrected.

      Page 9. The statement "Specifically, following RIF, CIP, and MMC treatments, we observed cells elongating by more than twofold, whereas INH and EMB treatments led to a reduction in cell length." cannot be justified by Figure 1, as the cell length information is not conveyed in this figure.

      Thank you for pointing this out, the revised Figure 1 conveys the cell length information.

      Page 10. If the experiment shown in Figure S1 was done in an acidic growth condition, the figure legend should clearly indicate the fact. Additionally, the assay condition should be described in detail in the Methods section.

      Thank you, the required information is now included in both the figure legend and the Methods section.

      Page 10. If PZA does not work against M. smegmatis, it seems pointless to add it to the COMBO treatment. Please clarify why it was included in the drug combination experiment.

      We added the following text to clarify the use of PZA: “Regardless of its inefficacy as a monotherapy, we included PZA in the combination treatment, as we could not rule out the possibility that PZA interacts with the other three drugs or that PZA elimination mechanisms are equally active in M. smegmatis under this regimen.”

      Page 10. Generation times calculated from liquid culture cannot be applied to colony growth on an agar plate. The growth behaviors on a solid surface will be totally different from planktonic suspension growth. The numbers of generations indicated here will be inaccurate.

      You are absolutely right. We conducted an experiment to calculate the number of generations on plates under the same conditions as used in the MA assay. We found, indeed, a different (doubled) generation time from what was determined in liquid culture. We have adjusted the mutation rates accordingly.

      Page 12. Was the experiment shown in Figure 3 done in a liquid culture? If so, the transcriptional profile could be different from the experiment shown in Figure 2, which was done on an agar plate.

      Yes, the experiment shown in Figure 3 was conducted in liquid culture. We acknowledge that the transcriptional profile could differ from the experiment shown in Figure 2, which was performed on an agar plate. However, technical limitations required us to use liquid cultures for these experiments.

      Page 14. Regarding the statement "INH and EMB coincided with a decreased concentration of these [dCTP and dTTP] nucleotides", by examining Table S5, I do not see any statistical reductions in dCTP and dTTP levels.

      Thank you for bringing this to our attention. We have made the necessary corrections to ensure that the text and data are now aligned.

      Page 14. Similarly to the comment above, the statement "RIF, CIP and MMC treatments promoted an increase in the dCTP and dTTP pools" is misleading as each drug seems to increase either dCTP or dTTP, not both.

      Same as above.

      Page 14. The authors state, "a larger overall dNTP pool size coincides with a larger cell size and vice versa (Figure 4H)". Please indicate the unit of the pool size for the graph shown in Figure 4H. According to the legend, I assume that it refers to the concentration. The term "pool size" may be misleading as it implies quantity rather than concentration.

      Page 15. Figure 4H is impossible to understand. The left y-axis label looks as if it is a ratio of cell length to volume. There is no point in having these three data on a single graph. Please separate them into individual graphs. Also, what is the spacing between the tick marks? The data also seem inconsistent with the values given in Table S1. For example, the mean volume of COMBO is larger than the control (according to Table S1), and yet the graph in Figure 4H indicates that COMBO's relative length is less than 1.

      Thank you for your feedback. We have corrected these and created what we hope is a clearer figure.

      Figure S1. Clarify what the gray shade in the graph represents.

      The gray shade was unnecessary, so we removed it when recoloring the figure to ensure a more coherent color scheme across the different treatments.

      Figure S1. Relative viability cannot be determined by OD600. CFU needs to be determined to assess cell viability.

      Thank you. We changed the incorrect term viability to growth inhibition.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This work describes the induction of SIV-specific NAb responses in rhesus macaques infected with SIVmac239, a neutralization-resistant virus. Typically, host NAb responses are not detected in animals infected with SIVmac239. In this work, seventy SIVmac239-infected macaques were retrospectively screened for NAb responses and a subset of nine animals were identified as NAb-inducers. The viral genomes from 7/9 animals that induced NAb responses were found to encode nonsynonymous mutation in the Nef gene (amino acid G63E). In contrast, Nef G63E mutation was found only in 2/19 NAb non-inducers - implicating that the Nef G63E mutation is selected in NAb inducers. Measurement of Nef G63E frequencies in plasma viruses suggested that Nef G63E selection preceded NAb induction. Nef G63E mutation was found to mediate escape from Nef-specific CD8+ T-cell responses. To examine the functional phenotype of Nef G63E mutant, its effect on downmodulation of Nef-interacting host proteins was examined. Infection of rhesus and cynomolgus macaque CD4+ T cell lines with WT or Nef G63E mutant SIV suggested that Nef mutant reduces S473 phosphorylation of AKT. Using flow cytometry-based proximity ligation assay, it was shown that Nef G63E mutation reduced binding of Nef to PI3K p85/p110 and mTORC2 GβL/mLST8 and MTOR components - kinase complex responsible AKT-S473 phosphorylation. In vitro B-cell Nef invasion and in vivo imaging/flow cytometry-based assays were employed to suggest that Nef from infected cells can target Env-specific B cells. Lastly, it was determined that NAb inducers have significantly higher Env-specific B-cells responses after Nef G63E selection when compared to NAb non-inducers. Finally, a corollary was drawn between the Nef G63E-associated B-cell/NAb induction phenotype and activated PI3K delta syndrome (APDS), which is caused by activating GOF mutations in PI3K, to suggest that Nef G63E-meidated induction of NAb response is reciprocal to APDS.

      Strengths:

      This study aims to understand the viral-host interaction that governs NAb induction in SIVmac239-infected macaques - this could enable identification of determinants important for induction of NAb responses against hard-to-neutralize tier-2/3 HIV variants. The finding that SIV-specific B-cell responses are induced following Nef G63E CD8+ T-cell escape mutant selection argue for an evolutionary trade-off between CTL escape and NAb induction. Exploitation of such a cellular-humoral immune axis could be important for HIV/AIDS vaccine efforts.

      Although more validation and mechanistic basis are needed, the corollary between PI3K hyperactive signaling during autoimmune disorders and Nef-mediated abrogated PI3K signaling could help identify novel targets and modalities for targeting immune disorders and viral infections.

      We are grateful for the supportive and insightful comments. The work did seem to unintendedly highlight a conceptual link between extrinsic and intrinsic immune perturbations. We will keep working on both wings, aiming to evoke synergisms.

      Weaknesses:

      Although the paper does have strengths in principle, the weaknesses of the paper are that the mechanistic basis of Nef-mediated induction of NAb responses are not directly examined. For example, it remains unclear whether SIVmac239 with engineered G63E mutation in Nef would induce faster and potent NAb responses. A macaque challenge study is needed to address this point.

      We appreciate the point. We do have certain difficulties in availability of macaques for de novo experiments. As partially discussed in ver1, the identified Nef phenotype selected post-acute infection confers an enhanced CD4+ T cell-killing effect (revised Fig 4F), and it is likely that de novo infection with the mutant would redirect the trajectory of infection to rapid disease/AIDS progression accompanying generalized immune failure by boosting acute-phase CD4 destruction. In other words, mutant de novo infection may not necessarily be directly discussable as an attempt for reconstitution. It appears equally critical to understand the mutant in vitro on an immunosignaling basis, and in the current work we have focused on depicting this as the first step. We will work on reconstitution experiments with emphasis on pharmacology in our future study.

      As presented, the central premise of the paper involves infected cell-generated Nef (WT or G63E mutant) being targeted to adjacent Env-specific B cells. However, it remains unclear how this is transfer takes place. A direct evidence demonstrating CD4+ T cell-associated and/or cell-free Nef being transferred to B-cell is needed to address this concern.

      We appreciate the point, also pointed out by Reviewers 2 and 3. We have performed three sets of in vitro reconstitution experiments graphically/functionally addressing how Nef transfer from CD4+ T cells to B cells can be modulated (new Fig 6) and edited text accordingly.

      The interaction between Nef and PI3K signaling components (p85, p110, GβL/mLST8, and MTOR) has been explored using PLA assay, however, this requires validation using additional biochemical and/or immunoprecipitation-based approaches. For example, is Nef (WT or mutant form) sufficient to affect PI3K-induced phosphorylation of Akt in an in vitro kinase assay? Moreover, the details regarding the binding events of WT vs mutant Nef with PI3K signaling components is lacking in this study. Lastly, it is unclear whether the interaction of Nef with PI3K signaling components is a conserved function of all primate lentiviruses or is this SIV-specific phenotype.

      We appreciate the point. Co-immunoprecipitation analysis via pulldown with the mTORC2-intrinsic cofactor Sin1 (revised Fig 4E), showing decreased G63E-Nef binding, should confer robustness to the statement combined with initial manipulation results (Fig 4C). As Sin1 is mTORC2- and not mTORC1-intrinsic, results should be strengthened. Phosflow may be a standard readout nowadays for pAkt itself. Related with sequence variation, conservation will be addressed in studies ahead. We concisely mentioned on this in the revision (Lines 390-391).

      It has been previously reported that the region of Nef encoding glycine at position 63 is not conserved in HIV-1 (Schindler et al, Journal of Virology 2004). Thus, does HIV-1 Nef also function in induction of NAb responses in humans? or the observed phenotype specific to SIV?

      We appreciate the point, and do not have an answer at the moment. We will explore in our HIV-1-infected patient cohort (Hau et al, AIDS 2022) and other occasions whether corresponding phenotypes may exist. We have mentioned on this point in the revised manuscript (Line 392-393).

      Reviewer #2 (Public Review):

      It is well known that human and simian immunodeficiency viruses (HIV and SIV, respectively) evolved numerous mechanisms to compromise effective immune responses but the underlying mechanisms remain incompletely understood. Here, Yamamoto and Matano examined the humoral immune response in a large number of rhesus macaques infected with the difficult-to-neutralize SIVmac239 strain. They identified a subgroup of animals that showed significant neutralizing Ab responses. Sequence analyses revealed that in most of these animals (7/9) but only a minority in the control group (2/19) SIVmac variants containing a CD8+ T-cell escape mutation of G63E/R in the viral Nef gene emerged. They further show that this change attenuates the ability of Nef to stimulate PI3K/Akt/mTORC2 signaling. The authors propose that this induction of SIVmac239 nAb induction is reciprocal to antibody dysregulation caused by a previously identified human PI3K gain-of-function (Ref). Altogether, the results suggest that PI3K signaling plays a key role in B-cell maturation and generation of effective nAb responses.

      Strengths of the study are that the authors analyzed a large number of SIVmac-infected macaques to unravel the biological significance of the known effect of the interaction of Nef with PI3K/Akt/mTORC2 signaling. This is interesting and may provide a novel means to improve humoral immune responses to HIV. Weaknesses are that only G63E and not G63R that also emerged in most animals was examined in most functional assays. Some effects of the G63E mutation seem modest and comparison to a grossly nef-defective SIVmac construct would be desirable to better assess to impact of the mutation of Nef-mediated stimulation of PI3K. While the impact of this Nef mutations on PI3K and the association with improved nAb responses is largely convincing, the results on the potential impact of soluble Nef on neighboring B cells is much less clear. SIVmac239 infects and manipulates helper CD4 T cells and these are essential for the activation and differentiation of B cells into antibody-producing plasma cells and effective humoral immune responses. Without additional functional evidence that Nef indeed specifically targets and manipulated B cells these results and conclusions should be made with much greater caution. Finally, the presentation of the results and conclusions is partly very convoluted and difficult to comprehend. Editing to improve clarity is highly recommended.

      We are very grateful for the supportive and visionary review and suggestions. Experiments have been performed to improve the points raised. This work inevitably involved interdisciplinary factors to even hit on the schematic (NAbs, B cells, CD4+T, CD8+T, viral escape, immunosignaling, IEI as extrapolation & microscopy implementations) and convoluted sections should have existed. We attempted streamlining of certain portions and edited writing throughout, and hope that it became more straightforward.

      Reviewer #2 (Recommendations For The Authors):

      As outlined in the public review, I found the results potentially very interesting but parts of the manuscript much more complex and confusing than necessary. In addition, the methods on the potential impact of soluble Nef on neighboring B cells in vivo was difficult to assess but altogether this part was not convincing. Have the following specific suggestions:

      We are very grateful for the scholarly review, and encouraging and suggestive comments on this orphan work. In the revision we designed experiments to address the properties of Nef transfer to append understanding on the in vivo B-cell data. Recommendations have been addressed as follows.

      (1) Title: "AIDS virus-neutralizing antibody induction reciprocal to a PI3K gain-of-function disease". Think this title hardly reflects the data; SIVmac cause simian AIDS and is not the "AIDS virus" the 2nd part is more appropriate for discussion than for the title (and the abstract).

      We appreciate the point. The original intent of the title was to conceptually bridge two differing fields of virus-host interaction and inborn errors of immunity/immunosignaling on an original article basis. Certain papers (Mudd et al, Nature 2012 etc) do utilize the term AIDS virus, and we similarly chose the term for simplification to non-virologists at initial submission.

      That being said, we understand the scholarly point raised, and feel that the initial aim can be well attained by retaining the key host effector PI3K in the title, as in the revised submission titled “SIV-specific neutralizing antibody induction following selection of a PI3K drive-attenuated nef variant”.

      (2) Abstract and throughout: As the authors show, SIVmac is not generally "neutralization resistant"; difficult to neutralize is more appropriate and should be used throughout. Also, the abstract and other parts are more complicated than necessary.

      We appreciate the point. HIV/SIV Env immunology work utilizes “neutralization-resistant” for SIVmac239 (e.g., Mason et al, PLoS Pathog 2016), and autologous titer positivity of ~10% at this size of examination does appear low amongst lentiviruses. Nevertheless, as recommended, “difficult-to-neutralize” better describes the nature, and we have switched the term accordingly.

      Linked with title modification, we reflected the comment on abstract structure and switched the main introductory sentence (Here we…) to a more data-based one instead of depicting extrapolation, and have modified phrasings in the latter half.

      (3) The intro seems a bit biased. Immune evasion due to mutations and proviral integration that play key roles in viral persistence are not mentioned. nAbs are not known to efficiently control HIV or SIV replication in vivo (not even in the present study). Thus, a more "balanced" presentation of the role of nAbs in vivo is desirable.

      We agree with the comment. Introduction in ver1 submission was compressed to just display humoral immune perturbation examples across persistence-prone viral infections, and indeed it should be much better to layout the multiscale strategies of lentiviruses in manifesting viral persistence. We have appended two sets of texts, one on the fundamental integrating retroviral life cycle and another on the wide spectrum of accessory protein-driven perturbation. As pointed out, the current endogenous induction is of course not early enough to exert suppressive impact on replication as like in exogenous Ab passive infusions. We have accordingly modulated text to improve the balance.

      (4) Lines 73-76: rephrase for clarity.

      We acknowledge the comment and have rephrased accordingly.

      (5) Line 92: "linked with sustained Env-specific B-cell responses after the mutant Nef selection". After or during in one case; the time frame varies enormously and this should be discussed.

      We appreciate the comment. The six Nef-G63E mutant-selecting NAb inducers subjected to B-cell analysis were the ones that showed precedence in Fig 2D (mutant before induction). That being said, we modified text as suggested (Line 104 in revised uploaded text). Text related to temporal deviation has been appended (Lines 378-383 in revised uploaded text).

      (6) The authors should discuss G63R and include it in the functional analyses.

      We appreciate the comment. Discussion on Nef-G63R in ver1 submission was kept minimal because statistical significance for selection was marginal. We generated a Nef-G63R mutant and results are appended in Fig 4-Figure Supplement 2.

      (7) Lines 124/5: conservation only applies to SIVsmm/mac Nefs and this region is also frequently deleted/length-variable in primary HIV-1 Nefs.

      We appreciate the comment. We modified description of the region accordingly (Lines 139-141 in revised text).

      (8) Lines 153-155: Statement doesn't seem to make sense. The triple mutant Nef SIVmac construct was not attenuated for replication but specifically disrupted in CD3 down-modulation.

      We acknowledge the comment. It had meant that the consequent plasma viral load showed a trend of decrease (as in the Graphical Abstract of the work) which should (in a simplistic view) influence antigenicity for humoral immune responses. Yet it is very true that virological replicative capacity was comparable with wild-type as in Fig.1. We have taken down the related text and rephrased it (Ref remains cited in introduction).

      (9) Lines 178/9: levels in PI3K gain-of-function mice "with full disease phenotype (Avery et al., 2018)". This needs more information, e.g. what disease exactly are they talking about?

      We are grateful for the correction, and have appended text and introduced the mentioned congenital disease in the Introduction section in advance. In-detail description is also appended in the Discussion section.

      (10) Lines 186/7: "Env-stimulating high-MOI infection also accelerated phenotype appearance, with enhanced 50% reduction (Figure 4C, right)". Modify text and corresponding figure for clarity.

      We acknowledge the comment. We revised as: “A high-MOI SIV infection, comprising higher initial concentration of extracellular Env stimuli, also accelerated phenotype appearance from day 3 to day 1 post-infection with stronger pAkt reduction”.

      (11) The validity of the results described in the section "Targeting of lymph node Env-specific B cells by Nef in vivo" was difficult to assess. Altogether, however, I didn't find them convincing, especially since a negative control (e.g. macaques infected with nef-deleted SIVmac) are missing.

      We acknowledge the comment. As a pure experimental control, whole-Nef deletion may assist for subtracted baselines. Within this work, the staining per se at least should be highly specific (mAb multiply verified in other applications and cytometry panel also designed for minimal spillover into AF488 channel). On in vivo basis, direct comparison may be somewhat frustrated by the fact that reduction in other pleiotropic effects of Nef seem to more dominate upon Nef deletion, as a set of reduced viremia, robust CD8 responses, killer CD4 responses and increased binding Ab titers (Johnson et al, J Virol 1997, Gauduin et al, J Exp Med 2006, Fukazawa et al, Nat Med 2012, Adnan et al, PLoS Pathog 2016 etc) leading to altered trajectory. We promise that we will work on refinement of the methodology in studies ahead.

      (12) Lines 309-319: This paragraph made little sense to me (as did lines 328-331).

      We acknowledge the comment and have edited both sections.

      Reviewer #3 (Additional Reviewer):

      In this manuscript, Hiroyuki Yamamoto et al examined virus-specific antibody responses and identified a subgroup of nine individuals, out of seventy SIVmac239 rhesus macaques of Burmese origin infected with SIVmac239, that develop neutralizing antibodies (NAb). The authors propose the emergence of a nef mutant (Nef-G63E) that impacts on B cell maturation resulting in PI3K gain-of-function.

      My major concerns are:

      The authors by different aspect addressed the role of the emergence of Nef-G63E mutant in individuals developing NAb. The manuscript is confused and the rational not always clearly stated. This reflects the two aspects of the manuscript (i) NAb identification in a subgroup of macaque and (2) the identification this nef mutation.

      We are grateful for the comprehensive and scholarly comments. As pointed out, the work did need to confront potential bifurcation of the influence of the obtained viral immunosignaling phenotype for CD4-intrinsic (which might be your specialty) and B-cell-intrinsic impact. Based on your suggestions we have acquired additional data and revised the manuscript as attached.

      The authors used both males (n=57) and females (n=13). However, there is no indication related to the sex regarding NAb inducers versus non-NAb Inducers. The notion of "highly pathogenic" is certainly not correct (see the introduction). Pathogenicity is also depending on monkey origin. Thus, cynomolgus are less sensitive to SIVmac239 or SIVmac251 compared to rhesus macaques (Ling B Aids 2002; Reimann KA, J Virol 2005; Cumont MC, J Virol 2008), or to pigtails used in US. Indeed, the authors used Burmese macaques, and therefore the dynamics of pathogenicity is different to rhesus macaque (Indian origin) housed in US. How many animals have been sacrificed out of the 61 animals? Herein, the animals are surviving longer (more than one year), and therefore the notion of "highly pathogenic" merits to be modulated.

      We appreciate the comment. We have accordingly appended sex information (M/F: 8/1 versus 49/12 in NAb inducers vs non-inducers, p > 0.99 by Fisher’s exact test) in the methods section. As pointed out there are differences in the frequency and rate of AIDS progression among macaques of differing origin, whereas we have also previously reported reproducible AIDS progression dependent on MHC-I genotypes in the Burmese rhesus macaques utilized (Nomura, Yamamoto et al., J Virol 2012). Adhering to advice, we have attenuated the term to “pathogenic” in the revised manuscript and appended one reference showing pathogenesis gradation from a cell-death perspective (Cumont 2008).

      Furthermore, no indication is provided regarding CD4 T cell dynamics, or CD8 T cells. In particular, the extent of T cell immunodeficiency may compromise humoral response. Therefore, this data needs to be shown. Indeed, previous reports have indicated that early CD4 T cell depletion is associated with defective humoral response. Furthermore, Tfh cell depletion was reported in several immune tissues, which are essential for B cell immune response like the spleen. Thus, this should be discussed as an alternative mechanism to the absence of NAb. Indeed, the authors found higher and persistent env-specific plasmablast cells in NAb inducers than that observed in non-NAb inducers figure 6. Why to have selected twelve individuals out of 61 individuals for assessing anti-env response (Supplemental S3 for figure 1, panel 1), and only eleven for western blots. The explanation in the text is absent. This requires to be clearly stated. See lines 108-110.

      We appreciate the comment. As in other sections, this study utilized available cryopreserved samples from a retrospective cohort, also having heterogeneity in data acquisition along the way. We acknowledge that some supplemental data are particularly limited in information, which is also a reason they are presented in SI. We felt that one important core was to secure samples for Nef-G63E-selecting NAb inducers versus viremic non-inducers, for which we acquired six versus twelve in the B-cell analysis.

      We (Nakane et al, PLoS ONE 2013) and others (Hirsch et al, J Virol 2004) have already reported on western blotting-basis that SIV-infected rapid progressors tend to manifest serological failure (impaired binding Ab-WB bands). Therefore, to compare quantitative traits at this basal stage (Fig 1), we judged that NAb inducer comparison with more non-rapid-progressing (>60 wk survival) non-inducers would be a criterion. We have mentioned on this in the revised manuscript (results/methods). Additionally, we have replaced the immunoblotting result with one more non-inducer (n = 12) to enhance results. Please note that there are lot deviations in strip-coated antigen (e.g., gp160) but the result is comparable (now covers 12/13 of animals with >60-wk survival).

      The authors indicated the frequencies of Nef-G63E mutant in figure 2 panel C. However. no information is indicated in the legend about the number of NAb non-inducers used to calculate this frequency. The authors indicated line 127, "only in two of the nineteen NAb non-inducers, including one rapid progressor". Thus, different numbers of individuals are used through the manuscript. For the readers, this is clearly a statement that needs to be clarify and to refer to what. This is not homogeneous along the text and the analyses performed.

      We appreciate the comment, and have appended the number in the revised Fig 2C. As aforementioned, heterogeneity of sample number in different sections is indeed a limitation of the work, and have mentioned this in the Discussion.

      The rational related to the sentence lines 140-142. Please clarify.. "NAb induction is not associated with these MHC-I genotypes (P = 0.25 by Fisher's exact test, data not shown) but with the Nef-G63E mutation itself".

      We appreciate the comment. We have rephrased it as:

      “Ten of nineteen NAb non-inducers also had either of these alleles (Figure 1-figure supplement 1). This did not significantly differ with the NAb inducer group (P = 0.25 by Fisher’s exact test, data not shown), indicating that NAb induction was not simply linked with possession of these MHC-I genotypes but instead required furthermore specific selection of the Nef-G63E mutation.” (Lines 159-162).

      In supplemental figure 3, only 7 individuals have been tested, while the authors indicated "Ten of nineteen NAb non-inducers also had either of these alleles". Why only seven? In NAb Burmese monkeys, the authors indicate specific T cells capable to recognize WT nef peptide, but not G63E peptide mutant. Thus, nef is immunogenic in vivo generating T cells despite to be mutated.

      In contrary, non-NAb-inducers demonstrate the absence of nef specific T cells (supplemental figure 3, excepted R01-011 panel A). Although, the authors propose an escape mutant for CD8 T cells, this is not associated with the absence of immunogenicity and not with a difference in viral load in comparison to NAb inducers (panel C). Therefore, the conclusions merit to be revised. Thus, this part of the manuscript is confusing. Please clarify the rational to link NAb and Nef specific CD8 T cells.

      We appreciate the comment. 7 out of 8 non-inducers positive for the allele and not selecting for the Nef-G63E mutant was available for analysis. The relative contribution of this single Nef62-70 epitope-specific CTL response is speculated not to be largely impacting viral control, among the many induced. This is basally discussed in a previous paper (Nomura, Yamamoto et al., J Virol 2012), more suggestive of an MHC-I haplotype-level correlation with plasma viral load. We assume that the CTL pressure-driven selection of Nef-G63E mutant was a rather pure immunosignaling trigger under persistent viremia. We appended this in the revised text (Line 172).

      In the next part of the manuscript, the authors assessed the function of this Nef-G63E mutant. The rational to introduce Ferritin in this part of the document is not clear for the reader. Furthermore, a subgroup for each (NAb+ versus NAb-) is shown: 4 for NAbneg versus 6 for NAbpos.

      We appreciate the point. As introduced, Swingler et al Cell Host Microbe 2008 reported HIV-infected macrophage-derived ferritin as a potentially B cell-disrupting factor. In that paper, viral load, ferritin and binding antibody titers positively correlated. Current data shows that SIVmac239-specific NAb induction is distinct from such kinetics already versus viral load (Fig 3-Supplement 1C), and ferritin levels were measured for some available samples more simply for confirmation. We appended three more available samples in the NAb- group. (The six NAb+/G63E animals correspond to the ones with B-cell data in Figure 7.) Statistical results appear unaffected and robust, as shown in this version. The revised manuscript incorporates appended explanation for the former.

      Similarly, whereas the authors observed a role of nef mutant on pAkt Ser473 (less induced) in comparison to WT, the authors suggest that this may have an impact on T cell survival.

      We appreciate the point. In the first submission we obtained peripheral memory Tfh decrease, whereas it is true that this is indirect. In the current revision we have addressed apoptotic cell death, shown to increase with Nef-G63E mutation (Figure 4F).

      The rational to analyze CXCR3-CXCR5+PD-1+ memory follicular Th (Tfh) is not clear. Moreover, the references used are not the adequately cited. Indeed, these papers show an expansion. See the literature for a depletion (Xu H, J Immunol. 2015; Moukambi F, PLoS Pathog. 2015; Yamamoto T, Sci Transl Med. 2015; Xu H, J Immunol. 2018 Moukambi F, Mucosal Immunol. 2019).

      We appreciate these points on in vivo CD4+ T cells.

      Peripheral memory Tfh was reported to correlate with Ab cross-reactivity in one human cohort (Locci et al, Immunity 2013) and we concisely examined the subset in the current NAb induction. We mentioned this in the revised manuscript.

      Moukambi F et al, PLoS Pathog 2015 & Mucosal Immunol 2019 are demonstrative work on acute-phase destruction. We have cited non-neonatal/vaccine-related ones suggested, including these two, in the revised manuscript. The biphasic dysregulation of Th (acute-phase destruction and chronic-phase adverse hyper-expansion) may indeed have a unique role with the current phenotype, which is beyond aim of the current analysis. We have concisely mentioned on this in the Discussion.

      Then, the authors assess the potential B-cell-intrinsic influence of the G63E-Nef phenotype. The rational here is clearly indicated, making sense with figure 1. Furthermore, this part is clearer. The dot-plots merit to be revised and the markers used better stated. The authors indicate that Nef invasion upregulates pAkt Ser473 assuming aberrant PI3K/mTORC2 signaling. What is the impact of Nef-G63E mutant on pAkt Ser473 using in vitro model of transfer. This is not addressed for comparison.

      We appreciate the remarks/suggestions, also pointed out by Reviewers 1 and 2. We have performed three sets of in vitro reconstitution experiments visually and functionally addressing how Nef transfer to B cells can be modulated (new Fig 6), and edited text accordingly.

      Minor points are:

      - the presence of references in the legend.

      -some Ab clones are in the table, however they are not used such CD38 and CD138, which are well known to be non-valid B cell markers for monkeys."

      We appreciate the suggestions.

      Mentioning on reference have been removed from the legend (Fig.1, Fig. 3) and moved to the corresponding Methods section (Fig. 1).

      We also understood this well in advance (CD38/CD138), and incorporated them in the memory B-cell panel just to check whether they ever behave in a specific pattern. As expected, no notable behavior was observed in these NAb inducers.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This valuable study examines the effects of NFKB2 mutations on pituitary gland development through hypothalamic-pituitary organoids. The evidence supporting the main conclusions is solid, although analysis of additional clones to exclude inter-clone variability would strengthen the conclusions. Insight into the mechanism of action of NFKB2 during pituitary development is incomplete. This work will be of interest to endocrinologists and biologists working on pituitary gland development and disease.

      We agree with these considerations and the summary and thank the Editors for their assessment. Although we indeed share the idea that reproduction of the experiments on a second clone would be a useful confirmatory step, we have not been able to reach this goal within a reasonable time frame for the reason mentioned above (unavailability of the main research engineer knowledgeable in the challenging methods involved for organoids differentiation) and due to the long turnaround time of this kind of experiments (3 months for the whole differentiation starting form iPSC). We therefore decided to publish on a single clone while we are still aiming at reproducing our results on at least a second one and will hopefully be able to provide these additional data in a subsequent revised version. We now acknowledge this limitation in the final part of the Discussion.

      Revised text: “Conversely, a limitation of this model is the long duration of the differentiation period (approximately 3 months) and the fact that not all hiPSC clones lead to full differentiation of hypothalamo-pituitary organoids despite similar conditions of culture. For these reasons, we could not include confirmation of our results on an independent clone in the present paper.”

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      NFKB mutations are thought to be one of the causes of pituitary dysfunction, but until now they could not be reproduced in mice and their pathomechanism was unknown. The authors used the differentiation of hypothalamic-pituitary organoids from human pluripotent stem cells to recapitulate the disease in human iPS cells carrying the NFKB mutation.

      Strengths:

      The authors achieved their primary goal of recapitulating the disease in human cells. In particular, the differentiation of the pituitary gland is closely linked to the adjacent hypothalamus in embryology, and the authors have again shown that this method is useful when the hypothalamus is suspected to be involved in pituitary abnormalities caused by genetic mutations.

      Weaknesses:

      On the other hand, the pathomechanism is still not fully understood. This study provides some clues to the pathomechanism, but further analysis of NFKB expression and experiments investigating the relevant factors in more detail may help to clarify it further.

      We thank this reviewer for acknowledging that we've reached our primary objective, in particular the fact that the HPO (hypothalamo-pituitary organoid) model allows recapitulation of the disease in human cells, including hypothalamic-pituitary interactions. Regarding the pathophysiological mechanism of the disease, we must admit that it remains incompletely understood. However, we have analysed more samples by RT-qPCR and further analysed RNASeq data from NFKB2 KI organoids, which provided with more insights into the different levels where NFKB2 may play a role. We have now provided several additional figures derived from these analyses, including a synthetic figure to summarize the most relevant observed effects (Fig. 14). 

      Reviewer #2 (Public Review):

      We also thank this reviewer for the detailed analysis of our manuscript, for the valuable comments, suggestions and questions that are addressed point-by point below. 

      Summary:

      DAVID syndrome is a rare autosomal dominant disorder characterized by variable immune dysfunction and variable ACTH deficiency. Nine different families have been reported, and all have heterozygous mutations in NFKB2. The mechanism of NFKB2 action in the immune systems has been well-studied, but nothing is known about its role in the pituitary gland.

      The DAVID mutations cluster in the C-terminus of the NFKB2 and interfere with cleavage and nuclear translocation. The mutations are likely dominant negative, by affecting dimer function. ACTH deficiency can be life-threatening in neonates and adults, thus, understanding the mechanism of NFKB2 action in pituitary development and/or function is important.

      The authors use CRISPR/Cas gene editing of human iPSC-derived pituitary-hypothalamic organoids to assess the function of NFKB2 and TBX19 in pituitary development. Mutations in TBX19 are the most common, known cause of pituitary ACTH deficiency, and the mechanism of action has been studied in mice, which phenocopy the human condition. Thus, the TBX19 organoids can serve as a positive control. The Nfkb2<Lym1/Lym1> mouse model has a p.Y868* mutation that impairs cleavage of NFKB2 p100, and the immune phenotype mimics the patients with DAVID mutations, but no pituitary phenotype was evident. Thus, a human organoid model might be the only approach suitable to discover the etiology of the pituitary phenotype.

      Overall, the authors have selected an important problem, and the results suggest that the pituitary insufficiency in DAVID syndrome is caused by a developmental defect rather than an autoimmune hypophysitis condition. The use of gene editing in human iPSC-derived hypothalamic-pituitary organoids is significant, as there is only one example of this previously, namely studies on OTX2. Only a few laboratories have demonstrated the ability to differentiate iPSC or ES cells to these organoids, and the authors have improved the efficiency of differentiation, which is also significant.

      The strength of the evidence is excellent. However, the two ACTH-deficient organoid models use a single genetically engineered clone, and the potential for variability amongst clones makes the conclusions less compelling. Since the authors obtained two independent clones for NFKB2 it is not clear why only one clone was studied.

      We experienced difficulties obtaining an hiPSC population devoid of spontaneous differentiation while purifying this second clone, and did not want to delay the start of the experiments. This clone will be analysed in a follow-up study.

      Finally, the effect of TBX19 on early pituitary fate markers is somewhat surprising given the phenotype of the knockout mice and patients with mutations. Thus, the use of a single clone for that study is also worrisome.

      We agree that the effect of the TBX19 mutant on early pituitary progenitor development is rather puzzling. In our model, TBX19 is expressed throughout the whole experiment, although it is at very low levels in undifferentiated hiPSCs compared to peak expression (over 50-fold difference).

      During the CRISPR-Cas9 gene edition, we obtained a clone with a homozygous one base insertion at the cutting site, leading to a frameshift and a premature stop codon 48 bases downstream. This would result in an expected protein of 163 amino acids instead of 488, but with potentially still functional DNA-binding ability. This mutation had a similar effect on LHX3 and PITX1 as the TBX19 KI mutation, although it was even more severe. Our most likely explanation is that the two TBX19 mutants we generated have dominant negative effects. Contrary to mouse, little is known about TBX19 expression in early human pituitary development, but scRNA-seq data on human embryonic pituitaries (Zhang et al.) show low expression in undifferentiated pituitary progenitors between 7 and 9 weeks of gestation. Therefore, early expression of these dominant negative proteins could perturb differentiation in the organoids. Future development of hiPSCs lines with total absence of TBX19 should help clarify these questions.

      Strengths:

      The authors make mutations in TBX19 and NFKB2 that exist in affected patients. The TBX19 p.K146R mutation is recessive and causes isolated ACTH deficiency. Mutations in this gene account for 2/3 of isolated ACTH deficiency cases. The NFKB2 p.D865G mutation is heterozygous in a patient with recurrent infections and isolated ACTH deficiency. NFKB2 mutations are a rare cause of ACTH deficiency, and they can be associated with the loss of other pituitary hormones in some cases. However, all reported cases are heterozygous.

      The developmental studies of organoid differentiation seem rigorous in that 200 organoids were generated for each hiPSC line, and 3-10 organoids were analyzed for each time point and genotype. Differentiation analysis relied on both RNA transcript measurements and immunohistochemistry of cleared organoids using light sheet microscopy. Multiple time points were examined, including seven times for gene expression at the RNA level and two times in the later stages of differentiation for IHC.<br /> TBX19 deficient organoids exhibit reduced levels of PITX1, LHX3, and POMC (ACTH precursor) expression at the RNA and IHC level, and there are fewer corticotropes in the organoids, as ascertained by POMC IHC.

      The NFKB2 deficient organoids have a normal expression of the early pituitary transcription factor HESX1, but reduced expression of PITX2, LHX3, and POMC. Because there is no immune component in the organoid, this shows that NFKB2 mutations can affect corticotrope differentiation to produce POMC. RNA sequencing analysis of the organoids reveals potential downstream targets of NFKB2 action, including a potential effect on epithelial-to-mesenchymal-like transition and selected pituitary and hypothalamic transcription factors and signaling pathways.

      Weaknesses:

      There could be variation between individual iPSC lines that is unrelated to the genetically engineered change. While the authors check for off-target effects of the guide RNA at predicted sites using WGS, a better control would be to have independently engineered clones or to correct the engineered clone to wild type and show that the phenotypic effects are reversed.

      All NFKB2 patients are heterozygous for what appear to be dominant negative mutations that affect protein cleavage and nuclear localization of processed protein as homo or heterdimers. The organoids are homozygous for this mutation. Supplemental Figure 4 indicates that one heterozygous clone and two homozygous mutant clones were obtained. Analysis of these additional clones would give more strength to the conclusions, showing reproducibility and the effect of mutant gene dosage.

      The main goal of this work was to evaluate if and how NFKB2D865G mutation affects hypothalamic-pituitary organoids development, in order to determine if these organoids would constitute a valuable model to study DAVID syndrome.

      We thank this reviewer for noting that we identified an important question and have used appropriate novel and not widely used methods to address it, including CRISPR/Cas9 genome editing of iPSCs and disease modelling in iPSC-derived HPOs that had not previously been reported by a team other than the one that initially described it, allowing to confirm our working hypothesis that DAVID syndrome is caused by a developmental defect rather than an autoimmune hypophysitis condition. We also agree that analysing more clones, generated from same or different hiPSC lines, carrying homozygous or heterozygous mutations, and corrected mutations will be necessary in the future.

      Reviewer #3 (Public Review):

      We also thank this reviewer for the detailed analysis of our manuscript, for the valuable comments, suggestions and questions that are addressed point-by point below. 

      Summary:

      This manuscript by Mac et al addresses the causes of pituitary dysfunction in patients with DAVID syndrome which is caused by mutations in the NFKB2 gene and leads to ACTH deficiency. The authors seek to determine whether the mutation directly leads to altered pituitary development, as opposed to an autoimmune defect, by using mutating human iPSCs and then establishing organoids that differentiate into pituitary tissue. They first seek to validate the system using a well-characterised mutation of the transcription factor TBX19, which also results in ACTH deficiency in patients. Then they characterise altered pituitary cell differentiation in mutant NFKB2 organoids and show that these lack corticotrophs, which would lead to ACTH deficiency.

      Strengths:

      The conclusion of the paper that ACTH deficiency in DAVID syndrome is independent of an autoimmune input is strong.

      Weaknesses:

      (1) The authors correctly emphasise the importance of establishing the validity of an iPSC-based model in being able to recapitulate in vivo dysfunctional pituitary development through characterisation of a TBX19 knock-in mutation. Whilst this leads to the expected failure of functional corticotroph differentiation, other aspects of the normal pituitary differentiation pathway upstream of corticotroph commitment seem to have been affected in surprising ways. In particular, the loss of LHX3 and PITX1 in TBX19 mutant organoids compared with wild type requires explanation, especially as the mutant protein would only be expected to be expressed in a small proportion of anterior pituitary lineage cells.

      If the developmental expression profile of key transcription factors in mutant organoids does not recapitulate that which occurs in vivo, any interpretation of the relevance of expression differences in the NFKB2 organoids to the mechanism(s) leading to corticotroph function in vivo has to be questionable.

      See response to Reviewer #2

      It is notable that the manipulation of iPSC cells used to generate mutants through CRISPR/Cas9 editing is not applied to the control iPSC line. It is possible that these manipulations lead to changes to the iPSC cells that are independent of the mutations introduced and this may change the phenotype of the cells. A better control would have been an iPSC line with a benign knock-in (such as GFP into the ROSA26 locus).

      We agree that the issue of off-target mutations should be addressed. However, we performed whole genome sequencing on TBX19 KI and did not observe any pathogenic variants other than the intended edition. We also checked that clones isolated during the screening procedure but that returned negative for editing still had the ability to generate pituitary cells. However, we made the choice to use the isogenic original hiPSC line as it could be compared to both TBX19 KI and NFKB2 KI simultaneously, therefore reducing workload and cost of the experiments. Any other knock-in mutation, such as GFP into the ROSA26 locus would imply the same risk of off-target mutations, but presumably at other sites in the genome.

      (2) In the results section of the manuscript the authors acknowledge that hypothalamic tissue in the NFKB2 mutant organoid may be having an effect on the development of pituitary tissue. However, in the discussion the emphasis is entirely on pituitary autonomous mechanisms such as pituitary HESX1 expression or POMC gene regulation; in the conclusion of the abstract, a direct role for NFKB2 in pituitary differentiation is described. Whilst the data here may suggest a non-immune mediated alteration in pituitary function in DAVID syndrome, if this is due to alteration of the developing hypothalamus then this is not direct. A fuller discussion of the potential hypothalamic contribution and/or further characterisation of this aspect is warranted.

      We agree with this reviewer that contributions of both hypothalamic and pituitary developing tissues should be taken into account. We performed more experiments and analysed the effect of both mutations on hypothalamic growth factors expression. These results are displayed in new figure 10. The role of the hypothalamus is now clearly mentioned and highlighted in the Discussion.

      (3) qRT-PCR data presented in Figure 6A shows negligible alteration of HESX1 expression at all time points in NFKB2 mutant organoids. This is not consistent with the 2-fold increase in HESX1 expression described in day 48 organoids found by bulk RNA sequencing.

      How do the authors reconcile these results and why is one result focused on in the discussion where a potential mechanism for a blockade of normal pituitary cell differentiation is suggested? Further confirmation of HESX1 expression is required.

      In the previous version on the manuscript, the HESX1 fold-change ratio between NFKB2 KI and WT at d48 was of 2.06 (p=0.22). However, the type of representation for expression kinetics (values relative to the expression peak in WT) and the scale used made it difficult to see. In the new version of the manuscript, we analysed more samples from the same experiments, and new figure (now 6B) shows significant increase of HESX1 expression (Fc = 2.46, p=0.019) in NFKB2 KI.

      Also, qPCR results come from at least two different experiments whereas RNAseq come from a single one. For RT-qPCR, 6 HPOs per genotype were picked and further analysed. As we found that only 60-70% of organoids show signs of pituitary cell differentiation, we chose to perform a preselection of organoids, based on RT-qPCR expression of selected markers (SOX2, HESX1, PITX1, LHX3, TBX19, POU1F1 and POMC) in order to avoid having “empty” HPOs sent for bulk RNAseq. We compared HESX1 expression ratios obtained by the two different techniques on the same samples (the ones used for RNA-seq) and found values of 2.19 (p=0.03) and 1.83 (p=0.061) for RNA-seq and RT-qPCR respectively. This is illustrated in Supplementary Figure 7. Our new results thus clearly demonstrate the increase in HESX1 expression in NFKB2 KI from d27 to d75.

      (4) Throughout the authors focus on POMC gene expression and ACTH antibody immunopositive as being indicative of corticotroph cell identity. In the human fetal pituitary melanotrophs are present and most ACTH antibodies are unable to distinguish these cells from corticotrophs. Is the antibody used specifically for ACTH rather than other products of the POMC gene? It is unlikely that all the ACTH-positive cells are melanotrophs, nevertheless, it is important to know what the proportions of the 2 POMC-positive cell types are. This could be distinguished by looking for the expression of NeuroD1, which would also define whether corticotrophs are committed but not fully differentiated in the NFKB2 mutant organoids. In support of an effect on corticotrophs, it is notable that CRHR1 expression (which would be expected to be restricted to this cell type) is reduced by 84% in bulk RNAseq data (Table 1) and this may be an indicator of the loss of corticotrophs in the model.

      The antibody we used is directed against ACTH. In HPOs, PAX7 expression was barely detected during the whole experiment. Moreover, although PCSK2 transcripts were observed, their expression started very early (d27) and remained constant, suggesting that an expression of this gene in hypothalamic cells rather than pituitary cells. All these observations suggest that melanotrophs are very unlikely to be present in HPOs.

      (5) Notwithstanding the caveats about whether the organoid model recapitulates in vivo pituitary differentiation (see 1 above) and whether the bulk RNAseq accurately reflects expression levels (see 3 above), there are potentially some extremely interesting changes in gene expression shown in Table 1 which warrant further discussion. For example, there is a 25-fold reduction in POU1F1 expression which may be expected to reflect a loss of somatotrophs in the organoid (and possibly lactotrophs) and highlights the importance of characterising the effect of NFKB2 on other anterior pituitary cell types within the organoid. If somatotrophs are affected, this may be relevant to the organoids as a model of DAVID syndrome as GH deficiency has been described in some individuals with NFKB2 mutations. The huge increase in CGA expression may reflect a switch in cell fate to gonadotrophs, as has been described with a loss of TPIT in the mouse. These are examples of the changes that warrant further characterisation and discussion.

      We performed a more in-depth analysis of other pituitary lineages (mainly somatotrophs). We confirmed the strong reduction in PROP1 and POU1F1 expression in NFKB2 KI organoids. Although the strong increase in CGA expression in the mutant may raise the possibility of a redirection towards gonadotroph lineage, the lack of change in NR5A1 expression may suggest otherwise.

      These results are now illustrated in figure 12 and discussed in a full paragraph.

      (6) How do the authors explain the lack of effect of NFKB2 mutation on global NFKB signalling?

      The most likely explanation is that p100/p52 is not involved in controlling the expression of other members of NFKB signalling. Therefore, the absence of global alteration of NFKB signaling pathway shows that mutant p100/p52 protein is directly responsible for the observed phenotype.

      Recommendations for the authors:

      Reviewing editor summary of recommendation to authors:

      The use of hypothalamic-pituitary organoids can provide a fundamental understanding of pituitary gland development and differentiation. Their use to study human pituitary insufficiency is important, gaining insight into the aetiology of disease and if it implicates the hypothalamus or anterior pituitary. To this end, there is only one other example of their use in the literature, where Matsumoto et al, (2019), used OTX2-mutant hypothalamic-pituitary organoids to understand the aetiology of pituitary hypoplasia driven by OTX2 mutations. This being the second example of using gene editing in human iPSC-derived hypothalamic-pituitary organoids, these studies have improved the efficiency of differentiation previously published by Suga et al. (2011) for ES cells, and Matsumoto et al. (2019) for iPS cells. In addition, it has solidified that this method is useful, especially when studying hypothalamic involvement in human pituitary anomalies, due to the concerted development of these two structures.

      The reviewers recognise the valuable insight provided into the mechanism of NFKB2 action during pituitary development and how this human organoid model might be one of the few or only approaches suitable to discover the aetiology of the pituitary phenotype.

      The reviewers agree that both the evidence provided from the organoid model, as well as the characterisation of the phenotype are incomplete. In particular, the strength of evidence would be improved by analysing additional independent clones for both NFKB2 as well as TBX19 gene-edited iPSCs. Additionally, analysis of NFKB2 expression both in vivo and in the organoids, as well as analysis for the NFKB2 targets put forward, would be a lot more informative to help understand this phenotype.

      The main recommendations discussed are summarised here and the reviewers have elaborated on these points in their individual reviews:

      The two ACTH-deficient organoid models use a single genetically engineered clone, and the potential for variability amongst clones, unrelated to the mutation, makes the conclusions less compelling. Two independent homozygous clones were obtained for NFKB2 but only one was used, so analysis of the second clone would strengthen the findings. A heterozygous clone was also obtained and given all NFKB2 patients are heterozygous for what appears to be dominant negative mutations, the heterozygous clone ought to be analysed. Analyses of these additional clones would give more strength to the conclusions, showing reproducibility and the effect of mutant gene dosage. The reviewers provide excellent suggestions for alternative controls for the engineered iPSC lines in their specific comments.

      The effect of TBX19 mutation on early pituitary fate markers LHX3 and PITX1 is surprising given the phenotype of the knockout mice and patients with mutations. If the developmental profile of essential transcription factors does not recapitulate the in vivo expression in this well-characterised mutant, this brings the organoid model into question. Thus, analysis of a further clone for the study of mutant TBX19 would be crucial. The validity of this control affects the interpretations relying on expression differences in the NFKB2-mutant organoids.

      The study has implicated NFKB2 in pituitary development, but more insight is needed to fully understand disease pathogenesis. The authors presented potential downstream targets of NFKB2 action, including transcription factors and key signalling pathway components; further analyses of NFKB2 expression and experiments investigating the relevant factors in more detail will help elucidate this point.

      Discerning between the hypothalamus and pituitary tissue is fundamental to interpreting phenotypes: (i) To pinpoint the primary tissue affected by NFKB2 deficiency, staining for NFKB2 during development in vivo will determine if this is expressed both in the developing hypothalamus and anterior pituitary gland or only one of these tissues. (ii) Using markers of hypothalamus and pituitary to discern between these two tissues in organoids, will provide a lot of valuable information where expression changes are presented. This would help discern the contribution of the developing hypothalamus as this is still unclear and has not been discussed. Knowing which tissue compartments NFKB2 is expressed in the organoids would also be of great value.

      The organoids provide an opportunity to characterise the effects of NFKB2 on other pituitary cell types, since the bulk RNAseq presents intriguing changes indicating that not only corticotrophs may be affected. This may be of relevance to patients, which can have additional pituitary hormone deficiencies. If NFKB2 is expressed in the pituitary, demonstrating expression in the different cell types in vivo as well as in the organoids would help interpret the phenotype. Is this expressed only in corticotrophs/corticotroph precursors, or in additional endocrine cells?

      We agree with these considerations and the summary and thank the Editors for their assessment. Although we indeed share the idea that reproduction of the experiments on a second clone would be a useful confirmatory step, we have not been able to reach this goal within a reasonable time frame for the reason mentioned above (unavailability of the main research engineer knowledgeable in the challenging methods involved for organoids differentiation) and due to the long turnaround time of this kind of experiments (3 months for the whole differentiation starting form hiPSC). We therefore decided to publish on a single clone while we are still aiming at reproducing our results on at least a second one and will hopefully be able to provide these additional data in a subsequent revised version. We now acknowledge this limitation in the final part of the Discussion.

      We have analysed more samples by RT-qPCR and further analysed RNASeq data from NFKB2 KI organoids, which provided with more insights into the different levels where NFKB2 may play a role. Specifically, we now show the effect of NFKB2 mutation on hypothalamic growth factors and pituitary progenitor differentiation (figure 10), different stages of corticotroph maturation (figure 11) and effects on PROP1/POU1F1-dependent lineages (figure 12). We confronted our results to publicly available ChIPseq data concerning p52 transcriptional targets (figure 13). We have now provided several additional figures derived from these analyses, including a synthetic figure to summarize the most relevant observed effects (Fig. 14). 

      Reviewer #1 (Recommendations For The Authors):

      In organoids, it is essential to stain for NFKB: is it the hypothalamus or the pituitary that expresses NFKB, and if the pituitary, is it the corticotroph itself or the surrounding cells? If immunostaining is not available, FISH or RNAscope can be used to look at expression.

      Figure 7 shows stronger expression of p100/p52 in pituitary progenitors, and some expression in the hypothalamic part of the organoid. Due to current lack of biological material and length of experimental procedure, we could not yet determine which differentiated cell types express p100/p52, but this is clearly something we will look at in further experiments.

      Regarding Figure 7, NFKB2 (D865G/D865G) shows no LHX3 expression already at day 48. It would be better to look at expression including PITX1 at an earlier time point to see at what point differentiation is impaired.

      RT-qPCR results show no statistically significant changes in PITX1 (Fc=0.58, p=0.25) or LHX3 (Fc = 0.15; p=0.22) expression at d27, although there was a tendency towards downregulation.

      Is it really just a species difference that NFKB2-deficient mice do not have abnormal pituitary function? This needs to be discussed in the manuscript.

      Nfkb2_Lym1/Lym1 mice and _NFKB2 KI model have different but functionally very similar mutations, as they both lead to an abnormal processing of p100 and a strong reduction of p52 content. In mice, these mutations are more severe than the complete absence of Nfkb2 gene product, and they have been called “super repressors”. It is therefore surprising that no pituitary phenotype as been observed in mice. In our opinion, this constitutes a strong argument in favour of an inter-species difference, at least for the pathogenicity of this type of mutations.

      This point is now addressed in the Discussion

      Just looking at changes in gene expression by qPCR and bulk RNA-seq does not give enough information about localisation. We wish RNA-seq had at least been separated by FACS first. For example, FACS can separate the anterior pituitary and hypothalamus by EpCAM positivity/negativity (PMID: 35903276), so we would like to see gene expression in such separated samples.

      This is a pertinent suggestion. We are aware of these techniques and we hope we will be able to include them in future studies

      For Figures 2 and 6, just looking at changes in gene expression by qPCR does not provide localisation information, so either (1) immunostaining for LHX3 and NKX2.1 should be shown in each aggregate as in FigS3, or (2) qPCR should be performed on the FACSed cells. (2) qPCR on FACSed cells.

      PITX1, LHX3 (as confirmed by our immunofluorescence data) and HESX1 are only expressed in non-neural tissue. TBX19 could be expressed in the hypothalamic part of the organoid, but we observed very little immunostaining outside the outermost layers of organoids (i.e. pituitary tissue). The antibody we used to detect corticotrophs only recognizes ACTH, and therefore only marks pituitary cells.

      In addition, pathway and gene ontology analyses should be performed.

      Pathways and gene ontology have been performed. However, as organoids consist of two different tissues, the analysis of over 4800 differentially expressed genes did not give us very informative results, apart from an impairment of retinoic acid signalling that we are currently investigating

      Reviewer #2 (Recommendations For The Authors):

      The differentiation of iPSC to organoids could be variable. The authors indicate that 200 organoids were analyzed for each line, and 3-10 organoids were analyzed per time point, genotype, and assay. Is it clear that 100% of the organoids differentiate to produce corticotropes? Please clarify.

      In our experiments, almost 90% of organoids give rise to non-neural ectoderm, as demonstrated by PITX1 expression. However, depending on experiments, only 60-70% of organoids give rise to pituitary progenitors (LHX3+) and subsequently to corticotropes. This has been clarified in the text.

      For TBX19, it seems surprising that there is an effect on PITX1 and LHX3 expression, since TBX19 expression is normally activated after these genes are expressed. An effect of TBX19 on EMT would also be surprising as the knockout mice do not have dysmorphology of the stem cell niche. The only evidence for an effect is the reduced IHC for E-cadherin. If this is an important point, the authors should examine other EMT markers such as Zeb2. The TBX19 knockout mice appear to form corticotropes based on the expression of NeuroD1, even though they lack TBX19 and POMC expression. It would be reassuring to see that NeuroD1 is normally expressed in the TBX19 mutant organoids.

      We agree that the effect of the TBX19 mutant on early pituitary progenitor development is rather puzzling. In our model, TBX19 is expressed throughout the whole experiment, although it is at very low levels in undifferentiated hiPSCs compared to peak expression (over 50-fold difference).

      During the CRISPR-Cas9 gene edition, we obtained a clone with a homozygous one base insertion at the cutting site, leading to a frameshift and a premature stop codon 48 bases downstream. This would result in an expected protein of 163 amino acids instead of 488, but with potentially still functional DNA-binding ability. This mutation had a similar effect on LHX3 and PITX1 as the TBX19 KI mutation, although it was even more severe. Our most likely explanation is that the two TBX19 mutants we generated have dominant negative effects. Contrary to mouse, little is known about TBX19 expression in early human pituitary development, but scRNA-seq data on human embryonic pituitaries (Zhang et al.) show low expression in undifferentiated pituitary progenitors between 7 and 9 weeks of gestation. Therefore, early expression of these dominant negative proteins could perturb differentiation in the organoids. Future development of hiPSCs lines with total absence of TBX19 should help clarify these questions.

      Apart from the lack of change in ZEB2 expression in TBX19 KI (Fc = 1.15; p = 0.35), we did not look further for changes in EMT markers in TBX19 KI. However, we added a more detailed analysis for EMT markers expression in NFKB2 KI based on RNAseq results (see table 2).

      Due to lack of material, we could not confirm NEUROD1 expression by immunostaining. However, RT-qPCR showed there was no change in NEUROD1 expression in TBX19 KI (Fc = 0.81; p = 0.64)

      NFKB2 IHC was markedly reduced in NFKB2 D865G/D865G organoids. Based on previous experiments, the mutant protein should be expressed but not activated by proteolytic cleavage. It is possible that the antibody has a different affinity for the mutant protein and/or the uncleaved protein may be unstable. Can this be clarified? The mRNA for mutant NFKB2 appears unchanged in Table 1.

      This is puzzling indeed. We did not notice any change in NFKB2 from d27 to d105, and no significant change either between WT and NFKB2 KI. Although the antibody we used recognizes both p100 and p52, we cannot rule out the possibility that p100/p52 is degraded by pathways other than proteasome. Another possibility is that p100 interactions with other proteins may decrease the accessibility of the antibody to the epitope

      The RNA sequencing data from the NFKB2 organoids is intriguing. It suggests that the NFKB2 mutation may have a modest effect on Tbx19 transcription but not Neurod1. It also suggests there are hypothalamic effects, i.e. altered expression of hypothalamic markers in mutant organoids. Is NFKB2 expressed in the developing hypothalamus? Can normal NEUROD1 IHC be confirmed? It is also intriguing that there may be an effect on EMT. However, there seem to be some discrepancies in the direction of effect on these markers. Please clarify.

      This is related to the point just above. P100/p52 is described as a ubiquitously expressed protein. We think that it is expressed in the hypothalamic part of the organoids, but at a lower level compared to pituitary progenitors.

      As mentioned before, we could not yet confirm NEUROD1 expression by immunostaining, but RT-qPCR clearly showed there was no change in NEUROD1 expression in TBX19 KI (Fc = 0.81; p = 0.64) or NFKB2 KI (Fc = 0.88; p = 0.5). However, we investigated other markers of different stages of corticotroph differentiation (see figure 11) and found that the later stages are most affected.

      Concerning the EMT, we also found changes in the expression of other markers that are shown in Table 2 and discussed further in the text.

      Cytokines have been proposed to play important roles in pituitary differentiation, i.e. IL6. Is there any evidence for an altered cytokine or chemokine expression in the NFKB2 organoids?

      We didn’t see any change in IL6 expression NFKB2 KI (Fc = 2.34; p = 0.55), but RNAseq shows a strong increase in IL6R (Fc = 8.89; p = 2.13e-09). But at this point, the relevance of these observations remains elusive.

      Minor:

      Some patients with DAVID syndrome have pituitary hypoplasia. The authors measure organoid size and find no differences based on genotype. However, each organoid probably has a variable amount of tissue differentiated to pituitary and hypothalamic fates, therefore, the volume of the whole organoid may not be a good proxy for the amount of pituitary tissue.

      We are aware of this issue. However, for most pituitary genes measured by RT-qPCR (PITX1, LHX3, TBX19), the deltaCt values did not drastically vary for a given time point/genotype, suggesting a stable pituitary/hypothalamic ratio.

      Figure 9 shows whole transcriptome data for the NFKB2 organoids, and Table 1 lists the data for selected genes. There appears to be disagreement between the significance cut-offs used in the figure and the table. Please adjust.

      We removed the fold-change cut-offs to improve clarity

      elife120868_0_supp_2945725_rxl2z4. "haft" appears several times, but it should be "half".

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The current manuscript provides strong evidence that the molecular function of SLC35G1, an orphan human SLC transporter, is citrate export at the basolateral membrane of intestinal epithelial cells. Multiple lines of evidence, including radioactive transport experiments, immunohistochemical staining, gene expression analysis, and siRNA knockdown are combined to deduce a model of the physiological role of this transporter.

      Strengths:

      The experimental approaches are comprehensive, and together establish a strong model for the role of SLC35G1 in citrate uptake. The observation that chloride inhibits uptake suggests an interesting mechanism that exploits the difference in chloride concentration across the basolateral membrane.

      Weaknesses:

      Some aspects of the results would benefit from a more thorough discussion of the conclusions and/or model.

      For example, the authors find that SLC35G1 prefers the dianionic (singly protonated) form of citrate, and rationalize this finding by comparison with the substrate selectivity of the citrate importer NaDC1. However, this comparison has weaknesses when considering the physiological pH for SLC35G1 and NaDC1. NaDC1 binds citrate at a pH of ~5.4 (the pKa of citrate is 5.4, so there is a lot of dianionic citrate present under physiological circumstances). SLC35G1 binds citrate under pH conditions of ~7.5, where a very small amount of dianionic citrate is present. The data clearly show a pH dependence of transport, and the authors rule out proton coupling, but the discrepancy between the pH dependence and the physiological expectations should be addressed/commented on.

      Thank you for your insightful comment. Citrate exists mostly in its trianionic form under near neutral pH conditions in biological fluids, as you pointed out. Its dianionic form represents only a small portion (about 1/100) of total citrate due to the pKa. However, significant SLC35G1-specific uptake was observed under near neutral pH conditions (Figure 1G). Therefore, although SLC35G1-mediated citrate transport is less efficient under physiologically relevant near neutral pH conditions, it could still play a role particularly in the intestinal absorption process, in which the concentration gradient of dianionic citrate could be maintained by continuous supply by NaDC1-mediated apical uptake.

      The rationale for the series of compounds tested in Figure 1F, which includes metabolites with carboxylate groups, a selection of drugs including anion channel inhibitors and statins, and bile acids, is not described. Moreover, the lessons drawn from this experiment are vague and should be expanded upon. It is not clear what, if anything, the compounds that reduce citrate uptake have in common.

      Thank you for highlighting the need for clarity regarding the compounds tested in Figure 1F. The tested compounds were TCA cycle intermediates (fumarate, α-ketoglutarate, malate, pyruvate, and succinate) as substrate candidate carboxylates analogous to citrate, diverse anionic compounds (BSP, DIDS, probenecid, pravastatin, and taurocholate) as those that might be substrates or inhibitors, and diverse cationic compounds (cimetidine, quinidine, and verapamil) as those that are least likely to interact with SLC35G1. Among them, certain anionic compounds significantly reduced SLC35G1-specific citrate uptake, suggesting that they may interact with SLC35G1. However, we could not identify any structural features commonly shared by these compounds, except that they have anionic moieties. We acknowledge that it requires further elaboration to clarify such structural features. We have revised the relevant section on p. 3 (line 25 - 32) to include these.

      The transporter is described as a facilitative transporter, but this is not established definitively. For example, another possibility could involve coupling citrate transport to another substrate, possibly even chloride ion.

      Thank you for your insightful comment regarding the nature of SLC35G1's transport mechanism. While we have described SLC35G1 as a facilitative transporter based on our current data, we acknowledge that this has not been definitively proven, as you pointed out, and we cannot exclude the possibility that its sensitivity to extracellular Cl- might imply its operation as a citrate/Cl- exchanger. To examine the possibility, we would need to manipulate the chloride ion gradient across the plasma membrane. Particularly, generating an outward Cl- gradient to see if it could enhance citrate uptake could be a potential strategy. However, current techniques do not allow us to effectively generate the Cl- gradient, thus preventing us from conclusively verifying this possibility. We recognize the importance of further investigating this aspect in future studies. Your suggestion highlights an important area for additional research to fully understand the transport mechanism of SLC35G1. We have additionally commented on this issue on p. 4 (line 1 – 3).

      Reviewer #2 (Public Review):

      Summary:

      The primary goal of this study was to identify the transport pathway that is responsible for the release of dietary citrate from enterocytes into blood across the basolateral membrane.

      Strengths:

      The transport pathway responsible for the entry of dietary citrate into enterocytes was already known, but the transporter responsible for the second step remained unidentified. The studies presented in this manuscript identify SLC35G1 as the most likely transporter that mediates the release of absorbed citrate from intestinal cells into the serosal side. This fills an important gap in our current knowledge of the transcellular absorption of dietary citrate. The exclusive localization of the transporter in the basolateral membrane of human intestinal cells and the human intestinal cell line Caco-2 and the inhibition of the transporter function by chloride support this conclusion.

      Weaknesses:

      (i) The substrate specificity experiments have been done with relatively low concentrations of potential competing substrates, considering the relatively low affinity of the transporter for citrate. Given that NaDC1 brings in not only citrate as a divalent anion but also other divalent anions such as succinate, it is possible that SLC35G1 is responsible for the release of not only citrate but also other dicarboxylates. But the substrate specificity studies show that the dicarboxylates tested did not compete with citrate, meaning that SLc35G1 is selective for the citrate (2-), but this conclusion might be flawed because of the low concentration of the competing substrates used in the experiment.

      Thank you for your valuable comment on our substrate specificity experiments. As you pointed out, we cannot rule out the possibility that dicarboxylates might be recognized by SLC35G1 with low affinity as the tested concentration was relatively low. However, at the concentration of 200 μM, competing substrates with an affinity comparable to that of citrate could inhibit SLC35G1-specific citrate uptake by about 30%. Therefore, it is likely that the compounds that did not exhibit significant effect have no affinity or at least lower affinity than citrate to SLC35G1. Further studies should explore a broader range of concentrations for potential substrates including those with lower affinity. It would help clarify the substrate recognition characteristics of SLC35G1 and if it indeed has a unique preference for citrate over dicarboxylates. We have additionally mentioned that on p. 3, line 32 – 35.

      (ii) The authors have used MDCK cells for assessment of the transcellular transfer of citrate via SLC35G1, but it is not clear whether this cell line expresses NaDC1 in the apical membrane as the enterocytes do. Even though the authors expressed SLC35G1 ectopically in MDCK cells and showed that the transporter localizes to the basolateral membrane, the question as to how citrate actually enters the apical membrane for SLC35G1 in the other membrane to work remains unanswered.

      Thank you for highlighting this important aspect of our study. The mechanism of apical citrate entry in MDCKII cells is unknown, although NaDC1 or a similar transporter may be involved. However, this set of experiments have successfully demonstrated the basolateral localization of SLC35G1 and its operation for citrate efflux. Attempts to clarify the apical entry mechanism may need to be included in future studies for more detailed characterization of the model system using MDCKII cells. This would help in fully understanding the transcellular transport system for citrate. Investigation using Caco-2 cells or MDCKII cells double transfected with NaDC1 and SLC35G1 would also need to be induced in future studies to gain more definitive insights into the transcellular transport mechanism for citrate in the intestine, delineating the suggested cooperative role of NaDC1 and SLC35G1. We would be grateful for your understanding of our handling regarding this issue.

      (iii) There is one other transporter that has already been identified for the efflux of citrate in some cell types in the literature (SLC62A1, PLoS Genetics; 10.1371/journal.pgen.1008884), but no mention of this transporter has been made in the current manuscript.

      Thank you for bringing up the relevance of SLC62A1, which has recently been identified as a citrate efflux transporter in some cell types (PLoS Genet, 16, e1008884, 2020). We have now included comments on this transporter in Introduction (p. 2).

      Reviewer #3 (Public Review):

      Summary:

      Mimura et al describe the discovery of the orphan transporter SLC35G1 as a citrate transporter in the small intestine. Using a combination of cellular transport assays, they show that SLC35G1 can mediate citrate transport in small intestinal cell lines. Furthermore, they investigate its expression and localization in both human tissue and cell lines. Limited evidence exists to date on both SLC35G1 and citrate uptake in the small intestine, therefore this study is an important contribution to both fields. However, the main claims by the authors are only partially supported by experimental evidence.

      Strengths:

      The authors convincingly show that SLC35G1 mediates uptake of citrate which is dependent on pH and chloride concentration. Putting their initial findings in a physiological context, they present human tissue expression data of SLC35G. Their Transwell assay indicates that SLC35G1 is a citrate exporter at the basolateral membrane.

      Weaknesses:

      Further confirmation and clarification are required to claim that the SLC indeed exports citrate at the basolateral membrane as concluded by the authors. Most experiments measure citrate uptake, but the authors state that SLC35G1 is an exporter, mostly based on the lack of uptake at physiological conditions faced at the basolateral side. The Transwell assay in Figure 1L is the only evidence that it indeed is an exporter. However, in this experiment, the applied chloride concentration was not according to the proposed model (120 mM at the basolateral side). The Transwell assay, or a similar assay measuring export instead of import, should be carried out in knockdown cells to prove that the export indeed occurs through SLC35G1 and not through an indirect effect. Related to the mentioned chloride sensitivity, it is unclear how the proposed model works if the SLC faces high chloride conditions under physiological conditions though it is inhibited by chloride.

      Thank you for highlighting these important points. We used the Cl--rich medium in transcellular transport studies, as stated in the relevant section in Meterials and Methods (p. 6, line 2 – 5). The Cl- concentration (144 mM) was comparable to the physiological concentration in extracellular body fluids. To clarify that experimental condition, we have additionally noted that in the text (p. 4, line 9) and the legends of Figs. 1K and 1L. The results indicate that basolaterally localized SLC35G1 can mediate citrate export effectively under the Cl--rich extracellular condition. The transport mechanism regulated by Cl- is unclear, but it is difficult to further clarify the mechanism at this time. We recognize the importance of further investigating the aspect in future studies, including the possibility that SLC35G1 might be a citrate/Cl- exchanger, as pointed out by Reviewer #1 (3rd comment).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The figures are very tiny and difficult to see. The inset in Figure 1C is much too small to be readable. I suggest enlarging the panels.

      Thank you for your feedback. As advised, we have enlarged the panels to improve visibility.

      Line 74: "certain anionic compounds signficantly inhibited SLC35G1-specific citrate uptake, indicating they are also recognized by SLC35G1." This sentence should be reworded since the mechanism is not clear. The word "reduced" would be a better option than "inhibited." Are there other interpretations besides SLC35G1 binding to explain the observations?

      Thank you for your suggestion. We have reworded the sentence to improve clarity (p. 3, line 30). It may be possible to speculate that they interact with SLC35G1, but the mechanisms are not clear yet.

      The manuscript is vague about how the transporter was discovered. If a screen of orphan transporters was performed to identify a citrate transporter, this should be described.

      Thank you for pointing out the need for more details regarding the discovery of the transporter. We have added some detailed description at the beginning of Results and Discussion (p. 3).

      Reviewer #2 (Recommendations For The Authors):

      Recommendations for the authors:

      (1) For transcellular transport of citrate and the role of SLC35G1, it would be better to use Caco-2 cells cultured on Transwells because these cells express NaDC1 in the apical membrane and the authors have shown that SLC35G1 is expressed in the basolateral membrane in this cell line. The mechanism for the entry of citrate into MDCK cells used in the present manuscript is not known. If the authors prefer to use MDCK cells because of their superior use for polarization, they can use a double transfection (NaDC1 and SLC35G1) to differentially express the two transporters in the apical versus and basolateral membrane and then use the cells for trans cellular transport of citrate.

      Please refer to our reply to your second review comment.

      (2) The substrate specificity experiments should use concentrations higher than 0.2 mM for competing dicarboxylates because the Km for citrate is only 0.5 mM. It is likely that NaDC1 brings in citrate and other dicarboxylates into enterocytes and then SLC35G1 mediates the efflux of these metabolic intermediates into blood.

      Please refer to our reply to your first review comment.

      (3) One major aspect of the transport function of this newly discovered citrate efflux transporter that has not been explored is the role of membrane potential in the transport function. The transporter is not coupled to Na or K or even H; so then the transport of citrate via this transporter must be electrogenic. Of course, this would be perfect for the transporter to function in the efflux of citrate because of the inside-negative membrane potential, but the authors need to show that the transporter is electrogenic. This can be examined through Caco-2 cells and/or MDCK cells expressing SLC35G1 and examining the impact of changes in membrane potential (valinomycin and K) on the transport of citrate.

      Thank you for your suggestion. As shown in Figure 1D, the use of K-gluconate in place of Na-gluconate, which induces plasma membrane depolarization, had no impact on the specific uptake of citrate, suggesting that SLC35G1-mediated citrate transport is independent of membrane potential. We have additionally mentioned this on p. 3 (line 21 – 24).

      (4) The localization studies mention Na/K ATPase component as a basolateral membrane marker, but the text describes it as BCRP. This needs to be corrected.

      Thank you for pointing out the mistake. We have corrected that. The marker was ATP1A1.

      Reviewer #3 (Recommendations For The Authors):

      Major points:

      (1) Most experiments measure citrate uptake, but the authors state that SLC35G1 is an exporter, mostly based on the lack of uptake at physiological conditions faced at the basolateral side. The Transwell assay in Figure 1L is the only evidence that it indeed is an exporter. However, in this experiment, the applied chloride concentration was not according to the proposed model (120mM at basolateral side). Why was this chloride concentration not mimicked accordingly in the Transwell assay?

      (2) The Transwell assay, or a similar assay measuring export instead of import, should be carried out in knockdown cells to prove that the export indeed occurs through SLC35G1 and not through an indirect effect.

      (3) Related to the mentioned chloride sensitivity, it is unclear how the proposed model works if the SLC faces high chloride conditions under physiological conditions though it is inhibited by chloride.

      Please refer to our reply to your review comments.

      Related to the localization of SLC35G1:

      (4) The polyclonal antibody against SLC35G1 should be validated to prove the specificity. This should be relatively straightforward given the authors have SLC35G1 knockdown cells.

      Thank you for your suggestion. To validate the specificity of the polyclonal antibody against SLC35G1, we prepared HEK293 cells transiently expressing SLC35G1 and SLC35G1 tagged with a FLAG epitope at the C-terminus (SLC35G1-FLAG). In the immunostained images, whereas only SLC35G1-FLAG was stained with the anti-FLAG antibody, both SLC35G1 and SLC35G1-FLAG were stained with the anti-SLC35G1 antibody, indicating that the anti-SLC35G1 antibody can recognize SLC35G1. In addition, the localization patterns of SLC35G1-FLAG observed with both antibodies were consistent, indicating furthermore that the anti-SLC35G1 antibody can recognize SLC35G1 specifically. Based on all these, the specificity of the anti-SLC35G1 antibody was validated.

      Author response image 1.

      (5) To strengthen the data on the localization of SLC35G1, the cell lines should be co-stained with a plasma membrane marker as well, not just in tissue with ATP1A1. In polarized cells co-staining with apical and basolateral markers should be applied.

      SLC35G1 was indicated to be localized to the basolateral membrane geometrically in both polarized MDCKII and Caco-2 cells. This finding aligns with its basolateral localization indicated by its colocalization with ATP1A1 in the human small intestinal section. These results are we consider sufficient to support the basolateral localization characteristics of SLC35G1.

      General points:

      (6) In the abstract the authors mention that they focus on highly expressed orphan transporters in the small intestine as candidates. However, no other candidates are mentioned or discussed in the study. Consequently, this should be rephrased.

      Thank you for the advice. Also taking into consideration the third recommendation point by Reviewer #1, we have added some detailed description at the beginning of Results and Discussion (p. 3).

      (7) As far as mentioned there is exactly one (other) publication on SLC35G1 (10.1073/pnas.1117231108). The authors should discuss this only publication with functional data on SLC35G1 in more detail. How do the authors integrate their findings with the existing knowledge? For example, why did the authors not investigate the impact of Ca2+ on SLC35G1 transport?

      Thank you for your suggestion. SLC35G1 was indicated to be mainly localized to the endoplasmic reticulum (ER) in the earlier study, in which SLC35G1 was tagged with GFP. A possibility is that SLC35G1 was wrongly directed to ER due to the modulation in the study. We have additionally mentioned this possibility in the relevant section (p. 3, line 9 – 11). We have also revised a relevant sentence on p. 3 (line 5).

      With regard to another point that GFP-tagged SLC35G1 was indicated to interact with STIM1, we examined its effect on SLC35G1-mediated citrate uptake supplementary. As shown in the accompanying figure, coexpression of HA-tagged STIM1 did not affect the elevated citrate uptake induced by FLAG-tagged SLC35G1, indicating that STIM1 has no impact on citrate transport function of SLC35G1 at the plasma membrane.

      Author response image 2.

      (A) Effect of the coexpression of HA-tagged STIM1 on [14C]citrate (1 μM) uptake by FLAG-tagged SLC35G1 transiently expressed in HEK293 cells. The uptake was evaluated for 10 min at pH 5.5 and 37°C. Data represent the mean ± SD of three biological replicates. Statistical differences were assessed using ANOVA followed by Dunnett’s test. *, p < 0.05 compared with the control (gray bar). (B) Western blot analysis was conducted by probing for the HA and FLAG tags, using the whole-cell lysate samples (10 µg protein aliquots) prepared from cells expressing HA-STIM1 and/or FLAG-SLC35G1. The blots of β-actin are shown for reference.

      (8) Generally, the introduction could provide more background.

      In response to your suggestion and also to the third review comment from Reviewer #2, we have now additionally included comments on SLC62A1, which has recently been reported as a citrate efflux transporter in some cell types, in Introduction.

      Minor points:

      (9) There is a typo in Figure 1D: manniotol instead of mannitol.

      Thank you for pointing that out. We have corrected the typo in Figure 1D.

      (10) Figure 1J: The resolution is low and the localization to the basolateral membrane is not conclusive based on this image. It seems rather localized at the whole membrane and intracellularly too.

      Thank you for your feedback. We have enhanced the resolution of the image and also enlarged it to improve clarity and make the basolateral membrane localization more discernible.

      (11) Figure 1K: Clarification is needed if the experiment was performed in the Transwell plate. Based on the results from the pH titration experiment, it is expected that there is no uptake at pH7.4. Therefore, this experiment does not seem to provide additional evidence or support the conclusions drawn related to cellular polarization.

      Please refer to our reply to your review comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Galanti et al. present an innovative new method to determine the susceptibility of large collections of plant accessions towards infestations by herbivores and pathogens. This work resulted from an unplanned infestation of plants in a greenhouse that was later harvested for sequencing. When these plants were extracted for DNA, associated pest DNA was extracted and sequenced as well. In a standard analysis, all sequencing reads would be mapped to the plant reference genome and unmapped reads, most likely originating from 'exogenous' pest DNA, would be discarded. Here, the authors argue that these unmapped reads contain valuable information and can be used to quantify plant infestation loads.

      For the present manuscript, the authors re-analysed a published dataset of 207 sequenced accessions of Thlaspi arvense. In this data, 0.5% of all reads had been classified as exogenous reads, while 99.5% mapped to the T. arvense reference genome. In a first step, however, the authors repeated read mapping against other reference genomes of potential pest species and found that a substantial fraction of 'ambiguous' reads mapped to at least one such species. Removing these reads improved the results of downstream GWAs, and is in itself an interesting tool that should be adopted more widely.

      The exogenous reads were primarily mapped to the genomes of the aphid Myzus persicae and the powdery mildew Erysiphe cruciferarum, from which the authors concluded that these were the likely pests present in their greenhouse. The authors then used these mapped pest read counts as an approximate measure of infestation load and performed GWA studies to identify plant gene regions across the T. arvense accessions that were associated with higher or lower pest read counts. In principle, this is an exciting approach that extracts useful information from 'junk' reads that are usually discarded. The results seem to support the authors' arguments, with relatively high heritabilities of pest read counts among T. arvense accessions, and GWA peaks close to known defence genes. Nonetheless, I do feel that more validation would be needed to support these conclusions, and given the radical novelty of this approach, additional experiments should be performed.

      A weakness of this study is that no actual aphid or mildew infestations of plants were recorded by the authors. They only mention that they anecdotally observed differences in infestations among accessions. As systematic quantification is no longer possible in retrospect, a smaller experiment could be performed in which a few accessions are infested with different quantities of aphids and/or mildew, followed by sequencing and pest read mapping. Such an approach would have the added benefit of allowing causally linking pest read count and pest load, thereby going beyond correlational associations.

      On a technical note, it seems feasible that mildew-infested leaves would have been selected for extraction, but it is harder to explain how aphid DNA would have been extracted alongside plant DNA. Presumably, all leaves would have been cleaned of live aphids before they were placed in extraction tubes. What then is the origin of aphid DNA in these samples? Are these trace amounts from aphid saliva and faeces/honeydew that were left on the leaves? If this is the case, I would expect there to be substantially more mildew DNA than aphid DNA, yet the absolute read counts for aphids are actually higher. Presumably read counts should only be used as a relative metric within a pest organism, but this unexpected result nonetheless raises questions about what these read counts reflect. Again, having experimental data from different aphid densities would make these results more convincing.

      We agree with the reviewer that additional aphid counts at the time of (or prior to) sequencing would have been ideal, but unfortunately we do not have these data. However, compared to such counts one strength of our sequencing-based approach is that it (presumably) integrates over longer periods than a single observation (e.g. if aphid abundances fluctuated, or winged aphids visited leaves only temporarily), and that it can detect pathogens even when invisible to our eyes, e.g. before a mildew colony becomes visible. Moreover, the key point of our study is that we can detect variation in pest abundance even in the absence of count data, which are really time consuming to collect.

      Conducting a new experiment, with controlled aphid infestations and continuous monitoring of their abundances, to test for correlation between pest abundance and the number of detected reads would require resequencing at least 30-50% of the collection for the results to be reliable. It would be a major experimental study in itself.

      Regarding the origin of aphid reads and the differences in read-counts between e.g. aphids and mildew, we believe this should not be of concern. DNA contamination is very common in all kinds of samples, but these reads are simply discarded in other studies. For example, although we collected and handled samples using gloves, MG-RAST detected human reads (Hominidae, S2 Table), possibly from handling the plants during transplanting or phenotyping 1-2 weeks before sequencing. Therefore, although we did remove aphids from the leaves at collection, aphid saliva or temporary presence on leaves must have been enough to leave detectable DNA traces. Additionally, the fact that the M. persicae load strongly correlates with the Buchnera aphidicola load (R2\=0.86, S6 Table), is reassuring. This obligate aphid symbiont is expected to be found in high amounts when sequencing aphids (see e.g. The International Aphid Genomics Consortium (2010))

      The higher amount of aphid compared to mildew reads, can probably be explained by aphids having expanded more than mildew at the time of plant collection, but most importantly, as already mentioned by the reviewer, the read-counts were meant to compare plant accessions rather then pests to one another. We are interested in relative not absolute values. Comparisons between pest species are a challenge because they can be influenced by several factors such as the availability of sequences in the MG-RAST database and the DNA extraction kit used, which is plant-specific and might bias towards certain groups. All these potential biases are not a concern when comparing different plants as they are equally subject to these biases.

      Reviewer #2 (Public Review):

      Summary:

      Galanti et al investigate genetic variation in plant pest resistance using non-target reads from whole-genome sequencing of 207 field lines spontaneously colonized by aphids and mildew. They calculate significant differences in pest DNA load between populations and lines, with heritability and correlation with climate and glucosinolate content. By genome-wide association analyses they identify known defence genes and novel regions potentially associated with pest load variation. Additionally, they suggest that differential methylation at transposons and some genes are involved in responses to pathogen pressure. The authors present in this study the potential of leveraging non-target sequencing reads to estimate plant biotic interactions, in general for GWAS, and provide insights into the defence mechanisms of Thlaspi arvense.

      Strengths:

      The authors ask an interesting and important question. Overall, I found the manuscript very well-written, with a very concrete and clear question, a well-structured experimental design, and clear differences from previous work. Their important results could potentially have implications and utility for many systems in phenotype-genotype prediction. In particular, I think the use of unmapped reads for GWAS is intriguing.

      Thank you for appreciating the originality and potential of our work.

      Weaknesses:

      I found that several of the conclusions are incomplete, not well supposed by the data and/or some methods/results require additional details to be able to be judged. I believe these analyses and/or additional clarifications should be considered.

      Thank you very much for the supportive and constructive comments. They helped us to improve the manuscript.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      The authors address an interesting and significant question, with a well-written manuscript that outlines a clear experimental design and distinguishes itself from previous work. However, some conclusions seem incomplete, lacking sufficient support from the data, or requiring additional methodological details for proper evaluation. Addressing these limitations through additional analyses or clarifications is recommended.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      - So far it is not clear to me how read numbers were normalised and quantified. For instance, Figure 1C only reports raw read numbers. In L149: "Prior to these analyses, to avoid biases caused by different sequencing depths, we corrected the read counts for the total numbers of deduplicated reads in each library and used the residuals as unbiased estimates of aphid, mildew and microbe loads". Was library size considered? Is the load the ratio between exogenous vs no exogenous reads? It is described in L461, but according to this, read counts were normalised and duplicated reads were removed. Now, why read counts were used? As opposite to total coverage / or count of bases per base? I cannot follow how variation in sequencing quality was considered. I can imagine that samples with higher sequencing depth will tend to have higher exogenous reads (just higher resolution and power to detect something in a lower proportion).

      Correcting for sequencing depth/library size is indeed very important. As the reviewer noted, we had explained how we did this in the methods section (L464), and we now also point to it in the results (L151):

      “Finally, we log transformed all read counts to approximate normality, and corrected for the total number of deduplicated reads by extracting residuals from the following linear model, log(read_count + 1) ∼ log(deduplicated_reads), which allowed us to quantify non-Thlaspi loads, correcting for the sequencing depth of each sample.”

      We showed the uncorrected read-counts only in Fig 1 to illustrate the orders of magnitude but used the corrected read-counts (also referred to as “loads”) for all subsequent analyses.

      In our view, theoretically, the best metric to correct the number of reads of a specific contaminant organism, is the total number of DNA fragments captured. Importantly, this is not well reflected by the total number of raw reads because of PCR and optical duplicates occurring during library prep and sequencing. For this reason we estimated the total number of reads captured multiplying total raw reads (after trimming) by the deduplication rate obtained from FastQC (methods L409-411). This metric reflects the amount of DNA fragments sampled better than the raw reads. Also it better reflects MG-RAST metrics as this software also deduplicates reads (Author response image 1 below). We also removed duplicates in our strict mappings to the M. persicae and B. aphidicola genomes.

      Coverage is not a good option for correction, because it is defined for a specific reference genome and many of the read-counts output by MG-RAST do not have a corresponding full assembly. Moreover, coverage and base counts are influenced by read size, which depends on library prep and is not included in the read-counts produced by MG-RAST.

      Author response image 1.

      Linear correlations between the number of MG-RAST reads post-QC and either total (left) or deduplicated (right) reads from fastq files of four full samples (not only unmapped reads).

      - The general assumption is that plants with different origins will have genetic variants or epigenetic variations associated with pathogen resistance, which can be tracked in a GWAS. However, plants from different regions will also have all variants associated with their origin (isolation by state as presented in the manuscript). In line 169: "Having established that our method most likely captured variation in plant resistance, we were interested in the ecological drivers of this variation". It is not clear to me how variation in plant resistance is differentiated from geographical variation (population structure). in L203: "We corrected for population structure using an IBS matrix and only tested variants with Minor Allele Frequency (MAF) > 0.04 (see Methods).". However, if resistant variants are correlated with population structure as shown in Table 1, how are they differentiated? In my opinion, the analyses are strongly limited by the correlation between phenotype and population structure.

      The association of any given trait with population structure is surely a very important aspect in GWAS studies and when looking at correlations of traits with environmental variables. If a trait is strongly associated with population structure, then disentangling variants associated with population structure vs. the ones associated with the trait can indeed be challenging, a good example being flowering time in A. thaliana (e.g. Brachi et al. 2013).

      In our case, although the pest and microbiome loads are associated with population structure to some extent, this association is not very strong. This can be observed for example in Fig. 1C, where there is no clear separation of samples from different regions. This means that we can correct for population structure (in both GWAS and correlations with climatic variables) without removing the signals of association. It is possible that other associations were missed if specific variants were indeed strongly associated with structure, but these would be unreliable within our dataset, so it is prudent to exclude them.

      - Similarly, in L212: "we still found significant GWA peaks for Erysiphales but not for other types of exogenous reads (excluding isolated, unreliable variants) (Figure 3A and S3 Figure)." In a GWA analysis, multiple variants will constitute an association pick (as shown for instance in main Figure 3A) only when the pick is accentuated by lockage disequilibrium around the region under selection (or around the variant explaining phenotypic variation in this case). However, in this case, I suspect there is a strong component of population structure (which still needs to be corroborated as suggested in the previous comment). But if variants are filtered by population structure, the only variants considered are those polymorphic within populations. In this case, I do not think clear picks are expected since most of the signal, correlated with population has been removed. Under this scenario, I wonder how informative the analyses are.

      As mentioned above, the traits we analyse (aphid and mildew loads) are only partially associated with population structure. This is evident from Fig. 1C (see answer above) but also from the SNP-based heritability (Table 1, last column) which measures indeed the proportion of variance explained by genetic population structure. Although some variance is explained (i.e. the reviewer is correct that there is some association) there is still plenty of leftover variance to be used for GWAS and correlations with environmental variables. The fact that we still find GWAS peaks confirms this, as otherwise they would be lost by the population structure correction included in our mixed model.

      - How were heritability values calculated? Were related individuals filtered out? I suggest adding more detail in both the inference of heritability and the kinship matrix (IBS matrix). Currently missing in methods (for heritability I only found the mention of an R package in the caption of Table 1).

      We somehow missed this in the methods and thank the reviewer for noticing. We now added this paragraph to the chapter “Exogenous reads heritability and species identification”:<br /> “To test for variation between populations we used a general linear model with population as a predictor. To measure SNP-based heritability, i.e. the proportion of variance explained by kinship, we used the marker_h2() function from the R package heritability (Kruijer and Kooke 2019), which uses a genetic distance matrix as predictor to compute REML-estimates of the genetic and residual variance. We used the same IBS matrix as for GWAS and for the correlations with climatic variables.”

      We also added the reference to the R package heritability to the Table 1 caption.

      - Figure 2C. in line 188: "Although the baseline levels of benzyl glucosinolates were very low and probably sometimes below the detection level, plant lines where benzyl glucosinolate was detected had significantly lower aphid loads (over 70% less reads) in the glasshouse (Figure 3C)". It is not clear to me how to see these values in Figure 2C. From the boxplot, the difference in aphid loads between detected and not detected benzyl seems significantly lower. From the boxplot distribution is not clear how this difference is statistically significant. It rather seems like a sampling bias (a lot of non-detected vs low detected values). Is the difference still significant when random subsampling of groups is considered?

      Here the “70% less reads” refers to the uncorrected read-counts directly (difference in means between samples where benzyl-GS were detected vs. not). We agree with the reviewer that this is confusing when referred to figure 2C which depicts the corrected M. persicae load (residuals). We therefore removed that information.

      Regarding the significance of the difference, we re-calculated the p value with the Welch's t-test, which accounts for unequal variances, and with a bootstrap t-test. Both tests still found a significant difference. We now report the p value of the Welch’s t-test.

      - I think additional information regarding the read statistics needs to be improved. At the moment some sections are difficult to follow. I found this information mainly in Supplementary Table 1. I could not follow the difference in the manuscript and supplementary materials between read (read count), fragment, ambiguous fragments, target fragments, etc. I didn't find information regarding mean coverage per sample and relative plant vs parasite coverage. This lack of clarity led me to some confusion. For instance, in L207: "We suspected that this might be because some non-Thlaspi reads were very similar to these highly conserved regions and, by mapping there, generated false variants only in samples containing many non-Thlaspi reads". I find it difficult to follow how non-Thlaspi reads will interfere with genotyping. I think the fact that the large pick is lost after filtering reads is already quite insightful. However, in principle I would expect the relative coverage between non-Thlaspi:Thlaspi reads to be rather low in all cases. I would say below 1%. Thus, genotyping should be relatively accurate for the plant variants for the most part. In particular, considering genotyping was done with GATK, where low-frequency variants (relative coverage) should normally be called reference allele for the most part.

      We agree with the reviewer that some clarification over these points is necessary! We modified Supplementary Table 1 to include coverage information for all samples before and after removal of ambiguous reads and explained thoroughly how each value in the table was obtained. Regarding reads and fragments, we define each fragment as having two reads (R1 and R2). The classification into Target, Ambiguous and Unmapped reads was based on fragments, so we used that term in the table, but referring to reads has the same meaning in this context as for example an unmapped read is a read whose fragment was classified as unmapped.

      We did not include the pest coverage specifically, because this cannot be calculated for any of the read counts obtained with MG-RAST as this tool is mapping to online databases where genome size is not necessarily known. What is more meaningful instead are the read counts, which are in Supplementary tables 2 and 6. Importantly as mentioned in other answers, if different taxa are differently represented in the databases this does not affect the comparison of read counts across different samples, but only the comparison of different taxa which was not used for any further analyses.

      Regarding the ambiguous reads causing unreliable variants, these occur only in very few regions of the Thlaspi genome that are highly conserved in evolution or of very low complexity. In these regions reads generated from both plant or for instance aphid DNA, can map, but the ones from aphid might contain variants when mapping to the Thlaspi reference genome (L207 and L300). The reviewer is right that there is only a very small difference in average coverage when removing those ambiguous reads (~1X, S1 Table), but that is not true for those few regions where coverage changes massively when removing ambiguous reads as shown on the right side Y axes of S2 Figure. Therefore these unreliable variants are not low-frequency and therefore not removed by GATK.

      - L215. I am not very convinced with the enrichment analyses, justified with a reference (52). For instance, how many of the predicted picks are not close to resistance genes? How was the randomisation done? At the moment, the manuscript reads rather anecdotally by describing only those picks that effectively are "close" to resistance genes. For instance, if random windows (let's say 20kb windows) are sampled along the genome, how often there are resistant genes in those random windows, and how is the random sampling compared with observed picks (windows).

      Enrichment is by definition an increase in the proportion of true positives (observed frequency: proportion of significant SNPs located close to a priori candidate genes) compared to the background frequency (number of all SNPs located close to a priori candidate genes). So the background likelihood of SNPs to fall into a priori candidate SNPs (i.e. the occurrence of a priori candidate genes in randomly sampled windows, as suggested by the reviewer) is already taken into account as the background frequency. We now explained more extensively how enrichment is calculated in the relevant methods section (L545-549), but it is an extensively used method, established in a large body of literature, so it can be found in many papers (e.g. Atwell et al. 2010, Brachi et al. 2010, Kawakatsu et al. 2016, Kerdaffrec et al. 2017, Sasaki et al. 2015-2019-2022, Galanti et al. 2022, Contreras-Garrido et al. 2024).

      Although we had already calculated an upper bound for the FDR based on the a priori candidates, as in previous literature, we now further calculated the significance of the enrichment for the Bonferroni-corrected -log(p) threshold for Erysiphales. Calculating significance requires adopting a genome rotation scheme that preserves the LD structure of the data, as described in the previously mentioned literature (eg. Kawakatsu et al. 2016, Sasaki et al. 2022). Briefly, we calculated a null distribution of enrichments by randomly rotating the p values and a priori candidate status of the genetic variants within each chromosome, for 10 million permutations. We then assessed significance by comparing the observed enrichment to the null distribution. We found that the enrichment at the Bonferroni corrected -log(p) threshold is indeed significant for Erysiphales (p = 0.016). We added this to the relevant methods section and the code to the github page.

      In addition, many other genes very close (few kb max) to significant SNPs were not annotated with the “defense response” GO term but still had functions relatable to it. Some examples are CAR8, involved in ABA signalling, PBL7 in stomata closure and SRF3 in cell wall building and stress response  (Fig 3D). This means that our enrichment is actually most likely underestimated compared to if we had a more complete functional annotation.

      - L247. Additional information is needed regarding sampling. It is not clear to me why methylation analyses are restricted to 20 samples, contrary to whole genome analyses.

      The sampling is best described in the original paper (on natural DNA methylation variation; Galanti et al. 2022), although the most important parts are repeated in the first chapter of the methods.<br /> Regarding methylation analysis, they are not restricted to 20 samples. Only the DMR calling was restricted to the 20 vs. 20 samples with the most divergent values (of pest loads) to identify regions of variation. This analysis was used to subset the genome to potential regions associated with pest presence rather than thoroughly testing actual methylation variants associated with pest presence. The latter was done in the second step, EWAS, which was based on the whole dataset with the exclusions of samples with high non-conversion rate. This left 188 samples for EWAS. We added this number in the new manuscript (L251 and L571).

      To clarify, we made a few additions to the results (L250) and methods (last two subchapters) sections, where we explain the above.

      - No clear association with TEs: in L364: "Erysiphales load was associated with hypomethylated Copia TEs upstream of MAPKKK20, a gene involved in ABA-mediated signaling and stomatal closure. Since stomatal closure is a known defense mechanism to block pathogen access (21), it is tempting to conclude that hypomethylation of the MAPKKK20 promoter might induce its overexpression and consequent stomatal closure, thereby preventing mildew access to the leaf blade. Overall, we found associations between pathogen load and TE methylation that could act both in cis (eg. Copia TE methylation in MAPKKK20 promoter) and in trans, possibly through transposon reactivation (eg. LINE, Helitron, and Ty3/Gypsi TEs isolated from genes)." I find the whole discussion related to transposable elements, first, rather anecdotical, and second very speculative. To claim: "Overall, we found associations between pathogen load and TE methylation", I believe a more detailed analysis is needed. For instance, how often there is an association? In general, there are some rather anecdotical examples, several of which are presented as association with pathogen load on the basis of being "in proximity" to a particular region/pick. The same regions contain multiple other genes and annotations, but the authors limit the discussion to the particular gene or TE concordant with the hypothesis. This is for both the discussion and results sections.

      Here we are referring to associations in a purely statistical sense. The fact that “Overall, we found associations between pathogen load and TE methylation” is simply a conclusion drawn from Fig. 4b, without implying any causality. Some methylation variants are statistically associated with the traits (aphid or mildew loads), and whether they are true positives or causal is of course more difficult to assess.

      Regarding the methylation variants associated with mildew load in proximity of MAPKKK20, those are the only two significant ones, located close to each other and close to many other variants that, although not significant, have low P-values (Author response image 2 below), so it is the most obvious association warranting further exploration. The reviewer is correct that there are other genes flanking the large DMR that covers the TEs (Fig. 4D), but the DMR is downstream of these genes, so less likely to affect their transcription.

      Author response image 2.

      Regarding all other associations found with M. persicae load, we stated that these are not really reliable due to a skewed P-value distribution (L269, S5B Fig), but we think that for future reference it is still worth reporting the closeby genes and TEs.

      We slightly changed the wording of the passage the reviewer is citing above to make it clearer that we are only offering potential explanations for the associations we observe with TE methylation, but by no means we state that TE reactivation is surely what is happening.

      - One conclusion in the manuscript is that DMRs have been mostly the result of hypomethylation. This is shown for instance in supplementary Figure 4. However, no general statistic is shown of methylation distribution (not only restricted to DMRs). Was the ratio methylation over de-methylation proportional along the genome? Thus the finding in DMRs is out of the genome-wide distribution? Or on the contrary, the DMRs are just a random sampling of the global distribution. The same for different annotated regions. For instance, I would expect that in general coding regions would be less methylated (not restricted to DMRs).

      Complete and exhaustive analyses of the methylomes were already published in the original manuscript (Galanti et al 2022). However, the variation among these methylomes is complex and influenced by multiple factors including genetic background and environment of origin, and talking about these things would have been beyond the scope of our paper. In this paper, we just took advantage of the existing methylome information to identify the few genomic regions that are consistently differentially methylated between samples with extreme values of pest loads. As for the GWAS, the phenotypes are only partially associated with population structure, so the 20 samples with the lowest and the 20 with the highest pathogen loads are not e.g. all Swedish vs. all German but they are a mixture, which allowed us to correct for population structure running EWAS with a mixed model that includes a genetic distance matrix.

      In this study we called DMRs between two defined groups: samples with the lowest amounts of pathogen DNA (not-infected; the “control” group) vs. samples with the highest amounts of pathogens (infected or the “treatment” group), so we could define a directionality (“hyper vs. “hypo” methylation). However, this is not the case for population DMRs called between many different combinations of populations. This is why the hyper- and hypomethylated regions found here cannot be compared to the genome-wide averages, which are influenced by other factors than the pathogens. Even with relaxed thresholds we indeed found very few DMRs associated to pathogen presence here.

      Specifically about coding regions, the reviewer is correct that they are less methylated, especially because T. arvense has largely lost gene body methylation (Nunn et al. 2021, Galanti et al. 2022), but this is unrelated and was discussed in the original publication (Galanti et al. 2022).

      Minor comments:- Figure 1B: it would be good to add also percentage values.

      As the figure is already tightly packed, we rather keep it simple. As the chart gives a good impression of frequencies of different kingdoms, and the frequences of several relevant groups. Also, as explained in a previous answer, comparing different taxonomic groups could be imprecise (as opposed to comparing the same group between different samples), so exact percentages seem unnecessary. If needed, the exact percentages can still be calculated from S2 Table.

      - L159: It is not clear to me what "enemy variation" is referring to here.

      We are referring to variation in enemy densities (attack rates) in the field, that could potentially be carried over to the greenhouse to cause the patterns of infection we observed. We changed it to “variation in enemy densities” to make it more clear.

      - L259: "In accordance with previous studies (8,9), most DMRs were hypomethylated in the affected samples, indicating that genes needed for defense might be activated through demethylation". Not clear to me what "affected samples" is referring to. Samples with lower load?

      Affected samples have a higher load of pathogen reads. We changed it to “infested” to make it more clear.

      - L336. Figure should be Fig 3E.

      We fixed it, thanks for noticing.

      ADDITIONAL CHANGES

      We updated reference 43 to point to the published paper rather than the preprint.

      We corrected the phenotype names in S3 Fig, to make them consistent with the rest of the manuscript and increased font size on the axes to make it more readable.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Review):

      The contribution of individual resides is shown in Figure 3c, which highlights one of the strengths of this RBM implementation - it is interpretable in a physically meaningful way. However, there are several decisions here, the justification of which is not entirely clear.

      i) Some of the residues in Fig 3c are stated as "relevant" for aminoacylated PG production. But is this the only such hidden unit? Or are there others that are sparse, bimodal, and involve "relevant" AA?

      Thanks for bringing this important question to our attention. In fact,  this was the only hidden unit involving the combination of positions 152 and 212.  Although we don't  have knowledge of all relevant amino acids for this catalytic process, the residues we uncover were however shown through experimental analysis to be critical for the catalytic function of two MprF variants, and thus since our protein of interest involved this function, any domain which did not contain these residues were excluded. We can't rule out that the domains we excluded from further analysis could be performing similar catalytic functions, but we found it unlikely considering the amino acids found in the negative portion of the weight were chemically unlikely to form a complex with the amino acid lysine. We have clarified in the text, that this selection is probably a subset of all important amino acids, however, this selection provided predictive power.

      ii) In order to filter the sequences for the second stage, only those that produce an activation over +2.0 in this particular hidden unit were taken. How was this choice made?

      The +2.0 was chosen as it ensured that the bimodal distribution was split into two distinct distributions.

      iii) How many sequences are in the set before and after this filtering? On the basis of the strength of the results that follow I expect that there are good reasons for these choices, but they should be more carefully discussed.  

      We started with 11,507 sequences and after filtering we had 7,890 to train our model with.  We think this number still maintains robust statistics. This is noted in the Dataset acquisition and pre-processing section of the Methods section.

      iv) Do the authors think that this gets all of the aminoacylated PG enzymes? Or are some missed?

      This is an interesting question that prompted us to do further analysis. We have added a new supplemental figure providing more details to this question. Based on the Uniprot derived annotations and the Pfam domain-based analysis of these sequences, the large majority of sequences that were excluded were proteins which included the LPG_synthase_C domain but not the transmembrane flippase domain required by the MprF class of enzymes, and were instead accompanied by different domains which  seem less relevant to our enzyme of interest.  It is true though, and related to question (i), that variants which might retain the functionality despite losing experimentally determined key catalytic residues could have been excluded by this method, but such sequences could still be reasonably excluded due to their dissimilarity with MprF from Streptococcus agalactiae.

      However, some similar criticisms from the last point occur here as well, namely the selection of which weights should be used to classify the enzymes' function. Again the approach is to identify hidden unit activations that are sparse (with respect to the input sequence), have a high overall magnitude, and "involve residues which could be plausibly linked to the lipid binding specificity."

      (i) Two hidden units are identified as useful for classification, but how many candidates are there that pass the first two criteria? Indeed, how many hidden units are there?

      We note in the Model training section of the methods that our final model used had 300 hidden units in total.  As to the first part of your question,  rather than systematically test the predictive power of all other hidden units to this task, we decided to use the weights that we did because of their connection to a proposed lipid binding pocket found through Autodocking experiments. While another weight might provide predictive power, it might lack this critical secondary information. Moreover, the direction of our research necessitated finding weights which first satisfied our lipid-binding pocket plausibility before using these weights to propose MprF variants to test for our novel functionality. Given the limited information we had early in the research process, to go in reverse would have provided too many options for experimental testing with reduced mechanistic justification. We included a brief explanation of our rationale in section " Restricted Boltzmann Machines can provide sensitive, rational guidance for sequence classification “ in the updated manuscript.

      ii) The criterion "involve residues which could be plausibly linked to the lipid binding specificity" is again vague. Do all of the other candidate hidden units *not* involve significant contributions from substrate-binding residues? Maybe one of the other units does a better job of discriminating substrate specificity. (As indicated in Figure 8, there are examples of enzymes that confound the proposed classification.) Why combine the activations of two units for the classification, instead of 1 or 3 or...?

      In fact, it is true that the other hidden units do not involve significant contribution to substrate-binding residues, and we will clarify this. The weights found through this RBM methodology are biased to be probabilistically independent, meaning that the residues and amino acids implicated by each weight are not shared among the other weights through the design of the model. We will update the Model Weight selection section to clarify that the weights we chose had more significantly weighted residues overlapping with the residues near the lipid-binding region than the other weights we checked. We combined these two because they were the only ones which had both overlap with these residues and predictive power of lipid activity with the few sequences we had detailed knowledge of at the time of decision (Figure 5b).

      The Model Weight section reads as follows:

      “Weights were chosen which involved sequence coordinates implicated in our function of interest. Specifically, locations identified through Autodock (Hebecker et al., 2015) where the lipid was likely to interact, and a small radius around this region to select a small set of coordinates. We chose the only weights which had both overlap with multiple residues in this chosen radius and predictive power (separation) for the three examples we had to start with.”

      Author Recommendations:

      The manuscript will likely be read by many membrane biologists/biochemists, and they might like to better understand how the RBM might be useful in their own approach. Here are some suggestions along these lines. The overall goal is to explain the RBM in *plain English* - the mathematical description in Eqs 2-4 is not easily interpretable.

      (1a) Explain that the RBM is a two-layer structure, in which one layer is the "visible" elements of the input sequence, and the other is called "hidden units." Connections are only made between visible and hidden units, but all such connections are made.

      (1b) The strengths of these connections are called "weights", and are determined in a statistical way based on a large set of input sequences. Once parametrized, the RBM is capable of capturing correlations among many positions in an input sequence - a significant advantage over the DCA approach.

      We agree with this assessment, and have updated the section of the text where we introduce the RBM with a non-technical explanation of what this method is doing. It reads as:

      “The design of this RBM can be seen in Figure 4, where the model architecture is represented by purple dots and green triangles. The dots are the “visible” layer, which take in input sequences and encode them into the “hidden” layer, where each triangle represents a separate hidden unit. The lines connecting the visible and hidden layers show that each hidden unit can see all the visible units (the statistics are global), but they cannot see any of the other hidden units, meaning the hidden units are mutually independent. This global model with mutually independent hidden units (see also the marginal distribution form shown in Equation 3) has the following useful properties: higherorder couplings between... “

      (1c) Although strictly true that the DCA model is a Boltzmann machine, it's not a typical Boltzmann machine, because all of the units are visible. Typically a Boltzmann machine would also include hidden units, in order to increase its capacity/power. 

      We have clarified the relationship between DCA and Boltzmann machines, and this section now reads as:

      This class of models is closely related to another model termed the Boltzmann machine. The Boltzmann machine formulation is closely related to the Potts model from physics, which was successfully applied in biology to elucidate important residues in protein structure and function (Morcos et al., 2011), and another example being the careful tuning of enzyme specificity in bacterial two-component regulatory systems (Cheng et al., 2014; Jiang et al., 2021). The Boltzmann machine-like formulation from Morcos et al. (2011), termed Direct Coupling Analysis (DCA), stores patterns...

      (1d) Throughout, the authors refer to the activation of the hidden units as weights, but this is not a typical usage of this terminology. Connections between units are weights and have two subscripts. Given an input sequence, the sum over these weights for a given hidden unit is its activation (Eq. 1). I suggest aligning the description with the typical usage in order to make the presentation easier to follow. Hereafter I will refer to these hidden unit activations as simply activations. 

      We agree with you, the hidden units are a collection of edge weights. We have modified the terminology in the text and in our figures to consistently refer to the collections of weights as hidden units and refer to the hidden unit outputs given a sequence input as activations.

      (1e) How many hidden units are there?

      The final model was trained with 300 hidden units.

      (2) It is redundant to say that lipids are both amphiphiles and hydrophobic...amphiphile already means hydrophobic plus hydrophilic. 

      This is true, we have edited the manuscript to reflect this.

      (3) What does this mean, and what's the point of this remark? "They [lipids] are relatively smaller than other complex biomolecules, such as proteins, thereby allowing a larger portion of their surface to interact with other macromolecules." 

      We have removed this sentence.

      Reviewer 2 (Author Recommendations):

      While the idea of filtering out a part of the sequence data obtained with BLAST makes sense per se, it would be nice if the authors could comment on the nature of the sequences corresponding to the left peak in Figure 3b. It is hypothesised in conclusion that these sequences could lack any catalytic function. Could the authors experimentally check that this is the case or provide further evidence for this hypothesis?

      Yes, in this revision we provide further evidence as a new supplementary figure S2. At the time we performed domain analysis of the sequences we excluded; most of these sequences lacked the flippase domain associated with MprF function, and instead were combined with different domains. On this basis we excluded them due to their lack of relevance to the MprF from Streptococcus agalactiae we were interested in. Although there is possibility that some relevant sequences might be excluded, our assessment is that we gained specificity by reducing the set of sequences. 

      A key step in the RBM-based approach is the identification of "meaningful" hidden units, i.e. whose values are related to biological function. In Methods, the authors explain how they selected these units based on the L1 norms of the weights and the region of interaction with the lipid. While these criteria are reasonable, I wonder whether they are too stringent. In particular, one could think that regions in the proteins not in direct contact with the lipid could also be important for binding. It is known for instance that the length of loops can affect flexibility and help regulate activity in some catalytic enzymes. So my question is: if one relaxes the criterion about the coordinates of large weight values, what happens? Are other potentially interesting hidden units identified?

      We completely agree that other regions of the protein are likely involved in determining enzyme specificity, and that focusing on solely regions which interact with the lipid is perhaps missing important contributions to the catalytic function; we hypothesize that the flippase domain itself and its interaction with the catalytic domain are involved, especially considering the concerted mechanism by which they must operate. We are currently investigating these theories and will be the subject of future work. As an initial step, we present this current work with restricted information that led to concrete predictions. We focused on the lipid binding pocket because it was one of just a few bits of information we had from the start, but as the reviewer suggests, we plan to follow up our research to try to identify other relevant hidden units and domains. 

      From a purely machine-learning point of view, it would be good to see more about cross-validation of the model. More precisely, could the authors show the log-likelihood of test set data compared to the one of training sequence data?

      We agree this is an important piece of information. We will update our methods section with this information. We performed a parameter sweep to search for the parameter’s we used in our final model, and in that testing with a random 80/20% training/test split we had a training log probability loss of -0.91, and a test loss of -0.98. However, for our final model we used all available data and did not perform a split; the final result did not change dramatically by including the additional data, and the weight structure and composition was consistent with the results presented in the paper.

      Reviewer 3 (Public Review):

      In many of the analyzed strains, the presence of the lipid species Lys-PG, Lys-Glc-DAG, and Lys-Glc2-DAG is correlated to the presence of the MprF enzyme(s), but one should keep in mind that a multitude of other membrane proteins are present that in theory could be involved in the synthesis as well. Therefore, there is no direct evidence that the MprF enzymes are linked to the synthesis of these lipid species. Although, it is unlikely that other enzymes are involved, this weakens the connection between the observed lipids and the type of MprF. 

      While there are a number of proteins found on the membrane that could play a role, we have specifically used a background strain that has a transposon in mprF that makes the bacteria incapable of synthesizing Lys-lipids (Figure 7B) unless complemented back with a functional MprF (Figure 7D-E). This led us to conclude that MprF is responsible for Lys-lipid synthesis.

      Related to this, in a few cases MprF activity is tested, but the manuscript does not contain any information on protein expression levels. Heterologous expression of membrane proteins is in general challenging and due to various reasons, proteins end up not being expressed at all. As an example, the absence of activity for the E. faecalis MprF1 and E. faecium MprF2 could very well be explained by the entire absence of the protein.

      The genes were expressed on the same plasmid to control for expression. While we did not run a western blot to examine expression levels the plasmid backbone was used as a control for protein expression. Previous research supports E. faecalis MprF1 and E. faecium MprF2 not synthesizing Lys-lipids and instead most likely play a different role in the cell membrane. 

      The title is somewhat misleading. The sequence statistics and machine learning categorized the MprFs, but the identification of a novel lipid species was a coincidence while checking/confirming the categorization. 

      We believe the title is appropriate given that the identification of Enterococcus dispar was through computational methods that led to the discovery Lys-Glc2-DAG. In other words, the categorization of potential organisms that produce lipids related to MprF has been driven by the proposition from the computational method. We agree, however, that the discovery was unexpected but would not have happened without the suggested organisms coming from the methodology presented here.  

      Please read the manuscript one more time to correct textual errors.  

      The example of the role of LPS in delivering siRNA to targeted cancer cells is a bit farfetched as LPS is very different from the lipids that are being discussed here. I would rather focus on the role of Lysyl-lipids in antibiotic resistance in the introduction.  

      We included LPS here to explain that natural lipids/components of the bacterial cell membrane could be used for drug delivery systems. While it is true LPS is quite different from Lys-lipid compounds, our goal was to create an emphasis on how the bacterial domain is a rich untapped source of lipids that could be used in biotechnology.  In this way we wanted our statement to be more broadly about bacterial lipids and the importance of their continued study for diverse applications like pharmaceuticals.

      The MS identification of Lys-Glc2-DAG is convincing, especially in combination with the fragmentation data, but the ion counts suggest low abundance. The observation would be strengthened if the identification of Lysyl-Glc2-DAG with different acyl-chain configurations has been observed. This should be then mentioned or visualized in the manuscript. 

      We agree and have added an updated Figure 8A to demonstrate the presence of different acyl-chain configurations in Enterococcus dispar.  

      Further analysis of the Enterococcus strains shows the presence of the three lipids Lys-PG, Lys-Glc-DAG, and LysGlc2-DAG, although the Lys-Glc-DAG is only detected in trace amounts. This raises questions on the specificity of the MprF for the substrate Glc-DAG. If the ratio of Glc2-DAG compared to Glc-DAG abundance is similar to the ratio of Lys-Glc2-DAG vs. Lys-Glc-DAG abundance, this would strengthen the observation that the enzyme has equal affinity. However, if there is a rather large amount of Glc-DAG but a small amount of Lys-Glc-DAG, the production of Lys-Glc-DAG might be a side-reaction. 

      The reviewer brings a relevant point of discussion, however, a clear resolution might be part of future work as we do not use spike in controls when completing lipid extractions. Because of this, it  it is not possible for us to compare lipid levels across different samples. We now include a note clarifying this in the discussion section.  

      The plotting of the MprF sequence variants using the chosen RBM weights reveals a rather complex distribution over the quadrants (Figure 8). It is rather unclear in Figure 8 why only 1 sequence is plotted for Enterococcus faecalis and faecium, while 2 different MprFs are present (and tested) for these two organisms. This should be clarified.  

      We agree this can be a source of confusion. We have further clarified this in the text that only the functional alleles were plotted in Figure 8 and that all Enterococcal alleles are plotted in Figure S3 regardless of function.

    1. Author response:

      Reviewer 1:

      The role of Fgf signaling in gliogenesis and Foxg1 in neurogenesis is well known. It is not clear if Fgf18 is a direct target of Foxg1.

      We agree with the reviewer- Fgf signaling is an established pro-gliogenic pathway (Duong et al 2019) and Foxg1 overexpression is known to promote neurogenesis in cultured neural stem cells (Branacaccio et al 2019). Our study links these two mechanisms, as the Reviewer has summarized: (a) we demonstrate that FOXG1 works via modulating Fgf signaling cell-autonomously within progenitors by regulating the levels of Fgfr3. (b) Loss of Foxg1 in postmitotic neurons results in the upregulation of Fgf ligand expression (possibly via indirect mechanisms) and this non-cell autonomously increases Fgf signaling in progenitors. Our study is entirely performed in vivo.

      Proposed revision: We will revise the manuscript to reflect that Fgf18 may be an indirect target of FOXG1 in postmitotic neurons.

      Reviewer 2:

      It wasn't clear to me why the authors chose postnatal day 14 to examine the effects of Foxg1 deletion at E15 - this is a long time window, giving time for indirect consequences of Foxg1 deletion to influence development and thereby potentially complicating the interpretation of findings. For example, the authors show that there is no increased proliferation of astrocytes or death of neurons lacking Foxg1 shortly after cre-mediated deletion, but it remains formally possible (if perhaps unlikely) that these processes could be affected later during the time window. The rationale underlying the choice of this time point should be explained.

      I don't agree with the statement in the very last sentence of the results section that "neurogenesis is not possible in the absence of [Foxg1]" as there are multiple reports in the literature demonstrating the presence of neurons in Foxg1-/- mice (eg: Xuan et al., 1995; Hanashima et al., 2002, Martynoga et al., 2005, Muzio and Mallamaci 2005). Perhaps the statement refers specifically to late-born cortical neurons. This point also arises in the discussion section.

      Proposed revisions:

      (a) We will revise the manuscript to explain why we chose postnatal day 14 to examine the effects of Foxg1 deletion at E15.

      ● We have examined the transcriptomic dysregulation after Foxg1 deletion at E17.5, which is a reasonable period to identify potential direct targets. Furthermore, FOXG1 occupies the Fgfr3 locus in ChIP-seq performed at E15.5. Together, these support the interpretation that Fgfr3 is a direct target of Foxg1.

      ● As the Reviewer notes, we have investigated the possibility of increased proliferation of astrocytes and death of neurons and found no evidence that suggests these phenomena occur in the 3 days after loss of Foxg1. Cortical neurons are postmitotic and differentiated by E18.5, the stage at which we examined CC3 staining and found no difference in cell death in control and mutants (Supplementary Figure S2C, C’). The majority of progenitors (PAX6+ve cells) that lose Foxg1 at E15.5 express the gliogenic transcription factor NFIA by E18.5 (Figure 2C, C’), but hardly any express intermediate (neurogenic) progenitor marker TBR2 (Supplementary Figure S2B, B’). It is therefore unlikely that neurons are born from Foxg1 mutant progenitors and then die at a later stage.

      ● The cellular consequences of loss of Foxg1 require additional time to detect e.g. it takes ~ 5 days for GFAP to be detected in astrocytes once they are born. The P14 timepoint permits the assessment of oligogenesis which begins after astrogliogenesis and therefore permits a comprehensive assessment of the lineage of E15.5 Foxg1 null progenitors.

      (b) Thank you for pointing out that the last sentence of the results section implied (incorrectly) that ALL neurogenesis is not possible in the absence of Foxg1 We will modify this (and the discussion) to reflect that this applies to E14/15 progenitors and late-born cortical neurons.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Sun et al. are interested in how experience can shape the brain and specifically investigate the plasticity of the Toll-6 receptor-expressing dopaminergic neurons (DANs). To learn more about the role of Toll-6 in the DANs, the authors examine the expression of the Toll-6 receptor ligand, DNT-2. They show that DNT-2 expressing cells connect with DANs and that loss of function of DNT-2 in these cells reduces the number of PAM DANs, while overexpression causes alterations in dendrite complexity. Finally, the authors show that alterations in the levels of DNT-2 and Toll-6 can impact DAN-driven behaviors such as climbing, arena locomotion, and learning and long-term memory.

      Strengths:

      The authors methodically test which neurotransmitters are expressed by the 4 prominent DNT-2 expressing neurons and show that they are glutamatergic. They also use Trans-Tango and Bac-TRACE to examine the connectivity of the DNT-2 neurons to the dopaminergic circuit and show that DNT-2 neurons receive dopaminergic inputs and output to a variety of neurons including MB Kenyon cells, DAL neurons, and possibly DANS.

      We are very pleased that Reviewer 1 found our connectivity analysis a strength.

      Weaknesses:

      (1) To identify the DNT-2 neurons, the authors use CRISPR to generate a new DN2-GAL4. They note that they identified at least 12 DNT-2 plus neurons. In Supplementary Figure 1A, the DNT-2-GAL4 driver was used to express a UAS-histoneYFP nuclear marker. From these figures, it looks like DNT-2-GAL4 is labeling more than 12 neurons. Is there glial expression?

      Indeed, we claimed that DNT-2 is expressed in at least 12 neurons (see line 141, page 6 of original manuscript), which means more than 12 could be found. The membrane tethered reporters we used – UAS-FlyBow1.1, UASmcD8-RFP, UAS-MCFO, as well as UAS-DenMark:UASsyd-1GFP – gave a consistent and reproducible pattern. However, with DNT-2GAL4>UAS-Histone-YFP more nuclei were detected that were not revealed by the other reporters. We have found also with other GAL4 lines that the patterns produced by different reporters can vary. This could be due to the signal strength (eg His-YFP is very strong) and perdurance of the reporter (e.g. the turnover of His-YFP may be slower than that of the other fusion proteins).

      We did not test for glial expression, as it was not directly related to the question addressed in this work.

      (2) In Figure 2C the authors show that DNT-2 upregulation leads to an increase in TH levels using q-RT-PCR from whole heads. However, in Figure 3H they also show that DNT-2 overexpression also causes an increase in the number of TH neurons. It is unclear whether TH RNA increases due to expression/cell or the number of TH neurons in the head.

      Figure 3H shows that over-expression of DNT-2 FL increased the number of Dcp1+ apoptotic cells in the brain, but not significantly (p=0.0939). The ability of full-length neurotrophins to induce apoptosis and cleaved neurotrophins promote cell survival is well documented in mammals. We had previously shown that DNT-2 is naturally cleaved, and that over-expression of DNT-2 does not induce apoptosis in the various contexts tested before (McIlroy et al 2013 Nature Neuroscience; Foldi et al 2017 J Cell Biol; Ulian-Benitez et al 2017 PLoS Genetics). Similarly, throughout this work we did not find DNT-2FL to induce apoptosis.

      Instead, in Figure 3G we show that over-expression of DNT-2FL causes a mild yet statistically significant increase in the number of TH+ cells. This is an important finding that supports the plastic regulation of PAM cell number. We thank the Reviewer for highlighting this point, as we had forgotten to add the significance star in the graph. In this context, we cannot rule out the possibility that the increase in TH mRNA observed when we over-express DNT-2FL could not be due to an increase in cell number instead. Unfortunately, it is not possible for us to separate these two processes at this time. Either way, the result would still be the same: an increase in dopamine production when DNT-2 levels rise.

      (3) DNT-2 is also known as Spz5 and has been shown to activate Toll-6 receptors in glia (McLaughlin et al., 2019), resulting in the phagocytosis of apoptotic neurons. In addition, the knockdown of DNT-2/Spz5 throughout development causes an increase in apoptotic debris in the brain, which can lead to neurodegeneration. Indeed Figure 3H shows that an adult-specific knockdown of DNT-2 using DNT2-GAL4 causes an increase in Dcp1 signal in many neurons and not just TH neurons.

      Indeed, we did find Dcp1+ cells in TH-negative cells too (although not widely throughout the brain). This is not surprising, as DNT-2 neurons have large arborisations that can reach a wide range of targets; DNT-2 is secreted, and could reach beyond its immediate targets; Toll-6 is expressed in a vast number of cells in the brain; DNT-2 can bind promiscuously at least also Toll-7 and other Keks, which are also expressed in the adult brain (Foldi et al 2017 J Cell Biology; Ulian-Benitez et al 2017 PLoS Genetics; Li et al 2020 eLife). Together with the findings by McLaughlin et al 2019, our findings further support the notion that DNT-2 is a neuroprotective factor in the adult brain. It will be interesting to find out what other neuron types DNT-2 maintains.

      We would like to thank Reviewer 1 for their positive comments on our work and their interesting and valuable feedback.

      Reviewer #2 (Public review):

      This paper examines how structural plasticity in neural circuits, particularly in dopaminergic systems, is regulated by Drosophila neurotrophin-2 (DNT-2) and its receptors, Toll-6 and Kek-6. The authors show that these molecules are critical for modulating circuit structure and dopaminergic neuron survival, synaptogenesis, and connectivity. They show that loss of DNT-2 or Toll-6 function leads to loss of dopaminergic neurons, dendritic arborization, and synaptic impairment, whereas overexpression of DNT-2 increases dendritic complexity and synaptogenesis. In addition, DNT-2 and Toll-6 modulate dopamine-dependent behaviors, including locomotion and long-term memory, suggesting a link between DNT-2 signaling, structural plasticity, and behavior.

      A major strength of this study is the impressive cellular resolution achieved. By focusing on specific dopaminergic neurons, such as the PAM and PPL1 clusters, and using a range of molecular markers, the authors were able to clearly visualize intricate details of synapse formation, dendritic complexity, and axonal targeting within defined circuits. Given the critical role of dopaminergic pathways in learning and memory, this approach provides a good opportunity to explore the role of DNT-2, Toll-6, and Kek-6 in experience-dependent structural plasticity. However, despite the promise in the abstract and introduction of the paper, the study falls short of establishing a direct causal link between neurotrophin signaling and experience-induced plasticity.

      Simply put, this study does not provide strong evidence that experience-induced structural plasticity requires DNT-2 signaling. To support this idea, it would be necessary to observe experience-induced structural changes and demonstrate that downregulation of DNT-2 signaling prevents these changes. The closest attempt to address this in this study was the artificial activation of DNT-2 neurons using TrpA1, which resulted in overgrowth of axonal arbors and an increase in synaptic sites in both DNT-2 and PAM neurons. However, this activation method is quite artificial, and the authors did not test whether the observed structural changes were dependent on DNT-2 signaling. Although they also showed that overexpression of DNT-2FL in DNT-2 neurons promotes synaptogenesis, this phenotype was not fully consistent with the TrpA1 activation results (Figures 5C and D).

      In conclusion, this study demonstrates that DNT-2 and its receptors play a role in regulating the structure of dopaminergic circuits in the adult fly brain. However, it does not provide convincing evidence for a causal link between DNT-2 signaling and experience-dependent structural plasticity within these circuits.

      We would like to thank Reviewer 2 for their very positive assessment of our approach to investigate structural circuit plasticity. We are delighted that this Reviewer found our cellular resolution impressive. We are also very pleased that Reviewer 2 found that our work demonstrates that DNT-2 and its receptors regulate the structure of dopaminergic circuits in the adult fly brain. This is already a very important finding that contributes to demonstrating that, rather than being hardwired, the adult fly brain is plastic, like the mammalian brain.

      We are very pleased that this Reviewer acknowledges that this work provides a good opportunity to explore the role of DNT-2, Toll-6, and Kek-6 in experience-dependent structural plasticity. We provide a molecular mechanism and proof of principle, and we demonstrate a direct link between the function of DNT-2 and its receptors in circuit plasticity, and a suggestive link to neuronal activity. Finding out the direct link to lived experience is a big task, beyond the scope of this manuscript, and we will be testing this with future projects. Nevertheless, it is important to place our findings within this context, as it opens opportunities for discovery by the neuroscience community.

      We would like to thank Reviewer 2 for the positive and thoughtful evaluation of our work, and for their feedback.

      Reviewer #3 (Public review):

      Summary:

      The authors used the model organism Drosophila melanogaster to show that the neurotrophin Toll-6 and its ligands, DNT-2 and kek-6, play a role in maintaining the number of dopaminergic neurons and modulating their synaptic connectivity. This supports previous findings on the structural plasticity of dopaminergic neurons and suggests a molecular mechanism underlying this plasticity.

      Strengths:

      The experiments are overall very well designed and conclusive. Methods are in general state-of-the-art, the sample sizes are sufficient, the statistical analyses are sound, and all necessary controls are in place. The data interpretation is straightforward, and the relevant literature is taken into consideration. Overall, the manuscript is solid and presents novel, interesting, and important findings.

      We are delighted that Reviewer 3 found our work solid, novel, interesting and with important findings. We are also very pleased that this Reviewer found that all necessary controls have been carried out.

      Weaknesses:

      There are three technical weaknesses that could perhaps be improved.

      First, the model of reciprocal, inhibitory feedback loops (Figure 2F) is speculative. On the one hand, glutamate can act in flies as an excitatory or inhibitory transmitter (line 157), and either situation can be the case here. On the other hand, it is not clear how an increase or decrease in cAMP level translates into transmitter release. One can only conclude that two types of neurons potentially influence each other.

      Thank you for pointing out that glutamate can be inhibitory. In mammals, the neurotrophin BDNF has an important function in glutamatergic synapses, thus we were intrigued by a potential evolutionary conservation. Our evidence that DNT-2A neurons could be excitatory is indirect, yet supportive: exciting DNT-2 neurons with optogenetics resulted in an increase in GCaMP in PAMs (data not shown); over-expression of DNT-2 in DNT-2 neurons increased TH mRNA levels; optogenetic activation of DNT-2 neurons results in the Dop2R-dependent downregulation of cAMP levels in DNT-2 neurons. Dop2R signals in response to dopamine, which would be released only if dopaminergic neurons had been excited. Accordingly, glutamate released from DNT-2 neurons would have been rather unlikely to inhibit DANs.

      cAMP is a second messenger that enables the activation of PKA. PKA phosphorylates many target proteins, amongst which are various channels. This includes the voltage gated calcium channels located at the synapse, whose phosphorylation increases their opening probability. Thus, a rise in cAMP could facilitate neurotransmitter release, and a downregulation would have the opposite effect. Other targets of PKA include CREB, leading to changes in gene expression. Conceivably, a decrease in PKA activity could result in the downregulation of DNT-2 expression in DNT-2 neurons. This negative feedback loop would restore the homeostatic relationship between DNT-2 and dopamine levels.

      Our data indeed demonstrate that DNT-2 and PAM neurons influence each other, not potentially, but really. We have provided data that: DNT-2 and PAMs are connected through circuitry; that the DNT-2 receptors Toll-6 and kek-6 are expressed in DANs, including in PAMs; that alterations in the levels of DNT-2 (both loss and gain of function) and loss of function for the DNT-2 receptors Toll-6 and Kek-6 alter PAM cell number, alter PAM dendritic complexity and alter synaptogenesis in PAMs; alterations in the levels of DNT-2, Toll-6 and kek-6 in adult flies alters dopamine dependent behaviours of climbing, locomotion in an arena and learning and long-term memory. These data firmly demonstrate that the two neuron types DNT-2 and PAMs influence each other.

      We have also shown that over-expression of DNT-2 in DNT-2 neurons increases TH mRNA levels, whereas activation of DNT-2 neurons decreases cAMP levels in DNT-2 neurons in a dopamine/Dop2R-dependent manner. These data show a functional interaction between DNT-2 and PAM neurons.

      Second, the quantification of bouton volumes (no y-axis label in Figure 5 C and D!) and dendrite complexity are not convincingly laid out. Here, the reader expects fine-grained anatomical characterizations of the structures under investigation, and a method to precisely quantify the lengths and branching patterns of individual dendritic arborizations as well as the volume of individual axonal boutons.

      Figure 5C, D do contain Y-axis labels, all our graphs in main manuscript and in supplementary files contain Y-axis labels.

      In fact, we did use a method to precisely quantify the lengths and branching patterns of individual dendritic arborisations, volume of individual boutons and bouton counting. These analyses were carried out using Imaris software. For dendritic branching patterns, the “Filament Autodetect” function was used. Here, dendrites were analysed by tracing semi-automatically each dendrite branch (ie manual correction of segmentation errors) to reconstruct the segmented dendrite in volume. From this segmented dendrite, Imaris provides measurements of total dendrite volume, number and length of dendrite branches, terminal points, etc. For bouton size and number, we used the Imaris “Spot” function. Here, a threshold is set to exclude small dots (eg of background) that do not correspond to synapses/boutons. All samples and genotypes are treated with the same threshold, thus the analysis is objective and large sample sizes can be analysed effectively. We have already provided a description of the use of Imaris in the methods section.

      Third, Figure 1C shows two neurons with the goal of demonstrating between-neuron variability. It is not convincingly demonstrated that the two neurons are actually of the very same type of neuron in different flies or two completely different neurons.

      We thank Reviewer 3 for raising this interesting point. It is not possible to prove which of the four DNT-2A neurons per hemibrain, which we visualised with DNT-2>MCFO, were the same neurons in every individual brain we looked at. This is because in every brain we have looked at, the soma of the neurons were not located in exactly the same location. Furthermore, the arborisation patterns are also different and unique, for each individual brain. Thus, there is natural variability in the position of the soma and in the arborisation patterns. Such variability presumably results from the combination of developmental and activity-dependent plasticity.

      We would like to thank Reviewer 3 for the very positive evaluation of our work and the interesting and valuable feedback.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Here the authors present their evidence linking the mitochondrial uniporter (MCU-1) and olfactory adaptation in C. elegans. They clearly demonstrate a behavioral defect of mcu-1 mutants in adaptation over 60 minutes and present evidence that this gene functions in the AWC primary sensory neurons at, or close to, the time of adaptation. 

      Strengths: 

      The paper is very well organized and their approach to unpacking the role of mcu-1 mutants in olfactory adaptation is very reasonable. The authors lean into diverse techniques including behavior, genetics, and pharmacological manipulation in order to flesh out their model for how MCU-1 functions in AWC neurons with respect to olfaction. 

      Weaknesses: 

      I would like to see the authors strengthen the link between mitochondrial calcium and olfactory adaptation. The authors present some gCaMP data in Figure 5 but it is unclear to me why this tool is not better utilized to explore the mechanism of MCU-1 activity. I think this is very important as the title of the paper states that "mitochondrial calcium modulates.." behavior in AWC and so it would be nice to see more evidence to support this direct connection. I would also like to see the authors place their findings into a model based on previous findings and perhaps examine whether mcu-1 is required for EGL-4 nuclear translocation, which would be straightforward to examine. 

      We agree that observing calcium levels inside the mitochondria would conclusively demonstrate that mitochondria calcium directly impacts neuropeptide secretion and behavior. We will try to do this with a mitochondrially targeted calcium indicator. We will also better integrate our findings to existing models in the literature, such as EGL-4 nuclear localization in AWC in response to prolonged odor exposure. Thank you for your comments.

      Reviewer #2 (Public review): 

      Summary: 

      In their manuscript, "Mitochondrial calcium modulates odor-mediated behavioural plasticity in C. elegans", Lee et al. aim to link a mitochondrial calcium transporter to higher-order neuronal functions that mediate memory and aversive learning behaviours. The authors characterise the role of the mitochondrial calcium uniporter, and a specific subunit of this complex, MCU-1, within a single chemosensory neuron (AWCOFF) during aversive odor learning in the nematode. By genetically manipulating mcu-1 as well as using pharmacological activators and blockers of MCU activity, the study presents compelling evidence that the activity of this individual mitochondrial ion transporter in AWCOFF is sufficient to drive animal behaviour through aversive memory formation. The authors show that perturbations to mcu-1 and MCU activity prevent aversive learning to several chemical odors associated with food absence. The authors propose a model, experimentally validated at several steps, whereby an increase in MCU activity during odor conditioning stimulates mitochondrial calcium influx and an increase in mitochondrial reactive oxygen species (mtROS) production, triggering the release of the neuropeptide NLP-1 from AWC, all of which are required to mediate future avoidance behaviour of the chemical odor. 

      Strengths: 

      Overall, the authors provided robust evidence that mitochondrial function, mediated through MCU activity, contributes to behavioural plasticity. They also demonstrated that ectopic MCU activation or mtROS during odor exposure could accelerate learning. This is quite profound, as it highlights the importance of mitochondrial function in complex neuronal processes beyond their general roles in the development and maintenance of neurons through energy homeostasis and biosynthesis, amongst their other cell-non-specific roles. 

      Weaknesses: 

      While the manuscript is generally robust, there are some concerns that should be addressed to improve the strength of the proposed model: 

      (1) Throughout the manuscript, it is implied that MCU activation caused by odor conditioning changes mitochondrial calcium levels. However, there is no direct experimental evidence of this. For example, the authors write on p.10 "This shows that H2O2 production occurs downstream of MCU activation and calcium influx into the mitochondria", and on p. 11, the statement that prolonged exposure to odors causes calcium influx. Because this is a key element of the proposed model, experimental evidence would be required to support it. 

      We are planning to measure mitochondrial calcium levels directly by using a mitochondrially targeted calcium indicator. We agree that this is a key element of our model.

      (2) Some controls missing, e.g. a heat-shock-only control in WT and mcu-1 (non-transgenic) background in Figure 1h is required to ensure the heat-shock stress does not interfere with odor learning. 

      We will conduct the experiments again with necessary controls.

      (3) Lee et al propose that mcu-1 is required at the adult stage to accomplish odor learning because inducing mcu-1 expression at larval stages did not rescue the phenotype of mcu-1 mutants during adulthood. However, the requirement of MCU for odor learning was narrowed down to a 15' window at the end of odor conditioning (Figure 5c). Is it possible that MCU-1 protein levels decline after larval induction so that MCU-1 is no longer present during adulthood when odor conditioning is performed? 

      Yes, we also noted that the early induction of MCU-1 is not effective to restore learning, and hypothesized that MCU-1 protein may be subject to high turnover. It may be that MCU-1 induced during larval stages no longer exist by the time odor conditioning is performed, although we have not confirmed this. We had a brief sentence noting this in the discussion section, but we will discuss this a little further in the revision. Thank you.

      (4) There is a limited learning effect observable after 30 minutes, and a very pronounced effect in all animals after 90 minutes. The authors very carefully dissect the learning mechanism at 60 minutes of exposure and distinguish processes that are relevant at 60 minutes from those important at 30 minutes. Some explanation or speculation as to why the processes crucial at the 60-minute mark are redundant at 90 minutes of exposure would be important. 

      I think this is in line with Reviewer #1’s comments that we should discuss our findings more in relation to existing models in the literature. We will do this in our revision.

      (5) Given the presumably ubiquitous function of mcu-1/MCU in mitochondrial calcium homeostasis, it is remarkable that its perturbation impacts only a very specific neuronal process in AWC at a very specific time. The authors should elaborate on this surprising aspect of their discovery in the discussion. 

      We will discuss the implication further in our revised manuscript.

      (6) Associated with the above comment, it remains possible that mcu-1 is required in coelomocytes for their ability to absorb NLP-1::Venus (Figure 3B), and the AWC-specific role of mcu-1 for this phenotype should be determined. 

      To confirm that mcu-1 is not required for coelomocyte uptake, we can stimulate NLP-1:Venus secretion in mcu-1 worms by adding H2O2, then observe whether Venus is observed in the coelomocytes. We will include this in our revised manuscript. Thank you for your comments.

      Reviewer #3 (Public review): 

      Summary: 

      This manuscript reports a role for the mitochondrial calcium uniporter gene (mcu-1) in regulating associative learning behavior in C. elegans. This regulation occurs by mcu-1-dependent secretion of the neuropeptide NLP-1 from the sensory neuron AWC. The authors report a post-developmental role for mcu-1 in AWC to promote learning. The authors further show that odor conditioning leads to increases in NLP-1 secretion from AWC, and that interfering with mcu-1 function reduces NLP-1 secretion. Finally, the authors show that NLP-1 secretion increases when ROS levels in AWC are genetically or pharmacologically elevated. The authors propose that mitochondrial calcium entry through MCU-1 in response to odor conditioning leads to the generation of ROS and the subsequent increase in neuropeptide secretion to promote conditioned behavior. 

      Strengths: 

      (1) The authors show convincingly that genetically or pharmacologically manipulating MCU function impacts chemotaxis in a conditioned learning paradigm. 

      (2) The demonstration that the secretion of a specific neuropeptide can be up-regulated by MCU, ROS and odor conditioning is an important and interesting advance that addresses mechanisms by which neuropeptide secretion can be regulated in vivo. 

      Weaknesses: 

      (1) The authors conclusion that mcu-1 functions in the AWC-on neuron is not adequately supported by their rescue experiments. The promoter they use for rescue drives expression in a number of additional neurons including AWC-on, that themselves are implicated in adaptation, leaving open the possibility that mcu-1 may function non-autonomously instead of autonomously in AWC to regulate this behavior. 

      We recognized this as well, and we now have a promoter construct more specific to AWCON (str-2). Using this more specific promoter, we will confirm that the role of mcu-1 is indeed AWCON-specific in our revised manuscript.

      (2) The authors conclude MCU promotes neuropeptide release from AWC by controlling calcium entry into mitochondria, but they did not directly examine the effects of altered MCU function on calcium dynamics either in mitochondria or in the soma, even though they conducted calcium imaging experiments in AWC of wild type animals. Examination of calcium entry in mitochondria would be a direct test of their model.

      We agree. As we stated above for reviewer #1 and #2, we will include results from the mitochondrial calcium data in our revised manuscript.

      (3) The authors' conclusion that mitochondrial-derived ROS produced by MCU activation drives neuropeptide release does not appear to be experimentally supported. A major weakness of this paper is that experiments addressing whether mcu-1 activity indeed produces ROS are not included, leaving unanswered the question of whether MCU is the endogenous source of ROS that drives neuropeptide secretion.

      We can confirm this using mitochondrially targeted redox indicator roGFP, and we will be sure to include the data in the revised manuscript. Thank you for your comments.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Nicoletti et al. presents a minimal model of habituation, a basic form of non-associative learning, addressing both from dynamical and information theory aspects of how habituation can be realized. The authors identify that negative feedback provided with a slow storage mechanism is sufficient to explain habituation.

      Strengths:

      The authors combine the identification of the dynamical mechanism with information-theoretic measures to determine the onset of habituation and provide a description of how the system can gain maximum information about the environment.

      We thank the reviewer for highlighting the strength of our work.

      Weaknesses:

      I have several main concerns/questions about the proposed model for habituation and its plausibility. In general, habituation does not only refer to a decrease in the responsiveness upon repeated stimulation but as Thompson and Spencer discussed in Psych. Rev. 73, 16-43 (1966), there are 10 main characteristics of habituation, including (i) spontaneous recovery when the stimulus is withheld after response decrement; dependence on the frequency of stimulation such that (ii) more frequent stimulation results in more rapid and/or more pronounced response decrement and more rapid spontaneous recovery; (iii) within a stimulus modality, the less intense the stimulus, the more rapid and/or more pronounced the behavioral response decrement; (iv) the effects of repeated stimulation may continue to accumulate even after the response has reached an asymptotic level (which may or may not be zero, or no response). This effect of stimulation beyond asymptotic levels can alter subsequent behavior, for example, by delaying the onset of spontaneous recovery.

      These are only a subset of the conditions that have been experimentally observed and therefore a mechanistic model of habituation, in my understanding, should capture the majority of these features and/or discuss the absence of such features from the proposed model.

      We are really grateful to the reviewer for pointing out these aspects of habituation that we overlooked in the previous version of our manuscript. Indeed, our model is able to capture most of these 10 observed behaviors, specifically: 1) habituation; 2) spontaneous recovery; 3) potentiation of habituation; 4) frequency sensitivity; and 5) intensity sensitivity. Here, we are following the same terminology employed in bioRxiv 2024.08.04.606534, the paper highlighted by the referee. Regarding the hallmark 6) subliminal accumulation, we also believe that our model can capture it as well, but more analyses are needed to substantiate this claim. We will include the discussion of these points in the revised version.

      Notably, in line with the discussion in bioRxiv 2024.08.04.606534, we also think that feature 10) long-term habituation, is ambiguous and its appearance might be simply related to the other features discussed above. In the revised version, we will detail our take on this aspect in relation to the presented model.

      All other hallmarks require the presence of multiple stimuli and, as a consequence, they cannot be observed within our model, but are interesting lines of research for future investigations. We believe that this addition will help clarify the validity of the model and the relevance of our result, consequently improving the quality of our manuscript.

      Furthermore, the habituated response in steady-state is approximately 20% less than the initial response, which seems to be achieved already after 3-4 pulses, the subsequent change in response amplitude seems to be negligible, although the authors however state "after a large number of inputs, the system reaches a time-periodic steady-state". How do the authors justify these minimal decreases in the response amplitude? Does this come from the model parametrization and is there a parameter range where more pronounced habituation responses can be observed?

      The referee is correct, but this is solely a consequence of the specific set of parameters we selected. We made this choice solely for visualization purposes. In the next version, when different emerging behaviors characterizing habituation are discussed, we will also present a set of parameters for which habituation can be better appreciated, justifying our new choice.

      We stated that the time-periodic steady-state is reached “after a large number of stimuli” from a mathematical perspective. However, by using a habituation threshold, as defined in bioRxiv 2024.08.04.606534 for example, we can say that the system is habituated after a few stimuli for the set of parameters selected in the first version of the manuscript. We will also discuss this aspect in the Supplemental Material of the revised version, as it will also be important to appreciate the hallmarks of habituation listed above.

      The same is true for the information content (Figure 2f) - already at the first pulse, IU, H ~ 0.7 and only negligibly increases afterwards. In my understanding, during learning, the mutual information between the input and the internal state increases over time and the system extracts from these predictions about its responses. In the model presented by the authors, it seems the system already carries information about the environment which hardly changes with repeated stimulus presentation. The complexity of the signal is also limited, and it is very hard to clarify from the presented results, whether the proposed model can actually explain basic features of habituation, as mentioned above.

      The point about information is more subtle. We can definitely choose a set of parameters for which the information gain is higher and we will show it in the Supplemental Material of the revised version. However, as the reviewer correctly points out, it is difficult to give an interpretation of the specific value of I_U,H for such a minimal model.

      We also remark that, since the readout population and the receptor both undergo a fast dynamics (with appropriate timescales as discussed in the text), we are not observing the transient gain of information associated with the first stimulus and, as such, the mutual information presents a discontinuous behavior resembling the dynamics of the readout.

      Additionally, there have been two recent models on habituation and I strongly suggest that the authors discuss their work in relation to recent works (bioRxiv 2024.08.04.606534; arXiv:2407.18204).

      We thank the reviewer for pointing out these relevant references. We will discuss analogies and differences in the revised version of the main text. The main difference is the fact that information-theoretic aspects of habituation are not discussed in the presented references, while the idea of this work is to elucidate exactly the interplay between information gain and habituation dynamics.

      Reviewer #2 (Public review):

      In this study, the authors aim to investigate habituation, the phenomenon of increasing reduction in activity following repeated stimuli, in the context of its information-theoretic advantage. To this end, they consider a highly simplified three-species reaction network where habituation is encoded by a slow memory variable that suppresses the receptor and therefore the readout activity. Using analytical and numerical methods, they show that in their model the information gain, the difference between the mutual information between the signal and readout after and before habituation, is maximal for intermediate habituation strength. Furthermore, they demonstrate that the Pareto front corresponds to an optimization strategy that maximizes the mutual information between signal and readout in the steady state, minimizes some form of dissipation, and also exhibits similar intermediate habituation strength. Finally, they briefly compare predictions of their model to whole-brain recordings of zebrafish larvae under visual stimulation.

      The author's simplified model might serve as a solid starting point for understanding habituation in different biological contexts as the model is simple enough to allow for some analytic understanding but at the same time exhibits all basic properties of habituation in sensory systems. Furthermore, the author's finding of maximal information gain for intermediate habituation strength via an optimization principle is, in general, interesting. However, the following points remain unclear or are weakly explained:

      We thank the reviewer for deeming our work interesting and for considering it a solid starting point for understanding habituation in biological systems.

      (1) Is it unclear what the meaning of the finding of maximal information gain for intermediate habituation strength is for biological systems? Why is information gain as defined in the paper a relevant quantity for an organism/cell? For instance, why is a system with low mutual information after the first stimulus and intermediate mutual information after habituation better than one with consistently intermediate mutual information? Or, in other words, couldn't the system try to maximize the mutual information acquired over the whole time series, e.g., the time series mutual information between the stimulus and readout?

      This is an important and delicate aspect to discuss. We considered the mutual information with a prolonged stimulation when building the Pareto front, by maximizing this quantity while minimizing the dissipation. The observation that the Pareto front lies in the vicinity of the maximum of the information gain hints at the fact that reducing the information gain by increasing the mutual information at each stimulation will require more energy. However, we did not thoroughly explore this aspect by considering all sources of dissipation and the fact that habituation is, anyway, a dynamical phenomenon. In the revised version, we will clarify this point, extending our analyses.

      We would like to add that, from a naive perspective, while the first stimulation will necessarily trigger a certain mutual information, multiple observations of the same stimulus have to reflect into accumulated infor

      mation that consequently drives the onset of observed dynamical behaviors, such as habituation.

      (2) The model is very similar to (or a simplification of previous models) for adaptation in living systems, e.g., for adaptation in chemotaxis via activity-dependent methylation and demethylation. This should be made clearer.

      We apologize for having missed this point. Our choice has been motivated by the fact that we wanted to avoid any confusion between the usual definition of (perfect) adaptation and habituation. At any rate, we will add this clarification in the revised version.

      (3) It remains unclear why this optimization principle is the most relevant one. While it makes sense to maximize the mutual information between stimulus and readout, there are various choices for what kind of dissipation is minimized. Why was \delta Q_R chosen and not, for instance, \dot{\Sigma}_int or the sum of both? How would the results change in that case? And how different are the results if the mutual information is not calculated for the strong stimulation input statistics but for the background one?

      We thank the referee for giving us the opportunity to deepen this aspect of the manuscript. We decided to minimize \delta Q_R since this dissipation is unavoidable. In fact, considering the existence of two different pathways implementing sensing and feedback, the presence of any input will result in a dissipation produced by the receptor. This energy consumption is reflected in \delta Q_R. Conversely, the dissipation associated with the storage is always zero in the limit of a fast memory. However, we know that such a limit is pathological and leads to no habituation. As a consequence, in the revised version we will discuss other choices for our optimization approach, along with their potentialities and limitations.

      The dependence of the Pareto front on the stimulus strength is shown in the Supplemental Material, but not in relation to habituation and information gain. We will strengthen this part in the revised version of the manuscript, elaborating more on the connection between optimality, information gain, and dynamical behavior.

      (4) The comparison to the experimental data is not too strong of an argument in favor of the model. Is the agreement between the model and the experimental data surprising? What other behavior in the PCA space could one have expected in the data? Shouldn't the 1st PC mostly reflect the "features", by construction, and other variability should be due to progressively reduced activity levels?

      The agreement between data and model is not surprising - we agree on this - since the data exhibit habituation. However, the fact that, without any explicit biological details, our minimal model is able to capture the features of a complex neural system just by looking at the PCs is non-trivial. The 1st PC only reflects the feature that captures most of the variance of the data and, as such, it is difficult to have a-priori expectations on what it should represent. Depending on the behavior of higher-order PCs, we may include them in the revised version if any interesting results arise.

      Reviewer #3 (Public review):

      The authors use a generic model framework to study the emergence of habituation and its functional role from information-theoretic and energetic perspectives. Their model features a receptor, readout molecules, and a storage unit, and as such, can be applied to a wide range of biological systems. Through theoretical studies, the authors find that habituation (reduction in average activity) upon exposure to repeated stimuli should occur at intermediate degrees to achieve maximal information gain. Parameter regimes that enable these properties also result in low dissipation, suggesting that intermediate habituation is advantageous both energetically and for the purpose of retaining information about the environment.

      A major strength of the work is the generality of the studied model. The presence of three units (receptor, readout, storage) operating at different time scales and executing negative feedback can be found in many domains of biology, with representative examples well discussed by the authors (e.g. Figure 1b). A key takeaway demonstrated by the authors that has wide relevance is that large information gain and large habituation cannot be attained simultaneously. When energetic considerations are accounted for, large information gain and intermediate habituation appear to be a favorable combination.

      We thank the referee for this positive assessment of our work and its generality.

      While the generic approach of coarse-graining most biological detail is appealing and the results are of broad relevance, some aspects of the conducted studies, the problem setup, and the writing lack clarity and should be addressed:

      (1) The abstract can be further sharpened. Specifically, the "functional role" mentioned at the end can be made more explicit, as it was done in the second-to-last paragraph of the Introduction section ("its functional advantages in terms of information gain and energy dissipation"). In addition, the abstract mentions the testing against experimental measurements of neural responses but does not specify the main takeaways. I suggest the authors briefly describe the main conclusions of their experimental study in the abstract.

      We thank the referee for this suggestion. The revised version will present a modified abstract in line with the reviewer’s proposal.

      (2) Several clarifications are needed on the treatment of energy dissipation.

      - When substituting the rates in Eq. (1) into the definition of δQ_R above Eq. (10), "σ" does not appear on the right-hand side. Does this mean that one of the rates in the lower pathway must include σ in its definition? Please clarify.

      We apologize to the referee for this typo. Indeed, \sigma sets the energy scale of the feedback and, as such, it appears in the energetic driving given by the feedback on the receptor, i.e., together with \kappa in Eq. (1). We will fix this issue in the revised version. Moreover, we will check the entire manuscript to be sure that all formulas are consistent.

      - I understand that the production of storage molecules has an associated cost σ and hence contributes to dissipation. The dependence of receptor dissipation on <H>, however, is not fully clear. If the environment were static and the memory block was absent, the term with <H> would still contribute to dissipation. What would be the nature of this dissipation?

      In the spirit of building a paradigmatic minimal model with a thermodynamic meaning, we considered H to act as an external thermodynamic driving. Since this driving acts on a different pathway with respect to the one affected by the storage, the receptor is driven out of equilibrium by its presence. By eliminating the memory block, we would also be necessarily eliminating the presence of the pathway associated with the storage effect (“internal pathway” in the manuscript). In this case, the receptor is a 2-state, 1-pathway system and, as such, it always satisfies an effective detailed balance. As a consequence, the definition of \delta Q_R reported in the manuscript does not hold anymore and the receptor does not exhibit any dissipation. Our choice to model two different pathways has been biologically motivated. We will make this crucial aspect clearer in the revised manuscript.

      - Similarly, in Eq. (9) the authors use the ratio of the rates Γ_{s → s+1} and Γ_{s+1 → s} in their expression for internal dissipation. The first-rate corresponds to the synthesis reaction of memory molecules, while the second corresponds to a degradation reaction. Since the second reaction is not the microscopic reverse of the first, what would be the physical interpretation of the log of their ratio? Since the authors already use σ as the energy cost per storage unit, why not use σ times the rate of producing S as a metric for the dissipation rate?

      In the current version of the manuscript, we employed the scheme of a controlled birth and death process to model the coupled process of readout and storage production. Since we are not dealing with a detailed biochemical underlying network, we used this coarse-grained description to capture the main features of the dynamics. In this sense, the considered reactions produce and destroy a molecule from a certain pool even if they are controlled in different ways by the readout. However, we completely agree with the point of view of the referee and will analyze our results following their suggestion.

      (3) Impact of the pre-stimulus state. The plots in Figure 2 suggest that the environment was static before the application of repeated stimuli. Can the authors comment on the impact of the pre-stimulus state on the degree of habituation and its optimality properties? Specifically, would the conclusions stay the same if the prior environment had stochastic but aperiodic dynamics?

      The initial stimulus is indeed stochastic with an average constant in time. Model response depends on the pre-stimulus level, since it also sets the stationary storage concentration before the first “strong” stimulation arrives. This dependence is not crucial for our result but deserves proper discussion, as the referee correctly pointed out. We will clarify this point in the revised version of this study.

      (4) Clarification about the memory requirement for habituation. Figure 4 and the associated section argue for the essential role that the storage mechanism plays in habituation. Indeed, Figure 4a shows that the degree of habituation decreases with decreasing memory. The graph also shows that in the limit of vanishingly small Δ⟨S⟩, the system can still exhibit a finite degree of habituation. Can the authors explain this limiting behavior; specifically, why does habituation not vanish in the limit Δ⟨S⟩ -> 0?

      We apologize for the lack of clarity here. Actually, Δ⟨S⟩ is not strictly zero, but equal to 0.15% at the final point. However, due to rounding this appears as 0% in the plot, and we will fix it in the revised version. Let us note that the fact that Δ⟨S⟩ is small signals a nonlinear dependence of Δ⟨U⟩ from Δ⟨S⟩, but no contradiction. We will clarify this aspect in the revised version.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study investigates a dietary intervention that employs a smartphone app to promote meal regularity, which may be useful. Despite no observed changes in caloric intake, the authors report significant weight loss. While the concept is very interesting and deserves to be studied due to its potential clinical relevance, the study's rigor needs to be revised, notably for its reliance on self-reported food intake, a highly unreliable way to assess food intake. Additionally, the study theorizes that the intervention resets the circadian clock, but the study needs more reliable methods for assessing circadian rhythms, such as actigraphy.

      Thank you for the positive yet critical feedback on our manuscript. We are pleased with the assessment that our study is very interesting and deserves to be continued. We have addressed the points of criticism mentioned and discussed the limitations of the study in more detail in the revised version than before.

      Nevertheless, we would like to note that one condition for our study design was that the participants were able to carry out the study in their normal everyday environment. This means that it is not possible to fully objectively record food intake - especially not over a period of eight weeks. In our view, self-reporting of food intake is therefore unavoidable and also forms the basis of comparable studies on chrononutrition. We believe that recording data with a smartphone application at the moment of eating is a reliable means of recording food consumption and is better suited than questionnaires, for example, which have to be completed retrospectively. Objectivity could be optimized by transferring photographs of the food consumed. However, even this only provides limited protection against underreporting, as photos of individual meals, snacks, or second servings could be omitted by the participants. Sporadic indirect calorimetric measurements can help to identify under-reporting, but this cannot replace real-time self-reporting via smartphone application.

      Our data show that at the behavioral level, the rhythms of food intake are significantly less variable during the intervention. Our assumption that precise mealtimes influence the circadian rhythms of the digestive system is not new and has been confirmed many times in animal and human studies. It can therefore be assumed that comparable effects also apply to the participants in our study. Of course, a measurement of physiological rhythms is also desirable for a continuation of the study. However, we suspect that cellular rhythms in tissues of the digestive tract in particular are decisive for the changes in body weight. The characterization of these rhythms in humans is at best indirectly possible via blood factors. Reduced variability of the sleep-wake rhythm, which is measured by actigraphy, may result from our intervention, but in our view is not the decisive factor for the optimization of metabolic processes.

      We have addressed the specific comments and made changes to the manuscript as indicated below.

      Reviewer #1 (Public Review):

      The authors Wilming and colleagues set out to determine the impact of regularity of feeding per se on the efficiency of weight loss. The idea was to determine if individuals who consume 2-3 meals within individualized time frames, as opposed to those who exhibit stochastic feeding patterns throughout the circadian period, will cause weight loss.

      The methods are rigorous, and the research is conducted using a two-group, single-center, randomized-controlled, single-blinded study design. The participants were aged between 18 and 65 years old, and a smartphone application was used to determine preferred feeding times, which were then used as defined feeding times for the experimental group. This adds strength to the study since restricting feeding within preferred/personalized feeding windows will improve compliance and study completion. Following a 14-day exploration phase and a 6-week intervention period in a cohort of 100 participants (inclusive of both the controls and the experimental group that completed the study), the authors conclude that when meals are restricted to 45min or less durations (MTVS of 3 or less), this leads to efficient weight loss. Surprisingly, the study excludes the impact of self-reported meal composition on the efficiency of weight loss in the experimental group. In light of this, it is important to follow up on this observation and develop rigorous study designs that will comprehensively assess the impact of changes (sustained) in dietary composition on weight loss. The study also reports interesting effects of regularity of feeding on eating behavior, which appears to be independent of weight loss. Perhaps the most important observation is that personalized interventions that cater to individual circadian needs will likely result in more significant weight loss than when interventions are mismatched with personal circadian structures.

      We would like to thank the reviewer for the positive assessment of our study.

      (1) One concern for the study is its two-group design; however, single-group cross-over designs are tedious to develop, and an adequate 'wash-out' period may be difficult to predict.

      A cross-over design would of course be highly desirable and, if feasible, would be able to provide more robust data than a two-group design. However, we have strong doubts about the feasibility of a cross-over design. Not only does the determination of the length of the washout period to avoid carry-over effects of metabolic changes pose a difficulty, but also the assumption that those participants who start with the TTE intervention will consciously or unconsciously pay attention to adherence to certain eating times in the next phase, when they are asked to eat at times like before the study.

      In a certain way, however, our study fulfills at least one arm of the cross-over design. During the follow-up period of our study, there were some participants who, by their own admission, started eating at more irregular times again, which is comparable to the mock treatment of the control subjects. And these participants gained weight again.

      (2)  A second weakness is not considering the different biological variables and racial and ethnic diversity and how that might impact outcomes. In sum, the authors have achieved the aims of the study, which will likely help move the field forward.

      In the meantime, we have at least added analyses regarding the age and gender of the participants and found no correlations with weight loss. The sample size of this pilot study was too small for a reliable analysis of the influence of ethnic diversity. If the study is continued with a larger sample size, this type of analysis will certainly come into play.

      We are pleased with the assessment that we have achieved our goals and are helping to advance the field.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigated the effects of the timing of dietary occasions on weight loss and well-being with the aim of explaining if a consistent, timely alignment of dietary occasions throughout the days of the week could improve weight management and overall well-being. The authors attributed these outcomes to a timely alignment of dietary occasions with the body's own circadian rhythms. However, the only evidence the authors provided for this hypothesis is the assumption that the individual timing of dietary occasions of the study participants identified before the intervention reflects the body's own circadian rhythms. This concept is rooted in understanding of dietary cues as a zeitgeber for the circadian system, potentially leading to more efficient energy use and weight management. Furthermore, the primary outcome, body weight loss, was self-reported by the study participants.

      Strengths:

      The innovative focus of the study on the timing of dietary occasions rather than daily energy intake or diet composition presents a fresh perspective in dietary intervention research. The feasibility of the diet plan, developed based on individual profiles of the timing of dietary occasions identified before the intervention, marks a significant step towards personalised nutrition.

      We thank the reviewer for the generally positive assessment of our study and for sharing the view that our personalized approach represents an innovative step in chrononutrion.

      Weaknesses:

      (1) Several methodological issues detract from the study's credibility, including unclear definitions not widely recognized in nutrition or dietetics (e.g., "caloric event"), lack of comprehensive data on body composition, and potential confounders not accounted for (e.g., age range, menstrual cycle, shift work, unmatched cohorts, inclusion of individuals with normal weight, overweight, and obesity).

      We have replaced the term "caloric event" with "calorie intake occasion" and otherwise revised our manuscript with regard to other terminology in order to avoid ambiguity.

      We agree with the reviewer that the determination of body composition is a very important parameter to be investigated. Such investigations will definitely be part of the future continuation of the study. In this pilot study, we aimed to clarify in principle whether our intervention approach shows effects. Since we believe that this is certainly the case, we would like to address the question of what exactly the physiological mechanisms are that explain the observed weight loss in the future.

      Part of these future studies will also include other parameters in the analyses. However, in response to the reviewer's suggestions, we have already completed analyses regarding age and gender of the participants, which show that both variables have no influence on weight loss.

      In our view, the menstrual cycle should not have a major influence on the effectiveness of a 6-week intervention.

      The inclusion of shift workers is not a problem from our point of view. If their work shifts allow them to follow their personal eating schedule, we see no violation of our hypothesis. If this is not the case, as our data in Fig. 1G show, we do not expect any weight loss. Nevertheless, the reviewer is of course right that shift work can generally be a confounding factor and have an influence on weight loss success. To our knowledge, none of the 100 participants evaluated were shift workers. In a continuation of the study, however, shift work should be an exclusion criterion. Yet, our intervention approach could be of great interest for shift workers in particular, as they may be at a particularly high risk of obesity due to irregular eating times. A separate study with shift workers alone could therefore be of particular interest.

      The fact that it turned out that the baseline BMI of the remaining 67 EG and 33 CG participants did not match is discussed in detail in the section "3.1 Limitations". Although this is a limitation, it does not raise much doubt about the effectiveness of the intervention, as a subgroup analysis shows that intervention subjects lose more weight than control subjects of the same BMI.

      The inclusion of a wide BMI range was intentional. Our hypothesis is that reduced temporal variability in eating times optimizes metabolism and therefore excess body weight is lost (which we would like to investigate specifically in future studies). We hypothesize that people living with a high BMI will experience greater optimization than people with a lower BMI. Our data in Figs. 1H and S2I suggest that this assumption is correct.

      (2) The primary outcome's reliance on self-reported body weight and subsequent measurement biases further undermines the reliability of the findings.

      Self-reported data is always more prone to errors than objectively measured data. With regard to the collection of body weight, we were severely restricted in terms of direct contact with the participants during the conduct of the study due to the Covid-19 pandemic. At least the measurement of the initial body weight (at T0), the body weight after the end of the exploration phase (at T1) and the final body weight (at T2) were measured in video calls in the (virtual) presence of the study staff. These are the measurement points that were decisive for our analyses. Intermediate self-reported measurement points were not considered for analyses. We have added in the Materials & Methods section that video calls were undertaken to minimize the risk of misreporting.

      (3) Additionally, the absence of registration in clinical trial registries, such as the EU Clinical Trials Register or clinicaltrials.gov, and the multiple testing of hypotheses which were not listed a priori in the research protocol published on the German Register of Clinical Trials impede the study's transparency and reproducibility.

      Our study was registered in the DRKS - German Clinical Trials Register in accordance with international requirements. The DRKS fulfills the same important criteria as the EU Clinical Trial Register and clinicaltrials.gov.

      We quote from the homepage of the DRKS: „The DRKS is the approved WHO Primary Register in Germany and thus meets the requirements of the International Committee of Medical Journal Editors (ICMJE). […] The WHO brings together the worldwide activities for the registration of clinical trials on the International Clinical Trials Registry Platform (ICTRP). […] As a Primary Register, the DRKS is a member of the ICTRP network.”

      We are therefore convinced that we registered our study in the correct place.

      Furthermore, in our view, we did not provide less information on planned analyses than is usual and all our analyses were covered by the information in the study registry. We have stated the hypothesis in the study register that „strict adherence to [personalized] mealtimes will lead to a strengthening of the circadian system in the digestive tract and thus to an optimization of the utilization of nutrients and ultimately to the adjustment of body weight to an individual ideal value.“

      In our view, numerous analyses are necessary to test this hypothesis. We investigated whether it is the adherence to eating times that is related to the observed weight loss (Fig. 1), or possibly other variables resulting from adherence to the meal schedule (Fig. 3). In addition, we analyzed whether the intervention optimized the utilization of nutrients, which we did based on the food composition and number of calories during the exploration and intervention phases (Fig. 2). We investigated whether the personalization of meal schedules plays a role (Fig. 3). And we attempted to analyze whether the adjustment of body weight to an individual ideal value occurs by correlating the influence of the original BMI with weight loss. Only the hypothesis that the circadian system in the digestive tract is strengthened has not yet been directly investigated, a fact that is listed as a limitation. Although it can be assumed that this has happened, as the Zeitgeber “food” has lost significant variability as a result of the intervention. The analyses on general well-being are covered in the study protocol by the listing of secondary endpoints.

      Beyond that, we did not analyze any hypotheses that were not formulated a priori.

      For these reasons, we see no restriction in transparency, reproducibility or requirements and regulations.

      Achievement of Objectives and Support for Conclusions:

      (4) The study's objectives were partially met; however, the interpretation of the effects of meal timing on weight loss is compromised by the weaknesses mentioned above. The evidence only partially supports some of the claims due to methodological flaws and unstructured data analysis.

      We hope that we have been able to dispel uncertainties regarding some interpretations through supplementary analyses and the addition of some methodological details.

      Impact and Utility:

      (5) Despite its innovative approach, significant methodological and analytical shortcomings limit the study's utility. If these issues were addressed, the research could have meaningful implications for dietary interventions and metabolic research. The concept of timing of dietary occasions in sync with circadian rhythms holds promise but requires further rigorous investigation.

      We are pleased with the assessment that our data to date is promising. We hope that the revised version will already clarify some of the doubts about the data available so far. Furthermore, we absolutely agree with the reviewer: the present study serves to verify whether our intervention approach is potentially effective for weight loss - which we believe is the case. In the next steps, we plan to include extensive metabolic studies and to adjust the limitations of the present study.

      Reviewer #3 (Public Review):

      The authors tested a dietary intervention focused on improving meal regularity in this interesting paper. The study, a two-group, single-center, randomized, controlled, single-blind trial, utilized a smartphone application to track participants' meal frequencies and instructed the experimental group to confine their eating to these times for six weeks. The authors concluded that improving meal regularity reduced excess body weight despite food intake not being altered and contributed to overall improvements in well-being.

      The concept is interesting, but the need for more rigor is of concern.

      We would like to thank the reviewer for the interest in our study.

      (1) A notable limitation is the reliance on self-reported food intake, with the primary outcome being self-reported body weight/BMI, indicating an average weight loss of 2.62 kg. Despite no observed change in caloric intake, the authors assert weight loss among participants.

      As already described above in the responses to the reviewer 2, the body weight assessment took place in video calls in the (virtual) presence of study staff, so that the risk of misreporting is minimized. We have added this information to the manuscript.

      When recording food intake, we had to weigh up the risk of misreporting against the risk of a lack of validity in a permanently monitored setting. It was important to us to investigate the effectiveness of the intervention in the participants' everyday environment and not in a laboratory setting in order to be able to convincingly demonstrate its applicability in everyday life. The restriction of self-reporting is therefore unavoidable in our view and must be accepted. It can possibly be reduced by photographing the food, but even this is not a complete protection against underreporting, as there is no guarantee that everything that is ingested is actually photographed.

      However, our analyses show that the reporting behavior of individual participants did not change significantly between the exploration and intervention phases. We do not assume that participants who underreported only did so during the exploration phase (and only ate more than reported in this study phase) and reported correctly in the intervention phase (and then indeed consumed fewer calories).  We discuss this point in the section "3.1 Limitations".

      (2) The trial's reliance on self-reported caloric intake is problematic, as participants tend to underreport intake; for example, in the NEJM paper (DOI: 10.1056/NEJM199212313272701), some participants underreported caloric intake by approximately 50%, rendering such data unreliable and hence misleading. More rigorous methods for assessing food intake are available and should have been utilized. Merely acknowledging the unreliability of self-reported caloric intake is insufficient as it would still leave the reader with the impression that there is no change in food intake when we actually have no idea if food intake was altered. A more robust approach to assessing food intake is imperative. Even if a decrease in caloric intake is observed through rigorous measurement, as I am convinced a more rigorous study would unveil testing this paradigm, this intervention may merely represent another short-term diet among countless others that show that one may lose weight by going on a diet, principally due to heightened dietary awareness.

      The risks of self-reporting, our considerations, and our analysis of participants' reporting behavior and caloric intake over the course of the study are discussed in detail both in our responses above and in the manuscript. 

      With regard to the reviewer's second argument, we have largely adapted the study protocol of the control group to that of the experimental group. Apart from the fact that the control subjects were not given guidelines on eating times and were instead only given a very rough time window of 18 hours for food intake, the content of the sessions and the measurement methods were the same in both groups. This means that the possibility of increased nutritional awareness was equally present in both groups, but only the participants in the experimental group lost a significant amount of body weight.

      In future continuations of the study, further follow-up after an even longer period than four weeks (e.g. after 6 months) can be included in the protocol in order to examine whether the effects can be sustained over a longer period.

      (3) Furthermore, the assessment of circadian rhythm using the MCTQ, a self-reported measure of chronotype, may not be as reliable as more objective methods like actigraphy.

      The MCTQ is a validated means of determining chronotype and its results are significantly associated with the results of actigraphic measurements. In our view, the MCTQ is sufficient to test our hypothesis that matching the chronobiological characteristics of participants is beneficial. Nevertheless, measurements using actigraphy could be of interest, for example to correlate the success of weight loss with parameters of the sleep-wake rhythm.

      (4) Given the potential limitations associated with self-reported data in both dietary intake and circadian rhythm assessment, the overall impact of this manuscript is low. Increasing rigor by incorporating more objective and reliable measurement techniques in future studies could strengthen the validity and impact of the findings.

      The body weight data was not self-reported, but the measurements were taken in the presence of study staff. Although optimization might be possible (see above), we do not currently see any other way of recording all calorie intake occasions in the natural environment of the participants over a period of several weeks (or possibly longer, as noted by the reviewer) other than self-report and, in our opinion, it would not be feasible. For the future continuation of the study, we are planning occasional indirect calorimetry measurements that can provide information about the actual amount of food consumed in different phases of the study. These can reveal errors in the self-report but will not be able to replace daily data collection by means of self-report.

      Reviewer #1 (Recommendations For The Authors):

      Summary:

      This interesting and timely study by Wilming and colleagues examines the effect of regularity vs. irregularity of feeding on body weight dynamics and BMI. A rigorous assessment of the same in humans needs to be improved, which this study provides. The study is well-designed, with a 14-day exploration phase followed by 6 weeks of intervention, and it is commendable to see the number of participants (100) who completed the study. Incorporation of a follow-up assessment 4 weeks after the conclusion of the study shows maintained weight loss in a subset of Experimental Group (EG) participants who continue with regular meals. There are several key observations, including particular meal times (lunch and dinner), which, when restricted to 45min or less in duration (MTVS of 3 or less), will lead to efficient weight loss, as well as correlations between baseline BMI and weight loss. The authors also exclude the impact of self-reported meal composition on the efficiency of weight loss in the EG group in the context of this study. The study reports interesting effects of regularity of feeding on eating behavior, which appears to be independent of weight loss. Finally, the authors highlight an important point: to provide attention to personalized feeding and circadian windows and that personalized interventions that cater to individual circadian structures will result in more significant weight loss. This is an important concept that needs to be brought to light. There are only a few minor comments listed below:

      Minor comments:

      (1) The authors may provide explanations for the reduction in the MTVS in the EG and the increase in the same for the Control Group (CG). The increases in MTVS in CG are surprising (lines 105-106) because it is assumed that there is no difference in CG eating patterns prior to and during the study.

      As the reviewer correctly states, our assumption was that there should be no change in the MTVS before and during the study - but we could not rule this out, as the subjects were not given any indication of the regularity of food intake in the fixed time window in the meetings with the study staff, i.e. they were not instructed to continue eating exactly as before. This would possibly have led to an effort on the part of the participants to adhere to a schedule as precisely as possible. As a result, there was a statistically significant worsening of the MTVS in the CG, which was less than 0.6 MTVS, i.e. a time span of only approx. ± 7.5 min, and remained within the MTVS 3. Since there were no correlations between the measured MTVS and the weight of the subjects in the CG and a change of about half an MTVS value has only a rather minor effect on weight, we do not attribute great significance to the observed deterioration in the MTVS.

      (2) There would be greater clarity for the readers if the authors clearly defined the study design in detail at the outset of the study, e.g., in section 2.1.

      We have included a brief summary of the study design at the end of the introduction so that the reader is already familiar with it at the beginning of the manuscript without having to switch to the material and methods section.

      (3) The data in Fig S2H is important and informs readers that the regularity of lunch and dinner is more related to body weight changes than breakfast. These data should be incorporated in the Main Figure. In addition, analyses of Table S7 data indicate that MTVS of no greater than 3 or -/+45mins of the meal-timing window is associated with efficient weight loss) should be represented in a figure panel in the Main Figures.

      As suggested by the reviewer, we have moved Fig. S2H to the main Fig. 1. In addition, Table S7 is now no longer inserted as a supplementary table but as main Table 1 in the manuscript.

      (4) The authors state in lines 222-223 that "weight changes of participants were not related to one of these changes in eating characteristics (Fig. 3B-D, Tab. S6)", referring to the shortening of feeding windows as noted in the EG group. This is a rather simplistic statement, which should be amended to include that weight changes may not relate to changes in eating characteristics per se but likely relate to changes in metabolic programming, for instance, energy expenditure increases, which have been shown to associate with these changes in eating characteristics. This is important to note.

      We have changed the wording at this point so that it is clear that we are only referring here in the results section to the results of the mathematical analysis, which showed no correlation between the eating time window and weight loss in our sample. However, we have now explicitly mentioned the change in metabolic programming correctly noted by the reviewer in the discussion at the end of section 3.

      (5) Please provide more background and details on the attributes that define individual participant chronotypes in the manuscript before discussing datasets, e.g., mSP and mEP. This is relation to narratives between 228-230: "Indeed, our data show that the later the chronotype of participants (measured by the MCTQ mid-sleep phase, mSP [24]), the later their mid-eat phase (mEP) on weekends (Fig. 3E, Tab. S6), with the mSP and mEP being almost antiphasic on average (Fig. 3F, Tab. S10)." This will help readers unfamiliar with circadian biology/chronobiology research understand the contents of this manuscript, particularly Fig 3.

      We have explained the new chronobiology terms that appear in the chapter better in the revised version so that they are easier to understand.

      Reviewer #2 (Recommendations For The Authors):

      (1) Clarify Terminology: Define or avoid using ambiguous terms such as "caloric event" to prevent confusion, especially for readers less familiar with chronobiology. Consider providing clear explanations or opting for more widely understood terms.

      We have replaced "caloric event" with “calorie intake occasion” and explain various chronobiology terms better, so that hopefully readers from other disciplines can now follow the text more easily.

      (2) Detailed Methodological Descriptions: Improve the transparency of your methods, especially concerning the measurement of primary and secondary outcomes. Address the concerns raised about the reliability of self-reported weight and the potential biases in measurement methods.

      In the section "3.1 Limitations", we have examined the aspect of the reliability of self-reported data and our measures to reduce this uncertainty in more detail. We have also added further details on the measurement of outcomes in the materials and methods section.

      (3) Address Participant Selection Criteria: Reevaluate the inclusion criteria and consider discussing the implications on the study's findings of the broad age range, the inclusion of shift work, unmatched cohorts, and inclusion of individuals with normal weight, overweight, and obesity. Provide a subgroup analysis or discuss how BMI might have influenced the results. Even though this is an additional post-hoc analysis, it would directly address one of the major weaknesses of the study design.

      We have supplemented the analyses and now show in Fig. S2G that neither age nor gender had any influence on weight loss as a result of the intervention. To our knowledge, none of the 100 participants evaluated were shift workers. Even if shift workers were part of the study without our knowledge, we do not consider this to be a problem as long as their shifts allow them to keep to certain eating times. The fact that it turned out that the baseline BMI of the remaining 67 EG and 33 CG participants did not match is discussed in detail in the section "3.1 Limitations". Our previous analysis in Fig. S2I already showed that there is a negative correlation between baseline BMI and weight loss - an interesting result, as it shows that people with a high BMI particularly benefit from the intervention. In addition, we already showed in Fig. S2J in a subgroup analysis that in all strata the BMI of EG subjects decreased more than that of CG subjects, even if they had the same initial BMI. We do not consider the wide dispersion of the BMIs of the included participants to be a weakness of the study design. On the contrary, it allows us to make a statement about which target group the intervention is particularly suitable for.

      (4) Improve Statistical Analysis: If not already done, involve a biostatistician to review the statistical analyses, particularly concerning post-hoc tests, correlation analyses, and the handling of measurement biases. Ensure that deviations from the original study protocol are clearly documented and justified.

      All analyses have already been checked by a statistician, decided together with him and approved by him.

      (5) Data Interpretation and Speculation: Limit speculation and clearly distinguish between findings supported by your data from hypotheses and future directions. Ensure that discussions about the implications of meal timing on metabolism are supported by evidence with adequate references and clearly state where further research is needed.

      We have revised the discussion and, especially through the detailed discussions of the limitations, we have emphasized more clearly what has been achieved and what still needs to be proven in future studies.

      (6) Clinical Trial Registration: Address the lack of registration in the EU Clinical Trials Register and clinicaltrials.gov. Discuss its potential implications on the study's transparency and how it aligns with current requirements and regulations.

      Our study was registered in the DRKS - German Clinical Trials Register in accordance with international requirements. The DRKS fulfills the same important criteria as the EU Clinical Trial Register and clinicaltrials.gov.

      We quote from the homepage of the DRKS: „The DRKS is the approved WHO Primary Register in Germany and thus meets the requirements of the International Committee of Medical Journal Editors (ICMJE).[…] The WHO brings together the worldwide activities for the registration of clinical trials on the International Clinical Trials Registry Platform (ICTRP). […] As a Primary Register, the DRKS is a member of the ICTRP network.”

      We are therefore convinced that we registered our study in the correct place before it began and see no restriction in transparency or requirements and regulations.

      (7) Use of Sensitive and Current Terminology: Update the manuscript to reflect the latest recommendations regarding the language used to describe obesity and patients living with obesity. This ensures respect and accuracy in reporting and aligns with contemporary standards in the field.

      We updated the manuscript accordingly.

      (8) Strengthen the Introduction: Expand the literature review to include more recent and relevant studies that contextualise your work within the broader field of chrononutrition. This could help clarify how your study builds upon or diverges from existing research.

      We have included further studies in the introduction that aim to reduce body weight by restricting food intake to certain time periods. We have also more clearly contrasted the designs of these studies with the design of our study.

      (9) Clarify Discrepancies and Errors: Address any inconsistencies, such as the discrepancy in meal timing instructions (90 minutes reported in the conclusion vs. 60 minutes reported in the methods), and ensure all figures, tables, and statistical analyses are correctly referenced and described.

      The first point mentioned by the reviewer is not an inconsistency. To ensure the feasibility of the intervention, each participant was initially given a time window of +/- 30 minutes (60 min) from the specified eating time. Our later analyses show that even a time window of +/- 45 minutes (90 min) around the specified eating time is sufficient to lose weight efficiently (see results in Table 1).

      We have checked all references to figures, tables and statistical analyses and updated them if necessary.

      (10) Discuss Limitations and Bias: More thoroughly discuss the limitations of your study, including the potential impacts of biases and how they were mitigated. Additionally, consider the effects of including shift workers and how this choice impacts the applicability of your findings.

      Section “3.1 Limitations” has now been supplemented by a number of points and discussions. As described above, we do not consider the inclusion of shift workers to be a limitation as long as they are able to adhere to the specifications of the eating time plan. We cannot derive any indications to the contrary from our data.

      (11) Consider Publishing Separate Manuscripts: If the study encompasses a wide range of outcomes or post-hoc analyses, consider separating these into distinct publications to allow for a more focused and detailed exploration of each set of findings.

      We will take this advice into consideration for future publications on the continuation of the study. As this is a pilot study that is intended to clarify whether and to what extent the intervention is effective, we believe it makes sense to report all the data in a publication.

      (12) By addressing these recommendations, the authors can significantly improve their manuscript's clarity, reliability, and impact. This would not only support the dissemination of their findings but also would contribute valuable insights into the growing field of chrononutrition.

      We hope that we have satisfactorily answered, discussed and implemented the points mentioned by the reviewer in the manuscript, so that clarity, reliability, and impact have been increased and it can offer a valuable contribution to the named field.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The report describes the control of the activity of the RNA-activated protein kinase, PKR, by the Vaccinia virus K3 protein. Repressive binding of K3 to the kinase prevents phosphorylation of its recognised substrate, EIF2α (the α subunit of the Eukaryotic Initiation Factor 2). The interaction of K3 is probed by saturation mutation within four regions of PKR chosen by modelling the molecules' interaction. They identify K3-resistant PKR variants that recognise that the K3/EIF2α-binding surface of the kinase is malleable. This is reasonably interpreted as indicating the potential adaptability of this antiviral protein to combat viral virulence factors.

      Strengths:

      This is a well-conducted study that probes the versatility of the antiviral response to escape a viral inhibitor. The experimentation is very diligent, generating and screening a large number of variants to recognise the malleability of residues at the interface between PKR and K3.

      Weaknesses:

      (1) These are minor. The protein interaction between PKR and K3 has been previously well-explored through phylogenetic and functional analyses and molecular dynamics studies, as well as with more limited site-directed mutational studies using the same experimental assays.

      Accordingly, these findings largely reinforce what had been established rather than making major discoveries.

      First, thank you for your thoughtful feedback. We agree that our results are concordant with previous findings and recognize the importance of emphasizing what we find novel in our results. We have revised the introduction (lines 65-74 of the revised_manuscript.pdf) to emphasize three findings of interest: (1) the PKR kinase domain is largely pliable across its substrate-binding interface, a remarkable quality that is most fully revealed through a comprehensive screen, (2) we were able to differentiate variants that render PKR nonfunctional from those that are susceptible to Vaccinia K3, and (3) we observe a strong correlation between PKR variants that are resistant to K3 WT and K3-H47R.

      There are some presumptions:

      (2) It isn't established that the different PKR constructs are expressed equivalently so there is the contingency that this could account for some of the functional differences.

      This is an excellent point. We have revised the manuscript to raise this caveat in the discussion (lines 247-251). One indirect reason to suppose that expression differences among our PKR variants are not a dominant source of variation is that we did not observe much variation in kinase activity in the absence of K3.

      (3) Details about the confirmation of PKR used to model the interaction aren't given so it isn't clear how accurately the model captures the active kinase state. This is important for the interaction with K3/EIF2α.

      We have expanded on Supplemental Figure 12 and our description of the AlphaFold2 models in the Materials and Methods section (lines 573-590). We clarify that these models may not accurately capture the phosphoacceptor loop of eIF2α (residues Glu49-Lys60) and the PKR β4-5 linker (Asp338-Asn350) as these are highly flexible regions that are absent in the existing crystal structure complex (PDB 2A1A) and have low AlphaFold2 confidence scores (pLDDT < 50). We also noted, in the Materials and Methods section and in the caption of Figure 1, that the modeled eIF2α closely resembles the crystal structure of standalone yeast eIF2α, which places the Ser51 phosphoacceptor site far from the PKR active site. Thus, we expect there are additional undetermined PKR residues that contact eIF2α.

      (4) Not all regions identified to form the interface between PKR and K3 were assessed in the experimentation. It isn't clear why residues between positions 332-358 weren't examined, particularly as this would have made this report more complete than preceding studies of this protein interaction.

      Great questions. We designed and generated the PKR variant library based on the vaccinia K3 crystal structure (PDB 1LUZ) aligned to eIF2α in complex with PKR (PDB 2A1A), in which PKR residues 338-350 are absent. After the genesis of the project, we generated the AlphaFold2-predicted complex of PKR and vaccinia K3, and have become very interested in the β4-β5 linker, a highly diverse region across PKR homologs which includes residues 332-358. However, this region remains unexamined in this manuscript.

      Reviewer #2 (Public Review):

      Chambers et al. (2024) present a systematic and unbiased approach to explore the evolutionary potential of the human antiviral protein kinase R (PKR) to evade inhibition by a poxviral antagonist while maintaining one of its essential functions.

      The authors generated a library of 426 single-nucleotide polymorphism (SNP)-accessible non-synonymous variants of PKR kinase domain and used a yeast-based heterologous virus-host system to assess PKR variants' ability to escape antagonism by the vaccinia virus pseudo-substrate inhibitor K3. The study identified determinant sites in the PKR kinase domain that harbor K3-resistant variants, as well as sites where variation leads to PKR loss of function. The authors found that multiple K3-resistant variants are readily available throughout the domain interface and are enriched at sites under positive selection. They further found some evidence of PKR resilience to viral antagonist diversification. These findings highlight the remarkable adaptability of PKR in response to viral antagonism by mimicry.

      Significance of the findings:

      The findings are important with implications for various fields, including evolutionary biology, virus-host interfaces, genetic conflicts, and antiviral immunity.

      Strength of the evidence:

      Convincing methodology using state-of-the-art mutational scanning approach in an elegant and simple setup to address important challenges in virus-host molecular conflicts and protein adaptations.

      Strengths:

      Systematic and Unbiased Approach:

      The study's comprehensive approach to generating and characterizing a large library of PKR variants provides valuable insights into the evolutionary landscape of the PKR kinase domain. By focusing on SNP-accessible variants, the authors ensure the relevance of their findings to naturally occurring mutations.

      Identification of Key Sites:

      The identification of specific sites in the PKR kinase domain that confer resistance or susceptibility to a poxvirus pseudosubstrate inhibition is a significant contribution.

      Evolutionary Implications:

      The authors performed meticulous comparative analyses throughout the study between the functional variants from their mutagenesis screen ("prospective") and the evolutionarily-relevant past adaptations ("retrospective").

      Experimental Design:

      The use of a yeast-based assay to simultaneously assess PKR capacity to induce cell growth arrest and susceptibility/resistance to various VACV K3 alleles is an efficient approach. The combination of this assay with high-throughput sequencing allows for the rapid characterization of a large number of PKR variants.

      Areas for Improvement:

      (5) Validation of the screen: The results would be strengthened by validating results from the screen on a handful of candidate PKR variants, either using a similar yeast heterologous assay, or - even more powerfully - in another experimental system assaying for similar function (cell translation arrest) or protein-protein interaction.

      Thank you for your thoughtful feedback. We agree that additional data to validate our findings would strengthen the manuscript. We have individually screened a handful of PKR variants in duplicate using serial dilution to measure yeast growth, and found that the results generally support our original findings. We have revised the manuscript to include these validation experiments (lines 117-119 of the revised_manuscript.pdf, Supplemental Figure 4).

      (6) Evolutionary Data: Beyond residues under positive selection, the screen would allow the authors to also perform a comparative analysis with PKR residues under purifying selection. Because they are assessing one of the most conserved ancestral functions of PKR (i.e. cell translation arrest), it may also be of interest to discuss these highly conserved sites.

      This is a great point. We do find that there are regions of the PKR kinase domain that are not amenable to genetic perturbation, namely in the glycine rich loop and active site. We contrast the PKR functional scores at conserved residues under purifying selection with those under positive selection in Figure 2E (lines 141-143).

      (7) Mechanistic Insights: While the study identifies key sites and residues involved in vaccinia K3 resistance, it could benefit from further investigation into the underlying molecular mechanisms. The study's reliance on a single experimental approach, deep mutational scanning, may introduce biases and limit the scope of the findings. The authors may acknowledge these limitations in the Discussion.

      We agree that further investigation into the underlying molecular mechanisms is warranted and we have revised the manuscript to acknowledge this point in the discussion (lines 284-288).

      (8) Viral Diversity: The study focuses on the viral inhibitor K3 from vaccinia. Expanding the analysis to include other viral inhibitors, or exploring the effects of PKR variants on a range of viruses would strengthen and expand the study's conclusions. Would the identified VACV K3-resistant variants also be effective against other viral inhibitors (from pox or other viruses)? or in the context of infection with different viruses? Without such evidence, the authors may check the manuscript is specific about the conclusions.

      This is a fantastic question that we are interested in exploring in our future studies. In the manuscript we note a strong correlation between PKR variants that evade vaccinia wild-type K3 and the K3-H47R enhanced allele, but we are curious to know if this holds when tested against other K3 orthologs such as variola virus C3. That said, we have revised the manuscript to clarify this limitation to our findings and specify vaccinia K3 where appropriate.

      Reviewer #3 (Public Review):

      Summary:

      -  This study investigated how genetic variation in the human protein PKR can enable sensitivity or resistance to a viral inhibitor from the vaccinia virus called K3.

      -  The authors generated a collection of PKR mutants and characterized their activity in a high-throughput yeast assay to identify 1) which mutations alter PKR's intrinsic biochemical activity, 2) which mutations allow for PKR to escape from viral K3, and 3) which mutations allow for escape from a mutant version of K3 that was previously known to inhibit PKR more efficiently.

      -  As a result of this work, the authors generated a detailed map of residues at the PKR-K3 binding surface and the functional impacts of single mutation changes at these sites.

      Strengths:

      -  Experiments assessed each PKR variant against three different alleles of the K3 antagonist, allowing for a combinatorial view of how each PKR mutant performs in different settings.

      -  Nice development of a useful, high-throughput yeast assay to assess PKR activity, with highly detailed methods to facilitate open science and reproducibility.

      -  The authors generated a very clean, high-quality, and well-replicated dataset.

      Weaknesses:

      (9) The authors chose to focus solely on testing residues in or near the PKR-K3 predicted binding interface. As a result, there was only a moderately complex library of PKR mutants tested. The residues selected for investigation were logical, but this limited the potential for observing allosteric interactions or other less-expected results.

      First, we greatly appreciate all your feedback on the manuscript, as well as raising this particular point. We agree that this is a moderately complex library of PKR variants, from which we begin to uncover a highly pliable domain with a few specific sites that cannot be altered. We have revised the manuscript to raise this limitation (lines 284-288 of the revised_manuscript.pdf) and encourage additional exploration of the PKR kinase domain.

      (10) For residues of interest, some kind of independent validation assay would have been useful to demonstrate that this yeast fitness-based assay is a reliable and quantitative readout of PKR activity.

      We agree that additional data to validate our findings would strengthen the manuscript. We have individually screened a handful of PKR variants in duplicate using serial dilution to measure yeast growth, and generally found that the results support our original findings. We have revised the manuscript to include this validation experiment (lines 117-119, Supplemental Figure 4).

      (11) As written, the current version of the manuscript could use more context to help a general reader understand 1) what was previously known about these PKR and K3 variants, 2) what was known about how other genes involved in arms races evolve, or 3) what predictions or goals the authors had at the beginning of their experiment. As a result, this paper mostly provides a detailed catalog of variants and their effects. This will be a useful reference for those carrying out detailed, biochemical studies of PKR or K3, but any broader lessons are limited.

      Thank you for bringing this to our attention. We have revised the introduction of the manuscript to provide more context regarding previous work demonstrating an evolutionary arms race between PKR and K3 and how single residue changes alter K3 resistance (lines 51-64).

      (12) I felt there was a missed opportunity to connect the study's findings to outside evolutionary genetic information, beyond asking if there was overlap with PKR sites that a single previous study had identified as positively selected. For example, are there any signals of balancing selection for PKR? How much allelic diversity is there within humans, and are people typically heterozygous for PKR variants? Relatedly, although PKR variants were tested in isolation here, would the authors expect their functional impacts to be recessive or dominant, and would this alter their interpretations? On the viral diversity side, how much variation is there among K3 sequences? Is there an elevated evolutionary rate, for example, in K3 at residues that contact PKR sites that can confer resistance? None of these additions are essential, but some kind of discussion or analysis like this would help to connect the yeast-based PKR phenotypic assay presented here back to the real-world context for these genes.

      We appreciate this suggestion to extend our findings to a broader evolutionary context. There is little allelic diversity of PKR in humans, with all nonsynonymous variation listed in gnomAD being rare. (PKR shows sequence diversity in comparisons across species, including across primates.) Thus, barring the possibility of variation being present in under-studied populations, there is unlikely to be balancing selection on PKR in humans. Our expectation is that beneficial mutations in PKR for evading a pseudosubstrate inhibitor would be dominant, as a small amount of eIF2α phosphorylation is capable of halting translation (Siekierka, PNAS, 1984). There is a recent report citing PKR missense variants associated with dystonia that can be dominantly or recessively inherited (Eemy et al. 2020 PMID 33236446). Elde et al. 2009 (PMID 19043403) notes that poxvirus K3 homologs are under positive selection but no specific residues have been cited to be under positive selection. The lack of allelic diversity in PKR in humans notwithstanding, PKR could experience future selection in the human population as evidenced by its rapid evolution in primates, so we fully agree that a connection to the real-world context is useful. We have noted these topics in the discussion section (lines 289-294).

      Reviewer #1 (Recommendations For The Authors):

      I have no major criticisms but ask for some clarifications and make some comments about the perceived weaknesses.

      (13)  If the authors disagree with my summation that the findings largely replicate what was known, could they detail how the findings differ from what was known about this protein interaction and the major new insights stemming from the study? Currently, the abstract is a little philosophical rather than listing the explicit discoveries of the study.

      Thank you again for raising the need for us to clearly convey the novelty of our findings. We have revised the final paragraph in our introduction as described in comment #1.

      (14) As the experimental approach is well reported it is unnecessary to confirm the proposed activity by, for instance, measures of Sui2 phosphorylation. However, previous reports have recognised that point mutants of PKR can be differentially expressed. The impact of this potential effect is unknown in the current experimentation as there are no measures of the expression of the different mutant PKR constructs. The large number of constructs used makes this verification onerous. The potential impact could be ameliorated by redundant replacing each residue (hoping different residues have different effects on expression). Still, this limitation of the study should be acknowledged in the text.

      We greatly appreciate this comment and agree that this should be made clear in the text, which we have added to the discussion of the manuscript (lines 247-251).

      (15) Preceding findings and the modeling in this report recognise an involvement in the kinase insert region (residues 332 to 358) in PKR's interaction with K3 but this region is excluded from the analysis. These residues have been largely disregarded in the preceding analysis (it is absent from the molecular structure of the kinase) so its inclusion here might have lent a more novel aspect or delivered a more complete investigation. Is there a justification for excluding this flexible loop?

      The PKR variant library was designed based on the crystal structure of K3 (PDB 1LUZ) aligned to eIF2α in complex with PKR (PDB 2A1A). After the library was designed and made we attained complete predicted structures of PKR in complex with eIF2α and K3, which largely agrees with the predicted crystal structures but contain the additional flexible loops that were not captured in the crystal structures. Though the library studied here does not explore variation in the kinase insert region, we are very interested in doing so in our future studies.

      (16)  Could the explanation of the 'PKR functional score' be clarified? The description given within the legend of SF1 was helpful, so could this be replicated earlier in the main body of the text when introducing these experiments? e.g. As PKR activity is toxic to yeast, the number of cells in the pool expressing the functional PKR will decrease over time. Thus the associated barcode read count will also decrease, while the read count for the nonfunctional PKR will increase. This is termed the PKR function score, which will be relatively lower for cells transformed with less active PKR than those with more active PKR.

      Thank you for suggesting this clarification, we have revised the manuscript to clarify our definition of the PKR functional score (lines 106-109).

      (17)  Another suggestion to clarify this term is to modify the figures. Currently, the intent of the first simulated graph in Fig 1E is clear but the inversion of the response (shown by the transposition of the colours) in the next graph (to the right) is less immediately obvious. Accordingly, the orientation of the 'PKR functional score' is uncertain. Could the authors add text to the rightmost graphic in Figure 1E by, for instance, indicating the PKR activity in the vertical column with text such as 'less active' (at the bottom), 'WT' (in the centre), and 'more activity' (at the top)? Also, the position of the inactive K296R mutant might be added to Figure 2A complementing the positioning of the active WT kinase in the first data graph of this kind.

      We appreciate your specific feedback to improve the figures of the manuscript, we have made adjustments to Figure 1E to clarify how we derive the PKR functional scores.

      (18) The authors don't use existing structures of PKR in their modelling. However, there is no information about the state of the PKR molecule used for modelling. Specific elements of the kinase domain affect its interaction with K3 so it would be informative to know the orientation of these elements in the model. Could the authors detail the state of pivotal kinase elements in their models? This could involve the alignment of the N- and C-lobes, the orientation of kinase spines (C- and R-spines), and the phosphorylation stasis of residues in the activation loop, or at least the position of this loop in relationship to that adopted in the active dimeric kinase (e.g. PDB-2A1A, 3UIU or 6D3L). Alternatively, crystallographic structures of active inactive PKR could be overlayed with the theoretical structure used for modelling (as supplementary information).

      We have revised the manuscript to describe the alignment of the predicted PKR-K3 complex with active and inactive PKR, and we have extended Supplemental Figure 12 with an overlay of the predicted structures with existing structures. We have also added a supplemental data file containing the RMSD values of PKR (from the predicted PKR-K3 complex) aligned to active (PDB 2A1A) and inactive (PDB 3UIU) or unphosphorylated (PDB 6D3L) PKR (5_Structure-Alignment-RMSD-Values.xlsx). We have also provided the AlphaFold2 best model predictions for the PKR-eIF2α complex (6_AF2_PKR-KD_eIF2a.pdb) and PKR-K3 complex (7_AF2_PKR-KD_VACV-K3.pdb). Looking across the RMSD values, the AlphaFold2 model of PKR most closely resembles unphosphorylated PKR (PDB 6D3L) though we note the activation loop is absent from PDB 6D3L and 3UIU. We also aligned the Ser51 phosphoacceptor loop of AlphaFold2 eIF2α model to PDB 1Q46 and we see that the model reflects the pre-phosphorylation state. This loop is expected to interact with the PKR active site, which is not captured in our model and we state this explicitly in the caption of Figure 1 (lines 665-668).

      (19) Could some specific residue in Figure 7 be labelled (numbered) to orient the findings? Also, the key in this figure doesn't title the residues coloured white (RE red/black/blue). The white also isn't distinguished from the green (outside the regions targeted for mutagenesis).

      Excellent suggestion, we have revised this figure to include labels for the sites to orient the reader and clarify our categorization of PKR residues in the kinase domain.

      (20)  Regarding the discussion, the authors adopt the convention of describing K3 as a pseudosubstrate. Although I realize it is common to refer to K3 as a pseudosubstrate, it isn't phosphorylated and binds slightly differently to PKR so alternative descriptors, such as 'a competitive binder', would more accurately present the protein's function. Possibly for this reason, the authors declared an expectation that evolution pressures should shift K3 to precisely mimic EIF2α. However, closer molecular mimicry shouldn't be expected for two reasons. The first is a risk of disrupting other interactions, such as the EIF2 complex. Secondly, equivalent binding to PKR would demote K3 to merely a stoichiometric competitor of EIF2α. In this instance, effective inhibition would require very high levels of K3 to compete with equivalent binding by EIF2α. This would be demanding particularly upon induction of PKR during the interferon response. To be an effective inhibitor K3 has to bind more avidly than EIF2α and merely requires a sufficient overlap with the EIF2α interface on PKR to disrupt this alternative association. This interpretation predicts that K3 is under pressure to bind PKR by a different mechanism than EIF2α.

      We appreciate your thoughtful point about the usage of the term pseudosubstrate. Ultimately, we’ve decided to continue using the term due to its historical usage in the field. The question of the optimal extent of mimicry in K3 is a fascinating one, and we greatly appreciate your thoughts. We wholly agree that the possibility of K3 having superior PKR binding relative to eIF2α would be preferable to perfect mimicry. In our Ideas and Speculation section, we propose that benefits towards increasing PKR affinity may need to be balanced against potential loss of host range resulting from overfitting to a given host’s PKR. However, the possibility that reduced mimicry could be selected to avoid disruption of eIF2 function had not occurred to us; thank you for pointing it out!

      (21) The discussion of the 'positive selection' of sites is also interesting in this context. To what extent has the proposed positive selection been quantified? My understanding is that all of the EIF2α kinases are conserved and so demonstrate lower levels of residue change that might be expected by random mutagenesis i.e. variance is under negative selection. The relatively higher rate of variance in PKR orthologs compared to other EIF2α kinases could reflect some relaxation of these constraints, rather than positive selection. Greater tolerance of change may stem from PKR 's more sporadic function in the immune response (infrequent and intermittent presence of its activating stimuli) rather than the ceaseless control of homeostasis by the other EIF2α kinases. Also, induction of PKR during the immune response might compensate for mutations that reduce its activity. I believe that the entire clade of extant poxviruses is young relative to the divergence between their hosts. Accordingly, genetic variance in PKR predates these viruses. Although a change in PKR may become fixed if it affords an advantage during infection, such an advantage to the host would be countered by the much higher mutation rates of the virus. This would appear to diminish the opportunity for a specific mutation to dominate a host population and, thereby, to differentiate host species. Rather, pressure to elude control by a rapidly evolving viral factor would favour variation at sites where K3 binds. This speculation offers an alternative perspective to the current discussion that the variance in PKR orthologs stems from positive selection driven by viral infection.

      We appreciate this stimulating feedback for discussion. Three of the four eIF2α kinases (HIR, PERK, and GCN2) appear to be under purifying selection (Elde et al. 2009, PMID 19043403), which stand in contrast to PKR. Residues under positive selection have been found throughout PKR, including the dsRNA binding domains, linker region, and the kinase domain. Importantly, the selection analysis from Elde et al. and Rothenburg et al. concluded that positive selection at these sites is more likely than relaxed selection. We agree that poxviruses are young, though we would guess that viral pseudosubstrate inhibition of PKR is ancient. Many viral proteins have been reported to directly interact with PKR, including herpes virus US11, influenza A virus NS1A, hepatitis C virus NS5A, and human immunodeficiency virus Tat. The PKR kinase domain does contain residues under purifying selection that are conserved among all four eIF2α kinases, but it also contains residues under positive selection that interface with the natural substrate eIF2α. Our work suggests that PKR is genetically pliable across several sites in the kinase domain, and we are curious to know if this pliability would hold at the same sites across the other three eIF2α kinases.

      (22) The manuscript is very well written but has a small number of typos; e.g. an aberrant 'e' ln 7 of the introduction, capitalise the R in ranavirus on the last line of the fourth paragraph of the discussion, and eIF2α (EIF2α?) is occasionally written as eIFα in the materials&methods.

      Thank you for bringing these typos to our attention! We’ve deleted the aberrant ‘e’ in the introduction, capitalized ‘Ranavirus’ in the discussion (line 265), and corrected ‘eIFα’ to ‘eIF2α’ throughout the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Additional minor edits or revisions:

      (23) Paragraph 3 of the Introduction gives the impression that most of the previous work on the PKR-virus arms race is speculative. However, it is one of the best-described and most convincing examples of virus-host arms races. Can the authors edit the paragraph accordingly?

      Thank you for bringing this to our attention. We have revised the third paragraph and strengthened the description of the evolutionary arms race between PKR and viral pseudosubstrate antagonists.

      (24) Introduction: PKR has "two" double-stranded RNA binding domains. Can the authors update the text accordingly?

      We have updated the manuscript to clarify PKR has two dsRNA binding domains (lines 44-45).

      (25) The authors test here for one of the key functions of PKR: cell growth/translation arrest. Because of PKR pleiotropy, the manuscript may be edited accordingly: For example, statements such as "We found few genetic variants render the PKR kinase domain nonfunctional" are too speculative as they may retain other (not tested here) functions.

      This is a great suggestion, we have revised the manuscript to specify our definition of nonfunction in the context of our experimental screen (lines 86-92 and 106-109) and acknowledge this limitation in our experimental screen (lines 304-307).

      (26) The authors should specify "vaccinia" K3 whenever appropriate.

      We appreciate this comment and have revised the manuscript to specify vaccinia K3 where appropriate (e.g. lines 62,66, 70, 80, 108, and 226).

      (27) Ref for ACE2 diversification may include Frank et al 2022 PMID: 35892217.

      Thank you for pointing us to this paper, we have included it as a reference in the manuscript (line 277).

      (28) Positive selection of PKR as referred to by the authors corresponds to analyses performed in primates. As shown by several studies, the sites under positive selection may vary according to host orders. Can the authors specify this ("primate") in their manuscript? And/or shortly discuss this aspect.

      Thank you for raising this point. In the manuscript we performed our analysis using vertebrate sites under positive selection as identified in Rothenburg et al. 2009 PMID 19043413 (lines 51 and figure legends). We performed the same analysis using sites under positive selection in primates (as identified by Elde et al. 2009 PMID 19043403) and again found a significant difference in PKR functional scores versus K3. We have revised the manuscript to clarify our use of vertebrate sites under positive selection (line 80-81).

      (29) We view deep mutational scanning experiments as a complementary approach to positive selection": The authors should edit this and acknowledge previous and similar work of other antiviral factors, in particular one of the first studies of this kind on MxA (Colon-Thillet et al 2019 PMID: 31574080), and TRIM5 (Tenthorey et al 2020 PMID: 32930662).

      Thank you for raising up these two papers, which we acknowledge in the revised manuscript (line 299).

      (30) We believe Figure S7 brings important results and should be placed in the Main.

      We appreciate this suggestion, and have moved the contents of the former supplementary Figure 7 to the main text, in Figure 6.

      (31) The title may specify "poxvirus".

      Thank you for the suggestion to specify the nature of our experiment, we have adjusted the title to: Systematic genetic characterization of the human PKR kinase domain highlights its functional malleability to escape a poxvirus substrate mimic (line 3).

      Reviewer #3 (Recommendations For The Authors):

      (32) No line numbers or page numbers are provided, which makes it difficult to comment.

      We sincerely apologize for this oversight and have included line numbers in our revised manuscript as well as the tracked changes document.

      (33) In the introduction, I recommend defining evolutionary arms races more clearly for a broad audience.

      Thank you for this suggestion. We have revised the manuscript in the first and third paragraphs to more clearly introduce readers to the concept of an evolutionary arms race.

      (34) The introduction could use a clearer statement of the question being considered and the gap in knowledge this paper is trying to address. Currently, the third paragraph includes many facts about PKR and the fourth paragraph jumps straight into the approach and results. Some elaboration here would convey the significance of the study more clearly. As is, the introduction reads a bit like "We wanted to do deep mutational scanning. PKR seemed like an ok protein to look at", rather than conveying a scientific question.

      This is a great suggestion to improve the introduction section. We have heavily revised the third and fourth paragraphs of the introduction to clarify the motivation, approach, and significance of our work.

      (35) Relatedly, did the authors have any hypotheses at the start of the experiment about what kinds of results they expected? e.g. What parts of PKR would be most likely to generate escape mutants? Would resistant mutants be rare or common? etc? This would help the reader to understand which results are expected vs. surprising.

      These are all great questions. We have revised the introduction of the manuscript to point out that previous studies have characterized a handful of PKR variants that evade vaccinia K3, and these variants were made at sites found to be under positive selection (lines 60-64).

      (36) A description of the different K3 variants and information about why they were chosen for study should also be added to the Introduction. It was not until Figure 5 that the reader was told that K3-H47R was the same as the 'enhanced' K3 allele you are testing.

      Thank you for bringing this to our attention, we have revised the introduction to clarify the experimental conditions (lines 65-67) and specify K3-H47R as the enhanced allele earlier in the manuscript (line 100).

      (37) Does every PKR include just a single point mutation? It would be nice to see data about the number and types of mutations in each PRK window added to Supplemental Figure 1.

      Thank you for the suggestion to improve this figure. Every PKR variant that we track has a single point mutation that generates a nonsynonymous mutation. In our PacBio sequencing of the PKR variant library we identified a few off-target variants or sequences with multiple variants, but we identified the barcodes linked to those constructs and discarded those variants in our analysis. We have revised Supplemental Figure 1 to include the number and types of mutations made at each PKR window.

      (38) In terms of the paper's logical flow, personally, I would expect to begin by testing which variants break PKR's function (Figure 3) and then proceeding to see which variants allow for K3 escape (Figure 2). Consider swapping the order of these sections.

      Thank you for this suggestion, and we can appreciate how the flow of the manuscript may be improved by swapping Figures 2 and 3. We have decided to maintain the current order of the figures because we use Figure 3 to emphasize the distinction of PKR sites that are nonfunctional versus susceptible to vaccinia K3.

      (39) Figure 3A seems like a less-informative version of Figure 4A, recommend combining these two. Same comment with Figure 5A and Figure 6A.

      We appreciate this specific feedback for the figures. Though there are similarities between figure panels (e.g. 3A and 4A) we use them to emphasize different points in each figure. For example, in Figure 3 we emphasize the general lack of variants that impair PKR kinase activity, and in Figure 4 we distinguish kinase-impaired variants from K3-susceptible variants. For this reason, and given space constraints, we have chosen to maintain the figures separately. We did decide to move the former Figure 6 to the supplement.

      (40) In general, it felt like there was a lot of repetition/re-graphing of the same data in Figures 3-6. I recommend condensing some of this, and/or moving some of the panels to supplemental figures.

      Thank you for your suggestion, we have revised the manuscript and have moved Figure 6 to Supplemental Figure 7.

      (41) In contrast, Supplemental Figure 7 is helpful for understanding the distribution of the data. Recommend moving to the main text.

      This is a great recommendation, and we have moved Supplemental Figure 7 into Figure 6.

      (42) How do the authors interpret an enrichment of positively selected sites in K3-resistant variants, but not K3-H74R-resistant variants? This seems important. Please explain.

      Thank you for this suggestion to improve the manuscript; we agree that this observation warranted further exploration. We found a strong correlation in PKR functional scores between K3 WT and K3-H47R, and with that we find sites under positive selection that are resistant to K3 WT are also resistant to K3-H47R. The lack of enrichment at positively selected sites appears to be caused by collapsed dynamic range between PKR wild-type-like and nonfunctional variants in the K3-H47R screen. We have revised the manuscript to clarify this point (line 202-204).

      (43) Discussion: The authors compare and contrast between PKR and ACE2, but it would be worth mentioning other examples of genes involved in antiviral arms races wherein flexible, unstructured loops are functionally important and are hotspots of positive selection (e.g. MxA, NLRP1, etc).

      We greatly appreciate this suggestion to improve the discussion. We note this contrast between the PKR kinase domain and the flexible linkers of MxA and NLRP1 in the revised manuscript (lines 273-274).

      (44) Speculation section: What is the host range of the vaccinia virus? Is it likely to be a generalist amongst many species' PKRs (and if so, how variable are those PKRs)? Would be worth mentioning for context if you want to discuss this topic.

      Thank you for raising this question. Vaccinia virus is the most well studied of the poxviruses, having been used as a vaccine to eradicate smallpox, and serves as a model poxvirus. Vaccinia virus has a broad host range, and though the name vaccinia derives from the Latin word “vacca” for cow the viruses origin remains uncertain (Smith 2007 https://doi.org/10.1007/978-3-7643-7557-7_1). has been used to eradicate smallpox as a vaccine and serves as a model poxvirus. Thought the natural host is unknown, it appears to be a general inhibitor of vertebrate PKRs The natural host of vaccinia virus is unknown, though there is some evidence to suggest it may be native to rabbits and does appear to be generalist.

      (45) Many papers in this field discuss interactions between PKR and K3L, rather than K3. I understand that this is a gene vs. protein nomenclature issue, but consider matching the K3L literature to make this paper easier to find.

      Thank you for bringing this to our attention. We have revised the manuscript to specify that vaccinia K3 is expressed from the K3L gene in both the abstract (line 26) and the introduction (line 56) to help make this paper easier to find when searching for “K3L” literature.

      (46) Which PKR sequence was used as the wild-type background?

      This is a great question. We used the predominant allele circulating in the human population represented by Genbank m85294.1:31-1686. We cite this sequence in the Methods (line 421) and have added it to the results section as well (lines 84).

      (47) Figure 1C: the black dashed line is difficult to see. Recommend changing the colors in 1A-1C.

      Thank you for this suggestion, we have changed the dashed lines from black to white to make them more distinguishable.

      (48) Figure 1D: Part of the point of this figure is to convey overlaps between sites under selection, K3 contact sites, and eIF2alpha contact sites, but at this scale, many of the triangles overlap. It is therefore impossible to tell if the same sites are contacted vs. nearby sites. Perhaps the zoomed-in panels showing each of the four windows in the subsequent figures are sufficient?

      Thank you for bringing this to our attention. We have scaled the triangles down to reduce their overlap in Figure 1D and list all sites of interest (predicted eIF2α and vaccinia contacts, conserved sites, and positive selection sites) in the Materials and Methods section “Predicted PKR complexes and substrate contacts”.

      (49) Figure 1E: under "1,293 Unique Combinations", there is a line between the PKR and K3 variants, which makes it look like they are expressed as a fusion protein. I believe these proteins were expressed from the same plasmid, but not as a fusion, so I recommend re-drawing. Then in the graph, the y-axis says "PKR abundance", but from the figure, it is not clear that this refers to relative abundance in a yeast pool. Perhaps "yeast growth" or similar would be clearer?

      Thank you for the specific feedback to improve Figure 1. We have made the suggested edits to clarify that PKR and vaccinia K3 are not fused but each is expressed from their own promoter. We have also changed the y-axis from “PKR Abundance” to “Yeast Growth”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) Correct capitalization errors, ensuring the first letter of each sentence is capitalized.

      Thank you for your comment. We have corrected capitalization errors.

      (2) Ensure that all technical terms and abbreviations are introduced in full when first mentioned and consistently used throughout the text.

      Thank you for your comment. we have checked and corrected the issue.

      (3) Review the manuscript for grammatical errors and improve sentence structures to enhance readability.

      Thank you for your comment. we have checked and corrected the issue.

      (4) Ensure all figures referenced in the text, such as Fig. 3G, are appropriately discussed and integrated into the narrative.

      Thank you for your comment. we have discussed and integrated Fig. 3G into the narrative (Page 12, Line 162-166).

      (5) Maintain consistent formatting, including first-line indentation and spacing before paragraphs, to improve the document's visual coherence.

      Thank you for your comment. we have checked and corrected the issue.

      (6) Provide additional explanations for the selection criteria of final model variables, particularly the rationale behind choosing the λ_1se criterion in the LASSO regression.

      Thank you for your comment. we have provided explanations for choosing the λ_1se criterion in the LASSO regression (Page 25, Line 315-316; Page 27, Line 363-364).

      (7) Conduct validation studies with cohorts from other high-altitude regions to assess the generalizability and robustness of the prediction models.

      Thank you for your comment. The lack of validation of cohorts from other high-altitude regions is a weakness in this study, and in our follow-up study, we will conduct external validation with cohorts from more other high-altitude regions to assess the generalizability and robustness of our prediction models.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, Bockorny, Muthuswamy, and Huang et al. performed proteomics analysis of plasma extracellular vesicles (EVs) from pancreatic ductal adenocarcinoma (PDAC) patients and patients with benign pancreatic diseases (chronic pancreatitis and intraductal papillary mucinous neoplasm, IPMN) to develop a 7-EV protein signature that predicts PDAC. Moreover, the authors identified PSMB4, RUVBL2, and ANKAR as being associated with metastasis. These studies provide important insight into alterations of EVs during PDAC progression and the data supporting predict PDAC with EV protein signatures are solid. However, there are certain concerns regarding the rigor and novelty of the data analysis and interpretation, as well as the clinical implications, as detailed below.

      (1) Plasma EVs were characterized by transmission electron microscopy and nanoparticle tracking analysis to confirm their morphology and size. The authors should also include an analysis of putative EV markers (e.g., tetraspanins, syntenin, ALIX, etc.) to confirm that the analyzed particles are EVs.

      We thank the reviewer for this comment. In the previous study from our co-authors who developed EVtrap method (PMID:32396726), they used electron microscopy and NTA , as well as quantification of typical EV protein markers, such as CD9, to confirm that particles isolated using EVtrap had typical characteristics of the extracellular vesicles. As such, these experiments were not replicated here. We added the following statement to the manuscript:

      “Previous analyses using electron microscopy and nanoparticle tracking also confirmed that the vast majority of particles isolated by EVtrap had diameters between 100-200 nm, consistent with exosomes (PMID:32396726). In addition, EVtrap isolates demonstrates higher abundance of CD9, a common exosome marker, as compared to isolates from other traditional EV isolation methods such as size exclusion chromatography and ultracentrifugation (PMID:32396726)”

      (2) The authors identified multiple over-expressed proteins in PDAC based on their foldchange and p-value; however, due to the heterogeneity of PDAC, it is necessary to show a heatmap displaying their abundance in all samples. High fold change does not necessarily indicate consistently high abundance in all PDAC samples.

      We thank the reviewer for this suggestion. We have now included the heatmap in the new Supplementary Figure 3.

      (3) PSMB4, RUVBL2, and ANKAR were identified as being associated with metastasis. The authors state that they intended to distinguish early and late-stage cancer samples, but it is unclear why they chose to compare metastatic and non-metastatic samples, as the non-metastatic group also includes late-stage cancer samples. This sentence should be rephrased to more accurately reflect the sample types profiled.

      We thank the reviewer for pointing this out. We would like to clarify that this analyses shown in Figures 3B and 3C pertain to patients with Metastatic vs Non-Metastatic disease, not early versus late stage. We edited the text to ensure this information is clear.

      (4) Non-metastatic and metastatic patients were separated based on global protein abundance. The samples within each group display significant heterogeneity, with some samples displaying similar patterns although they were classified into different groups (Figure 3A), and the samples within the same group, particularly the metastasis group, did not consistently exhibit similar patterns of protein abundance. The authors should clarify this point.

      We thank the reviewer for this comment. The EV proteomic expression is anticipated not to show the exact pattern across of samples of each group. The purpose of this experiment depicted in Figure 3 heatmap is to show the enrichment for pattern of expressions, but we acknowledge that not all samples from the same group have the exact proteome pattern.

      We added this statement in the discussion section:

      “As expected, the EV proteomic profiles of PDAC patients exhibited significant heterogeneity. While the above mentioned markers exhibited strong association with disease states at population levels, their abundances in individual patients varied significantly. Those observations highlight the need to develop multi-protein panels for pancreatic cancer diagnosis and prognosis.”

      (5) The authors performed the survival analysis on a set of EV proteins but did not specify the origin of these markers or how many markers were examined. The authors should show their abundances across different groups, such as different stages and metastasis status.

      We thank the reviewer for the comments. The goal of this experiment was not to identify EV proteins that performed similarly well for diagnosing and prognostication. In Figure 3A, 3B and 3C, we identified EV proteins that had better performance for diagnosis of metastatic disease. In these experiments we made  comparative analysis between patients with metastasis versus non-metastasis. In the experiment depicted in Figure 3D, the goal was to identify EV markers that had better performance is prognosticating outcomes as measured by overall survival, out of the markers identified in the previous experiments from Figure 3A. We would like to further clarify that based on our observation and others, it has become clear that EV profiles from cancer patients are highly heterogenous and we do not anticipate that a single marker will have sufficient test performance for cancer diagnosis or prognosis assessment when measured isolated. Rather, we anticipate that a panel of markers may yield better performance for diagnosis while a different combination of EV markers may have better performance for prognosis assessment.

      (6) The classification model yielded a 100% accuracy, which may refer to AUC, in their discovery cohort, but it decreased to 89% in the independent cohort. This suggests that the authors have encountered overfitting issues with their model, where it performed well on the discovery cohort but did not generalize well to the independent cohort. The authors should clarify this point. The AUC score of the 7-EV signature is 0.89 and is not equivalent to prediction accuracy. In order to demonstrate prediction accuracy, the authors should show the confusion matrix of training and testing data as well as other evaluation metrics, such as accuracy, precision, and recall.

      We thank the reviewer for providing these insightful comments. As you noted, the 7-biomarker signature machine learning model attained an impressive 100% accuracy within the internal Discovery Cohort, raising concerns about potential overfitting in the external validation dataset. Acknowledging the noted difference in AUROC of 0.11 in the external validation cohort, which surpasses the typical reported range of ~0.06-0.09, the model demonstrated a commendable AUROC of 0.89 in an independent patient cohort. Moreover, the utilization of an alternate technology to measure protein abundance in the validation dataset, underscores the model’s reproducibility and validity. We have provided the model metrics for both internal- and external-validation cohort. For these, please see updated Supplementary Figure 7, as well as the new Supplementary Figure 6 and Supplementary Figure 8. We also amended the discussion section to acknowledge that the validation cohort had limited sample size and proteins were measured in using a different method. Those factors likely contributed to the lower accuracy of predictions in the validation cohort. We addressed these limitations in the discussion section of the manuscript.

      (7) The authors should include more details of their model and the process of selection of signatures to enhance the reproducibility and transparency of their methods.

      We thank the reviewer for their valuable comments. To enhance clarity, we have incorporated additional information regarding the method employed for biomarker signature identification into the ‘Methods Section’ in page 23.  We note that Supplementary Table 7a provides details on ‘Sensitivity, Specificity, Precision, and AUC’ for the 16 markers included in the external validation study. Additionally, Supplementary Table 7b presents the contingency table for 7-biomarker signature, offering insights into model accuracy for both the Internal-Discovery and External Validation cohorts.  

      Reviewer #2 (Public Review):

      The authors intended to identify a protein signature in extracellular vesicles of serum to distinguish pancreatic ductal adenocarcinoma from benign pancreatic diseases.

      A major strength of the work presented is the valuable profiling of a significant number of patient samples, with a rich cohort of patients with pancreatic cancer, benign pancreatic diseases, and healthy controls. However, despite the strong cohorts presented, the numbers of patient samples for benign pancreatic diseases as well as controls were very limited.

      Also, the method used to isolate vesicles, EVTrap, recognizes double bilayers, which means that it can detect cellular debris and apoptotic bodies, which are very common in the circulation of patients that are undergoing chemotherapy. It would be important to identify the patients that are therapy naïve and the ones that are not because of this possible bias.

      We thank the Reviewer for these comments. We want to point out that the experiments presented in Supplementary Figure 1 (Transmission electron microscopy images and Nanoparticle tracking analysis) confirm that the vesicles isolated with EVTrap are not cellular debris and apoptotic bodies. Rather, these structures are in the nano range expected for exosomes. This is further supported by the additional work from our co-author and collaborator describing the development of EVtrap and its performance in isolating exosomes when compared to other traditional methods such as ultracentrifugation and size exclusion chromatography (PMID:32396726).

      As per the Reviewer’s request, we have provided an additional heatmap figure depicting whose patients are treatment naïve to differentiate from those who have received treatment (revised Figure 2C).

      Additionally, the transmission electron microscopy data reflect this heterogeneity of the samples, also with little identification of double bilayered vesicles. It would be important to identify some extracellular vesicles markers in those preparations to strengthen the quality of the samples analyzed.

      We appreciate the comment from the Reviewer and acknowledge the importance of identifying exosome markers on the isolate from EVtrap. These experiments have already been done and are reported in the original paper describing the development of this method by our co-authors in a separate work. In the manuscript PMID: 30080416, our collaborators demonstrated the detection of CD9, a well-known exosome marker, using Western Blot from isolates using EVtrap or ultra-centrifugation, a traditional technique to isolate exosomes. This work showed that EVtrap yielded much higher recovery rate of exosomes with lower contamination from soluble proteins. We did not repeat these already published experiments, but we amended our manuscript to reference these results.

      What is more, previously published work with this same methodology identifies around 2000 proteins per sample. It would be important to explain why in this study there seems to be a reduction in more than 50% of the amount of proteins identified in the vesicles.

      We thank the Reviewer for pointing out this important detail. In the previous work in which EVtrap was developed by our co-authors, the blood samples were processed using a different protocol, with shorter centrifugation (2,500g for 10 min) (PMID: 32396726). In the current work, we employed three centrifugation steps. As detailed in the Methods section of the manuscript, blood samples were centrifuged at 1,300g for 15 min. Then  plasma was removed from the top carefully avoiding cell pellet;  Repeat centrifugation of plasma at 2,500g for 15 min;  Again, plasma was removed from the top carefully avoiding cell pellet;  Third centrifugation at 2,500g for 15 min. This more extensive centrifugation process was intended to further increase the removal of platelets, apoptotic bodies, and other large particles and aggregates. Accordingly, we anticipate that the additional centrifugation steps decreased the contamination of our isolates but may have also decreased the amount of exosome proteins, hence the lower amount of exosome proteins identified in our study as compared to the original study from our co-authors (PMID: 32396726).

      One of the proteins that constantly surges on the analysis is KRT20. It would be important to proceed with the analysis by first filtering out possible contaminants of the proteomics, of which keratins are the most common ones.

      We thank the Reviewer for this comment. We would like to point out that we do believe that KRT20 is, in fact, cancer related and a not a contaminant. This is supported by our results presented in this manuscript showing enrichment or KRT20 in PDAC cases, and lower expression in benign samples. If this protein was a contaminant, its expression would be found uniformly in all samples, there would be no apparent reason for different expression between malignant vs benign cases, as all samples were processed following the same procedures. In addition, increased expression of KRT20 in PDAC tissues has also been reported by others. For instance, in a study by Schmiz-Winnthal  (PMID: 16364723), the authors showed that Cytokeratin 20 (KRT20) were expressed in 76% of PDAC patients and expression of KRT20 was associated with poor survival after surgical resection. Based on these observations, we believe that the KRT20 identified in our study is indeed a tumor associated EV protein rather than contamination.

      Finally, none of the 7-extracellular vesicle protein signatures has been validated by other techniques, such as western blot, in extracellular vesicles isolated by other, standard, methods, such as size exclusion chromatography.

      A distinct technique for protein analysis was done but not a different method of isolation of these vesicles. This would strengthen the results and the origin of the proteins.

      We appreciate the Reviewer’s comment. We would like to again emphasize that the goal of this manuscript was not to compare the performance of EVtrap with other traditional EV isolation approaches such as ultracentrifugation and size exclusion chromatography.  The main goal of study is to determine proteomic profiles of EVs isolated from clinical samples and provide such information to research community for further studies. As the Reviewer points out, proteins in EVs are highly heterogeneous which highlight the complexity of EV biology and interpatient heterogeneity of pancreatic cancer.  We do not anticipate the development of EV-based markers for pancreatic diagnosis can be achieved by a single team, but by a community of researchers. We hope information presented in the current study will help other researchers identify additional candidates for validation in future work. Nonetheless, we edited the manuscript to discuss the limitation of not doing cross-validation of protein detection using a different method.

      The conclusions that are reached do not fully meet the proposed aims of the identification of a protein signature in circulating extracellular vesicles that could improve early detection of the disease. The authors did not demonstrate the superiority of detection of these proteins in extracellular vesicles versus simply performing an ELISA, nor their superiority with respect to the current standard procedure for diagnosis.

      We would like to clarify to the Reviewer that the goal of this manuscript was not to prove superiority of the EV signature biomarker in diagnosing pancreatic cancer as compared to current standard of care (SOC) practice, i.e., CT scans, endoscopic ultrasound and CA19-9. In order to prove such superiority, one would require a large, randomized phase III trial with several hundred patients. This was not the pursue of our discovery EV proteomics study and we double checked our manuscript to ensure no such claim was made. Rather, we aimed at developing a new pipeline for discovery of new EV biomarkers and we believe we were able to prove that this approach was successful in discovering a new class of biomarkers based on proteins expressed on extra-cellular vesicles that have predominant expression on patients with pancreatic cancer. Future studies should continue to advance this field with goals of improving on the current standard of care diagnostic methods.

      The authors also suggest that profiling of circulating extracellular vesicles provides unique insights into systemic immune changes during pancreatic cancer development. How is this better than a regular hemogram is not clear.

      We would like to clarify that the overall goal of this study is to provide patient-relevant information for the research community to further investigate biology of extracellular vesicles. For the state 'unique insights into systemic immune changes' we referred to the fact that we discovered EVs carrying proteins involved in immune responses. Previous studies have shown that EVs play important roles in cell-cell communication, discoveries from our study provide candidates for future studies on cellular mechanisms underlying immune regulation during pancreatic cancer development.

      Finally, it would be important to determine how this signature compares with many others described in the literature that have the exact same aim. Why and how would this one be better?

      We would like to again clarify that comparing the diagnostic performance of the EV biomarkers discovered in the study against standard of care methods (CA19-9, ctDNA, CT scan) was beyond the scope of this discovery EV proteomics work. We reviewed the manuscript to ensure that no claims were made as far as superiority against point-of-care tests available in clinic.

      Reviewer #3 (Public Review):

      This work investigates the use of extracellular vesicles (EVs) in blood as a noninvasive 'liquid biopsy' to aid in the differentiation of patients with pancreatic cancer (PDAC) from those with benign pancreatic disease and healthy controls, an important clinical question where biopsies are frequently non-diagnostic. The use of extracellular vesicles as biomarkers of disease has been gaining interest in recent history, with a variety of published methods and techniques, looking at a variety of different compositions ('the molecular cargo') of EVs particularly in cancer diagnosis (Shah R, et al, N Engl J Med 2018; 379:958-966).

      This study adds to the growing body of evidence in using EVs for earlier detection of pancreatic cancer, identifying both new and known proteins of interest. Limitations in studying EVs, in general, include dealing with low concentrations in circulation and identifying the most relevant molecular cargo. This study provides validation of assaying EVs using the novel EVtrap method (Extracellular Vesicles Total Recovery And Purification),which the authors show to be more efficient than current standard techniques and potentially more scalable for larger clinical studies.

      The strength of this study is in its numbers - the authors worked with a cohort of 124 cases,93 of them which were PDAC samples, which are considered large for an EV study (Jia, E etal. BMC Cancer 22, 573 (2022)). The benign disease group (n=20, between chronic pancreatitis and IPMNs) and healthy control groups (n=11) were relatively small, but the authors were not only able to identify candidate biomarkers for diagnosis that clearly stood out in the PDAC cohort, but also validate it in an independent cohort of 36 new subjects.

      Proteins they have identified as associated with pancreatic cancer over benign disease included PDCD6IP, SERPINA12, and RUVBL2. They were even able to identify a set of EV proteins associated with metastasis and poorer prognosis, which include the proteins PSMB4, RUVBL2 and ANKAR and CRP, RALB and CD55. Their 7-EV protein signature yielded an 89% prediction accuracy for the diagnosis of PDAC against a background of benign pancreatic diseases that is compelling and comparable to other studies in the literature (Jia,E. et al. BMC Cancer 22, 573 (2022)).

      The limitations of this study are its containment within a single institution - further studies are warranted to apply the authors' 7-EV protein PRAC panel to multiple other cases at other institutions in a larger cohort.

      We are very thankful to the Reviewer for the positive feedback. We are similarly optimistic that EV-based biomarkers will assist future researchers to develop better diagnostic assays for patients with pancreatic cancer, as well as other tumor types lacking accurate blood-based tests.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Herrmannova et al explore changes in translation upon individual depletion of three subunits of the eIF3 complex (d, e and h) in mammalian cells. The authors provide a detailed analysis of regulated transcripts, followed by validation by RT-qPCR and/or Western blot of targets of interest, as well as GO and KKEG pathway analysis. The authors confirm prior observations that eIF3, despite being a general translation initiation factor, functions in mRNA-specific regulation, and that eIF3 is important for translation re-initiation. They show that global effects of eIF3e and eIF3d depletion on translation and cell growth are concordant. Their results support and extend previous reports suggesting that both factors control translation of 5'TOP mRNAs. Interestingly, they identify MAPK pathway components as a group of targets coordinately regulated by eIF3 d/e. The authors also discuss discrepancies with other reports analyzing eIF3e function.

      Strengths:

      Altogether, a solid analysis of eIF3 d/e/h-mediated translation regulation of specific transcripts. The data will be useful for scientists working in the Translation field.

      Weaknesses:

      The authors could have explored in more detail some of their novel observations, as well as their impact on cell behavior.

      The manuscript has improved with the new corrections. I appreciate the authors' attention to the minor comments, which have been fully solved. The authors have not, however, provided additional experimental evidence that uORF-mediated translation of Raf-1 mRNA depends on an intact eIF3 complex, nor have they addressed the consequences of such regulation for cell physiology. While I understand that this is a subject of follow-up research, the authors could have at least included their explanations/ speculations regarding major comments 2-4, which in my opinion could have been useful for the reader.

      Our explanations/speculations regarding major comments 2 and 3 were included in the Discussion. We apologize for this misunderstanding as we thought that we were supposed to explain our ideas only in the responses. We did not discuss the comment 4, however, as we are really not sure what is the true effect and did not want to go into wild speculations in our manuscript. We thank this reviewer for his insightful comments and understanding.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) The authors report the potential translational regulation of Raf kinase by re-initiation. It would be interesting to show that Raf is indeed regulated by uORF-mediated translation, and that this is dependent on an intact eIF3 complex. Analyzing the potential consequences of Raf1 regulation for cancer cell proliferation or apoptosis would be a plus.

      We agree that this is an interesting and likely possibility. In fact, another clue that translation of Raf1 is regulated by uORFs comes from Bohlen et al. 2023 (PMID: 36869665) where they showed that RAF1 translation is dependent on PRRC2 proteins (that promote leaky scanning through these uORFs). We noted in the discussion that our results from eIF3d/e/hKD and the PRRC2A/B/CKD partly overlap. It is a subject of our follow-up research to investigate whether eIF3 and PRRC2 co-operate together to regulate translation of this important mRNA. 

      (2) The authors show that eIF3 d/e -but not 3h- has an effect on cell proliferation. First, this indicates that proliferation does not fully correlate with eIF3 integrity. Depletion of eIF3d does not affect the integrity of eIF3, yet the effects on proliferation are similar to those of eIF3e. What is the possibility that changes in proliferation reflect functions of eIF3d outside the eIF3 complex? What could be the real consequences of disturbing eIF3 integrity for the mammalian cell? Please, discuss.

      Yes, proliferation does not fully correlate with eIF3 integrity. Downregulation of eIF3 subunits that lead to disintegration of eIF3 YLC core (a, b, c, g, i) have more detrimental effect on growth and translation than downregulation of the peripheral subunits (e, k, l, f, h, m). Our previous studies (Wagner et al. 2016, PMID: 27924037 and Herrmannová et al. 2020, PMID: 31863585) indicate that the YLC core of eIF3 can partially support translation even without its peripheral subunits. In this respect eIF3d (as a peripheral subunit) is an amazing exception, suggesting it may have some specialized function(s). Whether this function resides outside of the eIF3 complex or not we do not know, but do not think so. Mainly because in the absence of eIF3e – its interaction partner, eIF3d gets rapidly degraded. Therefore, it is not very likely that eIF3d exists alone outside of eIF3 complex with moonlighting functions elsewhere. We think that eIF3d, as a head-interacting subunit close to an important head ribosomal protein RACK1 (a landing pad for regulatory proteins), is a target of signaling pathways, which may make it important for translation of specific mRNAs. In support is these thoughts, eIF3d (in the context of entire eIF3) together with DAP5 were shown to promote translation by an alternate capdependent (eIF4F-independent) mechanism (Lee et al. 2016, PMID: 27462815; de la Parra et al. 2018, PMID:30076308). In addition, the eIF3d function (also in the context of entire eIF3) was proved to be regulated by stress-triggered phosphorylation (Lamper et al. 2020, PMID: 33184215). 

      (3) Figure 6D: Surprisingly, reduced levels of ERK1/2 upon eIF3d/e-KD are compensated by increased phosphorylation of ERK1/2 and net activation of c-Jun. Please comment on the functional consequences of buffering mechanisms that the cell deploys in order to counteract compromised eIF3 function. Why would the cell activate precisely the MAPK pathway to compensate for a compromised eIF3 function?

      This we do not know. We can only speculate that when translation is compromised, cells try to counteract it in two ways: 1) they produce more ribosomes to increase translational rates and 2) activate MAPK signaling to send pro-growth signals, which can in the end further boost ribosome biogenesis.

      (4) Regarding DAP-sensitive transcripts, can the authors discuss in more detail the role of eIF3d in alternative cap-dependent translation versus re-initiation? Are these transcripts being translated by a canonical cap- and uORF-dependent mechanism or by an alternative capdependent mechanism?

      This is indeed not an easy question. On one hand, it was shown that DAP5 facilitates translation re-initiation after uORF translation in a canonical cap-dependent manner. This mechanism is essential for translation of the main coding sequence (CDS) in mRNAs with structured 5' leaders and multiple uORFs. (Weber et al. 2022, PMID: 36473845; David et al., 2022, PMID: 35961752). On the other hand, DAP5 was proposed to promote alternative, eIF4F-independent but cap-dependent translation, as it can substitute the function of the eIF4F complex in cooperation with eIF3d (de la Parra et al., 2018, PMID: 30076308; Volta et al., 2021 34848685). Overall, these observations paint a very complex picture for us to propose a clear scenario of what is going on between these two proteins on individual mRNAs. We speculate that both mechanisms are taking place and that the specific mechanism of translation initiation differs for differently arranged mRNAs.

      Minor comments:

      (5) Figure S2C: why is there a strong reduction of the stop codon peak for 3d and 3h KDs?

      We have checked the Ribowaltz profiles of all replicates (in the Supplementary data we are showing only a representative replicate I) and the stop codon peak differs a lot among the replicates. We think that this way of plotting was optimized for calculation and visualization of P-sites and triplet periodicity and thus is not suitable for this type of comparison among samples. Therefore, we have performed our own analysis where the 5’ ends of reads are used instead of P-sites and triplicates are averaged and normalized to CDS (see below please), so that all samples can be compared directly in one plot (same as Fig. S13A but for stop codon). We can see that the stop codon peak really differs and is the smallest for eIF3hKD. However, these changes are in the range of 20% and we are not sure about their biological significance. We therefore refrain from drawing any conclusions. In general, reduced stop codon peak may signal faster termination or increased stop codon readthrough, but the latter should be accompanied by an increased ribosome density in the 3’UTR, which is not the case. A defect in termination efficiency would be manifested by an increased stop codon peak, instead.

      Author response image 1.

       

      (6) Figures 5 and S8: Adding a vertical line at 'zero' in all cumulative plots will help the reader understand the author's interpretation of the data. 

      We have added a dashed grey vertical line at zero as requested. However, for interpretation of these plots, the reader should focus on the colored curve and whether it is shifted in respect to the grey curve (background) or not. Shift to the right indicates increased expression, while shift to the left indicates decreased expression. The reported p-value then indicates the statistical significance of the shift.

      (7) The entire Figure 2 are controls that can go to Supplementary Material. The clustering of Figure S3B could be shown in the main Figure, as it is a very easy read-out of the consistent effects of the KDs of the different eIF3 subunits under analysis.

      We have moved the entire Figure 2 to Supplementary Material as suggested (the original panels can be found as Supplementary Figures 1B, 1C and 3A). Figure S3B is now the main Figure 2E. 

      (8) There are 3 replicates for Ribo-Seq and four for RNA-Seq. Were these not carried out in parallel, as it is usually done in Ribo-seq experiments? Why is there an extra replicate for RNASeq?

      Yes, the three replicates were carried out in parallel. We have decided to add the fourth replicate in RNA-Seq to increase the data robustness as the RNA-Seq is used for normalization of FP to calculate the TE, which was our main analyzed metrics in this article. We had the option to add the fourth replicate as we originally prepared five biological replicates for all samples, but after performing the control experiments, we selected only the 3 best replicates for the Ribo-Seq library preparation and sequencing.  

      (9) Please, add another sheet in Table S2 with the names of all genes that change only at the translation (RPF) levels.

      As requested, we have added three extra sheets (one for each downregulation) for differential FP with Padjusted <0.05 in the Spreadsheet S2. We also provide a complete unfiltered differential expression data (sheet named “all data”), so that readers can filter out any relevant data based on their interest.

      (10) Page 5, bottom: ' ...we showed that the expression of all 12 eIF3 subunits is interconnected such that perturbance of the expression of one subunit results in the down-regulation of entire modules...'. This is not true for eIF3d, as shown in Fig1B and mentioned in Results.

      This reviewer is correct. By this generalized statement, we were trying to summarize our previous results from Wagner et al., 2014, PMID: 24912683; Wagner et al.,2016, PMID: 27924037 and Herrmannova et al.,2020, PMID: 31863585. The eIF3d downregulation is the only exception that does not affect expression of any other eIF3 subunit. Therefore, we have rewritten this paragraph accordingly: “We recently reported a comprehensive in vivo analysis of the modular dynamics of the human eIF3 complex (Wagner et al, 2020; Wagner et al, 2014; Wagner et al., 2016). Using a systematic individual downregulation strategy, we showed that the expression of all 12 eIF3 subunits is interconnected such that perturbance of the expression of one subunit results in the down-regulation of entire modules leading to the formation of partial eIF3 subcomplexes with limited functionality (Herrmannova et al, 2020). eIF3d is the only exception in this respect, as its downregulation does not influence expression of any other eIF3 subunit.”

      (11) Page 10, bottom: ' The PCA plot and hierarchical clustering... These results suggest that eIF3h depletion impacts the translatome differentially than depletion of eIF3e or eIF3d.' This is already obvious in the polysome profiles of Figure S2C.

      We agree that this result is surely not surprising given the polysome profile and growth phenotype analyses of eIF3hKD. But still, we think that the PCA plot and hierarchical clustering results represent valuable controls. Nonetheless, we rephrased this section to note that this result agrees with the polysome profiles analysis: “The PCA plot and hierarchical clustering (Figure 2A and Supplementary Figure 4A) showed clustering of the samples into two main groups: Ribo-Seq and RNA-seq, and also into two subgroups; NT and eIF3hKD samples clustered on one side and eIF3eKD and eIF3dKD samples on the other. These results suggest that the eIF3h depletion has a much milder impact on the translatome than depletion of eIF3e or eIF3d, which agrees with the growth phenotype and polysome profile analyses (Supplementary Figure 1A and 1D).”

      (12) Page 12: ' As for the eIF3dKD "unique upregulated" DTEGs, we identified one interesting and unique KEGG pathway, the ABC transporters (Supplementary Figure 5A, in green).' This sentence is confusing, as there are more pathways that are significant in this group, so it is unclear why the authors consider it 'unique'.

      The eIF3dKD “unique upregulated” group comprises genes with increased TE only in eIF3dKD but not in eIF3eKD or eIF3hKD (500 genes, Fig 2G). All these 500 genes were examined for enrichment in the KEGG pathways, and the top 10 significant pathways were reported (Fig S6A). However, 8 out of these 10 pathways were also significantly enriched in other gene groups examined (e.g. eIF3d/eIF3e common). Therefore, the two remaining pathways (“ABC transporters” and “Other types of O-glycan biosynthesis”) are truly unique for eIF3dKD. We wanted to highlight the ABC transporters group in particular because we find it rather interesting (for the reasons mentioned in the article). We have corrected the sentence in question to avoid confusion: “Among the eIF3dKD “unique upregulated” DTEGs, we identified one interesting KEGG pathway, the ABC transporters, which did not show up in other gene groups (Supplementary Figure 6A, in green). A total of 12 different ABC transporters had elevated TE (9 of them are unique to eIF3dKD, while 3 were also found in eIF3eKD), 6 of which (ABCC1-5, ABCC10) belong to the C subfamily, known to confer multidrug resistance with alternative designation as multidrug resistance protein (MRP1-5, MRP7) (Sodani et al, 2012).

      Interestingly, all six of these ABCC transporters were upregulated solely at the translational level (Supplementary Spreadsheet S2).”    

      (13) Note typo ('Various') in Figure 4A.

      Corrected

      (14) The introduction could be shortened.

      This is a very subjective requirement. In fact, when this manuscript was reviewed in NAR, we were asked by two reviewers to expand it substantially. Because a number of various research topics come together in this work, e.g. translational regulation, the eIF3 structure and function, MAPK/ERK signaling, we are convinced that all of them demand a comprehensive introduction for non-experts in each of these topics. Therefore, with all due respect to this reviewer, we did not ultimately shorten it.

      Reviewer #2 (Recommendations For The Authors):

      - In Figure 2, it would be useful to know why eIF3d is destabilized by eIF3e knockdown - is it protein degradation and why do the eIF3d/e knockdowns not more completely phenocopy each other when there is the same reduction to eIF3d as in the eIF3d knockdown sample?

      Yes, we do think that protein degradation lies behind the eIF3d destabilization in the eIF3eKD, but we have not yet directly demonstrated this. However, we have shown that eIF3d mRNA levels are not altered in eIF3eKD and that Ribo-Seq data indicate no change in TE or FP for eIF3d-encoding mRNA in eIF3eKD. Nonetheless, it is important to note (and we discuss it in the article) that eIF3d levels in eIF3dKD are lower than eIF3d levels in eIF3eKD (please see Supplementary Figure 1C). In fact, we believe that this is one of the main reasons for the eIF3d/e knockdowns differences.

      - The western blots in Figures 4 and 6 show modest changes to target protein levels and would be strengthened by quantification.

      We have added the quantifications as requested by this reviewer and the reviewer 3.

      - For Figure 4, this figure would be strengthened by experiments showing if the increase in ribosomal protein levels is correlated with actual changes to ribosome biogenesis.

      As suggested, we performed polysome profiling in the presence of EDTA to monitor changes in the 60S/40S ratio, indicating a potential imbalance in the biogenesis of individual ribosome subunits. We found that it was not affected (Figure 3G). In addition, we performed the same experiment, normalizing all samples to the same number of cells (cells were carefully counted before lysis). In this way, we confirmed that eIF3dKD and eIF3eKD cells indeed contain a significantly increased number of ribosomes, in agreement with the western blot analysis (Figure 3H).

      - In Figure 6, there needs to be a nuclear loading control.

      This experiment was repeated with Lamin B1 used as a nuclear loading control – it is now shown as Fig. 5F.

      - For Figure 8, these findings would be strengthened using luciferase reporter assays where the various RNA determinants are experimentally tested. Similarly, 5′ TOP RNA reporters would have been appreciated in Figure 4.

      This is indeed a logical continuation of our work, which represents the current work in progress of one of the PhD students. We apologize, but we consider this time- and resource-demanding analysis out of scope of this article.

      Reviewer #3 (Recommendations For The Authors):

      (1) Within the many effects observed, it is mentioned that eIF3d is known to be overexpressed while eIF3e is underexpressed in many cancers, but knockdown of either subunit decreases MDM2 levels, which would be expected to increase P53 activity and decrease tumor cell transformation. In contrast, they also report that 3e/3d knockdown dramatically increases levels of cJUN, presumably due to increased MAPK activity, and is expected to increase protumor gene expression. Additional discussion is needed to clarify the significance of the findings, which are a bit confusing.

      This is indeed true. However, considering the complexity of eIF3, the largest initiation factor among all, as well as the broad portfolio of its functions, it is perhaps not so surprising that the observed effects are complex and may seem even contradictory in respect to cancer. To acknowledge that, we expanded the corresponding part of discussion as follows: “Here, we demonstrate that alterations in the eIF3 subunit stoichiometry and/or eIF3 subcomplexes have distinct effects on the translatome; for example, they affect factors that play a prominent (either positive or negative) role in cancer biology (e.g., MDM2 and cJUN), but the resulting impact is unclear so far. Considering the complex interactions between these factors as well as the complexity of the eIF3 complex per se, future studies are required to delineate the specific oncogenic and tumor suppressive pathways that play a predominant role in mediating the effects of perturbations in the eIF3 complex in the context of neoplasia.”

      (2) There are places in the text where the authors refer to changes in transcriptional control when RNA levels differ, but transcription versus RNA turnover wasn't tested, e.g. page 16 and Figure S10, qPCR does not confirm "transcriptional upregulation in all three knockdowns" and page 19 "despite apparent compensatory mechanisms that increase their transcription."

      This is indeed true, the sentences in question were corrected. The term “increased mRNA levels” was used instead of transcriptional upregulation (increased mRNA stabilization is also possible).

      (3) Similarly, the authors suggest that steady-state LARP1 protein levels are unaffected based on ribosome footprint counts (page 21). It is incorrect to assume this, because ribosome footprints can be elevated due to stalling on RNA that isn't being translated and doesn't yield more protein, and because levels of translated RNA/synthesized proteins do not always reflect steady-state protein levels, especially in mutants that could affect lysosome levels and protein turnover. Also page 12, 1st paragraph suggests protein production is down when ribosome footprints are changed.

      Yes, we are well-aware of this known limitation of Ribo-seq analysis. Therefore, the steadystate protein levels of our key hits were verified by western blotting. In addition, we have removed the sentence about LARP1 because it was based on Ribo-Seq data only without experimental evaluation of the steady-state LARP1 protein levels.

      (4) The translation buffering effect is not clear in some Figures, e.g. S6, S8, 8A, and B. The authors show a scheme for translationally buffered RNAs being clustered in the upper right and lower left quadrants in S4H (translation up with transcript level down and v.v.), but in the FP versus RNA plots, the non-TOP RNAs and 4E-P-regulated RNAs don't show this behavior, and appear to show a similar distribution to the global changes. Some of the right panels in these figures show modest shifts, but it's not clear how these were determined to be significant. More information is needed to clarify, or a different presentation, such as displaying the RNA subsets in the left panels with heat map coloring to reveal whether RNAs show the buffered translation pattern defined in purple in Figure S4H, or by reporting a statistical parameter or number of RNAs that show behavior out of total for significance. Currently the conclusion that these RNAs are translationally buffered seems subjective since there are clearly many RNAs that don't show changes, or show translation-only or RNA-only changes.

      We would like to clarify that S4H does not indicate a necessity for changes in FPs in the buffered subsets. Although opposing changes in total mRNA and FPs are classified as buffering, often we also consider the scenario where there are changes to the total mRNA levels not accompanied by changes in ribosome association.

      In figure S6, the scatterplots indicate a high density of genes shifted towards negative fold changes on the x-axis (total mRNA). This is also reflected in the empirical cumulative distribution functions (ecdfs) for the log2 fold changes in total mRNA in the far right panels of A and B, and the lack of changes in log2 fold change for FPs (middle panels). Similarly, in figure S8, the scatterplots indicate a density of genes shifted towards positive fold changes on the x-axis for total mRNA. The ecdfs also demonstrate that there is a significant directional shift in log2 fold changes in the total mRNA that is not present to a similar degree in the FPs, consistent with translational offsetting. It is rightly pointed out that not all genes in these sets follow the same pattern of regulation. We have revised the title of Supplementary Figure S6 (now S7) to reflect this. However, we would like to emphasize that these figures are not intended to communicate that all genes within these sets of interest are regulated in the same manner, but rather that when considered as a whole, the predominant effect seen is that of translational offsetting (directional shifts in the log2 fold change distribution of total mRNA that are not accompanied by similar shifts in FP mRNA log2 fold changes).

      The significance of these differences was determined by comparing the ecdfs of the log2 fold changes for the genes belonging to a particular set (e.g. non-TOP mTOR-sensitive, p-eIF4E-sensitive) against all other expressed genes (background) using a Wilcoxan rank sum test. This allows identification of significant shifts in the distributions that have a clear directionality (if there is an overall increase, or decrease in fold changes of FPs or total mRNA compared to background). If log2 fold changes are different from background, but without a clear directionality (equally likely to be increased or decreased), the test will not yield a significant result. This approach allows assessment of the overall behavior of gene signatures within a given dataset in a manner that is completely threshold-independent, such that it does not rely on classification of genes into different regulatory categories (translation only, buffering, etc.) based on significance or fold-change cut-offs (as in S4H). Therefore, we believe that this unbiased approach is well-suited for identifying cases when there are many genes that follow similar patterns of regulation within a given dataset.

      (5) Page 10-"These results suggest that eIF3h depletion impacts the translatome differentially than depletion of eIF3e or eIF3d" ...These results suggest that eIF3h has less impact on the translatome, not that it does so differently. If it were changing translation by a different mechanism, I would not expect it to cluster with control.

      This sentence was rewritten as follows: “The PCA plot and hierarchical clustering (Figure 2A and Supplementary Figure 4A) showed clustering of the samples into two main groups: RiboSeq and RNA-seq, and also into two subgroups; NT and eIF3hKD samples clustered on one side and eIF3eKD and eIF3dKD samples on the other. These results suggest that the eIF3h depletion has a much milder impact on the translatome than depletion of eIF3e or eIF3d, which agrees with the growth phenotype and polysome profile analyses (Supplementary Figure 1A and 1D).”

      Other minor issues:

      (1) There are some typos: Figure 2 leves, Figure 4 variou,

      Corrected.

      (2) Figure 3, font for genes on volcano plot too small

      Yes, maybe, however the resolution of this image is high enough to enlarge a certain part of it at will. In our opinion, a larger font would take up too much space, which would reduce the informativeness of this graph.

      (3) Figure S5, highlighting isn't defined.

      The figure legend for S5A (now S6A) states: “Less significant terms ranking 11 and below are in grey. Terms specifically discussed in the main text are highlighted in green.” Perhaps it was overlooked by this reviewer.

      (4) At several points the authors refer to "the MAPK signaling pathway", suggesting there is a single MAPK that is affected, e.g in the title, page 3, and other places when it seems they mean "MAPK signaling pathways" since several MAPK pathways appear to be affected.

      We apologize for any terminological inaccuracies. There are indeed several MAPK pathways operating in cells. In our study, we focused mainly on the MAPK/ERK pathway. The confusion probably stems from the fact that the corresponding term in the KEGG pathway database is labeled "MAPK signaling pathway" and this term, although singular, includes all MAPK pathways. We have carefully reviewed the entire article and have corrected the term used accordingly to either: 1) MAPK pathways in general, 2) the MAPK/ERK pathway for this particular pathway, or 3) "MAPK signaling pathway", where the KEGG term is meant.

      (5) Some eIF3 subunit RNAs have TOP motifs. One might expect 3e and 3h levels to change as a function of 3d knockdown due to TOP motifs but this is not observed. Can the authors speculate why the eIF3 subunit levels don't change but other TOP RNAs show TE changes? Is this true for other translation factors, or just for eIF3, or just for these subunits? Could the Western blot be out of linear range for the antibody or is there feedback affecting eIF3 levels differently than the other TOP RNAs, or a protein turnover mechanism to maintain eIF3 levels?

      This is indeed a very interesting question. In addition to the mRNAs encoding ribosomal proteins, we examined all TOP mRNAs and added an additional sheet to the S2 supplemental spreadsheet with all TOP RNAs listed in (Philippe et al., 2020, PMID: 32094190). According to our Ribo-Seq data, we could expect to see increased protein levels of eIF3a and eIF3f in eIF3dKD and eIF3eKD, but this is not the case, as judged from extensive western blot analysis performed in (Wagner et. al 2016, PMID: 27924037). Indeed, we cannot rule out the involvement of a compensatory mechanism monitoring and maintaining the levels of eIF3 subunits at steady-state – increasing or decreasing them if necessary, which could depend on the TOP motif-mediated regulation. However, we think that in our KDs, all non-targeted subunits that lose their direct binding partner in eIF3 due to siRNA treatment become rapidly degraded. For example, co-downregulation of subunits d, k and l in eIF3eKD is very likely caused by protein degradation as a result of a loss of their direct binding partner – eIF3e. Since we showed that the yeast eIF3 complex assembles co-translationally (Wagner et. al 2020, PMID: 32589964), and there is no reason to think that mammalian eIF3 differs in this regard, our working hypothesis is that free subunits that are not promptly incorporated into the eIF3 complex are rapidly degraded, and the presence or absence of the TOP motif in the 5’ UTR of their mRNAs has no effect. As for the other TOP mRNAs, translation factors eEF1B2, eEF1D, eEF1G, eEF2 have significantly increased FPs in both eIF3dKD and eIF3eKD, but we did not check their protein levels by western blotting to conclude anything specific.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      This study delineates an important set of uninjured and injured periosteal snRNAseq data that provides an overview of periosteal cell responses to fracture healing. The authors also took additional steps to validate some of the findings using immunohistochemistry and transplantation assays. This study will provide a valuable publicly accessible dataset to reexamine the expression of the reported periosteal stem and progenitor cell markers.

      Strengths: 

      (1) This is the first single-nuclei atlas of periosteal cells that are obtained without enzymatic cell dissociation or targeted cell purification by FACS. This integrated snRNAseq dataset will provide additional opportunities for the community to revisit the expression of many periosteal cell markers that have been reported to date.

      (2) The authors delved further into the dataset using cutting-edge algorithms, including CytoTrace, SCENIC, Monocle, STRING, and CellChat, to define the potential roles of identified cell populations in the context of fracture healing. These additional computation analyses generate many new hypotheses regarding periosteal cell reactions.

      (3) The authors also sought to validate some of the computational findings using immunohistochemistry and transplantation assays to support the conclusion.

      Weaknesses: 

      (1) The current snRNAseq datasets contain only a small number of nuclei (1,189 nuclei at day 0, 6,213 nuclei on day 0-7 combined). It is unclear if the number is sufficient to discern subtle biological processes such as stem cell differentiation. 

      We analyzed a total of 6,213 nuclei from uninjured periosteum and fracture calluses at 3 stages of bone healing. We were able to describe 11 distinct cell populations, revealing the diversity of cell populations in uninjured periosteum and post-injury, including rare cell types in the fracture environment such Schwann cells, adipocytes and pericytes. The number of nuclei was sufficient to perform extensive analysis using a combination of cutting-edge algorithms. We agree that more nuclei would allow more in-depth analyses of cell fate transitions and rare populations, such as pericytes and Schwann cells. However, we concentrated here on SSPC/fibrogenic cells that are well represented in our dataset. Our study robustness is also reinforced by the analysis of 4 successive time points to define the SSPC/fibrogenic cell trajectories. Our validations using immunohistochemistry and transplantation assays also confirmed that our dataset is sufficient to define cell trajectories. There is no clear consensus on the number of cells needed to perform sc/snRNAseq analyses, as it depends on the cell types analyzed and the fold changes in gene expression. Previously reported single cell datasets containing a lower number of cells reached major conclusions including SSPC identification, cell differentiation trajectories and differential gene expression (658 cells in (Debnath et al. 2018), 300 in (Ambrosi et al. 2021), around 175 in (Remark et al. 2023).)

      (2) The authors' designation of Sca1+CD34+ cells as SSPCs is not sufficiently supported by experimental evidence. It will be essential to demonstrate stem/progenitor properties of Sca1+CD34+ cells using independent biological approaches such as CFU-F assays. In addition, the putative lineage trajectory of SSPCs toward IIFCs, osteoblasts, and chondrocytes remains highly speculative without concrete supporting data. 

      We performed additional analyses to further support that Sca1+ SSPCs display stem/progenitor properties. We performed CFU assays with Prx1-GFP+ SCA1+ and Prx1-GFP+ SCA1- periosteal cells (Figure 2F-G). We showed that Prx1-GFP+ SCA1+ display significant increased CFU potential compared to Prx1-GFP+ SCA1- cells. In addition, we isolated and transplanted Prx1-GFP+ Sca1+ and Prx1-GFP+ Sca1- periosteal cells at the fracture site of wild-type mice (Figure 2H). Only Sca1+ cells contributed to the callus formation, reinforcing that Sca1+ cells are the SSPC population mediating bone repair. 

      The differentiation trajectory of SSPCs presented in our study is supported by a combination of bioinformatic analyses and in vivo validation:

      - snRNAseq allowed us to identify the different populations in the uninjured periosteum. In silico, in vitro and in vivo analyses all point to Sca1+ cells as the SSPC population (Fig 2EG).

      - At day 3 post-fracture, we did not detect Sca1+ cells in the callus (Fig 4 – Supplementary figure 2). Instead, we observed the appearance of a new population, IIFCs. This population clustered along SSPCs and pseudotime analyses indicate that SSPCs can differentiate into IIFCs (Fig 5B). We confirmed the ability of Sca1+ pSSPCs to form IIFCs, by grafting them in the fracture callus and assessing their fibrogenic fate at day 5 post-fracture (Fig 6B).

      - In silico, we observed that IIFCs clustered along osteogenic and chondrogenic cells. The pseudotime trajectory suggests that IIFCs can differentiate into both lineages (Fig 5B-C). This is coherent with the progressive expression of osteochondrogenic genes observed in IIFCs (Fig 5C, Fig 8A, C, E). In vivo, we observed the progressive expression of Runx2 and Sox9 by IIFCs undergoing differentiation (Fig 6A). We now show that IIFCs are not undergoing apoptosis, indicating that these cells further differentiate (Fig 7 – Supplementary figure 2). To functionally assess the osteochondrogenic potential of IIFCs, we used transplantation assay and showed that Prx1-GFP+ IIFCs isolated from day 3 post-fracture form cartilage and bone when transplanted at the fracture site of wild-type mice (Fig 6C). 

      We would like to insist on the robustness of the bioinformatic analyses performed in our study. First, we used datasets from different time points post-fracture to capture the true temporal progression of cell populations in the fracture callus. We used a large combination of tools shown to be reliable in many studies (Julien et al. 2021; Matsushita et al. 2020; Debnath et al. 2018; Baccin et al. 2020; Junyue Cao et al. 2019; Zhong et al. 2020), and all tools converge in the same trajectory. To further show the relevance of pseudotime in our model, we illustrated the distribution of the cell populations by time point (Fig. 5D). We can observe a parallel between the time points and the pseudotime, reinforcing that the pseudotime trajectory reflects the timing of SSPC differentiation. Overall, the combined in silico, in vitro and in vivo analyses support that Sca1+ Pi16+ cells are the periosteal SSPC population, specifically represented in the uninjured dataset. In response to bone fracture, these SSPCs give rise to IIFCs that are specifically represented in the intermediate stages (days 3 and 5) prior to osteochondrogenic differentiation.

      (3) The designation of POSTN+ clusters as injury-induced fibrogenic cells (IIFCs) is not fully supported by the presented data. The authors' snRNAseq datasets (Figure 1d) demonstrate that there are many POSTN+ cells prior to injury, indicating that POSTN+ cells are not specifically induced in response to injury. It has been widely recognized that POSTN is expressed in the periosteum without fracture. This raises a possibility that the main responder of fracture healing is POSTN+ cells, not SSPCs as they postulate. The authors cannot exclude the possibility that Sca1+CD34+ cells are mere bystanders and do not participate in fracture healing. 

      IIFCs are a population of cells that express high levels of ECM related genes, including Postn, Aspn and collagens. We did not claim that Postn expression is specific to IIFCs. While Postn is detected in the uninjured periosteum, snRNAseq analyses and RNAscope experiments showed that the expression of Postn is limited to a small number of cells in the cambium layer of the periosteum (Fig 4B , Figure 4 – Supplementary figure 1B). These Postn-expressing cells in the uninjured periosteum are not SSPCs, as they do not co-express/co-localize with Pi16+ and Sca1+ cells detected in the fibrous layer (Fig4, Figure 4– Supplementary figure 1A, Figure 6-Supplementary figure 1). These Postn-expressing cells are undergoing osteogenic differentiation as shown by the correlation between Runx2 and Postn expression (Fig. 4 – Supplementary Figure 1C). After fracture, we observed a strong increase in ECM-related gene expression and specifically in the IIFC population. We now show the strong increase of Postn expression after injury (Fig. 4 – Supplementary Figure 1D-E, Figure 6-Supplementary figure 1E). 

      As mentioned in our response above, we now show that SCA1+ cells form cartilage and bone after fracture, while SCA1- cells (including the POSTN+ population) from the uninjured periosteum did not contribute. These data reveal that Sca1+ CD34+ cells are the main SSPC population mediating bone healing and that POSTN+ IIFCs are a transient stage of SSPC differentiation. We added the following text to the result section: “Pi16-expressing SSPCs are located within the fibrous layer, while we observed few POSTN+ cells in the cambium layer (Fig. 4 – Supplementary Fig. 1A). Postn expression is weak in uninjured periosteum and is limited to differentiating cells. Postn expression is strongly increased in response to fracture, specifically in IIFCs (Fig. 4 – Supplementary Fig. 1B-E). “

      (4) Detailed spatial organization of Sca1+CD34+ cells and POSTN+ cells in the uninjured periosteum with respect to the cambium layer and the fibrous layer is not demonstrated. 

      We performed RNAscope experiments to locate Pi16-expressing and Postn-expressing cells in the uninjured periosteum. We observed that Pi16-expressing cells are in the external fibrous layer of the periosteum while Postn-expressing cells are located along the cortex in the cambium layer. The data are added in Fig 4B and Fig. 4- Supplementary Figure 1 and mentioned in the result section “Pi16-expressing SSPCs were located within the fibrous layer, while Postn-expressing cells were found in the cambium layer and corresponded to Runx2-expressing osteogenic cells (Fig. 4 – Supplementary Fig. 1A-C).”.

      (5) Interpretation of transplantation experiments in Figure 5 is not straightforward, as the authors did not demonstrate the purity of Prx1Cre-GFP+SCA1+ cells and Prx1Cre-GFP+CD146- cells to pSSPCs and IIFCs, respectively. It is possible that these populations contain much broader cell types beyond SSPCs or IIFCs.  

      We agree with the reviewer that our methodology for cell transplantation required more justification and validation. We decided to use a transgenic mouse line to be able to trace the cells in vivo after grafting. Prx1 marks limb mesenchyme during development and the Prx1Cre mouse model allows to label all SSPCs contributing to callus formation. Therefore, we used Prx1Cre, R26mTmG mice as donors for SSPCs and IIFCs isolation (Duchamp de Lageneste et al. 2018; Logan et al. 2002). Prx1 does not mark immune and endothelial cells but can label pericytes and fibroblastic populations (Duchamp de Lageneste et al. 2018; Logan et al. 2002; Julien et al. 2021). In the uninjured periosteum, Sca1 (Ly6a) is only expressed by SSPCs and endothelial cells (Fig 3-Supplementary figure 2, Fig 6-Supplementary figure 1). We sorted GFP+ Sca1+ cells from uninjured periosteum of Prx1Cre, R26mTmG mice to isolate only SSPCs and excluding endothelial cells and pericytes. For IIFCs, we isolated cells at day 3 post-fracture, as in our snRNAseq data, we detected IIFCs but no SSPCs, chondrocytes or osteoblasts at this stage of repair. To eliminate Prx1-derived pericytes, we sorted GFP+CD146- cells, as CD146 is specifically expressed by pericytes. We added Figure 6-supplementary Figure 1 to better illustrate the expression of Prx1, SCA1 (Ly6a) and CD146 (Mcam) in the uninjured and day 3 post-fracture datasets. We further demonstrate the purity of SSPCs and IIFCs isolation by qPCR on sorted GFP+ Sca1+ cells from uninjured periosteum and GFP+ CD146- cells from day 3 post-fracture periosteum and hematoma and confirmed the absence of contamination by other cell populations (Figure 6-Supplementary figure 1E). We made the following changes in the text: “To functionally validate the steps of pSSPC activation, we isolated SCA1+ GFP+ pSSPCs from Prx1Cre; R26mTmG mice, excluding endothelial cells, and grafted them at the fracture site of wild-type hosts” and “we isolated GFP+ CD146- from the fracture callus of Prx1Cre; R26mTmG mice at day 3 post fracture, that correspond to IIFCs without contamination by pericytes (CD146+ cells) (Fig. 6C, Figure 6 – Supplementary Fig.1).

      Reviewer #2 (Public Review):

      Summary: 

      The authors described cell type mapping was conducted for both WT and fracture types. Through this, unique cell populations specific to fracture conditions were identified. To determine these, the most undifferentiated cells were initially targeted using stemness-related markers and CytoTrace scoring. This led to the identification of SSPC differentiating into fibroblasts. It was observed that the fibroblast cell type significantly increased under fracture conditions, followed by subsequent increases in chondrocytes and osteoblasts.

      Strengths: 

      This study presented the injury-induced fibrogenic cell (IIFC) as a characteristic cell type appearing in the bone regeneration process and proposed that the IIFC is a progenitor undergoing osteochondrogenic differentiation. 

      Weaknesses: 

      This study endeavored to elucidate the role of IIFC through snRNAseq analysis and in vivo observation. However, such validation alone is insufficient to confirm that IIFC is an osteochondrogenic progenitor, and additional data presentation is required.  

      As mentioned in the response to Reviewer 1, the differentiation trajectory of SSPCs presented in our study is supported by a combination of bioinformatic analyses and in vivo validation:

      - snRNAseq allowed us to identify the different populations in the uninjured periosteum. In silico, in vitro and in vivo analyses altogether showed that Sca1+ cells are the SSPC population (Fig 2E-G).

      - At day 3 post-fracture, we did not detect Sca1+ cells in the callus (Fig 4 – Supplementary figure 2). Instead, we observed the appearance of a new population, IIFCs. This population clustered along SSPCs and pseudotime analyses indicate that SSPCs can differentiate into IIFCs (Fig 5B). We confirmed the ability of Sca1+ SSPCs to form IIFCs, by grafting them in the fracture callus and assessing their fate at day 5 post-fracture (Fig 6B).

      - In silico, we observed that IIFCs clustered along osteogenic and chondrogenic cells. The pseudotime trajectory suggests that IIFCs can differentiate into both lineages (Fig 5B-C). This is coherent with the progressive expression of osteochondrogenic genes observed in IIFCs (Fig 5C, Fig 8A, C, E). In vivo, we observed the progressive expression of Runx2 and Sox9 by IIFCs undergoing differentiation (Fig 6A). We now show that IIFCs are not undergoing apoptosis, indicating that these cells further differentiate (Fig 7 – Supp 2). To functionally assess the osteochondrogenic potential of IIFCs, we used transplantation assay and showed that Prx1-GFP+ IIFCs from day 3 post-fracture form cartilage and bone when transplanted at the fracture site of wild-type mice (Fig 6C). 

      We would like to insist on the robustness of the bioinformatic analyses performed in our study. First, we used datasets from different time points post-fracture to capture the true temporal progression of cell populations in the fracture callus. We used a large combination of tools shown to be reliable in many studies (Julien et al. 2021; Matsushita et al. 2020; Debnath et al. 2018; Baccin et al. 2020; Junyue Cao et al. 2019; Zhong et al. 2020), and all tools converge in the same trajectory. To further show the relevance of pseudotime in our model, we illustrate the distribution of the cell populations by time point (Fig. 5D). We can observe a parallel between the time points and the pseudotime, reinforcing that the pseudotime trajectory reflects the timing of SSPC differentiation. Overall, the combined in silico, in vitro and in vivo analyses strongly support that Sca1+ Pi16+ cells are the periosteal SSPC population, specifically represented in the uninjured dataset. In response to bone fracture, these SSPCs give rise to IIFCs that are specifically represented in the intermediate stages (days 3 and 5) prior to osteochondrogenic differentiation.

      We made the following changes in the text:

      - Line 81-87: “We performed in vitro CFU assays with sorted GFP+SCA1+  and GFP+SCA1- cells isolated from the periosteum of Prx1Cre; R26mTmG mice, as Prx1 labels all SSPCs contributing to the callus formation1. Prx1-GFP+ SCA1+ showed increased CFU potential, confirming their stem/progenitor property (Fig 2F-G).  Then, we grafted Prx1GFP+ SCA1+ et Prx1-GFP+ SCA1- periosteal cells at the fracture site of wild-type mice. Only SCA1+ cells formed cartilage and bone after fracture indicating that SCA1+ cells correspond to periosteal SSPCs with osteochondrogenic potential (Fig 2H).”

      - Line 120-122: “We did not detect Pi16-expressing SPPCs, consistent with the absence of cells expressing SSPC markers in day 3 snRNAseq dataset compared to uninjured periosteum (Fig. 4 – Supplementary Figure 2).”

      - Line 170-172: “Only a small subset of IIFCs undergo apoptosis, further supporting that IIFCs are maintained in the fracture environment giving rise to osteoblasts and chondrocytes (Fig. 7 – Supplementary Figure 2).”

      - Line 277-278: “Following this unique fibrogenic step, IIFCs do not undergo cell death but undergo either osteogenesis or chondrogenesis”

      - Line 281-283: “During bone repair, this initial fibrogenic process is an integral part of the SSPC differentiation process, and a transitional step prior to osteogenesis and chondrogenesis.”

      Reviewer #3 (Public Review): 

      In this manuscript, the authors explored the transcriptional heterogeneity of the periosteum with single nuclei RNA sequencing. Without prior enrichment of specific populations, this dataset serves as an unbiased representation of the cellular components potentially relevant to bone regeneration. By describing single-cell cluster profiles, the authors characterized over 10 different populations in combined steady state and post-fracture periosteum, including stem cells (SSPC), fibroblast, osteoblast, chondrocyte, immune cells, and so on. Specifically, a developmental trajectory was computationally inferred using the continuum of gene expression to connect SSPC, injury-induced fibrogenic cells (IIFC), chondrocyte, and osteoblast, showcasing the bipotentials of periosteal SSPCs during injury repair. Additional computational pipelines were performed to describe the possible gene regulatory network and the expected pathways involved in bone regeneration. Overall, the authors provided valuable insights into the cell state transitions during bone repair and proposed sets of genes with possible involvements in injury response. 

      While the highlights of the manuscript are the unbiased characterization of periosteal composition, and the trajectory of SSPC response in bone fracture response, many of the conclusions can be more strongly supported with additional clarifications or extensions of the analysis.  

      (1) As described in the method section, both the steady-state data and full dataset underwent integration before dimensional reduction and clustering. It would be appreciated if the authors could compare the post-integration landscapes of uninjured cells between steady state and full dataset analysis. Specifically, fibroblasts were shown in Figure 1C and 1E, and such annotations did not exist in Figure 2B. Will it be possible that the original 'fibroblasts' were part of the IIFC population? 

      As suggested, we now identified the fibroblast population from the uninjured periosteum in the integration of datasets from all time points (Figure 5B and Fig. 5 – Supplementary Figure 2). We identified 4 fibroblast populations in the uninjured periosteum: Luzp2+, Cldn1+, Hsd11b1+ and Csmd1+ fibroblasts. Luzp2+ and Cldn1+ fibroblasts are clustering distinctly from the other populations in the integrated dataset. Hsd11b1+ fibroblasts blend with SSPCs and IIFCs in the integrated dataset probably due to the low cell number. Finally, Csmd1+ fibroblasts are clustering at the interface between SSPCs and IIFCs likely because they correspond to differentiating cells both in the uninjured periosteum and in response to fracture. We modified the resolution of clustering in our subset dataset, in order to represent Luzp2+ and Cldn1+ fibroblasts as an isolated cluster (Figure 5B, cluster 10). In addition, both pseudotime (Fig. 5B) and gene regulatory network analyses (Fig. 7D), show that the fibroblast populations are distinct from the activation trajectory of SSPCs. We added the following sentence to the text “Fibroblasts from uninjured periosteum (Hsd11b1+, Cldn1+ and Luzp2+ cells corresponding to cluster 10 of Fig. 5B) clustered separately from the other populations, suggesting the absence of their contribution to bone healing.”

      (2) According to Figure 2, immune cells were taking a significant abundance within the dataset, specifically during days 3 & 5 post-fracture. It will be interesting to see the potential roles that immune cells play during bone repair. For example, what are the biological annotations of the immune clusters (B, T, NK, myeloid cells)? Are there any inflammatory genes or related signals unregulated in these immune cells? Do they interact with SSPC or IIFC during the transition?   

      In this manuscript, we report the overall dataset and focused our analyses on the response of SSPCs to injury and their differentiation trajectories. We did not include detailed analyses of the immune cell populations, that are out of scope of this manuscript and are part of another study (Hachemi et al, biorxiv, 2024)

      (3) The conclusion of Notch and Wnt signaling in IIFC transition was not sufficiently supported by the analysis presented in the manuscript, which was based on computational inferences. It will be great to add in references supporting these claims or provide experimental validations examining selected members of these pathways.

      The role of Wnt and Notch in bone repair has been widely studied and both signaling pathways are known to be regulators of SSPCs differentiation (Lee et al. 2021; Matthews et al. 2014; Novak et al. 2020; Wang et al. 2016; Kraus et al. 2022; Dishowitz et al. 2012; Junjie Cao et al. 2017; Matsushita et al. 2020; Steven Minear et al. 2010; Steve Minear et al. 2010; Kang et al. 2007; Komatsu et al. 2010). It was previously shown that Notch inactivation at early stages of repair leads to bone non-union while Notch inactivation in chondrocytes and osteoblasts does not significantly affect healing, confirming its role in SSPC differentiation before osteochondral commitment (Wang et al. 2016). Wnt was shown to be a critical driver of osteogenesis (Matsushita et al. 2020; Steve Minear et al. 2010; Steven Minear et al. 2010; Kang et al. 2007; Komatsu et al. 2010), as Wnt inhibition alters bone formation and Wnt overactivation increases bone formation (Pinzone et al. 2009; Balemans et Van Hul 2007). The role of Wnt is specific to osteogenic engagement as Wnt inhibition promotes chondrogenesis (Hsieh et al. 2023; C.-L. Wu et al. 2021; Ruscitto et al. 2023). A study by Lee et al. recently confirmed the successive activation and crosstalk of Notch and Wnt pathways during osteogenic differentiation of SSPCs during bone healing (Lee et al. 2021). They showed a peak of Notch activation at day 3 post-injury followed by a progressive decrease that parallels an increase of Wnt signaling inducing osteogenic differentiation. These studies correlate with the sequential activation of Notch and Wnt observed in our snRNAseq analyses. Our analyses now reveal how this sequential activation of Notch and Wnt relates to the fibrogenic and osteogenic phase of SSPC differentiation respectively. We clarified this in the discussion and added the references above to support our claims. 

      Recommendations for the authors: 

      Reviewer #1 (Recommendations For The Authors): 

      (1) The manuscript is well-written overall. However, the authors often oversimplify outcomes and overstate the results. Some of the statements (delineated below) need to be recalibrated to be in line with the presented data. 

      In addition to the suggested conclusions, we also toned down the following ones to avoid overstating our results :

      Line 24: suggesting a crucial paracrine role of this transient IIFC population

      Line 227: suggesting their central role in mediating cell interactions after fracture

      line 243: IIFCs produce paracrine factors that can regulate SSPCs

      - Line 77 (86): The authors should add "might" before "correspond to". 

      We provided new sets of data including CFU experiments and transplantation assay to reinforce our conclusion. We replaced “correspond to” by “encompass”

      - Line 102: SSPCs are obviously not "absent" in day 3 snRNAseq (Figure 2d). The percentage dropped (only) 75%, according to Figure 2e, which is far from disappearance. Overall, immunohistochemical staining is often dichotomous with snRNAseq designations. The authors should more carefully describe the results. 

      We agree that this comment may not reflect the data shown as we observe a strong decrease in the percentage of cells in SSPC clusters, but still detect few cells in the SSPC clusters. However, when we looked at the presence of Sca1+ Pi16+ cells at different time points, we confirmed the absence of cells expressing SSPC signature genes (Sca1, Pi16, Cd34) at day 3 injury. Due to the clustering resolution of the combined integration, some cells in the SSPC clusters might not be Sca1+ Pi16+. We now show these results in Fig. 4 – Supplementary Figure 2. We changed the text accordingly (line 120): “We did not detect Pi16-expressing SPPCs, consistent with the absence of cells expressing SSPC markers in the day 3 snRNAseq dataset compared to uninjured periosteum (Fig. 4 – Supplementary Figure 2)”.

      - Line 134: The authors need to clearly state that GFP+IIFCs were isolated based on Prx1CreGFP+CD146-. The authors did not clearly demonstrate the relationship between POSTN+ cells and CD146- cells, which poses concerns about the interpretation of transplantation experiments. 

      As mentioned above in response to reviewer 1-public review, we have clarified and provided additional information on our strategy to isolate SSPCs and IIFCs. We used the Prx1Cre; R26mTmG mice to mark all SSPCs and their derivatives with the GFP reporter in order to trace these populations after cell grafting. In the uninjured periosteum, Sca1 (Ly6a) is only expressed by SSPCs and endothelial cells. We sorted GFP+Sca1+ cells to exclude endothelial cells. For IIFCs, we isolated cells at day 3 post-fracture, as in our snRNAseq data, we detect IIFCs but no SSPCs, chondrocytes or osteoblasts at this time point. However, we also detected pericytes that can be Prx1-derived. To eliminate potential pericyte contamination, we sorted GFP+ CD146- cells, as CD146 is specifically expressed by pericytes. We added Figure 6-supplementary Figure 1 to better illustrate the expression of Prx1, SCA1 (Ly6a) and CD146 (Mcam) in the uninjured and day 3 post-fracture datasets. We further demonstrate the purity of SSPCs and IIFCs isolation by qPCR on sorted GFP+ Sca1+ cells from uninjured periosteum and GFP+ CD146- cells from day 3 postfracture periosteum and hematoma and confirmed the absence of contamination by other cell populations (Figure 6-Supplementary figure 1E). We made the following changes in the text (line 153): “To functionally validate the steps of pSSPC activation, we isolated SCA1+ GFP+ pSSPCs from Prx1Cre; R26mTmG mice, excluding endothelial cells, and grafted them at the fracture site of wild-type hosts” and “we isolated GFP+ CD146- from the fracture callus of Prx1Cre; R26mTmG mice at day 3 post fracture, that correspond to IIFCs without contamination by pericytes (CD146+ cells) (Fig. 6C, Figure 6 – Supplementary Fig.1).

      - Line 211: It is obvious from Figure 8F that ligand expression was not "specific" to the IIFC phase.

      The data only shows a slight enrichment of ligand score. 

      We corrected the text by “ligand expression was increased during the IIFC phase”.

      (2) Some of the computational predictions are incongruent with the known lineage trajectory. For example, in vivo lineage tracing experiments, including but not limited to, PLoS Genet. 2014. 10:e1004820, demonstrate that some of the chondrocytes within fracture callus can differentiate into osteoblasts. This is incompatible with the authors' conclusion that osteoblasts and chondrocytes represent two different terminal stages of cell differentiation in fracture healing. How do the authors reconcile this apparent inconsistency? 

      In this manuscript, we generated datasets corresponding to the initial stages of bone repair until day 7 post-injury. Therefore, our analyses encompass SSPC activation stages and engagement into osteogenesis and chondrogenesis. The results show that a portion of osteoblasts in the fracture callus are differentiating directly from IIFC via intramembranous ossification. The reviewer is correct to mention that osteoblasts have also been shown to derive from transdifferentiation of chondrocytes, which occurs at later stages of repair during the active phase of endochondral ossification (Julien et al. 2020; Aghajanian et Mohan 2018; Zhou et al. 2014; Hu et al. 2017). This process of chondrocyte to osteoblast transdifferentiation is not represented in our integrated dataset and may require adding later time points. However, when we analyzed the days 5 and 7 datasets independent of days 0 and 3, we were able to identify a cluster of hypertrophic chondrocytes (expressing Col10a1) connecting the clusters of chondrocytes and osteoblasts. This suggests that in this cluster, hypertrophic chondrocytes are undergoing transdifferentiation into osteoblasts as shown in the Author response image 1. Additional time points are needed in a future study to perform in depth analyses of chondrocyte transdifferentiation. 

      Author response image 1.

      Periosteum-derived chondrocytes undergo cartilage to bone transformation. A. UMAP projection of the subset of SSPCs, IIFCs, osteoblasts and chondrocytes in the integration of days 5 and 7 post-fracture datasets. B. Feature plots of Acan, Col10a1 and Ibsp expression.  C. UMAP projection separated by time points. D. Percentage of cells in the hypertrophic/differentiating chondrocyte cluster.

      (3) The authors did not cite some of the studies that described the roles of Notch signaling in fracture healing, for example, J Bone Miner Res. 2014. 29:1283-94. The authors should test the specificity of Notch signaling activities to IIFCs (POSTN+ cells) in vivo. 

      The role of Notch in the activation of SSPCs during bone repair has been investigated in several studies (Lee et al. 2021; Matthews et al. 2014; Novak et al. 2020; Wang et al. 2016; Kraus et al. 2022; Dishowitz et al. 2012; Junjie Cao et al. 2017). Notch dynamic was previously described with a peak at day 3 post-injury before a reduction when cells engage in osteogenesis and chondrogenesis (Lee et al. 2021; Dishowitz et al. 2012; Matthews et al. 2014). Notch plays a role in the early steps of SSPC activation prior to osteochondral differentiation as Notch inactivation in chondrocytes and osteoblasts does not affect bone repair (Wang et al. 2016). We added the references listed above to emphasize the correlation between our results and previous reports on the role of Notch and made changes in the discussion.

      Reviewer #2 (Recommendations For The Authors): 

      Suggestions 

      (1) This research utilized snRNA seq for the basic hypothesis formation; however, the number of nuclei acquired was quite limited. Therefore, please explain the rationale for employing snRNA seq instead of scRNA seq, which includes cytoplasm, and additionally provide the markers used for cell type mapping in the scRNA analysis.  

      As mentioned in our response to reviewer #1 above, we analyzed a total of 6,213 nuclei from uninjured periosteum and fracture calluses at 3 stages of bone healing. We were able to describe 11 distinct cell populations including rare cell types in the fracture environment such Schwann cells, adipocytes and pericytes. The number of nuclei was sufficient to perform extensive analysis using a combination of cutting-edge algorithms. We agree that more nuclei would allow more indepth analyses of cell fate transitions and rare populations, such as pericytes and Schwann cells. However, we concentrated here on SSPC/fibrogenic cell that are well represented in our dataset. Our study robustness is also reinforced by the analysis of 4 successive time points to define the SSPC/fibrogenic cell trajectories. Our validations using immunohistochemistry and transplantation assays also confirmed that our dataset is sufficient to define cell trajectories. There is no clear consensus on the number of cells needed to perform scRNAseq analyses, as it depends on the cell types analyzed and the fold changes in gene expression. Previously reported single cell datasets containing a lower number of cells reached major conclusions including SSPC identification, cell differentiation trajectories and differential gene expression (658 cells in(Debnath et al. 2018), 300 in (Ambrosi et al. 2021) around 175 in(Remark et al. 2023))

      Several studies have shown that snRNAseq provide data quality equivalent to scRNAseq in terms of cell type identification, number of detected genes and downstream analyses (Selewa et al. 2020; Wen et al. 2022; Ding et al. 2020; H. Wu et al. 2019; Machado et al. 2021). While, snRNAseq do not allow the detection of cytoplasm RNA, there is several advantages in using this technique: 

      (1) better representation of the cell types. To perform scRNAseq, a step of enzymatic digestion is needed. This usually leads to an overrepresentation of some cell types loosely attached to the ECM (immune cells, endothelial cells) and a reduced representation of cell types strongly attached to the ECM, such as chondrocytes and osteoblasts. In addition, large or multinucleated cells like hypertrophic chondrocytes and osteoclasts are too big to be sorted and encapsidated using 10X technology. Here, we optimized a protocol to mechanically isolate nuclei from dissected tissues that allows us to capture the diversity of cell types in periosteum and fracture callus.

      (2) higher recovery of nuclei. We performed both isolation of cells and nuclei from periosteum in our study and observed that nuclei extraction is the most efficient way to isolate cells from the periosteum and the fracture callus.

      (3) reduction of isolation time and cell stress. Previous studies showed that enzymatic digestion causes cell stress and induces stem cell activation (Machado et al. 2021; van den Brink et al. 2017). Therefore, we decided to perform snRNAseq to analyze the transcriptome of the intact periosteum without digestion induced-biais.

      We added this sentence in the result section: “Single nuclei transcriptomics was shown to provide results equivalent to single cell transcriptomics, but with better cell type representation and reduced digestion-induced stress response (Selewa et al. 2020; Wen et al. 2022; Ding et al. 2020; H. Wu et al. 2019; Machado et al. 2021)”.

      The list of genes used for cell type mapping are presented in Figure 3 – Supplementary figure 1. We added a detailed dot plot as Figure 3 – Supplementary figure 2.

      (2) During the fracture healing process of long bones, the influx of fibroblasts is a relatively common occurrence, and the fibrous callus that forms during bone repair and regeneration is reported to disappear over time. Therefore, inferring that IIFC differentiates into osteo- and chondrogenic cells based solely on their simultaneous appearance in the same time and space is challenging. More detailed validation is necessary, beyond what is supported by bioinformatics analysis. 

      The first step of bone repair is the formation of a fibrous callus, before cartilage and bone formation. There are no data in the literature demonstrating that an influx of fibroblasts occurs at the fracture site. Several studies now show that cells involved in callus formation are recruited locally (i.e. from the bone marrow, the periosteum and the skeletal muscle surrounding the fracture site) (Duchamp de Lageneste et al. 2018; Julien et al. 2021; Colnot 2009; Jeffery et al. 2022; Debnath et al. 2018; Matsushita et al. 2020; Julien et al. 2022; Matthews et al. 2021). The contribution of locally activated SSPCs to the fibrous callus is less well understood. Lineage tracing shows that GFP+ cell populations traced in Prx1Cre-GFP mice include SSPCs, IIFCs, chondrocytes and osteoblasts.

      The timing of the cell trajectories observed in our dataset correlates with the timing of callus formation previously described in the literature as the day 3 post-fracture mostly contains IIFCs while chondrocytes and osteoblasts appear from day 5 post-fracture. We conclude that IIFCs differentiate into osteochondrogenic cells based on multiple evidence beside the simultaneous appearance in time and space:

      - In silico trajectory analyses identify a trajectory from SSPCs to osteochondrogenic cells via IIFCs. We added an analysis to show that our pseudotime trajectory parallels the timepoints of the dataset, confirming that the differentiation trajectory follows the timing of cell differentiation (Figure 5D).

      - We show that IIFCs start to express chondrogenic and osteogenic genes prior to engaging into chondrogenesis and osteogenesis. In addition, we detected activation of osteo- and chondrogenic specific transcription factors in IIFCs. This shows a differentiation continuum between SSPCs, IIFCS, and osteochondrogenic cells (Figures 6-8).

      - Using transplantation assay, we showed that IIFCs form cartilage and bone, therefore reinforcing the osteochondrogenic potential of this population (Figure 6B).

      - IIFCs do not undergo apoptosis. We assessed the expression of apoptosis-related genes by IIFCs and did not detect expression. This was confirmed by cleaved caspase 3 immunostaining showing that a very low percentage of cells in the early fibrotic tissue undergo apoptosis. 

      Therefore, the idea that the initial fibrous callus is replaced by a new influx of SSPCs or committed progenitors is not supported by recent literature and is not observed in our dataset containing all cell types from the periosteum and fracture site. Overall, our bioinformatic analyses combined with our in vivo validation strongly support that IIFCs are differentiating into chondrocytes and osteoblasts during bone repair. Additional in vivo functional studies will aim to further validate the trajectory and investigate the critical factors regulating this process.

      (3) The influx of most osteogenic progenitors to the bone fracture site typically appears after postfracture day 7. It's essential to ascertain whether the osteogenic cells observed at the time of this study differentiated from IIFC or migrated from surrounding mesenchymal stem cells. 

      As mentioned above, there is not clear evidence in the literature indicating an influx of osteoprogenitors. Cells involved in callus formation are recruited locally and predominantly from the periosteum (Duchamp de Lageneste et al. 2018; Julien et al. 2021; Colnot 2009; Jeffery et al. 2022; Debnath et al. 2018; Matsushita et al. 2020; Matthews et al. 2021; Julien et al. 2022). Our datasets therefore include all cell populations that form the callus. Other sources of SSPCs include the surrounding muscle that contributes mostly to cartilage, and bone marrow that contributes to a low percentage of the callus osteoblasts in the medullary cavity (Julien et al. 2021; Jeffery et al. 2022). We provide evidence that IIFCs give rise to osteogenic cells using our bioinformatic analyses and in vivo transplantation assay (listed in the response above). As indicated in our response to reviewer #1, the steps leading to osteogenic differentiation observed in our dataset reflect the first step of callus ossification and correspond to the process of intramembranous ossification (up to day 7 post-injury). Endochondral ossification also contributes to osteoblasts including the transdifferentiation of chondrocytes into osteoblasts (Julien et al. 2020; Zhou et al. 2014; Hu et al. 2017). While this process mostly occurs around day 14 postfracture, we begin to detect this transition in our integrated day 5-day 7 dataset as shown in Author response image 1. 

      (4) It's crucial to determine whether the IIFC appearing at the fracture site contributes to the formation of the callus matrix or undergoes apoptosis during the fracture healing process. In the early steps of bone repair, the callus is mostly composed of an extracellular matrix (ECM). IIFCs are expressing high levels of ECM genes, including Postn, Aspn and collagens (Col3a1, Col5a1, Col8a1, Col12a1) (Figure 3 – Supplementary Figures 1-2 and Fig. 7 – Supplementary Figure 1B). IIFCs are the cells expressing the highest levels of matrix-related genes compared to the other cell types in the fracture environment (i.e. immune cells, endothelial cells, Schwann cells, pericytes, …) as shown now in Fig. 7 – Supplementary Figure 1A. Therefore, IIFCs are the main contributors to the callus matrix.

      We investigated if IIFCs undergo apoptosis. We observed that only a low percentage of IIFCs express apoptosis-related genes and are positive for cleaved caspase 3 immunostaining at days 3, 5 and 7 of bone repair. This shows that IIFCs do not undergo apoptosis and reinforces our model in which IIFCs further differentiate into osteoblasts and chondrocytes. We added these data in Fig. 7 – Supplementary Figure 2 and added the sentence in the results section “Only a small subset of IIFCs undergo apoptosis, further supporting that IIFCs are maintained in the fracture environment giving rise to osteoblasts and chondrocytes (Fig. 7 – Supplementary Figure 2).” 

      (5) Results from the snRNA seq highlight the paracrine role of IIFC, and verification is needed to ensure that the effect this has on surrounding osteogenic lineages is not misinterpreted.  

      To assess cell-cell interactions, we used tools such as Connectome and CellChat to infer and quantify intercellular communication networks between cell types. Studies showed the robustness of these tools combined with in vivo validation (Sinha et al. 2022; Alečković et al. 2022; Li et al. 2023). Here we used these tools to illustrate the paracrine profile of IIFCs, but in vivo validation would be required using gene inactivation to assess the requirement of individual paracrine factors. We performed extensive analyses of the crosstalk between immune cells and SSPCs using our dataset in another study combined with in vivo validation, showing the robustness of the tool and the dataset (Hachemi et al. 2024). We adjusted our conclusions to reflect our analyses: “suggesting a crucial paracrine role of this transient IIFC population during fracture healing”, “suggesting their central role in mediating cell interactions after fracture”, “suggesting that SSPCs can receive signals from IIFC”. 

      References

      Aghajanian, Patrick, et Subburaman Mohan. 2018. “The Art of Building Bone: Emerging Role of Chondrocyte-to-Osteoblast Transdifferentiation in Endochondral Ossification“. Bone Research 6 (1): 19. https://doi.org/10.1038/s41413-018-0021-z.

      Alečković, Maša, Simona Cristea, Carlos R. Gil Del Alcazar, Pengze Yan, Lina Ding, Ethan D. Krop, Nicholas W. Harper, et al. 2022. “Breast Cancer Prevention by Short-Term Inhibition of TGFβ Signaling“. Nature Communications 13 (1): 7558. https://doi.org/10.1038/s41467-02235043-5.

      Ambrosi, Thomas H., Owen Marecic, Adrian McArdle, Rahul Sinha, Gunsagar S. Gulati, Xinming Tong, Yuting Wang, et al. 2021. “Aged Skeletal Stem Cells Generate an Inflammatory Degenerative Niche”. Nature 597 (7875): 256‑62. https://doi.org/10.1038/s41586-021-03795-7.

      Baccin, Chiara, Jude Al-Sabah, Lars Velten, Patrick M. Helbling, Florian Grünschläger, Pablo Hernández-Malmierca, César Nombela-Arrieta, Lars M. Steinmetz, Andreas Trumpp, et Simon Haas. 2020. “Combined Single-Cell and Spatial Transcriptomics Reveal the Molecular, Cellular and Spatial Bone Marrow Niche Organization”. Nature Cell Biology 22 (1): 38‑48. https://doi.org/10.1038/s41556-019-0439-6.

      Balemans, Wendy, et Wim Van Hul. 2007. “The Genetics of Low-Density Lipoprotein ReceptorRelated Protein 5 in Bone: A Story of Extremes”. Endocrinology 148 (6): 2622‑29. https://doi.org/10.1210/en.2006-1352.

      Brink, Susanne C van den, Fanny Sage, Ábel Vértesy, Bastiaan Spanjaard, Josi Peterson-Maduro, Chloé S Baron, Catherine Robin, et Alexander van Oudenaarden. 2017. “Single-Cell Sequencing Reveals Dissociation-Induced Gene Expression in Tissue Subpopulations”. Nature Methods 14 (10): 935‑36. https://doi.org/10.1038/nmeth.4437.

      Cao, Junjie, Yalin Wei, Jing Lian, Lunyun Yang, Xiaoyan Zhang, Jiaying Xie, Qiang Liu, Jinyong Luo, Baicheng He, et Min Tang. 2017. ”Notch Signaling Pathway Promotes Osteogenic Differentiation of Mesenchymal Stem Cells by Enhancing BMP9/Smad Signaling”. International Journal of Molecular Medicine 40 (2): 378‑88. https://doi.org/10.3892/ijmm.2017.3037.

      Cao, Junyue, Malte Spielmann, Xiaojie Qiu, Xingfan Huang, Daniel M. Ibrahim, Andrew J. Hill, Fan Zhang, et al. 2019. ”The Single-Cell Transcriptional Landscape of Mammalian Organogenesis”. Nature 566 (7745): 496‑502. https://doi.org/10.1038/s41586-019-0969-x.

      Colnot, Céline. 2009. “Skeletal Cell Fate Decisions Within Periosteum and Bone Marrow During Bone Regeneration”. Journal of Bone and Mineral Research 24 (2): 274‑82. https://doi.org/10.1359/jbmr.081003.

      Debnath, Shawon, Alisha R. Yallowitz, Jason McCormick, Sarfaraz Lalani, Tuo Zhang, Ren Xu, Na Li, et al. 2018. “Discovery of a Periosteal Stem Cell Mediating Intramembranous Bone Formation”. Nature 562 (7725): 133‑39. https://doi.org/10.1038/s41586-018-0554-8.

      Ding, Jiarui, Xian Adiconis, Sean K. Simmons, Monika S. Kowalczyk, Cynthia C. Hession, Nemanja D. Marjanovic, Travis K. Hughes, et al. 2020. “Systematic Comparison of Single-Cell and Single-Nucleus RNA-Sequencing Methods”. Nature Biotechnology 38 (6): 737‑46.

      https://doi.org/10.1038/s41587-020-0465-8.

      Dishowitz, Michael I., Shawn P. Terkhorn, Sandra A. Bostic, et Kurt D. Hankenson. 2012. “Notch Signaling Components Are Upregulated during Both Endochondral and Intramembranous Bone Regeneration”. Journal of Orthopaedic Research 30 (2): 296‑303. https://doi.org/10.1002/jor.21518.

      Duchamp de Lageneste, Oriane, Anaïs Julien, Rana Abou-Khalil, Giulia Frangi, Caroline Carvalho, Nicolas Cagnard, Corinne Cordier, Simon J. Conway, et Céline Colnot. 2018. “Periosteum Contains Skeletal Stem Cells with High Bone Regenerative Potential Controlled by Periostin”. Nature Communications 9 (1): 773. https://doi.org/10.1038/s41467-018-03124-z.

      Hsieh, Chen-Chan, B. Linju Yen, Chia-Chi Chang, Pei-Ju Hsu, Yu-Wei Lee, Men-Luh Yen, ShawFang Yet, et Linyi Chen. 2023. “Wnt Antagonism without TGFβ Induces Rapid MSC Chondrogenesis via Increasing AJ Interactions and Restricting Lineage Commitment”. iScience 26 (1): 105713. https://doi.org/10.1016/j.isci.2022.105713.

      Hu, Diane P., Federico Ferro, Frank Yang, Aaron J. Taylor, Wenhan Chang, Theodore Miclau, Ralph S. Marcucio, et Chelsea S. Bahney. 2017. “Cartilage to Bone Transformation during Fracture Healing Is Coordinated by the Invading Vasculature and Induction of the Core Pluripotency Genes”. Development 144 (2): 221‑34. https://doi.org/10.1242/dev.130807.

      Jeffery, Elise C., Terry L.A. Mann, Jade A. Pool, Zhiyu Zhao, et Sean J. Morrison. 2022. “Bone Marrow and Periosteal Skeletal Stem/Progenitor Cells Make Distinct Contributions to Bone Maintenance and Repair”. Cell Stem Cell 29 (11): 1547-1561.e6. https://doi.org/10.1016/j.stem.2022.10.002.

      Julien, Anais, Anuya Kanagalingam, Ester Martínez-Sarrà, Jérome Megret, Marine Luka, Mickaël Ménager, Frédéric Relaix, et Céline Colnot. 2021. “Direct contribution of skeletal muscle mesenchymal progenitors to bone repair”. Nature Communications 12 (1): 2860. https://doi.org/10.1038/s41467-021-22842-5.

      Julien, Anais, Simon Perrin, Oriane Duchamp de Lageneste, Caroline Carvalho, Morad Bensidhoum, Laurence Legeai-Mallet, et Céline Colnot. 2020. “FGFR3 in Periosteal Cells Drives Cartilage-to-Bone Transformation in Bone Repair”. Stem Cell Reports 15 (4): 955‑67. https://doi.org/10.1016/j.stemcr.2020.08.005.

      Julien, Anais, Simon Perrin, Ester Martínez-Sarrà, Anuya Kanagalingam, Caroline Carvalho, Marine Luka, Mickaël Ménager, et Céline Colnot. 2022. “Skeletal Stem/Progenitor Cells in Periosteum and Skeletal Muscle Share a Common Molecular Response to Bone Injury”. Journal of Bone and Mineral Research, juin, jbmr.4616. https://doi.org/10.1002/jbmr.4616.

      Kang, Sona, Christina N. Bennett, Isabelle Gerin, Lauren A. Rapp, Kurt D. Hankenson, et Ormond A. MacDougald. 2007. “Wnt Signaling Stimulates Osteoblastogenesis of Mesenchymal Precursors by Suppressing CCAAT/Enhancer-Binding Protein α and Peroxisome Proliferator Activated        Receptor γ”. Journal of Biological Chemistry 282 (19): 14515‑24. https://doi.org/10.1074/jbc.M700030200.

      Komatsu, David E., Michelle N. Mary, Robert Jason Schroeder, Alex G. Robling, Charles H. Turner, et Stuart J. Warden. 2010. “Modulation of Wnt Signaling Influences Fracture Repair”. Journal of Orthopaedic Research 28 (7): 928‑36. https://doi.org/10.1002/jor.21078.

      Hachemi, Yasmine, Simon Perrin, Maria Ethel, Anais Julien, Julia Vettese, Blandine Geisler, Christian Göritz, et Céline Colnot. 2024. “Multimodal Analyses of Immune Cells during Bone Repair Identify Macrophages as a Therapeutic Target in Musculoskeletal Trauma”. https://doi.org/10.1101/2024.04.29.591608.

      Kraus, Jessica M., Dion Giovannone, Renata Rydzik, Jeremy L. Balsbaugh, Isaac L. Moss, Jennifer L. Schwedler, Julien Y. Bertrand, et al. 2022. “Notch Signaling Enhances Bone Regeneration in the Zebrafish Mandible”. Development 149 (5): dev199995. https://doi.org/10.1242/dev.199995.

      Lee, S., L. H. Remark, A. M. Josephson, K. Leclerc, E. Muiños Lopez, D. J. Kirby, Devan Mehta, et al. 2021. “Notch-Wnt Signal Crosstalk Regulates Proliferation and Differentiation of Osteoprogenitor Cells during Intramembranous Bone Healing”. Npj Regenerative Medicine 6 (1): 29. https://doi.org/10.1038/s41536-021-00139-x.

      Li, Jiaoduan, Dongyan Cao, Lixin Jiang, Yiwen Zheng, Siyuan Shao, Ai Zhuang, et Dongxi Xiang. 2023. “ITGB2-ICAM1 Axis Promotes Liver Metastasis in BAP1-Mutated Uveal Melanoma with Retained Hypoxia and ECM Signatures”. Cellular Oncology (Dordrecht), décembre. https://doi.org/10.1007/s13402-023-00908-4.

      Logan, Malcolm, James F. Martin, Andras Nagy, Corrinne Lobe, Eric N. Olson, et Clifford J. Tabin. 2002. “Expression of Cre Recombinase in the Developing Mouse Limb Bud Driven by aPrxl Enhancer”. Genesis 33 (2): 77‑80. https://doi.org/10.1002/gene.10092.

      Machado, Léo, Perla Geara, Jordi Camps, Matthieu Dos Santos, Fatima Teixeira-Clerc, Jens Van Herck, Hugo Varet, et al. 2021.”Tissue Damage Induces a Conserved Stress Response That Initiates Quiescent Muscle Stem Cell Activation”. Cell Stem Cell 28 (6): 1125-1135.e7. https://doi.org/10.1016/j.stem.2021.01.017.

      Matsushita, Yuki, Mizuki Nagata, Kenneth M. Kozloff, Joshua D. Welch, Koji Mizuhashi, Nicha Tokavanich, Shawn A. Hallett, et al. 2020. “A Wnt-Mediated Transformation of the Bone Marrow Stromal Cell Identity Orchestrates Skeletal Regeneration”. Nature Communications 11 (1): 332. https://doi.org/10.1038/s41467-019-14029-w.

      Matthews, Brya G, Danka Grcevic, Liping Wang, Yusuke Hagiwara, Hrvoje Roguljic, Pujan Joshi, Dong-Guk Shin, Douglas J Adams, et Ivo Kalajzic. 2014. “Analysis of αSMA-Labeled Progenitor Cell Commitment Identifies Notch Signaling as an Important Pathway in Fracture Healing”. Journal of Bone and Mineral Research 29 (5): 1283‑94. https://doi.org/10.1002/jbmr.2140.

      Matthews, Brya G, Sanja Novak, Francesca V Sbrana, Jessica L Funnell, Ye Cao, Emma J Buckels, Danka Grcevic, et Ivo Kalajzic. 2021. “Heterogeneity of Murine Periosteum Progenitors Involved in Fracture Healing”. eLife 10 (février):e58534. https://doi.org/10.7554/eLife.58534.

      Minear, Steve, Philipp Leucht, Samara Miller, et Jill A Helms. 2010. “rBMP Represses Wnt Signaling and Influences Skeletal Progenitor Cell Fate Specification during Bone Repair”. Journal of Bone and Mineral Research 25 (6): 1196‑1207. https://doi.org/10.1002/jbmr.29.

      Minear, Steven, Philipp Leucht, Jie Jiang, Bo Liu, Arial Zeng, Christophe Fuerer, Roel Nusse, et Jill A. Helms. 2010. “Wnt Proteins Promote Bone Regeneration”. Science Translational Medicine 2 (29). https://doi.org/10.1126/scitranslmed.3000231.

      Novak, Sanja, Emilie Roeder, Benjamin P. Sinder, Douglas J. Adams, Chris W. Siebel, Danka Grcevic, Kurt D. Hankenson, Brya G. Matthews, et Ivo Kalajzic. 2020. “Modulation of Notch1 Signaling Regulates Bone Fracture Healing”. Journal of Orthopaedic Research 38 (11): 2350‑61. https://doi.org/10.1002/jor.24650.

      Pinzone, Joseph J., Brett M. Hall, Nanda K. Thudi, Martin Vonau, Ya-Wei Qiang, Thomas J. Rosol, et John D. Shaughnessy. 2009. “The Role of Dickkopf-1 in Bone Development, Homeostasis, and Disease”. Blood 113 (3): 517‑25. https://doi.org/10.1182/blood-2008-03-145169.

      Remark, Lindsey H., Kevin Leclerc, Malissa Ramsukh, Ziyan Lin, Sooyeon Lee, Backialakshmi Dharmalingam, Lauren Gillinov, et al. 2023. “Loss of Notch Signaling in Skeletal Stem Cells Enhances Bone Formation with Aging”. Bone Research 11 (1): 50. https://doi.org/10.1038/s41413-023-00283-8.

      Ruscitto, Angela, Peng Chen, Ikue Tosa, Ziyi Wang, Gan Zhou, Ingrid Safina, Ran Wei, et al. 2023. “Lgr5-Expressing Secretory Cells Form a Wnt Inhibitory Niche in Cartilage Critical for Chondrocyte Identity”. Cell Stem Cell 30 (9): 1179-1198.e7. https://doi.org/10.1016/j.stem.2023.08.004.

      Selewa, Alan, Ryan Dohn, Heather Eckart, Stephanie Lozano, Bingqing Xie, Eric Gauchat, Reem Elorbany, et al. 2020. “Systematic Comparison of High-Throughput Single-Cell and SingleNucleus Transcriptomes during Cardiomyocyte Differentiation”. Scientific Reports 10 (1): 1535. https://doi.org/10.1038/s41598-020-58327-6.

      Sinha, Sarthak, Holly D. Sparks, Elodie Labit, Hayley N. Robbins, Kevin Gowing, Arzina Jaffer, Eren Kutluberk, et al. 2022. “Fibroblast Inflammatory Priming Determines Regenerative versus Fibrotic Skin Repair in Reindeer”. Cell 185 (25): 4717-4736.e25. https://doi.org/10.1016/j.cell.2022.11.004.

      Wang, Cuicui, Jason A. Inzana, Anthony J. Mirando, Yinshi Ren, Zhaoyang Liu, Jie Shen, Regis J. O’Keefe, Hani A. Awad, et Matthew J. Hilton. 2016. “NOTCH Signaling in Skeletal Progenitors Is Critical for Fracture Repair”. The Journal of Clinical Investigation 126 (4): 1471‑81. https://doi.org/10.1172/JCI80672.

      Wen, Fei, Xiaojie Tang, Lin Xu, et Haixia Qu. 2022. “Comparison of Single‑nucleus and Single‑cell Transcriptomes in Hepatocellular Carcinoma Tissue”. Molecular Medicine Reports 26 (5): 339. https://doi.org/10.3892/mmr.2022.12855.

      Wu, Chia-Lung, Amanda Dicks, Nancy Steward, Ruhang Tang, Dakota B. Katz, Yun-Rak Choi, et Farshid Guilak. 2021. “Single Cell Transcriptomic Analysis of Human Pluripotent Stem Cell Chondrogenesis”. Nature Communications 12 (1): 362. https://doi.org/10.1038/s41467-02020598-y.

      Wu, Haojia, Yuhei Kirita, Erinn L. Donnelly, et Benjamin D. Humphreys. 2019. “Advantages of Single-Nucleus over Single-Cell RNA Sequencing of Adult Kidney: Rare Cell Types and Novel Cell States Revealed in Fibrosis”. Journal of the American Society of Nephrology 30 (1): 23‑32. https://doi.org/10.1681/ASN.2018090912.

      Zhong, Leilei, Lutian Yao, Robert J. Tower, Yulong Wei, Zhen Miao, Jihwan Park, Rojesh Shrestha, et al. 2020. “Single Cell Transcriptomics Identifies a Unique Adipose Lineage Cell Population That Regulates Bone Marrow Environment”. eLife 9 (avril):e54695. https://doi.org/10.7554/eLife.54695.

      Zhou, Xin, Klaus von der Mark, Stephen Henry, William Norton, Henry Adams, et Benoit de Crombrugghe. 2014. “Chondrocytes Transdifferentiate into Osteoblasts in Endochondral Bone during Development, Postnatal Growth and Fracture Healing in Mice”. Édité par Matthew L. Warman. PLoS Genetics 10 (12): e1004820. https://doi.org/10.1371/journal.pgen.1004820.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study evaluates whether species can shift geographically, temporally, or both ways in response to climate change. It also teases out the relative importance of geographic context, temperature variability, and functional traits in predicting the shifts. The study system is large occurrence datasets for dragonflies and damselflies split between two time periods and two continents. Results indicate that more species exhibited both shifts than one or the other or neither, and that geographic context and temp variability were more influential than traits. The results have implications for future analyses (e.g. incorporating habitat availability) and for choosing winner and loser species under climate change. The methodology would be useful for other taxa and study regions with strong community/citizen science and extensive occurrence data.

      We thank Reviewer 1 for their time and expertise in reviewing our study. The suggestions are very helpful and will improve the quality of our manuscript.

      Strengths:

      This is an organized and well-written paper that builds on a popular topic and moves it forward. It has the right idea and approach, and the results are useful answers to the predictions and for conservation planning (i.e. identifying climate winners and losers). There is technical proficiency and analytical rigor driven by an understanding of the data and its limitations.

      We thank Reviewer 1 for this assessment.

      Weaknesses:

      (1) The habitat classifications (Table S3) are often wrong. "Both" is overused. In North America, for example, Anax junius, Cordulia shurtleffii, Epitheca cynosura, Erythemis simplicicollis, Libellula pulchella, Pachydiplax longipennis, Pantala flavescens, Perithemis tenera, Ischnura posita, the Lestes species, and several Enallagma species are not lotic breeding. These species rarely occur let alone successfully reproduce at lotic sites. Other species are arguably "both", like Rhionaeschna multicolor which is mostly lentic. Not saying this would have altered the conclusions, but it may have exacerbated the weak trait effects.

      We thank the reviewer for their expertise on this topic. We obtained these habitat classifications from field guides and trait databases, and we will review our primary sources to clarify the trait classifications. We will also reclassify the species according to the expertise of this reviewer and perform our analysis again. 

      (2) The conservative spatial resolution (100 x 100 km) limits the analysis to wide- ranging and generalist species. There's no rationale given, so not sure if this was by design or necessity, but it limits the number of analyzable species and potentially changes the inference.

      It is really helpful to have the opportunity to contextualize study design decisions like this one, and we thank the reviewer for the query. Sampling intensity is always a meaningful issue in research conducted at this scale, and we addressed it head-on in this work.

      Very small quadrats covering massive geographical areas will be critically and increasingly afflicted by sampling weaknesses, as well as creating a potentially large problem with pseudoreplication. There is no simple solution to this problem. It would be possible to create interpolated predictions of species’ distributions using Species Distribution Models, Joint Species Distribution Models, or various kinds of Occupancy Models. None of these approaches then leads to analyses that rely on directly observed patterns. Instead, they are extrapolations, and those extrapolations typically fail when tested, (for example, papers by Lee-Yaw demonstrate that it is rare for SDMs to predict things well; occupancy models often perform less well than SDMs and do not capture how things change over time - Briscoe et al. 2021, Global Change Biology). The result of employing such techniques would certainly be to make all conclusions speculative, rather than directly observable. 

      Rather than employing extrapolative models, we relied on transparent techniques that are used successfully in the core macroecology literature that address spatial variation in sampling explicitly and simply. Moreover, we constructed extensive null models that show that range and phenology changes, respectively, are contrary to expectations that arise from sampling difference. 100km quadrats make for a reasonable “middle-ground” in terms of the effects of sampling, and we will add a reference to the methods section to clarify this.

      (3) The objective includes a prediction about generalists vs specialists (L99-103) yet there is no further mention of this dichotomy in the abstract, methods, results, or discussion.

      Thank you for pointing this out - it is an editing error that should have been resolved prior to submission. We will replace the terms specialist and generalist with specific predictions based on traits.

      (4) Key references were overlooked or dismissed, like in the new edition of Dragonflies & Damselflies model organisms book, especially chapters 24 and 27.

      We thank Reviewer 1 for making us aware of this excellent reference. We will review this text and include it as a reference, in addition to other references recommended by Reviewer 1 and other reviewers.

      Reviewer #2 (Public review):

      Summary:

      This paper explores a highly interesting question regarding how species migration success relates to phenology shifts, and it finds a positive relationship. The findings are significant, and the strength of the evidence is solid. However, there are substantial issues with the writing, presentation, and analyses that need to be addressed. First, I disagree with the conclusion that species that don't migrate are "losers" - some species might not migrate simply because they have broad climatic niches and are less sensitive to climate change. Second, the results concerning species' southern range limits could provide valuable insights. These could be used to assess whether sampling bias has influenced the results. If species are truly migrating, we should observe northward shifts in their southern range limits. However, if this is an artifact of increased sampling over time, we would expect broader distributions both north and south. Finally, Figure 1 is missed panel B, which needs to be addressed.

      We thank Reviewer 2 for their time and expertise in reviewing our study.

      It is possible that some species with broad niches may not need to migrate, although in general failing to move with climate change is considered an indicator of “climate debt”, signaling that a species may be of concern for conservation (ex. Duchenne et al. 2021, Ecology Letters). We will revise the discussion to acknowledge potential differences in outcomes.

      We used null models to test whether our results regarding range shifts were robust, and if they varied due to increased sampling over time. We found that observed northern range limit shifts are not consistent with expectations derived from changes in sampling intensity (Figure S1, S2). 

      We thank Reviewer 2 for pointing out this error in Figure 1. This conceptual figure was a challenge to construct, as it must illustrate how phenology and range shifts can occur simultaneously or uniquely to enable a hypothetic odonate to track its thermal niche over time. In a previous version of the figure, we had a second panel and we failed to remove the reference to that panel when we simplified the figure. 

      Reviewer #3 (Public review):

      Summary:

      In their article "Range geographies, not functional traits, explain convergent range and phenology shifts under climate change," the authors rigorously investigate the temporal shifts in odonate species and their potential predictors. Specifically, they examine whether species shift their geographic ranges poleward or alter their phenology to avoid extreme conditions. Leveraging opportunistic observations of European and North American odonates, they find that species showing significant range shifts also exhibited earlier phenological shifts. Considering a broad range of potential predictors, their results reveal that geographical factors, but not functional traits, are associated with these shifts.

      We thank Reviewer 3 for their expertise and the time they spent reviewing our study. Their suggestions are very helpful and will improve the quality of our manuscript.

      Strengths:

      The article addresses an important topic in ecology and conservation that is particularly timely in the face of reports of substantial insect declines in North America and Europe over the past decades. Through data integration the authors leverage the rich natural history record for odonates, broadening the taxonomic scope of analyses of temporal trends in phenology and distribution to this taxon. The combination of phenological and range shifts in one framework presents an elegant way to reconcile previous findings improving our understanding of the drivers of biodiversity loss.

      We thank Reviewer 3 for this assessment.

      Weaknesses:

      The introduction and discussion of the article would benefit from a stronger contextualization of recent studies on biological responses to climate change and the underpinning mechanism.

      The presentation of the results (particularly in figures) should be improved to address the integrative character of the work and help readers extract the main results. While the writing of the article is generally good, particularly the captions and results contain many inconsistencies and lack important detail. With the multitude of the relationships that were tested (the influence of traits) the article needs more coherence.

      We thank Reviewer 3 for these suggestions. We will revise the introduction and discussion to better contextualize species’ responses to climate change and the mechanisms behind them. We will carefully review all figures and captions, and we will make changes to improve the clarity of the text and the presentation of results.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The manuscript by Mäkelä et al. presents compelling experimental evidence that the amount of chromosomal DNA can become limiting for the total rate of mRNA transcription and consequently protein production in the model bacterium Escherichia coli. Specifically, the authors demonstrate that upon inhibition of DNA replication the single-cell growth rate continuously decreases, in direct proportion to the concentration of active ribosomes, as measured indirectly by single-particle tracking. The decrease of ribosomal activity with filamentation, in turn, is likely caused by a decrease of the concentration of mRNAs, as suggested by an observed plateau of the total number of active RNA polymerases. These observations are compatible with the hypothesis that DNA limits the total rate of transcription and thus translation. The authors also demonstrate that the decrease of RNAp activity is independent of two candidate stress response pathways, the SOS stress response and the stringent response, as well as an anti-sigma factor previously implicated in variations of RNAp activity upon variations of nutrient sources.

      Remarkably, the reduction of growth rate is observed soon after the inhibition of DNA replication, suggesting that the amount of DNA in wild-type cells is tuned to provide just as much substrate for RNA polymerase as needed to saturate most ribosomes with mRNAs. While previous studies of bacterial growth have most often focused on ribosomes and metabolic proteins, this study provides important evidence that chromosomal DNA has a previously underestimated important and potentially rate-limiting role for growth. 

      Thank you for the excellent summary of our work.

      Strengths: 

      This article links the growth of single cells to the amount of DNA, the number of active ribosomes and to the number of RNA polymerases, combining quantitative experiments with theory. The correlations observed during depletion of DNA, notably in M9gluCAA medium, are compelling and point towards a limiting role of DNA for transcription and subsequently for protein production soon after reduction of the amount of DNA in the cell. The article also contains a theoretical model of transcription-translation that contains a Michaelis-Menten type dependency of transcription on DNA availability and is fit to the data. While the model fits well with the continuous reduction of relative growth rate in rich medium (M9gluCAA), the behavior in minimal media without casamino acids is a bit less clear (see comments below). 

      At a technical level, single-cell growth experiments and single-particle tracking experiments are well described, suggesting that different diffusive states of molecules represent different states of RNAp/ribosome activities, which reflect the reduction of growth. However, I still have a few points about the interpretation of the data and the measured fractions of active ribosomes (see below). 

      Apart from correlations in DNA-deplete cells, the article also investigates the role of candidate stress response pathways for reduced transcription, demonstrating that neither the SOS nor the stringent response are responsible for the reduced rate of growth. Equally, the anti-sigma factor Rsd recently described for its role in controlling RNA polymerase activity in nutrient-poor growth media, seems also not involved according to mass-spec data. While other (unknown) pathways might still be involved in reducing the number of active RNA polymerases, the proposed hypothesis of the DNA substrate itself being limiting for the total rate of transcription is appealing. 

      Finally, the authors confirm the reduction of growth in the distant Caulobacter crescentus, which lacks overlapping rounds of replication and could thus have shown a different dependency on DNA concentration. 

      Weaknesses: 

      There are a range of points that should be clarified or addressed, either by additional experiments/analyses or by explanations or clear disclaimers. 

      First, the continuous reduction of growth rate upon arrest of DNA replication initiation observed in rich growth medium (M9gluCAA) is not equally observed in poor media. Instead, the relative growth rate is immediately/quickly reduced by about 10-20% and then maintained for long times, as if the arrest of replication initiation had an immediate effect but would then not lead to saturation of the DNA substrate. In particular, the long plateau of a constant relative growth rate in M9ala is difficult to reconcile with the model fit in Fig 4S2. Is it possible that DNA is not limiting in poor media (at least not for the cell sizes studied here) while replication arrest still elicits a reduction of growth rate in a different way? Might this have something to do with the naturally much higher oscillations of DNA concentration in minimal medium?

      The reviewer is correct that there are interesting differences between nutrient-rich and -poor conditions. They were originally noted in the discussion, but we understand how our original presentation made it confusing. We reorganized the text and figures to better explain our results and interpretations. In the revised manuscript, the data related to the poor media are now presented separately (new Figure 6) from the data related to the rich medium (Figures 1-3).  The total RNAP activity (abundance x active fraction) is significantly reduced in poor media (Figure 6A-B) similarly to rich medium (Figure 3H). Thus, DNA is limiting for transcription across conditions. However, the total ribosome activity in poor media (Figure 6C-D) and thus the growth rate (Figure 6EF) was less affected in comparison to rich media (Figure 2H and 1C). Our interpretation of these results is that while DNA is limiting for transcription in all tested nutrient conditions (as shown by the total active RNAP data), post-transcriptional buffering activities compensate for the reduction in transcription in poor media, thereby maintaining a better scaling of growth rates under DNA limitation. 

      The authors argue that DNA becomes limiting in the range of physiological cell sizes, in particular for M9glCAA (Fig. 1BC). It would be helpful to know by how much (fold-change) the DNA concentration is reduced below wild-type (or multi-N) levels at t=0 in Fig 1B and how DNA concentration decays with time or cell area, to get a sense by how many-fold DNA is essentially 'overexpressed/overprovided' in wild-type cells. 

      We now provide crude estimates in the Discussion section. The revised text reads: “Crude estimations suggest that ≤ 40% DNA dilution is sufficient to negatively affect transcription (total RNAP activity) in M9glyCAAT, whereas the same effect was observed after less than 10% dilution in nutrient-poor media (M9gly or M9ala) (see Materials and Methods).” We obtained these numbers based on calculations and estimates described in the Materials and Methods section and Appendix 1 (Appendix 1 – Table 1).

      Fig. 2: The distribution of diffusion coefficients of RpsB is fit to Gaussians on the log scale. Is this based on a model or on previous work or simply an empirical fit to the data? An exact analytical model for the distribution of diffusion constants can be found in the tool anaDDA by Vink, ..., Hohlbein Biophys J 2020. Alternatively, distributions of displacements are expressed analytically in other tools (e.g., in SpotOn). 

      We use an empirical fit of Gaussian mixture model (GMM) of three states to the data and extract the fractions of molecules in each state. This avoids making too many assumptions on the underlying processes, e.g. a Markovian system with Brownian diffusion. The model in anaDDA (Vink et al.) is currently limited to two-transitioning states with a maximal step number of 8 steps per track for a computationally efficient solution (longer tracks are truncated). Using a short subset of the trajectories is less accurate than using the entire trajectory and because of this, we consider full tracks with at least 9 displacements. Meanwhile, Spot-On supports a three-state model but it is still based on a semi-analytical model with a pre-calculated library of parameters created by fitting of simulated data. Neither of these models considers the effect of cell confinement, which plays a major role in single-molecule diffusion in small-sized cells such as bacteria. For these reasons, we opted to use an empirical fit to the data. We note that the fractions of active ribosomes in WT cells, which we extracted from these diffusion measurements, are consistent with the range of estimates obtained by others using similar or different approaches (Forchhammer and Lindhal 1971; Mohapatra and Weisshaar, 2018; Sanamrad et al., 2014). 

      The estimated fraction of active ribosomes in wild-type cells shows a very strong reduction with decreasing growth rate (down from 75% to 30%), twice as strong as measured in bulk experiments (Dai et al Nat Microbiology 2016; decrease from 90% to 60% for the same growth rate range) and probably incompatible with measurements of growth rate, ribosome concentrations, and almost constant translation elongation rate in this regime of growth rates. Might the different diffusive fractions of RpsB not represent active/inactive ribosomes? See also the problem of quantification above. The authors should explain and compare their results to previous work. 

      We agree that our measured range is somewhat larger than the estimated range from Dai et al, 2016. However, they use different media, strains, and growth conditions. We also note that Dai et al did not make actual measurements of the active ribosome fraction. Instead, they calculate the “active ribosome equivalent” based on a model that includes growth rate, protein synthesis rate, RNA/protein abundance, and the total number of amino acids in all proteins in the cell. Importantly, our measurements show the same overall trend (a ~30% decrease) as Dai et al, 2016. Furthermore, our results are within the range of previous experimental estimates from ribosome profiling (Forchhammer and Lindhal 1971) or single-ribosome tracking (Mohapatra and Weisshaar, 2018; Sanamrad et al., 2014). We clarified this point in the revised manuscript. 

      To measure the reduction of mRNA transcripts in the cell, the authors rely on the fluorescent dye SYTO RNAselect. They argue that 70% of the dye signal represents mRNA. The argument is based on the previously observed reduction of the total signal by 70% upon treatment with rifampicin, an RNA polymerase inhibitor (Bakshi et al 2014). The idea here is presumably that mRNA should undergo rapid degradation upon rif treatment while rRNA or tRNA are stable. However, work from Hamouche et al. RNA (2021) 27:946 demonstrates that rifampicin treatment also leads to a rapid degradation of rRNA. Furthermore, the timescale of fluorescent-signal decay in the paper by Bakshi et al. (half life about 10min) is not compatible with the previously reported rapid decay of mRNA (24min) but rather compatible with the slower, still somewhat rapid, decay of rRNA reported by Hamouche et al.. A bulk method to measure total mRNA as in the cited Balakrishnan et al. (Science 2022) would thus be a preferred method to quantify mRNA. Alternatively, the authors could also test whether the mass contribution of total RNA remains constant, which would suggest that rRNA decay does not contribute to signal loss. However, since rRNA dominates total RNA, this measurement requires high accuracy. The authors might thus tone down their conclusions on mRNA concentration changes while still highlighting the compelling data on RNAp diffusion. 

      Thank you for bringing the Hamouche et al 2021 paper to our attention. To address this potential issue, we have performed fluorescence in situ hybridization (FISH) microscopy using a 16S rRNA probe (EUB338) to quantify rRNA concentration in 1N cells. We found that the rRNA signal only slightly decreases with cell size (i.e., genome dilution) compared to the RNASelect signal (e.g., a ~5% decrease for rRNA signal vs. 50% for RNASelect for a cell size range of 4 to 10 µm2). We have revised the text and added a figure to include the new rRNA FISH data (Figure 4). In addition, as a control, we validated our rRNA FISH method by comparing the intracellular concentration of 16S rRNA in poor vs. rich media (new Figure 4 – Figure supplement 3).

      The proteomics experiments are a great addition to the single-cell studies, and the correlations between distance from ori and protein abundance is compelling. However, I was missing a different test, the authors might have already done but not put in the manuscript: If DNA is indeed limiting the initiation of transcription, genes that are already highly transcribed in non-perturbed conditions might saturate fastest upon replication inhibition, while genes rarely transcribed should have no problem to accommodate additional RNA polymerases. One might thus want to test, whether the (unperturbed) transcription initiation rate is a predictor of changes in protein composition. This is just a suggestion the authors may also ignore, but since it is an easy analysis, I chose to mention it here. 

      We did not find any correlation when we examined the potential relation between RNA slopes and mRNA abundance (from our first CRISPRi oriC time point) or the transcription initiation rate (from Balakrishnan et al., 2022, PMID: 36480614) across genes. These new plots are presented in Figure 7 – Figure supplement 2B. In contrast, we found a small but significant correlation between RNA slopes and mRNA decay rates (from Balakrishnan et al., 2022, PMID: 36480614), specifically for genes with short mRNA lifetimes (new Figure 7F). This effect is consistent with our model prediction (Figure 5 – Figure supplement 2). 

      Related to the proteomics, in l. 380 the authors write that the reduced expression close to the ori might reflect a gene-dosage compensatory mechanism. I don't understand this argument. Can the authors add a sentence to explain their hypothesis? 

      We apologize for the confusion. While performing additional analyses for the revisions, we realized that while the proteins encoded by genes close to oriC tend to display subscaling behavior, this is not true at the mRNA level (new Figure 7 – Figure supplement 3B). In light of this result, we no longer have a hypothesis for the observed negative correlation at the protein level (originally Figure 5D, now Figure 7 – Figure supplement 3A). The text was revised accordingly.  

      In Fig. 1E the authors show evidence that growth rate increases with cell length/area. While this is not a main point of the paper it might be cited by others in the future. There are two possible artifacts that could influence this experiment: a) segmentation: an overestimation of the physical length of the cell based on phase-contrast images (e.g., 200 nm would cause a 10% error in the relative rate of 2 um cells, but not of longer cells). b) timedependent changes of growth rate, e.g., due to change from liquid to solid or other perturbations. To test for the latter, one could measure growth rate as a function of time, restricting the analysis to short or long cells, or measuring growth rate for short/long cells at selected time points. For the former, I recommend comparison of phase-contrast segmentation with FM4-64-stained cell boundaries.

      As the reviewer notes, the small increase in relative growth was just a minor observation that does not affect our story whether it is biologically meaningful or the result of a technical artefact. But we agree with the reviewer that others might cite it in future works and thus should be interpreted with caution.

      An artefact associated with time-dependent changes (e.g. changing from liquid cultures to more solid agarose pads) is unlikely for two reasons. 1. We show that varying the time that cells spend on agarose pads relative to liquid cultures does not affect the cell size-dependent growth rate results (Figure 1 – supplement 5A). 2. We show that the growth rate is stable from the beginning of the time-lapse with no transient effects upon cell placement on agarose pads for imaging (Figure 1 – supplement 1). These results were described in the Methods section where they could easily be missed. We revised the text to discuss these controls more prominently in the Results section.

      As for cell segmentation, we have run simulations and agree with the reviewer that a small overestimation of cell area (which is possible with any cell segmentation methods including ours) could lead to a small increase in relative growth with increasing cell areas (new Figure 1 – Figure supplement 3). Since the finding is not important to our story, we simply revised the text and added the simulation results to alert the readers to the possibility that the observation may be due to a small cell segmentation bias.

      Reviewer #2 (Public Review): 

      In this work, the authors uncovered the effects of DNA dilution on E. coli, including a decrease in growth rate and a significant change in proteome composition. The authors demonstrated that the decline in growth rate is due to the reduction of active ribosomes and active RNA polymerases because of the limited DNA copy numbers. They further showed that the change in the DNA-to-volume ratio leads to concentration changes in almost 60% of proteins, and these changes mainly stem from the change in the mRNA levels. 

      Thank you for the support and accurate summary!

      Reviewer #3 (Public Review): 

      Summary: 

      Mäkelä et al. here investigate genome concentration as a limiting factor on growth.

      Previous work has identified key roles for transcription (RNA polymerase) and translation (ribosomes) as limiting factors on growth, which enable an exponential increase in cell mass. While a potential limiting role of genome concentration under certain conditions has been explored theoretically, Mäkelä et al. here present direct evidence that when replication is inhibited, genome concentration emerges as a limiting factor. 

      Strengths: 

      A major strength of this paper is the diligent and compelling combination of experiment and modeling used to address this core question. The use of origin- and ftsZ-targeted CRISPRi is a very nice approach that enables dissection of the specific effects of limiting genome dosage in the context of a growing cytoplasm. While it might be expected that genome concentration eventually becomes a limiting factor, what is surprising and novel here is that this happens very rapidly, with growth transitioning even for cells within the normal length distribution for E. coli. Fundamentally, it demonstrates the fine balance of bacterial physiology, where the concentration of the genome itself (at least under rapid growth conditions) is no higher than it needs to be. 

      Thank you!

      Weaknesses: 

      One limitation of the study is that genome concentration is largely treated as a single commodity. While this facilitates their modeling approach, one would expect that the growth phenotypes observed arise due to copy number limitation in a relatively small number of rate-limiting genes. The authors do report shifts in the composition of both the proteome and the transcriptome in response to replication inhibition, but while they report a positional effect of distance from the replication origin (reflecting loss of high-copy, origin-proximal genes), other factors shaping compositional shifts and their functional effects on growth are not extensively explored. This is particularly true for ribosomal RNA itself, which the authors assume to grow proportionately with protein. More generally, understanding which genes exert the greatest copy number-dependent influence on growth may aid both efforts to enhance (biotechnology) and inhibit (infection) bacterial growth. 

      We agree but feel that identifying the specific limiting genes is beyond the scope of the study. This said, we carried out additional experiments and analyses to address the reviewer’s comment and identify potential contributing factors and limiting gene candidates. First, we examined the intracellular concentration of 16S ribosomal RNA (rRNA) by rRNA FISH microscopy and found that it decays much slower than the bulk of mRNAs as measured using RNASelect staining (new Figure 4 and Figure 4 – Figure supplements 1 and 3). We found that the rRNA signal is far more stable in 1N cells than the RNASelect signal, the former decreasing by only ~5% versus ~50% for the later in response to the same range of genome dilution (Figure 4C).  Second,  we carried out new correlation analyses between our proteomic/transcriptomic datasets and published genome-wide datasets that report various variables under unperturbed conditions (e.g., mRNA abundance, mRNA degradation rates, fitness cost, transcription initiation rates, essentiality for viability); see new Figure 7E-G and Figure 7 – Figure supplement 2. In the process, we found that genes essential for viability tend, on average, to display superscaling behavior (Figure 7G). This suggests that cells have evolved mechanisms that prioritize expression of essential genes over nonessential ones during DNA-limited growth. Furthermore, this analysis identified a small number of essential genes that display strong negative RNA slopes (Figure 7C, Datasets 1 and 2), indicating that the concentration of their mRNA decreases rapidly relative to the rest of the transcriptome upon genome dilution. These essential genes with strong subscaling behavior are candidates for being growth-limiting. 

      The text and figures were revised to include these new results.

      Overall, this study provides a fundamental contribution to bacterial physiology by illuminating the relationship between DNA, mRNA, and protein in determining growth rate. While coarse-grained, the work invites exciting questions about how the composition of major cellular components is fine-tuned to a cell's needs and which specific gene products mediate this connection. This work has implications not only for biotechnology, as the authors discuss, but potentially also for our understanding of how DNA-targeted antibiotics limit bacterial growth. 

      Thank you!

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors): 

      Below are my comments. 

      (1) I noticed that a paper by Li et al. on biorxiv has found similar results as this work ("Scaling between DNA and cell size governs bacterial growth homeostasis and resource allocation," https://doi.org/10.1101/2021.11.12.468234), including the linear growth of E. coli when the DNA concentration is low. This relevant reference was not cited or discussed in the current manuscript. 

      We agree that authors should cite and discuss relevant peer-reviewed literature. But broadly speaking, we feel that extending this responsibility to all preprints (and by extension any online material) that have not been reviewed is a bit dangerous. It would effectively legitimize unreviewed claims and risk their propagation in future publications. We think that while imperfect, the peer-reviewing process still plays an important role. 

      Regarding the specific 2021 preprint that the reviewer pointed out, we think that the presented growth rate data are quite noisy and that the experiments lack a critical control (multi-N cells), making interpretation difficult. Their report that plasmid-borne expression is enhanced when DNA is severely diluted is certainly interesting and makes sense in light of our measurements that the activities, but not the concentrations, of RNA polymerases and ribosomes are reduced in 1N cells. However, we do not know why this preprint has not yet been published since 2021. There could be many possible reasons for this. Therefore, we feel that it is safer to limit our discussion to peer-reviewed literature.

      (2) I think the kinetic Model B in the Appendix has been studied in previous works, such as Klump & Hwa, PNAS 2008, https://doi.org/10.1073/pnas.0804953105

      Indeed, Klumpp & Hwa 2008 modeled the kinetics of RNA polymerase and promoter association prior to our study. But there is a difference between their model and ours. Their model is based on Michaelis Menten-type (MM) functions in which the RNAP is analogous to the “substrate” and the promoter to the “enzyme” in the MM equation. In contrast, our model uses functions based on the law of mass action (instead of MMtype of function). We have revised the text, included the Klumpp & Hwa 2008 reference, and revised the Materials & Methods section to clarify these points. 

      (3) On lines 284-285, if I understand correctly, the fractions of active RNAPs and active ribosomes are relative to the total protein number. It would be helpful if the authors could mention this explicitly to avoid confusion. 

      The fractions of active RNAPs and active ribosomes are expressed as the percentage of the total RNAPs and ribosomes. We have revised the text to be more explicit. Thank you.

      (4) On line 835, I am not sure what the bulk transcription/translation rate means. I guess it is the maximum transcription/translation rate if all RNAPs/ribosomes are working according to Eq. (1,2). It would be helpful if the authors could explain the meaning of r_1 and r_2 more explicitly. 

      Our apology for the lack of clarity. We have added the following equations:

      (5) Regarding the changes in protein concentrations due to genome dilution, a recent theoretical paper showed that it may come from the heterogeneity in promoter strengths (Wang & Lin, Nature Communications 2021). 

      In the Wang and Lin model, the heterogeneity in promoter strength predicts that the “mRNA production rate equivalent”, which is the mRNA abundance multiplied by the mRNA decay rate, will correlate the RNA slopes. However, we found these two variables to be uncorrelated (see below, The Spearman correlation coefficient ρ was 0.02 with a p-value of 0.24, indicating non-significance (NS).

      Author response image 1.

      The mRNA production rate equivalent (mRNA abundance at the first time point after CRISPRi oriC induction multiplied by the mRNA degradation rate measured by Balakrishnan et al., 2022, PMID: 36480614, expressed in transcript counts per minute) does not correlate (Spearman correlation’s p-value = 0.24) with the RNA slope in 1N-rich cells.  Data from 2570 genes are shown (grey markers, Gaussian kernel density estimation - KDE), and their binned statistics (mean +/- SEM, ~280 genes per bin, orange markers). 

      In addition, we found no significant correlation between RNA slopes and mRNA abundance or transcription initiation rate. These plots are now included in Figure 7E and Figure 7 –Figure supplement 2B. Thus, the promoter strength does not appear to be a predictor of the RNA (and protein) scaling behavior under DNA limitation. 

      Reviewer #3 (Recommendations For The Authors): 

      One general area that could be developed further is analysis of changes in the proteome/transcriptome composition, given that there may be specific clues here as to the phenotypic effects of genome concentration limitation. Specifically: 

      • In Figure 5D, the authors demonstrate an effect of origin distance on sensitivity to replication inhibition, presumably as a copy number effect. However, the authors note that the effect was only slight and postulated a compensatory mechanism. Due to the stability of proteins, one should expect relatively small effects - even if synthesis of a protein stopped completely, its concentration would only decrease twofold with a doubling of cell area (slope = -1, if I'm interpreting things correctly). It would be helpful to display the same information shown in Figure 5D at the mRNA level, since I would anticipate that higher mRNA turnover rates mean that effects on transcription rate should be felt more rapidly. 

      We thank the reviewer for this suggestion. To our surprise, we found that there is no correlation between gene location relative to the origin and RNA slope across genes. This suggests that the observed correlation between gene location and protein slopes does not occur at the mRNA level. Given that we do not have an explanation for the underlying mechanism, we decided to present these data (the original data in Figure 5D and the new data for the RNA slope) in a supplementary figure (Figure 7 – Figure supplement 3).

      • Related to this, did the authors see any other general trends? For example, do highly expressed genes hit saturation faster, making them more sensitive to limited genome concentration? 

      We found that the RNA slopes do not correlate with mRNA abundance or transcription initiation rates. However, they do correlate with mRNA decay. That is, short-lived mRNAs tend to have negative RNA slopes. The new analyses have been added as Figure 7E-F and Figure 7 – Figure supplement 2B. The text has been revised to incorporate this information. 

      • Presumably loss of growth is primarily driven by a subset of genes whose copy number becomes limiting. Previously, it has been reported that there is a wide variety among "essential" genes in their expression-fitness relationship - i.e. how much of a reduction in expression you need before growth is reduced (e.g. PMID 33080209). It would be interesting to explore the shifts in proteome/transcriptome composition to see whether any genes particularly affected by restricted genome concentration are also especially sensitive to reduced expression - overlap in these datasets may reveal which genes drive the loss of growth. 

      This is a very interesting idea – thank you! We did not find a correlation between the protein/RNA slope and the relative gene fitness as previously calculated (PMID 33080209), as shown below.

      Author response image 2.

      The relative fitness of each gene (data by Hawkins et al., 2020, PMID: 33080209, median fitness from the highest sgRNA activity bin) plotted versus the gene-specific RNA and protein slopes that we measured in 1Nrich cells after CRISPRi oriC induction. More than 260 essential genes are shown (262 RNA slopes and 270 protein slopes, grey markers), and their binned statistics (mean +/- SEM, 43-45 essential genes per bin, orange markers). The spearman correlations (ρ) with p-values above 10-3 are considered not significant (NS). In our analyses, we only considered correlations significant if they have a Spearman correlation p-value below 10-10.

      However, while doing this suggested analysis, we noticed that the essential genes that were included in the forementioned study have RNA slopes above zero on average. This led us to compare the RNA slope distributions of essential genes relative to all genes (now included in Figure 7G). We found that they tend to display superscaling behavior (positive RNA slopes), suggesting the existence of regulatory mechanisms that prioritize the expression of essential genes over less important ones when genome concentration becomes limiting for growth.  The text has been revised to include this new information.

      Other suggestions: 

      • In Figure 3 the authors report that total RNAP concentration increases with increasing cytoplasmic volume. This is in itself an interesting finding as it may imply a compensatory mechanism - can the authors offer an explanation for this? 

      We do not have a straightforward explanation. But we agree that it is very interesting and should be investigated in future studies given that this superscaling behavior is common among essential genes. 

      • The explanation of the modeling within the main text could be improved. Specifically, equations 1 and 2, as well as a discussion of models A and B (lines 290-301), do not explicitly relate DNA concentration to downstream effects. The authors provide the key information in Appendix 1, but for a general reader, it would be helpful to provide some intuition within the main text about how genome concentration influences transcription rate (i.e. via 𝛼RNAP).  

      We apologize for the lack of clarity. We have added information that hopefully improves clarity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors show that the Gαs-stimulated activity of human membrane adenylyl cyclases (mAC) can be enhanced or inhibited by certain unsaturated fatty acids (FA) in an isoform-specific fashion. Thus, with IC50s in the 10-20 micromolar range, oleic acid affects 3-fold stimulation of membrane-preparations of mAC isoform 3 (mAC3) but it does not act on mAC5. Enhanced Gαs-stimulated activities of isoforms 2, 7, and 9, while mAC1 was slightly attenuated, but isoforms 4, 5, 6, and 8 were unaffected. Certain other unsaturated octadecanoic FAs act similarly. FA effects were not observed in AC catalytic domain constructs in which TM domains are not present. Oleic acid also enhances the AC activity of isoproterenol-stimulated HEK293 cells stably transfected with mAC3, although with lower efficacy but much higher potency. Gαs-stimulated mAC1 and 4 cyclase activity were significantly attenuated in the 20-40 micromolar by arachidonic acid, with similar effects in transfected HEK cells, again with higher potency but lower efficacy. While activity mAC5 was not affected by unsaturated FAs, neutral anandamide attenuated Gαs-stimulation of mAC5 and 6 by about 50%. In HEK cells, inhibition by anandamide is low in potency and efficacy. To demonstrate isoform specificity, the authors were able to show that membrane preparations of a domain-swapped AC bearing the catalytic domains of mAC3 and the TM regions of mAC5 are unaffected by oleic acid but inhibited by anandamide. To verify in vivo activity, in mouse brain cortical membranes 20 μM oleic acid enhanced Gαs-stimulated cAMP formation 1.5-fold with an EC50 in the low micromolar range.

      Strengths:

      (1) A convincing demonstration that certain unsaturated FAs are capable of regulating membrane adenylyl cyclases in an isoform-specific manner, and the demonstration that these act at the AC transmembrane domains.

      (2) Confirmation of activity in HEK293 cell models and towards endogenous AC activity in mouse cortical membranes.

      (3) Opens up a new direction of research to investigate the physiological significance of FA regulation of mACs and investigate their mechanisms as tonic or regulated enhancers or inhibitors of catalytic activity.

      (4) Suggests a novel scheme for the classification of mAC isoforms.

      Weaknesses:

      (1) Important methodological details regarding the treatment of mAC membrane preps with fatty acids are missing.

      We will address this issue in more detail.

      (2) It is not evident that fatty acid regulators can be considered as "signaling molecules" since it is not clear (at least to this reviewer) how concentrations of free fatty acids in plasma or endocytic membranes are hormonally or otherwise regulated.

      Although this question is not the subject of this ms., we will address this question in more detail in the discussion of the revision.

      Reviewer #2 (Public review):

      Summary:

      The authors extend their earlier findings with bacterial adenylyl cyclases to mammalian enzymes. They show that certain aliphatic lipids activate adenylyl cyclases in the absence of stimulatory G proteins and that lipids can modulate activation by G proteins. Adding lipids to cells expressing specific isoforms of adenylyl cyclases could regulate cAMP production, suggesting that adenylyl cyclases could serve as 'receptors'.

      Strengths:

      This is the first report of lipids regulating mammalian adenylyl cyclases directly. The evidence is based on biochemical assays with purified proteins, or in cells expressing specific isoforms of adenylyl cyclases.

      Weaknesses:

      It is not clear if the concentrations of lipids used in assays are physiologically relevant. Nor is there evidence to show that the specific lipids that activate or inhibit adenylyl cyclases are present at the concentrations required in cell membranes. Nor is there any evidence to indicate that this method of regulation is seen in cells under relevant stimuli.

      Although this question is not the subject of this ms., we will address this question in more detail in the discussion of the revision.

      Reviewer #3 (Public review):

      Summary:

      Landau et al. have submitted a manuscript describing for the first time that mammalian adenylyl cyclases can serve as membrane receptors. They have also identified the respective endogenouse ligands which act via AC membrane linkers to modify and control Gs-stimulated AC activity either towards enhancement or inhibition of ACs which is family and ligand-specific. Overall, they have used classical assays such as adenylyl cyclase and cAMP accumulation assays combined with molecular cloning and mutagenesis to provide exceptionally strong biochemical evidence for the mechanism of the involved pathway regulation.

      Strengths:

      The authors have gone the whole long classical way from having a hypothesis that ACs could be receptors to a series of MS studies aimed at ligand indentification, to functional studies of how these candidate substances affect the activity of various AC families in intact cells. They have used a large array of techniques with a paper having clear conceptual story and several strong lines of evidence.

      Weaknesses:

      (1) At the beginning of the results section, the authors say "We have expected lipids as ligands". It is not quite clear why these could not have been other substances. It is because they were expected to bind in the lipophilic membrane anchors? Various lipophilic and hydrophilic ligands are known for GPCR which also have transmembrane domains. Maybe 1-2 additional sentences could be helpful here.

      Will be done as suggested.

      (2) In stably transfected HEK cells expressing mAC3 or mAC5, they have used only one dose of isoproterenol (2.5 uM) for submaximal AC activation. The reference 28 provided here (PMID: 33208818) did not specifically look at Iso and endogenous beta2 adrenergic receptors expressed in HEK cells. As far as I remember from the old pharmacological literature, this concentration is indeed submaximal in receptor binding assays but regarding AC activity and cAMP generation (which happen after signal amplification with a so-called receptor reserve), lower Iso amounts would be submaximal. When we measure cAMP, these are rather 10 to 100 nM but no more than 1 uM at which concentration response dependencies usually saturate. Have the authors tried lower Iso concentrations to prestimulate intracellular cAMP formation? I am asking this because, with lower Iso prestimulation, the subsequent stimulatory effects of AC ligands could be even greater.

      The best way to address this issue is to establish a concentration-response curve for Iso-stimulated cAMP formation using the permanently transfected cells. We note that in the past isoproterenol concentrations used in biochemical or electrophysiological experiments differed substantially.

      (3) The authors refer to HEK cell models as "in vivo". I agree that these are intact cells and an important model to start with. It would be very nice to see the effects of the new ligands in other physiologically relevant types of cells, and how they modulate cAMP production under even more physiological conditions. Probably, this is a topic for follow-up studies.

      The last sentence is correct.

      Appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      The authors have achieved their aims to a very high degree, their results do nicely support their conclusions. There is only one point (various classical GPCR concentrations, please see above) that would be beneficial to address.

      Without any doubt, this is a groundbreaking study that will have profound implications in the field for the next years/decades. Since it is now clear that mammalian adenylyl cyclases are receptors for aliphatic fatty acids and anandamide, this will change our view on the whole signaling pathway and initiate many new studies looking at the biological function and pathophysiological implications of this mechanism. The manuscript is outstanding.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      It is not clear from the methods section how free FAs were applied to membrane preparations or HEK293 cells. Were FAs solubilized in organic solvents, or introduced as micelles?

      The requested info is inserted into the M&M section

      Could the authors comment on what is known about the concentration of oleic acid and other non-saturated fatty acids in plasma membranes relative to those required to produce allosteric effects on cyclase activity?

      This info is now included in the last paragraph of the discussion.

      It would be worthwhile to test the effect of FAs on basal (not Gαs-stimulated) activity of mACs.

      This has been carried with mAC isoforms 2, 3, 7, and 9 in which oleic acid enhances Gsα-stimulated activity. Due to the low levels of basal activities interpretable data were not obtained.

      Do triglycerides esterified with oleic acid stimulate mAC3 and other sensitive isoforms?

      Experiments were done with triolein and 2-oleoyl-glycerol (the answer is no). The data are presented in Fig. 3 and in the appendix Fig.’s 8, 9, 14; structural formulas in appendix 2 Fig. 4 were updated.

      Does the quantity plotted on the vertical axis of Figure 1, right panel represent "Fractional Stimulation by Oleic acid" rather than simply "Fold Stimulation"? Clearly, as shown in the two left-most panels, Gαs stimulates both mAC and mAC5. Rather it seems that the ratio (oleic acid stimulation) / (Gαs stimulation) remains constant. This observation supports the statement in the discussion that "We suppose that in mAC3 the equilibrium of two differing ground states favors a Gαs-unresponsive state and the effector oleic acid concentration-dependently shifts this equilibrium to a Gαs-responsive state". It could also be said that the effect of oleic acid is additive, and in constant proportion to that of Gαs.

      This comment certainly is related to Fig. 2:

      The ratio would be (Gsα + oleic acid stimulation) / (Gsα-stimulation), i.e., fractional stimulation by addition of oleic acid is identical to fold stimulation.

      We have amended the legend to fig. 2C for clarification.

      The last sentence is wrong because oleic acid alone does not stimulate.

      It is stated on page 3, 2nd to last line that "The action of oleic acid on mAC3 was instantaneous...". Since the earliest time point is taken at 5 minutes, the claim that the action of the lipid is instantaneous cannot be made. Information about kinetics would be useful to have, since it is possible that the lipid must be released from a micelle and be incorporated into the AC membrane fraction before it is active.

      The first point is 3 min.

      We deleted the word “instantaneous” and added the correlation coefficients for both conditions in the legend to appendix 2; fig. 1 for clarification.

      The data spread in Figure 4 and other figures showing similar data is significant, to the extent that the computed value for EC50 may not be of high precision. Authors should cite the correlation coefficient for the overall fit and uncertainty for the EC50 value (in addition to significances by t-test of individual data points).

      This will not add valuable information. Pearsons correlation coefficients are only for linear relationships.

      (cf. N.N. Kachouie, W. Deebani (2020) Association Factor for Identifying Linear and Nonlinear Correlations in Noisy Conditions. Entropy 22:440)

      The "switch" between relatively low potency and high efficacy in membrane preps to high potency and low efficacy in cells is remarkable. Could this have a methodological basis or is it reflective of the mechanism by which FAs access mACs in membrane preps vs. cell membranes, or perhaps some biochemical transformation of the lipid in cells?

      Honestly, we do not know.

      The authors should note that there is some precedence for this work:

      J Nakamura , N Okamura, S Usuki, S Bannai, Inhibition of adenylyl cyclase activity in brain membrane fractions by arachidonic acid and related unsaturated fatty acids. Arch Biochem Biophys. 2001 May 1;389(1):68-76. doi: 10.1006/abbi.2001.2315.

      The effects of FA deficiencies on AC and related activities have been noted:

      Alam SQ, Mannino SJ, Alam BS, McDonough K Effect of essential fatty acid deficiency on forskolin binding sites, adenylate cyclase, and cyclic AMP-dependent protein kinase activity, the levels of G proteins and ventricular function in rat heart. J Mol Cell Cardiol. 1995 Aug;27(8):1593-604. doi: 10.1016/s0022-2828(95)90491-3. PMID: 8523422

      The latter publications are supportive of, and provide context to, the author's findings.

      Both references are mentioned and cited.

      Minor points:

      The significance of the coloring scheme in Figure 5C bar graph should be stated in the legend.

      Done.

      In the introduction, it is stated that "The protein displayed two similar catalytic domains (C1 and C2) and two dissimilar hexahelical membrane anchors (TM1 and TM2)". In both cases, the respective domains can be said to be similar in overall fold, but - certainly in the case of the catalytic domains - different in amino acid sequence in functionally important regions of the domain.

      Done: Changed wording.

      The statement in the introduction that "The domain architecture, TM1-C1-TM2-C2, clearly indicated a pseudoheterodimeric protein composed of two concatenated bacterial precursor proteins" The authors refer to the fact that mammalian enzymes are pseudo heterodimers whereas bacterial type III cyclases are dimers of identical subunits.

      Done.

      Reviewer #2 (Recommendations for the authors):

      The title need not state that a 'new class of receptors' has been identified. There is no direct evidence that the lipids bind to the enzymes, and the affinities can only be surmised from the EC50 graphs. To call a protein a receptor requires evidence to show that the binding is specific by showing that binding can be inhibited by a large excess of 'unlabelled' ligand. This could have been done by procuring labelled lipids for experimental verification.

      As is well known, lipids easily bind to proteins. In this study no purified proteins were used. Therefore, binding assays most likely would result in unreliable data.

      The paper would have benefitted from showing sequence alignments in the TM domains of the ACs discussed in the paper. Further, a phylogenetic tree of mammalian ACs would also reveal which enzymes from other species may be regulated similarly to those described in the paper. This would be important for researchers who use other model organisms to study cAMP signalling.

      Such data are in multiple papers accessible in the literature. Where deemed appropriate we inserted references.

      Figures 1A and 1B show data from only two experiments. A third experiment would have been useful in order to show the statistical significance of the data.

      At this stage more experiments would not have affected further experimental plans.

      Statements made in the text (for example, the last paragraph on page 6) state only the mean value and not the SDs. This would have been important to include even if the data is shown in the appendix. The same is true in the Legend of Figure 2. Why have the authors decided to use SEM and not SDs?

      The reason is specified in M&M.

      Concentrations of lipids used in biochemical assays are in the micromolar range. This suggests that we have moderate affinity binding, more in the range of an enzyme for a substrate rather than a receptor-ligand interaction.

      We happen to disagree. Clearly, the differential activities, enhancing or attenuating Gsα-stimulated mAC activities is most plausibly explained by mAC receptor properties. mACs have enzyme activities using fatty acids as substrates.

      The authors add lipids to cells and show changes in cAMP levels in their presence and absence. They also discuss how these extracellular lipids could be produced. Do you think this is necessary in vivo, though? Could the lipids present in membranes naturally act as regulators? Do specific lipid concentrations differ in different cell types, suggesting tissue-specific regulation of these mammalian Acs?

      These are things that could be discussed in the manuscript.

      The last paragraph of the discussion deals with these questions.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The extra macrochaetae (emc) gene encodes the only Inhibitor of DNA binding protein (Id protein) in Drosophila. Its best-known function is to inhibit proneural genes during development. However, the emc mutants also display nonproneural phenotypes. In this manuscript, the authors examined four non-proneural phenotypes of the emc mutants and reported that they are all caused by inappropriate non-apoptotic caspase activity. These non-neuronal phenotypes are: reduced growth of imaginal discs, increased speed of the morphogenetic furrow, and failure to specify R7 photoreceptor neurons and cone cells during eye development. Double mutants between emc and either H99 (which deletes the three pro-apoptotic genes reaper, grim, and hid) or the initiator caspase dronc suppress these mutant phenotypes of emc suggesting that the cell death pathway and caspase activity are mediating these emc phenotypes. In previous work, the authors have shown that emc mutations elevate the expression of ex which activates the SHW pathway (aka the Hippo pathway). One known function of the SHW pathway is to inhibit Yorkie which controls the transcription of the inhibitor of apoptosis, Diap1. Consistently, in emc clones the levels of Diap1 protein are reduced which might explain why caspase activity is increased in emc clones giving rise to the four non-neural phenotypes of emc mutants.

      However, this increased caspase activity is not causing ectopic apoptosis, hence the authors propose that this is nonapoptotic caspase activity. In the last part of the manuscript, the authors ruled out that Wg, Dpp, and Hh signaling are the target of caspases, but instead identified Notch signaling as the target of caspases, specifically the Notch ligand Delta. Protein levels of Delta are increased in emc clones in an H99- and dronc-dependent manner. The authors conclude that caspase-dependent non-apoptotic signaling underlies multiple roles of emc that are independent of proneural bHLH proteins.

      Strengths:

      Overall, this is an interesting manuscript and the findings are intriguing. It adds to the growing number of non-apoptotic functions of apoptotic proteins and caspases in particular. The manuscript is well written and the data are usually convincingly presented.

      Weaknesses:

      (1)  One major concern I have is the observation by the authors in Figure 3C in which protein levels of Diap1 are still reduced in emc H99 double mutant clones. If Diap1 is still reduced in these clones, shouldn't caspases still be derepressed? Given that emc H99 double mutants rescue all emc phenotypes examined, the observation that Diap1 levels are still reduced in emc H99 clones is inconsistent with the authors' model. The authors need to address this inconsistency.

      The effect of H99 emc clones on Diap1 protein levels is consistent with our conclusions.  The reviewer’s concern probably relates to previous work that shows that RHG proteins act by antagonizing DIAP1, so that Diap1 is epistatic to RHG (PMID:10481910), and that RHG proteins affect DIAP1 protein levels, and in particular that HID promotes DIAP1 ubiquitylation leading to its destruction (PMID:12021767).  First, epistasis means that in the absence of DIAP1, RHG levels do not affect cell survival.  DIAP1 protein is not absent in emc/emc eye clones, however, it is reduced.  It is not only possible but expected that RHG levels would affect survival when DIAP1 levels are only reduced.  Secondly, we did not see a difference in DIAP1 levels between H99/H99 clones and H99/+ cells within the same specimen, suggesting that rpr, grim and hid might not affect DIAP1 levels. It is possible that Hid protein only affects DIAP1 levels when overexpressed, as in the aforementioned paper (PMID:12021767), and that physiological RHG levels affect DIAP1 activity.  The H99 deficiency also eliminates Rpr and Grim, which may affect DIAP1 without ubiquitylating it. In our experiments, however, there are no cells completely wild type for the H99 region for comparison in the same specimen, so our results do not rule out the H99 deletion having a dominant effect on DIAP1 levels both inside and outside the clones.  What our data clearly showed is that emc affected DIAP1 levels independently of any potential RHG effect, and we hypothesized this was through diap1 transcription, because we showed previously that emc affects yki, a transcriptional regulator of the diap1 gene, but we have not demonstrated transcriptional regulation of diap1 directly in emc clones.  We modified the manuscript to better delineate these issues (lines 275-284).    

      (2) Are Diap1 protein levels reduced in all emc clones, including clones anterior to the furrow? This is difficult to see in Figure 3B. it is also recommended to look in emc mosaic wing discs.

      We now mention that DIAP1 levels were only reduced in  emc clones posterior to the morphogenetic furrow, not anterior to the morphogenetic furrow or in emc clones in wing imaginal discs (lines 284-5) and Figure 3 supplement 1.  

      (3) The authors speculate that Delta may be a direct target of caspase cleavage (Figure 9B), but then rule it out for a good reason. However, I assume that the increased protein levels of Delta in emc clones (Figure 7) are the results of increased transcription. In that case, shouldn't caspases control the transcriptional machinery leading to Delta expression?

      Thank you for suggesting that caspases control the transcription of Dl.  We added this possibility to the manuscript (lines 499-500).  At one time there was a Dl-LacZ transcriptional reporter, which would have made it straightforward to assess Dl transcription in emc clones, but this strain does not seem to exist now.  We have not attempted in situ hybridization to Dl transcripts in mosaic discs.  

      (4) How does caspase activity in emc clones cause reduced growth? Is this also mediated through Delta signaling?

      We do not know what is the caspase target responsible for reduced growth in wing discs.

      (5) Figure 1M: Is there a similar result with emc dronc mosaics?

      The emc dronc clones do not show as dramatic a growth advantage in a Minute background.  This is consistent with the smaller effect of emc dronc in the non-Minute background also (Figure 1N).  We mention this in the revised paper (lines 232-3).     

      Reviewer #2 (Public Review):

      Id proteins are thought to function by binding and antagonizing basic helix-loop-helix (bHLH) transcription factors but new findings demonstrate roles for emc including in tissues where no proneural (Drosophila bHLH) genes are known to function. The authors propose a new mechanism for developmental regulation that entails restraining new/novel non-apoptotic functions of apoptotic caspases.

      Specifically, the data suggest that loss of emc leads to reduced expression of diap1 and increased apoptotic caspase activity, which does not induce apoptosis but elevates Delta expression to increase N activity and cause developmental defects. Indeed, many of the phenotypes of emc mutant clones can be rescued by a chromosomal deficiency that reduces caspase activation or by mutations in the initiator caspase Dronc. A related manuscript that shows that loss of emc results in increased da, linked previously to diap1 expression, provides supporting data. There is increasing appreciation that apoptotic caspases have non-apoptotic roles. This study adds to the emerging field and should be of interest to readers.

      The data, for the most part, support the conclusions but I do have concerns about some of the data and the interpretations that should be addressed.

      Reviewer #3 (Public Review):

      The work extends earlier studies on the Drosophila Id protein EMC to uncover a potential pathway that explains several tissue-scale developmental abnormalities in emc mutants. It also describes a non-apoptotic role for caspases in cell biology.

      Strengths:

      The work adds to an emerging new set of functions for caspases beyond their canonical roles as cell death mediators. This novelty is a major strength as well as its reliance on genetic-based in vivo study. The study will be of interest to those who are curious about caspases in general.

      Weaknesses:

      The manuscript relies on imaging experiments using genetic mosaic imaginal discs. It is for the most part a qualitative analysis, showing representative samples with a small number of mutant clones in each. Although the senior author has a long track record of using experiments like this to rigorously discover regulatory mechanisms in this system, it is straightforward in 2023 to use Fiji and other image analysis tools to measure fluorescence. Such measurements could be done for all replicate clones of a given genotype as well as genetic control sampling. These could be presented in plots that would not only provide quantitative and statistical measurements, but will be more reader- friendly to those who are not fly people.

      We added quantification of anti-Delta and anti-Diap1 levels to the manuscript (Figures 3E and 7E).  We agree that this facilitates statistical confirmation of the results and may be more accessible to non-experts.  We do have concerns that these quantifications might be given too much weight.  For example, we cannot measure the background level of anti-DIAP1 labeling by labeling diap1 null mutant cells, because such cells do not survive.  Although we measure ~20% reduction in emc clones in the eye disc, and none in the wing disc, both measures could be underestimates if some of the labeling is non-specific, as is very possible.  We discuss this in the Methods (lines 166-9).

      Likewise, more details are needed to describe how clone areas were measured in Figure 1. Did they measure each clone and its twin spot, and then calculate the area ratio for each clone and its paired twin spot? This would be the correct way to analyze the data, yielding many independent measurements of the ratio. And doing so would obviate the need to log transform the data which is inexplicable unless they were averaging clones and twins within a disc and making replicates. More explanation is needed and if they indeed averaged, then they need to calculate the ratios pairwise for each clone and twin.

      We added details of clone size measurements and analysis to the methods (lines 141-6).  Although it might be useful to compare individual clones and corresponding twin spots, the only rigorous way to associate individual clones with individual twin spots, or even to determine what is one clone and what is one twin spot, is to use recombination rates low enough that significantly less than one recombination occurs per disc.  This would require many more dissections and we did not do this.  We now clarify in the manuscript that the analysis is indeed based on the ratio of total area of clones and twin spots with replicates, and that Log-transformation is to improve the normality of the ratio data suitable for parametric significance testing, not because clones and twin spots were summed from each sample.  We consulted with a statistician over this approach.  

      Reviewer #1 (Recommendations For The Authors):

      Lines 319/320: "Frizzled-3 RFP expression was not changed in in emc clones (Figure 4A)". This was actually not shown in Fig 4A (in fact this result was not shown at all). Fig 4A shows the result for emc nkd3 which the authors incorrectly assigned to Figure 4B (line 324).

      We apologize for labeling Figure 4A and 4B incorrectly.

      The title of Figure 6 is inaccurate. The title does not indicate what is shown in this figure. A more accurate title would be: Notch activity and function in emc mutant clones.

      We provided a new title for Figure 6. 

      Reviewer #2 (Recommendations For The Authors):

      There is no information on how reproducible the data is. How many discs were examined in each experiment and in how many technical or biological replicates? Can fluorescence signals be quantified within and outside the clones and presented to illustrate reproducibility and significance? This is especially needed for Fig 7, which shows key data that N ligand Delta is elevated in emc clones but dronc and H99 mutations rescue this phenotype. I can see that the Dl signal is brighter in the GFP- emc clone in Fig 7B but I can also see a brighter Dl signal in the small clone and perhaps also in the large clone in C. The difference between B and C could be simply disc-to-disc variation, which should be addressed with quantification and presentation of all data points.

      We added the number of samples to each figure legend.  We quantified the fluorescence signals for Figures 3 and 7.  Quantification shows that the difference between 7B and 7C is highly significant, not disc to disc variation.

      Fig 2B does not support the conclusion. It is supposed to show premature Sens expression and therefore abnormal morphogenetic furrow progression in emc clones. But the yellow arrow is pointing to GFP+ (wild type) cells and it is within this GFP+ region that most premature Sens expression is seen.

      We relocated the arrows in Figure 2B to point precisely to the premature differentiation.  When the morphogenetic furrow is accelerated in emc mutant, GFP – tissue, it does not stop when wild type, GFP+ tissue is encountered again, it continues at a normal pace.  Accordingly, emc+ regions that are anterior to emc- regions can also experience accelerated differentiation (please see lines 594-8).

      Fig 1 shows that while H99 deficiency restores the growth of emc clones to wild type level (Fig 1N), placing these in the Minute background made emc clones grow better than emc wild type but Minute neighbors (Fig 1M). The latter cells were nearly absent, suggesting elimination through cell competition. For the rest of the figures, some experiments are done in the Minute background (e.g., emc H99 clones in Fig 2D) while others are not in the Minute background (e.g., emc H99 clones in Fig 7D). Why the switch between backgrounds from experiment to experiment?

      Figure 2D shows emc H99 clones in a Minute background so that it can be compared with panels 2A-C, which show clones of other genotypes in a Minute background.  These clones almost take over the eye disc.  In Figure 7D, it was important to show the Dl expression pattern in a substantial wild type region, which could only be shown using the non-Minute background.  We have no indication that a Minute background changes the properties of the nonMinute clone, other than allowing its greater growth.  

      The first 3 paragraphs of the Introduction are overly detailed and read more like a review article. These could be made more concise to focus on the founding data for this manuscript, which are the published findings that emc mutations elevate ex expression (line 129) and that ex mutants show elevated diap1 expression (line 125). These do not show up until the very end of the Introduction.

      We shortened the Introduction to focus more rapidly on the topics relevant to these experiments.

      In several places, the space between the end of the sentence and the citation is missing (e.g., lines 57, 68, and 75).

      The spacing of citations was fixed.

      Line 247. 'morphogenetic furrow that found each ommatidia...' should use a word besides 'found.'

      We corrected line 247.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors show that inhibiting caspases rescues the growth defect of emc clones. However, they did not find excessive TUNEL staining in emc clones that would explain why the clones would be so small - excessive cell death. How reliable was their tunel staining in being able to detect excessive apoptosis (only negative data was shown). Could they induce excessive cell death using radiation or some other means to ensure the assay is robust? If death is not occurring in emc clones, a deficiency worth addressing is that they do not discuss or explore how the caspases then inhibit clone growth. Is it expanded cell cycle times, or smaller cells?? And that phenotype does not fit with their end model of Delta being the only moderator of emc since it is not playing a significant role in tissue growth anterior to the furrow.One would assume using the commercial antibody against activated caspase would be another readout for emc clones and this would bolster their claim that excessive caspase activation occurs in the emc cells.

      We have added Dcp1 staining in Figure 2 supplement 3 to show that TUNEL staining is reliable.

      (2) Figure 3D has really large emc clones when GMR-Diap is present. But the large clones are anterior to the furrow where Diap would not be overexpressed. Is this just an unusual sample with a coincidentally big emc M+ clone? It speaks to my concerns about the qualitative nature of the data.

      We replaced Figure 3D with an example of smaller clones.  Nowhere have we suggested that  GMR-DIAP1 affects clone size.

      (3) Figure 9B is very speculative and not appropriate since the authors have zero data to support that cleavage mechanism. It is fit for the next paper if the idea is correct. The panel should be removed.

      We did not intend Figure 9B to imply that we think Dl itself is the relevant target of non-apoptotic caspases.  Since apparently we gave that impression, we removed this to a supplemental figure.  We still think it is worth showing that Dl does not contain predicted caspase sites expected to activate signaling. 

      (4) Figure 9A could be made more clear. Their pathway represents the mutant cells in the mosaic disc. Why not also outline what you think is happening in the emc+ cells as well?

      It is difficult to make a comparable diagram for normal cells, because none of this pathway happens in normal cells.  We modified the figure legend to indicate this (lines 677-8).

      (5) The one emc ci clone they show spanning the furrow has a very non-continuous furrow advance phenotype. This is unlike the emc clones where the furrow advance is graded about the clone. And it resembles the SuH clones they show. This result and the synergistic effect on clone sizes they mention need more discussion and thought put into it. It argues ci is doing something with respect to emc action. loss of ci might not rescue size and furrow advance but actually, it makes it worse! This is interesting and might suggest an inhibitory role for ci in emc or a parallel role for ci in mediating growth and progression that is redundant with emc.

      We agree that aspects of the emc ci phenotype are not clear.  We discuss this in the revised manuscript (lines 373-5).  

      (6) Related to point 7, it is a weak argument for non-autonomy that graded furrow advance in emc clones is evidence for emc acting nonautonomously through Delta. Its weakness is combined with its lack of significance relative to the other findings. It should be deleted as should the SuH data.

      We agree that the evidence that emc affects morphogenetic furrow progression non-autonomously is not compelling and have revised the manuscript to soften this conclusion (lines 426-7).  We do not want to remove this idea, because it does in fact have significance for other findings.  Specifically, it supports the idea that the emc effect in the morphogenetic furrow is due to trans-activation by Delta, whereas  the effect on R7 and cone cell differentiation is due to autonomous cis-inhibition.  We think this is important to keep in the paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) This experiment sought to determine what effect congenital/early-onset hearing loss (and associated delay in language onset) has on the degree of inter-individual variability in functional connectivity to the auditory cortex. Looking at differences in variability rather than group differences in mean connectivity itself represents an interesting addition to the existing literature. The sample of deaf individuals was large, and quite homogeneous in terms of age of hearing loss onset, which are considerable strengths of the work. The experiment appears well conducted and the results are certainly of interest. I do have some concerns with the way that the project has been conceptualized, which I share below.

      Thank you for acknowledging the strengths and novelty of our study. We have now addressed the conceptual issues raised; please see below in the specific comments.

      (2) The authors should provide careful working definitions of what exactly they think is occurring in the brain following sensory deprivation. Characterizing these changes as 'largescale neural reorganization' and 'compensatory adaptation' gives the impression that the authors believe that there is good evidence in support of significant structural changes in the pathways between brain areas - a viewpoint that is not broadly supported (see Makin and Krakauer, 2023). The authors report changes in connectivity that amount to differences in coordinated patterns of BOLD signal across voxels in the brain; accordingly, their data could just as easily (and more parsimoniously) be explained by the unmasking of connections to the auditory cortex that are present in typically hearing individuals, but which are more obvious via MR in the absence of auditory inputs.

      We thank the Reviewer for the suggestion to clarify and better support our stance regarding reorganization. We indeed believe that the adaptive changes in the auditory cortex in deafness represent real functional recruitment for non-auditory functions, even in the relatively limited large-scale anatomical connectivity changes. This is supported by animal works showing causal evidence for the involvement of deprived auditory cortices in non-auditory tasks, in a way that is not found in hearing controls (e.g., Lomber et al., 2010, Meredith et al., 2011, reviewed in Alencar et al., 2019; Lomber et al., 2020). Whether the word “reorganization” should be used is indeed debated recently (Makin and Krakauer, 2023). Beyond terminology, we do agree that the basis for the changes in recruitment seen in the brains of people with deafness or blindness is largely based on the typical anatomical connectivity at birth. We also agree that at the group level, there is poor evidence of large-scale anatomical connectivity differences in deprivation. However, we think there is more than ample evidence that the unmasking and more importantly re-weighting of non-dominant inputs gives rise to functional changes. This is supported by the relatively weaker reorganization found in late-onset deprivation as compared to early-onset deprivation. If unmasking of existing connectivity without any functional additional changes were sufficient to elicit the functional responses to atypical stimuli (e.g., non-visual in blindness and non-auditory in deafness), one would expect there to be no difference between early- and late-onset deprivation in response patterns. Therefore, we believe that the fact that these are based on functions with some innate pre-existing inputs and integration is the mechanism of reorganization, not a reason not to treat it as reorganization. Specifically, in the case of this manuscript, we report the change in variability of FC from the auditory cortex, which is greater in deafness than in typically hearing controls. This is not an increase in response per se, but rather more divergent values of FC from the auditory cortex, which are harder to explain in terms of ‘unmasking’ alone, unless one assumes unmasking is particularly variable. The mechanistic explanation for our findings is that in the absence of auditory input’s fine-tuning and pruning of the connectivity of the auditory cortex, more divergent connectivity strength remains among the deaf. Thus, auditory input not only masks non-dominant inputs but also prunes/deactivates exuberant connectivity, in a way that generates a more consistently connected auditory system. We have added a shortened version of these clarifications to the discussion (lines 351-372).

      (3) I found the argument that the deaf use a single modality to compensate for hearing loss, and that this might predict a more confined pattern of differential connectivity than had been previously observed in the blind to be poorly grounded. The authors themselves suggest throughout that hearing loss, per se, is likely to be driving the differences observed between deaf and typically-hearing individuals; accordingly, the suggestion that the modality in which intentional behavioral compensation takes place would have such a large-scale effect on observed patterns of connectivity seems out of line.

      Thank you for your critical insight regarding our rationale on modality use and its impact on connectivity patterns in the deaf compared to the blind. After some thought, we agree that the argument presented may not be sufficiently strong and could distract from the main findings of our study. Therefore, we have decided to remove this claim from our revised manuscript.

      (4) The analyses highlighting the areas observed to be differentially connected to the auditory cortex and areas observed to be more variable in their connectivity to the auditory cortex seem somewhat circular. If the authors propose hearing loss as a mechanism that drives this variability in connectivity, then it is reasonable to propose hypotheses about the directionality of these changes. One would anticipate this directionality to be common across participants and thus, these areas would emerge as the ones that are differently connected when compared to typically hearing folks.

      We are a little uncertain how to interpret this concern.  If the question was about the logic leading to our statement that variability is driven by hearing loss, then yes, we indeed were proposing hearing loss as a mechanism that drives this variability in connectivity to the auditory cortex; we regret this was unclear in the original manuscript. This logic parallels the proposal made with regard to the increased variability in FC in blindness; deprivation leads to more variable outcomes, due to the lack of developmental environmental constraints (Sen et al., 2022). Specifically, we first analyzed the differences in within-group variability between deaf and hearing individuals (Fig. 1A), followed by examining the variability ratio (Fig. 1B) in the same regions that demonstrated differences. The first analysis does not specify which group shows higher variability; therefore, the second analysis is essential to clarify the direction of the effect and identify which group, and in which regions, exhibits greater variability. We have clarified this in the revised manuscript (lines 125-127): “To determine which group has larger individual differences in these regions (Figure 1B), we computed the ratio of variability between the two groups (deaf/hearing) in the areas that showed a significant difference in variability (Figure 1A)”. Nevertheless, this comment can also be interpreted as predicting that any change in FC due to deafness would lead to greater variability. In this case, it is also important to mention that while we would expect regions with higher variability to also show group differences between the deaf and the hearing (Figure 2), our analysis demonstrates that variability is present even in regions without significant group mean differences. Similarly, many areas that show a difference between the groups in their FC do not show a change in variability (for example, the bilateral anterior insula and sensorimotor cortex). In fact, the correlation between the regions with higher FC variability (Figure 1A) and those showing FC group differences (Figure 2B) is significant but rather modest, as we now acknowledge in our revised manuscript (lines 324-328). Therefore, increased FC and increased variability of FC are not necessarily linked. 

      (5) While the authors describe collecting data on the etiology of hearing loss, hearing thresholds, device use, and rehabilitative strategies, these data do not appear in the manuscript, nor do they appear to have been included in models during data analysis. Since many of these factors might reasonably explain differences in connectivity to the auditory cortex, this seems like an omission.

      We thank the Reviewer for their comment regarding the inclusion of these variables in our manuscript. We have now included additional information in the main text and a supplementary table in the revised manuscript that elaborates further on the etiology of hearing loss and all individual information that characterizes our deaf sample. Although we initially intended to include individual factors (e.g., hearing threshold, duration of hearing aid use, and age of first use) in our models, this was not feasible for the following reasons: 1) for some subjects, we only have a level  of hearing loss rather than specific values, which we could not use quantitatively as a nuisance variable (it was typical in such testing to ascertain the threshold of loss as belonging to a deafness level, such as “profound” and not necessarily go into more elaborate testing to identify the specific threshold), and 2) this information was either not collected for the hearing participants (e.g., hearing threshold) or does not apply to them (e.g., age of hearing aid use), which made it impossible to use the complete model with all these variables. Modeling the groups separately with different variables would also be inappropriate. Last, the distribution of the values and the need for a large sample to rigorously assess a difference in variability also precluded sub-dividing the group to subgroup based on these values. 

      Therefore, we opted for a different way to control for the potential influence of these variables on FC variability in the deaf. We tested the correlation between the FC from the auditory cortex and each of these parameters in the areas that showed increased FC in deafness (Figures 1A, B), to see if it could account for the increased variability. This ROI analysis did not reveal any significant correlations (all p > .05, prior to correction for multiple comparisons; see Figures S4, S5, and S6 for scatter plots). The maximal variability explained in these ROIs by the hearing factors was r2\=0.096, whereas the FC variability (Figure 1B) was increased by at least 2 in the deaf. Therefore, it does not seem like these parameters underlie the increased variability in deafness. To test if these variables had a direct effect on FC variability in other areas in the brain, we also directly computed the correlation between FC and each factor individually. At the whole-brain level, the results indicate a significant correlation between AC-FC and hearing threshold, as well as a correlation between AC-FC and the age of hearing aid use onset, but not for the duration of hearing aid use (Figure S3). While these may be interesting on their own, and are added to the revised manuscript, the regions that show significant correlations with hearing threshold and age of hearing aid use are not the same regions that exhibit FC variability in the deaf (Figures 1A, B).

      Overall, these findings suggest that although some of these factors may influence FC, they do not appear to be the driving factors behind FC variability. Finally, in terms of rehabilitative strategies, only one deaf subject reported having received long-term oral training from teachers. This participant started this training at age 2, as now described in the participants’ section. We thank the reviewer for raising this concern and allowing us to show that our findings do not stem from simple differences ascribed to auditory experience in our participants. 

      Reviewer #2 (Public Review):

      (1) The paper has two main merits. Firstly, it documents a new and important characteristic of the re-organization of the brains of the deaf, namely its variability. The search for a welldefined set of functions for the deprived auditory cortex of the deaf has been largely unsuccessful, with several task-based approaches failing to deliver unanimous results. Now, one can understand why this was the case: most likely there isn't a fixed one well-defined set of functions supported by an identical set of areas in every subject, but rather a variety of functions supported by various regions. In addition, the paper extends the authors' previous findings from blind subjects to the deaf population. It demonstrates that the heightened variability of connectivity in the deprived brain is not exclusive to blindness, but rather a general principle that applies to other forms of deprivation. On a more general level, this paper shows how sensory input is a driver of the brain's reproducible organization.

      We thank the Reviewer for their observations regarding the merits of our study. We appreciate the recognition of the novelty in documenting the variability of brain reorganization in deaf individuals. 

      (2) The method and the statistics are sound, the figures are clear, and the paper is well-written. The sample size is impressively large for this kind of study.

      We thank the Reviewer for their positive feedback on the methodology, statistical analysis, clarity of figures, and the overall composition of our paper. We are also grateful for the acknowledgment of our large sample size, which we believe significantly strengthens the statistical power and the generalizability of our findings.

      (3) The main weakness of the paper is not a weakness, but rather a suggestion on how to provide a stronger basis for the authors' claims and conclusions. I believe this paper could be strengthened by including in the analysis at least one of the already published deaf/hearing resting-state fMRI datasets (e.g. Andin and Holmer, Bonna et al., Ding et al.) to see if the effects hold across different deaf populations. The addition of a second dataset could strengthen the evidence and convincingly resolve the issue of whether delayed sign language acquisition causes an increase in individual differences in functional connectivity to/from Broca's area. Currently, the authors may not have enough statistical power to support their findings.

      We thank the Reviewer for their constructive suggestion to reinforce the robustness of our findings. While we acknowledge the potential value of incorporating additional datasets to strengthen our conclusions, the datasets mentioned (Andin and Holmer, Bonna et al., Ding et al.) are not publicly available, which limits our ability to include them in our analysis. Additionally, datasets that contain comparable groups of delayed and native deaf signers are exceptionally rare, further complicating the possibility of their inclusion. Furthermore, to discern individual differences within these groups effectively, a substantially larger sample size is necessary. As such, we were unfortunately unable to perform this additional analysis. This is a challenge we acknowledge in the revised manuscript (lines 442-445), especially when the group is divided into subcategories based on the level of language acquisition, which indeed reduces our statistical power. We have however, now integrated the individual task accuracy and reaction time parameters as nuisance variables in calculating the variability analyses; all the results are fully replicated when accounting for task difficulty. We also report that there was no group difference in activation for this task between the groups which could affect our findings. 

      We would like to note that while we would like to replicate these findings in an additional cohort using resting-state, we do not anticipate the state in which the participants are scanned to greatly affect the findings. FC patterns of hearing individuals have been shown to be primarily shaped by common system and stable individual features, and not by time, state, or task (Finn et al., 2015; Gratton et al., 2018; Tavor et al., 2016). While the task may impact FC variability, we have recently shown that individual FC patterns are stable across time and state even in the context of plasticity due to visual deprivation (Amaral et al., 2024). Therefore, we expect that in deafness as well there should not be meaningful differences between resting-state and task FC networks, in terms of FC individual differences. That said, we are exploring collaborations and other avenues to access comparable datasets that might enable a more powerful analysis in future work. This feedback is very important for guiding our ongoing efforts to verify and extend our conclusions.

      (4) Secondly, the authors could more explicitly discuss the broad implications of what their results mean for our understanding of how the architecture of the brain is determined by the genetic blueprint vs. how it is determined by learning (page 9). There is currently a wave of strong evidence favoring a more "nativist" view of brain architecture, for example, face- and object-sensitive regions seem to be in place practically from birth (see e.g. Kosakowski et al., Current Biology, 2022). The current results show what is the role played by experience.

      We thank the Reviewer for highlighting the need to elaborate on the broader implications of our findings in relation to the ongoing debate of nature vs. nurture. We agree that this discussion is crucial and have expanded our manuscript to address this point more explicitly. We now incorporate a more detailed discussion of how our results contribute to understanding the significant role of experience in shaping individual neural connectivity patterns, particularly in sensory-deprived populations (lines 360-372).

      Reviewer #3 (Public Review):

      Summary:

      (1) This study focuses on changes in brain organization associated with congenital deafness. The authors investigate differences in functional connectivity (FC) and differences in the variability of FC. By comparing congenitally deaf individuals to individuals with normal hearing, and by further separating congenitally deaf individuals into groups of early and late signers, the authors can distinguish between changes in FC due to auditory deprivation and changes in FC due to late language acquisition. They find larger FC variability in deaf than normal-hearing individuals in temporal, frontal, parietal, and midline brain structures, and that FC variability is largely driven by auditory deprivation. They suggest that the regions that show a greater FC difference between groups also show greater FC variability.

      Strengths:

      -  The manuscript is well written.

      -  The methods are clearly described and appropriate.

      -  Including the three different groups enables the critical contrasts distinguishing between different causes of FC variability changes.

      -  The results are interesting and novel.

      We thank the Reviewer for their positive and detailed feedback. Their acknowledgment of the clarity of our methods and the novelty of our results is greatly appreciated.

      Weaknesses:

      (2) Analyses were conducted for task-based data rather than resting-state data. It was unclear whether groups differed in task performance. If congenitally deaf individuals found the task more difficult this could lead to changes in FC.

      We thank the Reviewer for their observation regarding possible task performance differences between deaf and hearing participants and their potential effect on the results. Indeed, there was a difference in task accuracy between these groups. To account for this variation and ensure that our findings on functional connectivity were not confounded by task performance, we now included individual task accuracy and reaction time as nuisance variables in our analyses. This approach allowed us to control for any performance differences. The results now presented in the revised manuscript account for the inclusion of these two nuisance variables (accuracy and reaction time) and completely align with our original conclusions, highlighting increased variability in deafness, which is found in both the entire deaf group at large, as well as when equating language experience and comparing the hearing and native signers. The correlation between variability and group differences also remains significant, but its significance is slightly decreased, a moderate effect we acknowledge in the revised manuscript (see comment #4). The differences between the delayed signers and native signers are also retained (Figure 3), now aligning better with language-sensitive regions, as previously predicted. The inclusion of the task difficulty predictors also introduced an additional finding in this analysis, a significant cluster in the right aIFG. Therefore, the inclusion of these predictors reaffirms the robustness of the conclusions drawn about FC variability in the deaf population.

      We would like to note that while we would like to replicate these findings in an additional cohort using resting-state if we had access to such data, we do not anticipate the state in which the participants are scanned to greatly affect the findings. FC patterns of hearing individuals have been shown to be primarily shaped by common system and stable individual features, and not by time, state, or task (Finn et al., 2015; Gratton et al., 2018; Tavor et al., 2016). While the task may impact FC variability, we have recently shown that individual FC patterns are stable across time and state even in the context of plasticity due to visual deprivation (Amaral et al., 2024). Therefore, we expect that in deafness as well there should not be meaningful differences between resting-state and task FC networks, in terms of FC individual differences. We have also addressed this point in our manuscript (lines 442-451).

      (3) No differences in overall activation between groups were reported. Activation differences between groups could lead to differences in FC. For example, lower activation may be associated with more noise in the data, which could translate to reduced FC.

      We thank the reviewer for noting the potential implications of overall activation differences on FC. In our analysis of the activation for words, we found no significant clusters showing a group difference between the deaf and hearing participants (p < .05, cluster-corrected for multiple comparisons) - we also added this information to the revised manuscript (lines 542-544). This suggests that the differences in FC observed are not confounded by variations in overall brain activation between the groups under these conditions.

      (4) Figure 2B shows higher FC for congenitally deaf individuals than normal-hearing individuals in the insula, supplementary motor area, and cingulate. These regions are all associated with task effort. If congenitally deaf individuals found the task harder (lower performance), then activation in these regions could be higher, in turn, leading to FC. A study using resting-state data could possibly have provided a clearer picture.

      We thank the Reviewer for pointing out the potential impact of task difficulty on FC differences observed in our study. As addressed in our response to comment #2, task accuracy and reaction times were incorporated as nuisance variables in our analysis. Further, these areas showed no difference in activation between the groups (see response to comment #3 above). Notably, the referred regions still showed higher FC in congenitally deaf individuals even when controlling for these performance differences. Additionally, these findings are consistent with results from studies using resting-state data in deaf populations, further validating our observations. Specifically, using resting-state data, Andin & Holmer (2022), have shown higher FC for deaf (compared to hearing individuals) from auditory regions to the cingulate cortex, insular cortex, cuneus and precuneus, supramarginal gyrus, supplementary motor area, and cerebellum. Moreover, Ding et al. (2016) have shown higher FC for the deaf between the STG and anterior insula and dorsal anterior cingulated cortex. This suggests that the observed FC differences are likely reflective of genuine neuroplastic adaptations rather than mere artifacts of task difficulty. Although we wish we could augment our study with resting-state data analyzed similarly, we could not at present acquire or access such a dataset. We acknowledge this limitation of our study (lines 442-451) in the revised manuscript and intend to confirm that similar results will be found with resting state data in the future.

      (5) The correlation between the FC map and the FC variability map is 0.3. While significant using permutation testing, the correlation is low, and it is not clear how great the overlap is.

      We acknowledge that the correlation coefficient of 0.3, while statistically significant, indicates a moderate overlap. It's also worth noting that, using our new models that include task performance as a nuisance variable, this value has decreased somewhat, to 0.24 (which is still highly significant). It is important to note that the visual overlap between the maps is not a good estimate of the correlation, which was performed on the unthresholded maps, to estimate the link not only between the most significant peaks of the effects, but across the whole brain patterns. This correlation is meant to suggest a trend rather than a strong link, but especially due to its consistency with the findings in blindness, we believe this observation merits further investigation and discussion. As such, we kept it in the revised manuscript while moderating our claims about its strength.

      Reviewer #1 (Recommendations For The Authors):

      (1) Page 4: Does auditory cortex FC variability..." FC is not yet defined.

      Corrected, thanks.

      (2) Page 4: "It showed lower variability..." What showed this?

      Clarified, thanks.

      (3) Page 11: "highlining the importance" should read "highlighting the importance".

      Corrected, thanks.

      (4) Page 11: Do you really mean to suggest functional connectivity does not vary as a function of task? This would not seem well supported.

      We do not suggest that FC doesn’t vary as a function of task, and have revised this section (lines 447-451). 

      (5) Page 12: "there should not to be" should read "there should not be".

      Corrected, thanks.

      (6) Page 12: "and their majority" should read "and the majority".

      Corrected, thanks.

      Reviewer #2 (Recommendations For The Authors):

      Major

      (1) Although this is a lot of work, I nonetheless have another suggestion on how to test if your results are strong and robust. Perhaps you could analyze your data using an ROI/graph-theory approach. I am not an expert in graph theory analysis, but for sure there is a simple and elegant statistic that captures the variability of edge strength variability within a population. This approach could not only validate your results with an independent analysis and give the audience more confidence in their robustness, but it could also provide an estimate of the size of the effect size you found. That is, it could express in hard numbers how much more variable the connections from auditory cortex ROI's are, in comparison to the rest of the brain in the deaf population, relative to the hearing population.

      We thank the Reviewer for suggesting the use of graph theory as a method to further validate our findings. While we see the potential value in this approach, we believe it may be beyond the scope of the current paper, and merits a full exploration of its own, which we hope to do in the future.  However, we understand the importance of showing the uniqueness of the connectivity of the auditory cortex ROI as compared to the rest of the brain. So, in order to bolster our results, we conducted an additional analysis using control regions of interest (ROIs). Specifically, we calculated the inter-individual variability using all ROIs from the CONN Atlas (except auditory and language regions) as the control seed regions for the FC. We showed that the variability of connectivity from the auditory cortex is uniquely more increased on deafness, as compared to these control ROIs (Figure S1). This additional analysis supports the specificity of our findings to the auditory cortex in the deaf population. We aim to integrate more analytic approaches, including graph theory methods, in our future work.

      Minor

      (1) Some citations display the initial of the author in addition to the last name, unless there is something I don't know about the citation system, the initial shouldn't be there.

      This is due to the citation style we're using (APA 7th edition, as suggested by eLife), which requires including the first author's initials in all in-text citations when citing multiple authors with the same last name.  

      Reviewer #3 (Recommendations For The Authors):

      (1) I recommend that the authors provide behavioral data and results for overall neural activation.

      Thanks. We have added these to the revised manuscript. Specifically, we report that there was no difference in the activation for words (p < .05, cluster-corrected for multiple comparisons) between the deaf and hearing participants. Further, we report the behavioral averages for accuracy and reaction time for each group, and have now used these individual values explicitly as nuisance variables in the revised analyses.

      (2) For the correlation between FC and FC variability, it seemed a bit odd that the permuted data were treated additionally (through Gaussian smoothing). I understand the general logic (i.e., to reintroduce smoothness), but this approach provides more smoothing to the permutation than the original data. It is hard to know what this does to the statistical distribution. I recommend using a different approach or at least also reporting the p-value for non-smoothed permutation data.

      In response to this suggestion and to ensure transparency in our results, we have now included also the p-value for the non-smoothed permutation data in our revised manuscript (still highly significant; p < .0001). Thanks for this proposal.

      (3) For the map comparison, a plot with different colors, showing the FC map, the FC variability map, and one map for the overlap on the same brain may be helpful.

      We thank the Reviewer for their suggestion to visualize the overlap between the maps. However, we performed the correlation analysis using the unthresholded maps, as mentioned in the methods section of our manuscript, specifically to estimate the link not only between the most significant peaks of the effects, but across the whole brain patterns. This is why the maps displayed in the figures, which are thresholded for significance, may not appear to match perfectly, and may actually obscure the correlation across the brain. This methodological detail is crucial for interpreting the relationship and overlap between these maps accurately but also explains why the visualization of the overlap is, unfortunately, not very informative.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary

      The authors asked if parabrachial CGRP neurons were only necessary for a threat alarm to promote freezing or were necessary for a threat alarm to promote a wider range of defensive behaviors, most prominently flight.

      Major Strengths of Methods and Results

      The authors performed careful single-unit recording and applied rigorous methodologies to optogenetically tag CGRP neurons within the PBN. Careful analyses show that single-units and the wider CGRP neuron population increases firing to a range of unconditioned stimuli. The optogenetic stimulation of experiment 2 was comparatively simpler but achieved its aim of determining the consequence of activating CGRP neurons in the absence of other stimuli. Experiment 3 used a very clever behavioral approach to reveal a setting in which both cue-evoked freezing and flight could be observed. This was done by having the unconditioned stimulus be a "robot" traveling along a circular path at a given speed. Subsequent cue presentation elicited mild flight in controls and optogenetic activation of CGRP neurons significantly boosted this flight response. This demonstrated for the first time that CGRP neuron activation does more than promote freezing. The authors conclude by demonstrating that bidirectional modulation of CGRP neuron activity bidirectionally aTects freezing in a traditional fear conditioning setting and aTects both freezing and flight in a setting in which the robot served as the unconditioned stimulus. Altogether, this is a very strong set of experiments that greatly expand the role of parabrachial CGRP neurons in threat alarm.

      We would like to sincerely thank the reviewer for the positive and insightful comments on our work. We greatly appreciate the acknowledgment of our new behavioral approach, which allowed us to observe a dynamic spectrum of defensive behaviors in animals. Our use of the robot-based paradigm, which enables the observation of both freezing and flight, has been instrumental in expanding our understanding of how parabrachial CGRP neurons modulate diverse threat responses. We are pleased that the reviewer found this methodological innovation to be a valuable contribution to the field.

      Weaknesses

      In all of their conditioning studies the authors did not include a control cue. For example, a sound presented the same number of times but unrelated to US (shock or robot) presentation. This does not detract from their behavioral findings. However, it means the authors do not know if the observed behavior is a consequence of pairing. Or is a behavior that would be observed to any cue played in the setting? This is particularly important for the experiments using the robot US.

      We appreciate the reviewer’s insightful comment regarding the absence of a control cue in our conditioning studies. First, we would like to mention that, in response to the Reviewer 3, we have updated how we present our flight data by following methods from previously published papers (Fadok et al., 2017; Borkar et al., 2024). Instead of counting flight responses, we calculated flight scores as the ratio of the velocity during the CS to the average velocity in the 7 s before the CS on the conditioning day (or 10 s for the retention test). This method better captures both the speed and duration of fleeing during CS. With this updated approach, we observed a significant difference in flight scores between the ChR2 and control groups, even during conditioning, which may partly address the reviewer’s concern about whether the observed behavior is a consequence of CS-US pairing.

      However, we agree with the reviewer that including an unpaired group would provide stronger evidence, and in response, we conducted an additional experiment with an unpaired group. In this unpaired group, the CS was presented the same number of times, but the robot US was delivered randomly within the inter-trial interval. The unpaired group did not exhibit any notable conditioned freezing or flight responses. We believe that this additional experiment, now reflected in Figure 3, further strengthens our conclusion that the fleeing behavior is driven by associative learning between the CS and US, rather than a reaction to the cue itself.

      The authors make claims about the contribution of CGRP neurons to freezing and fleeing behavior, however, all of the optogenetic manipulations are centered on the US presentation period. Presently, the experiments show a role for these neurons in processing aversive outcomes but show little role for these neurons in cue responding or behavior organizing. Claims of contributions to behavior should be substantiated by manipulations targeting the cue period.

      We appreciate the reviewer’s constructive comments. We would like to emphasize that our primary objective in this study was to investigate whether activating parabrachial CGRP neurons—thereby increasing the general alarm signal—would elicit different defensive behaviors beyond passive freezing. To this end, we focused on manipulating CGRP neurons during the US period rather than the cue period.

      Previous studies have shown that CGRP neurons relay US signals, and direct activation of CGRP neurons has been used as the US to successfully induce conditioned freezing responses to the CS during retention tests (Han et al., 2015; Bowen et al., 2020). In our experiments, we also observed that CGRP neurons responded exclusively to the US during conditioning with the robot (Figure 1F), and stimulating these neurons in the absence of any external stimuli elicited strong freezing responses (Figure 2B). These findings, collectively, suggest that activation of CGRP neurons during the CS period would predominantly result in freezing behavior.

      Therefore, we manipulated the activity of CGRP neurons during the US period to examine whether adjusting the perceived threat level through these neurons would result in diverse dfensive behaivors when paired with chasing robot. We observed that enhancing CGRP neuron activity while animals were chased by the robot at 70 cm/s made them react as if chased at a higher speed (90 cm/s), leading to increased fleeing behaviors. While this may not fully address the role of these neurons in cue responding or behavior organizing, we found that silencing CGRP neurons with tetanus toxin (TetTox) abolished fleeing behavior even when animals were chased at high speeds (90 cm/s), which usually elicits fleeing without CGRP manipulation (Figure 5). This supports the conclusion that CGRP neurons are necessary for processing fleeing responses.

      In summary, manipulating CGRP neurons during the US period was essential for effectively investigating their role in adjusting defensive responses, thereby expanding our understanding of their function within the general alarm system. We hope this clarifies our experimental design and addresses the concern the reviewer has raised.

      Appraisal

      The authors achieved their aims and have revealed a much greater role for parabrachial CGRP neurons in threat alarm.

      Discussion

      Understanding neural circuits for threat requires us (as a field) to examine diverse threat settings and behavioral outcomes. A commendable and rigorous aspect of this manuscript was the authors decision to use a new behavioral paradigm and measure multiple behavioral outcomes. Indeed, this manuscript would not have been nearly as impactful had they not done that. This novel behavior was combined with excellent recording and optogenetic manipulations - a standard the field should aspire to. Studies like this are the only way that we as a field will map complete neural circuits for threat.

      We sincerely thank the reviewer for their positive and encouraging comments. We are grateful for the acknowledgment of our efforts in employing a novel behavioral paradigm to study diverse defensive behaviors. We are pleased that our work contributes to advancing the understanding of neural circuits involved in threat responses.

      Reviewer #3 (Public Review):

      Strengths:

      The study used optogenetics together with in vivo electrophysiology to monitor CGRP neuron activity in response to various aversive stimuli including robot chasing to determine whether they encode noxious stimuli diTerentially. The study used an interesting conditioning paradigm to investigate the role of CGRP neurons in the PBN in both freezing and flight behaviors.

      Weakness:

      The major weakness of this study is that the chasing robot threat conditioning model elicits weak unconditioned and conditioned flight responses, making it diTicult to interpret the robustness of the findings. Furthermore, the conclusion that the CGRP neurons are capable of inducing flight is not substantiated by the data. No manipulations are made to influence the flight behavior of the mouse. Instead, the manipulations are designed to alter the intensity of the unconditioned stimulus.

      We sincerely thank the reviewer for the thoughtful and constructive comments on our manuscript. In response to this feedback, we revisited our analysis of the flight responses and compared our methods with those used in previous literatures examining similar behaviors.

      We reviewed a study investigating sex differences in defensive behavior using rats (Gruene et al., 2015). In that study, the CS was presented for 30 s, and active defensive behvaior – referred to as ‘darting’ – was quantified as ‘Dart rate (dart/min)’. This was calculated by doubling the number of darts counted during the 30-s CS presentation to extrapolate to a per-min rate. The highest average dart rate observed was approximatley 1.5. Another relevant studies using mice quantified active defensive behavior by calculating a flight score—the ratio of the average speed during each CS to the average speed during the 10 s pre-CS period (Fadok et al., 2017; Borkar et al., 2024). This method captures multiple aspects of flight behavior during CS presentation, including overall velocity, number of bouts, and duration of fleeing. Moreover, it accounts for each animal’s individual velocity prior to the CS, reflecting how fast the animals were fleeing relative to their baseline activity.

      In our original analysis, we quantified flight responses by counting rapid fleeing movements, defined as movements exceeding 8 cm/s. This approach was consistent with our previous study using the same robot paradigm to observe unique patterns of defensive behavior related to sex differences (Pyeon et al., 2023). Based on our earlier findings, where this approach effectively identified significant differences in defensive behaviors, we believed that this method was appropriate for capturing conditioned flight behavior within our specific experimental context. However, prompted by the reviewer's insightful comments, we recognized that our initial method might not fully capture the robustness of the flight responses. Therefore, we re-analyzed our data using the flight score method described by Fadok and colleagues, which provides a more sensitive measure of fleeing during the CS.

      Re-analyzing our data revealed a more robust flight response than previously reported, demonstrating that additional CGRP neuron stimulation promoted flight behavior in animals during conditioning, addressing the concern that the data did not substantiate the role of CGRP neurons in inducing flight. In addition, we would like to emphasize the findings from our final experiment, where silencing CGRP neurons, even under high-threat conditions (90 cm/s), prevented animals from exhibiting flight responses. This demonstrates that CGRP neurons are necessary in influencing flight responses.

      We have updated all flight data in the manuscript and revised the relevant figures and text accordingly. We appreciate the opportunity to enhance our analysis. The reviewer's insightful observation led us to adopt a better method for quantifying flight behavior, which substantiates our conclusion about the role of CGRP neurons in modulating defensive responses.

      Borkar, C.D., Stelly, C.E., Fu, X., Dorofeikova, M., Le, Q.-S.E., Vutukuri, R., et al. (2024). Top- down control of flight by a non-canonical cortico-amygdala pathway. Nature 625(7996), 743-749.

      Bowen, A.J., Chen, J.Y., Huang, Y.W., Baertsch, N.A., Park, S., and Palmiter, R.D. (2020). Dissociable control of unconditioned responses and associative fear learning by parabrachial CGRP neurons. Elife 9, e59799.

      Fadok, J.P., Krabbe, S., Markovic, M., Courtin, J., Xu, C., Massi, L., et al. (2017). A competitive inhibitory circuit for selection of active and passive fear responses. Nature 542(7639), 96-100.

      Gruene, T.M., Flick, K., Stefano, A., Shea, S.D., and Shansky, R.M. (2015). Sexually divergent expression of active and passive conditioned fear responses in rats. Elife 4, e11352.

      Han, S., Soleiman, M.T., Soden, M.E., Zweifel, L.S., and Palmiter, R.D. (2015). Elucidating an a_ective pain circuit that creates a threat memory. Cell 162(2), 363-374.

      Pyeon, G.H., Lee, J., Jo, Y.S., and Choi, J.-S. (2023). Conditioned flight response in female rats to naturalistic threat is estrous-cycle dependent. Scientific Reports 13(1), 20988.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript from So et al. describes what is suggested to be an improved protocol for single-nuclei RNA sequencing (snRNA-seq) of adipose tissue. The authors provide evidence that modifications to the existing protocols result in better RNA quality and nuclei integrity than previously observed, with ultimately greater coverage of the transcriptome upon sequencing. Using the modified protocol, the authors compare the cellular landscape of murine inguinal and perigonadal white adipose tissue (WAT) depots harvested from animals fed a standard chow diet (lean mice) or those fed a high-fat diet (mice with obesity). 

      Strengths: 

      Overall, the manuscript is well-written, and the data are clearly presented. The strengths of the manuscript rest in the description of an improved protocol for snRNA-seq analysis. This should be valuable for the growing number of investigators in the field of adipose tissue biology that are utilizing snRNA-seq technology, as well as those other fields attempting similar experiments with tissues possessing high levels of RNAse activity. 

      Moreover, the study makes some notable observations that provide the foundation for future investigation. One observation is the correlation between nuclei size and cell size, allowing for the transcriptomes of relatively hypertrophic adipocytes in perigonadal WAT to be examined. Another notable observation is the identification of an adipocyte subcluster (Ad6) that appears "stressed" or dysfunctional and likely localizes to crown-like inflammatory structures where proinflammatory immune cells reside. 

      Weaknesses:  

      Analogous studies have been reported in the literature, including a notable study from Savari et al. (Cell Metabolism). This somewhat diminishes the novelty of some of the biological findings presented here. Moreover, a direct comparison of the transcriptomic data derived from the new vs. existing protocols (i.e. fully executed side by side) was not presented. As such, the true benefit of the protocol modifications cannot be fully understood. 

      We agree with the reviewer’s comment on the limitations of our study. Following the reviewer's suggestion, we performed a new analysis by integrating our data with those from the study by Emont et al. Please refer to the Recommendation for authors section below for further details.

      Reviewer #2 (Public Review):

      Summary: 

      In the present manuscript So et al utilize single-nucleus RNA sequencing to characterize cell populations in lean and obese adipose tissues. 

      Strengths: 

      The authors utilize a modified nuclear isolation protocol incorporating VRC that results in higherquality sequencing reads compared with previous studies. 

      Weaknesses:  

      The use of VRC to enhance snRNA-seq has been previously published in other tissues. The snRNA-seq snRNA-seq data sets presented in this manuscript, when compared with numerous previously published single-cell analyses of adipose tissue, do not represent a significant scientific advance. 

      Figure 1-3: The snRNA-seq data obtained by the authors using their enhanced protocol does not represent a significant improvement in cell profiling for the majority of the highlighted cell types including APCs, macrophages, and lymphocytes. These cell populations have been extensively characterized by cytoplasmic scRNA-seq which can achieve sufficient sequencing depth, and thus this study does not contribute meaningful additional insight into these cell types. The authors note an increase in the number of rare endothelial cell types recovered, however this is not translated into any kind of functional analysis of these populations. 

      We acknowledge the reviewer's comments on the limitations of our study, particularly the lack of extension of our snRNA-seq data into functional studies of new biological processes. However, this manuscript has been submitted as a Tools and Resources article. As an article of this type, we provide detailed information on our snRNA-seq methods and present a valuable resource of high-quality mouse adipose tissue snRNA-seq data. In addition, we demonstrate that our improved method offers novel biological insights, including the identification of subpopulations of adipocytes categorized by size and functionality. We believe this study offers powerful tools and significant value to the research community.

      Figure 4: The authors did not provide any evidence that the relative fluorescent brightness of GFP and mCherry is a direct measure of the nuclear size, and the nuclear size is only a moderate correlation with the cell size. Thus sorting the nuclei based on GFP/mCherry brightness is not a great proxy for adipocyte diameter. Furthermore, no meaningful insights are provided about the functional significance of the reported transcriptional differences between small and large adipocyte nuclei. 

      To address the reviewer's point, we analyzed the Pearson correlation coefficient for nucleus size vs. adipocyte size and found R = 0.85, indicating a strong positive correlation. In addition, we performed a new experiment to determine the correlation between nuclear GFP intensity and adipocyte nucleus size, finding a strong correlation with R = 0.91. These results suggest that nuclear GFP intensity can be a strong proxy for adipocyte size. Furthermore, we performed gene ontology analysis on genes differentially regulated between large and small adipocyte nuclei. We found that large adipocytes promote processes involved in insulin response, vascularization and DNA repair, while inhibiting processes related to cell migration, metabolism and the cytoskeleton. We have added these new data as Figure 4E, S6E, S6G, and S6H (page 11)

      Figure 5-6: The Ad6 population is highly transcriptionally analogous to the mAd3 population from Emont et al, and is thus not a novel finding. Furthermore, in the present data set, the authors conclude that Ad6 are likely stressed/dying hypertrophic adipocytes with a global loss of gene expression, which is a well-documented finding in eWAT > iWAT, for which the snRNA-seq reported in the present manuscript does not provide any novel scientific insight. 

      As the reviewer pointed out, a new analysis integrating our data with the previous study found that Ad3 from our study is comparable to mAd3 from Emont et al. in gene expression profiles. However, significant discrepancies in population size and changes in response to obesity were observed, likely due to differences in technical robustness. The dysfunctional cellular state of this population, with compromised RNA content, may have hindered accurate capture in the previous study, while our protocol enabled precise detection. This underscores the importance of our improved snRNA-seq protocol for accurately understanding adipocyte population dynamics. We have revised the manuscript to include new data in Figure S7 (page 14).

      Reviewer #3 (Public Review): 

      Summary:  

      The authors aimed to improve single-nucleus RNA sequencing (snRNA-seq) to address current limitations and challenges with nuclei and RNA isolation quality. They successfully developed a protocol that enhances RNA preservation and yields high-quality snRNA-seq data from multiple tissues, including a challenging model of adipose tissue. They then applied this method to eWAT and iWAT from mice fed either a normal or high-fat diet, exploring depot-specific cellular dynamics and gene expression changes during obesity. Their analysis included subclustering of SVF cells and revealed that obesity promotes a transition in APCs from an early to a committed state and induces a pro-inflammatory phenotype in immune cells, particularly in eWAT. In addition to SVF cells, they discovered six adipocyte subpopulations characterized by a gradient of unique gene expression signatures. Interestingly, a novel subpopulation, termed Ad6, comprised stressed and dying adipocytes with reduced transcriptional activity, primarily found in eWAT of mice on a high-fat diet. Overall, the methodology is sound, the writing is clear, and the conclusions drawn are supported by the data presented. Further research based on these findings could pave the way for potential novel interventions in obesity and metabolic disorders, or for similar studies in other tissues or conditions. 

      Strengths:  

      • The authors developed a robust snRNA-seq technique that preserves the integrity of the nucleus and RNA across various tissue types, overcoming the challenges of existing methods. 

      • They identified adipocyte subpopulations that follow adaptive or pathological trajectories during obesity. 

      • The study reveals depot-specific differences in adipose tissues, which could have implications for targeted therapies. 

      Weaknesses: 

      • The adipose tissues were collected after 10 weeks of high-fat diet treatment, lacking the intermediate time points for identifying early markers or cell populations during the transition from healthy to pathological adipose tissue. 

      We agree with the reviewers regarding the limitations of our study. To address the reviewer’s comment, we revised the manuscript to include this in the Discussion section (page 17).  

      • The expansion of the Ad6 subpopulation in obese iWAT and gWAT is interesting. The author claims that Ad6 exhibited a substantial increase in eWAT and a moderate rise in iWAT (Figure 4C). However, this adipocyte subpopulation remains the most altered in iWAT upon obesity. Could the authors elaborate on why there is a scarcity of adipocytes with ROS reporter and B2M in obese iWAT?

      We observed an increase in the levels of H2DCFA reporter and B2M protein fluorescence in adipocytes from iWAT of HFD-fed mice, although this increase was much less compared to eWAT, as shown in Figure 6B (left panel). These increases in iWAT were not sufficient for most cells to exceed the cutoff values used to determine H2DCFA and B2M positivity in adipocytes during quantitative analysis. We have revised the manuscript to clarify these results (page 13).

      • While the study provides extensive data on mouse models, the potential translation of these findings to human obesity remains uncertain. 

      To address the reviewer’s point, we expanded our discussion on the differences in adipocyte heterogeneity between mice and humans. We attempted to identify human adipocyte subclusters that resemble the metabolically unhealthy Ad6 adipocytes found in mice in our study; however, we did not find any similar adipocyte types. It has been reported that human adipocyte heterogeneity does not correspond well to that of mouse adipocytes (Emont et al. 2022). In addition, the heterogeneity of human adipocyte populations is not reproducible between different studies (Massier et al. 2023). Interestingly, this inconsistency is unique to adipocytes, as other cell types in adipose tissues display reproducible sub cell types across species and studies (Massier et al. 2023). Our findings indicate that adipocytes may exhibit a unique pathological cellular state with significantly reduced RNA content, which may contribute to the poor consistency in adipocyte heterogeneity in prior studies with suboptimal RNA quality. Therefore, using a robust method to effectively preserve RNA quality may be critical for accurately characterizing adipocyte populations, especially in disease states. It may be important to test in future studies whether our snRNA-seq protocol can identify consistent heterogeneity in adipocyte populations across different species, studies, and individual human subjects. We have revised the manuscript to include this new discussion (page 17).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Suggested points to address: 

      (1) The authors suggest that their improved protocol for maintaining RNA/nucleus integrity results in a more comprehensive analysis of adipose tissue heterogeneity. The authors compare the quality of their snRNA-seq data to those generated in prior studies (e.g., Savari et al.). What is not clear is whether additional heterogeneity/clusters can be observed due directly to the protocol modifications. A direct head-to-head comparison of the protocols executed in parallel would of course be ideal; however, integrating their new dataset with the corresponding data from Savari et al. could help address this question and help readers understand the benefits of this new protocol vs. existing protocols. 

      The data from Savari et al. are of significantly lower quality, likely because they were generated using earlier versions of the 10X Genomics system, and this study lacks iWAT data. To address the reviewer’s point, we instead integrated our data with those from the other study by Emont et al. (2022), which used comparable tissue types and experimental systems. The integrated analysis confirmed the improved representation of all cell types present in adipose tissues in our study, with higher quality metrics such as increased Unique Molecular Identifiers (UMIs) and the number of genes per nucleus. These results indicate that our protocol offers significant advantages in generating a more accurate representation of each cell type and their gene expression profiles. New data are included in Figure S2 (page 7).

      (2) The exact frequency of the Ad6 population in eWAT of mice maintained on HFD is a little unclear. From the snRNA-seq data, it appears that roughly 47% of the adipocytes are in this "stressed state." In Figure 6, it appears that greater than 75% of the adipocytes express B2M (Ad6 marker) and greater than 75% of adipocytes are suggested to be devoid of measurable PPARg expression. The latter seems quite high as PPARg expression is essential to maintain the adipocyte phenotype. Is there evidence of de-differentiation amongst them (i.e. acquisition of progenitor cell markers)? Presenting separate UMAPs for the chow vs. HFD state may help visualize the frequency of each adipocyte population in the two states. Inclusion of the stromal/progenitor cells in the visualization may help understand if cells are de-differentiating in obesity as previously postulated by the authors. Related to Point # 1 above, is this population observed in prior studies and at a similar frequency?

      To address the reviewer’s point, we analyzed the expression of adipocyte progenitor cell (APC) markers, such as Pdgfra, in the Ad6 population. We did not detect significant expression of APC markers, suggesting that Ad6 does not represent dedifferentiating adipocytes. Instead, they are likely stressed and dying cells characterized by an aberrant state of transcription with a global decline.

      When integrating our data with the datasets by Emont et al., we observed an adipocyte population in the previous study, mAd3, comparable to Ad6 in our study, with similar marker gene expression and lower transcript abundance. However, the population size of mAd3 was much smaller than that of Ad6 in our data and did not show consistent population changes during obesity. This discrepancy may be due to different technical robustness; the dysfunctional cellular state of this population, with its severely compromised RNA contents, may have made it difficult to accurately capture using standard protocols in the previous study, while our protocol enabled robust and precise detection. We added new data in Figure S6I and S7 (page 14) and revised the Discussion (page 17).

      Additional points  

      (1) The authors should be cautious in describing subpopulations as "increasing" or "decreasing" in obesity as the data are presented as proportions of a parent population. A given cell population may be "relatively increased." 

      To address the reviewer's point, we revised the manuscript to clarify the "relative" changes in cell populations during obesity in the relevant sections (pages 8, 9, 10, 11, and 15).

      (2) The authors should also be cautious in ascribing "function" to adipocyte populations based solely on their expression signatures. Statements such as those in the abstract, "...providing novel insights into the mechanisms orchestrating adipose tissue remodeling during obesity..." should probably be toned down as no such mechanism is truly demonstrated. 

      To address the reviewer's point, we revised the manuscript by removing or replacing the indicated terms or phrases with more suitable wording in the appropriate sections (page 2, 10, 12, 14)

      Reviewer #3 (Recommendations For The Authors): 

      (1) The authors might consider expanding a discussion on the potential implications of their findings, especially the newly identified adipocyte subpopulations and depot-specific differences for human studies. 

      To address the reviewer’s point, we attempted to identify human adipocyte subclusters that resembled our dysfunctional Ad6 adipocytes in mice; however, we did not find any similar adipocyte types. It has been reported that human adipocyte heterogeneity does not correspond well to that of mouse adipocytes (Emont et al. 2022). In addition, the heterogeneity of human adipocyte populations is not reproducible between different studies (Massier et al. 2023). Interestingly, this inconsistency is unique to adipocytes, as other cell types in adipose tissues display reproducible sub cell types across species and studies (Massier et al. 2023). Our findings indicate that adipocytes may exhibit a unique pathological cellular state with significantly reduced RNA content, which may contribute to the poor consistency in adipocyte heterogeneity in prior studies with suboptimal RNA quality. Therefore, using a robust method to effectively preserve RNA quality may be critical for accurately characterizing adipocyte populations, especially in disease states. It may be important to test in future studies whether our snRNA-seq protocol can identify consistent heterogeneity in adipocyte populations across different species, studies, and individual human subjects. We have revised the manuscript to include this new discussion (page 17)

      (2) typo: "To generate diet-induced obesity models". 

      We revised the manuscript to correct it.

    1. Author response:

      Reviewer #1 (Public Review):

      The authors examined the hypothesis that plasma ApoM, which carries sphingosine-1-phosphate (S1P) and activates vascular S1P receptors to inhibit vascular leakage, is modulated by SGLT2 inhibitors (SGLTi) during endotoxemia. They also propose that this mechanism is mediated by SGLTi regulation of LRP2/ megalin in the kidney and that this mechanism is critical for endotoxin-induced vascular leak and myocardial dysfunction. The hypothesis is novel and potentially exciting. However, the author's experiments lack critical controls, lack rigor in multiple aspects, and overall does not support the conclusions.

      Thank you for these comments. We have now directly addressed this hypothesis by using proximal tubule-specific inducible megalin/Lrp2 knockout mice, which remains an innovative hypothesis about how SGLT2i can reduce vascular leak.

      Reviewer #2 (Public Review):

      Apolipoprotein M (ApoM) is a plasma carrier for the vascular protective lipid mediator sphingosine 1-phospate (S1P). The plasma levels of S1P and its chaperones ApoM and albumin rapidly decline in patients with severe sepsis, but the mechanisms for such reductions and their consequences for cardiovascular health remain elusive. In this study, Ripoll and colleagues demonstrate that the sodium-glucose co-transporter inhibitor dapagliflozin (Dapa) can preserve serum ApoM levels as well as cardiac function after LPS treatment of mice with diet-induced obesity. They further provide data to suggest that Dapa preserves serum ApoM by increasing megalin-mediated reabsorption of ApoM in renal proximal tubules and that ApoM improves vascular integrity in LPS treated mice. These observations put forward a potential therapeutic approach to sustain vascular protective S1P signaling that could be relevant to other conditions of systemic inflammation where plasma levels of S1P decrease. However, although the authors are careful with their statements, the study falls short of directly implicating megalin in ApoM reabsorption and of ApoM/S1P depletion in LPS-induced cardiac dysfunction and the protective effects of Dapa.

      The observations reported in this study are exciting and potentially of broad interest. The paper is well written and concise, and the statements made are mostly supported by the data presented. However, the mechanism proposed and implied is mostly based on circumstantial evidence, and the paper could be substantially improved by directly addressing the role of megalin in ApoM reabsorption and serum ApoM and S1P levels and the importance of ApoM for the preservation for cardiac function during endotoxemia. Some observations that are not necessarily in line with the model proposed should also be discussed.

      The authors show that Dapa preserves serum ApoM and cardiac function in LPS-treated obese mice. However, the evidence they provide to suggest that ApoM may be implicated in the protective effect of Dapa on cardiac function is indirect. Direct evidence could be sought by addressing the effect of Dapa on cardiac function in LPS treated ApoM deficient and littermate control mice (with DIO if necessary).

      The authors also suggest that higher ApoM levels in mice treated with Dapa and LPS reflect increased megalin-mediated ApoM reabsorption and that this preserves S1PR signaling. This could be addressed more directly by assessing the clearance of labelled ApoM, by addressing the impact of megalin inhibition or deficiency on ApoM clearance in this context, and by measuring S1P as well as ApoM in serum samples.

      Methods: More details should be provided in the manuscript for how ApoM deficient and transgenic mice were generated, on sex and strain background, and on whether or not littermate controls were used. For intravital microscopy, more precision is needed on how vessel borders were outland and if this was done with or without regard for FITC-dextran. Please also specify the type of vessel chosen and considerations made with regard to blood flow and patency of the vessels analyzed. For statistical analyses, data from each mouse should be pooled before performing statistical comparisons. The criteria used for choice of test should be outlined as different statistical tests are used for similar datasets. For all data, please be consistent in the use of post-tests and in the presentation of comparisons. In other words, if the authors choose to only display test results for groups that are significantly different, this should be done in all cases. And if comparisons are made between all groups, this should be done in all cases for similar sets of data.

      Thank you for these comments. We have now tested the direct role of Lrp2 with respect to SGLT2i in vivo and in vitro, and our study now shows that Lrp2 is required for the effect of dapagliflozin on ApoM. ApoM deficient and transgenic mice were previously described and published by our group (PMID: 37034289) and others (PMID: 24318881), and littermate controls were used throughout our manuscript. We agree that the effect on cardiac function is likely indirect in these models, and as yet we do not have the tools in the LPS model to separate potential endothelial protective vs cardiac effects. In addition, since the ApoM knockout has multiple abnormalities that include hypertension, secondary cardiac hypertrophy, and an adipose/browning phenotype, all of which may influence its response to Dapa in terms of cardiac function, these studies will be challenging to perform and will require additional models that are beyond the scope of this manuscript.

      For intravital microscopy, vessel borders were outlined blindly without regard for FITC-dextran. We believe it is important to show multiple blood vessels per mouse since, as the reviewer points out, there is quite a bit of vessel heterogeneity. These tests were performed in the collaborator’s laboratory, and data analysis was blinded, and the collaborator was unaware of the study hypothesis at the time the measurements were performed and analyzed. They have previously reported this is a valid method to show cremaster vessel permeability (PMID: 26839042).

      We have updated our methods section and updated the figure legends to clearly indicate the statistical analyses we used. For 2 group comparison we used student’s t-test, and for multiple groups one-way ANOVA with Sidak's correction for multiple comparisons was used throughout the paper when the data are normally distributed, and Kruskal-Wallis was used when the data are not normally distributed.

      Reviewer #3 (Public Review):

      The authors have performed well designed experiments that elucidate the protective role of Dapa in sepsis model of LPS. This model shows that Dapa works, in part, by increasing expression of the receptor LRP2 in the kidney, that maintains circulating ApoM levels. ApoM binds to S1P which then interacts with the S1P receptor stimulating cardiac function, epithelial and endothelial barrier function, thereby maintaining intravascular volume and cardiac output in the setting of severe inflammation. The authors used many experimental models, including transgenic mice, as well as several rigorous and reproducible techniques to measure the relevant parameters of cardiac, renal, vascular, and immune function. Furthermore, they employ a useful inhibitor of S1P function to show pharmacologically the essential role for this agonist in most but not all the benefits of Dapa. A strength of the paper is the identification of the pathway responsible for the cardioprotective effects of SGLT2is that may yield additional therapeutic targets. There are some weaknesses in the paper, such as, studying only male mice, as well as providing a power analysis to justify the number of animals used throughout their experimentation. Overall, the paper should have a significant impact on the scientific community because the SGLT2i drugs are likely to find many uses in inflammatory diseases and metabolic diseases. This paper provides support for an important mechanism by which they work in conditions of severe sepsis and hemodynamic compromise.

      Thank you for these comments.

    1. Author response:

      Reviewer #1 (Public Review):

      This paper proposes a novel framework for explaining patterns of generalization of force field learning to novel limb configurations. The paper considers three potential coordinate systems: cartesian, joint-based, and object-based. The authors propose a model in which the forces predicted under these different coordinate frames are combined according to the expected variability of produced forces. The authors show, across a range of changes in arm configurations, that the generalization of a specific force field is quite well accounted for by the model.

      The paper is well-written and the experimental data are very clear. The patterns of generalization exhibited by participants - the key aspect of the behavior that the model seeks to explain - are clear and consistent across participants. The paper clearly illustrates the importance of considering multiple coordinate frames for generalization, building on previous work by Berniker and colleagues (JNeurophys, 2014). The specific model proposed in this paper is parsimonious, but there remain a number of questions about its conceptual premises and the extent to which its predictions improve upon alternative models.

      A major concern is with the model's premise. It is loosely inspired by cue integration theory but is really proposed in a fairly ad hoc manner, and not really concretely founded on firm underlying principles. It's by no means clear that the logic from cue integration can be extrapolated to the case of combining different possible patterns of generalization. I think there may in fact be a fundamental problem in treating this control problem as a cue-integration problem. In classic cue integration theory, the various cues are assumed to be independent observations of a single underlying variable. In this generalization setting, however, the different generalization patterns are NOT independent; if one is true, then the others must inevitably not be. For this reason, I don't believe that the proposed model can really be thought of as a normative or rational model (hence why I describe it as 'ad hoc'). That's not to say it may not ultimately be correct, but I think the conceptual justification for the model needs to be laid out much more clearly, rather than simply by alluding to cue-integration theory and using terms like 'reliability' throughout.

      We thank the reviewer for bringing up this point. We see and treat this problem of finding the combination weights not as a cue integration problem but as an inverse optimal control problem. In this case, there can be several solutions to the same problem, i.e., what forces are expected in untrained areas, which can co-exist and give the motor system the option to switch or combine them. This is similar to other inverse optimal control problems, e.g. combining feedforward optimal control models to explain simple reaching. However, compared to these problems, which fit the weights between different models, we proposed an explanation for the underlying principle that sets these weights for the dynamics representation problem. We found that basing the combination on each motor plan's reliability can best explain the results. In this case, we refer to ‘reliability’ as execution reliability and not sensory reliability, which is common in cue integration theory. We have added further details explaining this in the manuscript.

      “We hypothesize that this inconsistency in results can be explained using a framework inspired by an inverse optimal control framework. In this framework the motor system can switch or combine between different solutions. That is, the motor system assigns different weights to each solution and calculates a weighted sum of these solutions. Usually, to support such a framework, previous studies found the weights by fitting the weighed sum solution to behavioral data (Berret, Chiovetto et al. 2011). While we treat the problem in the same manner, we propose the Reliable Dynamics Representation (Re-Dyn) mechanism that determines the weights instead of fitting them. According to our framework, the weights are calculated by considering the reliability of each representation during dynamic generalization. That is, the motor system prefers certain representations if the execution of forces based on this representation is more robust to distortion arising from neural noise. In this process, the motor system estimates the difference between the desired generalized forces and generated generalized forces while taking into consideration noise added to the state variables that equivalently define the forces.”

      A more rational model might be based on Bayesian decision theory. Under such a model, the motor system would select motor commands that minimize some expected loss, averaging over the various possible underlying 'true' coordinate systems in which to generalize. It's not entirely clear without developing the theory a bit exactly how the proposed noise-based theory might deviate from such a Bayesian model. But the paper should more clearly explain the principles/assumptions of the proposed noise-based model and should emphasize how the model parallels (or deviates from) Bayesian-decision-theory-type models.

      As we understand the reviewer's suggestion, the idea is to estimate the weight of each coordinate system based on minimizing a loss function that considers the cost of each weight multiplied by a posterior probability that represents the uncertainty in this weight value. While this is an interesting idea, we believe that in the current problem, there are no ‘true’ weight values. That is, the motor system can use any combination of weights which will be true due to the ambiguous nature of the environment. Since the force field was presented in one area of the entire workspace, there is no observation that will allow us to update prior beliefs regarding the force nature of the environment. In such a case, the prior beliefs might play a role in the loss function, but in our opinion, there is no clear rationale for choosing unequal priors except guessing or fitting prior probabilities, which will resemble any other previous models that used fitting rather than predictions.

      Another significant weakness is that it's not clear how closely the weighting of the different coordinate frames needs to match the model predictions in order to recover the observed generalization patterns. Given that the weighting for a given movement direction is over- parametrized (i.e. there are 3 variable weights (allowing for decay) predicting a single observed force level, it seems that a broad range of models could generate a reasonable prediction. It would be helpful to compare the predictions using the weighting suggested by the model with the predictions using alternative weightings, e.g. a uniform weighting, or the weighting for a different posture. In fact, Fig. 7 shows that uniform weighting accounts for the data just as well as the noise-based model in which the weighting varies substantially across directions. A more comprehensive analysis comparing the proposed noise-based weightings to alternative weightings would be helpful to more convincingly argue for the specificity of the noise-based predictions being necessary. The analysis in the appendix was not that clearly described, but seemed to compare various potential fitted mixtures of coordinate frames, but did not compare these to the noise-based model predictions.

      We agree with the reviewer that fitted global weights, that is, an optimal weighted average of the three coordinate systems should outperform most of the models that are based on prediction instead of fitting the data. As we showed in Figure 7 of the submitted version of the manuscript, we used the optimal fitted model to show that our noise-based model is indeed not optimal but can predict the behavioral results and not fall too short of a fitted model. When trying to fit a model across all the reported experiments, we indeed found a set of values that gives equal weights for the joints and object coordinate systems (0.27 for both), and a lower value for the Cartesian coordinate system (0.12). Considering these values, we indeed see how the reviewer can suggest a model that is based on equal weights across all coordinate systems. While this model will not perform as well as the fitted model, it can still generate satisfactory results.

      To better understand if a model based on global weights can explain the combination between coordinate systems, we perform an additional experiment. In this experiment, a model that is based on global fitted weights can only predict one out of two possible generalization patterns while models that are based on individual direction-predicted weights can predict a variety of generalization patterns. We show that global weights, although fitted to the data, cannot explain participants' behavior. We report these new results in Appendix 2.

      “To better understand if a model based on global weights can explain the combination between coordinate systems, we perform an additional experiment. We used the idea of experiment 3 in which participants generalize learned dynamics using a tool. That is, the arm posture does not change between the training and test areas. In such a case, the Cartesian and joint coordinate systems do not predict a shift in generalized force pattern while the object coordinate system predicts a shift that depends on the orientation of the tool. In this additional experiment, we set a test workspace in which the orientation of the tool is 90° (Appendix 2- figure 1A). In this case, for the test workspace, the force compensation pattern of the object based coordinate system is in anti-phase with the Cartesian/joint generalization pattern. Any globally fitted weights (including equal weights) can produce either a non-shifted or 90° shifted force compensation pattern (Appendix 2- figure 1B). Participants in this experiment (n=7) showed similar MPE reduction as in all previous experiments when adapting to the trigonometric scaled force field (Appendix 2- figure 1C). When examining the generalized force compensation patterns, we observed a shift of the pattern in the test workspace of 14.6° (Appendix 2- figure 1D). This cannot be explained by the individual coordinate system force compensation patterns or any combination of them (which will always predict either a 0° or 90° shift, Appendix 2- figure 1E). However, calculating the prediction of the Re-Dyn model we found a predicted force compensation pattern with a shift of 6.4° (Appendix 2- figure 1F). The intermediate shift in the force compensation pattern suggests that any global based weights cannot explain the results.”

      With regard to the suggestion that weighting is changed according to arm posture, two of our results lower the possibility that posture governs the weights:

      (1) In experiment 3, we tested generalization while keeping the same arm posture between the training and test workspaces, and we observed different force compensation profiles across the movement directions. If arm posture in the test workspaces affected the weights, we would expect identical weights for both test workspaces. However, any set of weights that can explain the results observed for workspace 1 will fail to explain the results observed in workspace 2. To better understand this point we calculated the global weights for each test workspace for this experiment and we observed an increase in the weight for the object coordinates system (0.41 vs. 0.5) and a reduction in the weights for the Cartesian and joint coordinates systems (0.29 vs. 0.24). This suggests that the arm posture cannot explain the generalization pattern in this case.

      (2) In experiments 2 and 3, we used the same arm posture in the training workspace and either changed the arm posture (experiment 2) or did not change the arm posture (experiment 3) in the test workspaces. While the arm posture for the training workspace was the same, the force generalization patterns were different between the two experiments, suggesting that the arm posture during the training phase (adaptation) does not set the generalization weights.

      Overall, this shows that it is not specifically the arm posture in either the test or the training workspaces that set the weights. Of course, all coordinate models, including our noise model, will consider posture in the determination of the weights.

      Reviewer #2 (Public Review):

      Leib & Franklin assessed how the adaptation of intersegmental dynamics of the arm generalizes to changes in different factors: areas of extrinsic space, limb configurations, and 'object-based' coordinates. Participants reached in many different directions around 360{degree sign}, adapting to velocity-dependent curl fields that varied depending on the reach angle. This learning was measured via the pattern of forces expressed in upon the channel wall of "error clamps" that were randomly sampled from each of these different directions. The authors employed a clever method to predict how this pattern of forces should change if the set of targets was moved around the workspace. Some sets of locations resulted in a large change in joint angles or object-based coordinates, but Cartesian coordinates were always the same. Across three separate experiments, the observed shifts in the generalized force pattern never corresponded to a change that was made relative to any one reference frame. Instead, the authors found that the observed pattern of forces could be explained by a weighted combination of the change in Cartesian, joint, and object-based coordinates across test and training contexts.

      In general, I believe the authors make a good argument for this specific mixed weighting of different contexts. I have a few questions that I hope are easily addressed.

      Movements show different biases relative to the reach direction. Although very similar across people, this function of biases shifts when the arm is moved around the workspace (Ghilardi, Gordon, and Ghez, 1995). The origin of these biases is thought to arise from several factors that would change across the different test and training workspaces employed here (Vindras & Viviani, 2005). My concern is that the baseline biases in these different contexts are different and that rather the observed change in the force pattern across contexts isn't a function of generalization, but a change in underlying biases. Baseline force channel measurements were taken in the different workspace locations and conditions, so these could be used to show whether such biases are meaningfully affecting the results.

      We agree with the reviewer and we followed their suggested analysis. In the following figure (Author response image 1) we plotted the baseline force compensation profiles in each workspace for each of the four experiments. As can be seen in this figure, the baseline force compensation is very close to zero and differs significantly from the force compensation profiles after adaptation to the scaled force field.

      Author response image 1.

      Baseline force compensation levels for experiments 1-4. For each experiment, we plotted the force compensation for the training, test 1, and test 2 workspaces.

      Experiment 3, Test 1 has data that seems the worst fit with the overall story. I thought this might be an issue, but this is also the test set for a potentially awkwardly long arm. My understanding of the object-based coordinate system is that it's primarily a function of the wrist angle, or perceived angle, so I am a little confused why the length of this stick is also different across the conditions instead of just a different angle. Could the length be why this data looks a little odd?

      Usually, force generalization is tested by physically moving the hand in unexplored areas. In experiment 3 we tested generalization using a tool which, as far as we know, was not tested in the past in a similar way to the present experiment. Indeed, the results look odd compared to the results of the other experiments, which were based on the ‘classic’ generalization idea. While we have some ideas regarding possible reasons for the observed behavior, it is out of the scope of the current work and still needs further examination.

      Based on the reviewer’s comment, we improved the explanation in the introduction regarding the idea behind the object based coordinate system

      “we could represent the forces as belonging to the hand or a hand-held object using the orientation vector connecting the shoulder and the object or hand in space (Berniker, Franklin et al. 2014).” The reviewer is right in their observation that the predictions of the object-based reference frame will look the same if we change the length of the tool. The object-based generalized forces, specifically the shift in the force pattern, depend only on the object's orientation but not its length (equation 4).

      The manuscript is written and organized in a way that focuses heavily on the noise element of the model. Other than it being reasonable to add noise to a model, it's not clear to me that the noise is adding anything specific. It seems like the model makes predictions based on how many specific components have been rotated in the different test conditions. I fear I'm just being dense, but it would be helpful to clarify whether the noise itself (and inverse variance estimation) are critical to why the model weights each reference frame how it does or whether this is just a method for scaling the weight by how much the joints or whatever have changed. It seems clear that this noise model is better than weighting by energy and smoothness.

      We have now included further details of the noise model and added to Figure 1 to highlight how noise can affect the predicted weights. In short, we agree with the reviewer there are multiple ways to add noise to the generalized force patterns. We choose a simple option in which we simulate possible distortions to the state variables that set the direction of movement. Once we calculated the variance of the force profile due to this distortion, one possible way is to combine them using an inverse variance estimator. Note that it has been shown that an inverse variance estimator is an ideal way to combine signals (e.g., Shahar, D.J. (2017) https://doi.org/10.4236/ojs.2017.72017). However, as we suggest, we do not claim or try to provide evidence for this specific way of calculating the weights. Instead, we suggest that giving greater weight to the less variable force representation can predict both the current experimental results as well as past results.

      Are there any force profiles for individual directions that are predicted to change shape substantially across some of these assorted changes in training and test locations (rather than merely being scaled)? If so, this might provide another test of the hypotheses.

      In experiments 1-3, in which there is a large shift of the force compensation curve, we found directions in which the generalized force was flipped in direction. That is, clockwise force profiles in the training workspace could change into counter-clockwise profiles in the test workspace. For example, in experiment 2, for movement at 157.5° we can see that the force profile was clockwise for the training workspace (with a force compensation value of 0.43) and movement at the same direction was counterclockwise for test workspace 1 (force compensation equal to -0.48). Importantly, we found that the noise based model could predict this change.

      Author response image 2.

      Results of experiment 2. Force compensation profiles for the training workspace (grey solid line) and test workspace 1 (dark blue solid line). Examining the force nature for the 157.5° direction, we found a change in the applied force by the participants (change from clockwise to counterclockwise forces). This was supported by a change in force compensation value (0.43 vs. -0.48). The noise based model can predict this change as shown by the predicted force compensation profile (green dashed line).

      I don't believe the decay factor that was used to scale the test functions was specified in the text, although I may have just missed this. It would be a good idea to state what this factor is where relevant in the text.

      We added an equation describing the decay factor (new equation 7 in the Methods section) according to this suggestion and Reviewer 1 comment on the same issue.

      Reviewer #3 (Public Review):

      The author proposed the minimum variance principle in the memory representation in addition to two alternative theories of the minimum energy and the maximum smoothness. The strength of this paper is the matching between the prediction data computed from the explicit equation and the behavioral data taken in different conditions. The idea of the weighting of multiple coordinate systems is novel and is also able to reconcile a debate in previous literature.

      The weakness is that although each model is based on an optimization principle, but the derivation process is not written in the method section. The authors did not write about how they can derive these weighting factors from these computational principles. Thus, it is not clear whether these weighting factors are relevant to these theories or just hacking methods. Suppose the author argues that this is the result of the minimum variance principle. In that case, the authors should show a process of how to derive these weighting factors as a result of the optimization process to minimize these cost functions.

      The reviewer brings up a very important point regarding the model. As shown below, it is not trivial to derive these weights using an analytical optimization process. We demonstrate one issue with this optimization process.

      The force representation can be written as (similar to equation 6):

      We formulated the problem as minimizing the variance of the force according to the weights w:

      In this case, the variance of the force is the variance-covariance matrix which can be minimized by minimizing the matrix trace:

      We will start by calculating the variance of the force representation in joints coordinate system:

      Here, the force variance is a result of a complex function which include the joints angle as a random variable. Expending the last expression, although very complex, is still possible. In the resulted expression, some of the resulted terms include calculating the variance of nested trigonometric functions of the random joint angle variance, for example:

      In the vast majority of these cases, analytical solutions do not exist. Similar issues can also raise for calculating the variance of complex multiplication of trigonometric functions such as in the case of multiplication of Jacobians (and inverse Jacobians)

      To overcome this problem, we turned to numerical solutions which simulate the variance due to the different state variables.

      In addition, I am concerned that the proposed model can cancel the property of the coordinate system by the predicted variance, and it can work for any coordinate system, even one that is not used in the human brain. When the applied force is given in Cartesian coordinates, the directionality in the generalization ability of the memory of the force field is characterized by the kinematic relationship (Jacobian) between the Cartesian coordinate and the coordinate of interest (Cartesian, joint, and object) as shown in Equation 3. At the same time, when a displacement (epsilon) is considered in a space and a corresponding displacement is linked with kinematic equations (e.g., joint displacement and hand displacement in 2 joint arms in this paper), the generated variances in different coordinate systems are linked with the kinematic equation each other (Jacobian). Thus, how a small noise in a certain coordinate system generates the hand force noise (sigma_x, sigma_j, sigma_o) is also characterized by the kinematics (Jacobian). Thus, when the predicted forcefield (F_c, F_j, F_o) was divided by the variance (F_c/sigma_c^2, F_j/sigma_j^2, F_o/sigma_o^2, ), the directionality of the generalization force which is characterized by the Jacobian is canceled by the directionality of the sigmas which is characterized by the Jacobian. Thus, as it has been read out from Fig*D and E top, the weight in E-top of each coordinate system is always the inverse of the shift of force from the test force by which the directionality of the generalization is always canceled.

      Once this directionality is canceled, no matter how to compute the weighted sum, it can replicate the memorized force. Thus, this model always works to replicate the test force no matter which coordinate system is assumed. Thus, I am suspicious of the falsifiability of this computational model. This model is always true no matter which coordinate system is assumed. Even though they use, for instance, the robot coordinate system, which is directly linked to the participant's hand with the kinematic equation (Jacobian), they can replicate this result. But in this case, the model would be nonsense. The falsifiability of this model was not explicitly written.

      As explained above, calculating the variability of the generalized forces given the random nature of the state variable is a complex function that is not summarized using a Jacobian. Importantly the model is unable to reproduce or replicate the test force arbitrarily. In fact, we have already shown this (see Appendix 1- figure 1), where when we only attempt to explain the data with either a single coordinate system (or a combination of two coordinate systems) we are completely unable to replicate the test data despite using this model. For example, in experiment 4, when we don’t use the joint based coordinate system, the model predicts zero shift of the force compensation pattern while the behavioral data show a shift due to the contribution of the joint coordinate system. Any arbitrary model (similar to the random model we tested, please see the response to Reviewer 1) would be completely unable to recreate the test data. Our model instead makes very specific predictions about the weighting between the three coordinate systems and therefore completely specified force predictions for every possible test posture. We added this point to the Discussion

      “The results we present here support the idea that the motor system can use multiple representations during adaptation to novel dynamics. Specifically, we suggested that we combine three types of coordinate systems, where each is independent of the other (see Appendix 1- figure 1 for comparison with other combinations). Other combinations that include a single or two coordinate system can explain some of the results but not all of them, suggesting that force representation relies on all three with specific weights that change between generalization scenarios.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewers' 1 and 2 concern on endothelial cells (ECs) transcription changes on culture.

      We have now addressed this concern by FACS-sorting ECs (Fig. 7A revised) and comparing our data with previous studies (S. Fig. 1C). Our major claim was the epigenetic repression of EC genes, including those involved in BBB formation and angiogenesis, during later development. To further strengthen our claim, we knocked out HDAC2 during the later stages of development to prevent this epigenetic repression. As shown in the first version of the manuscript, this knockout results in enhanced angiogenesis and a leaky BBB.

      In the revised version, we have FACS-sorted CD31+ ECs from E-17.5 WT and HDAC2 ECKO mice, followed by ultra-low mRNA sequencing. Confirming the epigenetic repression via HDAC2, the HDAC2-deleted ECs showed high expression of BBB genes such as ZO-1, OCLN, MFSD2A, and GLUT1, and activation of the Wnt signaling pathway as indicated by the upregulation of Wnt target genes such as Axin2 and APCDD1. Additionally, to validate the increased angiogenesis phenotype observed, angiogenesis-related genes such as VEGFA, FLT1, and ENG were upregulated.

      Since the transcriptomics of brain ECs during developmental stages has already been published in Hupe et al., 2017, we did not attempt to replicate this. However, we compared our differentially regulated genes from E-13.5 versus adult stages with the transcriptome changes during development reported by Hupe et al., 2017. We found a significant overlap in important genes such as CLDN5, LEF1, ZIC3, and MFSD2A (S. Fig. 1C).

      As pointed out by the reviewer, culture-induced changes cannot be ruled out from our data. We have included a statement in the manuscript: "Even though we used similar culture conditions for both embryonic and adult cortical ECs, culture-induced changes have been reported previously and should be considered as a varying factor when interpreting our results."

      Reviewer-1 Comment 2- An additional concern is that for many experiments, siRNA knockdowns are performed without validation of the efficacy of the knockdown.

      We have now provided the protein expression data for HDAC2 and EZH2 in the revised manuscript Supplementary Figure- 2A.

      Reviewer-1 Comment 3- Some experiments in the paper are promising, however. For example, the knockout of HDAC2 in endothelial cells resulting in BBB leakage was striking. Investigating the mechanisms underlying this phenotype in vivo could yield important insights.

      We appreciate your positive comment. The in vivo HDAC2 knockout experiment serves as a validation of our in vitro findings, demonstrating that the epigenetic regulator HDAC2 can control the expression of endothelial cell (EC) genes involved in angiogenesis, blood-brain barrier (BBB) formation, and maturation. To investigate the mechanism behind the underlying phenotype of HDAC2 ECKO, we performed mRNA sequencing on HDAC2 ECKO E-17.5 ECs and discovered that vascular and BBB maturation is hindered by preventing the epigenetic repression of BBB, angiogenesis, and Wnt target genes (Fig. 7A). As a result, the HDAC2 ECKO phenotype showed increased angiogenesis and BBB leakage. This strengthens our hypothesis that HDAC2-mediated epigenetic repression is critical for BBB and vascular maturation.

      Reviewer 2 Comment-2 The use of qPCR assays for quantifying ChIP and transcript levels is inferior to ChIPseq and RNAseq. Whole genome methods, such as ChIPseq, permit a level of quality assessment that is not possible with qPCR methods. The authors should use whole genome NextGen sequencing approaches, show the alignment of reads to the genome from replicate experiments, and quantitatively analyze the technical quality of the data.

      We appreciate the reviewer's comment. While whole-genome methods like ChIP-seq offer comprehensive and high-throughput data, ChIP-qPCR assays remain valuable tools due to their sensitivity, specificity, and suitability for validation and targeted analysis. Our ChIP analysis identify the crucial roles of HDAC2 and PRC2, two epigenetic enzymes, in CNS endothelial cells (ECs). In vivo data presented in Figure 4 further support this finding through observed phenotypic differences. We concur that a comprehensive analysis of HDAC2 and PRC2 target genes in ECs is essential. A comprehensive analysis of HDAC2 and PRC2 target genes in ECs is currently underway and will be the subject of a separate publication due to the extensive nature of the data.

      Reviewer 2 Comment-3 Third, the observation that pharmacologic inhibitor experiments and conditional KO experiments targeting HDAC2 and the Polycomb complex perturb EC gene expression or BBB integrity, respectively, is not particularly surprising as these proteins have broad roles in epigenetic regulation in a wide variety of cell types.

      We appreciate the comments from the reviewers. Our results provide valuable insights into the specific epigenetic mechanisms that regulate BBB genes It is important to recognize that different cell types possess stage-specific distinct epigenetic landscapes and regulatory mechanisms. Rather than having broad roles across diverse cell types, it is more likely that HDAC2 (eventhough there are several other class and subtypes of HDACs) and the Polycomb complex exhibit specific functions within the context of EC gene expression or BBB integrity.

      Moreover, the significance of our findings is enhanced by the fact that epigenetic modifications are often reversible with the assistance of epigenetic regulators. This makes them promising targets for BBB modulation. Targeting epigenetic regulators can have a widespread impact, as these mechanisms regulate numerous genes that collectively have the potential to promote the vascular repair.

      A practical advantage is that FDA-approved HDAC2 inhibitors, as well as PRC2 inhibitors (such as those mentioned in clinical trials NCT03211988 and NCT02601950, are already available. This facilitates the repurposing of drugs and expedites their potential for clinical translation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors address whether the dorsal nucleus of the inferior colliculus (DCIC) in mice encodes sound source location within the front horizontal plane (i.e., azimuth). They do this using volumetric two-photon Ca2+ imaging and high-density silicon probes (Neuropixels) to collect single-unit data. Such recordings are beneficial because they allow large populations of simultaneous neural data to be collected. Their main results and the claims about those results are the following:

      (1) DCIC single-unit responses have high trial-to-trial variability (i.e., neural noise);

      (2) approximately 32% to 40% of DCIC single units have responses that are sensitive tosound source azimuth;

      (3) single-trial population responses (i.e., the joint response across all sampled single unitsin an animal) encode sound source azimuth "effectively" (as stated in title) in that localization decoding error matches average mouse discrimination thresholds;

      (4) DCIC can encode sound source azimuth in a similar format to that in the central nucleusof the inferior colliculus (as stated in Abstract);

      (5) evidence of noise correlation between pairs of neurons exists;

      and 6) noise correlations between responses of neurons help reduce population decoding error.

      While simultaneous recordings are not necessary to demonstrate results #1, #2, and #4, they are necessary to demonstrate results #3, #5, and #6.

      Strengths:

      - Important research question to all researchers interested in sensory coding in the nervous system.

      - State-of-the-art data collection: volumetric two-photon Ca2+ imaging and extracellularrecording using high-density probes. Large neuronal data sets.

      - Confirmation of imaging results (lower temporal resolution) with more traditionalmicroelectrode results (higher temporal resolution).

      - Clear and appropriate explanation of surgical and electrophysiological methods. I cannot comment on the appropriateness of the imaging methods.

      Strength of evidence for claims of the study:

      (1) DCIC single-unit responses have high trial-to-trial variability - The authors' data clearlyshows this.

      (2) Approximately 32% to 40% of DCIC single units have responses that are sensitive tosound source azimuth - The sensitivity of each neuron's response to sound source azimuth was tested with a Kruskal-Wallis test, which is appropriate since response distributions were not normal. Using this statistical test, only 8% of neurons (median for imaging data) were found to be sensitive to azimuth, and the authors noted this was not significantly different than the false positive rate. The Kruskal-Wallis test was not performed on electrophysiological data. The authors suggested that low numbers of azimuth-sensitive units resulting from the statistical analysis may be due to the combination of high neural noise and relatively low number of trials, which would reduce statistical power of the test. This may be true, but if single-unit responses were moderately or strongly sensitive to azimuth, one would expect them to pass the test even with relatively low statistical power. At best, if their statistical test missed some azimuthsensitive units, they were likely only weakly sensitive to azimuth. The authors went on to perform a second test of azimuth sensitivity-a chi-squared test-and found 32% (imaging) and 40% (e-phys) of single units to have statistically significant sensitivity. This feels a bit like fishing for a lower p-value. The Kruskal-Wallis test should have been left as the only analysis. Moreover, the use of a chi-squared test is questionable because it is meant to be used between two categorical variables, and neural response had to be binned before applying the test.

      The determination of what is a physiologically relevant “moderate or strong azimuth sensitivity” is not trivial, particularly when comparing tuning across different relays of the auditory pathway like the CNIC, auditory cortex, or in our case DCIC, where physiologically relevant azimuth sensitivities might be different. This is likely the reason why azimuth sensitivity has been defined in diverse ways across the bibliography (see Groh, Kelly & Underhill, 2003 for an early discussion of this issue). These diverse approaches include reaching a certain percentage of maximal response modulation, like used by Day et al. (2012, 2015, 2016) in CNIC, and ANOVA tests, like used by Panniello et al. (2018) and Groh, Kelly & Underhill (2003) in auditory cortex and IC respectively. Moreover, the influence of response variability and biases in response distribution estimation due to limited sampling has not been usually accounted for in the determination of azimuth sensitivity.

      As Reviewer #1 points out, in our study we used an appropriate ANOVA test (KruskalWallis) as a starting point to study response sensitivity to stimulus azimuth at DCIC. Please note that the alpha = 0.05 used for this test is not based on experimental evidence about physiologically relevant azimuth sensitivity but instead is an arbitrary p-value threshold. Using this test on the electrophysiological data, we found that ~ 21% of the simultaneously recorded single units reached significance (n = 4 mice). Nevertheless these percentages, in our small sample size (n = 4) were not significantly different from our false positive detection rate (p = 0.0625, Mann-Whitney, See Author response image 1 below).  In consequence, for both our imaging (Fig. 3C) and electrophysiological data, we could not ascertain if the percentage of neurons reaching significance in these ANOVA tests were indeed meaningfully sensitive to azimuth or this was due to chance. 

      Author response image 1.

      Percentage of the neuropixels recorded DCIC single units across mice that showed significant median response tuning, compared to false positive detection rate (α = 0.05, chance level).

      We reasoned that the observed markedly variable responses from DCIC units, which frequently failed to respond in many trials (Fig. 3D, 4A), in combination with the limited number of trial repetitions we could collect, results in under-sampled response distribution estimations. This under-sampling can bias the determination of stochastic dominance across azimuth response samples in Kruskal-Wallis tests. We would like to highlight that we decided not to implement resampling strategies to artificially increase the azimuth response sample sizes with “virtual trials”, in order to avoid “fishing for a smaller p-value”, when our collected samples might not accurately reflect the actual response population variability.

      As an alternative to hypothesis testing based on ranking and determining stochastic dominance of one or more azimuth response samples (Kruskal-Wallis test), we evaluated the overall statistical dependency to stimulus azimuth of the collected responses.  To do this we implement the Chi-square test by binning neuronal responses into categories. Binning responses into categories can reduce the influence of response variability to some extent, which constitutes an advantage of the Chi-square approach, but we note the important consideration that these response categories are arbitrary.

      Altogether, we acknowledge that our Chi-square approach to define azimuth sensitivity is not free of limitations and despite enabling the interrogation of azimuth sensitivity at DCIC, its interpretability might not extend to other brain regions like CNIC or auditory cortex. Nevertheless we hope the aforementioned arguments justify why the Kruskal-Wallis test simply could not “have been left as the only analysis”.

      (3) Single-trial population responses encode sound source azimuth "effectively" in that localization decoding error matches average mouse discrimination thresholds - If only one neuron in a population had responses that were sensitive to azimuth, we would expect that decoding azimuth from observation of that one neuron's response would perform better than chance. By observing the responses of more than one neuron (if more than one were sensitive to azimuth), we would expect performance to increase. The authors found that decoding from the whole population response was no better than chance. They argue (reasonably) that this is because of overfitting of the decoder modeltoo few trials used to fit too many parameters-and provide evidence from decoding combined with principal components analysis which suggests that overfitting is occurring. What is troubling is the performance of the decoder when using only a handful of "topranked" neurons (in terms of azimuth sensitivity) (Fig. 4F and G). Decoder performance seems to increase when going from one to two neurons, then decreases when going from two to three neurons, and doesn't get much better for more neurons than for one neuron alone. It seems likely there is more information about azimuth in the population response, but decoder performance is not able to capture it because spike count distributions in the decoder model are not being accurately estimated due to too few stimulus trials (14, on average). In other words, it seems likely that decoder performance is underestimating the ability of the DCIC population to encode sound source azimuth.

      To get a sense of how effective a neural population is at coding a particular stimulus parameter, it is useful to compare population decoder performance to psychophysical performance. Unfortunately, mouse behavioral localization data do not exist. Therefore, the authors compare decoder error to mouse left-right discrimination thresholds published previously by a different lab. However, this comparison is inappropriate because the decoder and the mice were performing different perceptual tasks. The decoder is classifying sound sources to 1 of 13 locations from left to right, whereas the mice were discriminating between left or right sources centered around zero degrees. The errors in these two tasks represent different things. The two data sets may potentially be more accurately compared by extracting information from the confusion matrices of population decoder performance. For example, when the stimulus was at -30 deg, how often did the decoder classify the stimulus to a lefthand azimuth? Likewise, when the stimulus was +30 deg, how often did the decoder classify the stimulus to a righthand azimuth?

      The azimuth discrimination error reported by Lauer et al. (2011) comes from engaged and highly trained mice, which is a very different context to our experimental setting with untrained mice passively listening to stimuli from 13 random azimuths. Therefore we did not perform analyses or interpretations of our results based on the behavioral task from Lauer et al. (2011) and only made the qualitative observation that the errors match for discussion.

      We believe it is further important to clarify that Lauer et al. (2011) tested the ability of mice to discriminate between a positively conditioned stimulus (reference speaker at 0º center azimuth associated to a liquid reward) and a negatively conditioned stimulus (coming from one of five comparison speakers positioned at 20º, 30º, 50º, 70 and 90º azimuth, associated to an electrified lickport) in a conditioned avoidance task. In this task, mice are not precisely “discriminating between left or right sources centered around zero degrees”, making further analyses to compare the experimental design of Lauer et al (2011) and ours even more challenging for valid interpretation.

      (4) DCIC can encode sound source azimuth in a similar format to that in the central nucleusof the inferior colliculus - It is unclear what exactly the authors mean by this statement in the Abstract. There are major differences in the encoding of azimuth between the two neighboring brain areas: a large majority of neurons in the CNIC are sensitive to azimuth (and strongly so), whereas the present study shows a minority of azimuth-sensitive neurons in the DCIC. Furthermore, CNIC neurons fire reliably to sound stimuli (low neural noise), whereas the present study shows that DCIC neurons fire more erratically (high neural noise).

      Since sound source azimuth is reported to be encoded by population activity patterns at CNIC (Day and Delgutte, 2013), we refer to a population activity pattern code as the “similar format” in which this information is encoded at DCIC. Please note that this is a qualitative comparison and we do not claim this is the “same format”, due to the differences the reviewer precisely describes in the encoding of azimuth at CNIC where a much larger majority of neurons show stronger azimuth sensitivity and response reliability with respect to our observations at DCIC. By this qualitative similarity of encoding format we specifically mean the similar occurrence of activity patterns from azimuth sensitive subpopulations of neurons in both CNIC and DCIC, which carry sufficient information about the stimulus azimuth for a sufficiently accurate prediction with regard to the behavioral discrimination ability.

      (5) Evidence of noise correlation between pairs of neurons exists - The authors' data andanalyses seem appropriate and sufficient to justify this claim.

      (6) Noise correlations between responses of neurons help reduce population decodingerror - The authors show convincing analysis that performance of their decoder increased when simultaneously measured responses were tested (which include noise correlation) than when scrambled-trial responses were tested (eliminating noise correlation). This makes it seem likely that noise correlation in the responses improved decoder performance. The authors mention that the naïve Bayesian classifier was used as their decoder for computational efficiency, presumably because it assumes no noise correlation and, therefore, assumes responses of individual neurons are independent of each other across trials to the same stimulus. The use of decoder that assumes independence seems key here in testing the hypothesis that noise correlation contains information about sound source azimuth. The logic of using this decoder could be more clearly spelled out to the reader. For example, if the null hypothesis is that noise correlations do not carry azimuth information, then a decoder that assumes independence should perform the same whether population responses are simultaneous or scrambled. The authors' analysis showing a difference in performance between these two cases provides evidence against this null hypothesis.

      We sincerely thank the reviewer for this careful and detailed consideration of our analysis approach. Following the reviewer’s constructive suggestion, we justified the decoder choice in the results section at the last paragraph of page 18:

      “To characterize how the observed positive noise correlations could affect the representation of stimulus azimuth by DCIC top ranked unit population responses, we compared the decoding performance obtained by classifying the single-trial response patterns from top ranked units in the modeled decorrelated datasets versus the acquired data (with noise correlations). With the intention to characterize this with a conservative approach that would be less likely to find a contribution of noise correlations as it assumes response independence, we relied on the naive Bayes classifier for decoding throughout the study. Using this classifier, we observed that the modeled decorrelated datasets produced stimulus azimuth prediction error distributions that were significantly shifted towards higher decoding errors (Fig. 5B, C) and, in our imaging datasets, were not significantly different from chance level (Fig. 5B). Altogether, these results suggest that the detected noise correlations in our simultaneously acquired datasets can help reduce the error of the IC population code for sound azimuth.”

      Minor weakness:

      - Most studies of neural encoding of sound source azimuth are done in a noise-free environment, but the experimental setup in the present study had substantial background noise. This complicates comparison of the azimuth tuning results in this study to those of other studies. One is left wondering if azimuth sensitivity would have been greater in the absence of background noise, particularly for the imaging data where the signal was only about 12 dB above the noise. The description of the noise level and signal + noise level in the Methods should be made clearer. Mice hear from about 2.5 - 80 kHz, so it is important to know the noise level within this band as well as specifically within the band overlapping with the signal.

      We agree with the reviewer that this information is useful. In our study, the background R.M.S. SPL during imaging across the mouse hearing range (2.5-80kHz) was 44.53 dB and for neuropixels recordings 34.68 dB. We have added this information to the methods section of the revised manuscript.

      Reviewer #2 (Public Review):

      In the present study, Boffi et al. investigate the manner in which the dorsal cortex of the of the inferior colliculus (DCIC), an auditory midbrain area, encodes sound location azimuth in awake, passively listening mice. By employing volumetric calcium imaging (scanned temporal focusing or s-TeFo), complemented with high-density electrode electrophysiological recordings (neuropixels probes), they show that sound-evoked responses are exquisitely noisy, with only a small portion of neurons (units) exhibiting spatial sensitivity. Nevertheless, a naïve Bayesian classifier was able to predict the presented azimuth based on the responses from small populations of these spatially sensitive units. A portion of the spatial information was provided by correlated trial-to-trial response variability between individual units (noise correlations). The study presents a novel characterization of spatial auditory coding in a non-canonical structure, representing a noteworthy contribution specifically to the auditory field and generally to systems neuroscience, due to its implementation of state-of-the-art techniques in an experimentally challenging brain region. However, nuances in the calcium imaging dataset and the naïve Bayesian classifier warrant caution when interpreting some of the results.

      Strengths:

      The primary strength of the study lies in its methodological achievements, which allowed the authors to collect a comprehensive and novel dataset. While the DCIC is a dorsal structure, it extends up to a millimetre in depth, making it optically challenging to access in its entirety. It is also more highly myelinated and vascularised compared to e.g., the cerebral cortex, compounding the problem. The authors successfully overcame these challenges and present an impressive volumetric calcium imaging dataset. Furthermore, they corroborated this dataset with electrophysiological recordings, which produced overlapping results. This methodological combination ameliorates the natural concerns that arise from inferring neuronal activity from calcium signals alone, which are in essence an indirect measurement thereof.

      Another strength of the study is its interdisciplinary relevance. For the auditory field, it represents a significant contribution to the question of how auditory space is represented in the mammalian brain. "Space" per se is not mapped onto the basilar membrane of the cochlea and must be computed entirely within the brain. For azimuth, this requires the comparison between miniscule differences between the timing and intensity of sounds arriving at each ear. It is now generally thought that azimuth is initially encoded in two, opposing hemispheric channels, but the extent to which this initial arrangement is maintained throughout the auditory system remains an open question. The authors observe only a slight contralateral bias in their data, suggesting that sound source azimuth in the DCIC is encoded in a more nuanced manner compared to earlier processing stages of the auditory hindbrain. This is interesting, because it is also known to be an auditory structure to receive more descending inputs from the cortex.

      Systems neuroscience continues to strive for the perfection of imaging novel, less accessible brain regions. Volumetric calcium imaging is a promising emerging technique, allowing the simultaneous measurement of large populations of neurons in three dimensions. But this necessitates corroboration with other methods, such as electrophysiological recordings, which the authors achieve. The dataset moreover highlights the distinctive characteristics of neuronal auditory representations in the brain. Its signals can be exceptionally sparse and noisy, which provide an additional layer of complexity in the processing and analysis of such datasets. This will be undoubtedly useful for future studies of other less accessible structures with sparse responsiveness.

      Weaknesses:

      Although the primary finding that small populations of neurons carry enough spatial information for a naïve Bayesian classifier to reasonably decode the presented stimulus is not called into question, certain idiosyncrasies, in particular the calcium imaging dataset and model, complicate specific interpretations of the model output, and the readership is urged to interpret these aspects of the study's conclusions with caution.

      I remain in favour of volumetric calcium imaging as a suitable technique for the study, but the presently constrained spatial resolution is insufficient to unequivocally identify regions of interest as cell bodies (and are instead referred to as "units" akin to those of electrophysiological recordings). It remains possible that the imaging set is inadvertently influenced by non-somatic structures (including neuropil), which could report neuronal activity differently than cell bodies. Due to the lack of a comprehensive ground-truth comparison in this regard (which to my knowledge is impossible to achieve with current technology), it is difficult to imagine how many informative such units might have been missed because their signals were influenced by spurious, non-somatic signals, which could have subsequently misled the models. The authors reference the original Nature Methods article (Prevedel et al., 2016) throughout the manuscript, presumably in order to avoid having to repeat previously published experimental metrics. But the DCIC is neither the cortex nor hippocampus (for which the method was originally developed) and may not have the same light scattering properties (not to mention neuronal noise levels). Although the corroborative electrophysiology data largely eleviates these concerns for this particular study, the readership should be cognisant of such caveats, in particular those who are interested in implementing the technique for their own research.

      A related technical limitation of the calcium imaging dataset is the relatively low number of trials (14) given the inherently high level of noise (both neuronal and imaging). Volumetric calcium imaging, while offering a uniquely expansive field of view, requires relatively high average excitation laser power (in this case nearly 200 mW), a level of exposure the authors may have wanted to minimise by maintaining a low the number of repetitions, but I yield to them to explain.

      We assumed that the levels of heating by excitation light measured at the neocortex in Prevedel et al. (2016), were representative for DCIC also. Nevertheless, we recognize this approximation might not be very accurate, due to the differences in tissue architecture and vascularization from these two brain areas, just to name a few factors. The limiting factor preventing us from collecting more trials in our imaging sessions was that we observed signs of discomfort or slight distress in some mice after ~30 min of imaging in our custom setup, which we established as a humane end point to prevent distress. In consequence imaging sessions were kept to 25 min in duration, limiting the number of trials collected. However we cannot rule out that with more extensive habituation prior to experiments the imaging sessions could be prolonged without these signs of discomfort or if indeed influence from our custom setup like potential heating of the brain by illumination light might be the causing factor of the observed distress. Nevertheless, we note that previous work has shown that ~200mW average power is a safe regime for imaging in the cortex by keeping brain heating minimal (Prevedel et al., 2016), without producing the lasting damages observed by immunohistochemisty against apoptosis markers above 250mW (Podgorski and Ranganathan 2016, https://doi.org/10.1152/jn.00275.2016).

      Calcium imaging is also inherently slow, requiring relatively long inter-stimulus intervals (in this case 5 s). This unfortunately renders any model designed to predict a stimulus (in this case sound azimuth) from particularly noisy population neuronal data like these as highly prone to overfitting, to which the authors correctly admit after a model trained on the entire raw dataset failed to perform significantly above chance level. This prompted them to feed the model only with data from neurons with the highest spatial sensitivity. This ultimately produced reasonable performance (and was implemented throughout the rest of the study), but it remains possible that if the model was fed with more repetitions of imaging data, its performance would have been more stable across the number of units used to train it. (All models trained with imaging data eventually failed to converge.) However, I also see these limitations as an opportunity to improve the technology further, which I reiterate will be generally important for volume imaging of other sparse or noisy calcium signals in the brain.

      Transitioning to the naïve Bayesian classifier itself, I first openly ask the authors to justify their choice of this specific model. There are countless types of classifiers for these data, each with their own pros and cons. Did they actually try other models (such as support vector machines), which ultimately failed? If so, these negative results (even if mentioned en passant) would be extremely valuable to the community, in my view. I ask this specifically because different methods assume correspondingly different statistical properties of the input data, and to my knowledge naïve Bayesian classifiers assume that predictors (neuronal responses) are assumed to be independent within a class (azimuth). As the authors show that noise correlations are informative in predicting azimuth, I wonder why they chose a model that doesn't take advantage of these statistical regularities. It could be because of technical considerations (they mention computing efficiency), but I am left generally uncertain about the specific logic that was used to guide the authors through their analytical journey.

      One of the main reasons we chose the naïve Bayesian classifier is indeed because it assumes that the responses of the simultaneously recorded neurons are independent and therefore it does not assume a contribution of noise correlations to the estimation of the posterior probability of each azimuth. This model would represent the null hypothesis that noise correlations do not contribute to the encoding of stimulus azimuth, which would be verified by an equal decoding outcome from correlated or decorrelated datasets. Since we observed that this is not the case, the model supports the alternative hypothesis that noise correlations do indeed influence stimulus azimuth encoding. We wanted to test these hypotheses with the most conservative approach possible that would be least likely to find a contribution of noise correlations. Other relevant reasons that justify our choice of the naive Bayesian classifier are its robustness against the limited numbers of trials we could collect in comparison to other more “data hungry” classifiers like SVM, KNN, or artificial neuronal nets. We did perform preliminary tests with alternative classifiers but the obtained decoding errors were similar when decoding the whole population activity (Author response image 2A). Dimensionality reduction following the approach described in the manuscript showed a tendency towards smaller decoding errors observed with an alternative classifier like KNN, but these errors were still larger than the ones observed with the naive Bayesian classifier (median error 45º). Nevertheless, we also observe a similar tendency for slightly larger decoding errors in the absence of noise correlations (decorrelated, Author response image 2B). Sentences detailing the logic of classifier choice are now included in the results section at page 10 and at the last paragraph of page 18 (see responses to Reviewer 1).

      Author response image 2.

      A) Cumulative distribution plots of the absolute cross-validated single-trial prediction errors obtained using different classifiers (blue; KNN: K-nearest neighbors; SVM: support vector machine ensemble) and chance level distribution (gray) on the complete populations of imaged units. Cumulative distribution plots of the absolute cross-validated singletrial prediction errors obtained using a Bayes classifier (naive approximation for computation efficiency) to decode the single-trial response patterns from the 31 top ranked units in the simultaneously imaged datasets across mice (cyan), modeled decorrelated datasets (orange) and the chance level distribution associated with our stimulation paradigm (gray). Vertical dashed lines show the medians of cumulative distributions. K.S. w/Sidak: Kolmogorov-Smirnov with Sidak.

      That aside, there remain other peculiarities in model performance that warrant further investigation. For example, what spurious features (or lack of informative features) in these additional units prevented the models of imaging data from converging?

      Considering the amount of variability observed throughout the neuronal responses both in imaging and neuropixels datasets, it is easy to suspect that the information about stimulus azimuth carried in different amounts by individual DCIC neurons can be mixed up with information about other factors (Stringer et al., 2019). In an attempt to study the origin of these features that could confound stimulus azimuth decoding we explored their relation to face movement (Supplemental Figure 2), finding a correlation to snout movements, in line with previous work by Stringer et al. (2019).

      In an orthogonal question, did the most spatially sensitive units share any detectable tuning features? A different model trained with electrophysiology data in contrast did not collapse in the range of top-ranked units plotted. Did this model collapse at some point after adding enough units, and how well did that correlate with the model for the imaging data?

      Our electrophysiology datasets were much smaller in size (number of simultaneously recorded neurons) compared to our volumetric calcium imaging datasets, resulting in a much smaller total number of top ranked units detected per dataset. This precluded the determination of a collapse of decoder performance due to overfitting beyond the range plotted in Fig 4G.

      How well did the form (and diversity) of the spatial tuning functions as recorded with electrophysiology resemble their calcium imaging counterparts? These fundamental questions could be addressed with more basic, but transparent analyses of the data (e.g., the diversity of spatial tuning functions of their recorded units across the population). Even if the model extracts features that are not obvious to the human eye in traditional visualisations, I would still find this interesting.

      The diversity of the azimuth tuning curves recorded with calcium imaging (Fig. 3B) was qualitatively larger than the ones recorded with electrophysiology (Fig. 4B), potentially due to the larger sampling obtained with volumetric imaging. We did not perform a detailed comparison of the form and a more quantitative comparison of the diversity of these functions because the signals compared are quite different, as calcium indicator signal is subject to non linearities due to Ca2+ binding cooperativity and low pass filtering due to binding kinetics. We feared this could lead to misleading interpretations about the similarities or differences between the azimuth tuning functions in imaged and electrophysiology datasets. Our model uses statistical response dependency to stimulus azimuth, which does not rely on features from a descriptive statistic like mean response tuning. In this context, visualizing the trial-to-trial responses as a function of azimuth shows “features that are not obvious to the human eye in traditional visualizations” (Fig. 3D, left inset).

      Finally, the readership is encouraged to interpret certain statements by the authors in the current version conservatively. How the brain ultimately extracts spatial neuronal data for perception is anyone's guess, but it is important to remember that this study only shows that a naïve Bayesian classifier could decode this information, and it remains entirely unclear whether the brain does this as well. For example, the model is able to achieve a prediction error that corresponds to the psychophysical threshold in mice performing a discrimination task (~30 {degree sign}). Although this is an interesting coincidental observation, it does not mean that the two metrics are necessarily related. The authors correctly do not explicitly claim this, but the manner in which the prose flows may lead a non-expert into drawing that conclusion.

      To avoid misleading the non-expert readers, we have clarified in the manuscript that the observed correspondence between decoding error and psychophysical threshold is explicitly coincidental.

      Page 13, end of middle paragraph:

      “If we consider the median of the prediction error distribution as an overall measure of decoding performance, the single-trial response patterns from subsamples of at least the 7 top ranked units produced median decoding errors that coincidentally matched the reported azimuth discrimination ability of mice (Fig 4G, minimum audible angle = 31º) (Lauer et al., 2011).”

      Page 14, bottom paragraph:

      “Decoding analysis (Fig. 4F) of the population response patterns from azimuth dependent top ranked units simultaneously recorded with neuropixels probes showed that the 4 top ranked units are the smallest subsample necessary to produce a significant decoding performance that coincidentally matches the discrimination ability of mice (31° (Lauer et al., 2011)) (Fig. 5F, G).”

      We also added to the Discussion sentences clarifying that a relationship between these two variables remains to be determined and it also remains to be determined if the DCIC indeed performs a bayesian decoding computation for sound localization.

      Page 20, bottom:

      “… Concretely, we show that sound location coding does indeed occur at DCIC on the single trial basis, and that this follows a comparable mechanism to the characterized population code at CNIC (Day and Delgutte, 2013). However, it remains to be determined if indeed the DCIC network is physiologically capable of Bayesian decoding computations. Interestingly, the small number of DCIC top ranked units necessary to effectively decode stimulus azimuth suggests that sound azimuth information is redundantly distributed across DCIC top ranked units, which points out that mechanisms beyond coding efficiency could be relevant for this population code.

      While the decoding error observed from our DCIC datasets obtained in passively listening, untrained mice coincidentally matches the discrimination ability of highly trained, motivated mice (Lauer et al., 2011), a relationship between decoding error and psychophysical performance remains to be determined. Interestingly, a primary sensory representations should theoretically be even more precise than the behavioral performance as reported in the visual system (Stringer et al., 2021).”

      Moreover, the concept of redundancy (of spatial information carried by units throughout the DCIC) is difficult for me to disentangle. One interpretation of this formulation could be that there are non-overlapping populations of neurons distributed across the DCIC that each could predict azimuth independently of each other, which is unlikely what the authors meant. If the authors meant generally that multiple neurons in the DCIC carry sufficient spatial information, then a single neuron would have been able to predict sound source azimuth, which was not the case. I have the feeling that they actually mean "complimentary", but I leave it to the authors to clarify my confusion, should they wish.

      We observed that the response patterns from relatively small fractions of the azimuth sensitive DCIC units (4-7 top ranked units) are sufficient to generate an effective code for sound azimuth, while 32-40% of all simultaneously recorded DCIC units are azimuth sensitive. In light of this observation, we interpreted that the azimuth information carried by the population should be redundantly distributed across the complete subpopulation of azimuth sensitive DCIC units.

      In summary, the present study represents a significant body of work that contributes substantially to the field of spatial auditory coding and systems neuroscience. However, limitations of the imaging dataset and model as applied in the study muddles concrete conclusions about how the DCIC precisely encodes sound source azimuth and even more so to sound localisation in a behaving animal. Nevertheless, it presents a novel and unique dataset, which, regardless of secondary interpretation, corroborates the general notion that auditory space is encoded in an extraordinarily complex manner in the mammalian brain.

      Reviewer #3 (Public Review):

      Summary:

      Boffi and colleagues sought to quantify the single-trial, azimuthal information in the dorsal cortex of the inferior colliculus (DCIC), a relatively understudied subnucleus of the auditory midbrain. They used two complementary recording methods while mice passively listened to sounds at different locations: a large volume but slow sampling calcium-imaging method, and a smaller volume but temporally precise electrophysiology method. They found that neurons in the DCIC were variable in their activity, unreliably responding to sound presentation and responding during inter-sound intervals. Boffi and colleagues used a naïve Bayesian decoder to determine if the DCIC population encoded sound location on a single trial. The decoder failed to classify sound location better than chance when using the raw single-trial population response but performed significantly better than chance when using intermediate principal components of the population response. In line with this, when the most azimuth dependent neurons were used to decode azimuthal position, the decoder performed equivalently to the azimuthal localization abilities of mice. The top azimuthal units were not clustered in the DCIC, possessed a contralateral bias in response, and were correlated in their variability (e.g., positive noise correlations). Interestingly, when these noise correlations were perturbed by inter-trial shuffling decoding performance decreased. Although Boffi and colleagues display that azimuthal information can be extracted from DCIC responses, it remains unclear to what degree this information is used and what role noise correlations play in azimuthal encoding.

      Strengths:

      The authors should be commended for collection of this dataset. When done in isolation (which is typical), calcium imaging and linear array recordings have intrinsic weaknesses. However, those weaknesses are alleviated when done in conjunction with one another - especially when the data largely recapitulates the findings of the other recording methodology. In addition to the video of the head during the calcium imaging, this data set is extremely rich and will be of use to those interested in the information available in the DCIC, an understudied but likely important subnucleus in the auditory midbrain.

      The DCIC neural responses are complex; the units unreliably respond to sound onset, and at the very least respond to some unknown input or internal state (e.g., large inter-sound interval responses). The authors do a decent job in wrangling these complex responses: using interpretable decoders to extract information available from population responses.

      Weaknesses:

      The authors observe that neurons with the most azimuthal sensitivity within the DCIC are positively correlated, but they use a Naïve Bayesian decoder which assume independence between units. Although this is a bit strange given their observation that some of the recorded units are correlated, it is unlikely to be a critical flaw. At one point the authors reduce the dimensionality of their data through PCA and use the loadings onto these components in their decoder. PCA incorporates the correlational structure when finding the principal components and constrains these components to be orthogonal and uncorrelated. This should alleviate some of the concern regarding the use of the naïve Bayesian decoder because the projections onto the different components are independent. Nevertheless, the decoding results are a bit strange, likely because there is not much linearly decodable azimuth information in the DCIC responses. Raw population responses failed to provide sufficient information concerning azimuth for the decoder to perform better than chance. Additionally, it only performed better than chance when certain principal components or top ranked units contributed to the decoder but not as more components or units were added. So, although there does appear to be some azimuthal information in the recoded DCIC populations - it is somewhat difficult to extract and likely not an 'effective' encoding of sound localization as their title suggests.

      As described in the responses to reviewers 1 and 2, we chose the naïve Bayes classifier as a decoder to determine the influence of noise correlations through the most conservative approach possible, as this classifier would be least likely to find a contribution of correlated noise. Also, we chose this decoder due to its robustness against limited numbers of trials collected, in comparison to “data hungry” non linear classifiers like KNN or artificial neuronal nets. Lastly, we observed that small populations of noisy, unreliable (do not respond in every trial) DCIC neurons can encode stimulus azimuth in passively listening mice matching the discrimination error of trained mice. Therefore, while this encoding is definitely not efficient, it can still be considered effective.

      Although this is quite a worthwhile dataset, the authors present relatively little about the characteristics of the units they've recorded. This may be due to the high variance in responses seen in their population. Nevertheless, the authors note that units do not respond on every trial but do not report what percent of trials that fail to evoke a response. Is it that neurons are noisy because they do not respond on every trial or is it also that when they do respond they have variable response distributions? It would be nice to gain some insight into the heterogeneity of the responses.

      The limited number of azimuth trial repetitions that we could collect precluded us from making any quantification of the unreliability (failures to respond) and variability in the response distributions from the units we recorded, as we feared they could be misleading. In qualitative terms, “due to the high variance in responses seen” in the recordings and the limited trial sampling, it is hard to make any generalization. In consequence we referred to the observed response variance altogether as neuronal noise. Considering these points, our datasets are publicly available for exploration of the response characteristics.

      Additionally, is there any clustering at all in response profiles or is each neuron they recorded in the DCIC unique?

      We attempted to qualitatively visualize response clustering using dimensionality reduction, observing different degrees of clustering or lack thereof across the azimuth classes in the datasets collected from different mice. It is likely that the limited number of azimuth trials we could collect and the high response variance contribute to an inconsistent response clustering across datasets.

      They also only report the noise correlations for their top ranked units, but it is possible that the noise correlations in the rest of the population are different.

      For this study, since our aim was to interrogate the influence of noise correlations on stimulus azimuth encoding by DCIC populations, we focused on the noise correlations from the top ranked unit subpopulation, which likely carry the bulk of the sound location information.  Noise correlations can be defined as correlation in the trial to trial response variation of neurons. In this respect, it is hard to ascertain if the rest of the population, that is not in the top rank unit percentage, are really responding and showing response variation to evaluate this correlation, or are simply not responding at all and show unrelated activity altogether. This makes observations about noise correlations from “the rest of the population” potentially hard to interpret.

      It would also be worth digging into the noise correlations more - are units positively correlated because they respond together (e.g., if unit x responds on trial 1 so does unit y) or are they also modulated around their mean rates on similar trials (e.g., unit x and y respond and both are responding more than their mean response rate). A large portion of trial with no response can occlude noise correlations. More transparency around the response properties of these populations would be welcome.

      Due to the limited number of azimuth trial repetitions collected, to evaluate noise correlations we used the non parametric Kendall tau correlation coefficient which is a measure of pairwise rank correlation or ordinal association in the responses to each azimuth. Positive rank correlation would represent neurons more likely responding together. Evaluating response modulation “around their mean rates on similar trials” would require assumptions about the response distributions, which we avoided due to the potential biases associated with limited sample sizes.

      It is largely unclear what the DCIC is encoding. Although the authors are interested in azimuth, sound location seems to be only a small part of DCIC responses. The authors report responses during inter-sound interval and unreliable sound-evoked responses. Although they have video of the head during recording, we only see a correlation to snout and ear movements (which are peculiar since in the example shown it seems the head movements predict the sound presentation). Additional correlates could be eye movements or pupil size. Eye movement are of particular interest due to their known interaction with IC responses - especially if the DCIC encodes sound location in relation to eye position instead of head position (though much of eye-position-IC work was done in primates and not rodent). Alternatively, much of the population may only encode sound location if an animal is engaged in a localization task. Ideally, the authors could perform more substantive analyses to determine if this population is truly noisy or if the DCIC is integrating un-analyzed signals.

      We unsuccessfully attempted eye tracking and pupillometry in our videos. We suspect that the reason behind this is a generally overly dilated pupil due to the low visible light illumination conditions we used which were necessary to protect the PMT of our custom scope.

      It is likely that DCIC population activity is integrating un-analyzed signals, like the signal associated with spontaneous behaviors including face movements (Stringer et al., 2019), which we observed at the level of spontaneous snout movements. However investigating if and how these signals are integrated to stimulus azimuth coding requires extensive behavioral testing and experimentation which is out of the scope of this study. For the purpose of our study, we referred to trial-to-trial response variation as neuronal noise. We note that this definition of neuronal noise can, and likely does, include an influence from un-analyzed signals like the ones from spontaneous behaviors.

      Although this critique is ubiquitous among decoding papers in the absence of behavioral or causal perturbations, it is unclear what - if any - role the decoded information may play in neuronal computations. The interpretation of the decoder means that there is some extractable information concerning sound azimuth - but not if it is functional. This information may just be epiphenomenal, leaking in from inputs, and not used in computation or relayed to downstream structures. This should be kept in mind when the authors suggest their findings implicate the DCIC functionally in sound localization.

      Our study builds upon previous reports by other independent groups relying on “causal and behavioral perturbations” and implicating DCIC in sound location learning induced experience dependent plasticity (Bajo et al., 2019, 2010; Bajo and King, 2012), which altogether argues in favor of DCIC functionality in sound localization.

      Nevertheless, we clarified in the discussion of the revised manuscript that a relationship between the observed decoding error and the psychophysical performance, or the ability of the DCIC network to perform Bayesian decoding computations, both remain to be determined (please see responses to Reviewer #2).

      It is unclear why positive noise correlations amongst similarly tuned neurons would improve decoding. A toy model exploring how positive noise correlations in conjunction with unreliable units that inconsistently respond may anchor these findings in an interpretable way. It seems plausible that inconsistent responses would benefit from strong noise correlations, simply by units responding together. This would predict that shuffling would impair performance because you would then be sampling from trials in which some units respond, and trials in which some units do not respond - and may predict a bimodal performance distribution in which some trials decode well (when the units respond) and poor performance (when the units do not respond).

      In samples with more that 2 dimensions, the relationship between signal and noise correlations is more complex than in two dimensional samples (Montijn et al., 2016) which makes constructing interpretable and simple toy models of this challenging. Montijn et al. (2016) provide a detailed characterization and model describing how the accuracy of a multidimensional population code can improve when including “positive noise correlations amongst similarly tuned neurons”. Unfortunately we could not successfully test their model based on Mahalanobis distances as we could not verify that the recorded DCIC population responses followed a multivariate gaussian distribution, due to the limited azimuth trial repetitions we could sample.

      Significance:

      Boffi and colleagues set out to parse the azimuthal information available in the DCIC on a single trial. They largely accomplish this goal and are able to extract this information when allowing the units that contain more information about sound location to contribute to their decoding (e.g., through PCA or decoding on top unit activity specifically). The dataset will be of value to those interested in the DCIC and also to anyone interested in the role of noise correlations in population coding. Although this work is first step into parsing the information available in the DCIC, it remains difficult to interpret if/how this azimuthal information is used in localization behaviors of engaged mice.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      General:

      The manuscript is generally well written, but could benefit from a quick proof by a native English speaker (e.g., "the" inferior colliculus is conventionally used with its article). The flow of arguments is also generally easy to follow, but I would kindly ask the authors to consider elaborating or clarifying the following points (including those already mentioned in my public review).

      (1) Choice of model:

      There are countless ways one can construct a decoder or classifier that can predict a presented sensory stimulus based on a population neuronal response. Given the assumptions of independence as mentioned in my public review, I would ask the authors to explicitly justify their choice of a naïve Bayesian classifier.

      A section detailing the logic of classifier choice is now included in the results section at page 10 and the last paragraph of page 18 from the revised version of the manuscript.

      (2) Number of imaging repetitions:

      For particularly noisy datasets, 14 repetitions is indeed quite few. I reckon this was not the choice of the authors, but rather limited by the inherent experimental conditions. Despite minimisation of required average laser power during the development of s-TeFo imaging, the authors still required almost 200 mW (which is still quite a lot of exposure). Although 14 repetitions for 13 azimuthal locations every 5 s is at face value a relatively short imaging session (~15 min.), at 191 mW, with the desire to image mice multiple times, I could imagine that this is a practical limitation the authors faced (to avoid excessive tissue heating or photodamage, which was assessed in the original Nature Methods article, but not here). Nevertheless, this logic (or whatever logic they had) should be explained for non-imaging experts in the readership.

      This is now addressed in the answers to the public reviews.

      (3) Redundancy:

      It is honestly unclear to me what the authors mean by this. I don't speculate that they mean there are "redundant" (small) populations of neurons that sufficiently encode azimuth, but I'm actually not certain. If that were the case, I believe this would need further clarification, since redundant representations would be both inconsistent with the general (perhaps surprising) finding that large populations are not required in the DCIC, which is thought to be the case at earlier processing stages.

      In the text we are referring to the azimuth information being redundantly distributed across DCIC top ranked units. We do not mention redundant “populations of neurons”.

      (4) Correspondence of decoding accuracy with psychometric functions in mice: While this is an interesting coincidental observation, it should not be interpreted that the neuronal detection threshold in the DCIC somehow is somehow responsible its psychometric counterpart (which is an interesting yet exceedingly complex question). Although I do not believe the authors intended to suggest this, I would personally be cautious in the way I describe this correspondence. I mention this because the authors point it out multiple times in the manuscript (whereas I would have just mentioned it once in passing).

      This is now clarified in the revised manuscript.

      (5) Noisy vs. sparse:

      I'm confident that the authors understand the differences between these terms, both in concept (stochastic vs. scattered) and in context (neuronal vs. experimental), but I personally would be cautious in the way I use them in the description of the study. Indeed, auditory neuronal signals are to my knowledge generally thought to be both sparse and noisy, which is in itself interesting, but the study also deals with substantial experimental (recording) noise, and I think it's important for the readership to understand when "noise" refers to the recordings (in particular the imaging data) and to neuronal activity. I mention this specifically because "noisy" appears in the title.

      We have clarified this issue at the bottom of page 5 by adding the following sentences to the revised manuscript:

      “In this section we used the word “noise” to refer to the sound stimuli used and recording setup background sound levels or recording noise in the acquired signals. To avoid confusion, from now on in the manuscript the word “noise” will be used in the context of neuronal noise, which is the trial-to-trial variation in neuronal responses unrelated to stimuli, unless otherwise noted.”

      (6)  More details in the Methods:

      The Methods section is perhaps the least-well structured part of the present manuscript in my view, and I encourage the authors to carefully go through it and add the following information (in case I somehow missed it).

      a. Please also indicate the number of animals used here.

      Added.

      b. How many sessions were performed on each mouse?

      This is already specified in the methods section in page 25:

      “mice were imaged a total of 2-11 times (sessions), one to three times a week.”

      We added for clarification:

      “Datasets here analyzed and reported come from the imaging session in which we observed maximal calcium sensor signal (peak AAV expression) and maximum number of detected units.”

      c. For the imaging experiments, was it possible to image the same units from session tosession?

      This is not possible for sTeFo 2P data due to low spatial resolution which makes precisely matching neuron ROIs across sessions challenging.

      d. Could the authors please add more detail to the analyses of the videos (to track facialmovements) or provide a reference?

      Added citation.

      e. The same goes for the selection of subcellular regions of interest that were used as"units."

      Added to page 25:

      “We used the CaImAn package (Giovannucci et al., 2019) for automatic ROI segmentation through constrained non negative matrix factorization and selected ROIs (Units) showing clear Ca transients consistent with neuronal activity, and IC neuron somatic shape and size (Schofield and Beebe, 2019).”

      Specific: In order to maximise the efficiency of my comments and suggestions (as there are no line numbers), my numerated points are organised in sequential order.

      (1) Abstract: I wouldn't personally motivate the study with the central nucleus of the IC (i.e. Idon't think this is necessary). I think the authors can motivate it simply with the knowledge gaps in spatial coding throughout the auditory system, in which such large data sets such as the ones presented here are of general value.

      (2) Page 4: 15-50 kHz "white" noise is incorrect. It should be "band-passed" noise.

      Changed.

      (3) Supplemental figure 1, panel A: Since the authors could not identify cell bodiesunequivocally from their averaged volume timeseries data, it would be clearer to the readership if larger images are shown, so that they can evaluate (speculate) for themselves what subcellular structures were identified as units. Even better would be to include a planar image through a cross-section. As mentioned above, not everything determined for the cortex or hippocampus can be assumed to be true for the DCIC.

      The raw images and segmentations are publicly available for detailed inspections.

      (4) Supplemental figure 2, panel A: This panel requires further explanation, in particular thepanel on the right. I assume that to be a simple subtraction of sequential frames, but I'm thrown off by the "d(Grey)" colour bar. Also, if "grey" refers to the neutral colour, it is conventionally spelled "gray" in US-American English.

      Changed.

      (5) Supplemental figure 2, panel B: I'm personally curious why the animals exhibitedmovement just prior to a stimulus. Did they learn to anticipate the presentation of a sound after some habituation? Is that somehow a pre-emptive startle response? We observe that in our own experiments (but as we stochastically vary the inter-trial-intervals, the movement typically occurs directly after the stimulus). I don't suggest the authors dwell on this, but I find it an interesting observation.

      It is indeed interesting, but we can’t conclude much about it without comparing it to random inter-trial-intervals.

      (6) Supplemental figure 3: I personally find these data (decoding of all electrophysiologicaldata) of central relevance to the study, since it mirrors the analyses presented for its imaging data counterpart and encourage the authors to move it to the main text.

      Changed.

      (7) Page 12: Do the authors have any further analyses of spatial tuning functions? We allknow they can parametrically obscure (i.e., bi-lobed, non-monotonic, etc.), but having these parameters (even if just in a supplemental figure) would be informative for the spatial auditory community.

      We dedicated significant effort to attempt to parametrize and classify the azimuth response dependency functions from the recorded DCIC cells in an unbiased way. Nevertheless, given the observed response noise and the “obscure” properties of spatial tuning functions mentioned by the reviewer, we could only reach the general qualitative observation of having a more frequent contralateral selectivity.

      (8) Page 14 (end): Here, psychometric correspondence is referenced. Please add theLauer et al., (2011) reference, or, as I would, remove the statement entirely and save it for the discussion (where it is also mentioned and referenced).

      Changed.

      (9) Figure 5, Panels B and C: Why don't the authors report the Kruskal-Wallis tests (forincreasing number of units training the model), akin to e.g., Panel G of Figure 4? I think that would be interesting to see (e.g., if the number of required units to achieve statistical significance is the same).

      Within class randomization produced a moderate effect on decoder performance, achieving statistical significance at similar numbers of units, as seen in figure 5 panels B and C. We did not include these plots for the sake of not cluttering the figure with dense distributions and fuzzing the visualization of the differences between the distributions shown.

      (10) Figure 5, Panels B and C (histograms): I see a bit of skewedness in the distributions(even after randomisation). Where does this come from? This is just a small talking point.

      We believe this is potentially due to more than one distribution of pairwise correlations combined into one histogram (like in a Gaussian mixture model).

      (11) Page 21: Could the authors please specify that the Day and Delgutte (2013) study wasperformed on rabbits? Since rabbits have an entirely different spectral hearing range compared to mice, spatial coding principles could very well be different in those animals (and I'm fairly certain such a study has not yet been published for mice).

      Specified.

      (12) Page 22: I'd encourage the authors to remove the reference to Rayleigh's duplextheory, since mice hardly (if at all) use interaural time differences for azimuthal sound localisation, given their generally high-frequency hearing range.

      That sentence is meant to discuss beyond the mouse model an exciting outlook of our findings in light of previous reports, which is a hypothetical functional relationship between the tonotopy in DCIC and the spatial distribution of azimuth sensitive DCIC neurons. We have clarified this now in the text.

      (13) Page 23: I believe the conventional verb for gene delivery with viruses is still"transduce" (or "infect", but not "induce"). What was the specific "syringe" used for stereotactic injections? Also, why were mice housed separately after surgery? This question pertains to animal welfare.

      Changed. The syringe was a 10ml syringe to generate positive or negative pressure, coupled to the glass needle through a silicon tubing via a luer 3-way T valve. Single housing was chosen to avoid mice compromising each other’s implantations. Therefore this can be seen as a refinement of our method to maximize the chances of successful imaging per implanted mouse.

      (14) Page 25: Could the authors please indicate the refractory period violation time windowhere? I had to find it buried in the figure caption of Supplementary figure 1.

      Added.

      (15) Page 27: What version of MATLAB was used? This could be important for reproductionof the analyses, since The Mathworks is infamously known to add (or even more deplorably, modify) functions in particular versions (and not update older ones accordingly).

      Added.

      Reviewer #3 (Recommendations For The Authors):

      Overall I thought this was a nice manuscript and a very interesting dataset. Here are some suggestions and minor corrections:

      You may find this work of interest - 'A monotonic code for sound azimuth in primate inferior colliculus' 2003, Groh, Kelly & Underhill.

      We thank the reviewer for pointing out this extremely relevant reference, which we regrettably failed to cite. It is now included in the revised version of the manuscript.

      In your introduction, you state "our findings point to a functional role of DCIC in sound location coding". Though your results show that there is azimuthal information contained in a subset of DCIC units there's no evidence in the manuscript that shows a functional link between this representation and sound localization.

      This is now addressed in the answers to the public reviews.

      I found the variability in your DCIC population quite striking - especially during the intersound intervals. The entrainment of the population in the imaging datatset suggests some type of input activating the populations - maybe these are avenues for further probing the variability here:

      (1) I'm curious if you can extract eye movements from your video. Work from Jennifer Grohshows that some cells in the primate inferior colliculus are sensitive to different eye positions (Groh et. al., 2001). With recent work showing eye movements in rodents, it may explain some of the variance in the DCIC responses.

      This is now addressed in the answers to the public reviews.

      (2) I was also curious if the motor that moves the speaker made noise It could be possiblesome of the 'on going' activity could be some sound-evoked response.

      We were careful to set the stepper motor speed so that it produced low frequency noise, within a band mostly outside of the hearing range of mice (<4kHz). Nevertheless, we cannot fully rule out that a very quiet but perhaps very salient component of the motor noise could influence the activity during the inter trial periods. The motor was stationary and quiet for a period of at least one stimulus duration before and during stimulus presentation.  

      (3) Was the sound you present frozen or randomly generated on each trial? Could therebe some type of structure in the noise you presented that sometimes led cells to respond to a particular azimuth location but not others?

      The sound presented was frozen noise. This is now clarified in the methods section.

      It may be useful to quantify the number of your units that had refractory period violations.

      Our manual curation of sorted units was very stringent to avoid mixing differently tuned neurons. The single units analyzed had very infrequent refractory period violations, in less than ~5% of the spikes, considering a 2 ms refractory period.

      Was the video recording contralateral or ipsilateral to the recording?

      The side of the face ipsilateral to the imaged IC was recorded. Added to methods.

      I was struck by the snout and ear movements - in the example shown in Supplementary Figure 2B it appears as they are almost predicting sound onset. Was there any difference in ear movements in the habituated and non-habituated animals? Also, does the placement of the cranial window disturb any of the muscles used in ear movement?

      Mouse snout movements appear to be quite active perhaps reflecting arousal (Stringer et al., 2019). We cannot rule out that the cranial window implantation disturbed ear movement but while moving the mouse headfixed we observed what could be considered normal ear movements.

      Did you correlate time-point by time-point in the average population activity and movement or did you try different temporal labs/leads in case the effect of the movements was delayed in some way?

      Point by point due to 250ms time resolution of imaging.

      Are the video recordings only available during the imaging? It would be nice to see the same type of correlations in the neuropixel-acquired data as well.

      Only imaging. For neuropixels recordings, we were skeptical about face videography as we suspected that face movements were likely influenced by the acute nature of the preparation procedure. Our cranial window preparation in the other hand involved a recovery period of at least 4 weeks. Therefore we were inclined to perform videographical interrogation of face movements on these mice instead.

      If you left out more than 1 trial do you think this would help your overfitting issue (e.g. leaving out 20% of the data).

      Due to the relatively small number of trial repetitions collected, fitting the model with an even smaller training dataset is unlikely to help overfitting and will likely decrease decoder performance.

      It would be nice to see a confusion matrix - even though azimuthal error and cumulative distribution of error are a fine way to present the data - a confusion matrix would tell us which actual sounds the decoder is confusing. Just looking at errors could result in some funky things where you reduce the error generally but never actually estimate the correct location.

      We considered confusion matrices early on in our study but they were not easily interpretable or insightful, likely due to the relatively low discrimination ability of the mouse model with +/- 30º error after extensive training. Therefore, we reasoned that in passively listening mice (and likely trained mice too) with limited trial repetitions, an undersampled and diffuse confusion matrix is expected which is not an ideal means of visualizing and comparing decoding errors. Hence we relied on cumulative error distributions.

      Do your top-ranked units have stronger projections onto your 10-40 principal components?

      It would be interesting to know if the components are mostly taking into account those 30ish percent of the population that is dependent upon azimuth.

      Inspection of PC loadings across units ranked based on response dependency to stimulus azimuth does not show a consistent stronger projection of top ranked units onto the first 10-40 principal components (Author response image 3).

      Author response image 3.

      PC loading matrices for each recorded mouse. The units recorded in each mouse are ranked in descending order of response dependency to stimulus azimuth based on  the p value of the chi square test. Units above the red dotted line display a chi square p value < 0.05, units below this line have p values >= 0.05.

      How much overlap is there in the tuning of the top-ranked units?

      This is quite varying from mouse to mouse and imaging vs electrophysiology, which makes it hard to make a generalization since this might depend on the unique DCIC population sampled in each mouse.

      I'm not really sure I follow what the nS/N adds - it doesn't really measure tuning but it seems to be introduced to discuss/extract some measure of tuning.

      nS/N is used to quantify how noisy neurons are, independent of how sensitive their responses are to the stimulus azimuth.

      Is the noise correlation - observed to become more positive - for more contralateral stimuli a product of higher firing rates due to a more preferred stimulus presentation or a real effect in the data? Was there any relationship between distance and strength of observed noise correlation in the DCIC?

      We observed a consistent and homogeneous trend of pairwise noise correlation distributions either shifted or tailed towards more positive values across stimulus azimuths, for imaging and electrophysiology datasets (Author response image 3). The lower firing frequency observed in neuropixels recordings in response to ipsilateral azimuths could have affected the statistical power of the comparison between the pairwise noise correlation coefficient distribution to its randomized chance level, but the overall histogram shapes qualitatively support this consistent trend across azimuths (Author response image 4).

      Author response image 4.

      Distribution histograms for the pairwise correlation coefficients (Kendall tau) from pairs of simultaneously recorded top ranked units across mice (blue) compared to the chance level distribution obtained through randomization of the temporal structure of each unit’s activity to break correlations (purple). Vertical lines show the medians of these distributions. Imaging data comes from n = 12 mice and neuropixels data comes from n = 4 mice.

      Typos:

      'a population code consisting on the simultaneous" > should on be of?

      'half of the trails' > trails should be trials?

      'referncing the demuxed channels' > should it be demixed?

      Corrected.

    1. Author response:

      Reviewer #1 (Public Review):

      Padilha et al. aimed to find prospective metabolite biomarkers in serum of children aged 6-59 months that were indicative of neurodevelopmental outcomes. The authors leveraged data and samples from the cross-sectional Brazilian National Survey on Child Nutrition (ENANI-2019), and an untargeted multisegment injection-capillary electrophoresis-mass spectrometry (MSI-CE-MS) approach was used to measure metabolites in serum samples (n=5004) which were identified via a large library of standards. After correlating the metabolite levels against the developmental quotient (DQ), or the degree of which age-appropriate developmental milestones were achieved as evaluated by the Survey of Well-being of Young Children, serum concentrations of phenylacetylglutamine (PAG), cresol sulfate (CS), hippuric acid (HA) and trimethylamine-N-oxide (TMAO) were significantly negatively associated with DQ. Examination of the covariates revealed that the negative associations of PAG, HA, TMAO and valine (Val) with DQ were specific to younger children (-1 SD or 19 months old), whereas creatinine (Crtn) and methylhistidine (MeHis) had significant associations with DQ that changed direction with age (negative at -1 SD or 19 months old, and positive at +1 SD or 49 months old). Further, mediation analysis demonstrated that PAG was a significant mediator for the relationship of delivery mode, child's diet quality and child fiber intake with DQ. HA and TMAO were additional significant mediators of the relationship of child fiber intake with DQ.

      Strengths of this study include the large cohort size and study design allowing for sampling at multiple time points along with neurodevelopmental assessment and a relatively detailed collection of potential confounding factors including diet. The untargeted metabolomics approach was also robust and comprehensive allowing for level 1 identification of a wide breadth of potential biomarkers. Given their methodology, the authors should be able to achieve their aim of identifying candidate serum biomarkers of neurodevelopment for early childhood. The results of this work would be of broad interest to researchers who are interested in understanding the biological underpinnings of development and also for tracking development in pediatric populations, as it provides insight for putative mechanisms and targets from a relevant human cohort that can be probed in future studies. Such putative mechanisms and targets are currently lacking in the field due to challenges in conducting these kind of studies, so this work is important.

      However, in the manuscript's current state, the presentation and analysis of data impede the reader from fully understanding and interpreting the study's findings.

      Particularly, the handling of confounding variables is incomplete. There is a different set of confounders listed in Table 1 versus Supplementary Table 1 versus Methods section Covariates versus Figure 4. For example, Region is listed in Supplementary Table 1 but not in Table 1, and Mode of Delivery is listed in Table 1 but not in Supplementary Table 1. Many factors are listed in Figure 4 that aren't mentioned anywhere else in the paper, such as gestational age at birth or maternal pre-pregnancy obesity.

      We thank the reviewer for their comment. We would like to clarify that initially, the tables had different variables because they have different purposes. Table 1 aims to characterize the sample on variables directly related to the children’s and mother’s features and their nutritional status. Supplementary File 1(previously named supplementary table 1) summarizes the sociodemographic distribution of the development quotient. Neither of the tables concerned the metabolite-DQ relationships and their potential covariates, they only provide context for subsequent analyses by characterizing the sample and the outcome. Instead, the covariates included in the regression models were selected using the Direct Acyclic Graph presented in Figure 1.

      To avoid this potential confusion however, we included the same variables in Table 1 and Supplementary File 1(page 38) and we discussed the selection of model covariates in Figure 4 in more detail here in the letter and in the manuscript.

      The authors utilize the directed acrylic graph (DAG) in Figure 4 to justify the further investigation of certain covariates over others. However, the lack of inclusion of the microbiome in the DAG, especially considering that most of the study findings were microbial-derived metabolite biomarkers, appears to be a fundamental flaw. Sanitation and micronutrients are proposed by the authors to have no effect on the host metabolome, yet sanitation and micronutrients have both been demonstrated in the literature to affect microbiome composition which can in turn affect the host metabolome.

      Thank you for your comment. We appreciate that the use of DAG and lack of the microbiome in the DAG are concerns. This has been already discussed in reply #1 to the editor that has been pasted below for convenience:

      Thank you for the comment and suggestions. It is important to highlight that there is no data on microbiome composition. We apologize if there was an impression such data is available. The main goal of conducting this national survey was to provide qualified and updated evidence on child nutrition to revise and propose new policies and nutritional guidelines for this demographic. Therefore, collection of stool derived microbiome (metagenomic) data was not one of the objectives of ENANI-2019. This is more explicitly stated as a study limitation in the revised manuscript on page 17, lines 463-467:

      “Lastly, stool microbiome data was not collected from children in ENANI-2019 as it was not a study objective in this large population-based nutritional survey. However, the lack of microbiome data does not reduce the importance/relevance, since there is no evidence that microbiome and factors affecting microbiome composition are confounders in the association between serum metabolome and child development.”

      Besides, one must consider the difficulties and costs in collecting and analyzing microbiome composition in a large population-based survey. In contrast, the metabolome data has been considered a priority as there was already blood specimens collected to inform policy on micronutrient deficiencies in Brazil. However, due to funding limitations we had to perform the analysis in a subset of our sample, still representative and large enough to test our hypothesis with adequate study power (more details below).

      We would like to argue that there is no evidence that microbiome and factors affecting microbiome composition are confounders on the association between serum metabolome and child development. First, one should revisit the properties of a confounder according to the epidemiology literature that in short states that confounding refers to an alternative explanation for a given conclusion, thus constituting one of the main problems for causal inference (Kleinbaum, Kupper, and Morgenstern, 1991; Greenland & Robins, 1986; VanderWeele, 2019). In our study, we highlight that certain serum metabolites associated with the developmental quotient (DQ) in children were circulating metabolites (e.g., cresol sulfate, hippuric acid, phenylacetylglutamine, TMAO) previously reported to depend on dietary exposures, host metabolism and gut microbiota activity. Our discussion cites other published work, including animal models and observational studies, which have reported how these bioactive metabolites in circulation are co-metabolized by commensal gut microbiota, and may play a role in neurodevelopment and cognition as mediated by environmental exposures early in life.

      In fact, the literature on the association between microbiome and infant development is very limited. We performed a search using terms ‘microbiome’ OR ‘microbiota’ AND ‘child development’ AND ‘systematic’ OR ‘meta-analysis’ and found only one study: ‘Associations between the human immune system and gut microbiome with neurodevelopment in the first 5 years of life: A systematic scoping review’ (DOI 10.1002/dev.22360). The authors conclude: ‘while the immune system and gut microbiome are thought to have interactive impacts on the developing brain, there remains a paucity of published studies that report biomarkers from both systems and associations with child development outcomes.’ It is important to highlight that our criteria to include confounders on the directed acyclic graph (DAG) was based on the literature of systematic reviews or meta-analysis and not on single isolated studies.

      In summary, we would like to highlight that there is no microbiome data in ENANI-2019 and in the event such data was present, we are confident that based on the current stage of the literature, there is no evidence to consider such construct in the DAG, as this procedure recommends that only variables associated with the exposure and the outcome should be included. Please find more details on DAG below.

      Moreover, we would like to clarify that we have not stated that sanitation and micronutrients have no effect on the serum metabolome, instead, these constructs were not considered on the DAG.

      To make it clearer, we have modified the passage about DAG in the methods section. New text, page 9, lines 234-241:

      “The subsequent step was to disentangle the selected metabolites from confounding variables. A Directed Acyclic Graph (DAG; Breitling et al., 2021) was used to more objectively determine the minimally sufficient adjustments for the regression models to account for potentially confounding variables while avoiding collider variables and variables in the metabolite-DQ causal pathways, which if controlled for would unnecessarily remove explained variance from the metabolites and hamper our ability to detect biomarkers. To minimize bias from subjective judgments of which variables should and should not be included as covariates, the DAG only included variables for which there was evidence from systematic reviews or meta-analysis of relationships with both the serum metabolome and DQ (Figure 1). Birth weight, breastfeeding, child's diet quality, the child's nutritional status, and the child's age were the minimal adjustments suggested by the DAG. Birth weight was a variable with high missing data, and indicators of breastfeeding practice data (referring to exclusive breastfeeding until 6 months and/or complemented until 2 years) were collected only for children aged 0–23 months. Therefore, those confounders were not included as adjustments. Child's diet quality was evaluated as MDD, the child's nutritional status as w/h z-score, and the child's age in months.”

      Additionally, the authors emphasized as part of the study selection criteria the following, "Due to the costs involved in the metabolome analysis, it was necessary to further reduce the sample size. Then, samples were stratified by age groups (6 to 11, 12 to 23, and 24 to 59 months) and health conditions related to iron metabolism, such as anemia and nutrient deficiencies. The selection process aimed to represent diverse health statuses, including those with no conditions, with specific deficiencies, or with combinations of conditions. Ultimately, through a randomized process that ensured a balanced representation across these groups, a total of 5,004 children were selected for the final sample (Figure 1)."

      Therefore, anemia and nutrient deficiencies are assumed by the reader to be important covariates, yet, the data on the final distribution of these covariates in the study cohort is not presented, nor are these covariates examined further.

      Thank you for the comments. We apologize for the misunderstanding and will amend the text to make our rationale clearer in the revised version of the manuscript.

      We believed the original text was clear enough in stating that the sampling process was performed aiming to maintain the representativeness of the original sample. This sampling process considered anemia and nutritional deficiencies, among other variables. However, we did not aim to include all relevant covariates of the DQ-metabolome relationship; these were decided using the DAG, as described in the manuscript and other sessions of this letter. Therefore, we would like to emphasize that our description of the sampling process does not assumes anemia and nutritional deficiencies are important covariates for the DQ-metabolome relationship.

      We rewrote this text part, page 11, lines 279-285:

      “Due to the costs involved in the metabolome analysis, it was necessary to reduce the sample size that is equivalent to 57% of total participants from ENANI-2019 with stored blood specimens. Therefore, the infants were stratified by age groups (6 to 11, 12 to 23, and 24 to 59 months) and health conditions such as anemia and micronutrient deficiencies. The selection process aimed to represent diverse health statuses to the original sample. Ultimately, 5,004 children were selected for the final sample through a random sampling process that ensured a balanced representation across these groups (Figure 2).”

      The inclusion of specific covariates in Table 1, Supplementary Table 1, the statistical models, and the mediation analysis is thus currently biased as it is not well justified.

      We appreciate the reviewer comment. However, it would have been ideal to receive a comment/critic with a clearer and more straightforward argumentation, so we could try to address it based on our interpretation.

      Please refer to our response to item #1 above regarding the variables in the tables and figures. The covariates in the statistical models were selected using the DAG, which is a cutting-edge procedure that aims to avoid bias and overfitting, a common situation when confounders are adjusted for without a clear rationale. We elaborate on the advantages of using the DAG in response to item #6 and in page 9 of the manuscript. The statistical models we use follow the best practices in the field when dealing with a large number of collinear predictors and a continuous outcome (see our response to the editor’s 4th comment). Finally, the mediation analyses were done to explore a few potential explanations for our results from the PLSR and multiple regression analyses. We only ran mediation analyses for plausible mechanisms for which the variables of interest were available in our data. Please see our response to reviewer 3’s item #1 for a more detailed explanation on the mediation analysis.

      Finally, it is unclear what the partial-least squares regression adds to the paper, other than to discard potentially interesting metabolites found by the initial correlation analysis.

      Thank you for the question. As explained in response to the editor’s item #4, PLS-based analyses are among the most commonly used analyses for parsing metabolomic data (Blekherman et al., 2011; Wold et al., 2001; Gromski et al. 2015). This procedure is especially appropriate for cases in which there are multiple collinear predictor variables as it allows us to compare the predictive value of all the variables without relying on corrections for multiple testing. Testing each metabolite in separate correlations corrected for multiple comparisons is less appropriate because the correlated nature of the metabolites means the comparisons are not truly independent and would cause the corrections (which usually assume independence) to be overly strict. As such, we only rely on the correlations as an initial, general assessment that gives context to subsequent, more specific analyses. Given that our goal is to select the most predictive metabolites, discarding the less predictive metabolites is precisely what we aim to achieve. As explained above and in response to the editor’s item #4, the PLSR allows us to reach that goal without introducing bias in our estimates or losing statistical power.  

      Reviewer #2 (Public Review):

      A strength of the work lies in the number of children Padilha et al. were able to assess (5,004 children aged 6-59 months) and in the extensive screening that the Authors performed for each participant. This type of large-scale study is uncommon in low-to-middle-income countries such as Brazil.

      The Authors employ several approaches to narrow down the number of potentially causally associated metabolites.

      Could the Authors justify on what basis the minimum dietary diversity score was dichotomized? Were sensitivity analyses undertaken to assess the effect of this dichotomization on associations reported by the article? Consumption of each food group may have a differential effect that is obscured by this dichotomization.

      Thank you for the observation. We would like to emphasize that the child's diet quality was assessed using the minimum dietary diversity (MDD) indicator proposed by the WHO (World Health Organization & United Nations Children’s Fund (UNICEF), 2021). This guideline proposes the cutoff used in the present study. We understand the reviewer’s suggestion to use the consumption of healthy food groups as an evaluation of diet quality, but we chose to follow the WHO proposal to assess dietary diversity. This indicator is widely accepted and used as a marker and provides comparability and consistency with other published studies.

      Could the Authors specify the statistical power associated with each analysis?

      To the best of our knowledge, we are not aware of power calculation procedures for PLS-based analyses. However, given our large sample size, we do not believe power was an issue with the analyses. For our regression analyses, which typically have 4 predictors, we had 95% power to detect an f-squared of 0.003 and an r of 0.05 in a two-sided correlation test considering an alpha level of 0.05.

      New text, page 11, lines 296-298:

      “Given the size of our sample, statistical power is not an issue in our analyses. Considering an alpha of 0.05 for a two-sided test, a sample size of 5000 has 95% power to detect a correlation of r = 0.05 and an effect of f2 = 0.003 in a multiple regression model with 4 predictors.”

      Could the Authors describe in detail which metric they used to measure how predictive PLSR models are, and how they determined what the "optimal" number of components were?

      We chose the model with the fewest number of components that maximized R2 and minimized root mean squared error of prediction (RMSEP). In the training data, the model with 4 components had a lower R2 but a lower RMSEP, therefore we chose the model with 3 components which had a higher R2 than the 4-component model and lower RMSEP than the model with 2 components. However, the number of components in the model did not meaningfully change the rank order of the metabolites on the VIP index.

      New text, page 8, lines 220-224:

      “To better assess the predictiveness of each metabolite in a single model, a PLSR was conducted. PLS-based analyses are the most commonly used analyses when determining the predictiveness of a large number of variables as they avoid issues with collinearity, sample size, and corrections for multiple-testing (Blekherman et al., 2011; Wold et al., 2001; Gromski et al. 2015).”

      New text, page 12, lines 312-314:

      “In PLSR analysis, the training data suggested that three components best predicted the data (the model with three components had the highest R2, and the root mean square error of prediction (RMSEP) was only slightly lower with four components). In comparison, the test data showed a slightly more predictive model with four components (Figure 3—figure supplement 2).”

      The Authors use directed acyclic graphs (DAG) to identify confounding variables of the association between metabolites and DQ. Could the dataset generated by the Authors have been used instead? Not all confounding variables identified in the literature may be relevant to the dataset generated by the Authors.

      Thank you for the question. The response is most likely no, the current dataset should not be used to define confounders as these must be identified based on the literature. The use of DAGs has been widely explored as a valid tool for justifying the choice of confounding factors in regression models in epidemiology. This is because DAGs allow for a clear visualization of causal relationships, clarify the complex relationships between exposure and outcome. Besides, DAGs demonstrate the authors' transparency by acknowledging factors reported as important but not included/collected in the study. This has been already discussed in reply #1 to the editor that has been pasted below for convenience.

      Thank you for the comment and suggestions. It is important to highlight that there is no data on microbiome composition. We apologize if there was an impression such data is available. The main goal of conducting this national survey was to provide qualified and updated evidence on child nutrition to revise and propose new policies and nutritional guidelines for this demographic. Therefore, collection of stool derived microbiome (metagenomic) data was not one of the objectives of ENANI-2019. This is more explicitly stated as a study limitation in the revised manuscript on page 17, lines 463-467:

      “Lastly, stool microbiome data was not collected from children in ENANI-2019 as it was not a study objective in this large population-based nutritional survey. However, the lack of microbiome data does not reduce the importance/relevance, since there is no evidence that microbiome and factors affecting microbiome composition are confounders in the association between serum metabolome and child development.”

      Besides, one must consider the difficulties and costs in collecting and analyzing microbiome composition in a large population-based survey. In contrast, the metabolome data has been considered a priority as there was already blood specimens collected to inform policy on micronutrient deficiencies in Brazil. However, due to funding limitations we had to perform the analysis in a subset of our sample, still representative and large enough to test our hypothesis with adequate study power (more details below).

      We would like to argue that there is no evidence that microbiome and factors affecting microbiome composition are confounders on the association between serum metabolome and child development. First, one should revisit the properties of a confounder according to the epidemiology literature that in short states that confounding refers to an alternative explanation for a given conclusion, thus constituting one of the main problems for causal inference (Kleinbaum, Kupper, and Morgenstern, 1991; Greenland & Robins, 1986; VanderWeele, 2019). In our study, we highlight that certain serum metabolites associated with the developmental quotient (DQ) in children were circulating metabolites (e.g., cresol sulfate, hippuric acid, phenylacetylglutamine, TMAO) previously reported to depend on dietary exposures, host metabolism and gut microbiota activity. Our discussion cites other published work, including animal models and observational studies, which have reported how these bioactive metabolites in circulation are co-metabolized by commensal gut microbiota, and may play a role in neurodevelopment and cognition as mediated by environmental exposures early in life.

      In fact, the literature on the association between microbiome and infant development is very limited. We performed a search using terms ‘microbiome’ OR ‘microbiota’ AND ‘child development’ AND ‘systematic’ OR ‘meta-analysis’ and found only one study: ‘Associations between the human immune system and gut microbiome with neurodevelopment in the first 5 years of life: A systematic scoping review’ (DOI 10.1002/dev.22360). The authors conclude: ‘while the immune system and gut microbiome are thought to have interactive impacts on the developing brain, there remains a paucity of published studies that report biomarkers from both systems and associations with child development outcomes.’ It is important to highlight that our criteria to include confounders on the directed acyclic graph (DAG) was based on the literature of systematic reviews or meta-analysis and not on single isolated studies.

      In summary, we would like to highlight that there is no microbiome data in ENANI-2019 and in the event such data was present, we are confident that based on the current stage of the literature, there is no evidence to consider such construct in the DAG, as this procedure recommends that only variables associated with the exposure and the outcome should be included. Please find more details on DAG below.

      Moreover, we would like to clarify that we have not stated that sanitation and micronutrients have no effect on the serum metabolome, instead, these constructs were not considered on the DAG.

      To make it clearer, we have modified the passage about DAG in the methods section. New text, page 9, lines 234-241:

      “The subsequent step was to disentangle the selected metabolites from confounding variables. A Directed Acyclic Graph (DAG; Breitling et al., 2021) was used to more objectively determine the minimally sufficient adjustments for the regression models to account for potentially confounding variables while avoiding collider variables and variables in the metabolite-DQ causal pathways, which if controlled for would unnecessarily remove explained variance from the metabolites and hamper our ability to detect biomarkers. To minimize bias from subjective judgments of which variables should and should not be included as covariates, the DAG only included variables for which there was evidence from systematic reviews or meta-analysis of relationships with both the serum metabolome and DQ (Figure 1). Birth weight, breastfeeding, child's diet quality, the child's nutritional status, and the child's age were the minimal adjustments suggested by the DAG. Birth weight was a variable with high missing data, and indicators of breastfeeding practice data (referring to exclusive breastfeeding until 6 months and/or complemented until 2 years) were collected only for children aged 0–23 months. Therefore, those confounders were not included as adjustments. Child's diet quality was evaluated as MDD, the child's nutritional status as w/h z-score, and the child's age in months.”

      Were the systematic reviews or meta-analyses used in the DAG performed by the Authors, or were they based on previous studies? If so, more information about the methodology employed and the studies included should be provided by the Authors.

      Thank you for the question. The reviews or meta-analyses used in the DAG have been conducted by other authors in the field. This has been laid out more clearly in our methods section.

      New text, page 9, lines 234-241:

      “The subsequent step was to disentangle the selected metabolites from confounding variables. A Directed Acyclic Graph (DAG; Breitling et al., 2021) was used to more objectively determine the minimally sufficient adjustments for the regression models to account for potentially confounding variables while avoiding collider variables and variables in the metabolite-DQ causal pathways, which if controlled for would unnecessarily remove explained variance from the metabolites and hamper our ability to detect biomarkers. To minimize bias from subjective judgments of which variables should and should not be included as covariates, the DAG only included variables for which there was evidence from systematic reviews or meta-analysis of relationships with both the metabolome and DQ (Figure 1). Birth weight, breastfeeding, child's diet quality, the child's nutritional status, and the child's age were the minimal adjustments suggested by the DAG. Birth weight was a variable with high missing data, and indicators of breastfeeding practice data (referring to exclusive breastfeeding until 6 months and/or complemented until 2 years) were collected only for children aged 0–23 months. Therefore, those confounders were not included as adjustments. Child's diet quality was evaluated as MDD, the child's nutritional status as w/h z-score, and the child's age in months.”

      Approximately 72% of children included in the analyses lived in households with a monthly income superior to the Brazilian minimum wage. The cohort is also biased towards households with a higher level of education. Both of these measures correlate with developmental quotient. Could the Authors discuss how this may have affected their results and how generalizable they are?

      Thank you for your comment. This has been already discussed in reply #6 to the editor and that has been pasted below for convenience.

      Thank you for highlighting this point. The ENANI-2019 is a population-based household survey with national coverage and representativeness for macroregions, sex, and one-year age groups (< 1; 1-1.99; 2-2.99; 3-3.99; 4-5). Furthermore, income quartiles of the census sector were used in the sampling. The study included 12,524 households 14,588 children, and 8,829 infants with blood drawn.

      Due to the costs involved in metabolome analysis, it was necessary to further reduce the sample size to around 5,000 children that is equivalent to 57% of total participants from ENANI-2019 with stored blood specimens. To avoid a biased sample and keep the representativeness and generability, the 5,004 selected children were drawn from the total samples of 8,829 to keep the original distribution according age groups (6 to 11 months, 12 to 23 months, and 24 to 59 months), and some health conditions related to iron metabolism, e.g., anemia and nutrient deficiencies. Then, they were randomly selected to constitute the final sample that aimed to represent the total number of children with blood drawn. Hence, our efforts were to preserve the original characteristics of the sample and the representativeness of the original sample.

      The ENANI-2019 study does not appear to present a bias towards higher socioeconomic status. Evidence from two major Brazilian population-based household surveys supports this claim. The 2017-18 Household Budget Survey (POF) reported an average monthly household income of 5,426.70 reais, while the Continuous National Household Sample Survey (PNAD) reported that in 2019, the nominal monthly per capita household income was 1,438.67 reais. In comparison, ENANI-2019 recorded a household income of 2,144.16 reais and a per capita income of 609.07 reais in infants with blood drawn, and 2,099.14 reais and 594.74 reais, respectively, in the serum metabolome analysis sample.

      In terms of maternal education, the 2019 PNAD-Education survey indicated that 48.8% of individuals aged 25 or older had at least 11 years of schooling. When analyzing ENANI-2019 under the same metric, we found that 56.26% of ≥25 years-old mothers of infants with blood drawn had 11 years of education or more, and 51.66% in the metabolome analysis sample. Although these figures are slightly higher, they remain within a reasonable range for population studies.

      It is well known that higher income and maternal education levels can influence child health outcomes, and acknowledging this, ENANI-2019 employed rigorous sampling methods to minimize selection biases. This included stratified and complex sampling designs to ensure that underrepresented groups were adequately included, reducing the risk of skewed conclusions. Therefore, the evidence strongly suggests that the ENANI-2019 sample is broadly representative of the Brazilian population in terms of both socioeconomic status and educational attainment.

      Further to this, could the Authors describe how inequalities in access to care in the Brazilian population may have affected their results? Could they have included a measure of this possible discrepancy in their analyses?

      Thank you for the concern.

      The truth is that we are not in a position to answer this question because our study focused on gathering data on infant nutritional status and there is very limited information on access to care to allow us to hypothesize. Another important piece of information is that this national survey used sampling procedures that aimed to make the sample representative of the 15 million Brazilian infants under 5 years. Therefore, the sample is balanced according to socio-economic strata, so there is no evidence to make us believe inequalities in access to health care would have played a role.

      The Authors state that the results of their study may be used to track children at risk for developmental delays. Could they discuss the potential for influencing policies and guidelines to address delayed development due to malnutrition and/or limited access to certain essential foods?

      The point raised by the reviewer is very relevant. Recognizing that dietary and microbial derived metabolites involved in the gut-brain axis could be related to children's risk of developmental delays is the first step to bringing this topic to the public policy agenda. We believe the results can contribute to the literature, which should be used to accumulate evidence to overcome knowledge gaps and support the formulation and redirection of public policies aimed at full child growth and development; the promotion of adequate and healthy nutrition and food security; the encouragement, support, and protection of breastfeeding; and the prevention and control of micronutrient deficiencies.  

      Reviewer #3 (Public Review):

      The ENANI-2019 study provides valuable insights into child nutrition, development, and metabolomics in Brazil, highlighting both challenges and opportunities for improving child health outcomes through targeted interventions and further research.

      Readers might consider the following questions:

      (1) Should investigators study the families through direct observation of diet and other factors to look for a connection between food taken in and gut microbiome and child development?

      As mentioned before, the ENANI-2019 did not collect data on stool derived microbiome. However, there is data on child dietary intake with 24-hour recall that can be further explored in other studies.

      (2) Can an examination of the mother's gut microbiome influence the child's microbiome? Can the mother or caregiver's microbiome influence early childhood development?

      The questions raised by the reviewer are interesting and has been explored by other authors. However, we do not have microbiota data from the child nor from the mother/caregiver.

      (3) Is developmental quotient enough to study early childhood development? Is it comprehensive enough?

      Yes, we are confident it is comprehensive enough.

      According to the World Health Organization, the term Early Childhood Development (ECD) refers to the cognitive, physical, language, motor, social and emotional development between 0 - 8 years of age. The SWCY milestones assess the domains of cognition, language/communication and motor. Therefore, it has enough content validity to represent ECD.

      The SWYC is recommended for screening ECD by the American Society of Pediatrics. Furthermore, we assessed the internal consistency of the SWYC milestones questionnaire using ENANI-2019 data and Cronbach's alpha. The findings indicated satisfactory reliability (0.965; 95% CI: 0.963–0.968).

      The SWCY is a screening instrument and indicates if the ECD is not within the expected range. If one of the above-mentioned domains are not achieved as expected the child may be at risk of ECD delay. Therefore, DQ<1 indicates that a child has not reached the expected ECD for the age group. We cannot say that children with DQ≥1 have full ECD, since we do not assess the socio-emotional domains. However, DQ can track the risk of ECD delay.

      References

      Blekherman, G., Laubenbacher, R., Cortes, D. F., Mendes, P., Torti, F. M., Akman, S., ... & Shulaev, V. (2011). Bioinformatics tools for cancer metabolomics. Metabolomics, 7, 329-343.

      Gromski, P. S., Muhamadali, H., Ellis, D. I., Xu, Y., Correa, E., Turner, M. L., & Goodacre, R. (2015). A tutorial review: Metabolomics and partial least squares-discriminant analysis–a marriage of convenience or a shotgun wedding. Analytica chimica acta, 879, 10-23.

      Wold, S., Sjöström, M., & Eriksson, L. (2001). PLS-regression: a basic tool of chemometrics. Chemometrics and intelligent laboratory systems, 58(2), 109-130.

      LUIZ, RR., and STRUCHINER, CJ. Inferência causal em epidemiologia: o modelo de respostas potenciais [online]. Rio de Janeiro: Editora FIOCRUZ, 2002. 112 p. ISBN 85-7541-010-5. Available from SciELO Books http://books.scielo.org.

      GREENLAND, S. & ROBINS, J. M. Identifiability, exchangeability, and epidemiological Confounding. International Journal of Epidemiolgy, 15(3):413-419, 1986.

      Freitas-Costa NC, Andrade PG, Normando P, et al. Association of development quotient with nutritional status of vitamins B6, B12, and folate in 6–59-month-old children: Results from the Brazilian National Survey on Child Nutrition (ENANI-2019). The American journal of clinical nutrition 2023;118(1):162-73. doi: https://doi.org/10.1016/j.ajcnut.2023.04.026

      Sheldrick RC, Schlichting LE, Berger B, et al. Establishing New Norms for Developmental Milestones. Pediatrics 2019;144(6) doi: 10.1542/peds.2019-0374 [published Online First: 2019/11/16]

      Drachler Mde L, Marshall T, de Carvalho Leite JC. A continuous-scale measure of child development for population-based epidemiological surveys: a preliminary study using Item Response Theory for the Denver Test. Paediatric and perinatal epidemiology 2007;21(2):138-53. doi: 10.1111/j.1365-3016.2007.00787.x [published Online First: 2007/02/17]

      VanderWeele, TJ Princípios de seleção de fatores de confusão. Eur J Epidemiol 34, 211–219 (2019). https://doi.org/10.1007/s10654-019-00494-6

      David G. Kleinbaum, Lawrence L. Kupper; Hal Morgenstern. Epidemiologic Research: Principles and Quantitative Methods. 1991

      Yan R, Liu X, Xue R, Duan X, Li L, He X, Cui F, Zhao J. Association between internet exclusion and depressive symptoms among older adults: panel data analysis of five longitudinal cohort studies. EClinicalMedicine 2024;75. doi: 10.1016/j.eclinm.2024.102767.

      Zhong Y, Lu H, Jiang Y, Rong M, Zhang X, Liabsuetrakul T. Effect of homemade peanut oil consumption during pregnancy on low birth weight and preterm birth outcomes: a cohort study in Southwestern China. Glob Health Action. 2024 Dec 31;17(1):2336312.

      Aristizábal LYG, Rocha PRH, Confortin SC, et al. Association between neonatal near miss and infant development: the Ribeirão Preto and São Luís birth cohorts (BRISA). BMC Pediatr. 2023;23(1):125. Published 2023 Mar 18. doi:10.1186/s12887-023-03897-3

      Al-Haddad BJS, Jacobsson B, Chabra S, et al. Long-term risk of neuropsychiatric disease after exposure to infection in utero. JAMA Psychiatry. 2019;76(6):594-602. doi:10.1001/jamapsychiatry.2019.0029

      Chan, A.Y.L., Gao, L., Hsieh, M.HC. et al. Maternal diabetes and risk of attention-deficit/hyperactivity disorder in offspring in a multinational cohort of 3.6 million mother–child pairs. Nat Med 30, 1416–1423 (2024).

      Hernan MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.

      Greenland S; Pearl J; Robins JM. Confounding and collapsibility in causal inference. Statist Sci. 14 (1) 29 - 46 1999. https://doi.org/10.1214/ss/1009211805

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Summary: 

      This paper is focused on the role of Cadherin Flamingo (Fmi) - also called Starry night (stan) - in cell competition in developing Drosophila tissues. A primary genetic tool is monitoring tissue overgrowths caused by making clones in the eye disc that express activated Ras (RasV12) and that are depleted for the polarity gene scribble (scrib). The main system that they use is ey-flp, which makes continuous clones in the developing eye-antennal disc beginning at the earliest stages of disc development. It should be noted that RasV12, scrib-i (or lgl-i) clones only lead to tumors/overgrowths when generated by continuous clones, which presumably creates a privileged environment that insulates them from competition. Discrete (hs-flp) RasV12, lgl-i clones are in fact outcompeted (PMID: 20679206), which is something to bear in mind. 

      We think it is unlikely that the outcome of RasV12, scrib (or lgl) competition depends on discrete vs. continuous clones or on creation of a privileged environment. As shown in the same reference mentioned by the reviewer, the outcome of RasV12, scrib (or lgl) tumors greatly depends on the clone being able to grow to a certain size. The authors show instances of discrete clones where larger RasV12, lgl clones outcompete the surrounding tissue and eliminate WT cells by apoptosis, whereas smaller clones behave more like losers. It is not clear what aspect of the environment determines the ability of some clones to grow larger than others, but in neither case are the clones prevented from competition. Other studies show that in mammalian cells, RasV12, scrib clones are capable of outcompeting the surrounding tissue, such as in Kohashi et al (2021), where cells carrying both mutations actively eliminate their neighbors.

      The authors show that clonal loss of Fmi by an allele or by RNAi in the RasV12, scrib-i tumors suppresses their growth in both the eye disc (continuous clones) and wing disc (discrete clones). The authors attributed this result to less killing of WT neighbors when Myc over-expressing clones lacking Fmi, but another interpretation (that Fmi regulates clonal growth) is equally as plausible with the current results. 

      See point (1) for a discussion on this.

      Next, the authors show that scrib-RNAi clones that are normally out-competed by WT cells prior to adult stages are present in higher numbers when WT cells are depleted for Fmi. They then examine death in RasV12, scrib-i ey-FLP clones, or in discrete hsFLP UAS-Myc clones. They state that they see death in WT cells neighboring RasV12, scrib-i clones in the eye disc (Figures 4A-C). Next, they write that RasV12, scrib-I cells become losers (i.e., have apoptosis markers) when Fmi is removed. Neither of these results are quantified and thus are not compelling. They state that a similar result is observed for Myc over-expression clones that lack Fmi, but the image was not compelling, the results are not quantified and the controls are missing (Myc over-expressing clones alone and Fmi clones alone). 

      We assayed apoptosis in UAS-Myc clones in eye discs but neglected to include the results in Figure 4. We include them in the updated manuscript. Regarding Fmi clones alone, we direct the reviewer’s attention to Fig. 2 Supplement 1 where we showed that fminull clones cause no competition. Dcp-1 staining showed low levels of apoptosis unrelated to the fminull clones or twin-spots.

      Regarding the quantification of apoptosis, we did not provide a quantification, in part because we observe a very clear visual difference between groups (Fig. 4A-K), and in part because it is challenging to come up with a rigorous quantification method. For example, how far from a winner clone can an apoptotic cell be and still be considered responsive to the clone? For UASMyc winner clones, we observe a modest amount of cell death both inside and outside the clones, consistent with prior observations. For fminull UAS-Myc clones, we observe vastly more cell death within the fminull UAS-Myc clones and modest death in nearby wildtype cells, and consequently a much higher ratio of cell death inside vs outside the clone. Because of the somewhat arbitrary nature of quantification, and the dramatic difference, we initially chose not to provide a quantification. However, given the request, we chose an arbitrary distance from the clone boundary in which to consider dying cells and counted the numbers for each condition. We view this as a very soft quantification, but we nevertheless report it in a way that captures the phenomenon in the revised manuscript. 

      They then want to test whether Myc over-expressing clones have more proliferation. They show an image of a wing disc that has many small Myc overexpressing clones with and without Fmi. The pHH3 results support their conclusion that Myc overexpressing clones have more pHH3, but I have reservations about the many clones in these panels (Figures 5L-N). 

      As the reviewer’s reservations are not specified, we have no specific response.

      They show that the cell competition roles of Fmi are not shared by another PCP component and are not due to the Cadherin domain of Fmi. The authors appear to interpret their results as Fmi is required for winner status. Overall, some of these results are potentially interesting and at least partially supported by the data, but others are not supported by the data.

      Strengths: 

      Fmi has been studied for its role in planar cell polarity, and its potential role in competition is interesting.

      Weaknesses:

      (1) In the Myc over-expression experiments, the increased size of the Myc clones could be because they divide faster (but don't outcompete WT neighbors). If the authors want to conclude that the bigger size of the Myc clones is due to out-competition of WT neighbors, they should measure cell death across many discs of with these clones. They should also assess if reducing apoptosis (like using one copy of the H99 deficiency that removes hid, rpr, and grim) suppresses winner clone size. If cell death is not addressed experimentally and quantified rigorously, then their results could be explained by faster division of Myc over-expressing clones (and not death of neighbors). This could also apply to the RasV12, scrib-i results.

      Indeed, Myc clones have been shown to divide faster than WT neighbors, but that is not the only reason clones are bigger. As shown in (de la Cova et al, 2004), Myc-overexpressing cells induce apoptosis in WT neighbors, and blocking this apoptosis results in larger wings due to increased presence of WT cells. Also, (Moreno and Basler, 2004) showed that Myc-overexpressing clones cause a reduction in WT clone size, as WT twin spots adjacent to 4xMyc clones are significantly smaller than WT twin spots adjacent to WT clones. In the same work, they show complete elimination of WT clones generated in a tub-Myc background. Since then, multiple papers have shown these same results. It is well established then that increased cell proliferation transforms Myc clones into supercompetitors and that in the absence of cell competition, Myc-overexpressing discs produce instead wings larger than usual. 

      In (de la Cova et al, 2004) the authors already showed that blocking apoptosis with H99 hinders competition and causes wings with Myc clones to be larger than those where apoptosis wasn’t blocked. As these results are well established from prior literature, there is no need to repeat them here. 

      (2) This same comment about Fmi affecting clone growth should be considered in the scrib RNAi clones in Figure 3.

      In later stages, scrib RNAi clones in the eye are eliminated by WT cells. While scrib RNAi clones are not substantially smaller in third instar when competing against fmi cells (Fig 3M), by adulthood we see that WT clones lacking Fmi have failed to remove scrib clones, unlike WT clones that have completely eliminated the scrib RNAi clones by this time. We therefore disagree that the only effect of Fmi could be related to rate of cell division. 

      (3) I don't understand why the quantifications of clone areas in Figures 2D, 2H, 6D are log values. The simple ratio of GFP/RFP should be shown. Additionally, in some of the samples (e.g., fmiE59 >> Myc, only 5 discs and fmiE59 vs >Myc only 4 discs are quantified but other samples have more than 10 discs). I suggest that the authors increase the number of discs that they count in each genotype to at least 20 and then standardize this number.

      Log(ratio) values are easier to interpret than a linear scale. If represented linearly, 1 means equal ratios of A and B, while 2A/B is 2 and A/2B is 0.5. And the higher the ratio difference between A and B, the starker this effect becomes, making a linear scale deceiving to the eye, especially when decreased ratios are shown. Using log(ratios), a value of 0 means equal ratios, and increased and decreased ratios deviate equally from 0.

      Statistically, either analyzing a standardized number of discs for all conditions or a variable number not determined beforehand has no effect on the p-value, as long as the variable n number is not manipulated by p-hacking techniques, such as increasing the n of samples until a significant p-value has been obtained. While some of our groups have lower numbers, all statistical analyses were performed after all samples were collected. For all results obtained by cell counts, all samples had a minimum of 10 discs due to the inherent though modest variability of our automated cell counts, and we analyzed all the discs that we obtained from a given experiment, never “cherry-picking” examples. For the sake of transparency, all our graphs show individual values in addition to the distributions so that the reader knows the n values at a glance.

      (5) Figure 4 - shows examples of cell death. Cas3 is written on the figure but Dcp-1 is written in the results. Which antibody was used? The authors need to quantify these results. They also need to show that the death of cells is part of the phenotype, like an H99 deficiency, etc (see above).

      Thank you for flagging this error. We used cleaved Dcp-1 staining to detect cell death, not Cas3 (Drice in Drosophila). We updated all panels replacing Cas3 by Dcp-1. 

      As described above, cell death is a well established consequence of myc overexpression induced cell death and we feel there is no need to repeat that result. To what extent loss of Fmi induces excess cell death or reduces proliferation in “would-be” winners, and to what extent it reduces “would-be” winners’ ability to eliminate competitors are interesting mechanistic questions that are beyond the scope of the current manuscript.

      (6) It is well established that clones overexpressing Myc have increased cell death. The authors should consider this when interpreting their results.

      We are aware that Myc-overexpressing clones have increased cell death, but it has also been demonstrated that despite that fact, they behave as winners and eliminate WT neighboring cells. And as mentioned in comment (1), WT clones generated in a 3x and 4x Myc background are eliminated and removed from the tissue, and blocking cell death increases the size of WT “losers” clones adjacent to Myc overexpressing clones. 

      (7) A better characterization of discrete Fmi clones would also be helpful. I suggest inducing hs-flp clones in the eye or wing disc and then determining clone size vs twin spot size and also examining cell death etc. If such experiments have already been done and published, the authors should include a description of such work in the preprint.

      We have already analyzed the size of discrete Fmi clones and showed that they did not cause any competition, with fmi-null clones having the same size as WT clones in both eye and wing discs. We direct the reviewer’s attention to Figure 2 Supplement 1.

      (8) We need more information about the expression pattern of Fmi. Is it expressed in all cells in imaginal discs? Are there any patterns of expression during larval and pupal development? 

      Fmi is equally expressed by all cells in all imaginal discs in Drosophila larva and pupa. We include this information and the relevant reference (Brown et al, 2014) in the updated manuscript.

      (9) Overall, the paper is written for specialists who work in cell competition and is fairly difficult to follow, and I suggest re-writing the results to make it accessible to a broader audience.

      We have endeavored to both provide an accessible narrative and also describe in sufficient detail the data from multiple models of competition and complex genetic systems. We hope that most readers will be able, at a minimum, to follow our interpretations and the key takeaways, while those wishing to examine the nuts and bolts of the argument will find what they need presented as simply as possible.

      Reviewer 2:

      Summary: 

      In this manuscript, Bosch et al. reveal Flamingo (Fmi), a planar cell polarity (PCP) protein, is essential for maintaining 'winner' cells in cell competition, using Drosophila imaginal epithelia as a model. They argue that tumor growth induced by scrib-RNAi and RasV12 competition is slowed by Fmi depletion. This effect is unique to Fmi, not seen with other PCP proteins. Additional cell competition models are applied to further confirm Fmi's role in 'winner' cells. The authors also show that Fmi's role in cell competition is separate from its function in PCP formation.

      We would like to thank the reviewer for their thoughtful and positive review.

      Strengths:

      (1) The identification of Fmi as a potential regulator of cell competition under various conditions is interesting.

      (2) The authors demonstrate that the involvement of Fmi in cell competition is distinct from its role in planar cell polarity (PCP) development.

      Weaknesses:

      (1) The authors provide a superficial description of the related phenotypes, lacking a comprehensive mechanistic understanding. Induction of apoptosis and JNK activation are general outcomes, but it is important to determine how they are specifically induced in Fmi-depleted clones. The authors should take advantage of the power of fly genetics and conduct a series of genetic epistasis analyses.

      We appreciate that this manuscript does not address the mechanism by which Fmi participates in cell competition. Our intent here is to demonstrate that Fmi is a key contributor to competition. We indeed aim to delve into mechanism, are currently directing our efforts to exploring how Fmi regulates competition, but the size of the project and required experiments are outside of the scope of this manuscript. We feel that our current findings are sufficiently valuable to merit sharing while we continue to investigate the mechanism linking Fmi to competition. 

      (2) The depletion of Fmi may not have had a significant impact on cell competition; instead, it is more likely to have solely facilitated the induction of apoptosis.

      We respectfully disagree for several reasons. First, loss of Fmi is specific to winners; loss of Fmi has no effect on its own or in losers when confronting winners in competition. And in the Ras V12 tumor model, loss of Fmi did not perturb whole eye tumors – it only impaired tumor growth when tumors were confronted with competitors. We agree that induction of apoptosis is affected, but so too is proliferation, and only when in winners in competition.

      (3) To make a solid conclusion for Figure 1, the authors should investigate whether complete removal of Fmi by a mutant allele affects tumor growth induced by expressing RasV12 and scrib RNAi throughout the eye.

      We agree with the reviewer that this is a worthwhile experiment, given that RNAi has its limitations. However, as fmi is homozygous lethal at the embryo stage, one cannot create whole disc tumors mutant for fmi. As an approximation to this condition, we have introduced the GMR-Hid, cell-lethal combination to eliminate non-tumor tissue in the eye disc. Following elimination of non-tumor cells, there remains essentially a whole disc harboring fminull tumor. Indeed, this shows that whole fminull tumors overgrow similar to control tumors, confirming that the lack of Fmi only affects clonal tumors. We provide those results in the updated manuscript (Figure 1 Suppl 2 C-D).

      (4) The authors should test whether the expression level of Fmi (both mRNA and protein) changes during tumorigenesis and cell competition.

      This is an intriguing point that we considered worthwhile to examine. We performed immunostaining for Fmi in clones to determine whether its levels change during competition. Fmi is expressed ubiquitously at apical plasma membranes throughout the disc, and this was unchanged by competition, including inside >>Myc clones and at the clone boundary, where competition is actively happening. We provide these results as a new supplementary figure (Figure 5 Suppl 1) in the updated manuscript.

      Reviewer 3:

      Summary: 

      In this manuscript, Bosch and colleagues describe an unexpected function of Flamingo, a core component of the planar cell polarity pathway, in cell competition in the Drosophila wing and eye disc. While Flamingo depletion has no impact on tumour growth (upon induction of Ras and depletion of Scribble throughout the eye disc), and no impact when depleted in WT cells, it specifically tunes down winner clone expansion in various genetic contexts, including the overexpression of Myc, the combination of Scribble depletion with activation of Ras in clones or the early clonal depletion of Scribble in eye disc. Flamingo depletion reduces the proliferation rate and increases the rate of apoptosis in the winner clones, hence reducing their competitiveness up to forcing their full elimination (hence becoming now "loser"). This function of Flamingo in cell competition is specific to Flamingo as it cannot be recapitulated with other components of the PCP pathway, and does not rely on the interaction of Flamingo in trans, nor on the presence of its cadherin domain. Thus, this function is likely to rely on a non-canonical function of Flamingo which may rely on downstream GPCR signaling.

      This unexpected function of Flamingo is by itself very interesting. In the framework of cell competition, these results are also important as they describe, to my knowledge, one of the only genetic conditions that specifically affect the winner cells without any impact when depleted in the loser cells. Moreover, Flamingo does not just suppress the competitive advantage of winner clones, but even turns them into putative losers. This specificity, while not clearly understood at this stage, opens a lot of exciting mechanistic questions, but also a very interesting long-term avenue for therapeutic purposes as targeting Flamingo should then affect very specifically the putative winner/oncogenic clones without any impact in WT cells.

      The data and the demonstration are very clean and compelling, with all the appropriate controls, proper quantification, and backed-up by observations in various tissues and genetic backgrounds. I don't see any weakness in the demonstration and all the points raised and claimed by the authors are all very well substantiated by the data. As such, I don't have any suggestions to reinforce the demonstration.

      While not necessary for the demonstration, documenting the subcellular localisation and levels of Flamingo in these different competition scenarios may have been relevant and provided some hints on the putative mechanism (specifically by comparing its localisation in winner and loser cells). 

      Also, on a more interpretative note, the absence of the impact of Flamingo depletion on JNK activation does not exclude some interesting genetic interactions. JNK output can be very contextual (for instance depending on Hippo pathway status), and it would be interesting in the future to check if Flamingo depletion could somehow alter the effect of JNK in the winner cells and promote downstream activation of apoptosis (which might normally be suppressed). It would be interesting to check if Flamingo depletion could have an impact in other contexts involving JNK activation or upon mild activation of JNK in clones.

      We would like to thank the reviewer for their thorough and positive review.

      Strengths: 

      - A clean and compelling demonstration of the function of Flamingo in winner cells during cell competition.

      - One of the rare genetic conditions that affects very specifically winner cells without any impact on losers, and then can completely switch the outcome of competition (which opens an interesting therapeutic perspective in the long term)

      Weaknesses: 

      - The mechanistic understanding obviously remains quite limited at this stage especially since the signaling does not go through the PCP pathway.

      Reviewer 2 made the same comment in their weakness (1), and we refer to that response. In future work, we are excited to better understand the pathways linking Fmi and competition.

    1. Author response:

      Reviewer #2 (Public Review):

      M. El Amri et al., investigated the functions of Marcks and Marcks like 1 during spinal cord (SC) development and regeneration in Xenopus laevis. The authors rigorously performed loss of function with morpholino knock-down and CRISPR knock-out combining rescue experiments in developing spinal cord in embryo and regeneration in tadpole stage.

      For the assays in the developing spinal cord, a unilateral approach (knock-down/out only one side of the embryo) allowed the authors to assess the gene functions by direct comparing one-side (e.g. mutated SC) to the other (e.g. wild type SC on the other side). For the assays in regenerating SC, the authors microinject CRISPR reagents into 1-cell stage embryo. When the embryo (F0 crispants) grew up to tadpole (stage 50), the SC was transected. They then assessed neurite outgrowth and progenitor cell proliferation. The validation of the phenotypes was mostly based on the quantification of immunostaining images (neurite outgrowth: acetylated tubulin, neural progenitor: sox2, sox3, proliferation: EdU, PH3), that are simple but robust enough to support their conclusions. In both SC development and regeneration, the authors found that Marcks and Marcksl1 were necessary for neurite outgrowth and neural progenitor cell proliferation.

      The authors performed rescue experiments on morpholino knock-down and CRISPR knock-out conditions by Marcks and Marcksl1 mRNA injection for SC development and pharmacological treatments for SC development and regeneration. The unilateral mRNA injection rescued the loss-of-function phenotype in the developing SC. To explore the signalling role of these molecules, they rescued the loss-of-function animals by pharmacological reagents They used S1P: PLD activator, FIPI: PLD inhibitor, NMI: PIP2 synthesis activator and ISA-2011B: PIP2 synthesis inhibitor. The authors found the activator treatment rescued neurite outgrowth and progenitor cell proliferation in loss of function conditions. From these results, the authors proposed PIP2 and PLD are the mediators of Marcks and Marcksl1 for neurite outgrowth and progenitor cell proliferation during SC development and regeneration. The results of the rescue experiments are particularly important to assess gene functions in loss of function assays, therefore, the conclusions are solid. In addition, they performed gain-of-function assays by unilateral Marcks or Marcksl1 mRNA injection showing that the injected side of the SC had more neurite outgrowth and proliferative progenitors. The conclusions are consistent with the loss-of-function phenotypes and the rescue results. Importantly, the authors showed the linkage of the phenotype and functional recovery by behavioral testing, that clearly showed the crispants with SC injury swam less distance than wild types with SC injury at 10-day post surgery.

      Prior to the functional assays, the authors analyzed the expression pattern of the genes by in situ hybridization and immunostaining in developing embryo and regenerating SC. They confirmed that the amount of protein expression was significantly reduced in the loss of function samples by immunostaining with the specific antibodies that they made for Marcks and Marcksl1. Although the expression patterns are mostly known in previous works during embryo genesis, the data provided appropriate information to readers about the expression and showed efficiency of the knock-out as well.

      MARCKS family genes have been known to be expressed in the nervous system. However, few studies focus on the function in nerves. This research introduced these genes as new players during SC development and regeneration. These findings could attract broader interests from the people in nervous disease model and medical field. Although it is a typical requirement for loss of function assays in Xenopus laevis, I believe that the efficient knock-out for four genes by CRISPR/Cas9 was derived from their dedication of designing, testing and validation of the gRNAs and is exemplary.

      Weaknesses,

      (1) Why did the authors choose Marcks and Marcksl1? The authors mentioned that these genes were identified with a recent proteomic analysis of comparing SC regenerative tadpole and non-regenerative froglet (Line (L) 54-57). However, although it seems the proteomic analysis was their own dataset, the authors did not mention any details to select promising genes for the functional assays (this article). In the proteomic analysis, there must be other candidate genes that might be more likely factors related to SC development and regeneration based on previous studies, but it was unclear what the criteria to select Marcks and Marcksl1 was.

      To highlight the rationale for selecting these proteins, we reworded the sentence as follows: “A recent proteomic screen … after SCI identified a number of proteins that are highly upregulated at the tadpole stage but downregulated in froglets (Kshirsagar, 2020). These proteins included Marcks and Marcksl1, which had previously been implicated in the regeneration of other tissues (El Amri et al., 2018) suggesting a potential role for these proteins also in spinal cord regeneration.”

      (2) Gene knock-out experiments with F0 crispants,

      The authors described that they designed and tested 18 sgRNAs to find the most efficient and consistent gRNA (L191-195). However, it cannot guarantee the same phenotypes practically, due to, for example, different injection timing, different strains of Xenopus laevis, etc. Although the authors mentioned the concerns of mosaicism by themselves (L180-181, L289-292) and immunostaining results nicely showed uniformly reduced Marcks and Marcksl1 expression in the crispants, they did not refer to this issue explicitly.

      To address this issue, we state explicitly in line 208-212: “We also confirmed by immunohistochemistry that co-injection of marcks.L/S and marcksl1.L/S sgRNA, which is predicted to edit all four homeologs (henceforth denoted as 4M CRISPR) drastically reduced immunostaining for Marcks and Marcksl1 protein on the injected side (Fig. S6 B-G), indicating that protein levels are reduced in gene-edited embryos.”

      (3) Limitations of pharmacological compound rescue

      In the methods part, the authors describe that they performed titration experiments for the drugs (L702-704), that is a minimal requirement for this type of assay. However, it is known that a well characterized drug is applied, if it is used in different concentrations, the drug could target different molecules (Gujral TS et al., 2014 PNAS). Therefore, it is difficult to eliminate possibilities of side effects and off targets by testing only a few compounds.

      As explained in the responses to reviewer 1, we have completely rewritten and toned down our presentation of the pharmacological result and explicitly mention in our discussion now the possibility of side effects.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      This work by Grogan and colleagues aimed to translate animal studies showing that acetylcholine plays a role in motivation by modulating the effects of dopamine on motivation. They tested this hypothesis with a placebo-controlled pharmacological study administering a muscarinic antagonist (trihexyphenidyl; THP) to a sample of 20 adult men performing an incentivized saccade task while undergoing electroencephalography (EEG). They found that reward increased vigor and reduced reaction times (RTs) and, importantly, these reward effects were attenuated by trihexyphenidyl. High incentives increased preparatory EEG activity (contingent negative variation), and though THP also increased preparatory activity, it also reduced this reward effect on RTs.

      Strengths:

      The researchers address a timely and potentially clinically relevant question with a within-subject pharmacological intervention and a strong task design. The results highlight the importance of the interplay between dopamine and other neurotransmitter systems in reward sensitivity and even though no Parkinson's patients were included in this study, the results could have consequences for patients with motivational deficits and apathy if validated in the future.

      Weaknesses:

      The main weakness of the study is the small sample size (N=20) that unfortunately is limited to men only. Generalizability and replicability of the conclusions remain to be assessed in future research with a larger and more diverse sample size and potentially a clinically relevant population. The EEG results do not shape a concrete mechanism of action of the drug on reward sensitivity.

      We thank the reviewer for their time and their assessment of this manuscript, and we appreciate their helpful comments on the previous version.

      We agree that the sample size being smaller than planned due to the pandemic restrictions is a weakness for this study, and hope that future studies into cholinergic effects on motivation in humans will use larger sample sizes. They should also ensure women are not excluded from sample populations, which will become even more important if the research progresses to clinical populations.

      Reviewer #3 (Public review):

      Summary:

      Grogan et al examine a role for muscarinic receptor activation in action vigor in a saccadic system. This work is motivated by a strong literature linking dopamine to vigor, and some animal studies suggesting that ACH might modulate these effects, and is important because patient populations with symptoms related to reduced vigor are prescribed muscarinic antagonists. The authors use a motivated saccade task with distractors to measure the speed and vigor of actions in humans under placebo or muscarinic antagonism. They show that muscarinic antagonism blunts the motivational effects of reward on both saccade velocity and RT, and also modulates the distractibility of participants, in particular by increasing the repulsion of saccades away from distractors. They show that preparatory EEG signals reflect both motivation and drug condition, and make a case that these EEG signals mediate the effects of the drug on behavior.

      Strengths:

      This manuscript addresses an interesting and timely question and does so using an impressive within subject pharmacological design and a task well designed to measure constructs of interest. The authors show clear causal evidence that ACH affects different metrics of saccade generation related to effort expenditure and their modulation by incentive manipulations. The authors link these behavioral effects to motor preparatory signatures, indexed with EEG, that relate to behavioral measures of interest and in at least one case statistically mediate the behavioral effects of ACH antagonism.

      Weaknesses:

      A primary weakness of this paper is the sample size - since only 20 participants completed the study. The authors address the sample size in several places and I completely understand the reason for the reduced sample size (study halt due to covid). Nonetheless, it is worth stating explicitly that this sample size is relatively small for the effect sizes typically observed in such studies highlighting the need for future confirmatory studies.

      We thank the reviewer for their time and their assessment of this manuscript, and we appreciate their helpful comments on the previous version.

      We agree that the small sample size is a weakness of the study, and hope that future work into cholinergic modulation of motivation can involve larger samples to replicate and extend this work.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Thank you for addressing my comments and clarifying the analysis sections. Women can be included in such studies by performing a pregnancy test before each test session, but I understand how this could have added to the pandemic limitations. Best of luck with your future work!

      Thank you for your time in reviewing this paper, and your helpful comments.

      Reviewer #3 (Recommendations for the authors):

      The authors have done a great job at addressing my concerns and I think that the manuscript is now very solid. That said, I have one minor concern.

      Thank you for your time in reviewing this paper, and your helpful comments.

      For descriptions of mass univariate analyses and cluster correction, I am still a bit confused on exactly what terms were in the regression. In one place, the authors state:

      On each iteration we shuffled the voltages across trials within each condition and person, and regressed it against the behavioural variable, with the model 'variable ~1 + voltage + incentive*distractorPresent*THP + (1 | participant)'.

      I take this to mean that the regression model includes a voltage regressor and a three-way interaction term, along with participant level intercept terms.

      However, elsewhere, the authors state:

      "We regressed each electrode and time-point against the three behavioural variables separately, while controlling for effects of incentive, distractor, THP, the interactions of those factors, and a random effect of participant."

      I take this to mean that the regression model included regressors for incentive, distractorPresent, THP, along with their 2 and 3 way interactions. I think that this seems like the more reasonable model - but I just want to 1) verify that this is what the authors did and 2) encourage them to articulate this more clearly and consistently throughout.

      We apologise for the lack of clarity about the whole-brain regression analyses.

      We used Wilkinson notation for this formula, where ‘A*B’ denotes ‘A + B + A:B’, so all main effects and lower-order interactions terms were included in the regression, as your second interpretation says. The model written out in full would be:

      'variable ~1 + voltage + incentive + distractorPresent + THP + incentive*distractorPresent + incentive*THP + distractorPresent*THP +  incentive*distractorPresent*THP + (1 | participant)'    

      We will clarify this in the Version of Record.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors used a motivated saccade task with distractors to measure response vigor and reaction time (RT) in healthy human males under placebo or muscarinic antagonism. They also simultaneously recorded neural activity using EEG with event-related potential (ERP) focused analyses. This study provides evidence that the muscarinic antagonist Trihexyphenidyl (THP) modulates the motivational effects of reward on both saccade velocity and RT, and also increases the distractibility of participants. The study also examined the correlational relationships between reaction time and vigor and manipulations (THP, incentives) with components of the EEG-derived ERPs. While an interesting correlation structure emerged from the analyses relating the ERP biomarkers to behavior, it is unclear how these potentially epiphenomenal biomarkers relate to relevant underlying neurophysiology.

      Strengths:

      This study is a logical translational extension from preclinical findings of cholinergic modulation of motivation and vigor and the CNV biomarker to a normative human population, utilizing a placebo-controlled, double-blind approach.

      While framed in the context of Parkinson's disease where cholinergic medications can be used, the authors do a good job in the discussion describing the limitations in generalizing their findings obtained in a normative and non-age-matched cohort to an aged PD patient population.

      The exploratory analyses suggest alternative brain targets and/or ERP components that relate to the behavior and manipulations tested. These will need to be further validated in an adequately powered study. Once validated, the most relevant biomarkers could be assessed in a more clinically relevant population.

      Weaknesses:

      The relatively weak correlations between the main experimental outcomes provide unclear insight into the neural mechanisms by which the manipulations lead to behavioral manifestations outside the context of the ERP. It would have been interesting to evaluate how other quantifications of the EEG signal through time-frequency analyses relate to the behavioral outcomes and manipulations.

      The ERP correlations to relevant behavioral outcomes were not consistent across manipulations demonstrating they are not reliable biomarkers to behavior but do suggest that multiple underlying mechanisms can give rise to the same changes in the ERP-based biomarkers and lead to different behavioral outcomes.

      We thank the reviewer for their review and their comments.

      We agree that these ERPs may not be reliable biomarkers yet, given the many-to-one mapping we observed where incentives and THP antagonism both affected the CNV in different ways, and hope that future studies will help clarify the use and limitations of the CNV as a potential biomarker of invigoration.

      Our original hypothesis was specifically about the CNV as an index of preparatory behaviour, but we plan to look at potential changes to frequency characteristics in future work. We have included this in the discussion of future investigations. (page 16, line 428):

      “Future investigations of other aspects of the EEG signals may illuminate us. Such studies could also investigate other potential signals that may be more sensitive to invigoration and/or muscarinic antagonism, including frequency-band power and phase-coherence, or measures of variability in brain signals such as entropy, which may give greater insight into processes affected by these factors.”

      Reviewer #2 (Public Review):

      Summary:

      This work by Grogan and colleagues aimed to translate animal studies showing that acetylcholine plays a role in motivation by modulating the effects of dopamine on motivation. They tested this hypothesis with a placebo-controlled pharmacological study administering a muscarinic antagonist (trihexyphenidyl; THP) to a sample of 20 adult men performing an incentivized saccade task while undergoing electroengephalography (EEG). They found that reward increased vigor and reduced reaction times (RTs) and, importantly, these reward effects were attenuated by trihexyphenidyl. High incentives increased preparatory EEG activity (contingent negative variation), and though THP also increased preparatory activity, it also reduced this reward effect on RTs.

      Strengths:

      The researchers address a timely and potentially clinically relevant question with a within-subject pharmacological intervention and a strong task design. The results highlight the importance of the interplay between dopamine and other neurotransmitter systems in reward sensitivity and even though no Parkinson's patients were included in this study, the results could have consequences for patients with motivational deficits and apathy if validated in the future.

      Weaknesses:

      The main weakness of the study is the small sample size (N=20) that unfortunately is limited to men only. The generalizability and replicability of the conclusions remain to be assessed in future research with a larger and more diverse sample size and potentially a clinically relevant population. The EEG results do not shape a concrete mechanism of action of the drug on reward sensitivity.

      We thank the reviewer for their review, and their comments.

      We agree that our study was underpowered, not reaching our target of 27 participants due to pandemic restrictions halting our recruitment, and hope that future studies into muscarinic antagonism in motivation will have larger sample sizes, and include male and female participants across a range of ages, to assess generalisability.

      We only included men to prevent the chance of administering the drug to someone pregnant. Trihexyphenidyl is categorized by the FDA as a Pregnancy Category Class C drug, and the ‘Summary of Product Characteristics’ states: “There is inadequate information regarding the use of trihexyphenidyl in pregnancy. Animal studies are insufficient with regard to effects on pregnancy, embryonal/foetal development, parturition and postnatal development. The potential risk for humans is unknown. Trihexyphenidyl should not be used during pregnancy unless clearly necessary.”

      While the drug can be prescribed where benefits may outweigh this risk, as there were no benefits to participants in this study, we only recruited men to keep the risk at zero.

      We have updated the Methods/Drugs section to explain this (page 17, line 494):

      “The risks of Trihexyphenidyl in pregnancy are unknown, but the Summary Product of Characteristics states that it “should not be used during pregnancy unless clearly necessary”. As this was a basic research study with no immediate clinical applications, there was no justification for any risk of administering the drug during pregnancy, so we only recruited male participants to keep this risk at zero.”

      And we reference to this in the Methods/Participants section (page 18, line 501):

      “We recruited 27 male participants (see Drugs section above),…”

      We agree that future work is needed to replicate this in different samples, and that this work cannot tell us the mechanism by which the drug is dampening invigoration, but we think that showing these effects do occur and can be linked to anticipatory/preparatory activity rather than overall reward sensitivity is a useful finding.

      Reviewer #3 (Public Review):

      Summary:

      Grogan et al examine a role for muscarinic receptor activation in action vigor in a saccadic system. This work is motivated by a strong literature linking dopamine to vigor, and some animal studies suggesting that ACH might modulate these effects, and is important because patient populations with symptoms related to reduced vigor are prescribed muscarinic antagonists. The authors use a motivated saccade task with distractors to measure the speed and vigor of actions in humans under placebo or muscarinic antagonism. They show that muscarinic antagonism blunts the motivational effects of reward on both saccade velocity and RT, and also modulates the distractibility of participants, in particular by increasing the repulsion of saccades away from distractors. They show that preparatory EEG signals reflect both motivation and drug condition, and make a case that these EEG signals mediate the effects of the drug on behavior.

      Strengths:

      This manuscript addresses an interesting and timely question and does so using an impressive within-subject pharmacological design and a task well-designed to measure constructs of interest. The authors show clear causal evidence that ACH affects different metrics of saccade generation related to effort expenditure and their modulation by incentive manipulations. The authors link these behavioral effects to motor preparatory signatures, indexed with EEG, that relate to behavioral measures of interest and in at least one case statistically mediate the behavioral effects of ACH antagonism.

      Weaknesses:

      In full disclosure, I have previously reviewed this manuscript in another journal and the authors have done a considerable amount of work to address my previous concerns. However, I have a few remaining concerns that affect my interpretation of the current manuscript.

      Some of the EEG signals (figures 4A&C) have profiles that look like they could have ocular, rather than central nervous, origins. Given that this is an eye movement task, it would be useful if the authors could provide some evidence that these signals are truly related to brain activity and not driven by ocular muscles, either in response to explicit motor effects (ie. Blinks) or in preparation for an upcoming saccade.

      We thank the reviewer for re-reviewing the manuscript and for raising this issue.

      All the EEG analyses (both ERP and whole-brain) are analysing the preparation period between the ready-cue and target appearance when no eye-movements are required. We reject trials with blinks or saccades over 1 degree in size, as detected by the Eyelink software according the sensitive velocity and acceleration criteria specified in the manuscript (Methods/Eye-tracking, page 19, line 550). This means that there should be no overt eye movements in the data. However, microsaccades and ocular drift are still possible within this period, which indeed could drive some effects. To measure this, we counted the number of microsaccades (<1 degree in size) in the preparation period between incentive cue and the target onset, for each trial. Further, we measure the mean absolute speed of the eye during the preparation period (excluding the periods during microsaccades) for each trial.

      We have run a control analysis to check whether including ocular drift speed or number of microsaccades as a covariate in the whole-brain regression analysis changes the association between EEG and the behavioural metrics at frontal or other electrodes. Below we show these ‘variable ~ EEG’ beta-coefficients when controlling for each eye-movement covariate, in the same format as Figure 4. We did not run the permutation testing on this due to time/computational costs (it takes >1 week per variable), so p-values were not calculated, only the beta-coefficients. The beta-coefficients are almost unchanged, both in time-course and topography, when controlling for either covariate.  The frontal associations to velocity and distractor pull remain, suggesting they are not due to these eye movements.

      We have added this figure as a supplemental figure.

      For additional clarity in this response, we also plot the differences between these covariate-controlled beta-coefficients, and the true beta-coefficients from figure 4 (please note the y-axis scales are -0.02:0.02, not -0.15:0.15 as in Figure 4 and Figure 4-figure supplement 2). This shows that the changes to the associations between EEG and velocity/distractor-pull were not frontally-distributed, demonstrating eye-movements were not driving these effects. Relatedly, the RT effect’s change was frontally-distributed, despite Figure 4 showing the true relationship was central in focus, again indicating that effect was also not related to these eye movements.

      Author response image 1.

      Difference in beta-coefficients when eye-movement covariates are included. This is the difference from the beta-coefficients shown in Figure 4, please note the smaller y-axis limits.

      The same pattern was seen if we controlled for the change in eye-position from the baseline period (measured by the eye-tracker) at each specific time-point, i.e., controlling for the distance the eye had moved from baseline at the time the EEG voltage is measured. The topographies and time-course plots were almost identical to the above ones:

      Author response image 2.

      Controlling for change in eye-position at each time-point does not change the regression results. Left column shows the beta-coefficients between the variable and EEG voltage, and the right column shows the difference from the main results in Figure 4 (note the smaller y-axis limits for the right-hand column).

      Therefore, we believe the brain-behaviour regressions are independent of eye-movements. We have included the first figure presented here as an additional supplemental figure, and added the following to the text (page 10, line 265):

      “An additional control analysis found that these results were not driven by microsaccades or ocular drift during the preparation period, as including these as trial-wise covariates did not substantially change the beta-coefficients (Figure 4 – Figure Supplement 2).”

      For other EEG signals, in particular, the ones reported in Figure 3, it would be nice to see what the spatial profiles actually look like - does the scalp topography match that expected for the signal of interest?

      Yes, the CNV is a central negative potential peaking around Cz, while the P3a is slightly anterior to this (peaking between Cz and FCz). We have added the topographies to the main figure (see point below).

      This is the topography of the mean CNV (1200:1500ms from the preparation cue onset), which is maximal over Cz, as expected.

      The P3a’s topography (200:280ms after preparation cue) is maximal slightly anterior to Cz, between Cz and FCz.

      A primary weakness of this paper is the sample size - since only 20 participants completed the study. The authors address the sample size in several places and I completely understand the reason for the reduced sample size (study halt due to COVID). That said, they only report the sample size in one place in the methods rather than through degrees of freedom in their statistical tests conducted throughout the results. In part because of this, I am not totally clear on whether the sample size for each analysis is the same - or whether participants were removed for specific analyses (ie. due to poor EEG recordings, for example).  

      We apologise for the lack of clarity here. All 20 participants were included in all analyses, although the number of trials included differed between behavioural and EEG analyses. We only excluded trials with EEG artefacts from the EEG analyses, not from the purely behavioural analyses such as Figures 1&2, although trials with blinks/saccades were removed from behavioural analyses too. Removing the EEG artefactual trials from the behavioural analyses did not change the findings, despite the lower power. The degrees of freedom in the figure supplement tables are the total number of trials (less 8 fixed-effect terms) included in the single-trial / trial-wise regression analyses we used.

      We have clarified this in the Methods/Analysis (page 20, line 602):

      “Behavioural and EEG analysis included all 20 participants, although trials with EEG artefacts were included in the behavioural analyses (18585 trials in total) and not the EEG analyses (16627 trials in total), to increase power in the former. Removing these trials did not change the findings of the behavioural analyses.”

      And we state the number of participants and trials in the start of the behavioural results (page 3, line 97):

      “We used single-trial mixed-effects linear regression (20 participants, 18585 trials in total) to assess the effects of Incentive, Distractors, and THP, along with all the interactions of these (and a random-intercept per participant), on residual velocity and saccadic RT.”

      and EEG results section (page 7, line 193):

      “We used single-trial linear mixed-effects regression to see the effects of Incentive and THP on each ERP (20 participants, 16627 trials; Distractor was included too, along with all interactions, and a random intercept by participant).”

      Beyond this point, but still related to the sample size, in some cases I worry that results are driven by a single subject. In particular, the interaction effect observed in Figure 1e seems like it would be highly sensitive to the single subject who shows a reverse incentive effect in the drug condition.

      Repeating that analysis after removing the participant with the large increase in saccadic RT with incentives did not remove the incentive*THP interaction effect – although it did weaken slightly from (β = 0.0218, p = .0002) to  (β=0.0197, p=.0082). This is likely because that while that participant did have slower RTs for higher incentives on THP, they were also slower for higher incentives under placebo (and similarly for distractor present/absent), making them less of an outlier in terms of effects than in raw RT terms. Below is Author response image 3 the mean-figure without that participant, and Author response image 4 that participant shown separately.

      Author response image 3.

      Author response image 4.

      There are not sufficient details on the cluster-based permutation testing to understand what the authors did or whether it is reasonable. What channels were included? What metric was computed per cluster? How was null distribution generated?

      We apologise for not giving sufficient details of this, and have updated the Methods/Analysis section to include these details, along with a brief description in the Results section.

      To clarify here, we adapted the DMGroppe Mass Univariate Testing toolbox to also run cluster-based permutation regressions to examine the relationship between the behavioural variables and the voltages at all EEG electrodes at each time point. On each iteration we shuffled the voltages across trials within each condition and person, and regressed it against the behavioural variable, with the model ‘variable ~1 + voltage + incentive*distractorPresent*THP + (1 | participant)’. The Voltage term measured the association between voltage and the behavioural variable, after controlling for effects of incentive*distractor*THP on behaviour – i.e. does adding the voltage at this time/channel explain additional variance in the variable not captured in our main behavioural analyses. By shuffling the voltages, we removed the relationship to the behavioural variable, to build the null distribution of t-statistics across electrodes and time-samples. We used the ‘cluster mass’ method (Bullmore et al., 1999; Groppe et al., 2011; Maris & Oostenveld, 2007) to build the null distribution of cluster mass (across times/channels per iteration), and calculated the p-value as the proportion of this distribution further from zero than the absolute true t-statistics (two-tailed test).

      We have given greater detail for this in the Methods/Analysis section (page 20, line 614):

      “We adapted this toolbox to also run cluster-based permutation regressions to examine the relationship between the behavioural variables and the voltages at all EEG electrodes at each time point. On each iteration we shuffled the voltages across trials within each condition and person, and regressed it against the behavioural variable, with the model ‘~1 + voltage + incentive*distractorPresent*THP + (1 | participant)’. The Voltage term measured the association between voltage and the behavioural variable, after controlling for effects of incentive*distractor*THP on behaviour. By shuffling the voltages, we removed the relationship to the behavioural variable, to build the null distribution of t-statistics across electrodes and time-samples. We used the ‘cluster mass’ method (Bullmore et al., 1999; Groppe et al., 2011; Maris & Oostenveld, 2007) to build the null distribution, and calculated the p-value as the proportion of this distribution further from zero than the true t-statistics (two-tailed test). Given the relatively small sample size here, these whole-brain analyses should not be taken as definitive.”

      And we have added a brief explanation to the Results section also (page 9, line 246):

      “We regressed each electrode and time-point against the three behavioural variables separately, while controlling for effects of incentive, distractor, THP, the interactions of those factors, and a random effect of participant. This analysis therefore asks whether trial-to-trial neural variability predicts behavioural variability. To assess significance, we used cluster-based permutation tests (DMGroppe Mass Univariate toolbox; Groppe, Urbach, & Kutas, 2011), shuffling the trials within each condition and person, and repeating it 2500 times, to build a null distribution of ‘cluster mass’ from the t-statistics (Bullmore et al., 1999; Maris & Oostenveld, 2007) which was used to calculate two-tailed p-values with a family-wise error rate (FWER) of .05 (see Methods/Analysis for details).”

      The authors report that "muscarinic antagonism strengthened the P3a" - but I was unable to see this in the data plots. Perhaps it is because the variability related to individual differences obscures the conditional differences in the plots. In this case, event-related difference signals could be helpful to clarify the results.

      We thank the reviewer for spotting this wording error, this should refer to the incentive effect weakening the P3a, as no other significant effects were found on the P3a, as stated correctly in the previous paragraph. We have corrected this in the manuscript (page 9, line 232):

      “This suggests that while incentives strengthened the incentive-cue response and the CNV and weakened the P3a, muscarinic antagonism strengthened the CNV,”

      The reviewer’s suggestion for difference plots is very valuable, and we have added these to Figure 3, as well as increasing the y-axis scale for figure 3c to show the incentives weakening the P3a more clearly, and adding the topographies suggested in an earlier comment. The difference waves for Incentive and THP effects show that both are decreasing voltage, albeit with slightly different onset times – Incentive starts earlier, thus weakening the positive P3a, while both strengthen the negative CNV. The Incentive effects within THP and Placebo separately illustrate the THP*Incentive interaction.

      We have amended the Results text and figure (page 7, line 200):

      “The subsequent CNV was strengthened (i.e. more negative; Figure 3d) by incentive (β = -.0928, p < .0001) and THP (β = -0.0502, p < .0001), with an interaction whereby THP decreased the incentive effect (β= 0.0172, p = .0213). Figure 3h shows the effects of Incentive and THP on the CNV separately, using difference waves, and Figure 3i shows the incentive effect grows more slowly in the THP condition than the Placebo condition.

      For mediation analyses, it would be useful in the results section to have a much more detailed description of the regression results, rather than just reporting things in a binary did/did not mediate sort of way. Furthermore, the methods should also describe how mediation was tested statistically (ie. What is the null distribution that the difference in coefficients with/without moderator is tested against?).

      We have added a more detailed explanation of how we investigated mediation and mediated moderation, and now report the mediation effects for all tests run and the permutation-test p-values.

      We had been using the Baron & Kenny (1986) method, based on 4 tests outlined in the updated text below, which gives a single measure of change in absolute beta-coefficients when all the tests have been met, but without any indication of significance; any reduction found after meeting the other 3 tests indicates a partial mediation under this method. We now use permutation testing to generate a p-value for the likelihood of finding an equal or larger reduction in the absolute beta-coefficients if the CNV were not truly related to RT. This found that the CNV’s mediation of the Incentive effect on RT was highly significant, while the Mediated Moderation of CNV on THP*Incentive was weakly significant.

      During this re-analysis, we noticed that we had different trial-numbers in the different regression models, as EEG-artefactual trials were not excluded from the behavioural-only model (‘RT ~ 1 + Incentive’). However, this causes issues with the permutation testing as we are shuffling the ERPs and need the same trials included in all the mixed-effects models. Therefore, we have redone these mediation analyses, including only the trials with valid ERP measures (i.e. no artefactual trials) in all models. This has changed the beta-coefficients we report, but not the findings or conclusions of the mediation analyses. We have updated the figure to have these new statistics.

      We have updated the text to explain the methodology in the Results section (page 12, line 284):

      “We have found that neural preparatory activity can predict residual velocity and RT, and is also affected by incentives and THP. Finally, we ask whether the neural activity can explain the effects of incentives and THP, through mediation analyses. We used the Baron & Kenny ( 1986) method to assess mediation (see Methods/Analysis for full details). This tests whether the significant Incentive effect on behaviour could be partially reduced (i.e., explained) by including the CNV as a mediator in a mixed-effects single-trial regression. We measured mediation as the reduction in (absolute) beta-coefficient for the incentive effect on behaviour when the CNV was included as a mediator (i.e., RT ~ 1 + Incentive + CNV + Incentive*CNV + (1 | participant)). This is a directional hypothesis of a reduced effect, and to assess significance we ran a permutation-test, shuffling the CNV within participants, and measuring the change in absolute beta-coefficient for the Incentive effect on behaviour. This generates a distribution of mediation effects where there is no relationship between CNV and RT on a trial (i.e., a null distribution). We ran 2500 permutations, and calculated the proportion with an equal or more negative change in absolute beta-coefficient, equivalent to a one-tailed test. We ran this mediation analysis separately for the two behavioural variables of RT and residual velocity, but not for distractor pull as it was not affected by incentive, so failed the assumptions of mediation analyses (Baron & Kenny, 1986; Muller et al., 2005). We took the mean CNV amplitude from 1200:1500ms as our Mediator.

      Residual velocity passed all the assumption tests for Mediation analysis, but no significant mediation was found. That is, Incentive predicted velocity (β=0.1304, t(1,16476)=17.3280, p<.0001); Incentive predicted CNV (β=-0.9122, t(1,16476)=-12.1800, p<.0001); and CNV predicted velocity when included alongside Incentive (β=0.0015, t(1,16475)=1.9753, p=.0483). However, including CNV did not reduce the Incentive effect on velocity, and in fact strengthened it (β=0.1318, t(1,16475)=17.4380, p<.0001; change in absolute coefficient: Δβ=+0.0014). Since there was no mediation (reduction), we did not run permutation tests on this.

      However, RT did show a significant mediation of the Incentive effect by CNV: Incentive predicted RT (β=-0.0868, t(1,16476)=-14.9330, p<.0001); Incentive predicted CNV (β=-0.9122, t(1,16476)=-12.1800, p<.0001); and CNV predicted RT when included alongside Incentive (β=0.0127, t(1,16475)=21.3160, p<.0001). The CNV mediated the effect of Incentive on RT, reducing the absolute beta-coefficient (β=-0.0752, t(1,16475)=-13.0570, p<.0001; change in absolute coefficient: Δβ= -0.0116). We assessed the significance of this change via permutation testing, shuffling the CNV across trials (within participants) and calculating the change in absolute beta-coefficient for the Incentive effect on RT when the permuted CNV was included as a mediator. We repeated this 2500 times to build a null distribution of Δβ, and calculated the proportion with equal or stronger reductions for a one-tailed p-value, which was highly significant (p<.0001). This suggests that the Incentive effect on RT is partially mediated by the CNV’s amplitude during the preparation period, and this is not the case for residual velocity.

      We also investigated whether the CNV could explain the cholinergic reduction in motivation (THP*Incentive interaction) on RT – i.e., whether CNV mediation the THP moderation. We measured Mediated Moderation as suggested by Muller et al. (2005; see Methods/Analysis for full explanation): Incentive*THP was associated with RT (β=0.0222, t(1,16474)=3.8272, p=.0001); and Incentive*THP was associated with CNV (β=0.1619, t(1,16474)=2.1671, p=.0302); and CNV*THP was associated with RT (β=0.0014, t(1,16472)=2.4061, p=.0161). Mediated Moderation was measured by the change in absolute Incentive*THP effect when THP*CNV was included in the mixed-effects model (β=0.0214, t(1,16472)=3.7298, p=.0002; change in beta-coefficient: Δβ= -0.0008), and permutation-testing (permuting the CNV as above) found a significant effect (p=.0132). This indicates cholinergic blockade changes how incentives affect preparatory negativity, and how this negativity reflects RT, which can explain some of the reduced invigoration of RT. However, this was not observed for saccade velocity.

      And we have updated the Methods/Analysis section with a more detailed explanation too (page 21, line 627):

      “For the mediation analysis, we followed the 4-step process  (Baron & Kenny, 1986; Muller et al., 2005), which requires 4 tests be met for the outcome (behavioural variable, e.g. RT), mediator (ERP, e.g., CNV) and the treatment (Incentive):

      (1) Outcome is significantly associated with the Treatment (RT ~ 1 + Incentive + (1 | participant))

      (2) Mediator is significantly associated with the Treatment (ERP ~ 1 + Incentive + (1 | participant))

      (3) Mediator is significantly associated with the Outcome (RT ~ 1 + Incentive + ERP + (1 | participant))

      (4) And the inclusion of the Mediator reduces the association between the Treatment and Outcome (Incentive effect from model #3)

      The mediation was measured by the reduction in the absolute standardised beta coefficient between incentive and behaviour when the ERP mediator was included (model #3 vs model #1 above). We used permutation-testing to quantify the likelihood of finding these mediations under the null hypothesis, achieved by shuffling the ERP across trials (within each participant) to remove any link between the ERP and behaviour. We repeated this 2500 times to build a null distribution of the change in absolute beta-coefficients for the RT ~ Incentive effect when this permuted mediator was included (model #3 vs model #1). We calculated a one-tailed p-value by finding the proportion of the null distribution that was equal or smaller than the true values (as Mediation is a one-tailed prediction).

      Mediated moderation (Muller et al., 2005) was used to see whether the effect of THP (the Moderator) on behaviour is mediated by the ERP, with the following tests (after the previous Mediation tests were already satisfied):

      (5) THP moderates the Incentive effect, via a significant Treatment*Moderator interaction on the Outcome (RT ~ 1 + Incentive + THP + Incentive*THP + (1 | participant))

      (6) THP moderates the Incentive effect on the Mediator, via a Treatment*Moderator interaction on the Outcome (ERP ~ 1 + Incentive + THP + Incentive*THP + (1 | participant))

      (7) THP’s moderation of the Incentive effect is mediated by the ERP, via a reduction in the association of Treatment*Moderator on the Outcome when the Treatment*Moderator interaction is included (RT ~ 1 + Incentive + THP + Incentive*THP + ERP + ERP*THP + (1 | participant)

      Mediated moderation is measured as the reduction in absolute beta-coefficients for ‘RT ~ Incentive*THP’ between model #5 and #7, which captures how much of this interaction could be explained by including the Mediator*Moderator interaction (ERP*THP in model #7). We tested the significance of this with permutation testing as above, permuting the ERP across trials (within participants) 2500 times, and building a null distribution of the change in the absolute beta-coefficients for RT ~ Incentive*THP between models #7 and #5. We calculated a one-tailed p-value from the proportion of these that were equal or smaller than the true change.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) The analysis section could benefit from greater detail. For example, how exactly did they assess that the effects of the drug on peak velocity and RT were driven by non-distracting trials? Ideally, for every outcome, the analysis approach used should be detailed and justified.

      We apologise for the confusion from this. To clarify, we found a 2-way regression (incentive*THP) on both residual velocity and saccadic RT and this pattern was stronger in distractor-absent trials for residual velocity, and stronger in distractor-present trials for saccadic RT, as can be seen in Figure 1d&e. However, as there was no significant 3-way interaction (incentive*THP*distractor) for either metric, and the 2-way interaction effects were in the same direction in distractor present/absent trials for both metrics, we think these effects were relatively unaffected by distractor presence.

      We have updated the Results section to make this clearer: (page 3, line 94):

      We measured vigour as the residual peak velocity of saccades within each drug session (see Figure 1c & Methods/Eye-tracking), which is each trial’s deviation of velocity from the main sequence. This removes any overall effects of the drug on saccade velocity, while still allowing incentives and distractors to have different effects within each drug condition. We used single-trial mixed-effects linear regression (20 participants, 18585 trials in total) to assess the effects of Incentive, Distractors, and THP, along with all the interactions of these (and a random-intercept per participant), on residual velocity and saccadic RT. As predicted, residual peak velocity was increased by incentives (Figure 1d; β = 0.1266, p < .0001), while distractors slightly slowed residual velocity (β = -0.0158, p = .0294; see Figure 1 – Figure supplement 1 for full behavioural statistics). THP decreased the effect of incentives on velocity (incentive * THP: β = -0.0216, p = .0030), indicating that muscarinic blockade diminished motivation by incentives. Figure 1d shows that this effect was similar in distractor absent/present trials, although slightly stronger when the distractor was absent; the 3-way (distractor*incentive*THP) interaction was not significant (p > .05), suggesting that the distractor-present trials had the same effect but weaker (Figure 1d).

      Saccadic RT (time to initiation of saccade) was slower when participants were given THP (β = 0.0244, p = < .0001), faster with incentives (Figure 1e; β = -0.0767, p < .0001), and slowed by distractors (β = 0.0358, p < .0001). Again, THP reduced the effects of incentives (incentive*THP: β = 0.0218, p = .0002). Figure 1e shows that this effect was similar in distractor absent/present trials, although slightly stronger when the distractor was present; as the 3-way (distractor*incentive*THP) interaction was not significant and the direction of effects was the same in the two, it suggests the effect was similar in both conditions. Additionally, the THP*Incentive interactions were correlated between saccadic RT and residual velocity at the participant level (Figure 1 – Figure supplement 2).

      We have given more details of the analyses performed in the Methods section and the results, as requested by you and the other reviewers (page 20, line 602):

      Behavioural and EEG analysis included all 20 participants, although trials with EEG artefacts were included in the behavioural analyses (18585 trials in total) and not the EEG analyses (16627 trials in total), to increase power in the former. Removing these trials did not change the findings of the behavioural analyses.

      We used single-trial linear-mixed effects models to analyse our data, including participant as a random effect of intercept, with the formula ‘~1 + incentive*distractor*THP + (1 | participant)’. We z-scored all factors to give standardised beta coefficients.

      For the difference-wave cluster-based permutation tests (Figure 3 – Figure supplement 4), we used the DMGroppe Mass Univariate toolbox (Groppe et al., 2011), with 2500 permutations, to control the family-wise error rate at 0.05. This was used for looking at difference waves to test the effects of incentive, THP, and the incentive*THP interaction (using difference of difference-waves), across all EEG electrodes.

      We adapted this toolbox to also run cluster-based permutation regressions to examine the relationship between the behavioural variables and the voltages at all EEG electrodes at each time point. On each iteration we shuffled the voltages across trials within each condition and person, and regressed it against the behavioural variable, with the model ‘~1 + voltage + incentive*distractorPresent*THP + (1 | participant)’. The Voltage term measured the association between voltage and the behavioural variable, after controlling for effects of incentive*distractor*THP on behaviour. By shuffling the voltages, we removed the relationship to the behavioural variable, to build the null distribution of t-statistics across electrodes and time-samples. We used the ‘cluster mass’ method (Bullmore et al., 1999; Groppe et al., 2011; Maris & Oostenveld, 2007) to build the null distribution, and calculated the p-value as the proportion of this distribution further from zero than the true t-statistics (two-tailed test). Given the relatively small sample size here, these whole-brain analyses should not be taken as definitive.

      For the mediation analysis, we followed the 4-step process  (Baron & Kenny, 1986; Muller et al., 2005), which requires 4 tests be met for the outcome (behavioural variable, e.g. RT), mediator (ERP, e.g., CNV) and the treatment (Incentive):

      (1) Outcome is significantly associated with the Treatment (RT ~ 1 + Incentive + (1 | participant))

      (2) Mediator is significantly associated with the Treatment (ERP ~ 1 + Incentive + (1 | participant))

      (3) Mediator is significantly associated with the Outcome (RT ~ 1 + Incentive + ERP + (1 | participant))

      (4) And the inclusion of the Mediator reduces the association between the Treatment and Outcome (Incentive effect from model #3)

      The mediation was measured by the reduction in the absolute standardised beta coefficient between incentive and behaviour when the ERP mediator was included (model #3 vs model #1 above). We used permutation-testing to quantify the likelihood of finding these mediations under the null hypothesis, achieved by shuffling the ERP across trials (within each participant) to remove any link between the ERP and behaviour. We repeated this 2500 times to build a null distribution of the change in absolute beta-coefficients for the RT ~ Incentive effect when this permuted mediator was included (model #3 vs model #1). We calculated a one-tailed p-value by finding the proportion of the null distribution that was equal or more negative than the true value (as Mediation is a one-tailed prediction). For this mediation analysis, we only included trials with valid ERP measures, even for the models without the ERP included (e.g., model #1), to keep the trial-numbers and degrees of freedom the same.

      Mediated moderation (Muller et al., 2005) was used to see whether the effect of THP (the Moderator) on behaviour is mediated by the ERP, with the following tests (after the previous Mediation tests were already satisfied):

      (5) THP moderates the Incentive effect, via a significant Treatment*Moderator interaction on the Outcome (RT ~ 1 + Incentive + THP + Incentive*THP + (1 | participant))

      (6) THP moderates the Incentive effect on the Mediator, via a Treatment*Moderator interaction on the Outcome (ERP ~ 1 + Incentive + THP + Incentive*THP + (1 | participant))

      (7) THP’s moderation of the Incentive effect is mediated by the ERP, via a reduction in the association of Treatment*Moderator on the Outcome when the Treatment*Moderator interaction is included (RT ~ 1 + Incentive + THP + Incentive*THP + ERP + ERP*THP + (1 | participant)

      Mediated moderation is measured as the reduction in absolute beta-coefficients for ‘RT ~ Incentive*THP’ between model #5 and #7, which captures how much of this interaction could be explained by including the Mediator*Moderator interaction (ERP*THP in model #7). We tested the significance of this with permutation testing as above, permuting the ERP across trials (within participants) 2500 times, and building a null distribution of the change in the absolute beta-coefficients for RT ~ Incentive*THP between models #7 and #5. We calculated a one-tailed p-value from the proportion of these that were equal or more negative than the true change.

      (2) Please explain why only men were included in this study. We are all hoping that men-only research is a practice of the past.

      We only included men to prevent any chance of administering the drug to someone pregnant. Trihexyphenidyl is categorized by the FDA as a Pregnancy Category Class C drug, and the ‘Summary of Product Characteristics’ states: “There is inadequate information regarding the use of trihexyphenidyl in pregnancy. Animal studies are insufficient with regard to effects on pregnancy, embryonal/foetal development, parturition and postnatal development. The potential risk for humans is unknown. Trihexyphenidyl should not be used during pregnancy unless clearly necessary.”

      While the drug can be prescribed where benefits may outweigh this risk, as there were no benefits to participants in this study, we only recruited men to keep the risk at zero.

      We have updated the Methods/Drugs section to explain this (page 17, line 494):

      “The risks of Trihexyphenidyl in pregnancy are unknown, but the Summary Product of Characteristics states that it “should not be used during pregnancy unless clearly necessary”. As this was a basic research study with no immediate clinical applications, there was no justification for any risk of administering the drug during pregnancy, so we only recruited male participants to keep this risk at zero.”

      And we have referenced this in the Methods/Participants section (page 18, line 501):

      “Our sample size calculations suggested 27 participants would detect a 0.5 effect size with .05 sensitivity and .8 power. We recruited 27 male participants (see Drugs section above)”

      (3) Please explain acronyms (eg EEG) when first used.

      Thank you for pointing this out, we have explained EEG at first use in the abstract and the main text, along with FWER, M1r, and ERP which had also been missed at first use.

      Reviewer #3 (Recommendations For The Authors):

      The authors say: "Therefore, acetylcholine antagonism reduced the invigoration of saccades by incentives, and increased the pull of salient distractors. We next asked whether these effects were coupled with changes in preparatory neural activity." But I found this statement to be misleading since the primary effects of the drug seem to have been to decrease the frequency of distractor-repulsed saccades... so "decreased push" would probably be a better analogy than "increased pull".

      Thank you for noticing this, we agree, and have changed this to (page 5, line 165):

      “Therefore, acetylcholine antagonism reduced the invigoration of saccades by incentives, and decreased the repulsion of salient distractors. We next asked whether these effects were coupled with changes in preparatory neural activity.”

      I don't see anything in EEG preprocessing about channel rejection and interpolation. Were these steps performed? There are very few results related to the full set of electrodes.

      We did not reject or interpolate any channels, as visual inspection found no obvious outliers in terms of noisiness, and no channels had standard deviations (across time/trials) higher than our standard cutoff (of 80). The artefact rejection was applied across all EEG channels, so any trials with absolute voltages over 200uV in any channel were removed from the analysis. On average 104/120 trials were included (having passed this check, along with eye-movement artefact checks) per condition per person, and we have added the range of these, along with totals across conditions to the Analysis section and a statement about channel rejection/interpolation (page 20, line 588):

      “Epochs were from -200:1500ms around the preparation cue onset, and were baselined to the 100ms before the preparation cue appeared. Visual inspection found no channels with outlying variance, so no channel rejection or interpolation was performed. We rejected trials from the EEG analyses where participants blinked or made saccades (according to EyeLink criteria above) during the epoch, or where EEG voltage in any channel was outside -200:200μV (muscle activity). On average 104/120 trials per condition per person were included (SD = 21, range = 21-120), and 831/960 trials in total per person (SD=160, range=313-954). A repeated-measures ANOVA found there were no significant differences in number of trials excluded for any condition (p > .2).”

    1. Author response:

      Reviewer #1 (Public review):

      From the Reviewing Editor:

      Four reviewers have assessed your manuscript on valence and salience signaling in the central amygdala. There was universal agreement that the question being asked by the experiment is important. There was consensus that the neural population being examined (GABA neurons) was important and the circular shift method for identifying task-responsive neurons was rigorous. Indeed, observing valenced outcome signaling in GABA neurons would considerably increase the role the central amygdala in valence. However, each reviewer brought up significant concerns about the design, analysis and interpretation of the results. Overall, these concerns limit the conclusions that can be drawn from the results. Addressing the concerns (described below) would work towards better answering the question at the outset of the experiment: how does the central amygdala represent salience vs valence.

      A weakness noted by all reviewers was the use of the terms 'valence' and 'salience' as well as the experimental design used to reveal these signals. The two outcomes used emphasized non-overlapping sensory modalities and produced unrelated behavioral responses. Within each modality there are no manipulations that would scale either the value of the valenced outcomes or the intensity of the salient outcomes. While the food outcomes were presented many times (20 times per session over 10 sessions of appetitive conditioning) the shock outcomes were presented many fewer times (10 times in a single session). The large difference in presentations is likely to further distinguish the two outcomes. Collectively, these experimental design decisions meant that any observed differences in central amygdala GABA neuron responding are unlikely to reflect valence, but likely to reflect one or more of the above features.

      We appreciate the reviewers’ comments regarding the experimental design. When assessing fear versus reward, we chose stimuli that elicit known behavioral responses, freezing versus consumption. The use of stimuli of the same modality is unlikely to elicit easily definable fear or reward responses or to be precisely matched for sensory intensity. For example, sweet or bitter tastes can be used, but even these activate different taste receptors and vary in the duration of the activation of taste-specific signaling (e.g. how long the taste lingers in the mouth). The approach we employed is similar to that of Yang et al., 2023 (doi: 10.1038/s41586-023-05910-2) that used water reward and shock to characterize the response profiles of somatostatin neurons of the central amygdala. Similar to what was reported by Yang and colleagues we observed that the majority of CeA GABA neurons responded selectively to one unconditioned stimulus (~52%). We observed that 15% of neurons responded in the same direction, either activated or inhibited, by the food or shock US. These were defined as salience based on the definitions of Lin and Nicolelis, 2008 (doi: 10.1016/j.neuron.2008.04.031) in which basal forebrain neurons responded similarly to reward or punishment irrespective of valence. The designation of valence encoding based opposite responses to the food or shock is straightforward (~10% of cells); however, we agree that the designation of modality-specific encoding neurons as valence encoding is less straightforward.

      A second weakness noted by a majority of reviewers was a lack of cue-responsive unit and a lack of exploration of the diversity of response types, and the relationship cue and outcome firing. The lack of large numbers of neurons increasing firing to one or both cues is particularly surprising given the critical contribution of central amygdala GABA neurons to the acquisition of conditioned fear (which the authors measured) as well as to conditioned orienting (which the authors did not measure). Regression-like analyses would be a straightforward means of identifying neurons varying their firing in accordance with these or other behaviors. It was also noted that appetitive behavior was not measured in a rigorous way. Instead of measuring time near hopper, measures of licking would have been better. Further, measures of orienting behaviors such as startle were missing.

      The authors also missed an opportunity for clustering-like analyses which could have been used to reveal neurons uniquely signaling cues, outcomes or combinations of cues and outcomes. If the authors calcium imaging approach is not able to detect expected central amygdala cue responding, might it be missing other critical aspects of responding?

      As stated in the manuscript, we were surprised by the relatively low number of cue responsive cells; however, when using a less stringent statistical method (Figure 5 - Supplement 2), we observed 13% of neurons responded to the food associated cue and 23% responded to the shock associated cue. The differences are therefore likely a reflection of the rigor of the statistical measure to define the responsive units. The number of CS responsive units is less than reported in the CeAl by Ciocchi et al., 2010 (doi: 10.1038/nature09559 ) who observed 30% activated by the CS and 25% inhibited, but is not that dissimilar from the results of Duvarci et al., 2011 (doi: 10.1523/JNEUROSCI.4985-10.2011 ) who observed 11% activated in the CeAl and 25% inhibited by the CS. These numbers are also consistent with previous single cell calcium imaging of cell types in the CeA. For example, Yang et al., 2023 (doi: 10.1038/s41586-023-05910-2) observed that 13% of somatostatin neurons responded to a reward CS and 8% responded to a shock CS. Yu et al., 2017 (doi: 10.1038/s41593-017-0009-9) observed 26.5% of PKCdelta neurons responded to the shock CS. It should also be noted that our analysis was not restricted to the CeAl. Finally, Food learning was assessed in an operant chamber in freely moving mice with reward pellet delivery. Because liquids were not used for the reward US, licking is not a metric that can be used.

      All reviewers point out that the evidence for salience encoding is even more limited than the evidence for valence. Although the specific concern for each reviewer varied, they all centered on an oversimplistic definition of salience. Salience ought to scale with the absolute value and intensity of the stimulus. Salience cannot simply be responding in the same direction. Further, even though the authors observed subsets of central amygdala neurons increasing or decreasing activity to both outcomes - the outcomes can readily be distinguished based on the temporal profile of responding.

      We thank the reviewers for their comments relating to the definition of salience and valence encoding by central amygdala neurons. We have addressed each of the concerns below.

      Additional concerns are raised by each reviewer. Our consensus is that this study sought to answer an important question - whether central amygdala signal salience or valence in cue-outcome learning. However, the experimental design, analyses, and interpretations do not permit a rigorous and definitive answer to that question. Such an answer would require additional experiments whose designs would address the significant concerns described here. Fully addressing the concerns of each reviewer would result in a re-evaluation of the findings. For example, experimental design better revealing valence and salience, and analyses describing diversity of neuronal responding and relationship to behavior would likely make the results Important or even Fundamental.

      We appreciate the reviewers’ comments and have addressed each concern below.

      Reviewer #2 (Public review):

      In this article, Kong and authors sought to determine the encoding properties of central amygdala (CeA) neurons in response to oppositely valenced stimuli and cues predicting those stimuli. The amygdala and its subregional components have historically been understood to be regions that encode associative information, including valence stimuli. The authors performed calcium imaging of GABA-ergic CeA neurons in freely-moving mice conditioned in Pavlovian appetitive and fear paradigms, and showed that CeA neurons are responsive to both appetitive and aversive unconditioned and conditioned stimuli. They used a variant of a previously published 'circular shifting' technique (Harris, 2021), which allowed them to delineate between excited/non-responsive/inhibited neurons. While there is considerable overlap of CeA neurons responding to both unconditioned stimuli (in this case, food and shock, deemed "salience-encoding" neurons), there are considerably fewer CeA neurons that respond to both conditioned stimuli that predict the food and shock. The authors finally demonstrated that there are no differences in the order of Pavlovian paradigms (fear - shock vs. shock - fear), which is an interesting result, and convincingly presented given their counterbalanced experimental design.

      In total, I find the presented study useful in understanding the dynamics of CeA neurons during a Pavlovian learning paradigm. There are many strengths of this study, including the important question and clear presentation, the circular shifting analysis was convincing to me, and the manuscript was well written. We hope the authors will find our comments constructive if they choose to revise their manuscript.

      While the experiments and data are of value, I do not agree with the authors interpretation of their data, and take issue with the way they used the terms "salience" and "valence" (and would encourage them to check out Namburi et al., NPP, 2016) regarding the operational definitions of salience and valence which differ from my reading of the literature. To be fair, a recent study from another group that reports experiments/findings which are very similar to the ones in the present study (Yang et al., 2023, describing valence coding in the CeA using a similar approach) also uses the terms valence and salience in a rather liberal way that I would also have issues with (see below). Either new experiments or revised claims would be needed here, and more balanced discussion on this topic would be nice to see, and I felt that there were some aspects of novelty in this study that could be better highlighted (see below).

      One noteworthy point of alarm is that it seems as if two data panels including heatmaps are duplicated (perhaps that panel G of Figure 5-figure supplement 2 is a cut and paste error? It is duplicated from panel E and does not match the associated histogram).

      We thank the reviewer for their insightful comments and assessment of the manuscript.

      Major concerns:

      (1) The authors wish to make claims about salience and valence. This is my biggest gripe, so I will start here.

      (1a) Valence scales for positive and negative stimuli and as stated in Namburi et al., NPP, 2016 where we operationalize "valence" as having different responses for positive and negative values and no response for stimuli that are not motivational significant (neutral cues that do not predict an outcome). The threshold for claiming salience, which we define as scaling with the absolute value of the stimulus, and not responding to a neutral stimulus (Namburi et al., NPP, 2016; Tye, Neuron, 2018; Li et al., Nature, 2022) would require the lack of response to a neutral cue.

      We appreciate the reviewer’s comment on the definitions of salience and valence and agree that there is not a consistent classification of these response types in the field. As stated above, we used the designation of salience encoding if the cells respond in the same direction to different stimuli regardless of the valence of the stimulus similar to what was described previously (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031). Similar definitions of salience have also been reported elsewhere (for examples see: Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006,  Zhu et al., 2018 doi: 10.1126/science.aat0481, and  Comoli et al., 2003, doi: 10.1038/nn1113P). Per the suggestion of the reviewer, we longitudinally tracked cells on the first day of Pavlovian reward conditioning the fear conditioning day. Although there were considerably fewer head entries on the first day of reward conditioning, we were able to identify 10 cells that were activated by both the food US and shock US. We compared the responses to the first five head entries and last head entries and the first 5 shocks and last five shocks. Consistent with what has been reported for salience encoding neurons in the basal forebrain (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected and decreased in later trials.

      Author response image 1.

      (1b) The other major issue is that the authors choose to make claims about the neural responses to the USs rather than the CSs. However, being shocked and receiving sucrose also would have very different sensorimotor representations, and any differences in responses could be attributed to those confounds rather than valence or salience. They could make claims regarding salience or valence with respect to the differences in the CSs but they should restrict analysis to the period prior to the US delivery.

      Perhaps the reviewer missed this, but analysis of valence and salience encoding to the different CSs are presented in Figure 5G, Figure 5 -Supplement 1 C-D, and Figure 5 -Supplement 2 N-O. Analysis of CS responsiveness to CSFood and CSShock were analyzed during the conditioning sessions Figure 3E-F, Figure 4B-C, Figure 5 – Supplement 2J-O and Figure 5 – Supplement 3K-L, and during recall probe tests for both CSFood and CSShock, Figure 5 – Supplement 1C-J.

      (1c) The third obstacle to using the terms "salience" or "valence" is the lack of scaling, which is perhaps a bigger ask. At minimum either the scaling or the neutral cue would be needed to make claims about valence or salience encoding. Perhaps the authors disagree - that is fine. But they should at least acknowledge that there is literature that would say otherwise.

      (1d) In order to make claims about valence, the authors must take into account the sensory confound of the modality of the US (also mentioned in Namburi et al., 2016). The claim that these CeA neurons are indeed valence-encoding (based on their responses to the unconditioned stimuli) is confounded by the fact that the appetitive US (food) is a gustatory stimulus while the aversive US (shock) is a tactile stimulus.

      We provided the same analysis for the US and CS. The US responses were larger and more prevalent, but similar types of encoding were observed for the CS. We agree that the food reward and the shock are very different sensory modalities. As stated above, the use of stimuli of the same modality is unlikely to elicit easily definable fear or reward responses or to be precisely matched for sensory intensity. We agree that the definition of cells that respond to only one stimulus is difficult to define in terms of valence encoding, as opposed to being specific for the sensory modality and without scaling of the stimulus it is difficult to fully address this issue. It should be noted however, that if the cells in the CeA were exclusively tuned to stimuli of different sensory modalities, we would expect to see a similar number of cells responding to the CS tones (auditory) as respond to the food (taste) and shock (somatosensory) but we do not. Of the cells tracked longitudinally 80% responded to the USs, with 65% of cells responding to food (activated or inhibited) and 44% responding to shock (activated or inhibited).

      (2) Much of the central findings in this manuscript have been previously described in the literature. Yang et al., 2023 for instance shows that the CeA encodes salience (as demonstrated by the scaled responses to the increased value of unconditioned stimuli, Figure 1 j-m), and that learning amplifies responsiveness to unconditioned stimuli (Figure 2). It is nice to see a reproduction of the finding that learning amplifies CeA responses, though one study is in SST::Cre and this one in VGAT::cre - perhaps highlighting this difference could maximize the collective utility for the scientific community?

      We agree that the analysis performed here is similar to what was conducted by Yang et al., 2023. With the major difference being the types of neurons sampled. Yang et al., imaged only somatostatin neurons were as we recorded all GABAergic cell types within the CeA. Moreover, because we imaged from 10 mice, we sampled neurons that ostensibly covered the entire dorsal to ventral extent of the CeA (Figure 1 – Supplement 1). Remarkably, we found that the vast majority of CeA neurons (80%) are responsive to food or shock. Within this 80% there are 8 distinct response profiles consistent with the heterogeneity of cell types within the CeA based on connectivity, electrophysiological properties, and gene expression. Moreover, we did not find any spatial distinction between food or shock responsive cells, with the responsive cell types being intermingled throughout the dorsal to ventral axis (Figure 5 – Supplement 3).

      (3) There is at least one instance of copy-paste error in the figures that raised alarm. In the supplementary information (Figure 5- figure supplement 2 E;G), the heat maps for food-responsive neurons and shock-responsive neurons are identical. While this almost certainly is a clerical error, the authors would benefit from carefully reviewing each figure to ensure that no data is incorrectly duplicated.

      We thank the reviewer for catching this error. It has been corrected.

      (4) The authors describe experiments to compare shock and reward learning; however, there are temporal differences in what they compare in Figure 5. The authors compare the 10th day of reward learning with the 1st day of fear conditioning, which effectively represent different points of learning and retrieval. At the end of reward conditioning, animals are utilizing a learned association to the cue, which demonstrates retrieval. On the day of fear conditioning, animals are still learning the cue at the beginning of the session, but they are not necessarily retrieving an association to a learned cue. The authors would benefit from recording at a later timepoint (to be consistent with reward learning- 10 days after fear conditioning), to more accurately compare these two timepoints. Or perhaps, it might be easier to just make the comparison between Day 1 of reward learning and Day 1 of fear learning, since they must already have these data.

      We agree that there are temporal differences between the food and shock US deliveries. This is likely a reflection of the fact that the shock delivery is passive and easily resolved based on the time of the US delivery, whereas the food responses are variable because they are dependent upon the consumption of the sucrose pellet. Because of these differences the kinetics of the responses cannot be accurately compared. This is why we restricted our analysis to whether the cells were food or shock responsive. Aside from reporting the temporal differences in the signals did not draw major conclusions about the differences in kinetics. In our experimental design we counterbalanced the animals that received fear conditioning firs then food conditioning, or food conditioning then fear conditioning to ensure that order effects did not influence the outcome of the study. It is widely known that Pavlovian fear conditioning can facilitate the acquisition of conditioned stimulus responses with just a single day of conditioning. In contrast, Pavlovian reward conditioning generally progresses more slowly. Because of this we restricted our analysis to the last day of reward conditioning to the first and only day of fear conditioning. However, as stated above, we compared the responses of neurons defined as salience during day 1 of reward conditioning and fear conditioning. As would be predicted based on previous definitions of salience encoding (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected

      (5) The authors make a claim of valence encoding in their title and throughout the paper, which is not possible to make given their experimental design. However, they would greatly benefit from actually using a decoder to demonstrate their encoding claim (decoding performance for shock-food versus shuffled labels) and simply make claims about decoding food-predictive cues and shock-predictive cues. Interestingly, it seems like relatively few CeA neurons actually show differential responses to the food and shock CSs, and that is interesting in itself.

      As stated above, valence and salience encoding were defined similar to what has been previously reported (Li et al., 2019, doi: 10.7554/eLife.41223; Yang et al., 2023, doi: 10.1038/s41586-023-05910-2; Huang et al., 2024, doi: 10.1038/s41586-024-07819; Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). Interestingly, many of these studies did not vary the US intensity.

      Reviewer #3 (Public review):

      Summary:

      In their manuscript entitled Kong and colleagues investigate the role of distinct populations of neurons in the central amygdala (CeA) in encoding valence and salience during both appetitive and aversive conditioning. The study expands on the work of Yang et al. (2023), which specifically focused on somatostatin (SST) neurons of the CeA. Thus, this study broadens the scope to other neuronal subtypes, demonstrating that CeA neurons in general are predominantly tuned to valence representations rather than salience.

      We thank the reviewer for their insightful comments and assessment of the manuscript.

      Strengths:

      One of the key strengths of the study is its rigorous quantitative approach based on the "circular-shift method", which carefully assesses correlations between neural activity and behavior-related variables. The authors' findings that neuronal responses to the unconditioned stimulus (US) change with learning are consistent with previous studies (Yang et al., 2023). They also show that the encoding of positive and negative valence is not influenced by prior training order, indicating that prior experience does not affect how these neurons process valence.

      Weaknesses:

      However, there are limitations to the analysis, including the lack of population-based analyses, such as clustering approaches. The authors do not employ hierarchical clustering or other methods to extract meaning from the diversity of neuronal responses they recorded. Clustering-based approaches could provide deeper insights into how different subpopulations of neurons contribute to emotional processing. Without these methods, the study may miss patterns of functional specialization within the neuronal populations that could be crucial for understanding how valence and salience are encoded at the population level.

      We appreciate the reviewer’s comments regarding clustering-based approaches. In order to classify cells as responsive to the US or CS we chose to develop a statistically rigorous method for classifying cell response types. Using this approach, we were able to define cell responses to the US and CS. Importantly, we identified 8 distinct response types to the USs. It is not clear how additional clustering analysis would improve cell classifications.

      Furthermore, while salience encoding is inferred based on responses to stimuli of opposite valence, the study does not test whether these neuronal responses scale with stimulus intensity-a hallmark of classical salience encoding. This limits the conclusions that can be drawn about salience encoding specifically.

      As stated above, we used salience classifications similar to those previously described (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). We agree that varying the stimulus intensity would provide a more rigorous assessment of salience encoding; however, several of the studies mentioned above classify cells as salience encoding without varying stimulus intensity. Additionally, the inclusion of recordings with varying US intensities on top of the Pavlovian reward and fear conditioning would further decrease the number of cells that can be longitudinally tracked and would likely decrease the number of cells that could be classified.

      In sum, while the study makes valuable contributions to our understanding of CeA function, the lack of clustering-based population analyses and the absence of intensity scaling in the assessment of salience encoding are notable limitations.

      Reviewer #4 (Public review):

      Summary:

      The authors have performed endoscopic calcium recordings of individual CeA neuron responses to food and shock, as well as to cues predicting food and shock. They claim that a majority of neurons encode valence, with a substantial minority encoding salience.

      Strengths:

      The use of endoscopic imaging is valuable, as it provides the ability to resolve signals from single cells, while also being able to track these cells across time. The recordings appear well-executed, and employ a sophisticated circular shifting analysis to avoid statistical errors caused by correlations between neighboring image pixels.

      Weaknesses:

      My main critique is that the authors didn't fully test whether neurons encode valence. While it is true that they found CeA neurons responding to stimuli that have positive or negative value, this by itself doesn't indicate that valence is the primary driver of neural activity. For example, they report that a majority of CeA neurons respond selectively to either the positive or negative US, and that this is evidence for "type I" valence encoding. However, it could also be the case that these neurons simply discriminate between motivationally relevant stimuli in a manner unrelated to valence per se. A simple test of this would be to check if neural responses generalize across more than one type of appetitive or aversive stimulus, but this was not done. The closest the authors came was to note that a small number of neurons respond to CS cues, of which some respond to the corresponding US in the same direction. This is relegated to the supplemental figures (3 and 4), and it is not noted whether the the same-direction CS-US neurons are also valence-encoding with respect to different USs. For example, are the neurons excited by CS-food and US-food also inhibited by shock? If so, that would go a long way toward classifying at least a few neurons as truly encoding valence in a generalizable way.

      As stated above, valence and salience encoding were defined similar to what has been previously reported (Li et al., 2019, doi: 10.7554/eLife.41223; Yang et al., 2023, doi: 10.1038/s41586-023-05910-2; Huang et al., 2024, doi: 10.1038/s41586-024-07819; Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). As reported in Figure 5 and Figure 5 – Supplement 3, ~29% of CeA neurons responded to both food and shock USs (15% in the same direction and 13.5% in the opposite direction). In contrast, only 6 of 303 cells responded to both the CSfood and CSshock, all in the same direction.

      A second and related critique is that, although the authors correctly point out that definitions of salience and valence are sometimes confused in the existing literature, they then go on themselves to use the terms very loosely. For example, the authors define these terms in such a way that every neuron that responds to at least one stimulus is either salience or valence-encoding. This seems far too broad, as it makes essentially unfalsifiable their assertion that the CeA encodes some mixture of salience and valence. I already noted above that simply having different responses to food and shock does not qualify as valence-encoding. It also seems to me that having same-direction responses to these two stimuli similarly does not quality a neuron as encoding salience. Many authors define salience as being related to the ability of a stimulus to attract attention (which is itself a complex topic). However, the current paper does not acknowledge whether they are using this, or any other definition of salience, nor is this explicitly tested, e.g. by comparing neural response magnitudes to any measure of attention.

      As stated in response to reviewer 2, we longitudinally tracked cells on the first day of Pavlovian reward conditioning the fear conditioning day. Although there were considerably fewer head entries on the first day of reward conditioning, we were able to identify 10 cells that were activated by both the food US and shock US. We compared the responses to the first five head entries and last head entries and the first 5 shocks and last five shocks. Consistent with what has been reported for salience encoding neurons in the basal forebrain (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected and decreased in later trials.

      The impression I get from the authors' data is that CeA neurons respond to motivationally relevant stimuli, but in a way that is possibly more complex than what the authors currently imply. At the same time, they appear to have collected a large and high-quality dataset that could profitably be made available for additional analyses by themselves and/or others.

      Lastly, the use of 10 daily sessions of training with 20 trials each seems rather low to me. In our hands, Pavlovian training in mice requires considerably more trials in order to effectively elicit responses to the CS. I wonder if the relatively sparse training might explain the relative lack of CS responses?

      It is possible that learning would have occurred more quickly if we had used greater than 20 trials per session. However, we routinely used 20-25 trials for Pavlovian reward conditioning (doi: 10.1073/pnas.1007827107; doi: 10.1523/JNEUROSCI.5532-12.2013; doi: 10.1016/j.neuron.2013.07.044; and doi: 10.1016/j.neuron.2019.11.024).

    1. Author response:

      We agree with reviewer #1 to remove the mGluR6b data. It is indeed a weakness and is too preliminary. We will gladly remove it from the revised version.

      We will address the issue of the bulk responses (depicted in Figures 5 and 6) by showing the significance data, arguing that although we cannot prove that prey-detection is increased for lower intensities, the bulk effect is significant, so prey detection is effectively stronger.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Previous work demonstrated a strong bias in the percept of an ambiguous Shepard tone as either ascending or descending in pitch, depending on the preceding contextual stimulus. The authors recorded human MEG and ferret A1 single-unit activity during presentation of stimuli identical to those used in the behavioral studies. They used multiple neural decoding methods to test if context-dependent neural responses to ambiguous stimulus replicated the behavioral results. Strikingly, a decoder trained to report stimulus pitch produced biases opposite to the perceptual reports. These biases could be explained robustly by a feed-forward adaptation model. Instead, a decoder that took into account direction selectivity of neurons in the population was able to replicate the change in perceptual bias.

      Strengths:

      This study explores an interesting and important link between neural activity and sensory percepts, and it demonstrates convincingly that traditional neural decoding models cannot explain percepts. Experimental design and data collection appear to have been executed carefully. Subsequent analysis and modeling appear rigorous. The conclusion that traditional decoding models cannot explain the contextual effects on percepts is quite strong.

      Weaknesses:

      Beyond the very convincing negative results, it is less clear exactly what the conclusion is or what readers should take away from this study. The presentation of the alternative, "direction aware" models is unclear, making it difficult to determine if they are presented as realistic possibilities or simply novel concepts. Does this study make predictions about how information from auditory cortex must be read out by downstream areas? There are several places where the thinking of the authors should be clarified, in particular, around how this idea of specialized readout of direction-selective neurons should be integrated with a broader understanding of auditory cortex.

      While we have not used the term "direction aware", we think the reviewer refers generally to the capability of our model to use a cell's direction selectivity in the decoding. In accordance with the reviewer's interpretation, we did indeed mean that the decoder assumes that a neuron does not only have a preferred frequency, but also a preferred direction of change in frequency (ascending/descending), which is what we use to demonstrate that the decoding in this way aligns with the human percept. We have adapted the text in several places to clarify this, in particular expanding the description in the Methods substantially.

      Reviewer #2 (Public Review):

      The authors aim to better understand the neural responses to Shepard tones in auditory cortex. This is an interesting question as Shepard tones can evoke an ambiguous pitch that is manipulated by a proceeding adapting stimulus, therefore it nicely disentangles pitch perception from simple stimulus acoustics.

      The authors use a combination of computational modelling, ferret A1 recordings of single neurons, and human EEG measurements.

      Their results provide new insights into neural correlates of these stimuli. However, the manuscript submitted is poorly organized, to the point where it is near impossible to review. We have provided Major Concerns below. We will only be able to understand and critique the manuscript fully after these issues have been addressed to improve the readability of the manuscript. Therefore, we have not yet reviewed the Discussion section.

      Major concerns

      Organization/presentation

      The manuscript is disorganized and therefore difficult to follow. The biggest issue is that in many figures, the figure subpanels often do not correspond to the legend, the main body, or both. Subpanels described in the text are missing in several cases.

      We have gone linearly through the text and checked that all figure subpanels are referred to in the text and the legend. As far as we can tell, this was already the case for all panels, with the exception of two subpanels of Fig. 5.

      Many figure axes are unlabelled.

      We have carefully checked the axes of all panels and all but two (Fig. 5D) were labeled. As is customary, certain panels inherit the axis label from a neighboring panel, if the label is the same, e.g. subpanels in Fig. 6F or Fig. 5E, which helps to declutter the figure. We hope that with this clarification, the reviewer can understand the labels of each panel.

      There is an inconsistent style of in-text citation between figures and the main text. The manuscript contains typos and grammatical errors. My suggestions for edits below therefore should not be taken as an exhaustive list. I ask the authors to consider the following only a "first pass" review, and I will hopefully be able to think more deeply about the science in the second round of revisions after the manuscript is better organized.

      While we are puzzled by the severity of issues that R2 indicates (see above, and R3 qualifies it as "well written", and R1 does not comment on the writing negatively), we have carefully gone through all specific issues mentioned by R2 and the other reviewers. We hope that the revised version of the paper with all corrections and clarifications made will resolve any remaining issues.

      Frequency and pitch

      The terms "frequency" and "pitch" seem to be used interchangeably at times, which can lead to major misconceptions in a manuscript on Shepard tones. It is possible that the authors confuse these concepts themselves at times (e.g. Fig 5), although this would be surprising given their expertise in this field. Please check through every use of "frequency" and "pitch" in this manuscript and make sure you are using the right term in the right place. In many places, "frequency" should actually be "fundamental frequency" to avoid misunderstanding.

      Thanks for pointing this out. We have checked every occurrence and modified where necessary.

      Insufficient detail or lack of clarity in descriptions

      There seems to be insufficient information provided to evaluate parts of these analysis, most critically the final pitch-direction decoder (Fig 6), which is a major finding. Please clarify.

      Thanks for pointing this out. We have extended the description of the pitch-direction decoder and highlighted its role for interpreting the results.

      Reviewer #3 (Public Review):

      Summary:

      This is an elegant study investigating possible mechanisms underlying the hysteresis effect in the perception of perceptually ambiguous Shepard tones. The authors make a fairly convincing case that the adaptation of pitch direction sensitive cells in auditory cortex is likely responsible for this phenomenon.

      Strengths:

      The manuscript is overall well written. My only slight criticism is that, in places, particularly for non-expert readers, it might be helpful to work a little bit more methods detail into the results section, so readers don't have to work quite so hard jumping from results to methods and back.

      Following this excellent suggestion, we have added more brief method sketches to the Results section, hopefully addressing this concern.

      The methods seem sound and the conclusions warranted and carefully stated. Overall I would rate the quality of this study as very high, and I do not have any major issues to raise.

      Thanks for your encouraging evaluation of the work.

      Weaknesses:

      I think this study is about as good as it can be with the current state of the art. Generally speaking, one has to bear in mind that this is an observational, rather than an interventional study, and therefore only able to identify plausible candidate mechanisms rather than making definitive identifications. However, the study nevertheless represents a significant advance over the current state of knowledge, and about as good as it can be with the techniques that are currently widely available.

      Thanks for your encouraging evaluation of our work. The suggestion of an interventional study has also been on our minds, however, this appears rather difficult, as it would require a specific subset of cells to be inhibited. The most suitable approach would likely be 2p imaging with holographic inhibition of a subset of cells (using ArchT for example), that has a preference for one direction of pitch change, which should then bias the percept/behavior in the opposite direction.

      Reviewer #1 (Recommendations For The Authors):

      MAJOR CONCERNS

      (1) What is the timescale used to compute direction selectivity in neural tuning? How does it compare to the timing of the Shepard tones? The basic idea of up versus down pitch is clear, the intuition for the role of direction tuning and its relation to stimulus dynamics could be laid out more clearly. Are the authors proposing that there are two "special" populations of A1 neurons that are treated differently to produce the biased percept? Or is there something specific about the dynamics of the Shepard stimuli and how direction selective neurons respond to them specifically? It would help if the authors could clarify if this result links to broader concepts of dynamic pitch coding in general or if the example reported here is specific (or idiosyncratic) to Shepard tones.

      We propose that the findings here are not specific to Shepard tones. To the contrary, only basic properties of auditory cortex neurons, i.e. frequency preference, frequency-direction (i.e. ascending or descending) preference, and local adaptation in the tuning curve, suffice. Each of these properties have been demonstrated many times before and we only verified this in the lead-up to the results in Fig. 6. While the same effects should be observable with pure tones, the lack of ambiguity in the perception of direction of a frequency step for pure tone pairs, would make them less noticeable here. Regarding the time-scale of the directional selectivity, we relied on the sequencing of tones in our paradigm, i.e. 150 ms spacing. The SSTRFs were discretized at 50 ms, and include only the bins during the stimulus, not during the pause. The directional tuning, i.e. differences in the SSTRF above and below the preferred pitchclass for stimuli before the last stimulus, typically extended only one stimulus back in time. We have clarified this in more detail now, in particular in the added Methods section on the directional decoder.

      (2) (p. 9) "weighted by each cell's directionality index ... (see Methods for details)" The direction-selective decoder is interesting and appears critical to the study. However, the details of its implementation are difficult to locate. Maybe Fig. 6A contains the key concepts? It would help greatly if the authors could describe it in parallel with the other decoders in the Methods.

      We have expanded the description of the decoder in the Methods as the reviewer suggests.

      LESSER CONCERNS

      p. 1. (L 24) "distances between the pitch representations...." It's not obvious what "distances" means without reading the main paper. Can some other term or extra context be provided?

      We have added a brief description here.

      p. 2. (L 26) "Shepard tones" Can the authors provide a citation when they first introduce this class of stimuli?

      Citation has been added.

      p. 3 (L 4) "direction selective cells" Please define or provide context for what has a direction. Selective to pitch changes in time?

      Yes, selective to pitch changes in time is what is meant. We have further clarified this in the text.

      p. 4 (L 9-19). This paragraph seems like it belongs in the Introduction?

      Given the concerns raised by R2 about the organization of the manuscript we prefer to keep this 'road-map' in the manuscript, as a guidance for the reader.

      p. 4 (L 32) "majority of cells" One might imagine that the overlap of the bias band and the frequency tuning curve of individual neurons might vary substantially. Was there some criterion about the degree of overlap for including single units in the analysis? Does overlap matter?

      We are not certain which analysis the reviewer is referring to. Generally, cells were not excluded based on their overlap between a particular Bias band and their (Shepard) tuning curve. There are several reasons for this: The bias was located in 4 different, overlapping Shepard tone regions, and all sounds were Shepard tones. Therefore, all cells overlapped with their (Shepard) tuning curve with one or multiple of the Biases. For decoding analysis, all cells were included as both a response and lack of a response is contributing to the decoding. If the reviewer is referring only to the analysis of whether a cell adapts, then the same argument applies as above, i.e. this was an average over all Bias sequences, and therefore every responding cell was driven to respond by the Bias, and therefore it was possible to also assess whether it adapted its response for different positions inside the Bias. We acknowledge that the limited randomness of the Bias sequences in combination with the specific tuning of the cells could in a few cases create response patterns over time that are not indicative of the actual behavior for repeated stimulation, however, since the results are rather clear with 91% of cells adapting, we do not think this would significantly change the conclusions.

      p. 5 (L 17) "desynchronization ... behaving conditions" The logic here is not clear. Is less desynchronization expected during behavior? Typically, increased attention is associated with greater desynchronization.

      Yes, we reformulated the sentence to: While this difference could be partly explained by desynchronization which is typically associated with active behavior or attention [30], general response adaptation to repeated stimuli is also typical in behaving humans [31].

      p. 7 (L 5) "separation" is this a separation in time?

      Yes, added.

      p. 7 (L 33) "local adaptation" The idea of feedforward adaptation biasing encoding has been proposed before, and it might be worth citing previous work. This includes work from Nelken specifically related to SSA. Also, this model seems similar to the one described in Lopez Espejo et al (PLoS CB 2019).

      Thanks for pointing this out. We think, however, that neither of these publications suggested this very narrow way of biasing, which we consider biologically implausible. We have therefore not added either of these citations.

      p. 11 (L. 17) The cartoon in Fig. 6G may provide some intuition, but it is quite difficult to interpret. Is there a way to indicate which neuron "votes" for which percept?

      This is an excellent idea, and we have added now the purported perceptual relation of each cell in the diagram.

      p. 12 (L. 8). "classically assumed" This statement could benefit from a citation. Or maybe "classically" is not the right word?

      We have changed 'classically' to 'typically', and now cite classical works from Deutsch and Repp. We think this description makes sense, as the whole concept of bistable percepts has been interpreted as being equidistant (in added or subtracted semitone steps) from the first tone, see e.g. Repp 1997, Fig.2.

      p. 12 (L. 12) "...previous studies" of Shepard tone percepts? Of physiology?

      We have modified it to 'Relation to previous studies of Shepard tone percepts and their underlying physiology", since this section deals with both.

      p. 12 (L. 25) "compatible with cellular mechanisms..." This paragraph seems key to the study and to Major Concern 1, above. What are the dynamics of the task stimuli? How do they compare with the dynamics of neural FM tuning and previously reported studies of bias? And can the authors be more explicit in their interpretation - should direction selective neurons respond preferentially to the Shepard tone stimuli themselves? And/or is there a conceptual framework where the same neurons inform downstream percepts of both FM sweeps and both normal (unbiased) and biased Shepard tones?

      The reviewer raises a number of different questions, which we address below:

      - Dynamics of the task stimuli in relation to previously reported cellular biasing: The timescales tested in the studies mentioned are similar to what we used in our bias, e.g. Ye et al 2010 used FM sweeps that lasted for up to 200ms, which is quite comparable to our SOA of 150ms.

      - Preferred responses to Shepard tones: no, we do not think that there should be preferred responses to Shepard tones, but rather that responses to Shepard tones can be thought of as the combined responses to the constituent tones.

      - Conceptual framework where the same neurons inform about FM sweeps and both normal (unbiased) and biased Shepard tones: Our perspective on this question is as follows: To our knowledge, the classical approach to population decoding in the auditory system, i.e. weighted based on preferred frequency, has not been directly demonstrated to be read out inside the brain, and certainly not demonstrated to be read out in only this way in all areas of the brain that receive input from the auditory cortex. Rather it has achieved its credibility by being linked directly with animal performance or match with the presented stimuli. However, these approaches were usually geared towards a representation that can be estimated based on constituent frequencies. Additional response properties of neurons, such as directional selectivity have been documented and analyzed before, however, not been used for explaining the percept. We agree that our use of this cellular response preference in the decoding implicitly assumes that the brain could utilize this as well, however, this seems just as likely or unlikely as the use of the preferred frequency of a neuron. Therefore we do not think that this decoding is any more speculative than the classical decoding. In both cases, subsequent neurons would have to implicitly 'know' the preference of the input neuron, and weigh its input correspondingly.

      We have added all the above considerations to the discussion in an abbreviated form.

      p. 15 (L. 15). Is there a citation for the drive system?

      There is no publication, but an old repository, where the files are available, which we cite now: https://code.google.com/archive/p/edds-array-drive/

      p. 16 (L. 24) "position in an octave" It is implied but not explicitly stated that the Shepard tones don't contain the fundamental frequency. Can the authors clarify the relationship between the neural tuning band and the bands of the stimulus. Did a single stimulus band typically fall in a neuron's frequency tuning curve? If not 1, how many?

      Yes, it is correct that the concept of fundamental frequency does not cleanly apply to Shepard tones, because it is composed of octave spaced pure tones, but the lowest tone is placed outside the hearing range of the animal and amplitude envelope (across frequencies). Therefore one or more constituent tones of the Shepard tone can fall into the tuning curve of a neuron and contribute to driving the neuron (or inhibiting it, if they fall within an inhibitory region of the tuning curve). The number of constituent tones that fall within the tuning curve depends on the tuning width of the neurons. The distribution of tuning widths to Shepard tones is shown in Fig. S1E, which indicated that a lot of neurons had rather narrow tuning (close to the center), but many were also tuned widely, indicated that they would be stimulated by multiple constituent tones of the Shepard tone. As the tuning bandwidth (Q30: 30dB above threshold) of most cortical neurons in the ferret auditory cortex (see e.g. Bizley et al. Cerebral Cortex, 2005, Fig.12) is below 1, this means that typically not more than 1 tone fell into the tuning curve of a neuron. However, we also observed multimodal tuning-curves w.r.t. to Shepard tones, which suggests that some neurons were stimulated by more than 2 or more constituent tones (again consistent with the existence of more broadly tuned neurons (see same citation). We have added this information partly to the manuscript in the caption of Fig. S1E.

      p. 17 (L. 32). "Fig 4" Correct figure ref? This figure appears to be a schematic rather than one displaying data.

      Thanks for pointing this out, changed to Fig. 5.

      p. 18 (L. 25). "assign a pitchclass" Can the authors refer to a figure illustrating this process?

      Added.

      p. 19 (L. 17). Is mu the correct symbol?

      Thanks. We changed it to phi_i, as in the formula above.

      p. 19 (L 19). "convolution" in time? Frequency?

      Thanks for pointing this out, the term convolution was incorrect in this context. We have replaced it by "weighted average" and also adapted and simplified the formula.

      p. 19 (L 25) "SSTRF" this term is introduced before it is defined. Also it appears that "SSTRF" and "STRF" are sometimes interchanged.

      Apologies, we have added the definition, and also checked its usage in each location.

      p. 23 (Fig 2) There is a mismatch between panel labels in the figure and in the legend. Bottom right panel (B3), what does time refer to here?

      Thanks for pointing these out, both fixed.

      p. 24 (L 23) "shifts them away" away from what?

      We have expanded the sentence to: "After the bias, the decoded pitchclass is shifted from their actual pitchclass away from the biased pitchclass range ... "

      p. 25 (L 7) "individual properties" properties of individual subjects?

      Thanks for pointing this out, the corresponding sentence has been clarified and citations added.

      p. 26 (L 20) What is plotted in panel D? The average for all cells? What is n?

      Yes, this is an average over cells, the number of cells has now been added to each panel.

      p. 28 (L 3) How to apply the terms "right" "right" "middle" to the panel is not clear. Generally, this figure is quite dense and difficult to interpret.

      We have changed the caption of Panel A and replaced the location terms with the symbols, which helps to directly relate them to the figure. We have considered different approaches of adding or removing content from the figure to help make it less dense, but that all did not seem to help. For lack of better options we have left it in its current form.

      MINOR/TYPOS

      p. 3 (L 1) "Stimulus Specific Adaptation" Capitalization seems unnecessary

      Changed.

      p. 4 (L 14) "Siple"

      Corrected.

      p. 9 (L 10) "an quantitatively"

      Corrected

      p. 9 (L 20) "directional ... direction ... directly ... directional" This is a bit confusing as directseems to mean several different things in its different usages.

      We have gone through these sentences, and we think the terms are now more clearly used, especially since the term 'direction' occurs in several different forms, as it relates to different aspects (cells/percept/hypothesis). Unfortunately, some repetition is necessary to maintain clarity.

      Reviewer #2 (Recommendations For The Authors):

      Detailed critique

      Stimuli

      It would be very useful if the authors could provide demos of their stimuli on a website. Many readers will not be familiar with Shepard tones and the perceptual result of the acoustical descriptions are not intuitive. I ended up coding the stimuli myself to get some intuition for them.

      We have created some sample tones and sequences and uploaded them with the revision as supplementary documents.

      Abstract

      P1 L27 'pitch and...selective cells' - The authors haven't provided sufficient controls to demonstrate that these are "pitch cells" or "selective" to pitch direction. They have only shown that they are sensitive to these properties in their stimuli. Controls would need to be included to ensure that the cells aren't simply responding to one frequency component in the complex sound, for example. This is not really critical to the overall findings, but the claim about pitch "selectivity" is not accurate.

      Fair point. We have removed the word 'selective' in both occurrences.

      Introduction

      P2 L14-17: I do not follow the phonetic example provided. The authors state that the second syllable of /alga/ and /arda/ are physically identical, but how is this possible that ga = da? The acoustics are clearly different. More explanation is needed, or a correction.

      Apologies for the slightly misleading description, it has now been corrected to be in line with the original reference.

      P2,L26-27: Should the two uses of "frequency" be "F0" and "pitch" here? The tones are not separated in frequency by half and octave, but "separated in [F0]" by half an octave, correct? Their frequency ranges are largely overlapping. And the second 'frequency', which refers to the percept, should presumably be "pitch".

      Indeed. This is now corrected.

      P3 L2-6: Unclear at this point in the manuscript what is the difference between the 3 percepts mentioned: perceived pitch-change direction, Shepard tone pitches, and "their respective differences". (It becomes clear later, but clarification is needed here).

      We have tried a few reformulations, however, it tends to overload the introduction with details. We believe it is preferable to present the gist of the results here, and present the complete details later in the MS.

      P3 L6-7 What does it mean that the MEG and single unit results "align in direction and dynamics"? These are very different signals, so clarification is needed.

      We have phrased the corresponding sentence more clearly.

      Results

      Throughout: Choose one of 'pitch class', 'pitchclass', or 'pitch-class' and use it consistently.

      Done.

      P4L12 - would be helpful at this point to define 'repulsive effect'

      We have added another sentence to clarify this term.

      P4, L14 "simple"

      Done

      P4, L12 - not clear here what "repulsive influence" means

      See above.

      P4, L17 - alternative to which explanation? Please clarify. In general, this paragraph is difficult to interpret because we do not yet have the details needed to understand the terms used and the results described. In my opinion, it would be better to omit this summary of the results at the very beginning, and instead reveal the findings as they come, when they can be fully explained to the Reader.

      We agree, but we also believe that a rather general description here is useful for providing a roadmap to the results. However, we have added a half-sentence to clarify what is meant by alternative.

      P4 L30 - text says that cells adapt in their onset, sustained and offset responses, but only data for onset responses are shown (I think - clarification needed for fig 2A2). Supp figure shows only 1 example cell of sustained and offset, and in fact there is no effect of adaptation in the sustained response shown there.

      Regarding the effect of adaptation and whether it can be discerned from the supplementary figure: the shown responses are for 10 repetitions of one particular Bias sequence. Since the response of the cell will depend on its tuning and the specific sequence of the Shepard tones in this Bias, it is not possible to assess adaptation for a given cell. We assess the level of adaptation, by averaging all biases (similar to what is shown in Fig. 2A2) per cell, and then fit an exponential to it, separately by response type. The step direction of the exponential, relative to the spontaneous rate is then used to assess the kind of adaptation. The vast majority of cells show adaptation. We have added this information to the Methods of the manuscript.

      P4, L32 - please state the statistical test and criterion (alpha) used to determine that 91% of cells decreased their responses throughout the Bias sequence. Was this specifically for onset responses?

      Thanks for pointing this out, test and p-value added. Adaptation was observed for onset, sustained and offset responses, in all cases with the vast majority showing an adapting behavior, although the onset responses were adapting the most.

      P4 L36 - "response strength is reduced locally". What does "locally" mean here? Nearby frequencies?

      We have added a sentence here to clarify this question.

      Figure 1 - this appears to be the wrong version of the figure, as it doesn't match the caption or results text. It's not possible to assess this figure until these things are fixed. Figure 1A schematic of definition of f(diff) does not correspond to legend definition.

      As far as we can tell, it is all correct, only the resolution of the figure appears to be rather low. This has been improved now.

      Fig 2 A2 - is this also onset responses only?

      Yes, added to the caption.

      Fig 2 A3 - add y-axis label. The authors are comparing a very wide octave band (5.5 octaves) to a much narrower band (0.5 octaves). Could this matter? Is there something special about the cut-off of 2.5 octaves in the 2 bands, or was this an arbitrary choice?

      Interesting question.... essentially our stimulus design left us only with this choice, i.e. comparing the internal region of the bias with the boundary region of the bias, i.e. the test tones. The internal region just corresponds to the bias, which is 5 st wide, and therefore the range is here given as 2.5 st relative to its center, while the test tones are at the boundary, as they are 3 st from the center. The axis for the bias was mislabelled, and has now been corrected. The y-axis label is matched with the panel to the left, but has now been added to avoid any confusion.

      Fig 2A4 - does not refer to ferret single unit data, as stated in the text (p5L8). Nor does supp Fig2, as stated. Also, the figure caption does not match the figure.

      Apologies, this was an error in the code that led to this mislabelling. We have corrected the labels, which also added back the recovery from the Bias sequence in the new Panel A4.

      P5 l9 - Figure 3 is not understandable at this point in the text, and should not be referred to here. There is a lot going on in Fig 3, and it isn't clear what you are referring to.

      Removed.

      P5 L12 - by Fig 2 B1, I assume you mean A4? Also, F2B1 shows only 1 subject, not 2.

      Yes, mislabeled by mistake, and corrected now.

      Fig2B2 -What is the y-axis?

      Same as in the panel to its left, added for clarity.

      Stimuli: why are tones presented at a faster rate to ferrets than to humans?

      The main reason is that the response analysis in MEG requires more spacing in time than the neuronal analysis in the ferret brain.

      P5 L6 - there is no Fig 5 D2? I don't think it is a good idea to get the reader to skip so far ahead in the figures at this stage anyway, even if such a figure existed. It is confusing to jump around the manuscript

      Changed to 'see below'

      P5 L8 - There is no Figure 2A4, so I don't know whether this time constant is accurate.

      This was in reference to a panel that had been removed before, but we have added it back now.

      P5 L16: "in humans appears to be more substantial (40%) than for the average single units under awake conditions". One cannot directly compare magnitude of effects in MEG and single unit signals in this way and assume it is due to behavioural state. You are comparing different measures of neural activity, averaged over vastly different numbers of numbers, and recorded from different species listening to different stimuli (presentation rates).

      Yes, that's why the next sentence is: "However, comparisons between the level of adaptation in MEG and single neuron firing rates may be misleading, due to the differences in the signal measured and subsequent processing.", and all statements in the preceding sentences are phrased as 'appears' and 'may'. We think we have formulated this comparison with an appropriate level of uncertainty. Further, the main message here is that adaptation is taking place in both active and passive conditions.

      P5 L25 -I do not see any evidence regarding tuning widths in Fig s2, as stated in the text.

      Corrected to Fig. S1.

      P5 l26 - Do not skip ahead to Fig 5 here. We aren't ready to process that yet.

      OK, reference removed.

      P5 l27 - Do you mean because it could be tuning to pitch chroma, not height?

      Yes, that is a possible interpretation, although it could also arise from a combination of excitatory and inhibitory contributions across multiple octaves.

      P5 l33 - remove speculation about active vs passive for reasons given above.

      Removed.

      P6L2-6 'In the present...5 semitone step' - This is an incorrect interpretation of the minimal distance hypothesis in the context of the Shepard tone ambiguity. The percept is ambiguous because the 'true' F0 of the Shepard tones are imperceptibly low. Each constituent frequency of a single tone can therefore be perceived either as a harmonic of some lower fundamental frequency or as an independent tone. The dominant pitch of the second tone in the tritone pair may therefore be biased to be perceived at a lower constituent frequency (when the bias sequence is low) or at a higher constituent frequency (when the bias sequence is high). The text states that the minimal distance hypothesis would predict that an up-bias would make a tritone into a perfect fourth (5 semitones). This is incorrect. The MDH would predict that an up-bias would reduce the distance between the 1st tone in the ambiguous pair and the upper constituent frequency of the 2nd tone in the pair, hence making the upper constituent frequency the dominant pitch percept of the 2nd tone, causing an ascending percept.

      The reviewer here refers to a “minimal distance hypothesis”, which without a literature reference,is hard for us to fully interpret. However, some responses are given below:

      - "The percept is ambiguous because the 'true' F0 of the Shepard tones are imperceptibly low." This statement appears to be based on some misconception: due to the octave spacing (rather than multiple/harmonics of a lowest frequency), the Shepard tones cannot be interpreted as usual harmonic tones would be. It is correct that the lowest tone in a Shepard tone is not audible, due to the envelope and the fact that it could in principle be arbitrarily small... hence, speaking about an F0 is really not well-defined in the case of a Shepard tone. The closest one could get to it would be to refer to the Shepard tone that is both in the audible range and in the non-zero amplitude envelope. But again, since the envelope is fading out the highest and lowest constituent tones, it is not as easy to refer to the lowest one as F0 (as it might be much quieter than the next higher constituent.

      - "The dominant pitch of the second tone in the tritone pair may therefore be biased to be perceived at a lower constituent frequency (when the bias sequence is low) or at a higher constituent frequency (when the bias sequence is high)." This may relate to some known psychophysics, but we are unable to interpret it with certainty.

      - "The text states that the minimal distance hypothesis would predict that an up-bias would make a tritone into a perfect fourth (5 semitones). This is incorrect." We are unsure how the reviewer reaches this conclusion.

      - "The MDH would predict that an up-bias would reduce the distance between the 1st tone in the ambiguous pair and the upper constituent frequency of the 2nd tone in the pair, hence making the upper constituent frequency the dominant pitch percept of the 2nd tone, causing an ascending percept." Again, in the absence of a reference to the MDH, we are unsure of the implied rationale. We agree that this is a possible interpretation of distance, however, we believe that our interpretation of distance (i.e. distances between constituent tones) is also a possible interpretation.

      Fig 4: Given that it comes before Figure 3 in the results text, these should be switched in order in the paper.

      Switched.

      PCA decoder: The methods (p18) state that the PCA uses the first 3 dimensions, and that pitch classes are calculated from the closest 4 stimuli. The results (P6), however, state that the first 2 principal components are used, and classes are computed from the average of 10 adjacent points. Which is correct, or am I missing something?

      Thanks for pointing this out, we have made this more concrete in the Methods to: "The data were projected to the first three dimensions, which represented the pitch class as well as the position in the sequence of stimuli (see Fig. 43A for a schematic). As the position in the Bias sequence was not relevant for the subsequent pitch class decoding, we only focussed on the two dimensions that spanned the pitch circle." Regarding the number of stimuli that were averaged: this might be a slight misunderstanding: Each Shepard tone was decoded/projected without averaging. However, to then assign an estimated pitch class, we first had to establish an axis (here going around the circle), where each position along the axis was associated with a pitch class. This was done by stepping in 0.5 semitone steps, and finding the location in decoded space that corresponded to the median of the Shepard tones within +/- 0.25st. To increase the resolution, this circular 'axis' of 24 points was then linearly interpolated to a resolution of 0.05st. We have updated the text in the Methods accordingly. The mentioning of 10 points for averaging in the Results was correct, as there were 240 tones in all bias stimuli, and 24 bins in the pitch circle. The mentioning of an average over 4 tones in the Methods was a typo.

      Fig 3A: axes of pink plane should be PC not PCA

      Done.

      Fig 3B: the circularity in the distribution of these points is indeed interesting! But what do the authors make of the gap in the circle between semitones 6-7? Is this showing an inherent bias in the way the ambiguous tone is represented?

      While we cannot be certain, we think that this represents an inhomogeneous sampling from the overall set of neural tuning preferences, and that if we had recorded more/all neurons, the circle would be complete and uniformly sampled (which it already nearly is, see Fig.4C, which used to be Fig. 3C).

      Fig 3B (lesser note): It'd be preferable to replace the tint (bright vs. dark) differentiation of the triangles to be filled vs. unfilled because such a subtle change in tint is not easily differentiable from a change in hue (indicating a different variable in this plot) with this particular colour palette

      We have experimented with this suggestion, and it didn't seem to improve the clarity. However, we have changed the outline of the test-pair triangles to white, which now visually separates them better.

      P6 l32 - Please indicate if cross-validation was used in this decoder, and if so, what sort. Ideally, the authors would test on a held-out data set, or at least take a leave-one-out approach. Otherwise, the classifier may be overfit to the data, and overfitting would explain the exceptional performance (r=.995) of the classifier.

      Cross-validation was not used, as the purpose of the decoder is here to create a standard against which to compare the biased responses in the ambiguous pair, which were not used for training of the decoder. We agree that if we instead used a cross-validated decoder (which would only apply to the local average to establish the pitch class circle) the correlation would be somewhat lower, however, this is less relevant for the main question, i.e. the influence of the Bias sequence on the neural representation of the ambiguous pair. We have added this information to the corresponding section.

      Fig 3D: I understood that these pitch classifications shown by the triangles were carried out on the final ambiguous pair of stimuli. I thought these were always presented at the edges of the range of other stimuli, so I do not follow how they have so many different pitchclass values on the x-axis here.

      There were 4 Biases, centered at 0,3,6 or 9 semitones, and covering [-2.5,2.5]st relative to this center. Therefore the edges of the bias ranges (3st away from their centers) happen to be the same as the centers, e.g. for the Bias centered at 3, the ambiguous pair would be a 0-6 or 6-0 step. Therefore there are 4 locations for the ambiguous tones on the x-axis of Fig. 4D (previously 3D).

      Figure 4: This demonstration of the ambiguity of Shepard pairs may be misleading. The actual musical interval is never ambiguous, as this figure suggests. Only the ascending vs descending percept is ambiguous. Therefore the predictions of the ferret A1 decoding (Fig 3D) and the model in Fig 5 are inconsistent with perception in two ways. One (which the authors mention) is the direction of the bias shift (up vs down). Another (not mentioned here) is that one never experiences a shift in the shepard tone at a fraction of a semitone - the musical note stays the same, and changes only in pitch height, not pitch chroma.

      We are unsure of the reviewer’s direction with this question. In particular the second point is not clear to us: "...one (who?) never (in this experiment? in real life?) experiences a bias shift in the Shepard tone at a fraction of a semitone" (why is this relevant in the current experiment?). Pitch chrome would actually be a possible replacement for pitch class, but somehow, the previous Shepard tone literature has referred to it as pitch class.

      P7 l12 - omit one 'consequently'

      Changed to 'Therefore'.

      P7 l24 - I encourage the authors to not use "local" and "global" without making it clear what space they refer to. One tends to automatically think of frequency space in the auditory system, but I think here they mean f0 space? What is a "cell close to the location of the bias"? Cells reside in the brain. The bias is in f0 space. The use of "local" and "global" throughout the manuscript is too vague.

      Agreed, the reference here was actually to the cell's preferred pitch class, not its physical location (which one might arguably be able to disambiguate, given the context). We have changed the wording, and also checked the use of global/local throughout the manuscript. The main use of 'global/local' is now in reference to the range of adaptation, and is properly introduced on first mention.

      P7 L26 -there is no Fig 5D1. Do you mean the left panel of 5D?

      Thanks. Changed.

      FigS3 is referred to a lot on p7-8. Should this be moved to the main text?

      The main reason why we kept it in the supplement is that it is based on a more static model, which is intended to illustrate the consequences of different encoding schemes. In order to not confuse the reader about these two models, we prefer to keep it in the supplement, which - for an online journal - makes little difference since the reader can just jump ahead to this figure in the same way as any other figure.

      Fig 5C, D - label x-axis.

      Added.

      Fig 5E - axis labels needed. I don't know what is plotted on x and y, and cannot see red and green lines in left plot

      Thanks for noticing this, colors corrected, axes labeled.

      Page 8 L3-15 - If I follow this correctly, I think the authors are confusing pitch and frequency here in a way that is fundamental to their model. They seem to equate tonotopic frequency tuning to pitch tuning, leading to confused implications of frequency adaptation on the F0 representation of complex sounds like Shepard tones. To my knowledge, the authors do not examine pure tone frequency tuning in their neurons in this study. Please clarify how you propose that frequency tuning like that shown in Fig 5A relates to representation of the F0 of Shepard tones. Or...are the authors suggesting these neural effects have little to do with pitch processing and instead are just the result of frequency tuning for a single harmonic of the Shepard tones?

      We agree that it is not trivial to describe this well, while keeping the text uncluttered, in particular, because often tuning properties to stimulus frequency contribute to tuning properties of the same neuron for pitch class, although this can be more or less straightforward: specifically, for some narrowly tuned cells, the Shepard tuning is simply a reflection of their tuning to a single octave range of the constituent tones (see Fig. S1). For more broadly tuned cells, multiple constituent tones will contribute to the overall Shepard tuning, which can be additive, subtractive, or more complex. The assumption in our approach is that we can directly estimate the Shepard tuning to evaluate the consequence for the percept. While this may seem artificial, as Shepard tones do not typically occur in nature, the same argument could be made against pure tones, on which classical tuning curves and associated decodings are often based. Relating the Shepard tuning to the classical tuning would be an interesting study in itself, although arguably relating the tuning of one artificial stimulus to another. Regarding the terminology of pitch, pitch class and frequency: The term pitch class is commonly used in the field of Shepard tones, and - as we indicated in the beginning of the results: "the term pitch is used interchangeably with pitch class as only Shepard tones are considered in this study". We agree that the term pitch, which describes the perceptual convergence/construction of a tone-height from a range of possible physical stimuli, needs to be separated from frequency as one contributor/basis for the perception of a pitch. However, we think that the term pitch can - despite its perceptual origin - also be associated with neuron/neural responses, in order to investigate the neural origin of the pitch percept. At the same time, the present study is not targeted to study pitch encoding per se, as this would require the use of a variety of stimuli leading to consistent pitch percepts. Therefore, pitch (class) is here mainly used as a term to describe the neural responses to Shepard tones, based on the previous literature, and the fact that Shepard tones are composite stimuli that lead to a pitch percept. The last sentence has been added to the manuscript for clarity.

      P7-9: I wasn't left with a clear idea of how the model works from this text. I assume you have layers of neurons tuned to frequency or f0 (based on the real data?), which are connected in some way to produce some sort of output when you input a sound? More detail is needed here. How is the dynamic adaptation implemented?

      The detailed description of the model can be found in the Methods section. We have gone through the corresponding paragraph and have tried to clarify the description of the model by introducing a high-level description and the reference to the corresponding Figure (Fig. 5A) in the Results.

      Fig6A: Figure caption can't be correct. In any case, these equations cannot be understood unless you define the terms in them.

      We have clarified the description in the caption.

      Fig 6/directionality analysis: Assuming that the "F" in the STRFs here is Shepard tone f0, and not simple frequency?

      We have changed the formula in the caption and the axis labels now.

      Fig 6C - y-axis values

      In the submission, these values were left out on purpose, as the result has an arbitrary scale, but only whether it is larger or smaller than 0 counts for the evaluation of the decoded directionality (at the current level of granularity). An interesting refinement would be to relate the decoded values to animal performance. We have now scaled the values arbitrarily to fit within [-1,1], but we would like to emphasize that only their relative scale matters here, not their absolute scale.

      Fig 6E - can't both be abscissa (caption). I might be missing something here, but I don't see the "two stripes" in the data that are described in the caption.

      Thank you. The typo is fixed. The stripes are most clearly visible in the right panel of Fig. 6E, red and blue, diagonally from top left to bottom right.

      Fig 6G -I have no idea what this figure is illustrating.

      This panel is described in the text as follows: "The resulting distribution of activities in their relation to the Bias is, hence, symmetric around the Bias (Fig. 6G). Without prior stimulation, the population of cells is unadapted and thus exhibits balanced activity in response to a stimulus. After a sequence of stimuli, the population is partially adapted (Fig. 6G right), such that a subsequent stimulus now elicits an imbalanced activity. Translated concretely to the present paradigm, the Bias will locally adapt cells. The degree of adaptation will be stronger, if their tuning curve overlaps more with the biased region. Adaptation in this region should therefore most strongly influence a cell’s response. For example, if one considers two directional cells, an up- and a down-selective cell, cocentered in the same frequency location below the Bias, then the Bias will more strongly adapt the up-cell, which has its dominant, recent part of the SSTRF more inside the region of the Bias (Fig. 6G right). Consistent with the percept, this imbalance predicts the tone to be perceived as a descending step relative to the Bias. Conversely, for the second stimulus in the pair, located above the Bias, the down-selective cells will be more adapted, thus predicting an ascending step relative to the previous tone."

      I might be just confused or losing steam at this point, but I do not follow what has been done or the results in Fig 6 and the accompanying text very well at all. Can this be explained more clearly? Perhaps the authors could show spike rate responses of an example up-direction and down-direction neuron? Explain how the decoder works, not just the results of it.

      We agree that we are presenting something new here. However, it is conceptually not very different from decoding based on preferred frequencies. We have attempted to provide two illustrations of how the decoder works (Fig. 6A) and how it then leads to the percept using prototypical examples of cellular SSTRFs (Fig. 6G). We have added a complete, but accessible description to the Methods section. Showing firing rates of neurons would unfortunately not be very telling, given the usual variability in neural response and the fact that our paradigm did not have a lot of repetitions (but instead a lot of conditions), which would be able to average out the variability on a single neuron level.

      Discussion - I do not feel I can adequately critique the author's interpretation of the results until I understand their results and methods better. I will therefore save my critique of the discussion section for the next round of revisions after they have addressed the above issues of disorganization and clarity in the manuscript.

      We hope that the updated version of the manuscript provides the reviewer now with this possibility.

      Methods

      P15L7 - gender of human subjects? Age distribution? Age of ferrets?

      We have added this information.

      P16L21 - What is the justification for randomizing the phase of the constituent frequencies?

      The purpose of the randomization was to prevent idiosyncratic phase relationships for particular Shepard tones, which would depend in an orderly fashion on the included base-frequencies if non-randomized, and could have contributed to shaping the percept for each Shepard tone in a way that was only partly determined by the pitch class of the Shepard tone. Added to the section.

      P17L6 - what are the 2 randomizations? What is being randomized?

      Pitch classes and position in the Bias sequence. Added to the section.

      P16 Shepard Tuning section - What were the durations of the tones and the time between tones within a trial?

      Thanks, added!

      Equations - several undefined terms in the equations throughout the manuscript.

      Thanks. We have gone through the manuscript and all equations and have introduced additional definitions where they had been missing.

      Reviewer #3 (Recommendations For The Authors):

      P3L10: "passive" and "active" conditions come totally out of the blue. Need introducing first. (Or cut. If adaptation is always seen, why mention the two conditions if the difference is not relevant here?)

      We have added an additional sentence in the preceding paragraph, that should clarify this. The reason for mentioning it is that otherwise a possible counter-argument could be made that adaptation does not occur in the active condition, which was not tested in ferrets (but presents an interesting avenue for future research).

      P3L14 "siple" typo

      Corrected.

      P4L1 "behaving humans" you should elaborate just a little here on what sort of behavior the participants engaged in.

      Thanks for pointing this out. We have clarified this by adding an additional sentence directly thereafter.

      P4 adaptation: I wonder whether it would be useful to describe the Bias condition a bit more here before going into the observations. The reader cannot know what to expect unless they jump ahead to get a sense of what the Bias looks like in the sense of how many stimuli are in it, and how similar they are to each other. Observations such as "the average response strength decreases as a function of the position in the Bias sequence" are entirely expected if the Bias is made up of highly repetitive material, but less expected if it is not. I appreciate that it can be awkward to have Methods after Results, but with a format like that, the broad brushstroke Methods should really be incorporated into the Results and only the tedious details should be reserved for the Methods to avoid readers having to jump back and forth.

      Agreed, we have inserted a corresponding description before going into the details of the results.

      Related to this (perhaps): Bottom of P4, top of P5: "significantly less reduced (33%, p=0.0011, 2 group t-test) compared to within the bias (Fig. 2 A3, blue vs. red), relative to the first responses of the bias" ... I am at a loss as to what the red and blue symbols in Fig 2 A3 really show, and I wonder whether the "at the edges" to "within the Bias" comparison were to make sense if at this stage I had been told more about the composition of the Bias sequence. Do the ambiguous ('target') tones also occur within the Bias? As I am unclear about what is compared against what I am also not sure how sound that comparison is.

      We have added an extended description of the Bias to the beginning of this section of the manuscript. For your reference: the Shepard tones that made up the ambiguous tones were not part of the Bias sequence, as they are located at 3st distance from the center of the Bias (above and below), while the Bias has a range of only +/- 2.5st.

      Fig 2: A4 B1 B2 labels should be B1 B2 B3

      Corrected.

      Fig 2 A2, A3: consider adjusting y-axis range to have less empty space above the data. In A3 in particular, the "interesting bit" is quite compressed.

      Done, however, while still matching the axes of A2 and A3 for better comparability.

      I am under the strong impression that the human data only made it into Fig 2 and that the data from Fig 3 onwards are animal data only. That is of course fine (MEG may not give responses that are differentiated enough to perform the sort of analyses shown in the later figures. But I do think that somewhere this should be explicitly stated.

      Yes, the reviewer's observation is correct. The decoding analyses could not be conducted on the human MEG data and was therefore not further pursued. Its inclusion in the paper has the purpose of demonstrating that even in humans and active conditions, the local adaptation is present, which is a key contributor to the two decoding models. We now state this explicitly when starting the decoding analysis.

      P5L2 "bias" not capitalized. Be consistent.

      All changed to capitalized.

      P5L8 reference to Fig 2 A4: something is amiss here. From legend of Fig 2 it seems clear that panel A4 label is mislabeled B1. Maybe some panels are missing to show recovery rates?

      Apologies for this residual text from a previous version of the manuscript. We have gone through all references and corrected them.

      P6L7 comma after "decoding".

      Changed.

      Fig 3, I like this analysis. What would be useful / needed here though is a little bit more information about how the data were preprocessed and pooled over animals. Did you do the PCA separately for each animal, then combine, or pool all units into a big matrix that went into the PCA? What about repeat, presentations? Was every trial a row in the matrix, or was there some averaging over repeats? (In fact, were there repeats??)

      Thanks for bringing up these relevant aspects, which were partly insufficiently detailed in the manuscript. Briefly, cells were pooled across animals and we only used cells that could meaningfully contribute to the decoding analysis, i.e. had auditory responses and different responses to different Shepard tones. Regarding the responses, as stated in the Methods, "Each stimulus was repeated 10 times", and we computed average responses across these repetitions. Single trials were not analyzed separately. We have added this information in the Methods, and refer to it in the Results.

      Also, there doesn't appear to be a preselection of units. We would not necessarily expect all cortical neurons to have a meaningful "best pitch" as they may be coding for things other than pitch. Intuitively I suspect that, perhaps, the PCA may take care of that by simply not assigning much weight to units that don't contribute much to explained variance? In any event I think it should be possible, and would be of some interest, to pull out of this dataset some descriptive statistics on what proportion of units actually "care about pitch" in that they have a lot (or at least significantly more than zero) of response variance explained by pitch. Would it make sense to show a distribution of %VE by pitch? Would it make sense to only perform the analysis in Fig 3 on units that meet some criterion? Doing so is unlikely to change the conclusion, but I think it may be useful for other scientists who may want to build on this work to get a sense of how much VE_pitch to expect.

      We fully agree with the reviewer, which is why this information is already presented in Supplementary Fig.1, which details the tuning properties of the recorded neurons. Overall, we recorded from 1467 neurons across all ferrets, out of which 662 were selected for the decoding analysis based on their driven firing rate (i.e. whether they responded significantly to auditory stimulation) and whether they showed a differential response to different Shepard tones The thresholds for auditory response and tuning to Shepard tones were not very critical: setting the threshold low, led to quantitatively the same result, however, with more noise. Setting the thresholds very high, reduced the set of cells included in the analysis, and eventually that made the results less stable, as the cells did not cover the entire range of preferences to Shepard tones. We agree that the PCA based preprocessing would also automatically exclude many of the cells that were already excluded with the more concrete criteria beforehand. We have added further information on this issue in the Methods section under the heading 'Unit selection'.

      P9 "tones This" missing period.

      Changed.

      P10L17 comma after "analysis"

      Changed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #3 (Public Review):

      Some critical comments are provided below:

      (1) The data quality still needs to be improved. There are many outliers in the experimental data shown in some figures, e.g. Figure 2D-G. The presence of these outliers makes the results unreliable. The author should thoroughly review the data analysis in the manuscript. In addition, a couple of western blot bands, such as IL-1β in Figure 3C, are not clear enough, please provide clearer western blot results again to support the conclusion.

      Following our comparative analysis, we have determined that these data do not affect our conclusions. Moreover, our experimental design included a total of six mice per group, with all mouse samples being subjected to testing.

      (2) As shown in Figure 1G-I, foot thickness and IL-1β content in foot tissues of the Aged+Abx group were significantly reduced, but there was no difference in serum uric acid level. In addition, the Abx-untreated group should be included at all ages.

      Thank you for your comment. We have included this data in Supplemental Material 4.

      (3) Since FMT (Figure 4) and butyrate supplementation (Figure 8) have different effects on uric acid synthesis enzyme and excretion, different mechanisms may lie behind these two interventions. Transplantation with significantly enriched single strains from young mice, such as Bifidobacterium and Akkermansia, is the more reliable approach to reveal the underlying mechanism between gut microbiota and gout.

      Thank you for your comment. Due to the involvement of multiple bacterial genera in gout and hyperuricemia, and the practical challenge of testing all strains, our focus shifted to the functional implications and metabolism of the microbiota. Experimental validation confirmed that butyrate exerts a dual-therapeutic effect in mitigating gout and hyperuricemia.

      (4) In Figure 2F, the results showed the IL-1β, IL-6, and TNF-α content in serum, which was inconsistent with the authors' manuscript description (Line 171).

      Thank you for your comment. The modifications to the results have been implemented.

      (5) Figures 2F-H duplicate Supplementary Figures S1B-D. The authors should prepare the article more carefully to avoid such mistakes.

      Thank you for your comment. We have corrected it in the manuscript.

      (6) In lines 202-206, the authors stated that the elevated serum uric acid levels in the Young+Old or Young+Aged groups, but there is no difference in the results shown in Figure 4A.

      Thank you for your comment. We have corrected it in the manuscript.

      (7) Please visualize the results in Table 2 in a more intuitive manner.

      The results have been presented in Table 2 with a more intuitive visual format. The detailed information is presented in Supplement 4.

      (8) The heatmap in Figure 7A cannot strongly support the conclusion "the butyric acid content in the faeces of Young+PBS group was significantly higher than that in the Aged+PBS group". The author should re-represent the visual results and provide a reasonable explanation. In addition, please provide the ordinate unit of Supplementary Figure 7A-H.

      Thank you for your comment. Figure 7A and Supplementary Figure 7A-H together illustrate "the butyric acid content in the faeces of Young+PBS group was significantly higher than that in the Aged+PBS group", and the specific units of short-chain fatty acids have been annotated in the manuscript.

      (9) Uncropped original full-length western blot should be provided.

      Thank you for your comment. We have made relevant notes in the paper.

      Reviewer #1 (Recommendations For The Authors):

      Gout, a prevalent form of arthritis among the elderly, exhibits an intricate relationship with age and gut microbiota. The authors found that gut microbiota plays a crucial role in determining susceptibility to age-related gout. They observed that age-related gut microbiota regulated the activation of the NLRP3 inflammasome pathway and modulated uric acid metabolism. "Younger" microbiota has a positive impact on the gut microbiota structure of old or aged mice, enhancing butanoate metabolism and butyric acid content. Finally, they found butyric acid exerts a dual effect, inhibiting inflammation in acute gout and reducing serum uric acid levels. This work's insights emphasize the potential of "young" gut microbiome in mitigating senile gout. The whole study was interesting, but there were some minor errors in the overall writing of the paper. The author should carefully check the spelling of the words in the text and the case consistency of the group names.

      Questions:

      (1) Line 118, line 142, and elsewhere 24 months in the same format as before.

      Thank you for your comment. We have corrected it in the manuscript.

      (2) Lines 123, Old and Aged group should be a complex number.

      Thank you for your suggestion. We have corrected it in the manuscript.

      (3) Why does line 133 mention the use of ABX? Please add a brief explanation.

      Thank for your suggestion. The aim of utilizing ABX is to construct the linkage between gut microbiota, age, and gout.

      (4) Lines 172-175, the description of TNF does not match the description of the result figure, may be the picture placement error, please correct this.

      Thank you for your careful review. The error has been corrected and the accurate result has been inserted into the original manuscript.

      (5) Lines183-185 and lines193-lines195, Pro-Caspase-1 and Pro-IL activate excess write.

      Thank you for your careful review. We have corrected the error at the original location.

      (6) Line 400, the text should not be written as increased.

      Thank you for your careful review. We have corrected the error at the original location.

      (7) "ns" needs to be added in the legend to indicate that there is no significant difference.

      Thank you for your careful review. We have corrected the error at the original location.

      (8) Lines 1080-1084 "Old or Aged control group and the old or aged group", group names should be case-sensitive.

      Thank you for your suggestion. We have made the correct modification to the group names.

      (9) Lines 1072-1073, "Representative western blot images of foot tissue NLRP3 pathways proteins" add band density.

      Thank you for your suggestion. We have corrected the error on lines 1072-1073 of the article.

      Reviewer #2 (Recommendations For The Authors):

      Specific comments:

      (1) In Figures 1G-H, the Aged+PBS group with antibiotic treatment shows a significant reduction in foot swelling and IL-1β compared to the Young+PBS and Old+PBS groups. The authors state that age-related changes in the gut microbiota exacerbate gout. However, why does only the Aged+PBS group improve with antibiotic treatment? It seems that butyrate alone cannot explain this phenomenon.

      We utilize antibiotics for treatment in order to establish the relationship between gut microbiota, age, and gout. Different age groups are directly given antibiotics for treatment. We found that after clearing the gut microbiota and then stimulating with MSU, the trend of inflammation factors changing with age disappears.

      (2) In Figure 2, the fecal transplantation from young mice improved the infiltration of inflammatory cells and inflammatory cytokines in the Old and Aged groups. However, in Supplementary Figure 1A, there is no improvement observed in the percentage of foot swelling. Is it appropriate to conclude that inflammation was improved even though foot swelling was not suppressed?

      Although we did not observe changes in the swelling of the mice's feet, there were changes in the inflammatory cell infiltration and inflammation factors in the slices. We rely on a comprehensive assessment of various indicators to determine whether the inflammatory condition has improved or worsened.

      (3) In line #249, the authors state that "the fecal microbiota from mice in the young group promotes uric acid elimination, inhibits reabsorption, and may contribute to the integrity of the intestinal barrier structure." However, Supplementary Figure 3F-H shows no significant alterations in Occludin and ZO-1 mRNA expression levels among all groups. Therefore, it is difficult to conclude that the fecal microbiota from the young group promotes the integrity of the intestinal barrier structure. A functional barrier assay, such as oral administration of FITC-dextran, would be necessary to verify the authors' conclusion.

      In Supplementary Figure 3F-H, we observed that the mRNA expression of Occludin and ZO-1 increased but showed no significant difference. However, after the elderly mice were transplanted with the intestinal microbiota of young mice, the mRNA expression of JAMA showed a significant upward trend. Additionally, due to the scarcity of old mice, we were unable to perform the oral administration of FITC-dextran. However, we supplemented with immunohistochemical slices of Zo-1 and Occludin to support our viewpoint.

      (4) In Figure 4, when comparing the young+PBS group with the old+PBS or aged+PBS groups, there are hardly any differences in the proteins involved in uric acid synthesis (ADA, GDA, XOD) or the genes involved in uric acid transport (URAT1, GLUT9, OAT1, OTA3, ABCG2). Since no changes in uric acid synthesis or transport pathways are observed with aging, it is questionable to conclude that fecal transplantation from young mice improves these pathways and lowers blood uric acid levels.

      In the calculation process, we used different age groups of the control group as references, instead of directly using young mice. We then compared the data of mice of different ages, and the results are in Supplementary Material 4.

      (5) In line 276, the authors describe "the Young +Old and Young+Aged groups tended to be closer to the Old+PBS and Aged+PBS groups, and the Old+Young and Aged+young groups tended to be closer to the Young+PBS group (Figure 5D)". Please conduct a statistical analysis.

      (6) In line 298, the authors hypothesize that butyrate might be the key molecule responsible for controlling gout, as Bifidobacterium and Akkermansia were abundant in the Young group, and the butyrate pathway was prominent. However, neither Bifidobacterium nor Akkermansia are butyrate-producing bacteria. Thus, the conclusion appears to be biased toward butyrate, raising questions about this interpretation.

      Upon comparison, we discovered other bacteria genera that produce butyrate, such as Lachnoclostridium. Additionally, literature (PMID:38126785, 26420851) reports have indicated that Bifidobacteria combined with other genera can enhance the production of butyrate. Meanwhile, Akkermansia, particularly the species Akkermansia muciniphila, has been found to confer several beneficial traits, as evidenced by preclinical studies. These traits include promoting the growth of butyrate-producing bacteria through the production of acetate, which leads to a decrease in the loss of the colonic bilayer and subsequent reduction in inflammation (PMID:35468952). Based on the predicted results of microbiome functions, we observed that the Butanoate_metabolism of the microbiota in young mice and the elderly mice recipients of young mouse microbiota was enhanced. Considering that Lachnoclostridium can produce butyrate, and that Bifidobacteria and Akkermansia can promote the production of butyrate by the intestinal microbiota, we speculated that butyrate might play a role in gout and hyperuricemia.

      (7) In Supplementary Figure 7, acetic acid and propionic acid also show the same behavior as butyric acid. It is possible that these metabolites may also affect the development of gout.

      Thank you for your suggestion. Indeed, Figure 7 does show a similar trend for acetic and propionic acids as for butyric acid. However, considering the predictive data of microbial function and the non-targeted metabolomic data, there is an enhancement of Butanoate_metabolism in both young mice and elderly mice receiving young mouse intestinal microbiota transplants. Therefore, we prioritized butyrate as the subject of our study. Due to the scarcity of elderly mice, we are unable to conduct subsequent experiments with acetic and propionic acids, which is one of the limitations of this study. This work will be addressed in our follow-up research.

      (8) In Figure 6, the secondary bile acid biosynthesis pathway was also changed. However, there is little mention of secondary bile acid in the discussion section. Please carefully discuss other possibilities besides butyrate.

      Thank you for your suggestion. We have incorporated a discussion about secondary bile acids into the relevant section of our manuscript.

      (9) In line #330, the authors state, 'the metabolites identified as showing differential abundance between the groups were enriched in the butanoate metabolism pathway (Figure 6A-D).' However, there does not appear to be much difference in the butanoate metabolism pathway. Specifically, in Figure 6C, the butanoate metabolism pathway in the Old group does not differ from that in the Young group. Please explain in more detail whether the butanoate metabolism pathway is relevant in the Old group.

      The metabolites identified as showing differential abundance between the groups were enriched in the butanoate metabolism pathway. The differential metabolites are enriched in the butyrate metabolism pathway; however, the non-targeted metabolomics did not reveal the extent of their enrichment.

      (10) In Figure 7, the authors measured the levels of short-chain fatty acids in the Young and Aged groups. They found butyrate in the feces of mice in the Young group was higher than that in the Aged group. However, I wonder whether the Old group also had low levels of butyrate or not.

      In the experiment, we selected three representative groups to verify the hypothesis that butyrate may play a significant role in gout and hyperuricemia. Subsequently, we found that supplementing 18-month-old and 24-month-old mice with butyrate indeed reduced blood uric acid levels and alleviated gout symptoms. Since 18-month-old mice are difficult to obtain, we only conducted microbiome sequencing and non-targeted metabolomic analysis.

      Minor issues:

      (11) In line 74, what does MSU stand for? Please describe the abbreviation.

      In line 74, MSU refers to Monosodium urate crystals.

      (12) In line 136, please insert a space between "IL-1β" and "and".

      Thank you for your suggestion. We have corrected the error of the article.

      (13) In line 570, please describe the method of butyrate administration and also correct the grammatical errors.

      Thank you for your suggestion. We have corrected the error of the article.

      (14) Change the title of x axis in Figure 2F-H, "Serum ~" to "Peritoneal fluid ~", according to the legend.

      Thank you for your suggestion. We have corrected this error in the manuscript.

      (15) In line 302, "succinates" should be "butyric acid or butyrate".

      Thank you for your suggestion. We have corrected this error in the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors showed the results of IL-1β levels in foot tissues in Figure 1C and Figure 1H, and serum IL-1β, IL-6, and TNF-α levels in Figure 2F-H. Could the authors also provide the results of IL-6 and TNF-α in foot tissue in Figure 1?

      Thank you for your suggestion. We have added the results of of IL-6 and TNF-α in foot tissue in supplementary material 4.

      (2) There are some errors in the reference citation format, such as missing page numbers.

      Thank you for your careful review. We have revised the references in our manuscript.

      (3) There are too many writing errors in the manuscript, which greatly affect the understanding of the text. The manuscript must be carefully revised to improve its readability. It's recommended that a professional English writer or native speaker proofread the paper before submission. Some errors, but not limited to these errors, are listed below.

      a. Line 107: The abbreviation for "short-chain fatty acid" should be SCFA, not SFCA.

      Thank you for your careful review. We have corrected this error in the manuscript.

      b. Line 136: There is a missing space between IL-1β and and. B.

      Thank you for your careful review. We have corrected this error in the manuscript.

      c. Line 145, the phrase "on gout on gout", and line 471, "that transplantation" are repeated.

      Thank you for your careful review. We have corrected this error in the manuscript.

      d. Line 152: "Age+PBS" should be "Aged+PBS".

      Thank you for your careful review. We have corrected this error in the manuscript.

      e. In Figure 1e, "Aded+PBS" should be "Aged+PBS".

      Thank you for your careful review.  We have corrected the error in Figure 1e.

      f. Line 152: The phrase "by via" is repeated.

      Thank you for your suggestion. We have deleted the phrase "by via" in line 152.

      g. "16S rDNA" in line 92 is inconsistent with the "16S rRNA" in line 652.

      Thank you for your suggestion. We have revised the error in the manuscript to maintain consistency in professional terminology.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Tleiss et al. demonstrate that while commensal Lactiplantibacillus plantarum freely circulate within the intestinal lumen, pathogenic strains such as Erwinia carotovora or Bacillus thuringiensis are blocked in the anterior midgut where they are rapidly eliminated by antimicrobial peptides. This sequestration of pathogenic bacteria in the anterior midgut requires the Duox enzyme in enterocytes, and both TrpA1 and Dh31 in enteroendocrine cells. This effect induces muscular muscle contraction, which is marked by the formation of TARM structures (thoracic ary-related muscles). This muscle contraction-related blocking happens early after infection (15mins). On the other side, the clearance of bacteria is done by the IMD pathway possibly through antimicrobial peptide production while it is dispensable for the blockage. Genetic manipulations impairing bacterial compartmentalization result in abnormal colonization of posterior midgut regions by pathogenic bacteria. Despite a functional IMD pathway, this ectopic colonization leads to bacterial proliferation and larval death, demonstrating the critical role of bacteria anterior sequestration in larval defense.

      This important work substantially advances our understanding of the process of pathogen clearance by identifying a new mode of pathogen eradication from the insect gut. The evidence supporting the authors' claims is solid and would benefit from more rigorous experiments.

      (1) The authors performed the experiments on Drosophila larvae. I wonder whether this model could extend to adult flies since they have shown that the ROS/TRPA1/Dh31 axis is important for gut muscle contraction in adult flies. If not, how would the authors explain the discrepancy between larvae and adults?

      We have linked the adult phenotype to the larval model to explore the ROS/TrpA1/Dh31 axis in both contexts.  As highlighted in the discussion, however, there are key behavioral differences between larvae and adult flies. Unlike larvae, which remain in the food environment, adult flies have the ability to move away. This difference could impact the relevance of gut muscle contraction and bacterial clearance mechanisms between the two stages. Specifically, in larvae, the rapid ejection of gut contents due to muscle contraction poses a unique risk: larvae may inadvertently re-ingest the expelled material within minutes, which could influence their immune defenses. We have clarified this distinction and our hypothesis in the final section of the discussion, as it emphasizes the adaptive nature of this mechanism in larvae.

      (2) The authors performed their experiments and proposed the models based on two pathogenic bacteria and one commensal bacterial at a relatively high bacterial dose. They showed that feeding Bt at 2X1010 or Ecc15 at 4X108 did not induce a blockage phenotype. 

      I wonder whether larvae die under conditions of enteric infection with low concentrations of pathogenic bacteria. 

      To address this, we have provided new data (Movie 5), in which larvae were fed a lower dose of Bt-GFP at 1.3 × 10^10 CFU/mL. In this video, we observe that when larvae ingest fewer bacteria, no blockage occurs, and the bacteria are able to reach the posterior midgut. As the bacterial load is lower, the fluorescence signal is weaker, but the movie clearly shows the excretion of bacteria. Importantly, under these conditions, no larval death was observed. These findings suggest that below a certain bacterial threshold, the pathogenicity is insufficient to: (1) trigger the blockage response, and (2) kill the larvae. In such cases, bacteria are likely eliminated through normal peristaltic movements rather than through the blockage mechanism described in our study.

      If larvae do not show mortality, what is the mechanism for resisting low concentrations of pathogenic bacteria? 

      As mentioned in our previous response, we hypothesize that the larvae’s ability to resist low concentrations of pathogenic bacteria is likely due to being below the threshold of virulence. At lower bacterial doses, the pathogenic load is insufficient to trigger the blockage mechanism or cause larval death. In these cases, it is probable that classical peristaltic movements of the gut efficiently eliminate the bacteria, preventing them from colonizing the posterior midgut or causing significant harm. Thus, the larvae rely on standard gut motility and immune mechanisms, rather than the blockage response, to clear lower doses of bacteria.

      Why is this model only applied to high-dose infections? 

      The reason this model primarily applies to high-dose infections is that lower concentrations of pathogenic bacteria do not trigger the blockage mechanism. As we mentioned in the manuscript, for low bacterial concentrations, where the GFP signal remains detectable, wild-type larvae are still able to resist live bacteria in the posterior part of the intestine.

      Regarding the bacterial doses used in our experiments, it's important to clarify that we calculate the bacterial load based on colony-forming units (CFU). In our setup, there are approximately 5 × 10^4 CFU per midgut. For each experiment, we prepare 500 µl of contaminated medium containing 4 × 10^10 CFU. Fifty larvae are placed into this 500 µl of medium, meaning each larva ingests around 5 × 10^4 CFU within one hour of feeding.

      This leads us to two key points:

      (1) Continuous feeding might trigger the blockage response even at lower doses, as extended exposure to bacteria could lead to higher accumulation within the gut.

      (2) Other defense mechanisms, such as the production of reactive oxygen species (ROS) or classical peristaltic movements, could be sufficient to eliminate lower bacterial doses (around 10^3 CFU or below).

      We also refer to the newly provided Movie 5, where larvae fed with Bt-GFP at 1.3 × 10^10 CFU/mL show no blockage at low ingestion levels and successfully eliminate the bacteria.

      (3) The authors claim that the lock of bacteria happens at 15 minutes while killing by AMPs happens 6-8 hours later. 

      Our CFU data indicate that it’s after 4 to 6 hours that the quantity of bacteria decreases. We fixed this in the text.

      What happened during this period? 

      During the 4 to 6-hour period, several defense mechanisms are activated. ROS play a bacteriostatic and bacteriolytic role, helping to control bacterial growth. Concurrently, the IMD pathway is activated, leading to the transcription, translation, and secretion of antimicrobial peptides. These AMPs exert both bacteriostatic and bacteriolytic effects, contributing to the eventual clearance of the pathogenic bacteria.

      More importantly, is IMD activity induced in the anterior region of the larval gut in both Ecc15 and Bt infection at 6 hours after infection? 

      We have provided new data (Supplementary Figure 6) that includes RT-qPCR analysis of the whole larval gut in wt, TrpA1- and Dh31- genetic background after feeding with Lp, Ecc15, Bt, or yeast only. We monitored the expression of three different AMP-encoding genes and found that while AMP expression varied depending on the food content, there were no significant differences between the genotypes tested.

      Additionally, we included new imaging data (Supplementary Figure 11) from AMP reporter larvae (Dpt-Cherry) fed with fluorescent Lp or Bt. In larvae infected with Bt, which is blocked in the anterior part of the gut, the dpt gene is predominantly induced in this region, indicating strong IMD pathway activity in response to Bt infection. Conversely, in larvae fed with Lp-GFP, the Dpt-Cherry reporter shows weak expression in the anterior midgut, and is barely detectable in the posterior midgut where Lp-GFP establishes itself. This aligns with previous findings by Bosco-Drayon et al. (2012), which demonstrated low AMP expression in the posterior midgut due to the presence of negative regulators of the IMD pathway, such as amidases and Pirk.

      Are they mostly expressed in the anterior midgut in both bacterial infections? Several papers have shown quite different IMD activity patterns in the Drosophila gut. Zhai et al. have shown that in adult Drosophila, IMD activity was mostly absent in the R2 region as indicated by dpt-lacZ. Vodovar et al. have shown that the expression of dpt-lacZ is observable in proventriculus while Pe is not in the same region. Tzou et al. showed that Ecc15 infection induced IMD activity in the anterior midgut 24 hours after infection. 

      Based on our new data (Supplementary Figure 11), we observe that Dpt-RFP expression is primarily localized in the anterior midgut and likely in the beginning of acidic region in larvae infected with Bt, Ecc and Lp. 

      Using TrpA1 and Dh31 mutants, the authors found both Ecc15 and Bt in the posterior midgut. Why are they not evenly distributed along the gut? 

      We observe that bacteria are not evenly distributed along the gut in wild-type larvae as well, with LP. This suggests that the transit time in the anterior part of the gut may be relatively short due to active peristaltism, which would make this region function as a "checkpoint" for bacteria that are not supposed to be blocked. Indeed, we confirmed that peristaltism is active during our intoxication experiments, which could explain the rapid movement of bacteria through the anterior midgut.

      In contrast, bacteria tend to remain longer in the posterior midgut, which corresponds to the absorptive functions of intestinal cells in this region. This would explain why we observe more bacteria in the posterior midgut for Lp in control larvae and for Ecc15 and Bt in the TrpA1- and Dh31- mutants. Although a few bacteria are still found in the anterior midgut, they are consistently in much lower numbers compared to the posterior, as shown in Figures 1A and 3A of our manuscript.

      Last but not least, does the ROS/TrpA1/Dh31 axis affect AMP expression?

      We investigated whether the ROS/TrpA1/Dh31 axis influences AMP expression by performing RT-qPCR on the whole gut of larvae in wild-type, TrpA1-, and Dh31- genetic backgrounds. Larvae were fed with Lp, Ecc, Bt, or yeast (new data: Supplementary Figure 6). We monitored the expression of three different AMP-encoding genes and found that while AMP expression varied depending on the food content, there were no significant differences in AMP expression between the different genotypes.

      Additionally, we provide imaging data from AMP reporter larvae (pDpt-Cherry) fed with fluorescent Lp or Bt (new data: Supplementary Figure 11). These results further confirm that the ROS/TrpA1/Dh31 axis does not significantly affect AMP expression in our experimental conditions.

      (4) The TARM structure part is quite interesting. However, the authors did not show its relevance in their model. Is this structure the key-driven force for the blocking phenotype and killing phenotype? 

      We agree that the TARM structures are a fascinating aspect of this study and acknowledge the interest in their potential role in the blocking and killing phenotypes. While we are keen to explore the specific contributions of these structures during bacterial intoxication, the current genetic tools available for manipulating TARMs target both TARM T1 and T2 simultaneously, as demonstrated by Bataillé et al., 2020 (Fig. 2). Of note, these muscles are essential for proper gut positioning in larvae, and their absence leads to significant defects in food intake and transit, which would confound the results of our intoxication experiments (see Fig. 6 from Bataillé et al., 2020).

      Therefore, while TARMs are likely involved in these processes, the current limitations in selectively targeting them prevent us from definitively testing their role in bacterial blocking and killing at this stage. We hope to address this in future studies as more refined genetic tools become available.

      Is the ROS/TrpA1/Dh31 axis required to form this structure?

      To determine whether the ROS/TrpA1/Dh31 axis is required for the formation of TARM structures, we examined larval guts from control, TrpA1-, and Dh31- mutant backgrounds. Our new data (Supplementary Figure 8) show that the TARM T2 structures are still present in the mutants, indicating that the formation of these structures does not depend on the ROS/TrpA1/Dh31 axis.

      Reviewer #2 (Public Review):

      This article describes a novel mechanism of host defense in the gut of Drosophila larvae. Pathogenic bacteria trigger the activation of a valve that blocks them in the anterior midgut where they are subjected to the action of antimicrobial peptides. In contrast, beneficial symbiotic bacteria do not activate the contraction of this sphincter, and can access the posterior midgut, a compartment more favorable to bacterial growth.

      Strengths:

      The authors decipher the underlying mechanism of sphincter contraction, revealing that ROS production by Duox activates the release of DH31 by enteroendocrine cells that stimulate visceral muscle contractions. The use of mutations affecting the Imd pathway or lacking antimicrobial peptides reveals their contribution to pathogen elimination in the anterior midgut.

      Weaknesses:

      The mechanism allowing the discrimination between commensal and pathogenic bacteria remains unclear.

      Based on our findings, we hypothesize that ROS play a crucial role in this discrimination process, with uracil release by pathogenic or opportunistic bacteria potentially serving as a key signal.

      To test whether uracil could trigger this discrimination, we conducted experiments where Lp was supplemented with uracil. However, our results show that uracil supplementation alone was not sufficient to induce the blockage response (new data: Supplementary Figure 5). This suggests that while uracil may be a factor in bacterial discrimination, it is likely not the sole trigger, and additional bacterial factors or signals may be required to activate the blockage mechanism. 

      The use of only two pathogens and one symbiotic species may not be sufficient to draw a conclusion on the difference in treatment between pathogenic and symbiotic species.

      To address this concern, we performed additional intoxication experiments using Escherichia coli OP50, a bacterium considered innocuous and commonly used as a standard food source for C. elegans in laboratory settings. The results, presented in our updated data (new data: Fig 1B), show that E. coli OP50, despite being from the same genus as Ecc, does not trigger the blockage response. This further supports our conclusion that the gut’s discriminatory mechanism is specific to pathogenic bacteria, and not merely based on bacterial genus.

      We can also wonder how the process of sphincter contraction is affected by the procedure used in this study, where larvae are starved. Does the sphincter contraction occur in continuous feeding conditions? Since larvae are continuously feeding, is this process physiologically relevant?

      In our intoxication protocol, the larvae are exposed to contaminated food for 1 hour, during which the blockage ratio is quantified. Since this period involves continuous feeding with the contaminated food, we do not consider the larvae starved during the quantification process. Our observations show differences in the blockage response depending on the bacterial contaminant and the genetic background of the host. Additionally, we were able to trigger the blocking phenomenon using exogenous hCGRP.

      Regarding the experimental setup for movie observations, it is true that larvae are immobilized on tape in a humid chamber, which is not a fully physiological context. However, in the new movie we provide (Movie 3), co-treatment with fluorescent Dextran (Red) and fluorescent Bt (Green) shows that both are initially blocked, followed by the posterior release of Dextran once the bacterial clearance begins.

      Furthermore, to address the question of continuous exposure, we extended the exposure period to 20 hours instead of 1 hour. Even after prolonged exposure, we observed that pathogens are still blocked in the anterior part of the gut (new data: Supplementary Figure 2B). This supports the physiological relevance of the sphincter contraction and its ability to function under continuous feeding conditions.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors performed the experiments on Drosophila larvae. I wonder whether this model could extend to adult flies since they have shown that the ROS/TRPA1/Dh31 axis is important for gut muscle contraction in adult flies. If not, how would the authors explain the discrepancy between larvae and adults?

      We link the adult phenotype to the one we describe in larvae in order to have the candidate approach toward the ROS/TrpA1/Dh31 axis. As we already mention in the discussion, while larvae stay in the food, adult flies can go away. If larvae eject their gut content, they may ingest it within minutes. We clarify our idea in the last part of the discussion.

      (2) The authors performed their experiments and proposed the models based on two pathogenic bacteria and one commensal bacterial at a relatively high bacterial dose. They showed that feeding Bt at 2X1010 or Ecc15 at 4X108 did not induce a blockage phenotype. 

      I wonder whether larvae die under conditions of enteric infection with low concentrations of pathogenic bacteria. 

      Video provided with Bt-GFP 1.3 10^10 CFU/mL (new data: Movie 5). When larvae eat less, there is no blockage and bacteria can reach the posterior midgut. Note that the fluorescence is weak due to the low amount of bacteria ingested. The movie shows an excretion of the bacteria. There is also no death of the larvae. Together these results suggest that below a given threshold, the virulence of the bacteria is too weak to i) trigger a blockage and 2/ kill the larva. The bacteria are likely eliminated through classical peristaltism.

      If larvae do not show mortality, what is the mechanism for resisting low concentrations of pathogenic bacteria? 

      Maybe we are below the threshold of virulence. See our response just above.

      Why is this model only applied to high-dose infections? 

      As mentioned in the manuscript, lower concentrations do not trigger the blockage and for lower concentrations with a GFP signal still detectable, wild-type animals resist the presence of live-bacteria within the posterior part of the intestine.

      About the doses, the CFU should be considered. Indeed, there are around 5.10^4 CFU per midgut. In our experimental procedure we calculate the amount of bacteria for 500 µl of contaminated medium (i.e. 4.10^10 CFU/500µl of medium). Then around 50 larvae were deposited in the 500µl of contaminated media. In this condition, one larva ingests 5.10^4 CFU. Moreover, larvae are only fed for 1h. 

      So 1/ continuous feeding may also trigger locking even at lower doses and 2/ the other mechanisms of defenses (such as ROS) or peristalsis may be sufficient to eliminate lower doses (i.e. 10^3 CFU or below). See the new movie 5 we provide with Bt-GFP 1.3 10^10 CFU/mL

      (3) The authors claim that the lock of bacteria happens at 15 minutes while killing by AMPs happens 6-8 hours later. 

      Our CFU data indicate that it’s after 4 to 6 hours that the quantity of bacteria decreases. We fixed this in the text.

      What happened during this period? 

      ROS activity (bacteriostatic and bacteriolytic), IMD activation, AMP transcription, translation, secretion and bacteriostatic as well as bacteriolytic activity.

      More importantly, is IMD activity induced in the anterior region of the larval gut in both Ecc15 and Bt infection at 6 hours after infection? 

      We provide new data for larval whole gut RT-qPCR data in wt, TrpA1- and Dh31- genetic background fed with Lp or Ecc or Bt or yeast only (new data: SUPP6). We monitored 3 different AMP-encoding genes and found differences related to the food content, but no differences between genotypes. In addition, we provide images from AMP reporter animals (Dpt-Cherry) fed with fluorescent Lp or Bt (new data: SUPP11) showing that with Bt blocked in the anterior part of the intestine, the dpt gene is mainly induced in this area. Note that in the larva infected with Lp-GFP, the Dpt-Cherry reporter is weakly expressed in the anterior midgut. In the posterior midgut, the place where Lp-GFP is established, Dpt-Cherry is barely detectable. This observation is in line with the previous observation made by Bosco-Drayon et al., (2012) demonstrating the low level of AMP expression in the posterior midgut due to the expression of the IMD negative regulators such as amidases and pirk. In the larva infected with Bt-GFP, note the obvious expression of DptCherry in the anterior midgut colocalizing with the bacteria (new data: SUPP11).

      Are they mostly expressed in the anterior midgut in both bacterial infections? Several papers have shown quite different IMD activity patterns in the Drosophila gut. Zhai et al. have shown that in adult Drosophila, IMD activity was mostly absent in the R2 region as indicated by dpt-lacZ. Vodovar et al. have shown that the expression of dpt-lacZ is observable in proventriculus while Pe is not in the same region. Tzou et al. showed that Ecc15 infection induced IMD activity in the anterior midgut 24 hours after infection. 

      In ctrl animals fed Bt, Ecc and Lp we see Dpt-RFP in anterior midgut and likely in the beginning of acidic region. See the new data: SUPP11 images provided for the previous remark.

      Using TrpA1 and Dh31 mutants, the authors found both Ecc15 and Bt in the posterior midgut. Why are they not evenly distributed along the gut? 

      Same is true with Lp in wt; not evenly distributed. As if the transit time in the anterior part is very short due to peristaltism which would fit for a check point area if you’re not supposed to be blocked. Indeed, peristaltism is active during our intoxications. Then, it stays longer in the posterior part, fitting with the absorptive skills of the intestinal cells in this area. With Lp in ctrl or Ecc and Bt in TrpA1- and Dh31- mutants, there are always a few in the anterior midgut but always much less compared to the posterior. See our figure 1A and 3A.

      Last but not least, does the ROS/TrpA1/Dh31 axis affect AMP expression?

      We provide larval whole gut RT-qPCR data in wt, TrpA1- and Dh31- genetic background fed with Lp or Ecc or Bt or yeast only (new data: SUPP6). We monitored 3 different AMPencoding genes and found differences related to the food content, but no differences between genotypes. In addition, we provide images from AMP reporter animals (pDptCherry) fed with fluorescent Lp or Bt, (new data: SUPP11).

      (4) The TARM structure part is quite interesting. However, the authors did not show its relevance in their model. Is this structure the key-driven force for the blocking phenotype and killing phenotype? 

      Indeed, we would like to explore the roles of these structures and the putative requirement upon bacterial intoxication using some driver lines developed by the team that studied these muscles in vivo. However, the genetic tools currently available will target TARMsT1 and T2 at the same time. See Fig 2 form Bataillé et al, . 2020. Moreover, these TARMs are, at first, crucial for the correct positioning of the gut within the larvae and their absence lead to a global food intake and transit defect that will bias the outcomes of our intoxication protocol (see fig 6 from Bataillé et al,. 2020).

      Is the ROS/TrpA1/Dh31 axis required to form this structure?

      We provide images of larval guts from ctrl, TrpA1 and Dh31 mutants demonstrating the presence of the TARMs T2 structures despite the mutations (new data: SUPP8). In addition, we provide representative movies of peristalsis in intestines of Dh31 mutants fed or not with Ecc to illustrate that muscular activity is not abolished (new data: Movie 9 and Movie 10).

      Minor points:

      (1) Why not use the Pros-Gal4/UAS-Dh31 strain in Figure 3B in addition to hCGRP?

      We opted for exogenous hCGRP addition because it allowed us precise timing control over Dh31 activation. Overexpression of Dh31 from embryogenesis or early larval stages could have significant and unintended effects on intestinal physiology, potentially confounding the results. While temporal control using TubG80ts could be an alternative, our focus was on identifying the specific cells responsible for the phenomenon.

      To achieve this, we perturbed Dh31 production via RNAi, specifically targeting a limited number of enteroendocrine cells (EECs) using the DJ752-Gal4 driver, as described by Lajeunesse et al., 2010. Our new data (Supplementary Figure 4) demonstrate that Dh31 expression in this subset of cells is indeed necessary for the blockage phenomenon.

      (2) Section title (line 287) refers to mortality, but no mortality data is in the figure.

      We agree that the title referenced mortality, whereas no mortality data was presented in this section. We have updated the title to better reflect the data discussed in this part of the manuscript.

      (3) It may be better to combine ROS-related contents in the same figure.

      While it is technically feasible to consolidate the ROS-related content into one figure, doing so would require splitting essential data, such as the Gal4 controls for the RNAi assays and parts of the survival phenotype data. We believe that the current structure of the study, which first explores the molecular aspects of the phenomenon and then demonstrates its relevance to the animal’s survival, provides a clearer and more logical flow. For these reasons, we prefer to maintain the current figure layout.

      Reviewer #2 (Recommendations For The Authors):

      Major recommendation

      (1) Other wild-type backgrounds should be added (including the w Drosdel background of the AMP14 deficient flies) to check the robustness of the phenotype.

      To address the concern regarding the robustness of the phenotype across different wildtype backgrounds, we have tested additional genetic backgrounds, including w1, the isogenized w1118 and Oregon animals. 

      The results (new data: Figure 1C) demonstrate that Lp is able to transit freely to the posterior part of the intestine in all backgrounds, while Ecc and Bt are blocked in the anterior part. These findings confirm the robustness of the phenotype across different wildtype strains.

      (2) Although we recognize that this may be limited by the number of GFP-expressing species, other commensal and pathogenic bacteria should be tested in this assay (e.g. E. faecalis and Acetobacter).

      We performed new intoxication experiments using Escherichia coli OP50, a wellestablished innocuous bacterial strain. The data, presented in Figure 1B (new data), show that E. coli OP50, despite being from the same genus as Ecc, does not trigger the blockage response. This further supports our hypothesis that the blockage phenomenon is specific to pathogenic bacteria and not simply related to the bacterial genus.

      (3) It is important to test whether sphincter closure also occurs in continuous feeding conditions. This does not mean repeating all the experiments but just shows that this mechanism can take place in conditions where larvae are kept in a vial with food.

      While the movies we provide involve larvae immobilized on tape in a humid chamber, which is not a fully physiological context, we now provide new data (Movie 3) showing that, after co-treatment with fluorescent Dextran (Red) and fluorescent Bt (Green), both substances are initially blocked in the anterior midgut. Later, the dextran is released posteriorly once bacterial clearance has begun.

      Additionally, we extended the feeding period in our experiments from 1 hour to 20 hours to simulate more continuous exposure to contaminated food. Even under these prolonged conditions, we observed that pathogens are still blocked in the anterior part of the gut (new data: Supplementary Figure 2B). This confirms that the sphincter mechanism can function in continuous feeding conditions as well.

      (4) What are the molecular determinants discriminating innocuous from pathogenic bacteria? Addressing this point will increase the impact of the article. The fact that Relish mutants have normal valve constriction suggests that peptidoglycan recognition is not involved. Is there a sensing of pathogen virulence factors? 

      Our data suggest that uracil could be a key molecular determinant in discriminating between innocuous and pathogenic bacteria, as previously described by the W-J Lee team in several studies on adult Drosophila. However, in our experiments, exogenous uracil addition using the blue dye protocol (Keita et al., 2017) did not induce any significant changes in the larvae. Similarly, uracil supplementation in adult flies failed to trigger the Ecc expulsion and gut contraction phenotype, as reported by Benguettat et al., 2018. 

      To further investigate this, we tested the addition of uracil during Lp-GFP intoxication. In these experiments, we did not observe any blockage of Lp (new data: Supplementary Figure 5). These results suggest that uracil might not be the sole trigger for the blockage response, or we may not be providing uracil exogenously in the most effective way. Alternatively, there could be other pathogen-specific virulence factors that contribute to this discrimination mechanism.

      To address this question, the authors should infect larvae with Ecc15 evf- mutants or Ecc15 lacking uracil production. 

      Thank you for your suggestion to use Ecc15 evf- mutants or Ecc15 lacking uracil production to explore the role of uracil in bacterial discrimination. While we have provided some data using uracil supplementation (new data: Supplementary Figure 5), we agree that testing mutants like PyrE would be an important next step. Unfortunately, we currently lack access to fluorescent PyrE or Ecc15 evf- mutants.

      We are planning to address this by developing a new protocol involving fluorescent beads alongside bacteria. This approach will allow us to test several bacterial strains in parallel and better define the size threshold of the valve. However, we do not have the relevant data yet, but this will be a key focus of our future work.

      Similarly, does feeding heat-killed Ecc15 or Bt induce sequestration in the anterior midgut (larvae may be fed dextran-FITC at the same time to track bacteria)?

      Unfortunately, in our attempts to test heat-killed or ethanol-killed fluorescent Ecc15 for these experiments, we encountered an issue: while we were able to efficiently kill the bacteria, we lost the GFP signal required to track their position in the gut. This made it challenging to assess whether sequestration in the anterior midgut occurs with non-viable bacteria.

      Is uracil or Bt toxin feeding sufficient to induce valve closure? 

      As previously mentioned, uracil is a strong candidate for bacterial discrimination, and we have tested its role by adding exogenous uracil during Lp-GFP intoxication. However, in these experiments, Lp was not blocked (new data: Supplementary Figure 5). This suggests that uracil alone may not be sufficient to induce valve closure, or it may not be the only factor involved. It is also possible that our method of exogenous uracil supplementation may not be effectively mimicking the endogenous conditions.

      Regarding Bt, we used vegetative cells without Cry toxins in our experiments. Cry toxins are only produced during sporulation and are enclosed in crystals within the spore. The Bt strain we used, 4D22, has been deleted for the plasmids encoding Cry toxins. As a result, there were no Cry toxins present in the Bt-GFP vegetative cells used in our assays. This has been clarified in the Materials and Methods section of the manuscript.

      Would Bleomycin induce the same phenotype? 

      Indeed, Bleomycin, as well as paraquat, has been shown to damage the gut and trigger intestinal cell proliferation in adult Drosophila through mechanisms involving TrpA1. Testing whether Bleomycin induces a similar phenotype in larvae would indeed be interesting.

      However, one challenge we face in our intoxication protocol is that larvae tend to stop feeding when chemicals are added to their food mixture. We encountered similar difficulties in our DTT experiments, which were challenging to set up for this reason. Consequently, we aim to avoid approaches that might impair the general feeding activity of the larvae, as it can significantly affect the outcomes of our experiments.

      Could this process of sphincter closure be more related to food poisoning?

      If gut damage were the primary trigger for sphincter closure, we would indeed expect the blockage phenomenon to occur later following bacterial exposure. However, in our experiments, we observe the blockage occurring early after bacterial contact, suggesting that damage may not be the main trigger for this response.

      That said, we have not yet tested bacterial mutants lacking toxins, nor have we tested a direct damaging agent such as Bleomycin, as proposed. These would be valuable future experiments to explore the potential role of gut damage more thoroughly in this process.

      (5) Is Imd activation normal in trpA1 and DH31 mutants? The authors could use a diptericin reporter gene to check if Diptericin is affected by a lack of valve closure in trpA1.

      To address this, we performed RT-qPCR on whole larval guts from wt, TrpA11 and Dh31KG09001 genetic background. Larvae were fed with Lp, Ecc, Bt or yeast only (new data: SUPP6). We monitored the expression of three different AMP-encoding genes and found that while AMP expression varied depending on the food content, there were no significant differences in AMP expression between the genotypes.

      Additionally, we provide imaging data from AMP reporter animals (pDpt-Cherry) in a wildtype background, fed with fluorescent Lp or Bt (new data: Supplementary Figure 11). These images also support the conclusion that Diptericin expression is not significantly affected by a lack of valve closure in trpA1 and Dh31 mutants.

      (6) Are the 2-6 DH31 positive cells the same cells described by Zaidman et al., Developmental and Comparative Immunology 36 (2012) 638-647.

      The cells identified as hemocytes in the midgut junctions by Zaidman et al. are likely the same cells we describe in our study, as they are located in the same region and are Dh31 positive. We have added a reference to this paper and included lines in the manuscript acknowledging this connection.

      Although confirming whether these cells are Hml+, Dh31+, and TrpA1+ would clarify their exact identity, this falls outside the scope of our current study. However, the possibility that these cells play a role in physical barrier immunity and also possess a hemocyte identity is indeed intriguing, and we hope future research will explore this further.

      Minor points

      (1) The mutations should be appropriately labelled with the allele name.

      This has been fixed in the main text, in Fig Legends, and in figures. 

      (2) Line 230-231: the sentence is unclear to me.

      We simplified the sentence and do not refer to the expulsion in larvae.

      (3) Discussion: although the discussion is already a bit long, it would be interesting to see if this process is likely to happen/has been described in other insects (mosquito, Bactrocera, ...).

      We reviewed the available literature but were unable to find specific examples describing the blockage phenomenon in other insects. Most studies we found focused on symbiotic bacteria rather than pathogenic or opportunistic bacteria. However, as mentioned in our manuscript, the anterior localization of opportunistic or pathogenic bacteria has been observed in Drosophila by independent research groups.

      (4) Line 546: add the Caudal Won-Jae Lee paper to state the posterior midgut is less microbicidal.

      We added the reference at the right place, mentioning as well that it concerns adults. 

      (5)  Figure 6 indicates what the cells are, shown by the arrow.

      The sentence ‘the arrows point to TARMs’ is present in the legend of Fig6.

      (6) Does the sphincter closure depend on hemocytes?

      As mentioned above, the cells we identify as TrpA1+ in the midgut junction may be the same cells described by Zaidman et al., 2012, and earlier by Lajeunesse et al., 2010. Inactivating hemocytes using the Hml-Gal4 driver may also affect these Dh31+ cells, as they share similarities with hemocytes, as pointed out by Zaidman et al. However, distinguishing between hemocytes and Dh31+/TrpA1+ cells would require a genetic intersectional approach, which is beyond the scope of our current study.

      Nevertheless, the possibility that these cells play a dual role in immunity (through blockage) and share characteristics with hemocytes while functioning as enteroendocrine cells (EECs) is quite intriguing and deserves further exploration in future studies.

    1. Author response:

      Reviewer #1 (Public review):

      Lu et al. use their workflow to visualize RNA expression of five enzymes that are each involved in the biosynthetic pathway of different neurotransmitters/modulators, namely chat (cholinergeric), gad (GABAergic), tbh (octopaminergic), th (dopaminergic), and tph (serotonergic). In this way, they generate an anatomical atlas of neurons that produce these molecules. Collectively these markers are referred to as the "neuronpool." They overstate when they write, "The combination of these five types of neurons constitutes a neuron pool that enables the labeling of all neurons throughout the entire body." This statement does not accurately represent the state of our knowledge about the diversity of neurons in S. mediterranea. There are several lines of evidence that support the presence of glutamatergic and glycinergic neurons, including the following. The glutamate receptor agonists NMDA and AMPA both produce seizure-like behaviors in S. mediterranea that are blocked by the application of glutamate receptor antagonists MK-801 and DNQX (which antagonize NMDA and AMPA glutamate receptors, respectively; Rawls et al., 2009). scRNA-Seq data indicates that neurons in S. mediterranea express a vesicular glutamate transporter, a kainite-type glutamate receptor, a glycine receptor, and a glycine transporter (Brunet Avalos and Sprecher, 2021; Wyss et al., 2022). Two AMPA glutamate receptors, GluR1 and GluR2, are known to be expressed in the CNS of another planarian species, D. japonica (Cebria et al., 2002). Likewise, there is abundant evidence for the presence of peptidergic neurons in S. mediterranea (Collins et al., 2010; Fraguas et al., 2012; Ong et al., 2016; Wyss et al., 2022; among others) and in D. japonica (Shimoyama et al., 2016). For these reasons, the authors should not assume that all neurons can be assayed using the five markers that they selected. The situation is made more complex by the fact that many neurons in S. mediterranea appear to produce more than one neurotransmitter/modulator/peptide (Brunet Avalos and Sprecher, 2021; Wyss et al., 2022), which is common among animals (Vaaga et al., 2014; Brunet Avalos and Sprecher, 2021). However the published literature indicates that there are substantial populations of glutamatergic, glycinergic, and peptidergic neurons in S. mediterranea that do not produce other classes of neurotransmission molecule (Brunet Avalos and Sprecher, 2021; Wyss et al., 2022). Thus it seems likely that the neuronpool will miss many neurons that only produce glutamate, glycine or a neuropeptide.

      In response to your comments, we agree that our initial statement regarding the "neuron pool" overstated the extent of neuronal coverage provided by the five selected markers. We have revised the sentence as “The combination of these five types of neurons constitutes a neuron pool that enables the labeling of most of the neurons throughout the entire body, including the eyes, brain, and pharynx”. 

      Furthermore, we chose the five neurotransmitter systems (cholinergic, GABAergic, octopaminergic, dopaminergic, and serotonergic) based on their well-characterized roles in planarian neurobiology and the availability of reliable markers. However, we acknowledge the limitations of this approach and recognize that it does not encompass all neuron types, particularly those involved in glutamatergic, glycinergic, and peptidergic signaling, which have been documented in S. mediterranea. We will also add the content about other neuron types in our revised manuscript “Additionally, there is considerable diversity among glutamatergic, glycinergic, and peptidergic neurons in planarians. Many neurons in S. mediterranea express more than one neurotransmitter or neuropeptide, which adds further complexity to the system.”

      The authors use their technique to image the neural network of the CNS using antibodies raised vs. Arrestin, Synaptotagmin, and phospho-Ser/Thr. They document examples of both contralateral and ipsilateral projections from the eyes to the brain in the optic chiasma (Figure 1C-F). These data all seem to be drawn from a single animal in which there appears to be a greater than normal number of nerve fiber defasciculatations. It isn't clear how well their technique works for fibers that remain within a nerve tract or the brain. The markers used to image neural networks are broadly expressed, and it's possible that most nerve fibers are too densely packed (even after expansion) to allow for image segmentation. The authors also show a close association between estrella-positive glial cells and nerve fibers in the optic chiasma. 

      Thank you for your detailed feedback. While we did not perform segmentation of all neuron fibers, we were able to segment more isolated fibers that were not densely packed within the neural tracts. We use 120 nm resolution to segment neurons along the three axes. Our data show the presence of both contralateral and ipsilateral projections of visual neurons. Although Figure 1C-F shows data from one planarian, we imaged three independent specimens to confirm the consistency of these observations. In the revised manuscript, we will include a discussion on the limitations of TLSM in reconstructing neural networks, particularly when it comes to resolving fibers within densely packed regions of the nerve tracts.

      The authors count all cell types, neuron pool neurons, and neurons of each class assayed. They find that the cell number to body volume ratio remains stable during homeostasis (Figure S3C), and that the brain volume steadily increases with increasing body volume (Figure S3E). They also observe that the proportion of neurons to total body cells is higher in worms 2-6 mm in length than in worms 7-9 mm in length (Figure 2D, S3F). They find that the rate at which four classes of neurons (GABAergic, octopaminergic, dopaminergic, serotonergic) increase relative to the total body cell number is constant (Figure S3G-J). They write: "Since the pattern of cholinergic neurons is the major cell population in the brain, these results suggest that the above observation of the non-linear dynamics between neurons and cell numbers is likely from the cholinergic neurons." This conclusion should not be reached without first directly counting the number of cholinergic neurons and total body cells. Given that glutamatergic, glycinergic, and peptidergic neurons were not counted, it also remains possible that the non-linear dynamics are due (in part or in whole) to one or more of these populations. 

      We have removed the statement "Since the pattern of cholinergic neurons is the major cell population in the brain, these results suggest that the above observation of the non-linear dynamics between neurons and cell numbers is likely from the cholinergic neurons." We changed this statement into “These results suggest that the above observation of the non-linear dynamics between neurons and cell numbers is not likely from the octopaminergic, GABAergic, dopaminergic and serotonergic neurons. Since our neuron pool may not include glutamatergic, glycinergic, and peptidergic neurons, we would like to add the possibility that the non-linear dynamics may be from cholinergic neurons or other neurons not included in our staining.”

      Reviewer #2 (Public review): 

      Weaknesses: 

      (1) The proprietary nature of the microscope, protected by a patent, limits the technical details provided, making the method hard to reproduce in other labs. 

      Thank you for your comment. We understand the importance of reproducibility and transparency in scientific research. We would like to point out that the detailed design and technical specifications of the TLSM are publicly available in our published work: Chen et al., Cell Reports, 2020. Additionally, the protocol for C-MAP, including the specific experimental steps, is comprehensively described in the methods section of this paper. We believe that these resources should provide sufficient information for other labs to replicate the method.

      (2) The resolution of the analyses is mostly limited to the cellular level, which does not fully leverage the advantages of expansion microscopy. Previous applications of expansion microscopy have revealed finer nanostructures in the planarian nervous system (see Fan et al. Methods in Cell Biology 2021; Wang et al. eLife 2021). It is unclear whether the current protocol can achieve a comparable resolution. 

      Thank you for raising this important point. The strength of our C-MAP protocol lies in its fluorescence-protective nature and user convenience. Notably, the sample can be expanded up to 4.5-fold linearly without the need for heating or proteinase digestion, which helps preserve fluorescence signals. In addition, the entire expansion process can be completed within 48 hours. While our current analysis focused on cellular-level structures, our method can achieve comparable or better resolution and we will add this information in the revised manuscript.

      (3) The data largely corroborate past observations, while the novel claims are insufficiently substantiated. 

      A few major issues with the claims: 

      (4) Line 303-304: While 6G10 is a widely used antibody to label muscle fibers in the planarian, it doesn't uniformly mark all muscle types (Scimone at al. Nature 2017). For a more complete view of muscle fibers, it is important to use a combination of antibodies targeting different fiber types or a generic marker such as phalloidin. This raises fundamental concerns about all the conclusions drawn from Figures 4 and 6 about differences between various muscle types. Additionally, the authors should cite the original paper that developed the 6G10 antibody (Ross et al. BMC Developmental Biology 2015). 

      We appreciate the reviewer’s insightful comments and acknowledge that 6G10 does not uniformly label all muscle fiber types. We agree that this limitation should be recognized in the interpretation of our results. we will revise the manuscript to explicitly state the limitations of using 6G10 alone for muscle fiber labeling and highlight the need for additional markers. We would also clarify that the primary objective of our study was not to distinguish all muscle fiber types but rather to demonstrate the application of our 3D tissue reconstruction method in addressing traditional research questions. Nonetheless, we agree that expanding the labeling strategy in future studies would allow for a more thorough investigation of muscle fiber diversity. We will ensure all citations are properly revised and updated in our next version.

      (5) Lines 371-379: The claim that DV muscles regenerate into longitudinal fibers lacks evidence. Furthermore, previous studies have shown that TFs specifying different muscle types (DV, circular, longitudinal, and intestinal) both during regeneration and homeostasis are completely different (Scimone et al., Nature 2017 and Scimone et al., Current Biology 2018). Single-cell RNAseq data further establishes the existence of divergent muscle progenitors giving rise to different muscle fibers. These observations directly contradict the authors' claim, which is only based on images of fixed samples at a coarse time resolution. 

      Thank you for your valuable feedback. Our intent was not to suggest that DV muscles regenerate into longitudinal fibers. Our observations focused on the wound site, where DV muscle fibers appear to reconnect, and longitudinal fibers, along with other muscle types, gradually regenerate to restore the structure of the injured area. We will revise the relevant sections of the manuscript to clarify this dynamic process more accurately.

      (6) Line 423: The manuscript lacks evidence to claim glia guide muscle fiber branching. 

      We will remove this statement from the revised version. Instead, we will focus on describing our observations of the connections between glial cells and muscle fibers.

      (7) Lines 432/478: The conclusion about neuronal and muscle guidance on glial projections is similarly speculative, lacking functional evidence. It is possible that the morphological defects of estrella+ cells after bcat1 RNAi are caused by Wnt signaling directly acting on estrella+ cells independent of muscles or neurons. 

      We understand that this approach is insufficient and we will revise the manuscript to more clearly state the limitations of our data. We will describe our observations as preliminary and suggest that further experiments are required.

      (8) Finally, several technical issues make the results difficult to interpret. For example, in line 125, cell boundaries appear to be determined using nucleus images; in line 136, the current resolution seems insufficient to reliably trace neural connections, at least based on the images presented. 

      We use two setups for imaging cells and neuron projections. For cellular resolution imaging, we utilized a 1× air objective with a numerical aperture (NA) of 0.25 and a working distance of 60 mm (OLYMPUS MV PLAPO). The voxel size used was 0.8×0.8×2.5 µm3. This configuration resulted in a resolution of 2×2×5 µm3 and a spatial resolution of 0.5×0.5×1.25 µm3 with 4× isotropic expansion. Alternatively, for sub-cellular imaging, we employed a 10×0.6 SV MP water immersion objective with 0.8 NA and a working distance of 8 mm (OLYMPUS). The voxel size used in this configuration was 0.26×0.26×0.8 µm3. As a result of this configuration, we achieved a resolution of 0.5×0.5×1.6 µm3 and a spatial resolution of 0.12×0.12×0.4 µm3 with a 4.5× isotropic expansion. The higher resolution achieved with sub-cellular imaging allows us to observe finer structures and trace neural connections.

      Regarding your question about cell boundaries, we will revise the manuscript to specify that the boundaries we identified are those of each nucleus, rather than entire cells. This distinction will be made clear in the revised version.

      Reviewer #3 (Public review): 

      Weaknesses: 

      (1) The work would have been strengthened by a more careful consideration of previous literature. Many papers directly relevant to this work were not cited. Such omissions do the authors a disservice because in some cases, they fail to consider relevant information that impacts the choice of reagents they have used or the conclusions they are drawing. 

      For example, when describing the antibody they use to label muscles (monoclonal 6G10), they do not cite the paper that generated this reagent (Ross et al PMCID: PMC4307677), and instead, one of the papers they do cite (Cebria 2016) that does not mention this antibody. Ross et al reported that 6G10 does not label all body wall muscles equivalently, but rather "predominantly labels circular and diagonal fibers" (which is apparent in Figure S5A-D of the manuscript being reviewed here). For this reason, the authors of the paper showing different body wall muscle populations play different roles in body patterning (Scimone et al 2017, PMCID: PMC6263039, also not cited in this paper) used this monoclonal in combination with a polyclonal antibody to label all body wall muscle types. Because their "pan-muscle" reagent does not label all muscle types equivalently, it calls into question their quantification of the different body wall muscle populations throughout the manuscript. It does not help matters that their initial description of the body wall muscle types fails to mention the layer of thin (inner) longitudinal muscles between the circular and diagonal muscles (Cebria 2016 and citations therein). 

      Ipsilateral and contralateral projections of the visual axons were beautifully shown by dye-tracing experiments (Okamoto et al 2005, PMID: 15930826). This paper should be cited when the authors report that they are corroborating the existence of ipsilateral and contralateral projections. 

      Thank you for your feedback. We will incorporate these citations and clarifications into the revised manuscript. We acknowledge the limitations of this approach and recognize that it does not encompass all neuron types, particularly those involved in glutamatergic, glycinergic, and peptidergic signaling. We will also add the content about other neuron types in our revised version.

      (2) The proportional decrease of neurons with growth in S. mediterranea was shown by counting different cell types in macerated planarians (Baguna and Romero, 1981; https://link.springer.com/article/10.1007/BF00026179) and earlier histological observations cited there. These results have also been validated by single-cell sequencing (Emili et al, bioRxiv 2023, https://www.biorxiv.org/content/10.1101/2023.11.01.565140v). Allometric growth of the planaria tail (the tail is proportionately longer in large vs small planaria) can explain this decrease in animal size. The authors never really discuss allometric growth in a way that would help readers unfamiliar with the system understand this. 

      Thank you for your feedback. We will incorporate these citations and clarifications into the revised manuscript.

      (3) In some cases, the authors draw stronger conclusions than their results warrant. The authors claim that they are showing glial-muscle interactions, however, they do not provide any images of triple-stained samples labeling muscle, neurons, and glia, so it is impossible for the reader to judge whether the glial cells are interacting directly with body wall muscles or instead with the well-described submuscular nerve plexus. Their conclusion that neurons are unaffected by beta-cat or inr-1 RNAi based on anti-phospho-Ser/Thr staining (Fig. 6E) is unconvincing. They claim that during regeneration "DV muscles initially regenerate into longitudinal fibers at the anterior tip" (line 373). They provide no evidence for such switching of muscle cell types, so it is unclear why they say this. 

      We acknowledge that some of our conclusions were overclaimed given the current data, and we appreciate the opportunity to clarify and refine these claims in the revised manuscript. Regarding the statement that "DV muscles initially regenerate into longitudinal fibers at the anterior tip" (line 373), as addressed in our previous response, this phrasing was unclear. Our intent was not to imply that DV muscles switch into longitudinal fibers. Instead, we observed that muscle fibers reconnect at the wound site, with longitudinal fibers and other muscle types gradually restoring the structure. We will revise this section to better describe the dynamic changes observed during regeneration.

      (4) The authors show how their automated workflow compares to manual counts using PI-stained specimens (Figure S1T). I may have missed it, but I do not recall seeing a similar ground truth comparison for their muscle fiber counting workflow. I mention this because the segmented image of the posterior muscles in Figure 4I seems to be missing the vast majority of circular fibers visible to the naked eye in the original image. 

      Thank you for raising this important point. We will include a ground truth comparison of our automated muscle fiber counting with manual counts in the supplementary figures. Regarding the observation of missing circular fibers in Figure 4I, we agree that the segmentation appears to have missed a significant number of circular fibers in this particular image. This may have been due to limitations in the current parameters of the segmentation algorithm, especially in distinguishing fibers in regions of varying intensity or overlap. We are revisiting the segmentation parameters to improve the accuracy of detecting circular fibers, and we will provide an updated version of Figure 4I in the revised manuscript.

      (5) It is unclear why the abstract says, "We found the rate of neuron cell proliferation tends to lag..." (line 25). The authors did not measure proliferation in this work and neurons do not proliferate in planaria. 

      Thank you for bringing this to our attention. What we intended to convey was the increase in neuron number during homeostasis. We will revise the abstract to avoid this mistake in this context and instead describe it as the increase in neuron numbers due to progenitor cell differentiation during homeostasis.

      (6) It is unclear what readers are to make of the measurements of brain lobe angles. Why is this a useful measurement and what does it tell us? 

      The measurement of brain lobe angles is intended to provide a quantitative assessment of the growth and morphological changes of the planarian brain during regeneration. Additionally, the relevance of brain lobe angles has been explored in previous studies, such as Arnold et al., Nature, 2016, further supporting its use as a meaningful parameter.

      (7) The authors repeatedly say that this work lets them investigate planarians at the single-cell level, but they don't really make the case that they are seeing things that haven't already been described at the single-cell level using standard confocal microscopy. 

      Thank you for your comment. We agree that single-cell level imaging has been previously achieved in planarians using conventional confocal microscopy. However, our goal was to extend the application of expansion microscopy by combining C-MAP with tiling light sheet microscopy (TLSM), which allows for faster and high-resolution 3D imaging of whole-mount planarians. This combination offers several key advantages over traditional confocal microscopy. For example, it enables high-throughput imaging across entire organisms with a level of detail and speed that is not easily achieved using confocal methods. This approach allows us to investigate the planarian nervous system at multiple developmental and regenerative stages in a more comprehensive manner, capturing large-scale structures while preserving fine cellular details. The ability to rapidly image whole planarians in 3D with this resolution provides a more efficient workflow for studying complex biological processes. We believe this distinction is significant and represents an advance over previous methods. We will clarify this point in the manuscript to better distinguish our approach from standard techniques.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this article the authors described mouse models presenting with backer muscular dystrophy, they created three transgenic models carrying three representative exon deletions: ex45-48 del., ex45-47 19 del., and ex45-49 del. This article is well written but needs improvement in some points.

      Strengths:

      This article is well written. The evidence supporting the authors' claims is robust, though further implementation is necessary. The experiments conducted align with the current state-of-the-art methodologies.

      Weaknesses:

      This article does not analyze atrophy in the various mouse models. Implementing this point would improve the impact of the work

      We thank the reviewer for their constructive suggestions and comments on this work. Muscle hypertrophy is shown with growth in dystrophin-deficient skeletal muscle in mdx mice; thus, we did not pay attention to the factors associated with muscle atrophy in BMD mice. As the reviewer suggested, the examination of the association between type IIa fiber reduction and muscle atrophy is important, and the result is considered to be helpful in resolving the cause of type IIa fiber reduction in BMD mice.

      Thus, we are planning to:

      (1) Evaluate the cross-sectional areas (CSA) of muscles and compare them with the changes in the proportion of type IIa fibers.

      (2) Evaluate the expression levels of Murf1 and Atrogin1 as markers of muscle atrophy using RT-PCR.

      Reviewer #2 (Public review):

      Summary

      Miyazaki et al. established three distinct BMD mouse models by deleting different exon regions of the dystrophin gene, observed in human BMD. The authors demonstrated that these models exhibit pathophysiological changes, including variations in body weight, muscle force, muscle degeneration, and levels of fibrosis, alongside underlying molecular alterations such as changes in dystrophin and nNOS levels. Notably, these molecular and pathological changes progress at different rates depending on the specific exon deletions in the dystrophin gene. Additionally, the authors conducted extensive fiber typing, revealing a site-specific decline in type IIa fibers in BMD mice, which they suggest may be due to muscle degeneration and reduced capillary formation around these fibers.

      Strengths:

      The manuscript introduces three novel BMD mouse models with different dystrophin exon deletions, each demonstrating varying rates of disease progression similar to the human BMD phenotype. The authors also conducted extensive fiber typing across different muscles and regions within the muscles, effectively highlighting a site-specific decline in type IIa muscle fibers in BMD mice.

      Weaknesses:

      The authors have inadequate experiments to support their hypothesis that the decay of type IIa muscle fibers is likely due to muscle degeneration and reduced capillary formation. Further investigation into capillary density and histopathological changes across different muscle fibers is needed, which could clarify the mechanisms behind these observations.

      We thank the reviewer for these positive comments and the very important suggestion about type IIa fiber reduction and capillary change around muscle fibers in BMD mice. From the results of the cardiotoxin-induced muscle degeneration and regeneration model, type IIa and IIx fibers showed delayed recovery compared with that of type-IIb fibers. However, this delayed recovery of type IIa and IIx could not explain the cause of the selective muscle fiber reduction limited to type IIa fibers in BMD mice. Therefore, we considered vascular dysfunction as the reason for the selective type IIa fiber reduction, and we found morphological capillary changes from a “ring pattern” to a “dot pattern” around type IIa fibers in BMD mice. However, the association between selective type IIa fiber reduction and the capillary change around muscle fibers in BMD mice remains unclear due to the lack of information about capillaries around type IIx and IIb fibers. The reviewer pointed out this insufficient evaluation of capillaries around other muscle fibers (except for type IIa fibers), and this suggestion is very helpful for explaining the association between selective type IIa fiber reduction and vascular dysfunction in BMD mice.

      Thus, we are planning to:

      (1) Evaluate the changes in capillary formation around other muscle fibers, except for type IIa fibers (e.g., type IIx and IIb fibers).

      (2) Evaluate the endothelial area around other muscle fibers, except for type IIa fibers.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors have assembled a cohort of 10 SiNET, 1 SiAdeno, and 1 lung MiNEN samples to explore the biology of neuroendocrine neoplasms. They employ single-cell RNA sequencing to profile 5 samples (siAdeno, SiNETs 1-3, MiNEN) and single-nuclei RNA sequencing to profile seven frozen samples (SiNET 4-10).

      They identify two subtypes of siNETs, characterized by either epithelial or neuronal NE cells, through a series of DE analyses. They also report findings of higher proliferation in non-malignant cell types across both subtypes. Additionally, they identify a potential progenitor cell population in a single-lung MiNEN sample.

      Strengths:

      Overall, this study adds interesting insights into this set of rare cancers that could be very informative for the cancer research community. The team probes an understudied cancer type and provides thoughtful investigations and observations that may have translational relevance.

      Weaknesses:

      The study could be improved by clarifying some of the technical approaches and aspects as currently presented, toward enhancing the support of the conclusions:

      (1) Methods: As currently presented, it is possible that the separation of samples by program may be impacted by tissue source (fresh vs. frozen) and/or the associated sequencing modality (single cell vs. single nuclei). For instance, two (SiNET1 and SiNET2) of the three fresh tissues are categorized into the same subtype, while the third (SiNET9) has very few neuroendocrine cells. Additionally, samples from patient 1 (SiNET1 and SiNET6) are separated into different subtypes based on fresh and frozen tissue. The current text alludes to investigations (i.e.: "Technical effects (e.g., fresh vs. frozen samples) could also impact the capture of distinct cell types, although we did not observe a clear pattern of such bias."), but the study would be strengthened with more detail.

      We thank the reviewer for the thoughtful and constructive review. Due to the difficulty in obtaining enough SiNET samples, we used two platforms to generate data - single cell analysis of fresh samples, and single nuclei analysis of frozen samples. We opted to combine both sample types in our analysis while being fully aware of the potential for batch effects. We therefore agree that this is a limitation of our work, and that differences between samples should be interpreted with caution.

      Nevertheless, we argue that the two SiNET subtypes that we have identified are very unlikely to be due to such batch effect. First, the epithelial SiNET subtype was not only detected in two fresh samples but also in one frozen sample (albeit with relatively few cells, as the reviewer correctly noted). Second, and more importantly, the epithelial SiNET subtype was also identified in analysis of an external and much larger cohort of bulk RNA-seq SiNET samples that does not share the issue of two platforms (as seen in Fig. 2f). Moreover, the proportion of samples assigned to the two subtypes is similar between our data and the external data. We therefore argue that the identification of two SiNET subtypes cannot be explained by the use of two data platforms. However, we agree that the results should be further investigated and validated by future studies, as is often done in research on rare tumors.

      The reviewer also commented that two samples from the same patient which were profiled by different platforms (SiNET1 and SiNET6) were separated into different subtypes. We would like to clarify that this is not the case, since SiNET6 was not included in the subtype analysis due to too few detected Neuroendocrine cells, and was not assigned to any subtype, as noted in the text and as can be seen by its exclusion from Figure 2 where subtypes are defined. We apologize that our manuscript may have gave the wrong impression about SiNET6 classification (it is labeled in Fig. 4a in a misleading manner). In the revised manuscript, we will correct the labeling in Fig. 4a and clarify that SiNET is not assigned to any subtype. We will further acknowledge the limitation of the two platforms and the arguments in favor of the existence of two SiNET subtypes.

      (2) Results:<br /> Heterogeneity in the SiNET tumor microenvironment: It is unclear if the current analysis of intratumor heterogeneity distinguishes the subtypes. It may be informative if patterns of tumor microenvironment (TME) heterogeneity were identified between samples of the same subtype. The team could also evaluate this in an extension cohort of published SiNET tumors (i.e. revisiting additional analyses using the SiNET bulk RNAseq from Alvarez et al 2018, a subset of single-cell data from Hoffman et al 2023, or additional bulk RNAseq validation cohorts for this cancer type if they exist [if they do not, then this could be mentioned as a need in Discussion])

      We agree that analysis of an independent cohort will assist in defining the association between TME and the SiNET subtype. However, the sample size required for that is significantly larger than the data available. In the revised manuscript we will note that as a direction for future studies.

      (3) Proliferation of NE and immune cells in SiNETs: The observed proliferation of NE and immune cells in SiNETs may also be influenced by technical factors (including those noted above). For instance, prior studies have shown that scRNA-seq tends to capture a higher proportion of immune cells compared to snRNA-seq, which should be considered in the interpretation of these results. Could the team clarify this element?

      We agree that different platforms could affect the observed proportions of immune cells, and more generally the proportions of specific cell types. However, the low proliferation of Neuroendocrine cells and the higher proliferation of immune cells (especially B cells, but also T cells and macrophages) is consistently observed in both platforms, as shown in Fig. 4a, and therefore appears to be reliable despite the limitations of our work. We will clarify this consistency in the revised manuscript. 

      (4) Putative progenitors in mixed tumors: As written, the identification of putative progenitors in a single lung MiNEN sample feels somewhat disconnected from the rest of the study. These findings are interesting - are similar progenitor cell populations identified in SiNET samples? Recognizing that ideally additional validation is needed to confidently label and characterize these cells beyond gene expression data in this rare tumor, this limitation could be addressed in a revised Discussion.

      We agree with this comment and will add the need for additional validation for this finding in the revised Discussion.

      Reviewer #2 (Public review):

      Summary:

      The research identifies two main SiNET subtypes (epithelial-like and neuronal-like) and reveals heterogeneity in non-neuroendocrine cells within the tumor microenvironment. The study validates findings using external datasets and explores unexpected proliferation patterns. While it contributes to understanding SiNET oncogenic processes, the limited sample size and depth of analysis present challenges to the robustness of the conclusions.

      Strengths:

      The studies effectively identified two subtypes of SiNET based on epithelial and neuronal markers. Key findings include the low proliferation rates of neuroendocrine (NE) cells and the role of the tumor microenvironment (TME), such as the impact of Macrophage Migration Inhibitory Factor (MIF).

      Weaknesses:

      However, the analysis faces challenges such as a small sample size, lack of clear biological interpretation in some analyses, and concerns about batch effects and statistical significance.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors set out to profile small intestine neuroendocrine tumors (siNETs) using single-cell/nucleus RNA sequencing, an established method to characterize the diversity of cell types and states in a tumor. Leveraging this dataset, they identified distinct malignant subtypes (epithelial-like versus neuronal-like) and characterized the proliferative index of malignant neuroendocrine cells versus non-malignant microenvironment cells. They found that malignant neuroendocrine cells were far less proliferative than some of their non-malignant counterparts (e.g., B cells, plasma cells, epithelial cells) and there was a strong subtype association such that epithelial-like siNETs were linked to high B/plasma cell proliferation, potentially mediated by MIF signaling, whereas neuronal-like siNETs were correlated with low B/plasma cell proliferation. The authors also examined a single case of a mixed lung tumor (neuroendocrine and squamous) and found evidence of intermediate/mixed and stem-like progenitor states that suggest the two differentiated tumor types may arise from the same progenitor.

      Strengths:

      The strengths of the paper include the unique dataset, which is the largest to date for siNETs, and the potentially clinically relevant hypotheses generated by their analysis of the data.

      Weaknesses:

      The weaknesses of the paper include the relatively small number of independent patients (n = 8 for siNETs), lack of direct comparison to other published single-cell NET datasets, mixing of two distinct methods (single-cell and single-nucleus RNA-seq), lack of direct cell-cell interaction analyses and spatially-resolved data, and lack of in vitro or in vivo functional validation of their findings.

      The analytical methods applied in this study appear to be appropriate, but the methods used are fairly standard to the field of single-cell omics without significant methodological innovation. As the authors bring forth in the Discussion, the results of the study do raise several compelling questions related to the possibility of distinct biology underlying the epithelial-like and neuronal-like subtypes, the origin of mixed tumors, drivers of proliferation, and microenvironmental heterogeneity. However, this study was not able to further explore these questions through spatially-resolved data or functional experiments.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      As you can see from the assessment (which is unchanged from before) and the reviews included below, the reviewers felt that the revisions did not yet address all of the major concerns. There was agreement that the strength of evidence would be upgraded to "solid" by addressing, at minimum, the following: 

      (1) Which of the results are significant for individual monkeys; and 

      (2) How trials from different target contrasts were analyzed 

      In this revision, we have addressed the two primary editorial recommendations:

      (1) We apologize if this information was not clear in the previous version. We have updated Table 1 to highlight clearly the significant results for individual monkeys. Six of our key results – pupil diameter (Fig 2B), microsaccades (Fig 2D), decoding performance for narrow-spiking units (Fig 3A), decoding performance for broad-spiking units (Fig 3B), target-evoked firing rate for all units (Fig 3E) and target-evoked firing rate for broad-spiking units (Fig 3F) – are significant for individual animals and therefore gives us high confidence regarding our results. Please also note that we present all results for individual animals in the Supplementary figures accompanying each main figure.

      (2) We have updated the manuscript and methods to explain how trials of each contrast were included in each analysis, and how contrast normalization was performed for the analysis in Figure 3. In addition, we discuss this point in the Discussion section, which we quote below:

      “Non-target stimulus contrasts were slightly different between hits and misses (mean: 33.1% in hits, 34.0% in misses, permutation test, 𝑝 = 0.02), but the contrast of the target was higher in hits compared to misses (mean: 38.7% in hits, 27.7% in misses, permutation test, 𝑝 = 1.6   𝑒 − 31). To control for potential effects of stimulus contrast, firing rates were first normalized by contrast before performing the analyses reported in Figure 3. For all other results, we considered only non-target stimuli, which had very minor differences in contrast (<1%) across hits and misses. In fact, this minor difference was in the opposite direction of our results with mean contrast being slightly higher for misses. While we cannot completely rule out any other effects of stimulus contrast, the normalization in Figure 3 and minor differences for non-target stimuli should minimize them.”

      Reviewer #1 (Public Review): 

      Summary: 

      In this study, Nandy and colleagues examine neural, physiological and behavioral correlates of perceptual variability in monkeys performing a visual change detection task. They used a laminar probe to record from area V4 while two macaque monkeys detected a small change in stimulus orientation that occurred at a random time in one of two locations, focusing their analysis on stimulus conditions where the animal was equally likely to detect (hit) or not-detect (miss) a briefly presented orientation change (target). They discovered two behavioral and physiological measures that are significantly different between hit and miss trials - pupil size tends to be slightly larger on hits vs. misses, and monkeys are more likely to miss the target on trials in which they made a microsaccade shortly before target onset. They also examined multiple measures of neural activity across the cortical layers and found some measures that are significantly different between hits and misses. 

      Strengths: 

      Overall the study is well executed and the analyses are appropriate (though several issues still need to be addressed as discussed in Specific Comments). 

      Thank you.

      Weaknesses: 

      My main concern with this study is that, with the exception of the pre-target microsaccades, the correlates of perceptual variability (differences between hits and misses) appear to be weak, potentially unreliable and disconnected. The GLM analysis of predictive power of trial outcome based on the behavioral and neural measures is only discussed at the end of the paper. This analysis shows that some of the measures have no significant predictive power, while others cannot be examined using the GLM analysis because these measures cannot be estimated in single trials. Given these weak and disconnected effects, my overall sense is that the current results provide limited advance to our understanding of the neural basis of perceptual variability. 

      Please see our response above to item #1 of the editorial recommendation. Six of our key results are individually significant in both animals giving us high confidence about the reliability and strength of our results. 

      Regarding the reviewer’s comment about the GLM, we note (also stated in the manuscript) that among the measures that we could estimate reliably on a single trial basis, two of these – pre-target microsaccades and input-layer firing rates – were reliable signatures of stimulus perception at threshold. This analysis does not imply that the other measures – Fano Factor, PPC, inter-laminar population correlations, SSC (which are all standard tools in modern systems neuroscience, and which cannot be estimated on a single-trial basis) – are irrelevant. Our intent in including the GLM analyses was to complement the results reported from these across-trial measures (Figs 4-7) with the predictive power of single-trial measures.

      While no study is entirely complete in itself, we have attempted to synthesize our results into a conceptual model as depicted in Fig 8.

      Reviewer #2 (Public Review): 

      Strengths: 

      The experiments were well-designed and executed with meticulous control. The analyses of both behavioural and electrophysiological data align with the standards in the field. 

      Thank you.

      Weaknesses: 

      Many of the findings appear to be subtle differences and incremental compared to previous literature, including the authors' own work. While incremental findings are not necessarily a problem, the manuscript lacks clear statements about the extent to which the dataset, analysis, and findings overlap with the authors' prior research. For example, one of the main findings, which suggests that V4 neurons exhibit larger visual responses in hit trials (as shown in Fig. 3), appears to have been previously reported in their 2017 paper. 

      We respectfully disagree with the assessment that the findings reported here are incremental over the results reported in our prior study (Nandy et al,. 2017). In the previous study, we compared the laminar profile of neural modulation due to the deployment of attention i.e. the main comparison points were the attend-in and the attend-away conditions while controlling for visual stimulation. In this study, we go one step further and home in on the attend-in condition and investigate the differences in the laminar profile of neural activity (and two additional physiological measures: pupil and microsaccades) when the animal either correctly reports or fails to report a stimulus with equal probability. We thus control for both the visual stimulation and the cued attention state of the animal. While there are parallels to our previous results (as the reviewer correctly noted), the results reported here cannot be trivially predicted from our previous results. Please also note that we discuss our new results in the context of prior results, from both our group and others, in the manuscript (lines 310-332).

      Furthermore, the manuscript does not explore potentially interesting aspects of the dataset. For instance, the authors could have investigated instances where monkeys made 'false' reports, such as executing saccades towards visual stimuli when no orientation change occurred, which allows for a broader analysis that considers the perceptual component of neural activity over pure sensory responses. Overall, lacking broad interest with the current form.

      We appreciate the reviewer’s feedback on analyzing false alarm trials. Our focus for this study was to investigate the behavioral and neural correlates accompanying a correct or incorrect perception of a target stimulus presented at perceptual threshold. False alarm trials, by definition, do not include a target presentation. Moreover, false alarm rates rapidly decline with duration into a trial, with high rates during the first non-target presentation and rates close to zero by the time of the eighth presentation (see figure). Investigating false alarms will thus involve a completely different form of analysis than we have undertaken here. We therefore feel that while analyzing false alarm trials will be an interesting avenue to pursue in the future, it is outside the scope of the present study.

      Author response image 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      New Experiments

      (1) Activation-dependent dynamics of PKA with the RIα regulatory subunit, adding to the answer to Reviewers 1 and 2. To determine the dynamics of all PKA isoforms, we have added experiments that used PKA-RIα as the regulatory subunit. We found differential translocation between PKA-C (co-expressed with PKA-RIα) and PKA-RIα (Figure 1–figure supplement 3), similar to the results when PKA-RIIα or PKA-RIβ was used.

      (2) PKA-C dynamics elicited by a low concentration of norepinephrine, addressing Reviewer 3’s comment. We have found that PKA-C (co-expressed with RIIα) exhibited similar translocation into dendritic spines in the presence of a 5x lowered concentration (2 μM) of norepinephrine, suggesting that the translocation occurs over a wide range of stimulus strengths (Figure 1-figure supplement 2).

      Reviewer #1 (Public Review):

      Summary:

      This is a short self-contained study with a straightforward and interesting message. The paper focuses on settling whether PKA activation requires dissociation of the catalytic and regulatory subunits. This debate has been ongoing for ~ 30 years, with renewed interest in the question following a publication in Science, 2017 (Smith et al.). Here, Xiong et al demonstrate that fusing the R and C subunits together (in the same way as Smith et al) prevents the proper function of PKA in neurons. This provides further support for the dissociative activation model - it is imperative that researchers have clarity on this topic since it is so fundamental to building accurate models of localised cAMP signalling in all cell types. Furthermore, their experiments highlight that C subunit dissociation into spines is essential for structural LTP, which is an interesting finding in itself. They also show that preventing C subunit dissociation reduces basal AMPA receptor currents to the same extent as knocking down the C subunit. Overall, the paper will interest both cAMP researchers and scientists interested in fundamental mechanisms of synaptic regulation.

      Strengths:

      The experiments are technically challenging and well executed. Good use of control conditions e.g untransfected controls in Figure 4.

      We thank the reviewer for their accurate summarization of the position of the study in the field and for the positive evaluation of our study.

      Weaknesses:

      The novelty is lessened given the same team has shown dissociation of the C subunit into dendritic spines from RIIbeta subunits localised to dendritic shafts before (Tillo et al., 2017). Nevertheless, the experiments with RII-C fusion proteins are novel and an important addition.

      We thank the reviewer for noticing our earlier work. The first part of the current work is indeed an extension of previous work, as we have articulated in the manuscript. However, this extension is important because recent studies suggested that the majority of PKA-RIIβ are axonal localized. The primary PKA subtypes in the soma and dendrite are likely PKA-RIβ or PKA-RIIα. Although it is conceivable that the results from PKA-RIIβ can be extended to the other subunits, given the current debate in the field regarding PKA dissociation (or not), it remains important to conclusively demonstrate that these other regulatory subunit types also support PKA dissociation within intact cells in response to a physiological stimulant. To complete the survey for all PKA-R isoforms, we have now added data for PKA-RIα (New Experiment #1), as they are also expressed in the brain (e.g., https://www.ncbi.nlm.nih.gov/gene/5573). Additionally, as the reviewer points out, our second part is a novel addition to the literature.

      Reviewer #2 (Public Review):

      Summary:

      PKA is a major signaling protein that has been long studied and is vital for synaptic plasticity. Here, the authors examine the mechanism of PKA activity and specifically focus on addressing the question of PKA dissociation as a major mode of its activation in dendritic spines. This would potentially allow us to determine the precise mechanisms of PKA activation and address how it maintains spatial and temporal signaling specificity.

      Strengths:

      The results convincingly show that PKA activity is governed by the subcellular localization in dendrites and spines and is mediated via subunit dissociation. The authors make use of organotypic hippocampal slice cultures, where they use pharmacology, glutamate uncaging, and electrophysiological recordings.

      Overall, the experiments and data presented are well executed. The experiments all show that at least in the case of synaptic activity, the distribution of PKA-C to dendritic spines is necessary and sufficient for PKA-mediated functional and structural plasticity.

      The authors were able to persuasively support their claim that PKA subunit dissociation is necessary for its function and localization in dendritic spines. This conclusion is important to better understand the mechanisms of PKA activity and its role in synaptic plasticity.

      We thank the reviewer for their positive evaluation of our study.

      Weaknesses:

      While the experiments are indeed convincing and well executed, the data presented is similar to previously published work from the Zhong lab (Tillo et al., 2017, Zhong et al 2009). This reduces the novelty of the findings in terms of re-distribution of PKA subunits, which was already established. A few alternative approaches for addressing this question: targeting localization of endogenous PKA, addressing its synaptic distribution, or even impairing within intact neuronal circuits, would highly strengthen their findings. This would allow us to further substantiate the synaptic localization and re-distribution mechanism of PKA as a critical regulator of synaptic structure, function, and plasticity.

      We thank the reviewer for noticing our earlier work. The first part of the current work is indeed an extension of previous work, as we have articulated in the manuscript. However, this extension is important because recent studies suggested that the majority of PKA-RIIβ are axonal localized. The primary PKA subtypes in the soma and dendrite are likely PKA-RIβ or PKA-RIIα. Although it is conceivable that the results from PKA-RIIβ can be extended to the other subunits, given the current debate in the field regarding PKA dissociation (or not), it remains important to conclusively demonstrate that these other regulatory subunit types also support PKA dissociation within intact cells in response to a physiological stimulant. To complete the survey for all PKA-R isoforms, we have now added data for PKA-RIα (New Experiment #1), as they are also expressed in the brain (e.g., https://www.ncbi.nlm.nih.gov/gene/5573). Additionally, as Reviewer 1 points out, our second part is a novel addition to the literature.

      We also thank the reviewer for suggesting the experiments to examine PKA’s synaptic localization and dynamics as a key mechanism underlying synaptic structure and function. We agree that this is a very interesting topic. At the same time, we feel that this mechanistic direction is open ended at this time and beyond what we try to conclude within this manuscript: prevention of PKA dissociation in neurons affects synaptic function. Therefore, we will save the suggested direction for future studies. We hope the reviewer understand.

      Reviewer #3 (Public Review):

      Summary:

      Xiong et al. investigated the debated mechanism of PKA activation using hippocampal CA1 neurons under pharmacological and synaptic stimulations. Examining the two PKA major isoforms in these neurons, they found that a portion of PKA-C dissociates from PKA-R and translocates into dendritic spines following norepinephrine bath application. Additionally, their use of a non-dissociable form of PKC demonstrates its essential role in structural long-term potentiation (LTP) induced by two-photon glutamate uncaging, as well as in maintaining normal synaptic transmission, as verified by electrophysiology. This study presents a valuable finding on the activation-dependent re-distribution of PKA catalytic subunits in CA1 neurons, a process vital for synaptic functionality. The robust evidence provided by the authors makes this work particularly relevant for biologists seeking to understand PKA activation and its downstream effects essential for synaptic plasticity.

      Strengths:

      The study is methodologically robust, particularly in the application of two-photon imaging and electrophysiology. The experiments are well-designed with effective controls and a comprehensive analysis. The credibility of the data is further enhanced by the research team's previous works in related experiments. The conclusions of this paper are mostly well supported by data. The research fills a significant gap in our understanding of PKA activation mechanisms in synaptic functioning, presenting valuable insights backed by empirical evidence.

      We thank the reviewer for their positive evaluation of our study.

      Weaknesses:

      The physiological relevance of the findings regarding PKA dissociation is somewhat weakened by the use of norepinephrine (10 µM) in bath applications, which might not accurately reflect physiological conditions. Furthermore, the study does not address the impact of glutamate uncaging, a well-characterized physiologically relevant stimulation, on the redistribution of PKA catalytic subunits, leaving some questions unanswered.

      We agreed with the Reviewer that testing under physiological conditions is critical especially given the current debate in the literature. That is why we tested PKA dynamics induced by the physiological stimulant, norepinephrine. It has been suggested that, near the release site, local norepinephrine concentrations can be as high as tens of micromolar (Courtney and Ford, 2014). Based on this study, we have chosen a mid-range concentration (10 μM). At the same time, in light of the Reviewer’s suggestion, we have now also tested PKA-RIIα dissociation at a 5x lower concentration of norepinephrine (2 μM; New Experiment #2). The activation and translocation of PKA-C is also readily detectible under this condition to a degree comparable to when 10 μM norepinephrine was used.

      Regarding the suggested glutamate uncaging experiment, it is extremely challenging because of finite signal-to-noise ratios in our experiments. From our past studies, we know that activated PKA-C can diffuse three dimensionally, with a fraction as membrane-associated proteins and the other as cytosolic proteins. Although we have evidence that its membrane affinity allows it to become enriched in dendritic spines, it is not known (and is unlikely) that activated PKA-C is selectively targeted to a particular spine. Glutamate uncaging of a single spine presumably would locally activate a small number of PKA-C. It will be very difficult to trace the 3D diffusion of these small number of molecules in the presence of surrounding resting-state PKA-C molecules. Finally, we hope the reviewer agrees that, regardless of the result of the glutamate uncaging experiment, the above new experiment (New Experiment #2) already indicate that certain physiologically relevant stimuli can drive PKA-C dissociation from PKA-R and translocation to spines, supporting our conclusion.

      Reviewer #2 (Recommendations For The Authors):

      It was a pleasure reading your paper, and the results are well-executed and well-presented.

      My main and only recommendations are two ways to further expand the scope of the findings.

      First, I believe addressing the endogenous localization of PKA-C subunit before and after PKA activation would be highly important to validate these claims. Overexpression of tagged proteins often shows vastly different subcellular distribution than their endogenous counterparts. Recent technological advances with CRISPR/Cas9 gene editing (Suzuki et al Nature 2016 and Gao et al Neuron 2019 for example) which the Zhong lab recently contributed to (Zhong et al 2021 eLife) allow us to tag endogenous proteins and image them in fixed or live neurons. Any experiments targeting endogenous PKA subunits that support dissociation and synaptic localization following activation would be very informative and greatly increase the novelty and impact of their findings.

      We agreed that addressing the endogenous PKA dynamics is important. However, despite recent progress, endogenous labeling using CRISPR-based methods remains challenging and requires extensive optimization. This is especially true for signaling proteins whose endogenous abundance is often low. We have tried to label PKA catalytic subunits and regulatory subunits using both the homologous recombination-based method SLENDR and our own non-homologous end joining-based method CRISPIE. We did not succeed, in part because it is very difficult to see any signal under wide-field fluorescence conditions, which makes it difficult to screen different constructs for optimizing parameters. It is also possible that, at the endogenous abundance, the label is just not bright enough to be seen. Nevertheless, for both PKA type Iβ and type IIα that we studied in this manuscript, we have correlated the measured parameters (specifically, Spine Enrichment Index or SEI) with the overexpression level (Figure 1-figure supplement 1). We found that they are not strongly correlated with the expression level under our conditions. By extrapolating to non-overexpression conditions, our conclusion remains valid.

      To overcome the inability to label endogenous PKA subunits using CRISPR-based methods, we have also attempted a conditional knock-in method call ENABLED that we previously developed to label PKA-Cα. In preliminary results, we found that endogenously label PKA were very dim. However, in a subset of cells that are bright enough to be quantified, the PKA catalytic subunit indeed translocated to dendritic spines upon stimulation (see Additional Fig. 1 in the next page), corroborating our results using overexpression. These results, however, are not ready to be published because characterization of the mouse line takes time and, at this moment, the signal-to-noise ratio remains low. We hope that the reviewer can understand.

      Author response image 1.

      Endogeneous PKA-Cα translocate to dendritic spines upon activation.

      Second, experiments which would advance and validate these findings in vivo would be highly valuable. This could be achieved in a number of ways - one would be overexpression of tagged PKA versions and examining sub-cellular distribution before and after physiological activation in vivo. Another possibility is in vivo perturbation - one would speculate that disruption or tethering of PKA subunits to the dendrite would lead to cell-specific functional and structural impairments. This could be achieved in a similar manner to the in vitro experiments, with a PKA KO and replacement strategy of the tethered C-R plasmid, followed by structural or functional examination of neurons.

      I would like to state that these experiments are not essential in my opinion, but any improvements in one of these directions would greatly improve and extend the impact and findings of this paper.

      We thank the reviewer for the suggestion and the understanding. The suggested in vivo experiments are fascinating. However, in vivo imaging of dendritic spine morphology is already in itself challenging. The difficulty greatly increases when trying to detect partial, likely transient translocation of a signaling protein. It is also very difficult to knock down endogenous PKA while simultaneously expressing the R-C construct in a large number of cells to achieve detectable circuit or behavioral effect (and hope that compensation does not happen over weeks). We hope the reviewer agrees that these experiments would be their own project and go beyond the time and scope of the current study.

      Reviewer #3 (Recommendations For The Authors):

      Please elaborate on the methods used to visualize PKA-RIIα and PKA-RIβ subunits.

      As suggested, we have now included additional details for visualizing PKA-Rs in the text. Specifically, we write (pg. 5): “…, as visualized using expressed PKA-R-mEGFP in separate experiments (Figs. 1A-1C).”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      The authors examined the salt-dependent phase separation of the low-complexity domain of hnRN-PA1 (A1-LCD). Using all-atom molecular dynamics simulations, they identified four distinct classes of salt dependence in the phase separation of intrinsically disordered proteins (IDPs), which can be predicted based on their amino acid composition. However, the simulations and analysis, in their current form, are inadequate and incomplete. 

      Strengths: 

      The authors attempt to unravel the mechanistic insights into the interplay between salt and protein phase separation, which is important given the complex behavior of salt effects on this process. Their effort to correlate the influence of salt on the low-complexity domain of hnRNPA1 (A1-LCD) with a range of other proteins known to undergo salt-dependent phase separation is an interesting and valuable topic. 

      Weaknesses: 

      (1) The simulations performed are not sufficiently long (Figure 2A) to accurately comment on phase separation behavior. The simulations do not appear to have converged well, indicating that the system has not reached a steady state, rendering the analysis of the trajectories unreliable.

      We have extended the simulations for an additional 500 ns, to 1500 ns. The last 500 ns show reasonably good convergence (see Figure 2A).

      (2) The majority of the data presented shows no significant alteration with changes in salt concentration. However, the authors have based conclusions and made significant comments regarding salt activities. The absence of error bars in the data representation raises questions about its reliability. Additionally, the manuscript lacks sufficient scientific details of the calculations.  

      We have now included error bars. With the error bars, the salt dependences of all the calculated properties (exception for Rg) show a clear trend. Additionally, we have expanded the descriptions of our calculations (p. 15-16).

      (3) In Figures 2B and 2C, the changes in the radius of gyration and the number of contacts do not display significant variations with changes in salt concentration. The change in the radius of gyration with salt concentration is less than 1 Å, and the number of contacts does not change by at least 1. The authors' conclusions based on these minor changes seem unfounded. 

      The variation of ~ 1 Å for the calculated Rg is similar to the counterpart for the experimental Rg. As for the number of contacts, note that this property is presented on a per-residue basis, so a value of 1 means that each residue picks up one additional contact, or each protein chain gains a total of 131 contacts, when the salt concentration is increased from 50 to 1000 mM.

      Reviewer #2 (Public Review): 

      This is an interesting computational study addressing how salt affects the assembly of biomolecular condensates. The simulation data are valuable as they provide a degree of atomistic details regarding how small salt ions modulate interactions among intrinsically disordered proteins with charged residues, namely via Debye-like screening that weakens the effective electrostatic interactions among the polymers, or through bridging interactions that allow interactions between like charges from different polymer chains to become effectively attractive (as illustrated, e.g., by the radial distribution functions in Supplementary Information). However, this manuscript has several shortcomings: 

      (i) Connotations of the manuscript notwithstanding, many of the authors' concepts about salt effects on biomolecular condensates have been put forth by theoretical models, at least back in 2020 and even earlier. Those earlier works afford extensive information such as considerations of salt concentrations inside and outside the condensate (tie-lines). But the authors do not appear to be aware of this body of prior works and therefore missed the opportunity to build on these previous advances and put the present work with its complementary advantages in structural details in the proper context.

      (ii) There are significant experimental findings regarding salt effects on condensate formation [which have been modeled more recently] that predate the A1-LCD system (ref.19) addressed by the present manuscript. This information should be included, e.g., in Table 1, for sound scholarship and completeness. 

      (iii) The strengths and limitations of the authors' approach vis-à-vis other theoretical approaches should be discussed with some degree of thoroughness (e.g., how the smallness of the authors' simulation system may affect the nature of the "phase transition" and the information that can be gathered regarding salt concentration inside vs. outside the "condensate" etc.). Accordingly, this manuscript should be revised to address the following. In particular, the discussion in the manuscript should be significantly expanded by including references mentioned below as well as other references pertinent to the issues raised. 

      (1) The ability to use atomistic models to address the questions at hand is a strength of the present work. However, presumably because of the computational cost of such models, the "phase-separated" "condensates" in this manuscript are extremely small (only 8 chains). An inspection of Fig.1 indicates that while the high-salt configuration (snapshot, bottom right) is more compact and droplet-like than the low-salt configuration (top right), it is not clear that the 50 mM NaCl configuration can reasonably correspond to a dilute or homogeneous phase (without phase separation) or just a condensate with a lower protein concentration because the chains are still highly associated. One may argue that they become two droplets touching each other (the chains are not fully dispersed throughout the simulation box, unlike in typical coarse-grained simulations of biomolecular phase separation). While it may not be unfair to argue from this observation that the condensed phase is less stable at low salt, this raises critical questions about the adequacy of the approach as a stand-alone source of theoretical information. Accordingly, an informative discussion of the limitation of the authors' approach and comparisons with results from complementary approaches such as analytical theories and coarsegrained molecular dynamics will be instructive-even imperative, especially since such results exist in the literature (please see below). 

      We now discuss the limitations of our all-atom simulations and also other approaches (p. 13; see below).

      (2) The aforementioned limitation is reflected by the authors' choice of using Dmax as a sort of phase separation order parameter. However, no evidence was shown to indicate that Dmax exhibits a twostate-like distribution expected of phase separation. It is also not clear whether a Dmax value corresponding to the linear dimension of the simulation box was ever encountered in the authors' simulated trajectories such that the chains can be reliably considered to be essentially fully dispersed as would be expected for the dilute phase. Moreover, as the authors have noted in the second paragraph of the Results, the variation of Dmax with simulation time does not show a monotonic rank order with salt concentration. The authors' explanation is equivalent to stipulating that the simulation system has not fully equilibrated, inevitably casting doubt on at least some of the conclusions drawn from the simulation data. 

      First off, with the extended simulations, the Dmax values converge to a tiered order rank, with successively decreasing values from low salt (50 mM) to intermediate salt (150 and 300 mM) to high salt (500 and 1000 mM). Secondly, as we now state (p. 13), our low-salt simulations mimic a homogenous solution whereas our high-salt simulations mimic the dense phase of a phase-separated system. The intermediate-salt simulations also mimic the dense phase but at a somewhat lower concentration (hence the intermediate Dmax value).

      (3) With these limitations, is it realistic to estimate possible differences in salt concentration between the dilute and condensed phases in the present work? These features, including tie-lines, were shown to be amenable to analytical theory and coarse-grained molecular dynamics simulation (please see below).  

      The differences in salt effects that we report do not represent those between two phases. Rather, as explained in the preceding reply, they represent differences between a homogenous solution at low salt and the dense phase at higher salt. We also acknowledge salt effects calculated by analytical theory and coarse-grained simulations (p. 13).

      (4) In the comparison in Fig.2B between experimental and simulated radius of gyration as a function of [NaCl], there is an outlier among the simulated radii of gyration at [NaCl] ~ 250 mM. An explanation should be offered.  

      After extending the simulations and analyzing the last 500 ns, the Rg data no longer show an outlier though still have some fluctuations from one salt concentration to another.

      (5) The phenomenon of no phase separation at zero and low salt and phase separation at higher salt has been observed for the IDP Caprin1 and several of its mutants [Wong et al., J Am Chem Soc 142, 24712489 (2020) [https://pubs.acs.org/doi/full/10.1021/jacs.9b12208], see especially Fig.9 of this reference]. This work should be included in the discussion and added to Table 1. 

      We now have added Caprin1 to Table 1 (new ref 26) and discuss this paper (p. 13).

      (6) The authors stated in the Introduction that "A unifying understanding of how salt affects the phase separation of IDPs is still lacking". While it is definitely true that much remains to be learned about salt effects on IDP phase separation, the advances that have already been made regarding salt effects on IDP phase separation is more abundant than that conveyed by this narrative. For instance, an analytical theory termed rG-RPA was put forth in 2020 to provide a uniform (unified) treatment of salt, pH, and sequence-charge-pattern effects on polyampholytes and polyelectrolytes (corresponding to the authors' low net charge and high net charge cases). This theory offers a means to predict salt-IDP tie-lines and a comprehensive account of salt effect on polyelectrolytes resulting in a lack of phase separation at extremely low salt and subsequent salt-enhanced phase separation (similar to the case the authors studied here) and in some cases re-entrant phase separation or dissolution [Lin et al., J Chem Phys 152. 045102 (2020) [https://doi.org/10.1063/1.5139661]]. This work is highly relevant and it already provided a conceptual framework for the authors' atomistic results and subsequent discussion. As such, it should definitely be a part of the authors' discussion. 

      We now cite this paper (new ref 34) in Introduction (p. 4). We also discuss its results for Caprin1 (new ref 18; p. 13).

      (7) Bridging interactions by small ions resulting in effective attractive interactions among polyelectrolytes leading to their phase separation have been demonstrated computationally by Orkoulas et al., Phys Rev Lett 90, 048303 (2003) [https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.90.048303]. This result should also be included in the discussion. 

      We now cite this paper (new ref 41; p. 11).

      (8) More recently, the salt-dependent phase separations of Caprin1, its RtoK variants and phosphorylated variant (see item #5 above) were modeled (and rationalized) quite comprehensively using rG-RPA, field-theoretic simulation, and coarse-grained molecular dynamics [Lin et al., arXiv:2401.04873 [https://arxiv.org/abs/2401.04873]], providing additional data supporting a conceptual perspective put forth in Lin et al. J Chem Phys 2020 (e.g., salt-IDP tie-lines, bridging interactions, reentrance behaviors etc.) as well as in the authors' current manuscript. It will be very helpful to the readers of eLife to include this preprint in the authors' discussion, perhaps as per the authors' discretion along the manner in which other preprints are referenced and discussed in the current version of the manuscript. 

      We now cite this paper (new ref 18) and discuss it along with new ref 26 in Discussion (p. 13).

      Reviewer #3 (Public Review): 

      Summary: 

      This study investigates the salt-dependent phase separation of A1-LCD, an intrinsically disordered region of hnRNPA1 implicated in neurodegenerative diseases. The authors employ all-atom molecular dynamics (MD) simulations to elucidate the molecular mechanisms by which salt influences A1-LCD phase separation. Contrary to typical intrinsically disordered protein (IDP) behavior, A1-LCD phase separation is enhanced by NaCl concentrations above 100 mM. The authors identify two direct effects of salt: neutralization of the protein's net charge and bridging between protein chains, both promoting condensation. They also uncover an indirect effect, where high salt concentrations strengthen pi-type interactions by reducing water availability. These findings provide a detailed molecular picture of the complex interplay between electrostatic interactions, ion binding, and hydration in IDP phase separation. 

      Strengths: 

      Novel Insight: The study challenges the prevailing view that salt generally suppresses IDP phase separation, highlighting A1-LCD's unique behavior. 

      Rigorous Methodology: The authors utilize all-atom MD simulations, a powerful computational tool, to investigate the molecular details of salt-protein interactions. 

      Comprehensive Analysis: The study systematically explores a wide range of salt concentrations, revealing a nuanced picture of salt effects on phase separation. 

      Clear Presentation: The manuscript is well-written and logically structured, making the findings accessible to a broad audience. 

      Weaknesses: 

      Limited Scope: The study focuses solely on the truncated A1-LCD, omitting simulations of the full-length protein. This limitation reduces the study's comparative value, as the authors note that the full-length protein exhibits typical salt-dependent behavior. A comparative analysis would strengthen the manuscript's conclusions and broaden its impact.

      Perhaps we did not impress on the reviewer how expensive the all-atom MD simulations on A1-LCD were: the systems each contained half a million atoms and the simulations took many months to complete. That said, we agree with the reviewer that, ideally, a comparative study on a protein showing the typical screening class of salt dependence would have made our work more complete. However, we are confident of the conclusions for several reasons. First, the three salt effects – charge neutralization, bridging, and strengthening of pi-types of interactions – revealed by the all-atom simulations are physically sound and well-supported by other studies. Second, these effects led us to develop a unified picture for the salt dependence of homotypic phase separation, in the form of a predictor for the classes of salt dependence based on amino-acid composition. This predictor works well for nearly 30 proteins. Third, recent studies using analytical theory and coarse-grained simulations (new ref 18) also strongly support our conclusions.

      Reviewer #1 (Recommendations For The Authors): 

      (1) In Figure 1, the color scheme should be updated and the figure remade, as the current set of color choices makes it very difficult to distinguish the magenta spheres.  

      We have increased the sizes of ions in Figure 1 to make them distinguishable.

      (2) Within the framework of atomistic simulations, the influence of salt concentration alteration on protein conformational plasticity is worth investigating. This could be correlated (with proper details) with the effect of salt-concentration-modulated protein aggregation behavior. 

      We now use RMSF to measure conformational plasticity, which shows a clear salt-dependent trend with a 27% reduction in fluctuations from 50 mM to 1000 mM NaCl (new Fig. S1).

      (3) The authors should mention the protein concentrations employed in the simulations and whether these are consistent with experimentally used concentrations.  

      We have mentioned the initial concentration (3.5 mM). We now further state that this concentration is maintained in the low-salt simulations, indicating absence of phase separation, but is increased to 23 mM in the high-salt simulations, indicating phase separation. The latter value is consistent with the measured concentrations in the dense phase (last two paragraphs of p. 5).

      (4) It would be useful to test the salt effect for at least two extreme salt concentrations at various protein concentrations, consistent with experimental protein concentration ranges.  

      In simulation studies of short peptides (ref 37), we have shown that the initial concentration does not affect the final concentration in the dense phase, as expected for phase-separation systems. We expect that the same will be true for the A1-LCD system at intermediate and high salt where phase separation occurs. Though this expectation could be tested by simulations at a different initial protein concentration, such simulations would be expensive but unlikely to yield new physical insight.

      (5) Importantly, the simulations do not appear to have converged well enough (Figure 2A). The authors should extend the simulation trajectories to ensure the system has reached a steady state.  

      We extended the simulations for an additional 500 ns, which now appear to show convergence. In Figure 2A we now see Dmax values converge to a tiered order rank, with successively decreasing values from low salt (50 mM) to intermediate salt (150 and 300 mM) to high salt (500 and 1000 mM). 

      (6) The authors mention "phase separation" in the title, but with only a 1 μs simulation trajectory, it is not possible to simulate a phenomenon like phase separation accurately. Since atomistic simulations cannot realistically capture phase separation on this timescale, a coarse-grained approach is more suitable. To properly explore salt effects in the context of phase separation, long timescale simulation trajectories should be considered. Otherwise, the data remain unreliable. 

      Our all-atom simulations revealed rich salt effects that might have been missed in coarse-grained simulations. It is true that coarse-grained models allow the simulations of the phase separation process, but as we have recently demonstrated (refs 36 and 37), all-atom simulations on the μs timescale are also able to capture the spontaneous phase separation of peptides and small IDPs. A1-LCD is much larger than those systems, so we had to use a relatively small chain number (8 chains here vs 64 used in ref 37 and 16 used in ref 37). S2ll, we observe the condensation into a dense phase at high salt. We discuss the pros and cons of all-atom vs. coarse-grained simulations in p. 13.

      (7) In Figure 5E, the plot does not show that g(r) has reached 1. If it does, the authors should show the full curve. The same issue remains with supplementary figures 1, 2, 3, etc.  

      We now show the approach to 1 in the insets of Figs. S2, S3, S4, and 5E.

      (8) None of the data is represented with error bars. The authors should include error bars in their data representations. 

      We have now included error bars in all graphs that report average values.

      (9) The authors state that "the net charge of the system reduces to only +8 at 1000 mM NaCl (Figure 3C)" but do not explain how this was calculated. 

      We now add this explanation in methods (p. 16).

      (10). The authors mention "similar to the role played by ATP molecules in driving phase separation of positively charged IDPs." However, ATP can inhibit aggregation, and its induction of phase separation is concentration-dependent. Given ATP's large aromatic moiety, its comparison to ions is not straightforward and is more complex. This comparison can be at best avoided. 

      In this context we are comparing the bridging capability of ATP molecules in driving phase separation of positively charged IDPs in ref 36 to the bridging capability of the ions here. In ref 36 the authors show ATP bridging interactions between protein chains similar to what we show here with ions.

      (11) Many calculations are vaguely represented. The process for calculating the number of bridging ions, for example, is not well documented. The authors should provide sufficient details to allow for the reproducibility of the data. 

      We have now expanded the methods section to include more detailed information on calculations done.

      Reviewer #3 (Recommendations For The Authors): 

      Include error bars or standard deviations for all results averaged over four replicates, particularly for the number of ions and contacts per residue. This would provide a clearer picture of the data's reliability and variability. 

      We have now included error bars in all graphs that report averaged values.

      Strengthen the support for the conclusion that "each Arg sidechain often coordinates two Cl- ions, multiple backbone carbonyls often coordinate a single Na+ ion." While Fig. 3A clearly demonstrates ArgCl- coordination, the Na+ coordination claim for a 131-residue protein requires further clarification. Consider including the integration profile of radial distribution functions for Na+ ions to bolster this assertion. 

      We now report the number of Na+ ions that coordinate with multiple backbone carbonyls (p. 7) as well as the number of Na+ ions that bridge between A1-LCD chains via coordination with multiple backbone carbonyls (p. 9). Please note that Figure 4A right panel displays an example of Na+ coordinating with multiple backbone carbonyls.

      Address the following typographical errors in the main text: o Page 11, line 25: "distinct classes of sat dependence" should be "distinct classes of salt dependence" o Page 14, line 9: "for Cl- and 3.0 and 5.4 A" should be "for Cl- and 3.0 and 5.4 √Ö" o Page 14, line 18: "As a control, PRDFs for water were also calculated" should be "As a control, RDFs for water were also calculated" (assuming PRDF was meant to be RDF) 

      We have now corrected these typos.

      Consider expanding the study to include simulations of the full-length protein to provide a more comprehensive comparison between the truncated A1-LCD and the complete protein's behavior in various salt concentrations. 

      As we explained above, even with eight chains of A1-LCD, which has 131 residues, the systems already contain half a million atoms each and the all-atom simulations took many months to complete. Full-length A1 has 314 residues so a multi-chain system would be too large to be feasible for all-atom simulations.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Summary:

      Crosslinking mass spectrometry has become an important tool in structural biology, providing information about protein complex architecture, binding sites and interfaces, and conformational changes. One key challenge of this approach represents the quantitation of crosslinking data to interrogate differential binding states and distributions of conformational states.

      Here, Luo and Ranish present a novel class of isobaric crosslinkers ("Qlinkers"), conduct proof-of-concept benchmarking experiments on known protein complexes, and show example applications on selected target proteins. The data are solid and this could well be an exciting, convincing new approach in the field if the quantitation strategy is made more comprehensive and the quantitative power of isobaric labeling is fully leveraged as outlined below. It's a promising proof-of-concept, and potentially of broad interest for structural biologists.

      Strengths:

      The authors demonstrate the synthesis, application, and quantitation of their "Q2linkers", enabling relative quantitation of two conditions against each other. In benchmarking experiments, the Q2linkers provide accurate quantitation in mixing experiments. Then the authors show applications of Q2linkers on MBP, Calmodulin, selected transcription factors, and polymerase II, investigating protein binding, complex assembly, and conformational dynamics of the respective target proteins. For known interactions, their findings are in line with previous studies, and they show some interesting data for TFIIA/TBP/TFIIB complex formation and conformational changes in pol II upon Rbp4/7 binding.

      Weaknesses:

      This is an elegant approach but the power of isobaric mass tags is not fully leveraged in the current manuscript.

      First, "only" Q2linkers are used. This means only two conditions can be compared. Theoretically, higher-plexed Qlinkers should be accessible and would also be needed to make this a competitive method against other crosslinking quantitation strategies. As it is, two conditions can still be compared relatively easily using LFQ - or stable-isotope-labeling based approaches. A "Q5linker" would be a really useful crosslinker, which would open up comprehensive quantitative XLMS studies.

      We agree that a multiplexed Qlinker approach would be very useful. The multiplexed Qlinkers are more difficult and more expensive to synthesize. We are currently working on different schemes for synthesizing multiplexed Qlinkers.

      Second, the true power of isobaric labeling, accurate quantitation across multiple samples in a single run, is not fully exploited here. The authors only show differential trends for their interaction partners or different conformational states and do not make full quantitative use of their data or conduct statistical analyses. This should be investigated in more detail, e.g. examine Qlinker quantitation of MBP incubated with different concentrations of maltose or Calmodulin incubated with different concentrations of CBPs. Does Qlinker quantitation match ratios predicted using known binding constants or conformational state populations? Is it possible to extract ratios of protein populations in different conformations, assembly, or ligand-bound states?

      With these two points addressed this approach could be an important and convincing tool for structural biologists.

      We agree that multiplexed Qlinkers would open the door to exciting avenues of investigation such as studying conformational state populations.  We plan to conduct the suggested experiments when multiplexed Qlinkers are available.

      Reviewer #2 (Public review):

      The regulation of protein function heavily relies on the dynamic changes in the shape and structure of proteins and their complexes. These changes are widespread and crucial. However, examining such alterations presents significant challenges, particularly when dealing with large protein complexes in conditions that mimic the natural cellular environment. Therefore, much emphasis has been put on developing novel methods to study protein structure, interactions, and dynamics. Crosslinking mass spectrometry (CSMS) has established itself as such a prominent tool in recent years. However, doing this in a quantitative manner to compare structural changes between conditions has proven to be challenging due to several technical difficulties during sample preparation. Luo and Ranish introduce a novel set of isobaric labeling reagents, called Qlinkers, to allow for a more straightforward and reliable way to detect structural changes between conditions by quantitative CSMS (qCSMS).

      The authors do an excellent job describing the design choices of the isobaric crosslinkers and how they have been optimized to allow for efficient intra- and inter-protein crosslinking to provide relevant structural information. Next, they do a series of experiments to provide compelling evidence that the Qlinker strategy is well suited to detect structural changes between conditions by qCSMS. First, they confirm the quantitative power of the novel-developed isobaric crosslinkers by a controlled mixing experiment. Then they show that they can indeed recover known structural changes in a set of purified proteins (complexes) - starting with single subunit proteins up to a very large 0.5 MDa multi-subunit protein complex - the polII complex.

      The authors give a very measured and fair assessment of this novel isobaric crosslinker and its potential power to contribute to the study of protein structure changes. They show that indeed their novel strategy picks up expected structural changes, changes in surface exposure of certain protein domains, changes within a single protein subunit but also changes in protein-protein interactions. However, they also point out that not all expected dynamic changes are captured and that there is still considerable room for improvement (many not limited to this crosslinker specifically but many crosslinkers used for CSMS).

      Taken together the study presents a novel set of isobaric crosslinkers that indeed open up the opportunity to provide better qCSMS data, which will enable researchers to study dynamic changes in the shape and structure of proteins and their complexes. However, in its current form, the study some aspects of the study should be expanded upon in order for the research community to assess the true power of these isobaric crosslinkers. Specifically:

      Although the authors do mention some of the current weaknesses of their isobaric crosslinkers and qCSMS in general, more detail would be extremely helpful. Throughout the article a few key numbers (or even discussions) that would allow one to better evaluate the sensitivity (and the applicability) of the method are missing. This includes:

      (1) Throughout all the performed experiments it would be helpful to provide information on how many peptides are identified per experiment and how many have actually a crosslinker attached to it.

      As the goal of the experiments is to maximize identification of crosslinked peptides which tend to have higher charge states, we targeted ions with charge states of 3+ or higher in our MS acquisition settings for CLMS, and ignored ions with 2+ charge states, which correspond to many of the normal (i.e., not crosslinked) peptides that are identified by MS. As a result, normal peptides are less likely to be identified by the MS procedure used in our CLMS experiments compared to MS settings typically used to identify normal peptides. Our settings may also fail to identify some mono-modified peptides. Like most other CLMS methods, the total number of identified crosslinked peptide spectra is usually less than 1% of the total acquired spectra and we normally expect the crosslinked species to be approximately 1% of the total peptides. 

      We added information about the number of crosslinked and monolinked peptides identified in the pol I benchmarking experiments (line 173).  The number of crosslinks and monolinks identified in the pol II +/- a-amanitin experiment, the TBP/TFIIA/TFIIB experiment and the pol II experiment +/- Rpb4/7 are also provided.

      (2) Of all the potential lysines that can be modified - how many are actually modified? Do the authors have an estimate for that? It would be interesting to evaluate in a denatured sample the modification efficiency of the isobaric crosslinker (as an upper limit as here all lysines should be accessible) and then also in a native sample. For example, in the MBP experiment, the authors report the change of one mono-linked peptide in samples containing maltose relative to the one not containing maltose. The authors then give a great description of why this fits to known structural changes. What is missing here is a bit of what changes were expected overall and which ones the authors would have expected to pick up with their method and why have they not been picked up. For example, were they picked up as modified by the crosslinker but not differential? I think this is important to discuss appropriately throughout the manuscript to help the reader evaluate/estimate the potential sensitivity of the method. There are passages where the authors do an excellent job doing that - for example when they mention the missed site that they expected to see in the initial the pol II experiments (lines 191 to 207). This kind of "power analysis" should be heavily discussed throughout the manuscript so that the reader is better informed of what sensitivity can be expected from applying this method.

      Regarding the Pol II complex experiment described in Figures 4 and 5, out of the 277 lysine residues in the complex, 207 were identified as monolinked residues (74.7%), and 817 crosslinked pairs out of 38,226 potential pairs (2.1%) were observed. The ability of CLMS to detect proximity/reactivity changes may be impacted by several factors including 1) the (low) abundance of crosslinked peptides in complex mixtures, 2) the presence of crosslinkable residues in close proximity with appropriate orientation, and 3) the ability to generate crosslinked peptides by enzymatic digestion that are amenable to MS analysis (i.e., the peptides have appropriate m/z’s and charge states, the peptides ionize well, the peptides produce sufficient fragment ions during MS2 analysis to allow confident identification). Future efforts to enrich crosslinked peptides prior to MS analysis may improve sensitivity.

      It is very difficult to estimate the modification efficiency of Qlinker (or many other crosslinkers) based on peptide identification results. One major reason for this is that trypsin is not able to cleave after a crosslinker-modified lysine residue.  As a result, the peptides generated after the modification reaction have different lengths, compositions, charge states, and ionization efficiencies compared to unmodified peptides. These differences make it very difficult to estimate the modification efficiencies based on the presence/absence of certain peptide ions, and/or the intensities of the modified and unmodified versions of a peptide. Also, 2+ ions which correspond to many normal (i.e., unmodified) peptides were excluded by our MS acquisition settings.

      It is also very difficult to predict which structural changes are expected and which crosslinked peptides and/or modified peptides can be observed by MS.  This is especially true when the experiment involves proteins containing unstructured regions such as the experiments involving Pol II, and TBP, TFIIA and TFIIB. Since we are at the early stages of using qCLMS to study structural changes, we are not sure which changes we can expect to observe by qCLMS. Additional applications of Qlinker-CLMS are needed to better understand the types of structural changes that can be studied using the approach.

      We hope that our discussions of some the limitations of CLMS for detecting conformational/reactivity changes provide the reader with an understanding of the sensitivity that can be expected with the approach.  At the end of the paragraph about the pol II a-amanitin experiment we say, “Unfortunately, no Q2linker-modified peptides were identified near the site where α-amanitin binds. This experiment also highlights one of the limitations of residue-specific, quantitative CLMS methods in general. Reactive residues must be available near the region of interest, and the modified peptides must be identifiable by mass spectrometry.” In the section about Rbp4/7-induced structural changes in pol II we describe the under-sampling issue. And in the last paragraph we reiterate these limitations and say, “This implies that this strategy, like all MS-based strategies, can only be used for interpretation of positively identified crosslinks or monolinks. Sensitivity and under sampling are common problems for MS analysis of complex samples.”

      (3) It would be very helpful to provide information on how much better (or not) the Qlinker approach works relative to label-free qCLMS. One is missing the reference to a potential qCLMS gold standard (data set) or if such a dataset is not readily available, maybe one of the experiments could be performed by label-free qCLMS. For example, one of the differential biosensor experiments would have been well suited.

      We agree with the reviewer that it will be very helpful to establish gold standard datasets for CLMS. As we further develop and promote this technology, we will try to establish a standardized qCLMS.

      Reviewer #1 (Recommendations for the authors):

      Only a very minor point:

      I may have missed it but it's not really clear how many independent experiments were used for the benchmarking quantitation and mixing experiments for Figure 1. What is the reproducibility across experiments on average and on a per-peptide basis?

      Otherwise, I think the approach would really benefit from at least "Q5linkers" or even "Q10linkers", if possible. And then conduct detailed quantitative studies, either using dilution series or maybe investigating the kinetics of complex formation.

      We used a sample of BSA crosslinked peptides to optimize the MS settings, establish the MS acquisition strategies and test the quantification schemes.  The data in Figure 1 is based on one experiment, in which used ~150 ug of purified pol I complexes from a 6 L culture. We added this information to the Figure 1 legend. We also provide information about the reproducibility of peptide quantification by plotting the observed and expected ratios for each monolinked and crosslinked peptide identified in all of the runs in Figure S3.

      We agree with the reviewer that the Qlinker approach would be even more attractive if multiplex Qlinker reagents were designed. The multiplexed Qlinkers are more difficult and more expensive to synthesize. We are currently working on different schemes for synthesizing multiplexed Qlinkers.

      Reviewer #2 (Recommendations for the authors):

      In addition to the public review I have the following recommendations/questions:

      (1) The first part of the results section where the synthesis of the crosslinker is explained is excellent for mass spec specialists, but problematic for general readers - either more info should be provided (e.g. b1+ ions - most readers will have no idea why that is) - or potentially it could be simplified here and the details shifted to Materials and Methods for the expert reader. The same is true below for the length of spacer arms.

      However - in general this level of detail is great - but can impact the ease of understanding for the more mass spec affine but not expert reader.

      We have added the following sentence to assist the general reader: A b1+ ion is an ion with a charge state of +1 corresponding to the first N-terminal amino acid residue after breakage of the first peptide bond (lines 126-128).

      (2) The Calmodulin experiment (lines 239 to 257) - it is a very nice result that they see the change in the crosslinked peptide between residues K78-K95, but the monolinks are not just detected as described in the text but actually go 2 fold up. This would have been actually a bit expected if the residues are now too far away to be still crosslinked that the monolinks increase. In this case, this counteraction of monolinks to crosslinked sites can also be potentially used as a "selection criteria" for interesting sites that change. Is that a possible interpretation or do the authors think that upregulation of the monolinks is a coincidence and should not be interpreted?

      We agree with the reviewer that both monolinks and crosslinks can be used as potential indicators for some changes. However, it is much more difficult to interpret the abundance information from monolinks because, unlike crosslinks, there is little associated structural/proximity information with monolinks. Because it is difficult to understand the reason(s) for changes in monolink abundance, we concentrate on changes in crosslink abundances, which provide proximity/structural information about the crosslinked residues.

      (3) Lines 267 to 274: a small thing but the structural information provided is quite dense I have to say. Maybe simplify or accompany with some supplemental figures?

      We agree that the structural information is a bit dense especially for readers who are not familiar with the pol II system.  We added a reference to Figure 3c (line 177) to help the reader follow the structural information. 

      As qCLMS is still a relatively new approach for studying conformational changes, the utility of the approach for studying different types of conformational changes is still unclear. Thus, one of the goals of the experiments is to demonstrate the types of conformational changes that can be detected by Q2linkers.  We hope that the detailed descriptions will help structural biologists understand the types of conformational changes that can be detected using Qlinkers.

      (4) Line 280: explain maybe why the sample was fractionated by SCX (I guess to separate the different complexes?).

      SCX was used to reduce the complexity of the peptide mixtures. As the samples are complex and crosslinked peptides are of low abundance compared to normal peptides, SCX can separate the peptides based on their positive charges.  Larger peptides and peptides with higher charge states, such as crosslinked peptides, tend to elute at higher salt concentration during SCX chromatography.  The use of SCX to fractionate complex peptide mixtures is described in the “General crosslinking protocol and workflow optimization” section of the Methods, and we added a sentence to explain why the sample was fractionated by SCX (lines 278-279).

      (5) Lines 354 to 357: "This suggests that the inability to identity most of these crosslinked peptides in both experiments is mainly due to under-sampling during mass spectrometry analysis of the complex samples, rather than the absence of the crosslinked peptides in one of the experiments."

      This is an extremely important point for the interpretation of missing values - have the authors tried to also collect the mass spec data with DIA which is better in recovery of the same peptide signals between different samples? I realize that these are isobaric samples so DIA measurements per se are not useful as the quantification is done on the reporter channels in the MS2, but it would at least give a better idea if the missing signals were simply not picked up for MS2 as claimed by the authors or the modified peptides are just not present. Another possibility is for the authors to at least try to use a "match between the run" function as can be done in Maxquant. One of the strengths of the method is that it is quantitative and two states are analyzed together, but as can be seen in this experiment, more than two states might want to be compared. In such cases, the under-sampling issue (if that is indeed the cause) makes interpretation of many sites hard (due to missing values) and it would be interesting if for example, an analysis approach with a "match between the runs" function could recover some of the missing values.

      We agree that undersampling/missing values is an important issue that needs to be addressed more thoroughly. This also highlights the importance of qCLMS, as conclusions about structural changes based on the presence/absence of certain crosslinked species in database search results may be misleading if the absence of a species is due to under-sampling. We have not tried to collect the data with DIA since we would lose the quantitative information. It would be interesting to see if match between runs can recover some of the missing values. While this could provide evidence to support the under-sampling hypothesis, it would not recover the quantitative information.

      We recommend performing label swap experiments and focusing downstream analysis on the crosslinks/monolinks that are identified on both experiments. Future development of multiplexed Qlinker reagents should help to alleviate under-sampling issues. See response to Reviewer #1.

      (6) Lines 375 to 393 (the whole paragraph): extremely detailed and not easy to follow. Is that level of detail necessary to drive home that point or could it be visualized in enough detail to help follow the text?

      We agree that the paragraph is quite detailed, but we feel that the level of detailed is necessary to describe the types of conformational changes that can be detected by the quantitative crosslinking data, and also illustrate the challenges of interpreting the structural basis for some crosslink abundance changes even when high resolution structural data exists.

      To make it easier to follow, we added a sentence to the legend of Figure 5b. “In the holo-pol II structure (right), Switch 5 bending pulls Rpb1:D1442 away from K15, breaking the salt bridge that is formed in the core pol II structure (left). The increase in the abundances of the Rpb1:15-Rpb6:76 and Rpb1:15-Rpb6:72 crosslinks in holo-pol II is likely attributed to the salt bridge between K15 and D1442 in core pol II which impedes the NHS ester-based reaction between the epsilon amino group of K15 and the crosslinker.”

      (7) Final paragraph in the results section - lines 397 and 398: "All of the intralinks involving Rpb4 are more abundant in holo-pol II as expected." If I understand that experiment correctly the intralinks with Rpb4 should not be present at all as Rpb4 has been deleted. Is that due to interference between the 126 and 127 channels in MS2? If so, then this also sets a bit of the upper limit of quantitative differences that can be seen. The authors should at least comment on that "limitation".

      Yes, we shouldn’t detect any Rpb4 peptides in the sample derived from the Rpb4 knockout strain. The signal from Rpb4 peptides in the DRpb4 sample is likely due to co-eluting ions. To clarify, we changed the text to:

      All of the intralinks involving Rpb4 are more abundant in the holo-pol II sample (even though we don’t expect any reporter ion signal from Rpb4 peptides derived from the ∆Rpb4 pol II sample, we still observed reporter ion signals from the channel corresponding to the DRpb4 sample, potentially due to the presence of low abundance, co-eluting ions)(lines 395-399).

      (8) Materials and Methods - line 690: I am probably missing something but why were two different mass additions to lysine added to the search (I would have expected only one for the crosslinker)?

      The 297 Da modification is for monolinked peptides with one end of the crosslinker hydrolyzed and 18 Da water molecule is added. The 279 Da modification is for crosslinks and sometimes for looplinks (crosslinks involving two lysine residues on the same tryptic peptide).

    1. Author response:

      Review #1:

      Also, they observed no difference in the binding free energy of phosphatidylserine with wild TREM2-Ig and mutant TREM2-Ig, which is a bit inconsistent with the previous report with experiment studies by Journal of Biological Chemistry 293, (2018), Alzheimer's and Dementia 17, 475-488 (2021), Cell 160, 1061-1071 (2015).

      We directly note this contrast with experimental findings in the body of our work, particularly given the known limitations of free energy calculations in MD simulations, as outlined in the Limitations section. Our claim is that the loss of function in the R47H variant extends beyond decreased binding affinities and also impacts binding patterns. As stated in our manuscript: ‘Our observations for both sTREM2 and TREM2 indicate that R47H-induced dysfunction may result not only from diminished ligand binding but also an impaired ability to discriminate between different ligands in the brain, proposing a novel mechanism for loss-of-function.’

      Perhaps the authors made significant efforts to run a number of simulations for multiple models, which is nearly 17 microseconds in total; none of the simulations has been repeated independently at least a couple of times, which makes me uncomfortable to consider this finding technically true. Most of the important conclusions that authors claimed, including the opposite results from previous research, have been made on the single run, which raises the question of whether this observation can be reproduced if the simulation has been repeated independently. Although the authors stated the sampling number and length of MD simulations in the current manuscript as a limitation of this study, it must be carefully considered before concluding rather than based on a single run.

      The reviewer raises an interesting point regarding the repetition of individual simulations, a consideration we carefully evaluated during the design of this study. However, we believe our approach—running multiple independent models of the same system—offers a more rigorous methodology than simply repeating simulations of the same docked model. This strategy allows us to sample several distinct starting configurations, thereby minimizing biases introduced by docking algorithms and single-model reliance.

      In our study, we demonstrate that within the 150 ns timescale of our protein/ligand (PL) simulations, the relatively small ligands are able to move from their initial docking positions to a specific binding site. While ideally, replicates of these independent models would further strengthen the findings, this was not computationally feasible given the unprecedented total duration of our simulations. Importantly, our conclusions are seldom based on the results of a single protein/PL simulation.

      Moreover, the ergodic hypothesis suggests that over sufficiently long timescales, simulations will explore all accessible states. Additionally, we have performed several replicate simulations of our WT and R47H Ig-like domain models in solution, specifically to investigate CDR2 loop dynamics.

      In this case, since the system involves only the protein and lacks the independent replicates seen in the protein/PL simulations, these runs were chosen to effectively capture the stochastic nature of CDR2 loop movement.

      sTREM2 shows a neuroprotective effect in AD, even with the mutations with R47H, as evidenced by authors based on their simulation. sTREM2 is known to bind Aβ within the AD and reduce Aβ aggregation, whereas R47H mutant increases Aβ aggregation. I wonder why the authors did not consider Aβ as a ligand for their simulation studies. As a reader in this field, I would prefer to know the protective mechanism of sTREM2 in Aβ aggregation influenced by the stalk domain.

      Our initial approach for this study used Aβ as a ligand rather than phospholipids. However, we noted the difficulties in simulating Aβ, particularly in choosing relevant Aβ structures and oligomeric states (n-mers). We believe that phospholipids represent an equally pertinent ligand for TREM2, given its critical role in lipid sensing and metabolism. Furthermore, there is growing recognition in the AD research community of the need to move beyond Aβ and focus on other understudied pathological mechanisms.

      In a similar manner, why only one mutation is considered "R47H" for the study? There are more server mutations reported to disrupt tethering between these CDRs, such as T66M. Although this "T66M" is not associated with AD, I guess the stalk domain protective mechanism would not be biased among different diseases. Therefore, it would be interesting to see whether the findings are true for this T66M.

      In most previous studies, the mechanism for CDR destabilization by mutant was explored, like the change of secondary structures and residue-wise interloop interaction pattern. While this is not considered in this manuscript, neither detailed residue-wise interaction that changed by mutant or important for 'ligand binding" or "stalk domain".

      These are both excellent points that deserve extensive investigation. While R47H is the most common and prolific mutation in literature, an extensive catalog of other mutations is important to explore. We are currently preparing two separate publications that will delve into these gaps in more detail, as addressing them was beyond the scope of the present study.

      The comparison between the wild and mutant and other different complex structures must be determined by particular statistical calculations to state the observed difference between different structures is significant. Since autocorrelation is one of the major concerns for MD simulation data for predicting statistical differences, authors can consider bootstrap calculations for predicting statistical significance.

      We are currently working to address this comment to strengthen the validity of our results and statistical conclusions in the revised manuscript.  

      Review #2:

      The authors state that reported differences in ligand binding between the TREM2 and sTREM2 remain unexplained, and the authors cite two lines of evidence. The first line of evidence, which is true, is that there are differences between lipid binding assays and lipid signaling assays. However, signaling assays do not directly measure binding. Secondly, the authors cite Kober et al 2021 as evidence that sTREM2 and TREM2 showed different affinities for Abeta1-42 in a direct binding assay. Unfortunately, when Kober et al measured the binding of sTREM2 and Ig-TREM2 to Abeta they reported statistically identical affinities (Kd = 3.8 {plus minus} 2.9 µM vs 5.1 {plus minus} 3.7 µM) and concluded that the stalk did not contribute measurably to Abeta binding.

      We appreciate the reviewer’s insight and acknowledge the need to clarify our interpretation of Kober et al. (2021). We will adjust and refocus how we reference this evidence from Kober et al. in our revised manuscript. 

      In line with these findings, our energy calculations reveal that sTREM2 exhibits weaker—but still not statistically significant—binding affinities for phospholipids compared to TREM2. These results suggest that while overall binding affinity might be similar, differences in binding patterns or specific lipid interactions could still contribute to functional differences observed between TREM2 and sTREM2.

      The authors appear to take simulations of the Ig domain (without any stalk) as a surrogate for the full-length, membrane-bound TREM2. They compare the Ig domain to a sTREM2 model that includes the stalk. While it is fully plausible that the stalk could interact with and stabilize the Ig domain, the authors need to demonstrate why the full-length TREM2 could not interact with its own stalk and why the isolated Ig domain is a suitable surrogate for this state.

      We believe that this is a major limitation of all computational work of TREM2 to-date, and of experimental work which only presents the Ig-like domain. This is extensively discussed in the limitations section of our paper. Hence, we are currently working toward a manuscript that will be the first biologically relevant model of TREM2 in a membrane and will challenge the current paradigm of using the Ig-like domain as an experimental surrogate for TREM2.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      PPARgamma is a nuclear receptor that binds to orthosteric ligands to coordinate transcriptional programs that are critical for adipocyte biogenesis and insulin sensitivity. Consequently, it is a critical therapeutic target for many diseases, but especially diabetes. The malleable nature and promiscuity of the PPARgamma orthosteric ligand binding pocket have confounded the development of improved therapeutic modulators. Covalent inhibitors have been developed but they show unanticipated mechanisms of action depending on which orthosteric ligands are present. In this work, Shang and Kojetin present a compelling and comprehensive structural, biochemical, and biophysical analysis that shows how covalent and noncovalent ligands can co-occupy the PPARgamma ligand binding pocket to elicit distinctive preferences of coactivator and corepressor proteins. Importantly, this work shows how the covalent inhibitors GW9662 and T0070907 may be unreliable tools as pan-PPARgamma inhibitors despite their widespread use.

      Strengths:

      - Highly detailed structure and functional analyses provide a comprehensive structure-based hypothesis for the relationship between PPARgamma ligand binding domain co-occupancy and allosteric mechanisms of action. - Multiple orthogonal approaches are used to provide high-resolution information on ligand binding poses and protein dynamics.

      - The large number of x-ray crystal structures solved for this manuscript should be applauded along with their rigorous validation and interpretation.

      Weaknesses

      - Inclusion of statistical analysis is missing in several places in the text. - Functional analysis beyond coregulator binding is needed.

      We added additional statistical analyses as recommended (Source Data 1, a Microsoft Excel spreadsheet).

      Related to functional analysis, we cite and studies from our previous publication (Hughes et al. Nature Communications 2014 5:3571) where we demonstrated that the covalent inhibitor ligands (GW9662 and T0070907) do not block the activity of other ligands using a PPARγ transcriptional reporter assay and gene expression analysis in 3T3-L1 preadipocytes. Our study here expands on this finding and other published studies showing the structural mechanism for the lack of blocking activity by the covalent inhibitors.

      Reviewer #2 (Public Review):

      Summary:

      The flexibility of the ligand binding domain (LBD) of NRs allows various modes of ligand binding leading to various cellular outcomes. In the case of PPARγ, it's known that two ligands can co-bind to the receptor. However, whether a covalent inhibitor functions by blocking the binding of a non-covalent ligand, or co-bind in a manner that weakens the binding of a non-covalent ligand remains unclear. In this study, the authors first used TR-FRET and NMR to demonstrate that covalent inhibitors (such as GW9662 and T0070907) weaken but do not prevent non-covalent synthetic ligands from binding, likely via an allosteric mechanism. The AF-2 helix can exchange between active and repressive conformations, and covalent inhibitors shift the conformation toward a transcriptionally repressive one to reduce the orthosteric binding of the non-covalent ligands. By co-crystal studies, the authors further reveal the structural details of various non-covalent ligand binding mechanisms in a ligand-specific manner (e.g., an alternate binding site, or a new orthosteric binding mode by alerting covalent ligand binding pose).

      Strengths:

      The biochemical and biophysical evidence presented is strong and convincing.

      Weaknesses:

      However, the co-crystal studies were performed by soaking non-covalent ligands to LBD pre-crystalized with a covalent inhibitor. Since the covalent inhibitors would shift the LBD toward transcriptionally repressive conformation which reduces orthosteric binding of non-covalent ligands, if the sequence was reversed (i.e., soaking a covalent inhibitor to LBD pre-crystalized with a non-covalent ligand), would a similar conclusion be drawn? Additional discussion will broaden the implications of the conclusion.

      This is an interesting point, which we now expand upon in a new (third) paragraph of the discussion in our revised manuscript:

      “In our previous study, we observed synthetic and natural/endogenous ligand co-binding via co-crystallography where preformed crystals of PPARγ LBD bound to unsaturated fatty acids (UFAs) were soaked with a synthetic ligand, which pushed the bound UFA to an alternate site within the orthosteric ligand-binding pocket 8. In the scenario of synthetic ligand cobinding with a covalent inhibitor, it is possible that soaking a covalent inhibitor into preformed crystals where the PPARγ LBD is already bound to a non-covalent ligand may prove to be difficult. The covalent inhibitor would need to flow through solvent channels within the crystal lattice, which may not be a problem. However, upon reaching the entrance surface to the orthosteric ligand-binding pocket, it may be difficult for the covalent inhibitor to gain access to the region of the orthosteric pocket required for covalent modification as the larger non-covalent ligand could block access. This potential order of addition problem may not be a problem for studies in solution or in cells, where the non-covalent ligand can more freely exchange in and out of the orthosteric pocket and over time the covalent reaction would reach full occupancy.”

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - IC50 or EC50 values are not reported for the coregulator interaction assays, R2 for fit should also be reported where Ki and IC50s are disclosed.

      We now report fitting statistics and IC50/EC50 values when possible in Figure 2B and Source Data 1 along with R2 values for the fit. We note that some data do not show complete or robust enough binding curves to faithfully fit to a dose response equation.

      -  Reporter gene or qPCR should be performed for the combinations of covalent and noncovalent ligands to show how these molecules impact transcriptional activities rather than just coregulator binding profiles.

      We previously performed PPARγ transcriptional reporter assay and gene expression analysis in 3T3-L1 preadipocytes to demonstrate that cotreatment of a covalent inhibitor (GW9662 or T0070907) with a non-covalent ligand does not block activity of the non-covalent ligand and showed cobinding-induced activation relative to DMSO control (Hughes et al., 2024 Nature Communications). We did not specifically mention this in our original manuscript, but we now call this out in the first paragraph of the results section.

      - Inclusion of a structure figure to show the different helix 12 orientations should be included in the introduction. Likewise, how the overall structure of the LBD changes as a result of the cobinding in the discussion or a summary model would be helpful.

      Our revised manuscript includes a structure figure called out in the introduction describing the active and repressive helix 12 PPARγ LBD conformations (new Figure 1). There are no major changes to the overall structure of the LBD compared to the active conformation that crystallized, so we did not include a summary model figure but we do refer readers to our previous paper (Shang and Kojetin, Structure 2021 29(9):940-950) in the penultimate paragraph of the discussion. We also added the following sentence to the crystallography results section related to the overall LBD changes:

      “The structures show high structural similarity to the transcriptionally active LBD conformation with rmsd values ranging from 0.77–1.03Å (Supplementary Table S2)”

      A typo in paragraph 3 of the discussion says "long-live" when it should probably say "long-lived."

      We corrected this typo.

      Reviewer #2 (Recommendations For The Authors):

      It's interesting that ligand-specific binding mode of non-covalent ligands was observed. Would modifications of the chemical structure of a covalent inhibitor alter the allosteric binding behavior of non-covalent ligands in a predictive manner? If so, how can such SAR be used to guide the design of covalent inhibitors to more broadly and effectively inhibit agonists of various chemical structures? Discussion on this topic could be valuable.

      This is an interesting point, which we now discuss in the penultimate and last paragraphs of the discussion:

      “Another way to test this structural model could be through the use of covalent PPARγ inverse agonist analogs with graded activity 23, where one might posit that covalent inverse agonist analogs that shift the LBD conformational ensemble towards a fully repressive LBD conformation may better inhibit synthetic ligand cobinding.”

      “It may be possible to use the crystal structures we obtained to guide structure-informed design of covalent inhibitors that would physically block cobinding of a synthetic ligand. This could be the potential mechanism of a newer generation covalent antagonist inhibitor we developed, SR16832, that more completely inhibit alternate site ligand binding of an analog of MRL20, rosiglitazone and the UFA docosahexaenoic acid (DHA)

      21 and thus may be a better choice for the field to use as a covalent ligand inhibitor of PPARγ.”

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Summary: 

      The manuscript focuses on the role of the deubiquitinating enzyme UPS-50/USP8 in endosome maturation. The authors aimed to clarify how this enzyme drives the conversion of early endosomes into late endosomes. Overall, they did achieve their aims in shedding light on the precise mechanisms by which UPS-50/USP8 regulates endosome maturation. The results support their conclusions that UPS-50 acts by disassociating RABX-5 from early endosomes to deactivate RAB-5 and by recruiting SAND-1/Mon1 to activate RAB-7. This work is commendable and will have a significant impact on the field. The methods and data presented here will be useful to the community in advancing our understanding of endosome maturation and identifying potential therapeutic targets for diseases related to endosomal dysfunction. It is worth noting that further investigation is required to fully understand the complexities of endosome maturation. However, the findings presented in this manuscript provide a solid foundation for future studies. 

      We thank this reviewer for the instructive suggestions and encouragement.

      Strengths: 

      The major strengths of this work lie in the well-designed experiments used to examine the effects of UPS-50 loss. The authors employed confocal imaging to obtain a picture of the aftermath of the USP-50 loss. Their findings indicated enlarged early endosomes and MVB-like structures in cells deficient in USP-50/USP8. 

      We thank this reviewer for the instructive suggestions and encouragement.

      Weaknesses: 

      Specifically, there is a need for further investigation to accurately characterize the anomalous structures detected in the usp-50 mutant. Also, the correlation between the presence of these abnormal structures and ESCRT-0 is yet to be addressed, and the current working model needs to be revised to prevent any confusion between enlarged early endosomes and MVBs. 

      Excellent suggestions. USP8 has been identified as a protein associated with ESCRT components, which are crucial for endosomal membrane deformation and scission, leading to the formation of intraluminal vesicles (ILVs) within multivesicular bodies (MVBs). In usp-50 mutants, we observed a significant reduction in the punctate signals of HGRS-1::GFP and STAM-1 (Figure 1G and H; and Figure1-figure supplement 1B), indicating a disruption in ESCRT-0 complex localization (Author response image 1). Additionally, lysosomal structures are markedly reduced in these mutants. In contrast, we found that early endosomes, as marked by FYVE, RAB-5, RABEX5, and EEA1, are significantly enlarged in usp-50 mutants. Electron microscopy (EM) imaging further revealed an increase in large cellular vesicles containing various intraluminal structures. Given the reduction in lysosomal structures and the enlargement of early endosomes in usp-50 mutants, these enlarged vesicles are likely aberrant early endosomes rather than late endosomal or lysosomal structures. To address potential confusion, we have revised the manuscript according to the reviewer's comments and updated the model to accurately reflect these observations.

      Reviewer #2 (Public Review): 

      Summary: 

      In this study, the authors study how the deubiquitinase USP8 regulates endosome maturation in C. elegans and mammalian cells. The authors have isolated USP8 mutant alleles in C. elegans and used multiple in vivo reporter lines to demonstrate the impact of USP8 loss-of-function on endosome morphology and maturation. They show that in USP8 mutant cells, the early endosomes and MVB-like structures are enlarged while the late endosomes and lysosomal compartments are reduced. They elucidate that USP8 interacts with Rabx5, a guanine nucleotide exchange factor (GEF) for Rab5, and show that USP8 likely targets specific lysine residue of Rabx5 to dissociate it from early endosomes. They also find that the localization of USP8 to early endosomes is disrupted in Rabx5 mutant cells. They observe that in both Rabx5 and USP8 mutant cells, the Rab7 GEF SAND-1 puncta which likely represents late endosomes are diminished, although Rabex5 is accumulated in USP8 mutant cells. The authors provide evidence that USP8 regulates endosomal maturation in a similar fashion in mammalian cells. Based on their observations they propose that USP8 dissociates Rabex5 from early endosomes and enhances the recruitment of SAND-1 to promote endosome maturation. 

      We thank this reviewer for the instructive suggestions and encouragement.

      Strengths: 

      The major highlights of this study include the direct visualization of endosome dynamics in a living multi-cellular organism, C. elegans. The high-quality images provide clear in vivo evidence to support the main conclusions. The authors have generated valuable resources to study mechanisms involved in endosome dynamics regulation in both the worm and mammalian cells, which would benefit many members of the cell biology community. The work identifies a fascinating link between USP8 and the Rab5 guanine nucleotide exchange factor Rabx5, which expands the targets and modes of action of USP8. The findings make a solid contribution toward the understanding of how endosomal trafficking is controlled. 

      We thank this reviewer for the instructive suggestions and encouragement.

      Weaknesses: 

      - The authors utilized multiple fluorescent protein reporters, including those generated by themselves, to label endosomal vesicles. Although these are routine and powerful tools for studying endosomal trafficking, these results cannot tell whether the endogenous proteins (Rab5, Rabex5, Rab7, etc.) are affected in the same fashion. 

      Good suggestion. Indeed, to test whether the endogenous proteins (Rab5, Rabex5, Rab7, etc.) are affected in the same fashion as fluorescent protein reporters, we supplemented our approach with the utilization of endogenous markers. These markers, including Rab5, RAB-5, Rabex5, RABX-5, and EEA1 for early endosomes, as well as RAB-7, Mon1a, and Mon1b for late endosomes, were instrumental in our investigations (refer to Figure 3, Figure 6, Figure 5-figure supplement 1, Figure 5-figure supplement 2, and Figure 6-figure supplement 1). Our comprehensive analysis, employing various methodologies such as tissue-specific fused proteins, CRISPR/Cas9 knock-in, and antibody staining, consistently highlights the critical role of USP8 in early-to-late endosome conversion.

      - The authors clearly demonstrated a link between USP8 and Rabx5, and they showed that cells deficient in both factors displayed similar defects in late endosomes/lysosomes. However, the authors didn't confirm whether and/or to which extent USP8 regulates endosome maturation through Rabx5. Additional genetic and molecular evidence might be required to better support their working model. 

      Excellent point. To test whether USP-50 regulates endosome maturation through RABX-5, we performed additional genetic analyses. In rabx-5(null) mutant animals, the morphology of 2xFYVE-labeled early endosomes is comparable to that of wild-type controls (Figure 4H and I). Introducing the rabx-5(null) mutation into usp-50(xd413) backgrounds resulted in a significant suppression of the enlarged early endosome phenotype characteristic of usp-50(xd413) mutants (Figure 4H and I). These findings suggest that USP-50 may modulate the size of early endosomes through its interaction with RABX-5.

      Reviewer #3 (Public Review): 

      Summary: 

      The authors were trying to elucidate the role of USP8 in the endocytic pathway. Using C. elegans epithelial cells as a model, they observed that when USP8 function is lost, the cells have a decreased number and size in lysosomes. Since USP8 was already known to be a protein linked to ESCRT components, they looked into what role USP8 might play in connecting lysosomes and multivesicular bodies (MVB). They observed fewer ESCRT-associated vesicles but an increased number of "abnormal" enlarged vesicles when USP8 function was lost. At this specific point, it's not clear what the objective of the authors was. What would have been their hypothesis addressing whether the reduced lysosomal structures in USP8 (-) animals were linked to MVB formation? Then they observed that the abnormally enlarged vesicles, marked by the PI3P biosensor YFP-2xFYVE, are bigger but in the same number in USP8 (-) compared to wild-type animals, suggesting homotypic fusion. They confirmed this result by knocking down USP8 in a human cell line, and they observed enlarged vesicles marked by YFP-2xFYVE as well. At this point, there is quite an important issue. The use of YFP-2xFYVE to detect early endosomes requires the transfection of the cells, which has already been demonstrated to produce differences in the distribution, number, and size of PI3P-positive vesicles (doi.org/10.1080/15548627.2017.1341465). The enlarged vesicles marked by YFP-2xFYVE would not necessarily be due to the loss of UPS8. In any case, it appears relatively clear that USP8 localizes to early endosomes, and the authors claim that this localization is mediated by Rabex-5 (or Rabx-5). They finally propose that USP8 dissociates Rabx-5 from early endosomes facilitating endosome maturation. 

      Weaknesses: 

      The weaknesses of this study are, on one side, that the results are almost exclusively dependent on the overexpression of fusion proteins. While useful in the field, this strategy does not represent the optimal way to dissect a cell biology issue. On the other side, the way the authors construct the rationale for each approximation is somehow difficult to follow. Finally, the use of two models, C. elegans and a mammalian cell line, which would strengthen the observations, contributes to the difficulty in reading the manuscript. 

      The findings are useful but do not clearly support the idea that USP8 mediates Rab5-Rab7 exchange and endosome maturation, In contrast, they appear to be incomplete and open new questions regarding the complexity of this process and the precise role of USP8 within it. 

      We thank this reviewer for the insightful comments. Fluorescence-fused proteins serve as potent tools for visualizing subcellular organelles both in vivo and in live settings. Specifically, in epidermal cells of worms, the tissue-specific expression of these fused proteins is indispensable for studying organelle dynamics within living organisms. This approach is necessitated by the inherent limitations of endogenously tagged proteins, whose fluorescence signals are often weak and unsuitable for live imaging or genetic screening purposes. Acknowledging concerns raised by the reviewer regarding potential alterations in organelle morphology due to overexpression of certain fused proteins, we supplemented our approach with the utilization of endogenous markers. These markers, including Rab5, RAB-5, Rabex5, RABX-5, and EEA1 for early endosomes, as well as RAB-7, Mon1a, and Mon1b for late endosomes, were instrumental in our investigations (refer to Figure 3, Figure 6, Figure 5-figure supplement 1, Figure 5-figure supplement 2, and Figure 6-figure supplement 1). Our comprehensive analysis, employing various methodologies such as tissue-specific fused proteins, CRISPR/Cas9 knock-in, and antibody staining, consistently highlights the critical role of USP8 in early-to-late endosome conversion. Specifically, we discovered that the recruitment of USP-50/USP8 to early endosomes is depending on Rabex5. However, instead of stabilizing Rabex5, the recruitment of USP-50/USP8 leads to its dissociation from endosomes, concomitantly facilitating the recruitment of the Rab7 GEF SAND-1/Mon1. In cells with loss-of-function mutations in usp-50/usp8, we observed enhanced RABX-5/Rabex5 signaling and mis-localization of SAND-1/Mon1 proteins from endosomes. Consequently, this disruption impairs endolysosomal trafficking, resulting in the accumulation of enlarged vesicles containing various intraluminal contents and rudimentary lysosomal structures.

      Through an unbiased genetic screen, verified by cultured mammalian cell studies, we observed that loss-of-function mutations in usp-50/usp8 result in diminished lysosome/late endosomes. Electron microscopy (EM) analysis indicated that usp-50 mutation leads to abnormally enlarged vesicles containing various intraluminal structures in worm epidermal cells. USP8 is known to regulate the endocytic trafficking and stability of numerous transmembrane proteins. Given that lysosomes receive and degrade materials generated by endocytic pathways, we hypothesized that the abnormally enlarged vesicular structures observed in usp-50 or usp8 mutant cells correspond to the enlarged vesicles coated by early endosome markers. Indeed, in the absence of usp8/usp-50, the endosomal Rab5 signal is enhanced, while early endosomes are significantly enlarged. Given that Rab5 guanine nucleotide exchange factor (GEF), Rabex5, is essential for Rab5 activation, we further investigated its dynamics. Additional analyses conducted in both worm hypodermal cells and cultured mammalian cells revealed an increase of endosomal Rabex5 in response to usp8/usp-50 loss-of-function. Live imaging studies further demonstrated active recruitment of USP8 to newly formed Rab5-positive vesicles, aligning spatiotemporally with Rabex5 regulation. Through systematic exploration of putative USP-50 binding partners on early endosomes, we identified its interaction with Rabex5. Comprehensive genetics and biochemistry experiments demonstrated that USP8 acts through K323 site de-ubiquitination to dissociate Rabex5 from early endosomes and promotes the recruitment of the Rab7 GEF SAND-1/Mon1. In summary, our study began with an unbiased genetic screen and subsequent examination of established theories, leading to the formulation of our own hypothesis. Through multifaceted approaches, we unveiled a novel function of USP8 in early-to-late endosome conversion.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Within Figures 1K-N, diverse anomalous structures were detected in the usp-50 mutant. Further scrutiny is needed to definitively characterize these structures, particularly as the images in Figures 1M and 1L exhibit notable similarities to lamellar bodies.

      We thank the reviewer for the insightful question regarding the resemblance between the vesicles observed in our study and lamellar bodies (LBs). Lamellar bodies are specialized organelles involved in lipid storage and secretion1, prominently studied in keratinocytes of the skin and alveolar type II (ATII) epithelial cells in the lung2. These organelles contain not only lipids but also cell-type specific proteins and lytic enzymes. Due to their acidic pH and functional similarities, LBs are classified as lysosome-related organelles (LROs) or secretory lysosomes3,4. In usp-50 mutants, we observed a considerable number of abnormal vesicles, some of which contain threadlike membrane structures and exhibit morphological similarities to LBs (Figure 2O). However, further analysis with a comprehensive panel of lysosome-related markers demonstrated a significant reduction in lysosomal structures within these mutants. In contrast, vesicles marked by early endosome markers, such as FYVE, RAB-5, RABX-5, and EEA1, were notably enlarged. These results suggest that the enlarged vesicles observed in usp-50 mutants are more likely aberrant early endosomes rather than true lamellar bodies. We have revised the manuscript to reflect these findings and to clearly differentiate between these structures and lysosome-related organelles.

      (2) The correlation between the presence of these abnormal structures and ESCRT-0 remains unaddressed, thus the assertion that UPS-50 regulates endolysosome trafficking in conjunction with ESCRT-0 lacks empirical support.

      We thank the reviewer for the valuable suggestions. We apologize for any confusion and appreciate the opportunity to clarify our findings. The ESCRT machinery is essential for driving endosomal membrane deformation and scission, which leads to the formation of intraluminal vesicles (ILVs) within multivesicular bodies (MVBs). Recent research has shown that the absence of ESCRT components results in a reduction of ILVs in worm gut cells5. In wild type animals, the ESCRT-0 components HGRS-1 and STAM-1 display a distinct punctate distribution (Figure 1G and H). However, in usp-50 mutants, the punctate signals of HGRS-1::GFP and STAM-1::GFP are significantly reduced (Figure 1G and H; and Figure 1-figure supplement 1B), indicating a role for USP-50 in stabilizing the ESCRT-0 complex. Our TEM analysis revealed an accumulation of abnormally enlarged vesicles containing intraluminal structures in usp-50 mutants. When we examined a panel of early endosome and late endosome/lysosome markers, we found that early endosomes are significantly enlarged, while late endosomal/lysosomal structures are markedly reduced in these mutants. This suggests that the abnormal structures observed in usp-50 mutants are likely enlarged early endosomes rather than classical MVBs. To further investigate whether the reduction in ESCRT components contributes to the late endosome/lysosome defects, we analyzed stam-1 mutants. In these mutants, the size of RAB-7-coated vesicles was reduced (Author response image 1C), and the lysosomal marker LAAT-1 indicated a reduction in lysosomal structures (Author response image 1B). These results highlight the importance of the ESCRT complex in late endosome/lysosome formation. However, the morphology of early endosomes, as marked by 2xFYVE, remained similar to that of wild type in stam-1 mutants (Author response image 1A). Therefore, while reduced ESCRT-0 components may contribute to the late endosome/lysosome defects observed in usp-50 mutants, the enlargement of early endosomes in these mutants may involve additional mechanisms. We have revised the manuscript to incorporate these insights and to address the reviewer's comments more comprehensively.

      Author response image 1.

      (A) Confocal fluorescence images of hypodermis expressing YFP::2xFYVE to detect EEs in L4 stage animals in wild type and stam-1(ok406) mutants. Scale bar: 5 μm. (B) Confocal fluorescence images of hypodermal cell 7 (hyp7) expressing the LAAT-1::GFP marker to highlight lysosome structures in 3-day-old adult animals. Compared to wild type, LAAT-1::GFP signal is reduced in stam-1(ok406) animals. Scale bar, 5 μm. (C) The reduction of punctate endogenous GFP::RAB-7 signals in stam-1(ok406) animals. Scale bar: 10 μm.

      (3) Endosomal dysfunction typically leads to significant alterations in the spatial arrangement of marker proteins across distinct endosomes. In the manuscript, the authors examined the distribution and morphology of early endosomes, multivesicular bodies (MVBs), late endosomes, and lysosomes in a usp-50 deficient background primarily through single-channel confocal imaging. By employing two color images showing RAB-5 and RAB-7, in conjunction with HGRS-1, a more comprehensive picture of the aftermath of USP-50 loss can be obtained.

      Good suggestions. We have conducted a double-labeling analysis to examine the distribution of RAB-5 and RAB-7 in conjunction with HGRS-1. In wild type animals, HGRS-1 exhibits a punctate distribution that is partially co-localized with both RAB-5 and RAB-7. In contrast, in usp-50 mutants, the punctate signal of HGRS-1 is significantly reduced, along with its co-localization with RAB-5 and RAB-7 (Author response image 2). These results suggest that, in the absence of USP-50, the stabilization of ESCRT-0 components on endosomes is compromised.

      Author response image 2.

      ESCRT-0 is adjacent to both early endosomes and late endosomes. (A) Confocal fluorescence images of wild-type and usp-50(xd413) hypodermis at L4 stage co-expressing HGRS-1::GFP (hgrs-1 promoter) and endogenous wrmScarlet::RAB-5. (B) HGRS-1 and RAB-5 puncta were analyzed to produce Manders overlap coefficient M1 (HGRS-1/RAB-5) and M2 (RAB-5/HGRS-1) (N=10). (C) Confocal fluorescence images of wild-type and usp-50(xd413) hypodermis at L4 stage co-expressing HGRS-1::GFP (hgrs-1 promoter) and endogenous wrmScarlet::RAB-7. (D) HGRS-1 and RAB-7 puncta were analyzed to produce Manders overlap coefficient M1 (HGRS-1/RAB-7) and M2 (RAB-7/HGRS-1) (N=10). Scale bar: 10 μm for (A) and (C).

      (4) The authors observed enlarged early endosomes in cells depleted of usp-50/usp8, along with enlarged MVB-like structures identified through TEM. The potential identity of these structures as the same organelle could be determined using CLEM.

      We thank the reviewer for the valuable suggestion. Our TEM analysis identified a large number of abnormally enlarged vesicles with various intraluminal structures accumulated in usp-50 mutants. As the reviewer correctly noted, CLEM (correlative light and electron microscopy) would be an ideal approach to further characterize these structures. We have been attempting to implement CLEM in C. elegans for a few years. Given that CLEM relies on fluorescence markers, in this study we focused on two tagged proteins, RAB-5 and RABX-5, which show enlargement in their vesicles in usp-50 mutants. Unfortunately, we encountered significant challenges with this approach, as the GFP-tagged RAB-5 and RABX-5 signals did not survive the electron microscopy procedure. Attempts to align EM sections with residual GFP signaling yielded results that were not convincing. Consequently, we concentrated our analysis on a panel of molecular markers, including 2xFYVE, RAB-5, RABX-5, RAB-7, and LAAT-1. These markers consistently indicated that early endosomes are specifically enlarged in usp-50 mutants, while late endosomal/lysosomal structures are notably reduced. Thus, the abnormal structures identified in usp-50 mutants via TEM are likely to be enlarged early endosomes rather than the classical view of MVBs. We have revised the manuscript to reflect these findings and to clarify this point.

      (5) The working model depicted in Figure 6 Y (right) requires revision, as it has the potential to mislead authors into mistaking enlarged early endosomes for multivesicular bodies (MVBs).

      We thank the reviewer for the excellent suggestion. We have revised the model to clarify that it is the enlarged early endosomes, rather than MVBs, that are observed in usp-50 mutants.

      Reviewer #2 (Recommendations For The Authors):

      (1) Is there any change of Rabx5 protein level in USP8/USP50 mutant cells?

      Good question. In the absence of usp-50/usp8, we indeed observed a noticeable increase in the signal of Rabex5 on endosomes. To determine whether usp-50/usp8 affects the protein level of Rabex5, we investigated the endogenous levels of RABX-5 using the RABX-5::GFP knock-in line. Compared to wild-type controls, we found an elevated protein level of RABX-5::GFP in the knock-in line (Author response image 3). This suggests that USP-50 may play a role in the destabilization of RABX-5/Rabex5 in vivo.

      Author response image 3.

      The endogenous RABX-5 protein level is increased in usp-50 mutants. (A) The RABX-5::GFP KI protein level is increased in usp-50(xd413). (B) Quantification of endogenous RABX-5::GFP protein level in wild type and usp-50(xd413) mutant animals.

      (2) It is interesting that "The rabx-5(null) animals are healthy and fertile and do not display obvious morphological or behavioral defects.", which seems contrary to its role in regulating USP8 localization and endosome maturation.

      It has been previously documented that rabx-5 functions redundantly with rme-6, another RAB-5 GEF in C. elegans, to regulate RAB-5 localization in oocytes6. RNA interference (RNAi) targeting rabx-5 in a rme-6 mutant background results in synthetic lethality, whereas neither rabx-5 nor rme-6 single mutants are essential for worm viability. RME-6 co-localizes with clathrin-coated pits, while Rabex-5 is localized to early endosomes. Rabex-5 forms a stable complex with Rabaptin-5 and is part of a large EEA1-positive complex on early endosomes, whereas RME-6 does not interact with Rabaptin-5 (RABN-5) or EEA-1. These findings suggest that while RME-6 and RABX-5 may function redundantly, they likely play distinct roles in regulating intracellular trafficking processes. In the absence of RABX-5, USP-50 appears to lose its endosomal localization, although the size of the early endosome remains comparable to that of wild type. This observation contrasts with the phenotype associated with USP-50 loss-of-function, in which the early endosome is notably enlarged. These results suggest that residual USP-50 present in the endosomes is sufficient to maintain its role in the endocytic pathway. Conversely, the complete absence of USP-50 likely disrupts the transition of early endosomes to late endosomes, indicating a crucial role of USP-50 in this conversion process. It is also noteworthy that, although loss-of-function of rabx-5 does not result in obvious changes to early endosomes, increasing the gene expression level of rabx-5/Rabex-5 alone is sufficient to cause enlargement of early endosomes (Author response image 4) . Indeed, we observed that loss-of-function mutations in u_sp-50/usp_8 lead to abnormally enlarged early endosomes, accompanied by an enhanced signal of endosomal RABX-5. When the rabx-5(null) mutation was introduced into usp-50 mutant animals, the enlarged early endosome phenotype seen in usp-50 mutants was significantly suppressed (Figure 4H and I). This implies that maintaining a lower level of Rab5 GEF may be crucial for endolysosomal trafficking.

      (3) Does Rabx5 mutation has any impact on early endosomes?

      To address the question, we utilized the CRISPR/Cas9 technique to create a molecular null for rabx-5 (Figure 4E). In the rabx-5(null) mutant animals, we found that the 2xFYVE-labeled early endosomes are indistinguishable from wild type (Figure 4H and 4I). Given that r_abx-5_ functions redundantly with rme-6, another RAB-5 GEF in C. elegans, it is likely that the regulation of early endosome size involves a cooperative interaction between RABX-5 and RME-6.

      (4) The authors observed a reduction of ESCRT-0 components in USP8 mutant cells, could this contribute to the late endosome/lysosome defects?

      Good suggestion. In wild-type animals, the two ESCRT-0 components, HGRS-1 and STAM-1, exhibit a distinct punctate distribution (Figure 1G and H). However, in usp-50 mutants, the punctate signals of HGRS-1::GFP and STAM-1::GFP are significantly diminished (Figure 1G and H; and Figure 1-figure supplement 1B), which aligns with the role of USP-50 in stabilizing the ESCRT-0 complex. To investigate whether the reduction in ESCRT components might contribute to defects in late endosome/lysosome formation, we examined stam-1 mutants. In stam-1 mutants, we observed a reduction in the size of RAB-7-coated vesicles (Author response image 1). Further, when we introduced the lysosomal marker LAAT-1::GFP into stam-1 mutants, we found a substantial decrease in lysosomal structures compared to wild-type animals (Author response image 1). This suggests that the ESCRT complex is essential for proper late endosome/lysosome formation. In contrast, the morphology of early endosomes, as indicated by the 2xFYVE marker, appeared normal in stam-1 mutants, similar to wild-type animals (Author response image 1). This implies that while a reduction in ESCRT-0 components may contribute to the late endosome/lysosome defects observed in usp-50 mutants, the early endosome enlargement phenotype in _usp-5_0 mutants may involve additional mechanisms.

      (5) Rabx5 is accumulated in USP8 mutant cells, I am very curious about the phenotype of USP8-Rabx5 double mutants. Could over-expression of Rabx5 (wild type or mutant forms) cause any defects?

      Excellent suggestions. To address the question, we employed the CRISPR/Cas9 technique to create a molecular null for rabx-5 (Figure 4E). In the rabx-5(null) mutant animals, we observed that the punctate USP-50::GFP signal became diffusely distributed (Figure 4F and G). This suggests that rabx-5 is necessary for the endosomal localization of USP-50. Interestingly, in rabx-5(null) mutant animals, the 2xFYVE-labeled early endosomes appeared similar to those in wild-type animals (Figure 4H and I). When rabx-5(null) was introduced into usp-50 mutant animals, the enlarged early endosome phenotype observed in usp-50 was significantly suppressed (Figure 4H and I). This finding indicates that usp-50 indeed functions through rabx-5 to regulate early endosome size. Additionally, we constructed strains overexpressing either wild-type or K323R mutant RABX-5. Our results showed that overexpression of wild-type RABX-5 led to early endosome enlargement (as indicated by YFP::2xFYVE labeling) (Author response image 4A, B and D). In contrast, overexpression of the K323R mutant RABX-5 did not result in noticeable early endosome enlargement (Author response image 4A, C and D). Together, these data are in consistent with our model that USP-50 may regulate RABX-5 by deubiquitinating the K323 site.

      Author response image 4.

      (A-C) Over-expression wild type RABX-5 causes enlarged EEs (labeled by YFP::2xFYVE) while RABX-5(K323R) mutant form does not. (D) Quantification of the volume of individual YFP::2xFYVE vesicles. Data are presented as mean ± SEM. ****P<0.0001. ns, not significant. One-way ANOVA with Tukey’s test.

      (6) Rabx5 could be ubiquitinated at K88 and K323, and Rabx5-K323R showed different activity when compared with the wild-type protein in USP8 mutant cells. Could the authors provide evidence that USP8 could remove the ubiquitin modification from K323 in Rabx5 protein?

      We appreciate the reviewer's insightful suggestions. To explore the potential of USP-50 in removing ubiquitin modifications from lysine 323 on the RABX-5 protein, we undertook a series of experiments. Initially, we sought to determine whether USP-50 influences the ubiquitination level of RABX-5 in vivo. However, due to the low expression levels of USP-50, we encountered challenges in obtaining adequate amounts of USP-50 protein from worm lysates. To overcome this, we expressed USP-50::4xFLAG in HEK293 cells for subsequent affinity purification. Concurrently, we utilized anti-GFP agarose beads to purify RABX-5::GFP from worms expressing the rabx-5::gfp construct. We then incubated RABX-5::GFP with USP-50::4xFLAG for varying durations and performed immunoblotting with an anti-ubiquitin antibody. As shown in Author response image 5A, our results revealed a decrease in the ubiquitination level of RABX-5 in the presence of USP-50, suggesting that USP-50 directly deubiquitinates RABX-5. Previous studies have indicated that only a minor fraction of recombinant RABX-5 undergoes ubiquitination in HeLa cells, which is believed to have functional significance7. Our findings are consistent with this observation, as only a small fraction of RABX-5 in worms is ubiquitinated. Rabex-5 is known to interact with both K63- and K48-linked poly-ubiquitin chains. To further elucidate whether USP-50 specifically targets K48 or K63-linked ubiquitination at the K323 site of RABX-5, we incubated various HA-tagged ubiquitin mutants with either wild-type or K323R mutant RABX-5 protein. Our results indicated that the K323R mutation reduces K63-linked ubiquitination of RABX-5 (Author response image 5). This experiment was repeated multiple times with consistent results. Additionally, while overexpression of wild-type RABX-5 led to an enlargement of early endosomes, as evidenced by YFP::2xFYVE labeling, overexpression of the K323R mutant did not produce a noticeable effect on endosome size (Author response image 4). Collectively, this finding indicates that RABX-5 is subject to ubiquitin modification in vivo and that USP-50 plays a significant role in regulating this modification at the K323 site.

      Author response image 5.

      (A) RABX-5::GFP protein was purified from worm lysates using anti-GFP antibody. FLAG-tagged USP-50 was purified from HEK293T cells using anti-FLAG antibody. Purified RABX-5::GFP was incubated with USP-50::4FLAG for indicated times (0, 15, 30, 60 mins), followed by immunoblotting using antibody against ubiquitin, FLAG or GFP. In the presence of USP-50::4xFLAG, the ubiquitination level of RABX-5::GFP is decreased. (B) Quantification of RABX-5::GFP ubiquitination level from three independent experiments. (C) HEK293T cells were transfected with HA-Ub or indicated mutants and 4xFLAG tagged RABX-5 or RABX-5 K323R mutant for 48h. The cells were subjected to pull down using the FLAG beads, followed by immunoblotting using antibody against HA or Flag.

      (7) The authors described "the almost identical phenotype of usp-50/usp8 and sand-1/Mon1 mutants", found protein-protein interaction between USP8 and sand-1, and showed that sand1-GFP signal is diminished in USP8 mutant cells. These observations fit with the possibility that USP8 regulates the stability of sand-1 to promote endosomal maturation. Could this be tested and integrated into the current model?

      are grateful for the insightful comments provided by the reviewer. Rab5, known to be activated by Rabex-5, plays a crucial role in the homotypic fusion of early endosomes. Rab5 effectors also include the Rab7 GEF SAND-1/Mon1–Ccz1 complex. Rab7 activation by SAND-1/Mon1-Ccz1 complex is essential for the biogenesis and positioning of late endosomes (LEs) and lysosomes, and for the fusion of endosomes and autophagosomes with lysosomes. The Mon1-Ccz1 complex is able to interact with Rabex5, causing dissociation of Rabex5 from the membrane, which probably terminates the positive feedback loop of Rab5 activation and then promotes the recruitment and activation of Rab7 on endosomes. In our study, we identified an interaction between USP-50 and the Rab5 GEF, RABX-5. In the absence of USP-50, we observed an increased endosomal localization of RABX-5 and the formation of abnormally enlarged early endosomes. This phenotype is reminiscent of that seen in sand-1 loss-of-function mutants, which also exhibit enlarged early endosomes and a concomitant reduction in late endosomes/lysosomes. Notably, USP-50 also interacts with SAND-1, suggesting a potential role in regulating its localization. We could propose several models to elucidate how USP-50 might influence SAND-1 localization, including:

      (1) USP-50 may stabilize SAND-1 through direct de-ubiquitination.

      (2) In the absence of USP-50, the sustained presence of RABX-5 could lead to continuous Rab5 activation, which might hinder or delay the recruitment of SAND-1.

      (3) USP-50 could facilitate SAND-1 recruitment by promoting the dissociation of RABX-5.

      We are actively investigating these models in our laboratory. Due to space constraints, a more detailed exploration of how USP-50 regulates SAND-1 stability will be presented in a separate publication.

      References:

      (1) Schmitz, G., and Müller, G. (1991). Structure and function of lamellar bodies, lipid-protein complexes involved in storage and secretion of cellular lipids. J Lipid Res 32, 1539-1570.

      (2) Dietl, P., and Frick, M. (2021). Channels and Transporters of the Pulmonary Lamellar Body in Health and Disease. Cells-Basel 11. https://doi.org/10.3390/cells11010045.

      (3) Raposo, G., Marks, M.S., and Cutler, D.F. (2007). Lysosome-related organelles: driving post-Golgi compartments into specialisation. Current opinion in cell biology 19, 394-401. https://doi.org/10.1016/j.ceb.2007.05.001.

      (4) Weaver, T.E., Na, C.L., and Stahlman, M. (2002). Biogenesis of lamellar bodies, lysosome-related organelles involved in storage and secretion of pulmonary surfactant. Semin Cell Dev Biol 13, 263-270. https://doi.org/10.1016/s1084952102000551.

      (5) Ott, D.P., Desai, S., Solinger, J.A., Kaech, A., and Spang, A. (2024). Coordination between ESCRT function and Rab conversion during endosome maturation. bioRxiv, 2024.2005.2014.594104. https://doi.org/10.1101/2024.05.14.594104.

      (6) Sato, M., Sato, K., Fonarev, P., Huang, C.J., Liou, W., and Grant, B.D. (2005). Caenorhabditis elegans RME-6 is a novel regulator of RAB-5 at the clathrin-coated pit. Nature cell biology 7, 559-569. https://doi.org/10.1038/ncb1261.

      (7) Mattera, R., Tsai, Y.C., Weissman, A.M., and Bonifacino, J.S. (2006). The Rab5 guanine nucleotide exchange factor Rabex-5 binds ubiquitin (Ub) and functions as a Ub ligase through an atypical Ub-interacting motif and a zinc finger domain. The Journal of biological chemistry 281, 6874-6883. https://doi.org/10.1074/jbc.M509939200.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      How plants perceive their environment and signal during growth and development is of fundamental importance for plant biology. Over the last few decades, nano domain organisation of proteins localised within the plasma-membrane has emerged as a way of organising proteins involved in signal pathways. Here, the authors addressed how a non-surface localised signal (viral infection) was resisted by PM localised signalling proteins and the effect of nano domain organisation during this process. This is valuable work as it describes how an intracellular process affects signalling at the PM where most previous work has focused on the other way round, PM signalling effecting downstream responses in the plant. They identify CPK3 as a specific calcium dependent protein kinase which is important for inhibiting viral spread. The authors then go on to show that CPK3 diffusion in the membrane is reduced after viral infection and study the interaction between CPK3 and the remorins, which are a group of scaffold proteins important in nano domain organisation. The authors conclude that there is an interdependence between CPK3 and remorins to control their dynamics during viral infection in plants.

      Strengths:

      The dissection of which CPK was involved in the viral propagation was masterful and very conclusive. Identifying CPK3 through knockout time course monitoring of viral movement was very convincing. The inclusion of overexpression, constitutively active and point mutation non functioning lines further added to that.

      Weaknesses:

      My main concerns with the work are twofold.

      (1) Firstly, the imaging described and shown is not sufficient to support the claims made. The PM localisation and its non-PM localised form look similar and with no PM stain or marker construct used to support this. The sptPALM data conclusions are nice and fit the narrative. However, no raw data or movie is shown, only representative tracks. Therefore, the data quality cannot be verified and in addition, the reporting of number of single particle events visualised per experiment is absent, only number of cells imaged is reported. Therefore, it is impossible for the reader to appreciate the number of single molecule behaviours obtained and hence the quality of the data.

      (2) Secondly, remorins are involved in a lot of nanodomain controlled processes at the PM. The authors have not conclusively demonstrated that during viral infection the remorin effects seen are solely due to its interaction with CPK3. The sptPALM imaging of REM1.2 in a cpk3 knockout line goes part way to solve this but more evidence would strengthen it in my opinion. How do we not know that during viral infection the entire PM protein dynamics and organisation are altered? Or that CPK3 and REM are at very distant ends of a signalling cascade. Negative control experiments are required here utilising other PM localised proteins which have no role during viral infection. In addition, if the interaction is specific, the transiently expressed CPK3-CA construct (shown to from nano domains) should be expressed with REM1.2-mEOS to show the alterations in single particle behaviour occur due to specific activations of CPK3 and REM1.2 in the absence of PIAMV viral infection and it is not an artefact of whole PM changes in dynamics during viral infection.

      In addition, displaying more information throughout the manuscript (such as raw particle tracking movies and numbers of tracks measured) on the already generated data would strengthen the manuscript further.

      Overall, I think this work has the potential to be a very strong manuscript but additional reporting of methods and data are required and additional lines of evidence supporting interaction claims would significantly strengthen the work and make it exceptional.

      Reviewer #2 (Public Review):

      Summary:

      The paper provides evidence that CPK3 plays a role in plant virus infection, and reports that viral infection is accompanied by changes in the dynamics of CPK3 and REM1.2, the phosphorylation substrate of CPK3, in the plasma membrane. In addition, the dynamics of the two proteins in the PM are shown to be interdependent.

      Strengths:

      The paper contains novel, important information.

      Weaknesses:

      The interpretation of some experimental data is not justified, and the proposed model is not fully based on the available data.

      Reviewer #3 (Public Review):

      Summary:

      This study examined the role that the activation and plasma membrane localisation of a calcium dependent protein kinase (CPK3) plays in plant defence against viruses.<br /> The authors clearly demonstrate that the ability to hamper the cell-to-cell spread of the virus P1AMV is not common to other CPKs which have roles in defence against different types of pathogens, but appears to be specific to CPK3 in Arabidopsis. Further they show that lateral diffusion of CPK3 in the plasma membrane is reduced upon P1AMV infection, with CPK3 likely present in nano-domains. This stabilisation however, depends on one of its phosphorylation substrates a Remorin scaffold protein REM1-2. However, when REM1-2 lateral diffusion was tracked, it showed an increase in movement in response to P1AMV infection. These contrary responses to P1AMV infection were further demonstrated to be interdependent, which led the authors to propose a model in which activated CPK3 is stabilised in nano-domains in part by its interaction with REM1.2, which it binds and phosphorylates, allowing REM1-2 to diffuse more dynamically within the membrane.

      The likely impact of this work is that it will lead to closer examination of the formation of nano-domains in the plasma membrane and dissection of their role in immunity to viruses, as well as further investigation into the specific mechanisms by which CPK3 and REM1-2 inhibit the cell-to-cell spread of viruses.

      Strengths:

      The paper provided compelling evidence about the roles of CPK3 and REM1-2 through a combination of logical reverse genetics experiments and advanced microscopy techniques, particularly in single particle tracking.

      Weaknesses:

      There is a lack of evidence for the downstream pathways, specifically whether the role that CPK3 has in cytoskeletal organisation may play a role in the plant's defence against viral propagation. Also, there is limited discussion about the localisation of the nano-domains and whether there is any overlap with plasmodesmata, which as plant viruses utilise PD to move from cell to cell seems an obvious avenue to investigate.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Viral spread work in CPK mutants with time courses is beautiful!

      Regarding my public points on my issues with the imaging:

      - Figure 2A shows 'PM' localisation of CPK3 and 'non-PM' imaging of CPK3-G2A. The images are near identical both showing cell outlines and cytoplasmic strands. Here a PM marker (such as Lti6B) tagged with a different fluorophore or PM stain should be used in conjunction with surface views (such as in Figure 2C) to show it really is at the PM and the G2A line is not.

      Impaired membrane localization of CPK3-G2A is documented in Mehlmer et al., 2010 using microsomal fractionation. Although Figure 2A main purpose is to show correct expression of the constructs in the lines used for PlAMV propagation (Figure 2B), we replaced the images with wider view pictures to be more representative of the subcellular localization of CPK3 and CPK3-G2A.

      - Regarding Figure 2C, this is extremely noisy and PM heterogeneity is barely observable over the noise from the system (looking at the edges of surface imaged). You mention low resolution was an issue. I notice from the methods you have taken confocal images on an Zeiss 880 with Airyscan. These images must be confocal but If imaged with Airyscan the PM heterogeneity would be much clearer (see work from John Runions lab).

      Indeed, these are tangential views images obtained by Zeiss 880 with Airyscan. Based on tessellation analysis (Figure 2H-J), CPK3 is rather homogeneously distributed and forms ND of around 70nm of diameter. Objects of such size cannot be resolved using pixel reassignment methods such as Airyscan. Note also that AtREM in our study are less heterogeneously distributed than what was described in the literature for StREM1.3.

      - Regarding all sptPALM data. At least an example real data image and video is required otherwise the data can’t be assessed. The work of Alex Martiniere (sptPALM) or Alex Jonson (TIRF) all show raw data so the reader can appreciate the quality of the data. In addition, number of events (particles tracked) has to be shown in the figure legend, not just number of cells otherwise was one track performed per cell? Or 10,000? Obviously where the N sits in this range gives the reader more or less confidence of the data.

      We agree and we added example videos of sptPALM experiments in the supplementary data, we also indicated the number of tracked particles in the figure legends.

      - On a slight technical aside, how do you know the cells being imaged for sptPALM with PIAMV are actually infected with the virus? In Fig 2C you use a GFP tagged version but in sptPALM you use none tagged. I think a sentence in methods on this would help clarify.

      PlAMV-GFP was used for spt-PALM experiment and cell infection was assessed during PALM experiment. This is now precised in the corresponding figures and methods.

      - I also have a concern over some of the representative images showing the same things between different figures. Your clustering data in 3F looks very convincing. However, in Figure 2H the mock and PIAMV-GFP look very similar. How is Figure 3F so different for the same experiment? Especially considering the scale bars are the same for both figures. Same for CPK3-mRFP1.2 in Fig 2C and 3A, the same thing is being imaged, at the same scale (scale bars same size) but the images are extremely different.

      Figure 2 data were generated using CPK3 stably expressed in A. thaliana while Figure 3 data were obtained upon transient over-expression of CPK3 in N. benthamiana. We do not have a clear explanation for such a difference in CPK3 PM behavior, it could lie on a different PM composition or actin organization between those two species, this point is now addressed in the discussion.

      - Line 193&194 - you state that the CA CPK3 is reminiscent of the CPK3 upon PIAMV expression. I don't agree, while CPK3CA is less mobile (2D), the MSD shows it is in-between CPK3 and CPK3 + PIAMV. Therefore, can’t the opposite also be true? That overall the behaviour of CPK3-CA is reminiscent of WT CPK. I think this needs rewording.

      We agree and we reworded that part

      - Line 651 - what numerical aperture are you using for the lens during confocal microscopy. This is fundamentally important information directly related to the reproducibility of the work. You report it for the sptPALM.

      The numerical aperture is now indicated in the methods.

      Regarding my bigger point about specific interactions between CPK3 and remorin during viral infection to strengthen your claim the following need doing. I am not suggesting you do all of these but at least two would significantly enhance the paper.

      (1) Image a none related PM protein during viral infection using sptPALM and demonstrate that its behaviour is not altered (such as lti6b). This would show the affects on remorin behaviour are specific to CPK3 and not a whole scale PM alteration in dynamics due to viral infection.

      (2) Two colour SPT imaging of CPK3 and REM1.2. You show in absence of proteins (knockouts effect on each other) but your only interaction data is from a kinase assay (where CPK1 and 2 also interact, even though they are not localised at the same place) and colocalisation data (see below). A two colour SPT imaging experiment showing interaction and clustering of CPK3 and REM1.2 with each other and the change in their behaviours when viral infected and simultaneously imaged would address all of my concerns.

      - On another note, the co-localisation data (fig 5 sup 4) needs additional analysis. I would expect most PM proteins to show the results you show as the data is very noisy. In order to improve I would zoom in to fill the field of view and then determine correlation and also when one image is rotated 90 degrees (as described in Jarsch et al., plant cell) to enhance this work.

      (3) In the absence of viral infection, but presence of CPK3-CA, is sptPALM REM1.2 behaviour in the PM altered, if so then the interaction is specific and changes in remorin dynamics are not due to whole scale PM changes during viral infection and the manuscript substantially strengthened.

      (4) Building on from 3), if you have a CPK3 mutated with both CPK3-CA and G2A this would be constitutively active and non-PM localised and as such should not affect Remorin behaviour if your model is true, this would strengthen the case significantly but I appreciate is highly artificial and would need to be done transiently.

      Regarding the first point, since the role of PM proteins involved in potexvirus infection is barely assessed, picking a non-related PM protein might be tricky. The data obtained with mEOS3.2-REM1.2 expressed in cpk3 null-mutant point towards a specific role of CPK3 in PlAMV-induced REM1.2 diffusion and not a general alteration of PM protein behavior.

      Regarding the second point, we already reported the in vivo interaction between AtCPK3CA and AtREM1.2/AtREM1.3 by BiFC in N.benthamiana (Perraki et al 2018) and AtCPK3 was shown to co-IP with AtREM1.2 (Abel et al, 2021). While we agree on the relevance of doing dual color sptPALM with CPK3 and REM1.2, it is so far technically challenging and we would not be able to implement this in a timely manner. For the colocalization, although the whole cell is displayed in the figure, the analysis was performed on ROI to fill the field of analysis.

      We agree with the relevance of adding the colocalization analysis of randomized images (mTagBFP2 channel rotated 90 degrees), this is now added to Figure 5 – supplement figure 5.

      Finally, for the third and fourth points, spt-PALM analysis of REM1.2 in presence of CPK3-CA and CPK3-CA-G2A was performed (Figure 5 - figure supplement 4). The results suggest a specific role of CPK3-CA in REM1.2 diffusion.

      Minor points:

      Line 59 - from, I think you mean from.

      Line 63 - Reference needed after latter.

      Line 68 - Reference required after viral infection.

      Line 85 - Propose not proposed.

      Line 156 - Allowed us to not allows to.

      Line 204 - add we previously 'demonstrated'

      Line 622 and 623 - You say lines obtained from Thomas Ott. This is very odd phrasing considering he is an author. I appreciate citing the work producing the lines but maybe reword this

      These points were corrected, thank you.

      Reviewer #2 (Recommendations For The Authors):

      The paper provides evidence that CPK3 plays a role in plant virus infection, and reports that viral infection is accompanied by changes in the dynamics of CPK3 and REM1.2, the phosphorylation substrate of CPK3, in the plasma membrane. In addition, the dynamics of the two proteins in the PM are shown to be interdependent. The paper contains novel, important information that can undoubtedly be published in eLife. However, I have some concerns that should be addressed before it can be accepted for publication.

      Major concerns

      When the authors say that CPK3 plays a role in viral propagation, it should be clarified what is meant by 'propagation', - replication of the viral genome, its cell-to-cell transport, or long-distance transport via the phloem. By default the readers will tend to assume the former meaning. In my opinion, the term 'propagation' is misleading and should be avoided.

      We purposely chose the term “propagation” because it sums replication and cell-to-cell movement. Nevertheless, we previously showed that group 1 StREM1.3 doesn’t alter PVX replication (Raffaele et al., 2009 The Plant Cell). In this paper, as we do not investigate the role of AtREM1.2 or AtCPK3 in the replication of the viral PlAMV genome, we cannot state that these proteins are strictly involved in cell-to-cell movement of the virus.

      The authors show that viral infection is associated with decreased diffusion of CPK3 and increased diffusion of REM1.2 in the PM. However, it remains unclear whether these changes are related to partial resistance to viral infection involving CPK3 and REM1.2, or whether they are simply a consequence of viral infection that may lead to altered PM properties and altered dynamics of PM-associated proteins. Therefore, the model presented in Fig. 6 appears to be entirely speculative, as it postulates that changes in CPK3 and REM1.2 dynamics are the cause of suppressed virus 'propagation'. In addition, the model implies that a decrease in CPK3 mobility leads to activation of its kinase activity. This view is not supported by experimental data (see my next comment). The model should be deleted (both as the figure and its description in the Discussion) or substantially reworked so that it finally relies on existing data.

      For the first point, the results obtained from the additional experiments proposed by reviewer #1 supports the hypothesis of a direct impact of CPK3 on REM1.2 diffusion (Figure 5 - figure supplement 4).

      We agree with the second point and reworked the model to remove the link between CPK3 activation and its increased diffusion.

      The statement that 'changes in CPK3 dynamics upon PlAMV infection are linked to its activation' (line 194) is based on a flawed logic, and the conclusion in this section of Results ('changes in CPK3 dynamics upon PlAMV infection are linked to its activation') is incorrect, as it is not supported by experimental data. In fact, the authors show that CPK3 dynamics and clustering upon viral infection is somewhat reminiscent of the behavior of a CPK3 deletion mutant, which is a constitutively active protein kinase. However, this partial similarity cannot be taken as evidence that CPK3 dynamics upon PlAMV infection are related to its activation. Furthermore, the authors emphasize the similarity of the mutant and CPK3 in infected cells without taking into account a drastic difference in their localization (Fig. 3A, middle and right panels) showing that the reduced dynamics or the compared proteins may have different causes. I suggest the removal of the section 'CPK3 activation leads to its confinement in PM ND' from the paper, as the results included in this section are not directly related to other data presented.

      The PM lateral organization of PM-bound CPKs in their native or constitutively active form as well as the role of lipid in such phenomenon was never shown before. We believe that this section contains relevant information for the community. We kept the section but reworded it to tamper the correlation made between CPK3 PM organization upon viral infection and its activation.

      Line 270 - 'group 1 REMs might play a role in CPK3 domain stabilization upon viral infection'. This is an overstatement. The size of the CPK3-containing NDs may have no correlation with their stability.

      We reworded the sentence.

      Minor points

      Line 204 - we previously that Line 234 and hereafter - "the D" sounds strange. Suggest using "the diffusion coefficient".

      This was reworded.

      Reviewer #3 (Recommendations For The Authors):

      The authors have previously demonstrated that there was an increase in REM1.2 localisation to plasmodesmata under viral challenge. It would be useful to see if there was any co-localisation of REM1.2 and CPK3 with plasmodesmata in response to PlAMV and how this is affected in the mutants. This could be carried out relatively simply using aniline blue.

      These experiments were added to the supplementary data of Figure 2 – figure supplement 2.  and Figure 4 – figure supplement 4. , no enrichment of CPK3 or REM1.2 at plasmodesmata could be observed upon PlAMV infection.

      Fig 3 supplementary figure 2 would be better incorporated into the main body of Figure 3 as this underpins discussion on the involvement of lipids such as sterols in the formation of nanodomains.

      We moved Figure 3 – Supplementary figure 2 to the main body of Figure 3.

      Minor corrections:

      Whilst the paper is generally well written there are a number of grammatical errors:

      Line 1 & 2: Title doesn't quite read correctly, suggest a rewording for clarity.

      L31: Insert "a"after only

      L33: Replace "are playing" with "play"

      L34: Begin sentence "Viruses are intracellular pathogens and as such the role..."59: replace "form" with "from"

      L63: Insert "was demonstrated" after REM1.2)

      L85: Replace "proposed" with "propose"

      L86: replace "encouraging to explore" with "which will encourage further exploration of "

      L129: replace "we'll focus on" with "we concentrated on"

      L131: insert "an" before ATP

      L138: change "among" to "amongst"

      L156: change "allows to analyse" to "allows the analysis of"

      L204: Insert "showed" after previously.

      L232: "root seedlings" should this be the roots of seedlings?

      L235: insert "to" after "as"

      L280: insert "a" after "only"

      L281: change " to play" with "as playing": change CA to superscript

      L307: Insert "was" after "transcription"

      L320: change "display" to "displaying"

      L321: change "form" to forms"

      L340: "hampering" should come before viral

      L365: insert"us' after "allow"

      Thank you, these were corrected

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      This paper provides a computational model of a synthetic task in which an agent needs to find a trajectory to a rewarding goal in a 2D-grid world, in which certain grid blocks incur a punishment. In a completely unrelated setup without explicit rewards, they then provide a model that explains data from an approach-avoidance experiment in which an agent needs to decide whether to approach or withdraw from, a jellyfish, in order to avoid a pain stimulus, with no explicit rewards. Both models include components that are labelled as Pavlovian; hence the authors argue that their data show that the brain uses a Pavlovian fear system in complex navigational and approach-avoid decisions. 

      We thank the reviewer for their thoughtful comments. To clarify, the grid-world setup was used as a didactic tool/testbed to understand the interaction between Pavlovian and instrumental systems (lines 80-81) [Dayan et al., 2006], specifically in the context of safe exploration and learning. It helps us delineate the Pavlovian contributions during learning, which is key to understanding the safety-efficiency dilemma we highlight. This approach generates a hypothesis about outcome uncertainty-based arbitration between these systems, which we then test in the approach-withdrawal VR experiment based on foundational studies studying Pavlovian biases [Guitart-Masip et al., 2012, Cavanagh et al., 2013].

      Although the VR task does not explicitly involve rewards, it provides a specific test of our hypothesis regarding flexible Pavlovian fear bias, similar to how others have tested flexible Pavlovian reward bias without involving punishments (e.g., Dorfman & Gershman, 2019). Both the simulation and VR experiment models are derived from the same theoretical framework and maintain algebraic mapping, differing only in task-specific adaptations (e.g., differing in action sets and temporal difference learning for multi-step decisions in the grid world vs. Rescorla-Wagner rule for single-step decisions in the VR task). This is also true for Dayan et al. [2006] who bridge Pavlovian bias in a Go-No Go task (negative auto-maintenance pecking task) and a grid world task. Therefore, we respectfully disagree that the two setups are completely unrelated and that both models include components merely labelled as Pavlovian.

      We will rephrase parts of the manuscript to prevent the main message of our manuscript from being misconveyed. Particularly in the Methods and Discussion, to clarify that our main focus is on Pavlovian fear bias in safe exploration and learning (as also summarised by reviewers #2 and #3), rather than on its role in complex navigational decisions. We also acknowledge the need for future work to capture more sophisticated safe behaviours, such as escapes and sophisticated planning which span different aspects of the threat-imminence continuum [Mobbs et al., 2020], and we will highlight these as avenues for future research.

      In the first setup, they simulate a model in which a component they label as Pavlovian learns about punishment in each grid block, whereas a Q-learner learns about the optimal path to the goal, using a scalar loss function for rewards and punishments. Pavlovian and Q-learning components are then weighed at each step to produce an action. Unsurprisingly, the authors find that including the Pavlovian component in the model reduces the cumulative punishment incurred, and this increases as the weight of the Pavlovian system increases. The paper does not explore to what extent increasing the punishment loss (while keeping reward loss constant) would lead to the same outcomes with a simpler model architecture, so any claim that the Pavlovian component is required for such a result is not justified by the modelling. 

      Thank you for this comment. We acknowledge that our paper does not compare the Pavlovian fear system to a purely instrumental system with varying punishment sensitivity. Instead, our model assumes the coexistence of these two systems and demonstrates the emergent safety-efficiency trade-off from their interaction. It is possible that similar behaviours could be modelled using an instrumental system alone. In light of the reviewer’s comment, we will soften our claims regarding the necessity of the Pavlovian system, despite its known existence.

      We also encourage the reviewer to consider the Pavlovian system as a biologically plausible implementation of punishment sensitivity. Unlike punishment sensitivity (scaling of the punishments), which has not been robustly mapped to neural substrates in fMRI studies, the neural substrates for the Pavlovian fear system (e.g., the limbic loop) are well known (see Supplementary Fig. 16).

      Additionally, we point out that varying reward sensitivities while keeping punishment sensitivity constant allows our PAL agent to differentiate from an instrumental agent that combines reward and punishment into a single feedback signal. As highlighted in lines 136-140 and the T-maze experiment (Fig. 3 A, B, C), the Pavlovian system maintains fear responses even under high reward conditions, guiding withdrawal behaviour when necessary (e.g., ω = 0.9 or 1), which is not possible with a purely instrumental model if the punishment sensitivities are fixed. This is a fundamental point.

      We will revise our discussion and results sections to reflect these clarifications.

      In the second setup, an agent learns about punishments alone. "Pavlovian biases" have previously been demonstrated in this task (i.e. an overavoidance when the correct decision is to approach). The authors explore several models (all of which are dissimilar to the ones used in the first setup) to account for the Pavlovian biases. 

      Thank you, we respectfully disagree with the statement that our models used in the experimental setup are dissimilar to the ones used in the first setup. Due to differences in the nature of the task setup, the action set differs, but the model equations and the theory are the same and align closely, as described in our response above. The only additional difference is the use of a baseline bias in human experiments and the RLDDM model, where we also model reaction times with drift rates which is not a behaviour often simulated in grid world simulations. We will improve our Methods section to ensure that model similarity is highlighted.

      Strengths: 

      Overall, the modelling exercises are interesting and relevant and incrementally expand the space of existing models. 

      We thank reviewer #1 for acknowledging the relevance of our models in advancing the field. We would like to further highlight that, to the best of our knowledge, this is the first time reaction times in Pavlovian-Instrumental arbitration tasks have been modelled using RLDDM, which adds a novel dimension to our approach.

      Weaknesses: 

      I find the conclusions misleading, as they are not supported by the data. 

      First, the similarity between the models used in the two setups appears to be more semantic than computational or biological. So it is unclear to me how the results can be integrated. 

      We acknowledge the dissimilarity between the task setups (grid-world vs. approach-withdrawal). However, we believe these setups are computationally similar and may be biologically related, as suggested by prior work like Dayan et al. [2006], which integrates Go-No Go and grid-world tasks. Just as that work bridged findings in the appetitive domain, we aim to integrate our findings in the aversive domain. We will provide a more integrated interpretation in the discussion section of the revised manuscript.

      Dayan, P., Niv, Y., Seymour, B., and Daw, N. D. (2006). The misbehavior of value and the discipline of the will. Neural networks, 19(8):1153–1160.

      Secondly, the authors do not show "a computational advantage to maintaining a specific fear memory during exploratory decision-making" (as they claim in the abstract). Making such a claim would require showing an advantage in the first place. For the first setup, the simulation results will likely be replicated by a simple Q-learning model when scaling up the loss incurred for punishments, in which case the more complex model architecture would not confer an advantage. The second setup, in contrast, is so excessively artificial that even if a particular model conferred an advantage here, this is highly unlikely to translate into any real-world advantage for a biological agent. The experimental setup was developed to demonstrate the existence of Pavlovian biases, but it is not designed to conclusively investigate how they come about. In a nutshell, who in their right mind would touch a stinging jellyfish 88 times in a short period of time, as the subjects do on average in this task? Furthermore, in which real-life environment does withdrawal from a jellyfish lead to a sting, as in this task? 

      Thank you for your feedback. As mentioned above, we invite the reviewer to potentially think of Pavlovian fear systems as a way how the brain might implement punishment sensitivity. Secondly, it provides a separate punishment memory that cannot be overwritten with higher rewards (see also Elfwing and Seymour 2017, and Wang et al, 2021)

      Elfwing, S., & Seymour, B. (2017, September). Parallel reward and punishment control in humans and robots: Safe reinforcement learning using the MaxPain algorithm. In 2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) (pp. 140-147). IEEE. 

      Wang, J., Elfwing, S., & Uchibe, E. (2021). Modular deep reinforcement learning from reward and punishment for robot navigation. Neural Networks, 135, 115-126.

      The simulation setups such as the following grid-worlds are common test-beds for algorithms in reinforcement learning [Sutton and Barto, 2018].

      Any experimental setup faces the problem of having a constrained experiment designed to test and model a specific effect versus designing a lesser constrained exploratory experiment which is more difficult to model. Here we chose the former, building upon previous foundational experiments on Pavlovian bias in humans [Guitart-Masip et al., 2012, Cavanagh et al., 2013].  The condition where withdrawal from a jellyfish leads to a sting, though less realistic, was included for balancing the four cue-outcome conditions. Overall the task was designed to isolate the effect we wanted to test - Pavlovian fear bias in choices and reaction times, to the best of our ability. In a free operant task, it is very well likely that other components not included in our model could compete for control.

      Crucially, simplistic models such as the present ones can easily solve specifically designed lab tasks with low dimensionality but they will fail in higher-dimensional settings. Biological behaviour in the face of threat is utterly complex and goes far beyond simplistic fight-flight-freeze distinctions (Evans et al., 2019). It would take a leap of faith to assume that human decision-making can be broken down into oversimplified sub-tasks of this sort (and if that were the case, this would require a meta-controller arbitrating the systems for all the sub-tasks, and this meta-controller would then struggle with the dimensionality j). 

      We agree that safe behaviours, such as escapes, involve more sophisticated computations. We do not propose Pavlovian fear bias as the sole computation for safe behavior, but rather as one of many possible contributors. Knowing about the existence about the Pavlovian withdrawal bias, we simply study its possible contribution. We will include in our discussion that such behaviours likely occupy different parts of the threat-imminence continuum [Mobbs et al., 2020].

      Dean Mobbs, Drew B Headley, Weilun Ding, and Peter Dayan. Space, time, and fear: survival computations along defensive circuits. Trends in cognitive sciences, 24(3):228–241, 2020.

      On the face of it, the VR task provides higher "ecological validity" than previous screen-based tasks. However, in fact, it is only the visual stimulation that differs from a standard screen-based task, whereas the action space is exactly the same. As such, the benefit of VR does not become apparent, and its full potential is foregone. 

      We thank the reviewer for their comment. We selected the action space to build on existing models [Guitart-Masip et al., 2012, Cavanagh et al., 2013] that capture Pavlovian biases and we also wanted to minimize participant movement for EEG data collection. Unfortunately, despite restricting movement to just the arm, the EEG data was still too noisy to lead to any substantial results. We will explore more free-operant paradigms in future works.

      On the issue of the difference between VR and lab-based tasks, we note the reviewer's point. Note however that desktop monitor-based tasks lack the sensorimotor congruency between the action and the outcome. Second, it is also arguable, that the background context is important in fear conditioning, as it may help set the tone of the fear system to make aversive components easier to distinguish.

      If the authors are convinced that their model can - then data from naturalistic approach-avoidance VR tasks is publicly available, e.g. (Sporrer et al., 2023), so this should be rather easy to prove or disprove. In summary, I am doubtful that the models have any relevance for real-life human decision-making. 

      We thank the reviewers for their thoughtful inputs. We do not claim our model is the best fit for all naturalistic VR tasks, as they require multiple systems across the threat-imminence continuum [Mobbs et al., 2020] and are currently beyond the scope of the current work. However, we believe our findings on outcome-uncertainty-based arbitration of Pavlovian bias could inform future studies and may be relevant for testing differences in patients with mental disorders, as noted by reviewer #2. At a general level, it can be said that most well-controlled laboratory-based tasks need to bridge a sizeable gap to applicabilty in real-life naturalistic behaviour; although the principle of using carefully designed tasks to isolate individual factors is well established

      Finally, the authors seem to make much broader claims that their models can solve safety-efficiency dilemmas. However, a combination of a Pavlovian bias and an instrumental learner (study 1) via a fixed linear weighting does not seem to be "safe" in any strict sense. This will lead to the agent making decisions leading to death when the promised reward is large enough (outside perhaps a very specific region of the parameter space). Would it not be more helpful to prune the decision tree according to a fixed threshold (Huys et al., 2012)? So, in a way, the model is useful for avoiding cumulatively excessive pain but not instantaneous destruction. As such, it is not clear what real-life situation is modelled here. 

      We thank the reviewer for their comments and ideas. In our discussion lines 257-264, we discuss other works which identify similar safety-efficiency dilemmas, in different models. Here, we simply focus on the safety-efficiency trade-off arising from the interactions between Pavlovian and instrumental systems. It is important to note that the computational argument for the modular system with separate rewards and punishments explicitly protects (up to a point, of course) against large rewards leading to death because the Pavlovian fear response is not over-written by successful avoidance in recent experience. Note also that in animals, reward utility curves are typically convex. We will clarify this in the discussion section.

      We completely agree that in certain scenarios, pruning decision trees could be more effective, especially with a model-based instrumental agent. Here we utilise a model-free instrumental agent, which leads to a simpler model - which is appreciated by some readers such as reviewer #2. Future work can incorporate model-based methods.

      A final caveat regarding Study 1 is the use of a PH associability term as a surrogate for uncertainty. The authors argue that this term provides a good fit to fear-conditioned SCR but that is only true in comparison to simpler RW-type models. Literature using a broader model space suggests that a formal account of uncertainty could fit this conditioned response even better (Tzovara et al., 2018). 

      We thank the reviewer for bringing this to our notice. We will discuss Tzovara et al., 2018 in our discussion in our revised manuscript.

      Reviewer #2 (Public review): 

      Summary: 

      The authors tested the efficiency of a model combining Pavlovian fear valuation and instrumental valuation. This model is amenable to many behavioral decision and learning setups - some of which have been or will be designed to test differences in patients with mental disorders (e.g., anxiety disorder, OCD, etc.). 

      Strengths: 

      (1) Simplicity of the model which can at the same time model rather complex environments. 

      (2) Introduction of a flexible omega parameter. 

      (3) Direct application to a rather advanced VR task. 

      (4) The paper is extremely well written. It was a joy to read. 

      Weaknesses: 

      Almost none! In very few cases, the explanations could be a bit better. 

      We thank reviewer #2 for their positive feedback and thoughtful recommendations. We will ensure that, in our revision, we clarify the explanations in the few instances where they may not be sufficiently detailed, as noted.

      Reviewer #3 (Public review): 

      Summary: 

      This paper aims to address the problem of exploring potentially rewarding environments that contain the danger, based on the assumption that an independent Pavlovian fear learning system can help guide an agent during exploratory behaviour such that it avoids severe danger. This is important given that otherwise later gains seem to outweigh early threats, and agents may end up putting themselves in danger when it is advisable not to do so. 

      The authors develop a computational model of exploratory behaviour that accounts for both instrumental and Pavlovian influences, combining the two according to uncertainty in the rewards. The result is that Pavlovian avoidance has a greater influence when the agent is uncertain about rewards. 

      Strengths: 

      The study does a thorough job of testing this model using both simulations and data from human participants performing an avoidance task. Simulations demonstrate that the model can produce "safe" behaviour, where the agent may not necessarily achieve the highest possible reward but ensures that losses are limited. Interestingly, the model appears to describe human avoidance behaviour in a task that tests for Pavlovian avoidance influences better than a model that doesn't adapt the balance between Pavlovian and instrumental based on uncertainty. The methods are robust, and generally, there is little to criticise about the study. 

      Weaknesses: 

      The extent of the testing in human participants is fairly limited but goes far enough to demonstrate that the model can account for human behaviour in an exemplar task. There are, however, some elements of the model that are unrealistic (for example, the fact that pre-training is required to select actions with a Pavlovian bias would require the agent to explore the environment initially and encounter a vast amount of danger in order to learn how to avoid the danger later). The description of the models is also a little difficult to parse. 

      We thank reviewer #3 for their thoughtful feedback and useful recommendations, which we will take into account while revising the manuscript.

      We acknowledge the complexity of specifying Pavlovian bias in the grid world and appreciate the opportunity to elaborate on how this bias is modelled. In the human experiment, the withdrawal action is straightforwardly biased, as noted, while in the grid world, we assume a hardwired encoding of withdrawal actions for each state/grid. This innate encoding of withdrawal actions could be represented in the dPAG [Kim et. al., 2013]. We implement this bias using pre-training, which we assume would be a product of evolution. Alternatively, this could be interpreted as deriving from an appropriate value initialization where the gradient over initialized values determines the action bias. Such aversive value initialization, driving avoidance of novel and threatening stimuli, has been observed in the tail of the striatum in mice, which is hypothesized to function as a Pavlovian fear/threat learning system [Menegas et. al., 2018].

      Additionally, we explored the possibility of learning the action bias on the fly by tracking additional punishment Q-values instead of pre-training, which produced similar cumulative pain and step plots. While this approach is redundant, and likely not how the brain operates, it demonstrates an alternative algorithm.

      We thank the reviewer for pointing out these potentially unrealistic elements, and we will revise the manuscript to clarify and incorporate these explanations and improve the model descriptions.

      Eun Joo Kim, Omer Horovitz, Blake A Pellman, Lancy Mimi Tan, Qiuling Li, Gal Richter-Levin, and Jeansok J Kim. Dorsal periaqueductal gray-amygdala pathway conveys both innate and learned fear responses in rats. Proceedings of the National Academy of Sciences, 110(36):14795–14800, 2013

      William Menegas, Korleki Akiti, Ryunosuke Amo, Naoshige Uchida, and Mitsuko Watabe-Uchida. Dopamine neurons projecting to the posterior striatum reinforce avoidance of threatening stimuli. Nature neuroscience, 21(10): 1421–1430, 2018

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      The study significantly advances our understanding of how exosomes regulate filopodia formation. Filopodia play crucial roles in cell movement, polarization, directional sensing, and neuronal synapse formation. McAtee et al. demonstrated that exosomes, particularly those enriched with the protein THSD7A, play a pivotal role in promoting filopodia formation through Cdc42 in cancer cells and neurons. This discovery unveils a new extracellular mechanism through which cells can control their cytoskeletal dynamics and interaction with their surroundings. The study employs a combination of rescue experiments, live-cell imaging, cell culture, and proteomic analyses to thoroughly investigate the role of exosomes and THSD7A in filopodia formation in cancer cells and neurons. These findings offer valuable insights into fundamental biological processes of cell movement and communication and have potential implications for understanding cancer metastasis and neuronal development. 

      Weaknesses: 

      The conclusions of this study are in most cases supported by data, but some aspects of data analysis need to be better clarified and elaborated. Some conclusions need to be better stated and according to the data observed. 

      We appreciate the reviewer's recognition of the impact of our study.  We will address the concerns about data analysis and statement of our conclusions in our full response to reviewers.

      Reviewer #2 (Public review): 

      Summary: 

      The authors show that small EVs trigger the formation of filopodia in both cancer cells and neurons. They go on to show that two cargo proteins, endoglin, and THSD7A, are important for this process. This possibly occurs by activating the Rho-family GTPase CDC42. 

      Strengths: 

      The EV work is quite strong and convincing. The proteomics work is well executed and carefully analyzed. I was particularly impressed with the chick metastasis assay that added strong evidence of in vivo relevance. 

      Weaknesses: 

      The weakest part of the paper is the Cdc42 work at the end of the paper. It is incomplete and not terribly convincing. This part of the paper needs to be improved significantly.

      We appreciate the reviewer's recognition of the impact of our study.  Indeed, more work needs to be done to clarify the role of Cdc42 in the induction of filopodia by exosome-associated THSD7A.  We anticipate that this will be a separate manuscript, delving in-depth into how exosome-associated THSD7A interacts with recipient cells to activate Cdc42 and carrying out a variety of assays for Cdc42 activation.

      Reviewer #3 (Public review): 

      Summary: 

      The authors identify a novel relationship between exosome secretion and filopodia formation in cancer cells and neurons. They observe that multivesicular endosomes (MVE)-plasma membrane (PM) fusion is associated with filopodia formation in HT1080 cells and that MVEs are present in filopodia in primary neurons. Using overexpression and knockdown (KD) of Rab27/HRS in HT1080 cells, melanoma cells, and/or primary rat neurons, they found that decreasing exosome secretion reduces filopodia formation, while Rab27 overexpression leads to the opposite result. Furthermore, the decreased filopodia formation is rescued in the Rab27a/HRS KD melanoma cells by the addition of small extracellular vesicles (EVs) but not large EVs purified from control cells. The authors identify endoglin as a protein unique to small EVs secreted by cancer cells when compared to large EVs. KD of endoglin reduces filopodia formation and this is rescued by the addition of small EVs from control cells and not by small EVs from endoglin KD cells. Based on the role of filopodia in cancer metastasis, the authors then investigate the role of endoglin in cancer cell metastasis using a chick embryo model. They find that injection of endoglin KD HT1080 cells into chick embryos gives rise to less metastasis compared to control cells - a phenotype that is rescued by the co-injection of small EVs from control cells. Using quantitative mass spectrometry analysis, they find that thrombospondin type 1 domain containing 7a protein (THSD7A) is downregulated in small EVs from endoglin KD melanoma cells compared to those from control cells. They also report that THSD7A is more abundant in endoglin KD cell lysate compared to control HT1080 cells and less abundant in small EVs from endoglin KD cells compared to control cells, indicating a trafficking defect. Indeed, using immunofluorescence microscopy, the authors observe THSD7A-mScarlet accumulation in CD63-positive structures in endoglin KD HT1080 cells, compared to control cells. Finally, the authors determine that exosome-secreted THSD7A induces filopodia formation in a Cdc42-dependent mechanism. 

      Strengths: 

      (1) While exosomes are known to play a role in cell migration and autocrine signaling, the relationship between exosome secretion and the formation of filopodia is novel. 

      (2) The authors identify an exosomal cargo protein, THSD7A, which is essential for regulating this function. 

      (3) The data presented provide strong evidence of a role for endoglin in the trafficking of THSD7A in exosomes. 

      (4) The authors associate this process with functional significance in cancer cell metastasis and neurological synapse formation, both of which involve the formation of filopodia. 

      (5) The data are presented clearly, and their interpretation appropriately explains the context and significance of the findings. 

      Weaknesses: 

      (1) A better characterization of the nature of the small EV population is missing: 

      It is unclear why the authors chose to proceed to quantitative mass spectrometry with the bands in the Coomassie from size-separated EV samples, as there are other bands present in the small EV lane but not the large EV lane. This is important to clarify because it underlies how they were able to identify THSD7A as a unique regulator of exosome-mediated filopodia formation. Is there a reason why the total sample fractions were not compared? This would provide valuable information on the nature of the small and large EV populations. 

      We would like to clarify that there are two sets of proteomics data in the manuscript. The first was comparing bands from a Coomassie gel from two samples: small EVs and large EVs from B16F1 cells. In this proteomics experiment, we identified endoglin as present in small EVs, but not large EVs. For this experiment, we only sent 4 bands from the small EV lane, chosen based on their obvious banding pattern difference on the Coomassie gel.

      In the second proteomics experiment, we used quantitative iTRAQ proteomics to compare small EVs purified from B16F1 control (shScr) and endoglin KD (shEng1 and shEng2) cell lines. In this experiment, we sent total protein extracted from small EV samples for analysis. So, these samples included the entire EV content, not just selected bands from a gel. In this experiment, we identified THSD7A as reduced in the shEng small EVs.

      (2) Data analysis and quantification should be performed with increased rigor: 

      a) Figure 1C - The optical and temporal resolution are insufficient to conclusively characterize the association between exosome secretion and filopodia. Specifically, the 10-second interval used in the image acquisitions is too close to the reported 20-second median time between exosome secretion and filopodia formation. Two-5 sec intervals should be used to validate this. It would also be important to correlate the percentage of filopodia events that co-occur with exosome secretion. Is this a phenomenon that occurs with most or only a small number of filopodia? Additionally, resolution with typical confocal microscopy is subpar for these analyses. TIRF microscopy would offer increased resolution to parse out secretion events. As the TIRF objective is listed in the Methods section, figure legends should mention which images were acquired using TIRF microscopy. 

      We acknowledge that the frame rate naturally limits our estimates of the timing of filopodia formation after exosome secretion. We set out to show a relationship between exosome secretion and filopodia formation, based on their proximity in timing. While our data set shows a median time interval of 20 seconds, the true median could be between 10-30 seconds, based on our frame rate.  Regardless of the exact timing, our data show that exosome secretion is rapidly followed by filopodia formation events.

      To address the question of the percentage of filopodia events that are preceded by exosome secretion, the reviewer is correct in stating that we might need TIRF microscopy to get an accurate calculation of this number.  Nonetheless, we will review our live imaging data for this experiment to determine if this calculation is possible. Again, we will be limited by the frame rate we used to capture the images, so we could possibly be missing secretion events taking place between the 10 second time intervals.  Regardless, for the secretion events that we visualized, we always observed subsequent filopodia formation.

      No TIRF imaging was used in this manuscript.  A TIRF objective was used for selected neuron imaging (see methods); however, it was used for spinning disk confocal microscopy, not for TIRF imaging.  We will clarify this in the methods.

      b) Figure 2 - It would be important to perform further analysis to concretely determine the relationship between exosome secretion and filopodia stability. Are secretion events correlated with the stability of filopodia? Is there a positive feedback loop that causes further filopodia stability and length with increased secretion? Furthermore, is there an association between the proximity of secretion with stability? Quantification of filopodia more objectively (# of filopodia/cell) would be helpful. 

      Our data shows that manipulation of general exosome secretion, via Hrs knockdown, affects both de novo filopodia formation and filopodia stability (Fig 2g,h). Interestingly, knockdown of endoglin only affects de novo filopodia formation, while filopodia stability is unaffected (Fig 4g,h). These results suggest that filopodia stability is dependent upon exosome cargoes besides endoglin/THSD7A.  Such cargoes might include other extracellular matrix molecules, such as fibronectin. We previously showed that exosomes promote nascent cell adhesion and rapid cell migration, through exosome-bound fibronectin (Sung et al., Nature Communications, 6:7164, 2015). We also previously found that inhibition of exosome secretion affects the persistence of invadopodia, which are filopodia-dependent structures (Hoshino et al., Cell Reports, 5:1159-1168, 2013).  We agree that this is an interesting research direction, and perhaps future work could focus on exosomal factors that are responsible for filopodia persistence.

      With regard to the way we plotted the filopodia data, we plotted the cancer cell data as filopodia per cell area so that it matched the neuron data, which was plotted as filopodia per 100 mm of dendrite distance. Since the neurons cannot be imaged as a whole cell, the quantification is based on the length of the dendrite in the image. We found that graphing the cancer cell data as filopodia per cell gave similar results as filopodia per cell area, as there were no significant differences in cell area between conditions and experiments. We plan to include a new supplementary figure showing the data in Figure 2 plotted as filopodia per cell to show that this quantification gives the same results.

      c) Figure 6 - Why use different gel conditions to detect THSD7A in small EVs from B16F1 cells vs HT1080 and neurons? Why are there two bands for THSD7A in panels C and E? It is difficult to appreciate the KD efficiency in E. The absence of a signal for THSD7A in the HT1080 shEng small EVs that show a signal for endoglin is surprising. The authors should provide rigorous quantification of the westerns from several independent experimental repeats. 

      Detection of THSD7A via Western blot was, unfortunately, not straightforward and simple. Due to the large size (~260 kDa) of THSD7A, its low level of expression in cancer cells, as well as the inconsistency of commercially available THSD7A antibodies, we had to troubleshoot multiple conditions.  We found that it was much easier to detect THSD7A in the human fibrosarcoma cell line HT1080 than in the mouse B16F1 cells, both in the cell lysates and in the small EVs. We were usually unable to detect THSD7A using these same conditions for the mouse melanoma B16F1 samples, but were successful using native gel conditions. We also detected THSD7A in rat primary neuron samples. All these samples were from different source organisms (human, mouse, rat) and from either cell lysates or extracellular vesicles, further complicating the analyses. Expression and maturation of THSD7A in these different cell types and compartments could involve different post-translational modifications, such as glycosylation, thus requiring different methods needed to detect THSD7A on Western blots and leading to different banding patterns. Based on our THSD7A trafficking data, we believe that in control cells, most of the THSD7A is getting trafficked and secreted via small EVs. As you can see in Figure 7A, the band for THSD7A in the shScr cell lysate is relatively light and also shows a double band similar to Figure 6E (both HT1080 samples).

      With regard to the level of knockdown of THSD7A in the Western blot shown in Figure 6E, the normalized level is quantitated below the bands.  If you compare that quantitation to the filopodia phenotypes in the same panel, they are quite concordant.  Figures 7B and 7C show quantification of triplicate Western blots, highlighting the significant accumulation of THSD7A in shEng cell lysates, as well as significant small EV secretion of THSD7A in control and WT rescued conditions.

      (3) The study lacks data on the cellular distribution of endoglin and THSD7A: 

      a) Figure 6 - Is THSD7A expected to be present in the nucleus as shown in panel D (label D is missing in the Figure). It is not clear if this is observed in neurons. a Western of endogenous THSD7A on cell fractions would clarify this. The authors should further characterize the cellular distribution of THSD7A in both cell types. Similarly, the cellular distribution of endoglin in the cancer cells should be provided. This would help validate the proposed model in Figure 8. 

      The image in figure 6D shows an HT1080 cell stained with phalloidin-Alexa Fluor 488 to visualize F-actin with or without expression of THSD7A-mScarlet.  In order to fully visualize the thin filopodia protrusions, the cellular plane of focus of the images for this panel was purposely taken at the bottom of the cell, where the cell is attached to the coverslip glass. Thus, we interpret the red signal across the cell body as THSD7A-mScarlet expression on the plasma membrane underneath the cell, not in the nucleus. The neuron images only include the dendrite portion of the neurons; therefore, there is no nucleus present in the neuronal images.

      b) Figure 7 - Although the western blot provides convincing evidence for the role of endoglin in THSD7A trafficking, the microscopy data lack resolution as well as key analyses. While differences between shSCR and shEng cells are clear visually, the insets appear to be zoomed digitally which decreases resolution and interferes with interpretation. It would be crucial to show the colocalization of endoglin and THSD7A within CD63-postive MVE structures. What are the structures in Figure 7E shSCR zoom1? It would be important to rule out that these are migrasomes using TSPAN4 staining. More information on how the analysis was conducted is needed (i.e. how extracellular areas were chosen and whether the images are representative of the larger population). A widefield image of shSCR and shEng cells and DAPI or HOECHST staining in the higher magnification images should be provided. Additionally, the authors should quantify the colocalization of external CD63 and mScarlet signals from many independently acquired images (as they did for the internal signals in panel F). Is there no external THSD7A signal in the shEng cells? 

      The images for Figure 7E were taken with high resolution on a confocal microscope.  Insets for Figure 7E were zoomed in so that readers could see the tiny structures.  Zoom 1 in Figure 7E shows areas of extracellular deposition. In these areas, we can see small punctate depositions that are positive for CD63 and/or THSD7A-mScarlet. Our interpretation of this staining is that the cells are secreting heterogeneous small EVs that are then attached to the glass coverslip. The images and zooms in Fig 7E were chosen to be representative and indeed reveal that there is more extracellular deposition of THSD7A-mScarlet outside the control shScr cells compared to the shEng cells, consistent with more export of THSD7A into small EVs from shScr cells when compared to those of shEng cells (Fig 7A,B). However, we did not quantify this difference, as these experiments were conducted with transient transfection of THSD7A-mScarlet and it is challenging to determine which cell the extracellular THSD7A-mScarlet came from, complicating any quantitative analysis on a per-cell basis.  Quantification of internal THSD7A localization is much more straightforward in this experimental regime.  Indeed, in Figure 7F we assessed internal colocalization of THSD7A-mScarlet and CD63, which we obtained by choosing only cells that were visually positive for THSD7A-mScarlet in each transient transfection and omitting all extracellular signals. Quantifying the extracellular colocalization of THSD7A and CD63 could certainly be a future direction for this project and would require establishing cells that stably express THSD7A-mScarlet.

    1. Author response:

      We thank the reviewers for their thoughtful feedback and valuable comments. We plan to fully address their concerns by including the following experiments and analyses:

      Reviewer 1 suggested exploring data scaling trends for encoding models, as successful scaling would justify larger datasets for language ECoG studies. To estimate scaling effects, we will develop encoding models on subsets of our data.

      Reviewer 2 expressed uncertainty about the baseline for model-brain correlation and recommended adding control LLMs with randomly initialized weights. In response, we will generate embeddings using untrained LLMs to establish a more robust baseline for encoding results.

      Reviewer 2 also proposed incorporating control regressors such as word frequency and phonetic features of speech. We will re-run our modeling analysis using control regressors for word frequency, 8 syntactic features (e.g., part of speech, dependency, prefix/suffix), and 3 phonetic features (e.g., phonemes, place/manner of articulation) to assess how much these features contribute to encoding performance.

      Reviewer 3 raised concerns that the “plateau in maximal encoding performance” was actually a decline for the largest models. We will add significance tests in Figure 2B to clarify this issue.

      Reviewer 3 also noted that in Supplementary Figure 1A, the decline in encoding performance was more pronounced when using PCA to reduce embedding dimensionality, in contrast to the trend observed when using ridge regression. To address this, we will attempt to replicate the observed scaling trends in Figure 2B using PCA combined with OLS.

      Additionally, we will provide a point-by-point response and revise the manuscript with updated analyses and figures in the near future.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      Colomb et al have further explored the mechanisms of action of a family of three immunodulatory proteins produced by the murine gastrointestinal nematode parasite Heligmosomoides polygyrus bakeri. The family of HpARI proteins binds to the alarmin interleukin 33 and depending on family members, exhibits differential activities, either suppressive or enhancing. The present work extends previous studies by this group showing the binding of DNA by members of this family through a complement control protein (CCP1) domain. Moreover, they identify two members of the family that bind via this domain in a non-specific manner to the extracellular matrix molecule heparan sulphate through a basic charged patch in CCP1. The authors thus propose that binding to DNA or heparan sulphate extends the suppressive action of these two parasite molecules, whereas the third family member does not bind and consequently has a shorter half-life and may function via diffusion. 

      Strengths: 

      A strength of the work is the multifaceted approach to examining and testing their hypotheses, using a well-established and well-defined family of immunomodulatory molecules using multiple approaches including an in vivo setting. 

      Weaknesses: 

      There are a few weaknesses of the approach. Perhaps some discussion and speculation as to how these three family members might operate in concert during Heligmosomoides polygyrus bakeri infection would help place the biology of these molecules in context for the reader, e.g. when and where they are produced. 

      We agree that the roles of these proteins during infection requires further study and is not fully elucidated in infection here. We have added further discussion to the manuscript on their potential roles during infection (track changes manuscript, lines 277 – 283).

      Reviewer #2 (Public Review): 

      Summary: 

      Colomb et al. investigated here the heparin-binding activity of the HpARI family proteins from H. polygyrus. HpARIs bind to IL-33, a pleiotropic cytokine, and modulate its activities. HpARI1/2 has suppressive functions, while HpARI3 can enhance the interaction between IL-33 and its receptor. This study builds upon their previous observation that HpARI2 binds DNA via its CCP1 domain. Here, the authors tested the CCP1 domain of HpARIs in binding heparan sulfate, an important component of the extracellular matrix, and found that 1/2 bind heparan, but 3 cannot, which is related to their half-lives in vivo. 

      Strengths: 

      The authors use a comprehensive multidisciplinary approach to assess the binding and their effects in vivo, coupled with molecular modeling. 

      Weaknesses: 

      (1) Figure 1C should include Western. 

      We apologise for this oversight, and now include an uncropped western blot image as a Figure 1, Figure Supplement 1.

      (2) Figure 1E: Why does HpARI1 stop binding DNA at 50%? 

      It is currently unclear why HpARI1 does not bind to all DNA in the EMSA assay, however this was our repeated finding. With our revised findings we can now state definitively that HpARI1 has a lower affinity for HS compared to HpARI2, and in each of our assays (EMSA (Fig 1D-E), size exclusion chromatography (Fig 4A), HS-bead pull-down (Fig 4B), lung cell surface binding (Fig 4C) and ITC (Fig 4D)) HpARI1 always shows a weaker response compared to HpARI2. We hypothesise that HpARI1 binds more weakly to DNA/HS to allow it to diffuse further from the site of deposition, but we have yet to demonstrate this during infection. We add further discussion of this point (track changes manuscript, lines 262 – 266).

      (3) ITC binding experiment with HpARI1? Also, the ITC results from HpARI2 do not seem to saturate, thus it is difficult to really determine the affinity. 

      We have now included HpARI1-HS ITC, and re-ran the HpARI2 experiment to saturation (Fig 4D-E).

      (4) It would be helpful to add docking results from HpARI1. 

      We have now included HpARI1-HS docking, in Figure 5B.

      (5) Some conclusions are speculative and need to remain in the Discussion. e.g.: a) That HpARI3 may be able to diffuse farther 

      We have rewritten these points to remove the speculation on localisation from the abstract (lines 18-19) and introduction (line 78).

      b) That DNA/HS may trap HpARI1/2 at the infection site. 

      Likewise, these points have been rewritten in the abstract and introduction as above, and we have made it clearer that this is a model that we are proposing in the discussion (line 277-283).

      Reviewer #1 (Recommendations For The Authors): 

      The paper is well-written and the data well-presented. I have one small comment that the authors may like to consider. In the discussion, second paragraph, line 17, perhaps, "evolved" rather than "developed". 

      Thank you for this suggestion, we have made this change (line 248).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      To hopefully contribute to more strongly support the conclusions drawn by the authors, I am including a series of concerns regarding the manuscript, as well as some suggestions that could be useful to address these issues:

      (1) The main results of this study derive from the use of auxin-inducible degron (AID)-tagged proteins. Despite the great advantages of the AID strategy to conditionally deplete proteins, the AID tag can affect the normal function of a protein. In fact, some of the AID-labeled DDC components generated in this work are shown to be hypomorphic. Hence, the manuscript would have benefited from the additional confirmation of some of the observations using a different way to eliminate the proteins (e.g., temperature-sensitive mutants).

      Most ts mutants are also hypomorphic; hence we don’t see there is much advantage to their use. The addition of the AID to these proteins alone does not interfere with the ability to sustain checkpoint arrest as demonstrated in Figure S1. Instead we found that by overexpressing Rad9-AID we could demonstrate that inactivating Rad9 after 15 h behaved the same way as the inactivation of Ddc2, significantly strengthening our finding that the DDC checkpoint becomes dispensable while the SAC takes over. 

      (2) In cells depleted of Rad53-AID, the deletion of CHK1 stimulates an earlier release from a mitotic arrest induced by two DSBs (Figures 2D and 3C). Likewise, the authors claim that a faster escape from the cell cycle block can also be observed when upstream factors such as Ddc2, Rad9, or Rad24 are depleted in the absence of CHK1 (Figures 2A-C and Figures 3D-F). However, this earlier release from the cell cycle arrest, if at all, is only slightly noticeable in a Rad9-AID background (Figures 2B and 3E). In this sense, it is also worth pointing out that Rad9-AID chk1Δ (Figure 3E) and Rad24-AID chk1Δ (Figure 3F) cells were only evaluated up to 7 h, while in all other instances, cells were followed for 9 h, which hinders a fair assessment of the differences in the release from the cell cycle arrest.

      As noted above, we have now been able to examine Rad9 over the long-time frame.

      (3) Although only 25% of the cells depleted for Dun1 remained in G2/M arrest 7 h following the induction of two DSBs, it is shocking that Rad53 was nonetheless still phosphorylated after the cells had escaped the cell cycle blockage (Figure 4A).

      This persistence of Rad53 phosphorylation is also seen with the inactivation of Mad2, allowing escape in spite of continued Rad53 phosphorylation.

      (4) Generation of Rad9-AID2 and Rad24-AID2 strains did not fully restore the function of these proteins, since most cells had adapted 24 h after induction of two DSBs (Figure S1C). Nonetheless, Rad9-AID2 and Rad24-AID2 are still likely more stable than their AID counterparts, and hence the authors could have instead used the AID2 proteins for the experiments in Figure 2 to better evaluate the role of Rad9 and Rad24 in the maintenance of the DDC-dependent arrest.

      We note again that we have found a way to study Rad9 up to 24 h. 

      (5) Deletion of BFA1 has been shown to promote the escape from a cell cycle arrest triggered by telomere uncapping (Wang et al. 2000, Hu et al. 2001, Valerio-Santiago et al. 2013). Likewise, while cells carrying the cdc5-T238A allele cannot adapt to a checkpoint arrest induced by one irreparable DSB, BFA1 deletion rescues the adaptation defect of this mutant CDC5 allele (Rawal et al., 2016). The authors show how, using AID-degrons of Bfa1 and Bub2, that only Bub2, but not Bfa1, is required to maintain a prolonged cell cycle arrest after the induction of two DSBs. To reinforce this point, and as shown for mad2Δ cells (Figure S6A), the authors could perform a complete time course using both the Bfa1-AID and a bfa1Δ mutant to demonstrate that they do indeed show the same behavior in terms of the adaptation to a two DSB-induced cell cycle arrest.

      We thank the reviewer for noting these other instances where bfa1D promoted an escape from arrest. We tested a 2-DSB bfa1 deletion, data has been added to Figure S9E-F. We did not observe a difference in the percentage of cells escaping arrest between the 2-DSB bfa1 deletion and the 2-DSB BFA1-AID strains.

      (6) Bypass or adaptation of a checkpoint-induced cell cycle arrest in S. cerevisiae often leads to cells entering a new cell cycle without doing cytokinesis and, hence, to the accumulation of rebudded cells. However, the experiments shown in the manuscript only account for G1 or budded cells with either one or two nuclei. Do any of the mutants show cytokinesis problems and subsequent rebudding of the cells? If so, this should have been also noted and quantified in the corresponding assays.

      In the cases we have studied we have not seen instances where the cells re-bud without completing mitosis (at least as assessed by the formation of budded cells with two distinct DAPI staining masses). In the morphological assays we have done, we score the continuation of the cell cycle by the appearance of multiple buds, G1, and small budded cells. In our adaptation assays when cells escaped G2/M arrest they formed microcolonies indicating no short-term deficiency in cell division.

      (7) The location of the DSB relative to the centromere of a chromosome seems to be a factor that determines the capacity of the SAC to sustain a prolonged cell cycle arrest. The authors discuss the possibility that the DSB could somehow affect the structure of the kinetochore. Did they evaluate whether Mad1 or Mad2 were more actively recruited to kinetochores in those strains that more strongly trigger the SAC after induction of the DSBs?

      We have not attempted to follow Mad1/2 recruitment. ChIP-seq could be used to monitor Mad1/2 localization at the 16 centromeres in response to DSBs and the spread of g-H2AX across the centromere. Our previous data showed that g-H2AX could spread across the centromere region and could create a change that would be detected by Mad1/2.  This change does not, however, affect the mitotic behavior of a strain in which the H2A genes have been modified to the possibly phosphomimetic H2A-S129E allele.

      (8) The authors could speculate in the discussion about the reasons that could explain why the DDC is required for the maintenance of checkpoint arrest at early stages but then becomes dispensable for the preservation of a prolonged cell DNA DSB-induced cycle arrest, which is instead sustained at later stages by the SAC.

      Our suggestion is that cells would have adapted, but modification of the centromere region engages SAC.

      Finally, some minor issues are:

      (1) The lines in the graphs that display the results from adaptation assays (e.g., Figures 1B and 1E) or cell and nuclear morphology (e.g., Figures 1D and 1G) are too thick. This makes it sometimes difficult to distinguish the actual percentages of cells in each category, particularly in the experiments monitoring nuclear division.

      Fixed

      (2) While both the adaptation assay and the analysis of nuclear division in Figures 1E and 1G, respectively, show a complete DDC-dependent arrest at 4h, the Western blot in Figure 1F suggests that Rad53 is not phosphorylated at that time point. Do these figures represent independent experiments? Ideally, the analysis of cell budding and nuclear division, which is performed in liquid cultures, and the Western blot displaying Rad53 phosphorylation should correspond to the same experiment.

      Cell budding in liquid cultures and adaptation assays were performed in triplicate with 3 biological replicates and the collective results are shown in each graph showing the percentage of large-budded cells. Western blot samples were collected in each liquid culture experiment. The western blot in 1G is a representative western blot.

      (3) It is somewhat confusing that the blots for the proteins are not displayed in the same order in Figures 2A (Rad53 at the top) and 2B or 2C (Rad53 in the middle).

      Fixed.  We place Rad53 – the relevant protein - at the top.

      Reviewer #2 (Recommendations For The Authors):

      (1) Yeast with the two breaks responds to DNA damage checkpoint (DDC) until sometimes 4-15 h post DNA damage. Since the auxin-induced degradation does not completely deplete all the tagged proteins in cells, the results should be more carefully considered and not to interpret if the checkpoint entry or maintenance depends on each target protein's ability to induce Rad53 phosphorylation. It should be theoretically possible if checkpoint maintenance requires only a modest amount of checkpoint factors especially because the experiments involve the induction of one or two DSBs. The low levels of DDC factors may be insufficient for Rad53 activation but could still be effective for cell cycle arrest. Indeed, the Haber group showed that the mating type switch did not induce Rad53 phosphorylation but still invoked detectable DNA damage response. To test such possibilities, the authors might consider employing yet another marker for DDC such as H2A or Chk1 phosphorylation besides Rad53 autophosphorylation. Alternatively, the authors might check if auxin-induced depletion also disrupts break-induced foci formation for checkpoint maintenance or their enrichment at DNA breaks using ChIP assays at various points post-damage.

      DAPI staining of Ddc2-AID cells show that when IAA is added 4 h after DSB induction (Figure S3A), cells escape G2/M arrest as evidenced by the increase in large-budded cells with 2 DAPI signals, small budded cells, and G1 cells. Overexpression of Ddc2 can sustain the checkpoint past 24 h, but without SAC proteins like Mad2 they will eventually adapt (Figure S6B).

      That Rad9-AID or Rad24-AID in the absence of added auxin (but in the presence of TIR1) is unable to sustain arrest suggests to us that low levels of Rad9 or Rad24 are not sufficient to maintain arrest.  As the reviewer notes, normal MAT switching doesn’t cause Rad53 phosphorylation or arrest, though early damage-induced events such as H2A phosphorylation do occur.  But our point is that Rad9 or Ddc2 is needed to maintain arrest only up to a certain point, after which they become superfluous and a different checkpoint arrest is imposed. At that point apparently a low level of these proteins plays no obvious role.

      (2) It is interesting that DDC no longer responds to the damage signaling after 15 h of DSB-induced prolonged checkpoint arrest after two DNA double-strand breaks. Is this also applicable to other adaptation mutants? The results might improve the broad impact of the current conclusions. It is also possible that the transition from DDC to SPC depends on simply the changes in signaling or in part due to the molecular changes in the status of DNA breaks or its flanking regions. Indeed, the proposed model suggests that the spreading of H2A phosphorylation to centromeric regions induces SAC and thus mitotic arrest. The authors could measure H2A phosphorylation near the centromere using ChIP assays at various intervals post-DNA damage. It is particularly interesting if depletion of Ddc2 at 15 h post DNA damage does not alter the level of H2A phosphorylation at or near centromere.

      Our previous data have suggested that the involvement of the SAC in prolonging DSB-induced arrest involved post-translational modification of centromeric chromatin such as the Mec1- and Tel1-dependent phosphorylation of the histone H2A (Dotiwala). In budding yeast there is also a similar DSB-induced modification of histone H2B (Lee et al.). To ask if there is an intrinsic activation of the SAC if the regions around centromeres were modified by checkpoint kinase phosphorylation, we examined cell cycle progression in strains in which histone H2A or histone H2B was mutated to their putative phosphomimetic forms (H2A-S129E and H2B-T129E).  As shown in Figure S11, there was no effect on the growth rate of these strains, or of the double mutant, suggesting that cells did not experience a delay in entering mitosis because of these modifications. We note that although histone H2A-S129E is recognized by an antibody specific for the phosphorylation of histone H2A-S129, the mutation to S129E may not be fully phosphomimetic. 

      (3) It is puzzling why Rad9-AID or Rad24-AID are proficient for DDC establishment but cannot sustain permanent arrest in the two break cells. It appears Rad53 phosphorylation for DDC is weaker in cells expressing Rad9-AID or Rad24-AID according to Fig.2B and C even though their protein level before IAA treatment is still robust. This might also explain why the results of depleting Rad53 and Rad9 are very different. It also raises concern if the effect of Rad24 depletion on checkpoint maintenance is in part due to the weaker checkpoint establishment. It might be necessary to use the AID2 system to redo Rad24 depletion to exclude such a possibility.

      We believe that the AID mutants are very sensitive to the low level of IAA present in yeast.  The instability of the protein is entirely dependent on the TIR1 SCF factor, so the proteins themselves are not intrinsically defective; they are just subject to degradation.  Overexpressing Rad9 allowed us to evaluate its role at late time points. 

      (4) It is intriguing that the switch from DDC to SAC might take place at around 12 h when yeasts with a single unrepairable break ignore DDC and resume cell cycling (so-called "adaptation"). Since 4h and 15h are far apart and the transition point from DDC to SAC likely takes place between these two points, it will be very helpful to analyze and compare cell cycle exit after 24 h by treating IAA at multiple points between 4-15h.

      When we add IAA to Mad2-AID and Mad1-AID 4 h after DSB induction, cells remain arrested for up to 12 h after DSB induction. At 15 h cells begin to exit checkpoint arrest indicating that the handoff of checkpoint arrest must occur between 12 to 15 h after DSB induction. If we degraded DNA damage checkpoint proteins at any point before Mad2, Mad1, and Bub2 begin to contribute to checkpoint arrest, then arrested cells will likely adapt in a similar manner to when IAA was added 4 h after DSB induction.

      (5) Some of the Western blot quality is poor. For instance, in Figure 6C, Mad1-AID level after IAA addition is not compelling especially because the TIR level (the loading control) is also very low.

      In Figure 6C, while the relative levels of TIR1 are similar in the IAA treated and untreated samples, there is no detectable amount of Mad1-AID in the IAA treated samples indicating that Mad1-AID was successful degraded with the AID system.

      (6) Fig. 8 is complex. It might be helpful to define the different types of arrows in the figure. The legend also has a spelling error, Rad23 should be Rad24.

      We’ve defined what each arrow means in the legend and corrected the spelling error in the figure legend.

      Reviewer #3 (Recommendations For The Authors):

      Major concerns:

      Much of the manuscript states that two unrepairable DSBs lead to a long and severe G2/M arrest. Two main cytological approaches are used to make this statement: bud size and number on plates after micromanipulation (microcolony assay), and cell and nuclear morphology in liquid cultures. While the latter gives a clear pattern that can be assigned to a G2/M block as expected by DDC, i.e. metaphase-like mononucleated cells with large buds, the former can only tell whether cells eventually reach a second S phase (large budded cells on the plate can be in a proper G2/M arrest, but can also be in an anaphase block or even in the ensuing G1). The authors always performed the microcolony assay, but there are several cases where the much more informative budding/DAPI assay is missing. These include Dun1-aid and others, but more importantly chk1D and its combinations with DDC proteins. Incidentally, for the microcolony assay, it is more accurate to label the y-axis of the corresponding graphs (and in the figure legends and main text) with something like "large budded cells"; "G2/M arrested cells" is misleading.

      Figures have been updated to more accurately reflect what we are measuring.

      The results obtained with the Bfa1/Bub2 partner are intriguing. These two proteins form a complex whose canonical function is to prevent exit from mitosis until the spindle is properly aligned, acting in a distinct subpathway within the SAC that blocks MEN rather than anaphase onset. The data presented by the authors suggest that, on the one hand, both SAC subpathways work together to block the cell cycle. However, why does canonical SAC (Mad1/Mad2) inactivation not lead to a transition from G2/M (metaphase-like) arrested cells to anaphase-like arrest maintained by Bfa1-Bub2? Since Bfa1-Bub2 is a target of DDC, is it possible that DDC knockdown also inactivates this checkpoint, allowing adaptation? On the other hand, can the authors provide more data to confirm and strengthen their claim of a Bfa1-independent Bub2 role in prolonged arrest? Perhaps long-term protein localization and PTM changes. Bub2-independent roles for Bfa1 have been reported, but not vice versa, to the best of my knowledge.

      In the mitotic exit network Bfa1/Bub2 prime activation of the pathway by bringing Tem1 to spindle pole bodies. Phosphorylation of Bfa1 causes Tem1 to be released and phosphorylate Cdc5 to trigger exit by MEN. It has been shown that DNA damage, in a cdc13-1 ts mutant, phosphorylates Bfa1 in a Rad53 and Dun1 dependent manner. This phosphorylation of Bfa1 could release Tem1 and prime cells to exit checkpoint arrest when cells pass through anaphase. Looking at Tem1 localization to spindle pole bodies and interactions with Bfa1/Bub2 in response to DNA damage might give insight into why cells don’t experience an anaphase-like arrest when they are released by either deactivation of the DNA damage checkpoint or SAC.

      We have previously shown that a deletion of bub2 in a 1-DSB background shortens DSB-induced checkpoint arrest. Deletion of bfa1 in a 2-DSB background showed ~80-70% of cells stuck in a large-budded state as measured through an adaptation assay tracking the morphology of G1 cells on a YP-Gal plate and DAPI staining. Deletion or degradation of bfa1 might not release cells from arrest because the Mad2/Mad1 prevent cells from transitioning into anaphase. Our DAPI data for Bub2-AID shows an increase in cells with 2 DAPI signals (transition into anaphase) and small budded cells indicating that degradation of Bub2 is releasing cells into anaphase and allowing cells to complete mitosis.

      Further suggestions:

      It would be richer if authors could provide more than one experimental replicate in some panels (e.g., S1A,B; S4A; and S6B).

      S1C confirms that Rad9-AID and Rad24-AID will adapt by 24 h even with the point mutant TIR1(F74G) which has lower basal degradation than TIR1. S4A has been updated with additional experimental replicates. The 48 h timepoint after DSB induction was to show the importance of Mad2 even when Ddc2 is overexpressed.

      Figure 1: Rearrange figure panels when they are first mentioned in the text. For example, it makes more sense to have the plate adaptation assay as panel B for both 1-DSB and 2-DSB strains, budding plus DAPI as panel C, and Rad53 as panel D.

      These figures have been rearranged in the order that they are mentioned in the paper.

      Figure 5: Correct Ph-5-IAA in the Rad53 WBs (it should be 5-Ph-IAA).

      This has been corrected.

      Figure S2: The straight line under the "+IAA" text box is misleading. I think it should also cover the "-2" time point, right? Also, check the figure legend. Information is missing and does not correspond to the figure layout.

      This has been corrected.

      Figure S3: Perhaps "Cell cycle profile as determined by budding and DAPI staining" is a better and more accurate legend title.

      The legend title has been updated to “Cell cycle profile as determined by budding and DAPI staining in Ddc2-AID and Rad53-AID mutants ± IAA 4 h after galactose.”

      Figure S5: Detection of both Rad53 and Ddc2 in the same blot could lead to misinterpretation as hyperphosphorylated Rad53 appears to coincide with Ddc2 migration.

      Figure S5A-B are representative western blots where Rad53 was probed to show activation of the DNA damage checkpoint by Rad53 phosphorylation. When measuring the relative abundance of Ddc2 we did not probe all blots for Rad53.

      Table S1: Include the post-hoc test used for comparisons after ANOVA.

      A Sidak post-hoc test was used in PRISM for the one-way ANOVA test. PRISM listed the Sidak post-hoc test as the recommended test to correct for multiple comparisons. A column has been added to S. Table 1 to show which post-hoc test was used.

      Page 10, line 4: The putative additive effect of chk1 knockout with Dun1 depletion should also be compared to chk1 alone (in Figure 3A).

      We address the additive effect of chk1 knockout with Dun1-AID depletion in a later section on Page 11, line 6. Since we had not explored possible effects from downstream targets of Rad53 for prolonging checkpoint arrest when Rad53 was depleted, we did not mention the effect of the chk1 knockout on Dun1 depletion.

      Page 14, second paragraph, line 4: "Figure 6A-D", is it not?

      Figure S6A is measuring checkpoint arrest in a deletion of mad2 in a 2-DSB strain. Figure 6A-D shows how degradation of Mad2-AID and Mad1-AID after the handoff of arrest causes cells to exit the checkpoint in a Rad53 independent manner.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary: The authors investigated the function of Microrchidia (MORC) proteins in the human malaria parasite Plasmodium falciparum. Recognizing MORC's implication in DNA compaction and gene silencing across diverse species, the study aimed to explore the influence of PfMORC on transcriptional regulation, life cycle progression and survival of the malaria parasite. Depletion of PfMORC leads to the collapse of heterochromatin and thus to the killing of the parasite. The potential regulatory role of PfMORC in the survival of the parasite suggests that it may be central to the development of new antimalarial strategies.

      Strengths: The application of the cutting-edge CRISPR/Cas9 genome editing tool, combined with other molecular and genomic approaches, provides a robust methodology. Comprehensive ChIP-seq experiments indicate PfMORC's interaction with sub-telomeric areas and genes tied to antigenic variation, suggesting its pivotal role in stage transition. The incorporation of Hi-C studies is noteworthy, enabling the visualization of changes in chromatin conformation in response to PfMORC knockdown.

      We greatly appreciate the overall positive feedback and cognisense of our efforts. Our application of CRISPR/Cas9 genome editing tools coupled with complementary cellular and functional approaches shed light on the importance of _Pf_MORC in maintaining chromatin structural integrity in the parasite and highlights this protein as a promising target for novel therapeutic intervention.

      Weaknesses: Although disruption of PfMORC affects chromatin architecture and stage-specific gene expression, determining a direct cause-effect relationship requires further investigation.

      Our conclusions were made on the basis of multiple, unbiased molecular and functional genomic assays that point to the relevance of the _Pf_MORC protein in maintaining the parasite’s chromatin landscape. Although we do not claim to have precise evidence on the step-by-step pathway to which _Pf_MORC is involved, we bring forth first-hand evidence of its role in heterochromatin binding, gene-regulation and its association with major TFs as well as chromatin remodeling and modifying enzymes. We however agree with the comment regarding the lack of direct effects of _Pf_MORC KD and have since provided additional evidence by performing ChIP-seq experiments against H3K9me3 and H3K9ac during KD. Our new results are presented in Fig. 5. We showed that the level of H3K9me3 decreased significantly during _Pf_MORC KD.

      Furthermore, while numerous interacting partners have been identified, their validation is critical and understanding their role in directing MORC to its targets or in influencing the chromatin compaction activities of MORC is essential for further clarification. In addition, the authors should adjust their conclusions in the manuscript to more accurately represent the multifaceted functions of MORC in the parasite.

      Validation of the identified interacting partners is indeed critical and essential to understanding their role in directing MORC to its targets. Our protein pull down experiments have been done using several biological replicates. Several of the interacting partners have also been identified and published by other labs and collaborators. To confirm our results, we completed a direct comparison of our work with previous published work. Results have now been incorporated into the revised manuscript to confirm the identified interacting partners and the accuracy of the data we obtained in our experiment. Molecular validation of novel proteins identified in our protein pull down requires generation of tagged lines and may take a few more years but will be submitted for publication in a follow up manuscript.

      Reviewer #2 (Public Review):

      Summary: This paper, titled "Regulation of Chromatin Accessibility and Transcriptional Repression by PfMORC Protein in Plasmodium falciparum," delves into the PfMORC protein's role during the intra-erythrocytic cycle of the malaria parasite, P. falciparum. Le Roch et al. examined PfMORC's interactions with proteins, its genomic distribution in different parasite life stages (rings, trophozoites, schizonts), and the transcriptome's response to PfMORC depletion. They conducted a chromatin conformation capture on PfMORC-depleted parasites and observed significant alterations. Furthermore, they demonstrated that PfMORC depletion is lethal to the parasite.

      Strengths: This study significantly advances our understanding of PfMORC's role in establishing heterochromatin. The direct consequences of the PfMORC depletion are addressed using chromatin conformation capture.

      We appreciate the Reviewer’s comments and reflection on the importance of our work.

      Weaknesses: The study only partially addressed the direct effects of PfMORC depletion on other heterochromatin markers.

      Here again, we agree with the reviewer’s comment and have performed additional experiments to delve deeper into the multifaceted roles of _Pf_MORC. We have performed additional ChIP-sequencing analysis on _Pf_MORC depleted conditions focusing on known heterochromatin and euchromatin markers H3K9me3 and H3K9ac respectively. We hope our new results presented in figure 5 will shed light on the more direct implications of _Pf_MORC on heterochromatin and gene silencing.

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analyses.

      • Why does MORC, which was used in the pull-down, seem to be only minimally enriched in the volcano plot, while a series of proteins (marked in red) and AP2 (highlighted in green) are enriched with log2 fold changes exceeding 15?

      We apologize for the confusion. MORC was detected with the highest number of peptides (97 and 113) and spectra (1041 and 1177) confirming the efficiency of our pull-down. However, considering the relatively large size of the MORC protein (295kDa) and it weak detection in the control (5 and 7 peptides; 16 and 43 spectra), the Log2 FoldChange and Z-statistic after normalization are minimal compared to smaller proteins that were not identified in the control samples.

      Additionally, can you explain why these proteins appear to be enriched at the same fold? 

      We can postulate that these proteins form a complex with a ratio of 1:1. Two of these three proteins are described to interact with MORC in several publications, supporting a strong interaction between them.

      Variations in the interactome could result from the washing buffer's stringency.

      We agree that the IP conditions could affect the detection of the interactome as well as the parasite stage used. As indicated below, the overlap with previous publications and the presence of AP2 TFs and chromatin remodelers strongly support our results.

      It would be highly appropriate for the authors, similar to the co-submitted article (Maneesh Kumar Singh et al.), to present their mass spectrometry data in relation to previous purifications in Plasmodium (Bryant et al. 2020; Subudhi et al. 2023; Hillier et al. 2019) and also in Toxoplasma (Farhat et al. 2020). It would be good if authors could also put their results into perspective in light of the following pre-prints:

      We agree with the reviewer’s comment. In this revised manuscript, we compared our IP-MS data to previous published manuscripts. Key proteins including the AP2-P (PF3D7_1107800) and HDAC1 were indeed identified in several experiments validating our initial findings of the formation of large complexes with MORC. However, it’s important to highlight that the MORC protein was not used as the bait protein in previously published papers, and thus some discrepancies can be observed.

      Given the tendency of MORCs to form multiple complexes with AP2 factors, have you explored whether specific AP2s are conserved between Plasmodium and Toxoplasma, within the phylum?

      P. falciparum encodes for 27 putative AP2s, while T. gondii has over 60 AP2s, making direct comparison challenging. Some Plasmodium AP2s have multiple counterparts in T. gondii and typically conservation is limited to the AP2 binding domains. Attempts to identify sequence homology among AP2s and the regions of conservation have been performed (PMID: 30959972, PMID: 30959972, PMID: 16040597). Although this information would provide interesting insight, we believe exploring this topic at this time would diverge from our primary objectives. It would be more appropriate to address this in future studies.

      Could this conservation be identified either through phylogenetic means or by using tools such as AlphaFold, especially considering not just the AP2 domains but also any existing ACDC domains?

      Although this may reveal important information regarding the association between MORC proteins and AP2 domains, we believe investigating the conservation between AP2 across apicomplexan parasites may prove too challenging and is beyond the scope of this work.

      Most of the genes are depicted without their immediate surroundings (Fig. 2d and Fig S2c, d). For instance, the promoter region of AP2g is not shown (Fig. 2d). It is therefore very challenging to determine the presence or absence of MORC upstream or downstream; considering that this factor, which can create DNA loop protrusions, might bind at a distance from the genes in question.

      All gene coverage plots, including AP2-G, show 500 bp up- and downstream of the displayed gene. We have modified our figure legends to make sure that this information is provided.

      Upon examining Figure S3, it is evident that the authors have indicated a decline in PfMORC expression, represented as percentages over two unique time frames. The methodology behind this quantification remains ambiguous. It's essential for the authors to specify whether normalization was done using a loading control. As a benchmark, Singh et al. (2021) in their Figure 4 transparently used GAPDH as a loading control and included an untreated sample in their western blot analysis.

      We thank the Reviewer for bringing this to our attention. Our initial quantification was performed using ImageJ. To address the Reviewer’s comment, we have reperformed the experiment. Our quantitative analysis was performed through Bio-Rad ImageLab software using aldolase expression as a loading control (50% of the MORC loading). This information has now been incorporated into the supplementary figures (Figure S3).

      There's a striking observation that, despite significant degradation of PfMORC (as depicted in Figures S1 and S3), only the upper band in the western blot diminishes. This inconsistency needs addressing, as it can raise questions about the interpretation of the results.

      We agree with the reviewer's comment. We experienced some challenges upon performing a Western Blot on such a large protein (295kDa). Our initial attempts required long exposure that may have highlighted non-specific signals of smaller proteins. To address the reviewer’s comment, we have performed the experiment one more time and made necessary changes to our WB protocol. Our new result better reflects the expected down regulation of _Pf_MORC. These changes have been incorporated to our manuscript and Fig S3.

      Recommendations for improving the writing and presentation.

      MORC KD quantification and consistency with previous findings (Figure S3): When comparing their results with those from another study (Singh et al. 2021), it's critical to ensure that the experimental conditions, especially the methodology for KD and the quantification of protein levels, are similar. If not, a direct comparison might be misleading.

      We greatly appreciate the suggestions and have made efforts to redesign the MORC KD quantifications according to the reviewer’s recommendations.

      While the manuscript mentions the level of KD, it does not delve into the functional consequences of such a decrease in protein levels. It would be of interest to understand how this level of KD affects the parasite's biology, especially in the context of the paper's main findings.

      We have addressed this question by looking at the changes in chromatin structure in WT versus KD parasites upon atc removal. We have also validated this initial result by designing an additional ChIP-seq experiment against histone marks in WT versus KD parasites upon atc removal. Our findings showed a significant downregulation in H3K9me coverage in heterochromatin regions, specifically in genes associated with antigenic variation and invasion genes. These findings suggest that PfMORC regulates at least partially gene silencing and chromatin arrangements. The manuscript has been edited accordingly. 

      Concluding page 5, the authors present an interpretation of their findings that suggests a multi-faceted role of PfMORC in regulating stage-specific gene families, particularly the gametocyte-related genes and merozoite surface proteins. While the narrative they present is intriguing, several concerns arise:

      Over-reliance on correlation: The authors draw a direct line between the levels of PfMORC binding and the function of these genes in the parasite's life cycle. However, a mere correlation between PfMORC binding and stage-specific gene activity does not necessarily imply causation. They would need to provide experimental evidence showing that manipulation of PfMORC levels directly impacts these genes' expression.

      We agree with the reviewer's comment. We have however partially addressed this issue by comparing our ChIP-seq, RNA-seq and Hi-C experiments. We concluded that several of the transcriptional changes observed were due to an indirect effect of PfMORC KD and were most likely induced by a cell cycle arrest and partial collapse of the chromatin structure. The collapse of the heterochromatin structure was validated using our Hi-C experiment. To further address additional concerns the review’s had, we have included additional ChIP-seq experiments targeting histone marks to confirm our initial hypothesis. Result of this additional experiment has been incorporated in the revised version of the manuscript.

      Ambiguity surrounding "low levels" and "high levels": The terms "low levels" and "high levels" of PfMORC binding are qualitative and could be subject to interpretation. Without quantification or a clear benchmark, these descriptions remain vague.

      We agree with the reviewers that the terms "low levels" and "high levels" of PfMORC binding are qualitative and could be subject to interpretation. We have however quantified our change in DNA binding using normalized reads (RPKM). In trophozoite and schizont stages, most of the genes contain a mean of <0.5 RPKM normalized reads per nucleotide of Pf_MORC binding within their promoter region, whereas antigenic gene families such as _var and rifin contain ~1.5 and 0.5 normalized reads, respectively (Fig. 2b). Similar results are also obtained for the gametocyte-specific transcription factor AP2-G  that contains levels of Pf_MORC binding similar to what is observed in _var genes (Fig. 2c and S2c, d).

      Shift in Binding Sites: The observed minor switch in PfMORC binding sites from gene bodies to intergenic and promoter regions is mentioned, but without context on how these shifts impact gene expression or any comparative analysis with other proteins showing similar shifts. The claim that this shift implicates PfMORC as an "insulator" is a leap without direct evidence.

      We apologize for the confusion. We  have compared our ChIP-seq with RNA seq results at different time points of the cell cycle and demonstrated that the shift observed has an effect in gene expression. We have edit the manuscript to clarify these results.

      Overextension of PfMORC's Role: The authors suggest that PfMORC moves to the regulatory regions around the TSS to guide RNA Polymerase and transcription factors. This is a substantial claim and would require additional experiments to validate. Simply observing binding in a region is insufficient to assign a specific functional role, especially one as critical as guiding RNA Polymerase. Historically, the MORC family has been primarily linked with gene silencing across Apicomplexan, plants, and metazoans. On page 7, the authors noted a minimal overlap between the ChIP-seq and RNA-seq signals (Fig. 4e). They also acknowledged that the pronounced gene expression shifts at schizont stages result from a combination of direct and indirect impacts of PfMORC degradation, which could cause cell cycle arrest and potential heterochromatin disintegration, rather than just decreased PfMORC binding. Therefore, the authors should adjust their conclusions in the manuscript to more accurately represent the multifaceted functions of MORC in the parasite.

      We agree with the reviewer's comment and have edited the manuscript accordingly.  

      DISCUSSION:

      The authors concluded that "Using a combination of ChIP-seq, protein knock down, RNA-seq and Hi-C experiments, we have demonstrated that the MORC protein is essential for the tight regulation of gene expression through chromatin compaction, preventing access to gene promoters from TFs and the general transcriptional machinery in a stage specific manner."

      Again, the assertion that MORC protein is essential for tight regulation of gene expression, based purely on correlational data (e.g., ChIP-seq showing binding doesn't prove functionality), assumes causality which might not be fully substantiated. The phrase "preventing access to gene promoters from TFs and the general transcriptional machinery in a stage-specific manner" needs also validation. Asserting that MORC is essential for this function might oversimplify the process and overlook other critical contributors.

      We agree with the reviewer’s comments and the conclusion has since been edited accordingly.

      The discussion is quite poor. It would be pertinent to put MORC in perspective within the broader picture of regulatory mechanisms of chromatin state at telomeres and var genes. For instance, how do SIR2 and HDAC1 (associated with MORC) divide the task of deacetylation? Or the contribution of HP1 and other non-coding RNAs.

      We agree with the reviewer’s suggestion. However, in order to put MORC in perspective within a broader picture, we would need to measure changes in localization of several molecular components regulating heterochromatin in WT versus KD condition. This will require access to several molecular tools and specific antibodies that we do not currently have. We have addressed these issues in our discussion.  

      Minor corrections to the text and figures.

      Figure 1d: Could you provide the ID for each AP2 directly on the volcano plot? While some IDs are referenced in the manuscript, visual representation in the plot would facilitate a clearer understanding of their enrichment levels.

      ID for unknown AP2 proteins have been added on the volcano plot.

      I recommend presenting Figure S2b as a panel within a primary figure. This change would offer readers a more quantitative understanding of the distinct differences between developmental stages. Notably, there seems to be a limited number of genes in common when considering the total, and there is an apparent lack of enrichment in the ring stage.

      This has been done.

      The captions are very minimally detailed. An effort must be made to better describe the panels as well as which statistical tests were used. 

      We have improved the figure legends and add the number of biological replicates as well as the statistic used in each figure legend.

      Figure 1A: The protein diagram with its domains does not take scale into account.

      The figure has been modified.

      Reviewer #2 (Recommendations For The Authors):

      (1) The study lacks a direct link between PfMORC's inferred function and the state of heterochromatin in the genome post-depletion.

      We agree with the reviewer's comment and have included additional ChIP-seq experiments to measure changes in histone marks in PfMORC depleted parasite line. We show a significant decrease in histone H3K9me3 marks in PfMORC KD condition.

      Conducting ChIP-seq on well-known heterochromatin markers such as H3K9me3, HP1, or H3K36me2/3 could shed light on the consequences of PfMORC depletion on global heterochromatin and its boundaries.

      With no access to an anti-HP1 antibody with reasonable affinity, we have not been able to study the impact of MORC KD on HP1 but have successfully observed the impact on H3K9me3 marks. These results have been added to the revised manuscript in (Fig. 5).

      (2) The authors should conduct a more comprehensive analysis of PfMORC's genomic localization, comparing it to ApiAP2 binding (interacting proteins) and histone modifications. This would provide valuable insights.

      We have performed a more comprehensive genome wide analysis of MORC binding through ChIP-seq on WT and MORC-KD conditions. Our results show that Pf_MORC localizes to heterochromatin with significant overlap with H3K9-trimethylation (H3K9me3) marks, at or near _var gene regions. When downregulated, level of H3K9me3 was detected at a lower level, validating a possible role of _Pf_MORC in gene repression. Regarding the comparison with AP2 binding, our proteomics datasets have shown extensive MORC binding with several AP2 proteins.

      (3) RNA-seq data reveals that only a few genes are affected after 24 hours of PfMORC depletion, with an equivalent number of up-regulated and down-regulated genes. The reasons behind down-regulation resulting from a heterochromatin marker depletion are not clearly established.

      We agree with the reviewer’s comment. At this stage (24 hours), _Pf_MORC depletion is limited and the effects at the transcriptional level are quite restricted. Furthermore, it is highly probable that down-regulated genes are most likely due to an indirect effect of a cell cycle arrest. We have edited the manuscript to address this comment. 

      The relationship between this data and the partial depletion of PfMORC needs further discussion.

      We agree with the reviewers and have improved our discussion in the revised version of the manuscript.

      (4) The authors did not compare their ChIP-seq data with the genes found downregulated in the RNA-seq data. Examining the correlation between these datasets would enhance the study.

      We apologize for the confusion. We have compared ChIP-seq and RNA-seq data and identified a very limited number of overlapping genes indicating that most of the changes observed in gene expression are in fact most likely indirect due to a cell cycle arrest and a collapse of the chromatin. We have edited the manuscript to clarify this issue.

      (5) The discussion section is relatively concise and does not fully address the complexity of the data, warranting further exploration.

      We have improved the discussion section in the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors previously showed in cell culture that Su(H), the transcription factor mediating Notch pathway activity, was phosphorylated on S269 and they found that a phospho-deficient Su(H) allele behaves as a moderate gain of Notch activity in flies, notably during blood cell development. Since a downregulation of Notch signaling was proposed to be important for the production of a specialized blood cell types (lamellocytes) in response to wasp parasitism, the authors hypothesized that Su(H) phosphorylation might be involved in this cellular immune response.

      Consistent with their hypothesis, the authors show that Su(H)S269A knock-in flies display a reduced response to wasp parasitism and that Su(H) is phosphorylated upon infestation. Using in vitro kinase assays and a genetic screen, they identify the PKCa family member Pkc53E as the putative kinase involved in Su(H) phosphorylation and they show that Pkc53E can bind Su(H). They further show that Pkc53E deficit or its knock-down in larval blood cells results in similar blood cell phenotypes as Su(H)S269A, including a reduced response to wasp parasitism, and their epistatic analyses indicate that Pkc53E acts upstream of Su(H).

      Strengths

      The manuscript is well presented and the experiments are sound, with a good combination of genetic and biochemical approaches and several clear phenotypes which back the main conclusions. Notably Su(H)S269A mutation or Pkc53E deficiency strongly reduces lamellocyte production and the epistatic data are convincing.

      Weaknesses

      The phenotypic analysis of larval blood cells remains rather superficial. Looking at melanized cells is a crude surrogate to quantify crystal cell numbers as it is biased toward sessile cells (with specific location) and does not bring information concerning the percentage of blood cells differentiated along this lineage.

      In Su(H)S269A knock-in or Pkc53E zygotic mutants, the increase in crystal cells in uninfected conditions and the decreased capacity to induce lamellocytes following infection could have many origins which are not investigated. For instance, premature blood cell differentiation could promote crystal cell differentiation and reduce the pool of lamellocytes progenitors. These mutations could also affect the development and function of the posterior signaling center in the lymph gland, which plays a key role in lamellocyte induction.

      Similarly, the mild decrease on resistance to wasp infestation (Fig. 2A) could reflect a constitutive reduction in blood cell numbers in Su(H)S269A larvae rather than a defective down-regulation of Notch activity.

      We fully agree with the reviewer that sessile crystal cells counts are a coarse approach to capture hemocytes. However, they allowed the screening of numerous genotypes in the course of our kinase candidate screen. We recorded the hemocyte numbers in the various genetic backgrounds and with regard to wasp infestation. There was no significant difference between Su(H)S269A and Su(H)gwt control, independent of infection. This is in agreement with earlier observations of unchanged plasmatocyte numbers in N or Su(H) mutants compared to the wild type (Duvic et al., 2002). We noted, however, a small drop in hemocyte numbers in Su(H)S269D and a strong one in Pkc53ED28 mutants in both conditions relative to control. Presumably, Pkc53E has a more general role in blood cell development, which we have not further analysed. The results were included in new Figure 1_S1 and Figure 9_S1 supplements. Based on the link between hemocyte numbers and wasp resistance (e.g. McGonigle et al., 2017), we cannot exclude that the lowered resistance of Pkc53ED28 mutants regarding wasp attacks is partly due to reduced hemocyte numbers, albeit we did not see significant differences between either Su(H)S269A, nor Pkc53ED28 nor the double mutant. We have included this notion in the text.

      Lamellocytes arise in response to external challenges like parasitoid wasp infestation by trans-differentiation from larval plasmatocytes, and by maturation of lamellocyte precursors in the lymph gland, yet barely in the Su(H)S269A and Pkc53ED28 mutants.

      We find it hard to envisage, however, that a premature differentiation of plasmatocytes into crystal cells in our case could deplete the pool of lamellocyte progenitors in the hemolymph. (Is there a precedent?). Crystal cells make up about 5% of the hemocyte pool; they are increased max. 2 fold in the Su(H)S269A and Pkc53E mutants. Even if these extra crystal cells (now  ̴10%) had arisen by premature differentiation, there should be still enough plasmatocytes (̴ 80%) remaining with a potential to further divide and transdifferentiate into lamellocytes.

      Indeed, we cannot exclude an effect of the Su(H)S269A mutant on the development and function of the posterior signaling center of the lymph gland. We noted, however, a slight but significant enlargement of the PS in the Su(H)S269A mutant, that to our understanding cannot explain the reduced lamellocyte numbers.

      Whereas the authors also present targeted-knock down/inhibition of Pkc53E suggesting that this enzyme is required in blood cells to control crystal cell fate (Fig. 6), it is somehow misleading to use lz-GAL4 as a driver in the lymph gland and hml-GAL4 in circulating hemocytes as these two drivers do not target the same blood cell populations/steps in the crystal cell development process.

      We fully agree with the reviewer that the two driver lines target different blood cell populations/ steps in hematopoiesis. The hml-Gal4 driver is regarded pan-hemocyte, common to both plasmatocytes and pre-crystal cells (e.g. Tattikota et al., 2020). It has been reported to drive specifically within differentiated hemocytes prior to or at the stage of crystal cells commitment (Mukherjee et al., 2011). Hence, hml-Gal4 appeared suitable to hit sessile and circulating hemocytes prior to final differentiation into crystal cells or lamellocytes, respectively.

      In the lymph gland, however, hml is expressed within the cortical zone, where it appears specific to the plasmatocytes lineage, and not present in the crystal cell precursors (Blanco-Obregon et al., 2020). In contrast, lz-Gal4 is specific to the differentiating crystal cells in both lineages, i.e. in circulating and sessile hemocytes and in the lymph gland. Hence, we choose lz-Gal4 instead of hml-Gal4 at the risk of driving markedly later in the course of crystal cell differentiation. We included the reasoning in the text. Overall, we feel that this choice does not limit our conclusions.

      In addition, the authors do not present evidence that Pkc55E function (and Su(H) phosphorylation) is required specifically in blood cells to promote lamellocyte production in response to infestation.

      We have tried to address this interesting question by several means. Firstly, we show that Pkc53E is indeed expressed in the various cell types of larval hemocytes, shown in a new Figure 8 and Figure 8_S1 supplement. I.e., there is the potential of Pkc53E to promote lamellocyte formation. Moreover, RNAi-mediated downregulation of Pkc53E within hemocytes affected crystal cell formation similar to the Pkc53ED28 mutant, in agreement with a specific requirement within blood cells (Figure 6). Finally, we show a major drop in Notch target gene transcription (NRE-GFP) in response to wasp infestation within isolated hemocytes from Su(H)gwt in contrast to Su(H)S269A larvae (see new Figure 1 G). These data show that Su(H)-mediated Notch activity must be downregulated in hemocytes prior to lamellocyte formation in agreement with our hypothesis.

      Finally, the conclusion that Pkc53E is (directly) responsible for Su(H) phosophorylation needs to be strengthened. Most importantly, the authors do not demonstrate that Pkc53E is required for Su(H) phosphorylation in vivo (i.e. that Su(H) is not phosphorylated in the absence of Pkc53E following infestation).

      We would very much like to show respective results. Unfortunately, the low affinity of our pS269 antibody does not allow any in situ or in vivo experiments. We very much hope to obtain a more specific phosphoS269-Su(H) antibody allowing us further in situ studies, and show, for example co-localization with Pkc53E.

      In addition, the in vitro kinase assays with bacterially purified Pkc53E (in the presence of PMA or using an activated variant of Pkc53E) only reveal a weak activity on a Su(H) peptide encompassing S269 (Fig. 4).

      The reviewer correctly notes the poor activity of our purified Pkc53EEDDD kinase. This low activity also holds true for the standard peptide (PS), which in fact is even less well accepted than the Swt substrate. Indeed, the commercially available PKCα is a magnitude more active. Whether this reflects the poor quality of our isolated protein compared to the commercial PKCα, or whether it reflects a true biochemical property of Pkc53E remains to be shown in the future. We noted this observation in the manuscript.

      Moreover, while the authors show a coIP between an overexpressed Pkc53E and endogenous Su(H) (Fig. 7) (in the absence of infestation), it has recently been reported that Pkc53E is a cytoplasmic protein in the eye (Shieh et al. 2023), calling for a direct assessment of Pkc53E expression and localization in larval blood cells under normal conditions and upon infestation.

      Indeed, it is interesting that a Pkc53E-GFP fusion protein is cytoplasmic in the eye. The construct reported by Shieh et al. however, i.e. the B-isoform, is preferentially expressed in photoreceptors, where it regulates the de-polymerization of the actin cytoskeleton.

      Due to the eye-specific expression, we unfortunately cannot use the Pkc53E-B-GFP construct to test for Pkc53E’s distribution in other tissues.

      As this construct is of little use for studying hematopoiesis, we have instead used Pck53E-GFP (BL59413) derived from a protein trap: again, GFP is primarily seen in the cytoplasm of hemocytes, including lamellocytes of infected larvae. However, in a small number of hemocytes, GFP appears to be also nuclear (Fig. 8A), leaving the possibility that activated Pkc53E may localize to the nucleus, eventually phosphorylating Su(H) and downregulating Notch activity. As Su(H) enters the nucleus piggy-back with NICD, however, phosphorylation may as well occur at the membrane or within the cytoplasm. We note, however, that these hypotheses require a much more detailed analysis.

      Furthermore, the effect of the PKCa agonist PMA on Su(H)-induced reporter gene expression in cell culture and crystal cell number in vivo is somehow consistent with the authors hypothesis, but some controls are missing (notably western blots to show that PMA/Staurosporine treatment does not affect Su(H)-VP16 level) and it is unclear why STAU treatment alone promotes Su(H)-VP16 activity (in their previous reports, the authors found no difference between Su(H)S269A-VP16 and Su(H)-VP16) or why PMA treatment still has a strong impact on crystal cell number in Su(H)S269A larvae.

      We have added a Western blot showing that the treatment does not affect Su(H)-VP16 expression levels (Figure 5_supplement 1). As STAU is a general kinase inhibitor, it may obviate any inhibitory phosphorylation of Su(H)-VP16 in the HeLa cells, e.g. that by Akt1, CAMK2D or S6K which pilot T271, phosphorylation of which is expected to affect the DNA-binding of Su(H) as well (Figure 3_supplement 2). Moreover, in the previous report, we used different constructs with regard to the promoter, and we used RBPJ instead of Su(H), which may explain some of the discrepancies. As PMA is not specific to just Pkc53E, the altered crystal cell numbers may result from the influence on other kinases involved in blood cell homeostasis, as predicted by our genetic screen (Figure 3_supplement 1).

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors should provide a more elaborate examination of larval blood cell types and blood cell counts under normal conditions and following infestation in the different zygotic mutants as well as upon Pkc53 knock-down. A thorough examination of PSC integrity should be performed and the maintenance of core blood cell progenitors examined. The authors should also clarify when after infestation the LG and larval bleeds are analyzed.

      - a more elaborate examination of larval blood cell types:

      - examination of larval blood cell counts under normal conditions: hemocyte # in gwt, SA, SD, & Pkc

      - examination of larval blood cell counts after infestation: hemocyte # in gwt, SA, SD, & Pkc

      - thorough examination  of PSC integrity: in gwt, SA, SD, & Pkc

      - thorough examination of blood cell progenitors: in gwt, SA, SD, & Pkc

      - clarify timing

      Hemocyte numbers of the various genotypes and conditions were recorded and are presented in Figure 1_S1 and Figure 9_S1. Timing was elaborated in the text and the Methods section.

      (2) The authors should clarify why they use lz-GAL4 or hml-GAL4 and what we can infer from using these different drivers.

      See above. The reasoning was included in the text.

      (3) The percentage of hatching of Su(H)S269A and Su(H)gwt flies in the absence of infestation should also be scored; a small decrease in Su(H)S269A viability might explain the observed differences in survival to wasp infestation. Absolute blood cell numbers (in the absence of infestation) have also been correlated with survival to infection and should be checked.

      Percentage of the emerging flies and hemocyte numbers in the absence of infestation were recorded and included in Figure 2, Figure 1_S1, Figure 9_S1.

      (4) Whereas the impact of Su(H)S269A or Pkc53E mutation on lamellocytes production is clear, there is still a substantial reduction in crystal cell production following infestation. So I wouldn't conclude that the Su(H) larvae are "unable" to detect this immune challenge or respond to it (line 116).

      Thank you for the hint, we corrected the text.

      (5) The expression and localization of Pkc53E in larval blood cells should be investigated, for instance using the Pkc53E-GFP line recently published by Shieh et al. (or at least at the RNA level).

      Firstly, we confirmed expression of Pkc53E in hemocytes by RT-PCR (Figure 8_S1 supplement). Secondly, expression of Pkc53E-GFP was monitored in hemocytes (Figure 8). To this end, we used the protein trap (BL59413), since the one published by Shieh et al., 2023 is restricted to photoreceptors.

      (6) It would be interesting to test the anti-pS269 antibody in immunostaining (using Su(H)S269A as negative control).

      Unfortunately, the pS269 antiserum does not work in situ at all.

      (7) The authors must perform a western blot with anti-pS269 in Pkc53e mutant to show that Su(H) is not phosphorylated anymore after wasp infestation.

      The blot gives a negative result.

      (8) It is surprising that no signal is seen in the absence of infestation with anti-pS269: the fact that Su(H)S269A have more crystal cells suggest that there is a constitutive level of phosphorylation of Su(H).

      We fully agree: In the ideal world, we would expect a low level of S269 phosphorylation in the wild type as well. However, given the lousy specificity of our antibody, we were happy to see phospho-Su(H) in infected larvae. We are currently working hard to get a better antibody. 

      (9) The authors should check Su(H)-VP16 levels and phosphorylation status after PMA and/or staurosporine treatment. Some clarifications are also needed to explain the impact of PMA in Su(H)S269 larvae (this clearly suggests that PKC has other substrates implicated in crystal cell development).

      Su(H)-VP16 expression levels were monitored by Western blot and were not altered conspicuously (Figure 5_1 supplement). Presumably, Pkc53E is not the only kinase involved in Su(H) phosphorylation or the transduction of stress signals. Moreover, PMA may have a more general effect on larval development and hematopoiesis affecting both genotypes. We included this reasoning in the text.

      (10) Concerning the redaction, the authors forgot to mention and discuss the work of Cattenoz et al. (EMBO J 2020). The presentation of the screen for kinase candidates could be streamlined and better illustrated (notably supplement table 4, which would be easier to grasp as a figure/graph). The discussion could be shortened (notably the part on T cells), and I don't really understand lines 374-376 (why is it consistent?).

      We are sorry for omitting Cattenoz et al. 2020, which we have now included. We fully agree that this paper is of utmost importance to our work. We streamlined the screen and included a new figure in addition to table 4 summarizing the results graphically (Figure 3_S1 supplement). We cut on the T cell part and omitted the strange lines.

      Reviewer #2 (Public Review):

      Summary:

      The current draft by Deischel et.al., entitled "Inhibition of Notch activity by phosphorylation of CSL in response to parasitization in Drosophila" decribes the role of Pkc53E in the phosphorylation of Su(H) to downregulate its transcriptional activity to mount a successful immune response upon parasitic wasp-infection. Overall, I find the study interesting and relevant especially the identification of Pkc53E in phosphorylation of Su(H) is very nice. However, I have a number of concerns with the manuscript which are central to the idea that link the phosphorylation of Su(H) via Pkc53E to implying its modulation of Notch activity. I enlist them one by one subsequently.

      Strengths:

      I find the study interesting and relevant especially because of the following:

      (1) The identification of Pkc53E in phosphorylation of Su(H) is very interesting.

      (2) The role of this interaction in modulating Notch signaling and thereafter its requirement in mounting a strong immune response to wasp infection is also another strong highlight of this study.

      Weaknesses:

      (1) Epistatic interaction with Notch is needed: In the entire draft, the authors claim Pkc53E role in the phosphorylation of Su(H) is down-stream of notch activity. Given the paper title also invokes Notch, I would suggest authors show this in a direct epistatic interaction using a Notch condition. If loss of Notch function makes many more lamellocytes and GOF makes less, then would modulating Pkc53E (and SuH)) in this manifest any change? In homeostasis as well, given gain of Notch function leads to increased crystal cells the same genetic combinations in homeostasis will be nice to see.

      While I understand that Su(H) functions downstream of Notch, but it is now increasingly evident that Su(H) also functions independent of Notch. An epistatic relationship between Notch and Pkc will clarify if this phosphorylation event of Su(H) via Pkc is part of the canonical interaction being proposed in the manuscript and not a non-canoncial/Notch pathway independent role of Su(H).

      This is important, as I worry that in the current state, while the data are all discussed inlight of Notch activity, any direct data to show this affirmatively is missing. In our hands we do find Notch independent Su(H) function in immune cells, hence this is a suggestion that stems from our own personal experience.

      The role of Notch in Drosophila hematopoiesis, notably during crystal cell development in both hematopoietic compartments is well established; likewise the role of Su(H) as integral signal transducer in this context (e.g. Duvic et al., 2002). Not only promotes Notch activity crystal cell fate by upregulating target genes, at the same time it prevents adopting the alternative plasmatocyte fate (e.g. Terriente-Felix et al., 2013). We could confirm the downregulation of Notch target gene expression in response to wasp infestation by qRT-PCR, which was discovered earlier by Small et al. (2014). This is clearly in favor of a repression of Notch activity rather than a relief of inhibition by Su(H). A ligand-independent activation of Notch signaling has been uncovered in the context of crystal cell maintenance in the lymph gland involving Sima/Hif-α, including Su(H) as transcriptional mediator (Mukherjee et al., 2011). However, we are unaware of a respective Su(H) activity independent of Notch.

      Certainly, Su(H) acts independently of Notch in terms of gene repression. Here, Su(H) forms a repressor complex together with H and co-repressors Groucho and CtBP to silence Notch target genes. Accordingly, loss of Su(H) or H may induce the upregulation of respective gene expression independent of Notch activity. This has been demonstrated, for example, during wing and heart development (Klein et al., 2000; Kölzer, Klein, 2006; Panta et al., 2020). Moreover, during axis formation of the early embryo, global repression is brought about by Su(H) and relieved by activated Notch (Koromila, Stathopolous, 2019). In all these instances, Su(H) is thought to act as a molecular switch, and the activation of Notch causes a strong expression of the respective genes. Likewise, the loss of DNA-binding resulting from the phosphorylation of Su(H) allows the upregulation of repressed Notch target genes in wing imaginal discs, e.g. dpn, as we have demonstrated before with overexpression and clonal analyses (Nagel et al. 2017; Frankenreiter et al., 2021). However, H does not contribute to crystal cell homeostasis, i.e. de-repression of Notch target genes does not appear to be a major driver in this context, asking for additional mechanisms to downregulate Notch activity. Our work provides evidence that these inhibitory mechanisms involves the phosphorylation of Su(H) by Pkc53E. Formally, we cannot exclude alternative mechanisms. Hence, we have tried to avoid the direct link between Su(H) phosphorylation and the inhibition of Notch activity throughout the text, including the title. Moreover, we have discussed the possible consequences of Su(H) lack of DNA binding, interfering either with the activation of Notch target genes or abrogating their repression.

      In addition, we have performed new experiments addressing the epistasis between Notch and Su(H) during crystal cell formation (Figure 1_supplement 1). To this end, we knocked down Notch activity in hemocytes by RNAi (hml::N-RNAi) in the Su(H)gwt and Su(H)S269A background, respectively. Indeed, Notch downregulation strongly impairs crystal cell development independent of the genetic background as expected if Notch were epistatic to Su(H). We attribute the slightly elevated crystal cell numbers observed in the Su(H)S269A background to the increase in the embryonic precursors (see Fig. 4; Frankenreiter et al. 2021). Of note, the Notch gain of function allele Ncos479 also displayed a likewise increase in embryonic crystal cell precursors as well as in crystal cells within the lymph gland (Frankenreiter et al. 2021).

      (2) Temporal regulation of Notch activity in response to wasp-infection and its overlapping dynamics of Su(H) phosphorylation via Pkc is needed:

      First, I suggest the authors to show how Notch activity post infection in a time course dependent manner is altered. A RT-PCR profile of Notch target genes in hemocytes from infected animals at 6, 12, 24, 48 HPI, to gauge an understanding of dynamics in Notch activity will set the tone for when and how it is being modulated. In parallel, this response in phospho mutant of Su(H) will be good to see and will support the requirement for phosphorylation of Su(H) to manifest a strong immune response.

      Indeed, it would be extremely nice to follow the entire processes in every detail, ideally at the cellular level. The challenge, however, is quantities. The mRNA isolated from hemocytes could be barely quantified, although the subsequent ct-values were ok. We quantified NRE-GFP expression, introduced into Su(H)gwt and Su(H)S269A, as well as atilla expression. We were able to generate data for two time slots, 0-6 h and 24-30 h post infection. The data are provided in the extended Figure 1G, and show a strong drop of NRE-GFP in the infected Su(H)gwt control compared to the uninfected animals, whereas expression in Su(H)S269A plateaus at around 60%-70% of the infected Su(H)gwt control. Atilla expression jumps up in the control, but stays low in Su(H)S269A hemocytes.

      Second, is the dynamics of phosphorylation in a time course experiment is missing. While the increased phosphorylation of Su(H) in response to wasp-infestation shown in Fig.2B is using whole animal, this implies a global down-regulation of Su(H)/Notch activity. The authors need to show this response specifically in immune cells. The reader is left to the assumption that this is also true in immune cells. Given the authors have a good antibody, characterizing this same in circulating immune cells in response to infection will be needed. A time course of the phosphorylation state at 6, 12, 24, 48 HPI, to guage an understanding of this dynamics is needed.

      We really would love to do these experiments. Unfortunately, our pS269 antibody is rather lousy. It does not allow to detect Su(H) protein in tissue or cells, nor does it work on protein extracts in Westerns or for IP. Hence, we have no way so far to demonstrate cell or tissue specificity of Su(H) phosphorylation. So far, we were lucky to detect mCherry-tagged Su(H) proteins pulled down in rather large amounts with the highly specific nano-bodies. We have tried very hard to repeat the experiment with hemolymph and lymph glands only, but we have failed so far. Hence, we have to state that our antibody is neither suitable for in vivo analyses, nor for a detection of phospho-Su(H) at lower levels.

      The authors suggest, this mechanism may be a quick way to down-regulate Notch, hence a side by side comparison of the dynamics of Notch down-regulation (such as by doing RT-PCR of Notch target genes following different time point post infection) alongside the levels of pS269 will strengthen the central point being proposed.

      We fully agree and hope to address these issues in the future by improving our tools.

      Last, in Fig7. the authors show Co-immuno-precipitation of Pkc53EHA with Su(H)gwt-mCh 994 protein from Hml-gal4 hemocytes. I understand this is in homeostasis but since this interaction is proposed to be sensitive to infection, then a Co-IP of the two in immune cells, upon infection should be incorporated to strengthen their point.

      We do not fully agree with the reviewer. Although we also think that the interaction between Pkc53E and Su(H) might occur more frequently upon infection, we propose that this is a transient process occurring in several but not all hemocytes at a given time. Moreover, in the described experiment, Pkc53E-HA was expressed in hemocytes via the UAS/Gal4 system. We cannot exclude that this approach causes an overexpression. Hence, we would not expect considerable differences between unchallenged and infested animals.

      (3) In Fig 5B, the authors show the change in crystal cell numbers as read out of PMA induced activation of Pkc53E and subsequent inhibition of Su(H) transcriptional activity, I would suggest the authors use more direct measures of this read out. RT-PCR of Su(H) target genes, in circulating immune cells, will strengthen this point. Formation of crystal cells is not just limited to Notch, I am not convinced that this treatment or the conditions have other affect on immune cells, such as any impact on Hif expression may also lead to lowering of CC numbers. Hence, the authors need to strengthen this point by showing that effects are direct to Notch and Su(H) and not non-specific to any other pathway also shown to be important for CC development.

      We agree with the Reviewer that the rather general influence of PMA on PKCs might present a systemic stress to the animal. For example, we observed a slight drop of crystal cell numbers also in Su(H)S269A, suggesting other kinases apart from Pkc53E were affected that are involved in crystal cell homeostasis. We have included this notion in the text. To provide more conclusive evidence we also fed Staurosporine to the larvae which reversed the PMA effect. In addition, we assayed the expression of NRE-GFP in hemocytes of infected animals by qRT-PCR, and observed a strong drop in the infected versus uninfected control but less so in Su(H)S269A. The new data are provided in extended Figures 1G and 5B.

      (4) In addition to the above mentioned points, the data needs to be strengthened to further support the main conclusions of the manuscript. I would suggest the authors present the infection response with details on the timing of the immune response. Characterization of the immune responses at respective time points (as above or at least 24 and 48 HPI, as norms in the field) will be important. Also, any change in overall cell numbers, other immune cells, plasmatocytes or CC post infection is missing and is needed to present the specificity of the impact. The addition of these will present the data with more rigor in their analysis.

      Total hemocyte numbers of the various genotypes, i.e. control, Su(H)S269A, Su(H)S269D, and Pkc53ED28 were included before and after wasp infestation in supplemental Figures 1_S1 and 9_S1. 

      (5) Finally, what is the view of the authors on what leads to activation of Pkc53E, any upstream input is not presented. It will be good to see if wasp infection leads to increased Pkc53 kinase activity.

      The analysis of the full process is an ongoing project. We propose that ROS is produced upon the wasps’ sting, which is to trigger the subsequent cascade of events. These have to end with activation of Pkc53E in the presumptive pre-lamellocyte pool of both lineages, i.e. in plasmatocyte of the hemolymph, presumably in the sessile compartment (Tattikotta et al., 2021) and at the same time in the lymph gland cortex harboring the LM precursors (Blanco-Obregon et al., 2020). One of the known upstream kinases, Pdk1 has a similar impact on crystal cell development as Pkc53E, making its involvement likely. Moreover, we think that other PKCs influence the process as well.

      Without a good read out, e.g. a functional pSu(H) antiserum working in situ or a Pkc-activity reporter, it will be quite difficult to follow up this question. However, we already know that Pkc53E is expressed in hemocytes of all types independent of wasp infestation, in agreement with a role during lamellocyte differentiation. We hope to unravel the process in more of it in the future.

      Overall, I think the findings in the current state are interesting and fill an important gap, but the authors will need to strengthen the point with more detailed analysis that includes generating new data and also presenting the current data with more rigor in their approach. The data have to showcase the relationship with Notch pathway modulation upon phosphorylation of CSL in a much more comprehensive way, both in homeostasis and in response to infection which is entirely missing in the current draft.

      Reviewer #3 (Public Review):

      Diechsel et al. provide important and valuable insights into how Notch signalling is shut down in response to parasitic wasp infestation in order to suppress crystal cell fate and favour lamellocyte production. The study shows that CSL transcription factor Su(H) is phosphorylated at S269A in response to parasitic wasp infestation and this inhibitory phosphorylation is critical for shutting down Notch. The authors go on to perform a screen for kinases responsible for this phosphorylation and have identified Pkc53E as the specific kinase acting on Su(H) at S269A. Using analysis of mutants, RNAi and biochemistry-based approaches the authors convincingly show how Pkc53E-Su(H) interaction is critical for remodelling hematopoiesis upon wasp challenge. The data presented supports the overall conclusions made by the authors. There are a few points below that need to be addressed by the authors to strengthen the conclusions:

      (1) The authors should check melanized crystal cells in Su(H)gwt and Su(H)S269A in presence of PMA and Staurosporine?

      Thank you for the suggestion. We included the results of PMA + Staurosporine feeding into an extended Fig. 5B; they match those from the HeLa cells. Unfortunately, Staurosporine alone was lethal for the larvae at various concentrations, presumably owing to the overarching inhibition of kinase activity. This global effect also explains the high crystal cell numbers in the control fed with PMA + STAU compared to the untreated animals, as the downregulation of many kinases results in higher crystal cell numbers, a fact uncovered in our genetic screen.

      (2) Data for number of dead pupae, flies eclosed, wasps emerged post infestation should be monitored for the following genotypes and should be included:

      Pkc53EΔ28_, Su(H)S269A,_ Pkc53EΔ28 Su(H)S269A, Su(H)S269D, Su(H)S269D Pkc53EΔ28

      We extended the data with and without infection. The respective data are shown in a new Fig. 9 and an extended Fig. 2,  except for the Su(H)S269D allele. Su(H)S269D is larval lethal, i.e. dies too early for wasp development, and hence could not be included in the assay. Overall, Pkc53EΔ28 matched Su(H)S269A_._

      (3) The exact molecular trigger for activation of Pkc53E upon wasp infestation is not clear.

      Indeed, and we would love to know! Perhaps, the generation of Ca2+ by the wasp’s breach of the larval cuticle results in Pkc53E activation. The generation of ROS could be involved as well. At this point, we can only speculate. We hope to be able in the future to obtain direct experimental evidence for the one or the other hypothesis.

      (4) The authors should check if activating ROS alone or induction of Calcium pulses/DUOX activation can mimic this condition and can trigger activation of Pkc53E and thereby cause phosphorylation of Su(H) at S269

      The reviewer’s suggestions open up a new field of investigations, and are hence beyond of the scope of this article. However, we want to pursue the research in this direction, albeit we realize that counting crystal cells is too coarse but to give a first impression, and that lamellocytes may form already by breaching the larval cuticle. A major challenge shall be direct measurements of Pkc53E activation. To date, we have no tools for this, but ideally, we would like to have a direct, biochemical read out. Although we have been unsuccessful in the past, we want to develop a strong and specific phospho-S269 antibody that is also working in situ. Alternatively, we think of developing a PS-phosphorylation reporter, to allow reasonably addressing these questions.

      (5) Does Pkc53E get activated during sterile inflammation?

      We are in the process of addressing this issue, however, feel that his topic is beyond the scope of this paper. Our preliminary experiments, however, support the notion of a phospho-dependent regulation of Su(H) also in this context.

      Reviewer #3 (Recommendations For The Authors):

      The authors provide a graphical representation of major phenotypes that form the basis of their investigation and conclusions but have not supplemented the quantitation with images that represent these phenotypes. The authors need to include the following data to strengthen their conclusions:

      (1) The authors should include representative images for each of the genotypes/conditions (in presence and absence of wasp infestation) based on which corresponding plots have been made in Figure 1. Please include this for both circulating lamellocytes in the hemolymph and in the lymph glands since this is one of the main figures presenting the key findings.

      The data have been included in Figure 1-S2 supplement.

      (2) Please include representative images of LG with Hnt staining and corresponding images for melanization for each of the genotypes used in the plots in Figure 6A and B.

      The data have been included in Figure 6-S2 supplement.

      (3) Representative images for each of the genotypes in Figure 7A & B should be included (circulating crystal cells and lymph gland crystal cell numbers).

      Representative images for each of the genotypes for Fig. 7A have been included in Figure 7-S1 and for the old Fig. 7B in Figure 9-S2 supplement, respectively.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to reviewers

      We thank the Editor and the Reviewers for their constructure review. In the light of this feedback, we have made a number of changes and additions to the manuscript, that we think improved the presentation and hopefully address the majority of the concerns by the reviewers.

      Main changes:

      •   We added a new SI section (B1) with a population dynamics simulation in the high clonal interference regime and without expiring fitness (see R1: (1)).

      •   We added a new SI section (A9) with the derivation of the equilibrium state of our SIR model in the case of 𝑀 immune groups and in the limit 𝜀 → 0 (see R1: (5)).

      •   The text of the section Abstraction as “expiring” fitness advantage has been modified.

      •   We added a new SI section (A4) describing the links between parameters of the “expiring fitness” and SIR models.

      All three reviewers had concerns about the relation between our SIR model and the “expiring fitness” model, that we hope will be addressed by the last two items listed above. In particular, we would like to underline the following points:

      •   The goal of our SIR model is to give a mechanistic explanation of partial sweeps using traditional epidemiological models. While ecological models (e.g. consumer resource) can give rise to the same phenomenology, we believe that in the context of host-pathogen interaction it is relevant to explicitely show that SIR models can result in partial sweeps.

      •   The expiring fitness model is mainly an effective model: it reproduces some qualitative features of the SIR but does not quantitatively match all aspects of the frequency dynamics in SIR models.

      •   It is possible to link the parameters of the SIR (𝛼,𝛾,𝑏,𝑓) and expiring fitness (𝑠,𝑥,𝜈) models at the beginning of the invasion of the variant (new SI section A4). However, the two models also differ in significant ways (the SIR model can for example oscillate, while the effective model can not). The correspondence of quantities like the initial invasion rate and the ‘expiration rate’ of fitness effects is thus only expected to hold for some time after the emergence of a novel variant.

      Public reviews:

      Reviewer 1:

      Summary In this work, the authors study the dynamics of fast-adapting pathogens under immune pressure in a host population with prior immunity. In an immunologically diverse population, an antigenically escaping variant can perform a partial sweep, as opposed to a sweep in a homogeneous population. In a certain parameter regime, the frequency dynamics can be mapped onto a random walk with zero mean, which is reminiscent of neutral dynamics, albeit with differences in higher order moments. Next, they develop a simplified effective model of time dependent selection with expiring fitness advantage, and posit that the resulting partial sweep dynamics could explain the behaviour of influenza trajectories empirically found in earlier work (Barrat-Charlaix et al. Molecular Biology and Evolution, 2021). Finally, the authors put forward an interesting hypothesis: the mode of evolution is connected to the age of a lineage since ingression into the human population. A mode of meandering frequency trajectories and delayed fixation has indeed been observed in one of the long-established subtypes of human influenza, albeit so far only over a limited period from 2013 to 2020. The paper is overall interesting and well-written. Some aspects, detailed below, are not yet fully convincing and should be treated in a substantial revision.

      We thank the reviewer for their constructive criticism. The deep split in the A/H3N2 HA segment from 2013 to 2020 is indeed the one of the more striking examples of such meandering frequency dynamics in otherwise rapidly adapting populations. But the up and down of H1N1pdm clade 5a.2a.1 in recent years might be a more recent example. We argue that such meandering dynamics might be a common contributor to seasonal influenza dynamics, even if it only spans 3-6 years.

      (1) The quasi-neutral behaviour of amino acid changes above a certain frequency (reported in Fig, 3), which is the main overlap between influenza data and the authors’ model, is not a specific property of that model. Rather, it is a generic property of travelling wave models and more broadly, of evolution under clonal interference (Rice et al. Genetics 2015, Schiffels et al. Genetics 2011). The authors should discuss in more detail the relation to this broader class of models with emergent neutrality. Moreover, the authors’ simulations of the model dynamics are performed up to the onset of clonal interference 𝜌/ 𝑠0 \= 1 (see Fig. 4). Additional simulations more deeply in the regime of clonal interference (e.g. 𝜌/ 𝑠0 \= 5) show more clearly the behaviour in this regime.

      We agree with the reviewer that we did not discuss in detail the effects of clonal interference on quasi-neutrality and predictability. As suggested, we conducted additional simulations of our population model in the regime of high clonal interference (𝜌/ 𝑠0 ≫ 1) and without expiring fitness effects. The results are shown in a new section of the supplementary information. These simulations show, as expected, that increasing clonal interference tends to decrease predictability: the fixation probability of an adaptive mutation found at frequency 𝑥 moves closer to 𝑥 as 𝜌 increases. However, even in a case of strong interference 𝜌/ 𝑠0 \= 32, 𝑝fix remains significantly different from the neutral expectation. We conclude from this that while it is true that dynamics tend to quasi-neutrality in the case of strong interference, this effect alone is unlikely to explain observations of H3N2 influenza dynamics. In our previous publication (BarratCharlaix et al, MBE, 2021) we have also investigated the effect of epistatic interactions between mutations, along side strong clonal interference. We concluded that, while most of these processes make evolution less predictable and push 𝑝fix towards the diagonal, it is hard to reproduce the empirical observations with realistic parameters. The “expiring fitness” model, however, produces this quite readily.

      But there are qualitative differences between quasi-neutrality in traveling wave models and the expiring fitness model. In the traveling wave, a genotype carrying an adaptive mutation is always fitter than if it didn’t carry the mutation. Quasi-neutrality emerges from the accumulation of fitness variation at other loci and the fact that the coalescence time is not much bigger than the inverse selection coefficient of the mutation. In the expiring fitness model, the selective effect of the mutation itself goes away with time. We now discuss the literature on quasi-neutrality and cite Rice et al. 2015 and Schiffels et al. 2011.

      In this context, I also note that the modelling results of this paper, in particular the stalling of frequency increase and the decrease in the number of fixations, are very similar to established results obtained from similar dynamical assumptions in the broader context of consumer resource models; see, e.g., Good et al. PNAS 2018. The authors should place their model in this broader context.

      We thank the reviewer for pointing out the link between consumer resource models and our work. We further strengthened our discussion of the similarity of the phenomenology to models typically used in ecology and made an effort to highlight the link between consumer-resource models and ours in the introduction and in the part on the SIR model.

      (2) The main conceptual problem of this paper is the inference of generic non-predictability from the quasi-neutral behaviour of influenza changes. There is no question that new mutations limit the range of predictions, this problem being most important in lineages with diverse immune groups such as influenza A(H3N2). However, inferring generic non-predictability from quasi-neutrality is logically problematic because predictability refers to individual trajectories, while quasi-neutrality is a property obtained by averaging over many trajectories (Fig. 3). Given an SIR dynamical model for trajectories, as employed here and elsewhere in the literature, the up and down of individual trajectories may be predictable for a while even though allele frequencies do not increase on average. The authors should discuss this point more carefully.

      We agree with the reviewer that the deterministic SIR model is of course predictable. Similarly, a partial sweep is predictable. But we argue that expiring fitness makes evolution less predictable in two ways: (i) When a new adaptive mutation emerges and rises in frequency, we typically don’t know how rapidly its fitness effect is ‘expiring’. Thus even if we can measure its instantaneous growth rate accurately, we can’t predict its fate far into the future. (ii) Compared to the situation where fitness effects are not expiring, time to fixation is longer and there are more opportunities for novel mutations to emergence and change the course of the trajectory. We have tried to make this point clearer in the manuscript.

      (3) To analyze predictability and population dynamics (section 5), the authors use a Wright-Fisher model with expiring fitness dynamics. While here the two sources of the emerging neutrality are easily tuneable (expiring fitness and clonal interference), the connection of this model to the SIR model needs to be substantiated: what is the starting selection 𝑠0 as a function of the SIR parameters (𝑓,𝑏,𝑀,𝜀), the selection decay 𝜈 = 𝜈(𝑓,𝑏,𝑀,𝜀,𝛾)? This would enable the comparison of the partial sweep timing in both models and corroborate the mapping of the SIR onto the simplified W-F model. In addition, the authors’ point would be strengthened if the SIR partial sweeps in Fig.1 and Fig.2 were obtained for a combination of parameters that results in a realistic timescale of partial sweeps.

      We added a new section to the SI (A4) that relates the parameters of the SIR and expiring fitness models. In particular, we compute the initial growth rate 𝑠0 and a proxy for the fitness expiry rate 𝜈 as a function of the SIR parameters 𝛼,𝛾,𝑓,𝑏,𝑀, at the instant where the variant is introduced. The initial growth rate depends primarily on the degree of immune escape 𝑓, while the expiration rate 𝜈 is related to incidence 𝐼wt + 𝐼𝑚. However, as both models have fundamentally different dynamics, these relations are only valid on time scales shorter than potential oscillations of the SIR model. Beyond that, the connection between the models is mostly qualitative: both rely on the fact that growth rate of a strain diminishes when the strain becomes more frequent, and give rise to partial sweeps.

      In Figure 1, the time it takes a partial sweep to finish is roughly 100− 200 generations (bottom right panel). If we consider H3N2 influenza and take one generation to be one week, this corresponds to a sweep time of 2 to 4 years, which is slightly slower but roughly in line with observations for selective sweeps. This time is harder to define if oscillatory dynamics takes place (middle right panel), but the time from the introduction of the mutant to the peak frequency is again of about 4 years. The other parameters of the model correspond to a waning time of 200 weeks and immune escape on the order of 20-30% change in susceptibility.

      Reviewer 2:

      Summary

      This work addresses a puzzling finding in the viral forecasting literature: high-frequency viral variants evince signatures of neutral dynamics, despite strong evidence for adaptive antigenic evolution. The authors explicitly model interactions between the dynamics of viral adaptations and of the environment of host immune memory, making a solid theoretical and simulation-based case for the essential role of host-pathogen eco-evolutionary dynamics. While the work does not directly address improved data-driven viral forecasting, it makes a valuable conceptual contribution to the key dynamical ingredients (and perhaps intrinsic limitations) of such efforts.

      Strengths

      This paper follows up on previous work from these authors and others concerning the problem of predicting future viral variant frequency from variant trajectory (or phylogenetic tree) data, and a model of evolving fitness. This is a problem of high impact: if such predictions are reliable, they empower vaccine design and immunization strategies. A key feature of this previous work is a “traveling fitness wave” picture, in which absolute fitnesses of genotypes degrade at a fixed rate due to an advancing external field, or “degradation of the environment”. The authors have contributed to these modeling efforts, as well as to work that critically evaluates fitness prediction (references 11 and 12). A key point of that prior work was the finding that fitness metrics performed no better than a baseline neutral model estimate (Hamming distance to a consensus nucleotide sequence). Indeed, the apparent good performance of their well-adopted “local branching index” (LBI) was found to be an artifact of its tendency to function as a proxy for the neutral predictor. A commendable strength of this line of work is the scrutiny and critique the authors apply to their own previous projects. The current manuscript follows with a theory and simulation treatment of model elaborations that may explain previous difficulties, as well as point to the intrinsic hardness of the viral forecasting inference problem.

      This work abandons the mathematical expedience of traveling fitness waves in favor of explicitly coupled eco-evolutionary dynamics. The authors develop a multi-compartment susceptible/infected model of the host population, with variant cross-immunity parameters, immune waning, and infectious contact among compartments, alongside the viral growth dynamics. Studying the invasion of adaptive variants in this setting, they discover dynamics that differ qualitatively from the fitness wave setting: instead of a succession of adaptive fixations, invading variants have a characteristic “expiring fitness”: as the immune memories of the host population reconfigure in response to an adaptive variant, the fitness advantage transitions to quasi-neutral behavior. Although their minimal model is not designed for inference, the authors have shown how an elaboration of host immunity dynamics can reproduce a transition to neutral dynamics. This is a valuable contribution that clarifies previously puzzling findings and may facilitate future elaborations for fitness inference methods.

      The authors provide open access to their modeling and simulation code, facilitating future applications of their ideas or critiques of their conclusions.

      We thank the reviewer for their summary, assessement, and constructive critique.

      (1) The current modeling work does not make direct contact with data. I was hoping to see a more direct application of the model to a data-driven prediction problem. In the end, although the results are compelling as is, this disconnect leaves me wondering if the proposed model captures the phenomena in detail, beyond the qualitative phenomenology of expiring fitness. I would imagine that some data is available about cross-immunity between strains of influenza and sarscov2, so hopefully some validation of these mechanisms would be possible.

      We agree with the reviewer that quantitatively confronting our model with data would be very interesting. Unfortunately, most available serological data for influenza and SARS-CoV-2 is obtained using post-infection sera from previoulsy naive animal models. To test our model, we would require human serology data, ideally demographically resolved, and a way to link serology to transmission dynamics. Furthermore, our model is mostly an explanation for qualitative features of variant dynamics and their apparent lack of predictability. We therefore considered that quantitative validation using data is out of scope of this work.

      (2) After developing the SIR model, the authors introduce an effective “expiring fitness” model that avoids the oscillatory behavior of the SIR model. I hoped this could be motivated more directly, perhaps as a limit of the SIR model with many immune groups. As is, the expiring fitness model seems to lose the eco-evolutionary interpretability of the SIR model, retreating to a more phenomenological approach. In particular, it’s not clear how the fitness decay parameter 𝜈 and the initial fitness advantage 𝑠0 relate to the key ecological parameters: the strain cross-immunity and immune group interaction matrices.

      The expiring fitness model emerges as a limiting case, at least qualitatively, of the SIR model when growth rate of the new variant is small compared to the waning rate and the SIR model does not oscillate. This can be readily achieved by many immune groups, which reconciles the large effect of many escape mutations and the lack of oscillation by confining the escape to some fraction of the population. Beyond that, the expiring fitness model is mainly an effective model that allows us to study the consequences of partial sweeps on predictability on long timescales. As stated in the “Main changes” section at the start of this reply, we added an SI section which links parameters of the two models. However, we underline the fact that beyond the phenomenon of partial sweeps, the dynamics of the two are different.

      Reviewer 3:

      Summary

      In this work the authors start presenting a multi-strain SIR model in which viruses circulate in an heterogeneous population with different groups characterized by different cross-immunity structures. They argue that this model can be reformulated as a random walk characterized by new variants saturating at intermediate frequencies. Then they recast their microscopic description to an effective formalism in which viral strains lose fitness independently from one another. They study several features of this process numerically and analytically, such as the average variants frequency, the probability of fixation, and the coalescent time. They compare qualitatively the dynamics of this model to variants dynamics in RNA viruses such as flu and SARS-CoV-2.

      Strengths

      The idea that a vanishing fitness mechanisms that produce partial sweeps may explain important features of flu evolution is very interesting. Its simplicity and potential generality make it a powerful framework. As noted by the authors, this may have important implications for predictability of virus evolution and such a framework may be beneficial when trying to build predictive models for vaccine design. The vanishing fitness model is well analyzed and produces interesting structures in the strains coalescent. Even though the comparison with data is largely qualitative, this formalism would be helpful when developing more accurate microscopic ingredients that could reproduce viral dynamics quantitatively. This general framework has a potential to be more universal than human RNA viruses, in situations where invading mutants would saturate at intermediate frequencies.

      We thank the reviewer for their positive remarks and constructive criticism below.

      Weaknesses

      The authors build the narrative around a multi-strain SIR model in which viruses circulate in an heterogeneous population, but the connection of this model to the rest of the paper is not well supported by the analysis. When presenting the random walk coarse-grained description in section 3 of the Results, there is no quantitative relation between the random walk ingredients importantly 𝑃(𝛽) - and the SIR model, just a qualitative reasoning that strains would initially grow exponentially and saturate at intermediate frequencies. So essentially any other microscopic description with these two features would give rise to the same random walk.

      As also highlighted in the response to other reviewers, we now discuss how the parameter of the SIR model are related to the initial growth rate and the ‘expiration’ rate of the effective model. While the phenomenology of the SIR model is of course richer, this correspondence describes its overdamped limit qualitatively well.

      Currently it’s unclear whether the specific choices for population heterogeneity and cross-immunity structure in the SIR model matter for the main results of the paper. In section 2, it seems that the main effect of these ingredients are reduced oscillations in variants frequencies and a rescaled initial growth rate. But ultimately a homogeneous population would also produce steady state coexistence between strains, and oscillation amplitude likely depends on parameters choices. Thus a homogeneous population may lead to a similar coarse-grained random walk.

      The reviewer is correct that the primary effects of using many immune groups is to slow down the increase of novel variant, which in turn dampens the oscillations. Having multiple immune groups widens the parameter space in which partial sweeps without dramatic oscillations are observed. For slow sweeps, similar dymamics are observed in a homogeneous population.

      Similarly, it’s unclear how the SIR model relates to the vanishing fitness framework, other than on a qualitative level given by the fact that both descriptions produce variants saturating at intermediate frequencies. Other microscopic ingredients may lead to a similar description, yet with quantitative differences.

      Both of these points were also raised by other reviewers and we agree that it is worth discussing them at greater length. We now discuss how the parameters of the ‘expiring fitness’ model relate to those of the SIR. We also discuss how other models such as ecological models give rise to similar coarse grained models.

      At the same time, from the current analysis the reader cannot appreciate the impact of such a mean field approximation where strains lose fitness independently from one another, and under what conditions such assumption may be valid.

      In the SIR model, the rate at which strains lose fitness does depend on the precise state of the host population through the quantities 𝑆𝑚 and 𝑆wt , which is apparent in equation (A27) of the new SI section. The fact that a new variant shifts the equilibrium frequencies of previous strains in a proportional way is valid if the “antigenic space” is of very high dimensions, as explained in section Change in frequency when adding subsequent strains of the SI. It would indeed be interesting to explore relaxations of this assumption by considering a larger class of cross immunity matrices 𝐾. However, in the expiring fitness model, the fact that strains lose fitness independently from each ohter is a necessary simplification.

      In summary, the central and most thoroughly supported results in this paper refer to a vanishing fitness model for human RNA viruses. The current narrative, built around the SIR model as a general work on host-pathogen eco-evolution in the abstract, introduction, discussion and even title, does not seem to match the key results and may mislead readers. The SIR description rather seems one of the several possible models, featuring a negative frequency dependent selection, that would produce coarse-grained dynamics qualitatively similar to the vanishing fitness description analyzed here.

      We have revised the text throughout to make the connections between the different parts of the manuscript, in particular the SIR model and the expiring fitness model, clearer. We agree that the phenomenology of the expiring fitness model is more general than the case of human RNA viruses described by the SIR model, but we think this generality is an attractive feature of the coarse-graining, not a shortcoming. Indeed, other settings with negative frequency dependent selection or eco-systems that adapt on appropriate time scale generate similar dynamics.

      Recommendations for the authors:

      Reviewer 1:

      (4) Line 74: what does fitness mean?

      Many population dynamics models, including ones used for viral forecasting, attach a scalar fitness to each strain. The growth rate of each strain is then computed by substracting the average population fitness to the strain’s fitness. In this sentence, fitness is intended in this way.

      (5) Fig. 1: The equilibrium frequency in the middle and bottom rows is hardly smaller than the equilibrium frequency in the top row for one immune group. This is surprising since for M=10, the variant escapes in only 1/10th of the population, which naively should impact the equilibrium frequency more strongly. Could the authors comment on this?

      This is indeed non-trivial, and a hand-waving argument can be made by considering the extreme case 𝜀 = 0. The variant is then completely neutral for the immune groups 𝑖 > 1, and would be at equilibrium at any frequency in these immune groups. Its equilibrium frequency is then only determined by group 1, which is the only one breaking degeneracy. For 𝜀 > 0 but small, we naturally expect a small deviation from the 𝜀 = 0 case and thus 𝛽 should only change slightly.

      A more rigorous argument with a mathematical proof in the case 𝜀 = 0 is now given in section A4 of the supplementary information.

      (6) Fig. 1: In the caption, it is stated that the simulations are performed with 𝜀 = 0.99. Is this a typo? It seems that it should be 𝜀 = 0.01, as in and just below equation (7).

      This was indeed a typo. It is now fixed.

      (7) Fig. 3: The data analysis should be improved. In order to link the average frequency trajectories to standard population genetics of conditional fixation probabilities, the focal time should always be the time where the trajectory crosses the threshold frequency for the first time. Plotting some trajectories from a later time onwards, on their downward path destined to loss, introduces a systematic bias towards negative clonal interference (for these trajectories, the time between the first and the second crossing of the threshold frequency is simply omitted). The focal time of first crossing of the threshold frequency can easily be obtained, e.g., by linear interpolation of the trajectory between subsequent time points of frequency evalution. In light of the modified procedure, the statements on the on the inertia of the trajectories after crossing 𝑥⋆ (line 356) should be re-examined.

      The way we process the data is already in line with the suggestions of the reviewer. In particular, we use as focal time the first time at which a trajectory is found in the threshold frequency bin. Trajectories that are never seen in the bin because of limited time-resolution are simply ignored.

      In Fig. 3, there are no trajectories that are on their downward path at the focal time and when crossing the threshold frequency. Our other work on predictability of flu Barrat-Charlaix et. al. (2021) has a similar figure, which maybe created confusion.

      (8) Fig. 4: authors write 𝛼/ 𝑠0 in the figure, but should be 𝜈/ 𝑠0.

      Fixed.

      (9) Line 420: authors refer to the blue curve in panel B as the case with strong interference. However, strong interference is for higher 𝜌/ 𝑠0, that is panel D (see point 1).

      Fixed.

      (10) Line 477: typo “there will a variety of mutations”.

      Fixed.

      Reviewer 2:

      Should 𝛼 be 𝜈 in Figure 4 legends?

      Thank you very much for spotting this error. We fixed it.

      Equations 4-5 could be further simplified.

      We factorised the 𝐼 term in equation 4. In equation 5, we prefered to keep the 1− 𝛿/ 𝛼 term as this quantity appears in different calculations concerning the model. For instance, 𝑆 = 𝛿/ 𝛼 at equilibrium.

      The sentence before equation 8 references 𝑃𝛽(𝛽), but this wasn’t previously introduced.

      We now introduce 𝑃𝑏𝜂 at the beginning of the section Ultimate fate of the variant.

      In the last paragraph of page 12, “monotonously” maybe should be “monotonically”.

      Fixed.

      For the supplement section B, you might want a more descriptive title than “other”.

      We renamed this section to Expiring fitness model and random walk.

      Reviewer 3:

      To expand on my previous comments, my main concerns regard the connection of section 2 and the SIR model with the rest of the paper.

      In the first paragraph of page 9 the authors argue that a stochastic version of the SIR model would lead to different fixation dynamics in homogeneous vs heterogeneous populations due to the oscillations. This paragraph is quite speculative, some numerical simulations would be necessary to quantitatively address to what extent these two scenarios actually differ in a stochastic setting, and how that depends on parameters.

      Likewise, the connection between the SIR model, the random walk coarse-grained description and the vanishing fitness model can be investigated through numerical simulations of a stochastic SIR given the chosen population and cross-immunity structures with i.e. 10-20 strains. This would allow for a direct comparison of individual strain dynamics rather than the frequency averages, as well as other scalar properties such as higher moments, coalescent, and fixation probability once reaching a given frequency. It would also be possible to characterize numerically the SIR P(beta) bridging the gap with the random walk description. It’s not obvious to me that the SIR P(beta) would not depend on the population size in the presence of birth-death stochasticity, potentially changing the moments scalings. I appreciate that such simulations may be computationally expensive, but similar numerical studies have been performed in previous phylodynamics works so it shouldn’t be out of reach.

      An alternative, the authors should consider re-centering the narrative directly on the random walk of the vanishing fitness model, mentioning the SIR more briefly as a possible qualitative way to get there. Either way the authors should comment on other ways in which this coarse-grained dynamics could arise.

      In the vanishing fitness model, where variants fitnesses are independent, is an infinite dimensional antigenic space implicitly assumed? If that’s the case, it should be explained in the main text.

      A long simulation of the SIR model would indeed be interesting, but is numerically demanding and our current simulation framework doesn’t scale well for many strains and susceptibilities. We thus refrained from adding extensive simulations.

      In Figure 2B of the main text, the simulation with 7 strains illustrates the qualitative match between the expiring fitness and the SIR model. However, it is clearly not long enough to discuss statistical properties of the corresponding random walk. Furthermore, we do not expect the individual strain dynamics of the SIR and expiring fitness models to match. The latter depends on few parameters (𝛼, 𝑠0), while the former depends on the full state of the host population and of the previous variants.

      In the sectin linking the parameters of the two models, we now discuss the distribution 𝑃(𝛽) of the SIR model for two strains and a specific choice of distribution for the cross immunity 𝑏 and 𝑓.

      Minor comments:

      There is some back and forth in the writing. For instance, when introducing the model, 𝐶𝑖𝑗 is first defined as 1/ 𝑀, then a few paragraphs later the authors introduce that in another limit 𝐶𝑖𝑖 is just much higher than any 𝐶𝑖𝑗, and finally they specify that the former is the fast mixing scenario.

      Another example is in section 2, in the first paragraph they put forward that heterogeneity and crossimmunity have different impacts on the dynamics, but the meaning attributed to these different ingredients becomes clear only a while later after the homogeneous population analysis. Uniforming the writing would make it easier for the reader to follow the authors’ train of thought.

      We removed the paragraph below Equation (1) mentioning the 𝐶𝑖𝑗 \= 1/ 𝑀 case, which we hope will linearize the writing.

      When mentioning geographical structure, why would geography affect how immunity sees pairs of viral strains (differences in 𝐾)?

      Geographic structure could influence cross-immunity because of exposure histories of hosts. For instance in the case of influenza, different geographical regions do not have the same dominating strains in each season, and hosts from different regions may thus build up different immunity.

      In the current narrative there are some speculations about non-scalar fitness, especially in section 2. The heterogeneity in this section does not seem so strong to produce a disordered landscape that defies the notion of scalar fitness in the same way some complex ecological systems do. A more parsimonious explanation for the coexistence dynamics observed here may be a negative frequency dependent selection.

      Our language here was not very precise and we agree that the phenomenology we describe is related to that of frequency dependent selection (mediated by via immunity of the host population that integrates past frequencies). Traveling wave models typically use fitness function that are independent of the population distribution and only account for the evolution via an increasing average fitness. We have made discussion more accurate by stating that we consider a case where fitness depends explicitly on present and past population composition, which includes the case of negative frequency dependent selection.

      I don’t understand the comparison with genetic drift (typo here, draft) in the last paragraph of section 3 given that there is no stochasticity in growth death dynamics.

      We compare the random walk to genetic drift because of the expression of the second moment of the step size. The genetic draft has the same functional form. If one defines the effective population size as in the text, the drift due to random sampling of alleles (neutral drift) and the changes in strain frequency in our model have the same first and second moments. The stochasticity here does not come from the dynamics, which are indeed deterministic, but from the appearance of new mutations (variants) on backgrounds that are randomly sampled in the population. This latter property is shared with genetic draft.

      In the vanishing fitness model, I think the reader would benefit from having 𝑃(𝑠) in the main text, and it should be made more clear what simulations assume what different choice of 𝑃(𝑠).

      We added the expression of 𝑃(𝑠) in the main text. Simulations use the value 𝑠0 \= 0.03, which we added in the caption of Figure 4.

      When comparing the model and data, is the point that COVID is not reproduced due to clonal interference? It seems from the plot that flu has clonal interference as well though. Why is that negligible?

      A similar point has been raised by the first reviewer (see R1-(1)). Clonal interference is not negligible, but we find it to be insufficient to explain the observations made for H3N2 influenza, namely the lack of inertia of frequency trajectories or the probability of fixation. This is shown in the new section (B1) of the SI. Both SARS-CoV-2 and H3N2 influenza experience clonal interference, but the former is more predictable than the latter. Our point is that expiring fitness effects should be stronger in influenza because of the higher immune heterogeneity of the host population, making it less predictable than SARS-CoV-2.

      Does the fixation probability as a function of frequency threshold match the flu data for some parameters sets?

      For H3N2 influenza, the fixation probability is found to be equal to the threshold frequency (see Barrat-Charlaix MBE 2021, also indirectly visible from Fig. 3). In Figure 4, we obtain that either a high expiry rate or intermediate expiry rates and clonal interference regimes match this observation.

      It would be instructive to see examples of the individual variant dynamics of the vanishing fitness model compared to the presented data.

      We added an extra SI figure (S7) showing 10 randomly selected trajectories of individual variants in the case of H3N2/HA influenza and for the expiring fitness model with different parameter choices.

      Figure 4E has no colorbar label. The reader shouldn’t have to look for what that means in the bottom of the SIs. In panels A and B the label should be 𝜈, not 𝛼. Same thing in most equations of page 42.

      We added the colorbar label to the figure and also updated the caption: a darker color corresponds to a higher probability of sweeps to overlap. We fixed the 𝜈 – 𝛼 confusion in the SI and in the caption of the figure.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations For The Authors:

      Reviewer #1:

      ●      It might help the reader if you make it explicit that mDES allows you to create an approximate amalgam of different kinds of experiences by assuming that, across individuals, there is a general consensus of experiences at particular points in the movie. Whether this assumption is an accurate reflection of the way in which each individual's brain is an important, testable prediction that could be discussed/examined in different projects. For instance, in other projects there are clear idiosyncratic responses to the same naturalistic stimuli: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8064646/.

      Thank you, this is an excellent point. We have included this article in our revision and expanded on the introduction to emphasize how this study relates to our work. Additionally, we have included an additional figure that helps illustrate how mDES can be used to evaluate the idiosyncrasy for each respective thought component to visually display the variance across moments in the film:

      Page 6-7 [137-148] In our study, we used multi-dimensional experience sampling (mDES) to describe ongoing thought patterns during the movie-watching experience [8]. mDES is an experience sampling method that identifies different features of thought by probing participants about multiple dimensions of their experiences. mDES can provide a description of a person’s thoughts, generating reliable thought patterns across laboratory cognitive tasks [22, 32, 33] and in daily life [34, 35], and is sensitive to accompanying changes in brain activity [24, 36]. Studies that use mDES to describe experience ask participants to provide experiential reports by answering a set of questions about different features of their thought on a continuous scale from 1 (Not at all) to 10 (Completely) [24, 32-41]. Each question describes a different feature of experience such as if their thoughts are oriented in the future or the past, about oneself or other people, deliberate or intrusive in nature, and more (See methods for a full list of questions used in the current study).

      ●      A cartoon describing the mDES technique could be helpful for uninitiated readers.

      Thank you for your suggestion, we have added an additional figure (Figure 3) that illustrates the process of mDES in the laboratory during this experiment, clarifying that participants answer mDES items using a slider to indicate their score (rather than expressing it verbally).

      ●      Did the authors check for any measures of reliability across mDES estimates other than split-half reliability? For instance, the authors could demonstrate construct validity by showing that engagement with certain features of the thought-sampling space aligned with specific points in the movies. If so, the start of the Results section would be a great place to demonstrate the reliability of the approach. For instance, did any two participants sample the same 15-second window of time in a particular stimulus? If so, you could compare their experience samples to determine whether the method was extensible across subjects.

      This is a great point, thank you very much for highlighting this. We have eight individuals at each time point in our analysis, which is probably not enough to calculate meaningful reliability measures. However, we have added a time series analysis of experience in each clip to our revision (Figure 3). In these time plots, it is possible to see clear moments in the film in which scores do not straddle 0 (using 95% CI), and often, these persist across successive moments (Figure 3; see time-series plot four for the clearest example).  When the confidence intervals of a sampling epoch do not overlap with zero, this suggests a high degree of agreement in thought content across participants. At the same time, our analysis shows that individual differences do exist since the relative presence of each component for each participant was linked to objective measures of movie watching (in this case, comprehension). In this revision we have specifically addressed this question by conducting ANOVAs to determine how scores on each component across the clip (See also supplementary table 11). This additional analysis shows that mDES effectively captures shared aspects of movie-watching and is also sensitive to individual variation (since it can describe individual differences).

      Page 15 [304-323]: Next, we examined how each pattern of thought changes across each movie clip. For this analysis, we conducted separate ANOVA for each film clip for the four components (see Table 1 and Figure 3). Clear dynamic changes were observed in several components for different films. We analyzed these data using an Analysis of Variance (ANOVA) in which the time in each clip were explanatory variables of interest. This identified significant change in “Episodic Social Cognition” scores across Little Miss Sunshine, F(1, 712) = 10.80, p = .001, , η2 = .03, and Citizenfour, F(1, 712) = 5.23, p = .023, , η2 = .02. There were also significant change in “Verbal Detail” scores across Little Miss Sunshine, F(1, 712) = 31.79, p <.001, η2 = .09. Lastly, there were significant changes in “Sensory Engagement” scores for both Citizenfour, F(1, 712) = 6.22, p = .013, η2 = .02, and 500 Days of Summer, F(1, 706) = 80.41, p <.001, η2 = .18. These time series are plotted in Figure 3 and highlight how mDES can capture the dynamics of different types of experience across the three movie clips. Moreover, in several of these time series plots, it is clear that thought patterns reported extend beyond adjacent time periods (e.g. scores above zero between time periods 150 to 400 for Sensory Engagement in 500 days of Summer and for time periods between 175 and 225 for Verbal Detail in Little Miss Sunshine). It is important to note that no participant completed experience sampling reports during adjacent sampling points (see Supplementary Figure 7), so the length of these intervals indicates agreement in how specific scenes within a film were experienced and conserved across different individuals. Notably, the component with the least evidence for temporal dynamics was “Intrusive Distraction.”

      ●      P10: "Generation of the thought-space" - how stable are these word clouds to individual subjects? If there are subject-specific differences, are there ways to account for this with some form of normalization?

      Thank you for bringing up this point. Our current goal was to show how the average experience of one group of participants relates to the brain activity of a second group. In this regard it is important to seek the patterns of similarity across individuals in how they experience the film. However, as is normal in our studies using mDES, we can also use the variation from the mean to predict other cognitive measures and, in this way, account for the variability that individuals have in their movie-watching experience. In other words, the word clouds reflect the mean of a particular dimension, so when an individual score is close to 0, their thought content does not align with this dimension -- however, deviating scores, positive or negative, indicating that this dimension provides meaningful information about the individual's experience. Evidence of the meaningful nature of this variation can be seen in the links between the reported thoughts and the individuals’ comprehension (e.g. individuals whose thoughts do not contain strong evidence of “Intrusive Distraction”, or in other words, a negative score, tended to do better on comprehension tests of information in the movies they watched).

      ●      P11: "Variation in thought patterns" - can the authors use a null model here to demonstrate that the associations they've observed would occur above chance levels (e.g., for a comparison of time series with similar temporal autocorrelation but non-preserved semantic structure)? Further, were there any pre-defined hypotheses over whether any of the three different movies would engage any of the 4 observed dimensions?

      This is a great point. We chose to sample from three distinctly different films to help us understand if mDES was sensitive to different semantic and affective features of films. Our analysis, therefore, shows that at a broad level, mDES is able to discriminate between films, highlighting its broad sensitivity to variation in semantic or affective content. Armed with this knowledge, researchers in the future could derive mechanistic insights into how the semantic features may influence the mDES data. For example, future studies could ask participants to watch movies in a scrambled order to understand how varying the structure of semantics or information breaks the mapping between brains and ongoing experience. In this revision we have amended the text to reflect this possibility:

      Page 34 [674-679]. Our analysis shows that mDES is able to discriminate between films, highlighting its broad sensitivity to variation in semantic or affective content. Armed with this knowledge, we propose that in the future, researchers could derive mechanistic insights into how the semantic features may influence the mDES data. For example, it may be possible to ask participants to watch movies in a scrambled order to understand how the structure of semantic or information influences the mapping between brains and ongoing experience as measured by mDES.

      ●      P14: "Brain - Thought Mappings: Voxel-space Analysis" - this is a cool analysis, and a nice validation of the authors' approach. I would personally love to see some form of reliability analysis on these approaches - e.g., do the same locations in the cerebral cortex align with the four features in all three movies? Across subjects?

      This is another great point, and we thank you for your enthusiasm. The data we have has only sampled mDES during a relatively short period of brain activity which we suspect would make an individual-by-individual analysis underpowered. In the future, however, it may be possible to adopt a precision mapping approach in which we sample mDES during longer periods of movie watching and identify how group-level mappings of experience relate to brain activity within a single subject. To reflect this possibility, we have amended the text in this revision in the following way:

      Page 34-35 [672-687]: In addition, our study is correlational in nature, and in the future, it could be useful to generate a more mechanistic understanding of how brain activity maps onto the participants' experience. Our analysis shows that mDES is able to discriminate between films, highlighting its broad sensitivity to variation in semantic or affective content. Armed with this knowledge, we propose that in the future, researchers could derive mechanistic insights into how the semantic features may influence the mDES data. For example, it may be possible to ask participants to watch movies in a scrambled order to understand how the structure of semantic or information influences the mapping between brains and ongoing experience as measured by mDES. Finally, our study focused on mapping group-level patterns of experience onto group-level descriptions of brain activity. In the future, it may be possible to adopt a “precision-mapping” approach by measuring longer periods of experience using mDES and determining how the neural correlates of experience vary across individuals who watched the same movies while brain activity was collected [1]. In the future, we anticipate that the ease with which our method can be applied to different groups of individuals and different types of media will make it possible to build a more comprehensive and culturally inclusive understanding of the links between brain activity and movie-watching experience

      Reviewer #2:

      (1) The three-dimensional scatter plot in Figure 2 does not represent "Intrusive Distraction." Would it make sense to color-code dots by this important dimension?

      Thank you for this suggestion. Although it could be possible to indicate the location of each film in all four dimensions, we were worried that this would make the already complex 3-D space confusing to a naive reader. In this case, we prefer to provide this information in the form of bar graphs, as we did in the previous submission.

      (2) The coloring of neural activation patterns in Figure 3 is not distinct enough between the different dimensions of thought. Please reconsider color intensities or coding. The same applies to the left panel in Figure 4.

      Thanks for this comment; we found it quite difficult to find a colour mapping that allows us to show the distinction between four states in a simple manner, yet we believe it is valuable to show all of the results on a similar brain. Nonetheless, to provide a more fine-grained viewing of our results in this revision we have provided a supplementary figure (Supplementary Figure 6) that shows each of the observed patterns of activity in isolation.

      (3) The new method (mDES) is mentioned too often without explanation, making it hard to follow without referring to the methods section. It would be helpful to state prominently that participants rated their thoughts on different dimensions instead of verbalizing them.

      Thank you for this point, we have adjusted the Introduction to clarify and expand on the mDES method. We have also included an example of the mDES method in an additional figure that we have now included to visually express how participants respond to mDES probes (Figure 3).

      Page 6-7 [136-148]: In our study, we used multi-dimensional experience sampling (mDES) to describe ongoing thought patterns during the movie-watching experience [2]. mDES is an experience sampling method that identifies different features of thought by probing participants about multiple dimensions of their experiences. mDES can provide a description of a person’s thoughts, generating reliable thought patterns across laboratory cognitive tasks [3-5] and in daily life [6, 7], and is sensitive to accompanying changes in brain activity when reports are gained during scanning [8, 9]. Studies that use mDES to describe experience ask participants to provide experiential reports by answering a set of questions about different features of their thought on a continuous scale from 1 (Not at all) to 10 (Completely) [3, 5-14]. Each question describes a different feature of experience, such as if their thoughts are oriented in the future or the past, about oneself or other people, deliberate or intrusive in nature, and more (See Methods for a full list of questions used in the current study).

      Author response image 1.

      (4) Reporting of single-movie thought patterns seems quite extensive. Could this be condensed in the main text?

      Thank you for this point, upon re-visiting the manuscript, we have adjusted the text to be more concise.

      Reviewer #3:

      ●      This is a very elegant experiment and seems like a very promising approach. The text is currently hard to read.

      Thank you for this point, we have since revisited the text and adjusted the manuscript to be more concise and add more clarity.

      ●      The introduction (+ analysis goals) fails to explain the basic aspects of the analysis and dataset. It is not clear how many participants and datapoints were used to establish the group-level thought patterns, nor is it entirely clear that the fMRI data is a separate existing dataset. Some terms are introduced and highlighted and never revisited (e.g decoupled states and the role of the DMN).

      Thank you for this critique, we have since adjusted the introduction to clearly explain the difference between Sample 1 and Sample 2 and further clarify that the fMRI data is an entirely separate, independent sample compared to the laboratory mDES sample:

      Page 7-8 [158-174]: Thus, to overcome this obstacle, we developed a novel methodological approach using two independent sample participants. In the current study, one set of 120 participants was probed with mDES five times across the three ten-minute movie clips (11 minutes total, no sampling in the first minute). We used a jittered sampling technique where probes were delivered at different intervals across the film for different people depending on the condition they were assigned. Probe orders were also counterbalanced to minimize the systematic impact of prior and later probes at any given sampling moment. We used these data to construct a precise description of the dynamics of experience for every 15 seconds of three ten-minute movie clips. These data were then combined with fMRI data from a different sample of 44 participants who had already watched these clips without experience sampling [15]. By combining data from two different groups of participants, our method allows us to describe the time series of different experiential states (as defined by mDES) and relate these to the time series of brain activity in another set of participants who watched the same films with no interruptions. In this way, our study set out to explicitly understand how the patterns of thoughts that dominate different moments in a film in one group of participants relate to the brain activity at these time points in a second set of participants and, therefore, better understand the contribution of different neural systems to the movie-watching experience.

      Page 8-9 [177-188] The goal of our study, therefore, was to understand the association between patterns of brain activity over time during movie clips in one group of participants and the patterns of thought that participants reported at the corresponding moment in a different set of participants (see Figure 1). This can be conceptualized as identifying the mapping between two multi-dimensional spaces, one reflecting the time series of brain activity and the other describing the time series of ongoing experience (see Figure 1 right-hand panel). In our study, we selected three 11-minute clips from movies (Citizenfour, Little Miss Sunshine and 500 Days of Summer) for which recordings of brain data in fMRI already existed (n = 44) [15] (Figure 1, Sample 1). A second set of participants (n = 120) viewed the same movie clips, providing intermittent reports on their thought patterns using mDES (Figure 1, Sample 2). Our goal was to understand the mapping between the patterns of brain activity at each moment of the film and the reports of ongoing thought recorded at the same point in the movies.

      ●      It is unclear what the utility of the method is - is it meant to be done in fMRI studies on the same participants? Or is the idea to use one sample to model another?

      Great point, thank you for highlighting this important question. This paper aimed to interrogate the relationship between experience and neural states while preserving the novelty of movie-watching. Although it could be done in the same sample, it may be difficult to collect frequent reports of experience without interrupting the dynamics of the brain. However, in the future it could be possible to collect mDES and brain activity in the same individuals while they watched movies. For example, our prior studies (e.g. [9]) where we combined mDES with openly-available brain data activity during tasks. In the future, this online method could also be applied during movie watching to identify direct mapping between brain activity and films. However, this online approach would make it very expensive to produce the time series of experience across each clip given that it would require a large number of participants (e.g. 200 as we used in our current study). The following has been included in our manuscript:

      Page 7 [149-159] One challenge that arises when attempting to map the dynamics of thought onto brain activity during movie watching is accounting for the inherently disruptive nature of experience sampling: to measure experience with sufficient frequency to map the dynamics of thoughts during movies would disrupt the natural dynamics of the brain and would also alter the viewer’s experience (for example, by pausing the film at a moment of suspense). Therefore, if we periodically interrupt viewers to acquire a description of their thoughts while recording brain activity, this could impact capturing important dynamic features of the brain. On the other hand, if we measured fMRI activity continuously over movie-watching (as is usually the case), we would lack the capacity to directly relate brain signals to the corresponding experiential states. Thus, to overcome this obstacle, we developed a novel methodological approach using two independent sample participants

      ●      The conclusions currently read as somewhat trivial (e.g "Our study, therefore, establishes both sensory and association cortex as core features of the movie-watching experience", "Our study supports the hypothesis that perceptual coupling between the brain and external input is a core feature of how we make sense of events in movies").

      Thank you for this comment. In this revision we have attempted to extend the theoretical significance of our work in the discussion (for example, in contrasting the links between Intrusive distraction and the other components). To this end we have amended the text in this revision by including the following sections:

      Page 33-35 [654-687]: Importantly, our study provides a novel method for answering these questions and others regarding the brain basis of experiences during films that can be applied simply and cost-effectively. As we have shown mDES can be combined with existing brain activity allowing information about both brain activity and experience to be determined at a relatively low cost.  For example, the cost-effective nature of our paradigm makes it an ideal way to explore the relationship between cognition and neural activity during movie-watching during different genres of film. In neuroimaging, conclusions are often made using one film in naturalistic paradigm studies [16]. Although the current study only used three movie clips, restraining our ability to form strong conclusions regarding how different patterns of thought relate to specific genres of film, in the future, it will be possible to map cognition across a more extensive set of movies and discern whether there are specific types of experience that different genres of films engage. One of the major strengths of our approach, therefore, is the ability to map thoughts across groups of participants across a wide range of movies at a relatively low cost.

      Nonetheless, this paradigm is not without limitations. This is the first study, as far as we know, that attempts to compare experiential reports in one sample of participants with brain activity in a second set of participants, and while the utility of this method enables us to understand the relationship between thought and brain activity during movies, it will be important to extend our analysis to mDES data during movie watching while brain activity is recorded. In addition, our study is correlational in nature, and in the future, it could be useful to generate a more mechanistic understanding of how brain activity maps onto the participants experience. Our analysis shows that mDES is able to discriminate between films, highlighting its broad sensitivity to variation in semantic or affective content. Armed with this knowledge, we propose that in the future, researchers could derive mechanistic insights into how the semantic features may influence the mDES data. For example, it may be possible to ask participants to watch movies in a scrambled order to understand how the structure of semantic or information influences the mapping between brains and ongoing experience as measured by mDES. Finally, our study focused on mapping group-level patterns of experience onto group-level descriptions of brain activity. In the future it may be possible to adopt a “precision-mapping” approach by measuring longer periods of experience using mDES and determining how the neural correlates of experience vary across individuals who watched the same movies while brain activity was collected [1]. In the future, we anticipate that the ease with which our method can be applied to different groups of individuals and different types of media will make it possible to build a more comprehensive and culturally inclusive understanding of the links between brain activity and movie-watching experience

      ●      The beginning of the discussion is very clear and explains the study very well. Some of it could be brought up in the intro/analysis goal sections.

      Thank you for this comment, this is an excellent idea. We have revisited the introduction and analysis goals section to mirror this clarity across the manuscript.

      ●      The different components are very interesting, and not entirely clear. Some examples in the text could help. Especially regarding your thought that verbal components would refer to a "decoupled" mental verbal analysis participants might be performing in their thoughts.

      Thank you for this point. We would prefer not to elaborate on this point since, at present, it would simply be conjecture based on our correlational design. However, we have included a section in the discussion which explains how, in principle, we would draw more mechanistic conclusions (for example, by shuffling the order of scenes in a movie as suggested by another reviewer). In the current revision, we have amended the text in the following way:

      Page 34 [674-679]: Our analysis shows that mDES is able to discriminate between films, highlighting its broad sensitivity to variation in semantic or affective content. Armed with this knowledge, we propose that in the future, researchers could derive mechanistic insights into how the semantic features may influence the mDES data. For example, it may be possible to ask participants to watch movies in a scrambled order to understand how the structure of semantic or information influences the mapping between brains and ongoing experience as measured by mDES

      ●      The reference to using neurosynth as performing a meta-analysis seems a little stretched.

      We have adjusted the manuscript to remove ‘meta-analysis’ when referring to the analysis computed with neurosynth. Thank you for bringing this to our attention.

      ●      State-space is defined as brain-space in the methods.

      Thank you, we have since updated this.

      ●      It could be useful to remind the reader what thought and brain spaces are at the top of the state-space results section.

      This is an excellent point, and it has since been updated to remind the reader of thought- and brain-space. Thank you for this comment.

      Page 24 [458-467]: Our next analysis used a “state-space” approach to determine how brain activity at each moment in the film predicted the patterns of thoughts reported at these moments (for prior examples in the domain of tasks, see [12, 17], See Methods). In this analysis, we used the coordinates of the group average of each TR in the “brain-space” and the coordinates of each experience sampling moment in the “thought-space.”. To clarify, the location of a moment in a film in “brain-space” is calculated by projecting the grand mean of brain activity for each volume of each film against the first five dimensions of brain activity from a decomposition of the Human Connectome Project (HCP) resting state data, referred to as Gradients 1-5. “Thought-space” is the decomposition of mDES items to create thought pattern components, referred to as “Episodic Knowledge”, “Intrusive Distraction”, “Verbal Detail” and “Sensory Engagement.”

      ●      DF missing from the t-test for episodic knowledge/grad 4.

      Thank you for catching this, the degrees of freedom has since been included in this revision.

      Page 24 [474-476]: First, we found a significant main effect of Gradient 4 (DAN to Visual), which predicted the similarity of answers to the “Episodic Knowledge” component, t(2046) = 2.17, p = .013, η2 = .01.

      Public Reviews:

      Reviewer #1:

      ●      The lack of direct interrogation of individual differences/reliability of the mDES scores warrants some pause.

      Our study's goal was to understand how group-level patterns of thought in one group of participants relate to brain activity in a different group of participants. To this end, we decomposed trial-level mDES data to show dimensions that are common across individuals, which demonstrated excellent split-half reliability. Then we used these data in two complementary ways. First, we established that these ratings reliably distinguished between the different films (showing that our approach is sensitive to manipulations of semantic and affective features in a film) and that these group-level patterns were also able to predict patterns of brain activity in a different group of participants (suggesting that mDES dimensions are also sensitive to broad differences in how brain activity emerges during movie watching). Second, we established that variation across individuals in their mDES scores predicted their comprehension of information from the films. This establishes that when applied to movie-watching, mDES is sensitive to individual differences in the movie-watching experience (as determined by an individual's comprehension). Given the success of this study and the relative ease with which mDES can be performed, it will be possible in the future to conduct mDES studies that hone in on the common and distinct features of the movie-watching experience.

      Reviewer #2:

      (1) The dimensions of thought seem to distinguish between sensory and executive processing states. However, it is unclear if this effect primarily pertains to thinking. I could imagine highly intrusive distractions in movie segments to correlate with stagnating plot development, little change in scenery, or incomprehensible events. Put differently, it may primarily be the properties of the movies that evoke different processing modes, but these properties are not accounted for. For example, I'm wondering whether a simple measure of engagement with stimulus materials could explain the effects just as much. How can the effects of thinking be distinguished from the perceptual and semantic properties of the movie, as well as attentional effects? Is the measure used here capturing thought processes beyond what other factors could explain?

      Our study used mDES to identify four distinct components of experience, each of which had distinct behavioural and neural correlates and relationships to comprehension. Together this makes it unlikely that a single measure of engagement would be able to capture the range of effects we observed in our study. For example, “Intrusive Distraction” was associated with regions of association cortex, while the other three components highlighted regions of sensory cortex. Behaviorally, we found that some components had a common effect on comprehension (e.g. “Intrusive distraction” was related to worse comprehension across all films), while others were linked to clear benefits to comprehension in specific films (e.g. “Episodic Knowledge” was associated with better comprehension in only one of the films). Given the complex nature of these effects, it would be difficult for a single metric of engagement to explain this pattern of results, and even if it did, this could be misleading because our analysis implies that they are better explained by a model of movie-watching experience in which there are several relatively orthogonal dimensions upon which our experience can vary.

      At the same time, we also found that films vary in the general types of experience they can engender. For example, Citizenfour was high on “Intrusive Distraction” and participants performed relatively low on comprehension. This shows that manipulations of the semantic and affective content of films also have implications for the movie-watching experience. This pattern is consistent with laboratory studies that applied mDES during tasks and found that different tasks evoke different types of experience (for example, patterns of ‘intrusive’ thoughts were common in movie clips that were suspenseful, [18]). At the same time, in the same study, patterns of intrusive thought across the tasks were also associated with trait levels of dysphoria reported by participants. Other studies using mDES in daily life have shown that the data can be described by multiple dimensions and that each of these types of thought is more prevalent in certain activities than others ([19]). For example, in daily life, patterns of ‘intrusive distraction’ thoughts were more prevalent when individuals were engaged in activities that were relatively unengaging (such as resting). Collectively, therefore, studies using mDES suggest that is likely that human thought is multidimensional in nature and that these dimensions vary in a complex way in terms of (a) the contexts that promote them, and (b) how they are impacted by features of the individual (whether they be traits like anxiety or depression or memory for information in a film).

      (2) I'm skeptical about taking human thought ratings at face value. Intrusive distraction might imply disengagement from stimulus materials, but it could also be an intended effect of the movie to trigger higher-level, abstract thinking. Can a label like intrusive distraction be misleading without considering the actual thought and movie content?

      Our method uses a data-driven approach to identify the dimensions that best describe the range of answers that our participants provided to describe their experience. We use these dimensions to understand how these patterns of thought emerge in different contexts and how they vary across individuals (in this case, in different movies, but in other studies, laboratory tasks [3, 8, 9, 12, 20-22] or activities in daily life[6, 7]). These context relationships help constrain interpretations of what the components mean. For example, “Intrusive Distraction” scores were highest in the film with the most real-world significance for the participants (Citizenfour) and were associated with worse comprehension. In daily life, however, patterns of “Intrusive Distraction” thoughts tend to occur when activities engage in non-demanding activities, like resting. Psychological perspectives on thoughts that arise spontaneously occur in this manner since there is evidence that they occur in non-demanding tasks with no semantic content (when there is almost no external stimulus to explain the occurrence of the experience, see [23]), however, other studies have shown that specific cues in the environment can also cue the experience (see [23]). Consistent with this perspective, and our current data, patterns of ‘Intrusive Distraction’ thought are likely to arise for multiple reasons, some of which are more intrinsic in nature (the general association with poor comprehension across all films) and others which are extrinsic in nature (the elevation of intrusive distraction in Citizenfour).

      It is also important to note that our data-driven approach also found patterns of experience that provide more information about the content of their experience, for example, the dimension of “Episodic Knowledge” is characterized by thoughts based on prior knowledge, involving the past, and concerning oneself, and was most prevalent in the romance film (500 Days of Summer). Likewise, “Sensory Engagement” was associated with experiences related to sensory input and positive emotionality and occurred more during the romance movie (500 Days of Summer) than in the documentary (Citizenfour) and was linked to increased brain activity across the sensory systems. This shows that mDES can also provide information about the content of that experience, and discriminate between different sources of experience. In the future, it will be possible to improve the level of detail regarding the content of experiences by changing the questions used to interrogate experience.     

      (3) A jittered sampling approach is used to acquire thought ratings every 15 seconds. Are ratings for the same time point averaged across participants? If so, how consistent are ratings among participants? High consistency would suggest thoughts are mainly stimulus-evoked. Low consistency would question the validity of applying ratings from one (group of) participant(s) to brain-related analyses of another participant.

      In this experiment, we sampled experience every 15 seconds in each clip, and in each sampling epoch, we gained mDES responses from eight participants. Furthermore, no participant was sampled at an adjacent time point, as our approach jittered probes approximately 2 minutes apart (See Supplementary Figure 7). To illustrate the consistency of mDES data, we have included an additional figure (Figure 3) highlighting how experience varies over time in each clip. It is evident from these plots that there are distinct moments in which group-averaged reported thoughts across participants are stable and that these can extend across adjacent sampling points (i.e. when the confidence intervals of the score at a timepoint do not overlap with zero). Therefore, in some cases, adjacent sampling points, consisting of different sets of eight participants, describe their experiences as having similar positions on the same mDES dimension. This suggests that there is agreement among individuals regarding how they experienced a specific moment in a film, and in some cases, this agreement was apparent in successive sets of eight participants. Together, our findings indicate a conservation of agreement across participants that spans multiple moments in a film. A clear example of agreement on experience across multiple sets of 10 participants can be seen between 150-400 seconds in the clip from 500 Days of Summer for the dimension of “Sensory Engagement” (time series plot 4 in Figure 3).

      (4) Using three different movies to conclude that different genres evoke different thought patterns (e.g., line 277) seems like an overinterpretation with only one instance per genre.

      We found that mDES was able to distinguish between each film on at least one dimension of experience. In other words, information encoded in the mDES dimensions was sensitive to variation in semantic and affective experiences in the different movie clips. This provides evidence that is necessary but not sufficient to conclude that we can distinguish different genres of films (i.e. if we could not distinguish between films, then we would not be able to distinguish genres). However, it is correct that to begin answering the broader question about experiences in different genres then it would be necessary to map cognition across a larger set of movies, ideally with multiple examples of each genre.

      (5) I see no indication that results were cross-validated, and no effect sizes are reported, leaving the robustness and strength of effects unknown.

      Thank you for drawing this to our attention. We have re-run the LMMs and ANOVA models to include partial eta-squared values to clarify the strength of the effects in each of our reported outcomes.

      Reviewer #3:

      ●      What are the considerations for treating high-order thought patterns that occur during film viewing as stable enough to be used across participants? What would be the limitations of this method? (Do all people reading this paper think comparable thoughts reading through the sections?)

      It is likely, based on our study, that films can evoke both stereotyped thought patterns (i.e. thoughts that many people will share) and others that are individualistic. It is clear that, in principle, mDES is capable of capturing empirical information on both stereotypical thoughts and idiosyncratic thoughts. For example, clear differences in experiences across films and, in particular, during specific periods within a film, show that movie-watching can evoke broadly similar thought patterns in different groups of participants (see Figure 3 right-hand panel). On the other hand, the association between comprehension and the different mDES components indicate that certain individuals respond to the same film clip in different ways and that these differences are rooted in objective information (i.e. their memory of an event in a film clip). A clear example of these more idiosyncratic features of movie watching experience can be seen in the association between “Episodic Knowledge” and comprehension. We found that “Episodic Knowledge” was generally high in the romance clip from 500 Days of Summer but was especially high for individuals who performed the best, indicating they remembered the most information. Thus good comprehends responded to the 500 Days of Summer clip with responses that had more evidence of “Episodic Knowledge” In the future, since the mDES approach can account for both stereotyped and idiosyncratic features of experience, it will be an important tool in understanding the common and distinct features that movie watching experiences can have, especially given the cost effective manner with which these studies can be run.   

      ●      How does this approach differ from collaborative filtering, (for example as presented in Chang et al., 2021)?

      Our study is very similar to the notion of collaborative filtering since we can use an approach that is similar to crowd-sourcing as a tool for understanding brain activity. One of its strengths is its generalizability since it is also a method that can be used to understand cognition because it is not limited to movie-watching. We can use the same mDES method to sample cognition in multiple situations in daily life ([6, 19]), while performing tasks in the behavioural lab [18, 24], and while brain activity is being acquired [8, 25, 26]. In principle, therefore, we can use mDES to understand cognition in different contexts in a common analytic space (see [27] for an example of how this could work)

      Page 5 [106-110]: In our study, we acquired experiential data in one group of participants while watching a movie clip and used these data to understand brain activity recorded in a second set of participants who watched the same clip and for whom no experiential data was recorded. This approach is similar to what is known as “collaborative filtering” [28].

      ●      In conclusion, this study tackles a highly interesting subject and does it creatively and expertly. It fails to discuss and establish the utility and appropriateness of its proposed method.

      Thank you very much for your feedback and critique. In our revision and our responses to these questions, we provided more information about the method's robustness utility and application to understanding cognition.

      References

      (1) Gordon, E.M., et al., Precision Functional Mapping of Individual Human Brains. Neuron, 2017. 95(4): p. 791-807.e7.

      (2) Smallwood, J., et al., The neural correlates of ongoing conscious thought. Iscience, 2021. 24(3).

      (3) Konu, D., et al., Exploring patterns of ongoing thought under naturalistic and conventional task-based conditions. Consciousness and Cognition, 2021. 93.

      (4) Smallwood, J., et al., The default mode network in cognition: a topographical perspective. Nature Reviews Neuroscience, 2021. 22(8): p. 503-513.

      (5) Turnbull, A., et al., Age-related changes in ongoing thought relate to external context and individual cognition. Consciousness and Cognition, 2021. 96: p. 103226.

      (6) McKeown, B., et al., The impact of social isolation and changes in work patterns on ongoing thought during the first COVID-19 lockdown in the United Kingdom. Proceedings of the National Academy of Sciences, 2021. 118(40): p. e2102565118.

      (7) Mulholland, B., et al., Patterns of ongoing thought in the real world. Consciousness and Cognition, 2023. 114: p. 103530.

      (8) Konu, D., et al., A role for the ventromedial prefrontal cortex in self-generated episodic social cognition. NeuroImage, 2020. 218: p. 116977.

      (9) Turnbull, A., et al., Left dorsolateral prefrontal cortex supports context-dependent prioritisation of off-task thought. Nature Communications, 2019. 10.

      (10) Ho, N.S.P., et al., Facing up to the wandering mind: Patterns of off-task laboratory thought are associated with stronger neural recruitment of right fusiform cortex while processing facial stimuli. NeuroImage, 2020. 214: p. 116765.

      (11) Karapanagiotidis, T., et al., Tracking thoughts: Exploring the neural architecture of mental time travel during mind-wandering. NeuroImage, 2017. 147: p. 272-281.

      (12) McKeown, B., et al., Experience sampling reveals the role that covert goal states play in task-relevant behavior. Scientific Reports, 2023. 13(1): p. 21710.

      (13) Vatansever, D., et al., Distinct patterns of thought mediate the link between brain functional connectomes and well-being. Network Neuroscience, 2020. 4(3): p. 637-657.

      (14) Wang, H.-T., et al., Dimensions of Experience: Exploring the Heterogeneity of the Wandering Mind. Psychological Science, 2017. 29(1): p. 56-71.

      (15) Aliko, S., et al., A naturalistic neuroimaging database for understanding the brain using ecological stimuli. Scientific Data, 2020. 7(1).

      (16) Yang, E., et al., The default network dominates neural responses to evolving movie stories. Nature Communications, 2023. 14(1): p. 4197.

      (17) Turnbull, A., et al., Reductions in task positive neural systems occur with the passage of time and are associated with changes in ongoing thought. Scientific Reports, 2020. 10(1): p. 9912.

      (18) Konu, D., et al., Exploring patterns of ongoing thought under naturalistic and conventional task-based conditions. Consciousness and cognition, 2021. 93: p. 103139.

      (19) Mulholland, B., et al., Patterns of ongoing thought in the real world. Consciousness and cognition, 2023. 114: p. 103530.

      (20) Christoff, K., et al., Experience sampling during fMRI reveals default network and executive system contributions to mind wandering. Proc Natl Acad Sci U S A, 2009. 106(21): p. 8719-24.

      (21) Zhang, M., et al., Perceptual coupling and decoupling of the default mode network during mind-wandering and reading. eLife, 2022. 11: p. e74011.

      (22) Zhang, M.C., et al., Distinct individual differences in default mode network connectivity relate to off-task thought and text memory during reading. Scientific Reports, 2019. 9.

      (23) Smallwood, J. and J.W. Schooler, The science of mind wandering: Empirically navigating the stream of consciousness. Annual review of psychology, 2015. 66(1): p. 487-518.

      (24) Turnbull, A., et al., The ebb and flow of attention: Between-subject variation in intrinsic connectivity and cognition associated with the dynamics of ongoing experience. Neuroimage, 2019. 185: p. 286-299.

      (25) Turnbull, A., et al., Left dorsolateral prefrontal cortex supports context-dependent prioritisation of off-task thought. Nature communications, 2019. 10(1): p. 3816.

      (26) Mckeown, B., et al., Experience sampling reveals the role that covert goal states play in task-relevant behavior. Scientific reports, 2023. 13(1): p. 21710.

      (27) Chitiz, L., et al., Mapping cognition across lab and daily life using experience-sampling. 2023.

      (28) Chang, L.J., et al., Endogenous variation in ventromedial prefrontal cortex state dynamics during naturalistic viewing reflects affective experience. Science Advances, 2021. 7(17): p. eabf7129.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment 

      This study presents an important finding on the involvement of a Caspase 3-dependent pathway in the elimination of synapses for retinogeniculate circuit refinement and eye-specific territory segregation. This work fits well with the concept of "synaptosis" which has been proposed in the past but lacked in vivo support. Despite its elegant design and many strengths, the evidence supporting the claims of the authors is incomplete, particularly regarding whether Caspase-3 expression can really be isolated to synapses vs locally dying cells, whether microglia direct or instruct synapse elimination, and whether astrocytes are also involved. The work will be of interest to investigators studying cell death pathways, neurodevelopment, and neurodegenerative disease.

      Regarding significance:

      This study provides in vivo evidence that caspase-3 is important for synapse elimination in the visual pathway (Figure 3 and 4) and corroborates the previously proposed but not yet validated “synaptosis” hypothesis. But more significantly, we show that caspase-3 is activated in dLGN relay neurons in response to synapse inactivation (Figure 1) when synaptic competition is present (Figure 2), and that caspase-3 is important for efficient elimination of weakened synapses by microglia (Figure 5 and 6). We consider the causal link between synapse weakening/inactivation and caspase-3 activation to be the most important finding of this study and believe it is an error to not include this aspect of the study in the assessment. The mechanism by which neuronal activity influences synapse elimination is a fundamental question in neuroscience, and our study presents a significant advancement in understanding this problem.

      Regarding strength of evidence:

      We do not agree with the assessment that our evidence should be broadly labeled as “incomplete”. In fact, we argue that many concerns raised by the reviewers are not focused on the main claims made in this study.

      (1) Regarding whether caspase-3 activation (not “expression”, which is the term used in the assessment) is isolated to synapses or occurs in entire cells, we show in Figure 1 that both types of signals can be present. The main concern of the reviewers seems to be that activated caspase-3 signals in apoptotic dLGN relay neurons are irrelevant to our analysis and confound interpretation. We argue that this is not the case.

      In Figure 1, we have two sets of controls demonstrating that the observed apoptosis of dLGN relay neurons occurs specifically in response to synapse inactivation. For each animal that received TeTxLC injection in the right eye, activated caspase-3 signal is compared between the left dLGN, where most of the inactivated synapses are located, and the right dLGN, where the minority of the inactivated synapses are located (between Figure 1B and 1C, also between the first and second group of Figure 1E). We observed apoptotic neurons in the right dLGN with more inactivated synapses but not in the left dLGN with fewer inactivated synapses. The second control is between TeTxLC-injected animals (Figure 1B) and mock-injected animals (Figure 1D). We observed apoptotic relay neurons in the dLGN of TeTxLC-injected animals (Figure 1B) but not mock-injected animals (Figure 1D). Both these controls show that the observed apoptosis of dLGN relay neurons is caused by synapse inactivation.

      In addition, in our synapse inactivation experiment (Figure 1), AAV-hSyn-TeTxLC is injected into the right eye and expressed only in RGCs, not in dLGN relay neurons. Since dLGN relay neurons in this experiment do not receive a perturbation that is independent of synaptic transmission, we conclude that their apoptosis occurs through synapse-dependent mechanisms.

      Furthermore, if the apoptotic neurons are confounding the analysis (as implied by reviewers and editors) and do not occur through synapse-dependent mechanisms, then inhibiting both eyes with TeTxLC (Figure 2C, rightmost group) should cause high levels of caspase-3 activation, like that in the single-inhibition condition. Instead, we observe the opposite (Figure 2C, middle group) – overall caspase-3 activity goes down significantly in the dual-inhibition condition and is closer to the unperturbed condition, which can be explained by a loss of interaction between “strong” and “weak” synapses. Taken together, our data demonstrate that apoptosis of relay neurons in Figure 1 occurs specifically in response to synapse inactivation through synapse-dependent mechanisms, and the activated caspase-3 signal in the neurons should be included in our analysis.

      Why does synaptic caspase-3 activation manifest in different forms: puncta, “blobs”, and cells?  This is not surprising when considering the mechanisms that neurons must utilize to spatially confine caspase-3 activation and the nature of the apoptotic signaling cascade. On one hand, it has been proposed that caspase-3 activity in dendrites can be locally confined by proteasomal degradation of cleaved caspase-3 (Erturk et al., DOI: 10.1523/JNEUROSCI.3121-13.2014 ). On the other hand, caspase-3 activation is known to trigger explosive feedback amplification of apoptotic signaling events (McComb et al., DOI: 10.1126/sciadv.aau9433 ). For caspase-3 activation to remain localized to dendrites, the negative regulation must outweigh the positive feedback amplification. By expressing TeTxLC in RGCs of one eye, we create a strong perturbation that silences a large fraction of the synapses in the retinogeniculate pathway, which likely shifts the balance between positive and negative regulation of caspase-3 activity in some relay neurons. To be more specific, if a given dLGN relay neuron receives too many inactivated synapses, which is likely the case in our perturbation, caspase-3 activity that is initially localized can overwhelm the physiological negative regulation mechanisms that act to spatially confine it, resulting in whole cell apoptosis. In fact, previous in vitro evidence (Enturk et al., DOI: 10.1523/JNEUROSCI.3121-13.2014 ) demonstrated that, while caspase-3 activation in a single distal dendrite can be locally contained, activating apoptosis signaling in dendrites proximal to the cell body can result in whole-cell apoptosis. Similarly, a few inactivated retinogeniculate synapses can elicit locally contained caspase-3 activity in dLGN relay neurons, but a large number of inactivated synapses on a single relay neuron may trigger sufficient caspase-3 activity that can lead to whole-cell apoptosis. We discussed how to interpret synapse inactivation-induced apoptosis in dLGN relay neurons both in the main text and in the discussion (line 123-132, and line 411-421).

      (2) Regarding microglia, we did not claim that “microglia direct or instruct synapse elimination”. Our main claim is that caspase-3 activation is important for efficient elimination of weakened synapses by microglia. This claim emphasizes a regulatory role for caspase-3 activation in microglia-mediated synapse elimination, but not a regulatory role of microglia in synapse elimination. To be more specific, our data suggest that lack of synaptic activity induces caspase-3 activity, and caspase-3 activity in turn influences which synapses are preferentially eliminated by microglia. Therefore, the elimination specificity is fundamentally determined (i.e. instructed) by neuronal activity, not by microglia. We also did not presume the manner in which microglia engage in synapse elimination. We specifically address this point in the discussion at line 458 through 465 where we acknowledge that microglia may indirectly mediate synapse elimination by engulfing shed neuronal material. In our title and text, we use the phrase “microglia-mediated synapse elimination”, which is not the same as microglia-instructed synapse elimination and does not presume any instructive/directive role of microglia.

      (3) Regarding whether astrocytes are involved, we did not challenge the notion that astrocytes play important roles in synapse elimination. Rather, our claim is that, unlike what we observed with microglia, the amount of synaptic material engulfed by astrocytes does not robustly depend on whether caspase-3 is present. We acknowledge that there might be a caspase-3 dependent phenotype that we were unable to detect (line 309-310), and that it is plausible that astrocytes mediate activity-dependent synapse elimination through other caspase-3-independent mechanisms. This claim is not central to our study, and we would like to qualify the statements in the manuscript. We will remove the phrase “but not astrocytes” in line 18 of the abstract.

      In summary, using a state-of-the-art method to inactivate retinogeniculate synapses, we discovered a causal link between synapse weakening/inactivation and caspase-3 activation. Coupled with well-established in vivo assays (e.g., segregation analysis, electrophysiology, and engulfment analysis) that are used in many landmark studies we cite, we provide solid evidence supporting our claim that “caspase-3 is essential for synapse elimination driven by both spontaneous and experience-dependent neural activity”, and that “synapse weakening-induced caspase-3 activation determines the specificity of synapse elimination mediated by microglia”.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In this manuscript, the authors study the effects of synaptic activity on the process of eye-specific segregation, focusing on the role of caspase 3, classically associated with apoptosis. The method for synaptic silencing is elegant and requires intrauterine injection of a tetanus toxin light chain into the eye. The authors report that this silencing leads to increased caspase 3 in the contralateral eye (Figure 1) and demonstrate evidence of punctate caspase 3 that does not overlap neuronal markers like map2. However, the quantifications showing increased caspase 3 in the silenced eye (done at P5) are complicated by overlap with the signal from entire dying cells in the thalamus. The authors also show that global caspase 3 deficiency impairs the process of eye-specific segregation and circuit refinement (Figures 3-4). 

      The reviewer states: “this silencing leads to increased caspase 3 in the contralateral eye”. We observed increased caspase-3 activity, not protein levels, in the contralateral dLGN, not eye.

      The reviewer states: “and demonstrate evidence of punctate caspase 3 that does not overlap neuronal markers like map2”. This is not accurate. We show that the punctate active caspase-3 signals overlap with the dendritic marker MAP2 (Figure S4A).

      The reviewer states: “, the quantifications showing increased caspase 3 [activity] in the silenced [dLGN] (done at P5) are complicated by overlap with the signal from entire dying cells in the thalamus”. This is not accurate. The apoptotic neurons we observed are relay neurons located in the dLGN (confirmed by their morphology and positive staining of NeuN – Figure S4B-C), not “cells” of unknown lineage (as suggested by the reviewer) in the general “thalamus” area (as suggested by the reviewer). If the dying cells were non-neuronal cells, that would indeed confound our quantification and conclusions, but that is not the case.

      We argue that the active caspase-3 signals in apoptotic dLGN relay neurons are not a confounding factor but a bona fide response to synaptic silencing and therefore should be included in the quantification. We have two sets of controls (please also see the general response above), one is between the strongly inactivated dLGN and the weakly inactivated dLGN in each TeTxLC-injected animal, second is between dLGN of TeTxLC-injected animals and mock-injected animals. In both controls, only the dLGN receiving strong synapse inactivation has these apoptotic dLGN relay neurons, demonstrating that these cells occur as a consequence of synapse inactivation. It is also unlikely that our perturbation is causing cell death through a non-synaptic mechanism. As mock injections do not cause apoptosis in dLGN neurons, this phenomenon is not related to surgical damage. TeTxLC is injected into the eyes and only expressed in presynaptic RGCs, not in postsynaptic relay neurons, so this phenomenon is also unlikely to be caused by TeTxLC-related toxicity. Furthermore, if apoptosis of dLGN relay neurons is not related to synapse inactivation, then when TeTxLC is injected into both eyes, one would expect to see either the same amount or more apoptotic relay neurons, but we instead observed a reduction in dLGN neuron apoptosis, suggesting a synapse-related mechanism must be responsible. Considering the above, apoptosis of relay neurons in TeTxLC-inactivated dLGN is causally linked to synapse inactivation, and active caspase-3 signals in these neurons are true signals that should be included in the quantification.

      The authors also report that "synapse weakening-induced caspase-3 activation determines the specificity of synapse elimination mediated by microglia but not astrocytes" (abstract). They report that microglia engulf fewer RGC axon terminals in caspase 3 deficient animals (Figure 5), and that this preferentially occurs in silenced terminals, but this preferential effect is lost in caspase 3 knockouts. Based on this, the authors conclude that caspase 3 directs microglia to eliminate weaker synapses. However, a much simpler and critical experiment that the authors did not perform is to eliminate microglia and show that the caspase 3 dependent effects go away. Without this experiment, there is no reason to assume that microglia are directing synaptic elimination. 

      The reviewer states: “microglia engulf fewer RGC axon terminals in caspase 3 deficient animals (Figure 5), and that this preferentially occurs in silenced terminals, but this preferential effect is lost in caspase 3 knockouts”. We are not sure what the reviewer means by “this preferentially occurs in silenced terminals”. Our results show that microglia preferentially engulf silenced terminals, and such preference is lost in caspase-3 deficient mice (Figure 6).

      We do not understand the experiment where the reviewer suggested to: “eliminate microglia and show that the caspase 3 dependent effects go away”. To quantify caspase-3 dependent engulfment of synaptic material by microglia or preferential engulfment of silenced terminals by microglia, microglia must be present in the tissue sample. If we eliminate microglia, neither of these measurements can be made. What could be measured if microglia are eliminated is the refinement of retinogeniculate pathway. This experiment would test whether microglia are required for caspase-3 dependent phenotypes. This is not a claim made in the manuscript. Instead, we claimed caspase-3 is required for microglia to preferentially eliminate weak synapses.

      We did not claim that “microglia are directing synaptic elimination”. Our claim is that synapse inactivation induces caspase-3 activity, and this caspase-3 activity in turn determines the substrate preference of microglia-mediated synapse elimination. Based on this model, it is the neuronal activity that fundamentally directs synapse elimination. Throughout the manuscript, we used the term “microglia-mediated synapse elimination”. This terminology does not assume a directive/instructive role of microglia in synapse elimination and only describes the observed engulfment of synaptic material by microglia. We also did not assume how microglia engage in synapse elimination. We acknowledge in the discussion (line 458 through 465) that microglia may mediate synapse elimination in an indirect, passive way by engulfing shed neuronal material. This topic is a matter of debate in the field (Eyo et al., DOI: 10.1126/science.adh7906 ).

      Finally, the authors also report that caspase 3 deficiency alters synapse loss in 6-month-old female APP/PS1 mice, but this is not really related to the rest of the paper. 

      We respectfully disagree that Figure 7 is not related to the rest of the paper. Many genes involved in postnatal synapse elimination, such as C1q and C3, have been implicated in neurodegeneration. It is therefore natural and important to ask whether the function of caspase-3 in regulating synaptic homeostasis extends to neurodegenerative diseases in adult animals. The answer to this question may have broad therapeutic impacts.

      Reviewer #2 (Public Review): 

      Summary: 

      This manuscript by Yu et al. demonstrates that activation of caspase-3 is essential for synapse elimination by microglia, but not by astrocytes. This study also reveals that caspase 3 activation-mediated synapse elimination is required for retinogeniculate circuit refinement and eye-specific territories segregation in dLGN in an activity-dependent manner. Inhibition of synaptic activity increases caspase-3 activation and microglial phagocytosis, while caspase-3 deficiency blocks microglia-mediated synapse elimination and circuit refinement in the dLGN. The authors further demonstrate that caspase-3 activation mediates synapse loss in AD, loss of caspase-3 prevented synapse loss in AD mice. Overall, this study reveals that caspase-3 activation is an important mechanism underlying the selectivity of microglia-mediated synapse elimination during brain development and in neurodegenerative diseases. 

      Strengths: 

      A previous study (Gyorffy B. et al., PNSA 2018) has shown that caspase-3 signal correlates with C1q tagging of synapses (mostly using in vitro approaches), which suggests that caspase-3 would be an underlying mechanism of microglial selection of synapses for removal. The current study provides direct in vivo evidence demonstrating that caspase-3 activation is essential for microglial elimination of synapses in both brain development and neurodegeneration. 

      The paper is well-organized and easy to read. The schematic drawings are helpful for understanding the experimental designs and purposes. 

      Weaknesses: 

      It seems that astrocytes contain large amounts of engulfed materials from ipsilateral and contralateral axon terminals (Figure S11B) and that caspase-3 deficiency also decreased the volume of engulfed materials by astrocytes (Figures S11C, D). So the possibility that astrocyte-mediated synapse elimination contributes to circuit refinement in dLGN cannot be excluded.

      The experiments presented in Figure S11 aim to determine whether astrocyte-mediated synapse elimination depends on caspase- 3 signaling.  We do not claim that astrocytes are unimportant for synapse elimination or circuit refinement. We did observe a small decrease in synaptic material engulfed by astrocytes when caspase-3 is deficient, and we acknowledged that there could be defects that we were not able to detect (line 309-310). The claim that caspase-3 does not regulate astrocyte-mediated synapse elimination is not a central claim of the manuscript and we will qualify our statements in the text. We will remove the phrase “but not astrocytes” in the abstract (line 18).

      Does blocking single or dual inactivation of synapse activity (using TeTxLC) increase microglial or astrocytic engulfment of synaptic materials (of one or both sides) in dLGN? 

      We assume that by “blocking single or dual inactivation of synapse activity”, the reviewer refers to inactivating retinogeniculate synapses from one or both eyes.

      We showed that inactivating retinogeniculate synapses from one eye (single inactivation) increases microglia-mediated engulfment of presynaptic terminals of inactivated synapses (Figure 6). We did not measure microglia-mediated engulfment of synaptic material while inactivating retinogeniculate synapses from both eyes (dual inactivation). However, based on the total active caspase-3 signal (Figure 2) in the dual inactivation scenario, we do not expect to see an increase in engulfment of synaptic material.

      We did not measure astrocyte-mediated engulfment with single or dual inactivation, as we did not see a robust caspase-3 dependent phenotype in astrocyte-mediated engulfment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Review:

      Summary:

      Bursicon is a key hormone regulating cuticle tanning in insects. While the molecular mechanisms of its function are rather well studied--especially in the model insect Drosophila melanogaster, its effects and functions in different tissues are less well understood. Here, the authors show that bursicon and its receptor play a role in regulating aspects of the seasonal polyphenism of Cacopsylla chinensis. They found that low temperature treatment activated the bursicon signaling pathway during the transition from summer form to winter form and affect cuticle pigment and chitin content, and cuticle thickness. In addition, the authors show that miR-6012 targets the bursicon receptor, CcBurs-R, thereby modulating the function of bursicon signaling pathway in the seasonal polyphenism of C. chinensis. This discovery expands our knowledge of the roles of neuropeptide bursicon action in arthropod biology.

      However, the study falls short of its claim that it reveals the molecular mechanisms of a seasonal polyphenism. While cuticle tanning is an important part of the pear psyllid polyphenism, it is not the equivalent of it. First, there are other traits that distinguish between the two morphs, such as ovarian diapause (Oldfield, 1970), and the role of bursicon signaling in regulating these aspects of polyphenism were not measured. Thus, the phenotype in pear psyllids, whereby knockdown bursicon reduces cuticle tanning seems to simply demonstrate the phenotypes of Drosophila mutants for bursicon receptor (Loveall and Deitcher, 2010, BMC Dev Biol) in another species (Fig. 2I, 4H). Second, the study fails to address the threshold nature of cuticular tanning in this species, although it is the threshold response (specifically, to temperature and photoperiod) that distinguishes this trait as a part of a polyphenism. Whereas miR-6012 was found to regulate bursicon expression, there no evidence is provided that this microRNA either responds to or initiates a threshold response to temperature. In principle, miR-6012 could regulate bursicon whether or not it is part of a polyphenism. Thus, the impact of this work would be significantly increased if it could distinguish between seasonal changes of the cuticle and a bona fide reflection of polyphenism.

      Thanks for your valuable suggestion. We concur with the review’s comment that cuticle tanning does not equate to the C. chinensis polyphenism. To better reflect the core focus of our research, we have revised the title to "Neuropeptide Bursicon and its receptor mediated the transition from summer-form to winter-form of Cacopsylla chinensis".

      In response to the reviewer's inquiry regarding the threshold nature of cuticular tanning in C. chinensis, we have included a detailed analysis of the phenotypic changes (including nymph phenotypes, cuticle pigment absorbance, and cuticle thickness) during the transition from summer-form to winter-form in C. chinensis at distinct time intervals (3, 6, 9, 12, 15 days) under different temperature conditions (10°C and 25°C). As shown in Figure S1, nymphs exhibit a light yellow and transparent coloration at 3, 6, and 9 days, while nymphs at 12 and 15 days display shades of yellow-green or blue-yellow under 25°C conditions. At 10°C conditions, the abdomen end turns black at 3, 6, and 9 days. By the 12 days, numerous light black stripes appear on the chest and abdomen of nymphs at 10°C. At 15 days, nymphs exhibit an overall black-brown appearance, featuring dark brown stripes on the left and right sides of each chest and abdominal section. Furthermore, the end of the abdomen and back display a large black-brown coloration at 10°C (Figure S1A). The UV absorbance of the total pigment extraction at a 300 nm wavelength markedly increases following 10°C exposure for 6, 9, 12, and 15 days compared to the 25°C treatment group (Figure S1B). Cuticle thicknesses also increased following 10°C exposure for 6, 9, 12, and 15 days compared to the 25°C treatment group (Figure S1C). The detailed results (L122-143), materials and methods (L647-652), and discussion (L319-322) have been added in our revised manuscript.

      Regarding the response of miR-6012 to temperature, we have already determined its expression at 3, 6, 10 days under different temperatures in the previous Figure 5E. We now included additional time intervals (9, 12, 15 days) in the updated Figure 5E. Our results indicate a significant decrease in the expression levels of miR-6012 after 10°C treatment for 3, 6, 9, 12, 15 days compared to the 25°C treatment group. Detailed information regarding this has been integrated into the Materials and Methods (Line 608-610) of our revised manuscript.

      Strengths:

      This study convincingly identifies homologs of the genes encoding the bursicon subunits and its receptor, showing an alignment with those of another psyllid as well as more distant species. It also demonstrates that the stage- and tissue-specific levels of bursicon follow the expected patterns, as informed by other insect models, thus validating the identity of these genes in this species. They provide strong evidence that the expression of bursicon and its receptor depend on temperature, thereby showing that this trait is regulated through both parts of the signaling mechanism.

      Several parallel measurements of the phenotype were performed to show the effects of this hormone, its receptor, and an upstream regulator (miR-6012), on cuticle deposition and pigmentation (if not polyphenism per se, as claimed). Specifically, chitin staining and TEM of the cuticle qualitatively show difference between controls and knockdowns, and this is supported by some statistical tests of quantitative measurements (although see comments below). Thus, this study provides strong evidence that bursicon and its receptor play an important role in cuticle deposition and pigmentation in this psyllid.

      The study identified four miRNAs which might affect bursicon due to sequence motifs. By manipulating levels of synthetic miRNA agonists, the study successfully identified one of them (miR-6012) to cause a cuticle phenotype. Moreover, this miRNA was localized (by FISH) to the cuticle, body-wide. To our knowledge, this is the first demonstrated function for this miRNA, and this study provides a good example of using a gene of known function as an entry point to discovering others influencing a trait. Thus, this finding reveals another level of regulation of cuticle formation in insects.

      Weaknesses:

      (1) The introduction to this manuscript does not accurately reflect progress in the field of mechanisms underlying polyphenism (e.g., line 60). There are several models for polyphenism that have been used to uncover molecular mechanisms in at least some detail, and this includes seasonal polyphenisms in Hemiptera. Therefore, the justification for this study cannot be predicated on a lack of knowledge, nor is the present study original or unique in this line of research (e.g., as reviewed by Zhang et al. 2019; DOI: 10.1146/annurev-ento-011118-112448). The authors are apparently aware of this, because they even provide other examples (lines 104-108); thus the introduction seems misleading as framed.

      Thanks for your excellent suggestion. We have added the paper of Zhang et al. 2019 which recommended by reviewer (DOI: 10.1146/annurev-ento-011118-112448) in Line 57 of our revised manuscript. The statement has been revised to “However, the specific molecular mechanism underling temperature-dependent polyphenism still require further clarification” in Line 60-61 of our revised manuscript.

      (2) The data in Figure 2H show "percent of transition." However, the images in 2I show insects with tanned cuticle (control) vs. those without (knockdown). Yet, based on the description of the Methods provided, there appears to be no distinction between "percent of transition" and "percent with tanning defects". This an important distinction to make if the authors are going to interpret cuticle defects as a defect in the polyphenism. Furthermore, there is no mention of intermediate phenotypes. The data in 2H are binned as either present or absent, and these are the phenotypes shown in 2I. Was the phenotype really an all-or-nothing response? Instead of binning, which masks any quantitative differences in the tanning phenotypes, the authors should objectively quantify the degree of tanning and plot that. This would show if and to what degree intermediate tanning phenotypes occurred, which would test how bursicon affects the threshold response. This comment also applies to the data in Figures 4G and 6G. Since cuticle tanning is present in more insect than just those with seasonal polyphenism, showing how this responds as a threshold is needed to make claims about polyphenism.

      We appreciate your insightful comments. As shown in Figure 1 of our published paper (Zhang et al., 2013; doi.org/10.7554/eLife.88744.3) and Figure 2C-2I of the current manuscript, the transition from summer-form to winter-form entails not only external cuticular tanning but also alterations in internal cuticular chitin levels and cuticle thickness. While external cuticular tanning serves as a prominent and easily observable indicator of this transition, it is crucial to acknowledge that internal changes also play a significant role and should be taken into consideration. Therefore, we propose that the term "percent of transition" may be more suitable than "percent with tanning defects" to describe this process accurately.

      In order to provide a more visually comprehensive understanding of the phenotypic changes during the transition from summer-form to winter-form, we have included images at different time points (3, 6, 9, 12, 15 days) under different temperature conditions in Figure S1A of our revised manuscript. Specifically, under the 10°C condition, nymphs exhibit abdomen tanning after 6 and 9 days of treatment, while the thorax remains untanned. By days 12 to 15, both the abdomen and thorax of the nymphs show tanning, resulting in the majority of summer-form nymphs transitioning into winter-form, as depicted in Figure 2I for comparison. This observation indicates the presence of a critical threshold for cuticle tanning of C. chinensis following exposure to 10°C. Nymphs that did not undergo the transition to winter-form succumbed to the cold, highlighting the absence of intermediate phenotypes at 12-15 days under the 10°C condition. The UV absorbance of the total pigment extraction at a 300 nm wavelength markedly increases following 10°C exposure for 6, 9, 12, and 15 days compared to the 25°C treatment group (Figure S1B). Additionally, cuticle thickness shows an increase following 10°C exposure for 6, 9, 12, and 15 days compared to the 25°C treatment group (Figure S1C). These results highlight the relationship between the threshold of cuticular tanning and the transition process. The detailed description and information have been added in Results (L122-143), Materials and Methods (L647-652), and Discussion (L319-322) of our manuscript.

      (3) This study also does not test the threshold response of cuticle phenotypes to levels of bursicon, its receptor, or miR-6012. Hormone thresholds are the most widespread and, in most systems where polyphenism has been studied, the defining characteristic of a polyphenism (e.g., Nijhout, 2003, Evol Dev). Quantitative (not binned) measurements of a polyphenism marker (e.g., chitin) should be demonstrated to result as a threshold titer (or in the case of the receptor, expression level) to distinguish defects in polyphenism from those of its component trait.

      Thanks for your valuable feedback. We have supplemented additional data on the phenotypes (Figure S1A), cuticle pigment absorbance (Figure S1B), cuticle thickness (Figure S1C), expression levels of bursicon (Figure 1E and 1F), its receptors (Figure 3G), and miR-6012 (Figure 5E) corresponding to nymphs treated over different time periods (3, 6, 9, 12, 15 days) under both 10°C and 25°C conditions in our revised manuscript.

      While all these identified markers exhibit a strong correlation with the transition from summer-form to winter-form, it is important to note that they are not suitable as definitive thresholds due to the nature of relative gene expression quantification and chitin content assessment, rather than absolute quantitation. Further, given that tanning hormones are neuropeptides present in trace amounts in insects, unlike steroid hormones, determining their titers poses a considerable challenge.

      (4) Cuticle issue:

      (a) Unlike Fig. 6D and F, Figs. 2D and F do not correspond to each other. Especially the lack and reduction of chitin in ds-a+b! By fluorescence microscopy there is hardly any signal, whereas by TEM there is a decent cuticle. Additionally, the dsGFP control cuticle in 2D is cut obliquely with a thick and a thin chitin layer. This is misleading.

      Thanks for your insightful feedback. We have replaced the previous WGA chitin staining images in the dsCcbursα+β treatment of Figure 2D with new representative images aligning with Figure 2F. Furthermore, the presence of both thin and thick chitin layers observed in the dsEGFP treatment of Figure 2D could potentially be ascribed to the chitin content in the insect midgut or fat body as previously discussed (Zhu et al., 2016). It is notable that during the process of cuticle staining, the chitin located in the midgut and fat body of C. chinensis may exhibit green fluorescence, leading to the appearance of a thin chitin layer. A detailed analysis and elucidation of these observations have been added in the discussion section (Lines 347-352) of our revised manuscript.

      Zhu KY, Merzendorfer H, Zhang W, Zhang J, Muthukrishnan S. Biosynthesis, Turnover, and Functions of Chitin in Insects. Annu Rev Entomol. 2016;61:177-196. doi:10.1146/annurev-ento-010715-023933.

      (b) In Figs. 2F and 4F, the endocuticle appears to be missing, a portion of the procuticle that is produced post-molting. As tanning is also occurring post-molting, there seems to be a general problem with cuticle differentiation at this time point. This may be a timing issue. Please clarify.

      Thank you for your suggestion. The insect cuticle typically comprises three distinct layers (endocuticle, exocuticle, and epicuticle), with the thickness of each layer varying among different insect species. Cuticle differentiation is closely linked to the molting cycle of insects (Mrak et al., 2017). In our study, nymphal cuticles exhibited normal differentiation patterns, characterized by a thin epicuticle and comparable widths of the endocuticle and exocuticle following dsEGFP treatment, as illustrated in Figure 2F and 4F. Conversely, nymphs treated with dsCcBurs-α, dsCcBurs-β, and dsCcburs-R displayed impaired development, manifesting only the exocuticle without a discernible endocuticle layer. These findings suggest that bursicon genes and their receptor play a pivotal role in regulating insect cuticle development (Costa et al., 2016). We have added some discussion about these results in Lines 356-367 of our revised manuscript.

      Mrak, P., Bogataj, U., Štrus, J., & Žnidaršič, N. (2017). Cuticle morphogenesis in crustacean embryonic and postembryonic stages. Arthropod structure & development, 46(1), 77–95. https://doi.org/10.1016/j.asd.2016.11.001

      Costa, C. P., Elias-Neto, M., Falcon, T., Dallacqua, R. P., Martins, J. R., & Bitondi, M. (2016). RNAi-mediated functional analysis of Bursicon genes related to adult cuticle formation and tanning in the Honeybee, Apis mellifera. PloS one, 11(12), e0167421. https://doi.org/10.1371/journal.pone.0167421

      (c) To provide background information, it would be useful analyze cuticle formation in the summer and winter morphs of controls separately by light and electron microscopy. More baseline data on these two morphs is needed.

      Thanks for your valuable feedback. To provide more background information about cuticle formation, we supplied the results of nymph phenotypes, cuticle pigment absorbance, and cuticle thickness at distinct time intervals (3, 6, 9, 12, 15 days) under different temperatures of 10°C and 25°C in Figure S1 of our revised manuscript. Hope these results can help better understand the baseline data on these two morphs.

      (d) For the TEM study, it is not clear whether the same part of the insect's thorax is being sectioned each time, or if that matters. There is not an obvious difference in the number of cuticular layers, but only the relative widths of those layers, so it is difficult to know how comparable those images are. This raises two questions that the authors should clarify. First, is it possible that certain parts of the thoracic cuticle, such as those closer to the intersegmental membrane, are naturally thinner than other parts of the body? Second, is the tanning phenotype based on the thickness or on the number of chitin layers, or both? The data shown later in Figure 4I, J convincingly shows that the biosynthesis pathway for chitin is repressed, but any clarification of what this might mean for deposition of chitin would help to understand the phenotypes reported. Also, more details on how the data in Fig. 2G were collected would be helpful. This also goes for the data in Fig. 4 (bursicon receptor knockdowns).

      Thanks for your great comment. The TEM investigation adhered to a standardized protocol was used as previous description (Zhang et al., 2023), Initially, insect heads were uniformly excised and then fixed in 4% paraformaldehyde. Subsequently, a consistent cutting and staining procedure was executed at a uniform distance above the insect's thorax. The dorsal region of the thorax was specifically chosen for subsequent fluorescence imaging or transmission electron microscopy assessments with the specific objective of quantifying cuticle thickness. Regarding the measurement of cuticle thickness, use the built-in measuring ruler on the software to select the top and bottom of the same horizontal line on the cuticle. Measure the cuticle of each nymph at two close locations. Six nymphs were used for each sample. Randomly select 9 values and plot them. The related description has been added in the Materials and Methods (Line 660-668) of our revised manuscript.

      Zhang, S.D., Li, J.Y., Zhang, D.Y., Zhang, Z.X., Meng, S.L., Li, Z., & Liu, X.X. (2023). MiR-252 targeting temperature receptor CcTRPM to mediate the transition from summer-form to winter-form of Cacopsylla chinensis. eLife, 12. https://doi.org/10.7554/eLife.88744

      (5) Tissue issue:

      The timed experiments shown in all figures were done in whole animals. However, we know from Drosophila that Bursicon activity is complex in different tissues. There is, thus, the possibility, that the effects detected on different days in whole animals are misleading because different tissues--especially the brain and the epidermis, may respond differentially to the challenge and mask each other's responses. The animal is small, so the extraction from single tissue may be difficult. However, this important issue needs to be addressed.

      Thanks for your excellent suggestion. We express our heartfelt appreciation to the reviewer for their valuable input regarding the challenges involved in dissecting various tissue sections from the diminutive early instar nymphs of C. chinensis. In light of the metamorphic transition of C. chinensis across developmental stages, this study concentrated on examining the extensive phenotypic alterations. Consequently, intact samples of C. chinensis were specifically chosen for for qPCR analysis. The related descriptions have been added in the Materials and Methods (Line 513, 517, 553, 555, and 613) and Discussion (Line 327-329) of our revised manuscript.

      (6) No specific information is provided regarding the procedure followed for the rescue experiments with burs-α and burs-β (How were they done? Which concentrations were applied? What were the effects?). These important details should appear in the Materials and Methods and the Results sections.

      Thanks for your excellent suggestion. For the rescue experiments, the dsRNA of CcBurs-R and proteins of burs α-α, burs β-β homodimers, or burs α-β heterodimer (200 ng/μL) were fed together. The concentration of heterodimer protein of CcBurs-α+β was 200 ng/μL. The heterodimer protein of CcBurs-α+β fully rescued the effect of RNAi-mediated knockdown on CcBurs-R expression, while α+α or β+β homodimers did not (Figure 3F). Feeding the α+β heterodimer protein fully rescued the defect in the transition percent and morphological phenotype after CcBurs-R knockdown (Figure 4G-4H). We have added the detailed methods of rescued experiments and specific concentrations in the Materials and Methods (Line 561-563), and Results (Line 263) of our revised manuscript.

      (7) Pigmentation

      (a) The protocol used to assess pigmentation needs to be validated. In particular, the following details are needed: Were all pigments extracted? Were pigments modified during extraction? Were the values measured consistent with values obtained, for instance, by light microscopy (which should be done)?

      Thanks for your excellent comment. Our protocol for pigment extracted as detailed in Bombyx mori, the cuticles were pulverized in liquid nitrogen and then dissolved in 30 milliliters of acidified methanol (Futahashi et al., 2012; Osanai-Futahashi et al., 2012). Thus, all cuticle pigments were dissected and treated with acidified methanol. Pigments were not modified during extraction.. The details description have been integrated into the Materials and Methods (Line 630-633) of our revised manuscript.

      Futahashi, R., Kurita, R., Mano, H., & Fukatsu, T. (2012). Redox alters yellow dragonflies into red. Proceedings of the National Academy of Sciences of the United States of America, 109(31), 12626–12631. https://doi.org/10.1073/pnas.1207114109

      Osanai-Futahashi, M., Tatematsu, K. I., Yamamoto, K., Narukawa, J., Uchino, K., Kayukawa, T., Shinoda, T., Banno, Y., Tamura, T., & Sezutsu, H. (2012). Identification of the Bombyx red egg gene reveals involvement of a novel transporter family gene in late steps of the insect ommochrome biosynthesis pathway. The Journal of biological chemistry, 287(21), 17706–17714. https://doi.org/10.1074/jbc.M111.321331

      (b) In addition, pigmentation occurs post-molting; thus, the results could reflect indirect actions of bursicon signaling on pigmentation. The levels of expression of downstream pigmentation genes (ebony, lactase, etc) should be measured and compared in molting summer vs. winter morphs.

      Thanks for your valuable suggestion. Actually, we already studied the function of some downstream pigmentation genes, including ebony, Lactase, Tyrosine hydroxylase, Dopa decarboxylase, and Acetyltransferase. The variations in the expression patterns of these genes are closely tied to the molting dynamics of nymphs undergoing transitions between summer-form and winter-form. These findings will put in another manuscript currently being prepared for submission, thus detailed outcomes are not suitable for inclusion in the current manuscript.

      (8) L236: "while the heterodimer protein of CcBurs α+β could fully rescue the effect of CcBurs-R knockdown on the transition percent (Figure 4G 4H)". This result seems contradictory. If CcBurs-R is the receptor of bursicon, the heterodimer protein of CcBurs α+β should not be able to rescue the effect of CcBurs-R knockdown insects. How can a neuropeptide protein rescue the effect when its receptor is not there! If these results are valid, then the CcBurs-R would not be the (sole) receptor for CcBurs α+β heterodimer. This is a critical issue for this manuscript and needs to be addressed (also in L337 in Discussion).

      Thanks for your insightful suggestion. Following the administration of dsCcBur-R to C. chinensis, the expression of CcBurs-R exhibited a reduction of approximately 66-82% as depicted in Figure 4A, rather than complete suppression. Activation of endogenous CcBurs-R through feeding of the α+β heterodimer protein results in an increase in CcBurs-R expression, with the effectiveness of the rescue effect contingent upon the dosage of the α+β heterodimer protein. Consequently, the capacity of the α+β heterodimer protein to effectively mitigate the impacts of CcBurs-R knockdown on the conversion rate is clearly demonstrated. We have added additional discussion in Line 396-403 of our revised manuscript.

      (9) Fig. 5D needs improvement (the magnification is poor) and further explanation and discussion. mi6012 and CcBurs-R seem to be expressed in complementary tissues--do we see internal tissues also (see problem under point 2)? Again, the magnification is not high enough to understand and appreciate the relationships discussed.

      Thanks for your valuable suggestion. In order to enhance the resolution of the magnified images, we conducted FISH co-localization of miR-6012 and CcBurs-R in 3rd instar nymphs and obtained detailed zoomed-in images. As shown in the magnified view of Figure 5D, miR-6012 and CcBurs-R appear to exhibit complementary expression patterns in tissues. During the FISH assays, epidermis transparency of C. chinensis was achieved via decolorization treatment. Noteworthy observations from Figure 3G and Figure 5E reveal an inverse correlation in the expression profiles of CcBurs-R and miR-6012. Consequently, the FISH results distinctly highlight a significant disparity in the expression levels of CcBurs-R and miR-6012 within the same tissue. We have added related explanation and discussion in Line 291-293 of our revised manuscript.

      (10) The schematic in Fig. 7 is a useful summary, but there is a part of the logic that is unsupported by the data, specifically in terms of environmental influence on cuticle formation (i.e., plasticity). What is the evidence that lower temperatures influence expression of miR-6012? The study measures its expression over life stages, whether with an agonist or not, over a single temperature. Measuring levels of expression under summer form-inducing temperature is necessary to test the dependence of miR-6012 expression on temperature. Otherwise, this result cannot be interpreted as polyphenism control, but rather the control of a specific trait.

      Thanks for your great suggestion. We actually conducted the assessment of miR-6012 expression at specific time intervals (3, 6, 9, 12, 15 days) under different temperatures of 10°C and 25°C. As depicted in Figure 5E, the expression levels of miR-6012 were notably reduced at 10°C compared to 25°C. Additionally, the evaluation of agomir-6012 expression level of C. chinensis under 25°C conditions at various time points (3, 6, 9, 12, 15 days) revealed no significant changes. Hence, we suggest that the impact of miR-6012 on the seasonal morphological transition is influenced upon temperature.

      Recommendations for the authors:

      The authors report a novel role of Bursicon and its receptor in regulating the seasonal polyphenism of Cacopsylla chinensis. They found that low temperature treatment (10°C) activated the Bursicon signaling pathway during the transition from summer-form to winter-form, which influences cuticle pigment content, cuticle chitin content, and cuticle thickness. Moreover, the authors identified miR-6012 and show that it targets CcBurs-R, thereby modulating the function of Bursicon signaling pathway in the seasonal polyphenism of C. chinensis. This discovery expands our knowledge of multiple roles of neuropeptide bursicon action in arthropod biology. However, the m

      anuscript does have several major weaknesses, described under "Public review", which the authors need to address.

      Major issues:

      (1) L152-154 Fig S2E and S2F: Bursicon has been shown to be expressed in the CNS in a specific set of neurons. For example, In the larval CNS of Manduca sexta, bursicon expression is restricted to the subesophageal ganglion (SG), thoracic ganglia, and first abdominal ganglion. Pharate pupae and pharate adults show expression of this heterodimer in all ganglia. In Drosophila larvae, expression of a bursicon heterodimer is confined to abdominal ganglia. The additional neurons in the ventral nerve cord express only burs. In pharate adults, bursicon is produced by neurons in the SG and abdominal ganglia. I am wondering where bursicon subunits are expressed in the C. chinensis CNS? Since the authors have the antibodies, it would be useful to include immunocytochemical staining of bursicon alpha and beta in the CNS. The qPCR results from head or other tissues (Fig S2E and S2F) is not the most informative way to document localization of gene expression. Regarding the qPCR results, they show that the cuticle and the fat body express CcBurs-α and CcBurs-β. Can the authors confirm this unexpected results independently?

      Thanks for your insightful comment. In this study, we did not directly used antibodies targeting bursicon subunits, instead, the bursicon subunits along with a histidine tag were integrated into the expression vector pcDNA3.1 using homologous recombination. The experimental procedures were executed as follows: initially, the histidine tag was fused to the pcDNA3.1-mCherry vector through homologous recombination to generate the recombinant plasmid pcDNA3.1-his-mCherry. Subsequently, the amino acid sequences of the two bursicon subunits were introduced into the pcDNA3.1-his-mCherry vector via homologous recombination to produce the recombinant plasmids pcDNA3.1-CcBurs-α-his-mCherry and pcDNA3.1-CcBurs-β-his-mCherry. Finally, the P2A sequence was incorporated into the vector using reverse PCR to yield the recombinant plasmids pcDNA3.1-CcBurs-α-his-P2A-mCherry and pcDNA3.1-CcBurs-β-his-P2A-mCherry. Consequently, the bursicon subunits, along with the histidine tag, were capable of generating fusion proteins with the histidine tag. Western blot analysis was conducted using antibodies targeting the histidine tag, enabling the detection of histidine expression, which corresponds to the expression of the bursicon subunits. However, they are not suitable to conduct the in vivo immunocytochemical staining of bursicon alpha and beta in the CNS.

      Due to the diminutive size of the C. chinensis nymphs, dissection of the central nervous system (CNS) was unfeasible, precluding specific assessment of bursicon expression in the CNS. Prior literature has documented the expression of bursicon subunits in the epidermis and fat body of C. chinensis. Studies suggest that bursicon subunits not only play a role in the melanization and sclerotization processes of insect epidermis but also have significant roles in insect immunity (An et al., 2012). The presence of bursicon subunits in the epidermis, gut, and fat body of C. chinensis may indicate their crucial roles in the immune functions of these tissues. Further investigation is required to elucidate the specific immune functions they perform, hinting at the potential expression of these bursicon subunits in these two tissues.

      An, S., Dong, S., Wang, Q., Li, S., Gilbert, L. I., Stanley, D., & Song, Q. (2012). Insect neuropeptide bursicon homodimers induce innate immune and stress genes during molting by activating the NF-κB transcription factor Relish. PloS one, 7(3), e34510. https://doi.org/10.1371/journal.pone.0034510

      (2) L222: "CcBurs-R is the Bursicon receptor of C. chinensis". Is this statement supported by affinity binding assay results?

      Thanks for your excellent suggestion. We employed a fluorescence-based assay to quantify calcium ion concentrations and investigate the binding affinities of bursicon heterodimers and homodimers to the bursicon receptor across varying concentrations. Our findings suggest that activation of the receptor by the burs α-β heterodimer leads to significant alterations in intracellular calcium ion levels, whereas stimulation with burs α-α and burs β-β homodimers, in conjunction with Adipokinetic hormone (AKH), maintains consistent intracellular calcium ion levels. Consequently, this research definitively identifies CcBurs-R as the bursicon receptor. For further details, please refer to the Materials and Methods (Lines 493-504), Results (Lines 231-239), and Discussion (Lines 377-384) of our revised manuscript.

      (3) L245 Figure 4I-4J: Since knockdown of bursicon and its receptor cause a decrease pigment accumulation in the cuticle, it would be useful to examine 1-2 rate limiting enzyme-encoding genes in the bursicon regulated cuticle darkening process if possible (as was done for genes involved in cuticle thickening).

      Thanks for your excellent comment. Following the further study, a thorough analysis was conducted to evaluate the impact of bursicon and its receptor on the expression levels of Lactase, Tyrosine hydroxylase, Dopa decarboxylase, Acetyltransferase, and the effects of RNA interference targeting these genes on the seasonal morphological transition. The findings underscored their role in the bursicon-mediated cuticle darkening process. However, as this section is slated for inclusion in an upcoming manuscript intended for submission, it is deemed unsuitable for incorporation into the current manuscript.

      Minor issues:

      (1) L75 "stronger resistance (Ge et al., 2019; Tougeron et al., 2021)". Stronger resistance to what? Stronger resistance to environmental stress or weather condition? Please clarify.

      Thanks for your excellent suggestion. We have changed the statement to “stronger resistance to weather condition” in Line 75 of our revised manuscript.

      (2) L132 Figure 1A and 1B: Bursicon sequence was first identified and functionally characterized in Drosophila melanogaster: is there any reason why Drosophila bursicon sequences were not included in the comparison?

      Thanks for your excellent comment. We have added the sequence of Burs-α and Burs-β of D. melanogaster in the sequence alignment results of Figure 1A and 1B of our revised manuscript.

      (3) Although the authors clearly identify and validate the function for the bursicon genes and its receptor's, there is no mention of whether duplicates of this gene are also present in the pear psyllid. This has been known to happen in otherwise conserved hormone pathways (e.g., insulin receptor in some insects), so a formal check of this should be done.

      Thanks for your excellent comment. As shown in Figure S2A-S2B and 3B, there are two bursicon subunit genes and only one bursicon receptor gene in our selected insect species, for examples Drosophila melanogaster, Diaphorina citri, Bemisia tabaci, Nilaparvata lugens, and Sogatella furcifera. In our transcriptome database of C. chinensis, we also only identified two bursicon subunit genes and only one bursicon receptor gene.

      (4) Line 41: Here, as in the title, "fascinating" is a subjective judgement that does not improve a study's presentation.

      Thanks for your great comment. We have changed "fascinating" to "transformation" in Line 41 and also revised the title of our revised manuscript.

      (5) Line 44: What makes some fields "cutting-edge" and others not?

      Thanks for your excellent suggestion. The expression of "in cutting-edge fields" has been deleted in Line 44 of our revised manuscript.

      (6) Line 97: This is a peculiar choice of reference for the concept of slower development in cold temperatures. The concept of degree-days and growth rates is old and widespread in entomology.

      Thanks for your insightful comment. The reference of Nyamaukondiwa et al., 2011 in Line 95 has been deleted in our revised manuscript.

      (7) Lines 149-150: What justifies the assumption that higher levels of expression mean a more important role? This gene might be just as necessary for development of the summer form, even if expressed at lower levels.

      Thanks for your excellent suggestion. This sentence has been revised to “Increased gene expression levels may potentially contribute to the transition from summer-form to winter-form in C. chinensis.” in Line 168-169 of our revised manuscript.

      (8) The blue arrow in Fig. 7 is confusing.

      Thanks for your excellent suggestion. In Figure 7, the blue arrow represents the down-regulated expression of miR-6012. We have added a description about the blue arrow in Figure 7 of our revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Over the last decade, numerous studies have identified adaptation signals in modern humans driven by genomic variants introgressed from archaic hominins such as Neanderthals and Denisovans. One of the most classic signals comes from a beneficial haplotype in the EPAS1 gene in Tibetans that is evidently of Denisovan origin and facilitated high altitude adaptation (HAA). Given that HAA is a complex trait with numerous underlying genetic contributions, in this paper Ferraretti et al. asked whether additional HAA-related genes may also exhibit a signature of adaptive introgression. Specifically, the authors considered that if such a signature exists, they most likely are only mild signals from polygenic selection, or soft sweeps on standing archaic variation, in contrast to a strong and nearly complete selection signal like in the EPAS1. Therefore, they leveraged two methods, including a composite likelihood method for detecting adaptive introgression and a biological networkbased method for detecting polygenic selection, and identified two additional genes that harbor plausible signatures of adaptive introgression for HAA.

      Strengths: 

      The study is well motivated by an important question, which is, whether archaic introgression can drive polygenic adaptation via multiple small effect contributions in genes underlying different biological pathways regulating a complex trait (such as HAA). This is a valid question and the influence of archaic introgression on polygenic adaptation has not been thoroughly explored by previous studies.

      The authors reexamined previously published high-altitude Tibetan whole genome data and applied a couple of the recently developed methods for detecting adaptive introgression and polygenic selection. 

      Weaknesses: 

      My main concern with this paper is that I am not too convinced that the reported genomic regions putatively under polygenic selection are indeed of archaic origin. Other than some straightforward population structure characterizations, the authors mainly did two analyses with regard to the identification of adaptive introgression: First, they used one composite likelihood-based method, the VolcanoFinder, to detect the plausible archaic adaptive introgression and found two candidate genes (EP300 and NOS2). Next, they attempted to validate the identified signal using another method that detects polygenic selection based on biological network enrichments for archaic variants.

      In general, I don't see in the manuscript that the choice of methods here are well justified. VolcanoFinder is one among the several commonly used methods for detecting adaptive introgression (eg. the D, RD, U, and Q statistics, genomatnn, maldapt etc.). Even if the selection was mild and incomplete, some of these other methods should be able to recapitulate and validate the results, which are currently missing in this paper. Besides, some of the recent papers that studied the distribution of archaic ancestry in Tibetans don't seem to report archaic segments in the two gene regions. These all together made me not sure about the presence of archaic introgression, in contrast to just selection on ancestral variation.

      Furthermore, the authors tried to validate the results by using signet, a method that detects enrichments of alleles under selection in a set of biological networks related to the trait. However, the authors did not provide sufficient description on how they defined archaic alleles when scoring the genes in the network. In fact, reading from the method description, they seemed to only have considered alleles shared between Tibetans and Denisovans, but not necessarily exclusively shared between them. If the alleles used for scoring the networks in Signet are also found in other populations such as Han Chinese or Africans, then that would make a substantial difference in the result, leading to potential false positives.

      Overall, given the evidence provided by this article, I am not sure they are adequate to suggest archaic adaptive introgression. I recommend additional analyses for the authors to consider for rigorously testing their hypothesis. Please see the details in my review to the authors. 

      Reviewer #2 (Public Review):

      In Ferrareti et al. they identify adaptively introgressed genes using VolcanoFinder and then identify pathways enriched for adaptively introgressed genes. They also use a signet to identify pathways that are enriched for Denisovan alleles. The authors find that angiogenesis and nitric oxide induction are enriched for archaic introgression.

      Strengths: 

      Most papers that have studied the genetic basis of high altitude (HA) adaptation in Tibet have highly emphasized the role of a few genes (e.g. EPAS1, EGLN1), and in this paper, the authors look for more subtle signals in other genes (e.g EP300, NOS2) to investigate how archaic introgression may be enriched at the pathway level.

      Looking into the biological functions enriched for Denisovan introgression in Tibetans is important for characterizing the impact of Denisovan introgression.

      Weaknesses: 

      The manuscript lacks details or justification about how/why some of the analyses were performed. Below are some examples where the authors could provide additional details.

      The authors made specific choices in their window analysis. These choices are not justified or there is no comment as to how results might change if these choices were perturbed. For example, in the methods, the authors write "Then, the genome was divided into 200 kb windows with an overlap of 50 kb and for each of them we calculated the ratio between the number of significant SNVs and the total number of variants." 

      Additional information is needed for clarity. For example, "we considered only protein-protein interactions showing confidence scores {greater than or equal to} 0.7 and the obtained protein frameworks were integrated using information available in the literature regarding the functional role of the related genes and their possible involvement in high-altitude adaptation." What do the confidence scores mean? Why 0.7?

      In the method section (Identifying gene networks enriched for Denisovan-like derived alleles), the authors write "To validate VolcanoFinder results by using an independent approach". Does this mean that for signet the authors do not use the regions identified as adaptively introgressed using volcanofinder? I thought in the original signet paper, the authors used a summary describing the amount of introgression of a given region.

      Later, the authors write "To do so, we first compared the Tibetan and Denisovan genomes to assess which SNVs were present in both modern and archaic sequences. These loci were further compared with the ancestral reconstructed reference human genome sequence (1000 Genomes Project Consortium et al., 2015) to discard those presenting an ancestral state (i.e., that we have in common with several primate species)." It is not clear why the authors are citing the 1000 genomes project. Are they comparing with the reference human genome reference or with all populations in the 1000 genomes project? Also, are the authors allowing derived alleles that are shared with Africans? Typically, populations from Africa are used as controls since the Denisovan introgression occurred in Eurasia.

      The methods section for Figures 4B, 4C, and 4D is a little hard to understand. What is the x-axis on these plots? Is it the number of pairwise differences to Denisovan? The caption is not clear here. The authors mention that "Conversely, for non-introgressed loci (e.g., EGLN1), we might expect a remarkably different pattern of haplotypes distribution, with almost all haplotype classes presenting a larger proportion of non-Tibetan haplotypes rather than Tibetan ones." There is clearly structure in EGLN1. There is a group of non-Tibetan haplotypes that are closer to Denisovan and a group of Tibetan haplotypes that are distant from Denisovan...How do the authors interpret this? 

      In the original signet paper (Guoy and Excoffier 2017), they apply signet to data from Tibetans. Zhang et al. PNAS (2021) also applied it to Tibetans. It would be helpful to highlight how the approach here is different. 

      We thank the Reviewers for having appreciated the rationale of our study and to have identified potential issues that deserve to be addressed in order to better focus on robust results specifically supported by multiple approaches.

      First, we agree with the Reviewers that clarification and justification for the methodologies adopted in the present study should be deepened with respect to what done in the original version of the manuscript, with the purpose of making it more intelligible for a broad range of scientists. As reported thoroughly in the revised version of the text, the VolcanoFinder algorithm, which we used as the primary method to discover new candidate genomic regions affected by events of adaptive introgression, was chosen among several approaches developed to detect signatures ascribable to such an evolutionary process according to the following reasons: i) VolcanoFinder is one of the few methods that can test jointly events of both archaic introgression and adaptive evolution (e.g., the D statistic cannot formally test for the action of natural selection, having been also developed to provide genome wide estimates of allele sharing between archaic and modern groups rather than to identify specific genomic regions enriched for introgressed alleles); ii) the model tested by the VolcanoFinder algorithm remarkably differs from those considered by other methods typically used to test for adaptive introgression, such as the RD, U and Q statistics, which are aimed at identifying chromosomal segments showing low divergence with respect to a specific archaic sequence and/or enriched in alleles uniquely shared between the admixed group and the source population, as well as characterized by a frequency above a certain threshold in the population under study, thus being useful especially to test an evolutionary scenario conformed to that expected in the case that adaptation was mediated by strong selective sweeps rather than weak polygenic mechanisms (see answer to comment #1 of Reviewer #1 for further details); iii) VolcanoFinder relies on less demanding computational efforts respect to other algorithms, such as genomatnn and Maladapt, which also require to be trained on large genomic simulations built specifically to reflect the evolutionary history of the population under study, thus increasing the possibility to introduce bias in the obtained results if the information that guides simulation approaches is not accurate.

      Despite that, we agree with Reviewer #2 that some criteria formerly implemented during the filtering of VolcanoFinder results (e.g., normalization of LR scores, use of a sliding windows approach, and implementation of enrichment analysis based on specific confidence scores) might introduce erratic changes, which depend on the thresholds adopted, in the list of the genomic regions considered as the most likely candidates to have experienced adaptive introgression. To avoid this issue, and to adhere more strictly to the VolcanoFinder pipeline of analyses developed by Setter et al. 2020, in the revised version of the manuscript we have opted to use raw LR scores and to shortlist the most significant results by focusing on loci showing values falling in the top 5% of the genomic distribution obtained for such a statistic (see Materials and methods for details). 

      Moreover, to further reduce the use of potential arbitrary filtering thresholds we decided to do not implement functional enrichment analysis to prioritize results from the VolcanoFinder method. To this end, although a STRING confidence score (i.e., the approximate probability that a predicted interaction exists between two proteins belonging to the same functional pathway according to information stored in the KEGG database) above 0.7 is generally considered a high confidence score (string-db.org, Szklarczyk et al. 2014), we replaced such a prioritization criterion by considering as the most robust candidates for adaptive introgression only those genomic regions that turned out to be supported by all the approaches used (i.e., VolcanoFinder, Signet, LASSI and Haplostrips analyses).

      According to the Reviewers’ comments on the use of the Signet algorithm, we realized that the rationale beyond such a validation approach was not well described in the original version of the manuscript. First and foremost, we would like to clarify that in the present study we did not use this method to test for the action of natural selection (as it was formerly used by Gouy et al. 2017), but specifically to identify genomic regions putatively affected by archaic introgression. For this purpose, we followed the approach described by Gouy and Excoffier 2020 by searching for significant networks of genes presenting archaic-derived variants observable in the considered Tibetan populations but not in an outgroup population of African ancestry. Accordingly, we used the Signet method as an independent approach to obtain a first validation of introgressed (but not necessarily adaptive) loci pointed out by VolcanoFinder results. 

      In detail, in response to the question by Reviewer #2 about which genomic regions have been considered in the Signet analysis, it is necessary to clarify that to obtain the input score associated to each gene along the genome, as required by the algorithm, we calculated average frequency values per gene by considering all the archaic-derived alleles included in the Tibetan dataset but not in the outgroup one. Therefore, we did not take into account only those loci identified as significant by VolcanoFinder analysis, but we performed an independent genome scan. Then, we crosschecked significant results from VolcanoFinder and Signet approaches and we shortlisted the genomic regions supported by both. This approach thus differs from that of Zhang et al. 2021 in which the input scores per gene were obtained by considering only those loci previously pointed out by another method as putatively introgressed. Moreover, as mentioned in the previous paragraph, our approach differs also from that implemented by Guoy et al. 2017, in which the input scores assigned to each gene were represented by the variants showing the smallest P-value associated to a selection statistic, being thus informative about putative adaptive events but not introgression ones.

      However, as correctly pointed out by both the Reviewers, we formerly performed Signet analysis by considering derived alleles shared between Tibetans and the Denisovan species, without filtering out those alleles that are observed also in other modern human populations. We agree with the Reviewers that this approach cannot rule out the possibility of retaining false positive results ascribable to ancestral polymorphisms rather than introgressed alleles. According to the Reviewers’ suggestion, we thus repeated the Signet analysis by removing derived alleles observed also in an outgroup population of African ancestry (i.e., Yoruba), by assuming that only Eurasian H. sapiens populations experienced Denisovan admixture. In detail, we considered only those alleles that: i) were shared between Tibetans and Denisovan (i.e., Denisovan-like alleles); ii) were assumed to be derived according to the comparison with the ancestral reconstructed reference human genome sequence; iii) were completely absent (i.e., present frequency equal to zero) in the Yoruba population sequenced by the 1000 Genomes Project. Despite the comment of Reviewer #1 seems to propose the possible use of Han Chinese as a further control population, we decided to do not filter out Denisovan-like derived alleles present also in this human group because evidence collected so far suggest that Denisovan introgression in the gene pool of East Asian ancestors predated the split between low-altitude and high-altitude populations (Lu et al. 2016; Hu et al. 2017) and, as mentioned before, we aimed at using the Signet algorithm to validate introgression events rather than adaptive ones (see the answer to comment #6 of Reviewer #1 for further details). Moreover, we would like to remark that we decided to maintain the Signet analysis as a validation method in the revised version of the manuscript because: i) comments from both the Reviewers converge in suggesting how to effectively improve this approach, and ii) it represents a method that goes beyond the simple identification of single putative introgressed alleles, by instead enabling us to point out those biological functions that might have been collectively shaped by gene flow from Denisovans.

      In addition to validate genomic regions putatively affected by archaic introgression by crosschecking results from the VolcanoFinder and Signet analyses, according to the suggestion by Reviewer #1 we implemented a further validation procedure aimed at formally testing for the adaptive evolution of the identified candidate introgressed loci. For this purpose, we applied the LASSI likelihood haplotype based method (Harris & DeGiorgio 2020) to Tibetan whole genome data. Notably, we choose this approach mainly for the following reasons: i) because it is able to detect and distinguish genomic regions that have experienced different types of selective events (i.e. strong and weak ones); ii) it has been demonstrated to have increased power in identifying them with respect to other selection statistics (e.g., H12 and nSL) (Harris & DeGiorgio 2020). Again, we performed an independent genome scan using the LASSI algorithm and then we crosschecked the obtained significant results with those previously supported by VolcanoFinder and Signet approaches in order to shortlist genomic regions that have plausibly experienced both archaic introgression and adaptive evolution.

      Moreover, we maintained a final validation step represented by Haplostrips analysis, which was instead specifically performed on chromosomal segments supported by results from both VolcanoFinder, Signet, and LASSI approaches. This enabled us to assess the similarity between Denisovan haplotypes and those observed in Tibetans (i.e., the population under study in which archaic alleles might have played an adaptive role in response to high-altitude selective pressures), Han Chinese (i.e., a sister group whose common ancestors with Tibetans have experienced Denisovan admixture, but have then evolved at low altitude), and Yoruba (i.e., an outgroup that is assumed to have not received gene flow from Denisovans). 

      In conclusion, we believe that the substantial changes incorporated in the manuscript according to the Reviewers’ suggestions strongly improved the study by enabling us to focus on more solid results with respect to those formerly presented. Interestingly, although the single candidate loci supported by all the approaches now implemented for validating the obtained results have attained higher prioritization with respect to previous ones (which are supported by some but not all the adopted methods), angiogenesis still stands out as the one of the main biological functions that have been shaped by events of adaptive introgression in human groups of Tibetan ancestry. This provides new evidence for the contribution of introgressed Denisovan alleles other than the EPAS1 ones in modulating the complex adaptive responses evolved by Himalayan populations to cope with selective pressures imposed by high altitudes.

      Responses to Recommendations For The Authors:

      Reviewer #1:

      The authors mainly relied on one method, VolcanoFinder (VF), to detect adaptive introgression signals. As one of the recently developed methods, VF indeed demonstrated statistical power at detecting mild selection on archaic variants, as well as detecting soft sweeps on standing variations. However, compared to other commonly used methods for detecting adaptive introgression, such as the U and Q stats (Racimo et al. 2017), genomatnn (Gower et al. 2021), or MaLAdapt (Zhang et al. 2023),

      VF doesn't seem to have better power at capturing mild and incomplete sweeps. And it makes me wonder about the justification for choosing VF over other methods here, which is not clearly explained in the manuscript. If these adaptive introgression candidates are legitimate, even if the signals are mild, at least some of the other methods should be able to recapitulate the signature (even if they don't necessarily make it through the genome-wide significance thresholds). I would be more convinced about the archaic origin of these regions if the authors could validate their reported findings using some of the aforementioned other methods. 

      According to the Reviewer’s suggestion, in the revised version of the manuscript we have expanded the considerations reported as concern the rationale that guided the choice of the adopted methods. In particular, in the Materials and methods section (see page 12) we have specificed the reasons for having used the VolcanoFinder algorithm. 

      First, it represents one of the few approaches that relies on a model able to test jointly the occurrence of archaic introgression and the adaptive evolution of the genomic regions affected by archaic gene flow, without the need for considering the putative source of introgression. This was a relevant aspect for us, beacuse we planned to adopt at least two main independent (and possibly quite different in terms of the underlying approaches) methods to validate the identified candidate intregressed loci and the other algorithm we used (i.e., Signet) was explicitly based on the comparison of modern data with the archaic sequence. Accordingly, the model tested by VolcanoFinder differs from those considered by the RD, U and Q statistics. In fact, RD statistic is aimed at identifying regions of the genome with low divergence with respect to a given archaic reference, while the U/Q statistics can detect those chromosomal segments enriched in alleles that are i) uniquely shared between the admixed group (e.g., Tibetans) and the source population (e.g., Denisovans), and ii) that present a frequency above a specific threshold in the admixed population (Racimo et al. 2016). For instance, all the loci considered as likely involved in adaptive introgression events by Racimo et al. 2016 presented remarkable frequencies, with most of them showing values above 50%. That being so, we decided to do not implement these methods because we believe that they are more suitable for the detection of adaptive introgression events involving few variants with a strong effect on the phenotype, which comport a substantial increase in frequency in the population subjected to the selective pressure (i.e., cases such as that of  EPAS1), while it appears challenging to choose an arbitrary frequency threshold appropriate for the detection of weak and/or polygenic selective events. 

      As regards the possible use of Maladapt or genomatnn approaches as validation methods, we believe that they rely on more demanding computational efforts with respect to the Signet algorithm and, above all, they have the disadvantage of requiring to be trained on simulated genomic data. This makes them more prone to the potential bias introduced in the obtained results by simulations that do not carefully reflect the evolutionary history of the population under study.

      Overall, we do not agree with the Reviwer’s statement about the fact that we mainly relied on a single method to detect adaptive introgression signals because, as mentioned above, the Signet algorithm was specifically used to identify genomic regions putatively affected by introgression. This method relies on assumptions very similar to those described above for the U/Q statistics (e.g. it considers alleles uniquely shared between Tibetans and Denisovans), but avoids the necessity to select a frequency threshold to shortlist the most likely adaptive intregressed loci. In addition, according to another suggestion by the Reviewer we have now implemented a further approach to provide evidence for the adaptive evolution of the candidate introgressed loci (see response to comment #3).  

      As regards the use of Signet, based on comments from both the Reviewers we realized that the rationale beyond such a validation approach was not well described in the original version of the manuscript. First and foremost, we would like to clarify that in the present study we did not use this method to test for the action of natural selection (as it was formerly used by Gouy et al. 2017), but specifically to identify genomic regions putatively affected by archaic introgression. For this purpose, we followed the approach described by Gouy and Excoffier (2020) by searching for significant networks of genes presenting archaic-derived variants observable in the considered Tibetan populations. That being so, we used the Signet method as an independent approach to obtain a first validation of VolcanoFinder results. However, by following suggestions from both the Reviweres, we modified the criteria adopted to filter for archaic-derived variants, by excluding those alleles in common between Denisovan and the Yoruba outgroup population (see response to comment #6 for further information regarding this aspect). 

      To sum up, we think that the combination of VolcanoFinder and Signet+LASSI approaches offered a good compromise between required computational efforts to shortlist the most robust candidates of adaptive introgressed loci and the typologies of model tested (i.e. that does not diascard a priori genomic signatures ascribable to weak and/or polygenic selective events). Morevoer, we would like to remark that we decided to maintain the Signet method as a validation approach in the revised version of the manuscript because: i) comments from both the Reviewers converge in suggesting how to effectively improve this approach, and ii) it represents a method that can be used to perform both single-locus validation analysis and to search for those biological functions that have been collectively much more impacted by archaic introgression, allowing to test a more realistic approximation of the polygenic model of adaptation involving introgressed alleles. In fact, although the single candidate loci supported by all the approaches now implemented for validating the obtained results  (see responses to comments #3 and #7 for further details) have attained higher prioritization with respect to previous ones (i.e., EP300 and NOS2, which are now supported by some but not all the adopted methods), angiogenesis still stands out as one of the main biological functions that have been shaped by events of adaptive introgression in the ancestors of Tibetan populations. 

      Besides, I am a little surprised to see that in Supplementary Figure 2, VF didn't seem to capture more significant LR values in the EPAS1 region (positive control of adaptive introgression) than in the negative control EGLN1 region. The author explained this as the selection on EPAS1 region is "not soft enough", which I find a bit confusing. If there is no major difference in significant values between the positive and negative controls, how would the authors be convinced the significant values they detected in their two genes are true positives? I would like to see more discussion and justification of the VF results and interpretations.

      In the light of such a Reviewer’s observation and according to the Reviewer #2 overall comment on the procedures implemented for filtering VolcanoFinder results, we realized that both normalization of  LR scores and the use of a sliding windows approach might introduce erratic changes, which depend on the thresholds adopted, in the list of the genomic regions considered as the most likely candidates to have experienced adaptive introgression. To avoid this issue, and to adhere more strictly to the VolcanoFinder pipeline of analyses developed by Setter et al. 2020, in the revised version of the manuscript we have opted to use raw LR scores and to shortlist the most significant results by focusing on loci showing values falling in the top 5% of the genomic distribution obtained for such a statistic (see Materials and methods, page 13 lines 4 -16 for further details).

      By following this approach, we indeed observed a pattern clearer than that previously described, in which the distribution of LR scores in the EPAS1 genomic region is remarkably different with respect to that obtained for the EGLN1 gene (Figure 2 – figure supplement 1). More in detail, we identified a total of 19 EPAS1 variants showing scores within the top 5% of LR values, in contrast to only three EGLN1 SNVs. Moreover, LR values were collectively more aggregated in the EPAS1 genomic region and showed a higher average value with respect to what observed for EGLN1. We reported LR values, as well as -log (a) scores calculated for these control genes in Supplement tables 3 and 4.

      Nevertheless, we agree with the Reviewer that results pointed out by VolcanoFinder require to be confirmed by additional methods, which is was what we have done to define both new candidate adaptive intregressed loci and the considered positive/negative controls. In fact, validation analyses performed to confirm signatures of both archaic introgression and adaptive evolution (i.e., Signet, LASSI and Haplostrips) converged in indicating that Tibetan variability at the EGLN1 gene does not seem to have been shaped by archaic introgression events but only by the action of natural selection (see Results, page 5 lines 3-9, page 6 lines 23-25, page 7 lines 29-36; Discussion page 14 lines 33-36; Figure 2 – figure supplement 1B and Figure 4 – figure supplement 1B, 3B and 3D), also according to what was previously proposed (Hu et al., 2017). On the other hand, results from all validation analyses confirmed adaptive introgression signatures at the EPAS1 genomic region (see Results page 4 lines 32-37, page 5 lines 1-2 and 30-34, page 6 lines 23-29; Figure 3A, 3B and Figure 4 – figure supplement 1A, 3A and 3C). 

      Finally, as already reported in the former version of the manuscript, our choice of considering EPAS1 and EGLN1 respectively as positive and negative controls for adaptive introgression was guided by previous evidence suggesting these loci as targets of natural selection in high-altitude Himalayan populations (Yang et al., 2017; Liu et al., 2022), although only EPAS1 was proved to have been involved also in an adaptive introgression event (Huerta-Sanchez et al., 2014; Hu et al., 2017). 

      With that being said, I suggest the authors try to first validate the signal of positive selection in the two gene regions using methods such as H2/H1 (Garud et al. 2015), iHS (Voight et al. 2006) etc. that have demonstrated power and success at detecting mild sweeps and soft sweeps, regardless of if these are adaptive introgression.

      According to the Reviewer’s suggestion, we validated the new candidate adaptive introgressed loci by using also a method to formally test for the action of natural selection. In particular, we decided to use the LASSI (Likelihood-based Approach for Selective Sweep Inference) algorithm developed by Harris & DeGiorgio (2020) mainly for the following reasons: i) it is able to identify both strong and weak genomic signatures of positive selection similarly to others approaches, but additionally it can distinguish these signals by explicitly classifying genomic windows affected by hard or soft selective sweeps; ii) when applied on simulated data generated under different demographic models and by setting a range of different values for the parameters that describe a selective event (e.g., the time at which the beneficial mutation arose, the selection coefficient s) it has been proved to have an increased power with respect to traditional selection scans, such as nSL, H2/H1 and H12 (see Harris & DeGiorgio 2020 for further details).  

      According to such an approach, we were able to recapitulate signatures of natural selection previously observed in Tibetans for both EPAS1 and EGLN1 (Figure 4 – figure supplement 1 and 3C – 3D).  We also obtained comparable patterns for our previous candidate adaptive introgressed loci (i.e., EP300 and NOS2), as well as for the new ones that have been instead prioritized in the revised version of the manuscript according to consistent results also from VolcanoFinder, Signet and Haplostrips analyses (see Results, page 6 lines 30-35; Figure 4C, 4D, Figure 4 – figure supplement 2C and 2D).    

      With regard to the plausible archaic origin of the haplotypes under selection in these gene regions, my concern comes from the fact that other recent studies characterizing the archaic ancestry landscape in Tibetans and East Asians (eg. SPrime reports from Browning et al. 2018, as well as ArchaicSeeker reports from Yuan et al. 2021) didn't report archaic segments in regions overlapping with EP300 and NOS2. So how would the authors explain the discrepancy here, that adaptive introgression is detected yet there is little evidence of archaic segments in the regions? 

      We thank the Reviewer for the comment and the references provided. However, we read the suggested articles and in both of them it does not seem that genomes from individuals of Tibetan ancestry have been analysed. Moreover, in the study by Yuan et al. 2021 we were not able to find any table or supplementary table reporting the genomic segments showing signatures of Denisovan-like introgression in East Asian groups, with only findings from enrichment analyses performed on significant results being described for the Papuan population. Anyway, as reported below in the response to comment #5, in line with what observed by the Reviwer as concerns the original version of the manuscript, according to the additional validation analyses implemented during this revison EP300 and NOS2 received lower prioritization with respect to other loci showing more robust signatures supporting introgression of Denisovan alleles in the gene pool of Tibetan ancestors (i.e., TBC1D1, PRKAG2, KRAS and RASGRF2). Three out of four of these genes are in accordance also with previously published results supporting introgression of Denisovan alleles in the ancestors of present-day Han Chinese (Browning et al. 2018) or directly in the Tibetan genomes (Hu et al. 2017) (see Results, page 5 lines 10-21 and Supplement table 5). Despite that, the reason why not all the candidate adaptive introgression regions detected by our analyses are found among results from Browning et al. 2018 can be represented by the fact that in Han Chinese this archaic variation could have evolved neutrally after the introgression events, thus preventing the identification of chromosomal segments enriched in putative archaic introgressed variants according to VolcanoFinder and LASSI approaches (which consider also the impact of natural selection). In fact, the Sprime method implemented by Browning et al. 2018 focuses only on introgression events rather than adaptive introgression ones. For instance, the Denisovan-like regions identified with Sprime in Han Chinese by such a study do not comprise at all the EPAS1 region. 

      Additionally, looking at Figure 4 and Supplementary Figure 4, the authors showed haplotype comparisons between Tibetans, Denisovan, and Han Chinese for EP300 and NOS2 regions. However, in both figures, there are about equal number of Tibetans and Han Chinese that harbor the haplotype with somewhat close distance to the Denisovan genotype. And this closest haplotype is not even that similar to the Denisovan. So how would the authors rule out the possibility that instead of adaptive introgression, the selection was acting on just an ancestral modern human haplotype?

      We agree with the Reviewer that according to the analyses presented in the original version of the manuscript haplotype patterns observed at EP300 and NOS2 loci by means of the Haplostrips approach cannot ruled out the possibility that their adaptative evolution involved ancestral modern human haplotypes. In fact, after the modifications implemented in the adopted pipeline of analyses based on the Reviewers’ suggestions, their role in modulating complex adaptations to high-altitudes was confirmed also by results obtained with the LASSI algorithm (in addition to results from previous studies Bigham et al., 2010; Zheng et al., 2017; Deng et al., 2019; X. Zhang et al., 2020), but their putative archaic origin received lower prioritization with respect to other loci, being not confirmed by all the analyses performed.

      Furthermore, I have a question about how exactly the authors scored the genes in their network analysis using Signet. The manuscript mentioned they were looking for enrichment of archaic-like derived alleles, and in the methods section, they mentioned they used SNPs that are present in both Denisovan and Tibetan genomes but are not in the chimp ancestral allele state. But are these "derived" alleles also present in Han Chinese or Africans? If so, what are the frequencies? And if the authors didn't use derived alleles exclusively shared between Tibetans and Denisovans, that may lead to false positives of the enrichment analysis, as the result would not be able to rule out the selection on ancestral modern human variation.

      As mentioned in the response to comment #1, by following the suggestions of both the Reviewers we have modified the criteria adopted for filtering archaic derived variants exclusively shared between Denisovans and Tibetans. In particular, we retained as input for Signet analysis only those alleles that i) were shared between Tibetans and Denisovan (i.e., Denisovan-like alleles) ii) were in their derived state and iii) were completely absent (i.e., show frequency equal to zero) in the Yoruba population sequenced by the 1000 Genome Project and used here as an outgroup by assuming that only Eurasian H. sapiens populations experienced Denisovan admixture. We instead decided to do not filter out potential Denisovan-like derived alleles present also in the Han Chinese population because multiple evidence agreed at indicating that gene flow from Denisovans occurred in the ancestral East Asian gene pool no sooner than 48–46 thousand years ago (Teixeira et al. 2019; Zhang et al. 2021; Yuan et al. 2021), thus predating the split between low-altitude and high-altitude groups, which occurred approximately 15 thousand years ago (Lu et al. 2016; Hu et al. 2017). In fact, traces of such an archaic gene-flow are still detectable in the genomes of several low-altitude populations of East Asian ancestry (Yuan et al. 2021).

      Concerning the above, I would also suggest the authors replot their Figure 4 and Figure S4 by adding the African population (eg. YRI) in the plot, and examine the genetic distance among the modern human haplotypes, in contrast to their distance to Denisovan.

      According to the Reviewer’s suggestion, after having identified new candidate adaptive introgressed loci according to the revised pipeline of analyses, we run the Haplostrips algorithm by including in the dataset 27 individuals (i.e., 54 haplotypes) from the Yoruba population sequenced by the 1000 Genomes Project (Figure 4A, 4B, Figure 4 - figure supplement 2A, 2B, 3A).

      Reviewer #2:

      In the methods the authors write "Since composite likelihood statistics are not associated with pvalues, we implemented multiple procedures to filter SNVs according to the significance of their LR values." What does significance mean here?

      After modifications applied to the adopted pipeline of analyses according to the Reviewers’ suggestions (see responses to public reviews and to comments #1, #3, #6, #7 of Reviewer #1), new candidate adaptive introgressed loci have been identified specifically by focusing on variants showing LR values falling in the top 5% of the genomic distribution obtained for such a statistic in order to adhere more strictly to the VolcanoFinder approach developed by Setter et al. 2020. Therefore, the related sentence in the materials and methods section was modified accordingly.

      Signet should be cited the first time it appears in the manuscript. The citation in the references is wrong. It lists R. Nielsen as the last author, but R. Nielsen is not an author of this paper.

      We thank the Reviewer for the comment. We have now mentioned the article by Gouy and Excoffier (2020) in the Results section where the Signet algorithm was first described and we have corrected the related reference.

      I could not find Figure 5 which is cited in the methods in the main text. I assume the authors mean Supplementary Figure 5, but the supplementary files have Figure 4.

      We thank the Reviewer for the comment. We have checked and modified figures included in the article and in the supplementary files to fix this issue.

      I didn't see a table with the genes identified as adaptatively introgressed with VolcanoFinder. This would be useful as I believe this is the first time VolcanoFinder is being used on Tibetan data?

      According to the Reviewer suggestion, we have reported in Supplement table 2 all the variants showing LR scores falling in the top 5% of the genomic distribution obtained for such a statistic, along with the associated α parameters computed by the VolcanoFinder algorithm.

      It is easier for the reviewer if lines have numbers.

      According to the Reviewer suggestion, we have included line numbers in the revised version of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Reviewer 1

      Summary:

      The authors introduce a denoising-style model that incorporates both structure and primary-sequence embeddings to generate richer embeddings of peptides. My understanding is that the authors use ESM for the primary sequence embeddings, take resolved structures (or use structural predictions from AlphaFold when they're not available), and then develop an architecture to combine these two with a loss that seems reminiscent of diffusion models or masked language model approaches. The embeddings can be viewed as ensemble-style embedding of the two levels of sequence information, or with AlphaFold, an ensemble of two methods (ESM+AlphaFold). The authors also gather external datasets to evaluate their approach and compare it to previous approaches. The approach seems promising and appears to out-compete previous methods at several tasks. Nonetheless, I have strong concerns about a lack of verbosity as well as the exclusion of relevant methods and references.

      Thank you for the comprehensive summary. Regarding the concerns listed in the review below, we have made point-to-point response. We also modified our manuscript in accordance. 

      Advances:

      I appreciate the breadth of the analysis and comparisons to other methods. The authors separate tasks, models, and sizes of models in an intuitive, easy-to-read fashion that I find valuable for selecting a method for embedding peptides. Moreover, the authors gather two datasets for evaluating embeddings' utility for predicting thermostability. Overall, the work should be helpful for the field as more groups choose methods/pretraining strategies amenable to their goals, and can do so in an evidence-guided manner.

      Thank you for recognizing the strength of our work in terms of the notable contributions, the solid analysis, and the clear presentation.

      Considerations:

      (1) Primarily, a majority of the results and conclusions (e.g., Table 3) are reached using data and methods from ProteinGym, yet the best-performing methods on ProteinGym are excluded from the paper (e.g., EVEbased models and GEMME). In the ProteinGym database, these methods outperform ProtSSN models. Moreover, these models were published over a year---or even 4 years in the case of GEMME---before ProtSSN, and I do not see justification for their exclusion in the text.

      We decided to exclude the listed methods from the primary table as they are all MSA-based methods, which are considered few-shot methods in deep learning (Rao et al., ICML, 2021). In contrast, the proposed ProtSSN is a zero-shot method that makes inferences based on less information than few-shot methods. Moreover, it is possible for MSA-based methods to query aligned sequences based on predictions. For instance, Tranception (Notin et al., ICML, 2022) selects the model with the optimal proportions of logits and retrieval results according to the average correlation score on ProteinGym (Table 10, Notin et al., 2022).

      With this in mind, we only included zero-shot deep learning methods in Table 3, which require no more than the sequence and structure of the underlying wild-type protein when scoring the mutants. In the revision, we have added the performance of SaProt to Table 3, and the performance of GEMME, TranceptEVE, and SaProt to Table 5. Furthermore, we have released the model's performance on the public leaderboard of ProteinGym v1 at proteingym.org.

      (2) Secondly, related to the comparison of other models, there is no section in the methods about how other models were used, or how their scores were computed. When comparing these models, I think it's crucial that there are explicit derivations or explanations for the exact task used for scoring each method. In other words, if the pre-training is indeed an important advance of the paper, the paper needs to show this more explicitly by explaining exactly which components of the model (and previous models) are used for evaluation. Are the authors extracting the final hidden layer representations of the model, treating these as features, and then using these features in a regression task to predict fitness/thermostability/DDG etc.? How are the model embeddings of other methods being used, since, for example, many of these methods output a k-dimensional embedding of a given sequence, rather than one single score that can be correlated with some fitness/functional metric? Summarily, I think the text lacks an explicit mention of how these embeddings are being summarized or used, as well as how this compares to the model presented.

      Thank you for the suggestion. Below we address the questions in three points. 

      (1) The task and the scoring for each method. We followed your suggestion and added a new paragraph titled “Scoring Function” on page 9 to provide a detailed explanation of the scoring functions used by other deep learning zero-shot methods.

      (2) The importance of individual pre-training modules. The complete architecture of the proposed ProtSSN model has been introduced on page 7-8. Empirically, the influence of each pre-training module on the overall performance has been examined through ablation studies on page 12. In summary, the optimal performance is achieved by combining all the individual modules and designs.

      (3) The input of fitness scoring. For a zero-shot prediction task, the final score for a mutant will be calculated by wildly-used functions named log-odds ratio (for encoder models, including ours) or loglikelihood (for autoregressive models or inverse folding models. In the revision, we explicitly define these functions in sections “Inferencing” (page 7) and “Scoring Function” (page 9). 

      (3) I think the above issues can mainly be addressed by considering and incorporating points from Li et al. 2024[1] and potentially Tang & Koo 2024[2]. Li et al.[1] make extremely explicit the use of pretraining for downstream prediction tasks. Moreover, they benchmark pretraining strategies explicitly on thermostability (one of the main considerations in the submitted manuscript), yet there is no mention of this work nor the dataset used (FLIP (Dallago et al., 2021)) in this current work. I think a reference and discussion of [1] is critical, and I would also like to see comparisons in line with [1], as [1] is very clear about what features from pretraining are used, and how. If the comparisons with previous methods were done in this fashion, this level of detail needs to be included in the text.

      The initial version did not include an explicit comparison with the mentioned reference due to the difference in the learning task. In particular, [1] formulates a supervised learning task on predicting the continuous scores of mutants of specific proteins. In comparison, we make zero-shot predictions, where the model is trained in a self-supervised learning manner that requires no labels from experiments. In the revision, we added discussions in “Discussion and Conclusion” (lines 476-484):

      Recommendations For The Authors:

      Comment 1

      I found the methods lacking in the sense that there is never a simple, explicit statement about what is the exact input and output of the model. What are the components of the input that are required by the user (to generate) or supply to the model? Are these inputs different at training vs inference time? The loss function seems like it's trying to de-noise a modified sequence, can you make this more explicit, i.e. exactly what values/objects are being compared in the loss?

      We have added a more detailed description in the "Model Pipeline" section (page 7), which explains the distinct input requirements for training and inference, as well as the formulation of the employed loss function. To summarize:

      (1) Both sequence and structure information are used in training and inference. Specifically, structure information is represented as a 3D graph with coordinates, while sequence information consists of AA-wise hidden representations encoded by ESM2-650M. During inference, instead of encoding each mutant individually, the model encodes the WT protein and uses the output probability scores relevant to the mutant to calculate the fitness score. This is a standard operation in many zero-shot fitness prediction models, commonly referred to as the log-odds-ratio.

      (2) The loss function compares the differences between the noisy input sequence and the output (recovered) AA sequence. Noise is added to the input sequences, and the model is trained to denoise them (see “Ablation Study” for the different types of noise we tested). This approach is similar to a one-step diffusion process or BERT-style token permutation. The model learns to recover the probability of each node (AA) being one of 33 tokens. A cross-entropy loss is then applied to compare this distribution with the ground-truth (unpermuted) AA sequence, aiming to minimize the difference.

      To better present the workflow, we revised the manuscript accordingly.

      Comment 2

      Related to the above, I'm not exactly sure where the structural/tertiary structure information comes from. In the methods, they don't state exactly whether the 3D coordinates are given in the CATH repository or where exactly they come from. In the results section they mention using AlphaFold to obtain coordinates for a specific task---is the use of AlphaFold limited only to these tasks/this is to show robustness whether using AlphaFold or realized coordinates?

      The 3D coordinates of all proteins in the training set are derived from the crystal structures in CATH v4.3.0 to ensure a high-quality input dataset (see "Training Setup," Page 8). However, during the inference phase, we used predicted structures from AlphaFold2 and ESMFold as substitutes. This approach enhances the generalizability of our method, as in real-world scenarios, the crystal structure of the template protein to be engineered is not always available. The associated descriptions can be found in “Training Setup” (lines 271-272) and “Folding Methods” (lines 429-435).

      Comment 3

      Lines 142+144 missing reference "Section establishes", "provided in Section ."

      199 "see Section " missing reference

      214 missing "Section"

      Thank you for pointing this out. We have fixed all missing references in the revision.

      Comment 4

      Table 2 - seems inconsistent to mention the number of parameters in the first 2 methods, then not in the others (though I see in Table 3 this is included, so maybe should just be omitted in Table 2).

      In Table 2, we present the zero-shot methods used as baselines. Since many methods have different versions due to varying hyperparameter settings, we decided to list the number of parameters in the following tables.

      We have double-checked both Table 3 and Table 5 and confirm that there is no inconsistency in the reported number of parameters. One potential explanation for the observed difference in the comment could be due to the differences in the number of parameters between single and ensemble methods. The ensemble method averages the predictions of multiple models, and we sum the total number of parameters across all models involved. For example, RITA-ensemble has 2210M parameters, derived from the sum of four individual models with 30M, 300M, 680M, and 1200M parameters.

      Comment 5

      In general, I found using the word "type" instead of "residue" a bit unnatural. As far as I can tell, the norm in the field is to say "amino acid" or "residue" rather than "type". This somewhat confused me when trying to understand the methods section, especially when talking about injecting noise (I figured "type" may refer to evolutionarily-close, or physicochemically-close residues). Maybe it's not necessary to change this in every instance, but something to consider in terms of ease of reading.

      Thank you for your suggestion. The term "type" we used is a common expression similar to "class" in the NLP field. To avoid further confusion to the biologists, we have revised the manuscript accordingly. 

      Comment 6

      197 should this read "based on the kNN "algorithm"" (word missing) or maybe "based on "its" kNN"?

      We have corrected the typo accordingly. It now reads “the 𝑘-nearest neighbor algorithm (𝑘NN)” (line 198).

      Comment 7

      200 weights of dimension 93, where does this number come from?

      The edge features are derived by Zhou et al., 2024. We have updated the reference in the manuscript for clarity (lines 201-202).

      Comment 8

      210-212 "representations of the noisy AA sequence are encoded from the noisy input" what is the "noisy AA sequence?" might be helpful to exactly defined what is "noisy input" or "noisy AA sequence". This sentence could potentially be worded to make it clearer, e.g. "we take the modified input sequence and embed it using [xyz]."

      We have revised the text accordingly. In the revised see lines 211-212:

      Comment 9

      In Table 3

      Formatting, DTm (million), (million) should be under "# Params" likely?

      Also for DDG this is reported on only a few hundred mutations, it might be worth plotting the confidence intervals over the Spearman correlation (e.g. by bootstrapping the correlation coefficient).

      We followed the suggestion and added “million” under the "# Params". We have added the bootstrapped results for DDG and DTm to Table 6. For each dataset, we randomly sampled 50% of the data for ten independent runs. ProtSSN achieves the top performance with a considerably small variance.

      Comment 10

      The paragraph in lines 319 to lines 328 I feel may lack sufficient evidence.

      "While sequence-based analysis cannot entirely replace the role of structure-based analysis, compared to a fully structure-based deep learning method, a protein language model is more likely to capture sufficient information from sequences by increasing the model scale, i.e., the number of trainable parameters."

      This claim is made without a citation, such as [1]. Increasing the scale of the model doesn't always align with improving out-of-sample/generalization performance. I don't feel fully convinced by the claim that worse prediction is ameliorated by increasing the number of parameters. In Table 3 the performance is not monotonic with (nor scales with) the number of parameters, even within a model. See ProGen2 Expression scores, or ESM-2 Stability scores, as a function of their model sizes. In [1], the authors discuss whether pretraining strategies are aligned with specific tasks. I think rewording this paragraph and mentioning this paper is important. Figure 3 shows that maybe there's some evidence for this but I don't feel entirely convinced by the plot.

      We agree that increasing the number of learnable parameters does not always result in better performance in downstream tasks. However, what we intended to convey is that language models typically need to scale up in size to capture the interactions among residues, while structure-based models can achieve this more efficiently with lower computational costs. We have rephrased this paragraph in the paper to clarify our point in lines 340-342.

      Comment 11

      Line 327 related to my major comment, " a comprehensive framework, such as ProtSSN, exhibits the best performance." Refers to performance on ProteinGym, yet the best-performing methods on ProteinGym are excluded from the comparison.

      The primary comparisons were conducted using zero-shot models for fairness, meaning that the baseline models were not trained on MSA and did not use test performance to tune their hyperparameters. It's also worth noting that SaProt (the current SOTA model) had not been updated on the leaderboard at the time of submitting this paper. In the revised manuscript, we have included GEMME and TranceptEVE in Table 5 and SaProt in Tables 3, 5, and 6. While ProtSSN does not achieve SOTA performance in every individual task, our key argument in the analysis is to highlight the overall advantage of hybrid encoders compared to single sequence-based or structure-based models. We made clearer statement in the revised manuscript (line 349):

      Comment 12

      Line 347, line abruptly ends "equivariance when embedding protein geometry significantly." (?).

      We have fixed the typo, (lines 372-373): 

      Comment 13

      Figure 3 I think can be made clearer. Instead of using True/false maybe be more explicit. For example in 3b, say something like "One-hot encoded" or "ESM-2 embedded".

      The labels were set to True/False with the title of the subfigures so that they can be colored consistently.

      Following the suggestion, we have updated the captions in the revised manuscript for clarity.

      Comment 14

      Lines 381-382 "average sequential embedding of all other Glycines" is to say that the score is taken as the average score in which Glycine is substituted at every other position in the peptide? Somewhat confused by the language "average sequential embedding" and think rephrasing could be done to make things clearer.

      We have revised the related text accordingly a for clearer presentation (lines 406-413). 

      Comment 15

      Table 5, and in mentions to VEP, if ProtSSN is leveraging AlphaFold for its structural information, I disagree that ProtSSN is not an MSA method, and I find it unfair to place ProtSSN in the "non-MSA" categories. If this isn't the case, then maybe making clearer the inputs etc. in the Methods will help.

      Your response is well-articulated and clear, but here is a slight revision for improved clarity and flow:

      We respectfully disagree with classifying a protein encoding method based solely on its input structure. While AF2 leverages MSA sequences to predict protein structures, this information is not used in our model, and our model is not exclusive to AF2-predicted structures. When applicable, the model can encode structures derived from experimental data or other folding methods. For example, in the manuscript, we compared the performance of ProtSSN using proteins folded by both AF2 and ESMFold.

      However, we would like to emphasize that comparing the sensitivity of an encoding method across different structures or conformations is not the primary focus of our work. In contrast, some methods explicitly use MSA during model training. For instance, MSA-Transformer encodes MSA information directly into the protein embedding, and Tranception-retrieval utilizes different sets of MSA hyperparameters depending on the validation set's performance.

      To avoid further confusion, we have revised the terms "MSA methods" and "non-MSA methods" in the manuscript to "zero-shot methods" and "few-shot methods."

      Comment 16

      Table 3 they're highlighted as the best, yet on ProteinGym there's several EVE models that do better as well as GEMMA, which are not referenced.

      The comparison in Table 3 focuses on zero-shot methods, whereas GEMME and EVE are few-shot models. Since these methods have different input requirements, directly comparing them could lead to

      unfair conclusions. For this reason, we reserved the comparisons with these few-shot models for Table 5, where we aim to provide a more comprehensive evaluation of all available methods.            

      Response to Reviewer 2

      Summary:

      To design proteins and predict disease, we want to predict the effects of mutations on the function of a protein. To make these predictions, biologists have long turned to statistical models that learn patterns that are conserved across evolution. There is potential to improve our predictions however by incorporating structure. In this paper, the authors build a denoising auto-encoder model that incorporates sequence and structure to predict mutation effects. The model is trained to predict the sequence of a protein given its perturbed sequence and structure. The authors demonstrate that this model is able to predict the effects of mutations better than sequence-only models.

      Thank you for your thorough review and clear summary of our work. Below, we provide a detailed, pointby-point response to each of your questions and concerns. 

      Strengths:

      The authors describe a method that makes accurate mutation effect predictions by informing its predictions with structure.

      Thank you for your clear summary of our highlights.

      Weaknesses:

      Comment 1

      It is unclear how this model compares to other methods of incorporating structure into models of biological sequences, most notably SaProt.

      (https://www.biorxiv.org/content/10.1101/2023.10.01.560349v1.full.pdf).

      In the revision, we have updated the performance of SaProt single models (with both masked and unmasked versions with the pLDDT score) and ensemble models in the Tables 3, 5, and 6.

      In the revised manuscript, we have updated the performance results for SaProt's single models (both masked and unmasked versions with the pLDDT score) as well as the ensemble models. These updates are reflected in Tables 3, 5, and 6.

      Comment 2

      ProteinGym is largely made of deep mutational scans, which measure the effect of every mutation on a protein. These new benchmarks contain on average measurements of less than a percent of all possible point mutations of their respective proteins. It is unclear what sorts of protein regions these mutations are more likely to lie in; therefore it is challenging to make conclusions about what a model has necessarily learned based on its score on this benchmark. For example, several assays in this new benchmark seem to be similar to each other, such as four assays on ubiquitin performed at pH 2.25 to pH 3.0.

      We agree that both DTm and DDG are smaller datasets, making them less comprehensive than ProteinGym. However, we believe DTm and DDG provide valuable supplementary insights for the following reasons:

      (1) These two datasets are low-throughput and manually curated. Compared to datasets from highthroughput experiments like ProteinGym, they contain fewer errors from experimental sources and data processing, offering cleaner and more reliable data.

      (2) Environmental factors are crucial for the function and properties of enzymes, which is a significant concern for many biologists when discussing enzymatic functions. Existing benchmarks like ProteinGym tend to simplify these factors and focus more on global protein characteristics (e.g., AA sequence), overlooking the influence of environmental conditions.

      (3) While low-throughput datasets like DTm and DDG do not cover all AA positions or perform extensive saturation mutagenesis, these experiments often target mutations at sites with higher potential for positive outcomes, guided by prior knowledge. As a result, the positive-to-negative ratio is more meaningful than random mutagenesis datasets, making these benchmarks more relevant for evaluating model performance.

      We would like to emphasize that DTm and DDG are designed to complement existing benchmarks rather than replace ProteinGym. They address different scales and levels of detail in fitness prediction, and their inclusion allows for a more comprehensive evaluation of deep learning models.

      Recommendations For The Authors:

      Comment 1

      I recommend including SaProt in your benchmarks.

      In the revision, we added comparisons with SaProt in all the Tables (3, 5 and 6). 

      Comment 2

      I also recommend investigating and giving a description of the bias in these new datasets.

      The bias of the new benchmarks could be found in Table 1, where the mutants are distributed evenly at different level of pH values.

      In the revision, we added a discussion regarding the new datasets in “Discussion and Conclusion” (lines 496-504 of the revised version).

      Comment 3

      I also recommend reporting the model's ability to predict disease using ClinVar -- this experiment is conspicuously absent.

      Following the suggestion, we retrieved 2,525 samples from the ClinVar dataset available on ProteinGym’s website. Since the official source did not provide corresponding structure files, we performed the following three steps:

      (1) We retrieved the UniProt IDs for the sequences from the UniProt website and downloaded the corresponding AlphaFold2 structures for 2,302 samples.

      (2) For the remaining proteins, we used ColabFold 1.5.5 to perform structure prediction.

      (3) Among these, 12 proteins were too long to be folded by ColabFold, for which we used the AlphaFold3 server for prediction.

      All processed structural data can be found at https://huggingface.co/datasets/tyang816/ClinVar_PDB. Our test results are provided in the following table. ProtSSN achieves the top performance over baseline methods.

      Author response table 1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors aimed to elucidate the cytological mechanisms by which conjugated linoleic acids (CLAs) influence intramuscular fat deposition and muscle fiber transformation in pig models. Utilizing single-nucleus RNA sequencing (snRNA-seq), the study explores how CLA supplementation alters cell populations, muscle fiber types, and adipocyte differentiation pathways in pig skeletal muscles.

      Thanks!

      Strengths:

      Innovative approach: The use of snRNA-seq provides a high-resolution insight into the cellular heterogeneity of pig skeletal muscle, enhancing our understanding of the intricate cellular dynamics influenced by nutritional regulation strategy.

      Robust validation: The study utilizes multiple pig models, including Heigai and Laiwu pigs, to validate the differentiation trajectories of adipocytes and the effects of CLA on muscle fiber type transformation. The reproducibility of these findings across different (nutritional vs genetic) models enhances the reliability of the results.

      Advanced data analysis: The integration of pseudotemporal trajectory analysis and cell-cell communication analysis allows for a comprehensive understanding of the functional implications of the cellular changes observed.

      Practical relevance: The findings have significant implications for improving meat quality, which is valuable for both the agricultural and food industry.

      Thanks!

      Weaknesses:

      Model generalizability: While pigs are excellent models for human physiology, the translation of these findings to human health, especially in diverse populations, needs careful consideration.

      Thanks!

      Reviewer #2 (Public Review):

      Summary:

      This study comprehensively presents data from single nuclei sequencing of Heigai pig skeletal muscle in response to conjugated linoleic acid supplementation. The authors identify changes in myofiber type and adipocyte subpopulations induced by linoleic acid at depth previously unobserved. The authors show that linoleic acid supplementation decreased the total myofiber count, specifically reducing type II muscle fiber types (IIB), myotendinous junctions, and neuromuscular junctions, whereas type I muscle fibers are increased. Moreover, the authors identify changes in adipocyte pools, specifically in a population marked by SCD1/DGAT2. To validate the skeletal muscle remodeling in response to linoleic acid supplementation, the authors compare transcriptomics data from Laiwu pigs, a model of high intramuscular fat, to Heigai pigs. The results verify changes in adipocyte subpopulations when pigs have higher intramuscular fat, either genetically or diet-induced. Targeted examination using cell-cell communication network analysis revealed associations with high intramuscular fat with fibro-adipogenic progenitors (FAPs).  The authors then conclude that conjugated linoleic acid induces FAPs towards adipogenic commitment. Specifically, they show that linoleic acid stimulates FAPs to become SCD1/DGAT2+ adipocytes via JNK signaling. The authors conclude that their findings demonstrate the effects of conjugated linoleic acid on skeletal muscle fat formation in pigs, which could serve as a model for studying human skeletal muscle diseases.

      Thanks!

      Strengths:

      The comprehensive data analysis provides information on conjugated linoleic acid effects on pig skeletal muscle and organ function. The notion that linoleic acid induces skeletal muscle composition and fat accumulation is considered a strength and demonstrates the effect of dietary interactions on organ remodeling. This could have implications for the pig farming industry to promote muscle marbling. Additionally, these data may inform the remodeling of human skeletal muscle under dietary behaviors, such as elimination and supplementation diets and chronic overnutrition of nutrient-poor diets. However, the biggest strength resides in thorough data collection at the single nuclei level, which was extrapolated to other types of Chinese pigs.

      Thanks!

      Weaknesses:

      While the authors generated a sizeable comprehensive dataset, cellular and molecular validation needed to be improved. For example, the single nuclei data suggest changes in myofiber type after linoleic acid supplementation, yet these data are not validated by other methodologies. Similarly, the authors suggest that linoleic acid alters adipocyte populations, FAPs, and preadipocytes; however, no cellular and molecular analysis was performed to reveal if these trajectories indeed apply. Attempts to identify JNK signaling pathways appear superficial and do not delve deeper into mechanistic action or transcriptional regulation. Notably, a variety of single cell studies have been performed on mouse/human skeletal muscle and adipose tissues. Yet, the authors need to discuss how the populations they have identified support the existing literature on cell-type populations in skeletal muscle.Moreover, the authors nicely incorporate the two pig models into their results, but the authors only examine one muscle group. It would be interesting if other muscle groups respond similarly or differently in response to linoleic acid supplementation.Further, it was unclear whether Heigai and Laiwu pigs were both fed conjugated linoleic acid or whether the comparison between Heigai-fed linoleic acid and Laiwu pigs (as a model of high intramuscular fat). With this in mind, the authors do not discuss how their results could be implicated in human and pig nutrition, such as desirability and cost-effectiveness for pig farmers and human diets high in linoleic acid. Notably, while single nuclei data is comprehensive, there needs to be a statement on data deposition and code availability, allowing others access to these datasets. Moreover, the experimental designs do not denote the conjugated linoleic acid supplementation duration. Several immunostainings performed could be quantified to validate statements. This reviewer also found the Nile Red staining hard to interpret visually and did not appear to support the conclusions convincingly. Within Figure 7, several letters (assuming they represent statistical significance) are present on the graphs but are not denoted within the figure legend.

      Thanks for your suggestions! We accepted your suggestion to revised our manuscript.

      For changes in myofiber type, we performed qPCR to verify the changes of muscle fiber type related gene expression after CLA treatment (Figure 2E); for changes of adipocyte and preadipocyte populations, we also performed immunofluorescence staining, qPCR, and western blotting in LDM tissues and FAPs to verify the alterations of cell types after feeding with CLA (Figure 3D, 3E, 6G, 7C, and 7D). Hence, we think these cellular and molecular results could support our conclusions.

      For JNK signaling pathway, we selected this signaling pathway based on snRNA-seq dataset and verified by activator in vitro experiment. However, we did not explore the mechanistic action and the downstream transcriptional regulators need to be further discussed. We have added these in the discussion part (line 443-448).

      We have added the comparation between different cell-type populations in skeletal muscles (line 362-368 and 385-390).

      For changes in myofiber type of Laiwu pigs, we have discussed in our previous study(Wang et al., 2023). Interestingly, we also found in high IMF content Laiwu pigs, the percentage of type IIa myofibers had an increased tendency (29.37% vs. 23.95%) while the percentage of type IIb myofibers had a decreased tendency (38.56% vs. 43.75%) in this study. We also added this discussion in the discussion part (line 392-395).

      We have supplied the information of treatment in the materials and methods part (line 469-478). We also added the discussion about significance of our study for human and pig nutrition in the discussion part (line 375-376 and 446-447).

      Our data will be made available on reasonable request (line 574-576).

      We have supplied the information of the CLA supplementation duration in the materials and methods part (line 465).

      Porcine FAPs have little lipid droplets and we improved the image quality (Figure 7A). In Figure 7, the Nile Red staining could be quantified and we have the quantification of Oil Red O staining (Figure 7B and 7J). We also added the statistical significance in figure legend.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for Improved or Additional Experiments, Data, or Analyses

      Cross-species analysis: To strengthen the generalizability of the results, it would be beneficial to include a comparative analysis with other species, such as human, bovine, or rodent models, using publicly available snRNA-seq datasets.

      Thanks! Our previous study has compared the conserved and unique signatures in fatty skeletal muscles between different species(Wang, Zhou, Wang, & Shan, 2024). We mainly focused on the regulatory mechanism of CLAs in regulating intramuscular fat deposition. However, there is still a blank in the snRNA-seq or scRNA-seq datasets about the effects of CLAs on regulating fat deposition in muscles across other species, including human, bovine or rodent models. Hence, we only analyze the regulatory mechanisms of CLAs influencing intramuscular fat deposition in pigs.

      Functional link: the authors should discuss in the manuscript how the muscles differ in terms of texture, flavor, aroma, etc. before and after CLA administration or between Heigai and Laiwu to provide context and help readers better understand how the observed high-resolution cellular changes relate to these functional properties of meat.

      Thanks! We have added these in the introduction part (line 90-98).

      Improve figures: some figures, particularly those involving Oil Red O and Nail Red, could be improved by including higher magnification images to assess the organization of lipid droplets of individual adipocytes (Figure 7A, I, and K).

      Thanks! Porcine FAPs have little lipid droplets and we improved the image quality (Figure 7A).

      Reviewer #2 (Recommendations For The Authors):

      All of my comments are above. However, I would recommend improving the writing as several areas throughout the results needed clarity.

      Thanks! We have revised our manuscript carefully after accepting your revisions.

      Wang, L., Zhao, X., Liu, S., You, W., Huang, Y., Zhou, Y., . . . Shan, T. (2023) Single-nucleus and bulk RNA sequencing reveal cellular and transcriptional mechanisms underlying lipid dynamics in high marbled pork NPJ Sci Food 7: 23. https://doi.org/10.1038/s41538-023-00203-4

      Wang, L., Zhou, Y., Wang, Y., & Shan, T. (2024) Integrative cross-species analysis reveals conserved and unique signatures in fatty skeletal muscles Sci Data 11: 290. https://doi.org/10.1038/s41597-024-03114-5

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Moir, Merheb et al. present an intriguing investigation into the pathogenesis of Pol III variants associated with neurodegeneration. They established an inducible mouse model to overcome developmental lethality, administering 5 doses of tamoxifen to initiate the knock-in of the mutant allele. Subsequent behavioral assessments and histological analyses revealed potential neurological deficits. Robust analyses of the tRNA transcriptome, conducted via northern blotting and RNA sequencing, suggested a selective deleterious effect of the variant on the cerebrum, in contrast to the cerebellum and non-cerebral tissues. Through this work, the authors identified molecular changes caused by Pol III mutations, particularly in the tRNA transcriptome, and demonstrated its relative progression and selectivity in brain tissue. Overall, this study provides valuable insights into the neurological manifestations of certain genetic disorders and sheds light on transcripts/products that are constitutively expressed in various tissues.

      Strengths:

      The authors utilize an innovative mouse model to constitutively knock in the gene, enhancing the study's robustness. Behavioral data collection using a spectrometer reduces experimenter bias and effectively complements the neurological disorder manifestations. Transcriptome analyses are extensive and informative, covering various tissue types and identifying stress response elements and mitochondrial transcriptome patterns. Additionally, metabolic studies involving pancreatic activity and glucose consumption were conducted to eliminate potential glucose dysfunction, strengthening the histological analyses.

      Weaknesses:

      The study could have explored identifying the extent of changes in the tRNA transcriptome among different cell types in the cerebrum. Although the authors attempted to show the temporal progression of tRNA transcriptome changes between P42 and P75 mice, the causal link was not established. A subsequent rescue experiment in the future could address this gap.

      Nonetheless, the claims and conclusions are supported by the presented data.

      We thank Reviewer 1 for their thoughtful review and commentary.  We appreciate the reviewer’s finding that our “claims and conclusions are supported by the presented data.”   

      We note that our findings on the temporal progression of transcriptional changes between P42 and P75 apply to both the Pol II and Pol III transcriptomes. Importantly, in the case of Pol III, only precursor and mature tRNAs are affected at P42 whereas at P75, numerous other Pol III transcripts are also changed.  We therefore attribute the changes in tRNA as being causal in disease initiation since this is the earliest  direct consequence of the Polr3a mutation.

      To expand on the evidence demonstrating the progressive nature of Polr3-related disease in our mouse model, the revised manuscript includes new immunofluorescence data showing no change in microglial cell density in the cerebral cortex or the striatum at an early stage in the disease (Supplementary Fig. S6F, G).  This is in striking contrast to the findings at later times (P75) where the number of microglia increased significantly in the Polr3a mutant and exhibit an activated morphology (Fig. 4G,H).   

      We agree with the reviewer that it will be interesting in the future to assess the impact of the Polr3a mutation in different neural cell types and to explore opportunities for suppressing disease phenotypes. 

      Reviewer #2 (Public Review):

      Summary:

      The study "Molecular basis of neurodegeneration in a mouse model of Polr3 related disease" by Moir et.al. showed that how RNA Pol III mutation affects production, maturation and transport of tRNAs. Furthermore, their study suggested that RNA pol III mutation leads to behavioural deficits that are commonly observed in neurodegeneration. Although, this study used a mouse model to establish theses aspects, the study seems to lack a clear direction and mechanism as to how the altered level of tRNA affects locomotor behaviour. They should have used conditional mouse to delete the gene in specific brain area to test their hypothesis. Otherwise, this study shows a more generalized developmental effect rather than specific function of altered tRNA level. This is very evident from their bulk RNA sequencing study. This study provides some discrete information rather than a coherent story. My enthusiasm for publication of this article in eLife is dampened considering following reasons mentioned in the weakness.

      Reviewer 2’s summary contains two misstatements: 

      Moir et.al. showed that how RNA Pol III mutation affects production, maturation and transport of tRNAs.

      Our experiments document the effect of a neurodegenerative disease-causing mutation in RNA polymerase III on the Pol III transcriptome with a particular focus on the tRNAome (i.e. the mature tRNA population). Experiments on the maturation and transport of tRNA were not performed as there was no indication that these processes might be negatively impacted at the earliest time point (P42). Additional comments about tRNA maturation and export are provided under points 8 and 9 (see below). 

      The study seems to lack a clear direction and mechanism as to how the altered level of tRNA affects locomotor behaviour.

      This comment misstates the purpose of our study while overlooking the important results. As stated in the abstract, our goal was to develop “a postnatal whole-body mouse model expressing pathogenic Polr3a mutations to examine the molecular mechanisms by which reduced Pol III transcription results primarily in central nervous system phenotypes.”

      Accordingly, our work provides the first molecular analysis of RNA polymerase III transcription in an animal model of Polr3-related disease. The novelty and importance of the findings, as stated in the abstract, include the discovery that a global reduction in tRNA levels (and not other Pol III transcripts) at an early stage in the disease precedes the frank induction of integrated stress and innate immune responses, activation of microglia and neuronal loss at later times. These later events readily account for the observed neurobehavioral deficits that collectively include risk assessment, locomotor, exploratory and grooming behaviors. 

      Strengths:

      The study created a mouse model to investigate role of RNA PolIII transcription. Furthermore, the study provided RNA seq analysis of the mutant mice and highlighted expression specific transcripts affected by the RNA PolIII mutation.

      Weaknesses:

      (1) The abstract is not clearly written. It is hard to interpret what is the objective of the study and why they are important to investigate. For example: "The molecular basis of disease pathogenesis is unknown." Which disease? 4H leukodystrophy? All neurodegenerative disease?

      We have modified the abstract to more clearly frame the objective of the study and its importance as reflected in the title “Molecular basis of neurodegeneration in a mouse model of Polr3-related disease”. We hope the reviewer will agree that the fourth sentence of the abstract, unchanged from the initial submission, clearly outlines the objective of the study.  

      (2) How cerebral pathology and exocrine pancreatic atrophy are related? How altered tRNA level connects these two axes?

      It is not known how cerebral pathology and exocrine pancreatic atrophy are related beyond their shared Pol III dysfunction in our mouse model of Polr3-related disease. We anticipate that altered tRNA levels connect these two axes. Indeed, the pancreas and the brain are both known to be highly sensitive to perturbations affecting translation (Costa-Mattioli and Walter, 2020 Science doi: 10.1126/science.aat5314). Changes to the tRNA population in the cerebrum and cerebellum of Polr3a mutant mice were extensively documented in the manuscript (e.g. Figs. 3, 5 and 6).  We also found reduced tRNA levels in the pancreas of the mutant mice but did not report these findings due to the absence of a stable reference transcript in total RNA from the atrophied pancreatic tissue, even at the earliest time point examined (P42). 

      (3) Authors mentioned that previously observed reduction mature tRNA level also recapitulated in their study. Why this study is novel then?

      Our study reports the novel finding that a pathogenic Polr3a mutation causes a global reduction in the steady state levels of mature tRNAs, i.e. the levels of all tRNA decoders were reduced with the vast majority these reaching statistical significance (Fig. 6D and 6F). In the introduction we refer to several studies that examined the effect of pathogenic Polr3 mutations on the levels of Pol III-derived transcripts. We noted that these studies examined only a small number of Pol III transcripts in CRISPR-Cas9 engineered cell lines, patient-derived fibroblasts and patient blood. Thus, no study until now has tested for or reported a global defect in the abundance of mature tRNAs in any model of Polr3-related disease. Moreover, no previous study of _Polr3_related disease has analyzed Pol III transcript levels in the brain or in any other tissue. 

      (4) It is very intuitive that deficit in Pol III transcription would severely affect protein synthesis in all brain areas as well as other organs. Hence, growth defect observed in Polr3a mutant mice is not very specific rather a general phenomenon.

      While we agree with the simple assumption that a “deficit in Pol III transcription likely would affect protein synthesis in all brain areas as well as other organs”, this turned out not to be the case. In fact, a novel finding of our study is that not all Polr3a mutant tissues show a translation stress response despite reduced Pol III transcription and reduced mature tRNA levels. This implies that in some tissues the reduction in tRNA levels caused by the Polr3a mutation is not sufficient to affect protein synthesis, at least to a point where the Integrated Stress Response is induced. The underlying basis for the growth deficit has not been defined in this work. However, we noted in the discussion that a growth defect was previously seen in mice where expression of the Polr3a mutation was restricted to the Olig2 lineage.  In the present postnatal whole-body inducible model, we anticipate that the diminished growth of the mice results from a combination of hormonal and nutritional deficits caused by cerebral and pancreatic dysfunction.

      (5) Authors observed specific myelination defect in cortex and hippocampus but not in cerebellum. This is an interesting observation. It is important to find the link between tRNA removal and myelin depletion in hippocampus or cortex? Why is myelination not affected in cerebellum?

      We agree that the specific myelin defect observed in the cortex and hippocampus, but not the cerebellum, is an interesting observation. Pol III dysfunction in this model and reduced tRNA levels are common to both cerebra and cerebella, yet the pathological consequences differ between these regions.  While we do not know why this is the case, the cells that oligodendrocytes support in these regions are functionally different. We suggest in the discussion that subtle defects in oligodendrocyte function in the cerebellum may be uncovered using more sensitive or specific assays than the ones we have employed to date.  In addition, consistent with our findings in other tissues where Pol III transcription and tRNA levels are reduced but phenotypes are lacking, we suggest that oligodendrocytes in the cerebellum may have a different minimum threshold for Pol III activity than in other regions of the brain. 

      (6) How was the locomotor activity measured? The detailed description is missing. Also, locomotion is primarily cerebellum dependent. There is no change in term of growth rate and myelination in cerebellar neurons. I do not understand why locomotor activity was measured.

      We used a behavioral spectrometer with video tracking and pattern-recognition software to quantify ~20 home cage-like behaviors, including locomotor activity, as part of our phenotypic characterization of the mice. This experimenter-unbiased approach reported several metrics of locomotion, specifically, total Track length (the total distance traveled in the instrument), Center Track length and the time spent running (Run Sum) and standing still (Still Sum) in a longitudinal study (Figs. 2A-C and Supplemental Fig. S3A-C). The Materials and Methods section on mouse behavior has been amended to provide a detailed description of these experiments. 

      locomotion is primarily cerebellum dependen_t_

      While we agree that the cerebellum plays a critical role in balance and locomotion, regions of the cerebrum that are affected in our mice, including the primary motor cortex and the basal ganglia (Fig. 4), also have important roles in locomotor activity and control. 

      (7) The correlation with behavioural changes and RNA seq data is missing. There a number of transcripts are affected and mostly very general factors for cellular metabolism. Most of them are RNA Pol II transcribed. How a Pol III mutation influences RNA Pol II driven transcription? I did not find differential expression of any specific transcripts associated with behavioural changes. What is the motivation for transcriptomics analysis? None of these transcripts are very specific for myelination. It is rather a general cellular metabolism effect that indirectly influences myelination.

      The differentially expressed mRNAs identified in our RNAseq analysis at P75 reflect both direct and secondary consequences of dysfunctional Pol III transcription on Pol II transcription. These effects can be achieved by multiple mechanisms. Induction of the Integrated Stress Response (ISR) due to insufficient tRNA can be considered a direct consequence of diminished Pol III transcription on Pol II transcription. An example of a secondary response is the activation of microglia and the innate immune response (which is known to accompany prolonged activation of the ISR), and the loss of neurons and oligodendrocytes. These changes are documented in Figs. 3 and 4. Importantly, loss of neurons, activated microglia and reduced oligodendrocyte numbers are each readily reconciled with changes in behavior.  

      None of these transcripts are very specific for myelination 

      The RNAseq data at P75 indicates only a modest reduction in oligodendrocyte-specific gene expression (as defined by single-cell RNAseq studies of purified cell populations, Mackenzie et al., 2018 Sci. Rep. doi: 10.1038/s41598-018-27293-5). Despite this, some oligodendrocytespecific transcripts with well-known roles in myelination were down-regulated in the Polr3a mutant (e.g. Plp1, Mog and Mobp). In addition, steroid synthesis pathway transcripts involved in the production of cholesterol, an abundant and essential component of myelin, were also downregulated (Supplementary Fig. S4E).

      (8) What genes identified by transcriptomics analysis regulates maturation of tRNA? Authors should at least perform RNAi study to identify possible factor and analyze their importance in maturation of tRNA.

      Of the many proteins involved in the maturation of tRNA (Phizicky and Hopper, 2023 RNA doi: 10.1261/rna.079620.123), RNAseq analysis at P75 identified only amino-acyl tRNA synthetases as being differentially-expressed (fold change >1.5, p adj. < 0.05, Table S1). These genes are canonical indicators of the ATF4-dependent Integrated Stress Response and their upregulation is widely interpreted as an attempt to restore efficient translation. In addition, our analysis of Pol III transcripts at P75 identified a reduction in the level of RppH1 (Fig. 3C), the RNA component of RNase P, which removes the 5’ leader of precursor tRNAs.  However, at P42, there was no effect on RppH1 abundance, or the expression of amino-acyl tRNA synthetase genes (Fig. 5C and Table S3).  Thus, an RNAi study to identify and analyze a possible factor involved in the maturation of tRNA is neither warranted nor relevant to the current body of work.

      (9) What factors are influencing tRNA transport to cytoplasm? It may be possible that Polr3a mutation affect cytoplasmic transport of tRNA. Authors should study this aspect using an imaging experiment.

      Our analysis of tRNA populations in this study employed total cellular RNA and thus reflect the abundance of mature tRNA from all cellular compartments. We have not assessed whether the reduction in tRNA abundance caused by the Polr3a mutation alters the dynamics of tRNA transport from the nucleus to the cytoplasm. However, we consider it highly unlikely that the Polr3a mutation would have a significant effect on cytoplasmic transport of tRNA. Imaging experiments along these lines are beyond the scope of the current study.

      (10) Does alteration of cytoplasmic level of tRNA affects translation? Author should perform translation assay using bio-orthoganal amino acid (AHA) labelling.

      It is not known whether the reduced tRNA levels affect translation globally in the Polr3a mutant, but we predict that this may not be the case. Since tissues (heart and kidney) and brain regions (cerebrum and cerebellum) that share a decrease in tRNA abundance do not share activation of the Integrated Stress Response (a reporter of aberrant translation), we anticipate that effects on translation may be limited to specific regions or cell populations and to specific mRNAs within these cells. The current study provides the foundation for further work to address these questions.

      Reviewer #1 (Recommendations For The Authors):

      Below are a few comments, mostly regarding typographical errors, presentation, and clarity, that we believe would enhance this manuscript:

      On the heatmaps generated, it would be ideal to place "WT" before "KI," with "WT" on the left. This will maintain consistency with the rest of the manuscript, where "WT" conditions precede "KI" conditions, as observed in the bar graphs and dot plots.

      All heatmaps have been remade with WT on the left and KI on the right to maintain consistency throughout the manuscript. 

      Authors mentioned in several instances (Discussion Pg 19 Line 2, for instance) the analysis of changes in the "Pol II transcriptome." Is this a typographical error?

      The reference to the Pol II transcriptome is not a typographical error (Discussion Pg 19 Line2). Here and elsewhere in the manuscript, we are distinguishing between changes to the Pol III transcriptome and the timing of subsequent changes to the Pol II transcriptome. The text has been edited to clarify this relationship in several places.   

      (1) Introduction, Page 4, last paragraph.

      Analysis of the Pol III transcriptome reveals a common decrease in pre-tRNA and mature tRNA populations and few if any changes among other Pol III transcripts across multiple tissues. Analysis of the Pol II transcriptome reveals activation of the integrated stress response in cerebra but not in other surveyed tissues.

      (2) Results, page 8, 2nd paragraph

      To investigate the molecular changes to Pol III transcript levels caused by the Polr3a mutation and any secondary effects on the Pol II transcriptome, we initially focused on the cerebra of adult mice at P75.

      (3) Discussion, Page 19, second paragraph

      Pol III dysfunction and the reduction in the cerebral tRNA population at P42 coincides with behavioral deficits and precedes substantial downstream alterations in the Pol II transcriptome, which include induction of an innate immune response (IR) and an ISR, and indicators of neurodegeneration (i.e., activation of cell death pathways and loss of mitochondrial DNA). These findings suggest a causal role for the lower tRNA abundance and/or altered tRNA profile in disease progression.

      In supplementary figure 1, authors validated the expression of their systems using flow cytometry and observed a high level of recombination frequency in different tissue types. Can the flow cytometry data distinguish between cell types within the cerebrum (neurons/microglia/astrocytes)?

      The flow cytometry experiments reported in Supplementary Fig. S1 used a dual tdTomato-EGFP reporter to assess recombination. The cerebral and cerebellar samples were gated on fluorescence from endogenous expression of tdTomato (red), EGFP (green) and DAPI (blue) staining. In principle, flow cytometry could be used to distinguish between cell types within the cerebrum (neurons/microglia/astrocytes). However,  this would require (i) an antibody to a cell surface marker on the cell type of interest and (ii) a fluorescent probe conjugated to the primary antibody or a fluorescent secondary antibody that is spectrally well resolved from the emission spectra of tdTomato, eGFP and DAPI.

      Results section 1: Is there any particular reason why P28 was chosen as the commencement of tamoxifen injection?

      P28 was chosen so that any effect of the Polr3a mutation on development and differentiation would be limited in the tissues we examined. 

      Fig 1C: The number of asterisks does not match between the graph and the figure legend.

      Fig. 1C has been corrected to match the number of asterisks in the graph and figure legend.

      Results section 3:

      This section seemed a little brief, especially when compared to the depth of the succeeding sections. Authors can state in greater detail which behaviors were quantified. In S3A-C, my understanding is that the animals were placed in an open-field test. This procedure can be briefly mentioned in the methods, as well as in the main manuscript text.

      In the legends of S3, a bracket is missing for "(D-F)" on line 5. Additionally, the alignment of legends for each bar graph could be consistent for all graphs except under the condition of spatial constraint.

      Detailed methods pertaining to the measurement and calculation of home cage-like behaviors reported by the behavioral spectrometer have been added to the Methods section on Mouse Behavior. 

      In the Results, Figs. S3A-C show anxiety-like behaviors which measure the number and duration of visits and the distance traveled  in a 15 cm2  central area of the arena. Figs. 2A-C show locomotor behaviors including Tracklength, Run sum and Still sum. The open field-like behavior is reported as total Tracklength in the behavioral spectrometer, i.e. the total distance travelled in the arena. This is now more clearly described in  the main manuscript and the Methods section. “overall locomotor activity was decreased in Polr3a-tamKI mice as indicated by the reduced track length at P42, P49, P56 and P63 (Fig. 2A).” 

      The legend of S3, now has the missing bracket "(D-F)" on line 5. 

      The legends within each bar graph are now consistent and aligned as much as spatial constraints allow.

      Results section 4:

      Similar to our earlier questions for S1, is it possible to distinguish samples derived from different cell types (neurons/glia)? In figure 4, this is mainly done post-hoc, based on the known gene expression. Maybe the authors could discuss this small limitation? In Fig S4C, the color contrast for the heatmap legend needs to be corrected.

      It is not possible to accurately distinguish different neural cell sub-types, such as different types of neurons, or different types of oligodendrocytes in bulk RNAseq. Hence, we have reported only high confidence correlations based on known gene expression signatures (Fig. 4). We discuss only the data for which we can draw confident conclusions. The heatmap and legend in Fig. S4C has been amended. 

      Results section 5:

      In figure S5A, the alignment of asterisk significance markers could be adjusted.

      Asterisks have been realigned in Fig. S5A

      Reviewer #2 (Recommendations For The Authors):

      Methods Section should include detailed procedure.

      A detailed description of the methods pertaining to the measurement and calculation of behaviors using the behavioral spectrometer has been added to the Methods section.

      Statistical tests should have detailed information

      Statistical tests are detailed in the Methods section “Statistical Analysis”. Additional details pertaining to calculations of behavioral data have been added to the “Mouse behavior” section of the Methods.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #2 (Public review):

      Weaknesses:

      The authors have clarified that the first features available for each patient have been used. However, they have not shown that these features did not occur before the time of post-stroke epilepsy. Explicit clarification of this should be performed.

      The data utilized in our analysis were collected during the first examination or test conducted after the patients' admission. We specifically excluded any patients with a history of epilepsy, ensuring that all cases of epilepsy identified in our study occurred after admission. Therefore, the features we analyzed were collected after the patients' admission but prior to the onset of post-stroke epilepsy.

      Reviewer #3 (Public review):

      Weaknesses:

      The writing of the article may be significantly improved.

      Although the external validation is appreciated, cross-validation to check robustness of the models would also be welcome.

      Thank you for your helpful advice.  Performing n-fold cross-validation is a crucial step to ensure the reliability and robustness of the reported results, especially when dealing with the datasets which don't have sufficient quantity.   We revised our code and did a 5 fold cross-validation version ,it didn’t have much promote(because our model has reach the auc of 0.99).Considering that we have sufficient quantity of more than 20000 records, we think split the dataset by 7:3 and train the model is enough for us. We have uploaded the code of 5 fold cross-validation version and ploted the 5 fold test roc  on GitHub at https://github.com/conanan/lasso-ml/lasso_ml_cross_validation.ipynb as an external resource. We  trained the 5 fold average model and ploted the 5 fold test roc curves, the results show some improvement, but it is not substantial because the best model are still tree models in the end.

      External validation results may be biased/overoptimistic, since the authors informed that "The external validation cohort focused more on collecting positive cases 80 to examine the model's ability to identify positive samples", which may result in overoptimistic PPV and Sensitivity estimations. The specificity for the external validation set has not been disclosed.

      Thank you for your valuable feedback regarding the external validation results. We appreciate your concerns about potential bias and overoptimism in our estimations of positive predictive value (PPV) and sensitivity.

      To clarify, we have uploaded the code for external validation on GitHub at https://github.com/conanan/lasso-ml. The results indicate that the PPV is 0.95 and the specificity is 0.98.

      While we focused on collecting more positive cases due to their lower occurrence rate, this approach allows us to better evaluate the model's ability to predict positive samples, which is crucial in clinical settings. We believe that emphasizing positive cases enhances the model's utility for practical applications(So a little overoptimism is acceptable ).


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses 1:

      The methodology needs further consideration. The Discussion needs extensive rewriting.

      Thanks for your advice, we have revised the Discussion

      Reviewer #2 (Public Review):

      Weaknesses 2:

      There are many typos and unclear statements throughout the paper.

      There are some issues with SHAP interpretation. SHAP in its default form, does not provide robust statistical guarantees of effect size. There is a claim that "SHAP analysis showed that white blood cell count had the greatest impact among the routine blood test parameters". This is a difficult claim to make.

      Thank you for your suggestion that the SHAP analysis is really just a means of interpreting the model.  In our research, we compared the SHAP analysis with traditional statistical methods, such as regression analysis.  We found the SHAP results to be consistent with the statistical results from the regression for variables like white blood cell count (see Table 1). This alignment leads us to believe the SHAP analysis is providing reliable insights in this context

      The Data Collection section is very poorly written, and the methodology is not clear.

      Thanks for your advice, we have revised the Data Collection section.

      There is no information about hyperparameter selection for models or whether a hyperparameter search was performed. Given this, it is difficult to conclude whether one machine learning model performs better than others on this task.

      Thank you for the advices of performing hyperparameter. We used the package of sklearn, xgboost, lightgbm of python 3.10 to construct the model and  didn’t change the default settings before. It is not proper and may lead to  less certain conclusions. Now we carry out grid search to select and optimize hyperparameters and they make the model better. The best model is still RF.

      The inclusion and exclusion criteria are unclear - how many patients were excluded and for what reasons?

      The procedure of selection is in figure1. Total there are 42079 records from the stroke database, 24733 patients were diagnosed as ischemic stroke or lacular stoke with new onset. Then we excluded hemorrage stroke(4565),history of stroke(2154), TIA(3570), unclear cause stroke(561) and records who missed important data(6496). Then we excluded patients whose seizure might be attributed to other potential causes (brain tumor, intracranial vascular malformation, traumatic brain injury,etc)(865). Then we exclude patient who had a seizure history(152) or died in hospital (1444). Then we excluded patients who were lost in follow-up (had no outpatient records and can’t contact by phone )or died within 3 months of the stroke incident(813). Finally 21459 cases are involved in this research.

      There is no sensitivity analysis of the SMOTE methodology: How many synthetic data points were created, and how does the number of synthetic data points affect classification accuracy?

      Thanks for your remind, we have accept these advice and change the SMOTE to SMOTEENN (Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors) technique to resample an imbalanced dataset for machine learning. The code is

      smoteenn = SMOTEENN(samplingstrategy='auto', randomstate=42)

      the SMOTEENN class comes from the imblearn library. The samplingstrategy='auto' parameter tells the algorithm to automatically determine the appropriate sampling strategy based on the class distribution. The randomstate=42 parameter sets a seed for the random number generator, ensuring reproducibility of the results.

      Did the authors achieve their aims? Do the results support their conclusions?

      Yes, we have achieve some of the aims of predicting PSE while still leave some problem.

      The paper does not clarify the features' temporal origins. If some features were not recorded on admission to the hospital but were recorded after PSE occurred, there would be temporal leakage.

      The data used in our analysis is from the first examination or test conducted after the patients' admission, retrieved from a PostgreSQL database. First, we extracted the initial admission date for patients admitted due to stroke. Then, we identified the nearest subsequent examination data for each of those patients.

      The sql code like follows:

      SELECT TO_DATE(condition_start_date, 'DD-MM-YYYY') AS DATE

      FROM diagnosis

      WHERE person_id ={} and (condition_name like '%梗死%' or condition_name like '%梗塞%') and(condition_name like '%脑%'or condition_name like '%腔隙%'))

      order by DATE limit 1

      The authors claim that their models can predict PSE. To believe this claim, seeing more information on out-of-distribution generalisation performance would be helpful. There is limited reporting on the external validation cohort relative to the reporting on train and test data.

      Thank you for the advice. The external validation is certainly very important, but there have been some difficulties in reaching a perfect solution.  We have tried using open-source databases like the MIMIC database, but the data there does not fit our needs as closely as the records from our own hospital.  The MIMIC database lacks some of the key features we require, and also lacks the detailed patient follow-up information that is crucial for our analysis.   Given these limitations, we have decided to collect newer records from the same hospitals here in Chongqing.  We believe this will allow us to build a more comprehensive dataset to support robust external validation.  While it may not be a perfect solution, gathering this additional data from our local healthcare system is a pragmatic step forward.   Looking ahead, we plan to continue expanding this Chongqing-based dataset and report on the results of the greater external validation in the future.  We are committed to overcoming the challenges around data availability to strengthen the validity and generalizability of our research findings.

      For greater certainty on all reported results, it would be most appropriate to perform n-fold cross-validation, and report mean scores and confidence intervals across the cross-validation splits

      Thank you for your helpful advice. Performing n-fold cross-validation is a crucial step to ensure the reliability and robustness of the reported results, especially when dealing with the datasets which don't have sufficient quantity. While we have sufficient quantity of more than 20000 records, so we think split the dataset by 7:3 and train the model is enough for us. We revised our code and did a 5 fold cross-validation version ,it had little promote(because our model has reach the auc of 0.99), we may use this great technique in our next study if there is not enough cases.

      Additional context that might help readers

      The authors show force plots and decision plots from SHAP values. These plots are non-trivial to interpret, and the authors should include an explanation of how to interpret them.

      Thank you for your helpful advice. It is a great improve for our draft, we have added the explanation that we use the force plot of the first person to show the influence of different features of the first person, we can see that long APTT time contribute best to PSE, then the AST level and others, the NIHSS score may be low and contribute opposite to the final result. Then the decision plot is a collection of model decisions that show how complex models arrive at their predictions

      Reviewer #3 (Public Review):

      Weaknesses3:

      There are issues with the readability of the paper. Many abbreviations are not introduced properly and sometimes are written inconsistently. A lot of relevant references are omitted. The methodological descriptions are extremely brief and, sometimes, incomplete.

      Thanks for your advice, we have revised these flaws.

      The dataset is not disclosed, and neither is the code (although the code is made available upon request). For the sake of reproducibility, unless any bioethical concerns impede it, it would be good to have these data disclosed.

      Thank you for your recommendations. We have made the code available on GitHub at https://github.com/conanan/lasso-ml. While the data is private and belongs to the hospital. Access can be requested by contacting the corresponding author to apply from the hospitals and specifying the purpose of inquiry.

      Although the external validation is appreciated, cross-validation to check the robustness of the models would also be welcome.

      Thank you for your valuable advice. Performing n-fold cross-validation is crucial for ensuring the reliability and robustness of results, especially with limited datasets. However, since we have over 20,000 records, we believe that a 70:30 split for training and testing is sufficient.

      We revised our code and implemented 5-fold cross-validation, which provided minimal improvement, as our model has already achieved an AUC of 0.99. We plan to use this technique in future studies if we encounter fewer cases.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      My comments include two parts:

      (1) Methodology<br /> a-This study was based on multiple clinical indicators to construct a model for predicting the occurrence of PSE. It involved various multi-class indicators such as the affected cortical regions, locations of vascular occlusion, NIHSS scores, etc. Only using the SHAP index to explain the impact of multi-class variables on the dependent variable seems slightly insufficient. It might be worth considering the use of dummy variables to improve the model's accuracy.

      Thank you for the detailed feedback on the study methodology. The SHAP analysis is really just a means of interpreting the model, which we compared with the combination of SHAP and traditional statistics, so we think SHAP analysis is reliable in this research. We have used the dummy variables, expecially when dealing with the affected cortical regions, locations of vascular occlusion, for example if frontal region is involved the variable is 1. But they have less impact in the machine learning model

      b-The study used Lasso regression to select 20 features to build the model. How was the optimal number of 20 features determined?

      Lasso regression is a commonly used feature screening method. Since we extract information from the database and try to include as many features as possible, the cross-verification curve of lasso regression includes 78 features best, but it will lead to too complex model. We select 10,15,20,25,30 features for modeling according to the experiment. When 20 features are found, the model parameters are good and relatively concise. Improve the number of features contribute little to the model effect, decrease the number of features influence the concise of model ,for example the auc of the model with 15 features will drop under 0.95. So we finally select 20 features.

      c-The study indicated that the incidence rate of PSE in the enrolled patients is 4.3%, showing a highly imbalanced dataset. If singly using the SMOTE method for oversampling, could this lead to overfitting?

      Thanks for your remind, singly using the SMOTE method for oversampling is inproper. Now we have find this improvement and change the SMOTE to SMOTEENN (Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors) technique to resample an imbalanced dataset for machine learning. First, oversampling with SMOTE and then undersampling with ENN to remove possible noise and duplicate samples. The code is

      smoteenn = SMOTEENN(sampling_strategy='auto', random_state=42)

      the SMOTEENN class comes from the imblearn library. The sampling_strategy='auto' parameter tells the algorithm to automatically determine the appropriate sampling strategy based on the class distribution. The random_state=42 parameter sets a seed for the random number generator, ensuring reproducibility of the results.

      (2) Clinical aspects:

      Line 8, history of ischemic stroke, this is misexpression, could be: diagnosis of ischemic stroke.

      Line 8, several hospitals, should be more exact; how many?

      Line 74 indicates that the data are from a single centre, this should be clarified.

      Line 4 data collection: The criteria read unclear; please clarify further.

      Thanks for your remind, we have revised the draft and correct these errors.

      Line 110, lab parameters: Why is there no blood glucose?

      Because many patients' blood sugar fluctuates greatly and is easily affected by drugs or diet, we finally consider HBA1c as a reference index by asking experts which is more stable.

      Line 295, The author indicated that data lost; this should be clarified in the results part, and further, the treatment of missing data should be clarified in the method part.

      Thanks for your remind, we have revised the draft and correct these errors.

      I hope to see a table of the cohort's baseline characters. The discussion needs extensive rewriting; the author seems to be swinging from the stoke outcome and the seizure, sometimes losing the target.

      Figure1 is the procedure of the selection of patients. Table1 contains the cohort's baseline characters

      For the swinging from the stoke outcome and the seizure, that is because there are few articles on predicting epilepsy directly by relevant indicators, while there are more articles on prognosis. So we can only take epilepsy as an important factor in prognosis and comprehensively discuss it, or we can't find enough articles and discuss them

      Reviewer #2 (Recommendations For The Authors):

      There are typos and examples of text that are not clear, including:

      "About the nihss score, the higher the nihss score, the more likely to be PSE, nihss score has a third effect just below white blood cell count and D-dimer."

      "and only 8 people made incorrect predictions, demonstratijmng a good predictive ability of the model."

      "female were prone to PSE"

      " Waafi's research"

      "One-heat' (should be one-hot)

      Thanks for your remind, we have revised the draft and correct these errors.

      The Data Collection section is poorly written, and the methodology is not clear. It would be much more appropriate to include a table of all features used and an explanation of what these features involve. It would also be useful to see the mean values of these features to assess whether the feature values are reasonable for the dataset.

      Thanks for your remind. All data are from the first examination or test after admission, presented through the postgresql database . First we extract the first date of the patients who was admitted by stroke ,then we extract informations from the nearest examination from the admission. We extract by the SQL code by computer instead of others who may extract data by manual so we get as much data as possible other than only get the features which was reported before .The table of all features used and their mean±std is in table1.

      The paper does not clarify the features' temporal origins. If some features were not recorded on admission to the hospital but were recorded after PSE occurred, there would be temporal leakage. I would need this clarified before believing the authors achieved their claims of building a predictive model.

      All relevant index results were from the first examination after admission, and the mean standard deviation was listed in the statistical analysis section in table1.

      The authors claim that their models can predict PSE. To believe this claim, seeing more information on out-of-distribution generalisation performance would be helpful. There is limited reporting on the external validation cohort relative to the reporting on train and test data.

      Thank you for the advice, the external validation is very important but there are some difficulties to reach a perfect one. We have tried some of the open source database like the mimic database ,but these data don't fit our request because they don't have as much features as our hospital and lack of follow-up of the relevant patients. In the end we collected the newer records in the same hospitals in Chongqing and we will collect more and report a greater external validation in the future.

      For greater certainty on all reported results, It would be most appropriate to perform n-fold cross-validation, and report mean scores and confidence intervals across the cross-validation splits.

      Thank you for your helpful advice. Performing n-fold cross-validation is a crucial step to ensure the reliability and robustness of the reported results, especially when dealing with the datasets which don't have sufficient quantity. While we have sufficient quantity of more than 20000 records, so we think split the dataset by 7:3 and train the model is enough for us. We revised our code and did a 5 fold cross-validation version ,it had little promote, we will use this great technique in our next study.

      The authors show force plots and decision plots from SHAP values. These plots are non-trivial to interpret, and the authors should include an explanation of how to interpret them.

      It is a great improve for our draft, we have added the explanation we use the force plot of the first person to show the influence of different features of the first person, we can see that long APTT time contribute best to PSE, then the AST level and others, the NIHSS score may be low and contribute lower to the final result. Then the decision plot is a collection of model decisions that show how complex models arrive at their predictions

      Reviewer #3 (Recommendations For The Authors):

      Abbreviations should not be defined in the abstract )or only in the abstract).

      Please explicit what are the purposes of the study you are referring to in "Currently, most studies utilize clinical data to establish statistical models, survival analysis and cox regression."

      Authors affirm: "there is still a relative scarcity of research 49 on PSE prediction, with most studies focusing on the analysis of specific or certain risk factors ." This statement is especially curious since the current study uses risk factors as predictors.

      It is not clear to me what the authors mean by "No study has proposed or established a more comprehensive and scientifically accurate prediction model." The authors do not summarize the statistical parameters of previously reported model, or other relevant data to assess coverage or validity (maybe including a Table summarizing such information would be appropriate. In any case, I would try to omit statements that imply, to some extent, discrediting previous studies without sufficient foundation.

      "antiepileptic drugs" is an outdated name. Please use "antiseizure medications"

      Thanks for your remind, we have revised the draft and correct these errors.

      The authors say regarding missing data that they "filled the data of the remaining indicators with missing values of more than 1000 cases by random forest algorithm". Please clarify what you mean by "of more than 1000 cases." Also, provide details on the RF model used to fill in missing data.

      Thanks for your remind. "of more than 1000 cases" was a wrong sentence and we have corrected it. Here is the procedure, first we counted the values of all laboratory indicators for the first time after stroke admission( everyone who was admitted because of stroke would perform blood routine , liver and kidney function and so on), excluded indicators with missing values of more than 10%, and filled the data of the remaining indicators with missing values by random forest algorithm using the default parameter. First, we go through all the features, starting with the one with the least missing (since the least accurate information is needed to fill in the feature with the least missing). When filling in a feature, replace the missing value of the other feature with 0. Each time a regression prediction is completed, the predicted value is placed in the original feature matrix and the next feature is filled in. After going through all the features, the data filling is complete.

      Please specify what do you mean by negative group and positive group, Avoid tacit assumptions.

      Thanks for your remind, we have revised the draft and correct these errors.

      Please provide more details (and references) on the smote oversampling method. Indicate any relevant parameters/hyperparameters.

      Thanks for your remind, we have accept these advice and change the SMOTE to SMOTEENN (Synthetic Minority Over-sampling Technique combined with Edited Nearest Neighbors) technique to resample an imbalanced dataset for machine learning. The code is

      smoteenn = SMOTEENN(sampling_strategy='auto', random_state=42)

      the SMOTEENN class comes from the imblearn library. The sampling_strategy='auto' parameter tells the algorithm to automatically determine the appropriate sampling strategy based on the class distribution. The random_state=42 parameter sets a seed for the random number generator, ensuring reproducibility of the results.

      The methodology is presented in an extremely succinct and non-organic manner (e.g., (Model building) Select the 20 features with the largest absolute value of LASSO." Please try to improve the narrative.

      Lasso regression is a commonly used feature screening method. Since we extract information from the database and try to include as many features as possible, the cross-verification curve of lasso regression includes 78 features best, but it will lead to too complex model. We select 10,15,20,25,30 features for modeling according to the experiment. When 20 features are found, the model parameters are good and relatively concise. Improve the number of features contribute little to the model effect, decrease the number of features influence the concise of model ,for example the auc of the model with 15 features will drop under 0.95. So we finally select 20 features.

      Many passages of the text need references. For example, those that refer to Levene test, Welch's t-test, Brier score, Youden index, and many others (e.g., NIHSS score). Please revise carefully.

      Thanks for your remind, we have revised the draft and correct these errors.

      "Statistical details of the clinical characteristics of the patients are provided in the table." Which table? Number?

      Thanks for your remind, we have revised the draft and correct these errors, it is in table1.

      Many abbreviations are not properly presented and defined in the text, e.g., wbc count, hba1c, crp, tg, ast, alt, bilirubin, bua, aptt, tt, d_dimer, ck. Whereas I can guess the meaning, do not assume everyone will. Avoid assumptions.

      ROC is sometimes written "ROC" and others, "roc." The same happens for PPV/ppv, and many other words (SMOTE; NIHSS score, etc.).

      Please rephrase "ppv value of random forest is the highest, reaching 0.977, which is more accurate for the identification of positive patients(the most important function of our models).". PPV always refer to positive predictions that are corroborated, so the sentences seem redundant.

      Thanks for your remind, we have revised the draft and correct these errors.

      What do you mean by "Complex algorithms". Please try to be as explicit as possible. The text looks rather cryptic or vague in many passages.

      Thanks for your remind, "Complex algorithms" is corrected by machine learning.

      The text needs a thorough English language-focused revision, since the sense of some sentences is really misleading. For instance "only 8 people made incorrect predictions,". I guess the authors try to say that the best algorithm only mispredicted 8 cases since no people are making predictions here. Also, regarding that quote... Are the authors still speaking of the results of the random forest model, which was said to be one of the best performances?

      Thanks for your remind, we have revised the draft and correct these errors.

      The authors say that they used, as predictors "comprehensive clinical data, imaging data, laboratory test data, and other data from stroke patients". However, the total pool of predictors is not clear to me at this point. Please make it explicit and avoid abbreviations.

      Thanks for your remind, we have revised the draft and correct these errors.

      Although the authors say that their code is available upon request, I think it would be better to have it published in an appropriate repository.

      Thanks for your remind, we showed our code at  https://github.com/conanan/lasso-ml.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors investigated how the presence of interspecific introgressions in the genome affects the recombination landscape. This research was intended to inform about genetic phenomena influencing the evolution of introgressed regions, although it should be noted that the research itself is based on examining only one generation, which limits the possibility of drawing far-reaching evolutionary conclusions. In this work, yeast hybrids with large (from several to several dozen percent of the chromosome length) introgressions from another yeast species were crossed. Then, the products of meiosis were isolated and sequenced, and on this basis, the genome-wide distribution of both crossovers (COs) and noncrossovers (NCOs) was examined. Carrying out the analysis at different levels of resolution, it was found that in the regions of introduction, there is a very significant reduction in the frequency of COs and a simultaneous increase in the frequency of NCOs. Moreover, it was confirmed that introgressions significantly limit the local shuffling of genetic information, and NCOs are only able to slightly contribute to the shuffling, thus they do not compensate for the loss of CO recombination.

      Strengths:

      - Previously, experiments examining the impact of SNP polymorphism on meiotic recombination were conducted either on the scale of single hotspots or the entire hybrid genome, but the impact of large introgressed regions from another species was not examined. Therefore, the strength of this work is its interesting research setup, which allows for providing data from a different perspective.

      - Good quality genome-wide data on the distribution of CO and NCO were obtained, which could be related to local changes in the level of polymorphism.

      Weaknesses:

      (1)  The research is based on examining only one generation, which limits the possibility of drawing far-reaching evolutionary conclusions. Moreover, meiosis is stimulated in hybrids in which introgressions occur in a heterozygous state, which is a very unlikely situation in nature. Therefore, I see the main value of the work in providing information on the CO/NCO decision in regions with high sequence diversification, but not in the context of evolution.

      While we are indeed only examining recombination in a single generation, we respectfully disagree that our results aren't relevant to evolutionary processes. The broad goals of our study are to compare recombination landscapes between closely related strains, and we highlight dramatic differences between recombination landscapes. These results add to a body of literature that seeks to understand the existence of variation in traits like recombination rate, and how recombination rate can evolve between populations and species. We show here that the presence of introgression can contribute to changes in recombination rate measured in different individuals or populations, which has not been previously appreciated. We furthermore show that introgression can reduce shuffling between alleles on a chromosome, which is recognized as one of the most important determinants for the existence and persistence of sexual reproduction across all organisms. As we describe in our introduction and conclusion, we see our experimental exploration of the impacts of introgression on the recombination landscape as complementary to studies inferring recombination and introgression from population sequencing data and simulations. There are benefits and challenges to each approach, but both can help us better understand these processes. In regards to the utility of exploring heterozygous introgression, we point out that introgression is often found in a heterozygous state (including in modern humans with Neanderthal and/or Denisovan ancestry). Introgression will always be heterozygous immediately after hybridization, and depending on the frequency of gene flow into the population, the level of inbreeding, selection against introgression, etc., introgression will typically be found as heterozygous.

      - The work requires greater care in preparing informative figures and, more importantly, re-analysis of some of the data (see comments below).

      More specific comments:

      (1) The authors themselves admit that the detection of NCO, due to the short size of conversion tracts, depends on the density of SNPs in a given region. Consequently, more NCOs will be detected in introgressed regions with a high density of polymorphisms compared to the rest of the genome. To investigate what impact this has on the analysis, the authors should demonstrate that the efficiency of detecting NCOs in introgressed regions is not significantly higher than the efficiency of detecting NCOs in the rest of the genome. If it turns out that this impact is significant, analyses should be presented proving that it does not entirely explain the increase in the frequency of NCOs in introgressed regions.

      We conducted a deeper exploration of the effect of marker resolution on NCO detection by randomly removing different proportions of markers from introgressed regions of the fermentation cross in order to simulate different marker resolutions from non-introgressed regions. We chose proportions of markers that would simulate different quantiles of the resolution of non-introgressed regions and repeated our standard pipeline in order to compare our NCO detection at the chosen marker densities. More details of this analysis have been added to the manuscript (lines 188-199, 525-538). We confirmed the effect of marker resolution on NCO detection (as reported in the updated manuscript and new supplementary figures S2-S10, new Table S10) and decided to repeat our analyses on the original data with a more stringent correction. For this we chose our observed average tract size for NCOs in introgressed regions (550bp), which leads to a far more conservative estimate of NCO counts (As seen in the updated Figure 2 and Table 2). This better accounts for the increased resolution in introgressed regions, and while it's possible to be more stringent with our corrections, we believe that further stringency would be unreasonable. We also see promising signs that the correction is sufficient when counting our CO and NCO events in both crosses, as described in our response to comment 39 (response to reviewer #3).

      (2) CO and NCO analyses performed separately for individual regions rarely show statistical significance (Figures 3 and 4). I think that the authors, after dividing the introgressed regions into non-overlapping windows of 100 bp (I suggest also trying 200 bp, 500 bp, and 1kb windows), should combine the data for all regions and perform correlations to SNP density in each window for the whole set of data. Such an analysis has a greater chance of demonstrating statistically significant relationships. This could replace the analysis presented in Figure 3 (which can be moved to Supplement). Moreover, the analysis should also take into account indels.

      We're uncertain of what is being requested here. If the comment refers to the effect of marker density on NCO detection, we hope the response to comment 2 will help resolve this comment as well. Otherwise, we ask for some clarification so that we may correct or revise as appropriate.

      (3) In Arabidopsis, it has been shown that crossover is stimulated in heterozygous regions that are adjacent to homozygous regions on the same chromosome (http://dx.doi.org/10.7554/eLife.03708.001, https://doi.org/10.1038/s41467-022-35722-3).

      This effect applies only to class I crossovers, and is reversed for class II crossovers (https://doi.org/10.15252/embj.2020104858, https://doi.org/10.1038/s41467-023-42511-z). This research system is very similar to the system used by the authors, although it likely differs in the level of DNA sequence divergence. The authors could discuss their work in this context.

      We thank the reviewer for sharing these references. We have added a discussion of our work in the context of these findings in the Discussion, lines 367-376.

      Reviewer #2 (Public Review):

      Summary:

      Schwartzkopf et al characterized the meiotic recombination impact of highly heterozygous introgressed regions within the budding yeast Saccharomyces uvarum, a close relative of the canonical model Saccharomyces cerevisiae. To do so, they took advantage of the naturally occurring Saccharomyces bayanus introgressions specifically within fermentation isolates of S. uvarum and compared their behavior to the syntenic regions of a cross between natural isolates that do not contain such introgressions. Analysis of crossover (CO) and noncrossover (NCO) recombination events shows both a depletion in CO frequency within highly heterozygous introgressed regions and an increase in NCO frequency. These results strongly support the hypothesis that DNA sequence polymorphism inhibits CO formation, and has no or much weaker effects on NCO formation. Eventually, the authors show that the presence of introgressions negatively impacts "r", the parameter that reflects the probability that a randomly chosen pair of loci shuffles their alleles in a gamete.

      The authors chose a sound experimental setup that allowed them to directly compare recombination properties of orthologous syntenic regions in an otherwise intra-specific genetic background. The way the analyses have been performed looks right, although this reviewer is unable to judge the relevance of the statistical tests used. Eventually, most of their results which are elegant and of interest to the community are present in Figure 2.

      Strengths:

      Analysis of crossover (CO) and noncrossover (NCO) recombination events is compelling in showing both a depletion in CO frequency within highly heterozygous introgressed regions and an increase in NCO frequency.

      Weaknesses:

      The main weaknesses refer to a few text issues and a lack of discussion about the mechanistic implications of the present findings.

      - Introduction

      (1) The introduction is rather long. | I suggest specifically referring to "meiotic" recombination (line 71) and to "meiotic" DSBs (line 73) since recombination can occur outside of meiosis (ie somatic cells).

      We agree and have condensed the introduction to be more focused. We also made the suggested edits to include “meiotic” when referring to recombination and DSBs.

      (2) From lines 79 to 87: the description of recombination is unnecessarily complex and confusing. I suggest the authors simply remind that DSB repair through homologous recombination is inherently associated with a gene conversion tract (primarily as a result of the repair of heteroduplex DNA by the mismatch repair (MMR) machinery) that can be associated or not to a crossover. The former recombination product is a crossover (CO), the latter product is a noncrossover (NCO) or gene conversion. Limited markers may prevent the detection of gene conversions, which erase NCO but do not affect CO detection.

      We changed the language in this section to reflect the reviewer’s suggestions.

      (3) In addition, "resolution" in the recombination field refers to the processing of a double Holliday junction containing intermediates by structure-specific nucleases. To avoid any confusion, I suggest avoiding using "resolution" and simply sticking with "DSB repair" all along the text.

      We made the suggested correction throughout the paper.

      (4) Note that there are several studies about S. cerevisiae meiotic recombination landscapes using different hybrids that show different CO counts. In the introduction, the authors refer to Mancera et al 2008, a reference paper in the field. In this paper, the hybrid used showed ca. 90 CO per meiosis, while their reference to Liu et al 2018 in Figure 2 shows less than 80 COs per meiosis for S. cerevisiae. This shows that it is not easy to come up with a definitive CO count per meiosis in a given species. This needs to be taken into account for the result section line 315-321.

      This is an excellent point. We added this context in the results (lines 180-187).

      (5) In line 104, the authors refer to S. paradoxus and mention that its recombination rate is significantly different from that of S. cerevisiae. This is inaccurate since this paper claims that the CO landscape is even more conserved than the DSB landscape between these two species, and they even identify a strong role played by the subtelomeric regions. So, the discussion about this paper cannot stand as it is.

      We agree with the reviewer's point. We also found that the entire paragraph was unnecessary, so it and the sentence in question have been removed.

      (6) Line 150, when the authors refer to the anti-recombinogenic activity of the MMR, I suggest referring to the published work from Martini et al 2011 rather than the not-yet-published work from Copper et al 2021, or both, if needed.

      Added the suggested citation.

      Results

      (7) The clear depletion in CO and the concomitant increase in NCO within the introgressed regions strongly suggest that DNA sequence polymorphism triggers CO inhibition but does not affect NCO or to a much lower extent. Because most CO likely arises from the ZMM pathway (CO interference pathway mainly relying on Zip1, 2, 3, 4, Spo16, Msh4, 5, and Mer3) in S. uvarum as in S. cerevisiae, and because the effect of sequence polymorphism is likely mediated by the MMR machinery, this would imply that MMR specifically inhibits the ZMM pathway at some point in S. uvarum. The weak effect or potential absence of the effect of sequence polymorphism on NCO formation suggests that heteroduplex DNA tracts, at least the way they form during NCO formation, escape the anti-recombinogenic effect of MMR in S. uvarum. A few comments about this could be added.

      We have added discussion and citations regarding the biased repair of DSB to NCO in introgression, lines 380-386.

      (8) The same applies to the fact that the CO number is lower in the natural cross compared to the fermentation cross, while the NCO number is the same. This suggests that under similar initiating Spo11-DSB numbers in both crosses, the decrease in CO is likely compensated by a similar increase in inter-sister recombination.

      Thank you to the reviewer for this observation. We agree that this could explain some differences between the crosses.

      (9) Introgressions represent only 10% of the genome, while the decrease in CO is at least 20%. This is a bit surprising especially in light of CO regulation mechanisms such as CO homeostasis that tends to keep CO constant. Could the authors comment on that?

      We interpret these results to reflect two underlying mechanisms. First, the presence of heterozygous introgression does reduce the number of COs. Second, we believe the difference in COs reflects variation in recombination rate between strains. We note that CO homeostasis need not apply across different genetic backgrounds. Indeed, recombination rate is appreciated to significantly differ between strains of S. cerevisiae (Raffoux et al. 2018), and recombination rate variation has been observed between strains/lines/populations in many different species including Drosophila, mice, humans, Arabidopsis, maize, etc. We reference S. cerevisiae strain variability in the Introduction lines 128-130, and have added context in the Results lines 180-187, and Discussion lines 343-350.

      (10) Finally, the frequency of NCOs in introgressed regions is about twice the frequency of CO in non-introgressed regions. Both CO and NCO result from Spo11-initiating DSBs.

      This suggests that more Spo11-DSBs are formed within introgressed regions and that such DSBs specifically give rise to NCO. Could this be related to the lack of homolog engagement which in turn shuts down Spo11-DSB formation as observed in ZMM mutants by the Keeney lab? Could this simply result from better detection of NCO in introgressed regions related to the increased marker density, although the authors claim that NCO counts are corrected for marker resolution?

      The effect noted by the reviewer remains despite the more conservative correction for marker density applied to NCO counts (as described in the response to Reviewer 1, comment #2). Given that CO+NCO counts in introgressed regions are not statistically different between crosses, it is likely that these regions are simply predisposed to a higher rate of DSBs than the rest of the genome. This is an interesting observation, however, and one that we would like to further explore in future work.

      (11) What could be the explanation for chromosome 12 to have more shuffling in the natural cross compared to the fermentation cross which is deprived of the introgressed region?

      We added this text to the Results, lines 323-327, "While it is unclear what potential mechanism is mediating the difference in shuffling on chromosome 12, we note that the rDNA locus on chromosome 12 is known to differ dramatically in repeat content across strains of S. cerevisiae (22–227 copies) (Sharma et a. 2022), and we speculate that differences in rDNA copy number between strains in our crosses could impact shuffling."

      Technical points:

      (12) In line 248, the authors removed NCO with fewer than three associated markers.

      What is the rationale for this? Is the genotyping strategy not reliable enough to consider events with only one or two markers? NCO events can be rather small and even escape detection due to low local marker density.

      We trust the genotyping strategy we used, but chose to be conservative in our detection of NCOs to account for potential sequencing biases.

      (13) Line 270: The way homology is calculated looks odd to this reviewer, especially the meaning of 0.5 homology. A site is either identical (1 homology) or not (0 homology).

      We've changed the language to better reflect what we are calculating (diploid sequence similarity; see comment #28). Essentially, the metric is a probability that two randomly selected chromatids--one from each parent--will share the same nucleotide at a given locus (akin to calculating the probability of homozygous offspring at a single locus). We average it along a segment of the genome to establish an expected sequence similarity if/when recombination occurs in that segment.

      (14) Line 365: beware that the estimates are for mitotic mismatch repair (MMR). Meiotic MMR may work differently.

      We removed the citation that refers exclusively to mitotic recombination. The statement regarding meiotic recombination is otherwise still reflective of results from Chen & Jinks-Robertson

      (15) Figure 1: there is no mention of potential 4:0 segregations. Did the authors find no such pattern? If not, how did they consider them?

      The program we used to call COs and NCOs (ReCombine's CrossOver program) can detect such patterns, but none were detected in our data.

      Reviewer #3 (Public Review):

      When members of two related but diverged species mate, the resulting hybrids can produce offspring where parts of one species' genome replace those of the other. These "introgressions" often create regions with a much greater density of sequence differences than are normally found between members of the same species. Previous studies have shown that increased sequence differences, when heterozygous, can reduce recombination during meiosis specifically in the region of increased difference. However, most of these studies have focused on crossover recombination, and have not measured noncrossovers. The current study uses a pair of Saccharomyces uvarum crosses: one between two natural isolates that, while exhibiting some divergence, do not contain introgressions; the other is between two fermentation strains that, when combined, are heterozygous for 9 large regions of introgression that have much greater divergence than the rest of the genome. The authors wished to determine if introgressions differently affected crossovers and noncrossovers, and, if so, what impact that would have on the gene shuffling that occurs during meiosis.

      (1) While both crossovers and noncrossovers were measured, assessing the true impact of increased heterology (inherent in heterozygous introgressions) is complicated by the fact that the increased marker density in heterozygous introgressions also increases the ability to detect noncrossovers. The authors used a relatively simple correction aimed at compensating for this difference, and based on that correction, conclude that, while as expected crossovers are decreased by increased sequence heterology, counter to expectations noncrossovers are substantially increased. They then show that, despite this, genetic shuffling overall is substantially reduced in regions of heterozygous introgression. However, it is likely that the correction used to compensate for the effect of increased sequence density is defective, and has not fully compensated for the ascertainment bias due to greater marker density. The simplest indication of this potential artifact is that, when crossover frequencies and "corrected" noncrossover frequencies are taken together, regions of introgression often appear to have greater levels of total recombination than flanking regions with much lower levels of heterology. This concern seriously undercuts virtually all of the novel conclusions of the study. Until this methodological concern is addressed, the work will not be a useful contribution to the field.

      We appreciate this concern. Please see response to comments #2 and #38. We further note that our results depicted in Figure 3 and 4 are not reliant on any correction or comparison with non-introgressed regions, and thus our results regarding sequence similarity and its effect on the repair of DSBs and the amount of genetic shuffling with/without introgression to be novel and important observations for the field.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Line 149 - this sentence refers to a mixture of papers reporting somatic or meiotic recombination and as these processes are based on different crossover pathways, this should not be mixed. For example, it is known that in Arabidopsis MSH2 has a pro-crossover function during meiotic recombination.

      Corrected

      (2) What is unclear to me is how the crosses are planned. Line 308 shows that there were only two crosses (one "natural" and one "fermentation"), but I understand that this is a shorthand and in fact several (four?) different strains were used for the "fermentation cross". At least that's what I concluded from Fig. 1B and its figure caption. This needs to be further explained. Were different strains used for each fermentation cross, or was one strain repeated in several crosses? In Figure 1, it would be worth showing, next to the panel showing "fermentation cross", a diagram of how "natural cross" was performed, because as I understand it, panel A illustrates the procedure common to both types of crosses, and not for "natural cross".

      We thank the reviewer for drawing our attention to confusion about how our crosses were created. We performed two crosses, as depicted in Figure 1A. The fermentation cross is a single cross from two strains isolated from fermentation environments. The natural cross is a single cross from two strains isolated from a tree and insect. Table S1 and the methods section "Strain and library construction" describe the strains used in more detail. We modified Figure 1 and the figure legend to help clarify this. See also response to comment #37.

      (3) The authors should provide a more detailed characterization of the genetic differences between chromosomes in their hybrids. What is the level of polymorphism along the S. uvarum chromosomes used in the experiments? Is this polymorphism evenly distributed? What are the differences in the level of polymorphism for individual introgressions? Theoretically, this data should be visible in Figure 2D, but this figure is practically illegible in the present form (see next comment).

      As suggested, we remade Figure 2D to only include chromosomes with an introgression present, and moved the remaining chromosomes to the supplements (Figure S11). The patterns of markers (which are fixed differences between the strains in the focal cross) should be more clear now. As we detail in the Methods line 507-508, we utilized a total of 24,574 markers for the natural cross and 74,619 markers for the fermentation cross (the higher number in the fermentation cross being due to more fixed differences in regions of introgression).

      (4) Figure 2D should be prepared more clearly, I would suggest stretching the chromosomes, otherwise, it is difficult to see what is happening in the introgression regions for CO and NCO (data for SNPs are more readable). Maybe leave only the chromosomes with introgressions and transfer the rest to the supplement?

      See previous comment.

      (5) How are the Y scales defined for Figure 2D?

      Figure 2D now includes units for the y-axis.

      (6) Are increases in CO levels in fermentation cross-observed at the border with introgressions? This would indicate local compensation for recombination loss in the introgressed regions, similar to that often observed for chromosomal inversions.

      We see no evidence of an increase in CO levels at the borders of introgressions, neither through visual inspection or by comparing the average CO rate in all fermentation windows to that of windows at the edges of introgressions. This is included in the Discussion lines 360-366, "While we are limited in our interpretations by only comparing two crosses (one cross with heterozygous introgression and one without introgression), these results are in line with findings in inversions, where heterozygotes show sharp decreases in COs, but the presence of NCOs in the inverted region (Crown et al., 2018; Korunes & Noor, 2019). However, unlike heterozygous inversions where an increase in COs is observed on freely recombining chromosomes (the inter-chromosomal effect), we do not see an increase in COs on the borders flanking introgression or on chromosomes without introgression."

      (7) Line 336 - "We find positive correlations between CO counts..." - you should indicate here that between fermentation and natural crosses, it was quite hard for me to understand what you calculated.

      We corrected the language as suggested.

      (8) The term "homology" usually means "having a common evolutionary origin" and does not specify the level of similarity between sequences, thus it cannot be measured. It is used incorrectly throughout the manuscript (also in the intro). I would use the term "similarity" to indicate the degree of similarity between two sequences.

      We corrected the language as suggested throughout the document.

      (9) Paragraph 360 and Figure 3 - was the "sliding window" overlapping or non-overlapping?

      We added clarifying language to the text in both places. We use a 101bp sliding window with 50bp overlaps.

      (10) Line 369 - what is "...the proportion of bases that are expected to match between the two parent strains..."?

      We clarified the language in this location, and hopefully changes associated with the comment about sequence similarity will make the comment even clearer in context.

      (11) Line 378 - should it refer to Figure S1 and not Figure 4?

      Corrected.

      (12) Line 399 - should refer to Figure 4, not Figure 5.

      Corrected

      (13) Line 444-449 - the analysis of loss of shuffling in the context of the location of introgression on the chromosome should be presented in the result section.

      We shifted the core of the analysis to the results, while leaving a brief summary in the discussion.

      (14) The authors should also take into account the presence of indels in their analyses, and they should be marked in the figures, if possible.

      We filtered out indels in our variant calling. However, we did analyze our crosses for the presence of large insertions and deletions (Table S2), which can obscure true recombination rates, and found that they were not an issue in our dataset.

      Reviewer #2 (Recommendations For The Authors):

      This reviewer suggests that the authors address the different points raised in the public review.

      (1) This reviewer would like to challenge the relevance of the r-parameter in light of chromosome 12 which has no introgression and still a strong depletion in r in the fermentation cross.

      We added this text to the Results, lines 377-381, "While it is unclear what potential mechanism is mediating the difference in shuffling on chromosome 12, we note that the rDNA locus on chromosome 12 is known to differ dramatically in repeat content across strains of S. cerevisiae (22–227 copies) (Sharma et a. 2022), and we speculate that differences in rDNA copy number between strains in our crosses could impact shuffling."

      (2) This reviewer insists on making sure that NCO detection is unaffected by the marker density, notably in the highly polymorphic regions, to unambiguously support Figure 1C.

      We've changed our correction for resolution to be more aggressive (see response to comment #2), and believe we have now adequately adjusted for marker density (see response to comment #38).

      Reviewer #3 (Recommendations For The Authors):

      I regret using such harsh language in the public review, but in my opinion, there has been a serious error in how marker densities are corrected for, and, since the manuscript is now public, it seems important to make it clear in public that I think that the conclusions of the paper are likely to be incorrect. I regret the distress that the public airing of this may cause. Below are my major concerns:

      (1) The paper is written in a way that makes it difficult to figure out just what the sequence differences are within the crosses. Part of this is, to be frank, the unusual way that the crosses were done, between more than one segregant each from two diploids in both natural and fermentation cases. I gather, from the homology calculations description, that each of these four diploids, while largely homozygous, contained a substantial number of heterozygosities, so individual diploids had different patterns of heterology. Is this correct? And if so, why was this strategy chosen? Why not start with a single diploid where all of the heterologies are known? Why choose to insert this additional complication into the mix? It seems to me that this strategy might have the perverse effect of having the heterology due to the polymorphisms present in one diploid affect (by correction) the impact of a noncrossover that occurs in a diploid that lacks the additional heterology. If polymorphic markers are a small fraction of total markers, then this isn't such a great concern, but I could not find the information anywhere in the manuscript. As a courtesy to the reader, please consider providing at the beginning some basic details about the starting strains-what is the average level of heterology between natural A and natural B, and what fraction of markers are polymorphic; what is the average level of heterology between fermentation A and fermentation B in non-introgressed regions, in introgressed regions, and what fraction of markers are polymorphic? How do these levels of heterology compare to what has been examined before in whole-genome hybrid strains? It also might be worth looking at some of the old literature describing S. cerevisiae/S. carlsbergensis hybrids.

      We thank the reviewer for drawing our attention to confusion about the cross construction. These crosses were conducted as is typical for yeast genetic crosses: we crossed 2 genetically distinct haploid parents to create a heterozygous diploid, then collected the haploid products of meiosis from the same F1 diploid. Because the crosses were made with haploid parents, it is not possible for other genetic differences to be segregating in the crosses. We have revised Figure 1 and its caption to clarify this. Further details regarding the crosses are in the Methods section "Strain and library construction" and in Supplemental Table S1. We only utilized genetic markers that are fixed differences between our parental strains to call CO and NCO. As we detail in the Methods line 507-508, we utilized a total of 24,574 markers for the natural cross and 74,619 markers for the fermentation cross (the higher number in the fermentation cross being due to more fixed differences in regions of introgression). We additionally revised Figure 2D (and Figure S11) to help readers better visualize differences between the crosses.

      (2) There are serious concerns about the methods used to identify noncrossovers and to normalize their levels, which are probably resulting in an artifactually high level of calculated crossovers in Figure 2. As a primary indication of this, it appears in Figure 2 that the total frequency of events (crossovers + noncrossovers) in heterozygous introgressed regions are substantially greater than those in the same region in non-introgressed strains, while just shifting of crossovers to noncrossovers would result in no net increase. The simplest explanation for this is that noncrossovers are being undercounted in non-introgressed relative to introgressed heterozygous regions. There are two possible reasons for this: i. The exclusion of all noncrossover events spanning less than three markers means that many more noncrossovers in introgressed heterozygous regions than in non-introgressed. Assuming that average non-homology is 5% in the former and 1% in the latter, the average 3-marker event will be 60 nt in introgressed regions and 300 nt in non-introgressed regions - so many more noncrossovers will be counted in introgressed regions. A way to check on this - look at the number of crossover-associated markers that undergo gene conversion; use the fraction that involves < 3 markers to adjust noncrossover levels (this is the strategy used by Mancera et al.). ii. The distance used for noncrossover level adjustment (2kb) is considerably greater than the measured average noncrossover lengths in other studies. The effect of using a too-long distance is to differentially under-correct for noncrossovers in non-introgressed regions, while virtually all noncrossovers in heterozygous introgressed regions will be detected. This can be illustrated by simulations that reduce the density of scored markers in heterozygous introgressed regions to the density seen in non-introgressed regions. Because these concerns go to the heart of the conclusions of the paper, they must be addressed quantitatively - if not, the main conclusions of the paper are invalid.

      We adjusted the correction factor (See also response to comment #2) and compared the average number of CO and NCO events in introgressed and non-introgressed regions between crosses (two comparisons: introgression CO+NCO in natural cross vs introgression CO+NCO in fermentation cross; non-introgression CO+NCO in natural cross vs non-introgression CO+NCO in fermentation cross). We found no significant differences between the crosses in either of the comparisons. This indicates that the distribution of total events is replicated in both crosses once we correct for resolution.

      (3) It is important to distinguish the landscape of double-strand breaks from the landscape of recombination frequencies. Double-strand breaks, as measured by uncalibrated levels of Spo11-linked oligos, is a relative number - not an absolute frequency. So it is possible that two species could have a similar break landscape in terms of topography but have absolute levels higher in one species than in the other.

      We agree with this statement, however, we have removed the relevant text to streamline our introduction.

      (4) Lines 123-125. Just meiosis will produce mosaic genomes in the progeny of the F1; further backcrossing will reduce mosaicism to the level of isolated regions of introgression.

      Adjusted the language to be more specific.

      (5) Please provide actual units for the Y axes in Figure 2D.

      We have corrected the units on the axes.

      (6) Tables (general). Are the significance measures corrected for multiple comparisons?

      In Table 3, the cutoff was chosen to be more conservative than a Bonferroni corrected alpha=0.01 with 9 comparisons (0.0011). In text, any result referred to as significant has an associated hypothesis test with a p-value less than its corresponding Bonferroni-corrected alpha of 0.05. This has been clarified in the caption for Table 3 and in the text where relevant.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      I have added a paragraph that addresses the issue of how landmarks might be used and why they are not. The suggestions made in the "Weaknesses" paragraph were concise and excellent and have directly incorporated them into my revised manuscript. This text appears on Page 21 and is shown below. I hope that this is what the editors and reviewers were looking.

      The requested revision is the second paragraph.

      The first paragraph was not written in response to reviews but inspired by a recent paper by Mahdev et al (2024) - https://doi.org/10.1038/s41593-024-01681-9.  I had already requested to add this reference and was encouraged to do so by the Editors. The Mahdev et al paper was very surprising in that it showed that path integration is not constant but that its "gain" can be recalibrated by selfmotion signals. I wondered whether this unexpected capacity extended to path integration also recalibrating the cognitive map and thereby generating the shortcutting behavior we observe. I suggested that, at an abstract level, this would correspond to "coordinate transformation" of the cognitive map. I realize that this is entirely speculative. If the Editors feel that it does not add much to the manuscript and that the speculation goes to far, I will remove the first paragraph and re-submit.

      Added text. P21 and just before the heading: " Implications for theories of hippocampal representations of spatial maps" There were no other changes made in the paper.

      "Path integration uses self-motion signals to update the animal's estimated location on its internal cognitive map. Path integration gain has been shown to be plastic and regulated by landmarks (52). Remarkably, a recent study has revealed that path integration gain can also be directly recalibrated by self-motion signals alone (53), albeit not as effectively as by landmarks (52, 53). An interesting question for future research is whether self-motion signals can also recalibrate the coordinates of a cognitive map. From this perspective, the Target B to Target A shortcut requires a transformation of the cognitive map coordinates so that the start point is now Target B.

      Extensive research has shown that external cues can control hippocampal neuron place fields (11, 12, 54) and the gain of the path integrator (52), making the failure of mice in our study to use such cues puzzling. The failure to use landmarks may be related to our task being low stakes and our pretraining procedure teaching the mouse that such cues are not necessary. Our results may not generalize to more natural conditions where many reliable prominent cues are available, and where there is urgency to find food or water while avoiding predation (55). Under these more naturalistic conditions the use of distal cues to rapidly find a food reward is more likely to be observed."

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, the authors continue their investigations on the key role of glycosylation to modulate the function of a therapeutic antibody. As a follow-up to their previous demonstration on how ADCC was heavily affected by the glycans at the Fc gamma receptor (FcγR)IIIa, they now dissect the contributions of the different glycans that decorate the diverse glycosylation sites. Using a well-designed mutation strategy, accompanied by exhaustive biophysical measurements, with extensive use of NMR, using both standard and newly developed methodologies, they demonstrate that there is one specific locus, N162, which is heavily involved in the stabilization of (FcγR)IIIa and that the concomitant NK function is regulated by the glycan at this site.

      Strengths:

      The methodological aspects are carried out at the maximum level.

      Weaknesses:

      The exact (or the best possible assessment) of the glycan composition at the N162 site is not defined.

      We revised the Introduction to include previous findings from our laboratory regarding processing on YTS cells:

      “YTS cells, a key cytotoxic human NK cell line used for these studies, express FcγRIIIa with extensive glycan processing, including the N162 site with predominantly hybrid and complex-type glycoforms {Patel 2021}.” 

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to demonstrate a mechanistic link between Fcgamma receptor (IIIA) glycosylation and IgG binding affinity and signaling - resulting in antibody-dependent cellular cytotoxicity - ADCC. The work builds off prior findings from this group about the general impact of glycosylation on FcR (Fc receptor)-IgG binding.

      Strengths:

      The structural data (NMR) is highly compelling and very significant to the field. A demonstration of how IgG interacts with FcgRIIIA in a manner sensitive to glycosylation of both the IgG and the FcR fills a critical knowledge gap. The approach to demonstrate the selective impact of glycosylation at N162 is also excellent and convincing. The manuscript/study is, overall, very strong.

      Weaknesses:

      There are a number of minor weaknesses that should be addressed.

      (1) Since S164A is the only mutant in Figure 1 that seems to improve affinity, even if minimally, it would be a nice reference to highlight that residue in the structural model in panel B.

      We revised Figure 1B to include the S164 site.

      (2) It is confusing why some of the mutants in the study are not represented in Figure 1 panel A. Those affinities and mutants should be incorporated into panel A so the reader can easily see where they all fall on the scale.

      We thank the reviewer for this comment. We restructured the Results section to highlight that a primary outcome of the experiment referenced was to map the contribution of interface residues to antibody binding affinity. These data were not previously available, highlighting hotspots at the interface. Figure 1A and B report these results.

      We then used a subset of mutations from this experiment, as well as a subset of mutations from an additional library containing mutations proximal to the interface, to build a small library for evaluation using ADCC. The complete binding data for all variants, binding to two different IgG1 Fc glycoforms, is presented in Supplemental Table 1. 

      T167Y in particular needs to be shown, as it is one of few mutants that fall between what seems to be ADCC+ and ADCC- lines. Also, that mutant seems to have a stronger affinity compared to wt (judged by panel D), yet less ADCC than wt. This would imply that the relationship between affinity and activity is not as clean as stated, though it is clearly important. Comments about this would strengthen the overall manuscript.

      We thank the reviewer for this particular insight. We agree that the lack of a clean correlation between ADCC potency and affinity implies additional factors that could have affected these experimental results. We added the following sentence to the discussion. 

      “Notably, the ADCC potency for those high-affinity variants does not fall cleanly on a line, indicating that other factors affect our observations, which may include organization at the cell surface, changes to glycan composition, or receptor trafficking.”

      (3) This statement feels out of place: "In summary, this result demonstrates that the sensitivity to antibody fucosylation may be eliminated through FcγRIIIa engineering while preserving antibody-binding affinity." In Figure 2, the authors do indeed show that mutations in FcgRIIIa can alter the impact of IgG core fucosylation, but implying that receptor engineering is somehow translatable or as impactful therapeutically as engineering the antibody itself deflates the real basic science/biochemical impact of understanding these interactions in molecular detail. Not everything has to be immediately translatable to be important. 

      We agree and removed the highlighted sentence.   

      (4) The findings reported in Figure 2, panel C are exciting. Controls for the quality of digestion at each step should be shown (perhaps in supplementary data).

      We agree. We added an example of the digestions as Figure S2.  

      (5) Figure 3 is confusing (mislabeled?) and does not show what is described in the Results. First, there is a F158V variant in the graph but a V158F variant in the text.

      Please correct this. 

      Thank you for identifying this typo. We corrected Figure 3.

      Second, this variant (V158F/F158V) does not show the 2-fold increase in ADCC with kifunesine as stated. 

      Thank you for drawing our attention to this rounding error. We revised the text to report a statistically significant 1.4-fold increase.

      Finally, there are no statistical evaluations between the groups (+/- kif; +/- fucose). 

      We provide the p values for +/-fuc and +/- Kifunensine for each YTS cell line in the figure. We did not provide a global comparison of p values that included all cell lines due to some cell lines experiencing a significant change and others not. However, we added the raw data as Supplemental Table 2 should readers wish to perform these analyses.

      The differences stated are not clearly statistically significant given the wide spread of the data. This is true even for the wt variant.

      We agree that there are points that overlap in this figure between the different treatments. However, our use of the students T-test (two tailed) using three experiments collected on three different days (each with three technical replicates) provides enough resolution to determine the significance of difference of the means for the different treatments. This is, by our estimation, a highly rigorous manner to collect and analyze the data.  

      (6) The kifunensine impact is somewhat confusing. They report a major change in ADCC, yet similar large changes with trimming only occur once most of the glycan is nearly gone (Figure 2). Kifunensine will tend to generate high mannose and possibly a few hybrid glycans. It is difficult to understand what glycoforms are truly important outside of stating that multi-branched complex-type N-glycans decrease affinity.

      Note that Figure 2 does not evaluate the kifunensine-treated glycan, which is mostly Man8 and Man9 structures. In our previous work, these structures likewise provide increased binding affinity (see pubmed ID 30016589). We believe the most important message is that composition of the N162 glycan (removed with the S164A mutation) regulates NK cell ADCC. On cells, we are not able to modulate N162 glycan composition without affecting potentially every other N-glycan on the surface, so we do not have an ADCC experiments that is directly comparable to Figure 2. Thus, this increased ADCC resulting from kifunensine treatment is consistent with previously observed increases in binding affinity measurement.  

      (7) This is outside of the immediate scope, but I feel that the impact would be increased if differences in NK cell (and thus FcgRIIIA) glycosylation are known to occur during disease, inflammation, age, or some other factor - and then to demonstrate those specific changes impact ADCC activity via this mechanism.

      We agree completely. As mentioned in the Introduction, we know that N162 glycan composition varies substantially from donor to donor based on previous work from our

      lab. Curiously, little variability appeared between donors at the other four Nglycosylation sites. Thus, there is the potential that different NK cell N162 glycan compositions are coincident with different indications. This is an area we are quite interested in pursuing.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      (1) A major issue throughout the paper is that Hox expression analysis is done exclusively through quantitative PCR, with values ranging from 2-fold to several thousand-fold upregulation, with no antibody validation for any Hox protein (presumably they are all upregulated).

      Thank you for your comment.

      We tried to verify the stimulated Hox expression pattern by in situ hybridization. Although in early embryos (E9.5) we could detect clearly hox (i.e. Hox8 and Hox9 in Author response image 1) expression patterns in the neural tube by whole mount in situ hybridization, we failed to detect a clear pattern in the brain stem at E18.5 either in whole mount tissue or on sections. That’s one reason that we turned to single nuclear RNA-seq instead.

      This is likely due to their low expression levels at late developmental stages and need to be detected by more sensitive method. However, we estimated that the stimulated expression levels of the representative Hox genes are at least comparable to the physiological levels at posterior spinal cord to evoke a functional effect.

      Author response image 1.

      Some Hox8 and Hox9 expression pattern in E9.5 embryos.

      (2) In Figure 1, massive upregulation of most Hox genes in the brainstem is shown after e16.5 but the paper quickly focuses on analysis of PN nuclei. What are the other consequences of this broad upregulation of Hox genes in the brainstem? There is no discussion of the overall phenotype of the mice, the structure of the brainstem, the migration of neurons, etc. The very narrow focus on motor cortex projections to PN nuclei seems bizarre without broad characterization of the mice, and the brainstem in particular. There is only a mention of "severe motor deficits" from previous studies, but given the broad expression of Rnf220, the fact that is a global knockout, and the effects on spinal cord populations shown previously the justification for focusing on PN nuclei does not seem strong.

      Thank you for your comment.

      Although RNF220 is important for the dorsal-ventral patterning of the spinal cord as well as the hindbrain during embryonic development, the earlier neural patterning and differentiation are normal in the Rnf220+/- mice (Wang et al., 2022). However, these mice showed reduced survival and motility to various degree postnatally (Ma et al., 2019; Ma et al., 2021), likely suggesting a dosage dependent role of RNF220 in maintaining late neural development. As our microarray assay showed the deregulation of the Hox genes in the brain, we followed this direction in this study and narrowed down the affected region to the pons. Our single nuclear RNA-Seq (snRNA-seq) data further shows that the Hox de-regulation mainly occurred in 3 clusters of neurons. However, the pons is complex and contains tens of nuclei. And the current resolution of our data does not support to assign a clear identity to each of them. Although it is clear that more nuclei are likely affected, the PN (cluster7) is the only cluster we can identify to follow in the current study. 

      As to general effect of RNF220 haploinsufficiency on the brainstem, we carried out Nissl staining assays and found no clear difference in neuronal cell organization between WT and Rnf220+/- pons (revised Figure 2-figure supplement 2).

      (3) It is stated that cluster 7 in scRNA-seq corresponds to the PN nuclei. The modest effect shown on Hox3-5 expression in that data in Figure 1 is inconsistent with the larger effect shown in Figure 2.

      Thank you for your comment.

      Due to the low efficiency of snRNA-seq and the depth of the sequencing, the quantification of the Hox expression based on the snRNA-seq data is likely less accurate as the qRT-PCR. In addition, only mRNAs in the nuclear could be captured by snRNA-seq, while mRNAs in both the nuclear and cytoplasm were reversed-transcribed and examined for qRT-PCR assays in Figure 2A.

      (4) Presumably, Hox genes are not the only targets of Rnf220 as shown in the microarray/RNA-sequencing data. There is no definitive evidence that any phenotypes observed (which are also not clear) are specifically due to Hox upregulation. The only assay the authors use to look at a Hox-dependent phenotype in the brainstem is the targeting of PN nuclei by motor cortex axons. This is only done in 2 animals and there are no details as to how the data was analyzed and quantified. The only 2 images shown are not convincing of a strong phenotype, they could be taken at slightly different levels or angles. At the very least, serial sections should be shown and the experiment repeated in more animals. There is also no discussion of how these phenotypes, if real, would relate to previous work by the Rijli group which showed very precise mechanisms of synaptic specificity in this system.

      Thank you for your comments and suggestions.

      The deregulation of Hox is the most obvious phenomena observed from the RNA-seq data, and we tried to assign its specific phenotypic effect in this study. As the roles of Hox in PN patterning and circuit formation is well established, we focused on the PN in the following study. Based on literature, we carried out the circuit analysis to examine the targeting of PN neurons by the motor cortex axons. A cohort of additional animals with different genotypes (n=10 for WT and n=9 for Rnf220+/-) were used to repeat the experiment and we got the same conclusion. More detailed information on data analysis and serial images were included in the revised manuscript and figure legends.

      (5) The temporal aspect of this regulation in vivo is not clear. The authors show some expression changes begin at e16.5 but are also present at 2 months. Is the presumed effect on neural circuits a result of developmental upregulation at late embryonic stages or does the continuous overexpression in adult mice have additional influence? Are any of the Hox genes upregulated normally expressed in the brainstem, or PN specifically, at 2 months? Why perform single-cell sequencing experiments at 2 months if this is thought to be mostly a developmental effect? Similarly, the significance of the upregulated WRD5 in the pons and pontine nuclei at 2 months in Figure 3 is not clear.

      Thank you for your comment.

      The spatial and temporal expression pattern of Hox genes is established at early embryonic stages and then maintained throughout developmental stage in mammals. As we have shown, the de-repression of Hox genes is a long-lasting defect in Rnf220+/- mice beginning at late embryonic stages. Since the neuronal circuit is established after birth in mice, we speculated that the neuronal circuit defects from motor cortex to PN neurons were due to the long-lasting up-regulation of Hox genes in PN neurons. We could not distinguish the effect on neural circuit a result of Hox genes developmental upregulation or continuous overexpression in adult mice. An inducible knockout mouse model may help to answer this question in the future. The discussion on this point was included in the revised manuscript.

      We carried out snRNA-seq analysis using pons tissues from adult mice aiming to identify the specific cell population with Hox up-regulation, which we failed to specify by in situ hybridization.

      We repeated the related experiments in the original Figure 3 and some of the blot images were replaced and quantified.

      (6) In Figure 3C, the levels of RNF220 in wt and het don't seem to be that different.

      We repeated the experiments and changed the related image in the revised Figure 3C.

      (7) Based on the single-cell experiments, and the PN nuclei focus, the rescue experiments are confusing. If the Rnf220 deletion has a sustained effect for up to 2 months, why do the injections in utero? If the focus is the PN nuclei why look at Hox9 expression and not Hox3-5 which are the only Hox genes upregulated in PN based on sc-sequencing? No rescue of behavior or any phenotype other than Hox expression by qPCR is shown and it is unclear whether upregulation of Hox9 paralogs leads to any defects in the first place. The switch to the Nes-cre driver is not explained. Also, it seems that wdr5 mRNA levels are not so relevant and protein levels should be shown instead (same for rescue experiments in P19 cells).

      Thank you for your comments.

      Since our data suggest that the upregulation of Hox genes expression is a long-lasting effect beginning at the late embryonic stage of E16.5, we conducted the rescue experiments by in utero injection of WDR5 inhibitor at E15.5 and examined the expression of Hox genes at E18.5. Although it is also necessary to examine whether the rescue effect by WDR5 inhibitor injection is also a long-lasting effect at adult stages, it is difficult to distinguish the embryos or pups when they were given birth. As a supplement, rescue assays with genetic ablation of Wdr5 gene were conducted and the results showed that genetic ablation of a single copy of Wdr5 allele could revere the upregulation of Hox genes by RNF220 haploinsufficiency in the hindbrains at P15.

      Most of the upregulated Hox genes including both Hox9 and Hox3-5 were examined in our rescue experiments. Since this study focuses on the PN nuclei, the results of Hox3-5 genes were shown in the revised main Figure 6.

      We conducted rescue experiments by deleting Wdr5 in neural tissue using Nestin-Cr_e mice because _Wdr5+/- mice is embryonic lethal. And the up-regulation of Hox genes could be also observed in the hindbrains of Rnf220fl/wt; Nestin-Cre mice. Although Rnf220fl/wt; Wdr5fl/wt; Nestin-Cre mice are viable and could survive to adult stages, developmental defects in the forebrains, including cerebral cortex and hippocampus, were observed in Rnf220fl/wt;Wdr5fl/wt;Nestin-Cre mice. Therefore, no rescue of behavior tests was conducted in this study. We believe that it is out of the scope of this study to discuss the role of WDR5 in the development of forebrains.

      The potential defects due to the up-regulation of Hox9 paralogs awaits further investigations.

      Wdr5 mRNA levels were firstly examined to confirm the genetic deletion or siRNA mediated knockdown of Wdr5 genes. We have carried out western blot to examine the WDR5 protein levels and the results were included in the revised Figure 3.

      (8) What is the relationship between Retinoic acid and WRD5? In Figure 3E there is no change in WRD5 levels without RA treatment in Rnf KO but an increase in expression with RA treatment and Rnf KO. However, the levels of WRD5 do not seem to change with RA treatment alone. Does Rnf220 only mediate WDR5 degradation in the presence of RA? This does not seem to be the case in experiments in 293 cells in Figure 4.

      Thank you for your comment.

      We believe that the regulation of WDR5 and Hox expression by RNF220 is context dependent and precisely controlled in vivo, depending on the molecular and epigenetic status of the cell, which is fulfilled by RA treatment in P19 cells. In Figure 4, the experiment is based on exogenous overexpression assays, which might not fully reflect the situation in vivo.

      (9) Why are the levels of Hox upregulation after RA treatment so different in Figure 5 and Figure Supplement 5?

      In Figure.5C, the Hox expression levels were normalized against the control group in the presence of RA; while in Figure Supplement 5 they were normalized to the control group without RA treatment.

      (10) In Figures 4B+C which lanes are input and which are IP? There is no quantitation of Figure 4D, from the blot it does look that there is a reduction in the last 2 columns as well. The band in the WT flag lane seems to have a bubble. Need to quantitate band intensities. Same for E, the effect does not seem to be completely reversed with MG132.

      Thanks for pointing this out. The labels were included in the revised Figure 4B and 4C.

      We repeated the experiments for Figure 4D and 4E. Some of bot images were replaced and quantified in the revised Figure 4D and 4E.

      Reviewer 2:

      (1) Figure 1E shows that Rnf220 knockdown alone could not induce an increase in Hox expression without RA, which indicates that Rnf220 might endogenously upregulate Retinoic acid signaling. The authors should test if RA signaling is downstream of Rnf220 by looking at differences in the expression of Retinaldehyde dehydrogenase genes (as a proxy for RA synthesis) upon Rnf220 knockdown.

      Thank you for your comment and suggestion.

      Two sequential reactions are required for RA synthesis from retinol, which catalyzed by alcohol dehydrogenases (ADHs)/ retinol dehydrogenase (RDH) and retinaldehyde dehydrogenase (RALDHs also known as ALDHs) respectively. When RA is no longer needed, it is catabolized by cytochrome enzymes (CYP26 enzymes) (Niederreither, et al.,2008; Kedishvili et al., 2016). Here, we test ADHs、ALDHs and CYP26 enzymes in E16.5 WT and Rnf220-/- embryos.

      The results are as follows. ADH7 and ADH10 are slightly upregulated. ALDH1 and ALDH3 are upregulated and downregulated in Rnf220-/- embryos, respectively, but there is no significant change in the expression of ALDH2, which plays a key role in RA synthesis during embryonic development (Niederreither, et al.,2008). Furthermore, Cyp26a1 which responsible for RA catabolism was upregulated in Rnf220-/- embryos. Collectively, these data do not support a clear effect on RA signaling by RNF220.  

      Author response image 2.

      The effect of Rnf220 on RA synthesis and degradation pathways

      (2) In Figure 2C-D further explanation is required to describe what criteria were used to segment the tissue into Rostral, middle, and caudal regions. Additionally, it is unclear whether the observed change in axonal projection pattern is caused due to physical deformation and rearrangement of the entire Pons tissue or due to disruption of Hox3-5 expression levels. Labeling of the tissue with DAPI or brightfield image to show the structural differences and similarities between the brain regions of WT and Rnf220 +/- will be helpful.

      Thank you for your comment and suggestion.

      More information on the quantification of the results shown in Figure 2C-D was included in our revised manuscript. We carried out Nissl staining assays using coronal sections of the brainstem and found that there is no significant difference in neuronal cell organization between WT and Rnf220+/- (revised Figure 2-figure supplement 2).

      (3) Line 192-195. These roles of PcG and trxG complexes are inconsistent with their initial descriptions in the text - lines 73-74.

      We are sorry for the mistake. We carefully revised the related descriptions to avoid such mistake. Thank you.

      (4) In Figure 4D, the band in the gel seems unclear and erased. Please provide a different one. These data show that neither Rnf220 nor wdr5 directly regulates Hox gene expressions. The effect of double knockdown in the presence of RA suggests that they work together to suppress Hox gene expression via a different downstream target. This point should be addressed in the text and discussion section of the paper. example for the same data which shows a full band with lower intensity.

      Thank you for your suggestion.

      We repeated the experiment of Figure 4D and some of the blot images were replaced in the revised Figure 4D.

      Indeed, in the presence of RA, knockdown of Rnf220 alone can upregulate the expression Hox genes (Figure 5C). Knockdown of Wdr5 could reverse the upregulation of Hox genes in RNF220 knockdown cells, suggesting that Rnf220 regulated Hox gene expression in a Wdr5 dependent manner. However, in the absence of RA, none of Rnf220 knockdown, Wdr5 knockdown or Rnf220 and Wdr5 double knockdown had a significant effect on the expression of Hox genes in P19 cells. It seems that RA signaling plays a crucial role for the regulation of RNF220 to WDR5 in P19 cells and discussion on this point was included in the revised manuscript.

      (5) In Figure 4G the authors could provide some form of quantitation for changes in ubiquitination levels to make it easier for the reader. They should also describe the experimental procedures and conditions used for each of the pull-down and ubiquitination assays in greater detail in the methods section.

      Thank you for your suggestion.

      The quantitation and statistics for the original Figure 4G were included in the revised Figure 4. More information on the biochemical assays was included in the “Methods and Materials” section of our revised manuscript.

      (6) Figure 5 shows that neither Rnf220 nor wdr5 directly regulate Hox gene expressions. The effect of double knockdown in the presence of RA suggests that they work together to suppress Hox gene expression via a different downstream target.

      Thank you for your comment.

      In fact, knockdown of Rnf220 alone can upregulate the expression Hox genes in the presence of RA (Figure 5C). Furthermore, knockdown of Wdr5 could reverse the upregulation of Hox genes in Rnf220 knockdown cells, which suggest that Rnf220 regulated Hox gene expression in a Wdr5 dependent manner. However, in the absence of RA, none of Rnf220 knockdown, Wdr5 knockdown or Rnf220 and Wdr5 double knockdown had a significant effect on the expression of Hox genes in P19 cells. It seems that RA signaling plays a crucial role for the regulation of RNF220 to WDR5 in P19 cells and discussion on this point was included in the revised manuscript.

      (7) In Figure 6, while the reversal of changes in Hox gene expression upon concurrent Rnf220; Wdr5 inhibition highlights the importance of Wdr5 in this regulatory process, the mechanistic role of wdr5 and its functional consequences are unclear. To answer these questions, the authors need to: (i) Assay for activated and repressive epigenetic modifications upon double knockdown of Rnf220 and Wdr5 similar to that shown in Figure 3- supplement 1. This will reveal if wdr5 functions according to its intended role as part of the TrxG complex. (ii) The authors need to assay for changes in axon projection patterns in the double knockdown condition to see if Wdr5 inhibition rescues the neural circuit defects in Rnf220 +/- mice.<br />

      Thank you for your suggestion.

      Although it is also necessary to examine whether the rescue effect by WDR5 inhibitor injection in uetro is also a long-lasting effect for neuronal cirtuit at adult stages, it is difficult to distinguish the embryos or pups when they were given birth. Although Rnf220fl/wt;Wdr5fl/wt;Nestin-Cre mice are viable and could survive to adult stages, developmental defects in the forebrains, including cerebral cortex and hippocampus, were observed in Rnf220fl/wt;Wdr5fl/wt;Nestin-Cre mice. Therefore, no rescue effect on defects of behavior and neuronal circuit were examined in this study. Maybe, a PN nuclei specific inducible Cre mouse line could help toward this direction in the future.

      We carried out ChIP-qPCR and tested activated and repressive epigenetic modifications upon double knockdown of Rnf220 and Wdr5 in P19 cell line and found Rnf220 and Wdr5 double knockdown recured Hox epigenetic modification to a certain degree (Figure 6-figure supplement 1).

      References

      Kedishvili, N.Y. 2016. Retinoic acid synthesis and degradation. Subcell Biochem, 81:127-161. DOI: 10.1007/978-94-024-0945-1_5, PMID: 2783050

      Ma, P., Li, Y., Wang, H., Mao, B., Luo, Z.-G. 2021. Haploinsufficiency of the TDP43 ubiquitin E3 ligase RNF220 leads to ALS-like motor neuron defects in the mouse. Journal of Molecular Cell Biology, 13: 374-382. DOI: 10.1093/jmcb/mjaa072, PMID: 33386850

      Ma, P., Song, N.-N., Li, Y., Zhang, Q., Zhang, L., Zhang, L., Kong, Q., Ma, L., Yang, X., Ren, B., Li, C., Zhao, X., Li, Y., Xu, Y., Gao, X., Ding, Y.-Q., Mao, B. 2019. Fine-Tuning of Shh/Gli Signaling Gradient by Non-proteolytic Ubiquitination during Neural Patterning. Cell Rep, 28: 541-553.e544. DOI: 10.1016/j.celrep.2019.06.017, PMID: 31291587

      Niederreither, K., Dollé, P. 2008. Retinoic acid in development: towards an integrated view. Nat Rev Genet, 9: 541-53. DOI: 10.1038/nrg2340, PMID: 18542081

      Wang, Y.-B., Song, N.-N., Zhang, L., Ma, P., Chen, J.-Y., Huang, Y., Hu, L., Mao, B., Ding, Y.-Q. 2022. Rnf220 is Implicated in the Dorsoventral Patterning of the Hindbrain Neural Tube in Mice. Front Cell Dev Biol, 10. DOI: 10.3389/fcell.2022.831365, PMID: 35399523

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The authors study the variability of patient response of NSCLC patients on immune checkpoint inhibitors using single-cell RNA sequencing in a cohort of 26 patients and 33 samples (primary and metastatic sites), mainly focusing on 11 patients and 14 samples for association analyses, to understand the variability of patient response based on immune cell fractions and tumor cell expression patterns. The authors find immune cell fraction, clonal expansion differences, and tumor expression differences between responders and non-responders. Integrating immune and tumor sources of signal the authors claim to improve prediction of response markedly, albeit in a small cohort.

      Strengths:

      The problem of studying the tumor microenvironment, as well as the interplay between tumor and immune features is important and interesting and needed to explain the heterogeneity of patient response and be able to predict it.

      Extensive analysis of the scRNAseq data with respect to immune and tumor features on different axes of hypothesis relating to immune response and tumor immune evasion using state-of-the-art methods.

      The authors provide an interesting scRNAseq data set linked to outcomes data.

      Integration of TCRseq to confirm subtype of T-cell annotation and clonality analysis.

      Interesting analysis of cell programs/states of the (predicted) tumor cells and characterization thereof.

      Weaknesses:

      Generally, a very heterogeneous and small cohort where adjustments for confounding are hard. Additionally, there are many tests for association with outcome, where necessary multiple testing adjustments would negate signal and confirmation bias likely, so biological takeaways have to be questioned.

      Thank you for your comment. We made multiple testing adjustments as suggested in “Recommendations for Authors.”

      RNAseq is heavily influenced by the tissue of origin (both cell type and expression), so the association with the outcome can be confounded. The authors try to argue that lymph node T-cell and NK content are similar, but a quantitative test on that would be helpful.

      Following the reviewer’s suggestion, we performed principal component analysis (PCA) to assess the influence of tissue of origin on immune and stromal cell populations. In the revised Figure S1g, we quantified the similarity using Euclidean distances of centroids between sample groups based on their tissue of origin in the PC1 and PC3 plot.

      The authors claim a very high "accuracy" performance, however, given the small cohort and lack of information on the exact evaluation it is not clear if this just amounts to overfitting the data.

      We acknowledge the concern about the high “accuracy” potentially indicating overfitting. To address this, we revised the manuscript to clarify the use of 'accuracy,' 'AUC,' and 'performance' with clearer expressions in the following sections: Abstract (Line 57), Results (Line 264), Discussion (Lines 320-321), Methods (Lines 546-547), Legends for Figure 5c and Figure S8b.

      Especially for tumor cell program/state analysis the specificity to the setting of ICIs is not clear and could be prognostic.

      Thank you for your comments. As outlined in the ‘Table 2 in the revised manuscript’, we conducted a multivariate survival analysis of tumor signature candidates using the TCGA lung adenocarcinoma (LUAD, n = 533) and squamous cell carcinoma (LUSC, n = 502) cohorts to evaluate their prognostic potential. No tumor cell programs or states were found to be associated with overall survival in either LUAD or LUSC. We added descriptions related to Table 2 in the Results (Lines 249-251) and Methods (Lines 530-542) section.

      Due to the small cohort with a lot of variability, more external validation is needed to be convincingly reproducible, especially when talking about AUC/accuracy of a predictor.

      Expanding the cohort size was difficult due to limited resources. We recognize the challenges posed by the small and heterogeneous cohort. We have acknowledged these limitations and applied statistical corrections to address them.

      Reviewer #2 (Public Review):

      Summary:

      The authors have utilised deep profiling methods to generate deeper insights into the features of the TME that drive responsiveness to PD-1 therapy in NSCLC.

      Strengths:

      The main strengths of this work lie in the methodology of integrating single-cell sequencing, genetic data, and TCRseq data to generate hypotheses regarding determinants of IO responsiveness.

      Some of the findings in this study are not surprising and well precedented eg. association of Treg, STAT3, and NFkB with ICI resistance and CD8+ activation in ICI responders and thus act as an additional dataset to add weight to this prior body of evidence. Whilst the role of Th17 in PD-1 resistance has been previously reported (eg. Cancer Immunol Immunother 2023 Apr;72(4):1047-1058, Cancer Immunol Immunother 2024 Feb 13;73(3):47, Nat Commun. 2021; 12: 2606 ) these studies have used non-clinical models or peripheral blood readouts. Here the authors have supplemented current knowledge by characterization of the TME of the tumor itself.

      Weaknesses:

      Unfortunately, the study is hampered by the small sample size and heterogeneous population and whilst the authors have attempted to bring in an additional dataset to demonstrate the robustness of their approach, the small sample size has limited their ability to draw statistically supported conclusions. There is also limited validation of signatures/methods in independent cohorts, no functional characterization of the findings, and the discussion section does not include discussion around the relevance/interpretation of key findings that were highlighted in the abstract (eg. role of Th17, TRM, STAT3, and NFKb). Because of these factors, this work (as it stands) does have value to the field but will likely have a relatively low overall impact.

      We acknowledge the challenges posed by the small and heterogeneous cohort. To address this, we tempered our claims related to accuracy by applying statistical testing corrections. We also appreciate the feedback on functional characterization and have expanded the discussion in the revised manuscript to include an overview of specific cell populations and genes.

      Related to the absence of discussion around prior TRM findings, the association between TRM involvement in response to IO therapy in this manuscript is counter to what has been previously demonstrated (Cell Rep Med. 2020;1(7):100127, Nat Immunol. 2017;18(8):940-950., J Immunol. 2015;194(7):3475-3486.). However, it should be noted that the authors in this manuscript chose to employ alternative markers of TRM characterisation when defining their clusters and this could indicate a potential rationale for differences in these findings. TRM population is generally characterised through the inclusion of the classical TRM markers CD69 (tissue retention marker) and CD103 (TCR experienced integrin that supports epithelial adhesion), which are both absent from the TRM definition in this study. Additional markers often used are CD44, CXCR6, and CD49a, of which only CXCR6 has been included by the authors. Conversely, the majority of markers used by the authors in the cell type clustering are not specific to TRM (eg. CD6, which is included in the TRM cluster but is expressed at its lowest in cluster 3 which the authors have highlighted as the CD8+ TRM population). Therefore, whilst there is an interesting finding of this particular cell cluster being associated with resistance to ICI, its annotation as a TRM cluster should be interpreted with caution.

      Single-cell RNA sequencing (scRNA-seq) can sometimes fail to detect the expression of classical cell type markers due to incomplete capture of a cell’s transcriptome. To determine cell identity, we utilized cell type markers established in previous scRNA-seq studies. In response to your comments, we have added the expression levels of classical TRM markers, including CD69, CD103 (ITGAE), CD44, CXCR6, and CD49a (ITGA1), in the revised Figure 2c. Although these markers were not exclusively expressed in TRM clusters, TRM clusters exhibited relatively high levels of these genes while lacking other clusters’ specific marker genes.

      Reviewer #1 (Recommendations For The Authors):

      General suggestions:

      When analyzing the association of cell type proportions with outcomes, some adjustment for multiple testing should be considered (either sampling-based, e.g. permutation test, or adjustment based on assumptions of independence of tests, e.g. Bonferroni).

      Thank you for your comments. As suggested, we calculated the adjusted p-value using the False Discovery Rate for the association of cell type proportions with outcomes in Figure 3a. The heatmap in Reviewer's ONLY Figure 1, using the adjusted p-value consistently showed the expected grouping of cell types and outcomes. However, the significance did not meet the conventional statistical cutoff criteria. We acknowledge this limitation, which results from statistical testing based on ratio values.

      Author response image 1.

      Heat map with unsupervised hierarchical clustering of proportional changes in cell subtypes within total immune cells. Proportional changes were compared across multiple ICI response groups. The color represents the adjusted -log (p-value) calculated using the False Discovery Rate.

      A formal test of clonotype differences (normalized to cell type fraction) would be great as the shown plot 2e could be confounded by cell number and type differences between responders and non-responders.

      Thank you for your suggestion. We have revised Figure 2e to display the relative clonotype differences versus CD4+ and CD8+ T cell fractions in each sample. The relative clone size of each cell was calculated by dividing the size of each clone by the total number of CD4+ or CD8+ T cells, respectively.

      It could be made a bit more clear when the core group of patients was used (only when associating with outcomes?) and when all other patients were used as well (only cell type annotation?).

      As the reviewer correctly noted, we performed scRNA-seq analysis on all specimens, but only the core group of patients was used for the comparative analysis between the responder and non-responder groups. This information has been detailed in the manuscript (Lines 103-105).

      For immune cells, it would be interesting to look at expression patterns (NMF, scINSIGHT) as well, not just immune cell fractions and expansion.

      In contrast to tumor signatures, immune cell programs are more directly tied to their functional characteristics. Therefore, we focused on annotating immune cells based on their functional properties and conducted comparative analyses between responders and non-responders.

      Multiple testing is necessary for the univariate association analysis. Some adjustments for confounders in a multivariate model (despite the size) could be informative.

      As shown in ‘Reviewer's ONLY Table 1’, we conducted a multivariate regression analysis of immune and tumor signatures for ICI response, adjusting for clinical variables such as tissue origin, cancer subtype, pathological stage, and smoking status. However, the results were not significant, likely due to the heterogeneity and small size of the cohort.

      Author response table 1.

      P-values from univariate and multivariate regression analysis of immune and tumor signatures for ICI response.

      It is not clear from the manuscript how "accuracy" is measured. The terms "accuracy" and "AUC", as well as "performance" are used interchangeably, a section in the methods with the precise definition is needed.

      We have revised the manuscript to clarify the terms 'accuracy,' 'AUC,' and 'performance' by using clearer expressions in the following sections: Abstract (Line 57), Results (Line 264), Discussion (Lines 320-321), Methods (Lines 546-547), Legends for Figure 5c and Figure S8b.

      Furthermore, it has to be clear if this is in-sample performance or if there was some train/test split or cross-validation used. Given the small cohort size and wealth of features finding some combination of predictors that could overfit on responders/non-responders would not be surprising.

      As the reviewer has noted, we acknowledge the statistical limitations due to the small cohort size. We have revised the sentence on Lines 545-547 “Classification models of responders and non-responders for PC signatures and combinatorial indexes between tumor and/or immune cells were generated based on in-sample performance…”.

      Suggestions to improve readability:

      Line 84: The sentence should be reformulated to improve understanding.

      We have revised sentences in lines 81-93.

      Line 86: missing a "the".

      We have revised the sentences in lines 81-93.

      Reviewer #2 (Recommendations For The Authors):

      "Tumor-infiltrating PD-1 positive T cells have higher capacity of tumor recognition than PD-1 negative T cells" Please look to rephrase this sentence as this is not entirely accurate: PD-1 is upregulated in tumor-experienced T cells as a consequence of antigen recognition ie those cells that recognise tumor will increase PD-1, whereas the sentence as it's currently written indicates that PD1+ cells have an intrinsically increased capacity to kill tumors, which is incorrect.

      We have revised the sentence “Tumor-infiltrating PD-1 positive T cells have higher capacity of tumor recognition than PD-1 negative T cells” in lines 86-88 as “More specifically, PD-1 expression is upregulated upon antigen recognition (PMID29296515), indicating that certain T cells in the tumor microenvironment are actively engaged as tumor-specific T cells.” in the revised manuscript.

      Cancer subtype abbreviations (eg. SQ, ADC, NUT) are used in figures in the main article and so should be defined in the main text (they are currently only explained in the legend for the supplementary table).

      As per the reviewer’s suggestion, the manuscript has been revised to include definitions of cancer type abbreviations in lines 108-110.

      Figure S1d-f does not appear to corroborate the statement that "Although there were differences in tissue-specific resident populations, we found that the immune cell profiles, especially T/NK cells of mLN were similar to those of primary tumor tissues indicating the activation of immune responses were 118 consistently observed at metastatic sites (Figure S1d-f)." The diagrams are complex (please explain all abbreviations) and it is not clear how the authors have come to this conclusion. Additionally, cell quantity does not indicate that the 'activation of immune responses' is consistently observed at metastatic sites as these cells could be dysfunctional/bystander.

      In the revision, we have quantified the diagrams (Figure S1f) to more clearly highlight the differences in tissue-specific resident populations. We performed principal component analysis (PCA) to evaluate the impact of tissue origin on immune and stromal cell populations. In the revised Figure S1g, we illustrated the quantitative similarity between sample groups using Euclidean distances in the PC plot based on their tissue of origin. Additionally, the legends for Figures S1d and S1e have been updated to include definitions for all abbreviations.

      We agree with the reviewer's comment that cell quantity alone may not fully reflect activation of antigen-specific immune responses, even though we annotated the functional T cell subtypes. To better focus on the comparisons of cellular profiles between metastatic sites (mLN) and primary tumors (tLung and tL/B), we removed the sentence “…indicating the activation of immune responses were consistently observed at metastatic sites (Fig. S1d-f).” from the revised manuscript.

      In Figure 2c, classical markers for TRM (CD103, CD69) should be included in the description for the definition of the TRM clusters, or their exclusion appropriately explained. The findings regarding the negative correlation between follicular B cells and ICI response are surprising. Figure S3, the cluster identified as Follicular B cells contains MS4A1 (CD20) and HLA-DRA. Classical markers are CD20 (pan-B cell), CD21 (CR2), CD23, and IgD/IgM (double positive), and as such it is not clear if the authors have appropriately annotated this cluster as representing follicular B cells. These classical markers should be included in the interpretation of the cell clustering or their exclusion appropriately explained.

      We appreciate your comments. In response, we have added the expression levels of classical TRM markers such as CD69, CD103 (ITGAE), CD44, CXCR6, and CD49a (ITGA1), in the revised Figure 2c. Additionally, we revised the dot plot showing the mean expression of marker genes in each cell cluster for B/Plasma cells (revised Figure S3b) by incorporating classical markers for Follicular B cells, such as CD21 (CR2), CD23 (FCER2), IgD (IGHD), IgM (IGHM).

      Figure 2f is rather confusing for the reader. I would recommend changing to an alternative plot that shows logP and response in a different way. If keeping to this plot type please clarify why plotting response vs PD, and whether the lower left quadrant indicates patients with progressive disease and the top right indicates responders as the interpretation is not clear currently.

      Thank you for your feedback. To address the concerns raised, we have updated the figure legend for Figure 2f to clarify the interpretation of the quadrants: “The lower left quadrant shows cell types overrepresented in the poor responder groups, while the upper right quadrant indicates cell types overrepresented in the better responder groups”. This clarification aims to help readers understand that the lower left quadrant reflects cell types associated with worse treatment outcomes, while the upper right quadrant reflects cell types associated with improved therapeutic responses.

      The terms "PC7.neg, INT.down, and UNION.down" are included in the results with no explanation to the reader of what they are or how to interpret them. The methods description "We constructed DEGs with 470 intersections (INT) and union (UNION) of up- or down-regulated genes for comparisons" does not sufficiently describe how they were generated/calculated and, therefore, this is difficult for the reader to interpret in the final results section. Please add an additional explanation for the reader in the final section of the results/Figure 5 and in the methods.

      Following the reviewer’s suggestion, we added additional explanation in the Results section (lines 258-261): “PC7.neg denotes genes negatively correlated with PC7, a principal component extracted from PCA that distinguishes tumor cells in poor response groups. INT.down and UNION.down represent the intersection and union of down-regulated genes in the responder group, respectively.”. We also explained the details in the Methods section (lines 489-495): “We reconstructed DEGs as four groups: INT.up, INT.down, UNION,up, and UNION.down, based on with the intersection (INT) and union (UNION) of up- or down-regulated genes for pairwise comparisons between responder versus non-responder, PR versus PD, and PR versus SD. INT.up and INT.down represent the intersection of up- and down-regulated genes in the responder group, respectively. UNION.up and UNION.down represent the union of up- and down-regulated genes in the responder group, respectively.”

      The TRM and Th17+ T cell populations are highlighted in the abstract as being related to ICI resistance, but these populations of cells are not even mentioned in the discussion. Likewise, STAT3 and NFkb pathways are also highlighted in the abstract but absent in the discussion section. Please discuss the relevance of these findings, particularly given the prior studies demonstrating the opposite impact of TRM populations in NSCLC.

      We have expanded the discussion in the revised manuscript (Lines 295-313) to address the roles of TRM and Th17+ T cell, as well as the STAT3 and NF-κB pathways, in association with ICI resistance in NSCLC.

      “The identification of an abundance of CD4+ TRM cells as a negative predictor of ICI response is an unexpected finding, considering that higher frequencies of TRM cells in lung tumor tissues are generally associated with better clinical outcomes in NSCLC (PMID28628092). This is largely due to their role in sustaining high densities of tumor-infiltrating lymphocytes and promoting anti-tumor responses. Additionally, previous studies have demonstrated that TRM cell subsets coexpressing PD-1 and TIM-3 are relatively enriched in patients who respond to PD-1 inhibitors (PMID31227543). However, recent findings suggest that pre-existing TRM-like cells in lung cancer may promote immune evasion mechanisms, contributing to resistance to immune checkpoint blockade therapies (PMID37086716). These observations suggest that the roles of TRM subsets in tumor immunity are highly context-dependent.

      Similarly, CD4+ TH17 cells, which were overrepresented in the non-responder groups, exhibit context-dependent roles in tumor immunity and may be associated with both unfavorable and favorable outcomes (PMID34733609; PMID30941641). In exploring tumor cell signatures linked to ICI response, non-responder attributes were regulated by STAT3 and NFKB1. The STAT3 and NF-κB pathways are crucial for Th17 cell differentiation and T cell activation (PMID24605076; PMID32697822). Notably, STAT3 activation in lung cancer orchestrates immunosuppressive characteristics by inhibiting T-cell mediated cytotoxicity (PMID31848193). The combined influence of the Th17/STAT3 axis and TRM cell activity in predicting ICI response underscores the complexity of these pathways and suggests that their roles in tumor immunity and therapy response warrants further investigation.”

    1. Author response:

      The following is the authors’ response to the current reviews.

      Many thanks to the editors for the reviewing of the revised manuscript.

      We are very grateful to the Reviewers for their time and for the appreciation of the revision.

      We thank the Reviewer 3 for acknowledging the use of sulforhodamine B (SRB) fluorescence as a real-time readout of astrocyte volume dynamics. Experimental data in brain slices were provided to validate this approach.<br /> The incomplete matching of our observation with early reported data in cultured astrocytes (e.g., Solenov et al., AJP-Cell, 2004), might reflect certain of their properties differing from the slice/in vivo counterparts as discussed in the manuscript.<br /> The study (T.R. Murphy et al., Front Cell Neurosci., 2017) showed that AQP4 knockout increased astrocyte swelling extent in response to hypoosmotic solution in brain slices (Fig 9), and discussed '... AQP4 can provide an efficient efflux pathway for water to leave astrocytes.’ Correspondingly, our data suggest that AQP4 mediate astrocyte water efflux in basal conditions.<br /> We have discussed the study (Igarashi et al., NeuroReport 2013); our current data would help to understand the cellular mechanisms underlying the finding of Igarashi et al.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Pham and colleagues provide an illuminating investigation of aquaporin-4 water flux in the brain utilizing ex vivo and in vivo techniques. The authors first show in acute brain slices, and in vivo with fiber photometry, SRB-loaded astrocytes swell after inhibition of AQP4 with TGN-020, indicative of tonic water efflux from astrocytes in physiological conditions. Excitingly, they find that TGN-020 increases the ADC in DW-MRI in a region-specific manner, potentially due to AQP4 density. The resolution of the DW-MRI cannot distinguish between intracellular or extracellular compartments, but the data point to an overall accumulation of water in the brain with AQP4 inhibition. These results provide further clarity on water movement through AQP4 in health and disease.

      Overall, the data support the main conclusions of the article, with some room for more detailed treatment of the data to extend the findings.

      Strengths:

      The authors have a thorough investigation of AQP4 inhibition in acute brain slices. The demonstration of tonic water efflux through AQP4 at baseline is novel and important in and of itself. Their further testing of TGN-020 in hyper- and hypo-osmotic solutions shows the expected reduction of swelling/shrinking with AQP4 blockade.

      Their experiment with cortical spreading depression further highlights the importance of water efflux from astrocytes via AQP4 and transient water fluxes as a result of osmotic gradients. Inhibition of AQP4 increases the speed of tissue swelling, pointing to a role in the efflux of water from the brain.

      The use of DW-MRI provides a non-invasive measure of water flux after TGN-020 treatment.

      We thank the reviewer for the insightful comments.

      Weaknesses:

      The authors specifically use GCaMP6 and light sheet microscopy to image their brain sections in order to identify astrocytic microdomains. However, their presentation of the data neglects a more detailed treatment of the calcium signaling. It would be quite interesting to see whether these calcium events are differentially affected by AQP4 inhibition based on their cellular localization (ie. processes vs. soma vs. vascular end feet which all have different AQP4 expressions).

      Following the suggestion, we provide new data on the effect of AQP4 inhibition on spontaneous calcium signals in perivascular astrocyte end-feet. As shown now in Fig.S2, acute application of TGN020 induced Ca2+ oscillations in astrocyte end-feet regions where the GCaMP6 labeling lines the profile of the blood vessel. It is noted that on average, the strength of basal Ca2+ signals in the end-feet is higher than that observed across global astrocyte territories (4.65 ± 0.55 vs. 1.45 ± 0.79, p < 0.01), as does the effect of TGN (8.4 ± 0.62 vs. 6.35 ± 0.97, p < 0.05; Fig S2 vs. Fig 2B). This likely reflects the enrichment of AQP4 in astrocyte end-feet. We describe the data in Fig.S2, and on page 8, line 20 – 23.  

      We now use the transgenic line GLAST-GCaMP6 for cytosolic GCaMP6 expression in astrocytes. Spontaneous calcium signals, reflected by transient fluorescence rises, occur in discrete micro-domains whereas the basal GCaMP6 fluorescence in the soma is weak. In the present condition, it is difficult to unambiguously discriminate astrocyte soma from the highly intermingled processes. 

      The authors show the inhibition of AQP4 with TGN-020 shortens the onset time of the swelling associated with cortical spreading depression in brain slices. However, they do not show quantification for many of the other features of CSD swelling, (ie. the duration of swelling, speed of swelling, recovery from swelling).

      Regarding the features of the CSD swelling, we have performed new analysis to quantify the duration of swelling, speed of swelling and the recovery time from swelling in control condition and in the presence of TGN-020. The new analysis is now summarized in Fig. S5. Blocking AQP4 with TGN-020 increases the swelling speed, prolongs the duration of swelling and slows down the recovery from swelling, confirming our observation that acute inhibition of AQP4 water efflux facilitates astrocyte swelling while restrains shrinking. We describe the result on page 11, line 19-21. 

      Significance:

      AQP4 is a bidirectional water channel that is constitutively open, thus water flux through it is always regulated by local osmotic gradients. Still, characterizing this water flux has been challenging, as the AQP4 channel is incredibly water-selective. The authors here present important data showing that the application of TGN-020 alone causes astrocytic swelling, indicating that there is constant efflux of water from astrocytes via AQP4 in basal conditions. This has been suggested before, as the authors rightfully highlight in their discussion, but the evidence had previously come from electron microscopy data from genetic knockout mice.

      AQP4 expression has been linked with the glymphatic circulation of cerebrospinal fluid through perivascular spaces since its rediscovery in 2012 [1]. Further studies of aging[2], genetic models[3], and physiological circadian variation[4] have revealed it is not simply AQP4 expression but AQP4 polarization to astrocytic vascular endfeet that is imperative for facilitating glymphatic flow. Still, a lingering question in the field is how AQP4 facilitates fluid circulation. This study represents an important step in our understanding of AQP4's function, as the basal efflux of water via AQP4 might promote clearance of interstitial fluid to allow an influx of cerebrospinal fluid into the brain. Beyond glymphatic fluid circulation, clearly, AQP4-dependent volume changes will differentially alter astrocytic calcium signaling and, in turn, neuronal activity.

      (1) Iliff, J.J., et al., A Paravascular Pathway Facilitates CSF Flow Through the Brain Parenchyma and the Clearance of Interstitial Solutes, Including Amyloid β. Sci Transl Med, 2012. 4(147): p. 147ra111.

      (2) Kress, B.T., et al., Impairment of paravascular clearance pathways in the aging brain. Ann Neurol, 2014. 76(6): p. 845-61.

      (3) Mestre, H., et al., Aquaporin-4-dependent Glymphatic Solute Transport in the Rodent Brain. eLife, 2018. 7.

      (4) Hablitz, L., et al., Circadian control of brain glymphatic and lymphatic fluid flow. Nature Communications, 2020. 11(1).

      We thank the reviewer in acknowledging the significance of our study and the functional implication in brain glymphatic system. We have now highlighted the mentioned studies as well as the potential implication glymphatic fluid circulation (page 4, line 9-10; page 5, line 1-3; and page 19, line 3-10). 

      Reviewer #2 (Public Review):

      Summary:

      The paper investigates the role of astrocyte-specific aquaporin-4 (AQP4) water channel in mediating water transport within the mouse brain and the impact of the channel on astrocyte and neuron signaling. Throughout various experiments including epifluorescence and light sheet microscopy in mouse brain slices, and fiber photometry or diffusion-weighted MRI in vivo, the researchers observe that acute inhibition of AQP4 leads to intracellular water accumulation and swelling in astrocytes. This swelling alters astrocyte calcium signaling and affects neighboring neuron populations. Furthermore, the study demonstrates that AQP4 regulates astrocyte volume, influencing mainly the dynamics of water efflux in response to osmotic challenges or associated with cortical spreading depolarization. The findings suggest that AQP4-mediated water efflux plays a crucial role in maintaining brain homeostasis, and indicates the main role of AQP4 in this mechanism. However authors highlight that the report sheds light on the mechanisms by which astrocyte aquaporin contributes to the water environment in the brain parenchyma, the mechanism underlying these effects remains unclear and not investigated. The manuscript requires revision.

      Strengths:

      The paper elucidates the role of the astrocytic aquaporin-4 (AQP4) channel in brain water transport, its impact on water homeostasis, and signaling in the brain parenchyma. In its idea, the paper follows a set of complimentary experiments combining various ex vivo and in vivo techniques from microscopy to magnetic resonance imaging. The research is valuable, confirms previous findings, and provides novel insights into the effect of acute blockage of the AQP4 channel using TGN-020.

      We thank the reviewer for the constructive comments.

      Weaknesses:

      Despite the employed interdisciplinary approach, the quality of the manuscript provides doubts regarding the significance of the findings and hinders the novelty claimed by the authors. The paper lacks a comprehensive exploration or mention of the underlying molecular mechanisms driving the observed effects of astrocytic aquaporin-4 (AQP4) channel inhibition on brain water transport and brain signaling dynamics. The scientific background is not very well prepared in the introduction and discussion sections. The important or latest reports from the field are missing or incompletely cited and missconcluded. There are several citations to original works missing, which would clarify certain conclusions. This especially refers to the basis of the glymphatic system concept and recently published reports of similar content. The usage of TGN-020, instead of i.e. available AER-270(271) AQP4 blocker, is not explained. While employing various experimental techniques adds depth to the findings, some reasoning behind the employed techniques - especially regarding MRI - is not clear or seemingly inaccurate. Most of the time the number of subjects examined is lacking or mentioned only roughly within the figure captions, and there are lacking or wrongly applied statistical tests, that limit assessment and reproducibility of the results. In some cases, it seems that two different statistical tests were used for the same or linked type of data, so the results are contradictory even though appear as not likely - based on the figures. Addressing these limitations could strengthen the paper's impact and utility within the field of neuroscience, however, it also seems that supplementary experiments are required to improve the report.

      The current data hint at a tonic water efflux from astrocyte AQP4 in physiological condition, which helps to understand brain water homeostasis and the functional implication for the glymphatic system. The underlying molecular and cellular mechanisms appear multifaceted and functionally interconnected, as discussed (page 14 line 8 –page 15, line 3). We agree that a comprehensive exploration will further advance our understanding.

      The introduction and discussion are now strengthened by incorporating the important advances in glymphatic system while highlighting the relevant studies. 

      The use of TGN-020 was based on its validation by wide range of ex vivo and in vivo studies including the use of heterologous expression system and the AQP4 KO mice. The validation of AER-270(271, the water soluble prodrug) using AQP4 KO mice is reported recently (Giannetto et al., 2024). AER-271 was noted to impact brain water ADC (apparent diffusion coefficient evaluated by diffusion-weighted MRI) in AQP4 KO mice ~75 min after the drug application (Giannetto et al., 2024). This likely reflects that AER270(271) is also an inhibitor for κΒ nuclear factor (NF-κΒ) whose inhibition could reduce CNS water content independent of AQP4 targeting (Salman et al., 2022). In addition, the inhibition efficiency of AER-270(271) seems lower than TGN-020 (Farr et al., 2019; Giannetto et al., 2024; Huber et al., 2009; Salman et al., 2022). We have now supplemented this information in the manuscript (page 7, line 1-6 and page15, line 7-17).

      The description on the DW-MRI is now updated (page 4, line 10-14). 

      We also performed new experiments and data analysis as described in a point-to-point manner below in the section ‘Recommendations For The Authors’.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors propose that astrocytic water channel AQP4 represents the dominant pathway for tonic water efflux without which astrocytes undergo cell swelling. The authors measure changes in astrocytic sulforhodamine fluorescence as the proxy for cell volume dynamics. Using this approach, they perform a technically elegant series of ex vivo and in vivo experiments exploring changes in astrocytic volume in response to AQP4 inhibitor TGN-020 and/or neuronal stimulation. The key finding is that TGN-020 produces an apparent swelling of astrocytes and modifies astrocytic cell volume regulation after spreading depolarizations. Additionally, systemic application of TGN-020 produced changes in diffusion-weighted MRI signal, which the authors interpret as cellular swelling. This study is perceived as potentially significant. However, several technical caveats should be strongly considered and perhaps addressed through additional experiments.

      Strengths:

      (1) This is a technically elegant study, in which the authors employed a number of complementary ex vivo and in vivo techniques to explore functional outcomes of aquaporin inhibition. The presented data are potentially highly significant (but see below for caveats and questions related to data interpretation).

      (2) The authors go beyond measuring cell volume homeostasis and probe for the functional significance of AQP4 inhibition by monitoring Ca2+ signaling in neurons and astrocytes (GCaMP6 assay).

      (3) Spreading depolarizations represent a physiologically relevant model of cellular swelling. The authors use ChR2 optogenetics to trigger spreading depolarizations. This is a highly appropriate and much-appreciated approach.

      We thank the reviewer for the effort in evaluating our work.

      Weaknesses:

      (1) The main weakness of this study is that all major conclusions are based on the use of one pharmacological compound. In the opinion of this reviewer, the effects of TGN-020 are not consistent with the current knowledge on water permeability in astrocytes and the relative contribution of AQP4 to this process.

      Specifically: Genetic deletion of AQP4 in astrocytes reduces plasmalemmal water permeability by ~two-three-fold (when measured a 37oC, Solenov et al., AJP-Cell, 2004). This is a significant difference, but it is thought to have limited/no impact on water distribution. Astrocytic volume and the degree of anisosmotic swelling/shrinkage are unchanged because the water permeability of the AQP4null astrocytes remains high. This has been discussed at length in many publications (e.g., MacAulay et al., Neuroscience, 2004; MacAulay, Nat Rev Neurosci, 2021) and is acknowledged by Solenov and Verkman (2004).

      Keeping this limitation in mind, it is important to validate astrocytic cell volume changes using an independent method of cell volume reconstruction (diameter of sulforhodamine-labeled cell bodies? 3D reconstruction of EGFP-tagged cells? Else?)

      Solenov and coll. used the calcein quenching assay and KO mice demonstrating AQP4 as a functional water channel in cultured astrocytes (Solenov et al., 2004). AQP4 deletion reduced both astrocyte water permeability and the absolute amplitude of swelling over comparable time, and also slowed down cell shrinking, which overall parallels our results from acute AQP4 blocking. Yet in Solenovr’s study, the time to swelling plateau was prolonged in AQP4 KO astrocytes, differing from our data from the pharmacological acute blocking. This discrepancy may be due to compensatory mechanisms in chronic AQP4 KO, or reflect the different volume responses in cultured astrocytes from brain slices or in vivo results as suggested previously (Risher et al., 2009). 

      Soma diameter might be an indicator of cell volume change, yet it is challenging with our current fluorescence imaging method that is diffraction-limited and insufficient to clearly resolve the border of the soma in situ. In addition, the lateral diameter of cell bodies may not faithfully reflect the volume changes that can occur in all three dimensions. Rapid 3D imaging of astrocyte volume dynamics with sufficient high Z-axis resolution appears difficult with our present tools. 

      We have now accordingly updated the discussion with relevant literatures being cited (page 17 line 14 – page 18, line 3).

      (2) TGN-020 produces many effects on the brain, with some but not all of the observed phenomena sensitive to the genetic deletion of AQP4. In the context of this work, it is important to note that TGN020 does not completely inhibit AQP4 (70% maximal inhibition in the original oocyte study by Huber et al., Bioorg Med Chem, 2009). Thus, besides not knowing TGN-020 levels inside the brain, even

      "maximal" AQP4 inhibition would not be expected to dramatically affect water permeability in astrocytes.

      This caveat may be addressed through experiments using local delivery of structurally unrelated AQP4 blockers, or, preferably, AQP4 KO mice.

      It is an important point that TGN-020 partially blocks AQP4, implying the actual functional impact of AQP4 per se might be stronger than what we observed. TGN provides a means to acutely probe AQP4 function in situ, still we agree, its limitation needs be acknowledged. We mention this now on page 15, line 7-9 and 14-17.

      We agree that local delivery of an alternative blocker will provide additional information. Meanwhile, local delivery requires the stereotaxic implantation of cannula, which would cause inflammations to surrounding astrocytes (and neurons). The recently introduced AQP4 blocker AER-270(271) has received attention that it influences brain water dynamics (ADC in DW-MRI) in AQP4 KO mice (Giannetto et al., 2024), recalling that AER-270(271) is also an inhibitor for κΒ nuclear factor (NF-κΒ). This pathway can potentially perturb CNS water content and influence brain fluid circulation, in an AQP4independent manner (Salman et al., 2022). The inhibition efficiency on mouse AQP4 of AER-270 (~20%, Farr et al., 2019; Salman et al., 2022) appears lower than TGN-020 (~70%, Huber et al., 2009).

      We chose to use the pharmacological compound to achieve acute blocking of AQP4 thereby avoiding the chronic genetics-caused alterations in brain structural, functional and water homeostasis. Multiple lines of evidence including the recent study (Gomolka et al., 2023), have shown that AQP4 KO mice alters brain water content, extracellular space and cellular structures, which raises concerns to use the transgenic mouse to pinpoint the physiological functions of the AQP4 water channel. 

      We have now mentioned the concerns on AQP4 pharmacology by supplementing additional literatures in the field (page 15, line 8-18). 

      (3) This reviewer thinks that the ADC signal changes in Figure 5 may be unrelated to cellular swelling. Instead, they may be a result of the previously reported TGN-020-induced hyphemia (e.g., H. Igarashi et al., NeuroReport, 2013) and/or changes in water fluxes across pia matter which is highly enriched in AQP4. To amplify this concern, AQP4 KO brains have increased water mobility due to enlarged interstitial spaces, rather than swollen astrocytes (RS Gomolka, eLife, 2023). Overall, the caveats of interpreting DW-MRI signal deserve strong consideration.

      The previous observation show that TGN-020 increases regional cerebral blood flow in wild-type mice but not in AQP4 KO mice (Igarashi et al., 2013). Our current data provide a possible mechanism explanation that TGN-020 blocking of astrocyte AQP4 causes calcium rises that may lead to vasodilation as suggested previously (Cauli and Hamel, 2018). We now add updates to the discussion on page 15, line 3-7.

      We are in line with the reviewer regarding the structural deviations observed with the AQP4 KO mice

      (Gomolka et al., 2023), now mentioned on page 19, line 3-5. Following the Reviewer’s suggestion, we have also updated the interpretation of the DW-MRI signal and point that in addition to being related to the astrocyte swelling, the ADC signal changes may also be caused by indirect mechanisms, such as the transient upregulation of other water-permeable pathways in compensating AQP4 blocking. We now describe this alternative interpretation and the caveats of the DW-MRI signals (page 20, line 1-8). 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Private recommendations

      My more broad experimental suggestions are in the "weaknesses" section. Some minor points that would improve the manuscript are included below:

      (1) A more detailed explanation for why SRB fluorescence reflects the astrocyte volume changes, whereas typical intracellular GFP does not.

      As an engineered fluorescence protein, the GFP has been used to tag specific type of cells. Meanwhile, as a relatively big protein (MW, 26.9 kDa), the diffusion rate of EGFP is expected to be much less than SRB, a small chemical dye (MW, 558.7 Da). Also, the IP injection of SRB enables geneticsless labeling of brain astrocytes, so to avoid the influence of protein overexpression on astrocyte volume and water transport responses. We have now stated this point in the manuscript (page 13, line 21 – page 14, line 4).

      (2) Figure 1 panel B should have clear labels on the figure and a description in the legend to delineate which part of the panel refers to hyper- or hypo-osmotic treatment.

      We have now updated the figure and the legend.  

      (3) For Figure 2, what is the rationale for analyzing the calcium signaling data between the cell types differently?

      We analyzed calcium micro-domains for astrocytes as their spontaneous signals occur mainly in discrete micro-domains (Shigetomi et al., 2013). While for neurons, we performed global analysis by calculating the mean fluorescence of imaging field of view, because calcium signal changes were only observed at global level rather than in micro-domains. This information is now included (page 24, line1820).

      (4) For Figure 3, the authors mention that TGN-020 likely caused swelling prior to the hypotonic solution administration. Do they have any measurements from these experiments prior to the TGN-020 application to use as a "true baseline" volume?

      The current method detects the relative changes in astrocyte volume (i.e., transmembrane water transport), which nevertheless is blind to the absolute volume value. We have no readout on baseline volumes.  

      (5) For Figures 3 and 4, did the authors see any evidence for regulatory volume decrease? And is this impaired by TGN-020? It is a well-characterized phenomenon that astrocytes will open mechanosensitive channels to extrude ions during hypo-osmotic induced swelling. This process is dependent on AQP4 and calcium signaling [5]

      Mola and coll. provided important results demonstrating the role of AQP4 in astrocyte volume regulation (Mola et al., 2016). In the present study in acute brain slices, when we applied hypotonic solution to induce astrocyte swelling, our protocol did not reveal rapid regulatory volume decrease (e.g., Fig. 3D). When we followed the volume changes of SRB-labeled astrocytes during optogenetically induced CSD, we observed the phase of volume decrease following the transient swelling (Fig. 4F), where the peak amplitude and the degree of recovery were both reduced by inhibiting AQP4 with TGN020. These data imply that regulatory astrocyte volume decrease may occur in specific conditions, which intriguingly has been suggested to be absent in brain slices and in vivo (e.g., Risher et al., 2009). We have not specifically investigated this phenomenon, and now briefly discuss this point on page18 line 6-14.

      (6) Figure 5 box plots do not show all data points, could the authors modify to make these plots show all the animals, or edit the legend to clarify what is plotted?

      We have now updated the plot and the legend. This plot is from all animals (n = 7 per condition).

      (7) pg. 9 line 6, there is a sentence that seems incomplete or otherwise unfinished. "We first followed the evoked water efflux and shrinking induced by hypertonic solution while."

      Fixed (now, page 9 line 17-18). 

      (8)  During the discussion on pg 13 line 11, it may be more clear to describe this as the cotransport of water into the cells with ions/metabolites as reviewed by Macaulay 2021 [6].

      We agree; the text is modified following this suggestion (now page14, line 12-13).  

      (1) Iliff, J.J., et al., A Paravascular Pathway Facilitates CSF Flow Through the Brain Parenchyma and the Clearance of Interstitial Solutes, Including Amyloid β. Sci Transl Med, 2012. 4(147): p. 147ra111.

      (2) Kress, B.T., et al., Impairment of paravascular clearance pathways in the aging brain. Ann Neurol, 2014. 76(6): p. 845-61.

      (3) Mestre, H., et al., Aquaporin-4-dependent Glymphatic Solute Transport in the Rodent Brain. eLife, 2018. 7.

      (4) Hablitz, L., et al., Circadian control of brain glymphatic and lymphatic fluid flow. Nature Communications, 2020. 11(1).

      (5) Mola, M., et al., The speed of swelling kinetics modulates cell volume regulation and calcium signaling in astrocytes: A different point of view on the role of aquaporins. Glia, 2016. 64(1).

      (6) MacAulay, N., Molecular mechanisms of brain water transport. Nat Rev Neurosci, 2021. 22(6): p. 326-344.

      We thank the reviewer. These important literatures are now supplemented to the manuscript together with the corresponding revisions.

      Reviewer #2 (Recommendations For The Authors):

      In its concept, the paper is interesting and provides additional value - however, it requires revision.

      Below, I provide the following remarks for the following sections/ pages/lines:

      ABSTRACT/page 2 (remarks here refer to the rest of the manuscript, where these sentences are repeated):

      - It seems that the 'homeostasis' provides not only physical protection, but also determines the diffusion of chemical molecules...' Please correct the sentence as it is grammatically incorrect.

      It is now corrected (page 2, line 1).

      - The term 'tonic water' is not clear. I understand, after reading the paper, that it is about tonicity of the solutes injected into the mouse.

      We use the term ‘tonic’ to indicate that in basal conditions, a constant water efflux occurs through the APQ4 channel.

      - 'tonic aquaporin water efflux maintains volume equilibrium' - I believe it is about maintaining volume and osmotic equilibrium?

      This description is now refined (now page 2, line 10).

      - It is not clear whether the tonic water outflow refers to the cellular level or outflow from the brain parenchyma (i.e., glymphatic efflux)

      It refers to the cellular level. 

      INTRODUCTION/page 3:

      - 'clearance of waste molecules from the brain as described in the glymphatic system' - The original papers describing the phenomena are not cited: Iliff et al. 2012, 2013, Mestre et al. 2018, as well as reviews by Nedergaard et al.

      Indeed. We have now cited these key literatures (now page 4, line 10).

      - 'brain water diffusion is the basis for diffusion-weighted magnetic resonance imaging (DW-MRI)' - The statement is wrong. it is the mobility of the water protons that DWI is based on, but not the diffusion of molecules in the brain. This should be clarified and based on the DW-MRI principle and the original works by Le Bihan from 1986, 1988, or 2015.

      This sentence is now updated (page 4, line10-14).

      - Similarly, I suggest correcting or removing the citations and the sentence part regarding the clinical use of DWI, as it has no value here. Instead, it would be worth mentioning what actually ADC reflects as a computational score, and what were the results from previous studies assessing glymphatic systems using DWI. This is especially important when considering the mislocalization of the AQP4 channel.

      We now states recent studies using DW-MRI to evaluate glymphatic systems (page 4, line16-17).  

      - 'In the brain, AQP4 is predominantly expressed in astrocytes'-please review the citations. I suggest reading the work by Nielsen 1997, Nagelhus 2013, Wolburg 2011, and Li and Wang from 2017. To my best knowledge, in the brain AQP4 is exclusively expressed in astrocytes.

      Thanks for the reviewer. It is described that while enriched in astrocytes, AQP4 is also expressed in ependymal cells lining the ventricles (e.g., (Mayo et al., 2023; Verkman et al., 2006)). ‘predominantly’ is now removed (page 4, line 21).

      - The conclusion: ' Our finding suggests that aquaporin acts as a water export route in astrocytes in physiological conditions, so as to counterbalance the constitutive intracellular water accumulation caused by constant transmitter and ion uptake, as well as the cytoplasmic metabolism processes. This mechanism hence plays a necessary role in maintaining water equilibrium in astrocytes, thereby brain water homeostasis' seems to be slightly beyond the actual findings in the paper. I suggest clarifying according to the described phenomena.

      We have now refined the conclusion sticking to the experimental observations (page 5, line16-18).

      - The introduction lacks important information on existing AQP4 blockers and their effects, pros and cons on why to use TGN-020. Among others, I would refer to recent work by Giannetto et al 2024, as well as previous work of Mestre et al. 2018 and Gomolka et al. 2023.

      We initiated the study by using TGN-020 as an AQP4 blocker because it has been validated by wide range of ex vivo and in vivo studies as documented in the text (page 7, line 1-6). We also update discussions on the recent advances in validating the AQP4 blocker AER-270(271) while citing the relevant studies (page 15, line 7-17).  

      RESULTS:

      - Page 5, lines 19-20: '...transport, we performed fluorescence intensity translated (FIT) imaging.' - this term was never introduced in the methods so it is difficult for the reader to understand it at first sight. -'To this end,' - it is not clear which action refers to 'this'. (is it about previous works or the moment that the brain samples were ready for imaging? Please clarify, as it is only starting to be clear after fully reading the methods.

      We now refine the description give the principle of our imaging method first, then explain the technical steps. To avoid ambiguity, the term ‘To this end’ is removed. The updated text is now on page 6, line 1-3.  

      - From page 6 onwards - all references to Figures lack information to which part of the figure subpanel the information refers (top/middle bottom or left/middle/right).

      We apologize. The complementary indication is now added for figure citations when applicable.  

      - 'whereas water export and astrocyte shrinking upon hyperosmotic manipulation increased astrocyte fluorescence (Figure 1B). Hence, FIT imaging enables real-time recording of astrocyte transmembrane water transport and volume dynamics.' - this part seems to be undescribed or not clear in the methods.

      We have now refined this description (page 6, line 19-20).

      - Page 6, lines 17-22: TGN-020. In addition to the above, I suggest familiarizing also with the following works by Igarashi 2011. doi: 10.1007/s10072-010-0431-1, and by Sun 2022. doi: 10.3389/fimmu.2022.870029.

      These studies are now cited (page 7, line 3-4).

      - Page 7: ' AQP4 is a bidirectional channel facilitating... ' - AQP4 water channel is known as the path of least resistance for water transfer, please see Manley, Nature Medicine, 2000 and Papadopoulos, Faseb J, 2004.

      This sentence is now updated (page 7, line 12-13).

      - ' astrocyte AQP4 by TGN-020 caused a gradual decrease in SRB fluorescence intensity, indicating an intracellular water accumulation' - tissue slice experiment is a very valuable method. However it seems right, the experiment does not comment on the cell swelling that may occur just due to or as a superposition of tissue deterioration and the effect of TGN-020. The AQP4 channel is blocked, and the influx of water into astrocytes should be also blocked. Thus, can swelling be also a part of another mechanism, as it was also observed in the control group? I suggest this should be addressed thoroughly.

      We performed this experiment in acute brain slices to well control the pharmacological environment and gain spatial-temporal information. Post slicing, the brain slices recovered > 1hr prior to recording, so that the slices were in a stable state before TGN-020 application as evidenced by the stable baseline. The constant decrease in the control trace is due to photobleaching which did not change its curve tendency in response to vehicle. TGN-020, in contrast, caused a down-ward change suggesting intracellular water accumulation and swelling. 

      The experiment was performed at basal condition without active water influx; a decrease in SRB fluorescence hints astrocyteintracellular water buildup. This result shows that in basal condition, astrocyte aquaporin mediates a constant (i.e., tonic) water efflux; its blocking causes intracellular water accumulation and swelling. 

      We have accordingly updated the description of this part (page 7, line 15-20).

      - From the Figure 1 legend: Only 4 mice were subjected to the experiment, and only 1 mouse as a control. I suggest expanding the experiment and performing statistics including two-way ANOVA for data in panels B, C, and D, as no results of statistical tests confirm the significance of the findings provided.

      The panel B confirms that cytosolic SRB fluorescence displays increasing tendency upon water efflux and volume shrinking, and vice versa. As for the panel C, the number of mice is now indicated. Also, the downward change in the SRB fluorescence was now respectively calculated for the phases prior and post to TGN (and vehicle) application, and this panel is accordingly updated. TGN-020 induced a declining in astrocyte SRB fluorescence, which is validated by t-test performed in MATLAB. To clarify, we now add cross-link lines to indicate statistical significance between the corresponding groups (Fig 1C, middle). As for panel D, we calculated the SRB fluorescence change (decrease) relative to the photobleaching tendency illustrated by the dotted line. The significance was also validated by t-test performed in MATLAB.  

      - Figure 1: Please correct the figure - pictures in panel A are low quality and do not support the specificity of SRB for astrocytes. Panels B-D are easier to understand if plotted as normal X/Y charts with associated statistical findings. Some drawings are cut or not aligned.

      In GFAP-EGFP transgenic, astrocytes are labeled by EGFP. SRB labeling (red fluorescence) shows colocalization with EGFP-positive astrocytes, meanwhile not all EGFP-positive astrocytes are labeled by SRB. The PDF conversion procedure during the submission may also somehow have compromised image quality. We have tried to update and align the figure panels.  

      - Page 12: ' TGN-020 increased basal water diffusion within multiple regions including the cortex,

      hippocampus and the striatum in a heterogeneous manner (Figure 5C).'

      This sentence is updated now (page 12, line 12 – page13, line 2). It reads ‘The representative images reveal the enough image quality to calculate the ADC, which allow us to examine the effect of TGN-020 on water diffusion rate in multiple regions (Fig. 5C).’

      - The expression of AQP4 within the brain parenchyma is known to be heterogenous. Please familiarize yourself with works by Hubbard 2015, Mestre 2018, and Gomolka 2023. A correlation between ADC score and AQP4 expression ROI-wise would be useful, but it is not substantial to conduct this experiment.

      We thank the reviewer. This point is stressed on page 19, line 12-14.

      DISCUSSION:

      - Most of the issues are commented on above, so I suggest following the changes applied earlier. -Page 16: 'We show by DW-MRI that water transport by astrocyte aquaporin is critical for brain water homeostasis.' This statement is not clear and does not refer to the actual impact of the findings. DWI is allowed only to verify the changes of ADC fter the application of TGN-020. I suggest commenting on the recent report by Giannetto 2024 here.

      This sentence is now refined (page 19, line 1-2), followed by the updates commenting on the recent studies employing DW-MRI to evaluate brain fluid transport, including the work of (Giannetto et al., 2024) (page 19, line 3-10). 

      METHODS:

      - Page 18: no total number of mice included in all experiments is provided, as well as no clearly stated number of mice used in each experiment. Please correct.

      We have now double checked the number of the mice for the data presented and updated the figure legends accordingly (e.g., updates in legends fig1, fig5, etc).

      -  Page 18, line 7: 'Axscience' is not a producer of Isoflurane, but a company offering help with scientific manuscript writing. If this company's help was used, it should be stated in the acknowledgments section. Reference to ISOVET should be moved from line 15 to line 7.

      We apologize. We did not use external writing help, and now have removed the ‘Axcience’. The Isoflurane was under the mark ‘ISOVET’ from ‘Piramal’. This info is now moved up (page 21, line 11). 

      - Page 18, line 9: ' modified artificial cerebrospinal fluid (aCSF)'. Additional information on the reason for the modified aCSF would be useful for the reader.

      In this modified solution, the concentration of depolarizing ions (Na+, Ca2+) was reduced to lower the potential excitotoxicity during the tissue dissection (i.e., injury to the brain) for preparing the brain slices. Extra sucrose was added to balance the solution osmolarity. This solution has been used previously for the dissection and the slicing steps in adult mice (Jiang et al., 2016). We now add this justification in the text and quote the relevant reference (page 21, line14-16). 

      - Page 19, line 6: a reasoning for using Tamoxifen would be helpful for the reader.

      The Glast-CreERT2 is an inducible conditional mouse line that expresses Cre recombinase selectively in astrocytes upon tamoxifen injection. We now add this information in the text (page 22, line 10-11). 

      - Line 8 - 'Sigma'

      Fixed.

      - Line 7/8: It is not clear if ethanol is of 10% solution or if proportions of ethanol+tamoxifen to oil were of 1:9. The reasoning for each performed step is missing.

      We have now clarified the procedure (page 22, line 11-15).

      - Line 10: '/' means 'or'?

      Here, we mean the bigenic mice resulting from the crossing of the heterozygous Cre-dependent GCaMP6f and Glast-CreERT2 mouse lines. We now modify it to ‘Glast-CreERT2::Ai95GCaMP6f//WT’, in consistence with the presentation of other mouse lines in our manuscript (page 22, line 16).

      - Lines 22-23: being in-line with legislation was already stated at the beginning of the Methods so I suggest combining for clearance.

      Done. 

      - Page 21, line 4: it is good to mention which printer was used, but it would be worth mentioning the material the chamber was printed from - was it ABS?

      Yes. We add this info in the text now (page 24, line 5).

      - Line 9 -'PI' requires spelling out.

      It is ‘Physik Instrumente’, now added (page 24, line 10).

      - Line 11-12: What is the reason for background subtraction - clearer delineation of astrocytes/ increasing SNR in post-processing, or because SRB signal was also visible and changing in the background over time? Was the background removed in each frame independently (how many frames)? How long was the time-lapse and was the F0 frame considered as the first frame acquired? The background signal should be also measured and plotted alongside the astrocytic signal, as a reference (Figure 1). This should be clarified so that steps are to be followed easily.

      We sought to follow the temporal changes in SRB fluorescence signal. The acquired fluorescent images contain not only the SRB signals, but also the background signals consisting of for instance the biological tissue autofluorescence, digital camera background noise and the leak light sources from the environments. The value of the background signal was estimated by the mean fluorescence of peripheral cell-free subregions (15 × 15 µm²) and removed from all frames of time-lapse image stack. The traces shown in the figures reflect the full lengths of the time-lapse recordings. F0 was identified as the mean value of the 10 data points immediately preceding the detected fluorescence changes. The text is now updated (page 24 line 21 - page 25 line 5).

      - Line 15: Was astrocyte image delineation performed manually or automatically? Where was the center of the region considered in the reference to the astrocyte image? It would be good to see the regions delineated for reference.

      Astrocytes labeled by SRB were delineated manually with the soma taken as the center of the region of interest. We now exemplify the delineated region in Fig 1A, bottom.

      - Page 22, line 2: 'x4 objective'.

      Added (now, page 25, line 16). 

      - Line 3: 'barrels' - reference to publication or the explanation missing.

      The relevant reference is now added on barrel cortex (Erzurumlu and Gaspar, 2020) (page 25, line 19-20). 

      - Line 19: were the coordinates referred to = bregma?

      Yes. This info is now added (page 26, line 12). 

      - Line 20: was the habituation performed directly at the acquisition date? It is rather difficult to say that it was a habituation, but rather acute imaging. I suggest correcting, that mice were allowed to familiarize themselves with the setup for 30 minutes prior to the imaging start.

      In this context, although it is a very nice idea and experiment, the influence of acute stress in animals familiar with the setup only from the day of acquisition is difficult to avoid. It is a major concern, especially when considering norepinephrine as a master driver of neuronal and vascular activity through the brain, and strong activation of the hypothalamic-adrenal axis in response to acute stress. It is well known, that the response of monoamines is reduced in animals subjected to chronic v.s acute stress, but still larger than that if the stressor is absent.

      Major remark: The animals should, preferably, be imaged at least after 3 days of habituation based on existing knowledge. I suggest exploring the topic of the importance of habituation. It is difficult though, to objectively review these findings without considering stress and associated changes in vascular dynamics.

      Many thanks for the reviewer to help to precise this information. The text is accordingly updated to describe the experiment (now page 26, line 14). 

      - Page 23, line 17: number of animals included in experiments missing.

      The number of animals is added in Methods (page 27, line 12) and indicated in the legend of Figure 5. 

      - Line 18/19: were the respiratory effects observed after injection of saline or TGN-020? Since DWI was performed, the exclusion of perfusive flow on ADC is impossible.

      I suggest an additional experiment in n=3 animals per group, verifying the HR (and if possible BP) response after injection of TGN-020 and saline in mice.

      The respiratory rate has been recorded. We added the averaged respiratory rate before and after injection of TGN-020 or saline (now, Fig. S6; page 13, line 5-6).

      - Line 22: Please, provide the model of the scanner, the model of the cryoprobe, as well as the model of the gradient coil used, otherwise it is difficult to assess or repeat these experiments.

      We have now added the information of MRI system in Methods section (page 27, line17-21).

      - Page 24: line 3/4: although the achieved spatial resolution of DWI was good and slightly lower than desired and achievable due to limitations of the method itself as well as cryoprobe, it is acceptable for EPI in mice.

      Still, there is no direct explanation provided on the reasoning for using surface instead of volumetric coil, as well as on assuming an anisotropic environment (6 diffusion directions) for DWI measurements. This is especially doubtful if such a long echo-time was used alongside lower-thanpossible spatial resolution. Longer echo time would lower the SNR of the depicted signal but also would favor the depiction of signal from slow-moving protons and larger water pools. On the other hand, only 3 b-values were used, which is the minimum for ADC measurements, while a good research protocol could encompass at least 5 to increase the accuracy of ADC estimation and avoid undersampling between 250 and 1800 b-values. What was the reason for choosing this particular set of b-values and not 50, 600, and 2000? Besides, gradient duration time was optimally chosen, however, I have concerns about the decision for such a long gradient separation times.

      If the protocol could have been better optimized, the assessment could have been also performed in respiratory-gated mode, allowing minimization of the effects of one of the glymphatic system driving forces.

      Thus, I suggest commenting on these issues.

      We chose the cryoprobe to increase the signal-to-noise ratio (SNR) in DW-MRI with long echo-time and high b-value. The volume coil has a more homogeneous SNR in the whole brain rather than the cryoprobe, but SNR should be reduced compared with cryoprobe. We confirmed that, even at the ventral part of the brain, the image quality of DW-MRI images was enough to investigate the ADC with cryoprobe (Fig. 5B-C). This is mentioned now in Methods (page 27, line 17-21).

      We performed DW-MRI scanning for 5 min at each time-point using the condition of anisotropic resolution and 3 b-values, to investigate the time-course of ADC change following the injection of TGN020. Because the effect of TGN-020 appears about dozen of minutes post the injection (Igarashi et al., 2011), fast DW-MRI scanning is required. If isotropic DW-MRI with lower echo-time and more direction is used, longer scan time at each time point is required, maybe more than 1h. We agree that three bvalues is minimum to calculate the ADC and more b-values help to increase the accuracy. However, to achieve the temporal resolution so as to better catch the change of water diffusion, we have decided to use the minimum b-values. The previous study also validates the enough accuracy of DW-MRI with three b-values (Ashoor et al., 2019). Furthermore, previous study that used long diffusion time (> 20 ms) and long echo time (40 ms) shows the good mean diffusivity (Aggarwal et al., 2020), supporting that our protocol is enough to investigate the ADC. We have now updated the description (page 28 line 5-9).  The reason why we choose the b = 250 and 1800 s/mm² is that 2000 s/mm² seems too high to get the good quality of image. In the previous study, we have optimized that ADC is measurable with b = 0, 250, and 1800 s/mm² (Debacker et al., 2020). 

      - Page 24, line 7: What was the post-processing applied for images acquired over 70 minutes? Did it consider motion-correction, co-registration, or drift-correction crucial to avoid pitfalls and mismatches in concluding data?

      The motion correction and co-registration were explained in Methods (page 28, line 12-14).

      Also, were these trace-weighted images or magnitude images acquired since DTI software was used for processing - while ADC fitting could be reliably done in Matlab, Python, or other software. Thus, was DSI software considering all 3 b-values or just used 0 and 1800 for the calculation of mean diffusivity for tractography (as ADC). The details should be explained.

      DSIstudio was used with all three b values (b = 0, 250, and 1800 s/mm²) to calculate the ADC. We added the description in Methods (page 28, line 16-18).

      To make sure that the results are not affected by the MR hardware, I suggest performing 3 control measurements in a standard water phantom, and presenting the results alongside the main findings.

      Thanks for this suggestion. We have performed new experiments and now added the control measurement with three phantoms, that is water, undecane, and dodecane. These new data are summarized now in Fig. S7, showing the stability of ADC throughout the 70 min scanning. We have updated the description on Method part (page 28, line 9-11) and on the Results (page 13, line 6-8).  

      - Line 13: were the ROI defined manually or just depicted from previously co-registered Allen Brain atlas?

      The ROIs of the cortex, the hippocampus, and the striatum were depicted with reference to Allen mouse brain atlas (https://scalablebrainatlas.incf.org/mouse/ABA12). This is explained in Methods (page 28, line 14-16).

      - Line 10: why the average from 1st and 2nd ADC was not considered, since it would reduce the influence of noise on the estimation of baseline ADC?

      We are sorry that it was a typo. The baseline was the average between 1st and 2nd ADC. We corrected the description (page 28, line 20).

      STATISTIC:

      Which type of t-test - paired/unpaired/two samples was used and why? Mann-Whitney U-tets are used as a substitution for parametric t-tests when the data are either non-parametric or assuming normal distribution is not possible. In which case Bonferroni's-Holm correction was used? - I couldn't find any mention of any multiple-group analysis followed by multiple comparisons. Each section of the manuscript should have a description of how the quantitative data were treated and in which aim. I suggest carefully correcting all figures accordingly, and following the remarks given to the Figure 1.

      We used unpaired t-test for data obtained from samples of different conditions. Indeed, MannWhitney U-test is used when the data are non-parametric deviating from normal distributions.  Bonferroni-Holm correction was used for multiple comparisons (e.g., Fig. 4D-E).

      Reviewer #3 (Recommendations For The Authors):

      I think that the following statement is insufficient: "The authors commit to share data, documentation, and code used in analysis". My understanding is eLife expects that all key data to be provided in a supplement.

      We thank the reviewer; we follow the publication guidelines of eLife. 

      References

      Aggarwal, M., Smith, M.D., and Calabresi, P.A. (2020). Diffusion-time dependence of diffusional kurtosis in the mouse brain. Magn Reson Med 84, 1564-1578.

      Ashoor, M., Khorshidi, A., and Sarkhosh, L. (2019). Estimation of microvascular capillary physical parameters using MRI assuming a pseudo liquid drop as model of fluid exchange on the cellular level. Rep Pract Oncol Radiother 24, 3-11.

      Cauli, B., and Hamel, E. (2018). Brain Perfusion and Astrocytes. Trends in neurosciences 41, 409-413.

      Debacker, C., Djemai, B., Ciobanu, L., Tsurugizawa, T., and Le Bihan, D. (2020). Diffusion MRI reveals in vivo and non-invasively changes in astrocyte function induced by an aquaporin-4 inhibitor. PLoS One 15, e0229702.

      Erzurumlu, R.S., and Gaspar, P. (2020). How the Barrel Cortex Became a Working Model for Developmental Plasticity: A Historical Perspective. J Neurosci 40, 6460-6473.

      Farr, G.W., Hall, C.H., Farr, S.M., Wade, R., Detzel, J.M., Adams, A.G., Buch, J.M., Beahm, D.L., Flask, C.A., Xu, K., et al. (2019). Functionalized Phenylbenzamides Inhibit Aquaporin-4 Reducing Cerebral Edema and Improving Outcome in Two Models of CNS Injury. Neuroscience 404, 484-498.

      Giannetto, M.J., Gomolka, R.S., Gahn-Martinez, D., Newbold, E.J., Bork, P.A.R., Chang, E., Gresser, M., Thompson, T., Mori, Y., and Nedergaard, M. (2024). Glymphatic fluid transport is suppressed by the aquaporin-4 inhibitor AER-271. Glia.

      Gomolka, R.S., Hablitz, L.M., Mestre, H., Giannetto, M., Du, T., Hauglund, N.L., Xie, L., Peng, W., Martinez, P.M., Nedergaard, M., et al. (2023). Loss of aquaporin-4 results in glymphatic system dysfunction via brain-wide interstitial fluid stagnation. eLife 12.

      Huber, V.J., Tsujita, M., and Nakada, T. (2009). Identification of aquaporin 4 inhibitors using in vitro and in silico methods. Bioorg Med Chem 17, 411-417.

      Igarashi, H., Huber, V.J., Tsujita, M., and Nakada, T. (2011). Pretreatment with a novel aquaporin 4 inhibitor, TGN-020, significantly reduces ischemic cerebral edema. Neurol Sci 32, 113-116.

      Igarashi, H., Tsujita, M., Suzuki, Y., Kwee, I.L., and Nakada, T. (2013). Inhibition of aquaporin-4 significantly increases regional cerebral blood flow. Neuroreport 24, 324-328.

      Jiang, R., Diaz-Castro, B., Looger, L.L., and Khakh, B.S. (2016). Dysfunctional Calcium and Glutamate Signaling in Striatal Astrocytes from Huntington's Disease Model Mice. J Neurosci 36, 3453-3470.

      Mayo, F., Gonzalez-Vinceiro, L., Hiraldo-Gonzalez, L., Calle-Castillejo, C., Morales-Alvarez, S., Ramirez-Lorca, R., and Echevarria, M. (2023). Aquaporin-4 Expression Switches from White to Gray Matter Regions during Postnatal Development of the Central Nervous System. Int J Mol Sci 24.

      Mola, M.G., Sparaneo, A., Gargano, C.D., Spray, D.C., Svelto, M., Frigeri, A., Scemes, E., and Nicchia, G.P. (2016). The speed of swelling kinetics modulates cell volume regulation and calcium signaling in astrocytes: A different point of view on the role of aquaporins. Glia 64, 139-154.

      Risher, W.C., Andrew, R.D., and Kirov, S.A. (2009). Real-time passive volume responses of astrocytes to acute osmotic and ischemic stress in cortical slices and in vivo revealed by two-photon microscopy. Glia 57, 207-221.

      Salman, M.M., Kitchen, P., Yool, A.J., and Bill, R.M. (2022). Recent breakthroughs and future directions in drugging aquaporins. Trends Pharmacol Sci 43, 30-42.

      Shigetomi, E., Bushong, E.A., Haustein, M.D., Tong, X., Jackson-Weaver, O., Kracun, S., Xu, J., Sofroniew, M.V., Ellisman, M.H., and Khakh, B.S. (2013). Imaging calcium microdomains within entire astrocyte territories and endfeet with GCaMPs expressed using adeno-associated viruses. J Gen Physiol 141, 633-647.

      Solenov, E., Watanabe, H., Manley, G.T., and Verkman, A.S. (2004). Sevenfold-reduced osmotic water permeability in primary astrocyte cultures from AQP-4-deficient mice, measured by a fluorescence quenching method. Am J Physiol Cell Physiol 286, C426-432.

      Verkman, A.S., Binder, D.K., Bloch, O., Auguste, K., and Papadopoulos, M.C. (2006). Three distinct roles of aquaporin-4 in brain function revealed by knockout mice. Biochim Biophys Acta 1758, 10851093.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:  

      Reviewer #1 (Public Review):  

      Summary:  

      The authors have presented data showing that there is a greater amount of spontaneous differentiation in human pluripotent cells cultured in suspension vs static and have used PKCβ and Wnt signaling pathway inhibitors to decrease the amount of differentiation in suspension culture.  

      Strengths:  

      This is a very comprehensive study that uses a number of different rector designs and scales in addition to a number of unbiased outcomes to determine how suspension impacts the behaviour of the cells and in turn how the addition of inhibitors counteracts this effect. Furthermore, the authors were also able to derive new hiPSC lines in suspension with this adapted protocol.  

      Weaknesses:  

      The main weakness of this study is the lack of optimization with each bioreactor change. It has been shown multiple times in the literature that the expansion and behaviour of pluripotent cells can be dramatically impacted by impeller shape, RPM, reactor design, and multiple other factors. It remains unclear to me how much of the results the authors observed (e.g. increased spontaneous differentiation) was due to not having an optimized bioreactor protocol in place (per bioreactor vessel type). For instance - was the starting seeding density, RPM, impeller shape, feeding schedule, and/or any other aspect optimized for any of the reactors used in the study, and if not, how were the values used in the study determined?  

      Thank you for your thoughtful comments. According to your comments, we have performed several experiments to optimize the bioreactor conditions in revised manuscripts. We tested several cell seeding densities and several stirring speeds with or without WNT/PKCβ inhibitors  (Figure 6—figure supplement 1). We found that 1 - 2 x 105 cells/mL of the seeding densities and 50 - 150 rpm of the stirring speeds were applicable in the proliferation of these cells. Also, PKCβ and Wnt inhibitors suppressed spontaneous differentiation in bioreactor conditions regardless with stirring speeds. As for the impeller shape and reactor design, we just used commonly-used ABLE's bioreactor for 30 mL scale and Eppendorf's bioreactors for 320 mL scale, which had been designed and used for human pluripotent stem cell culture conditions in previous studies, respectively (Matsumoto et al., 2022 (doi: 10.3390/bioengineering9110613); Kropp et al., 2016 (doi: 10.5966/sctm.2015-0253)). We cited these previous studies in the Results and Materials and Methods section. We believe that these additional data and explanation are sufficient to satisfy your concerns on the optimization of bioreactor experiments.

      Reviewer #2 (Public Review):  

      This study by Matsuo-Takasaki et al. reported the development of a novel suspension culture system for hiPSC maintenance using Wnt/PKC inhibitors. The authors showed elegantly that inhibition of the Wnt and PKC signaling pathways would repress spontaneous differentiation into neuroectoderm and mesendoderm in hiPSCs, thereby maintaining cell pluripotency in suspension culture. This is a solid study with substantial data to demonstrate the quality of the hiPSC maintained in the suspension culture system, including long-term maintenance in >10 passages, robust effect in multiple hiPSC lines, and a panel of conventional hiPSC QC assays. Notably, large-scale expansion of a clinical grade hiPSC using a bioreactor was also demonstrated, which highlighted the translational value of the findings here. In addition, the author demonstrated a wide range of applications for the IWR1+LY suspension culture system, including support for freezing/thawing and PBMC-iPSC generation in suspension culture format. The novel suspension culture system reported here is exciting, with significant implications in simplifying the current culture method of iPSC and upscaling iPSC manufacturing.  

      Another potential advantage that perhaps wasn't well discussed in the manuscript is the reported suspension culture system does not require additional ECM to provide biophysical support for iPSC, which differentiates from previous studies using hydrogel and this should further simplify the hiPSC culture protocol.  

      Interestingly, although several hiPSC suspension media are currently available commercially, the content of these suspension media remained proprietary, as such the signaling that represses differentiation/maintains pluripotency in hiPSC suspension culture remained unclear. This study provided clear evidence that inhibition of the Wnt/PKC pathways is critical to repress spontaneous differentiation in hiPSC suspension culture.  

      I have several concerns that the authors should address, in particular, it is important to benchmark the reported suspension system with the current conventional culture system (eg adherent feeder-free culture), which will be important to evaluate the usefulness of the reported suspension system.  

      Thank you for this insightful suggestion. In this revised manuscript, we have performed additional experiments using conventional media, mTeSR1 (Stem Cell Technologies, Vancouver, Canada), comparing with the adherent feeder-free culture system in four different hiPSC lines simultaneously. Compared to the adherent conditions, the suspension conditions without chemical treatment decreased the expression of self-renewal marker genes/proteins and increased the expression levels of SOX17, T, and PAX6 (Figure 4 - figure supplement 2). Importantly, the treatment of LY333531 and IWR-1-endo in mTeSR1 medium reversed the decreased expression of these undifferentiated markers and suppressed the increased expression of differentiation markers in suspension culture conditions, reaching the comparable levels of the adherent culture conditions. These results indicated that these chemical treatments in suspension culture are beneficial even when using a conventional culture medium.

      Also, the manuscript lacks a clear description of a consistent robust effect in hiPSC maintenance across multiple cell lines.  

      Thank you for this insightful suggestion. We have performed additional experiments on hiPSC maintenance across 5 hiPSC lines in suspension culture using StemFit AK02N medium simultaneously (Figure 3C - E). Overall, the treatment of LY333531 and IWR-1-endo in the StemFit AK02N medium reversed the decreased expression of these undifferentiated markers and suppressed the increased expression of differentiation markers in suspension culture conditions. Also as above, we have added results using conventional media, mTeSR1, in comparison to the adherent feeder-free culture system in four different hiPSC lines simultaneously. These results show that this chemical treatment consistently produced robust effects in hiPSC maintenance across multiple cell lines using multiple conventional media.

      There are also several minor comments that should be addressed to improve readability, including some modifications to the wording to better reflect the results and conclusions.  

      In the revised manuscript, we have added and corrected the descriptions to improve readability, including some modifications to the wording to better reflect the results and conclusions. 

      Reviewer #3 (Public Review):  

      In the current manuscript, Matsuo-Takasaki et al. have demonstrated that the addition of PKCβ and WNT signaling pathway inhibitors to the suspension cultures of iPSCs suppresses spontaneous differentiation. These conditions are suitable for large-scale expansion of iPSCs. The authors have shown that they can perform single-cell cloning, direct cryopreservation, and iPSC derivation from PBMCs in these conditions. Moreover, the authors have performed a thorough characterization of iPSCs cultured in these conditions, including an assessment of undifferentiated stem cell markers and genetic stability. The authors have elegantly shown that iPSCs cultured in these conditions can be differentiated into derivatives of three germ layers. By differentiating iPSCs into dopaminergic neural progenitors, cardiomyocytes, and hepatocytes they have shown that differentiation is comparable to adherent cultures.

      This new method of expanding iPSCs will benefit the clinical applications of iPSCs.  

      Recently, multiple protocols have been optimized for culturing human pluripotent stem cells in suspension conditions and their expansion. Additionally, a variety of commercially available media for suspension cultures are also accessible. However, the authors have not adequately justified why their conditions are superior to previously published protocols (indicated in Table 1) and commercially available media. They have not conducted direct comparisons.  

      Thank you for this careful suggestion. In this revised manuscript, we have added results using a conventional medium, mTeSR1 (Stem Cell Technologies), which has been used for the suspension culture in several studies. Compared to the adherent conditions using mTeSR1 medium, the suspension conditions with the same medium decreased the ratio of TRA1-60/SSEA4-positive cells and OCT4positive cells and the expression levels of OCT4 and NANOG and decreased the expression levels of SOX17, T, and PAX6 in 4 different hiPSC lines simultaneously (Figure 4 - Supplement 2). Importantly, the treatment of LY333531 and IWR-1-endo in the mTeSR1 medium reversed the decreased expression of these undifferentiated markers. With these direct comparisons, we were able to justify why our conditions are superior to previously published protocols using commercially available media.

      Additionally, the authors have not adequately addressed the observed variability among iPSC lines. While they claim in the Materials and Methods section to have tested multiple pluripotent stem cell lines, they do not clarify in the Results section which line they used for specific experiments and the rationale behind their choices. There is a lack of comparison among the different cell lines. It would also be beneficial to include testing with human embryonic stem cell lines.  

      Thank you for this insightful suggestion. In this revised manuscript, we have added results on 5 different hiPSC lines at the same time (Figure 3 C-E). Excuse for us, but it is hard to use human embryonic stem cell lines for this study due to ethical issues in Japanese governmental regulations. The treatment of LY333531 and IWR-1-endo increased the expression of self-renewal marker genes/proteins and decreased the expression levels of SOX17, T, and PAX6 in these hiPSC lines in general. These results indicated that these chemical treatments in suspension culture were robust in general while addressing the observed variability among iPSC lines.

      Additionally, there is a lack of information regarding the specific role of the two small molecules in these conditions.  

      In this revised manuscript, we have added data and discussion regarding the specific role of the two small molecules in these conditions in the Results and Discussion section. For using WNT signaling inhibitor, we hypothesized that adding Wnt signaling inhibitors may inhibit the spontaneous differentiation of hiPSCs into mesendoderm. Because exogenous Wnt signaling induces the differentiation of human pluripotent stem cells into mesendoderm lineages (Nakanishi et al, 2009; Sumi et al, 2008; Tran et al, 2009; Vijayaragavan et al, 2009; Woll et al, 2008). Also, endogenous expression and activation of Wnt signaling in pluripotent stem cells are involved in the regulation of mesendoderm differentiation potentials (Dziedzicka et al, 2021). For using PKC inhibitors, "To identify molecules with inhibitory activity on neuroectodermal differentiation, hiPSCs were treated with candidate molecules in suspension conditions. We selected these candidate molecules based on previous studies related to signaling pathways or epigenetic regulations in neuroectodermal development (reviewed in (GiacomanLozano et al, 2022; Imaizumi & Okano, 2021; Sasai et al, 2021; Stern, 2024) ) or in pluripotency safeguards (reviewed in (Hackett & Surani, 2014; Li & Belmonte, 2017; Takahashi & Yamanaka, 2016; Yagi et al, 2017))." 

      We also found that the expression of naïve pluripotency markers, KLF2, KLF4, KLF5, and DPPA3, were up-regulated in the suspension conditions treated with LY333531 and IWR-1-endo while the expression of OCT4 and NANOG was at the same levels (Figure 5—figure supplement 2). Combined with RT-qPCR analysis data on 5 different hiPSC lines (Figure 3E), these results suggest that IWRLY conditions may drive hiPSCs in suspension conditions to shift toward naïve pluripotent states.

      The authors have not attempted to elucidate the underlying mechanism other than RNA expression analysis.  

      Regarding the underlying mechanisms, we have added results and discussion in the revised manuscript.  For Wnt activation in human pluripotent stem cells, several studies reported some WNT agonists were expressed in undifferentiated human pluripotent stem cells (Dziedzicka et al., 2021; Jiang et al, 2013; Konze et al, 2014). In suspension culture, cell aggregation causes tight cell-cell interaction. The paracrine effect of WNT agonists in the cell aggregation may strongly affect neighbor cells to induce spontaneous differentiation into mesendodermal cells. Thus, we think that the inhibition of WNT signaling is effective to suppress the spontaneous differentiation into mesendodermal lineages in suspension culture.

      For PKC beta activation in human pluripotent stem cells, we have shown that phosphorylated PKC beta protein expression is up-regulated in suspension culture than in adherent culture with western blotting (Figure 3 - figure supplement 1). The treatment of PKCβ inhibitor is effective to suppress spontaneous differentiation into neuroectodermal lineages. For future perspectives, it is interesting to examine (1) how and why PKCβ is activated (or phosphorylated), especially in suspension culture conditions, and (2) how and why PKCβ inhibition can suppress the neuroectodermal differentiation. Conversely, it is also interesting to examine how and why PKCβ activation is related to neuroectodermal differentiation.

      For these reasons some aspects of the manuscript need to be extended:  

      (1) It is crucial for authors to specify the culture media used for suspension cultures. In the Materials and Methods section, the authors mentioned that cells in suspension were cultured in either StemFit AK02N medium, 415 StemFit AK03N (Cat# AK03N, Ajinomoto, Co., Ltd., Tokyo, Japan), or StemScale PSC416 suspension medium (A4965001, Thermo Fisher Scientific, MA, USA). The authors should clarify in the text which medium was used for suspension cultures and whether they observed any differences among these media.  

      Sorry for this confusion. Basically in this study, we use StemFit AK02N medium (Figure 1-5, 7-9). For bioreactor experiments (Figure 6), we use StemFit AK03N medium, which is free of human and animalderived components and GMP grade. To confirm the effect of IWRLY chemical treatment, we use StemScale suspension medium (Figure 4 - figure supplement 1) and mTeSR1 medium (Figure 4 - figure supplement 2 and Figure 8 - figure supplement 1). In the revised manuscript we clarified which medium was used for suspension cultures in the Results and Materials and Methods section.

      Although we have not compared directly among these media in suspension culture (, which is primarily out of the focus of this study), we have observed some differences in maintaining self-renewal characteristics, preventing spontaneous differentiation (including tendencies to differentiate into specific lineages), stability or variation among different experimental times in suspension culture conditions. Overcoming these heterogeneity caused by different media, the IWRLY chemical treatment stably maintain hiPSC self-renewal in general. We have added this issue in the Discussion section.

      (2) In the Materials and Methods section, the authors mentioned that they used multiple cell lines for this study. However, it is not clear in the text which cell lines were used for various experiments. Since there is considerable variation among iPSC lines, I suggest that the authors simultaneously compare 2 to 3 pluripotent stem cell lines for expansion, differentiation, etc.  

      Thank you for this careful suggestion. We have added more results on the simultaneous comparison using StemFit AK02N medium in 5 different hiPSC lines (Figure 3 C-E) and using mTeSR1 medium in 4 different hiPSC lines (Figure 4 - figure supplement 2). From both results, we have shown that the treatment of LY333531 and IWR-1-endo was beneficial in maintaining the self-renewal of hiPSCs while suppressing spontaneous differentiation.

      (3) Single-cell sorting can be confusing. Can iPSCs grown in suspensions be single-cell sorted?

      Additionally, what was the cloning efficiency? The cloning efficiency should be compared with adherent cultures.  

      Sorry for this confusion. With our method, iPSCs grown in IWRLY suspension conditions can be singlecell sorted. We have improved the clarity of the schematics (Figure 7A). Also, we added the data on the cloning efficiency, which are compared with adherent cultures (Figure 7B). The cloning efficiency of adherent cultures was around 30%. While the cloning efficiency of suspension cultures without any chemical treatment was less than 10%, the IWR-1-endo treatment in the suspension cultures increased the efficiency was more than 20%. However, the treatment of LY333531 decreased the efficiency. These results indicated that the IWR-1-endo treatment is beneficial in single-cell cloning in suspension culture.

      (4) The authors have not addressed the naïve pluripotent state in their suspension cultures, even though PKC inhibition has been shown to drive cells toward this state. I suggest the authors measure the expression of a few naïve pluripotent state markers and compare them with adherent cultures  

      Thank you for this insightful comment. In the revised manuscript, we have added the data of RT-qPCR in 5 different hiPSC lines and specific gene expression from RNA-seq on naïve pluripotent state markers (Figure 3E and Figure 5 - figure supplement 2), respectively. Interestingly, the expression of KLF2, KLF4, KLF5, and DPPA3 is significantly up-regulated in IWRLY conditions. These results suggested that IWRLY suspension conditions drove hiPSCs toward naïve pluripotent state.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):  

      Overall, I feel that this study is very interesting and comprehensive, but has significant weaknesses in the bioprocessing aspects. More optimization data is required for the suspension culture to truly show that the differentiation they are observing is not an artifact of a non-optimized protocol.  

      Thank you for your thoughtful comments. Following your comments, we have performed several experiments to optimize the bioreactor conditions in revised manuscripts. We tested several cell seeding densities and several stirring speeds with or without WNT/PKCβ inhibitors (Figure 6—figure supplement 1). From these optimization experiments, we found that 1 - 2 x 105 cells/mL of the seeding densities and 50 - 150 rpm of the stirring speeds were applicable in the proliferation of these cells. Also, PKCβ and Wnt inhibitors suppressed spontaneous differentiation in bioreactor conditions regardless with acceptable stirring speeds. As for the impeller shape and reactor design, we just used commonly-used ABLE's bioreactor for 30 mL scale and Eppendorf's bioreactors for 320 mL scale, which had been designed and used for human pluripotent stem cell culture conditions in previous studies, respectively (Matsumoto et al., 2022 (doi: 10.3390/bioengineering9110613); Kropp et al., 2016 (doi:10.5966/sctm.2015-0253). We cited these previous studies in the Results section. We believe that these additional data and explanation are sufficient to satisfy your concerns on the optimization of bioreactor experiments.

      Reviewer #2 (Recommendations For The Authors):  

      The following comments should be addressed by the authors to improve the manuscript:  

      (1) Abstract: '...a scalable culture system that can precisely control the cell status for hiPSCs is not developed yet.' There were previous reports for a scalable iPSC culture system so I would suggest toning down/rephrasing this point: eg that improvement in a scalable iPSC culture system is needed.  

      Thank you for this careful suggestion. Following this suggestion, We have changed the sentence as "the improvement in a scalable culture system that can precisely control the cell status for hiPSCs is needed."

      (2) Line 71: please specify what media was used as a 'conventional medium' for suspension culture, was it Stemscale?  

      As suggested, we specified the media as StemFit AK02N used for this experiment. 

      (3) Fig 1E: It's not easy to see gating in the FACS plots as the threshold line is very faint, please fix this issue.  

      As suggested, we used thicker lines for the gating in the FACS plots (Figure 1E).

      (4) Fig 1G-J, Fig 2D-H: The RNAseq figures appeared pixelated and the resolution of these figures should be improved. The x-axis label for Fig 1H is missing.  

      We have improved these figures in their resolution and clarity. Also, we have added the x-axis label as "enrichment distribution" for gene set enrichment analysis (GSEA) in Figures 1H, 5F, and 5- figure supplement 1B.

      (5) Line 103-107: 'Since Wnt signaling induces the differentiation of human pluripotent stem cells into mesendoderm lineages, and is endogenously involved in the regulation of mesendoderm differentiation of pluripotent stem cells.....'. The two points seem the same and should be clarified.  

      Sorry for this unclear description. We have changed this description as "Exogenous Wnt signaling induces the differentiation of human pluripotent stem cells into mesendoderm lineages (Nakanishi et al, 2009; Sumi et al, 2008; Tran et al, 2009; Vijayaragavan et al, 2009; Woll et al, 2008). Also, endogenous expression and activation of WNT signaling in pluripotent stem cells are involved in the regulation of mesendoderm differentiation potentials (Dziedzicka et al, 2021; Jiang et al, 2013)." With this description, we hope that you will understand the difference of two points.

      (6) Line 113: 'In samples treated with inhibitors' should be 'In samples treated with Wnt inhibitors'.  

      Thank you for this careful suggestion. We have corrected this. 

      (7) Line 115: '....there was no reduction in PAX6 expression.' That's not entirely correct, there was a reduction in PAX6 in IWR-1 endo treatment compared to control suspension culture (is this significant?), but not consistently for IWP-2 treatment. Please rephrase to more accurately describe the results.  

      Sorry for this inaccurate description. We have corrected this phrase as "there was only a small reduction in PAX6 expression in the IWR-1-endo-treated condition and no reduction in the IWP2-treated condition" as recommended.

      (8) It's critical to show that the effect of the suspension culture system developed here can maintain an undifferentiated state for multiple hiPSC lines. I think the author did test this in multiple cell lines, but the results are scattered and not easy to extract. I would recommend adding info for the hiPSC line used for the results in the legend, eg WTC11 line was used for Figure 3, 201B7 line was used for Figure 2. I would suggest compiling a figure that confirms the developed suspension system (IWR-1 +LY) can support the maintenance of multiple hiPSC lines.  

      Thank you for this insightful suggestion. We have added data on hiPSC maintenance across 5 hiPSC lines in suspension culture using StemFit AK02N medium simultaneously (Figure 3C - E) and on hiPSC maintenance across 4 hiPSC lines in suspension culture using mTeSR1 medium simultaneously  (Figure 4 - figure supplement 2). Together, the treatment of LY333531 and IWR-1-endo in these media reversed the decreased expression of these undifferentiated markers and suppressed the increased expression of differentiation markers in suspension culture conditions. These results show that these chemical treatment produced a consistent robust effect in hiPSC maintenance across multiple cell lines.

      (9) Line 166: Please use the correct gene nomenclature format for a human gene (italicised uppercase) throughout the manuscript. Also, list the full gene name rather than PAX2,3,5.  

      Sorry for the incorrectness of the gene names. We have corrected them.

      (10) Please improve the resolution for Figure 4D.  

      We have provided clearer images of Figure 4D.

      (11) In the first part of the study, the control condition was referred to as 'suspension culture' with spontaneous differentiation, but in the later parts sometimes the term 'suspension culture' was used to describe the IWR1+LY condition (ie lines 271-272). I would suggest the authors carefully go through the manuscript to avoid misinterpretation on this issue.  

      Thank you for this careful suggestion. To avoid this misinterpretation on this issue, we use 'suspension culture' for just the conventional culture medium and 'LYIWR suspension culture' for the culture medium supplemented with LY333531 and IWR1-endo in this manuscript.

      (12) Figure 5: It is impressive to demonstrate that the IWR1+LY suspension culture enables large-scale expansion of a clinical-grade hiPSC line using a bioreactor, yielding 300 vials/passage. Can the author add some information regarding cell yield using a conventional adherent culture system in this cell line? This will provide a comparison of the performance of the IWR1+LY suspension culture system to the conventional method.  

      Thank you for this valuable suggestion. We have provided information regarding cell yield using a conventional adherent culture system in this cell line in the Results as "Since the population doubling time (PDT) of this hiPSC line in adherent culture conditions is 21.8 - 32.9 hours at its production (https://www.cira-foundation.or.jp/e/assets/file/provision-of-ips-cells/QHJI14s04_en.pdf), this proliferation rate in this large scale suspension culture is comparable to adherent culture conditions."

      (13) Line 273: For testing the feasibility of using IWR1+LY media to support the freeze and thaw process, the author described the cell number and TRA160+/OCT4+ cell %. How is this compared to conventional media (eg E8)? It would be nice to see a head-to-head comparison with conventional media, quantification of cell count or survival would be helpful to determine this.  

      For this issue, we attempted a direct freeze and thaw process using conventional media, StemFit AK02N in 201B7 line (Figure 8) or mTeSR1 in 4 different hiPSC lines(Figure 8 - figure supplement 1) with or without IWR1+LY. However, since the hiPSCs cultured in suspension culture conditions without IWR1+LY quickly lost their self-renewal ability, these frozen cells could not be recovered in these conditions nor counted. Our results indicate that the addition to IWR1+LY in the thawing process support the successful recovery in suspension conditions.

      (14) More details of the passaging method should be added in the method section. Do you do cell count following accutase dissociation and replate a defined density (eg 1x10^5/ml)?  

      Yes. We counted the cells in every passage in suspension culture conditions. We have added more explanation in the Materials and Methods as below.

      "The dissociated cells were counted with an automatic cell counter (Model R1, Olympus) with Trypan Blue staining to detect live/dead cells. The cell-containing medium was spun down at 200 rpm for 3 minutes, and the supernatant was aspirated. The cell pellet was re-suspended with a new culture medium at an appropriate cell concentration and used for the next suspension culture."

      (15) The IWR1+LY suspension culture system requires passage every 3-5 days. Is there still spontaneous differentiation if the hiPSC aggregate grows too big?  

      Thank you for this insightful question.

      Yes. The size of hiPSC aggregates is critical in maintaining self-renewal in our method as previous studies showed. Stirring speed is a key to make the proper size of hiPSC aggregates in suspension culture. Also, the culture period between passages is another key not to exceed the proper size of hiPSC aggregates. Thus, we keep stirring speed at 90 rpm (135 rpm for bioreactor conditions) basically and passaging every 3 - 5 days in suspension culture conditions.

      (16) Several previous studies have described the development of hiPSC suspension culture system using hydrogel encapsulation to provide biophysical modulation (reviewed in PMID: 32117992). In comparison, it seems that the IWR1+LY suspension system described here does not require ECM addition which further simplifies the culture system for iPSC. It would be good to add more discussion on this topic in the manuscript, such as the potential role of the E-cadherin in mediating this effect - as RNAseq results indicated that CDH1 was upregulated in the IWR1+LY condition).  

      Thank you for this valuable suggestion. We have added more discussion on this topic in the Discussion section as below.

      "Thus, our findings show that suspension culture conditions with Wnt and PKCβ inhibitors (IWRLY suspension conditions) can precisely control cell conditions and are comparable to conventional adhesion cultures regarding cellular function and proliferation. Many previous 3D culture methods intended for mass expansion used hydrogel-based encapsulation or microcarrier-based methods to provide scaffolds and biophysical modulation (Chan et al, 2020). These methods are useful in that they enable mass culture while maintaining scaffold dependence. However, the need for special materials and equipment and the labor and cost involved are concerns toward industrial mass culture. On the other hand, our IWRLY suspension conditions do not require special materials such as hydrogels, microcarriers, or dialysis bags, and have the advantage that common bioreactors can be used. "

      "On the other hand, it is interesting to see whether and how the properties of hiPSCs cultured in IWRLY suspension culture conditions are altered from the adherent conditions. Our transcriptome results in comparison to adherent conditions show that gene expression associated with cell-to-cell attachment, including E-cadherin (CDH1), is more activated. This may be due to the status that these hiPSCs are more dependent on cell-to-cell adhesion where there is no exogenous cell-to-substrate attachment in the three-dimensional culture. Previous studies have shown that cell-to-cell adhesion by E-cadherin positively regulates the survival, proliferation, and self-renewal of human pluripotent stem cells (Aban et al, 2021; Li et al, 2012; Ohgushi et al, 2010). Furthermore, studies have shown that human pluripotent stem cells can be cultured using an artificial substrate consisting of recombinant E-cadherin protein alone without any ECM proteins (Nagaoka et al, 2010). Also, cell-to-cell adhesion through gap junctions regulates the survival and proliferation of human pluripotent stem cells (Wong et al, 2006; Wong et al, 2004). These findings raise the possibility that the cell-to-cell adhesion, such as E-cadherin and gap junctions, are compensatory activated and support hiPSC self-renewal in situations where there are no exogenous ECM components and its downstream integrin and focal adhesion signals are not forcedly activated in suspension culture conditions. It will be interesting to elucidate these molecular mechanisms related to E-cadherin in the hiPSC survival and self-renewal in IWRLY suspension conditions in the future."

      Reviewer #3 (Recommendations For The Authors):  

      (1) I am a bit confused about the passage of adherent cultures. The authors claim that they used EDTA for passaging and plated cells at a density of 2500 cells/cm2. My understanding is that EDTA is typically used for clump passaging rather than single-cell passaging.  

      Sorry about this confusion. We routinely use an automatic cell counter (model R1, Olympus) which can even count small clumpy cells accurately. Thus, we show the cell numbers in the passaging of adherent hiPSCs.  

      (2) Figure 2D- The authors have not directly compared IWR-1-endo with IWR-1-endo+Go6983 for the expression of T and SOX17, a simultaneous comparison would be an interesting data.  

      As recommended, we have added the data that directly compared IWR-1-endo with IWR-1endo+Go6983 for the expression of T and SOX17 in Figure 2D. The addition of IWR-1-endo alone decreased the expression of T and SOX17, but not PAX6, which were similar to the data in Figure 2C.

      (3) Oxygen levels play a crucial role in pluripotency maintenance. Could the authors please specify the oxygen levels used for culturing cells in suspension?  

      Sorry for not mentioning about oxygen levels in this study. We basically use normal oxygen levels (i.e., 21% O2) in suspension culture conditions. We have explained this in the Materials and Methods section.

      (4) Figure supplement 1 (G and H): In the images, it is difficult to determine whether the green (PAX6 and SOX17) overlaps with tdT tomato. For better visualization, I suggest that the authors provide separate images for the green and red colors, as well as an overlay.  

      Sorry for these unclear images. We have provided separate images for the green and red colors, as well as an overlay in Figure 1- figure supplement 1 G and H.

      (5) The authors have only compared quantitatively the expression of TRA-1-60 for most of the figures. I suggest that the authors quantitatively measure the expression of other markers of undifferentiated stem cells, such as NANOG, OCT4, SSEA4, TRA-1-81, etc.  

      We have added the quantitative data of the expression of markers of undifferentiated hiPSCs including NANOG, OCT4, SSEA4, and TRA-1-60 on 5 different hiPSC lines in Figure 3 C-E.

      (6) In Figure 2D, the authors have tested various small molecules but the rationale behind testing those molecules is missing in the text.  

      These molecules are chosen as putatively affecting neuroectodermal induction from the pluripotent state.

      We have added the rationale with appropriate references in the Results section as below.

      "We have chosen these candidate molecules based on previous studies related to signaling pathways or epigenetic regulations in neuroectodermal development (reviewed in (Giacoman-Lozano et al, 2022; Imaizumi & Okano, 2021; Sasai et al, 2021; Stern, 2024) ) or in pluripotency safeguards (reviewed in (Hackett & Surani, 2014; Li & Belmonte, 2017; Takahashi & Yamanaka, 2016; Yagi et al, 2017)) (Figure 2A; listed in Supplementary Table 1). "

      (7) In the beginning authors used Go6983 but later they switched to LY333531, the reasoning behind the switch is not explained well.  

      To explain the reasons for switching to LY333531 from Go6983 clearly, we reorganized the order of results and figures. In short, we found that the suppression of PAX6 expression in hiPSCs cultured in suspension conditions was observed with many PKC inhibitors, all of which possessed PKCβ inhibition activity (Figure 2—figure supplement 2B-D). Also, elevated expression of PKCβ in suspension-cultured hiPSCs could affect the spontaneous differentiation (Figure 3—figure supplement 1A-C). To further explore the possibility that the inhibition of PKCβ is critical for the maintenance of self-renewal of hiPSCs in the suspension culture, we evaluated the effect of LY333531, a PKCβ specific inhibitor. The maintenance of suspension-cultured hiPSCs is specifically facilitated by the combination of PKCβ and Wnt signaling inhibition (Figure 3A and B; Figure 2—figure supplement 1). Last, we performed longterm culture for 10 passages in suspension conditions and compared hiPSC growth in the presence of LY333531 or Go6983. LY333531 was superior in the proliferation rate and maintaining OCT4 protein expression in the long-term culture (Figure 4). Thus, we used IWR-1-endo and LY333531 for the rest of this study.

      (8) I suggest the authors measure cell death after the treatment with LY+IWR-1-endo.  

      Thank you for this valuable suggestion. We have measured cell death after the treatment with LY+IWR1-endo and found that the chemical combination had no or little effects on the cell death. We have added data in Figure 3—figure supplement 2 and the description in the Results section as below. "We also examined whether the combination of PKCb and Wnt signaling inhibition affects the cell survival in suspension conditions. In this experiment, we used another PKC inhibitor, Staurosporine (Omura et al, 1977), which has a strong cytotoxic effect as a positive control of cell death in suspension conditions. The addition of IWR-1-endo and LY333531 for 10 days had no effects on the apoptosis while the addition of Staurosporine for 2 hours induced Annexin-V-positive apoptotic cells  (Figure 3—figure supplement 2). These results indicate that the combination of PKCb and Wnt signaling inhibition has no or little effects on the cell survival in suspension conditions."

      (9) The authors have performed reprogramming using episomal vectors and using Sendai viruses. In both the protocols authors have added small molecules at different time points, for episomal vector protocol at day 3 and Sendai virus protocol at day 23. Why is this different?  

      Thank you for this insightful question. We intended that these differences should be reflected in the degree of the expression from these reprogramming vectors. The expression of reprogramming factors from these vectors should suppress the spontaneous differentiation in reprogramming cells. Sendai viral vectors should last longer than episomal plasmid vectors. Thus, we thought that adding these chemical inhibitors for episomal plasmid vector conditions from the early phase of reprogramming and for Sendai viral vector conditions from the late phase of reprogramming. For future perspectives, we might further need to optimize the timing of adding these molecules.

      (10) The protocol for three germ layer differentiation using a specific differentiation medium requires further elaboration. For instance, the authors mentioned that suspension cultures were transferred to differentiation media but did not emphasize the cell number and culture conditions before moving the cultures to the differentiation media.  

      Sorry for this unclear description. We have added the explanation on the cell number and culture conditions before moving the cultures to the differentiation media in the Materials and Methods section as below.

      "As in the maintenance conditions, 4 × 105 hiPSC were seeded in one well of a low-attachment 6-well plate with 4 mL of StemFit AK02N medium supplemented with 10 µM Y-27632. This plate was placed onto the plate shaker in the CO2 incubator. Next day, the medium was changed to the germ layer specific differentiation medium."

    1. Author response:

      Joint Public Reviews:

      Here, the authors compare how different operationalizations of adverse childhood experience exposure related to patterns of skin conductance response during a fear conditioning task. They use a large dataset to definitively understand a phenomenon that, to date, has been addressed using a range of different definitions and methods, typically with insufficient statistical power. Specifically, the authors compared the following operationalizations: dichotomization of the sample into "exposed" and "non-exposed" categories, cumulative adversity exposure, specificity of adversity exposure, and dimensional (threat versus deprivation) adversity exposure. The paper is thoughtfully framed and provides clear descriptions and rationale for procedures, as well as package version information and code. The authors' overall aim of translating theoretical models of adversity into statistical models, and comparing the explanatory power of each model, respectively, is an important and helpful addition to the literature. However, the analysis would be strengthened by employing more sophisticated modelling techniques that account for between-subjects covariates and the presentation of the data needs to be streamlined to make it clearer for the broad audience for which it is intended.

      Strengths

      Several outstanding strengths of this paper are the large sample size and its primary aim of statistically comparing leading theoretical models of adversity exposure in the context of skin conductance response. This paper also helpfully reports Cohen's d effect sizes, which aid in interpreting the magnitude of the findings. The methods and results are generally thorough.

      Weaknesses

      Weakness 1: The largest concern is that the paper primarily relies on ANOVAs and pairwise testing for its analyses and does not include between-subjects covariates. Employing mixedeffects models instead of ANOVAs would allow more sophisticated control over sources of random variance in the sample (especially important for samples from multi-site studies such as the present study), and further allow the inclusion of potentially relevant between-subjects covariates such as age (e.g. Eisenstein et al., 1990) and gender identity or sex assigned at birth (e.g. Kopacz II & Smith, 1971) (perhaps especially relevant due to possible to gender or sex-related differences in ACE exposure; e.g. Kendler et al., 2001). Also, proxies for socioeconomic status (e.g. income, education) can be linked with ACE exposure (e.g. Maholmes & King, 2012) and warrant consideration as covariates, especially if they differ across adversity-exposed and unexposed groups. 

      We appreciate the reviewer's suggestion and recognize the value of using (more) sophisticated statistical methods. However, we think that considerations which methods to employ should not only be guided by perceived complexity and think that the chosen ANOVA -based approach provides reliable and valid data. In our revision, we address the reviewer's suggestion by demonstrating that employing mixed models leaves the reported results unchanged (a). We would also like to refer the reviewer to the robustness analyses provided in the initial supplementary material (b).

      a) Re-running analyses using mixed models

      Based on the reviewers' suggestion, we repeated our main analyses (association between exposure to childhood adversity and SCRs, arousal, valence, and contingency ratings during fear acquisition and generalization) using linear mixed models, including age, sex, educational attainment, and childhood adversity as fixed effects, and site as a random effect. These analyses produced results similar to those in our manuscript, demonstrating a significant effect of childhood adversity on SCRs, as assessed by CS discrimination during both acquisition training and the generalization phase, and on general reactivity, but not on linear deviation scores (LDS). For the different rating types, we did not observe any significant effects of childhood adversity.

      We would prefer to retain our main analyses as they are and report the linear mixed model results as additional results in the supplement. However, if the reviewer and editor have strong preferences otherwise, we are open to presenting the mixed models in the main manuscript and moving our previous analyses to the supplement.

      We added the following paragraph to the main manuscript (page 25-26):

      “At the request of a reviewer, we repeated our main analyses by using linear mixed models including age, sex, school degree (i.e., to approximate socioeconomic status), and exposure to childhood adversity as mixed effects as well as site as random effect. These analyses yielded comparable results demonstrating a significant effect of childhood adversity on CS discrimination during acquisition training and the generalization phase as well as on general reactivity, but not on the generalization gradients in SCRs (see Supplementary Table 2 A). Consistent with the results of the main analyses reported in our manuscript, we did not observe any significant effects of childhood adversity on the different types of ratings when using mixed models (see Supplementary Table 2 B-D). Some of the mixed model analyses showed significantly lower CS discrimination during acquisition training and generalization, and lower general reactivity in males compared to females (see Supplementary Table 2 for details).”

      b) Additional robustness tests for the main analyses (already provided in the initial submission as supplementary material)

      We would also like to refer the reviewer to the robustness analyses in the initial supplement to account for possible site effects. Adding site to the analyses affected the pvalue in only one instance: entering site as covariate in analyses of CS discrimination during acquisition training attenuated the p-value of the ACQ exposure effect from p = 0.020 to p = 0.089.

      Further robustness checks involved repeating our main analyses while excluding (a) physiological non-responders (participants with only SCRs = 0) and (b) extreme outliers (data points ± 3 SDs from the mean) to ensure generalizable results. These repetitions of the analyses did not lead to any changes in the results.

      We did not include age in our primary analyses due to the homogeneity of our sample and the lack of related hypotheses. Additionally, socio-economic status was assessed only crudely via the highest education level attained, rendering it of limited use.

      Weakness 2: On a related methodological note, the authors mention that scores representing threat and deprivation were not problematically collinear due to VIFs being <10; however, some sources indicate that VIFs should be <5 (e.g. Akinwande et al., 2015).

      We thank the reviewer for bringing different cut-offs to our attention. We have revised this section to highlight the arbitrary nature of their interpretation (page 33):

      “Within the dimensional model framework, the issue of multicollinearity among predictors (i.e., different childhood adversity types) is frequently discussed (McLaughlin et al., 2021; Smith & Pollak, 2021). If we apply the rule of thumb of a variance inflation factor (VIF) > 10, which is often used in the literature to indicate concerning multicollinearity (e.g., Hair, Anderson, Tatham, & Black, 1995; Mason, Gunst, & Hess, 1989; Neter, Wasserman, & Kutner, 1989), we can assume that that multicollinearity was not a concern in our study (abuse: VIF = 8.64; neglect: VIF = 7.93). However, some authors state that VIFs should not exceed a value of 5 (e.g., Akinwande, Dikko, and Samson (2015)), while others suggest that these rules of thumb are rather arbitrary (O’brien, 2007).”

      Weakness 3: Additionally, the paper reports that higher trait anxiety and depression symptoms were observed in individuals exposed to ACEs, but it would be helpful to report whether patterns of SCR were in turn associated with these symptom measures and whether the different operationalizations of ACE exposure displayed differential associations with symptoms.

      We thank the reviewer for highlighting these relevant points. We have included additional analyses in the supplementary material in response to this comment. Figures and the corresponding text are also copied below for your convenience.

      We added the following paragraphs to the main manuscript: Methods (page 21):

      “Analyses of trait anxiety and depression symptoms

      To further characterize our sample, we compared individuals being unexposed compared to exposed to childhood adversity on trait anxiety and depression scores by using Welch tests due to unequal variances.

      On the request of a reviewer, we additionally investigated the association of childhood adversity as operationalized by the different models used in our explanatory analyses (i.e., cumulative risk, specificity, and dimensional model) and trait anxiety as well as depression scores (see Supplementary Figure 7). By using STAI-T and ADS-K scores as independent variable, we calculated a) a comparison of conditioned responding of the four severity groups (i.e., no, low, moderate, severe exposure to childhood adversity) using one-way ANVOAs and the association with the number of sub-scales exceeding an at least moderate cut-off in simple linear regression models for the implementation of the cumulative risk model, and b) the association with the CTQ abuse and neglect composite scores in separate linear regression models for the implementation of the specificity/dimensional models. On request of the reviewer, we also calculated the Pearson correlation between trait anxiety (i.e., STAI-T scores), depression scores (i.e., ADS-K scores) and conditioned responding in SCRs (see Supplementary Table 8).”

      Results (page 38):

      “Analyses of trait anxiety and depression symptoms

      As expected, participants exposed to childhood adversity reported significantly higher trait anxiety and depression levels than unexposed participants (all p’s < 0.001; see Table 1 and Supplementary Figure 6). This pattern remained unchanged when childhood adversity was operationalized differently - following the cumulative risk approach, the specificity, and dimensional model (see methods). These additional analyses all indicated a significant positive relationship between exposure to childhood adversity and trait anxiety as well as depression scores irrespective of the specific operationalization of “exposure” (see Supplementary Figure 7).

      CS discrimination during acquisition training and the generalization phase, generalization gradients, and general reactivity in SCRs were unrelated to trait anxiety and depression scores in this sample with the exception of a significant association between depression scores and CS discrimination during fear acquisition training (see Supplementary Table 8). More precisely, a very small but significant negative correlation was observed indicating that high levels of depression were associated with reduced levels of CS discrimination (r = -0.057, p =0.033). The correlation between trait anxiety levels and CS discrimination during fear acquisition training was not statistically significant but on a descriptive level, high anxiety scores were also linked to lower CS discrimination scores (r = -0.05, p = 0.06) although we highlight that this should not be overinterpreted in light of the large sample. However, both correlations (i.e., CS-discrimination during fear acquisition training and trait anxiety as well as depression, respectively) did not statistically differ from each other (z = 0.303, p = 0.762, Dunn & Clark, 1969). Interestingly, and consistent with our results showing that the relationship between childhood adversity and CS discrimination was mainly driven by significantly lower CS+ responses in exposed individuals, trait anxiety and depression scores were significantly associated with SCRs to the CS+, but not to the CS- during acquisition training (see Supplementary Table 8).”

      Weakness 4: Given the paper's framing of SCR as a potential mechanistic link between adversity and mental health problems, reporting these associations would be a helpful addition. These results could also have implications for the resilience interpretation in the discussion (lines 481-485), which is a particularly important and interesting interpretation.

      We have added a paragraph on this to the discussion (page 41):

      “Interestingly, in our study, trait anxiety and depression scores were mostly unrelated to SCRs, defined by CS discrimination and generalization gradients based on SCRs as well as general SCR reactivity, with the exception of a significant - albeit minute - relationship between CS discrimination during acquisition training and depression scores (see above). Although reported associations in the literature are heterogeneous (Lonsdorf et al., 2017), we may speculate that they may be mediated by childhood adversity. We conducted additional mediation analyses (data not shown) which, however, did not support this hypothesis. As the potential links between reduced CS discrimination in individuals exposed to childhood adversity and the developmental trajectories of psychopathological symptoms are still not fully understood, future work should investigate these further in - ideally - prospective studies.”

      Weakness 5: Given that the manuscript criticizes the different operationalizations of childhood adversity, there should be greater justification of the rationale for choosing the model for the main analyses. Why not the 'cumulative risk' or 'specificity' model? Related to this, there should also be a stronger justification for selecting the 'moderate' approach for the main analysis. Why choose to cut off at moderate? Why not severe, or low? Related to this, why did they choose to cut off at all? Surely one could address this with the continuous variable, as they criticize cut-offs in Table 2.

      We thank the reviewers and editors for bringing to our attention that our reasoning for choosing the main model was not clear. As outlined in the manuscript, we chose the approach for the main analyses from the literature as a recent review on this topic (Ruge et al., 2023) has shown the moderate CTQ cut-off to be the most abundantly employed in the field of research on associations between childhood adversity and threat learning. We have made this rationale more explicit in our revised manuscript (page 15/21):

      “Operationalization of "exposure"

      We implemented different approaches to operationalize exposure to childhood adversity in the main analyses and exploratory analyses (see Table 2). In the main analyses, we followed the approach most commonly employed in the field of research on childhood adversity and threat learning - using the moderate exposure cut-off of the CTQ (for a recent review see Ruge et al. (2024)). In addition, the heterogeneous operationalizations of classifying individuals into exposed and unexposed to childhood adversity in the literature (Koppold, Kastrinogiannis, Kuhn, & Lonsdorf, 2023; Ruge et al., 2024) hampers comparison across studies and hence cumulative knowledge generation. Therefore, we also provide exploratory analyses (see below) in which we employ different operationalizations of childhood adversity exposure.”

      “Exploratory analyses

      Additionally, the different ways of classifying individuals as exposed or unexposed to childhood adversity in the literature (Koppold et al., 2023; for discussion see Ruge et al., 2024) hinder comparison across studies and hence cumulative knowledge generation. Therefore, we also conducted exploratory analyses using different approaches to operationalize exposure to childhood adversity (see Table 2 for details).”

      Furthermore, as correctly noted, we fully agree that employing the moderate cut-off (or any cut-off in fact) is in principle an arbitrary decision - despite being guided by and derived from the literature in the field. However, we would like to draw the reviewers’ attention to Figure 5 in the initial submission (please see also below): Although the differences in SCR between severity groups were not significant, the overall pattern suggests at a descriptive level that the decline in CS discrimination, LDS and general reactivity in SCR occurs mainly when childhood adversity exceeds a moderate level. Thus, while we used the moderate cut-off as it was recently shown to be the most widely used approach in the literature (see Ruge et al., 2023), our exploratory analyses also seem to suggest on a descriptive level, that this cut-off may indeed “make sense”. We also refer to this in the results section (page 31-32) and discussion (page 43-44):

      Results:

      “However, on a descriptive level (see Figure 5), it seems that indeed exposure to at least a moderate cut-off level may induce behavioral and physiological changes (see main analysis, Bernstein & Fink, 1998). This might suggest that the cut-off for exposure commonly applied in the literature (see Ruge et al., 2024) may indeed represent a reasonable approach.”

      Discussion:

      “It is noteworthy, however, that this cut-off appears to map rather well onto psychophysiological response patterns observed here (see Figure 5). More precisely, our exploratory results of applying different exposure cut-offs (low, moderate, severe, no exposure) seem to indicate that indeed a moderate exposure level is “required” for the manifestation of physiological differences, suggesting that childhood adversity exposure may not have a linear or cumulative effect.”

      Weakness 6: In the Introduction, the authors predict less discrimination between signals of danger (CS+) and safety (CS-) in trauma-exposed individuals driven by reduced responses to the CS+. Given the potential impact of their findings for a larger audience, it is important to give greater theoretical context as to why CS discrimination is relevant here, and especially what a reduction in response specifically to danger cues would mean (e.g. in comparison to anxiety, where safety learning is impacted).

      We thank the reviewer for highlighting that this was not sufficiently clear. We revised the paragraph in the introduction as follows (page 7-8):

      “Fear acquisition as well as extinction are considered as experimental models of the development and exposure-based treatment of anxiety- and stress-related disorders. Fear generalization is in principle adaptive in ensuring survival (“better safe than sorry”), but broad overgeneralization can become burdensome for patients. Accordingly, maintaining the ability to distinguish between signals of danger (i.e., CS+) and safety (i.e., CS-) under aversive circumstances is crucial, as it is assumed to be beneficial for healthy functioning (Hölzel et al., 2016) and predicts resilience to life stress (Craske et al., 2012), while reduced discrimination between the CS+ and CS- has been linked to pathological anxiety (Duits et al., 2015; Lissek et al., 2005): Meta-analyses suggest that patients suffering from anxiety- and stress-related disorders show enhanced responding to the safe CS- during fear acquisition (Duits et al., 2015). During extinction, patients exhibit stronger defensive responses to the CS+ and a trend toward increased discrimination between the CS+ and CS- compared to controls, which may indicate delayed and/or reduced extinction (Duits et al., 2015). Furthermore, meta-analytic evidence also suggests stronger generalization to cues similar to the CS+ in patients and more linear generalization gradients (Cooper, van Dis, et al., 2022; Dymond, Dunsmoor, Vervliet, Roche, & Hermans, 2015; Fraunfelter, Gerdes, & Alpers, 2022). Hence, aberrant fear acquisition, extinction, and generalization processes may provide clear and potentially modifiable targets for intervention and prevention programs for stress-related psychopathology (McLaughlin & Sheridan, 2016).”

      Recommendations for the authors:

      Abstract:

      Comment 1:

      (a) It does not succinctly describe the background rationale well (i.e. it tries to say too much). It should be streamlined. There is a lot of 'jargon', which muddies the results, and too many concepts are introduced at each part and assume knowledge from the reader. 

      We thank the reviewer for providing constructive guidance for revisions. We have revised our abstract according to these suggestions.

      (b) Multiple terms for childhood trauma are used: ACEs, early adversity, childhood trauma, and childhood maltreatment. Choose one term and stick to it to enhance clarity. Why not just use childhood adversity, as in the title? Related to this, the use of ACEs sets up an expectation that ACE questionnaire was used, so readers are then surprised to find they used the childhood trauma questionnaire.

      We thank the reviewer for bringing this to our attention. As suggested by the reviewer, we use the term “childhood adversity” in our revised manuscript.

      Introduction:

      Comment 2:

      The phrasing seems to 'exaggerate' the trauma problem and is too broad in the first paragraph - e.g., "two-thirds of people experience one or more traumatic events..." It is important to clarify that not all of these people will go on to develop behavioral, somatic, and psychopathological conditions. Could break this down more into how many people have low, moderate, or severe for clarity, as 1 childhood adversity is different to 5+, and the type.

      We thank the reviewer for bringing this to our attention and have revised the first paragraph accordingly (page 6). Please note, however, that in the literature typically a specific cut-off (e.g. moderate) is used and the number of individuals that would meet different cut-offs (e.g., low and high) are not specifically reported.

      “Exposure to childhood adversity is rather common, with nearly two thirds of individuals experiencing one or more traumatic events prior to their 18th birthday (McLaughlin et al., 2013). While not all trauma-exposed individuals develop psychopathological conditions, there is some evidence of a dose-response relationship (Danese et al., 2009; Smith & Pollak, 2021; Young et al., 2019). As this potential relationship is not yet fully clear, understanding the mechanisms by which childhood adversity becomes biologically embedded and contributes to the pathogenesis of stress-related somatic and mental disorders is central to the development of targeted intervention and prevention programmes.”

      Comment 3:

      The published cut-offs for exposed/unexposed should be indicated here.

      We have included the published cut-offs as suggested (page 10):

      We operationalize childhood adversity exposure through different approaches: Our main analyses employ the approach adopted by most publications in the field (see Ruge et al., 2024 for a review) - dichotomization of the sample into exposed vs. unexposed based on published cut-offs for the Childhood Trauma Questionnaire [CTQ; Bernstein et al. (2003); Wingenfeld et al. (2010)]. Individuals were classified as exposed to childhood adversity if at least one CTQ subscale met the published cut-off (Bernstein & Fink, 1998; Häuser, Schmutzer, & Glaesmer, 2011) for at least moderate exposure (i.e., emotional abuse  13, physical abuse  10, sexual abuse  8, emotional neglect  15, physical neglect  10).

      Comment 4:

      Please check for overly complex sentences, and reduce the complexity. For example: "In addition, we provide exploratory analyses that attempt to translate dominant (verbal) theoretical accounts (McLaughlin et al., 2021; Pollak & Smith, 2021) on the impact of exposure to ACEs into statistical tests while acknowledging that such a translation is not unambiguous and these exploratory analyses should be considered as showcasing a set of plausible solutions."

      We have revised this section and carefully proofread our manuscript by paying attention to this (page 10):

      “In addition, we provide exploratory analyses that attempt to translate dominant (verbal) theoretical accounts (McLaughlin et al., 2021; Pollak & Smith, 2021) on the impact of exposure to childhood adversity into statistical tests. At the same time, we acknowledge that such a translation is not unambiguous and these exploratory analyses should be considered as showcasing a set of plausible solutions”

      Here is another example of reducing the complexity of our sentences (page 6):

      “Learning is a core mechanism through which environmental inputs shape emotional and cognitive processes and ultimately behavior. Thus, learning mechanisms are key candidates potentially underlying the biological embedding of exposure to childhood adversity and their impact on development and risk for psychopathology (McLaughlin & Sheridan, 2016).”

      Methods:

      Comment 5:

      Is this study part of a larger project? These outcomes were probably not the primary outcomes of this multicenter project. The readers need to understand how this (crosssectional?) analysis was nested in this larger trial.

      We thank the reviewers and editor for bringing to our attention that this was not sufficiently clear. Thus far, we included the information that we used the participants recruited for large multicentric study in the main manuscript, but point to the inclusion of more information in the supplement (page 11):

      “In total, 1678 healthy participants (age_M_ = 25.26 years, age_SD_ = 5.58 years, female = 60.10%, male = 39.30%) were recruited in a multi-centric study at the Universities of Münster, Würzburg, and Hamburg, Germany (SFB TRR58). Data from parts of the Würzburg sample have been reported previously (Herzog et al., 2021; Imholze et al., 2023; Schiele, Reinhard, et al., 2016; Schiele, Ziegler, et al., 2016; Stegmann et al., 2019). These previous reports, also those focusing on experimental fear conditioning (Schiele, Reinhard, et al., 2016; Stegmann et al., 2019), addressed, however, research questions different from the ones investigated here (see also Supplementary Material for details).”

      Moreover, we have included additional information on the larger trial in our revised supplement (page 2):

      “Participants of this study were recruited in a multi-centric collaborative research center “Fear, anxiety, anxiety disorders” joining forces between the Universities of Hamburg,

      Würzburg, and Münster, Germany (SFB TRR58). During the second funding period of (20132016), all three sites recruited a large sample (N ~500) in the context of the Z project. All participants underwent the cross-sectional experimental paradigm reported here and were additionally extensively characterized to allow specific subprojects to recruit target subpopulations serving different aims with a focus on molecular genetic, epigenetic, or other research questions (see Herzog et al. (2021); Imholze et al. (2023); Schiele, Reinhard, et al. (2016); Schiele, Ziegler, et al. (2016); Stegmann et al. (2019)). The question on the association of exposure to childhood adversity and recent adversity was part of the primary research question of one subproject led by the senior author of this work (B07, TBL) and was hence a research question of primary interest also for this multicentric project.”

      Comment 6:

      Table 1 does not include percentages (a reader must calculate them: for example, 15% exposed?). These numbers belong in the results (i.e., it is confusing to read about the exposed/non-exposed before we know how it has been calculated).

      We have added the percentages as suggested and have included information on how exposed and unexposed was calculated as a table caption. We have considered moving the table to the results section but find it more suitable here. 

      Comment 7:

      A procedure figure could be useful.

      We thank the reviewer for this advice and have included a procedure figure in the supplementary material.

      Comment 8:

      Physiological data recordings and processing paragraph: The reasoning as to why the authors chose log transformation over square root transformation, or an approach that does not require transformation is not clear.

      We thank the reviewer for notifying us that we did not make this point clear enough. We opted for a log-transformation and range-correction of the SCR data because we use these transformations consistently in our laboratory (e.g., Ehlers et al., 2020; Kuhn et al., 2016; Scharfenort & Lonsdorf, 2016; Sjouwerman et al., 2015; Sjouwerman et al. 2020). In addition, log-transformed and range-corrected data are assumed to be closer to a normal distribution, to have a lower error variance resulting in larger effect sizes (Lykken & Venables, 1971; Lykken, 1972; Sjouwerman et al., 2022), and appear to have - at least descriptively - higher reliability compared to raw data (Klingelhöfer-Jens et al., 2022). We added a sentence on this to the methods section (page 14):

      Note that previous work using this sample (Schiele, Reinhard, et al., 2016; Stegmann et al., 2019) had used square-root transformations but we decided to employ a log-transformation and range-correction (i.e., dividing each SCR by the maximum SCR per participant). We used log-transformation and range-correction for SCR data because these transformations are standard practice in our laboratory and we strive for methodological consistency across different projects (e.g., Ehlers, Nold, Kuhn, Klingelhöfer-Jens, & Lonsdorf, 2020; Kuhn, Mertens, & Lonsdorf, 2016; Scharfenort, Menz, & Lonsdorf, 2016; Sjouwerman & Lonsdorf, 2020; Sjouwerman, Niehaus, & Lonsdorf, 2015). Additionally, log-transformed and rangecorrected data are generally assumed to approximate a normal distribution more closely and exhibit lower error variance, which leads to larger effect sizes (Lykken, 1972; Lykken & Venables, 1971; Sjouwerman, Illius, Kuhn, & Lonsdorf, 2022). Additionally, on a descriptive level, this combination of transformations appear to offer greater reliability compared to using raw data alone (Klingelhöfer-Jens, Ehlers, Kuhn, Keyaniyan, & Lonsdorf, 2022).

      Ehlers, M. R., Nold, J., Kuhn, M., Klingelhöfer-Jens, M., & Lonsdorf, T. B. (2020). Revisiting potential associations between brain morphology, fear acquisition and extinction through new data and a literature review. Scientific Reports, 10(1), 19894. https://doi.org/10.1038/s41598-020-76683-1

      Kuhn, M., Mertens, G., & Lonsdorf, T. B. (2016). State anxiety modulates the return of fear. International Journal of Psychophysiology: Official Journal of the International Organization of Psychophysiology, 110, 194–199. https://doi.org/10.1016/j.ijpsycho.2016.08.001

      Scharfenort, R., & Lonsdorf, T. B. (2016). Neural correlates of and processes underlying generalized and differential return of fear. Social Cognitive and Affective Neuroscience, 11(4), 612–620. https://doi.org/10.1093/scan/nsv142

      Sjouwerman, R., Niehaus, J., & Lonsdorf, T. B. (2015). Contextual Change After Fear Acquisition Affects Conditioned Responding and the Time Course of Extinction Learning—Implications for Renewal Research. Frontiers in Behavioral Neuroscience, 9. https://doi.org/10.3389/fnbeh.2015.00337

      Sjouwerman, R., Scharfenort, R., & Lonsdorf, T. B. (2020). Individual differences in fear acquisition: Multivariate analyses of different emotional negativity scales, physiological responding, subjective measures, and neural activation. Scientific Reports, 10(1), 15283. https://doi.org/10.1038/s41598-020-72007-5

      Comment 9:

      There are 24 lines of text of R packages. I do not think this is necessary for the manuscript document and could be moved to the Supplement.

      We thank the reviewer for this comment and understand that it may take a considerable amount of space to list all the references of the R packages. However, we think it is important to prominently credit the respective authors of the R packages. Yet, if this is an important concern of the reviewer and editor, we will reconsider this point.

      Comment 10:

      It is not clear why the authors chose to analyze summary scores across trials rather than including a time factor for the acquisition phase.

      We would like to thank the reviewer for highlighting that the factor time may be interesting as well. However, we think that in our case the time factor is less interesting, as the acquisition effect itself is rather strong. Nevertheless, we have included a figure in the supplement that shows the time course of the SCR by displaying trial-by-trial data across the acquisition and generalization phase for transparency. This figure (Supplementary figure 4) shows that the trajectories appear to barely differ between individuals who were unexposed vs. exposed to moderate childhood adversity. Hence, we think that the analysis approach we have chosen is unlikely to overshadow central time-depending effects. However, if the reviewer and editor has strong feelings about this point, we will consider integrating additional analyses including the time factor in the supplement.

      Results:

      Comment 11:

      The caption of Figure 3 does not match the figure. Please check this.

      We thank the reviewers and editor for attentive reading and have revised this part.

      References:

      Comment 12:

      The Ruge et al paper that is cited many times throughout does not have a valid DOI in the References section. Additionally, the author list on the preprint server is substantially different from that listed in the manuscript. Please correct this reference.

      We thank the reviewers and editor for attentive reading and have corrected this reference. The provided doi was functioning at our end and we hope that this now also applies to the reviewers.

    1. Author response:

      Reviewer #1:

      Response to Public Review

      We thank the reviewer for taking the time to carefully read our paper and to provide helpful comments and suggestions, most of which we have incorporated in our revised manuscript.  One of this reviewer’s (and reviewer #2’s) main concerns was that the confocal images provided in some cases did not appear to reflect the quantitative data in the bar graphs.  These images were provided only for illustrative purposes, to give the reader a sense of what the primary data look like. The reviewer may not have appreciated that the quantitative data reflect counts of RNA smFISH signals (dots) in hundreds of cells collected through z-stacks comprising multiple optical sections in multiple flies for each condition  For example, in P1a control condition (in Figure 2A), we have analyzed 135 neurons from 8 individuals. There, the number of z-planes ranged from 3 to 8 per hemisphere. It is generally not possible to find a single confocal section that encompasses quantitatively the statistics that are presented in the graphs. Presenting the data as an MIP (Maximum Intensity Projection, i.e., collapsed z-stack) in a single panel would generate an image that is too cluttered to see any detail.  We have now included, for the reader’s benefit, additional example confocal sections in both a z-stack and from the opposite hemisphere, in Supplemental Figure S4D. We have also inserted clarifying statements in the text on p. 7 (lines 154-156).

      Another suggestion from Reviewer #1 is that "it would be more informative to separate in the quantification between the GAL4-expressing neurons and the non-expressing ones" based on the presented pictures where more non-P1a neurons (that the reviewer speculates may be pC1-type neurons) are activated by a male-male encounter than by a male-female encounter, while the P1a-positive neurons seem to be more responsive during courtship behavior. In this paper, we were not looking at pC1 neurons and did not try to answer which neuronal population(s) outside of the P1a population is/are responsible for aggression and/or courtship. Rather, we focused on P1a neurons and addressed whether P1a neurons that induce both aggression and courtship behavior when they are artificially activated (Hoopfer et al. 2015) are also naturally activated during spontaneous performance of these two social behaviors. However, this result did not exclude the possibility that P1a neurons were inactive during naturalistic courtship or aggression. Our data in the current manuscript provide further experimental evidence in support of the idea that P1a neurons as a population play a role in both of these behaviors. Moreover, we provided data identifying P1a neurons activated only during aggression or during courtship (or both). However this does not exclude that pC1 or other neighboring populations are activated during aggression as well (See also the response to 'Recommendations For The Authors' and text lines 151-154).

      In Figure 3, we used opto-HI-FISH to identify candidate downstream targets (direct or indirect) of P1a neurons. We used 50 Hz Chrimson stimulation to activate P1a neurons to induce expression of Hr38 and identified Kenyon cells in the mushroom body (MB) and PAM neurons (as well as pCd neurons) as potential downstream targets of P1a cells. In Figure 3 – supplement we performed calcium imaging of KCs and PAM neurons in response to P1a optogenetic stimulation to confirm independently our results from the Hr38 labeling experiments. That control was the purpose of that supplemental experiment.

      Based on those imaging data, the reviewer asked the further question of which [natural] behavioral context induces Hr38 expression in these populations (i.e., mating or aggression). This question is reasonable because our calcium imaging data (Figure 3-supplement) showed that both Kenyon cells and PAM neurons are active only during photo-stimulation of P1a neurons.  Our previous behavioral studies (Inagaki et al., 2014; Hoopfer et al., 2015) showed that 50 Hz photo-stimulation of P1a neurons in freely moving flies induced unilateral wing extension during stimulation, while aggression was observed only after the offset of the stimulation (Hoopfer et.al., 2015). Based on the comparison of those behavioral data to the imaging results in this paper, the reviewer suggested that Kenyon cells and PAM neurons are activated during courtship rather than during aggression. This is certainly a possible interpretation. However it is difficult to extrapolate from behavioral experiments in freely moving animals to calcium imaging results in head-fixed flies, particularly with response to neural dynamics.  Furthermore, Hr38 expression, like that of other IEGs (e.g., c-fos), may reflect persistently activated 2nd messenger pathways (e.g., cAMP, IP3) in Kenyon cells and PAM neurons that are not detected by calcium imaging, but that nevertheless play a role in mediating its behavioral effects. We still do not understand the mechanisms of how optogenetic stimulation of P1a neurons in freely behaving flies induces aggression vs. courtship behavior. Although 50 Hz stimulation of P1a neurons does not induce aggressive behavior during photo-stimulation, it is possible that this manipulation activates both aggression and courtship circuits, but that the courtship circuit might inhibit aggressive behavior at a site downstream of the MB (e.g., in the VNC). Once stimulation is terminated and courtship stops the fly would show aggressive behavior, due to release of that downstream inhibition (see Models in Anderson (2016) Fig 2d, e). In that case, there would be no apparent inconsistency between the imaging data and behavioral data. We agree that the reviewer's question is interesting and important but we feel that answering this question with decisive experiments is beyond the scope of this manuscript.

      Finally, Reviewer #1 suggested a method to evaluate the Hr38 signals in the catFISH experiment of Figure 4. We appreciate their suggestions, but the way that we evaluated the Hr38 signals was basically the same as the way the reviewer suggested. We apologize for the confusion caused by the lack of detailed descriptions in the original manuscript. We have now revised the methods section to explain more clearly how we define the cells as positive based on Hr38EXN and Hr38INT signals.

      Response to Recommendations for the authors:

      “To strengthen the author's argumentation, I would distinguish in their quantification between gal4+ from the other [classes of neighboring neurons]” (Fig. 2 and 4).”

      Our focus in this paper was to ask simply whether P1a neurons are active or not active during natural occurrences of the social behaviors they can evoke when artificially activated. We did not claim that they are the only cells in the region that control the behaviors.  It is not possible to compare their activation to that of 'other' cells neighboring P1a neurons without a separate marker to identify those cells driven by a different reporter system (e.g., LexA). This in turn would require repeating all of the experiments in Figs 2 and 4 from scratch with new genotypes permitting dual-labeling of the two populations by different XFPs, and quantifying the data using 4-color labeling. We respectfully submit that such curiosity-driven experiments, while in principle interesting, are beyond the scope of the present manuscript.  However, we have inserted text to acknowledge the possibility that the aggression-activated Hr38 signals in P1a- cells neighboring P1a+ cells may correspond to other classes of P1 neurons (of which there are 70 in total) or to pC1 cells. Changes:  Text lines 151-154.

      “if the magenta dot is outside of the nuclei I would not count this as positive also the size of the dot seems to be a good marker of the reality of the signal). I would measure the intensity of the hr38EXN. A high Hr38EXN level associated with the presence of hr38INT would indicate that the cell has been activated during both encounters, while a lower hr38EXN with no hr38INT would suggest only an activation during the 1st behavioural context. Finally, a lower hr38EXN associated with the presence of hr38INT would suggest the opposite, an activation only during the 2nd behaviour.”

      We agree that there are some tiny dot signals with hr38 INT probe that are more likely the background signals. We only counted the INT probe signals as positive when the cells had a clearly visible dot and also co-localize with the exonic probe's signal, as primary (un-spliced) Hr38 transcripts in the nucleus should be positive for both EXN and INT probes. Regarding the reviewer’s latter comments, we agree with their interpretation of the catFISH results and that is how we interpreted them originally. We measured the intensity of hr38EXN expression and defined hr38EXN-labeled cells as “positive” when the relative intensity was 3σ >average, a stringent criterion. In the revised manuscript, we added more detailed information in the methods section regarding our criteria for defining cell types as positive.

      “Knowing that the P1a neurons (using the split-gal4) can trigger only wing extension when activated by optogenetic 50Hz, I would test to which behavioral context the MB neurons and the PAM neurons positively respond to.”

      As we answered in 'Response to Public Review,' our opto-HI-FISH experiments identified Kenyon cells in the mushroom body (MB) and PAM neurons (as well as pCd neurons) as potential downstream targets of P1a cells, using Hr38 labeling. The purpose of the calcium imaging experiment in Figure 3 – supplement was to confirm the P1a-dependent activation of KCs and PAM neurons using an independent method. In that respect this control experiment was successful in that methodological confirmation. The reviser raised an interesting question about how our calcium imaging experiments relate to our behavioral experiments, in terms of the dynamics of KC and PAM activation. A recent publication (Shen et al., 2023) revealed that courtship behavior has a positive valence and that activation of P1 neurons mimics a courtship-reward state via activation of PAM dopaminergic neurons. Therefore, it is reasonable to think that PAM neurons (and Kenyon cells as downstream of PAM neurons) are activated during female exposure. However those data do not exclude the possibility that inter-male aggression is also rewarding in Drosophila males, as it has shown to be in mice. This is an interesting curiosity-driven question that has yet to be resolved.  Therefore, as mentioned in the 'Response to Public Review,' we feel that the additional experiment the reviewer suggests is beyond the scope of our manuscript.

      Changes: None.

      Minor comments:

      “Please provide different pictures from main fig2 and sup2 for the three common conditions (control, aggression, and courtship).” 

      The data set for Figure 2 and Figure 2 supplement are from the same experiment. Because of the limited space, we just presented the selected key conditions ('Control', 'Aggression', and 'Courtship') in the main figure and put the complete data set (including these three key conditions) in the supplemental figure.

      Changes: None

      “Please, provide scale bars for the images.”

      Also, Reviewer #2 commented, 'Scale bars are missing on all the images throughout the main and supplementary figures.'

      We have now added scale bars for each figure. 

      “Fig.1: “Is the chrimsonTdtom images from endogenous fluorescence? It is not said in the legend and anti-dsred is not provided in the material and method while anti-GFP is.”

      We are sorry for the confusion and thank the reviewer for raising that question. The signals were native fluorescence, and we have now added that information to the figure legend.

      P7: "As an initial proof-of-concept application of HI-FISH, we asked whether neuronal subsets initially identified in functional screens for aggression-promoting neurons (Asahina et al., 2014; Hoopfer et al., 2015; Watanabe et al., 2017) were actually active during natural aggressive behavior. These included P1a, Tachykinin-FruM+ (TkFruM), and aSP2 neurons". Please put the references to the corresponding group of neurons listed. For example: "These included P1a neurons [Hoopfer et al., 2015]". 

      We have now added these references.

      P9: "Optogenetic and thermogenetic stimulation experiments have shown that that P1a interneurons can promote both male-directed aggression and male- or female-directed courtship" typo

      We appreciate the reviewer for catching this error and have corrected the text.

      (P10:" To validate this approach, we first asked whether we could detect Hr38 induction in pCd neurons, which were previously shown by calcium imaging to be (indirect) targets of P1a neurons". Reference [Jung et al., 2020] 

      We have now added this reference.

      Fig. 4A: Put the time scale on the diagram (3h adaptation-20min-30min rest-20min-10min rest-collect) 

      We have now added the time scale in Figure 4A.

      Reviewer #2: 

      Response to Public Review: 

      We thank the reviewer for their helpful comments and suggestions. We have addressed most of them in our revised manuscript. The main concern of Reviewer #2 was the temporal resolution of the HI-catFISH experiment shown in Figure 4 and Figure 4-Supplement. Our original manuscript illustrated temporal patterns of Hr38EXN and Hr38ITN signals concomitant with different behavioral paradigms (Figure 4B). The reviewer pointed out that the illustrated experimental design does not reflect the actual data shown in Figure 4-Supplement A-C. We believe this issue was raised because we drew the temporal pattern of Hr38EXN signals in Figure 4B based on the intensity of Hr38EXN signals (Figure 4-Supplement B) rather than based on the % number of positive cells (Figure 4-Supplement C). We have now revised the schematic time course of Hr38EXN signals in Figure 4B using the % of positive cells. We believe this change will be helpful for readers to understand better the experimental design since we used the % of positive cells to identify patterns of P1a neuron activation during male-male vs. male-female social interactions in Figure 4D. Another suggestion from Reviewer #2 was to add additional controls, such as the quantification of the intronic and exonic Hr38 probes after either only the first or second social context exposure. In response, we have now added the data from only the first social context (Figure 4C, and 4D, right column). These new data provides evidence that there are essentially no detectable Hr38INT signals 60 minutes later without a second behavioral context, while Hr38EXN signals are still present at the time of the analysis.  Unfortunately, we are not able to provide the converse dataset with the second behavioral context only to show that Hr38 INT signals are detected. On this point, we call the reviewer’s attention to Figure 4-supplement-S4A-C, which show that the INT probe signals are detectable at 15 and 30 minutes following stimulation, but not at 60 minutes.  In the experiment of Fig. 4B, flies are fixed and labeled for Hr38 30 minutes after the beginning of the second behavior, conditions under which we should obtain robust INT signals (as observed).  EXN signals are also expected at 30 minutes because the primary (non-spliced) RNA transcript detected by the INT probe also contains exonic sequences.

      Response to Recommendations for the authors:

      Given that the development of in situ HCR for the adult fly brain is so central to the present manuscript, I think that the methods section describing the HCR protocol can be significantly improved. In particular, the authors should fully describe the in situ HCR protocol including the 'minor modifications' they refer to, and define how they calculate the 'relative intensity to the background'.

      We appreciate the reviewer’s suggestion. We have now revised the methods section to describe the procedure in more detail. Also, we will submit a separate document describing the HI-FISH protocol.

      Note: The authors refer to a recently published paper by Takayanagi-Kiya et al (2023) describing activity-based neuronal labeling using a different immediate early gene, stripe/egr-1. The authors state the following: 'That study used a GAL4 driver for the stripe/egr-1 gene to label and functionally manipulate activated neurons. In contrast, our approach is based purely on detecting expression of the IEG mRNA using..'. Takayanagi-Kiya et al. (2023) also use in situ mRNA detection of the IEG stripe/egr-1 and not only a GAL4 driver system. This claim should be modified and the paper should be cited in the introduction of the present paper.

      We have now cited the paper in the Introduction and have modified and moved the description originally in 'Note' section to Discussion (text lines: 392-404) as the reviewer requested. We have emphasized the difference between the two approaches for comparing neuronal activities during two different behaviors within the same animal. Takayanagi-Kiya used GAL4/UAS and stripe protein expression with immunohistochemistry to analyze neuronal activities during two different behaviors, while we exclusively analyzed Hr38 mRNA expression for this purpose, using intronic and exonic Hr38 probes. This approach made it possible to perform catFISH with higher temporal resolution and also allows extension of our approach to other IEGs for which antibodies are not available.

      Please specify the nature of the iron fillings in the methods section.

      We added a detailed description in the methods section, including the catalog number.

      In Figure 1B, the authors may add a dashed outline to the regions magnified in 1C so that readers can more easily follow the figures. Moreover, it would be informative to see a more detailed quantification of the number of Hr38-positive cells in different brain regions marked by Fru-GAL4.

      We have now added the whole brain images for each condition in Figure 1C and also quantitative data in Figure 1-Supplement C, as the reviewer suggested.

      In the middle right aggression panel of Figure 2A, it looks as if one P1a neuron is not outlined.

      We have carefully examined other z-planes through this region and based on those data have concluded that the signals mentioned by the reviewer are neurites from neurons labeled in other z-planes.

      Changes: None.

      The images in Figure 2A can be again found in Figure Supplement 2A, yet the number of neurons analyzed suggests the quantification was performed from different samples. The images in Figure Supplement 2A should be either changed or it should be explained as to why the images are the same yet the numbers in the legend are different.

      We apologize for the confusion. Figure 2 and Figure 2-Supplement are from the same experiment. To avoid clutter we illustrated three key conditions ('Control,' 'Aggression,' and 'Courtship') in the main figure. The reason why the numbers in the legend are different is that the purpose of presenting Figure 2-Supplement B-D was to determine whether there were differences in the intensity of Hr38 FISH signals in the neurons considered as 'positive' in different conditions. Therefore, the numbers described in Figure 2-Supplement legend are derived only from those neurons that were considered Hr38-positive, while the numbers in Figure 2 include all neurons analyzed. We have now added notes to explain this in the Figure 2 – supplement legend.

      The panels of the quantification of the Hr38 relative intensity in Figure 2B/C/D are very difficult to read, ideally, they should be plotted as in Figure Supplement 2B/C/D.

      The graphs in Figure 2B-D (upper) show data from all GFP-labeled cells scored, including cells defined as 'negative' or 'borderline.' In contrast, the graphs in Figure 2-supplement show the relative Hr38 signal intensity in those GFP neurons defined as positive based on the analysis in Fig. 2B. If we were to plot the data in Fig. 2B (upper) as box plots (like that in Figure-2-supplement), we would see either a skewed (only negative cells) or a bimodal distribution (one around the negative population and the other around the positive population); the shapes of these distributions would likely be hidden in the box-whisker plots format. Therefore, we prefer to plot all of the data points as we did in the original manuscript. However, we agree that the data points in the original manuscript were hard to read. We therefore changed the format of the datapoints from blurry dots to open circles with clear solid lines.

      In Figure 2B/C/D, please specify in the figure legend what 'grouped in categories according to character' means. 

      We used letters to mark statistically significant differences (or lack thereof) between conditions. Bars sharing at least one common letter are not significantly different.  If they do not share any letter, they are significantly different. For example, Aggression: bc vs. Dead: bc, means no difference. Aggression: bc vs. No Food: b, or Aggression: bc vs. Courtship: c also means no difference between Aggression and each of the two other conditions. However, 'No Food: b' and 'Courtship: c' have no common letter, meaning they are different. This is a standard method for showing statistically comparisons among multiple bars without lots of asterisks and horizontal bars cluttering the figure, and we have revised the legend to clarify what each letter means. We have also removed the color shading in Figure 2 B-D as it may have been confusing.

      A quantification of the number of Hr38-positive neurons and Hr38 relative intensity during the entire time course would be informative in Figure 3D. 

      Although the data set for this figure is different from that for Figure 4-Supplement A-C, the main claim is the same. Therefore, Figure 4 - Supplement essentially provides the information that the reviewer suggested. However, we also reanalyzed the data set used for the original Figure 3D and evaluated % positive cells at the 30-minute time point and have now added that number in the figure legend.

      In the legend of Figure 3D, it says '..The expression level reaches its peak at 30-60min', yet I don't see timepoints beyond 60min. Please rephrase or add additional timepoints. 

      We apologize for the error. We have rephrased the text.

      Figure Supplement 3A/D: please add an outline or a schematic figure to better understand where the imaging is performed.

      We added illustrated schemas next to the title of each experiment (P1->PAM neurons (bundle) and P1 -> Kenyon cells (bundle)).

      Figure Supplement 3C/F: please add information about the statistical test to the corresponding figure legend.

      We have added a phrase to describe the test used.

      Figure Supplement 3G/H/I/J: motion artifacts can potentially strongly affect the performed analysis given that cell bodies are very small and highly subjected to motion. Can the authors comment on how they corrected for motion?

      We have now described how we corrected for motion artifacts in the Methods section.

      Figure 4C/D: It seems as if the representative images don't reflect the quantification, e.g., in the male -> female panel, close to 100% of the neurons are positive for the exonic probe as opposed to approx. 40% in the bar graph.

      Please see our response to this issue in the 'Response to Public Review (Reviewer #1)'.

      Additional controls should be included in Figure 4C in order to assess the temporal resolution of HI-CatFISH more in detail (see 'Weaknesses').

      We have also answered this in the 'Response to Public Review'.

      The authors should adjust the scheme in the main Figure 4B to reflect the data presented in Figure S4A and C. For instance, the peak for the intronic version is observed at 15 minutes, while at 30 minutes, both the exonic and intronic signals show an equal level of signal.

      We have addressed this issue in the 'Response to Public Review'.

      We thank the reviewers again for their helpful comments and hope that with these changes, the manuscript will now be acceptable for official publication in eLife.

    1. Author response:

      Reviewer #1 (Public Review): 

      The manuscript entitled "A septo-hypothalamic-medullary circuit directs stress-induced analgesia" by Shah et al., showed that the dLS-to-LHA circuit is sufficient and necessary for stress-induced analgesia (SIA), which is mediated by the rostral ventromedial medulla (RVM) in a opioid-dependent manner. This study is interesting and important and the conclusions are largely supported by the data. I have a few concerns as follows:

      We thank the reviewer for finding our study “interesting”, “important”, and “conclusions are largely supported by data”.

      (1)  The present data show that activation of dLS neurons produces SIA, however, this manipulation is non-specific. It may be better to see the effect of specific manipulation of stress-activated c-Fos positive neurons in the dLS using a combination of the Tet-Off system and chemogenetic/optogenetic tools. 

      We agree with the reviewer that activating the stress-“trapped” neurons will be more specific way to induce SIA through septal activation, compared to the activation of entire dLS strategy pursued by us. In most likelihood, we expect to see a robust SIA if specifically stress responsive dLS neurons are observed. We are in the process of acquiring the genetic tools required for “Trapping” stress neurons and expect to be able to perform the experiments suggested by the reviewers in the coming months. 

      (2)  Depending on its duration, and intensity, stress can exert potent and bidirectional modulatory effects on pain, either reducing pain (SIA) or exacerbating it (stress-induced hyperalgesia, SIH). Is the circuit in the manuscript involved in SIH?

      As mentioned by the reviewer, it would be reasonable to suspect that the dLS neurons are involved in SIH. However, we believe that the experiments to test this hypothesis is outside the scope of this paper, since here we have focused on the circuit mechanisms for SIA. However, in the revised discussion section, we have included the possibility of dLS neurons driving SIH. 

      (3)  It is well-accepted that opioid and cannabinoid receptors participate in the SIA, and the evidence is especially strong for the RVM endocannabinoid system. Given this, why did the authors focus their study on the opioid system?

      We agree with the reviewer that dLS-mediated SIA may work through neural circuits centered on RVM expressing receptors for either or both opioids and endocannabinoids. We primarily focused on the opioidergic system in the RVM as decades of mechanistic work has revealed how the ON, OFF, and neutral neurons modulate pain through the endogenous opioids and even mediate SIA. In the revised discussion, we have included the possibility of involvement of both pain modulatory systems. 

      (4)  Does silencing of the dLS neurons affect stress-induced anxiety-like behaviors? Alternatively, what is the relationship between SIA and the level of stress-induced anxiety?

      We did not test if the silencing of dLS would affect stress-induced anxiety, as our focus was on the pain modulatory effects of dLS activation. The relationships between levels of SIA and stress-induced anxiety will be interesting to explore in future. We believe we would need better behavioral assays compared to the existing ones to quantitatively measure levels of stress-induced anxiety and SIA levels.

      (5)  Direct electrophysiological evidence should be provided to confirm the efficacy of the MP-CNO.

      We agree with the reviewer that ex-vivo electrophysiology experiments will substantiate the effectiveness of the MP-CNO. However, we do not have the expertise, or the instrumentation required to perform these experiments in our laboratory.

      (6)  Is the LHA a specific downstream target for SIA, and is the LHA involved in stressinduced anxiety-like behaviors?

      Several lines of evidence points to the fact that LHA neurons are involved in stressinduced anxiety. We have also shown that the dLS downstream neurons in the LHA are activated by acute restraint by fiber photometry recordings. Thus, we expect activation of the LHA neurons will cause stress-induced anxiety. However, we wanted to focus on the pain modulation aspect of the dLS-LHA-RVM circuitry.

      (7)  Do LHA neurons have direct projections to the RVM? If yes, what is its role in the SIA?

      Our anatomical studies using transsynaptic anterograde and retrograde viral strategies in the Figure 6 shows that the LHA neurons have direct projections to the RVM, and these neurons are sufficient in driving hyperalgesia, as well as necessary for SIA. 

      Reviewer #2 (Public Review): 

      Summary: 

      In this manuscript, Shah et al. explore the function of an understudied neural circuitry from the dLS -> LHA -> RVM in mediating stress-induced analgesia. They initially establish this neural circuitry through a series of intersectional tracings. Subsequently, they conduct behavioral tests, coupled with optogenetic or chemogenetic manipulations, to confirm the involvement of this pathway in promoting analgesia. Additionally, fiber photometry experiments are employed to investigate the activity of each brain region in response to stress and pain. 

      Strengths: 

      Overall, the study is comprehensive, and the findings are compelling. 

      We appreciate the reviewer for finding our manuscript “comprehensive” and “compelling”.

      Weaknesses: 

      One noteworthy concern arises regarding the overarching hypothesis that restrainedinduced stress promotes analgesia. A more direct interpretation suggests that intense struggling, rather than stress per se, activates the dLS -> LHA -> RVM pathway that may drive analgesic responses. 

      We agree with the reviewer that our data can be interpreted as “intense struggling”, rather than the “acute stress” might have altered the pain thresholds in mice. However, we would like to point out that the restraint induced stress model that we have used has been long regarded as a standard for inducing stress. Moreover, we have demonstrated that dLS activation results into acute stress by measuring the blood corticosterone levels, and showed that dLS activations caused stress-induced anxiety through lightdark box tests.

      Reviewer #2 (Recommendations For The Authors): 

      Please find below my other comments for improvements. 

      Introduction: The authors claimed that "dLS neurons receive nociceptive inputs from the thalamus and somatosensory cortices." However, citations are missing.

      We have added the citations.

      Figure 1 B&C: Although this paper focuses on the dLS, it would be informative to also include vLS c-Fos images (maybe in a supplementary figure), given that these data appear to be already acquired. The inclusion of vLS data will provide critical information regarding potential specificity (or lack of) across LS subregions in stress responses.

      In the revised manuscript we have added the vLS c-Fos images as suggested by the reviewer. 

      Figure 1D: Quantification of Vgat vs. Vglut neurons is missing. It is unclear if the Vgat neurons are restricted to small clusters.

      We did not add the Vglut vs, Vgat quantification since from both of our experiments and publicly available data from the Allen Brain Atlas show that almost all of the neurons in the LS are gabaergic. We found very rare,0-2 Vglut2 expressing neurons per section in the the LS of the mouse brain.

      Figure 1G: The Y-axis label is missing. 

      We have added the axis in the revised manuscript.

      Figure 2: The authors claimed that dLS neurons are preferentially tuned to stress caused by physical restraint. However, it appears that these neurons are specifically tuned to intense struggle behavior (transient) rather than stress (prolonged).

      We agree with the reviewer that the SIA observed in mice with dLS activation, can be interpreted as the effect of transient struggle behavior rather than the prolonged stress. However, we would like to point out that the acute restraint for one hour is known to produce prolonged stress, and is backed up by increased blood coticosterone levels and stress-induced anxiety (Fig1-Fig Supplementary 1).

      Figure 4: The authors provided compelling evidence that dLS neurons synapse on LHA Vglut2 neurons. However, it is unclear if they exclusively target the Vglut2 neurons or also synapse on LHA Vgat neurons.

      We agree with the reviewer that even though the majority of the dLS downstream neurons in the LHA are glutamatergic, as now shown in the Fig. 4D, few neurons do not express Vglut and thus must be Gabaergic. 

      Figure 5D: It is unclear if the trace represents dLS or LHA calcium signal (in the main text, the authors claimed both).

      Now, we have mentioned the neurons on the LHA we have recorded from at the top of Figure 5C, D. 

      Figure 6 G&H: Presumably, ΔG-Rabies does not transmit across neurons due to the deletion of the glycoprotein (G) gene. Thus, it is unclear why dLS and LHA neurons express mCherry after injecting rabies into RVM.

      The aim of the rabies experiment was to test that the cells in the LHA that receive inputs from the dLS are the same ones that send projections downstream to the RVM. To this end, we used a monosynaptic rabies virus that has retrograde properties. Hence, when injected into the RVM, it was taken up by the terminals of the LHA neurons in the RVM and traveled to the cell bodies in the LHA. We injected the AAV1-Transsyn-Cre in the dLS, so only the cells downstream of the dLS in the LHA can express the Credependent glycoprotein (G) gene. Thus, the rabies-mCherry virus infected the LHA neurons downstream of dLS specifically, and jumped a synapse, to label the upstream dLS neurons.

      The authors claim that "RVMpost-LHA neurons may modulate nociceptive thresholds through their local synaptic connections within the RVM, recurrent connections with the PAG, or direct interactions with spinal cord neurons." It is unclear what the "local synaptic connections within the RVM" means. It is also unclear whether there is evidence of recurrent connections between the RVM and PAG.

      We meant by local connections as intrinsic connections within the RVM, as in some or few of the RVM neurons, post LHA might be interneurons and mediating SIA by modulating the ON or OFF cells. There are some anatomical evidence for the ascending inputs from RVM to the PAG and the we have now included the citation in the mentioned section of the manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary

      In this manuscript, Day et al. present a high-throughput version of expansion microscopy to increase the throughput of this well-established super-resolution imaging technique. Through technical innovations in liquid handling with custom-fabricated tools and modifications to how the expandable hydrogels are polymerized, the authors show robust ~4-fold expansion of cultured cells in 96-well plates. They go on to show that HiExM can be used for applications such as drug screens by testing the effect of doxorubicin on human cardiomyocytes. Interestingly, the effects of this drug on changing DNA organization were only detectable by ExM, demonstrating the utility of HiExM for such studies.

      Overall, this is a very well-written manuscript presenting an important technical advance that overcomes a major limitation of ExM - throughput. As a method, HiExM appears extremely useful and the data generally support the conclusions.

      Strengths

      Hi-ExM overcomes a major limitation of ExM by increasing the throughput and reducing the need for manual handling of gels. The authors do an excellent job of explaining each variation introduced to HiExM to make this work and thoroughly characterize the impressive expansion isotropy. The dox experiments are generally well-controlled and the comparison to an alternative stressor (H2O2) significantly strengthens the conclusions.

      Weaknesses

      (1) It is still unclear to me whether or not cells that do not expand remain in the well given the response to point 1. The authors say the cells are digested and washed away but then say that there is a remaining signal from the unexpanded DNA in some cases. I believe this is still a concern that potential users of the protocol should be aware of.

      Although ProteinaseK digestion removes most of the unexpanded cells, DNA can sometimes persist. As such, we occasionally observe Hoechst signal underneath cells. The residual DNA is easily differentiated from nuclear Hoechst signal and does not confound interpretation of results. We have added a new supplementary figure that further clarifies this point.

      (2) Regarding the response to point 9, I think this information should be included in the manuscript, possibly in the methods. It is important for others to have a sense of how long imaging may take if they were to adopt this method.

      We have added detailed information to the methods section to address this point as shown below.  In general, we image HiExM samples on the Opera Phenix at 63x with the following parameters: 100% laser power for all channels; 200 ms exposure for Hoechst, 500-1000+ ms exposure for immunostained channels depending on the strength of the stain and the laser; 60 optical sections with 1 micron spacing; and 4-20 fields of view per well depending on the cell density and sample size requirements. Therefore, imaging one full 96-well plate (60 wells total as we avoid the outer wells) takes anywhere from 3 hr to 64 hr depending on the combination of parameters used.

      Reviewer #2 (Public review):

      Summary:

      In the present work, the authors present an engineering solution to sample preparation in 96-well plates for high-throughput super resolution microscopy via Expansion Microscopy. This is not a trivial problem, as the well cannot be filled with the gel, which would prohibit expansion of the gel. They thus engineered a device that can spot a small droplet of hydrogel solution and keep it in place as it polymerises. It occupies only a small portion space at the center of each well, the gel can expand into all directions and imaging and staining can proceed by liquid handling robots and an automated microscope.

      Strengths:

      In contrast to Reference 8, the authors system is compatible with standard 96 well imaging plates for high-throughput automated microscopy and automated liquid handling for most parts of the protocol. They thus provide a clear path towards high throughput exM and high throughout super resolution microscopy, which is a timely and important goal.

      Addition upon revision:

      The authors addressed this reviewer's suggestions.

      Reviewer #3 (Public review):

      Summary:

      Day et al. introduced high-throughput expansion microscopy (HiExM), a method facilitating the simultaneous adaptation of expansion microscopy for cells cultured in a 96-well plate format. The distinctive features of this method include: 1) the use of a specialized device for delivering a minimal amount (~230 nL) of gel solution to each well of a conventional 96-well plate, and 2) the application of the photochemical initiator, Irgacure 2959, to successfully form and expand toroidal gel within each well.

      Addition upon revision:

      Overall, the authors have adequately addressed most of the concerns raised. There are a few minor issues that require attention.

      Minor comments:

      Figure S10: There appears to be a discrepancy in the panel labeling. The current labels are EH, but it is unclear whether panels A-D exist. Also, this reviewer thought that panels G and H would benefit from statistical testing to strengthen the conclusions. As a general rule for scientific graph presentation, the y-axis of all graphs should start at zero unless there is a compelling reason not to do so.

      We have revised Figure S10 to address your comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      By examining the prevalence of interactions with ancient amino acids of coenzymes in ancient versus recent folds, the authors noticed an increased interaction propensity for ancient interactions. They infer from this that coenzymes might have played an important role in prebiotic proteins.

      Strengths:

      (1) The analysis, which is very straightforward, is technically correct. However, the conclusions might not be as strong as presented.

      (2) This paper presents an excellent summary of contemporary thought on what might have constituted prebiotic proteins and their properties.

      (3) The paper is clearly written.

      We are grateful for the kind comments of the reviewer on our manuscript. However, we would like to clarify a possible misunderstanding in the summary of our study. Specifically, analysis of "ancient versus recent folds" was not really reported in our results. Our analysis concerned "coenzyme age" rather than the "protein folds age" and was focused mainly on interaction with early vs. late amino acids in protein sequence. While structural propensities of the coenzyme binding sites were also analyzed, no distinction on the level of ancient vs. recent folds was assumed and this was only commented on in the discussion, based on previous work of others. 

      Weaknesses:

      (1) The conclusions might not be as strong as presented. First of all, while ancient amino acids interact less frequently in late with a given coenzyme, maybe this just reflects the fact that proteins that evolved later might be using residues that have a more favorable binding free energy.

      We would like to point out that there was no distinction between proteins that evolved early or late in our dataset of coenzyme-binding proteins. The aim of our analysis was purely to observe trends in the age of amino acids vs. age of coenzymes. While no direct inference can be made from this about early life as all the proteins are from extant life (as highlighted in the discussion of our work), our goal was to look for intrinsic propensities of early vs. late amino acids in binding to the different coenzyme entities. Indeed, very early interactions would be smeared by the eons of evolutionary history (perhaps also towards more favourable binding free energy, as pointed out also by the reviewer). Nevertheless, significant trends have been recorded across the PDB dataset, pointing to different propensities and mechanistic properties of the binding events. Rather than to a specific evolutionary past, our data therefore point to a “capacity” of the early amino acids to bind certain coenzymes, and we believe that this is the major (and standing) conclusion of our work, along with the properties of such interactions. In our revised version, we will carefully go through all the conclusions and make sure that this message stands out, but we are confident that the following concluding sentences copied from the abstract and the discussion of our manuscript fully comply with our data:

      “These results imply the plausibility of a coenzyme-peptide functional collaboration preceding the establishment of the Central Dogma and full protein alphabet evolution”

      “While no direct inferences about distant evolutionary past can be drawn from the analysis of extant proteins, the principles guiding these interactions can imply their potential prebiotic feasibility and significance.”

      “This implies that late amino acids would not be necessarily needed for the sovereignty of coenzyme-peptide interplay.”

      We would also like to add that proteins that evolved later might not always have higher free energy of binding. Musil et al., 2021 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8294521/)  showed in their study on the example of haloalkane dehalogenase Dha A that the ancestral sequence reconstruction is a powerful tool for designing more stable, but also more active proteins. Ancestral sequence reconstruction relies on finding ancient states of protein families to suggest mutations that will lead to more stable proteins than are currently existing proteins. Their study did not explore the ligand-protein interactions specifically but showed that ancient states often show more favorable properties than modern proteins.

      (2) What about other small molecules that existed in the probiotic soup? Do they also prefer such ancient amino acids? If so, this might reflect the interaction propensity of specific amino acids rather than the inferred important role of coenzymes.

      We appreciate the comment of the reviewer towards other small molecules, which we assume points mainly towards metal ions (i.e. inorganic cofactors). We completely agree with the reviewer that such interactions are of utmost importance to the origins of life. Intentionally, they were not part of our study, as these have already been studied previously by others (e.g. Bromberg et al., 2022; and reviewed in Frenkel-Pinter et al., 2020) and also us (Fried et al., 2022). For example, it is noteworthy that prebiotically relevant metal binding sites (e.g. of Mg2+) exhibit enrichment in early amino acids such as Asp and Glu while more recent metal (e.g. Cu and Zn) site in the late amino acids His and Cys (Fried et al., 2022). At the same time, comparable analyses of amino acid - coenzyme trends were not available.

      Nevertheless, involvement of metal ions in the coenzyme binding sites was also studied here and pointed to their bigger involvement with the Ancient coenzymes. In the revised version of the manuscript, we will be happy to enlarge the discussion of the studies concerning inorganic cofactors.

      The following sentence was added in the discussion of the revised manuscript:

      “This would also be true for direct interaction of early peptides/proteins and metal ions, independent of organic cofactor involvement, as discussed previously by us and others (Bromberg et al., 2022; Frenkel-Pinter et al., 2020; Fried et al., 2022).  For example, it has been observed that coordination of prebiotically most relevant metal ions (e.g., Mg2+) is more often mediated by early amino acids such as Asp and Glu, whereas metal ions of later relevance (e.g., Cu and Zn) bind more frequently via late amino acids like His and Cys (Fried et al. 2022). Similarly, ancient metal binding folds have been shown to be enriched in early amino acids (Bromberg et al., 2022).”

      (3) Perhaps the conclusions just reflect the types of active sites that evolved first and nothing more.

      We partly agree on this point with the reviewer but not on the fact why it is listed as the weakness of our study and on the “nothing more” notion. Understanding what the properties of the earliest binding sites is key to merging the gap between prebiotic chemistry and biochemistry. The potential of peptides preceding ribosomal synthesis (and the full alphabet evolution) along with prebiotically plausible coenzymes addresses exactly this gap, which is currently not understood.  

      Reviewer #2 (Public Review):

      I enjoyed reading this paper and appreciate the careful analysis performed by the investigators examining whether 'ancient' cofactors are preferentially bound by the first-available amino acids, and whether later 'LUCA' cofactors are bound by the late-arriving amino acids. I've always found this question fascinating as there is a contradiction in inorganic metal-protein complexes (not what is focused on here). Metal coordination of Fe, Ni heavily relies on softer ligands like His and Cys - which are by most models latecomer amino acids. There are no traces of thiols or imidazoles in meteorites - although work by Dvorkin has indicated that could very well be due to acid degradation during extraction. Chris Dupont (PNAS 2005) showed that metal speciation in the early earth (such as proposed by Anbar and prior RJP Williams) matched the purported order of fold emergence.

      As such, cofactor-protein interactions as a driving force for evolution has always made sense to me and I admittedly read this paper biased in its favor. But to make sure, I started to play around with the data that the authors kindly and importantly shared in the supplementary files. Here's what I found:

      Point 1: The correlation between abundance of amino acids and protein age is dominated by glycine.

      There is a small, but visible difference in old vs new amino acid fractional abundance between Ancient and LUCA proteins (Figure 3, Supplementary Table 3). However, the bias is not evenly distributed among the amino acids - which Figure 4A shows but is hard to digest as presented. So instead I used the spreadsheet in Supplement 3 to calculate the fractional difference FDaa = F(old aa)-F(new aa). As expected from Figure 3, the mean FD for Ancient is greater than the mean FD for LUCA. But when you look at the same table for each amino acid FDcofactor = F(ancient cofactor) - F(LUCA cofactor), you now see that the bias is not evenly distributed between older and newer amino acids at all. In fact, most of the difference can be explained by glycine (FDcofactor = 3.8) and the rest by also including tryptophan (FDcofactor = -3.8). If you remove these two amino acids from the analysis, the trend seen in Figure 3 all but disappears.

      Troubling - so you might argue that Gly is the oldest of the old and Trp is the newest of the new so the argument still stands. Unfortunately, Gly is a lot of things - flexible, small, polar - so what is the real correlation, age, or chemistry? This leads to point 2.

      We truly acknowledge the effort that the reviewer made in the revision of the data and for the thoughtful, deeper analysis. We agree that this deserves further discussion of our data. 

      As invited by the reviewer, we indeed repeated the analysis on the whole dataset. First, we would like to point out that the reviewer was most probably referring to the Supplementary Fig. 2 (and not 3, which concerns protein folds). While the difference between Ancient and LUCA coenzyme binding is indeed most pronounced for Gly and Trp, we failed to confirm that the trend disappears if those two amino acids are removed from the analysis (additional FDcofactors of 3.2 and -3.2 are observed for the early and late amino acids, resp.), as seen in Table I below. The main additional contributors to this effect are Asp (FD of 2.1) and Ser (FD of 1.8) from the early amino acids and Arg (FD of -2.6) and Cys (FD of -1.7) of the late amino acids. Hence, while we agree with the reviewer that Gly and Trp (the oldest and the youngest) contribute to this effect the most, we disagree that the trend reduces to these two amino acids.  

      In addition, the most recent coenzyme temporality (the Post-LUCA) was neglected in the reviewer’s analysis. The difference between F (old) and F (new) is even more pronounced in Post-LUCA than in LUCA, vs. Ancient (Supplementary table 5A) and depends much less on Trp. Meanwhile, Asp, Ser, Leu, Phe, and Arg dominate the observed phenomenon (Supplementary table 5b). This further supports our lack of agreement with the reviewer’s point. Nevertheless, we remain grateful for this discussion and we will happily include this additional analysis in the Supplementary Material of our revised manuscript.

      The following text (and the additional data) was included in the revised manuscript version:

      “To explore the contribution of individual amino acids to this effect, fractional difference (FD) for early vs. late amino acids among the Ancient, LUCA, and Post-LUCA coenzyme binding was calculated (Supplementary Table 5). The mean FD revealed a similar trend to the amino acid composition analysis (Fig. 3). The amino acids most enriched in LUCA vs. Post-LUCA are Gly, Ser, and Leu (FD of 4.4, 4.3, and 4.1 respectively), while the most depleted include Phe, Arg, and His (FD of -11, -4.2, and -3.2) (Supplementary Table 5B).”

      Point 2 - The correlation is dominated by phosphate.

      In the ancient cofactor list, all but 4 comprise at least one phosphate (SAM, tetrahydrofolic acid, biopterin, and heme). Except for SAM, the rest have very low Gly abundance. The overall high Gly abundance in the ancient enzymes is due to the chemical property of glycine that can occupy the right-hand side of the Ramachandran plot. This allows it to make the alternating alphaleft-alpharight conformation of the P-loop forming Milner-White's anionic nest. If you remove phosphate binding folds from the analysis the trend in Figure 3 vanishes.

      Likewise, Trp is an important functional residue for binding quinones and tuning its redox potential. The LUCA cofactor set is dominated by quinone and derivatives, which likely drives up the new amino acid score for this class of cofactors.

      Once again, we are thankful to the reviewer for raising this point. The role of Gly in the anionic nests proposed by Milner-White and Russel, as well as the Trp role in quinone binding are important points that we would be happy to highlight more in the discussion of the revised manuscript. 

      Nevertheless, we disagree that the trends reduce only to the phosphate-containing coenzymes and importantly, that “the trend in Figure 3 vanishes” upon their removal. Supplementary table 6A and 6B show the data for coenzymes excluding those with phosphate moiety and the trend in Fig. 3 remains, albeit less pronounced.

      The following text was included in the revised manuscript version:

      “Moreover, we investigated whether the observed trend in amino acid occurrence at the binding sites was dominated by the presence of phosphate groups, which are common in many ancient cofactors except for SAM, Tetrahydrofolic acid, Biopterin, and Heme. An additional analysis therefore excluded all phosphate-containing coenzymes indicating that while the trend is less pronounced, it remains even in the absence of phosphate groups (Supplementary Table 6).”

      In summary, while I still believe the premise that cofactors drove the shape of peptides and the folds that came from them - and that Rossmann folds are ancient phosphate-binding proteins, this analysis does not really bring anything new to these ideas that have already been stated by Tawfik/Longo, Milner-White/Russell, and many others.

      I did this analysis ad hoc on a slice of the data the authors provided and could easily have missed something and I encourage the authors to check my work. If it holds up it should be noted that negative results can often be as informative as strong positive ones. I think the signal here is too weak to see in the noise using the current approach.

      We are grateful to the reviewer for encouraging further look at our data. While we hope that the analysis on the whole dataset (listed in Tables I - IV) will change the reviewer’s standpoint on our work, we would still like to comment on the questioned novelty of our results. In fact, the extraordinary works by Tawfik/Longo and Milner-While/Russel (which were cited in our manuscript multiple times) presented one of the motivations for this study.   We take the opportunity to copy the part of our discussion that specifically highlights the relevance of their studies, and points out the contribution of our work with respect to theirs.  

      “While all the coenzymes bind preferentially to protein residue sidechains, more backbone interactions appear in the ancient coenzyme class when compared to others. This supports an earlier hypothesis that functions of the earliest peptides (possibly of variable compositions and lengths) would be performed with the assistance of the main chain atoms rather than their sidechains (Milner-White and Russel 2011). Longo et al., recently analyzed binding sites of different phosphate-containing ligands which were arguably of high relevance during earliest stages of life, connecting all of today’s core metabolism (Longo et al., 2020 (b)). They observed that unlike the evolutionary younger binding motifs (which rely on sidechain binding), the most ancient lineages indeed bind to phosphate moieties predominantly via the protein backbone.

      Our analysis assigns this phenomenon primarily to interactions via early amino acids that (as mentioned above) are generally enriched in the binding interface of the ancient coenzymes. This implies that late amino acids would not be necessarily needed for the sovereignty of coenzyme-peptide interplay.”

      Unlike any other previous work, our study involves all the major coenzymes (not just the phosphate-containing ones) and is based on their evolutionary age, as well as age of amino acids. It is the first PDB-wide systematic evolutionary analysis of coenzyme-amino acid binding. Besides confirming some earlier theoretical assertions (such as role of backbone interactions in early peptide-coenzyme evolution) and observations (such as occurrence of the ancient phosphate-containing coenzymes in the oldest protein folds), it uncovers substantial novel knowledge. For example, (i) enrichment of early amino acids in the binding of ancient coenzymes, vs. enrichment of late amino acids in the binding of LUCA and Post-LUCA coenzymes, (ii) the trends in secondary structure content of the binding sites of coenzyme of different temporalities, (iii) increased involvement of metal ions in the ancient coenzyme binding events, and (iv) the capacity of only early amino acids to bind ancient coenzymes. In our humble opinion, all of these points bring important contributions in the peptide-coenzyme knowledge gap which has been discussed in a number of previous studies.

      Recommendations for the authors:

      (1) By only focusing on coenzymes, the authors may have overestimated their importance. What about other small molecules that existed in the prebiotic soup? Do they also prefer such ancient amino acids? If so, this might reflect the interaction propensity of specific amino acids rather than some possible role in very ancient proteins. Or it might diminish the conjectured importance of coenzymes.

      The following sentence was added in the discussion of the revised manuscript:

      “This would also be true for direct interaction of early peptides/proteins and metal ions, independent of organic cofactor involvement, as discussed previously by us and others (Bromberg et al., 2022; Frenkel-Pinter et al., 2020; Fried et al., 2022).  For example, it has been observed that coordination of prebiotically most relevant metal ions (e.g., Mg2+) is more often mediated by early amino acids such as Asp and Glu, whereas metal ions of later relevance (e.g., Cu and Zn) bind more frequently via late amino acids like His and Cys (Fried et al. 2022). Similarly, ancient metal binding folds have been shown to be enriched in early amino acids (Bromberg et al., 2022).”

      (2) The authors should analyze whether the interactions are with similar types of amino acids in ancient versus early proteins.

      While we appreciate the interesting suggestion, we would like to clarify that we did not aim to elucidate the differences between early and late protein folds - we agree that this might add an interesting perspective to our work, but we feel that it is well beyond the scope of our current study.

      (3) The authors might also wish to do sequence alignments to the structures in early versus late evolving proteins to see how general this pattern of residue usage is beyond the limited set of proteins found in the PDB.

      This is an interesting suggestion but similar to the previous recommendation, it is not within the scope of this study where no distinction between early and late evolving proteins has been made.  

      There has been a number of attempts to classify the folds as shared among Bacteria, Archea and Eukaryota or specific to  one or two of these groups of organisms (https://link.springer.com/article/10.1007/s00239-023-10136-xhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9541633/) - this does not however compare easily with our time scales - where ancient ligands occur well before the last common ancestor.

      We also agree  the set of sequences present in the PDB is biased, but perhaps it is less biased than we have thought. The recent fantastic work https://www.biorxiv.org/content/10.1101/2024.03.18.585509v2)  from Nicola Bordin and his colleagues from Orengo group attempted to classify over 200 milion structures in Alphafold database in so called Encyclopedia of Domains and they found out that nearly 80% of detected domains can be assigned to already known superfamilies in CATH (https://www.biorxiv.org/content/10.1101/2024.03.18.585509v2).

      (4) The authors might wish to consider the results in Skolnick, H. Zhou, and M. Gao. On the possible origin of protein homochirality, structure, and biochemical function. PNAS 2019: 116(52): 26571-26579.

      Based on the editorial recommendation, the following sentence was added in the discussion:

      “It has been implied by computer simulations that coenzymes could bind to proteins with similar propensity even before the onset of protein homochirality, despite lower structural stability and secondary structure content in heterochiral polypeptides (Skolnick et al., 2019).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In their manuscript entitled: "Is tumor mutational burden predictive of response to immunotherapy?", Gurjao and colleagues discuss the use of tumor mutational burden (TMB) as a predictive biomarker for cancer patients to respond to immune checkpoint blockage (ICB). By analyzing a large cohort of 882 patient samples across different tumor types they find either little or no association of TMB to the response of ICB. In addition, they showed that finding the optimal cutoff for patient stratification lead to a severe multiple testing problem. By rigorously addressing this multiple testing problem only non-small cell lung cancer out of 10 cancer types showed a statistically significant association of TMB and response to ICB. Nevertheless, it is clearly shown that in any case the rate of misclassification is too high that TMB alone would qualify as a clinically suitable biomarker for ICB response. Finally, the authors demonstrate with a simple mathematical model that only a few strong immunogenic mutations would be sufficient for an ICB response, thereby showing that also patients with a low TMB score could benefit from immunotherapy. The manuscript is clearly written, the results are well presented and the applied methods are state-of-the-art.

      We would like to thank the reviewer for their thoughtful suggestions and efforts towards improving our manuscript. We address below the reviewer’s recommendations.

      Reviewer #1 (Recommendations For The Authors):

      (1) The method used for mutation call can also influence the TMB score. Mutation data was downloaded from public databases and not re-called for this study, a potential caller bias could be present. What was the calling strategy of the used data sets? For the present study, I don't think that this is crucial because different callers or post-call processing would be used at different sites to determine TMB. I think it should the mutation calling bias should also be discussed in the manuscript as another shortcoming for TMB as a biomarker for ICB response.

      We thank the reviewer for this comment. Mutational data was not aggregated across studies and caller bias would thus not have any impact on the results of this manuscript. In addition, we further clarified the role of mutation calling bias in the Discussions section.

      “Although attractive and scalable, TMB does not consider the effect of specific mutations (missense, frameshift etc), their presentation and clonality (19), nor the state of the tumour, its microenvironment, and interactions with the immune system that can be integrated into potentially better predictors of response to ICB (43, 44). In addition, another major limitation of TMB is the lack of standardized measures. This includes the lack of standard sequencing methods to assess TMB: TMB can be measured from Whole-Exome sequencing, Whole-Genome sequencing, targeted panel and even RNA sequencing. This also includes biases introduced by using different mutation calling pipelines resulting in different TMB, sequencing depth and different characteristics of the samples (e.g. low purity samples typically yield lower TMB).”

      (2) In their mathematical model of neoantigens and immunogenicity it is assumed that the probability of a mutation to be immunogenic is constant for all mutations. In reality this is certainly not satisfied. However, the central conclusion from the model still holds. I think that this is important to discuss in the manuscript.

      We thank the reviewer for this suggestion and now consider the case where each mutation has its own probability p(i) of being immunogenic.

      “Our model shows that achieving about constant 𝑃{𝑖𝑚𝑚𝑢𝑛𝑒 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒} for 𝑁 > 10 − 20 mutations, requires and . The same argument holds when each mutation has its  own probability to be immunogenic 𝑝(𝑖), then , where is the mean probability of a mutation to be immunogenic. Thus only the average probability of a mutation to be immunogenic matters. In summary, we find that the model agrees with clinical data if individual non-synonymous mutations have, on average, 𝑝~10 − 20% chance for triggering an immune response.”

      (3) In the mathematical formula on page 8, C_N^k is the binomial coefficient. This should be stated or written out.

      Thank you for pointing this out. Corrected.

      “Due to immunodominance, only a few 𝑘crit immunogenic mutations are sufficient to elicit a full k𝑐𝑟𝑖𝑡 immune response. Hence, the probability for a cancer with 𝑁 (=TMB) mutations to elicit an immune response is then the probability of having 𝑘 or more immunogenic mutations among :

      which is the CDF of a binomial distribution.”

      (4) The mathematical model provides an explanation that tumors with a low TMB can also respond on ICB. It cannot explain tumors with high TMB lacking ICB response. An explanation of this phenomenon is discussed in the paper but I think also the impact of the tumor immune microenvironment should be mentioned here.

      As we explained in the presentation of the model, even immunogenic tumors elicit response to ICB with some probability. In the revision we write:

      “𝑃{𝑐𝑙𝑖𝑛𝑖𝑐𝑎𝑙 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒} = 𝑃{𝑐𝑙𝑖𝑛𝑖𝑐𝑎𝑙 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒|𝑖𝑚𝑚𝑢𝑛𝑒 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒} · 𝑃{𝑖𝑚𝑚𝑢𝑛𝑒 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒}, where 𝑃{𝑐𝑙𝑖𝑛𝑖𝑐𝑎𝑙 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒|𝑖𝑚𝑚𝑢𝑛𝑒 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒} is the probability of clinical response, given that cancer elicits an immune response which is complex and depends on many factors including tumor immune microenvironment. Yet the prerequisite for the clinical response is the immune response 𝑃{𝑖𝑚𝑚𝑢𝑛𝑒 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒} that we focus on.”

      Reviewer #2 (Public Review):

      The manuscript points out that TMB cut-offs are not strong predictors of response to immunotherapy or overall survival. By randomly shuffling TMB values within cohorts to simulate a null distribution of log-rank test p-values, they show that under correction, the statistical significance of previously reported TMB cut-offs for predicting outcomes is questionable.

      We would like to thank the reviewer for their thoughtful suggestions and efforts towards improving our manuscript.

      There is a clinical need for a better prediction of treatment response than TMB alone can provide. However, no part of the analysis challenges the validity of the well-known pan-cancer correlation between TMB and immunotherapy response.

      We address the pan-cancer correlation in the supplemental text and Figure S3. We realized the supplemental text was missing in eLife submission and included in the bioRxiv only. We apologize for this oversight. In particular, we show that the “well-known pan-cancer correlation” is largely based on a few outlier cancer subtypes - MSI colorectal cancers and uveal/ ocular melanomas. We show that when we remove these cancer types from the pan-cancer dataset, the correlation becomes non-significant for the remaining 15 cancer types.

      The failure to detect significant TMB cut-offs may be due to insufficient power, as the examined cohorts have relatively low sample sizes. A power analysis would be informative of what cohort sizes are needed to detect small to modest effects of TMB on immune response.

      Since we see no effect, we cannot perform a power analysis. Moreover, increasing cohort sizes cannot increase the effect -- dramatic misclassification of responders (the fraction of responders below the treatment cutoff) would remain the same, making TMB unsuitable for clinical decision-making.

      The manuscript provides a simple model of immunogenicity that is tailored to be consistent with a claimed lack of relationship between TMB and response to immunotherapy. Under the model, if each mutation that a tumor has acquired has a relatively high probability of being immunogenic (~10%, they suggest), and if 1-2 immunogenic mutations is enough to induce an immune response, then most tumors produce an immune response, and TMB and response should be uncorrelated except in very low-TMB tumors.

      Contrary to reviewer’s suggestion, our modeling is not tailored to be consistent with the lack of association between TMB and response. On the contrary, we found the model has two regimes: the first regime (where p<<1) in which higher TMB leads to a higher probability of response, which doesn’t agree with the data , and the second regime (p~0.1) in which cancers with TMB>10-20 are immunogenic, consistent with the clinical data.

      We further expanded on these key points in the Results:

      “The model shows two different behaviors. If individual mutations are unlikely to be immunogenic (𝑝 ≪ 1) , e.g. due to a low probability of being presented, the probability of response increases gradually with TMB (Figure 5B). The neoantigen theory generally expects such gradual increase in immunogenicity of cancer with TMB. Yet, available data (Figure 2) don’t show such a trend.

      On the contrary, if mutations are more likely to be immunogenic 𝑝~0. 1, the probability of response quickly saturates (Figure 5C), making such tumors respond to ICB irrespective of TMB, as we observed in clinical data.”

      We also expanded on these key points in the Introduction:

      “We develop a simple model that is based on the neoantigen theory and find that it has two regimes. In one regime, the probability of response increases gradually with TMB, as commonly believed. Yet in the other, the probability of response saturates after a few mutations, making a chance to respond independent of TMB. Our analysis of the clinical data is consistent with the latter regime. Thus our model shows that the neoantigen theory is fully consistent with the lack of association between TMB and response.”

      The question then becomes whether the response is sufficient to wipe out tumor cells in conjunction with immunotherapy, which is essentially the same question of predicting response that motivated the original analysis. While TMB alone is not an excellent predictor of treatment response, the pan-cancer correlation between TMB and response/survival is highly significant, so the model's only independent prediction is wrong.

      Our study indicates that TMB is a very poor predictor (writing that it’s “not an excellent predictor of treatment response” is understatement). Moreover we show that a widely believed “pan-cancer correlation” is shaky as well (Supplemental text and Figure S3). So we don’t see any contradictions between the model and the data.

      Additionally, experiments to predict and validate neoepitopes suggest that a much smaller fraction of nonsynonymous mutations produce immune responses1,2.

      We agree with the reviewer. That’s exactly what the model suggests.

      A key idea that is overlooked in this manuscript is that of survivorship bias: self-evidently, none of the mutations found at the time of sequencing have been immunogenic enough to provoke a response capable of eliminating the tumor. While the authors suggest that immunoediting "is inefficient, allowing tumors to accumulate a high TMB," the alternative explanation fits the neoepitope literature better: most mutations that reach high allele frequency in tumor cells are not immunogenic in typical (or patient-specific) tumor environments. Of course, immunotherapies sometimes succeed in overcoming the evolved immune evasion of tumors. Higher-TMB tumors are likely to continue to have higher mutation rates after sequencing; increased generation of new immunogenic mutations may partially explain their modestly improved responses to therapy.

      We disagree with reviewers' assertion that survivorship bias could explain observed phenomena. If immunogenic mutations that arise during cancer development were eliminated (by purifying selection, i.e. reduced fitness or cellular death) then observed mutations would carry noticeable signatures of purifying selection. On the contrary, cancer genomic data shows incredibly weak signals of purifying selection on non-synonymous mutations (Weghorn and Sunyaev, Nature Genetics 2017), and observed passenger mutations are practically indistinguishable from random in their effect on proteins (McFarland et al PNAS 2013).

      We do agree with the statement that “most mutations … in tumor cells are not immunogenic”. In fact that’s exactly what our model predicts: (1-p)~90% of mutations in the model are non-immunogenic, while remaining p~10% being sufficient to trigger an immune response. We clarify this in the text of the paper: “On the contrary, if mutations are more likely to be immunogenic 𝑝~0. 1, the probability of response quickly saturates (Figure 5C), making such tumors respond to ICB irrespective of TMB, as we observed in clinical data. ”

      Reviewer #2 (Recommendations For The Authors):

      Abstract

      Defining TMB as "number of non-synonymous mutations": while TMB is not consistently defined throughout the literature, it is usually given as a rate rather than a total count, and sometimes synonymous mutations are included. Consider adopting the definition used by the TMB Harmonization Project: "number of somatic mutations per megabase of interrogated genomic sequence.3"

      We thank the reviewer for their comment,

      Be more specific about your findings, so that abstract readers can get some understanding of your proposed explanation for the "immunogenicity of neoantigens and the lack of association between TMB and response."

      We thank the reviewer for their comment. We modified the abstract to explain that the theory we developed expands the neoantigen theory yet can be consistent with the observed lack of association between TMB and response:

      "Second, we develop a model that expands the neoantigen theory and can be consistent with both immunogenicity of neoantigens and the lack of association between TMB and response. Our analysis shows that the use of TMB in clinical practice is not supported by available data and can deprive patients of treatment to which they are likely to respond.”

      Introduction

      Again, consider using a more standard definition of TMB.

      We thank the reviewer for their comment. Our study did not seek to harmonize TMB across the datasets and we thus used the total number of mutations rather than the mutational rate often used for comparison across different datasets.

      Expand the introduction to provide a preview of the purpose and direction of your analysis. The current draft reveals only that the analysis will relate to TMB.

      We expanded the introduction providing the motivation, the approach, and the summary of main findings.

      “Using a biomarker to stratify and prioritize patients for treatment runs a risk of depriving patients who have a chance to respond to a life-saving treatment. High variability of response makes relying on a predictor particularly risky. Hence, we revisit original data that were used to establish correlation between TMB and response. We tested TMB as a predictor of both binary responder/non-responder labels from original clinical studies, as well as continuous survival data. We also investigated whether a TMB threshold could distinguish patients with high and low survival after multiple hypothesis testing. We find that no TMB threshold performs better on the clinical data than on randomized ones.

      We further show that irrespective of the strategy to choose the threshold, even if we were to employ the optimal TMB cutoff, it would still lead to about 25% of responders falling below the treatment prioritization threshold. In addition, we re-examine the pan-cancer association of TMB with response rate to ICB.

      “Finally we revisit the neoantigen theory that was the rationale for using TMB as a predictor of response to immunotherapy. The theory stipulates that non-synonymous mutations can lead to the production of unique antigens (_neo_antigens) that are recognized by the immune system as foreign, triggering the immune response to cancer. The theory further assumes that the more mutations a cancer has, the more likely it triggers the immune system, and the more likely it will benefit from immunotherapy. We develop a simple model that is based on the neoantigen theory and find that it has two regimes. In one regime, the probability of response increases gradually with TMB, as commonly believed. Yet in the other, the probability of response saturates after a few mutations, making a chance to respond independent of TMB. Our analysis of the clinical data is consistent with the latter regime. Thus our model shows that the neoantigen theory is fully consistent with the lack of association between TMB and response.”

      Section: Is TMB associated with response after treatment?

      The claim that after excluding melanoma and some colorectal cancers, there is no relationship between TMB and response rates in pan-cancer studies cites references 12 and 14. In reference 12 (Yarchoan et al.), it is clear from glancing at their Figure 1 that a pan-cancer correlation between TMB and response would remain with these cancer types excluded. This discrepancy requires explanation. "Supplementary text" is cited for this claim, but it was not included in the file that I received.

      We address the pan-cancer correlation in the supplemental text and Figure S3. While the figure was available, we realized the supplemental text was missing in eLife submission. We apologize for this oversight.

      Plots of survival and TMB do not show "visible correlation": Please strengthen this claim with an appropriate statistical test.

      We expand the figure caption to explain the following:

      “Plots of progression-free survival and TMB for melanoma and lung cancer ICB cohorts show the lack of correlation or of an obvious TMB cutoff. Computing a simple correlation for survival and censored data cannot correctly represent the dependence since patients who are alive live longer than the reported survival, and limiting correlation to patients who are dead would bias the analysis. Thus other survival statistics are used through the paper.”

      Section: Model reconciles neoantigen theory and data

      Page 8: In the probability formula, the C term is not defined. My guess is that it means choose(N, k).

      Please clarify.

      Thank you for pointing this out. Corrected using more conventional notation.

      which is the CDF of a binomial distribution.

      Page 8: Assuming the above, P(immune response) = P(X >= k_crit); where X~Bin(N, p). The formula should be explicitly introduced in terms of the CDF of the binomial distribution to prevent readers from thinking the wheel is being re-invented.

      We thank the reviewer for pointing this out, we modified the equation in the text to make it easier to see this point (see above). We refrain from going further since the CDF of a binomial distribution doesn’t have a closed form and can only be written as the regularized incomplete beta function.

      Page 9: Missing word in "allowing cancers with as little as mutations to be"

      We thank the reviewer for pointing this out, we modified the text accordingly.

      See comments in public review. In brief, I think a convincing case is made regarding the significance of TMB cut-offs as predictors of survival within cancer types, but frankly this elementary model is not compelling.

      Section: Materials and Methods

      In the manuscript, it is stated that TMB is accepted as reported by data sources. Since most of the comparisons in the manuscript are within-data-source, that is acceptable. However, it should be ensured that TMB measurements are comparable between samples within each source. For example, when TMB is reported as a total mutation count, it can be verified that all samples have the same coverage, or measurement can be converted to mutations per megabase of coverage. In the same vein, if this manuscript's definition of TMB only includes nonsynomous mutations, it should be confirmed that the TMB reported by data sources excludes synonymous mutations.

      We thank the reviewer for their comment. We leverage total TMB as reported in the original studies claiming an association between TMB and response/ survival.

      Figure S2: Instead of writing "the Youden index associated cutoffs is also plotted," it can be stated that the asterisk represents the Youden index cutoff, or a legend can be added that provides this information.

      We thank the reviewer for pointing this out, we modified the text accordingly.

    1. Author Response:

      Reviewer #1 (Public Review):

      This work makes several contributions: (1) a method for the self-supervised segmentation of cells in 3D microscopy images, (2) an cell-segmented dataset comprising six volumes from a mesoSPIM sample of a mouse brain, and (3) a napari plugin to apply and train the proposed method.

      First, thanks for acknowledging our contributions of a new tool, new dataset, and new software.

      (1) Method

      This work presents itself as a generalizable method contribution with a wide scope: self-supervised 3D cell segmentation in microscopy images. My main critique is that there is almost no evidence for the proposed method to have that wide of a scope. Instead, the paper is more akin to a case report that shows that a particular self-supervised method is good enough to segment cells in two datasets with specific properties.

      First, thanks for acknowledging our contributions of a new tool, new dataset, and new software. We agree we focus on lightsheet microscopy data, therefore to narrow the scope we have changed the title to “CellSeg3D: self-supervised 3D cell segmentation for light-sheet microscopy”.

      To support the claim that their method "address[es] the inherent complexity of quantifying cells in 3D volumes", the method should be evaluated in a comprehensive study including different kinds of light and electron microscopy images, different markers, and resolutions to cover the diversity of microscopy images that both title and abstract are alluding to. The main dataset used here (a mesoSPIM dataset of a whole mouse brain) features well-isolated cells that are easily distinguishable from the background. Otsu thresholding followed by a connected component analysis already segments most of those cells correctly.

      You have selectively dropped the last part of that sentence that is key: “.... 3D volumes, often in cleared neural tissue” – which is what we tackle. The next sentence goes on to say: “We offer a new 3D mesoSPIM dataset and show that CellSeg3D can match state-of-the-art supervised methods.” Thus, we literally make it clear our claims are on MesoSPIM and cleared data.

      The proposed method relies on an intensity-based segmentation method (a soft version of a normalized cut) and has at least five free parameters (radius, intensity, and spatial sigma for SoftNCut, as well as a morphological closing radius, and a merge threshold for touching cells in the post-processing). Given the benefit of tweaking parameters (like thresholds, morphological operation radii, and expected object sizes), it would be illuminating to know how other non-learning-based methods will compare on this dataset, especially if given the same treatment of segmentation post-processing that the proposed method receives. After inspecting the WNet3D predictions (using the napari plugin) on the used datasets I find them almost identical to the raw intensity values, casting doubt as to whether the high segmentation accuracy is really due to the self-supervised learning or instead a function of the post-processing pipeline after thresholding.

      First, thanks for testing our tool, and glad it works for you. The deep learning methods we use cannot “solve” this dataset, and we also have a F1-Score (dice) of ~0.8 with our self-supervised method. We don’t see the value in applying non-learning methods; this is unnecessary and beyond the scope of this work.

      I suggest the following baselines be included to better understand how much of the segmentation accuracy is due to parameter tweaking on the considered datasets versus a novel method contribution:<br /> * comparison to thresholding (with the same post-processing as the proposed method)<br /> * comparison to a normalized cut segmentation (with the same post-processing as the proposed method)<br /> * comparison to references 8 and 9.

      Ref 8 and 9 don’t have readily usable (https://github.com/LiangHann/USAR) or even shared code (https://github.com/Kaiseem/AD-GAN), so re-implementing this work is well beyond the bounds of this paper. We benchmarked Cellpose, StartDist, SegResNets, and a transformer – SwinURNet. Moreover, models in the MONAI package can be used. Note, to our knowledge the transformer results also are a new contribution that the Reviewer does not acknowledge.

      I further strongly encourage the authors to discuss the limitations of their method. From what I understand, the proposed method works only on well-separated objects (due to the semantic segmentation bottleneck), is based on contrastive FG/BG intensity values (due to the SoftNCut loss), and requires tuning of a few parameters (which might be challenging if no ground-truth is available).

      We added text on limitations. Thanks for this suggestion.

      (2) Dataset

      I commend the authors for providing ground-truth labels for more than 2500 cells. I would appreciate it if the Methods section could mention how exactly the cells were labelled. I found a good overlap between the ground truth and Otsu thresholding of the intensity images. Was the ground truth generated by proofreading an initial automatic segmentation, or entirely done by hand? If the former, which method was used to generate the initial segmentation, and are there any concerns that the ground truth might be biased towards a given segmentation method?

      In the already submitted version, we have a 5-page DataSet card that fully answers your questions. They are ALL labeled by hand, without any semi-automatic process.

      In our main text we even stated “Using whole-brain data from mice we cropped small regions and human annotated in 3D 2,632 neurons that were endogenously labeled by TPH2-tdTomato” - clearly mentioning it is human-annotated.

      (3) Napari plugin

      The plugin is well-documented and works by following the installation instructions.

      Great, thanks for the positive feedback.

      However, I was not able to recreate the segmentations reported in the paper with the default settings for the pre-trained WNet3D: segments are generally too large and there are a lot of false positives. Both the prediction and the final instance segmentation also show substantial border artifacts, possibly due to a block-wise processing scheme.

      Your review here does not match your comments above; above you said it was working well, such that you doubt the GT is real and the data is too easy as it was perfectly easy to threshold with non-learning methods.

      You would need to share more details on what you tried. We suggest following our code; namely, we provide the full experimental code and processing for every figure, as was noted in our original submission: https://github.com/C-Achard/cellseg3d-figures.

      Reviewer #2 (Public Review):

      Summary:

      The authors propose a new method for self-supervised learning of 3d semantic segmentation for fluorescence microscopy. It is based on a WNet architecture (Encoder / Decoder using a UNet for each of these components) that reconstructs the image data after binarization in the bottleneck with a soft n-cuts clustering. They annotate a new dataset for nucleus segmentation in mesoSPIM imaging and train their model on this dataset. They create a napari plugin that provides access to this model and provides additional functionality for training of own models (both supervised and self-supervised), data labeling, and instance segmentation via post-processing of the semantic model predictions. This plugin also provides access to models trained on the contributed dataset in a supervised fashion.

      Strengths:

      (1) The idea behind the self-supervised learning loss is interesting.

      (2) The paper addresses an important challenge. Data annotation is very time-consuming for 3d microscopy data, so a self-supervised method that yields similar results to supervised segmentation would provide massive benefits.

      Thank you for highlighting the strengths of our work and new contributions.

      Weaknesses:

      The experiments presented by the authors do not adequately support the claims made in the paper. There are several shortcomings in the design of the experiment and presentation of the results. Further, it is unclear if results of similar quality as reported can be achieved within the GUI by non-expert users.

      Major weaknesses:

      (1) The main experiments are conducted on the new mesoSPIM dataset, which contains quite small and well separated nuclei. It is unclear if the good performance of the novel self-supervised learning method compared to CellPose and StarDist would hold for dataset with other characteristics, such as larger nuclei with a more complex morphology or crowded nuclei.

      StarDist is not pretrained, we trained it from scratch as we did for WNet3D. We retrained Cellpose and reported the results both with their pretrained model and our best-retrained model. This is documented in Figure 1 and Suppl. Figure 1. We also want to push back and say that they both work very well on this data. In fact, our main claim is not that we beat them, it is that we can match them with a self-supervised method.

      Further, additional preprocessing of the mesoSPIM images may improve results for StarDist and CellPose (see the first point in minor weaknesses). Note: having a method that works better for small nuclei would be an important contribution. But I am uncertain the claims hold for larger and/or more crowded nuclei as the current version of the paper implies.

      Figure 2 benchmarks our method on larger and denser nuclei, but we do not intend to claim this is a universal tool. It was specifically designed for light-sheet (brain) data, and we have adjusted the title to be more clear. But we also show in Figure 2 it works well on more dense and noisy samples, hinting that it could be a promising approach. But we agree, as-is, it’s unlikely to be good for extremely dense samples like in electron microscopy, which we never claim it would be.

      With regards to preprocessing, we respectfully disagree. We trained StarDist (and asked the main developer of StarDist, Martin Weigert, to check our work and he is acknowledged in the paper) and it does very well. Cellpose we also retrained and optimized and we show it works as-well-as leading transformer and CNN-based approaches. Again, we only claimed we can be as good as these methods with an unsupervised approach.

      The contribution of the paper would be stronger if a comparison with StarDist / CellPose was also done on the additional datasets from Figure 2.

      We appreciate that more datasets would be ideal, but we always feel it’s best for the authors of tools to benchmark their own tools on data. We only compared others in Figure 1 to the new dataset we provide so people get a sense of the quality of the data too; there we did extensive searches for best parameters for those tools. So while we think it would be nice, we will leave it to those authors to be most fair. We also narrowed the scope of our claims to mesoSPIM data (added light-sheet to the title), which none of the other examples in Figure 2 are.

      (2) The experimental setup for the additional datasets seems to be unrealistic. In general, the description of these experiments is quite short and so the exact strategy is unclear from the text. However, you write the following: "The channel containing the foreground was then thresholded and the Voronoi-Otsu algorithm used to generate instance labels (for Platynereis data), with hyperparameters based on the Dice metric with the ground truth." I.e., the hyperparameters for the post-processing are found based on the ground truth. From the description it is unclear whether this is done a) on the part of the data that is then also used to compute metrics or b) on a separate validation split that is not used to compute metrics. If a): this is not a valid experimental setup and amounts to training on your test set. If b): this is ok from an experimental point of view, but likely still significantly overestimates the quality of predictions that can be achieved by manual tuning of these hyperparameters by a user that is not themselves a developer of this plugin or an absolute expert in classical image analysis, see also 3. Note that the paper provides notebooks to reproduce the experimental results. This is very laudable, but I believe that a more extended description of the experiments in the text would still be very helpful to understand the set-up for the reader. Further, from inspection of these notebooks it becomes clear that hyper-parameters where indeed found on the testset (a), so the results are not valid in the current form.

      We apologize for this confusion; we have now expanded the methods to clarify the setup is now b; you can see what we exactly did as well in the figure notebook: https://c-achard.github.io/cellseg3d-figures/fig2-b-c-extra-datasets/self-supervised-extra.html#threshold-predictions. For clarity, we additionally link each individual notebook now in the Methods.

      (3) I cannot obtain similar results to the ones reported in the manuscript using the plugin. I tried to obtain some of the results from the paper qualitatively: First I downloaded one of the volumes from the mesoSPIM dataset (c5image) and applied the WNet3D to it. The prediction looks ok, however the value range is quite narrow (Average BG intensity ~0.4, FG intensity 0.6-0.7). I try to apply the instance segmentation using "Convert to instance labels" from "Utilities". Using "Voronoi-Otsu" does not work due to an error in pyClesperanto ("clGetPlatformIDs failed: PLATFORM_NOT_FOUND_KHR"). Segmentation via "Connected Components" and "Watershed" requires extensive manual tuning to get a somewhat decent result, which is still far from perfect.

      We are sorry to hear of the installation issue; pyClesperanto is a dependency that would be required to reproduce the images (sounds like you had this issue; https://forum.image.sc/t/pyclesperanto-prototype-doesnt-work/45724 ) We added to our docs now explicitly the fix: https://github.com/AdaptiveMotorControlLab/CellSeg3D/pull/90. We recommend checking the reproduction notebooks (which were linked in initial submission): https://c-achard.github.io/cellseg3d-figures/intro.html.

      Then I tried to obtain the results for the Mouse Skull Nuclei Dataset from EmbedSeg. The results look like a denoised version of the input image, not a semantic segmentation. I was skeptical from the beginning that the method would transfer without retraining, due to the very different morphology of nuclei (much larger and elongated). None of the available segmentation methods yield a good result, the best I can achieve is a strong over-segmentation with watersheds.

      - We are surprised to hear this; did you follow the following notebook which directly produces the steps to create this figure? (This was linked in preprint): https://c-achard.github.io/cellseg3d-figures/fig2-c-extra-datasets/self-supervised-extra .html

      -  We have made a video demo for you such that any step that might be unclear is also more clear to a user: (https://youtu.be/U2a9IbiO7nE).

      -  We also expanded the methods to include the exact values from the notebook into the text.

      Minor weaknesses:

      (1) CellPose can work better if images are resized so that the median object size in new images matches the training data. For CellPose the cyto2 model should do this automatically. It would be important to report if this was done, and if not would be advisable to check if this can improve results.

      We reported this value in Figure 1 and found it to work poorly, that is why we retrained Cellpose and found good performance results (also reported in Figure 1). Resizing GB to TB volumes for mesoSPIM data is otherwise not practical, so simply retraining seems the preferable option, which is what we did.

      (2) It is a bit confusing that F1-Score and Dice Score are used interchangeably to evaluate results. The dice score only evaluates semantic predictions, whereas F1-Score evaluates the actual instance segmentation results. I would advise to only use F1-Score, which is the more appropriate metric. For Figure 1f either the mean F1 score over thresholds or F1 @ 0.5 could be reported. Furthermore, I would advise adopting the recommendations on metric reporting from https://www.nature.com/articles/s41592-023-01942-8.

      We are using the common metrics in the field for instance and semantic segmentation, and report them in the methods. In Figure 2f we actually report the “Dice” as defined in StarDist (as we stated in the Methods). Note, their implementation is functionally equivalent to F1-Score of an IoU >= 0, so we simply changed this label in the figure now for clarity. We agree this clarifies for the expert readers what was done, and we expanded the methods to be more clear about metrics. We added a link to the paper you mention as well.

      (3) A more conceptual limitation is that the (self-supervised) method is limited to intensity-based segmentation, and so will not be able to work for cases where structures cannot be distinguished based on intensity only. It is further unclear how well it can separate crowded nuclei. While some object separation can be achieved by morphological operations this is generally limited for crowded segmentation tasks and the main motivation behind the segmentation objective used in StarDist, CellPose, and other instance segmentation methods. This limitation is only superficially acknowledged in "Note that WNet3D uses brightness to detect objects [...]" but should be discussed in more depth.

      Note: this limitation does not mean at all that the underlying contribution is not significant, but I think it is important to address this in more detail so that potential users know where the method is applicable and where it isn't.

      We agree, and we added a new section specifically on limitations. Thanks for raising this good point. Thus, while self-supervision comes at the saving of hundreds of manual labor, it comes at the cost of more limited regimes it can work on. Hence why we don’t claim this should replace excellent methods like Cellpose or Stardist, but rather complement them and can be used on mesoSPIM samples, as we show here.

    1. Author Response:

      We thank the reviewers for their thoughtful comments on our manuscript. In this provisional response, we aim to address the major concerns raised and outline a plan for a revised version of the manuscript. A more detailed point-by-point response will follow with the revision.

      The reviewers appreciated our efforts to combine computational modelling with experimental work. However, they also expressed the need for more clarity in explaining how the model was set up, what was simulated, and what the insights and limitations are. In the revision, we plan to improve the discussion section to clarify all of these points. 

      The reviewers also highlighted the need for more transparency regarding the code and the mathematical formulas used in this study. We agree that this is an important issue. While we have already made the software and code for our computational model, along with instructions on how to run it, available in Zenodo (see Ref. 1), and have extensively described the original computational model and formulas in a 13-page supplementary file in our previous study (see Ref. 2), we recognize from the reviewers’ comments that additional transparency is needed. To address this, we will provide an appendix in the revision that includes a full model description, covering the incorporation of cell differentiation and death, a list of parameters, and details on how parameter values were chosen.

      Additionally, in the revised manuscript, we will add a paragraph to more thoroughly discuss the limitations of our approach, as well as avenues for future studies. We hope this will clarify both capabilities and limitations of our model in a way that is more  accessible to readers of eLife.

      References:

      1. Virtual Thymus Model (version 2.0). Published: Jun 14, 2024.  doi:10.5281/zenodo.11656320

      2. Aghaallaei, Narges, et al. "αβ/γδ T cell lineage outcome is regulated by intrathymic cell localization and environmental signals." Science Advances 7.29 (2021): eabg3613.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors intended to prove that gut GLP-1 expression and secretion can be regulated by Piezo1, and hence by mechanistic/stretching regulation. For this purpose, they have assessed Piezo1 expression in STC-1 cell line (a mouse GLP-1 producing cell line) and mouse gut, showing the correlation between Piezo1 level and Gcg levels (Figure S1). They then aimed to generate gut L cell-specific Piezo1 KO mice, and claimed the mice show impaired glucose tolerance and GLP-1 production, which can be mitigated by Ex-4 treatment (Figures 1-2). Pharmacological agents (Yoda1 and GsMTx4) and mechanic activation (intestinal bead implantation) were then utilized to prove the existence of ileal Piezo1-regulated GLP-1 synthesis (Figure 3). This was followed by testing such mechanism in a limited amount of primary L cells and mainly in the STC-1 cell line (Figures 4-7).

      While the novelty of the study is somehow appreciable, the bio-medical significance is not well demonstrated in the manuscript. The authors stated (in lines between lines 78-83) a number of potential side effects of GLP-1 analogs, how can the mechanistic study of GLP-1 production on its own be essential for the development of new drug targets for the treatment of diabetes. Furthermore, the study does not provide a clear mechanistic insight on how the claimed CaMKKbeta/CaMKIV-mTORC1 signaling pathway upregulated both GLP-1 production and secretion. This reviewer also has concerns about the experimental design and data presented in the current manuscript, including the issue of how proglucagon expression can be assessed by Western blotting.

      Strengths:

      The novelty of the concept.

      Weaknesses:

      Experimental design and key experiment information.

      We appreciate the reviewer's comments. Nowadays, GLP-1-based therapy is well-recognized and commonly used in treatment of Type 2 Diabetes Mellitus (T2DM). Therefore, elucidation of the mechanism that regulates GLP-1 production is essential for the development of new drug targets for the treatment of diabetes. We have revised the relevant wording in the manuscript.

      In our previous studies, we have elucidated the role of mTOR/S6K pathway in regulating GLP-1 production in L cells. Using STC-1 cell line and different mouse models, including Neurog3-Tsc1−/− mice, rapamycin or L-lucine treatment to stimulate mTOR activity, we have demonstrated that mTOR stimulates proglucagon gene expression and thus GLP-1 production (Diabetologia 2015;58(8):1887-97; Mol Cell Endocrinol. 2015 Nov 15:416:9-18.). Based on our previous studies, we found that Piezo1 regulated mTOR/S6K pathway and thus proglucagon expression and GLP-1 production through a Ca2+/CaMKKbeta/CaMKIV pathway in our present study. Although we could not exclude involvement of other signaling pathways downstream of Piezo1 in regulating the cleavage of proglucagon, granule maturation and the final release of GLP-1, our present study provided evidence to support the involvement of the Ca2+/CaMKKbeta/CaMKIV/mTOR pathway in mediating the role Piezo1 in proglucagon expression and GLP-1 production.

      The reviewer also expressed concerns on the use of western blot to detect proglucagon expression. Proglucagon is encoded by the GCG gene and is cleaved by PC1/3 in L cells to form mature GLP-1. In fact, measurement of intestinal proglucagon protein is a common approach for assessing GLP-1 production in the intestine. Here are some examples from other researchers: Diabetes. 2013 Mar;62(3):789-800. Gastroenterology. 2011 May;140(5):1564-74. 2004 Jul 23;279(30):31068-75. The proglucagon antibody used in our study was purchased from abcam (Cat#ab23468), which can detect proglucagon at 21 kDa.

      Reviewer #2 (Public Review):

      Summary:

      The study by Huang and colleagues focuses on GLP-1 producing entero-endocrine (EEC) L-cells and their regulation of GLP-1 production by a mechano-gated ion channel Piezo1. The study describes Piezo1 expression by L-cells and uses an exciting intersectional mouse model (villin to target epithelium and Gcg to target GLP-1-producing cells and others like glucagon-producing pancreatic endocrine cells), which allows L-cell specific Piezo1 knockout. Using this model, they find an impairment of glucose tolerance, increased body weight, reduced GLP-1 content, and changes to the CaMKKbeta-CaMKIV-mTORC1 signaling pathway using a normal diet and then high-fat diet. Piezo1 chemical agonist and intestinal bead implantation reversed these changes and improved the disrupted phenotype. Using primary sorted L-cells and cell model STC-1, they found that stretch and Piezo1 activation increased GLP-1 and altered the molecular changes described above.

      Strengths:

      This is an interesting study testing a novel hypothesis that may have important mechanistic and translational implications. The authors generated an important intersectional genetics mouse model that allowed them to target Piezo1 L-cells specifically, and the surprising result of impaired metabolism is intriguing.

      Weaknesses:

      However, there are several critical limitations that require resolution before making the conclusions that the authors make.

      (1) A potential explanation for the data, and one that is consistent with existing literature [see for example, PMC5334365, PMC4593481], is that epithelial Piezo1, which is broadly expressed by the GI epithelium, impacts epithelial cell density and survival, and as such, if Piezo1 is involved in L-cell physiology, it may be through regulation of cell density. Thus, it is critical to determine L-cell densities and epithelial integrity in controls and Piezo1 knockouts systematically across the length of the gut, since the authors do not make it clear which gut region contributes to the phenotype they see. Current immunohistochemistry data are not convincing.

      We appreciate the reviewer's comment and agree that Piezo1 may impact L-cell density and epithelial integrity. To address this, we have incorporated quantification of L-cell density in new Figure Supplement 7. The quantitative results demonstrate that the specific deletion of the piezo1 gene in L cells did not significantly impact L-cell density.

      Regarding epithelial integrity, we assessed the expression of tight junction proteins (ZO-1 and Occludin). As demonstrated in new Figure Supplement 8, the expression of tight junction proteins such as ZO-1 and Occludin did not show significant changes in IntL-Piezo1-/- mice compared to littermate controls.

      Furthermore, we conducted double immunofluorescence of Piezo1 and GLP-1 in the duodenum, jejunum, ileum, and colon of control and IntL-Piezo1-/- mice. As illustrated in new Figure Supplement 5, Piezo1 is expressed in GLP-1-positive cells of the duodenum, jejunum, ileum, and colon of control mice, but not in IntL-Piezo1-/- mice.

      (2) Calcium signaling in L-cells is implicated in their typical role of being gut chemo-sensors, and Piezo1 is a calcium channel, so it is not clear whether any calcium-related signaling mechanism would phenocopy these results.

      We agree with the reviewer that Piezo1 is a calcium channel (validation of the Ca2+ influx-mediated Piezo1 in primary L cells and STC-1 cells are shown in figure 4A-C and figure 5A-C). According to our study, calcium-related signaling mechanism such as calcium/calmodulin-dependent protein kinase kinase 2 (CaMKKβ) -Calcium/Calmodulin Dependent Protein Kinase IV (CaMKIV) may contribute the phenotype seen in the _IntL-Piezo1-/_mice. In addition, we also discussed other potential calcium-related signaling mechanisms in the article's discussion section (lines645-656).

      (3) Intestinal bead implantation, while intriguing, does not have clear mechanisms and is likely to provide a point of intestinal obstruction and dysmotility.

      We appreciate the reviewer’s comment. To ascertain if intestinal bead implantation led to intestinal obstruction and dysmotility, we conducted a bowel transit time test and detected the postoperative defecation (As shown in new Figure Supplement 9). The results revealed no difference in bowel transit time and fecal mass between the sham-operated mice and those implanted with beads. Furthermore, to assess whether the animals were in pain or under any discomfort after intestinal bead implantation, we performed abdominal mechanical sensitivity test three days after the surgery. As indicated in Figure Supplement 9C, no difference in abdominal pain threshold was observed between sham and bead-implanted mice. These results suggest that the mice did not experience discomfort during the experiment.

      (4) Previous studies, some that are very important, but not cited, contradict the presented results (e.g., epithelial Piezo1 role in insulin secretion) and require reconciliation.

      Thanks a lot for the point. We have cited more previous studies. The lack of changes in blood glucose seen in Villin-Piezo1-/- mice reported by Sugisawa et. al. is not surprising (Cell. 2020 Aug 6;182(3):609-624.e21.). Actually, in another recent study from our group, we found similar results when the Villin-Piezo1-/- mice Piezo1fl/fl control mice were fed with normal chow diet. Since Villin-1 is expressed in all the epithelial cells of the gut, including enterocytes and various types of endocrine cells, the effect of L-cell Piezo1 loss may be masked by other cell types under normal condition. However, impaired glucose tolerance was seen in Villin-Piezo1-/- mice compared to the Piezo1fl/fl control mice after high fat diet for 8 weeks. We further found that Piezo1 in enterocytes exerted a negative effect on the glucose and lipid absorption. Loss of Piezo1 in enterocytes led to over-absorption of nutrients under high-fat diet. (Tian Tao, Qing Shu, Yawen Zhao, Wenying Guo, Jinting Wang, Yuhao Shi, Shiqi Jia, Hening Zhai, Hui Chen, Cunchuan Wang*, Geyang Xu*, Mechanical regulation of lipid and sugar absorption by Piezo1 in enterocytes, Acta Pharmaceutica Sinica B , Accepted, 2024. (https://doi.org/10.1016/j.apsb.2024.04.016).

      Overall, this study makes an interesting observation but the data are not currently strong enough to support the conclusions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major concerns

      (1) Figure 1L was labeled wrong, and the co-localization was not clear. The KO leads to such a strong effect on the percentage of GLP-1 positive cells (panel M) but was not clearly demonstrated with immune-staining. Additional experiments are needed to prove tissue-specific knockout in gut GLP-1-producing cells only, but not in other cell lineages or elsewhere. If so, how was the change in gut Gcg mRNA expression? Importantly, this review is not clear on how to use Western blotting to measure proglucagon expression in the tissue samples. What is the size of the product? The antibody information was not provided in the manuscript. Figure 1N, a potential mechanism that affects GLP-1 production involving mTORC and downstream molecules. This comes from nowhere.

      We appreciate the reviewer's feedback. The incorrect label has been corrected in the new Figure 1L. As suggested, we have performed additional experiments to demonstrate tissue-specific knockout of Piezo1 in gut GLP-1-producing cells exclusively, excluding other cell lineages or locations.

      As shown in Figure Supplement 6, Piezo1 remains expressed in ileal ghrelin-positive cells and pancreatic glucagon-positive cells of IntL-Piezo1-/mice, suggesting that Piezo1 was specifically knocked out in L cells, but not in other endocrine cell types. Furthermore, the decrease was only observed in GLP-1 levels, but not PYY levels, in L cells of IntL-Piezo1-/- mice compared to controls, suggesting that the loss of Piezo1 in L cells affects GLP-1 levels specifically, but not the secretion of other hormones produced by L cells (Figure Supplement 7A-D).

      In our previous studies, we have elucidated the role of mTOR/S6K pathway in regulating GLP-1 production in L cells. Using STC-1 cell line and different mouse models, including Neurog3-Tsc1−/− mice, rapamycin or L-lucine treatment to stimulate mTOR activity, we have demonstrated that mTOR stimulates proglucagon gene expression and thus GLP-1 production (Diabetologia 2015;58(8):1887-97; Mol Cell Endocrinol. 2015 Nov 15:416:9-18.). Based on our previous studies, we found that Piezo1 regulated mTOR/S6K pathway and thus proglucagon expression and GLP-1 production through a Ca2+/CaMKKbeta/CaMKIV pathway in our present study.

      Although we could not exclude involvement of other signaling pathways downstream of Piezo1 in regulating the cleavage of proglucagon, granule maturation and the final release of GLP-1, our present study provided evidence to support the involvement of the Ca2+/CaMKKbeta/CaMKIV/mTOR pathway in mediating the role Piezo1 in proglucagon expression and GLP-1 production.

      The reviewer also expressed concerns on the use of western blot to detect proglucagon expression. Proglucagon is encoded by the GCG gene and is cleaved by PC1/3 in L cells to form mature GLP-1. In fact, measurement of intestinal proglucagon protein is a common approach for assessing GLP-1 production in the intestine. Here are some examples from other researchers: Diabetes. 2013 Mar;62(3):789-800. Gastroenterology. 2011 May;140(5):1564-74. 2004 Jul 23;279(30):31068-75. The proglucagon antibody used in our study was purchased from abcam (Cat#ab23468), which can detect proglucagon at 21 kDa.

      (2) In Figure 2, the LFD control mouse group was missing. Again, I don't understand the detection of proglucagon by Western blotting in this figure.

      We appreciate the reviewer's comments. The figure 1 presents the phenotypic changes of transgenic mice under low-fat diet feeding, while figure 2 focuses on the phenotypic changes of transgenic mice under high-fat diet feeding. As we mentioned before, western blot is often used in detection of the precursor of GLP-1 named proglucagon.

      (3) Why show body weight change but not body weight itself? How are the changes compared (which one serves as the control)? Again, how to do Western blotting on pro-glucagon detection?

      We appreciate the reviewer's comments. Body weight has been added in new figure3. Proglucagon is the precursor of GLP-1. Intestinal proglucagon protein measurement is commonly used to assess GLP-1 production in the intestine.

      (4) After reading the whole manuscript, this reviewer cannot get a clear picture of how the claimed CaMKKbeta-mTORC1 pathway mediates the function of Pieo1 activation (via the utilization of Yoda1 or intestinal bead implantation) on Gcg expression (at the transcription level or mRNA stability level?), hormone production, the genesis of GLP-1 producing cells, and the secretion of the hormone.

      We appreciate the reviewer's comments. Figure 7 showed that overexpression of CaMKKbeta and CaMKIV enhanced mTOR and S6K phosphorylation, proglucagon expression and GLP-1 secretoin, while CaMKKbeta inhibitor STO609 inhibited mTOR and S6K phosphorylation, proglucagon expression and GLP-1 secretoin, suggesting CaMKKbeta and CaMKIV was involved in GLP-1 production. Moreover, mTOR inhibitor rapamycin inhibited Yoda1-induced proglucagon expression and GLP-1 secretion. These results suggested that CaMKKbeta/CaMKIV/mTOR mediated the effect of Piezo1 on GLP-1 production.

      I strongly suggest that authors focus on more solid findings and dissect the mechanistic insight on something more meaningful, but not on everything (hormone coding gene expression, hormone production, and hormone secretion).

      GLP-1 production involves multiple steps, including proglucagon expression, protein cleavage, granule packaging and final release. In our present study, we focused on how mechanical signals regulated proglucagon expression in L-cells and thus promote GLP-1 production. We did not exclude the possibility that mechanical signals could also affect other step of GLP-1 production and we discussed this possibility in the discussion section.

      Minor concerns

      (1) Figure S1A. STC-1 is a Gcg expression cell line, which shows less amount of Peio1 mRNA when compared with most primary tissue samples tested. This does not support the fundamental role of Peio1 in regulating Gcg expression. Maybe qRT-PCR will be more helpful for establishing the correlation.

      Thanks a lot for the comments. As suggested, the results of qRT-PCR have been added in new Figure S1A.

      (2) There are numerous scientific presentation problems in the written manuscript. Necessary literature citations are missing especially for key methods (such as bean implantation).

      Thank you very much for your comments. We have made every effort to enhance the scientific presentation and have included the necessary literature citations.

      Reviewer #2 (Recommendations For The Authors):

      Overall, this study makes an interesting observation but the data are not currently strong enough to support the conclusions.

      (1) There needs to be data localizing Piezo1 to L-cells and importantly, this needs to be quantified - are all L-cells (small bowel and colon) Piezo1 positive?

      Thank you very much for your comments. We performed double immunofluorescence of Piezo1 and GLP-1 in the duodenum, jejunum, ileum, and colon of control and IntL-Piezo1-/- mice. As shown in new Figure Supplement 5, Piezo1 is expressed in about 90% of GLP-1-positive cells in the duodenum, jejunum, ileum, and colon of control mice, but not in IntL-Piezo1-/- mice.

      (2) The intersectional model for L-cell transduction needs deeper validation. Images in Figure 1e are not convincing for the transduction of GFP in L-cells. The co-localization studies are not convincing, especially because Piezo1 labeling is very broad. There needs to be stronger validation of the intersectional Gcg-Villin-Piezo1 KO model. It is important to determine whether L-cell Piezo1 localization epithelium in the small bowel and colon is present (above) and affected specifically in the knockout.

      Thanks a lot for the comments. In our study, we conducted a double immunofluorescence analysis for Piezo1 and GLP-1 across various segments of the gastrointestinal tract, including the duodenum, jejunum, ileum, and colon, in both control and IntL-Piezo1-/- mice. As illustrated in the newly incorporated Figure Supplement 5, it was observed that Piezo1 is indeed expressed within the cells of the aforementioned gastrointestinal segments in control mice, which are also positive for GLP-1 expression. In stark contrast, no evidence of Piezo1 expression was detected in the IntL-Piezo1-/- mice. Consistent with these findings, in situ hybridization experiments corroborated the absence of Piezo1 expression within GLP-1 positive cells in the IntL-Piezo1-/- mice, offering evidence for the successful knockout of Piezo1 in the L cells of these knockout mice. (Figure 1L and M).

      In Figure 1E, IntL-Cre mice were bred with mT/mG reporter mice to further validate Cre recombinase activity and specificity. All tissues and cells of mT/mG mice express red fluorescence (membrane-targeted tdTomato; mT) at baseline, and switch to membrane-targeted EGFP in the presence of cell-specific Cre. EGFP expression was only observed scatteredly in the intestine, but not in the pancreas, indicating the intestinal-specific Cre activity in the IntL-Cre mice (Figure 1E). We have revised the relevant expressions in the main text.

      (3) The authors state that "Villin-1 (encoded by Vill1 gene) is expressed in the gastrointestinal epithelium, including L cells, but not in pancreatic α cells" (lines 378-379). However, Villin is highly expressed in whole mouse islets (https://doi.org/10.1016/j.molmet.2016.05.015, Figure 1A).

      Thanks a lot for the comments. Although Hassan Mziaut et al. reported that Villin is highly expressed in whole mouse islets, in that article, only the co-localization of insulin cells with Villin is mentioned, while the co-localization of glucagon and Villin is lacking.

      According to our research (refer to Author response image 1 below) and previous study (Rutlin, M. et al, 2020, The Villin1 Gene Promoter Drives Cre Recombinase Expression in Extraintestinal Tissues. Cell Mol Gastroenterol Hepatol, 10(4), 864-867.e865. ), Villin is sparsely expressed in pancreatic tissue but not highly expressed in islets. We did not observed co-localization of glucagon and Villin in the pancreas (see Author response image 1A and B below). The same antibody was used to stain intestine, which show specific expression on the apical side of the intestinal villi (see Author response image 1C below).

      Author response image 1.

      (4) There needs to be quantification of L-cells in Piezo1 knockout. This is because several studies show Piezo1 affecting epithelial cell densities. If there are changes in L-cell or other EEC densities in Piezo1 knockout, that shift can potentially explain the changes that the authors see in glucose metabolism and weight.

      We appreciate the reviewer’s comment. We agree that Piezo1 may affect L-cell density and epithelial integrity.

      To assess epithelial integrity we examined the expression of tight junction proteins (ZO-1 and Occludin). As shown in new Figure Supplement 8, the expression of tight junction proteins, including ZO-1 and Occludin, remained unchanged in IntL-Piezo1-/- mice when compared to littermate controls.

      To assess the L-cell density, we stained PYY, another hormone mainly secreted by L cells, in both control and IntL-Piezo1-/- mice. As shown in new Figure Supplement 7A and B, the percentage of PYY positive cells were not significantly different between control and IntL-Piezo1-/- mice, suggesting that the L-cell density was not affected by Piezo1 knockout.

      (5) L-cells are classically considered to be chemosensors. Do nutritive signals, which presumably also increase calcium compete or complement or dominate L-cell GLP1 synthesis regulation?

      We appreciate the reviewer ’ s comment and agree that L-cells are traditionally considered to be chemosensors. It is also recognized that nutritive signals regulate L-cell GLP1 synthesis. We have addressed these points in lines 568-595. Both nutritive and mechanical signals regulate GLP-1 production. While the food needs to be digested and nutrients absorbed before L-cells can detect the nutritive signals, mechanical stimulation provides a more direct and rapid response. However, determining whether nutritive signals compete, complement with mechanical signals or dominate in L-cell GLP-1 production will require to be further explored.

      (6) The mechanism of Glp1 synthesis vs release downstream of Piezo1 is not clear. The authors hypothesize that "Piezo1 might regulate GLP-1 synthesis through the CaMKKβ/CaMKIV-mTOR signaling pathway". However, references cited suggest that Ca2+ or cAMP leads to GLP-1-release, while mTOR primarily acts on the regulation of gene expression by promoting Gcg gene expression. These pathways do not clearly link to Piezo1 GLP-1 production. These mechanisms need to be reconciled.

      Thanks a lot for the point. The effect of Piezo1-mediated Ca2+ increase on GLP-1 production may be two-fold: promote Gcg gene expression through CaMKKβ/CaMKIV-mTOR and promote GLP-1 release by degranulation. Both gene expression and release are important to sustained GLP-1 production.

      (7) Previous study PMID 32640190 (not cited here) found that Villin-driven Piezo1 knockout, which knocks out Piezo1 from all epithelial intestinal cells (including L-cells), showed no significant alterations in blood glucose or body weight. This is the opposite of the presented findings and therefore the current results require reconciliation.

      We have cited PMID 32640190 in our revised manuscript. The lack of changes in blood glucose seen in Villin-Piezo1-/- mice reported by Sugisawa et. al. is not surprising (Cell. 2020 Aug 6;182(3):609-624.e21.). Actually, in another recent study from our group, we found similar results when the Villin-Piezo1-/_mice _Piezo1fl/fl control mice were fed with normal chow diet. Since Villin-1 is expressed in all the epithelial cells of the gut, including enterocytes and various types of endocrine cells, the effect of L-cell Piezo1 loss may be masked by other cell types under normal condition. However, impaired glucose tolerance was seen in Villin-Piezo1-/- mice compared to the Piezo1fl/fl control mice after high fat diet for 8 weeks. We further found that Piezo1 in enterocytes exerted a negative effect on the glucose and lipid absorption. Loss of Piezo1 in enterocytes led to over-absorption of nutrients under high-fat diet (Tian Tao, Qing Shu, Yawen Zhao, Wenying Guo, Jinting Wang, Yuhao Shi, Shiqi Jia, Hening Zhai, Hui Chen, Cunchuan Wang, Geyang Xu, Mechanical regulation of lipid and sugar absorption by Piezo1 in enterocytes, Acta Pharmaceutica Sinica B, Accepted, 2024, https://doi.org/10.1016/j.apsb.2024.04.016).

      Reviewing Editor (Recommendations For The Authors):

      Your paper - while innovative in concept and interesting - has many flaws that in my opinion need to be corrected before the paper and pre-print should be published or uploaded as pre-print. Can you please make every effort to address the missing data that the Reviewers have asked for and correct the lack of references as noted in the reviews? Thank you.

      Thank you for the invaluable suggestions provided by the editors and reviewers. In response to these suggestions, we have included the missing data as requested and rectified the lack of references to the best of our ability. We hope that these revisions will effectively address the concerns raised by the editors and reviewers.

    1. Author response:

      Reviewer #1:

      Weaknesses:

      (1) The activity of the dominant negatives lacks appropriate controls. This is crucial given that mouse mutants for PG5, PG6, PG7, and three of the four PG4 genes show no major effects on limb induction or growth. Understanding these discrepancies is essential.

      Given the importance of the Loss of Function (LOF) experiments, we will provide additional evidence for the validity of the dominant-negative strategy and constructs used.

      (2) The authors mention redundancies in Hox activity, consistent with numerous previous reports. However, they only use single dominant-negative versions of each Hox paralog gene individually. If Hox4 and Hox5 functions are redundant, experiments should include simultaneous dominant negatives for both groups.

      To clarify redundancies in Hox activity, we will test whether simultaneous expression of dominant-negative forms of more than one Hox genes induces a stronger effect compared to the expression of a single dominant-negatives Hox genes.

      (3) The main conclusion that Hox4 and Hox5 provide permissive cues on which Hox6/7 induce the forelimb is not sufficiently supported by the data. An experiment expressing simultaneous dnHox4/5 and Hox6/7 is needed. If the hypothesis is correct, this should block Hox6/7's capacity to expand the limb bud or generate an extra bulge.

      We agree that this is an excellent additional experiment to corroborate our conclusion and will perform this experiment in our revision.

      (4) The identity of the extra bulge or extended limb bud is unclear. The only marker supporting its identity as a forelimb is Tbx5, while other typical limb development markers are absent. Tbx5 is also expressed in other regions besides the forelimb, and its presence does not guarantee forelimb identity. For instance, snakes express Tbx5 in the lateral mesoderm along much of their body axis.

      To date, Tbx5 is the best marker for the forelimb. While it is true that the Tbx5 expression is broader than the limb field, this occurs only at early stages before forelimb bud formation. We will work towards a further definition of this extra bulge.

      (5) It is important to analyze the skeletons of all embryos to assess the effect of reduced limb buds upon dnHox expression and determine whether extra skeletal elements develop from the extended bud or ectopic bulge.

      We have analysed the cartilage structure of operated embryos with GOF experiments and found no skeletal elements within the ectopic wing bud in the neck. Additionally, in our revision, we can further analyse the wing skeleton of operated embryos with LOF experiments, which would provide more detailed assessments of the impact of dominant-negative Hox genes on wing bud formation.

      Reviewer #2:

      Weaknesses

      (1) By contrast to the GOF experiments that induce ectopic limb budding, the LOF experiments, which use dominant negative forms of Hoxa4, Hoxa5, Hoxa6, and Hoxa7, are more challenging to interpret due to the absence of data on the specificity of the dominant negative constructs. Absent such controls, one cannot be certain that effects on limb development are due to disruption of the specific Hox proteins that are being targeted.

      We will revise our manuscript to clarify the specificity of the dominant-negative strategy used.

      (2) A test of their central hypothesis regarding the necessity and sufficiency of the Hox genes under investigation would be to co-transfect the neck with full-length Hoxa6/a7 AND the dnHoxA4/a5. If their hypothesis is correct, then the dn constructs should block the limb-inducing ability of Hoxa6/a7 overexpression (again, validation of specificity of the DN constructs is important here).

      This is an excellent idea and we will implement the experiment in our revision.

      (3) The paper could be strengthened by providing some additional data, which should already exist in their RNA-Seq dataset, such as supplementary material that shows the actual gene expression data that are represented in the Venn diagram, heatmap, and GO analysis in Figure 3.

      We will incorporate this suggestion and include additional data from our RNA-seq analysis.

      (4) The results of these experiments in chick embryos are rather unexpected based on previous knockout experiments in mice, and this needs to be discussed.

      In our revision, we will appropriately expand the discussion on the discrepancies observed between knockout mouse models and our chick embryo experiments.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Fernandez et al. investigate the influence of maternal behavior on bat pup vocal development in Saccopteryx bilineata, a species known to exhibit vocal production learning. The authors performed detailed longitudinal observations of wild mother-pup interactions to ask whether non-vocal maternal displays during juvenile vocal practice or 'babbling', affect vocal production. Specifically, the study examines the durations of pup babbling events and the developmental babbling phase, in relation to the amount of female display behavior, as well as pup age and the number of nearby singing adult males. Furthermore, the authors examine pup vocal repertoire size and maturation in relation to the number of maternal displays encountered during babbling. Statistical models identify female display behavior as a predictor of i) babbling bout duration, ii) the length of the babbling phase, iii) song composition, and iv) syllable maturation. Notably, these outcomes were not influenced by the number of nearby adult males (the pups' source of song models) and were largely independent of general maturation (pup age). These findings highlight the impact of non-vocal aspects of social interactions in guiding mammalian vocal development.

      We thank Reviewer 1 for the time and effort dedicated to the revision of our study. The suggestions for the revision of our manuscript were very helpful and will improve our manuscript. 

      Strengths:

      Historically, work on developmental vocal learning has focused on how juvenile vocalizations are influenced by the sounds produced by nearby adults (often males). In contrast, this study takes the novel approach of examining juvenile vocal ontogeny in relation to non-vocal maternal behavior, in one of the few mammals known to exhibit vocal production learning. The authors collected an impressive dataset from multiple wild bat colonies in two Central American countries. This includes longitudinal acoustic recordings and behavioral monitoring of individual mother-pup pairs, across development.

      The identified relationships between maternal behavior and bat pup vocalizations have intriguing implications for understanding the mechanisms that enable vocal production learning in mammals, including human speech acquisition. As such, these findings are likely to be relevant to a broad audience interested in the evolution and development of social behavior as well as sensory-motor learning.

      We thank reviewer 1 for this assessment. 

      Weaknesses:

      The authors qualitatively describe specific patterns of female displays during pup babbling, however, subsequent quantitative analyses are based on two aggregate measures of female behavior that pool across display types. Consequently, it remains unclear how certain maternal behaviors might differentially influence pup vocalizations (e.g. through specific feedback contingencies or more general modulation of pup behavioral states).

      In analyzing the effects of maternal behavior on song maturation, the authors focus on the most common syllable type produced across pups. This approach is justified based on the syllable variability within and across individuals, however, additional quantification and visual presentation of categorized syllable data would improve clarity and potentially strengthen resulting claims.

      We agree that our analysis of maternal behaviour does not investigate potential contingencies between particular maternal behavioural displays and pup vocalizations (e.g.

      particular syllable types). Our data collected for this study on maternal behaviour includes direct observations, field notes and/or video recordings. In the future, it will be necessary to work with high-speed cameras for the analysis of potential contingencies between particular maternal behavioural displays and specific pup vocalizations, which allow this kind of fine-detailed analysis. We have planned future studies investigating whether pup vocalizations elicit contingent maternal responses or vice versa. In the revision of our manuscript, we will include a comment pointing out that this special behaviour will be investigated in greater detail in the future. 

      As suggested by reviewer 1, in our revised manuscript we will include more information on methods to improve understandability. In particular, we will:

      - present more information on different steps of our acoustic analyses

      - provide additional and clearer spectrogram figures representing the different syllable types and categorizations 

      - change the figures accompanying our GLMM analyses following the suggestion of Reviewer 1

      Reviewer #2 (Public review):

      Summary:

      This study explores how maternal behaviors influence vocal learning in the greater sac-winged bat (Saccopteryx bilineata). Over two field seasons, researchers tracked 19 bat pups from six wild colonies, examining vocal development aspects such as vocal practice duration, syllable repertoire size, and song syllable acquisition. The findings show that maternal behaviors significantly impact the length of daily babbling sessions and the overall babbling phase, while the presence of adult male tutors does not.

      The researchers conducted detailed acoustic analyses, categorizing syllables and evaluating the variety and presence of learned song syllables. They discovered that maternal interactions enhance both the number and diversity of learned syllables and the production of mature syllables in the pups' vocalizations. A notable correlation was found between the extent of acoustic changes in the most common learned syllable type and maternal activity, highlighting the key role of maternal feedback in shaping pups' vocal development.

      In summary, this study emphasizes the crucial role of maternal social feedback in the vocal development of S. bilineata. Maternal behaviors not only increase vocal practice but also aid in acquiring and refining a complex vocal repertoire. These insights enhance our understanding of social interactions in mammalian vocal learning and draw interesting parallels between bat and human vocal development.

      We thank reviewer 2 for his/her time and effort dedicated to the revision of our study. The suggestions were very helpful in improving our manuscript. 

      Strengths:

      This paper makes significant contributions to the field of vocal learning by looking at the role of maternal behaviors in shaping the vocal learning phenotype of Saccopteryx bilineata. The paper uses a longitudinal approach, tracking the vocal ontogeny of bat pups from birth to weaning across six colonies and two field seasons, allowing the authors to assess how maternal interactions influence various aspects of vocal practice and learning, providing strong empirical evidence for the critical role of social feedback in non-human mammalian vocal learners. This kind of evidence highlights the complexity of the vocal learning phenotype and shows that it goes beyond the right auditory experience and having the right circuitry.

      The paper offers a nuanced understanding of how specific maternal behaviors impact the acquisition and refinement of the vocal repertoire, while showing the number of male tutors - the source of adult song - did not have much of an effect. The correlation between maternal activity and acoustic changes in learned syllable types is a novel finding that underscores the importance of non-vocal social interactions in vocal learning. In vocal learning research, with some notable exceptions, experience is often understood as auditory experience. This paper highlights how, even though that is one important piece of the puzzle, other kinds of experience directly affect the development of vocal behavior. This is of particular importance in the case of a mammalian species such as Saccopteryx bilineata, as this kind of result is perhaps more often associated with avian species.

      Moreover, the study's findings have broader implications for our understanding of vocal learning across species. By drawing parallels between bat and human vocal development (and in some ways to bird vocal development), the paper highlights common mechanisms that may underlie vocal practice and learning in both humans and other mammals. This interdisciplinary perspective enriches the field and encourages further comparative studies, ultimately advancing our knowledge of the evolutionary and developmental processes that shape vocal productive learning in all its dimensions.

      Weaknesses:

      Some weaknesses can be pointed out, but in fairness, the authors acknowledge them in one way or another. As such, these are not flaws per se, but gaps that can be filled with further research.

      Experimental manipulations, such as controlled playback experiments or controlled environments, could strengthen the causal claims by directly testing the effects of specific maternal behaviors on vocal development. Certainly, the strengths of the paper will be consolidated after such work is performed.

      The reliance on the number of singing males as a proxy for social acoustic input. This measure does not account for the variability in the quality, frequency, or duration of the male songs to which the pups are exposed. A more detailed analysis of the acoustic environment, including direct measurements of song exposure and its impact on vocal learning, would provide a clearer understanding of the role of male tutors.

      Finally, and although it would be unlikely that these results are unique to Saccopteryx bilineata, the study's focus on a single species limits at present the generalizability of some of its findings to other vocal learning mammals. While the parallels drawn between bat and human vocal development are intriguing, the conclusions will be more robust when supported by comparative studies involving multiple species of vocal learners. This will help to identify whether the observed maternal influences on vocal development reported here are unique to Saccopteryx bilineata or represent a broader phenomenon in chiropteran, mammalian, or general vocal learning. Expanding the scope of research to include a wider range of species and incorporating cross-species comparisons will significantly enhance the contribution of this study to the field of vocal learning.

      Thank you for your suggestions and comments. 

      Regarding your main comment 1: In the future, we plan to implement temporary captivity experiments to investigate how maternal behaviours affect pup vocal development. This study provides the necessary basis for conducting future playback studies investigating specific behaviours in a controlled environment.

      Regarding your main comment 2: We completely agree that the number of singing males only represents a proxy for acoustic input that pups receive during ontogeny. In the future, we plan to investigate in detail how the acoustic landscape influences pup vocal development and learning. This will include quantifying how long pups are exposed to song during ontogeny and, assessing the influence of different tutors, including a detailed analysis of song syllables of the adult tutors to compare it to vocal trajectories of song syllables in pups. 

      Regarding your main comment 3: We also fully agree that it is unlikely that these results are unique to Saccopteryx bilineata. We are certain that other mammalian vocal learners show parallels to the vocal development and learning processes of S. bilineata. Especially bats are a promising taxon for comparative studies because their vocal production and perception systems are highly sophisticated (due to their ability to echolocate). The high sociability of this taxon also includes a variety of social systems and vocal capacities (e.g. regarding vocal repertoire size, vocal learning capacities, information content, etc.) which support social learning and social feedback – as shown in our study. 

      As suggested, in our revised manuscript we will include information on the validation of the ethogram. Furthermore, we will correct all the spelling mistakes – thank you very much for pointing them out!

    1. Author response:

      We appreciate all the reviewers for their encouraging comments and thoughtful feedback. We are confident that we can incorporate many of the suggestions to provide a clearer overall picture in the revised manuscript. In particular, we agree with the reviewers' concern that some of our methodological decisions, including our choice of metrics, require further clarification. We will focus on revising the methods section to make these decisions more transparent and to address any misunderstandings related to the analysis.

      We also value the request to include more data, such as intermediate results and additional control analyses. We will carefully assess which results to include in the main manuscript and which to provide in an extended supplementary section.

      To offer a more detailed understanding of our quantification of "prediction tendency," we refer to our previous work (Schubert et al., 2023, 2024), where we elaborate on our analytical choices in great detail and provide additional control analyses (e.g., ensuring that the relationship with speech tracking is not driven by participants' signal-to-noise ratio; Schubert et al., 2023).

      Additionally, we would like to clarify that the aim of this manuscript is not to analyze viewing behavior in depth but to replicate the general finding of ocular speech tracking, as presented in Gehmacher et al. (2024). A thorough investigation of specific ocular contributions (e.g., microsaccades or blinks) would require a separate research question and distinct analysis approaches, given the binary nature of such events.

      Nevertheless, we share the reviewers' interest in independent results from the current study, and we plan to carefully select and present the most relevant findings in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their careful and overall positive evaluation of our work and the constructive feedback! To address the main concerns, we have:

      – Clarified a major misunderstanding of our instructions: Participants were only informed that they would receive different stimuli of medium intensity and were thus not aware that the stimulation temperature remained constant

      – Implemented a new analysis to evaluate how participants rated their expectation and pain levels in the control condition

      – Added a paragraph in the discussion in which we argue that our paradigm is comparable to previous studies

      Below, we provide responses to each of the reviewers’ comments on our manuscript.

      Reviewer #1 (Public Review):

      Summary:  

      In this important paper, the authors investigate the temporal dynamics of expectation of pain using a combined fMRI-EEG approach. More specifically, by modifying the expectations of higher or lower pain on a trial-to-trial basis, they report that expectations largely share the same set of activations before the administration of the painful stimulus, and that the coding of the valence of the stimulus is observed only after the nociceptive input has been presented. fMRIinformed EEG analysis suggested that the temporal sequence of information processing involved the Dorsolateral prefrontal cortex (DLPFC), the anterior insula, and the anterior cingulate cortex. The strength of evidence is convincing, and the methods are solid, but a few alternative interpretations about the findings related to the control group, as well as a more in-depth discussion on the correlations between the BOLD and EEG signals would strengthen the manuscript. 

      Thank you for your positive evaluation! In the revised version of the manuscript, we elaborated on the control condition and the BOLD-EEG correlations in more detail.

      Strengths:  

      In line with open science principles, the article presents the data and the results in a complete and transparent fashion. 

      From a theoretical standpoint, the authors make a step forward in our understanding of how expectations modulate pain by introducing a combination of spatial and temporal investigation. It is becoming increasingly clear that our appraisal of the world is dynamic, guided by previous experiences, and mapped on a combination of what we expect and what we get. New research methods, questions, and analyses are needed to capture these evolving processes.  

      Thank you very much for these positive comments!

      Weaknesses:  

      The control condition is not so straightforward. Across the manuscript it is defined as "no expectation", and in the legend of Figure 1 it is mentioned that the third state would be "no prediction". However, it is difficult to conceive that participants would not have any expectations or predictions. Indeed, in the description of the task it is mentioned that participants were instructed that they would receive stimuli during "intermediate sensitive states". The results of the pain scores and expectations might support the idea that the control condition is situated in between the placebo and nocebo conditions. However, since this control condition was not part of the initial conditioning, and participants had no reference to previous stimuli, one might expect that some ratings might have simply "regressed to the mean" for a lack of previous experience. 

      General considerations and reflections:  

      Inducing expectations in the desired direction is not a straightforward task, and results might depend on the exact experimental conditions and the comparison group. In this sense, the authors' choice of having 3 groups of positive, negative, and "neutral" expectations is to be praised. On the other hand, also control groups form their expectations, and this can constitute a confounder in every experiment using expectation manipulation, if not appropriately investigated. 

      Thank you for raising these important concerns! Firstly, as it seems that we did not explain the experimental procedure in a clear fashion, there appeared to be a general misunderstanding regarding our instructions. We want to emphasize that we did not tell participants that the stimulus intensity would always be the same, but that pain stimuli would be different temperatures of medium intensity. Furthermore, our instruction did not necessarily imply that our algorithm detected a state of medium sensitivity, but that the algorithm would not make any prediction, e.g., due to highly fluctuating states of pain sensitivity, or no clear-cut state of high or low pain sensitivity. We changed this in the Methods (ll. 556-560, 601-606, 612-614) and Results (ll. 181-192) sections of the manuscript to clarify these important features of our procedure.

      Then, we absolutely agree that participants explicitly and implicitly form expectations regarding all conditions over time, including the control condition. We carefully considered your feedback and rephrased the control condition, no longer framing it as eliciting “no expectations” but as “neutral expectations” in the revised version of the manuscript. This follows the more common phrasing in the literature and acknowledges that participants indeed build up expectations in the control condition. However, we do still think that we can meaningfully compare the placebo and nocebo condition to the control condition to investigate the neuronal underpinnings of expectation effects. Independently of whether participants build up an expectation of “medium” intensities in the control condition, which caused them to perceive stimuli in line with this expectation, or if they simply perceived the stimuli as they were (of medium intensity) with limited effects of expectations, the crucial difference to the placebo and nocebo conditions is that there was no alteration of perception due to previous experiences or verbal information and no shift of perception from the actual stimulus intensity towards any direction in the control condition. This allowed us to compare the neural basis of a modulation of pain perception in either direction to a condition in which this modulation did not take place. 

      Author response image 1.

      Variability within conditions over time. Relative variability index for expectation (left) and pain ratings (right) per condition and measurement block. 

      Lastly, we want to highlight that our finding of the control condition being rated in between the placebo and nocebo condition is in line with many previous studies that included similar control conditions and advanced our understanding of pain-related expectations (Bingel et al., 2011; Colloca et al., 2010; Shih et al., 2019). We thank the reviewer for the very interesting idea to evaluate the development of ratings in the control condition in more detail and added a new analysis to the manuscript in which we compared how much intra-subject variance was within the ratings of each of the three conditions and how much this variance changed over time. For this aim, we computed the relative variability index (Mestdagh et al., 2018), a measure that quantifies intra-subject variation over multiple ratings, and compared between the three conditions and the three measurement blocks. We observed differences in variances between conditions for both expectation (F(2,96) = 8.14, p < .001) and pain ratings (F(2,96) = 3.41, p = .037). For both measures, post-hoc tests revealed that there was significantly more variance in the placebo compared to the control condition (both p_holm < .05), but no difference between control and nocebo. The substantial and comparable variation in pain and expectation ratings in all three conditions (or at least between control and nocebo) shows that participants did not always expect and perceive the same intensity within conditions. Variance in expectation ratings decreased from the first block compared to the other two blocks (_F(1.35,64.64) = 5.69, p = .012; both p_holm < .05), which was not the case for pain ratings. Most importantly, there was no interaction effect of block and condition for neither expectation (_F(2.65,127.06) = 0.40, p = .728) nor pain ratings (F(4,192) = 0.48, p = .748), which implies that expectations were similarly dynamically updated in all conditions over the course of the experiment. This speak against a “regression to the mean” in the control condition and shows that control ratings fluctuated from trial to trial. We included this analysis and a more in-depth discussion of the choice of conditions in the Result (ll. 219-232) and Discussion (ll. 452-486) sections of the revised manuscript.

      In addition, although fMRI is still (probably) the best available tool we have to understand the spatial representation of cortical processing, limitations about not only the temporal but even the spatial resolution should be acknowledged. Given the anatomical and physiological complexity of the cortical connections, as we know from the animal world, it is still well possible that subcircuits are activated also for positive and negative expectations, but cannot be observed due to the limitation of our techniques. Indeed, on an empirical/evolutionary basis it would remain unclear why we should have a system that waits for the valence of a stimulus to show differential responses. 

      We agree that the spatial resolution of fMRI is limited and that our signal is often not able to dissociate different subcircuits. Whether on this basis differential processes occurred cannot be observed in fMRI but is indeed possible. We now include this reasoning in our Discussion (ll. 373-377):

      “Importantly, the spatial resolution of fMRI is limited when it comes to discriminating whether the same pattern of activity is due to identical activation or to activation in different sub-circuits within the same area. Nonetheless, the overlap of areas is an indicator for similar processes involved in a more general preparation process.

      Also, moving in a dimension of network and graph theory, one would not expect single areas to be responsible for distinct processes, but rather that they would integrate information in a shared way, potentially with different feedback and feedforward communications. As such, it becomes more difficult to assume the insula is a center for coding potential pain, perhaps more of a node in a system that signals potential dangers for the integrity of the body. 

      We appreciate the feedback on our interpretation of our results and agree that the overall network activity most likely determines how a large part of expectations and pain are coded. We therefore adjusted the Discussion, embedding the results in an interpretation considering networks (ll. 427-430, 432-435,438-442 ). 

      The authors analyze the EEG signal between 0.5 to 128 Hz, finding significant results in the correlation between single-trial BOLD and EEG activity in the higher gamma range (see Figure 6 panel C). It would be interesting to understand the rationale for including such high frequencies in the signal, and the interpretation of the significant correlation in the high gamma range. 

      On a technical level, we adapted our EEG processing pipeline from Hipp et al. (2011) who similarly investigated signals up to 128 Hz. Of note, the spectral smoothing was adjusted to match 3/4 octave, meaning that the frequency resolution at 128 Hz is rather broad and does not only contain oscillations at 128 Hz sharp. Gamma oscillations in general have repeatedly been reported in relation to pain and feedforward signals reflecting noxious information (e.g. Ploner et al., 2017; Strube et al., 2021). Strube et al. (2021) reported the highest effects of pain stimulus intensity and prediction error processing at high gamma frequencies (100 and 98 Hz, respectively). These findings could also serve as basis to interpret our results in this frequency range: If anticipatory activation in the ACC is linked to high gamma oscillations, which appear to play an important role in feedforward signaling of pain intensity and prediction errors, this could indicate that later processing of intensity in this area is already pre-modulated before the stimulus actually occurs. Of note: although not significant, it looks as if the cluster extends further into pain processing on a descriptive level. We added additional explanation regarding the interpretation of the correlation in the Discussion (ll. 414425):

      “The link between anticipatory activity in the ACC and EEG oscillatory activity was observed in the high gamma band, which is consistent with findings that demonstrate a connection between increased fMRI BOLD signals and a relative shift from lower to higher frequencies (Kilner et al., 2005). Gamma oscillations have been repeatedly reported in the context of pain and expectations and have been interpreted as reflecting feedforward signals of noxious information ( e.g. Ploner et al., 2017; Strube et al., 2021). In combination with our findings, this might imply that high frequency oscillations may not only signal higher actual or perceived pain intensity during pain processing (Nickel et al., 2022; Ploner et al., 2017; Strube et al., 2021; Tu et al., 2016), but might also be instrumental in the transfer of directed expectations from anticipation into pain processing.”

      Reviewer #2 (Public Review):  

      I think this is a very promising paper. The combination of EEG and fMRI is unique and original. However, I also have some suggestions that I think could help improve the manuscript. 

      This manuscript reports the findings of an EEG-fMRI study (n = 50) on the effects of expectations on pain. The combination of EEG with fMRI is extremely original and well-suited to study the transition from expectation to perception. However, I think that the current treatment of the data, as well as the way that the manuscript is currently written, does not fully capitalize on the potential of this unique dataset. Several findings are presented but there is currently no clear message coming out of this manuscript. 

      First, one positive point is that the experimental manipulation clearly worked. However, it should be noted that the instructions used are not typical of studies on placebo/nocebo. Participants were not told that the stimulations would be of higher/lower intensity. Rather, they were told that objective intensities were held constant, but that EEG recordings could be used to predict whether they would perceive the stimulus as more or less intense. I think that this is an interesting way to manipulate expectations, but there could have been more justification in the introduction for why the authors have chosen this unusual procedure. 

      Most importantly, we again want to emphasize again that participants were not aware that the stimulation temperature was always the same but were informed that they would receive different stimuli of medium intensity. We now clarify this in the revised Results (ll. 190-192) and Methods (ll. 612-614) sections.

      While we agree that our procedure was not typical, we do not think that the manipulation is not comparable to previous studies on pain-related expectations. To our knowledge, either expectations regarding a treatment that changes pain perception (treatment expectancy) or expectations regarding stimulus intensities (stimulus expectancy) are manipulated (see Atlas & Wager, 2014). In our study, participants received a cue that induced expectations in regard to a ”treatment”, although in this case the “treatment” came from changes in their own brain activity. This is comparable to studies using TENS-devices that are supposedly changing peripheral pain transmission (Skvortsova et al., 2020). Thus, although not typical, our paradigm could be classified as targeting treatment expectancies and allowed us to examine effects on a trial-by-trial level within subjects. We added a paragraph regarding the comparability of our paradigm with previous studies in the Discussion of the revised manuscript (ll. 452-464) .

      Also, the introduction mentions that little is known about potential cerebral differences between expectations of high vs. low pain expectations. I think the fear conditioning literature could be cited here. Activations in ACC, SMA, Ins, parahippocampal gyrus, PAG, etc. are often associated with upcoming threat, whereas activations vmPFC/default mode network are associated with safety. 

      We thank you for your suggestions to add literature on fear conditioning. We agree there is some overlap between fear conditioning and expectation effects in humans, but we also believe there are fundamental differences regarding their underlying processes and paradigms. E.g. the expectation effects are not driven by classical learning algorithms but act in a large amount as self-fulfilling prophecies (see e.g. Jepma et al., 2018). However, we now acknowledge the similarities e.g in the recruitment of the insula and the vmPFC of the modalities in our Introduction (ll. 132-136 ).

      The fact that the authors didn't observe a clearer distinction between high and low expectations here could be related to their specific instructions that imply that the stimulus is the same and that it is the subjective perception that is expected to change. In any case, this is a relatively minor issue that is easy to address. 

      We apologize again for the lack of clarity in our instructions: Participants were unaware that they would receive the exact same stimulus. The clear effects of the different conditions on expectation and pain ratings also challenge the notion that participants always expected the same level of stimulation and/or perception. Additionally, if participants were indeed expecting a consistent level of intensity in all conditions, one would also assume to see the same anticipatory activation in the control condition as in the placebo and nocebo conditions, which is not the case. Thus, we respectfully disagree that the common effects might be explained by our instructions but would argue that they indeed reflect common (anticipatory) processes of positive and negative expectations.

      Towards the end of the introduction, the authors present the aims of the study in mainly exploratory terms: 

      (1) What are the differences between anticipation and perception? 

      (2) What regions display a difference between high and low expectations (high > low or low < high) vs. an effect of expectation regardless of the direction (high and low different than neutral)? 

      I think these are good questions, but the authors should provide more justification, or framework, for these questions. More specifically, what will they be able to conclude based on their observations? 

      For instance (note that this is just an example to illustrate my point. I encourage the authors to come up with their own framework/predictions) : 

      (1) Possibility #1: A certain region encodes expectations in a directed fashion (high > low) and that same region also responds to perception in the same direction (high > low). This region would therefore modulate pain by assimilating perception towards expectations. 

      (2) Possibility # 2: different regions are involved in expectation and perception. Perhaps this could mean that certain regions influence pain processing through descending facilitation for instance...  

      Thank you for pointing out that our hypotheses were not crafted carefully enough. We tried to give better explanations for the possible interpretations of our hypotheses. Additionally, we interpreted our results on the background of a broader framework for placebo and nocebo effects (predictive coding) to derive possible functions of the described brain areas. We embedded this in our Introduction (ll. 74-86, 158-175 ) and Discussion (ll. 384-388 ), interpreting the anticipatory activity and the activity during pain processing in the context of expectation formation as described in Büchel et al. (2014).

      Interpretation derived from our framework (ll. 384-388):

      e.g.: “Following the framework of predictive coding, our results would suggest that the DPMS is the network responsible for integrating ascending signals with descending signals in the pain domain and that this process is similar for positive and negative valences during anticipation of pain but differentiates during pain processing.”

      Regarding analyses, I think that examining the transition from expectations to perception is a strong angle of the manuscript given the EGG-fMRI nature of the study. However, I feel that more could have been done here. One problem is that the sequence of analyses starts by identifying an fMRI signal of interest and then attempts to find its EEG correlates. The problem is that the low temporal resolution of fMRI makes it difficult to differentiate expectation from perception, which doesn't make this analysis a good starting point in my opinion. Why not start by identifying an EEG signal that differentiates perception vs expectation, and then look for its fMRI correlates?  

      We appreciate your feedback on the transition from expectations to perceptions and also think that additional questions could be answered with our data set. However, based on the literature we had specific hypotheses regarding specific brain areas, and we therefore decided to start from the fMRI data with the superior spatial resolution and EEG was used to focus on the temporal dynamics within the areas important for anticipatory processes. We share the view that many different approaches in analyzing our data are possible. On the other hand, identifying relevant areas based on EEG characteristics inherits even more uncertainty due to the spatial filtering of the EEG signal. For the research question of this study a more accurate evaluation of the involved areas and the related representation was more important. We therefore decided to only implement the procedure already present in the manuscript. 

      Finally, I found the hypotheses on "valenced" vs. "absolute" effects a little bit more difficult to follow. This is because "neutral" is not really neutral: it falls in between low and high. If I follow correctly, participants know that the temperature is always the same. Therefore, if they are told that the machine cannot predict whether their perception is going to be low or high, then it must be because it is likely to be in between. Ratings of expectation and pain ratings confirm that. The neutral condition is not "devoid" of expectations as the authors suggest.

      Therefore, it would make sense to look at regions with the following pattern low > neutral > high, or vice-versa, low < neutral < high. Low & high being different than neutral is more difficult to interpret. I don't think that you can say that it reflects "absolute" expectations because neutral is also the expectation of a medium temperature. Perhaps it reflects "certainty/uncertainty" or something like that, but it is not clear that it reflects "expectations". 

      Thank you for your valuable feedback! We considered your concerns about the interpretation of our results and completely agree that the control condition cannot be interpreted as void of expectations (ll. 119-123). We therefore evaluated the control condition in more detail in a separate analysis (ll. 219-232) and integrated a new assessment of the conditions into the Discussion (ll. 465-486). We changed the phrasing of our control condition to “neutral expectations”, as we agree that the control condition is not void of expectations and this phrasing is more in line with other studies (e.g. Colloca et al., 2010; Freeman et al., 2015; Schmid et al., 2015). We would argue that the neutral expectations can still be meaningfully compared to positive and negative expectations because only the latter shift expectations and perception in one direction. Thus, we changed our wording throughout the manuscript to acknowledge that we indeed did not test for general effects of expectations vs. no expectations, but for effects of directed expectations. Please also see our reasoning regarding the control condition in response to Reviewer 1, in which we addressed the interpretation of the control condition. We therefore still believe that the contrasts that we calculated between conditions are valid. The proposed new contrast largely overlaps with our differential contrast low>high and vice versa already reported in the manuscript (for additional results also see Supplements).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Figure 6, panel C. The figure mentions Anterior Cingulate Cortex R, whereas the legend mentions left ACC. Please check. 

      Thanks for catching this, we changed the figure legend accordingly.

      Reviewer #2 (Recommendations For The Authors):  

      - I don't think that activity during the rating of expectations is easily interpretable. I think I would recommend not reporting it. 

      The majority of participants completed the expectation rating relatively quickly (M = 2.17 s, SD = 0.35 s), which resulted in the overlap between the DLPFC EEG cluster and the expectation rating encompassing only a limited portion of the cluster (~ 1 s). We agree that this activity still is more difficult to interpret, yet we have decided to report it for reasons of completeness.

      - The effects on SIIPS are interesting. I think that it is fine to present them as a "validation" of what was observed with pain ratings, but it also seems to give a direction to the analyses that the authors don't end up following. For instance, why not try other "signatures" like the NPS or signatures of pain anticipation? Also, why not try to look at EEG correlates of SIIPS? I don't think that the authors "need" to do any of that, but I just wanted to let them know that SIIPS results may stir that kind of curiosity in the readers.  

      While this would be indeed very interesting, these additional analyses are not directly related to our current research question. We fear that too many analyses could be confusing for the readers. Nonetheless, we are grateful for your suggestion and will implement additional brain signatures in future studies. 

      - The shock was calibrated to be 60%. Why not have high (70%) and low (30%) conditions at equal distances from neutral, like 80% and 40% for instance? The current design makes it hard to distinguish high from control. Perhaps the "common" effects of high + low are driven by a deactivation for low (30%)?  

      We appreciate your feedback! We adjusted the temperature during the test phase to counteract habituation typically happening with heat stimuli. We believe that this was a good measure as participants rated the control condition at roughly VAS 50 (M = 51.40) which was our target temperature and then would be equidistant to the VAS 70 and VAS 30 during conditioning when no habituation should have taken place yet. We further tested whether participants rated placebo and nocebo trials at equal distances from the control condition and found no existent bias for either of the conditions. To do this, we computed the individual placebo effect (control minus placebo) and nocebo effect (nocebo minus control) for each participant during the test phase and statistically compared whether they differed in terms of magnitude. There was no significant difference between placebo and nocebo effects for both expectation (placebo effect M = 14.25 vs. nocebo effect M = 17.22, t(49) = 1.92, p = .061) and pain ratings (placebo effect M = 6.52 vs. nocebo effect M = 5.40, t(49) = -1.11, p = .274). This suggests that our expectation manipulation resulted in comparable shifts in expectation and pain ratings away from the control condition for both the placebo and nocebo condition and thus hints against any bias of the conditioning temperatures. Please also note that the analysis of the common effects was masked for differences of the high and low, therefore the effects cannot be driven by one condition by itself.

      - If I understand correctly, all fMRI contrasts were thresholded with FWE. This is fine, but very strict. The authors could have opted for FDR. Maybe I missed something here....  

      While it is true that FDR is the more liberal approach, it is not valid for spatially correlated fMRI data and is no longer available in SPM for the correction of multiple comparisons. The newly implemented topological peak based FDR correction is comparably sensitive with the FWE correction (see. Chumbley et al. BELEG). We opted for the slightly more conservative approach in our preregistration (_p_FWE < .05), therefore a change of the correction is not possible.

      Altogether, I think that this is a great study. The combination of EEG and fMRI is truly unique and affords many opportunities to examine the transition from expectations to perception. The experimental manipulation of expectations seems to have worked well, and there seem to be very promising results. However, I think that more could have been done. At least, I would recommend trying to give more of a theoretical framework to help interpret the results.  

      We are very grateful for your positive feedback. We took your suggestion seriously and tried to implement a more general framework from the literature (see Büchel et al., 2014) to provide a better explanation for our results.

      References

      Atlas, L. Y., & Wager, T. D. (2014). A meta-analysis of brain mechanisms of placebo analgesia: Consistent findings and unanswered questions. Handbook of Experimental Pharmacology, 225, 37–69. https://doi.org/10.1007/978-3-662-44519-8_3

      Bingel, U., Wanigasekera, V., Wiech, K., Ni Mhuircheartaigh, R., Lee, M. C., Ploner, M., & Tracey, I. (2011). The effect of treatment expectation on drug efficacy: Imaging the analgesic benefit of the opioid remifentanil. Science Translational Medicine, 3(70), 70ra14. https://doi.org/10.1126/scitranslmed.3001244

      Büchel, C., Geuter, S., Sprenger, C., & Eippert, F. (2014). Placebo analgesia: A predictive coding perspective. Neuron, 81(6), 1223–1239. https://doi.org/10.1016/j.neuron.2014.02.042

      Colloca, L., Petrovic, P., Wager, T. D., Ingvar, M., & Benedetti, F. (2010). How the number of learning trials affects placebo and nocebo responses. Pain, 151(2), 430–439. https://doi.org/10.1016/j.pain.2010.08.007

      Freeman, S., Yu, R., Egorova, N., Chen, X., Kirsch, I., Claggett, B., Kaptchuk, T. J., Gollub, R. L., & Kong, J. (2015). Distinct neural representations of placebo and nocebo effects. NeuroImage, 112, 197–207. https://doi.org/10.1016/j.neuroimage.2015.03.015

      Hipp, J. F., Engel, A. K., & Siegel, M. (2011). Oscillatory synchronization in large-scale cortical networks predicts perception. Neuron, 69(2), 387–396. https://doi.org/10.1016/j.neuron.2010.12.027

      Jepma, M., Koban, L., van Doorn, J., Jones, M., & Wager, T. D. (2018). Behavioural and neural evidence for self-reinforcing expectancy effects on pain. Nature Human Behaviour, 2(11), 838–855. https://doi.org/10.1038/s41562-018-0455-8

      Kilner, J. M., Mattout, J., Henson, R., & Friston, K. J. (2005). Hemodynamic correlates of EEG: A heuristic. NeuroImage, 28(1), 280–286. https://doi.org/10.1016/j.neuroimage.2005.06.008

      Nickel, M. M., Tiemann, L., Hohn, V. D., May, E. S., Gil Ávila, C., Eippert, F., & Ploner, M. (2022). Temporal-spectral signaling of sensory information and expectations in the cerebral processing of pain. Proceedings of the National Academy of Sciences of the United States of America, 119(1). https://doi.org/10.1073/pnas.2116616119

      Ploner, M., Sorg, C., & Gross, J. (2017). Brain Rhythms of Pain. Trends in Cognitive Sciences, 21(2), 100–110. https://doi.org/10.1016/j.tics.2016.12.001

      Schmid, J., Bingel, U., Ritter, C., Benson, S., Schedlowski, M., Gramsch, C., Forsting, M., & Elsenbruch, S. (2015). Neural underpinnings of nocebo hyperalgesia in visceral pain: A fMRI study in healthy volunteers. NeuroImage, 120, 114–122. https://doi.org/10.1016/j.neuroimage.2015.06.060

      Shih, Y.‑W., Tsai, H.‑Y., Lin, F.‑S., Lin, Y.‑H., Chiang, C.‑Y., Lu, Z.‑L., & Tseng, M.‑T. (2019). Effects of Positive and Negative Expectations on Human Pain Perception Engage Separate But Interrelated and Dependently Regulated Cerebral Mechanisms. Journal of Neuroscience, 39(7), 1261–1274. https://doi.org/10.1523/JNEUROSCI.2154-18.2018

      Skvortsova, A., Veldhuijzen, D. S., van Middendorp, H., Colloca, L., & Evers, A. W. M. (2020). Effects of Oxytocin on Placebo and Nocebo Effects in a Pain Conditioning Paradigm: A Randomized Controlled Trial. The Journal of Pain, 21(3-4), 430–439. https://doi.org/10.1016/j.jpain.2019.08.010

      Strube, A., Rose, M., Fazeli, S., & Büchel, C. (2021). The temporal and spectral characteristics of expectations and prediction errors in pain and thermoception. ELife, 10. https://doi.org/10.7554/eLife.62809

      Tu, Y., Zhang, Z., Tan, A., Peng, W., Hung, Y. S., Moayedi, M., Iannetti, G. D., & Hu, L. (2016). Alpha and gamma oscillation amplitudes synergistically predict the perception of forthcoming nociceptive stimuli. Human Brain Mapping, 37(2), 501–514. https://doi.org/10.1002/hbm.23048

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, Misic et al showed that white matter properties can be used to classify subacute back pain patients that will develop persisting pain.

      Strengths:

      Compared to most previous papers studying associations between white matter properties and chronic pain, the strength of the method is to perform a prediction in unseen data. Another strength of the paper is the use of three different cohorts. This is an interesting paper that provides a valuable contribution to the field.

      We thank the reviewer for emphasizing the strength of our paper and the importance of validation on multiple unseen cohorts.

      Weaknesses:

      The authors imply that their biomarker could outperform traditional questionnaires to predict pain: "While these models are of great value showing that few of these variables (e.g. work factors) might have significant prognostic power on the long-term outcome of back pain and provide easy-to-use brief questionnaires-based tools, (21, 25) parameters often explain no more than 30% of the variance (28-30) and their prognostic accuracy is limited.(31)". I don't think this is correct; questionnaire-based tools can achieve far greater prediction than their model in about half a million individuals from the UK Biobank (Tanguay-Sabourin et al., A prognostic risk score for the development and spread of chronic pain, Nature Medicine 2023).

      We agree with the reviewer that we might have under-estimated the prognostic accuracy of questionnaire-based tools, especially, the strong predictive accuracy shown by Tangay-Sabourin 2023.  In this revised version, we have changed both the introduction and the discussion to reflect the questionnaire-based prognostic accuracy reported in the seminal work by Tangay-Sabourin. 

      In the introduction (page 4, lines 3-18), we now write:

      “Some studies have addressed this question with prognostic models incorporating demographic, pain-related, and psychosocial predictors.1-4 While these models are of great value showing that few of these variables (e.g. work factors) might have significant prognostic power on the long-term outcome of back pain, their prognostic accuracy is limited,5 with parameters often explaining no more than 30% of the variance.6-8. A recent notable study in this regard developed a model based on easy-to-use brief questionnaires to predict the development and spread of chronic pain in a variety of pain conditions capitalizing on a large dataset obtained from the UK-BioBank. 9 This work demonstrated that only few features related to assessment of sleep, neuroticism, mood, stress, and body mass index were enough to predict persistence and spread of pain with an area under the curve of 0.53-0.73. Yet, this study is unique in showing such a predictive value of questionnaire-based tools. Neurobiological measures could therefore complement existing prognostic models based on psychosocial variables to improve overall accuracy and discriminative power. More importantly, neurobiological factors such as brain parameters can provide a mechanistic understanding of chronicity and its central processing.”

      And in the conclusion (page 22, lines 5-9), we write:

      “Integrating findings from studies that used questionnaire-based tools and showed remarkable predictive power9 with neurobiological measures that can offer mechanistic insights into chronic pain development, could enhance predictive power in CBP prognostic modeling.”

      Moreover, the main weakness of this study is the sample size. It remains small despite having 3 cohorts. This is problematic because results are often overfitted in such a small sample size brain imaging study, especially when all the data are available to the authors at the time of training the model (Poldrack et al., Scanning the horizon: towards transparent and reproducible neuroimaging research, Nature Reviews in Neuroscience 2017). Thus, having access to all the data, the authors have a high degree of flexibility in data analysis, as they can retrain their model any number of times until it generalizes across all three cohorts. In this case, the testing set could easily become part of the training making it difficult to assess the real performance, especially for small sample size studies.

      The reviewer raises a very important point of limited sample size and of the methodology intrinsic of model development and testing. We acknowledge the small sample size in the “Limitations” section of the discussion.   In the resubmission, we acknowledge the degree of flexibility that is afforded by having access to all the data at once. However, we also note that our SLF-FA based model is a simple cut-off approach that does not include any learning or hidden layers and that the data obtained from Open Pain were never part of the “training” set at any point at either the New Haven or the Mannheim site.  Regarding our SVC approach we follow standard procedures for machine learning where we never mix the training and testing sets. The models are trained on the training data with parameters selected based on cross-validation within the training data. Therefore, no models have ever seen the test data set. The model performances we reported reflect the prognostic accuracy of our model. We write in the limitation section of the discussion (page 20, lines 20-21, and page 21, lines 1-6):

      “In addition, at the time of analysis, we had “access” to all the data, which may lead to bias in model training and development.  We believe that the data presented here are nevertheless robust since multisite validated but need replication. Additionally, we followed standard procedures for machine learning where we never mix the training and testing sets. The models were trained on the training data with parameters selected based on cross-validation within the training data. Therefore, no models have ever seen the test data set. The model performances we reported reflect the prognostic accuracy of our model”. 

      Finally, as discussed by Spisak et al., 10 the key determinant of the required sample size in predictive modeling is the ” true effect size of the brain-phenotype relationship”, which we think is the determinant of the replication we observe in this study. As such the effect size in the New Haven and Mannheim data is Cohen’s d >1.

      Even if the performance was properly assessed, their models show AUCs between 0.65-0.70, which is usually considered as poor, and most likely without potential clinical use. Despite this, their conclusion was: "This biomarker is easy to obtain (~10 min of scanning time) and opens the door for translation into clinical practice." One may ask who is really willing to use an MRI signature with a relatively poor performance that can be outperformed by self-report questionnaires?

      The reviewer is correct, the model performance is fair which limits its usefulness for clinical translation.  We wanted to emphasize that obtaining diffusion images can be done in a short period of time and, hence, as such models’ predictive accuracy improves, clinical translation becomes closer to reality. In addition, our findings are based on older diffusion data and limited sample sizes coming from different sites and different acquisition sequences.  This by itself would limit the accuracy especially since the evidence shows that sample size affects also model performance (i.e. testing AUC)10.  In the revision, we re-worded the sentence mentioned by the reviewer to reflect the points discussed here. This also motivates us to collect a more homogeneous and larger sample.  In the limitations section of the discussion, we now write (page 21, lines 6-9):

      “Even though our model performance is fair, which currently limits its usefulness for clinical translation, we believe that future models would further improve accuracy by using larger homogenous sample sizes and uniform acquisition sequences.”

      Overall, these criticisms are more about the wording sometimes used and the inference they made. I think the strength of the evidence is incomplete to support the main claims of the paper.

      Despite these limitations, I still think this is a very relevant contribution to the field. Showing predictive performance through cross-validation and testing in multiple cohorts is not an easy task and this is a strong effort by the team. I strongly believe this approach is the right one and I believe the authors did a good job.

      We thank the reviewer for acknowledging that our effort and approach were useful.

      Minor points:

      Methods:

      I get the voxel-wise analysis, but I don't understand the methods for the structural connectivity analysis between the 88 ROIs. Have the authors run tractography or have they used a predetermined streamlined form of 'population-based connectome'? They report that models of AUC above 0.75 were considered and tested in the Chicago dataset, but we have no information about what the model actually learned (although this can be tricky for decision tree algorithms). 

      We apologize for the lack of clarity; we did run tractography and we did not use a pre-determined streamlined form of the connectome.

      Finding which connections are important for the classification of SBPr and SBPp is difficult because of our choices during data preprocessing and SVC model development: (1) preprocessing steps which included TNPCA for dimensionality reduction, and regressing out the confounders (i.e., age, sex, and head motion); (2) the harmonization for effects of sites; and (3) the Support Vector Classifier which is a hard classification model11.

      In the methods section (page 30, lines 21-23) we added: “Of note, such models cannot tell us the features that are important in classifying the groups.  Hence, our model is considered a black-box predictive model like neural networks.”

      Minor:

      What results are shown in Figure 7? It looks more descriptive than the actual results.

      The reviewer is correct; Figure 7 and Supplementary Figure 4 were both qualitatively illustrating the shape of the SLF. We have now changed both figures in response to this point and a point raised by reviewer 3.  We now show a 3D depiction of different sub-components of the right SLF (Figure 7) and left SLF (Now Supplementary Figure 11 instead of Supplementary Figure 4) with a quantitative estimation of the FA content of the tracts, and the number of tracts per component.  The results reinforce the TBSS analysis in showing asymmetry in the differences between left and right SLF between the groups (i.e. SBPp and SBPr) in both FA values and number of tracts per bundle.

      Reviewer #2 (Public Review):

      The present study aims to investigate brain white matter predictors of back pain chronicity. To this end, a discovery cohort of 28 patients with subacute back pain (SBP) was studied using white matter diffusion imaging. The cohort was investigated at baseline and one-year follow-up when 16 patients had recovered (SBPr) and 12 had persistent back pain (SBPp). A comparison of baseline scans revealed that SBPr patients had higher fractional anisotropy values in the right superior longitudinal fasciculus SLF) than SBPp patients and that FA values predicted changes in pain severity. Moreover, the FA values of SBPr patients were larger than those of healthy participants, suggesting a role of FA of the SLF in resilience to chronic pain. These findings were replicated in two other independent datasets. The authors conclude that the right SLF might be a robust predictive biomarker of CBP development with the potential for clinical translation.

      Developing predictive biomarkers for pain chronicity is an interesting, timely, and potentially clinically relevant topic. The paradigm and the analysis are sound, the results are convincing, and the interpretation is adequate. A particular strength of the study is the discovery-replication approach with replications of the findings in two independent datasets.

      We thank reviewer 2 for pointing to the strength of our study.

      The following revisions might help to improve the manuscript further.

      - Definition of recovery. In the New Haven and Chicago datasets, SBPr and SBPp patients are distinguished by reductions of >30% in pain intensity. In contrast, in the Mannheim dataset, both groups are distinguished by reductions of >20%. This should be harmonized. Moreover, as there is no established definition of recovery (reference 79 does not provide a clear criterion), it would be interesting to know whether the results hold for different definitions of recovery. Control analyses for different thresholds could strengthen the robustness of the findings.

      The reviewer raises an important point regarding the definition of recovery.  To address the reviewers’ concern we have added a supplementary figure (Fig. S6) showing the results in the Mannheim data set if a 30% reduction is used as a recovery criterion, and in the manuscript (page 11, lines 1,2) we write: “Supplementary Figure S6 shows the results in the Mannheim data set if a 30% reduction is used as a recovery criterion in this dataset (AUC= 0.53)”.

      We would like to emphasize here several points that support the use of different recovery thresholds between New Haven and Mannheim.  The New Haven primary pain ratings relied on visual analogue scale (VAS) while the Mannheim data relied on the German version of the West-Haven-Yale Multidimensional Pain Inventory. In addition, the Mannheim data were pre-registered with a definition of recovery at 20% and are part of a larger sub-acute to chronic pain study with prior publications from this cohort using the 20% cut-off12. Finally, a more recent consensus publication13 from IMMPACT indicates that a change of at least 30% is needed for a moderate improvement in pain on the 0-10 Numerical Rating Scale but that this percentage depends on baseline pain levels.

      - Analysis of the Chicago dataset. The manuscript includes results on FA values and their association with pain severity for the New Haven and Mannheim datasets but not for the Chicago dataset. It would be straightforward to show figures like Figures 1 - 4 for the Chicago dataset, as well.

      We welcome the reviewer’s suggestion; we added these analyses to the results section of the resubmitted manuscript (page 11, lines 13-16): “The correlation between FA values in the right SLF and pain severity in the Chicago data set showed marginal significance (p = 0.055) at visit 1 (Fig. S8A) and higher FA values were significantly associated with a greater reduction in pain at visit 2 (p = 0.035) (Fig. S8B).”

      - Data sharing. The discovery-replication approach of the present study distinguishes the present from previous approaches. This approach enhances the belief in the robustness of the findings. This belief would be further enhanced by making the data openly available. It would be extremely valuable for the community if other researchers could reproduce and replicate the findings without restrictions. It is not clear why the fact that the studies are ongoing prevents the unrestricted sharing of the data used in the present study.

      We greatly appreciate the reviewer's suggestion to share our data sets, as we strongly support the Open Science initiative. The Chicago data set is already publicly available. The New Haven data set will be shared on the Open Pain repository, and the Mannheim data set will be uploaded to heiDATA or heiARCHIVE at Heidelberg University in the near future. We cannot share the data immediately because this project is part of the Heidelberg pain consortium, “SFB 1158: From nociception to chronic pain: Structure-function properties of neural pathways and their reorganization.” Within this consortium, all data must be shared following a harmonized structure across projects, and no study will be published openly until all projects have completed initial analysis and quality control.

      Reviewer #3 (Public Review):

      Summary:

      Authors suggest a new biomarker of chronic back pain with the option to predict the result of treatment. The authors found a significant difference in a fractional anisotropy measure in superior longitudinal fasciculus for recovered patients with chronic back pain.

      Strengths:

      The results were reproduced in three different groups at different studies/sites.

      Weaknesses:

      - The number of participants is still low.

      The reviewer raises a very important point of limited sample size. As discussed in our replies to reviewer number 1:

      We acknowledge the small sample size in the “Limitations” section of the discussion.   In the resubmission, we acknowledge the degree of flexibility that is afforded by having access to all the data at once. However, we also note that our SLF-FA based model is a simple cut-off approach that does not include any learning or hidden layers and that the data obtained from Open Pain were never part of the “training” set at any point at either the New Haven or the Mannheim site.  Regarding our SVC approach we follow standard procedures for machine learning where we never mix the training and testing sets. The models are trained on the training data with parameters selected based on cross-validation within the training data. Therefore, no models have ever seen the test data set. The model performances we reported reflect the prognostic accuracy of our model. We write in the limitation section of the discussion (page 20, lines 20-21, and page 21, lines 1-6):

      “In addition, at the time of analysis, we had “access” to all the data, which may lead to bias in model training and development.  We believe that the data presented here are nevertheless robust since multisite validated but need replication. Additionally, we followed standard procedures for machine learning where we never mix the training and testing sets. The models were trained on the training data with parameters selected based on cross-validation within the training data. Therefore, no models have ever seen the test data set. The model performances we reported reflect the prognostic accuracy of our model”. 

      Finally, as discussed by Spisak et al., 10 the key determinant of the required sample size in predictive modeling is the ” true effect size of the brain-phenotype relationship”, which we think is the determinant of the replication we observe in this study. As such the effect size in the New Haven and Mannheim data is Cohen’s d >1.

      - An explanation of microstructure changes was not given.

      The reviewer points to an important gap in our discussion.  While we cannot do a direct study of actual tissue microstructure, we explored further the changes observed in the SLF by calculating diffusivity measures. We have now performed the analysis of mean, axial, and radial diffusivity. 

      In the results section we added (page 7, lines 12-19): “We also examined mean diffusivity (MD), axial diffusivity (AD), and radial diffusivity (RD) extracted from the right SLF shown in Fig.1 to further understand which diffusion component is different between the groups. The right SLF MD is significantly increased (p < 0.05) in the SBPr compared to SBPp patients (Fig. S3), while the right SLF RD is significantly decreased (p < 0.05) in the SBPr compared to SBPp patients in the New Haven data (Fig. S4). Axial diffusivity extracted from the RSLF mask did not show significant difference between SBPr and SBPp (p = 0.28) (Fig. S5).”

      In the discussion, we write (page 15, lines 10-20):

      “Within the significant cluster in the discovery data set, MD was significantly increased, while RD in the right SLF was significantly decreased in SBPr compared to SBPp patients. Higher RD values, indicative of demyelination, were previously observed in chronic musculoskeletal patients across several bundles, including the superior longitudinal fasciculus14.  Similarly, Mansour et al. found higher RD in SBPp compared to SBPr in the predictive FA cluster. While they noted decreased AD and increased MD in SBPp, suggestive of both demyelination and altered axonal tracts,15 our results show increased MD and RD in SBPr with no AD differences between SBPp and SBPr, pointing to white matter changes primarily due to myelin disruption rather than axonal loss, or more complex processes. Further studies on tissue microstructure in chronic pain development are needed to elucidate these processes.”

      - Some technical drawbacks are presented.

      We are uncertain if the reviewer is suggesting that we have acknowledged certain technical drawbacks and expects further elaboration on our part. We kindly request that the reviewer specify what particular issues need to be addressed so that we can respond appropriately.

      Recommendations For The Authors:

      We thank the reviewers for their constructive feedback, which has significantly improved our manuscript. We have done our best to answer the criticisms that they raised point-by-point.

      Reviewer #2 (Recommendations For The Authors):

      The discovery-replication approach of the current study justifies the use of the terminus 'robust.' In contrast, previous studies on predictive biomarkers using functional and structural brain imaging did not pursue similar approaches and have not been replicated. Still, the respective biomarkers are repeatedly referred to as 'robust.' Throughout the manuscript, it would, therefore, be more appropriate to remove the label 'robust' from those studies.

      We thank the reviewer for this valuable suggestion. We removed the label 'robust' throughout the manuscript when referring to the previous studies which didn’t follow the same approach and have not yet been replicated.

      Reviewer #3 (Recommendations For The Authors):

      This is, indeed, quite a well-written manuscript with very interesting findings and patient group. There are a few comments that enfeeble the findings.

      (1) It is a bit frustrating to read at the beginning how important chronic back pain is and the number of patients in the used studies. At least the number of healthy subjects could be higher.

      The reviewer raises an important point regarding the number of pain-free healthy controls (HC) in our samples. We first note that our primary statistical analysis focused on comparing recovered and persistent patients at baseline and validating these findings across sites without directly comparing them to HCs. Nevertheless, the data from New Haven included 28 HCs at baseline, and the data from Mannheim included 24 HCs. Although these sample sizes are not large, they have enabled us to clearly establish that the recovered SBPr patients generally have larger FA values in the right superior longitudinal fasciculus compared to the HCs, a finding consistent across sites (see Figs. 1 and 3). This suggests that the general pain-free population includes individuals with both low and high-risk potential for chronic pain. It also offers one explanation for the reported lack of differences or inconsistent differences between chronic low-back pain patients and HCs in the literature, as these differences likely depend on the (unknown) proportion of high- and low-risk individuals in the control groups. Therefore, if the high-risk group is more represented by chance in the HC group, comparisons between HCs and chronic pain patients are unlikely to yield statistically significant results. Thus, while we agree with the reviewer that the sample sizes of our HCs are limited, this limitation does not undermine the validity of our findings.

      (2) Pain reaction in the brain is in general a quite popular topic and could be connected to the findings or mentioned in the introduction.

      We thank the reviewer for this suggestion.  We have now added a summary of brain response to pain in general; In the introduction, we now write (page 4, lines 19-22 and page 5, lines 1-5):

      “Neuroimaging research on chronic pain has uncovered a shift in brain responses to pain when acute and chronic pain are compared. The thalamus, primary somatosensory, motor areas, insula, and mid-cingulate cortex most often respond to acute pain and can predict the perception of acute pain16-19. Conversely, limbic brain areas are more frequently engaged when patients report the intensity of their clinical pain20, 21. Consistent findings have demonstrated that increased prefrontal-limbic functional connectivity during episodes of heightened subacute ongoing back pain or during a reward learning task is a significant predictor of CBP.12, 22. Furthermore, low somatosensory cortex excitability in the acute stage of low back pain was identified as a predictor of CBP chronicity.23”

      (3) It is clearly observed structural asymmetry in the brain, why not elaborate this finding further? Would SLF be a hub in connectivity analysis? Would FA changes have along tract features? etc etc etc

      The reviewer raises an important point. There is ground to suggest from our data that there is an asymmetry to the role of the SLF in resilience to chronic pain. We discuss this at length in the Discussion section. We have, in addition, we elaborated more in our data analysis using our Population Based Structural Connectome pipeline on the New Haven dataset. Following that approach, we studied both the number of fiber tracts making different parts of the SLF on the right and left side. In addition, we have extracted FA values along fiber tracts and compared the average across groups. Our new analyses are presented in our modified Figures 7 and Fig S11.  These results support the asymmetry hypothesis indeed. The SLF could be a hub of structural connectivity. Please note however, given the nature of our design of discovery and validation, the study of structural connectivity of the SLF is beyond the scope of this paper because tract-based connectivity is very sensitive to data collection parameters and is less accurate with single shell DWI acquisition. Therefore, we will pursue the study of connectivity of the SLF in the future with well-powered and more harmonized data.

      (4) Only FA is mentioned; did the authors work with MD, RD, and AD metrics?

      We thank the reviewer for this suggestion that helps in providing a clearer picture of the differences in the right SLF between SBPr and SBPp. We have now extracted MD, AD, and RD for the predictive mask we discovered in Figure 1 and plotted the values comparing SBPr to SBPp patients in Fig. S3, Fig. S4., and Fig. S5 across all sites using one comprehensive harmonized analysis. We have added in the discussion “Within the significant cluster in the discovery data set, MD was significantly increased, while RD in the right SLF was significantly decreased in SBPr compared to SBPp patients. Higher RD values, indicative of demyelination, were previously observed in chronic musculoskeletal patients across several bundles, including the superior longitudinal fasciculus14.  Similarly, Mansour et al. found higher RD in SBPp compared to SBPr in the predictive FA cluster. While they noted decreased AD and increased MD in SBPp, suggestive of both demyelination and altered axonal tracts15, our results show increased MD and RD in SBPr with no AD differences between SBPp and SBPr, pointing to white matter changes primarily due to myelin disruption rather than axonal loss, or more complex processes. Further studies on tissue microstructure in chronic pain development are needed to elucidate these processes.”

      (5) There are many speculations in the Discussion, however, some of them are not supported by the results.

      We agree with the reviewer and thank them for pointing this out. We have now made several changes across the discussion related to the wording where speculations were not supported by the data. For example, instead of writing (page 16, lines 7-9): “Together the literature on the right SLF role in higher cognitive functions suggests, therefore, that resilience to chronic pain is a top-down phenomenon related to visuospatial and body awareness.”, We write: “Together the literature on the right SLF role in higher cognitive functions suggests, therefore, that resilience to chronic pain might be related to a top-down phenomenon involving visuospatial and body awareness.”

      (6) A method section was written quite roughly. In order to obtain all the details for a potential replication one needs to jump over the text.

      The reviewer is correct; our methodology may have lacked more detailed descriptions.  Therefore, we have clarified our methodology more extensively.  Under “Estimation of structural connectivity”; we now write (page 28, lines 20,21 and page 29, lines 1-19):

      “Structural connectivity was estimated from the diffusion tensor data using a population-based structural connectome (PSC) detailed in a previous publication.24 PSC can utilize the geometric information of streamlines, including shape, size, and location for a better parcellation-based connectome analysis. It, therefore, preserves the geometric information, which is crucial for quantifying brain connectivity and understanding variation across subjects. We have previously shown that the PSC pipeline is robust and reproducible across large data sets.24 PSC output uses the Desikan-Killiany atlas (DKA) 25 of cortical and sub-cortical regions of interest (ROI). The DKA parcellation comprises 68 cortical surface regions (34 nodes per hemisphere) and 19 subcortical regions. The complete list of ROIs is provided in the supplementary materials’ Table S6.  PSC leverages a reproducible probabilistic tractography algorithm 26 to create whole-brain tractography data, integrating anatomical details from high-resolution T1 images to minimize bias in the tractography. We utilized DKA 25 to define the ROIs corresponding to the nodes in the structural connectome. For each pair of ROIs, we extracted the streamlines connecting them by following these steps: 1) dilating each gray matter ROI to include a small portion of white matter regions, 2) segmenting streamlines connecting multiple ROIs to extract the correct and complete pathway, and 3) removing apparent outlier streamlines. Due to its widespread use in brain imaging studies27, 28, we examined the mean fractional anisotropy (FA) value along streamlines and the count of streamlines in this work. The output we used includes fiber count, fiber length, and fiber volume shared between the ROIs in addition to measures of fractional anisotropy and mean diffusivity.”

      (7) Why not join all the data with harmonisation in order to reproduce the results (TBSS)

      We have followed the reviewer’s suggestion; we used neuroCombat harmonization after pooling all the diffusion weighted data into one TBSS analysis. Our results remain the same after harmonization. 

      In the Supplementary Information we added a paragraph explaining the method for harmonization; we write (SI, page 3, lines 25-34):

      “Harmonization of DTI data using neuroCombat. Because the 3 data sets originated from different sites using different MR data acquisition parameters and slightly different recruitment criteria, we applied neuroCombat 29  to correct for site effects and then repeated the TBSS analysis shown in Figure 1 and the validation analyses shown in Figures 5 and 6. First, the FA maps derived using the FDT toolbox were pooled into one TBSS analysis where registration to a standard template FA template (FMRIB58_FA_1mm.nii.gz part of FSL) was performed.  Next, neuroCombat was applied to the FA maps as implemented in Python with batch (i.e., site) effect modeled with a vector containing 1 for New Haven, 2 for Chicago, and 3 for Mannheim originating maps, respectively. The harmonized maps were then skeletonized to allow for TBSS.”

      And in the results section, we write (page 12, lines 2-21):

      “Validation after harmonization

      Because the DTI data sets originated from 3 sites with different MR acquisition parameters, we repeated our TBSS and validation analyses after correcting for variability arising from site differences using DTI data harmonization as implemented in neuroCombat. 29 The method of harmonization is described in detail in the Supplementary Methods. The whole brain unpaired t-test depicted in Figure 1 was repeated after neuroCombat and yielded very similar results (Fig. S9A) showing significantly increased FA in the SBPr compared to SBPp patients in the right superior longitudinal fasciculus (MNI-coordinates of peak voxel: x = 40; y = - 42; z = 18 mm; t(max) = 2.52; p < 0.05, corrected against 10,000 permutations).  We again tested the accuracy of local diffusion properties (FA) of the right SLF extracted from the mask of voxels passing threshold in the New Haven data (Fig.S9A) in classifying the Mannheim and the Chicago patients, respectively, into persistent and recovered. FA values corrected for age, gender, and head displacement accurately classified SBPr  and SBPp patients from the Mannheim data set with an AUC = 0.67 (p = 0.023, tested against 10,000 random permutations, Fig. S9B and S7D), and patients from the Chicago data set with an AUC = 0.69 (p = 0.0068) (Fig. S9C and S7E) at baseline, and an AUC = 0.67 (p = 0.0098)  (Fig. S9D and S7F) patients at follow-up,  confirming the predictive cluster from the right SLF across sites. The application of neuroCombat significantly changes the FA values as shown in Fig.S10 but does not change the results between groups.”

      Minor comments

      (1) In the case of New Haven data, one used MB 4 and GRAPPA 2, these two factors accelerate the imaging 8 times and often lead to quite a poor quality.<br /> Any kind of QA?

      We thank the reviewer for identifying this error. GRAPPA 2 was in fact used for our T1-MPRAGE image acquisition but not during the diffusion data acquisition. The diffusion data were acquired with a multi-band acceleration factor of 4.  We have now corrected this mistake.

      (2) Why not include MPRAGE data into the analysis, in particular, for predictions?

      We thank the reviewer for the suggestion. The collaboration on this paper was set around diffusion data. In addition, MPRAGE data from New Haven related to prediction is already published (10.1073/pnas.1918682117) and MPRAGE data of the Mannheim data set is a part of the larger project and will be published elsewhere.

      (3) In preprocessing, the authors wrote: "Eddy current corrects for image distortions due to susceptibility-induced distortions and eddy currents in the gradient coil"<br /> However, they did not mention that they acquired phase-opposite b0 data. It means eddy_openmp works likely only as an alignment tool, but not susceptibility corrector.

      We kindly thank the reviewer for bringing this to our attention. We indeed did not collect b0 data in the phase-opposite direction, however, eddy_openmp can still be used to correct for eddy current distortions and perform motion correction, but the absence of phase-opposite b0 data may limit its ability to fully address susceptibility artifacts. This is now noted in the Supplementary Methods under Preprocessing section (SI, page 3, lines 16-18): “We do note, however, that as we did not acquire data in the phase-opposite direction, the susceptibility-induced distortions may not be fully corrected.”

      (4) Version of FSL?

      We thank the reviewer for addressing this point that we have now added under the Supplementary Methods (SI, page 3, lines 10-11): “Preprocessing of all data sets was performed employing the same procedures and the FMRIB diffusion toolbox (FDT) running on FSL version 6.0.”

      (5) Some short sketches about the connectivity analysis could be useful, at least in SI.

      We are grateful for this suggestion that improves our work. We added the sketches about the connectivity analysis, please see Figure 7 and Supplementary Figure 11.

      (6) Machine learning: functions, language, version?

      We thank the reviewer for pointing out these minor points that we now hope to have addressed in our resubmission in the Methods section by adding a detailed description of the structural connectivity analysis. We added: “The DKA parcellation comprises 68 cortical surface regions (34 nodes per hemisphere) and 19 subcortical regions. The complete list of ROIs is provided in the supplementary materials’ Table S7.  PSC leverages a reproducible probabilistic tractography algorithm 26 to create whole-brain tractography data, integrating anatomical details from high-resolution T1 images to minimize bias in the tractography. We utilized DKA 25 to define the ROIs corresponding to the nodes in the structural connectome. For each pair of ROIs, we extracted the streamlines connecting them by following these steps: 1) dilating each gray matter ROI to include a small portion of white matter regions, 2) segmenting streamlines connecting multiple ROIs to extract the correct and complete pathway, and 3) removing apparent outlier streamlines. Due to its widespread use in brain imaging studies27, 28, we examined the mean fractional anisotropy (FA) value along streamlines and the count of streamlines in this work. The output we used includes fiber count, fiber length, and fiber volume shared between the ROIs in addition to measures of fractional anisotropy and mean diffusivity.”

      The script is described and provided at: https://github.com/MISICMINA/DTI-Study-Resilience-to-CBP.git.

      (7) Ethical approval?

      The New Haven data is part of a study that was approved by the Yale University Institutional Review Board. This is mentioned under the description of the data “New Haven (Discovery) data set (page 23, lines 1,2).  Likewise, the Mannheim data is part of a study approved by Ethics Committee of the Medical Faculty of Mannheim, Heidelberg University, and was conducted in accordance with the declaration of Helsinki in its most recent form. This is also mentioned under “Mannheim data set” (page 26, lines 2-5): “The study was approved by the Ethics Committee of the Medical Faculty of Mannheim, Heidelberg University, and was conducted in accordance with the declaration of Helsinki in its most recent form.”

      (1) Traeger AC, Henschke N, Hubscher M, et al. Estimating the Risk of Chronic Pain: Development and Validation of a Prognostic Model (PICKUP) for Patients with Acute Low Back Pain. PLoS Med 2016;13:e1002019.

      (2) Hill JC, Dunn KM, Lewis M, et al. A primary care back pain screening tool: identifying patient subgroups for initial treatment. Arthritis Rheum 2008;59:632-641.

      (3) Hockings RL, McAuley JH, Maher CG. A systematic review of the predictive ability of the Orebro Musculoskeletal Pain Questionnaire. Spine (Phila Pa 1976) 2008;33:E494-500.

      (4) Chou R, Shekelle P. Will this patient develop persistent disabling low back pain? JAMA 2010;303:1295-1302.

      (5) Silva FG, Costa LO, Hancock MJ, Palomo GA, Costa LC, da Silva T. No prognostic model for people with recent-onset low back pain has yet been demonstrated to be suitable for use in clinical practice: a systematic review. J Physiother 2022;68:99-109.

      (6) Kent PM, Keating JL. Can we predict poor recovery from recent-onset nonspecific low back pain? A systematic review. Man Ther 2008;13:12-28.

      (7) Hruschak V, Cochran G. Psychosocial predictors in the transition from acute to chronic pain: a systematic review. Psychol Health Med 2018;23:1151-1167.

      (8) Hartvigsen J, Hancock MJ, Kongsted A, et al. What low back pain is and why we need to pay attention. Lancet 2018;391:2356-2367.

      (9) Tanguay-Sabourin C, Fillingim M, Guglietti GV, et al. A prognostic risk score for development and spread of chronic pain. Nat Med 2023;29:1821-1831.

      (10) Spisak T, Bingel U, Wager TD. Multivariate BWAS can be replicable with moderate sample sizes. Nature 2023;615:E4-E7.

      (11) Liu Y, Zhang HH, Wu Y. Hard or Soft Classification? Large-margin Unified Machines. J Am Stat Assoc 2011;106:166-177.

      (12) Loffler M, Levine SM, Usai K, et al. Corticostriatal circuits in the transition to chronic back pain: The predictive role of reward learning. Cell Rep Med 2022;3:100677.

      (13) Smith SM, Dworkin RH, Turk DC, et al. Interpretation of chronic pain clinical trial outcomes: IMMPACT recommended considerations. Pain 2020;161:2446-2461.

      (14) Lieberman G, Shpaner M, Watts R, et al. White Matter Involvement in Chronic Musculoskeletal Pain. The Journal of Pain 2014;15:1110-1119.

      (15) Mansour AR, Baliki MN, Huang L, et al. Brain white matter structural properties predict transition to chronic pain. Pain 2013;154:2160-2168.

      (16) Wager TD, Atlas LY, Lindquist MA, Roy M, Woo CW, Kross E. An fMRI-based neurologic signature of physical pain. N Engl J Med 2013;368:1388-1397.

      (17) Lee JJ, Kim HJ, Ceko M, et al. A neuroimaging biomarker for sustained experimental and clinical pain. Nat Med 2021;27:174-182.

      (18) Becker S, Navratilova E, Nees F, Van Damme S. Emotional and Motivational Pain Processing: Current State of Knowledge and Perspectives in Translational Research. Pain Res Manag 2018;2018:5457870.

      (19) Spisak T, Kincses B, Schlitt F, et al. Pain-free resting-state functional brain connectivity predicts individual pain sensitivity. Nat Commun 2020;11:187.

      (20) Baliki MN, Apkarian AV. Nociception, Pain, Negative Moods, and Behavior Selection. Neuron 2015;87:474-491.

      (21) Elman I, Borsook D. Common Brain Mechanisms of Chronic Pain and Addiction. Neuron 2016;89:11-36.

      (22) Baliki MN, Petre B, Torbey S, et al. Corticostriatal functional connectivity predicts transition to chronic back pain. Nat Neurosci 2012;15:1117-1119.

      (23) Jenkins LC, Chang WJ, Buscemi V, et al. Do sensorimotor cortex activity, an individual's capacity for neuroplasticity, and psychological features during an episode of acute low back pain predict outcome at 6 months: a protocol for an Australian, multisite prospective, longitudinal cohort study. BMJ Open 2019;9:e029027.

      (24) Zhang Z, Descoteaux M, Zhang J, et al. Mapping population-based structural connectomes. Neuroimage 2018;172:130-145.

      (25) Desikan RS, Segonne F, Fischl B, et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage 2006;31:968-980.

      (26) Maier-Hein KH, Neher PF, Houde J-C, et al. The challenge of mapping the human connectome based on diffusion tractography. Nature Communications 2017;8:1349.

      (27) Chiang MC, McMahon KL, de Zubicaray GI, et al. Genetics of white matter development: a DTI study of 705 twins and their siblings aged 12 to 29. Neuroimage 2011;54:2308-2317.

      (28) Zhao B, Li T, Yang Y, et al. Common genetic variation influencing human white matter microstructure. Science 2021;372.

      (29) Fortin JP, Parker D, Tunc B, et al. Harmonization of multi-site diffusion tensor imaging data. Neuroimage 2017;161:149-170.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors intended to investigate the earliest mechanisms enabling self-prioritization, especially in the attention. Combining a temporal order judgement task with computational modelling based on the Theory of Visual Attention (TVA), the authors suggested that the shapes associated with the self can fundamentally alter the attentional selection of sensory information into awareness. This self-prioritization in attentional selection occurs automatically at early perceptual stages. Furthermore, the processing benefits obtained from attentional selection via self-relatedness and physical salience were separated from each other.

      Strengths:

      The manuscript is written in a way that is easy to follow. The methods of the paper are very clear and appropriate.

      Thank you for your valuable feedback and helpful suggestions. Please see specific answers below.

      Weaknesses:

      There are two main concerns:

      (1) The authors had a too strong pre-hypothesis that self-prioritization was associated with attention. They used the prior entry to consciousness (awareness) as an index of attention, which is not appropriate. There may be other processing that makes the stimulus prior to entry to consciousness (e.g. high arousal, high sensitivity), but not attention. The self-related/associated stimulus may be involved in such processing but not attention to make the stimulus easily caught. Perhaps the authors could include other methods such as EEG or MEG to answer this question.

      We found the possibility of other mechanisms to be responsible for “prior entry” interesting too, but believe there are solid grounds for the hypothesis that it is indicative of attention:

      First, prior entry has a long-standing history as in index of attention (e.g., Titchener, 1903; Shore et al., 2001; Yates and Nicholls, 2009; Olivers et al. 2011; see Spence & Parise, 2010, for a review.) Of course, other factors (like the ones mentioned) can contribute to encoding speed. However, for the perceptual condition, we systematically varied a stimulus feature that is associated with selective attention (salience, see e.g. Wolfe, 2021) and kept other features that are known to be associated with other factors such as arousal and sensitivity constant across the two variants (e.g. clear over threshold visibility) or varied them between participants (e.g. the colours / shapes used).

      Second, in the social salience condition we used a manipulation that has repeatedly been used to establish social salience effects in other paradigms (e.g., Li et al., 2022; Liu & Sui, 2016; Scheller et al., 2024; Sui et al., 2015; see Humphreys & Sui, 2016, for a review). We assume that the reviewer’s comment suggests that changes in arousal or sensitivity may be responsible for social salience effects, specifically. We have several reasons to interpret the social salience effects as an alteration in attentional selection, rather than a result of arousal or sensitivity:

      Arousal and attention are closely linked. However, within the present model, arousal is more likely linked to the availability of processing resources (capacity parameter C). That is, enhanced arousal is typically not stimulus-specific, and therefore unlikely affects the *relative* advantage in processing weights/rates of the self-associated (vs other-associated) stimuli. Indeed, a recent study showed that arousal does not modulate the relative division of attentional resources (as modelled by the Theory of Visual Attention; Asgeirsson & Nieuwenhuis, 2017). As such, it is unlikely that arousal can explain the observed results in relative processing changes for the self and other identities.

      Further, there is little reason to assume that presenting a different shape enhances perceptual sensitivity. Firstly, all stimuli were presented well above threshold, which would shrink any effects that were resulting from increases in sensitivity alone. Secondly, shape-associations were counterbalanced across participants, reducing the possibility that specific features, present in the stimulus display, lead to the measurable change in processing rates as a result of enhanced shape-sensitivity.

      Taken together, both, the wealth of literature that suggests prior entry to index attention and the specific design choices within our study, strongly support the notion that the observed changes in processing rates are indicative of changes in attentional selection, rather than other mechanisms (e.g. arousal, sensitivity).

      (2) The authors suggested that there are two independent attention processes. I suspect that the brain needs two attention systems. Is there a probability that the social and perceptual (physical properties of the stimulus) salience fired the same attention processing through different processing?

      We appreciate this thought-provoking comment. We conceptualize attention as a process that can facilitate different levels of representation, rather than as separate systems tuned to specific types of information. Different forms of representation, such as the perceptual shape, or the associated social identity, may be impacted by the same attentional process at different levels of representation. Indeed, our findings suggest that both social and perceptual salience effects may result from the same attentional system, albeit at different levels of representation. This is further supported by the additivity of perceptual and social salience effects and the negative correlation of processing facilitations between perceptually and socially salient cues. These results may reflect a trade-off in how attentional resources are distributed between either perceptually or socially salient stimuli.

      Reviewer #2 (Public review):

      Summary:

      The main aim of this research was to explore whether and how self-associations (as opposed to other associations) bias early attentional selection, and whether this can explain well-known self-prioritization phenomena, such as the self-advantage in perceptual matching tasks. The authors adopted the Visual Attention Theory (VAT) by estimating VAT parameters using a hierarchical Bayesian model from the field of attention and applied it to investigate the mechanisms underlying self-prioritization. They also discussed the constraints on the self-prioritization effect in attentional selection. The key conclusions reported were:

      (1) Self-association enhances both attentional weights and processing capacity

      (2) Self-prioritization in attentional selection occurs automatically but diminishes when active social decoding is required, and

      (3) Social and perceptual salience capture attention through distinct mechanisms.

      Strengths:

      Transferring the Theory of Visual Attention parameters estimated by a hierarchical Bayesian model to investigate self-prioritization in attentional selection was a smart approach. This method provides a valuable tool for accessing the very early stages of self-processing, i.e., attention selection. The authors conclude that self-associations can bias visual attention by enhancing both attentional weights and processing capacity and that this process occurs automatically. These findings offer new insights into self-prioritization from the perspective of the early stage of attentional selection.

      Thank you for your valuable feedback and helpful suggestions. Please see specific answers below.

      Weaknesses:

      (1) The results are not convincing enough to definitively support their conclusions. This is due to inconsistent findings (e.g., the model selection suggested condition-specific c parameters, but the increase in processing capacity was only slight; the correlations between attentional selection bias and SPE were inconsistent across experiments), unexpected results (e.g., when examining the impact of social association on processing rates, the other-associated stimuli were processed faster after social association, while the self-associated stimuli were processed more slowly), and weak correlations between attentional bias and behavioral SPE, which were reported without any p-value corrections. Additionally, the reasons why the attentional bias of self-association occurs automatically but disappears during active social decoding remain difficult to explain. It is also possible that the self-association with shapes was not strong enough to demonstrate attention bias, rather than the automatic processes as the authors suggest. Although these inconsistencies and unexpected results were discussed, all were post hoc explanations. To convince readers, empirical evidence is needed to support these unexpected findings.

      Thank you for outlining the specific points that raise your concern. We were happy to address these points as follows:

      a. Replications and Consistency: In our study, we consistently observed trends (relative reduction in processing speed of the self-associated stimulus) in the social salience conditions across experiments. While Experiment 2 demonstrated a significant reduction in processing rate towards self-stimuli, there was a notable trend in Experiment 1 as well.

      b. Condition-specific parameters: The condition-specific C parameters, though presenting a small effect size, significantly improved model fit. Inspecting the HDI ranges of our estimated C parameters indicates a high probability (85-89%) that processing capacity increased due to social associations, suggesting that even small changes (~2Hz) can hold meaningful implications within the context attentional selection.

      Please also note that the main conclusions about relative salience (self/other, salient/non-salient) are based on the relative processing rates. Processing rates are the product of the processing capacity (condition- but not stimulus dependent) and the attentional weight (condition and stimulus dependent). The latter is crucial to judge the *relative* advantage of the salient stimulus. Hence, the self-/salient stimulus advantage that is reflected in the ‘processing rate difference’ is automatically also reflected in the relative attentional weights attributed to the self/other and salient/non-salient stimuli. As such, the overall results of an automatic relative advantage of self-associated stimuli hold, independently of the change in overall processing capacity.

      c. Correlations: Regarding the correlations the reviewer noted, we wish to clarify that these were exploratory, and not the primary focus of our research. The aim of these exploratory analyses was to gauge the contribution of attentional selection to matching-based SPEs. As SPEs measured via the matching task are typically based on multiple different levels of processing, the contribution of early attentional selection to their overall magnitude was unclear. Without being able to gauge the possible effect sizes, corrected analyses may prevent detecting small but meaningful effects. As such, the effect sizes reported serve future studies to estimate power a priori and conduct well-powered replications of such exploratory effects. Additionally, Bayes factors were provided to give an appreciation of the strength of the evidence, all suggesting at least moderate evidence in favour of a correlation. Lastly, please note that effects that were measured within individuals and task (processing rate increase in social and perceptual decision dimensions in the TOJ task) showed consistent patterns, suggesting that the modulations within tasks were highly predictive of each other, while the modulations between tasks were not as clearly linked. We will add this clarification to the revised manuscript.

      d. Unexpected results: The unexpected results concerning the processing rates of other-associated versus self-associated stimuli certainly warrant further discussion. We believe that the additional processing steps required for social judgments, reflected in enhanced reaction times, may explain the slower processing of self-associated stimuli in that dimension. We agree that not all findings will align with initial hypotheses, and this variability presents avenues for further research. We have added this to the discussion of social salience effects.

      e. Whether association strength can account for the findings: We appreciate the scepticism regarding the strength of self-association with shapes. However, our within-participant design and control matching task indicate that the relative processing advantage for self-associated stimuli holds across conditions. This makes the scenario that “the self-association with shapes was not strong enough to demonstrate attention bias” very unlikely. Firstly, the relative processing advantage of self-associated stimuli in the perceptual decision condition, and the absence of such advantage in the social decision condition, were evidenced in the same participants. Hence, the strength of association between shapes and social identities was the same for both conditions. However, we only find an advantage for the self-associated shape when participants make perceptual (shape) judgements. It is therefore highly unlikely that the “association strength” can account for the difference in the outcomes between the conditions in experiment 1. Also, note that the order in which these conditions were presented was counter-balanced across participants, reducing the possibility that the automatic self-advantage was merely a result of learning or fatigue. Secondly, all participants completed the standard matching task to ascertain that the association between shapes and identities did indeed lead to processing advantages (across different levels).

      In summary, we believe that the evidence we provide supports the final conclusions. We do, of course, welcome any further empirical evidence that could enhance our understanding of the contribution of different processing levels to the SPE and are committed to exploring these areas in future work.

      (2) The generalization of the findings needs further examination. The current results seem to rely heavily on the perceptual matching task. Whether this attentional selection mechanism of self-prioritization can be generalized to other stimuli, such as self-name, self-face, or other domains of self-association advantages, remains to be tested. In other words, more converging evidence is needed.

      The reviewer indicates that the current findings heavily rely on the perceptual matching task, and it would be more convincing to include other paradigm(s) and different types of stimuli. We are happy to address these points here: first, we specifically used a temporal order paradigm to tap into specific processes, rather than merely relying on the matching task. Attentional selection is, along with other processes, involved in matching, but the TOJ-TVA approach allows tapping into attentional selection specifically.  Second, self-prioritization effects have been replicated across a wide range of stimuli (e.g. faces: Wozniak et al., 2018; names or owned objects: Scheller & Sui, 2022a, or even fully unfamiliar stimuli: Wozniak & Knoblich, 2019) and paradigms (e.g. matching task: Sui et al., 2012; cross-modal cue integration: e.g. Scheller & Sui, 2022b; Scheller et al., 2023; continuous flash suppression: Macrae et al., 2017; temporal order judgment: Constable et al., 2019; Truong et al., 2017). Using neutral geometric shapes, rather than faces and names, addresses a key challenge in self research: mitigating the influence of stimulus familiarity on results. In addition, these newly learned, simple stimuli can be combined with other paradigms, such as the TOJ paradigm in the current study, to investigate the broader impact of self-processing on perception and cognition.

      To the best of our knowledge, this is the first study showing evidence about the mechanisms that are involved in early attentional selection of socially salient stimuli. Future replications and extensions would certainly be useful, as with any experimental paradigm.

      (3) The comparison between the "social" and "perceptual" tasks remains debatable, as it is challenging to equate the levels of social salience and perceptual salience. In addition, these two tasks differ not only in terms of social decoding processes but also in other aspects such as task difficulty. Whether the observed differences between the tasks can definitively suggest the specificity of social decoding, as the authors claim, needs further confirmation.

      Equating the levels of social and perceptual salience is indeed challenging, but not an aim of the present study. Instead, the present study directly compares the mechanisms and effects of social and perceptual salience, specifically experiment 2. By manipulating perceptual salience (relative colour) and social salience (relative shape association) independently and jointly, and quantifying the effects on processing rates, our study allows to directly delineate the contributions of each of these types of salience. The results suggest additive effects (see also Figure 7). Indeed, the possibility remains that these effects are additive because of the use of different perceptual features, so it would be helpful for future studies to explore whether similar perceptual features lead to (supra-/sub-) additive effects. In either case, the study design allows to directly compare the effects and mechanisms of social and perceptual salience.

      Regarding the social and perceptual decision dimensions, they were not expected to be equated. Indeed, the social decision dimension requires additional retrieval of the associated identity, making it likely more challenging. This additional retrieval is also likely responsible for the slower responses towards the social association compared to the shape itself. However, the motivation to compare the effects of these two decisional dimensions lies in the assumption that the self needs to be task relevant. Some evidence suggests that the self needs to be task-relevant to induce self-prioritization effects (e.g., Woźniak & Knoblich, 2022). However, these studies typically used matching tasks and were powered to detect large effects only (e.g. f = 0.4, n = 18). As it is likely that lacking contribution of decisional processing levels (which interact with task-relevance) will reduce the SPE, smaller self-prioritization effects that result from earlier processing levels may not be detected with sufficient statistical power. Targeting specific processing levels, especially those with relatively early contributions or small effect sizes, requires larger samples (here: n = 70) to provide sufficient power. Indeed, by contrasting the relative attentional selection effects in the present study we find that the self does not need to be task-relevant to produce self-prioritization effects. This is in line with recent findings of prior entry of self-faces (Jubile & Kumar, 2021)

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      Liu et al., present an immersion objective adapter design called RIM-Deep, which can be utilized for enhancing axial resolution and reducing spherical aberrations during inverted confocal microscopy of thick cleared tissue.

      Strengths:

      RI mismatches present a significant challenge to deep tissue imaging, and developing a robust immersion method is valuable in preventing losses in resolution. Liu et al., present data showing that RIM-Deep is suitable for tissue cleared with two different clearing techniques, demonstrating the adaptability and versatility of the approach.

      Greetings, we greatly appreciate your feedback. In truth, we have utilized three distinct clearing techniques, including iDISCO, CUBIC, and MACS, to substantiate the adaptability and multifunctionality of the RIM-Deep adapter.

      Weaknesses:

      Liu et al., claim to have developed a useful technique for deep tissue imaging, but in its current form, the paper does not provide sufficient evidence that their technique performs better than existing ones.

      We are in complete agreement with your recommendation, and the additional experiments will conduct a thorough comparison of the efficacy between the RIM-deep adapter and the official adapter in the context of fluorescence bead experiments, along with their performance in cubic and MASC tissue clearing techniques.

      Reviewer 2 (Public review):

      The authors used different clearing methods to demonstrate the suitability of RIM-Deep for various sample preparation protocols with clearing solutions of different refractive indices. They clearly demonstrate that the RIM-Deep chamber is compatible with all three methods. Brain samples are characterized by complex networks of cells and are often hard to visualize. Despite the dense, complex structure of brain tissue, the RIM-Deep method generated high-quality images of all three samples. As the authors stated, increasing imaging depth often goes hand in hand with purchasing expensive new equipment, exchanging several microscopy parts, or purchasing a new microscopy setup. Innovations like the RIM-Deep chamber might pave the way for cost-effective imaging and expand the applicability of inverted confocal microscopy.

      Weeknesses:

      (1) However, since this study introduces a novel imaging technique aiming to revolutionize imaging of large samples, additional control experiments would strengthen the data. From the three clearing protocols used (CUBIC, MACS, and iDISCO), only the brain section from Macaca fascicularis cleared with iDISCO was imaged with the standard chamber and the RIM-Deep method. This comparison indeed shows a more than 2-fold increase in imaging depth, a significant enhancement in microscopy. However, it would have been important to evaluate and show the imaging depth differences in the other two samples, as they were cleared with different protocols and treated with clearing solutions of different refractive indices compared to iDISCO.

      Thank you for your suggestion. We will investigate the imaging performance of brain tissue using the other two clearing protocols with both the official adapter and the RIM-deep method.

      (2) The description of the figures and figure panels should be improved for a better understanding of the experiments performed and the resulting images/data.

      Thank you for your suggestion. We will revise the figure legends in detail.

      (3) While the authors used a Nikon AX inverted laser scanning confocal microscope, the study would benefit from evaluating the performance of the RIM-Deep method using other inverted confocal microscopes or even wide-field microscopes.

      Thank you for your suggestion. We also recognize that evaluating the performance of the RIM-Deep method on other inverted confocal microscopes will help further validate its applicability and robustness. We will supplement these experiments to expand the scope and reliability of RIM-Deep.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In their manuscript, the authors propose a learning scheme to enable spiking neurons to learn the appearance probability of inputs to the network. To this end, the neurons rely on error-based plasticity rules for feedforward and recurrent connections. The authors show that this enables the networks to spontaneously sample assembly activations according to the occurrence probability of the input patterns they respond to. They also show that the learning scheme could explain biases in decision-making, as observed in monkey experiments. While the task of neural sampling has been solved before in other models, the novelty here is the proposal that the main drivers of sampling are within-assembly connections, and not between-assembly (Markov chains) connections as in previous models. This could provide a new understanding of how spontaneous activity in the cortex is shaped by synaptic plasticity. 

      The manuscript is well written and the results are presented in a clear and understandable way. The main results are convincing, concerning the spontaneous firing rate dependence of assemblies on input probability, as well as the replication of biases in the decision-making experiment. Nevertheless, the manuscript and model leave open several important questions. The main problem is the unclarity, both in theory and intuitively, of how the sampling exactly works. This also makes it difficult to assess the claims of novelty the authors make, as it is not clear how their work relates to previous models of neural sampling. 

      We agree with the reviewer that our previous manuscript was not clear regarding the mechanism of the model. We have performed additional simulations and included a derivation of the learning rule to address this, which we explain below.

      Regarding the unclarity of the sampling mechanism, the authors state that withinassembly excitatory connections are responsible for activating the neurons according to stimulus probability. However, the intuition for this process is not made clear anywhere in the manuscript. How do the recurrent connections lead to the observed effect of sampling? How exactly do assemblies form from feedforward plasticity? This intuitive unclarity is accompanied by a lack of formal justification for the plasticity rules. The authors refer to a previous publication from the same lab, but it is difficult to connect these previous results and derivations to the current manuscript. The manuscript should include a clear derivation of the learning rules, as well as an (ideally formal) intuition of how this leads to the sampling dynamics in the simulation. 

      We have included a derivation of our plasticity rules in lines 871-919 in the revised manuscript. Consistent with our claim that predictive plasticity updates the feedforward and the recurrent synapses to predict output firing rates, we have shown that the corresponding cost function measures the discrepancy among the recurrent prediction, feedforward prediction, and the output firing rate. The resultant feedforward plasticity is the same with our previous rule (Asabuki and Fukai, 2020), which segments the salient patterns embedded in the input sequence. The recurrent plasticity rule suggests that the recurrent prediction learns the statistical model of the evoked activity, enabling the network to replay the learned internal model.  

      Similarly, for the inhibitory plasticity, we defined a cost function that evaluates the difference between the firing rate and inhibitory potential within each neuron. This rule is crucial for maintaining balanced network dynamics. See our response below for more details on the role of inhibitory plasticity.

      Some of the model details should furthermore be cleared up. First, recurrent connections transmit signals instantaneously, which is implausible. Is this required, would the network dynamics change significantly if, e.g., excitation arrives slightly delayed? Second, why is the homeostasis on h required for replay? The authors show that without it the probabilities of sampling are not matched, but it is not clear why, nor how homeostasis prevents this. Third, G and M have the same plasticity rule except for G being confined to positive values, but there is no formal justification given for this quite unusual rule. The authors should clearly justify (ideally formally) the introduction of these inhibitory weights G, which is also where the manuscript deviates from their previous 2020 work. My feeling is that inhibitory weights have to be constrained in the current model because they have a different goal (decorrelation, not prediction) and thus should operate with a completely different plasticity mechanism. The current manuscript doesn't address this, as there is no overall formal justification for the learning algorithm. 

      First, while the reviewer's suggestion to test with delayed excitation is intriguing and crucial for a more biologically detailed spiking neuron model, we have chosen to maintain the current model configuration. Our use of Poisson spiking neurons, which generate spikes based on instantaneous firing rates, does not heavily depend on precise spike timing information. Therefore, to preserve the simplicity of our results, we kept the model unchanged.

      Second, we agree that our previous claim regarding the importance of the memory trace h for sampling may have been confusing. As shown in Supplementary Figure 7b in the revised manuscript, when we eliminated the dynamics of the memory trace, sampling performance did indeed decrease. However, we also observed that the assembly activity ratio continued to show a linear relationship with stimulus probabilities. Based on these findings, we have revised our claim in the manuscript to clarify that the memory trace is primarily critical for firing rate homeostasis, rather than directly influencing sampling within the learned network. We have explained this in ll. 446-448 in the revised manuscript.

      Third, we explored a new architecture where all recurrent connections are either exclusively excitatory or inhibitory, keeping their sign throughout the learning process. This change addresses the reviewer's concern about our initial assumption that only the inhibitory connection G was constrained to non-negative values. We found that inhibition plays a crucial role in decorrelation and prediction, helping activate specific assemblies through competition while preventing runaway excitation within active assemblies. We have explained this in ll.560-593 in the revised manuscript.

      Finally, the authors should make the relation to previous models of sampling and error-based plasticity more clear. Since there is no formal derivation of the sampling dynamics, it is difficult to assess how they differ exactly from previous (Markov-based) approaches, which should be made more precise. Especially, it would be important to have concrete (ideally experimentally testable) predictions on how these two ideas differ. As a side note, especially in the introduction (line 90), this unclarity about the sampling made it difficult to understand the contrast to Markovian transition models. 

      As the reviewer pointed out, previous computational models have demonstrated that recurrent networks with Hebbian-like plasticity can learn appropriate Markovian statistics (Kappel et al., 2014; Asabuki and Clopath, 2024). However, our model differs conceptually from these previous models. While Kappel et al. showed that STDP in winner-take-all circuits can approximate online learning of hidden Markov models (HMMs), a key difference with our model is that their neural representations acquire sequences using Markovian sampling dynamics, whereas our model does not depend on such ordered sampling. Specifically, in their model, sequential sampling arises from learned structures in the off-diagonal elements of the recurrent connections (i.e., between-assembly connections). In contrast, our network learns to stochastically generate recurrent cell assemblies by relying solely on within-assembly connections. A similar argument can be made for Asabuki and Clopath paper as well. Further, while our model introduced plasticity rule for all types of connections, Asabuki and Clopath paper introduced plasticity for recurrent synapses projecting on the excitatory neurons only and the cell assembly memberships were preconfigured unlike our model. We have added additional clarifying sentences in ll. 757-772 of the revised manuscript to elaborate on this point.

      There are also several related models that have not been mentioned and should be discussed. In 663 ff. the authors discuss the contributions of their model which they claim are novel, but in Kappel et al (STDP Installs in Winner-Take-All Circuits an Online Approximation to Hidden Markov Model Learning) similar elements seem to exist as well, and the difference should be clarified. There is also a range of other models with lateral inhibition that make use of error-based plasticity (most recently reviewed in Mikulasch et al, Where is the error? Hierarchical predictive coding through dendritic error computation), and it should be discussed how the proposed model differs from these. 

      We have clarified the difference from previously proposed recurrent network model to perform Markovian sampling. Please see our reply above.

      We have also included additional sentence in ll. 704-709 in the revised manuscript to discuss how our model differs from similar predictive learning models: “It should be noted that while several network models that perform errorbased computations like ours exploit only inhibitory recurrent plasticity (Mikulasch et al., 2021; Mackwood et al., 2021; Hertäg and Clopath., 2022; Mikulasch et al., 2023), our model learns the structured spontaneous activity to reproduce the evoked statistics by modifying both excitatory and inhibitory recurrent connections.”

      Reviewer #2 (Public Review):

      Summary: 

      The paper considers a recurrent network with neurons driven by external input. During the external stimulation predictive synaptic plasticity adapts the forward and recurrent weights. It is shown that after the presentation of constant stimuli, the network spontaneously samples the states imposed by these stimuli. The probability of sampling stimulus x^(i) is proportional to the relative frequency of presenting stimulus x^(i) among all stimuli i=1,..., 5. 

      Methods: 

      Neuronal dynamics: 

      For the main simulation (Figure 3), the network had 500 neurons, and 5 nonoverlapping stimuli with each activating 100 different neurons where presented. The voltage u of the neurons is driven by the forward weights W via input rates x, the inhibitory recurrent weights G, are restricted to have non-negative weights (Dale's law), and the other recurrent weights M had no sign-restrictions. Neurons were spiking with an instantaneous Poisson firing rate, and each spike-triggered an exponentially decaying postsynaptic voltage deflection. Neglecting time constants of the postsynaptic responses, the expected postsynaptic voltage reads (in vectorial form) as 

      u = W x + (M - G) f (Eq. 5) 

      where f =; phi(u) represents the instantaneous Poisson rate, and phi a sigmoidal nonlinearity. The rate f is only an approximation (symbolized by =;) of phi(u) since an additional regularization variable h enters (taken up in Point 4 below). The initialisation of W and M is Gaussian with mean 0 and variance 1/sqrt(N), N the number of neurons in the network. The initial entries of G are all set to 1/sqrt(N). 

      Predictive synaptic plasticity: 

      The 3 types of synapses were each adapted so that they individually predict the postsynaptic firing rate f, in matrix form 

      ΔW ≈ (f - phi( W x ) ) x^T 

      ΔM ≈ (f - phi( M f ) ) f^T 

      ΔG ≈ (f - phi( M f ) ) f^T but confined to non-negative values of G (Dale's law). 

      The ^T tells us to take the transpose, and the ≈ again refers to the fact that the ϕ entering in the learning rule is not exactly the ϕ determining the rate, only up to the regularization (see Point 4). 

      Main formal result: 

      As the authors explain, the forward weight W and the unconstrained weight M develop such that, in expectations, 

      f =; phi( W x ) =; phi( M f ) =; phi( G f ) , 

      consistent with the above plasticity rules. Some elements of M remain negative. In this final state, the network displays the behaviour as explained in the summary. 

      Major issues: 

      Point 1: Conceptual inconsistency 

      The main results seem to arise from unilaterally applying Dale's law only to the inhibitory recurrent synapses G, but not to the excitatory recurrent synapses M. 

      In fact, if the same non-negativity restriction were also imposed on M (as it is on G), then their learning rules would become identical, likely leading to M=G. But in this case, the network becomes purely forward, u = W x, and no spontaneous recall would arise. Of course, this should be checked in simulations. 

      Because Dale's law was only applied to G, however, M and G cannot become equal, and the remaining differences seem to cause the effect. 

      Predictive learning rules are certainly powerful, and it is reasonable to consider the same type of error-correcting predictive learning rule, for instance for different dendritic branches that both should predict the somatic activity. Or one may postulate the same type of error-correcting predictive plasticity for inhibitory and excitatory synapses, but then the presynaptic neurons should not be identical, as it is assumed here. Both these types of error-correcting and error-forming learning rules for same-branches and inhibitory/excitatory inputs have been considered already (but with inhibitory input being itself restricted to local input, for instance). 

      The model presented above lacked biological plausibility in several key aspects. Specifically, we assumed that the recurrent connection M could change sign through plasticity and be either excitatory or inhibitory, while the inhibitory connection G was restricted to being inhibitory only. This initial setting does not reflect the biological constraint that synapses typically maintain a consistent excitatory or inhibitory type. Furthermore, due to this unconstrained recurrent connectivity M, the original model had two types of inhibitory connections (i.e., the negative part of M and the inhibitory connection G) without providing a clear computational role for each type of inhibition.

      To address these limitations and to understand the role of the two types of inhibition, we explored a new architecture where all recurrent connections are either exclusively excitatory or inhibitory, keeping their sign throughout the learning process. This change addresses the reviewer's concern about our initial assumption that only the inhibitory connection G was constrained to non-negative values. We found that inhibition plays a crucial role in prediction and decorrelation, helping activate specific assemblies through competition while preventing runaway excitation within active assemblies. We have explained this in ll. 561593 in the revised manuscript.

      Point 2: Main result as an artefact of an inconsistently applied Dale's law? 

      The main result shows that the probability of a spontaneous recall for the 5 nonoverlapping stimuli is proportional to the relative time the stimulus was presented. This is roughly explained as follows: each stimulus pushes the activity from 0 up towards f =; phi( W x ) by the learning rule (roughly). Because the mean weights W are initialized to 0, a stimulus that is presented longer will have more time to push W up so that positive firing rates are reached (assuming x is non-negative). The recurrent weights M learn to reproduce these firing rates too, while the plasticity in G tries to prevent that (by its negative sign, but with the restriction to non-negative values). Stimuli that are presented more often, on average, will have more time to reach the positive target and hence will form a stronger and wider attractor. In spontaneous recall, the size of the attractor reflects the time of the stimulus presentation. This mechanism so far is fine, but the only problem is that it is based on restricting G, but not M, to non-negative values. 

      As mentioned above, we have included an additional simulation where all weights are non-negative. We have demonstrated the new results in Figure 6 before presenting the two-population model in the revised manuscript (Figure 7), so that readers can follow the importance of two pathways of inhibitory connections.

      Point 3: Comparison of rates between stimulation and recall. 

      The firing rates with external stimulations will be considerably larger than during replay (unless the rates are saturated). 

      This is a prediction that should be tested in simulations. In fact, since the voltage roughly reads as  u = W x + (M - G) f,  and the learning rules are such that eventually M =; G, the recurrences roughly cancel and the voltage is mainly driven by the external input x. In the state of spontaneous activity without external drive, one has  u = (M - G) f ,  and this should generate considerably smaller instantaneous rates f =; phi(u) than in the case of the feedforward drive (unless f is in both cases at the upper or lower ceiling of phi). This is a prediction that can also be tested. 

      Because the figures mostly show activity ratios or normalized activities, it was not possible for me to check this hypothesis with the current figures. So please show non-normalized activities for comparing stimulation and recall for the same patterns. 

      We agree with the reviewer that the activity levels of spontaneous and induced activity should be compared. We have shown the distributions of activity level of these activities in our new Figure 2d. As expected, we found that the evoked activity showed stronger activity compared to the spontaneous activity.  

      Point 4: Unclear definition of the variable h. 

      The formal definition of h = hi is given by (suppressing here the neuron index i and the h-index of tau) 

      tau dh/dt = -h if h>u, (Eq. 10)  h = u otherwise. 

      But if it is only Equation 10 (nothing else is said), h will always become equal to u, or will vanish, i.e. either h=u or h=0 after some initial transient. In fact, as soon as h>u, h is decaying to 0 according to the first line. If u is >0, then it stops at u=h according to the second line. No reason to change h=u further. If u<=0 while h>u, then h is converging to 0 according to the first line and will stay there. I guess the authors had issues with the recurrent spiking simulations and tried to fix this with some regularization. However as presented, it does not become clear how their regulation works. 

      We apologize for the reviewer that our definition of h was unclear. As the reviewer pointed out, since the memory trace is always positive and larger than (or equal to) the membrane potential, it is possible that the membrane potential becomes always negative and the memory trace reach to 0 constantly. However, since the network is always balanced between excitatory and inhibitory inputs, and it does not happen that the membrane potential always diverges negatively. In fact, we trained without any manipulations other than the memory trace described in the manuscript, and the network was able to learn the assembly structure stably. 

      BTW: In Eq. 11 the authors set the gain beta to beta = beta0/h which could become infinite and, putatively more problematic, negative, depending on the value of h. Maybe some remark would convince a reader that no issues emerge from this. 

      We have mentioned in ll. 864-866 in the revised manuscript that no issues emerge from the slope parameter.

      Added from discussions with the editor and the other reviewers: 

      Thanks for alerting me to this Supplementary Figure 8. Yes, it looks like the authors did apply there Dale's law for both the excitatory and inhibitory synapses. Yet, they also introduced two types of inhibitory pathways converging both to the excitatory and inhibitory neurons. For me, this is a confirmation that applying Dale's law to both excitatory and inhibitory synapses, with identical learning rules as explained in the main part of the paper, does not work. 

      Adding such two pathways is a strong change from the original model as introduced before, and based on which all the Figures in the main text are based. Supplementary Figure 8 should come with an analysis of why a single inhibitory pathway does not work. I guess I gave the reason in my Points 1-3. Some form of symmetry breaking between the recurrent excitation and recurrent inhibition is required so that, eventually, the recurrent excitatory connection will dominate. 

      Making the inhibitory plasticity less expressive by applying Dale's law to only those inhibitory synapses seems to be the answer chosen in the Figures of the main text (but then the criticism of unilaterally applying Dale's law). 

      Applying Dale's law to both types of synapses, but dividing the labor of inhibition into two strictly separate and asymmetric pathways, and hence asymmetric development of excitatory and inhibitory weights, seems to be another option. However, introducing such two separate inhibitory pathways, just to rescue the fact that Dale's law is applied to both types of synapses, is a bold assumption. Is there some biological evidence of such two pathways in the inhibitory, but not the excitatory connections? And what is the computational reasoning to have such a separation, apart from some form of symmetry breaking between excitation and inhibition? I guess, simpler solutions could be found, for instance by breaking the symmetry between the plasticity rules for the excitatory and inhibitory neurons. All these questions, in my view, need to be addressed to give some insights into why the simulations do work. 

      The reviewer’s intuition is correct. To effectively learn cell assembly structures and replay their activities, our model indeed requires two types of inhibitory connections. Please refer to our response above for further details. 

      Overall, Supplementary Figure 8 seems to me too important to be deferred to the Supplement. The reasoning behind the two inhibitory pathways should appear more prominently in the main text. Without this, important questions remain. For instance, when thinking in a rate-based framework, the two inhibitory pathways twice try to explain the somatic firing rate away. Doesn't this lead to a too strong inhibition? Can some steady state with a positive firing rate caused by the recurrence, in the absence of an external drive, be proven? The argument must include the separation into Path 1 and Path 2. So far, this reasoning has not been entered. 

      In fact, it might be that, in a spiking implementation, some sparse spikes will survive. I wonder whether at least some of these spikes survive because of the other rescuing construction with the dynamic variable h (Equation 10, which is not transparent, and that is not taken up in the reasoning either, see my Point 4)

      Perhaps it is helpful for the authors to add this text in the reply to them. 

      We have moved the former Supplemental Figure 8 to the main Figure 7. Please see our response above about the role of dual inhibitory connection types.

      Reviewer #3 (Public Review): 

      Summary: 

      The work shows how learned assembly structure and its influence on replay during spontaneous activity can reflect the statistics of stimulus input. In particular, stimuli that are more frequent during training elicit stronger wiring and more frequent activation during replay. Past works (Litwin-Kumar and Doiron, 2014; Zenke et al., 2015) have not addressed this specific question, as classic homeostatic mechanisms forced activity to be similar across all assemblies. Here, the authors use a dynamic gain and threshold mechanism to circumnavigate this issue and link this mechanism to cellular monitoring of membrane potential history. 

      Strengths: 

      (1) This is an interesting advance, and the authors link this to experimental work in sensory learning in environments with non-uniform stimulus probabilities. 

      (2) The authors consider their mechanism in a variety of models of increasing complexity (simple stimuli, complex stimuli; ignoring Dale's law, incorporating Dale's law). 

      (3) Links a cellular mechanism of internal gain control (their variable h) to assembly formation and the non-uniformity of spontaneous replay activity. Offers a promise of relating cellular and synaptic plasticity mechanisms under a common goal of assembly formation. 

      Weaknesses: 

      (1) However, while the manuscript does show that assembly wiring does follow stimulus likelihood, it is not clear how the assembly-specific statistics of h reflect these likelihoods. I find this to be a key issue. 

      We agree that our previous claim regarding the importance of the memory trace h for sampling may have been confusing. As shown in Supplementary Figure 7b, when we eliminated the dynamics of the memory trace, sampling performance did indeed decrease. However, we also observed that the assembly activity ratio continued to show a linear relationship with stimulus probabilities. Based on these findings, we revised our claim in the manuscript to clarify that the memory trace is primarily critical for learning to avoid trivial solutions, rather than directly influencing sampling within the learned network. We have explained this in ll. 446-448 in the revised manuscript.

      (2) The authors' model does take advantage of the sigmoidal transfer function, and after learning an assembly is either fully active or nearly fully silent (Figure 2a). This somewhat artificial saturation may be the reason that classic homeostasis is not required since runaway activity is not as damaging to network activity. 

      The reviewer's intuition is correct. The saturating nonlinearity is important for the network to form stable assembly structures. We have added an additional sentence in ll. 866-868 to mention this.

      (3) Classic mechanisms of homeostatic regulation (synaptic scaling, inhibitory plasticity) try to ensure that firing rates match a target rate (on average). If the target rate is the same for all neurons then having elevated firing rates for one assembly compared to others during spontaneous activity would be difficult. If these homeostatic mechanisms were incorporated, how would they permit the elevated firing rates for assemblies that represent more likely stimuli? 

      LIF neurons) may solve this problem by utilizing spike-timing statistics.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Minor issues: 

      Figure 1: It would be helpful to display the equation for output rate here as well. 

      We have included the equation in the revised Figure 1a.

      Figure 3c: Typo "indivisual neurons". 

      We have modified the typo. We thank the reviewer for their careful review.

      Line 325: Do you mean Figure 3f,g? 

      We repeated the task with different numbers of stimuli in Supplementary Figure 1c,d.

      Line 398: Winner-take-all can be misunderstood, as it typically stands for competition in inference, not in learning. 

      We have rephrased it as “unstable dynamics” in l. 400

      Line 429: Are intra-assembly and within-assembly the same? If so please use consistent terminology. 

      We have made the terminology consistent.

      Line 792 ff.: Please mention that (t) was left away. 

      We have included a sentence to mention it in ll. 847-848 in the revised manuscript.

      Line 817: Should u_i be v_i? 

      We have modified the term.

      Methods: What is the value of tau_h? 

      We have used 𝜏! \=10 s, which is mentioned in l. 853

    1. Author response:

      We are appreciative of the editors’ and reviewers’ positive comments and constructive suggestions, which will help us to improve our manuscript. We will make changes as required by the reviewers. Our primary focus will be on revising and clarifying certain aspects:

      First, recent research has revealed a strong correlation between brain synchronization and group decision-making, a key neural marker. We aim to bolster our hypothesis by reviewing additional literature, ensuring accuracy in terminology and appropriateness in phrasing.

      Second, it is crucial to note that we will include additional methodological details, such as the details of the experiment, the significance of individual difference variables, and the details of the data analyses.

      Third, despite introducing a novel perspective in our study, we acknowledge the utilization of the conventional fNIRS hyperscanning analyses, which are widely accepted within the research community. Our methodology entails the identification of significant channels via one-sample t-tests, subsequently complemented by either ANOVAs or independent sample t-tests, without the need for double dipping.

      We will address all the issues raised by the reviewers.We believe that the manuscript will significantly benefit from the insightful suggestions and invaluable contributions made by the editors and reviewers.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      In the second round of reviews, Reviewer 2 made three specific comments. The first comment criticises us for not including a set of equations they had requested in their first review. We did, in fact, include the requested equations in our revised submission, which were in the Supplementary Information, and were also cited in the main text of our revised manuscript and our changes were made clear in our response to the reviewer. The second comment, the reviewer suggested adding one word to a sentence in the abstract. We have made this change (line 23). The third comment, the reviewer highlights a sentence where we agree we could have been more clear. The sentence can be rectified by adding one word to the current sentence, which we have done (line 232). We believe the changes required to our manuscript are very minor, and we have implemented these two suggested changes, which are highlighted in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1

      Overall, this work is quite comprehensive and is logically and rigorously designed. The phenotypic and functional data on 2C are strong.

      Thank you for your positive feedback on our findings!

      (1) Comment from Reviewer 1 suggesting the mechanistic insights of 2C are primarily derived from transcriptomic and genomic datasets without experimental verification. 

      Thank you for emphasizing the importance of experimental validation to support our transcriptomic and genomic findings. We acknowledge the gap in direct experimental evidence for the mechanistic insights of section 2C and recognize the value of such validation in strengthening our conclusions. While we recognize the importance of such validation, our current dataset lacks the comprehensive preliminary results necessary for inclusion in the supplemental material. We believe that the mechanistic insights presented offer a substantial foundation for the future research, where we aim to explore these aspects in depth with targeted experimental approaches.

      Reviewer 2

      Together their data may suggest a regenerative effect of 2C both in vitro and in vivo settings. If confirmed, this study might unlock therapeutic strategy for cardiac regeneration.

      Thank you for your positive comment on the significance of our findings and the valuable therapeutic potential of 2C in cardiac regeneration!

      (1) Comment from Reviewer 2 pointing out the the main hypothesis (line 50) that Isl1 cells have regenerative properties is not extremely novel. 

      We agree with the reviewer that Isl1-positive cells possess regenerative properties. Following the reviewer’s suggestion, we have revised the original wording (line 46 in the revised manuscript).

      (2) Comment from Reviewer 2 asking for providing a rationale for this 20x reduction of A-485 concentration? It would be useful to get a titration of this compound for the effects tested. 

      As suggested by the reviewer, we have added the titration results of A-485 in Figure 1—figure supplement 1F-G.

      (3) Comment from Reviewer 2 confusing to clearly understand what proportion of CMs dedifferentiate to become RCCs. The lineage tracing data suggests only 0.6%-1.5% of cells undergo this transition. It is difficult to understand how such a small fraction can have wide effects in their different experimental settings. This is specifically true when the author quantified nuclear and cytosolic area on brightfield pictures - would the same effect on nuclear/cytosolic area be observed in Isl1 KO cells. 

      We appreciate the reviewer's insightful observation on the proportion of CMs undergoing dedifferentiation into RCCs and the potential impact of this subset on our experimental outcomes. The lineage tracing data indicating that only 0.6%-1.5% of CMs transition to RCCs indeed reflects a modest proportion. This observation raises valid questions regarding the broader implications of such a limited fraction in the context of cardiac regeneration and the experimental effects reported. It's important to note that while the proportion of CMs dedifferentiating into RCCs is small, the biological significance and potential impact of these RCCs could be disproportionately large. Emerging evidence suggests that even a minimal number of stem or progenitor cells can exert significant effects on tissue repair and regeneration, possibly through paracrine mechanisms or by acting as key signaling centers within the tissue microenvironment (Fernandes et al., 2015). Regarding the specific question about 2C’s effects on nuclear/cytosolic area in Isl1 knockout (KO) cells, we appreciate the suggestion and consider that such comparative studies would provide valuable insights for future comprehensively understanding the significant impact of 2C-induced RCCs in future search. In addition, ISL1 KO cells are also described in detail in the article published in eLife in 2018 by Quaranta et al.

      (4) Comment from Reviewer 2 asking for the effect of CHIR + I-BET-762 alone. 

      As suggested by the reviewer, we have added the results of CHIR + T-BET-762 in Figure 1—figure supplement 1H.

      (5) Comment from Reviewer 2 suggesting a transparent explaination about the effects of A-485 on acetylation status.

      We thank the reviewer for highlighting the confusion regarding the effects of A-485 on the acetylation status of H3K27Ac and H3K9Ac. Upon re-examination of our data and statements, we recognize the need for clarity in our explanation and the inconsistency it may have caused (lines 223-231 on page 8).

      Initially, our observations suggested a selective effect of A-485 on H3K27Ac based on early experimental results (Figure 7—figure supplement 1). This conclusion was drawn from preliminary analyses that focused predominantly on this specific histone mark. However, upon further comprehensive examination of our data, including additional replicates and more sensitive detection methods, we observed that A-485 also impacts H3K9Ac levels (Figure 7—figure supplement 1F). This latter finding emerged from expanded datasets that were not initially considered in our preliminary conclusions.

      The "further analyses" mentioned referred to these subsequent experimental investigations, which included chromatin immunoprecipitation (ChIP) assays and extended sample sizes, providing a more robust dataset for evaluating the effects of A-485. We understand the importance of transparency and rigor in scientific communication. To address this, we have revised the manuscript to clearly delineate the progression of our analyses and the evidential basis for our revised understanding of A-485's effects. This includes a detailed description of the methodologies employed in our follow-up experiments (line 537 on page 27), the statistical approaches for data analysis (lines 226-227 in supporting information), and how these led to the updated interpretation regarding A-485's impact on histone acetylation (lines232-269).

      (6) Comment from Reviewer 2 asking for the difference in the ChIP peaks representation of the y-axis on the ChIP traces.

      Thank you for raising this quest. Actually, we did not normalise the sequencing depth and the y-axis represents the number of counts (line 537 on page 27 and lines 226-227 in supporting information).

      (7) Comment from Reviewer 2 suggesting the possibility of testing this 2C protocol on mESCs to see if similar changes are subject to and how these mouse RCCs differ transcriptionally from Isl1+ progenitor cells isolated from neonatal mice (P1-P5)?

      Thank you for your insightful questions. Testing the 2C protocol on mouse embryonic stem cells (mESCs) to observe if similar changes occur presents an excellent opportunity to further validate the versatility and applicability of our findings across different stem cell models. We agree that such experiments would not only strengthen the current study but also provide valuable insights into the conservation of mechanisms across species. We are currently in the process of setting up experiments to address this very question and anticipate that the results will significantly contribute to our understanding of cardiomyocyte differentiation processes. Regarding the transcriptional comparison between mouse regenerative cardiac cells (RCCs) induced by our 2C protocol and Isl1+ progenitors isolated from neonatal mice (P1-P5), this comparison is indeed crucial for delineating the specific identity and developmental potential of the RCCs generated. However, a comprehensive side-by-side transcriptomic analysis is required to systematically identify these differences and understand their biological implications. We plan to undertake this analysis as part of our future studies, which will include detailed RNA sequencing and comparative gene expression profiling to elucidate the transcriptional similarities and differences between these cell populations. These future directions will enhance our current findings, provide a deeper mechanistic understanding, and confirm the potential of the 2C protocol in regenerative medicine applications. We appreciate the reviewer's suggestions and acknowledge the importance of these experiments in advancing the field.

      (8) Comment from Reviewer 2 with a suggestion to have a precise clarification of statistics & data acquisition.

      As suggested by the reviewer, we have revised clarifications to make them clearer (lines 228-233 in supporting information and a precise description of each paragraph involving statistical analyses).

      Reviewer 3

      The findings may have a translation potential. The idea of promoting the regenerative capacity of the heart by reprogramming CMs into RCCs is interesting.

      Thank you for your appreciation of the significance and translational potential of our findings!

      (1) Comment from Reviewer 3 suggesting the mechanism involved in the 2C-mediated generation of RCCs is unclear and the lead found in the RAN-seq and ChIP-seq are not experimatally validated.

      We acknowledge the reviewer's concern regarding the lack of experimental validation for the mechanisms identified through RNA-seq and ChIP-seq analyses in the generation of RCCs from the 2C state. We understand the importance of substantiating these molecular leads with empirical data to strengthen our conclusions. Currently, our findings are based on in-depth bioinformatic analyses, which have provided us with valuable insights and a strong basis for hypothesis generation. Moving forward, we plan to prioritize experimental validation of key pathways and targets identified in our study. This will include designing targeted experiments to elucidate the functional roles of these mechanisms in the 2C-mediated generation of RCCs. We appreciate the opportunity to clarify our approach and future directions, and we are committed to addressing this gap in subsequent work.

      (2) Comment from Reviewer 3 considering the very low number of RCCs (0.6%-1.5% of cells) generated cannot protect the heart from MI, and whether 2C affects the the survival or metabolism of existing CM under hypoxia conditions, and what percentage of cells are regenerated by 2C treatment post-MI?

      We appreciate the reviewer's insightful queries regarding the protective effects of 2C treatment against myocardial infarction (MI) given the low percentage of RCCs generated. It is our hypothesis that the benefits of 2C treatment extend beyond mere cell numbers. We propose that 2C may enhance the survival and metabolic resilience of existing CMs under hypoxic conditions, thereby contributing to cardiac protection post-MI. Our future investigations will aim to quantify the precise percentage of cells regenerated by 2C treatment post-MI and explore its broader impacts on cardiac tissue survival and repair mechanisms.

      (3) Comment from Reviewer 3 suggesting the administration of 2C in mice, as well as whether 2C affects cardiac function under basal conditions and any physiology in mice, and the need to examine cardiac structural and functional parameters after administration of 2C.

      We appreciate the reviewer's interest in the potential effects of 2C administration on cardiac function and overall physiology in mice. While we observed a decrease in body weight at P5 compared to controls, our immunofluorescence staining did not indicate any changes in cardiac structure (Figure 4— figure supplement 1E). This suggests that while 2C administration impacts neonatal rat physiology, it does not adversely affect cardiac structure under basal conditions. Further investigations are planned to assess the functional parameters of the heart post-2C administration to comprehensively understand its effects.

      (4) Comment from Reviewer 3 suggesting the potential effects of 2C on other cell types of the heart, including fibroblasts and endothelial cells, in vitro and in vivo.

      We value the reviewer's suggestion to explore the effects of 2C on various cardiac cell types, including fibroblasts and endothelial cells, both in vitro and in vivo. We acknowledge the importance of understanding the broader impact of 2C treatment across different cell populations within the heart, given its potential protective effects. To address this, we are designing a series of experiments to assess 2C's influence on these cell types, aiming to elucidate any changes in their behavior, proliferation, and function following treatment. This comprehensive approach will allow us to better understand the mechanistic basis of 2C's cardioprotective effects.

      (5) Comment from Reviewer 3 suggesting validation the effect of 2C in a dose-dependent manner.

      As suggested by the reviewer, we have supplemented the effect of 2C in dose-dependent (Figure 1— figure supplement 1F-G).

      (6) Comment from Reviewer 3 suggesting an explanation of how A-485 affects H3K27Ac and H3K9Ac.

      We appreciate the reviewer pointing out the discrepancy regarding the effects of A-485 on H3K27Ac and H3K9Ac. Upon re-examination of our data, we realize that our initial interpretation may have overlooked the broader impact of A-485 on histone acetylation patterns. It appears that A-485 does indeed influence both H3K27Ac and H3K9Ac, contrary to our initial statement. This oversight will be corrected in our revised manuscript, where we will provide a more detailed analysis and discussion of A-485's impact on these histone marks, alongside an explanation for the observed effects (lines 223-269 across page 8-9).

      (7) Comment from Reviewer 3 with a correction to use "regeneration" at the screeing stage.

      As suggested by the reviewer, we have amended the wording in the text (line 66 on page 3).

      Reviewer 4

      Comment from Reviewer 4 suggesting more information that clarifies and justifies the hypothesis.

      As suggested by the reviewer, we added more information to clarify and justify the hypothesis (lines 39-47 on page 3).

      (1) Comment from Reviewer 4 pointing out the story line is not well developed.

      To address the reviewer’s question, we revised the manuscript to ensure a smooth and coherent logical flow.

      (2) Comment from Reviewer 4 pointing out the purpose in choosing to study ISL1-CMs.

      As raised by the reviewer, we have clarified the rationale for using ISL1 as a marker to define RCCs in revised manuscript (lines 39-47 on page 3).

      (3) Comment from Reviewer 4 pointing out the missing references in row 57-58.

      Thank you for pointing this out, we fixed it.

      (4) Comment from Reviewer 4 suggesting more explains and show the results of the screening compounds.

      As suggested by the reviewer, we added additional explanations in lines 65-73 and showed the screening results in Figure 1—figure supplement 1F-H.

      (5) Comment from Reviewer 4 suggesting an in-depth discussion of the findings.

      Thank you for the suggestion, we included additional discussion at the end of the article.

      (6) Comment from Reviewer 4 suggesting a conclusion should be inculded in the main text.

      Thank you for the suggestion, we made a revision.

      (7) Comment from Reviewer 4 pointing out the cell viability under different concentrations of 2C.

      As mentioned by the reviewer, have supplemented the cell numbers during different doses of 2C treatment (Figure 2F).

      (8) Comment from Reviewer 4 pointing out the missing information in the methods.

      Thank you for the suggestion, we made additions.

      (9) Comment from Reviewer 4 suggesting more explanations in Figure S3A.

      As mentioned by the reviewer, we made a revision in original Fig.S3A (now is Figure 2—figure supplement 1).

      (10) Comment from Reviewer 4 pointing out the high variability of mCherry cells (%) in Figure 3J.

      Thank you. We made a revision.

      (11) Comment from Reviewer 4 suggesting more explanations on the DNA-binding motif of ISL1 in the cells treated with A-485 or 2C.

      Thank you for the suggestion, we added additional explanations (lines 270-274 on page 9).

      (12) Comment from Reviewer 4 pointing out the unclear labeling in Figure S1B and D.

      Thank you for the suggestion, made a revision (lines 240-245 in supporting information).

      (13) Comment from Reviewer 4 suggesting a relative quantification of the proteins in Figure 1H.

      Thank you for the suggestion. We have quantified the relative expression levels of proteins in original Fig. 1H. As shown in Figure 1F.

      (14) Comment from Reviewer 4 suggesting to provide detailed information in the methodology part about the compounds.

      Thank you for the suggestion, we made a revision.

      (15) Comment from Reviewer 4 pointing out the insufficient explanations on figure legends.

      Thank you for the suggestion, we made a revision.

      (16) Comment from Reviewer 4 suggesting more independent experiments to reduce the high variations in “ns” between NC and 2C at 60h+3d shown in Figure 2E and F.

      Thank you for the suggestion, we made a revision in Figure 2F.

      (17) Comment from Reviewer 4 suggesting a limitations should be provided in the text.

      Thank you for the suggestion, we have made provide a limitation statement in the revised manuscript (lines 300-311 on page 10).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study investigates the impact of Clonal Hematopoiesis of Indeterminate Potential (CHIP) on Immune Checkpoint Inhibitor (ICI) therapy outcomes in NSCLC patients, analyzing blood samples from 100 patients pre- and post-ICI therapy for CHIP, and conducting single-cell RNA sequencing (scRNA-seq) of PBMCs in 63 samples, with validation in 180 more patients through whole exome sequencing. Findings show no significant CHIP influence on ICI response, but a higher CHIP prevalence in NSCLC compared to controls, and a notable CHIP burden in squamous cell carcinoma. Severely affected CHIP groups showed NF-kB pathway gene enrichment in myeloid clusters.

      Strengths:

      The study is commendable for analyzing a significant cohort of 100 patients for CHIP and utilizing scRNA-seq on 63 samples, showcasing the use of cutting-edge technology. The study tackles the vital clinical question of predicting ICI therapy outcomes in NSCLC.

      Weaknesses:

      The manuscript's comparison of CHIP prevalence between NSCLC patients and healthy controls could be strengthened by providing more detailed information on the control group. Specifically, details such as sex, smot king status, and comorbidities are needed to ensure the differences in CHIP are attributable to lung cancer rather than other factors. Including these details, along with a comparative analysis of demographics and comorbidities between both groups and clarifying how the control group was selected, would enhance the study's credibility and conclusions.

      Reviewer #2 (Public Review):

      Summary:

      The authors used a large cohort of patients with metastatic lung cancer pre- and 1-3 weeks post-immunotherapy. The goal was to investigate whether immunotherapy results in changes in CHIP clones (using targeted sequencing and whole exome sequencing) as well as to investigate whether patients with CHIP changed their response to immunotherapy (single-cell RNA sequencing).

      Strengths:

      This represents a large cohort of patients, and comprehensive assays - including targeted sequencing, whole exome sequencing, and single-cell RNA sequencing.

      Weaknesses:

      Findings are not necessarily unexpected. With regards to clonal dynamics, it would be very unlikely to see any changes within a few weeks' time frame. Longer follow-up to assess clonal dynamics would realistically be necessary.

      We truly appreciate constructive comments by the reviewers and editors. We agree with these comments and did our best to address them to improve the paper. Please see the following pages.

      Reviewer #1 (Recommendations For The Authors):

      Comment 1-1. In Figure 3B, the changes in frequency are challenging to discern. Consider employing connected line plots or another visual representation to enhance clarity and interpretation.

      Thank you for the suggestion. We modified Figure 3B to efficiently visualize the changes in cell proportion. Please note that the proportional changes in cell populations were not statistically significant by treatment, pathology, or clonal hematopoiesis (CH) severity.

      Comment 1-2. On page 13, Figure 3D is mentioned before Figure 3C. Please re-order to follow the correct sequence.

      We corrected the sequence of the figure and revised the text accordingly.

      Comment 1-3. Supplementary Figure 9 reveals an intriguing observation: the hypoxia and TNF signaling pathways appear to be regulated in opposite directions between CHIP-negative subjects and those with a Variant Allele Frequency (VAF) greater than 0.1. It would be insightful if the authors could delve into the potential implications or interpretations of this finding.

      We appreciate the reviewer's insightful comment. In the GSEA results presented in Supplementary Figure 9 and Figure 3C, we specifically focused on TNF signaling in monocytes and cDCs. Our subsequent analysis revealed that the adaptation of inflammatory signals is enriched in the myeloid cells in the CHIP-severe patients (Supplementary Fig. S12). Following the reviewer’s comment, we found that the leading-edge genes were shared between the TNF signaling and hypoxia pathways in most clusters (Supplementary Fig. S15). Suggested core genes, such as FOS, DUSP1, JUN, and PPP1R15A, play critical roles in the inflammatory phenotypes of myeloid lineages. Based on this finding, we added a paragraph in the Discussion section to address the implications of these shared signatures as follows (lines 340-348):

      “Our GSEA results specifically indicated the enrichment of TNF signaling and hypoxia pathways in most clusters of patients with severe CH (Supplementary Fig. S9). The leading-edge genes from GSEA results showed core genes such as FOS, DUSP1, JUN, and PPP1R15A, which are known to play critical roles in the inflammatory phenotypes of immune cells, were shared between the TNF signaling and hypoxia pathways in all significant clusters. (Supplementary Fig. S15). Furthermore, gene regulatory network analysis using SCENIC indicated a higher enrichment of inflammatory signatures in myeloid lineages (Supplementary Fig. S9), highlighting the pronounced inflammatory phenotype of CH clones in these cell lineages.”

      Comment 1-4. The plots in Supplementary Figure 12 would benefit from enlargement to improve legibility and facilitate a better understanding of the data presented.

      We improved resolutions and enlarged Supplementary Figure S12.

      Reviewer #2 (Recommendations For The Authors):

      Comment 2-1. The authors state that CHIP is seen at a higher prevalence in the metastatic lung (44/100) vs controls (5/42), however, no in-depth info other than age is given about the normal cohort (Table S2). It would be important to make sure the cohorts are matched with regards to smoking hx, age range, etc before making the claim that CHIP is more frequent in the metastatic lung cancer group.

      Thank you for the comment. To provide additional information of control cohort including current smoking habits and their sex information, we added columns in Table S2. While we tried to match the age distributions between the control group without a history of solid cancer and the lung cancer cohort, we observed that the lung cancer cohort had slightly older ages (mean ages: 58.9 vs. 64.1 years), a higher prevalence of smoking (current smokers: 11/42 vs. 37/100), and a higher proportion of males (male/female: 18/24 vs. 91/9).  Age and smoking are well-known epidemiological contributors to lung cancer and could influence the prevalence of clonal hematopoiesis (CH).

      However, previous studies have reported similar prevalence rates of CH in NSCLC patients, which aligned with our findings (Bolten et al., 2020 Nat Genet; Hong et al., 2022 Cancer Res). Moreover, our most prevalent CH mutations, including DNMT3A, TET2, and PPM1D, were marginally affected by smoking, and this trend has been consistently observed in healthy populations (Levin et al., 2022 Sci Rep). We have acknowledged these factors as major limitations of our study in the Discussion section as follows (lines 379-390):

      “Also, the distinct characteristics of our cohort can be confounders for our results. Compared to control patients, our cohort was biased toward slightly older ages, higher prevalence of smoking habits, and with a higher proportion of males (mean age: 64.1 vs. 58.9; current smokers: 37/100 vs. 11/42; male/female: 91/9 vs. 18/24 Supplementary Figures S1 and S3). However, previous studies have reported similar prevalence rates of clonal hematopoiesis in NSCLC patients, aligned with our findings (9,51). Moreover, our most prevalent CH mutations, including DNMT3A, TET2, and PPM1D, were marginally affected by smoking, and this trend has been consistently observed in both healthy populations and NSCLC patients (10,51,52).”

      Comment 2-2. Figure 1 - 1A states there were 100 CHIP and CHIP-PD mutations identified, but in 1B, C, and D there are < 100 bars and/or dots shown. How were the mutations in 1A then triaged to be shown in 1B-D?

      It appears that our poor annotation caused this misunderstanding. In Figure 1A, we showed the number of samples in each study group but did not provide detailed information in the legend. We found 67 mutations among the 100 patients and presented the mutational statistics in Figures 1B–D. Accordingly, we have revised the Figure 1 legend to clarify this sentence “The numbers indicate sample counts in each group.” (lines 426-427).

      Comment 2-3. Table S4 - would be helpful to have # of variant reads and # of total reads as columns (and also calculate VAF for an additional column).

      Thank you for the comment. We added columns revealing the total number of reads and the number of variant reads in Table S4. Also, we calculated the VAF and included it as a new column as suggested by the reviewer.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The present work from Velloso and collaborators investigated the transcription profiles of resident and recruited hypothalamic microglia. They found sex-dependent differences between males and females and identified the protective role of chemokine receptor CXCR3 against diet-induced obesity.

      Strengths:

      (1) Novelty;

      (2) Relevance, since this work provides evidence about a subset of recruited microglia that has a protective effect against DIO. This provides a new concept in hypothalamic inflammation and obesity.

      Weaknesses:

      (1) Lack of mechanistic insight into the sex-dependent effects;

      (2) Analysis of indirect calorimetry data requires more depth;

      (3) A deeper analysis of hypothalamic inflammation and ER stress pathways would strengthen the manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This study by Mendes et al provides novel key insights into the role of chemotaxis and immune cell recruitment into the hypothalamus in the development of diet-induced obesity. Specifically, the authors reveal that although transcriptional changes in hypothalamic resident microglia following exposure to high-fat feeding are minor, there are compelling transcriptomic differences between resident microglia and microglia recruited to the hypothalamus, and these are sexually dimorphic. Using independent loss-of-function studies, the authors also demonstrate an important role of CXCR3 and hypothalamic CXCL10 in the hypothalamic recruitment of CCR2+ positive cells on metabolism following exposure to high-fat diet-feeding in mice. This manuscript puts forth conceptually novel evidence that inhibition of chemotaxis-mediated immune cell recruitment accelerates body weight gain in high-fat diet-feeding, suggesting that a subset of microglia that express CXCR3 may confer protective, anti-obesogenic effects.

      Strengths:

      The work is exciting and relevant given the prevalence of obesity and the consequences of inflammation in the brain on perturbations of energy metabolism and ensuant metabolic diseases. Hypothalamic inflammation is associated with disrupted energy balance, and activated microglia within the hypothalamus resulting from excessive caloric intake and saturated fatty acids are often thought to be mediators of impairment of hypothalamic regulation of metabolism. The present work reports a novel notion in which immune cells recruited into the hypothalamus that express chemokine receptor CXCR3 may have a protective role against diet-induced obesity. In vivo studies reported herein demonstrate that inhibition of CXCR3 exacerbates high-fat diet-induced body weight gain, increases circulating triglycerides and fasting glucose levels, worsens glucose tolerance, and increases the expression of orexigenic neuropeptides, at least in female mice.

      This work provides a highly interesting and needed overview of preclinical and clinical brain inflammation, which is relevant to readers with an interest in metabolism and immunometabolism in the context of obesity.

      Using flow cytometry, cell sorting, and transcriptomics including RNA-sequencing, the manuscript provides novel insights into transcriptional landscapes of resident and recruited microglia in the hypothalamus. Importantly, sex differences are investigated.

      Overall, the manuscript is perceived to be highly interesting, relevant, and timely. The discussion is thoughtful, well-articulated, and a pleasure to read and felt to be of interest to a broad audience.

      Weaknesses:

      There were no major weaknesses perceived. Some comments for potential textual additions to the results/discussion are listed in recommendations to authors.

      Comments from the authors regarding the evaluation of the article: We publicly express our gratitude for the work of both Reviewers. The comments were timely and constructive and guided us toward preparing a new version of the article which contains novel data that strengthened the overall quality of the study.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Experiments with ovariectomized female mice with (and without) estrogen replacement would help to address the physiological basis of the observed sexdependent effects.

      We performed an experiment with female C57BL/6J Unib, subdivided into Sham, OVX, and OVX+EST groups, which were exposed to HFD for 4 weeks. We monitored the weekly evolution of body weight and food intake. At the end of the protocol, the animals fasted for 4 hours. Then, we measured fasting blood glucose and estradiol; and extracted tissues (hypothalamus and

      WAT). In the hypothalamus samples, we evaluated, by RT-qPCR, the expression of chemokines, chemokine receptors, and some pro-inflammatory cytokines and neuropeptides. We evaluated the body mass relative WAT weight. The new results are presented in Supplementary Figure 1.

      Indirect calorimetric analysis of energy expenditure will benefit from ANCOVA analysis using body weight as a covariate. Moreover, locomotor activity should be also controlled.

      All statistical analysis regarding energy expenditure is corrected by body mass, thus, there is no need for ANCOVA, we clarified this in the text. The determination of locomotor activity is now included in Supplementary Figures 2 and 3. 

      A deeper analysis of hypothalamic inflammation and ER stress pathways would strengthen the manuscript.

      We performed new experiments to determine the expression of hypothalamic inflammation and ER stress pathaways. This is shown in Suppl. Fig. 2 and 3. 

      Mechanistic inhibition of CXCR3 was performed by CXCL10 immunoneutralization and CXCR3 antagonism. Those approaches are correct and well-performed, however considering the experience of the group in hypothalamic studies, I miss a virogenetic-based knockdown. Do the authors have any data on that?

      This is indeed a great point. Unfortunately, we did not succeed in obtaining mice Cre lineages that would be needed for the proposed experiments. We included this as a weakness of the study. 

      Reviewer #2 (Recommendations For The Authors):

      There are a few typographical errors for correction:

      -  Page 4, line 157: CCL10 to CXCL10.

      -  Page 6, line 226: makers to markers.

      -  Page 7, lines 283 and 287, Figure 6C: INF to IFN.

      All errors were corrected, as recommended. 

      Parts of the manuscript may be difficult for readers without knowledge of transcriptomics to interpret; thus, further description of several of the figures (e.g. Figure 3 and 4) may be helpful.

      We expanded the text in Results to clarify this issue.

      Could the authors comment on the choice of peripheral administration of CXCR3 antagonist as opposed to central (e.g. icv) administration? Indeed, systemic inhibition of CXCR3 produced significant alterations in body weight gain and glucose tolerance in female mice given high-fat diets and reduced CCR2 and CXCR3 immunostaining in the hypothalamus. Could changes to peripheral (e.g. WAT, liver) immune responses to the diet underlie the metabolic changes observed?

      CXCR3+ cells are present in very small numbers in the hypothalamus under basal conditions. In HFD, these are recruited from the periphery to the CNS, so, we believe ICV treatment with AMG487 would not reduce recruitment to the hypothalamic parenchyma. With the same animals in which we performed the locomotor activity, we performed RT-qPCR of WAT and liver and analyzed the expression of genes involved in lipid and glucose metabolism. This is now in Supplementary Figures 2 and 3. We included a comment in the text to explain our rationale for this approach.

      Besides hypothalamic mRNA levels of chemokines and chemokine receptors, does systemic CXCR3 antagonism affect other aspects linked to diet-induced impairments of hypothalamic regulation of energy homeostasis, like inflammation, ER stress and/or mitochondrial dynamics/function? It would be interesting to reveal the consequence of reduced CCR2+ microglial migration to the hypothalamus with chronic high-fat diet exposure.

      We performed new experiments shown in Supplementary Figures 2 and 3 to deal with these important questions. In the hypothalamus of females there were no changes in the expression of transcripts encoding proteins involved in endoplasmic reticulum homeostasis and mitochondrial turnover, whereas in males there was a reduction of Ddit3 and Mfn1. Moreover, in females the inhibition of CXCR3 promoted no changes in the liver expression of lipidogenic and gluconeogenic genes, and no changes in the white adipose tissue expression of lipidogenic genes. In the liver of males, there was a reduction in the expression of Fasn and an increase in the expression of G6pc3. As for the females, in males, there were no changes in the white adipose tissue expression of lipidogenic genes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Weaknesses:

      There are some minor weaknesses.

      Comment 1:Notably, there are not a lot of new insights coming from this paper. The structural comparisons between MCC and PCC have already been described in the literature and there were not a lot of significant changes (outside of the exo- to endo- transition) in the presence vs. absence of substrate analogues.

      We agree that the structures of the human MCC and PCC holoenzymes are similar to their bacterial homologs. That is due to the conserved sequences and functions of MCC and PCC across different species.

      Comment 2: There is not a great deal of depth of analysis in the discussion. For example, no new insights were gained with respect to the factors contributing to substrate selectivity (the factors contributing to selectivity for propionyl-CoA vs. acetyl-CoA in PCC). The authors state that the longer acyl group in propionyl-CoA may mediate stronger hydrophobic interactions that stabilize the alpha carbon of the acyl group at the proper position. This is not a particularly deep analysis and doesn't really require a cryo-EM structure to invoke. The authors did not take the opportunity to describe the specific interactions that may be responsible for the stronger hydrophobic interaction nor do they offer any plausible explanation for how these might account for an astounding difference in the selectivity for propionyl-CoA vs. acetyl-CoA. This suggests, perhaps, that these structures do not yet fully capture the proper conformational states.

      We appreciate this comment. Unfortunately, in the cryo-EM maps of the PCC holoenzymes, the acyl groups were not resolved (fig. S6), so we were unable to analyze the specific interactions between the acyl-CoAs and PCC. We have revised the manuscript and acknowledged this limitation in the second paragraph of the discussion section: 

      “In the cryo-EM maps of the PCC holoenzymes, the acyl groups of acetyl-CoA and propionylCoA were not resolved (fig. S6), limiting the analysis of the interactions between the acyl groups and PCC. Nevertheless, the PCC-PCO and PCC-ACO structures determined in our study demonstrate that the conformations of the acyl-CoA binding pockets in the two structures are almost identical (Fig. 3F, fig. S7, B and C). In addition, the well resolved CoA groups of propionyl-CoA and acetyl-CoA bind at the same position in human PCC holoenzyme (Fig. 3F). These findings indicate that propionyl-CoA and acetyl-CoA bind to PCC with a similar binding mode.”

      Comment 3: The authors also need to be careful with their over-interpretation of structure to invoke mechanisms of conformational change. A snapshot of the starting state (apo) and final state (ligand-bound) is insufficient to conclude *how* the enzyme transitioned between conformational states. I am constantly frustrated by structural reports in the biotin-dependent enzymes that invoke "induced conformational changes" with absolutely no experimental evidence to support such statements. Conformational changes that accompany ligand binding may occur through an induced conformational change or through conformational selection and structural snapshots of the starting point and the end point cannot offer any valid insight into which of these mechanisms is at play.

      Point accepted. We have revised our manuscript to use conformational differences instead of conformational changes to describe the differences between the apo and ligand-bound states (see the last paragraph of the introduction section and the third paragraph of the discussion section).

      Reviewer #2 (Public Review):

      Comments and questions to the manuscripts:

      Comment 1: I'm quite impressed with the protein purification and structure determination, but I think some functional characterization of the purified proteins should be included in the manuscript. The activity of enzymes should be the foundation of all structures and other speculations based on structures.

      We appreciate this comment. However, since we purified the endogenous BDCs and the sample we obtained was a mixture of four BDCs, the enzymatic activity of this mixture cannot accurately reflect the catalytic activity of PCC or MCC holoenzyme. We have revised the manuscript and acknowledged this limitation in the first paragraph of the results section: 

      “We did not characterize the enzyme activities of the mixed BDCs because the current methods used to evaluate the carboxylase activities of BDCs, such as measuring the ATP hydrolysis or incorporation of radio-labeled CO2, are unable to differentiate the specific carboxylase activity of each BDC.”

      Comment 2: In Figure 1B, the structure of MCC is shown as two layers of beta units and two layers of alpha units, while there is only one layer of alpha units resolved in the density maps. I suggest the authors show the structures resolved based on the density maps and show the complete structure with the docked layer in the supplementary figure.

      We appreciate this comment. We have shown the cryo-EM maps of the PCC and MCC holoenzymes in fig. S8 to indicate the unresolved regions in these structures. The BC domains in one layer of MCCα in the MCC-apo structure were not resolved. However, we think it would be better to show a complete structure in Fig. 1 to provide an overall view of the MCC holoenzyme. We have revised Fig. 1B and the figure legend to clearly point out which domains were not resolved in the cryo-EM map and were built in the structure through docking. We have also revised the main text to clearly describe which parts of the holoenzymes were not resolved in the cryo-EM maps and how the complete structures were built.

      Comment 3: In the introduction, I suggest the author provide more information about the previous studies about the structure and reaction mechanisms of BDCs, what is the knowledge gap, and what problem you will resolve with a higher resolution structure. For example, you mentioned in line 52 that G437 and A438 are catalytic residues, are these residues reported as catalytic residues or this is based on your structures? Has the catalytic mechanism been reported before? Has the role of biotin in catalytic reactions revealed in previous studies?

      Point accepted. It was reported that G419 and A420 in Streptomyces coelicolor PCC, corresponding to G437 and A438 in human PCCβ, were the catalytic residues for the secondstep carboxylation reaction (PMID: 15518551). The same study also reported the catalytic mechanism of the carboxyl transfer reaction. The role of biotin in the BDC-catalyzed carboxylation reactions has been extensively studied (PMIDs: 22869039, 28683917). We have revised the manuscript to introduce the catalytic mechanisms of BDCs elucidated through the investigation of prokaryotic BDCs in the fourth paragraph of the introduction section. 

      Comment 4: In the discussion, the authors indicate that the movement of biotin could be related to the recognition of acyl-CoA in BDCs, however, they didn't observe a change in the propionyl-CoA bound MCC structure, which is contradictory to their speculation. What could be the explanation for the exception in the MCC structure?

      We appreciate this comment. We do not have a good explanation for why we did not observe a change in the propionyl-CoA bound MCC structure. It is noteworthy that neither acetyl-CoA nor propionyl-CoA is the natural substrate of MCC. Recently, a cryo-EM structure of the human MCC holoenzyme in complex with its natural substrate, 3-methylcrotonyl-CoA, has been resolved (PDB code: 8J4Z). In this structure, the binding site of biotin and the conformation of the CT domain closely resemble that in our acetyl-CoA-bound MCC structure. Therefore, the movement of biotin induced by acetyl-CoA binding mimics that induced by the binding of MCC's natural substrate, 3-methylcrotonyl-CoA, indicating that in comparison with propionylCoA, acetyl-CoA is closer to 3-methylcrotonyl-CoA regarding its ability to bind to MCC. We have discussed this possibility in the last paragraph of the discussion section. We have also added a supplementary figure (fig. S11) to compare the structures of human MCC holoenzyme in complex with acetyl-CoA and 3-methylcrotonyl-CoA.

      Comment 5: In the discussion, the authors indicate that the selectivity of PCC to different acyl-CoA is determined by the recognition of the acyl chain. However, there are no figures or descriptions about the recognition of the acyl chain by PCC and MCC. It will be more informative if they can show more details about substrate recognition in Figures 3 and 4.

      We appreciate this comment. Unfortunately, in the cryo-EM maps of the PCC holoenzymes, the acyl groups were not resolved (fig. S6), so we were unable to analyze the specific interactions between the acyl-CoAs and PCC. We have revised the manuscript and acknowledged this limitation in the second paragraph of the discussion section: 

      “In the cryo-EM maps of the PCC holoenzymes, the acyl groups of acetyl-CoA and propionylCoA were not resolved (fig. S6), limiting the analysis of the interactions between the acyl groups and PCC. Nevertheless, the PCC-PCO and PCC-ACO structures determined in our study demonstrate that the conformations of the acyl-CoA binding pockets in the two structures are almost identical (Fig. 3F, fig. S7, B and C). In addition, the well resolved CoA groups of propionyl-CoA and acetyl-CoA bind at the same position in human PCC holoenzyme (Fig. 3F). These findings indicate that propionyl-CoA and acetyl-CoA bind to PCC with a similar binding mode.”

      Comment 6: How are the solved structures compared with the latest Alphafold3 prediction?

      Since AlphaFold3 was not released when our manuscript was submitted, we did not compare the solved structures with the AlphaFold3 predictions. We have now carried out the predictions using Alphafold3. Due to the token limitation of the AlphaFold3 server, we can only include two α and six β subunits of human PCC or MCC in the prediction. The overall assembly patterns of the Alphafold3-predicted structures are similar to that of the cryo-EM structures. The RMSDs between PCCα, PCCβ, MCCα, and MCCβ in the apo cryo-EM structures and those in the AlphaFold3-predicted structures are 7.490 Å, 0.857 Å, 7.869 Å, and 1.845 Å, respectively. The PCCα and MCCα subunits adopt an open conformation in the cryo-EM structures but adopt a closed conformation in the AlphaFold-3 predicted structures, resulting in large RMSDs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      DMS-MaP is a sequencing-based method for assessing RNA folding by detecting methyl adducts on unpaired A and C residues created by treatment with dimethylsulfate (DMS). DMS also creates methyl adducts on the N7 position of G, which could be sensitive to tertiary interactions with that atom, but N7-methyl adducts cannot be detected directly by sequencing. In this work, the authors adopt a previously developed method for converting N7-methyl-G to an abasic site to make it detectable by sequencing and then show that the ability of DMS to form an N7-methyl-G adduct is sensitive to RNA structural context. In particular, they look at the G-quadruplex structure motif, which is dense with N7-G interactions, is biologically important, and lacks conclusive methods for in-cell structural analysis. 

      Strengths: 

      - The authors clearly show that established methods for detecting N7-methyl-G adducts can be used to detect those adducts from DMS and that the formation of those adducts is sensitive to structural context, particularly G-quadruplexes. 

      - The authors assess the N7-methyl-G signal through a wide range of useful probing analyses, including standard folding, adduct correlations, mutate-and-map, and single-read clustering. 

      - The authors show encouraging preliminary results toward the detection of G-quadruplexes in cells using their method. Reliable detection of RNA G-quadruplexes in cells is a major limitation for the field and this result could lead to a significant advance. 

      - Overall, the work shows convincingly that N7-methyl-G adducts from DMS provide valuable structural information and that established data analyses can be adapted to incorporate the information. 

      We thank the reviewer for their time and appreciate the reviewer for their positive assessment as well as for their suggestions which we have addressed below.

      Weaknesses: 

      - Most of the validation work is done on the spinach aptamer and it is the only RNA tested that has a known 3D structure. Although it is a useful model for validating this method, it does not provide a comprehensive view of what results to expect across varied RNA structures. 

      Thank you for your insightful comments. We agree that a more comprehensive view of BASH MaP involves probing a larger variety of RNAs with known 3D-structures beyond Spinach and the poly-UG RNA. Although outside the scope of this publication, more work is needed to reveal the determinants of N7G reactivity to DMS.

      - It's not clear from this work what the predictive power of BASH-MaP would be when trying to identify G-quadruplexes in RNA sequences of unknown structure. Although clusters of G's with low reactivity and correlated mutations seem to be a strong signal for G-quadruplexes, no effort was made to test a range of G-rich sequences that are known to form G-quadruplexes or not. Having this information would be critical for assessing the ability of BASH-MaP to identify G-quadruplexes in cells. 

      - Although the authors present interesting results from various types of analysis, they do not appear to have developed a mature analysis pipeline for the community to use. I would be inclined to develop my own pipeline if I were to use this method. 

      Thank you for your suggestion. We have more clearly annotated the python scripts and GitHub repository which contain all custom scripts used for analyzing BASH MaP data. These changes will enable researchers to more easily utilize our developed pipelines.

      - There are various aspects of the DAGGER analysis that don't make sense to me: <br /> (1) Folding of the RNA based on individual reads does not represent single-molecule folding since each read contains only a small fraction of the possible adducts that could have formed on that molecule. As a result, each fold will largely be driven by the naive folding algorithm. I recommend a method like DREEM that clusters reads into profiles representing different conformations. 

      (2) How reliable is it to force open clusters of low-reactivity G's across RNA's that don't already have known G-quadruplexes? 

      (3) By forcing a G-quadruplex open it will be treated as a loop by the folding algorithm, so the energetics won't be accurate. 

      (4) It's not clear how signals on "normal" G's are treated. In Figure 5C some are wiped to 0 but others are kept as 1. 

      Thank you for your keen observations regarding the conceptual frameworks utilized in DAGGER. We have included a complimentary analysis to DAGGER utilizing Spinach BASH MaP data with DANCE, an algorithm which shares an underlying architecture with DREEM, and found that DANCE analysis gave similar results to those found with DAGGER. However, we have not benchmarked DAGGER’s performance on a range of RNAs and compared the results with expectation-maximization algorithms like DREEM and DANCE.

      To minimize the effects of artificially creating loops with tertiary folding constraints, we utilized the RNA folding algorithm CONTRAfold which relies less on direct energetic calculations than other commonly used RNA folding algorithms such as RNAstructure.

      We have updated the main text to more clearly indicate how DAGGER handles signals at G’s in a range of conditions. The main text now better clarifies the specific logic used for determining which G’s contain either a 0 or a 1 in the bitvector encoding used in DAGGER analysis.

      Reviewer #2 (Public Review): 

      Summary: 

      The manuscript introduces BASH MaP and DAGGER, innovative tools for analyzing RNA tertiary structures, specifically focusing on the G-quadruplexes. Traditional methods have struggled to detect and analyze these structures due to their reliance on interactions on the Hoogsteen face of guanine, which are not readily observable through conventional probing that targets Watson-Crick interactions. BASH MaP employs dimethyl sulfate and potassium borohydride to enhance the detection of N7-methylguanosine by converting it into an abasic site, thereby enabling its identification through misincorporation during reverse transcription. This method provides higher precision in identifying G-quadruplexes and offers deeper insights into RNA's structural dynamics and alternative conformations in both vitro and cellular contexts. Overall, the study is well-executed, demonstrating robust signal detection of N7-Gs with some compelling positive controls, thorough analysis, and beautifully presented figures. 

      Strengths: 

      The manuscript introduces a new method to detect G-quadruplexes (G-qs) that simplifies and potentially enhances the robustness and quantification compared to previous methods relying on reverse transcription truncations. The authors provide a strong positive control, demonstrating a 70% misincorporation at endogenous N7-G within the 18S rRNA, which illustrates BASH MaP's high signal-to-noise ratio. The data concerning the detection of positive control G-qs is particularly compelling. 

      Weaknesses: 

      Figure 3E shows considerable variability in the correlations among guanosines, suggesting that the methods may struggle with specificity in determining guanosine participation within and between different quadruplexes. There is no estimation of the methods false positive discovery rate.

      Thank you for your positive assessment and for your time to come up with suggestions to improve this publication. We have addressed your specific comments in the “Recommendations For The Authors” section below.

      Reviewer #3 (Public Review): 

      Summary: 

      In this study, the authors aim to develop an experimental/computational pipeline to assess the modification status of an RNA following treatment with dimethylsulfate (DMS). Building upon the more common DMS Map method, which predominantly assesses the modification status of the Watson-Crick-Franklin face of A's and C's, the authors insert a chemical processing step in the workflow prior to deep sequencing that enables detection of methylation at the N7 position of guanosine residues. This approach, termed BASH MaP, provides a more complete assessment of the true modification status of an RNA following DMS treatment and this new information provides a powerful set of constraints for assessing the secondary structure and conformational state of an RNA. In developing this work, the authors use Spinach as a model RNA. Spinach is a fluorogenic RNA that binds and activates the fluorescence of a small molecule ligand. Crystal structures of this RNA with ligand bound show that it contains a G-quadruplex motif. In applying BASH MaP to Spinach, the authors also perform the more standard DMS MaP for comparison. They show that the BASH MaP workflow appears to retain the information yielded by DMS MaP while providing new information about guanosine modifications. In Spinach, the G-quadruplex G's have the least reactive N7 positions, consistent with the engagement of N7 in hydrogen bonding interactions at G's involved in quadruplex formation. Moreover, because the inclusion of data corresponding to G increases the number of misincorporations per transcript, BASH MaP is more amenable to analysis of co-occurring misincorporations through statistical analysis, especially in combination with site-specific mutations. These co-occurring misincorporations provide information regarding what nucleotides are structurally coupled within an RNA conformation. By deploying a likelihood-ratio statistical test on BASH MaP data, the authors can identify Gs in G-quadruplexes, deconvolute G-G correlation networks, base-triple interactions and even stacking interactions. Further, the authors develop a pipeline to use the BASH MaP-derived G-modification data to assist in the prediction of RNA secondary structure and identify alternative conformations adopted by a particular RNA. This seems to help with the prediction of secondary structure for Spinach RNA. 

      Strengths: 

      The BASH Map procedure and downstream data analysis pipeline more fully identify the complement of methylations to be identified from the DMS treatment of RNA, thereby enriching the information content. This in turn allows for more robust computational/statistical analysis, which likely will lead to more accurate structure predictions. This seems to be the case for the Spinach RNA. 

      Weaknesses: 

      The authors demonstrate that their method can detect G-quadruplexes in Spinach and some other RNAs both in vitro and in cells. However, the performance of BASH MaP and associated computational analysis in the context of other RNAs remains to be determined. 

      We thank the reviewer for their time spent analyzing this manuscript, for their positive assessment and for their suggestions on improving this publication. We have addressed your specific comments in the “Recommendations For The Authors” section below.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Although the text is clear and coherent, the overall flow of the manuscript comes across as "here's a bunch of stuff I tried." Maybe you're looking to get this out quickly, but it would have been much more impactful (and enjoyable to read) a description of a more polished final product. 

      Thank you for your highlighting the strengths and weaknesses of this manuscript. We have changed parts of the main text to enhance the overall flow of the manuscript and increase reader enjoyability.

      Reviewer #2 (Recommendations For The Authors): 

      I have only a few comments: 

      Major: 

      (1) Analysis of Guanosine Correlations in Figure 3E: In Figure 3E, there is a lot of variability in the correlations among guanosines. For example, G46 shows a strong correlation with G93 (within the same quadruplex) but also correlates with G91, G95 (in different quadruplexes), and G97 (not part of any quadruplex as per the model in Figure 3C). Contrarily, G86 exhibits weak correlations, and G50 along with G89 shows no significant correlations. These findings imply that BASH MaP followed by RING MaP analysis struggles to accurately distinguish between guanosines within the same or different quadruplexes in Spinach. Perhaps there are some opportunities to enhance the specificity in determining guanosine participation within quadruples, a great point for the authors to discuss. 

      Thank you for your comments and careful analysis of the pattern of correlations produced by BASH MaP. We agree that BASH MaP followed by RING MaP analysis is unable to unambiguously distinguish between guanosines within the same or different quadruplex layers. This finding was a surprise as we initially assumed that quadruplex layers would behave in a manner like Watson-Crick base pairs and produce specific signals in the corresponding RING MaP heatmaps.  We suspect that this may be due to mutations in specific G’s being associated with altered conformations which allow other G’s to form different interactions that affect DMS reactivity.  This may be unique to the highly complex structure in Spinach.  However, we think BASH-MaP clearly provides signals that point to key residues within the G-quadruplex, even if it does not clearly identify all of them.

      This idea is supported by experiments described in Figure 4, which show that mutation of a single guanosine residue causes a complete breakdown of the hydrogen-bonding network throughout all quadruplex layers. Additionally, DMS methylation of an N7G in a quadruplex is likely to disrupt base stacking interactions in and around the quadruplex. The compounding effects of a dynamic G-quadruplex and DMS-induced changes to local base stacking properties explains both the strong correlations with G97, which is base-stacked with the quadruplex, and the inability to specifically identify the guanosines which comprise specific quadruplex quartets. We have further emphasized this point in an updated discussion section.

      (2) Potential Consolidation of Figures 3 and 4: Figure 4 appears quite similar to Figure 3 but employs M2-seq instead of relying on spontaneous mutations. It might be beneficial to merge these figures to demonstrate that M2-seq can more effectively identify correlations between guanosines in quadruplexes. 

      We agree that Figures 3 and 4 appear quite similar but there is an important distinction to be made between RING MaP and M2-seq analysis. We suspect that the mechanism causing correlations between guanosines in quadruplexes for RING MaP as “RNA breathing” in contrast to the spontaneous T7 RNA polymerase-induced mutation model proposed in Cheng et al. PNAS 2017, https://doi.org/10.1073/pnas.1619897114. To determine whether correlations between guanosines in Spinach BASH MaP experiments rely on spontaneous mutations, we compared the fraction of reads containing misincorporations at pairs of quadruplex guanosines over a range of DMS concentrations.  The spontaneous mutation model predicts a linear dependence between quadruplex guanosine signals and DMS dose while an “RNA breathing” or double-DMS hit model predicts a quadratic dependence on DMS dose (Cheng et al. PNAS 2017, https://doi.org/10.1073/pnas.1619897114). Our data may support a quadratic dependence on DMS dose for multiple pairs of G-quadruplex guanosines, while they demonstrate a linear dependence between helical G’s (Supplementary Data Fig. 9). Together, these data suggest that BASH MaP followed by RING MaP analysis detects double-DMS modification events for pairs of quadruplex guanosines. Therefore, BASH MaP and RING MaP analysis provide a complimentary approach to M2 BASH MaP and reveal guanosine correlations in contexts where pre-installed mutations are incompatible such as the study of endogenously expressed RNAs.

      (3) Estimation of False Positive Rates: An estimation of the false positive rate for G-quadruplex identification would be invaluable. Since identification currently depends on the absence of DMS modification, it's important to consider how other factors like solvent inaccessibility or library generation might affect the detection and be misinterpreted as G-quadruplexes. Although this could be a subject of future work, some discussion by the authors would enhance the manuscript. 

      We have added a table summarizing sensitivity, positive predictive value, and false positive rate for different G-quadruplex identification schemes.  See Supplementary Table 1.

      Minor: 

      (4) Line 273 Reference Correction: Please adjust the reference in line 273 to accurately reflect that the G-quadruplex experiments compare potassium with lithium, not sodium. 

      In cellulo G-quadruplex reverse transcriptase (RT) stop assays as described by Guo and Bartel (https://www.science.org/doi/10.1126/science.aaf537) compared RT stops between DMS treated mRNA refolded in potassium and sodium buffers. We have clarified in the text that traditionally, G-quadruplex RT stop assays compare potassium with lithium.

      (5) Consistency in Figure 1 (Panels F and G): Aligning BASH MaP (170 mM DMS) as the y-axis in both panels F and G would visually align the data points and enhance the graphical coherence across these panels. 

      Thank you for noticing the subtleties in our data presentation and for the suggestion on how to improve our graphical coherence across panels. We specifically choose not to align BASH MaP (170 mM DMS) as the y-axis for panels F and G because we did not want the reader to mistakenly assume that the data for BASH MaP (170 mM DMS) presented in panels F and G is the same data. In panel F, BASH MaP was performed under standard DMS probing buffer conditions which utilized a pH 7.5 bicine buffer. The purpose of panel F is to show the reproducibility of BASH MaP under various DMS concentrations. In panel G, BASH MaP was performed under DMS probing buffer conditions which promote the formation of m3U using a pH 8.3 bicine buffer. The purpose of panel G is to show that the borohydride treatment and depurination steps in BASH MaP do not react with DMS-derived m1A, m3C, and m3U in a manner which prevents their measurement through cDNA misincorporation. Together, these experimental differences cause the data points for BASH MaP (170 mM DMS) to vary between panels F and G which would lead to more confusion for the reader and detract from the intended message we are trying to convey through panels F and G. 

      (6) Statistical Detail in Figure 1E: Incorporating a confidence interval or a P-value in Figure 1E would enrich the statistical depth and provide readers with a clearer understanding of the data's significance. 

      Thank you for the suggestion of including a p-value in Figure 1E to provide the readers with a clearer understanding of the data’s significance. The effect of combining DMS treatment and borohydride reduction on the misincorporation rate of G’s in Spinach is so dramatic that the raw data sufficiently provides the readers a clear understanding of its significance.

      (7) Reevaluation of Figure 2B: Considering the small number of Gs in single-stranded regions and base triples, it might be more informative to move Figure 2B to supplementary information. Focusing on Figure 2C, which consolidates non-quadruplex categories, could provide more impactful insights. 

      Thank you for your suggestion. It is important to initially provide an overall characterization of N7G DMS reactivity for G’s in a variety of structural contexts before more specifically looking at G-quadruplexes. Panel B is an important part of figure 2 for the following two reasons:

      First, a reader’s first question upon seeing the N7G chemical reactivity for Spinach as showed in Figure 2A is likely to ask whether base-paired G’s and single-stranded G’s have similar or different DMS reactivities. Figure 2, panel B shows that generally, single-stranded G’s appear to have higher DMS reactivity than base-paired G’s except for 2 G’s which display hyper-reactivity. The basis for this hyper-reactivity is addressed in Figure 4.

      Second, panel B highlights the wide range in N7G DMS reactivities. Since the G-quadruplex G’s display a dramatically lower DMS reactivity as compared to single-stranded G’s and hyper-reactive base-paired G’s, the dynamic range of DMS reactivities was difficult to capture in a single panel. Panel C does not convey these dynamics appropriately as a stand-alone figure.

      (8) Enhancements to Figure 2G: Improving the visibility of mutation rates in this figure would help. Suggestions include coloring bars by nucleotide type for intuitive visual comparison and adjusting the y-axis to a logarithmic scale to better represent near-zero mutation rates. Additionally, employing histograms or box plots could directly compare DMS reactivities and provide a clearer analysis. 

      Thank you for your suggestions on enhancing the presentation of BASH MaP applied to an mRNA. The main purpose of figure 2G was to validate whether BASH MaP could detect G’s engaged in a G-quadruplex in a cell. In-cell G-quadruplex folding measurements as performed by Guo and Bartel (https://www.science.org/doi/10.1126/science.aaf537) only identified a few G-quadruplexes which were folded and only the 3’ end of the G-quadruplex was detected. We therefore reasoned that the 3’ most G’s of these select set of G-quadruplexes were the only validated G’s engaged in a G-quadruplex in cells. In the instance of the AKT2 mRNA, Guo and Bartel found that 4 G’s appeared to be folded in a G-quadruplex in cells (Supplementary figure 2E). These G’s are indicated at the bottom of the plot with black bars and the label “In-cell G-quadruplex guanosines”. Therefore, we hypothesized that these G’s would display low DMS reactivity with BASH MaP while other G’s in the AKT2 mRNA would display higher chemical reactivities. We followed a standard convention in displaying chemical reactivities used extensively in the field where black bars indicate low reactivity, yellow bars indicate moderate reactivity, and red bars indicate high reactivity. The data in Fig 2G directly supports Guo and Bartel’s prediction of an in-cell folded G-quadruplex in the AKT2 mRNA because the 4 G’s predicted to be engaged in a G-quadruplex all displayed near zero DMS reactivities.

      We agree that adjusting the y-axis to a logarithmic scale would better represent near-zero mutations rates. However, the purpose of figure 2G is not to compare all positions with near-zero mutation rates. Instead, our use of standard conventions in displaying chemical reactivities is sufficient for the purpose of displaying BASH MaP’s ability to validate in-cell G-quadruplex G’s.

      Later in the paper, we go a step further and create a better criterion than simple N7G DMS reactivity for identifying G’s engaged in a G-quadruplex. For further analysis of G’s with near zero DMS reactivities, see Figure 3 and Supplementary figure 4 which utilizes RING Mapper to identify lowly-reactive G’s which produce co-occurring misincorporations.

      (9) Scale Consistency in Figure 3: Ensuring that the correlation scales are uniform across Panels A, B, D, and E would facilitate easier comparison of the data, enhancing the overall coherence of the findings. Using raw correlation values could also improve clarity and interpretation. 

      Thank you for the suggestions to facilitate easier comparisons of data in Figure 3. We have ensured the correlation scales are uniform across panels A, B, D, and E to enhance the coherence of these findings. We initially visualized the data in Figure 3 by plotting raw correlation values, but we found these values differed between DMS MaP and BASH MaP datasets, likely because of the low-level background mutations introduced by the borohydride reduction step of BASH (see Supplementary figure 3A). However, performing a global normalization of correlation strength values computed by RING mapper enabled clear comparisons between DMS MaP and BASH MaP RING heatmaps and revealed structural domains consistent with the crystal structure of Spinach.

      (10) Correction on Line 506: Please update the reference to M2 BASH MaP for accuracy. 

      Thank you. We have updated the main text to incorporate this comment.

      Reviewer #3 (Recommendations For The Authors): 

      The paper describes multiple applications and multiple methods of analysis of the BASH Map data, which collectively make the manuscript more difficult to follow. The manuscript would become more readable and user-friendly if there were some overview figures to describe the sequencing pipeline and the various computational workflows that the BASH MaP data are fed into (e.g. RING Mapper, DAGGER, M2 BASH MaP, Co-occurring Misincorporations, Secondary Structure Prediction). One or more summary schemes that provide an overview would strongly assist with the clarity and overall content of the paper. 

      Thank you for your suggestions. We have incorporated a summary scheme of the various computational workflows and their use cases in Fig 7.

      Line 165. Here, misincorporation rates for all four nucleotides are discussed, but m3U is not mentioned until from the following paragraph. It would be appropriate and clearer to mention this sooner. 

      Thank you for your suggestion. We have restructured this section to introduce the DMS modification m3U in an earlier paragraph to increase clarity for readers.

      Line 506: spelling of DAGGER. 

      Thank you. We have updated the main text to incorporate this comment.

      Line 645: I found this paragraph difficult to follow, especially the line starting 649. I thought the logic was to exclude G's involved in tertiary interactions from base-paring in the secondary structure prediction. Some clarification would be helpful. 

      Thank you for your comments. We have restructured the paragraph to emphasize that DAGGER only applies tertiary folding constraints to sequencing reads without misincorporations at G’s engaged in tertiary interactions. We reasoned that sequencing reads with a misincorporation at a G engaged in a tertiary interaction likely come from an RNA molecule which is in an alternative tertiary conformational state. In this specific circumstance, a tertiary folding constraint may impose incorrect restrictions on the folding of RNA molecules due to distinct tertiary conformations.

      Line 817. "Ability to". 

      Thank you. We have updated the main text to incorporate this comment.

      Figure 6F. Mistake in the axis description. 

      Thank you. We have updated the main text to incorporate this comment.

      Consider combining the paragraphs at lines 850 and 903. 

      Thank you for the suggestion. We rearranged paragraphs in the discussion to improve clarity.

      Line 1546. The final conc of DMS would be nice to see here.

      Thank you. We have updated the main text to incorporate this comment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Using a knock-out mutant strain, the authors tried to decipher the role of the last gene in the mycofactocin operon, mftG. They found that MftG was essential for growth in the presence of ethanol as the sole carbon source, but not for the metabolism of ethanol, evidenced by the equal production of acetaldehyde in the mutant and wild type strains when grown with ethanol (Fig 3). The phenotypic characterization of ΔmftG cells revealed a growth-arrest phenotype in ethanol, reminiscent of starvation conditions (Fig 4). Investigation of cofactor metabolism revealed that MftG was not required to maintain redox balance via NADH/NAD+, but was important for energy production (ATP) in ethanol. Since mycobacteria cannot grow via substrate-level phosphorylation alone, this pointed to a role of MftG in respiration during ethanol metabolism. The accumulation of reduced mycofactocin points to impaired cofactor cycling in the absence of MftG, which would impact the availability of reducing equivalents to feed into the electron transport chain for respiration (Fig 5). This was confirmed when looking at oxygen consumption in membrane preparations from the mutant and would type strains with reduced mycofactocin electron donors (Fig 7). The transcriptional analysis supported the starvation phenotype, as well as perturbations in energy metabolism, and may be beneficial if described prior to respiratory activity data.

      We thank the reviewer for their thorough evaluation of our work. We carefully considered whether transcriptional data should be presented before the respirometry data. However, this would disrupt other transitions and the flow of thoughts between sections, so that we prefer to keep the order of sections as is.

      While the data and conclusions do support the role of MftG in ethanol metabolism, the title of the publication may be misleading as the mutant was able to grow in the presence of other alcohols (Supp Fig S2).

      We agree that ethanol metabolism was the focus of this work and that phenotypes connected to other alcohols were less striking. We, therefore, changed “alcohol” to “ethanol” in the title of the manuscript.

      Furthermore, the authors propose that MftG could not be involved in acetate assimilation based on the detection of acetate in the supernatant and the ability to grow in the presence of acetate. The minimal amount of acetate detected in the supernatant but a comparative amount of acetaldehyde could point to disruption of an aldehyde dehydrogenase.

      We do not agree that MftG might be involved in acetaldehyde oxidation. According to our hypothesis, the disruption of an acetaldehyde dehydrogenase would lead to the accumulation of acetaldehyde. However, we observed an equal amount of acetaldehyde in cultures of M. smegmatis WT and ∆mftG grown on ethanol as well as on ethanol + glucose. Furthermore, the amount of acetate detected in the supernatants is not “minimal” as the reviewer points out but higher as or comparable to the acetaldehyde concentration (Figure 3 E and F, note that acetate concentration are indicated in g/L, acetaldehyde concentrations in µM). Furthermore, the accumulation of mycofactocinols in ∆mftG mutants grown on ethanol is not in agreement with the idea of MftG being an aldehyde dehydrogenase but very well supports our hypothesis that MftG is involved in cofactor reoxidation.

      The link between mycofactocin oxidation and respiration is shown, however the mutant has an intact respiratory chain in the presence of ethanol (oxygen consumption with NADH and succinate in Fig 7C) and the NADH/NAD+ ratios are comparable to growth in glucose. Could the lack of growth of the mutant in ethanol be linked to factors other than respiration?

      Indeed, by using NADH and succinate as electron donors we show that the respiratory chain is largely intact in WT and ∆mftG grown on ethanol. Also, when mycofactocinols were used as an electron donor, we observed that respiration was comparable to succinate respiration in the WT. However, respiration was severely hampered in membranes of ∆mftG when mycofactocinols were offered as reducing agent. These findings support our hypothesis very well that MftG is necessary to shuttle electrons from mycofactocin to the respiratory chain, while the rest of the respiratory chain stayed intact. The fact that NADH/NAD+ ratios are comparable between ethanol and glucose conditions are interesting but indirectly support our hypothesis that mycofactocin and not NAD is the major cofactor in ethanol metabolism. Therefore, we do not see any evidence that the lack of growth of the mutant in ethanol is linked to factors other than respiration.

      To this end, bioinformatic investigation or other evidence to identify the membrane-bound respiratory partner would strengthen the conclusions.

      We generally agree that it is an important next step to identify the direct interaction partners of MftG. However, we are convinced that experimental evidence using several orthogonal approaches is required to unequivocally identify interaction partners of MftG. Nevertheless, we agree that a preliminary bioinformatics study, could guide follow-up studies. We therefore attempted to predict interaction partners of MftG using D-SCRIPT and Alphafold 2. However, our approach did not reveal any meaningful results. Thus, we prefer not to integrate this approach into the manuscript but briefly summarize our methodology here: To predict potential interaction partners of M. smegmatis mc2 155 MftG (MSMEG_1428), D-SCRIPT (Sledzieski et al. 2021, https://doi.org/10.1016/j.cels.2021.08.010) with the Topsy-Turvy model version 1 (Singh et al. 2022, https://doi.org/10.1093/bioinformatics/btac258) was employed to screen every combination of the MSMEG_1428 amino acid sequence with the amino acid sequence of every potential interaction partner from the M. smegmatis mc2 155 predicted total proteome (total 6602 combinations, UniProt UP000000757,  Genome Accession CP000480). Predictions failed for eight potential interaction partners due to size constraints (MSMEG_0019, MSMEG_0400, MSMEG_0402, MSMEG_0408, MSMEG_1252, MSMEG_3715, MSMEG_4727, MSMEG_4757; all amino acids sequences ≥ 2000 AA). Afterward, the top 100 predicted interaction partners, ranked by D-SCRIPT protein-protein-interaction score, were subjected to an Alphafold 2 multimer prediction using ColabFold batch version 1.5.5 (AlphaFold 2 with MMseqs2, Mirdita et al. 2022, https://doi.org/10.1038/s41592-022-01488-1) on a Google Colab T4 GPU with a Python 3 environment and the following parameters (msa_mode: MMseqs2 (UniRef+Environmental), num_models = 1, num_recycles = 3, stop_at_score = 100, num_relax = 0, relax_max_iterations = 200, use_templates = False). As input, the MSMEG_1428 amino acid sequence was used as protein 1 and the amino acid sequence of the potential interaction partner was used as protein 2. In addition, proteins of the electron transport chain and the dormancy regulon (dos regulon) were included as potential interaction partners. In total, 222 unique potential MftG interactions were predicted. The AlphaFold 2 model interface predicted template modelling (ipTM) score peaked at 0.45 for MftG-MftA. This score, however, lies below the threshold of 0.75, which indicates a likely false prediction of interaction (Yin et al. 2022, https://doi.org/10.1002/pro.4379). Nonetheless, the models with the highest ipTM scores (MftG with MftA, MSMEG_3233, MSMEG_4260, MSMEG_0419, MSMEG_5139, MSMEG_5140) were inspected manually using ChimeraX version 1.8 (Meng et al. 2023, https://doi.org/10.1002/pro.4792). However, no reasonable interaction was found.

      Reviewer #2 (Public Review):

      Summary

      Patrícia Graça et al., examined the role of the putative oxidoreductase MftG in regeneration of redox cofactors from the mycofactocin family in Mycolicibacerium smegmatis. The authors show that the mftG is often co-encoded with genes from the mycofactocin synthesis pathway in M. smegmatis genomes. Using a mftG deletion mutant, the authors show that mftG is critical for growth when ethanol is the only available carbon source, and this phenotype can be complemented in trans. The authors demonstrate the ethanol associated growth defect is not due to ethanol induced cell death, but is likely a result of carbon starvation, which was supported by multiple lines of evidence (imaging, transcriptomics, ATP/ADP measurement and respirometry using whole cells and cell membranes). The authors next used LC-MS to show that the mftG deletion mutant has much lower oxidised mycofactocin (MFFT-8 vs MMFT-8H2) compared to WT, suggesting an impaired ability to regenerate myofactocin redox cofactors during ethanol metabolism. These striking results were further supported by mycofactocin oxidation assays after over-expression of MftG in the native host, but also with recombinantly produced partially purified MftG from E. coli. The results showed that MftG is able to partially oxidise mycofactocin species, finally respirometry measurements with M. smegmatis membrane preparations from WT and mftG mutant cells show that the activity of MftG is indispensable for coupling of mycofactocin electron transfer to the respiratory chain. Overall, I find this study to be comprehensive and the conclusions of the paper are well supported by multiple complementary lines of evidence that are clearly presented.

      Strengths

      The major strengths of the paper are that it is clearly written and presented and contains multiple, complementary lines of experimental evidence that support the hypothesis that MftG is involved in the regeneration of mycofactocin cofactors, and assists with coupling of electrons derived from ethanol metabolism to the aerobic respiratory chain. The data appear to support the authors hypotheses.

      We thank the reviewer for their thorough evaluation of our work.

      Weaknesses

      No major weaknesses were identified, only minor weaknesses mostly surrounding presentation of data in some figures.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) In Fig 6 C and D, would it not be expected that MMFT-2H2 would be decreasing over time as MMFT-2 is increasing?

      This is true. MMFT-2H2 is indeed decreasing while MMFT-2 in increasing, however, since the y-axis is drawn in logarithmic scale the visible difference is not proportional to the actual changes. The increase of MMFT-2 against a very low starting point is more clearly visible than the decrease of MMFT-2H2, which was added in high quantities.

      (2) It would be beneficial to include rationale regarding the electron acceptors tested and why FAD was not included.

      FAD is a prosthetic group of the enzyme and was always a component of the assay. The other electron acceptors were chosen as potential external electron acceptors.

      (3) Bioinformatic analysis to capture possible interacting partners of MftG

      See our response to the previous review.

      Reviewer #2 (Recommendations For The Authors):

      Questions:

      (1) The co-occurrence analysis showed that one genome encoded mftG, but not mftC - do the authors think that this is a mftG mis-annotation?

      This is a good question. We have investigated this case more closely and conclude that this particular mftG is not a misannotation. Instead, it appears that the mftC gene underwent gene loss in this organism. We added on page 8, line 15: “Only one genome (Herbiconiux sp. L3-i23) encoding a bona fide MftG did not harbor any MftC homolog. However, close inspection revealed the presence of mftD, mftF, and a potential mftA gene but a loss of mftB,C and E in this organism.”

      (2) Figure 3A - the complemented mutant strain shows enhanced growth on ethanol when compared to the WT strain with the same mftG complementation vector, suggesting that dysregulation from the expression plasmid may not be responsible for this phenotype. Have the authors conducted whole genome sequencing on the mutant/complement isolate to rule out secondary mutations?

      This is an interesting point. We have not conducted further investigations into the complement mutant. However, we can confidently state that the complementation was successful in that it restored growth of the ∆mftG mutant on ethanol, thus confirming that the growth arrest of the mutant was due to the lack of mftG activity and not due to any secondary mutation. We also observed that both the complement strain and the overexpression strain, both of which are based on the same overexpression plasmid, exhibited shorter lag phases, faster growth and higher final cell densities compared to the wild type. We interpret these data in a way that overexpression of mftG might lift a growth limited step. Notably, this is only an interpretation, we do not make this claim. What we cannot explain at the moment, is the observation that the complement mutant grew to a higher OD than the overexpression strain. This is indeed interesting, and it might be due to an artefact or due to complex regulatory effects, which are hard to study without an in-depth characterization of the different strains involved. While this goes beyond the scope of this study, we are convinced that our main conclusions are not challenged by this phenomenon.

      (3) Figure 4C - could the yellow fluorescence that suggests growth arrest be quantified in these images similar to the size and septa/replication sites?

      In principle, this is a good suggestion. However, the amount of yellow fluorescence only differed in the starvation condition between genotypes. Since this condition was not a focus of this study, we preferred not to discuss these differences further.

      (4) Figure 4E - the complemented mutant strain has very high error, why is that? Could this phenotype not be complemented?

      It is true that the standard deviation (SD) is relatively high in this experiment. This is due to the fact that single-cell analyses based on microscopic images were conducted here - not bulk measurements of the average fluorescence. This means that the high variance partially reflects phenotypic heterogeneity of the population, rather than inefficient complementation. While it is interesting that not all cells behaved equally, a finding that deserves further investigations in the future, we conclude that the mean value is a good representative for the efficiency of the complementation.

      (5) While the whole cell extract experiment presented in Figure 6A is very clear, could the authors include SDS page or MS results of their partially purified MftG preparations used for figure B-F in the supplementary data to rule out any confounding factors that may be oxidising mycofactocin species in these preparations?

      We did not include SDS-Page or MS results since the enzyme preparations obtained were not pure. This is why we refer to the preparation as “partially purified fraction”. Since we were aware of the risk of confounding factors being potentially present in the preparation, we used two different expression hosts (M. smegmatismftG and E. coli) and included negative controls, i.e., a reaction using protein preparations from the same host that underwent the exact same purification steps but lacked the mftG gene. For instance, Figure 6A shows the negative control (M. smegmatismftG) and the verum (M. smegmatismftG-mftG_His6). Although this control is not shown in panels BCD for more clarity, we can assure that the proposed activity of MftG as never been detected in any extract of _M. smegmatismftG. Concerning MftG preparations obtained from heterologous expression in E. coli, we also performed empty vector controls and inactivated protein controls. We added a new Supplementary Figure S4 to show one example control. Taken together, the usage of two different expression hosts along with corresponding background controls clearly demonstrates that mycofactocinol oxidation only occurred in protein extracts of bacterial strains that contained the mftG gene. Taken together, these data indicate that the observed mycofactocinol dehydrogenase activity is connected to MftG and not to any background activity.

      Recommendations:

      • A suggestion - revise sub-titles in the results section to be more 'results-oriented' e.g. rather than 'the role of MftG in growth and metabolism of mycobacteria' consider instead 'MftG is critical for M. smegmatis capacity to utilise ethanol as a sole carbon source for growth' or something similar.

      In principle this is a good idea for many manuscripts. However, we have the impression that this approach does not reflect the complexity and additive aspect of the sections of our manuscript.

      • For clarity, revise all figures to include p-values in the figure legend rather than above the figures (use asterisks to indicate significance).

      We are not sure whether the deletion of p-values in the figures would enhance clarity. We would prefer to leave them within figures.

      • Figure 5B -revise colour legend, it is unclear which bar on the graph corresponds to which strain.

      The figure legend was enlarged to enhance readability.

      • Page 8 - MftG and MftC should be lowercase and italicised as the authors are writing about the co-occurrence of genes encoded in genomes, not proteins.

      Good point, we changed some instances of MftG / MftC to mftG / mftC, to more specifically refer to the gene level. However, in some cases, the protein level is more appropriate, for instance, the phylogenies are based on protein sequences. That is why we used the spelling MftG / MftC in these cases.

      • Page 9 - for clarity move Figure 3 after first in text citation.

      We moved Figure 3.

      • Page 17 - for clarity move Figure 5 after first in text citation.

      We moved Figure 5. We furthermore reformatted figure legend to fit onto the same page as the figures.

      • Page 20, line 17 - 'was attempted' change to 'was performed'. The authors did more than attempt purification, they succeeded!

      Since purification of MftG was not successful, we prefer the term “attempted” here. However, activity assays indeed indicate successful production of MftG.

      • Page 20, line 19-21 - data showing that the MftG-HIS6 complements ∆mftG could be included in supplementary information.

      Complementation was obvious by growth on media containing ethanol as a sole carbon source.

      • Page 26 line 25 - 'we also we' delete duplicated we.

      Thank you for the hint, we deleted the second instance of “we” in the manuscript.

      • Page 26 Line 26 - 'mycofactocinols were oxidised to mycofactocinols', should this read mycofactocinols were oxidised to mycofactocinones?

      Correct. We changed “mycofactocinols” to “mycofactocinones”

      • Page 28 line 17, huc hydrogenase operon

      We added (“huc operon”).

      • Page 38 line 24, 'Two' not 'to'.

      This is a misunderstanding. “To” is correct

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public reviews):

      (1) Given that this is one of the first studies to report the mapping of longitudinal intactness of proviral genomes in the globally dominant subtype C, the manuscript would benefit from placing these findings in the context of what has been reported in other populations, for example, how decay rates of intact and defective genomes compare with that of other subtypes where known.  

      Most published studies are from men living with HIV-1 subtype B and the studies are not from the hyperacute infection phase and therefore a direct head-to-head comparison with the FRESH study is difficult.  However, we can cite/highlight and contrast our study with a few a few examples from acute infection studies as follows.

      a. Peluso et. al., JCI, 2020, showed that in Caucasian men (SCOPE study), with subtype B infection, initiating ART during chronic infection virus intact genomes decayed at a rate of 15.7% per year, while defective genomes decayed at a rate of 4% per year.  In our study we showed that in chronic treated participants genomes decreased at a rate of 25% (intact) and 3% (defective) per month for the first 6 months of treatment.

      b. White et. al., PNAS, 2021, demonstrated that in a cohort of African, white and mixed-race American men treated during acute infection, the rate of decay of intact viral genomes in the first phase of decay was <0.3 logs copies in the first 2-3 weeks following ART initiation. In the FRESH cohort our data from acute treated participants shows a comparable decay rate of 0.31 log copies per month for virus intact genomes.

      c. A study in Thailand (Leyre et. al., 2020, Science Translational Medicine), of predominantly HIV-1 CRF01-AE subtype compared HIV-reservoir levels in participants starting ART at the earliest stages of acute HIV infection (in the RV254/SEARCH 010 cohort) and participants initiating ART during chronic infection (in SEARCH 011 and RV304/SEARCH 013 cohorts). In keeping with our study, they showed that the frequency of infected cells with integrated HIV DNA remained stable in participants who initiated ART during chronic infection, while there was a sharp decay in these infected cells in all acutely treated individuals during the first 12 weeks of therapy.  Rates of decay were not provided and therefore a direct comparison with our data from the FRESH cohort is not possible.

      d. A study by Bruner et. al., Nat. Med. 2016, described the composition of proviral populations in acute treated (within 100 days) and chronic treated (>180 days), predominantly male subtype B cohort. In comparison to the FRESH chronic treated group, they showed that in chronic treated infection 98% (87% in FRESH) of viral genomes were defective, 80% (60% in FRESH) had large internal deletions and 14% (31% in FRESH) were hypermutated.  In acute treated 93% (48% in FRESH) were defective and 35% (7% in FRESH) were hypermutated.  The differences frequency of hypermutations could be explained by the differences in timing of infection specifically in the acute treated groups where FRESH participants initiate ART at a median of 1 day after infection.  It is also possible that sex- or race-based differences in immunological factors that impact the reservoir may play a role.  

      This study also showed that large deletions are non-random and occur at hotspots in the HIV-1 genome. The design of the subtype B IPDA assay (Bruner et. al., Nature, 2019) is based on optimal discrimination between intact and deleted sequences - obtained with a 5′ amplicon in the Ψ region and a 3′ amplicon in Envelope. This suggest that Envelope is a hotspot for large while deletions in Ψ is the site of frequent small deletions and is included in larger 5′ deletions. In the FRESH cohort of HIV-1

      subtype C, genome deletions were most frequently observed between Integrase and Envelope relative to Gag (p<0.0001–0.001).

      e. In 2017, Heiner et. al., in Cell Rep, also described genetic characteristics of the latent HIV-1 reservoir in 3 acute treated and 3 chronic treated male study participants with subtype B HIV.  Their data was similar to Bruner et. al. above showing proportions of intact proviruses in participants who initiated therapy during acute/early infection at 6% (94% defective) and chronic infection at 3% (97% defective). In contrast the frequencies in FRESH in acute treated were 52% intact and 48% defective and in chronic infection were 13% intact and 87% defective.  These differences could be attributed to the timing of treatment initiation where in the aforementioned study early treatment ranged from 0.6-3.4 months after infection.

      (2) Indeed, in the abstract, the authors indicate that treatment was initiated before the peak. The use of the term 'peak' viremia in the hyperacute-treated group could perhaps be replaced with 'highest recorded viral load'. The statistical comparison of this measure in the two groups is perhaps more relevant with regards to viral burden over time or area under the curve viral load as these are previously reported as correlates of reservoir size. 

      We have edited the manuscript text to describe the term peak viraemia in hyperacute treated participants more clearly (lines 443-444). We have now performed an analysis of area under the curve to compare viral burden in the two study groups and found associations with proviral DNA levels after one year. This has been added to the results section (lines 162-163).

      Reviewer #2 (Public reviews):

      (1) Other factors also deserve consideration and include age, and environment (e.g. other comorbidities and coinfections.)

      We agree that these factors could play a role however participants in this study were of similar age (18-23), and information on co-morbidities and coinfections are not known.

      Reviewer #3 (Public reviews):

      (1) The word reservoir should not be used to describe proviral DNA soon after ART initiation. It is generally agreed upon that there is still HIV DNA from actively infected cells (phase 1 & 2 decay of RNA) during the first 6-12 months of ART. Only after a full year of uninterrupted ART is it really safe to label intact proviral HIV DNA as an approximation of the reservoir. This should be amended throughout.

      We agree and where appropriate have amended the use of the word reservoir to only refer to the proviral load after full viral suppression, i.e., undetectable viral load.

      (2) All raw, individualized data should be made available for modelers and statisticians. It would be very nice to see the RNA and DNA data presented in a supplementary figure by an individual to get a better grasp of intra-host kinetics.

      We will make all relevant data available and accessible to interested parties on request. We have now added a section on data availability (lines 489-491).

      (3) The legend of Supplementary Figure 2 should list when samples were taken.

      The data in this figure represents an overall analysis of all sequences available for each participant at all time points.  This has now been explained more clearly in the figure legend.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      (1) It is recommended that the introduction includes information to set the scene regarding what is currently reported on the composition of the reservoir for those not in the immediate field of study i.e., the reported percentage of defective genomes and in which settings/populations genome intactness has been mapped, as this remains an area of limited information.

      We have now included summary of other reported findings in the field in the introduction (lines 89-92, 9498) and discussion (lines 345-350).  A more detailed overview has been provided in the response to public reviews.

      (2) It may be beneficial to state in the main text of the paper what the purpose of the Raltegravir was and that it was only administered post-suppression. Looking at Table 1, only the hyperacute treatment group received Raltegravir and this could be seen as a confounder as it is an integrase inhibitor. Therefore, this should be explained.

      Once Raltegravir became available in South Africa, all new acute infections in the study cohort had an intensified 4-drug regimen that included Raltegravir.  A more detailed explanation has now been included in the methods section (lines 435-437).

      (3) Can the authors explain why the viral measures at 6 months post-ART are not shown for chronictreated individuals in Figure 1 or reported on in the text?

      The 6 months post-ART time point has been added to Figure 1.

      (4) Can the authors indicate in the discussion, how the breakdown of proviral composition compares to subtype B as reported in the literature, for example, are the common sites of deletion similar, or is the frequency of hypermutation similar?

      Added to discussion (lines 345-350).

      (5) Do the numbers above the bars in Figure 3 represent the number of sampled genomes? If so, this should be stated.

      Yes, the numbers above the bars represent the number of sampled genomes. This has been added to the Figure 3 legend.

      (6) In the section starting on line 141, the introduction implies a comparison with immunological features, yet what is being compared are markers of clinical disease progression rather than immune responses. This should be clarified/corrected.

      This has been corrected (line 153).

      (7) Line 170 uses the term 'immediately' following infection, however, was this not 1 -3 days after?

      We have changed the word “immediately” to “1-3 days post-detection” (line 181).

      (8) Can the sampling time-points for the two groups be given for the longitudinal sequencing analysis?

      The sequencing time points for each group is depicted in Figure 2.

      (9) Line 183 indicates that intact genomes contributed 65% of the total sequence pool, yet it's given as 35% in the paragraph above. Should this be defective genomes?

      Yes, this was a typographical error.  Now corrected to read “defective genomes” (line 193).

      (10) The section on decay kinetics of intact and defective genomes seems to overlap with the section above and would flow better if merged.

      Well noted, however we choose to keep these sections separate.

      (11) Some references in the text are given in writing instead of numbering.

      This has been corrected.

      (12) In the clonal expansion results section, can it be indicated between which two time-points expansion was measured?

      This analysis was performed with all sequences available for each participant at all time points.  We have added this explanation to the respective Figure legend.

      Reviewer #2 (Recommendations for The Authors):

      (1) The statement on line 384 "Our data showed that early ART...preserves innate immune factors" - what innate immune factors are being referred to?

      We have removed this statement.

      (2) HLA genotyping methods are not included in the Methods section

      Now included and referenced (lines 481-483).

      (3) Are CD4:CD8 ratios available for the cohorts? This could be another informative clinical parameter to analyse in relation to HIV-1 proviral load after 1 year of ART – as done for the other variables (peak VL, and the CD4 measures).

      Yes, CD4:CD8 ratios are available. We performed the recommended analysis but found no associations with HIV-1 proviral load after 1 year of ART. We have added this to the results section (lines 163-164).

      (4) Reference formatting: Paragraph starting at line 247 (Contribution of clonal expansion...) - the two references in this paragraph are not cited according to the numbering system as for the rest of the manuscript. The Lui et al, 2020 reference is missing from the reference list - so will change all the numbering throughout.

      This has been corrected.

      Reviewer #3 (Recommendations for The Authors):

      (1) To allow comparison to past work. I suggest changing decay using % to half-life. I would also mention the multiple studies looking at total and intact HIV DNA decay rates in the intro.

      We do not have enough data points to get a good estimate of the half-life and therefor report decay as percentage per month for the first 6 months. 

      (2) Line 73: variability is the wrong word as inter-individual variability is remarkably low. I think the authors mean "difference" between intact and total.

      We have changed the word variability to difference as suggested.

      (3) Line 297: I am personally not convinced that there is data that definitively shows total HIV DNA impacting the pathophysiology of infection. All of this work is deeply confounded by the impact of past viremia. The authors should talk about this in more detail or eliminate this sentence.

      We have reworded the statement to read “Total HIV-1 DNA is an important biomarker of clinical outcomes.” (Lines 308-309).

      (4) Line 317; There is no target cell limitation for reservoir cells. The vast majority of CD4+ T cells during suppressive ART are uninfected. The mechanism listing the number of reservoir cells is necessarily not target cell limitation.

      We agree. The statement this refers to has been reworded as follows: “Considering, that the majority of CD4 T cells remain uninfected it is likely that this does not represent a higher number of target cells, and this warrants further investigation.” (lines 325-326).

      (5) Line 322: Some people in the field bristle at the concept of total HIV DNA being part of the reservoir as defective viruses do not contribute to viremia. Please consider rephrasing. 

      We acknowledge that there are deferring opinions regarding total HIV DNA being part of the reservoir as defective viruses do not contribute to viremia, however defective HIV proviruses may contribute to persistent immune dysfunction and T cell exhaustion that are associated comorbidities and adverse clinical outcomes in people living with HIV.  We have explained in the text that total HIV-DNA does not distinguish between replication-competent and -defective viruses that contribute to the viral reservoir.

      (6) Line 339: The under-sampling statement is an understatement. The degree of under-sampling is massive and biases estimates of clonality and sensitivity for intact HIV. Please see and consider citing work by Dan Reeves on this subject.

      We agree and have cited work by Dan Reeves (line 358).

      (7) Line 351: This is not a head-to-head comparison of biphasic decay as the Siliciano group's work (and others) does not start to consider HIV decay until one year after ART. I think it is important to not consider what happens during the first year of ART to be reservoir decay necessarily.

      Well noted.

      (8) Line 366-371: This section is underwritten. In nearly all PWH studies to date, observed reservoirs are highly clonal.

      We agree that observed reservoirs are highly clonal but have not added anything further to this section.

      (9) It would be nice to have some background in the intro & discussion about whether there is any a priori reason that clade C reservoirs, or reservoirs in South African women, might differ (or not) from clade B reservoirs observed in different study participants.

      We have now added this to the introduction (lines 94-103).

      (10) Line 248: This sentence is likely not accurate. It is probable that most of the reservoir is sustained by the proliferation of infected CD4+ T cells. 50% is a low estimate due to under-sampling leading to false singleton samples. Moreover, singletons can also be part of former clones that have contracted, which is a natural outcome for CD4+ T cells responding to antigens &/or exhibiting homeostasis. The data as reported is fine but more complex ecologic methods are needed to truly probe the clonal structure of the reservoir given severe under sampling.

      Well noted.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The present study's main aim is to investigate the mechanism of how VirR controls the magnitude of MEV release in Mtb. The authors used various techniques, including genetics, transcriptomics, proteomics, and ultrastructural and biochemical methods. Several observations were made to link VirR-mediated vesiculogenesis with PG metabolism, lipid metabolism, and cell wall permeability. Finally, the authors presented evidence of a direct physical interaction of VirR with the LCP proteins involved in linking PG with AG, providing clues that VirR might act as a scaffold for LCP proteins and remodel the cell wall of Mtb. Since the Mtb cell wall provides a formidable anatomical barrier for the entry of antibiotics, targeting VirR might weaken the permeability of the pathogen along with the stimulation of the immune system due to enhanced vesiculogenesis. Therefore, VirR could be an excellent drug target. Overall, the study is an essential area of TB biology.

      We thank the reviewer for the kind assessment of our paper.  

      Strengths: 

      The authors have done a commendable job of comprehensively examining the phenotypes associated with the VirR mutant using various techniques. Application of Cryo-EM technology confirmed increased thickness and altered arrangement of CM-L1 layer. The authors also confirmed that increased vesicle release in the mutant was not due to cell lysis, which contrasts with studies in other bacterial species. 

      Another strength of the manuscript is that biochemical experiments show altered permeability and PG turnover in the mutant, which fits with later experiments where authors provide evidence of a direct physical interaction of VirR with LCP proteins. 

      Transcriptomics and proteomics data were helpful in making connections with lipid metabolism, which the authors confirmed by analyzing the lipids and metabolites of the mutant. 

      Lastly, using three approaches, the authors confirm that VirR interacts with LCP proteins in Mtb via the LytR_C terminal domain. 

      Altogether, the work is comprehensive, experiments are designed well, and conclusions are made based on the data generated after verification using multiple complementary approaches.

      We are glad that this reviewer finds our study of interest and well designed.   

      Weaknesses: 

      (1) The major weakness is that the mechanism of VirR-mediated EV release remains enigmatic. Most of the findings are observational and only associate enhanced vesiculogenesis observed in the VirR mutant with cell wall permeability and PG metabolism. The authors suggest that EV release occurs during cell division when PG is most fragile. However, this has yet to be tested in the manuscript - the AFM of the VirR mutant, which produces thicker PG with more pore density, displays enhanced vesiculogenesis. No evidence was presented to show that the PG of the mutant is fragile, and there are differences in cell division to explain increased vesiculogenesis. These observations, counterintuitive to the authors' hypothesis, need detailed experimental verification.

      We concur with the reviewer that we do not have direct evidence showing a more fragile PG in the virR mutant and our statement is supported by a compendium of different results. However, this statement is framed in the discussion section as a possible scenario, acknowledging that more experiments are needed to make such connection. Nevertheless, we provide additional data on the molecular characterization of virRmut PG using MS to show a significant increase in the abundance of deacetylated muropeptides, a feature that has been linked to altered lysozyme sensitivity in other unrelated Gram-positive bacteria

      (Fig 8 G,H).  

      (2.1) Transcriptomic data only adds a little substantial. Transcriptomic data do not correlate with the proteomics data. It remains unclear how VirR deregulates transcription. 

      We concur with the reviewer that information provided by transcriptomics and proteomics is a bit fragmented and, taking into consideration the low correlation between both datasets, it does not help to explain the phenotype observed in the mutant. This issue has also been raised by another reviewer so, we have paid special attention to that. 

      To refine the biological interpretation of the transcriptomic data we have integrated the complemented strain (virRmut-Comp) in our analyses. This led us to narrow down the virR-dependent transcriptomics signature to the sets of genes that appear simultaneously deregulated in virRmut with respect to both WT and complemented strain in either direction. Furthermore, to identify the transcription factors whose regulatory activity appear disrupted in the mutant strain, we have resorted to an external dataset (Minch et al. 2015) and found a set of 10 transcriptional regulators whose regulons appear significantly impacted in the virRmut strain. While admittedly these improvements do not fully address the question tackled by the reviewer, we found that they contribute to a more precise characterization of the VirR-dependent transcriptional signatures, as well as the regulons, in the genome-wide transcriptional regulatory network of the pathogen that appear altered because of virR disruption. We acknowledge that the lack of correlation between whole-cell lysates proteomics and transcriptomic data is something intriguing, albeit not uncommon in Mycobacterium tuberculosis. However, differences in the protein cargo of the vesicles from different strains share key pathways in common with the transcriptomic analyses, such as the enrichments in cell wall biogenesis and peptidoglycan biosynthesis that are observed both among genes that are downregulated in both cases in virRmut.

      (2.2) TLCs of lipids are not quantitative. For example, the TLC image of PDIM is poor; quantitative estimation needs metabolic labeling of lipids with radioactive precursors. Further, change in PDIMs is likely to affect other lipids (SL-1, PAT/DAT) that share a common precursor (propionyl- CoA).

      We also agree with the reviewer that TLC, as it is, it is not quantitative. However, we do not have access to radioactive procedures. In the new version of the manuscript, we have run TLCs on all the strains tested to resolve SLs and PAT/DATs (Fig S8). Our results show a reduction in the pool of SL and DATs in the mutant, indicating that part of the methylmalonil pool is diverted to the synthesis of PDIMs. 

      (3) The connection of cholesterol with cell wall permeability is tenuous. Cholesterol will serve as a carbon source and contribute to the biosynthesis of methyl-branched lipids such as PDIM, SL-1, and PAD/DAT. Carbon sources also affect other aspects of physiology (redox, respiration, ATP), which can directly affect permeability and import/export of drugs. Authors should investigate whether restoration of the normal level of permeability and EV release is not due to the maintenance of cell wall lipid balance upon cholesterol exposure of the VirR mutant.

      We concur with the reviewer that cholesterol as a sole carbon source is introducing many changes in Mtb cells beside permeability. Consequently, we investigated the virRmut lipid profile upon exposure to either cholesterol or TRZ (Fig S8). Both WT and virRmut-Comp strains were included in the analysis. Polar lipid analysis revealed that either cholesterol or TRZ exposure induced a marked reduction in PIMs and cardiolipin (DPG) levels in virRmut relative to WT or complemented strains (Fig S8A). Analysis of apolar lipids indicated that, relative to glycerol MM, virRmut cultured in the presence of cholesterol or TRZ showed reduced levels of TDM and DATs compared to WT and virRmut-Comp strains (Fig S8B). These results suggest a lack of correlation between modulation of cell permeability by cholesterol and TRZ and lipid levels in the absence of VirR.

      Furthermore, about this section, we would like to mention that we have modified the reference used for the annotation of the DosR regulon: moving from the definition of the regulon used in the previous submission (coming from Rustad, el at. PLoS One 3(1), e1502 (2008). The enduring hypoxic response of Mycobacterium tuberculosis) to the more recent characterization of the regulon based on CHiPseq data, reported in Minch et al. 2015. This was done to ensure coherence with the transcriptomics analyses in the new figure 4.

      (4) Finally, protein interaction data is based on experiments done once without statistical analysis. If the interaction between VirR and LCP protein is expected on the mycobacterial membrane, how the SPLIT_GFP system expressed in the cytoplasm is physiologically relevant. No explanation was provided as to why VirR interacts with the truncated version of LCP proteins and not with the full-length proteins.

      We have repeated the experiments and applied statistics (Figure 9). As stated in the manuscript this assay has successfully been applied to interrogate interactions of domains of proteins embedded in the membrane of mycobacteria. Therefore, we believe that this assay is valid to interrogate interactions between Lcp proteins.

      Reviewer #2 (Public Review): 

      Summary: 

      In this work, Vivian Salgueiro et al. have comprehensively investigated the role of VirR in the vesicle production process in Mtb using state-of-the-art omics, imaging, and several biochemical assays. From the present study, authors have drawn a positive correlation between cell membrane permeability and vesiculogenesis and implicated VirR in affecting membrane permeability, thereby impacting vesiculogenesis. 

      Strengths: 

      The authors have discovered a critical factor (i.e. membrane permeability) that affects vesicle production and release in Mycobacteria, which can broadly be applied to other bacteria and may be of significant interest to other scientists in the field. Through omics and multiple targeted assays such as targeted metabolomics, PG isolation, analysis of Diaminopimelic acid and glycosyl composition of the cell wall, and, importantly, molecular interactions with PG-AG ligating canonical LCP proteins, the authors have established that VirR is a central scaffold at the cell envelope remodelling process which is critical for MEV production. 

      We thank the reviewer for the kind assessment of the paper.

      Weaknesses: 

      Throughout the study, the authors have utilized a CRISPR knockout of VirR. VirR is a non-essential gene for the growth of Mtb; a null mutant of VirR would have been a better choice for the study. 

      According to Tn mutant databases and CRISPR databases, virR is a non-essential gene. However, we have tried to interrupt this gene using the allelic exchange substitution approach via phages many times with no success. So far there is no precedent of a clean KO mutant in this gene. White et al., generated a virR mutant consisting of deletion of a large fragment of the c-terminal part of the protein, pretty much replicating the effect of the Tn insertion site in the virR Tn mutant. These precedents made us to switch to CRISPR technology.  

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      (1) The authors monitored cell lysis by measuring the release of a cytoplasmic iron-responsive protein (IdeR). Since EV release is regulated by iron starvation, which is directly sensed by IdeR, another control (unrelated to iron) is needed. A much better approach would be to use hydrophobic/hydrophilic probes to measure changes in the cell wall envelope.

      Does the VirR complemented strain have a faint IdeR band in the supernatant? The authors need to clarify. Also, it's unclear whether the complementation restored normal VirR levels or not. 

      We thank the reviewer for this recommendation. Consequently, we have complemented these studies by an alternative approach based on serially diluted cultures spotted on solid medium. These results align very well with that of western blot using IdeR levels in the supernatant as a surrogate of cell lysis.

      We also noticed the presence of a faint IdeR band in the supernatant of the complemented strain and suggestive of a possible cell lysis. However, as shown in other section this was not translated into increased levels of vesiculation. As previously shown in a previous paper describing VirR as a genetic determinant of vesiculogenesis, VirR levels in the complemented strains are not just restored but increased considerably. This overexpression could explain the potential artifact of a leaky phenotype in the complemented strain. In addition to that previous study, the proteomic data included in this paper clearly shows a restoration of VirR levels relative to the WT strains.

      (2) Figure 2C: The data are weak; I don't see any difference in incorporating FDAAs in MM media. Even in the 7H9 medium, differences appear only at the last time point (20 h). What happens at the time point after 20 h (e.g., 48 h)? How do we differentiate between defective permeability or anabolism leading to altered PG? No statistical analysis was performed.

      We apologize for the incomplete assessment of the results in this figure. First, this figure just shows differential incorporation of FDAAs in the different strains in different media. As per previous studies (Kuru et al (2017) Nat. Protocols), these probes can freely enter into cells and may be incorporated into PG by at least three different mechanisms, depending on the species: through the cytoplasmic steps of PG biosynthesis and via two distinct transpeptidation reactions taking place in the periplasm. Consequently, the differential labeling observed in virRmut relative to WT strain may be a consequence of the enlarge PG observe din the mutant. We have repeated the experiment and created new data. First, we have cultured strains with a blue FDAA (HADA) for 48 to ensure full labeling. Then, we washed cells and cultured in the presence of a second FDAA, this time green (FDL) for 5 h. The differential incorporation of FDL relative to HADA was then measured under the fluorescence microscope. This experiment showed a virRmut incorporate more FDL that the other strains, suggesting an altered PG remodeling.  modified the figure to make clearer the early and late time points of the time-course and applied statistics.

      (3) Many genes (~ 1700) were deregulated in the mutant. Since these transcriptional changes do not correlate at the protein level in WCL, it's important to determine VirR-specificity. RNA-Seq of VirR complemented strain is important.

      We think this was an extremely important point, and we thank the reviewer for pointing this out. Following their suggestion, we have analyzed and integrated data from the complemented strain, which we have added to the GEO submission, to conclude that, in fact, differences in expression between the complemented strain and either the WT, or virRmut are also common and highly significant. Albeit this is not completely unexpected, given the nature of our mutants and the fact that the complemented strains show significantly higher levels of expression of VirR -both at the RNA and protein levels- than the WT, it motivated us to narrow down our definition of VirR-dependent genes to adopt a combined criterium that integrated the complemented strain. Following this approach, we considered the set of genes upregulated (downregulated) in virRmut as those whose expression in that strain is, at the same time, significantly higher (lower) than in WT as well as in virRmut-Comp. Working with this integrated definition, the genes considered -399 upregulated and 502 downregulated genes- are those whose observed expression changes are more likely to be genuinely VirR-dependent rather than any non-specific consequence of the mutagenesis protocols. Despite the lower number of genes in these sets, the repetition of all our functional enrichment analyses based on this combined criterium leads us to conclusions that are largely compatible with those presented in the first version of the paper.

      (4) Transcriptome data provide no clues about how VirR could mediate expression deregulation. Is there an overlap with the regulations/regulons of any Mtb transcription factors? One clue is DosR; however, DosR only regulates 50-60 genes in Mtb. 

      Again, we would like to thank the reviewer for this recommendation, which we have followed accordingly to generate a new section in the results named “VirR-dependent genes intersect the regulons of key transcriptional regulators of the responses to stress, dormancy, and cell wall remodeling”. As we explain in this new section, we resorted to the regulon annotations reported in (Minch et al. 2015), where ChIP-seq data is collected on binding events between a panel of 143 transcription factors (TFs) and DNA genome-wide. The dataset includes 7248 binding events between regulators and DNA motifs in the vicinity of targets’ promoters. After completing enrichment analyses with the resulting regulons, we identified 10 transcription factors whose intersections with the sets of up and downregulated genes in virRmut were larger than expected by chance (One tailed Fisher exact test, OR>2, FDR<0.1). Those regulators -which, as guessed by the referee, included DevR-, control key pathways related with cell wall remodeling, stress responses, and transition to dormancy.

      (5) How many proteins that are enriched or depleted in the EVs of the VirR mutant also affected transcriptionally in the mutant? How does VirR regulate the abundance and transport of protein in EVs? 

      While the intersection between genes and proteins that appear upregulated in the virRmut strain both at transcriptional and vesicular protein levels (N=21) was found larger than expected by chance (OR=2.0 p=7.0E-3), downregulated genes and proteins in virRmut (N=14) were not enriched in each other. These results, indicated, at most, a scarce correlation between RNA and protein levels (a phenomenon nonetheless previously observed in Mycobacterium tuberculosis, among other organisms, see Cortés et al. 2013). Admittedly, the compilation of these omics data is insufficient, by itself to pinpoint the specific regulatory mechanisms through which the absence of VirR impacts protein abundance in EVs. For the sake of transparency, this has been acknowledged in the discussion section of the resubmitted version of the manuscript.

      (6) The assumption that a depleted pool of methylmalonyl CoA is due to increased utilization for PDIM biosynthesis is problematic. Without flux-based measurement, we don't know if MMCoA is consumed more or produced less, more so because Acc is repressed in the VirR mutant EVs. Further, MMCoA feeds into the TCA cycle and other methyl-branched lipids. Without data on other lipids and metabolism, the depletion of MMCoA is difficult to explain.

      The differential expression statistics compiled suggest that both effects may be at place, since we observed, at the same time, a downregulation of enzymes controlling methylmalonyl synthesis from propionyl-CoA (i.e. Acc, at the protein level), as well as an upregulation of enzymes related with its incorporation into DIM/PDIMs (i.e. pps genes). Both effects, combined, would favor an increased rate of methylmalonyl production, and a slower depletion rate, thus contributing to the higher levels observed. We however concur with the reviewer that fluxomics analyses will contribute to shed light on this question in a more decisive manner, and we have acknowledged this in the discussion section too.   

      (7) Figure 5: Deregulation of rubredoxins and copper indicates impaired redox balance and respiration in the mutant. The data is complex to connect with permeability as TRZ is mycobactericidal and also known to affect the respiratory chain. The authors need to investigate if, in addition to permeability, the presence of VirR is essential for maintaining bioenergetics.

      The data related to rubredoxins and copper has been modified after reanalyzing transcriptomic data including the complemented strain. Nevertheless, we found that some features of the response to stresses may be impaired in the mutant, including the one to oxidative stress. In this regard, we found the enhanced sensitivity of the mutant to H2O2 relative to WT and complemented strains. This piece of data is now included as Fig S3 in the new version of the manuscript.

      (8) Differential regulation of DoS regulon and cholesterol growth could also be linked to differences in metabolism, redox, and respiration. What is the phenotype of VirR mutants in terms of growth and respiration in the presence of cholesterol/TRZ? 

      We thank the reviewer for this suggestion. Consequently, we have added a new section to Results that suggest that other aspects of mycobacterial physiology may be affected in the virR mutant when cultured in the presence of cholesterol or TRZ: 

      “Modulation of EV levels and permeability in virRmut by cholesterol and TRZ. We next wondered about the effect of culturing virRmut on both cholesterol or TRZ could have on cell growth, permeability and EV production. In the case of cholesterol, it has also been shown to affect other aspects of physiology (redox, respiration, ATP), which can directly affect permeability (Lu et al., 2017). We monitored virRmut growth cultured in MM supplemented with either glycerol, cholesterol as a sole carbon source, and TRZ at 3 ug ml-1 for 20 days. While cholesterol significantly enhanced the growth virRmut after 5 days relative to glycerol medium, supplementation of glycerol medium with TRZ restricted growth during the whole time-course (Fig S5A). The study of cell permeability in the same conditions indicated that the enhanced cell permeability observed in glycerol MM was reduced when virRmut when cultured with cholesterol as sole carbon source. Conversely, the presence of TRZ increased cell permeability relative to the medium containing solely glycerol (Fig S5C). As we have previously observed for the WT strain, either condition (Chol or TRZ) also modified vesiculation levels in the mutant accordingly (Fig S5B). These results strongly indicates that other aspects of mycobacterial physiology besides permeability are also affected in the virR mutant and may contribute to the observed enhanced vesiculation.

      (9) PDIM TLC is not evident; both DimA and DImB should be clearly shown. It will also be necessary to show other methyl-branched lipids, such as SL-1 and PAT/DAT, because the increase in PDIM can take away methyl malonyl CoA from the biosynthesis of SL-1 and PAT/DAT. Studies have shown that SLI-, PAT/DAT, and PDIM are tightly regulated, where an increase in one lipid pool can affect the abundance of other lipids. Quantitative assays using 14C acetate/propionate are most appropriate for these experiments. 

      We apologize for the fact that TLC analysis is not performed in a radioactive fashion. However, we do not have access to this approach. To answer reviewer question about the fact that other methyl-branched lipids may explain the altered flux of methyl malonyl CoA, we have run TLCs on all the strains tested to resolve SLs and PAT/DATs (Fig S8). Notably, we observed a reduction in the level of these lipids (SL1 or PAT/DAT) in virRmut cultured in glycerol relative to WT and complemented strains, suggesting that the excess of PDIM synthesis can take away methyl malonyl CoA from the biosynthesis of SL-1 and PAT/DAT in the absence of VirR (Fig S8B).

      (10) Figure 8: Interaction between VirR and Lcp proteins. Since these interactions are happening in the membrane, using a split GFP system where proteins are expressed in the cytoplasm is unlikely to be relevant.

      Also, experiments on Figure 8C are performed once, and representation needs to be clarified; split GFP needs a positive control, and negative control (CtpC) is not indicated in the figure.

      We have repeated the experiments and applied statistics (Figure 9). As stated in the manuscript this assay has successfully been applied to interrogate interactions of domains of proteins embedded in the membrane of mycobacteria. Therefore, we believe that this assay is valid to interrogate interactions between Lcp proteins.

      Reviewer #2 (Recommendations For The Authors):  

      (1) Authors should consider making more effort to mine the omics data and integrate them. Given the amount of data that is generated with the omics, they need to be looked at together to find out threads that connect all of them. 

      In the resubmitted version of the paper, we have followed reviewer´s recommendation by incorporating new analyses that integrated the virRmut-C strain, and tried to provide context to the differences found in the context of broader transcriptional regulatory networks (new figure 4), as well as in the context of metabolic pathways related with PDIM biosynthesis from methylmalonyl (figure 6I, already present in the first submission). We consider that these additions contribute to a deeper interpretation of the omics data in the line of what was suggested by the reviewer.

      (2) The interpretation given by authors in lines 387-390 is an interpretation that does not have sufficient support and, hence should be moved into discussion. 

      We thank the reviewer for this recommendation. We believe that these new analyses and integration studies now support the above statement.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I recommend being explicit regarding how the animals were habituated to blood sampling.

      On lines 109-111 we have added a more detailed explanation of how mice were habituated to blood sampling. This includes details that mice were held and had their tails palpated for approximately 5 minutes per day.

      Were any mice excluded due to loss or movement of the implant over time? Any details to allow replication of long-term measurements like this should be included.

      No mice lost their cannulas during experimentation so we have added a sentence on this on lines 303 to 304 to this effect.  We have also noted that there was a slight decrease in signal over the months of experimentation. A statement on line 318 has also been added that clarifies two mice lost between the pregnancy and lactation stages of experiment were euthanised due to dystocia.

      The text states that synchronized episodic activity reappeared as early as 3 days after birth, citing Figure 6c as evidence. There is no 6c. Figure 6b shows day 5 after birth.

      This has been corrected.

      The methods state mRNA levels had to be "above background" to be counted as colocalization. At how many fold/what percent above background was a cell considered positive for expression?

      Positive hybridisation was scored according to the manufacturer’s protocol and a statement to this effect has been added on line 144.

      Please ensure figure titles or the data graphs explicitly give the genotype of the mice in all figures (or state the mice are wildtype).

      Genotype has been added to figure titles where possible. Genotypes are always given in figure legends and tables and/or explicitly stated on the figure itself.

      Figure 4's title states events are "perfectly" correlated, which is a subjective term. I recommend saying "consistently" or "temporally" correlated, depending on your meaning.

      This has been amended to read “consistently correlated”

      Reviewer #2 (Recommendations For The Authors):

      The comments below aim to clarify the paper's methodology and results but do not detract from my overall enthusiasm for this work.

      - Given past studies demonstrating prolactin action in the brain, particularly the MPOA/MPN, is essential for maternal behavior, can the authors please clarify why this behavior is retained in the cam2a prlr knockout mice? The authors mention that prlr in the MPOA is only knocked down 50% compared to WT controls. Is this sufficient to retain maternal behavior?

      In our experience 50% Prlr in the MPOA is sufficient to retain normal maternal behaviour in most animals including the ones in this experiment (our original paper describing this showed relatively normal behavior, for example, with a vGAT and vGlut-mediated knockouts, and even a double knockout – it was only when we achieved complete KO with an AAV-Cre that we saw failure of maternal behavior – Brown et al, PNAS 3;114(40):10779-10784 2017). We have added a statement on lines 157-159 regarding this.  We have an additional paper in preparation specifically characterising the maternal behaviour and lactation outcomes in this line of mice, and we find most animals display normal maternal behaviour, with slightly impaired milk production in later lactation.

      - Supplementary Figure 1. Can the authors please clarify the criteria for a cell to be positive for prlr? The methods state that the signal must be "above background level." How was the background measurement obtained? In the negative control?

      As per above, scoring of positive hydribisation was done according to the manufacturer’s protocol and a statement to this effect has been added on line 144.

      - Lines 310-314: This sentence describes RNAscope analysis of prlr knockdown in kisspeptin cells and refers to Extended Figure 3 - but I believe this is in Supplementary Figure 1.

      This has been corrected.

      - Figure 3-4: When mice return to estrous cycles, the amplitude of episodic kisspeptin neuron activity is the same as 24 hours after weaning, which appears much lower than in virgin females. Does this reach significance? If so, do the authors know why kisspeptin activity is still suppressed, and can they comment on why this may not affect estrous cyclicity?

      This does not reach significance – see Supplementary Table 1 (4C) for statistics. Therefore, no further analysis was done. This question would need to be examined with a follow up experiment. Given the 5s on, 15 s off scheduled mode of recording used here, amplitude was not an extremely accurate measure and amplitude has been reported as relative within each mouse. There is also an additional issue of a gradual reduction in amplitude of signal over time in these long-term experiments – although it is true that much larger signals were detected after ovariectomy at the end of the experiment.  At present, we have not tried to interpret whether the changes in amplitude are informative.

      - Fiber photometry studies: Please indicate whether a post-mortem examination of GCaMP transfection and fiber photometry placement was conducted, and what region of the ARC was imaged.

      Brains from these mice were collected, however postmortem analysis of cannula placement of GCaMP6 transfection was not carried out in all mice. This was based on our experience with this method, in that the quality and characteristic pattern of activity seen, as well as corresponding LH secretion following an SE, was indicative of successful cannula placement and transfection.  Incorrectly placed cannular failed to show SEs. A trial was done with 3 mice and cannula placement was found to be in the caudal ARC (cARC) with GFP (attached to GCaMP) restricted to the cARC. A statement has been added on lines 306-313 regarding this.

      - Were male mice removed before birth? Please add to the methods section if not included.

      Yes, male mice were removed after a sperm plug was seen and were never present at parturition. We have inserted additional details on line 95 to this effect.

      Reviewer #3 (Recommendations For The Authors):

      (1) Line 172: n=7-8 per group, yet in Supplementary Figure 2, n=6 per group.

      These are referring to different groups of mice. N=7-8 is referring to the group size of mice in Figure 2 that were given mifepristone or vehicle control. In contrast the Supplementary figure 2 n number refers to the mice in the pilot study. Additional n number for the pilot study has been added on line 194.

      (2) Line 314: Extended = suppl; Figure 3 = 1.

      This has been corrected.

      (3) Line 451: Figure 6C, does not exist.

      This has been corrected.

      Line 590: Reference 23 could be replaced by Ordog T et al 1998 Am J Physiol 274,E665 because it is later and more relevant to the topic.

      This reference has been replaced with the suggested reference.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      Wang and colleagues presented an investigation of pig-origin bacteria Bacillus velezensis HBXN2020, for its released genome sequence, in vivo safety issue, probiotic effects in vitro, and protection against Salmonella infection in a murine model. Various techniques and assays are performed; the main results are all descriptive, without new insight advancing the field or a mechanistic understanding of the observed protection.

      Thank you very much for your reading and comments our manuscript.

      Strengths:

      An extensive study on probiotic property of the Bacillus velezensis strain HBXN2020

      Thank you very much for your reading and comments our manuscript.

      Weaknesses:

      The main results are descriptive without mechanistic insight. Additionally, most of the results and analysis parts are separated without a link or a story-telling way to deliver a concise message.

      Thank you for your comments and suggestions on our manuscript. In later work, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. The manuscript results and analysis sections have been extensively revised. We appreciate your review and feedback.

      Reviewer #2 (Public Review):

      Summary:

      In this study, Wang and colleagues study the potential probiotic effects of Bacillus velezensis. Bacillus species have potential benefit to serve as probiotics due to their ability to form endospores and synthesize secondary metabolites. B. velezensis has been shown to have probiotic effects in plants and animals but data for human use are scarce, particularly with respect to salmonella-induced colitis. In this work, the authors identify a strain of B. velezensis and test it for its ability to control colitis in mice.

      Thanks for the constructive comments and the positive reception of the manuscript.

      Key findings:

      (1) The authors sequence an isolate for B. velezensis - HBXN2020 and describe its genome (roughly 4 mb, 46% GC-content etc).

      Thanks for the constructive comments and the positive reception of the manuscript.

      (2) The authors next describe the growth of this strain in broth culture and survival under acid and temperature stress. The susceptibility of HBXN2020 was tested against various antibiotics and against various pathogenic bacteria. In the case of the latter, the authors set out to determine if HBXN2020 could directly inhibit the growth of pathogenic bacteria. Convincing data, indicating that this is indeed the case, are presented.

      Thanks for the constructive comments and the positive reception of the manuscript.

      (3) To determine the safety profile of BHXN2020 (for possible use as a probiotic), the authors infected the strain in mice and monitored weight, together with cytokine profiles. Infected mice displayed no significant weight loss and expression of inflammatory cytokines remained unchanged. Blood cell profiles of infected mice were consistent with that of uninfected mice. No significant differences in tissues, including the colon were observed.

      Thanks for the constructive comments and the positive reception of the manuscript.

      (4) Next, the authors tested the ability to HBXN2020 to inhibit growth of Salmonella typhimurium (STm) and demonstrate that HBXN2020 inhibits STm in a dose dependent manner. Following this, the authors infect mice with STm to induce colitis and measure the ability of HBXN2020 to control colitis. The first outcome measure was a reduction in STm in faeces. Consistent with this, HBXN2020 reduced STm loads in the ileum, cecum, and colon. Colon length was also affected by HBXN2020 treatment. In addition, treatment with HBXN2020 reduced the appearance colon pathological features associated with colitis, together with a reduction in inflammatory cytokines.

      Thanks for the constructive comments and the positive reception of the manuscript.

      (5) After noting the beneficial (and anti-inflammatory effects) of HBXN2020, the authors set out to investigate effects on microbiota during treatment. Using a variety of algorithms, the authors demonstrate that upon HXBN2020 treatment, microbiota composition is restored to levels akin to that seen in healthy mice.

      Thanks for the constructive comments and the positive reception of the manuscript.

      (6) Finally, the authors assessed the effect of using HBXN2020 as prophylactic treatment for colitis by first treating mice with the spores and then infecting with STm. Their data indicate that treatment with HBXN2020 reduced colitis. A similar beneficial impact was seen with the gut microbiota.

      Thanks for the constructive comments and the positive reception of the manuscript.

      Strengths:

      (1) Good use of in vitro and animal models to demonstrate a beneficial probiotic effect.

      Thank you very much for your reading and comments our manuscript.

      (2) Most observations are supported using multiple approaches.

      Thanks for the comments and the positive reception of the manuscript.

      (3) Mouse experiments are very convincing.

      Thanks for the comments and the positive reception of the manuscript.

      Weaknesses:

      (1) Whilst a beneficial effect is observed, there no investigation of the mechanism that underpins this.

      Thank you for pointing this out. We apologize for any inconvenience caused by the lack of mechanism research of the manuscript. In later work, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. Thank you for your suggestions, and we hope our response has addressed your concerns.

      (2) Mouse experiments would have benefited from the use of standard anti-inflammatory therapies to control colitis. That way the authors could compare their approach of using bacillus spores that current gold standard for treatment.

      We gratefully appreciate for your valuable comments. The comments improve the quality and depth of manuscript. Based on your suggestion, we have supplemented this in the revised manuscript. We appreciate your review and feedback, and have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 198-378 in results section of the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Wang et al. investigates the effects of B. velezensis HBXN2020 in alleviating S. Typhimurium-induced mouse colitis. The results showed that B. velezensis HBXN2020 could alleviate bacterial colitis by enhancing intestinal homeostasis (decreasing harmful bacteria and enhancing the abundance of Lactobacillus and Akkermansia) and gut barrier integrity and reducing inflammation.

      Thanks for the comments and the positive reception of the manuscript.

      Strengths:

      B. velezensis HBXN2020 is a novel species of Bacillus that can produce a great variety of secondary metabolites and exhibit high antibacterial activity against several pathogens. B. velezensis HBXN2020 is able to form endospores and has strong anti-stress capabilities. B. velezensis HBXN2020 has a synergistic effect with other beneficial microorganisms, which can improve intestinal homeostasis.

      Thanks for the comments and the positive reception of the manuscript.

      Weaknesses:

      Few studies about the clinical application of Bacillus velezensis. Thus, more studies are still needed to explore the effectiveness of Bacillus velezensis before clinical application.

      Thanks for your suggestion. This study serves as an exploratory investigation before the application of Bacillus velezensis. The main purpose of this study is to explore the potential of Bacillus velezensis in application. We appreciate your review and feedback and hope that our response adequately addresses your concerns.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Most of my previous comments are well addressed, here are a few examples.<br /> While in my last comment, I requested a Colitis Mouse Model, which will well resemble the diarrhea disease caused by Salmonella in mammals. The available statement is not convincing, please check https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2225501/, https://pubs.rsc.org/en/content/articlelanding/2020/fo/d0fo01017k please replace "colitis" to a normal infection model. The current statement is incorrect.

      Thank you for your valuable suggestion. The comments improve the quality of manuscript. We have corrected this in the revised manuscript as suggested. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 2, 29, 38, 46, 48, 199, 204, 246, 248, 282, 307, 310, 316, 431, 433, 464, 466, 473, 494, 497, 499, 504, 513, 518, 525, 706, 710 and 735 in the revised manuscript.

      Certain parts remain to be overestimated, to my knowledge, the language and logical flow should be addressed thoroughly.

      Here are suggestions to improve the logical flow of the manuscript.

      (1) Probiotic sampling and isolation

      (2) in vitro assessment

      (3) genomic sequencing and in silico safety assessment (Crit Rev Food Sci Nutr. 2023;63(32):11244-11262), which should be included as a right ref.

      (4) in vivo assay for safety evaluation, but not biosafety (it has a different meaning!!)

      (5) infection model and protection assay.

      We gratefully appreciate for your valuable comments. The comments improve the quality and depth of manuscript. According to your suggestion, we do our best to correct those problems in the revised manuscript. We would like to express our apologies once again and hope that the revised manuscript meets your expectations. We have marked the updated contents in the revised manuscript.

      Also, please pay attention to the logical link or transition sentences between each part to connect the dots in each part.

      We gratefully appreciate for your valuable comments. The comments improve the quality of manuscript. According to your suggestion, we have corrected this in the revised manuscript. We have marked the updated contents in the revised manuscript. 

      Finally, there are also lots of typos and errors, please improve through the text.<br /> For example, Line 521. "Stain", and more...

      Thanks for pointing this out. Based on your suggestion, we have corrected in the revised manuscript. We have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 753, 1055, 1087 in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      The revised manuscript by Wang and colleagues attempts to address concerns raised during the first round of review.

      All minor comments have been addressed and in general, the major concerns have been partially addressed in the revised manuscript.

      The outstanding concerns relate to the mechanistic basis of the observations. The authors made no attempt to address this in a meaningful manner. Secondly, the issue of comparing the responses to what would be standard therapy (such as anti-inflammatories) was also handled in a somewhat dismissive manner, referring to other ongoing/future work. The clinical utility of the findings are hard to ascertain if there is no comparison to the current gold standard therapeutic approach.

      I have no further suggestions for the authors, save for those previously made.

      Thank you for pointing this out. We apologize for any inconvenience caused by the lack of mechanism research of the manuscript. In later work, we will focus on exploring the antibacterial substances and bactericidal mechanisms of B. velezensis. Thank you for your suggestions, and we hope our response has addressed your concerns.

      Secondly, About the comparative trial of oral bacillus spore treatment with the current gold standard for treatment, we have supplemented this in the revised manuscript. We appreciate your review and feedback, and have marked the updated contents in the revised manuscript.

      The updated contents were presented in line 198-378 in results section of the revised manuscript.

      Reviewer #3 (Recommendations For The Authors):

      This is a revision, they have addressed all my concerns, and now it is acceptable.

      Thank you very much for your comments and recognition of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Summary:

      Left-right asymmetry in the developing embryo is important for establishing correct lateralisation of the internal organs, including the gut. It has been shown previously that the dorsal mesentery (DM), which supports looping of the endodermal gut tube during development, is asymmetric with sharp delineation of left and right domains prior to gut looping. The authors set out to investigate the nature of the midline barrier that separates the left and right sides of the DM. They identify a transient basement membrane-like structure which is organised into two layers between the notochord and descending endoderm. In the time window when this basement membrane structure exists, there is no diffusion or cell mixing between the left and right sides of the DM, but once this structure starts breaking down, mixing and diffusion occur. This suggests it acts as a barrier, both physical and chemical, between left and right at the onset of gut lateralisation.

      Strengths:

      The authors identify a new midline structure that likely acts as a barrier to facilitate left and right separation during early organogenesis. This is an interesting addition to the field of laterality, with relevance to laterality-related disorders including heterotaxia, and may represent a gut-specific mechanism for establishing and maintaining early left-right asymmetry. The structure of this midline barrier appears to be an atypical basement membrane, comprising two adjacent basement membranes. The complexities of basement membrane assembly, maintenance, and function are of importance in almost all organismal contexts. Double basement membranes have been previously reported (for example in the kidney glomeruli as the authors note), and increasing evidence suggests that atypical basement membrane organisation or consideration is likely to be more prevalent than previously appreciated. Thus this work is both novel and broadly interesting.

      The data presented are well executed, using a variety of well-established methods. The characterisation of the midline barrier at the stages examined is extensive, and the data around the correlation between the presence of the midline barrier and molecular diffusion or cell mixing across the midline are convincing.

      Weaknesses:

      The study is rather descriptive, and the authors' hypotheses around the origins of the midline barrier are speculative and not experimentally demonstrated. While several potential origins of the midline are excluded raising interesting questions about the timing and cell-type-specific origin of the midline basement membrane, these remain unanswered which limits the scope of the paper.

      We extend our appreciation to Reviewer #1 for their thoughtful and comprehensive evaluation of our work, recognizing the considerable time and effort they dedicated to our work. We agree that functional data would significantly strengthen our understanding of the midline barrier and its exact role during LR asymmetric gut development. However, we would like to note that repeated and diligent attempts to perturb this barrier were made using various strategies, such as in vivo laser ablation, diphtheria toxin, molecular disruption (Netrin 4), and enzymatic digestion (MMP2 and MMP9 electroporation) but we observed no significant effect or stable disruption of the midline. We acknowledge and accept this limitation and hope that our discovery will invite future investigations and perturbation of this novel midline structure.

      For example, it is unclear whether the two basement membranes originally appear to be part of a single circular/spherical structure (which looks possible from the images) that simply becomes elongated, or whether it is indeed initially two separate basement membranes that extend.

      We favor the hypothesis that the elongation of the preexisting small circular structure to an extended double membrane of relatively increased length would be unlikely without continued contribution of new basement membrane components. However, our attempts to label and trace the basement membrane of the endoderm using tagged laminins (LAMB1-GFP, LAMB1-His, and LAMC1-His), and more recently tagged nidogen constructs (NID1-GFP and NID1-mNG) have met with export issues (despite extensive collaboration with experts, Drs. Dave Sherwood and Peter Yurchenco). As such, it remains difficult to differentiate between the two possibilities suggested. We also believe this is an important question and will continue to investigate methods to trace it.

      There is a substantial gap between the BMs at earlier stages before the endoderm has descended - is this a lumen, or is it filled with interstitial matrix?

      Our preliminary studies indicate that the gap enclosed by the basement membranes in the early midline structure does have extracellular matrix present, such as fibrillin-2 (see Author response image 1). Also, the electron microscopy shown in Fig. 2 C’’ supports that the space between the notochord and endoderm has fibrillar matrix.

      Author response image 1.

      The authors show where this basement membrane does not originate from, but only speculate on its origin. Part of this reasoning is due to the lack of Lama1-expressing cells either in the early midline barrier before it extends, or in the DM cells adjacent to it. However, the Laminin observed in the midline could be comprised of a different alpha subtype for example, that wasn't assessed (it has been suggested that the Laminin antibody used in this study is not specific to the alpha-1 subunit, see e.g. Lunde et al, Brain Struct Funct, 2015).

      We appreciate this comment and have tried other laminin RNA probes that showed similar lack of midline expression (Lama1, lama3, lama5). Importantly, the laminin alpha 1 subunit is a component of the laminin 111 heterotrimer, which along with laminin 511 is the first laminin to be expressed and assemble in embryonic basement membranes, as reviewed in Yurchenco 2011. Laminin 111 is particularly associated with embryonic development while laminins 511/521 become the most widespread in the adult (reviewed in Aumailley 2013). It is likely that the midline contains laminin 111 based on our antibody staining and the accepted importance and prevalence of laminin 111 in embryonic development. However, it is indeed worth noting that most laminin heterotrimers contain beta 1, gamma 1, or both subunits, and due to this immunological relation laminin antibody cross reactivity is certainly known (Aumailley 2013). As such, while laminin 511 remains a possibility as a component of the midline BM, our lama5 in situs have shown no differential expression at the midline of the dorsal mesentery (see Author response image 2), and as such we are confident that our finding of no local laminin transcription is accurate. Additionally, we will note that the study referenced by the Reviewer observed cross reactivity between the alpha 1 and alpha 2 subunits. Laminin 211/221 is an unlikely candidate based on the embryonic context, and because they are primarily associated with muscle basement membranes (Aumailley 2013). In further support, we recently conducted a preliminary transcriptional profile analysis of midline cells isolated through laser capture microdissection (LCM), which revealed no differential expression of any laminin subunit at the midline. Please note that these data will be included as part of a follow-up story and falls beyond the scope of our initial characterization.

      Author response image 2.

      Similarly, the authors show that the midline barrier breaks down, and speculate that this is due to the activity of e.g. matrix metalloproteinases, but don't assess MMP expression in that region.

      This is an important point, as the breakdown of the midline is unusually rapid. Our MMP2 RNA in situ hybridization at HH21, and ADAMTS1 (and TS9) at HH19-21 indicates no differential activity at the midline (see Author response images 3 and 4). Our future focus will be on identifying a potential protease that exhibits differential activity at the midline of the DM.

      Author response image 3.

      Author response image 4.

      The authors suggest the (plausible) hypothesis that the descent of the endoderm pulls or stretches the midline barrier out from its position adjacent to the notochord. This is an interesting possibility, but there is no experimental evidence to directly support this. Similarly, while the data supporting the barrier function of this midline is good, there is no analysis of the impact of midline/basement membrane disruption demonstrating that it is required for asymmetric gut morphogenesis. A more functional approach to investigating the origins and role of this novel midline barrier would strengthen the study.

      Yes, we fully agree that incorporating functional data would immensely advance our understanding of the midline barrier and its crucial role in left-right gut asymmetry. However, our numerous efforts to perturb this barrier have encountered technical obstacles. For instance, while perturbing the left and right compartments of the DM is a routine and well-established procedure in our laboratory, accessing the midline directly through similar approaches has been far more challenging. We have made several attempts to address this hurdle using various strategies, such as in vivo laser ablation, diphtheria toxin, molecular disruption (Netrin 4), and enzymatic digestion (MMP2 and MMP9 electroporation). Despite employing diverse approaches, we have yet to achieve effective and interpretable perturbation of this resilient structure. We acknowledge this limitation and remain committed to developing methods to disrupt the midline in our current investigations. We again thank Reviewer #1 for the detailed feedback on our manuscript, guidance, and the time taken to provide these comments.

      Recommendations For The Authors:

      Using Laminin subunit-specific antibodies, or exploring the mRNA expression of more laminin subunits may support the argument that the midline does not derive from the notochord, endoderm, or DM.

      As mentioned above, RNA in situ hybridization for candidate genes and a preliminary RNA-seq analysis of cells isolated from the dorsal mesentery midline revealed no differential expression of any laminin subunits.

      Similarly, expression analysis of Laminin-degrading MMPs, and/or application of an MMP inhibitor and assessment of midline integrity could strengthen the authors' hypothesis that the BM is actively and specifically broken down.

      Our MMP2 RNA in situ hybridization at HH21, and ADAMTS1 at HH19-21shows no differential expression pattern at the midline of the DM (see Author response image 3). We have not included these data in the revision, but future work on this topic will aim at identifying a protease that is differentially active at the midline of the DM.

      Functionally testing the role of barrier formation in regulating left-right asymmetry or the role of endoderm descent in elongating the midline barrier would be beneficial. Regarding the former, the authors show that Netrin4 overexpression is insufficient to disrupt the midline, but perhaps overexpression of e.g. MMP9 prior to descent of the endoderm would facilitate early degradation of the midline, and the impact of this on gut rotation could be assessed.

      Unfortunately, MMP9 electroporation has produced little appreciable effect. We acknowledge that the lack of direct evidence for the midline’s role in regulating left-right asymmetry is a shortcoming, but current work on this subject aims to define the midline’s function to LR asymmetric morphogenesis.

      Reviewer #2:

      When the left-right asymmetry of an animal body is established, the barrier that prevents the mixing of signals or cells across the midline is essential. The midline barrier that prevents the mixing of asymmetric signals during the patterning step has been identified. However, a midline barrier that separates both sides during asymmetric organogenesis is unknown. In this study, the authors discovered the cellular structure that seems to correspond to the midline in the developing midgut. This midline structure is transient, present at the stage when the barrier would be required, and composed of Laminin-positive membrane. Stage-dependent diffusion of dextran across the midline (Figure 6) coincides with the presence or absence of the structure (Figures 2, 3). These lines of indirect evidence suggest that this structure most likely functions as the midline barrier in the developing gut.

      We extend our gratitude to Reviewer #2 for their thoughtful assessment of our research and for taking the time to provide these constructive comments. We are excited to report that we have now included additional new data on midline diffusion using BODIPY and quantification method to further support our findings on the midline's barrier function. While our data on dextran and now BODIPY both indirectly suggests barrier function, we aspire to perturb the midline directly to assess its role in the dorsal mesentery more conclusively. However, our numerous efforts to perturb this barrier have encountered technical obstacles. For instance, while perturbing the left and right compartments of the DM is a routine and well-established procedure in our laboratory, accessing the midline directly through similar approaches has been far more challenging. We have made several attempts to address this hurdle using various strategies, such as in vivo laser ablation, diphtheria toxin, molecular disruption (Netrin 4), and enzymatic digestion (MMP2 and MMP9 electroporation). Despite employing diverse approaches, we have yet to achieve effective and interpretable perturbation of this resilient structure. Moving forward, our focus is on identifying an effective means of perturbation that can offer direct evidence of barrier function.

      Recommendations For The Authors:

      (1) It would be much nicer if the requirement of this structure for asymmetric morphogenesis was directly tested. However, experimental manipulations such as ectopic expression of Netrin4 or transplantation of the notochord were not able to influence the formation of this structure (these results, however, suggested the mechanism of the midline formation in the gut dorsal mesentery). Therefore, it seems not feasible to directly test the function of the structure, and this should be the next issue.

      We fully agree that the midline will need to be perturbed to fully elucidate its role in asymmetric gut morphogenesis. As noted, multiple attempts were ineffective at perturbing this structure. Extensive current work on this topic is dedicated to finding an effective perturbation method.

      (2) Whereas Laminin protein was present in the double basement membrane at the midline, Laminin mRNA was not expressed in the corresponding region (Fig. 4A-C). It is necessary to discuss (with experimental evidence if available) the origin of Laminin protein.

      As we have noted, the source of laminin and basement membrane components for the midline remains unclear - no local transcription and the lack of sufficiency of the notochord to produce a midline indicates that the endoderm to be a likely source of laminin, as we have proposed in our zippering endoderm model. We will note that Fig. 4A-C indicate that laminin is in fact actively transcribed in the endoderm. Currently, attempts to trace the endodermal basement membrane using tagged laminins (LAMB1-GFP, LAMB1-His, and LAMC1-His), and more recently tagged nidogen constructs (NID1-GFP and NID1-mNG) have met with export issues (despite extensive collaboration with experts, Drs. Dave Sherwood and Peter Yurchenco). Confirmation of our proposed endodermal origin model is a goal of our ongoing work.

      (3) Figure 4 (cell polarity from GM130 staining): addition of representative GM130 staining images for each Rose graph (Figure 4E) would help. They can be shown in Supplementary Figures. Also, a graph for the right coelomic epithelium in Fig. 4E would be informative.

      We have added the requested GM130 images in our Supplemental Figures (please refer to Fig. S4ABB’) and modified the main Fig. 4E to include a rose graph for the polarity of the right coelomic epithelium.

      (4) Histological image of HH19 DM shown in Fig. 2J looks somehow different from that shown in Fig. 3F. Does Fig. 2J represent a slightly earlier stage than Fig. 3F?

      Figure 2J and Figure 3F depict a similar stage, although the slight variation in the length of the dorsal mesentery is attributed to the pseudo time phenomenon illustrated in Figure 3J-J’’’. This implies that the sections in Figure 2J and Figure 3F might originate from slightly different positions along the anteroposterior axis. Nonetheless, these distinctions are minimal, and based on the dorsal mesentery's length in Figure 2J, the midline is likely extremely robust regardless of this minor pseudo time difference.

      Reviewer #3:

      Summary:

      The authors report the presence of a previously unidentified atypical double basement membrane (BM) at the midline of the dorsal mesentery (DM) during the establishment of left-right (LR) asymmetry. The authors suggest that this BM functions as a physical barrier between the left and the right sides of the DM preventing cell mixing and ligand diffusion, thereby establishing LR asymmetry.

      Strengths:

      The observation of the various components in the BM at the DM midline is clear and convincing. The pieces of evidence ruling out the roles of DM and the notochord in the origin of this BM are also convincing. The representation of the figures and the writing is clear.

      Weaknesses:

      The paper's main and most important weakness is that it lacks direct evidence for the midline BM's barrier and DM LR asymmetry functions.

      We thank Reviewer #3 for their thoughtful and comprehensive evaluation of our work, recognizing the considerable time and effort they dedicated to assessing our study. We fully agree that incorporating functional data would immensely advance our understanding of the midline barrier and its crucial role in left-right gut asymmetry. However, several distinct attempts at perturbing this barrier have encountered technical obstacles. While our laboratory routinely perturbs the left and right compartments of the DM via DNA electroporation and other techniques, directly perturbing the midline using these methods is far more challenging. We have made diligent attempts to address this using various strategies, such as in vivo laser ablation, diphtheria toxin, molecular disruption (Netrin 4), and enzymatic digestion (MMP2 and MMP9 electroporation). However, we have not yet been able to identify a means of producing consistent and interpretable perturbation of the midline. We acknowledge this limitation and remain committed to developing methods to disrupt the midline in our current investigations.

      Recommendations For The Authors:

      Major:

      (1) We suggest the authors test their hypotheses i.e., physical barrier and proper LR asymmetry establishment by the midline BM, by disrupting it using techniques such as physical ablation, over-expression of MMPs, or treatment with commercially available enzymes that digest the BM.

      As above, efforts involving physical ablation and MMP overexpression have not yielded significant effects on the midline thus far. Moving forward, investigating the midline's role in asymmetric morphogenesis will necessitate finding a method to perturb it effectively. In pursuit of progress on this critical question, we recently conducted laser capture microdissection (LCM) and RNA-sequencing of the midline to unravel the mechanisms underlying its formation and potential disruption. This work shows promise but it is still in its early stages; validating it will require significant time and effort, and it falls outside the scope of the current manuscript.

      (2) Lefty1's role in the midline BM was ruled out by correlating lack of expression of the gene at the midline during HH19 when BM proteins expression was observed. Lefty1 may still indirectly or directly trigger the expression of these BM proteins at earlier stages. The only way to test this is by inhibiting lefty1 expression and examining the effect on BM protein localization.

      We have added a section to discuss the potential of Lefty1 inhibition as a future direction. However, similar to perturbing global Nodal expression, interpreting the results of Lefty1 inhibition could be challenging. This is because it may not specifically target the midline but could affect vertebrate laterality as a whole. Despite this complexity, we acknowledge the value of such an experiment and consider it worth pursuing in the future.

      (3) Using a small dextran-based assay, the authors conclude that diffusible ligands such as cxcl2 and bmp4 do not diffuse across the midline (Figure 6). However, dextran injection in this system seems to label the cells, not the extracellular space. The authors measure diffusion, or the lack thereof, by counting the proportion of dextran-labeled cells rather than dextran intensity itself. Therefore, This result shows a lack of cell mixing across the midline (already shown in Figure 2 ) rather than a lack of diffusion.

      We should emphasize that the dextran-injected embryos shown in Fig. 6 D-F were isolated two hours post-injection, a timeframe insufficient for cell migration to occur across the DM (Mahadevan et al., 2014). We also collected additional post-midline stage embryos ten minutes after dextran injections - too short a timeframe for significant cellular migration (Mahadevan et al., 2014). Importantly, the fluorescent signal in those embryos was comparable to that observed in the embryos in Fig. 6. Thus, we believe the movement of fluorescent signal across the DM when the barrier starts to fragment (HH20-HH23) is unlikely to represent cell migration. More than a decade of DNA electroporation experiments of the left vs. right DM by our laboratory and others have never indicated substantial cell migration across the midline (Davis et al., 2008; Kurpios et al., 2008; Welsh et al., 2013; Mahadevan et al., 2014; Arraf et al. 2016; Sivakumar et al., 2018; Arraf et al. 2020; and Sanketi et al., 2022). This is also shown in our current GFP/RFP double electroporation data in Fig. 2 G-H, and DiI/DiO labeling data in Fig. 2 E-G. Collectively, our experiments suggest that the dextran signal we observed at HH20 and HH23 is likely not driven by cell mixing.

      To further strengthen this argument, we now have additional new data on midline diffusion using BODIPY diffusion and quantification method to support our findings on the midline's function against diffusion (please refer to New Fig. 6H-M). Briefly, we utilized a BODIPY-tagged version of AMD3100 (Poty et al., 2015) delivered via soaked resin beads surgically inserted into the left coelomic cavity (precursor to the DM). The ratio of average AMD3100-BODIPY intensity in the right DM versus the left DM was below 0.5 when the midline is intact (HH19), indicating little diffusion across the DM (Fig. 6J). At HH21 when no midline remains, this ratio significantly rises to near one, indicating diffusion of the drug is not impeded when the midline basement membrane structure is absent. Collectively, these data suggest that the basement membrane structure at the midline forms a transient functional barrier against diffusion.

      (4) Moreover, in a previous study (Mahadevan et al., Dev Cell., 2014), cxcl2 and bmp4 expression was observed on both the left and right side before gut closure (HH17, when midline BM is observed). Then their expression patterns were restricted on the left or right side of DM at around HH19-20 (when midline BM is dissociated). The authors must explain how the midline BM can act as a barrier against diffusible signals at HH-17 to 19, where diffusible signals (cxcl12 and bmp4) were localized on both sides.

      We appreciate the Reviewer's invitation to clarify this crucial point. Early in dorsal mesentery (DM) formation, genes like Cxcl12 (Mahadevan et al., Dev Cell 2014) and Bmp4 (Sanketi et al., Science 2021) exhibit symmetry before Pitx2 expression initiates on the left (around ~HH18, Sanketi et al., 2021). Pitx2 then inhibits BMP4 (transcription) and maintains Cxcl12 (mRNA) expression on the left side. The loss of Cxcl12 mRNA on the right is due to the extracellular matrix (ECM), particularly hyaluronan (Sivakumar et al., Dev Cell 2018). Our hypothesis is that during these critical stages of initial DM asymmetry establishment, the midline serves as a physical barrier against protein diffusion to protect this asymmetry during a critical period of symmetry breaking. Although some genes, such as Pitx2 and Cxcl12 continue to display asymmetric transcription after midline dissolution (Cxcl12 becomes very dynamic later on – see Mahadevan), it's crucial to note that the midline's primary role is preventing protein diffusion across it, akin to an insurance policy. Thus, the absence of the midline barrier at HH21 does not result in the loss of asymmetric mRNA expression. We think its primary function is to block diffusible factors from crossing the midline at a critical period of symmetry breaking. We acknowledge that confirming this hypothesis will necessitate experimental disruption of the midline and observing the consequent effects on asymmetry in the DM. This remains central to our ongoing research on this subject.

      (5) On page 11, lines 15-17, the authors mention that "We know that experimentally mixing left and right signals is detrimental to gut tilting and vascular patterning-for example, ectopic expression of pro-angiogenic Cxcl12 on the right-side results in an aberrant vessel forming on the right (Mahadevan et al., Dev Cell., 2014)". In this previous report from the author's laboratory, the authors suggested that ectopic expression of cxcl12 on the right side induced aberrant formation of the vessel on the right side, which was formed from stage HH17, and the authors also suggested that the vessel originated from left-sided endothelial cells. If the midline BM acts as a barrier against the diffusible signal, how the left-sided endothelial cells can contribute to vessel formation at HH17 (before midline BM dissociation)?

      To address this point, we suggest directing the Reviewer to previously published supplemental movies of time-lapse imaging, which clearly illustrate the migration path of endothelial cells from left to right DM (Mahadevan et al., Dev Cell 2014). While the Reviewer correctly notes that ectopic induction of Cxcl12 on the right induces left-to-right migration, it's crucial to highlight that these cells never cross the midline. Instead, they migrate immediately adjacent to the tip of the endoderm (please also refer to published Movies S2 and S3). We observe this migration pattern even in wild-type scenarios during the loss of the endogenous right-sided endothelial cords, where some endothelial cells from the right begin slipping over to the left around HH19-20 (over the endoderm), as the midline is beginning to fragment, but never traverse the midline. We attribute this migration pattern to a dorsal-to-ventral gradient of left-sided Cxcl12 expression, as disrupting this pattern perturbs the migration trajectory (Mahadevan).

      6) It is unclear how continuous is the midline BM across the anterior-posterior axis across the relevant stages. Relatedly, it is unclear how LR segregated the cells are, across the anterior-posterior axis across the relevant stages.

      We refer the reviewer to Fig. 3J-K, in which the linear elongation of the midline basement membrane structure is shown and measured at HH19 in three embryos from the posterior of the embryo to the anterior point at which the midline is fragmented and ceases to be continuous. Similarly, Fig. S2 shoes the same phenomenon in serial sections along the length of the anterior-posterior (AP) axis at HH17, also showing the continuity of the midline. All our past work at all observed sections of the AP axis has shown that cells do not move across the midline as indicated by electroporation of DNA encoding fluorescent reporters (Davis et al. 2008, Kurpios et al. 2008, Welsh et al. 2013, Mahadevan et al. 2014, Sivakumar et al. 2018, Sanketi et al. 2022), and is shown again in Fig. 2 E-H. As noted previously, very few endothelial cells cross the midline at a point just above the endoderm (image above) when the right endothelial cord remodels (Mahadevan et al. 2014), but this is a limited phenomenon to endothelial cells and cells of the left and right DM are fully segregated as previously established.

      Minor comments:

      (1) The authors found that left and right-side cells were not mixed with each other even after the dissociation of the DM midline at HH21 (Fig2 H). And the authors also previously mentioned that N-cadherin contributes to cell sorting for left-right DM segregation (Kurpios et al., Proc Natl Acad Sci USA., 2008). It could be a part of the discussion about the difference in tissue segregation systems before or after the dissociation of DM midline.

      We appreciate this thoughtful suggestion. N-cadherin mediated cell sorting is key to the LR asymmetry of the DM and gut tilting, and we believe it underlies the observed lack of cell mixing from left and right DM compartments after the midline fragments. We have added a brief section to the discussion concerning the asymmetries in N-cadherin expression that develop after the midline fragments.

      (2) Please add the time point on the images (Fig3 C, D, Fig 6A and B)

      We have updated these figures to provide the requested stage information.

      (3) The authors suggested that the endoderm might be responsible for making the DM BM midline because the endoderm links to DM midlines and have the same resistance to NTN4. The authors mentioned that the midline and endoderm might have basement membranes of the same "flavor." However, perlecan expression was strongly expressed in the midline BM compared with the endodermal BM. It could be a part of the discussion about the difference in the properties of the BM between the endoderm and DM midline.

      Perlecan does indeed localize strongly to the endoderm as well as the midline. The HH18 image included in prior Fig. S3 B’, B’’ appears to show atypically low antibody staining in the endoderm for all membrane components. Perlecan is an important component for general basement membrane assembly, and the bulk of our HH18 and HH19 images indicate strong staining for perlecan in both midline and endoderm. Perlecan staining at the very earliest stages of midline formation also indicate perlecan in the endoderm as well, supporting the endoderm as a potential source for the midline basement membrane. We have updated Fig. S3 to include these images in our revision.

      (4) The authors investigated whether the midline BM originates from the notochord or endoderm, but did not examine a role for endothelial cells and pericytes surrounding the dorsal aorta (DA). In Fig S1, Fig S2, and FigS3, the authors showed that DA is very close to the DM midline basement membrane, so it is worth checking their roles.

      We fully agree that the dorsal aorta and the endothelial cords that originate from the dorsal aorta may interact with the midline in important ways. However, accessing the dorsal aorta for electroporation or other perturbation is extremely difficult. Additionally, the basement membrane of vascular endothelial cells has a distinct composition from a non-vascular basement membrane. Vascular endothelial cells produce only alpha 4 and alpha 5 laminin subunits but contain no alpha 1 subunit in any known species (reviewed in DiRusso et al., 2017). Thus, endothelial cell-derived basement membranes would not contain the alpha 1 laminin subunit that we used in our studies as a robust marker of the midline basement membrane. Additionally, no fibronectin is found in the midline basement membrane, while it is enriched in the dorsal aorta (see Supplemental Figure 3CC’C’’). We will briefly note that our preliminary data in quail tissue indicates that QH1+ cord cells (i.e. endothelial cells) sometimes exhibit striking contact with the midline along the dorso-ventral length of the DM, suggesting not an origin but an important interaction.

      Reviewer #4 (Recommendations For The Authors):

      Major comments:

      (1) The descending endoderm zippering model for the formation of the midline lacks evidence.

      We have attempted to address this issue by introducing several tagged laminin constructs (LAMB1-GFP, LAMB1-His, LAMC1-His), and more recently tagged nidogen plasmids (NID1-GFP and NID1-mNG) to the endoderm via DNA electroporation to try to label the source of the basement membrane. Production of the tagged components occurred but no export was observed in any case (despite extensive collaboration with experts in this area, Drs. Dave Sherwood and Peter Yurchenco). This experiment was further complicated by the necessary large size of these constructs at 10-11kb due to the size of laminin subunit genes, resulting in low electroporation efficiency. We also believe this is an important question and are continuing to investigate methods to trace it.

      The midline may be Ntn4 resistant until it is injected in the source cells.

      Ntn4 has been shown to disrupt both assembling and existing basement membranes (Reuten et al. 2016). Thus, we feel that the midline and endodermal basement membranes’ resistance to degradation is not determined by stage of assembly or location of secretion.

      Have you considered an alternative origin from the bilateral dorsal aorta or the paraxial mesoderm, which would explain the double layer as a meeting of two lateral tissues? The left and right paraxial mesoderm seem to abut in Fig. S1B-C and S2E, and is laminin-positive in Fig 4A'. What are the cells present at the midline (Fig.4D-E)? Are they negative for the coelomic tracing, paraxial or aortic markers?

      We fully agree that alternate origins of the midline basement membrane cannot be ruled out from our existing data. We agree and have considered the dorsal aorta and even the endothelial cords that originate from the dorsal aorta. However, accessing the dorsal aorta for electroporation or other perturbation is extremely difficult. Importantly, the basement membrane of vascular endothelial cells has a distinct composition from a non-vascular basement membrane. Vascular endothelial cells produce only alpha 4 and alpha 5 laminin subunits but contain no alpha 1 subunit in any known species (reviewed in Hallmann et al. 2005). Thus, endothelial cell-derived basement membranes would not contain the alpha 1 laminin subunit that we used in our studies as a robust marker of the midline basement membrane. Note in Fig. 3 E-H that our laminin alpha 1 antibody staining does not label the aortae. Additionally, no fibronectin is found in the midline basement membrane, while it is enriched in the dorsal aorta (see Supplemental Figure 3CC’C’’). We will briefly note that our preliminary data in quail tissue indicates that QH1+ cord cells (i.e. endothelial cells) sometimes exhibit striking contact with the midline along the dorso-ventral length of the DM, suggesting not an origin but an important interaction. Moreover, at the earliest stages of midline basement membrane emergence, the dorsal aortae are distant from the nascent basement membrane, as are the somites, which have not yet undergone any epithelial to mesenchymal transition. Fig. S2G provides an example of an extremely early midline basement membrane without dorsal aorta or somite contact. S2G is from a section of the embryo that is fairly posterior in the embryo, it is thus less developed in pseudo-time and gives a window on midline formation in very early embryos.

      (2) The importance of the midline is inferred from previously published data and stage correlations but will require more direct evidence. Can the midline be manipulated with Hh signaling or MMPs?

      We agree that direct evidence in the form of midline perturbation will be critically required. As previously noted, our numerous efforts to perturb this barrier have encountered technical obstacles. For instance, while perturbing the left and right compartments of the DM is a routine and well-established procedure in our laboratory, accessing the midline directly through similar approaches has been far more challenging. We have made several attempts to address this hurdle using various strategies, such as in vivo laser ablation, diphtheria toxin, molecular disruption (Netrin 4), and enzymatic digestion (MMP2 and MMP9 electroporation). Despite employing diverse approaches, we have yet to achieve effective and interpretable perturbation of this resilient structure. Targeting Hh signaling between the endoderm and notochord is a good idea and we will continue these efforts. Thanks very much.

      Minor comments:

      - Please add the species in the title.

      We have altered the title as follows: “An atypical basement membrane forms a midline barrier during left-right asymmetric gut development in the chicken embryo.”

      - The number of observations in Fig2, Fig3A-B, 4A-C, G-H, S1, S3 is lacking.

      We have added the requested n numbers of biological replicates to the legends of the specified figures.

      - Please annotate Fig 3J to show what is measured in K.

      We have modified Fig. 3J to include a dashed bar indicating the length measurements in Fig. 3K.

      - Please provide illustrations of Fig 4E.

      We have added a representative image of GM130 staining to the supplement.

      - If laminin gamma is the target of Ntn4, its staining would help interpret the results of Ntn4 manipulation. Is laminin gamma present in different proportions in the different types of basement membranes, underlying variations in sensitivity?

      Laminin is exported as a heterotrimer consisting of an alpha, beta, and gamma subunit. Laminin gamma is therefore present in equal proportions to other laminins in all basement membranes with a laminin network. Several gamma isoforms do exist, but only laminin gamma 1 will bind to laminin alpha 1, which we use throughout this paper to mark the midline as well as nearby basement membranes that are sensitive to Ntn4 disruption. Thus, gamma laminin proportions or isoforms are unlikely to underlie the resistance of the midline and endodermal basement membranes to Ntn4 (reviewed in Yurchenco 2011).

      - Please comment: what is the red outline abutting the electroporated DM on the left of Fig5B?

      The noted structure is the basement membrane of the nephric duct – we added this information to Fig. 5B image and legend.

      - The stage in Fig 6A-B is lacking.

      We have added the requested stage information to Fig. 6.

      - Please comment on whether there is or is not some cell mixing Fig 2H, at HH21 after the midline disappearance. Is it consistent with Fig. 6E-F which labels cells?

      More than a decade of DNA electroporation experiments of the left vs. right DM by our laboratory and others have never indicated dorsal mesentery cell migration across the midline (Davis et al., 2008; Kurpios et al., 2008; Welsh et al., 2013; Mahadevan et al., 2014; Arraf et al. 2016; Sivakumar et al., 2018; Arraf et al. 2020; and Sanketi et al., 2022). This is also shown in our current GFP/RFP double electroporation data in Fig. 2 G-H, and DiI/DiO labeling data in Fig. 2 E-G. Cell mixing does not occur even after midline disappearance, most likely due to asymmetric N-cadherin expression on the left side of the DM (Kurpios et al., 2008). The sparse, green-labeled cells observed on the right side in Fig. 2H are likely a result of DNA electroporation - the accuracy of this process relies on the precise injection of the left (or right) coelomic cavity (precursor to the gut mesenchyme including the DM) and subsequent correct placement of the platinum electrodes.

      Based on these data, we strongly feel that cellular migration is not responsible for the pattern of dextran observed in Fig. 6E-F, especially in light of the N-cadherin mediated segregation of left and right. We will also note that there is no significant difference between dextran diffusion at HH19 and HH20, only a trend towards significance. Additionally, we would like to note that the dextran-injected embryos were isolated two hours post-injection, which we do not believe is sufficient time for any cell migration to occur across the DM. We also collected additional post-midline stage embryos ten minutes after dextran injections (data not shown), too short a timeframe for significant cellular migration, and the fluorescent signal in those embryos was comparable to that represented in the embryos in Fig. 6. Thus, we believe the movement of fluorescent signal across the DM observed when the barrier starts to fragment at HH20 and HH23 is unlikely to represent movement of cells.

      To further strengthen this argument, we now have additional new data on midline diffusion using BODIPY and quantification method to support our findings on the midline's function against diffusion (please refer to New Fig. 6H-M). Briefly, we utilized a BODIPY-tagged version of AMD3100 (Poty et al., 2015) delivered via soaked resin beads surgically inserted into the left coelomic cavity (precursor to the DM). The ratio of average AMD3100-BODIPY intensity in the right DM versus the left DM was below 0.5 when the midline is intact (HH19), indicating little diffusion across the DM (Fig. 6J). At HH21 when no midline remains, this ratio significantly rises to near one, indicating diffusion of the drug is not impeded when the midline basement membrane structure is absent. Collectively, these data suggest that the basement membrane structure at the midline forms a transient functional barrier against diffusion.

      - 'independent of Lefty1': rephrase or show the midline phenotype after lefty1 inactivation.

      We agree with this comment and have rephrased this section to indicate the midline is present “at a stage when Lefty1 is no longer expressed at the midline.”

      We again would like to extend our sincere gratitude to our reviewers and the editors at eLife for their dedicated time and thorough evaluation of our paper. Their meticulous attention to detail and valuable insights have strengthened our data and provided further support for our findings.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      In this manuscript, Zhou et al describe a deaminase and reader protein-assisted RNA m5C sequencing method. The general strategy is similar to DART-seq for m6A sequencing, but the difference is that in DART-seq, m6A sites are always followed by C which can be deaminated by fused APOBEC1 to provide a high resolution of m6A sites, while in the case of m5C, no such obvious conserved motifs for m5C sites exist, therefore, the detection resolution is much lower. In addition, the authors used two known m5C binding proteins ALYREF and YBX1 to guide the fused deaminases, but it is not clear whether these two binding proteins can bind most m5C sites and compete with other m5C binding proteins.

      Thank you for your kind suggestion. RNA affinity chromatography and mass spectrometry analyses using biotin-labelled oligonucleotides with or without m5C were performed in previous reports (doi:10.1038/cr.2017.55 and doi: 10.1038/s41556-019-0361-y), and the results showed that ALYREF and YBX1 had a more prominent binding ability to m5C -modified oligonucleotides. Moreover, these two m5C -binding proteins are also responsible for mRNA m5C binding, so we chose to use their ability to bind targeted m5C to construct a DRAM detection system in anticipation of transcriptome-wide m5C detection. We hope to propose a suitable detection strategy for RNA m5C, and there will certainly be room for optimization of the DRAM system in the future with more in-depth studies of m5C binding proteins. We have discussed the above issue in lines 75-82 and 315-318.

      It is well known that two highly modified m5C sites exist in 28S RNA and many m5C sites exist in tRNA, the authors should validate their methods first by detecting these known m5C sites and evaluate the possible false positives in rRNA and tRNA.

      Thank you for your kind suggestion. We attempted PCR amplification of sequences flanking m5C sites 3782 and 4447 on 28S rRNA, as well as multiple m5C sites on tRNA, including m5C48 and m5C49 on tRNAVal, m5C48 and m5C49 on tRNAAsp, and m5C48 on tRNALys.

      However, Sanger sequencing revealed no valid mutations, which was implemented in Figure S3. We believe this outcome indicates that the DRAM system is more suited for transcriptome-wide m5C detection of mRNAs. This is supported by current reports that ALYREF and YBX1 are responsible for the m5C-binding proteins of mRNAs (doi:10.1038/cr.2017.55 and doi: 10.1038/s41556-019-0361-y). The above results and descriptions were added to lines 136-143.

      In mRNA, it is not clear what is the overlap between the technical replicates. In Figures 4A and 4C, they detected more than 10K m5C sites, and most of them did not overlap with sites uncovered by other methods. These numbers are much larger than expected and possibly most of them are false positives.

      Thank you for your kind suggestion. We observed significant overlap between the technical repeats by comparing the data across biological repeats, as shown in Figure S4C and described in lines 174-175. We considered m5C modification in a region only when editing events were detected in at least two biological replicates, ensuring a high-stringency screening process (details seen in the revised method in lines 448-455 and Figure 3F). With more in-depth research into m5C readers, we aim to achieve more accurate detection in the future.

      Besides, it is not clear what is the detection sensitivity and accuracy since the method is neither single base resolution nor quantitative.

      Thank you for your suggestion. As shown in Figure 3G, we found that the editing window of the DRAM system exhibited enrichment of approximately 20 bp upstream and downstream of the m5C site. Previous reports identified Type I m5C sites, which tend to have a downstream "NGGG" motif, and Type II m5C sites, which often contain a downstream "UCCA" motif. However, these m5C motifs do not fully characterize all m5C sites, and their presence downstream of an m5C site is not guaranteed (doi: 10.1038/s41594-019-0218-x). This limitation complicates single-base resolution analysis by the DRAM system. Nevertheless, we believe that with further exploration of m5C sequence features, precise single-base resolution detection can be achieved in the future. This point is also discussed in lines 314-322.

      Regarding the quantitative level of the assay, we conducted additional experiments by progressively reducing the expression levels of the fusion proteins. Sanger sequencing revealed that the editing efficiency of A-to-G and C-to-U within the m5C region significantly decreased as fusion protein expression diminished (Figure S9). These findings suggest that the DRAM system's transfection efficiency is concentration-dependent and that the ratio of editing efficiency to transfection efficiency could aid in the quantitative analysis of m5C using the DRAM system. The relative results were supplemented in Figure S9 and discussed in lines 263-271.

      There are no experiments to show that the detected m5C sites are responsive to the writer proteins such as NSUN2 and NSUN6, and the determination of the motifs of these writer proteins.

      Thank you for your kind suggestion. We have performed a motif enrichment analysis based on the sequences spanning 10 nt upstream and downstream of DRAM-editing sites. The relative results of this analysis were supplemented in Figure S4D and lines 168-171. Unfortunately, we did not identify any clear sequence preferences for the m5C sites catalyzed by the methyltransferases NSUN2 and NSUN6, which have previously been associated with “G”-rich sequences and the “CUCCA” motif. This limitation is mainly due to the DRAM detection system’s inability to achieve single-base resolution for m5C detection, which is also explained in the above response.

      Reviewer #2:

      (1) The use of two m5C reader proteins is likely a reason for the high number of edits introduced by the DRAM-Seq method. Both ALYREF and YBX1 are ubiquitous proteins with multiple roles in RNA metabolism including splicing and mRNA export. It is reasonable to assume that both ALYREF and YBX1 bind to many mRNAs that do not contain m5C.

      To substantiate the author's claim that ALYREF or YBX1 binds m5C-modified RNAs to an extent that would allow distinguishing its binding to non-modified RNAs from binding to m5C-modified RNAs, it would be recommended to provide data on the affinity of these, supposedly proven, m5C readers to non-modified versus m5C-modified RNAs. To do so, this reviewer suggests performing experiments as described in Slama et al., 2020 (doi: 10.1016/j.ymeth.2018.10.020). However, using dot blots like in so many published studies to show modification of a specific antibody or protein binding, is insufficient as an argument because no antibody, nor protein, encounters nanograms to micrograms of a specific RNA identity in a cell. This issue remains a major caveat in all studies using so-called RNA modification reader proteins as bait for detecting RNA modifications in epitranscriptomics research. It becomes a pertinent problem if used as a platform for base editing similar to the work presented in this manuscript.

      We thank the reviewer for the valuable suggestion. Previous studies have shown that while ALYREF and YBX1 can bind mRNAs without the m5C modification, their binding affinity for m5C-modified oligonucleotides is significantly higher than for unmethylated controls. This has been demonstrated through experiments such as in vitro tractography, electrophoretic mobility shift assay (EMSA) (doi:10.1038/cr.2017.55), and UHPLC-MRM-MS/MS. Additionally, isothermal titration calorimetry measurements and PAR-CLIP experiments have shown that mutations in the key amino acids responsible for m5C binding in ALYREF and YBX1 result in a significant reduction in their ability to m5C (doi: 10.1038/s41556-019-0361-y).

      Although Me-RIP analysis was unsuccessful in our laboratory, likely due to the poor specificity of the m5C antibody, we alternatively performed RNA pulldown experiments. These experiments verified that the ability of DRAMmut-expressing proteins to bind RNA with m5C modification was virtually absent compared to DRAM-expressing proteins, while their binding ability with non-modified RNA was not significantly affected. The relative RNA pulldown results were supplemented in Figure S1E, S1F and lines 110-111. Therefore, we believe that by integrating DRAMmut group, our DRAM system could effectively exclude the false-positive mutations caused by unspecific binding of DRAM’s reader protein to non-m5C-modified mRNAs.

      (2) Since the authors use a system that results in transient overexpression of base editor fusion proteins, they might introduce advantageous binding of these proteins to RNAs. It is unclear, which promotor is driving construct expression but it stands to reason that part of the data is based on artifacts caused by overexpression. Could the authors attempt testing whether manipulating expression levels of these fusion proteins results in different editing levels at the same RNA substrate?

      Thank you for pointing this out. To investigate how different expression levels of these proteins influence A-to-G and C-to-U editing within the same m5C region, we conducted a gradient transfection using plasmid concentrations of 1500 ng, 750 ng and 300 ng. This approach allowed us to progressively reduce the expression levels of the fusion proteins. Sanger sequencing revealed that the editing efficiency of A-to-G and C-to-U within the m5C region significantly decreased as fusion protein expression diminished. These findings suggest that the transfection efficiency of the DRAM system is concentration-dependent and that the ratio of editing efficiency to transfection efficiency may assist in the quantitative analysis of m5C using the DRAM system. The relative results and hypotheses were added and discussed in Figure S9 and lines 263-271 of the revised manuscript.

      (3) Using sodium arsenite treatment of cells as a means to change the m5C status of transcripts through the downregulation of the two major m5C writer proteins NSUN2 and NSUN6 is problematic and the conclusions from these experiments are not warranted. Sodium arsenite is a chemical that poisons every protein containing thiol groups. Not only do NSUN proteins contain cysteines but also the base editor fusion proteins. Arsenite will inactivate these proteins, hence the editing frequency will drop, as observed in the experiments shown in Figure 5, which the authors explain with fewer m5C sites to be detected by the fusion proteins.

      Thank you for pointing this out. We used bisulfite sequencing PCR to determine that the m5C levels in RPSA and AP5Z1 were significantly reduced after sodium arsenite treatment. This was followed by a significant decrease in editing frequency detected by the DRAM system in sodium arsenite-treated samples compared to untreated samples. This reduction aligns with the decreased editing efficiency observed in methyltransferase-deficient cells (as shown in Figures 2G and 2H), which initially convinced us that these results reflected the DRAM system's ability to monitor dynamic changes in m5C levels.

      However, as the reviewer pointed out, sodium arsenite treatment could potentially inactivate the fusion proteins, leading to the observed reduction in editing efficiency. This possibility has not been conclusively ruled out in our current experiments. Optimizing this validation may require the future development of more specific m5C inhibitors. In light of this, we have revised our previous results and conclusions in lines 235-244, and discussed these points in lines 308-315.

      (4) The authors should move high-confidence editing site data contained in Supplementary Tables 2 and 3 into one of the main Figures to substantiate what is discussed in Figure 4A. However, the data needs to be visualized in another way than an Excel format. Furthermore, Supplementary Table 2 does not contain a description of the columns, while Supplementary Table 3 contains a single row with letters and numbers.

      Thank you for your kind suggestion. We have visualized the data from Supplementary Tables 2 and 3 into Figure 3F, presenting it as a screening flowchart for high-confidence editing sites. In Supplementary Table 3, we have displayed only the DRAM-mutated genes, which is why it contains a single row with letters and numbers. As requested, we have included descriptions of each column and reorganized the Supplementary table 2 and 3 accordingly.

      (5) The authors state that "plotting the distribution of DRAM-seq editing sites in mRNA segments (5'UTR, CDS, and 3'UTR) highlighted a significant enrichment near the initiation codon (Figure 3F).", which is not true when this reviewer looks at Figure 3F.

      Thank you for your kind suggestion, and we replaced the expression of " near the initiation codon" with "in the CDS" in lines 192-193.

      (6) The authors state that "In contrast, cells expressing the deaminase exhibited a distinct distribution pattern of editing sites, characterized by a prevalence throughout the 5'UTR.", which is not true when this reviewer looks at Figure 3F.

      Thank you for your kind suggestion. This distribution was actually characterized by a prevalence throughout the "3'UTR", but not "5'UTR". We have also made the necessary changes in lines 193-195.

      (7) The authors claim in the final conclusion: "In summary, we developed a novel deaminase and reader protein assisted RNA m5C methylation approach...", which is not what the method entails. The authors deaminate As or Us close to 5mC sites based on the binding of a deaminase-containing protein.

      Thank you for your kind suggestion, and we have made the necessary changes in lines 331-334.

      (8) The authors claim that "The data supporting the findings of this study are available within the article and its Supplementary Information." However, no single accession number for the deposited sequencing data can be found in the text or the supplementary data. Without the primary data, none of the claims can be verified.

      Thank you for pointing this out. The sequencing data from this study has already been deposited to the GEO database (GEO assession number: GSE254194, GEO token:ororioukbdqtpcn), and we will ensure it is made publicly available in a timely manner.

      (a) To underscore point (1), a recent publication (https://doi.org/10.1038/s41419-023-05661-y) reported: "To further identify the potential mRNAs regulated by ALYREF, we performed RNA-seq analysis in control or ALYREF knockdown T24 cells. After knockdown of ALYREF, 143 mRNAs differentially expressed, including 94 downregulated mRNAs (NC reads >100, |Fold change | >1.5, P-value <0.05). Functional enrichment analysis using Kyoto Encyclopedia of Genes and Genomes (KEGG) indicated that regulated mRNAs by ALYREF are chiefly enriched in canonical cancer-related pathways (Fig. S4A), including TGF-β signaling, MAPK signaling, and NF-κB signaling, strongly supporting the oncogenic function of ALYREF in tumor progression. Among these 94 downregulated genes, 11 mRNA showed a significant reduction in m5C methylation after NUSN2 silencing in T24 cells, combined with previously transcriptome-wide RNA-BisSeq data of T24 cells [21] (Fig. 4A)."

      These results translate into 94 mRNAs are regulated by ALYREF in bladder cancer-derived cells. From those, very few (11) mRNA identities respond to NSUN2-dependent RNA methylation mediated by ALYREF binding.The question then arises, is that number sufficient to claim that ALYREF is a m5C-binding protein?

      And if so, how does the identification of 10.000+ edits by DRAM-Seq compare with the 94 mRNAs that are regulated by ALYREF? Were these 94 mRNAs identified by DRAM-Seq.

      Thank you for your kind suggestion. Previous reports by Yang et al. ( doi: 10.1038/cr.2017.55), including the literature you refer to, have detailed the close relationship between ALYREF and m5C modification, and the ALY/REF export factor (ALYREF) was identified as the first nuclear m5C reader, and it was demonstrated that many mRNAs are regulated by ALYREF, and is therefore considered to be an m5C-binding protein.

      As required, by comparing the DRAM-edited mRNAs with the reported 94 mRNAs, we found that only 55.32% of the 94 mRNAs regulated by ALYREF could be detected by the DRAM system. This indicates that the DRAM system specifically targets certain mRNAs, as illustrated in Figure S4E. The relevant results were described and discussed in lines 175-179.

      (b) Line 123:

      "The deep sequencing results showed that the deamination rates of RPSA and SZRD1 were 75.5% and 27.25%, respectively. (Fig. 2A, B)."

      The Figure shows exactly the opposite of bisulfite-mediated deamination. These are the cytosines that were not deaminated by the chemical treatment and therefore can be sequenced as cytosines and not thymidines. Hence, the term deamination rate is wrong.

      Thank you for your kind suggestion. We have made the necessary change in lines 129-130 to change the deamination rates to m⁵C fraction.

      (c) Line 157:

      "DRAM-seq analysis further confirmed that DRAM was detected in an m5C-dependent manner, with minimal mutations in AP5Z1 and RPSA mRNAs in methyltransferase knockout cells compared to wild-type cells (Fig. 3C, D)."

      There is no indication of what the authors mean by minimal mutation in these Figures. The term "minimal mutation" should be reconsidered as well.

      Thank you for your kind suggestion. We intended to express that "Mutations in AP5Z1 and RPSA mRNA are reduced in methyltransferase-deficient cells." There was an issue with the initial formulation, and we have made the necessary changes in lines 165-167.

      (d) Line 167:

      "To further delineate the characteristics of the DRAM-seq data, we compared the distribution of DRAM-seq editing sites within the gene structure, specifically examining their occurrences in the 5'untranslated region (5'UTR), 3' untranslated region (3'UTR), CDS and ncRNA."

      Which part of a coding RNA is meant by "ncRNA"?

      Thank you for pointing this out. This was actually the Intergenic or Intron region, but not ncRNA. We have also corrected this labelling in Figure 3G and lines 186-189 of the revised manuscript.

      (e) Line 189:

      "Subsequently, we assessed the capacity of DRAM-seq to detect m5C on a transcriptome-wide scale, comparing its performance to BS-seq that have been previously reported with great authority."

      The term "great authority" is not a scientific term. Please, remove adulation to senior authors.

      Thank you for your kind suggestion. We removed this unsuitable expression and made the necessary changes in lines 207-208.

      (f) Line 233:

      "Several experiments have highlighted the requirement of 100-500 ng of RNA for m5C-RIP-seq, while BS-seq necessitates an even more demanding 500-750 μg of RNA21,25,61."

      This reviewer doubts that RNA bisulfite sequencing required half to one mg of RNA input. Please, check these references.

      Thank you for your kind suggestion. According to the references, we corrected μg to ng and made the necessary changes in lines 251-252.

      (g) Line 247:

      "Several experiments have highlighted the requirement of 100-500 ng of RNA for m5C-RIP-seq, while BS-seq necessitates an even more demanding 500-750 μg of RNA21,25,61."

      This reviewer doubts that RNA bisulfite sequencing requires half to one mg of RNA input. Please, check these references.

      Thank you for your kind suggestion. According to the references, we corrected μg to ng and made the necessary changes in lines 251-252.

      (h) Line 292:

      "Since m5C lacks a fixed motif, DRAM has an apparent limitation in achieving single-base resolution for detecting m5C."

      m5C deposition by NSUN2 and NSUN6 occurs in particular motifs that were coined Type I and II motifs. Hence, this statement is not correct.

      Thank you for your kind suggestion. Previous reports identified Type I m5C sites, which tend to have a downstream "NGGG" motif, and Type II m5C sites, which often contain a downstream "UCCA" motif. However, these m5C motifs do not fully characterize all m5C sites, and their presence downstream of an m5C site is not guaranteed (doi: 10.1038/s41594-019-0218-x ). Therefore, we have corrected the expression “fixed motif” to “fixed base composition for characterizing all m5C modification sites” in lines 317.

      (i) Line 390:

      "1 μl of total cellular RNA was used for sequencing library gene..."

      1 uL does not allow us to deduce which RNA mass was used for cDNA synthesis.

      Thank you for your kind suggestion. According to our cDNA synthesis protocol, we corrected “1μl” to “1μg” in lines 422-423.

      (j) Line 405:

      "...was assessed on the Agilent 5400 system (Agilent, USA) and quantified by QPCR (1.5 nM)"

      What does the 1.5 nM refer to in this sentence?

      Thank you for your kind suggestion. Here, "1.5nM" means that the concentration of the constructed library should be no less than 1.5nM. We have also revised this expression in the methods in lines 436-438.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use analysis of existing data, mathematical modelling, and new experiments, to explore the relationship between protein expression noise, translation efficiency, and transcriptional bursting.

      Strengths:

      The analysis of the old data and the new data presented is interesting and mostly convincing.

      Thank you for the constructive suggestions and comments. We address the individual comments below.

      Weaknesses:

      (1) My main concern is the analysis presented in Figure 4. This is the core of mechanistic analysis that suggests ribosomal demand can explain the observed phenomenon. I am both confused by the assumptions used here and the details of the mathematical modelling used in this section. Firstly, the authors' assumption that the fluctuations of a single gene mRNA levels will significantly affect ribosome demand is puzzling. On average the total level of mRNA across all genes would stay very constant and therefore there are no big fluctuations in the ribosome demand due to the burstiness of transcription of individual genes. Secondly, the analysis uses 19 mathematical functions that are in Table S1, but there are not really enough details for me to understand how this is used, are these included in a TASEP simulation? In what way are mRNA-prev and mRNA-curr used? What is the mechanistic meaning of different terms and exponents? As the authors use this analysis to argue ribosomal demand is at play, I would like this section to be very much clarified.

      Thank you for raising two important points. Regarding the first point, we agree that the overall ribosome demand in a cell will remain more or less the same even with fluctuations in mRNA levels of a few genes. However, what we refer to in the manuscript is the demand for ribosomes for translating mRNA molecules of a single gene. This demand will vary with the changes in the number of the mRNA molecules of that gene. When the mRNA copy number of the gene is low, the number of ribosomes required for translation is low. At a subsequent timepoint when the mRNA number of the same gene goes up rapidly due to transcriptional bursting, the number of ribosomes required would also increase rapidly. The process of allocation of ribosomes for translation of these mRNA molecules will vary between cells, and this process can lead to increased expression variation of that gene among cells.

      Regarding the second point, each of the 19 mathematical functions was individually tested in the TASEP model and stochastic simulation. The parameters ‘mRNA-curr’ and ‘mRNA-prev’ are the mRNA copy numbers at the current time point and the previous time point in the stochastic simulation, respectively. These numbers were calculated from the rate of production of mRNA, which is influenced by the burst frequency and the burst size, as well as the rate of mRNA removal. We would expand this section with explanation for all parameters and terms in the revised manuscript.

      (2) Overall, the paper is very long and as there are analytical expressions for protein noise (e.g. see Paulsson Nature 2004), some of these results do not need to rely on Gillespie simulations. Protein CV (noise) can be written as three terms representing protein noise contribution, mRNA expression contribution, and bursty transcription contribution. For example, the results in panel 1 are fully consistent with the parameter regime, protein noise is negligible compared to transcriptional noise.

      Thank you for referring to the paper on analytical expressions for protein noise. We introduced translational bursting and ribosome demand in our model, and these are linked to stochastic fluctuations in mRNA and ribosome numbers. In addition, our model couples transcriptional bursting with translational bursting and ribosome demand. Since these processes are all stochastic in nature, we felt that the stochastic simulation would be able to better capture the fluctuations in mRNA and protein expression levels originating from these processes. For consistency, we used stochastic simulations throughout even when the coupling between transcription and translation were not considered.

      Reviewer #2 (Public review):

      This work by Pal et al. studied the relationship between protein expression noise and translational efficiency. They proposed a model based on ribosome demand to explain the positive correlation between them, which is new as far as I realize. Nevertheless, I found the evidence of the main idea that it is the ribosome demand generating this correlation is weak. Below are my major and minor comments.

      Thank you for your helpful suggestions and comments. We note that the direct experimental support required for the ribosome demand model would need experimental setups that are beyond the currently available methodologies. We address the individual comments below.

      Major comments:

      (1) Besides a hypothetical numerical model, I did not find any direct experimental evidence supporting the ribosome demand model. Therefore, I think the main conclusions of this work are a bit overstated.

      Direct experimental evidence of the hypothesis would require generation of ribosome occupancy maps of mRNA molecules at the level of single cells and at time intervals that closely match the burst frequency of the genes. This is beyond the currently available methodologies. However, there are other evidences that support our model. For example, earlier work in cell-free systems have showed that constraining cellular resources required for transcription or translation can increase expression heterogeneity (Caveney et al., 2017). In addition, genome-wide analysis of expression noise in yeast also revealed that the association between protein noise and translational efficiency was highest in the group of genes with the most bursty transcription (Supplementary fig. S20).

      (2) I found that the enhancement of protein noise due to high translational efficiency is quite mild, as shown in Figure 6A-B, which makes the biological significance of this effect unclear.

      Although we agree with the reviewer’s comment that the effect of translational efficiency on protein noise may not be as substantial as the effect of transcriptional bursting, it has been observed in studies across bacteria, yeast and Arabidopsis (Ozbudak et al., 2003; Blake et al., 2003; Wu et al., 2022). In addition, the relationship between translational efficiency and protein noise is in contrast with the inverse relationship observed between mean expression and noise (Newman et al., 2006; Silander et al., 2012). We also note that the goal of the manuscript was not to evaluate the strength of the association, but to understand the basis of the influence of translational efficiency on protein noise.

      (3) The captions for most of the figures are short and do not provide much explanation, making the figures difficult to read.

      We will revise the figure captions to include more details as per the reviewer’s suggestion.

      (4) It would be helpful if the authors could define the meanings of noise (e.g., coefficient of variation?) and translational efficiency in the very beginning to avoid any confusion. It is also unclear to me whether the noise from the experimental data is defined according to protein numbers or concentrations, which is presumably important since budding yeasts are growing cells.

      For all published datasets where we had measurements from a large number of genes/promoters, we used the measures of adjusted noise (for mRNA noise) and Distance-to-median (DM, for protein noise). For experiments that we performed on a limited number of promoters, we used the measure of coefficient of variation (CV) to quantify noise, as calculation of adjusted noise or DM was not possible. Translational efficiency refers to translation rate which is determined by both the translation initiation rate and the translation elongation rate. The noise at the protein level was quantified from the signal intensity of GFP tagged proteins, which was proportional to protein numbers without considering cell volume. For quantification of noise at the mRNA level, single-cell RNA-seq data was used, which provided mRNA numbers in individual cells.

      (5) The conclusions from Figures 1D and 1E are not new. For example, the constant protein noise as a function of mean protein expression is a known result of the two-state model of gene expression, e.g., see Equation (4) in Paulsson, Physics of Life Reviews 2005.

      Yes, they are not new, but we included these results for setting the baseline for comparison with simulation results that appear in the later part of the manuscript where we included translational bursting and ribosome demand in our models.

      (6) In Figure 4C-D, it is unclear to me how the authors changed the mean protein expression if the translation initiation rate is a function of variation in mRNA number and other random variables.

      The translation initiation rate varied from a baseline initiation rate depending on the mRNA numbers and other variables. We changed the baseline initiation rate to alter the mean protein expression levels. We will elaborate this section in the revised manuscript.

      (7) If I understand correctly, the authors somehow changed the translation initiation rate to change the mean protein expression in Figures 4C-D. However, the authors changed the protein sequences in the experimental data of Figure 6. I am not sure if the comparison between simulations and experimental data is appropriate.

      It is an important observation. Even though we changed the translation initiation rate to change the mean expression (Fig. 4C-D), we noted in the description in the model (Fig. 3D) that the changes in the translation initiation rate was also linked with changes in the translation elongation rate. The translation initiation rate can only increase if the ribosomes already bound to the mRNA traverse quicker through the mRNA. This means that an increase in the translation initiation rate will occur only if the translation elongation rate is also increased, which will lead to lower traversal time of the ribosomes through the mRNA (Fig. 3D). Similarly, an increase in the translation elongation rate will allow more ribosomes to initiate translation. Thus, the parameters translation initiation rate and translation elongation rate are interconnected. This has also been observed in an experimental study by Barrington et al. (2023). Having said that, however, the models can also be expressed in terms of the translation elongation rate, instead of the translation initiation rate, and this modification will not change the results of the simulations due to interconnectedness of the initiation rate and the elongation rate.  

      References

      C. L. Barrington, G. Galindo, A. L. Koch, E. R. Horton, E. J. Morrison, S. Tisa, T. J. Stasevich, O. S. Rissland. Synonymous codon usage regulates translation initiation. Cell Rep. 42, 113413 (2023).

      W. J. Blake, M. Kaern, C. R. Cantor, J. J. Collins, Noise in eukaryotic gene expression. Nature 422, 633-637 (2003).

      P. M. Caveney, S. E. Norred, C. W. Chin, J. B. Boreyko, B. S. Razooky, S. T. Retterer, C. P. Collier, M. L. Simpson, Resource Sharing Controls Gene Expression Bursting. ACS Synth Biol. 6, 334-343 (2017)

      J. R. Newman, S. Ghaemmaghami, J. Ihmels, D. K. Breslow, M. Noble, J. L. DeRisi, J. S. Weissman, Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature, 441, 840-846 (2006).

      E. M. Ozbudak, M. Thattai, I. Kurtser, A. D. Grossman, A. van Oudenaarden, Regulation of noise in the expression of a single gene. Nat Genet. 31, 69-73 (2002).

      O. K. Silander, N. Nikolic, A. Zaslaver, A. Bren, I. Kikoin, U. Alon, M. Ackermann, A genome-wide analysis of promoter-mediated phenotypic noise in Escherichia coli. PLoS Genet. 8, e1002443 (2012).

      H. W. Wu, E. Fajiculay, J. F. Wu, C. S. Yan, C. P. Hsu, S. H. Wu, Noise reduction by upstream open reading frames. Nat Plants. 8, 474-480 (2022).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review)

      Summary:

      Numerous mechanism and structural studies reported the cooperative role of Oct4 and Sox2 during the establishment of pluripotency during reprogramming. Due to the difficulty in sample collection and RNA-seq with low-number cells, the precise mechanisms remain in early embryos. This manuscript reported the role of OCT4 and SOX2 in mouse early embryos using knockout models with low-input ATAC-seq and RNA-seq. Compared to the control, chromatin accessibility and transcriptome were affected when Oct4 and Sox2 were deleted in early ICM. Specifically, decreased ATAC-seq peaks showed enrichment of Motifs of TF such as OCT, SOX, and OCT-SOX, indicating their importance during early development. Moreover, by deep analysis of ATAC-seq and RNA-seq data, they found Oct4 and Sox2 target enhancer to activate their downstream genes. In addition, they also uncovered the role of OS during development from the morula to ICM, which provided the scientific community with a more comprehensive understanding.

      Strengths:

      On the whole, the manuscript is innovative, and the conclusions of this paper are mostly well supported by data, however, there are some issues that need to be addressed.

      Weaknesses:

      Major Points:

      (1) In Figure 1, a more detailed description of the knockout strategy should be provided to clarify itself. The knockout strategy in Fig1 is somewhat obscure, such as how is OCT4 inactivated in Oct4mKO2 heterozygotes. As shown in Figure 1, the exon of OCT4 is not deleted, and its promoter is not destroyed. Therefore, how does OCT4 inactivate to form heterozygotes?

      Thank you for your kind suggestions. We will add a detailed description of the knockout strategy in the legends for Figure 1A and 1B, as shown below:

      Figure 1A. Schemes of mKO2-labeled Oct4 KO (Oct4mKO2) and Oct4 flox alleles. In the Oct4mKO2 allele, a PGK-pac∆tk-P2A-mKO2-pA cassette was inserted 3.6 kb upstream of the Oct4 transcription start site (TSS) and a promoter-less FRT-SA-IRES-hph-P2A-Venus-pA cassette was inserted into Oct4 intron 1. The inclusion of a stop codon followed by three sets of polyadenylation signal sequences (pA) after the Venus cassette ensures both transcriptional and translational termination, effectively blocking the expression of Oct4 exons 2–5.

      Figure 1B. Schemes of EGFP-labeled Sox2 KO (Sox2EGFP) and Sox2 flox alleles. In the Sox2EGFP allele, the 5’ untranslated region (UTR), coding sequence and a portion of the 3’ UTR of Sox2 were deleted and replaced with a PGK-EGFP-pA cassette. Notably, 1,023 bp of the Sox2 3’UTR remaine intact.

      (2) Is ZP 3-Cre expressed in the zygotes? Is there any residual protein?

      Thank you for the question. While we have not directly tested for ZP3-Cre expression in zygotes, the published transcriptome and proteomics data shows that ZP3 is present at both the transcriptional and protein levels in wild-type zygotes (Deng et al., Science, 2014; Gao et al., Cell Reports, 2017). This suggests that ZP3-Cre could potentially be expressed in zygotes as well.

      (3) What motifs are enriched in the rising ATAC-seq peaks after knocking out of OCT4 and SOX2?

      Thank you for the question. The enriched motifs in the rising ATAC-seq peaking in Oct4 KO and Sox2 KO ICMs are the GATA, TEAD, EOMES and KLF motifs, as shown in Figure 4A and Figure supplement 7.

      (4) The ordinate of Fig4c is lost.

      Thank you for the question. The y-axis is average normalized signals (reads per million-normalized pileup signals). We will add it in the revised version.

      (5) Signals of H3K4me1, H3K27ac, and so on are usually used to define enhancers, and the loci of enhancers vary greatly in different cells. In the manuscript, the authors defined ATAC-seq peaks far from the TSS as enhancers. The definition in this manuscript is not strictly an enhancer.

      Thank you for this insightful comment. We will search for and analyze published omics data on H3K4me1 and H3K27ac in early embryos or mouse embryonic stem cells to conduct this analysis.

      (6) If Oct4 and Sox2 truly activate sap 30 and Uhrf 1, what effect does interfering with both genes have on gene expression and chromatin accessibility?

      Thank you for the interesting question. Unfortunately, we have not conducted this specific experiment, so we do not have direct results. However, Sap30 is a key component of the mSin3A corepressor complex, while Uhrf1 regulates the establishment and maintenance of DNA methylation. Both proteins are known to function as repressors. Therefore, we hypothesize that interfering with these two genes could alleviate repression of some genes, such as trophectoderm markers, similar to what we have observed in Oct4 KO and Sox2 KO ICMs.

      Reviewer #2 (Public review):

      In this manuscript, Hou et al. investigate the interplay between OCT4 and SOX2 in driving the pluripotent state during early embryonic lineage development. Using knockout (KO) embryos, the authors specifically analyze the transcriptome and chromatin state within the ICM-to-EPI developmental trajectory. They emphasize the critical role of OCT4 and the supportive function of SOX2, along with other factors, in promoting embryonic fate. Although the paper presents high-quality data, several key claims are not well-supported, and direct evidence is generally lacking.

      Major Points:

      (1) Although the authors claim that both maternal KO and maternal KO/zygotic hetero KO mice develop normally, the molecular changes in these groups appear overestimated. A wildtype control is recommended for a more robust comparison.

      Thank you for your valuable feedback. However, I’m unclear on what is meant by “the molecular changes in these groups appear overestimated.” Could the reviewer kindly provide more details or clarify which specific aspects of the molecular changes they are referring to? This would help us better address the concern.

      (2) The authors assert that OCT4 and SOX2 activate the pluripotent network via the OCT-SOX enhancer. However, the definition of this enhancer is based solely on proximity to TSSs, which is a rough approximation. Canonical enhancers are typically located in intronic and intergenic regions and marked by H3K4me1 or H3K27ac. Re-analyzing enhancer regions with these standards could be beneficial. Additionally, the definitions of "close to" or "near" in lines 183-184 are unclear and not defined in the legends or methods.

      Thank you for this insightful comment. We will search for and analyze published omics data on H3K4me1 and H3K27ac in early embryos or mouse embryonic stem cells to address the concern of “enhancer”.

      The definition of "close to" or "near" in lines 183-184 is in the legend of Figure 2E and methods. In the GSEA analysis, Ensembl protein-coding genes with TSSs located within 10 kb of ATAC-seq peak centers were included.

      (3) There is no evidence that the decreased peaks/enhancers could be the direct targets of Oct4 and Sox2 throughout this manuscript. Figures 2 and 4 show only minimal peak annotations related to OCT and SOX motifs, and there is a lack of chromatin IP data. Therefore, claims about direct targets are not substantiated and should be appropriately revised.

      Thank you for the comment. In Figure Supplement 3C, we analyzed published Sox2 CUT&RUN data from E4.5 ICMs (Li et al., Science, 2023), which demonstrates that the reduced ATAC-seq peaks in our Sox2 KO ICMs are enriched with Sox2 CUT&RUN signals. This data suggests that decreased peaks/enhancers could be the direct targets of Sox2. Unfortunately, we did not to find similar published data for Oct4 in embryos.

      (4) Lines 143-146 lack direct data to support the claim. Actually, the main difference in cluster 1, 11 and 3, 8, 14 is whether the peak contains OCT-SOX motif. However, the reviewer cannot get any information of peaks activated by OCT4 rather than SOX2 in cluster 1, 11.

      Thank you for the comment. As the reviewer pointed out, we agree that clusters 3, 8, 14 is more enriched with OCT-SOX motifs than clusters 1/11. However, this is consistent with our observation that the accessibility of peaks in clusters 1 and 11 mainly relies on Oct4, while the accessibility of clusters 3, 8, 14 relies on both Oct4 and Sox2. Probably the word “activate” is not accurate. We will rearrange the texts as below:

      “Notably, compared to the peaks dependent on Oct4 but not Sox2 (Figure 2B, clusters 1 and 11), those reliant on both Oct4 and Sox2 show greater enrichment of the OCT-SOX motif (Figure 2B, clusters 3, 8 and 14). The former group tended to be already open in the morula, while the latter group became open in the ICM. “

      Minor Points:

      (1) Lines 153-159: The figure panel does not show obvious enrichment of SOX2 signals or significant differences in H3K27ac signals across clusters, thus not supporting the claim.

      Thank you for the comments.

      Line 153-159 reference two datasets:  Figure supplement 3C and 3D.

      In Figure supplement 3C, the average plots above the heatmaps show that the decreased ATAC-seq peaks exhibited higher enrichment with Sox2 CUT&RUN signals compared to the increased or unchanged peaks.

      Regarding Figure supplement 3D, we agree that the H3K27ac signal is only slightly more enriched on the decreased peaks than the unchanged peaks, However, it's important to note that only the top 57,512 strongest of the 142,096 unchanged peaks were included in the analysis. We excluded the weaker unchanged peaks because they are less informative. but if included, they could reduce the average H3K27ac signal for the unchanged peaks.

      (2) Lines 189-190: The term "identify" is overstated for the integrative analysis of RNA-seq and ATAC-seq, which typically helps infer TF targets rather than definitively identifying them.

      Thank you for the suggestion. We will replace “identify” with “infer”. The revised version is as below:

      “In addition, integration of the ATAC-seq and RNA-seq data allowed us to infer previously unknown targets of Oct4 and Sox2, such as Sap30 and Uhrf1, which are essential for somatic cell reprogramming and embryonic development.”

      (3) The Discussion is lengthy and should be condensed.

      Thank you for the suggestion. We will shorten it.

    1. Author response:

      We thank the editors and reviewers for their valuable feedback and are committed to addressing their suggestions in a revised manuscript. We appreciate the reviewers’ recognition of the value of our findings, including the insights into the consequences of synaptic topography and the investigation of spike initiation zones in DNs, which further advance our understanding of signal processing. Our studies offer broader insights into synaptic organization and its significance for dendritic integration in an ethologically relevant context.

      We particularly appreciate the reviewer's suggestion to elaborate on the electrophysiological properties of DNs and to consider the electrotonic distance in our analysis. We also thank the reviewers for highlighting points that need clarification. In short, our models suggest that DNs effectively distribute synapses to maintain linear encoding of synapse numbers when multiple synapses are coactivated. This supports the results of an earlier study suggesting that synapse number gradients encode the location of an approaching stimulus in these neurons (Dombrovski et al., 2023).

      We also agree with the reviewers that the temporal activation of synapses is highly relevant for this system. However, we have focused on synaptic topography because the characterization of temporal patterns of VPN activity is currently lacking in the field. A more detailed investigation of temporal dynamics is therefore beyond the scope of this study.

      With the publication of the reviewed preprint, we have now made the computational pipeline and models available on GitHub (https://github.com/AusbornLab/VPN-DN-synapse-normalization).

      Reference

      Dombrovski M, Peek MY, Park J-Y, Vaccari A, Sumathipala M, Morrow C, Breads P, Zhao A, Kurmangaliyev YZ, Sanfilippo P, Rehan A, Polsky J, Alghailani S, Tenshaw E, Namiki S, Zipursky SL, Card GM. 2023. Synaptic gradients transform object location to action. Nature 613:534–542. doi:10.1038/s41586-022-05562-8

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      Dr. Santamaria's group previously utilized antigen-specific nanomedicines to induce immune tolerance in treating autoimmune diseases. The success of this therapeutic strategy has been linked to expanded regulatory mechanisms, particularly the role of T-regulatory type-1 (TR1) cells. However, the differentiation program of TR1 cells remained largely unclear. Previous work from the authors suggested that TR1 cells originate from T follicular helper (TFH) cells. In the current study, the authors aimed to investigate the epigenetic mechanisms underlying the transdifferentiation of TFH cells into IL-10-producing TR1 cells. Specifically, they sought to determine whether this process involves extensive chromatin remodeling or is driven by preexisting epigenetic modifications. Their goal was to understand the transcriptional and epigenetic changes facilitating this transition and to explore the potential therapeutic implications of manipulating this pathway. 

      The authors successfully demonstrated that the TFH-to-TR1 transdifferentiation process is driven by pre-existing epigenetic modifications rather than extensive new chromatin remodeling. The comprehensive transcriptional and epigenetic analyses provide robust evidence supporting their conclusions. 

      Strengths: 

      (1) The study employs a broad range of bulk and single-cell transcriptional and epigenetic tools, including RNA-seq, ATAC-seq, ChIP-seq, and DNA methylation analysis. This comprehensive approach provides a detailed examination of the epigenetic landscape during the TFH-to-TR1 transition. 

      (2) The use of high-throughput sequencing technologies and sophisticated bioinformatics analyses strengthens the foundation for the conclusions drawn. 

      (3) The data generated can serve as a valuable resource for the scientific community, offering insights into the epigenetic regulation of T-cell plasticity. 

      (4) The findings have significant implications for developing new therapeutic strategies for autoimmune diseases, making the research highly relevant and impactful. 

      We thank the reviewer for providing constructive feedback on the manuscript.

      Weaknesses: 

      (1) While the scope of this study lies in transcriptional and epigenetic analyses, the conclusions need to be validated by future functional analyses. 

      We fully agree with the reviewer’s suggestion. We have added the following text to the Discussion to address this concern: “The current study provides a foundational understanding of how the epigenetic landscape of TFH cells evolves as they transdifferentiate into TR1 progeny in response to chronic ligation of cognate TCRs using pMHCII-NPs. Our current studies focus on functional validation of these observations, by carrying out extensive perturbation studies of the TFH-TR1 transdifferentiation pathway in conditional transcription factor gene knock-out mice. In these ongoing studies, genes coding for a series of transcription factors expressed along the TFH-TR1 pathway are selectively knocked out in T cells, to ascertain (i) the specific roles of key transcription factors in the various cell conversion events and transcriptional changes that take place along the TFH-TR1 cell axis; (ii) the roles that such transcription factors play in the chromatin re-modeling events that underpin the TFH-TR1 transdifferentiation process; and (iii) the effects of transcription factor gene deletion on phenotypic and functional readouts of TFH and regulatory T cell function.”

      (2) This study successfully identified key transcription factors and epigenetic marks. How these factors mechanistically drive chromatin closure and gene expression changes during the TFH-to-TR1 transition requires further investigation. 

      Agreed. Please see our response to point #1 above.  

      (3) The study provides a snapshot of the epigenetic landscape. Future dynamic analysis may offer more insights into the progression and stability of the observed changes. 

      We have previously shown that the first event in the pMHCII-NP-induced TFH-TR1 transdifferentiation process involves proliferation of cognate TFH cells in the splenic germinal centers. This event is followed by immediate transdifferentiation of the proliferated TFH cells into transitional and terminally differentiated TR1 subsets. Although the snapshot provided by our single cell studies reported herein documents the simultaneous presence of the different subsets composing the transdifferentiation pathway at any given time point, the transdifferentiation process itself is extremely fast, such that proliferated TFH cells already transdifferentiate into TR1 cells after a single pMHCII-NP dose (Sole et al., 2023a). This makes it extremely challenging to pursue dynamic experiments. Notwithstanding this caveat, ongoing studies of cognate T cells post treatment withdrawal, coupled to single cell studies of the TFHTR1 pathway in transcription factor gene knockout mice exhibiting perturbed transdifferentiation processes are likely to shed light into the progression and stability of the epigenetic changes reported herein. 

      To address this limitation in the manuscript, we have added the following paragraph to the Discussion: “Although the snapshot provided by our single cell studies reported herein documents the simultaneous presence of the different subsets composing the TFH-TR1 cell pathway upon the termination of treatment, the transdifferentiation process itself is extremely fast, such that proliferated TFH cells already transdifferentiate into TR1 cells after a single pMHCII-NP dose (6). This makes it extremely challenging to pursue dynamic experiments. Notwithstanding this caveat, ongoing studies of cognate T cells post treatment withdrawal, coupled to single cell studies of the TFH-TR1 pathway in transcription factor gene knockout mice exhibiting perturbed transdifferentiation processes are likely to shed light into the progression and stability of the epigenetic changes reported herein”. 

      Reviewer #1 (Recommendations for the authors): 

      The authors may consider the following suggestions to improve this study: 

      (1) The authors may include a brief background on type 1 diabetes and the model involving BDC2.5 T cells to provide context for readers who may not be familiar with these aspects. 

      We have added this information to the first paragraph in the Results section: “BDC2.5mi/I-Ag7-specific CD4+ T cells comprise a population of autoreactive T cells that contribute to the progression of spontaneous autoimmune diabetes in NOD mice. The size of this type 1 diabetes-relevant T cell specificity is small and barely detectable in untreated NOD mice, but treatment with cognate pMHCII-NPs leads to the expansion and formation of antidiabetogenic TR1 cells that retain the antigenic specificity of their precursors (3). As a result, treatment of hyperglycemic NOD mice with these compounds results in the reversal of type 1 diabetes (3).”

      (2) It is understandable that further biological and functional experiments are beyond the scope of this paper, but it would be of interest to know how the authors envision future studies based on the transcriptional and epigenetic information obtained thus far. 

      We have added the following text to the Discussion section: “The current study provides a foundational understanding of how the epigenetic landscape of TFH cells evolves as they transdifferentiate into TR1 progeny in response to chronic ligation of cognate TCRs using pMHCII-NPs. Our current studies focus on functional validation of these observations, by carrying out extensive perturbation studies of the TFH-TR1 transdifferentiation pathway in conditional transcription factor gene knock-out mice. In these ongoing studies, genes coding for a series of transcription factors expressed along the TFH-TR1 pathway are selectively knocked out in T cells, to ascertain (i) the specific roles of key transcription factors in the various cell conversion events and transcriptional changes that take place along the TFH-TR1 cell axis; (ii) the roles that such transcription factors play in the chromatin re-modeling events that underpin the TFH-TR1 transdifferentiation process; and (iii) the effects of transcription factor gene deletion on phenotypic and functional readouts of TFH and regulatory T cell function.”

      (3) The authors may consider adjusting figures where genes are crowded or difficult to read due to small font size. 

      Figures with crowded text have been modified to facilitate reading.

      Reviewer #2 (Public Review): 

      Summary: 

      This study, based on their previous findings that TFH cells can be converted into TR1 cells, conducted a highly detailed and comprehensive epigenetic investigation to answer whether TR1 differentiation from TFH is driven by epigenetic changes. Their evidence indicated that the downregulation of TFH-related genes during the TFH to TR1 transition depends on chromatin closure, while the upregulation of TR1-related genes does not depend on epigenetic changes. 

      Strengths: 

      (1) A significant advantage of their approach lies in its detailed and comprehensive assessment of epigenetics. Their analysis of epigenetics covers chromatin open regions, histone modifications, DNA methylation, and using both single-cell and bulk techniques to validate their findings. As for their results, observations from different epigenetic perspectives mutually supported each other, lending greater credibility to their conclusions. This study effectively demonstrates that (1) the TFH-to-TR1 differentiation process is associated with massive closure of OCRs, and (2) the TR1-poised epigenome of TFH cells is a key enabler of this transdifferentiation process. Considering the extensive changes in epigenetic patterns involved in other CD4+ T lineage commitment processes, the similarity between TFH and TR1 in their epigenetics is intriguing. 

      (2) They performed correlation analysis to answer the association between "pMHC-NPinduced epigenetic change" and "gene expression change in TR1". Also, they have made their raw data publicly available, providing a comprehensive epigenomic database of pMHC-NPinduced TR1 cells. This will serve as a valuable reference for future research. 

      We thank the reviewer for his/her constructive feedback and suggestions for improvement of the manuscript.

      Weaknesses: 

      (1) A major limitation is that this study heavily relies on a premise from the previous studies performed by the same group on pMHC-NP-induced T-cell responses. This significantly limits the relevance of their conclusion to a broader perspective. Specifically, differential OCRs between Tet+ and naïve T cells were limited to only 821, as compared to 10,919 differential OCRs between KLH-TFH and naïve T cells (Figure 2A), indicating that the precursors and T cell clonotypes that responded to pMHC-NP were extremely limited. This limitation should be clearly discussed in the Discussion section. 

      We agree that this study focuses on a very specific, previously unrecognized pathway discovered in mice treated with pMHCII-NPs. Despite this apparent narrow perspective, we now have evidence that this is a naturally occurring pathway that also develops in other contexts (i.e., in mice that have not been treated with pMHCII-NPs). Furthermore, this pathway affords a unique opportunity to further understand the transcriptional and epigenetic mechanisms underpinning T cell plasticity; the findings reported can help guide/inform not only upcoming translational studies of pMHCII-NP therapy in humans, but also other research in this area. 

      We have added the following text to the Discussion to address this limitation: “Although the TFH-TR1 transdifferentiation was discovered in mice treated with pMHCII-NPs, we now have evidence that this is a naturally occurring pathway that also develops in other contexts (i.e., in mice that have not been treated with pMHCII-NPs). Importantly, the discovery of this transdifferentiation process affords a unique opportunity to further understand the transcriptional and epigenetic mechanisms underpinning T cell plasticity; the findings reported here can help guide/inform not only upcoming translational studies of pMHCII-NP therapy in humans, but also other research in this area”.

      We acknowledge that, in the bulk ATAC-seq studies, the differences in the number of OCRs found in tetramer+ cells or KLH-induced TFH cells vs. naïve T cells may be influenced by the intrinsic oligoclonality of the tetramer+ T cell pool arising in response to repeated pMHCII-NP challenge (Sole et al., 2023a). However, we note that our scATAC-seq studies of the tetramer+ T cell pool found similar differences between the oligoclonal tetramer+ TFH subpool and its (also oligoclonal) tetramer+ TR1 counterparts (i.e., substantially higher number of OCRs in the former vs. the latter relative to naïve T cells). 

      This has been clarified in the revised version of the manuscript, by adding the following text to the last paragraph of the Results subsection entitled “Contraction of the chromatin in pMHCII-NP-induced Tet+ vs. TFH cells at the bulk level”: “We acknowledge that, in the bulk ATAC-seq studies, the differences in the number of OCRs found in tetramer+ cells or KLHinduced TFH cells vs. naïve T cells may be influenced by the intrinsic oligoclonality of the tetramer+ T cell pool arising in response to repeated pMHCII-NP challenge (6). However, we note that scATAC-seq studies of the tetramer+ T cell pool found similar differences between the oligoclonal tetramer+ TFH subpool and its (also oligoclonal) tetramer+ TR1 counterparts (i.e., substantially higher number of OCRs in the former vs. the latter relative to naïve T cells)”.

      (2) This article uses peak calling to determine whether a region has histone modifications, claiming that the regions with histone modifications in TFH and TR1 are highly similar. However, they did not discuss the differences in histone modification intensities measured by ChIP-seq. For example, as shown in Figure 6C, IL10 H3K27ac modification in Tet+ cells showed significantly higher intensity than KLH-TFH, while in this article, it may be categorized as "possessing same histone modification region". This will strengthen their conclusions.

      We appreciate your suggestion to discuss differences in histone modification intensities as measured by ChIP-seq. However, we respectfully disagree with the reviewer’s interpretation of these data.

      Our study primarily focuses on the identification of epigenetic similarities and differences between pMHCII-NP-induced tetramer+ cells and KLH-induced TFH cells relative to naive T cells. The outcome of direct comparisons of histone deposition (ChIP-seq) between these cell types is summarized in the lower part of Figure 4B and detailed in Datasheet 5. Throughout this section, we mention the number of differentially enriched regions, their overlap with OCRs shared between tetramer+ TFH and tetramer+ TR1 cells based on scATAC-seq data, and the associated genes. Clearly, the epigenetic modifications that TR1 cells inherit from TFH cells were acquired by TFH cells upon differentiation from naïve T cell precursors. 

      Regarding the specific point raised by the reviewer on differences in the intensity of the H3K27Ac peaks linked to Il10 in Figure 6C, we note that the genomic tracks shown are illustrative. Thorough statistical analyses involving signal background for each condition and p-value adjustment did not support differential enrichment for H3K27Ac deposition around the Il10 gene between pMHCII-NP-induced tetramer+ T cells and KLH-induced TFH cells. 

      This has now been clarified by adding the following text to the end of the Results subsection entitled ”H3K4me3, H3K27me3 and H3K27ac marks in genes upregulated during the TFH-to-TR1 cell conversion are already in place at the TFH cell stage”: “We note that, although in the representative chromosome track views shown in Fig. 6C there appear to be differences in the intensity of the peaks, thorough statistical analyses involving signal background for each condition and p-value adjustment did not support differential enrichment for histone deposition around the Il10 gene between pMHCII-NP-induced tetramer+ T cells and KLH-induced TFH cells.” 

      We have also clarified this in the corresponding section of the Methods section (“ATACseq and ChIP-seq” under “Bioinformatic and Statistical Analyses”): “Given that peak calling alone does not account for variations in the intensity of histone mark deposition, analysis of differential histone deposition includes both qualitative and quantitative assessments. Whereas qualitative assessment involves evaluating the overall pattern and distribution of the various histone marks, quantitative assessment measures the intensity and magnitude of histone mark deposition.”

      (3) Last, the key findings of this study are clear and convincing, but some results and figures are unnecessary and redundant. Some results are largely a mere confirmation of the relationship between histone marks and chromatin status. I propose to reduce the number of figures and text that are largely confirmatory. Overall, I feel this paper is too long for its current contents. 

      We understand your concern about the potential redundancy of some results and figures. Our aim in including these analyses was to provide a comprehensive understanding of the intricate relationships between epigenetic features and transcriptomic differences. We believe that a detailed examination of these relationships is crucial for several reasons: (i) the breadth of the data allows for a thorough exploration of the relationships between histone marks, open chromatin status and transcriptional differences. This comprehensive approach helps to ensure that our conclusions are robust and well-supported; (ii) some of the results that may appear confirmatory are, in fact, important for validating and reinforcing the consistency of our findings across different contexts. These details are intended to provide a nuanced understanding of the interactions between epigenetic features and gene expression; and (iii) By presenting a detailed analysis, we aim to offer a solid foundation for future research in this area. The extensive data presented will serve as a valuable resource for others in the field who may seek to build on our findings.

      That said, we have carefully reviewed the manuscript to identify and streamline elements that might be perceived as overly redundant, while retaining the depth of analysis that we believe is essential.

      Reviewer #2 (Recommendations for the authors): 

      (1) In Figure 1E, the text states "94% (n=217/231) of the genes associated with chromatin regions that had closed during the TFH-TR1 conversion,", but n=231 do not match with n=1820 provided in Figure 1D as downregulated genes. This is one of the examples that do not match numbers among figures or lack sufficient explanations. Please check those numbers carefully and add some sentences if necessary. 

      We note that the text referring to Figure 1D describes the total number of differentially expressed genes between Tet+ TR1 and Tet+ TFH cells using the scMultiome dataset (n = 2,086 genes downregulated in the former vs. the latter; and n = 266 genes upregulated in the former vs. the latter). The text in the paragraph that follows (referring to Figure 1E) focuses exclusively on the genes that had closed chromatin regions during the TFH-to-TR1 conversion, to ascertain whether or not chromatin closure was indeed associated with such gene downregulation. 

      We have modified the first sentence in the paragraph referring to Figure 1E to clarify this point for the reader: “Further analyses focusing on the genes that had closed chromatin regions during the TFH-to-TR1 conversion, confirmed…”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors have developed a valuable method based on a fully cell-free system to express a channel protein and integrate it into a membrane vesicle in order to characterize it biophysically. The study presents a useful alternative to study channels that are not amenable to being studied by more traditional methods.

      Strengths:

      The evidence supporting the claims of the authors is solid and convincing. The method will be of interest to researchers working on ionic channels, allowing them to study a wide range of ion channel functions such as those involved in transport, interaction with lipids, or pharmacology.

      Weaknesses:

      The inclusion of a mechanistic interpretation of how the channel protein folds into a protomer or a tetramer to become functional in the membrane would strengthen the study.

      Work from other labs has described key factors which can improve expression and artificial lipid integration of cellfree derived transmembrane proteins (PMIDs: 35520093, 29625253, 26270393) . However, a significant number of additional experiments would be needed to elucidate the exact biophysical properties governing channel assembly of synthetically derived polycystins. We carried out additional biochemical experiments to address these concerns (see new Figure 1— figure supplement 1 D, E). We used fluorescence-detection size-exclusion chromatography (FSEC) with the goal of understanding how much of the CFE-derived protomers are biochemically folding and assembly into functional tetramers upon incorporation into SUVs. When compared to protein recombinant sources from HEK cells, the production of assembled channels is less than 4% when using the CFE+SUV approach, an estimate based on the oligomer peak fluorescence. In the absence of chaperones found in cells, the assembly of synthetically derived protomers into tetramers is likely intrinsic to the chemical properties of the proteins, and the biophysical principles governing helical membrane protein when inserted into the lipid membrane  (PMID:35133709). We have added our interpretation in lines 111-121.

      Reviewer #2 (Public Review):

      It is challenging to study the biophysical properties of organelle channels using conventional electrophysiology. The conventional reconstitution methods require multiple steps and can be contaminated by endogenous ionophores from the host cell lines after purification. To overcome this challenge, in this manuscript, Larmore et al. described a fully synthetic method to assay the functional properties of the TRPP channel family. The TRPP channels are an important organelle ion channel family that natively traffic to primary cilia and ER organelles. The authors utilized cell-free protein expression and reconstitution of the synthetic channel protein into giant unilamellar vesicles (GUV), the single channel properties can be measured using voltage-clamp electrophysiology. Using this innovative method, the authors characterized their membrane integration, orientation, and conductance, comparing the results to those of endogenous channels. The manuscript is well-written and may present broad interest to the ion channel community studying organelle ion channels. Particularly because of the challenges of patching native cilia cells, the functional characterization is highly concentrated in very few labs. This method may provide an alternative approach to investigate other channels resistant to biophysical analysis and pharmacological characterization.

      Thank you for evaluating our manuscript.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) It would be useful to explain how the Polycystin protein is folded under the experimental conditions used. The expression data shown in Figure 1 Supplement 1B show different protein concentrations of protomer or tetramer. However, it is not described how each form is identified and distinguished. It is also important to mention in the manuscript that this method is only applicable to membrane channels that do not require chaperons for its folding and expression into the membrane. How is the tetramer mechanistically conformed? In line 184, it is stated that this method can be leveraged for studying the effects of channel subunit composition. Would this method allow the expression of two different subunit proteins in order to produce a heteromeric channel?

      In Figure 1—figure supplement 1B, total fluorescence from the synthesized channel-GFP was measured. Protein concentration was calculated based on the linear regression of the GFP standards. Monomeric protein concentration was reported directly from total fluorescence. Tetrameric protein concentration was calculated by dividing the fluorescence by four, and subsequently calculating the concentration based off the GFP standards. 

      This is a good point. Based on your suggestion, we carried out additional biochemical experiments (see new Figure 1— figure supplement 1 D, E). We used fluorescence-detection size-exclusion chromatography (FSEC) with the goal of understanding how much of the CFE-derived protomers are biochemically folding and assembly into functional tetramers upon incorporation into SUVs. As controls we produced recombinant PKD2-GFP and PKD2L1GFP channels as elution time standards and to compare the relative production of tetrameric channels generated when using the two expression systems. The synthetically derived polycystin channels indeed produced tetramers and protomers, which supports feasibility of using this method to assay their functional properties.  When compared to protein recombinant sources from HEK cells, the production of assembled channels is less than 4% when using the CFE+SUV approach, an estimate based on the oligomer peak fluorescence. We speculate that assembly of synthetically derived protomers into tetramers is likely intrinsic to the chemical properties of the proteins, and the biophysical principles governing helical membrane protein when inserted into the lipid membrane (PMID: 35133709). Although an interesting question, a systematic analysis of these channel-lipid interactions is beyond the scope of this eLife Report but can be addressed in future studies. The limitation of using this method to characterize channels which fold and membrane integrate without the aid of molecular chaperones is now stated in lines 201205. In principle, the CFE-GUV method can be deployed to co-express different subunits to produce heteromeric channels. We have modified the text lines 192-197 to be clearer on this point.

      (2) The type of plasmid (and promoter) required for this methodology should be mentioned.

      Added to the methods (lines 210-211). “PKD2 and PKD2L1 are in pET19b plasmid under T7 promoter.”

      (3) Since this paper is methodological, it would be useful to have some information about the stability of the GUVs containing the synthetic channel. In Methods, it is stated that GUV vesicles are used on the same day (line 207). And in line 193 it says that the reactions (?) are placed at 4{degree sign}C for storage.

      Restated in lines 226-228: GUVs are electroformed and used for electrophysiology the same day. SUVs with channel incorporated are stored at 4°C for 3 days.

      (4) A comment reasoning why the PKD2 protein is more frequently incorporated into the membrane in comparison to PKD2L1 should be included. A brief description of the differences between these two proteins would also be helpful for the reader.

      In terms of overall protein production and oligomeric assembly— more PKD2L1 channels are produced compared to PKD2 (see new Figure 1C, and Figure 1— figure supplement 1 D, E). In lines 149-155 we note single channel openings were frequently observed for the high expressing PKD2L1 channels, but this often resulted in patch instability. As a result, GUV patches with lower expressing PKD2-GFP channel were more stable and thus more successfully recorded from. We have revised the text to be clearer on this point.

      (5) There are no methods for preparing hippocampal neurons or IMCD cells shown in Figure 4 Supplement 1. Instead, the method of mammalian cultures provided corresponds to HEK 293T cells.

      This information has been added to lines 273-284.

      (6) Minor:

      In Figure 2C, please include the actual % of the Cell488+Surface647+Clear lumen vesicles.

      Added

      Line 99, 108: Figures 1B and 1C are swapped. Please correct.

      Corrected in figure and figure legends.

      Line 108: misspelling: effect.

      Done

      Line 109: check sentence: verb is missing.

      Sentence now reads “Minimal changes in fluorescence were detected when a control plasmid (Ctrl) encoding a non- fluorescent protein (dihyrofolate reductase) was used in the reaction.”

      Line 145: recoding. Correct.

      Recoding changed to recordings

      Line 169: "from" is missing (recorded from MCD cilia).

      Added

      Line 169: In Table 1, the PKD2 K+ conductance magnitudes recorded from IMCD cilia were significantly smaller, not larger as stated, than those assayed using CFE-GUV system. Please correct.

      Corrected

      Line 180: "of" is missing (adaptation of CFE derived...).

      Corrected

      Line 182: "to" is missing (generalized to other channels).

      Corrected

      Line 193: "in" 4ºC, correct at.

      Corrected

      Line 197: replace "mole" for "mol".

      Corrected

      Line 207: are used "within the" same day.

      Corrected

      Line 210: c-terminally. C-should be capital letter.

      Corrected

      Line 231: n-terminally. N- should be capital letter.

      Corrected

      Reviewer #2 (Recommendations For The Authors):

      The authors validated their method using PKD2 and PKD2L1 channels, demonstrating the potential of this approach. However, a few points merit further clarification or validation:

      (1) Stability of the protein vesicles for recording. The authors observed membrane instability during voltage transitions. It would be beneficial to discuss potential solutions to enhance stability.

      In lines 197-202, we have added a discussion of potential solutions to enhance stability. CsF in the intracellular saline could be added to stabilize the GUV membranes. CsF is frequently added to stabilize whole cell membranes in HTS planer patch clamp recording. We did not explore this formulation because Cs+ would limit outward polycystin conductance. We also suggest but did not test altering the membrane formulation of GUVs with additional cholesterol to stabilize these recordings.

      (2) Validation. Further discussion on how broadly this method can be applied to other channels would strengthen the manuscript.

      We have included further discussion on this point in lines 190-206. 

      (3) Protein production estimated by a standard GFP absorbance assay. The estimation of protein production using GFP absorption may be affected by improperly folded protein. Additional validation methods could be considered.

      C-terminal GFP fluorescence has been widely used in expression systems to designate proper folding of the target protein upstream of the GFP-tag (PMID: 22848743, PMID: 21805523, PMID: 35520093). Nonetheless we have conducted additional experiments designed to estimate the amount of assembled PKD2 and PKD2L1 channels generated using the CFE method. In the new Figure 1— figure supplement 1 D, E, we carried out fluorescencedetection size-exclusion chromatography and compared channel assembly of recombinant and CFE+SUV derived PKD2-GFP and PKD2L1-GFP. Here, we clearly observed tetrameric and protomeric forms of the channels using the synthetic approach, which supports feasibility of using this method to assay their functional properties (see new Figure 1— figure supplement 1 D, E).  When compared to protein recombinant sources from HEK cells, the production of assembled channels is less than 4% when using the CFE+SUV approach, an estimate based on the oligomer peak fluorescence. 

      (4) Single channels were observed more frequently from PKD2 incorporated GUVs compared to PKD2L1. Does this just randomly happen or is there a reason behind this difference?

      In terms of overall protein production and oligomeric assembly— more PKD2L1 channels are produced compared to PKD2 (Figure 1C, and Figure 1— figure supplement 1 D, E). This is apparent whether the channels are produced recombinantly in cells or when using the cell-free method (Figure 1— figure supplement 1 D, E). In lines 149-155, we note single channel openings were frequently observed but that the high expression of the PKD2L1 often resulted in patch instability. As a result, GUV patches the lower expressing PKD2-GFP channel were more stable and thus more successfully recorded from. As requested, we have included a brief description of the two proteins in lines 76-78. 

      (5) Additional validation or clarification for examining the channel orientation may strengthen the manuscript.

      We have modified the text to make this point clearer. 

      (6) Advantage and limitations. The authors compared the recordings from hippocampal primary cilia membranes, noting differences in conductance magnitudes compared to the GUV method. Further discussing the limitations and advantages of this approach for the biophysical properties of organelle channels would be beneficial.

      We have revised the final paragraph to discuss the limitations of this method.

      (7) Including experiments that demonstrate ligand-induced activation or inhibition to further validate the current using this method would strengthen the manuscript (optional, not required).

      Despite our best attempts, exchange of the external bath to apply inhibitors (Gd3+, La3+) resulted in GUV patch instability. Our plans are to investigate ways to stabilize the high resistance seals to develop pharmacological screening using the CFE+GUV method.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We would like to thank the reviewers for their interest in our studies. In response to their comments, we have conducted additional experiments and made the necessary revisions to the manuscript. The new studies included to address the reviewers’ comments are shown in Figure 1B, 1F, Figure 2—figure supplement 1, Figure 3, Figure 3—figure supplement 1, Figure 3—figure supplement 2, Figure 3—figure supplement 3, Figure 4E, Figure 4—figure supplement 1, Figure 5, Figure 5—figure supplement 1, Figure 5—figure supplement 2D, and Figure 6. We are grateful for the critiques, which have helped us substantially improve the quality of the manuscript.

      Below, we have provided a point-by-point response to the reviewers’ comments.  

      Public Reviews:

      Reviewer #1 (Public Review):

      In this paper, the authors show that disruption of calcineurin, which is encoded by tax-6 in C. elegans, results in increased susceptibility to P. aeruginosa, but extends lifespan. In exploring the mechanisms involved, the authors show that disruption of tax-6 decreases the rate of defecation leading to intestinal accumulation of bacteria and distension of the intestinal lumen. The authors further show that the lifespan extension is dependent on hlh-30, which may be involved in breaking down lipids following deficits in defecation, and nhr-8, whose levels are increased by deficits in defecation. The authors propose a model in which disruption of the defecation motor program is responsible for the effect of calcineurin on pathogen susceptibility and lifespan, but do not exclude the possibility that calcineurin affects these phenotypes independently of defecation.

      We thank the reviewer for providing an excellent summary of our work. We have performed additional experiments as suggested by both the reviewers and believe we have thoroughly addressed all the reviewers' concerns.

      Reviewer #2 (Public Review):

      The manuscript titled "Calcineurin Inhibition Enhances Caenorhabditis elegans Lifespan by Defecation Defects-Mediated Calorie Restriction and Nuclear Hormone Signaling" by Priyanka Das, Alejandro Aballay, and Jogender Singh reveals that inhibiting calcineurin, a conserved protein phosphatase, in C. elegans affects the defecation motor program (DMP), leading to intestinal bloating and increased susceptibility to bacterial infection. This intestinal bloating mimics calorie restriction, ultimately resulting in an enhanced lifespan. The research identifies the involvement of HLH-30 and NHR-8 proteins in this lifespan enhancement, providing new insights into the role of calcineurin in C. elegans DMP and mechanisms for longevity.

      The authors present novel findings on the role of calcineurin in regulating the defecation motor program in C. elegans and how its inhibition can lead to lifespan enhancement. The evidence provided is solid with multiple experiments supporting the main claims.

      Strengths:

      The manuscript's strength lies in the authors' use of genetic and biochemical techniques to investigate the role of calcineurin in regulating the DMP, innate immunity, and lifespan in C. elegans. Moreover, the authors' findings provide a new mechanism for calcineurin inhibitionmediated longevity extension, which could have significant implications for understanding the molecular basis of aging and developing interventions to promote healthy aging.

      (1) The study uncovers a new role for calcineurin in the regulation of C. elegans DMP and a potential novel pathway for enhancing lifespan via calorie restriction involving calcineurin, HLH-30, and NHR-8 in C. elegans.

      (2) Multiple signaling pathways involved in lifespan enhancement were investigated with fairly strong experimental evidence supporting their claims.

      We thank the reviewer for an excellent summary of our work and for highlighting the strengths of the findings.

      Weaknesses:

      The manuscript's weaknesses include the lack of mechanistic details regarding how calcineurin inhibition leads to defects in the DMP and induces calorie restriction-like effects on lifespan.

      The exact site of calcineurin action, i.e., whether in the intestine or enteric muscles (Lee et al., 2005), and the possible molecular mechanisms linking calcineurin inhibition, DMP defects, and lifespan were not adequately explored. Although characterization of the full mechanism is probably beyond the scope of this paper, given the relative simplicity and advantages of using C. elegans as a model organism for this study, some degree of rigor is expected with additional straightforward control experiments as listed below:

      The authors state that tax-6 knockdown animals had drastically reduced expulsion events (Figure 2G), leading to irregular DMP (Lines 144-145), but did not describe the nature of DMP irregularity. For example, did the reduced expulsion events still occur with regular intervals but longer cycle lengths? Or was the rhythmicity completely abolished? The former would suggest the intestine clock is still intact, and the latter would indicate that calcineurin is required for the clock to function. Therefore, ethograms of DMP in both wild-type and tax6 mutant animals are warranted to be included in the manuscript. Along the same line, besides the cycle length, the three separable motor steps (aBoc, pBoc, EMC) are easily measurable, with each step indicating where the program goes wrong, hence the site of action, which is precisely the beauty of studying C. elegans DMP. Unfortunately, the authors did not use this opportunity to characterize the exact behavior phenotypes of the tax-6 mutant to guide future investigations. Furthermore, it is interesting that about 64% of tax-6 (p675) animals had normal DMP. The authors attributed this to p675 being a weak allele. It would be informative to further examine tax-6 RNAi as in other experiments or to make a tax-6 null mutant with CRISPR. In addition, in one of the cited papers (Lee et al., 2005), the exact calcineurin loss-of-function strain tax-6(p675) was shown to have normal defecation, including normal EMC, while the gain-of-function mutant of calcineurin tax-6(jh107) had abnormal EMC steps. It wasn't clear from Lee et al., 2005, if the reported "normal defecation" was only referring to the expulsion step or also included the cycle length. Nevertheless, this potential contradiction and calcineurin gain-of-function mutant is highly relevant to the current study and should be further explored as a follow-up to previously reported results. For some of the key experiments, such as tax-6's effects on susceptibility to PA14, DMP, intestinal bloating, and lifespan, additional controls, as the norm of C. elegans studies, including second allele and rescue experiments, would strengthen the authors' claims and conclusions.

      We have now included lifespan, survival on P. aeruginosa, and DMP data using an additional knockout allele, tax-6(ok2065). Additionally, we have added ethograms of DMP for both tax-6 RNAi and the tax-6(ok2065) mutant. Our observations indicate that tax-6 inhibition leads to a complete loss of DMP rhythmicity, suggesting that calcineurin is essential for maintaining the DMP clock. While characterizing the DMP, we noticed that expulsion events appeared superficial in the tax-6(ok2065) mutant, with little to no gut content released. Consequently, we examined the movement of gut content and found that both tax-6(ok2065) mutants and tax-6 knockdown animals showed significantly reduced gut content movement. The new findings on the characterization of DMP are presented in Figure 2—figure supplement 1, Figure 3, Figure 3—figure supplement 1, and Figure 3—figure supplement 2. The text in the results section reads (lines 160-176): “Next, we investigated whether the reduced number of expulsion events was due to regular intervals with longer cycle lengths or if rhythmicity was entirely disrupted upon tax-6 knockdown. To assess this, we obtained ethograms of the DMP for N2 animals grown on control and tax-6 RNAi. While animals on control RNAi displayed regular cycles of pBoc, aBoc, and EMC, the tax-6 RNAi animals exhibited disrupted rhythmicity (Figure 3A and Figure 3—figure supplement 1). Most tax-6 knockdown animals lacked the pBoc and aBoc steps and had sporadic expulsion events. Isolated pBoc events were occasionally observed, indicating a complete loss of rhythmicity in tax-6 knockdown animals. Ethograms for tax-6(ok2065) animals also showed disrupted rhythmicity (Figure 3B and Figure 3—figure supplement 2). Although the number of expulsion events appeared higher in tax-6(ok2065) animals compared to tax-6 RNAi animals (Figure 3—figure supplement 1 and 2), these expulsion events seemed superficial, releasing little to no gut content. This suggested slow movement of gut content in tax6(ok2065) animals, leading to constipation and intestinal bloating. We examined gut content movement by measuring the clearance of blue dye (erioglaucine disodium salt) from the gut. The clearance was significantly slower in tax-6(ok2065) animals compared to N2 animals (Figure 3C), indicating impaired gut content movement due to the loss of tax-6. Similarly, tax-6 knockdown animals also showed significantly slowed gut content movement (Figure 3D).”

      Moreover, we have added a potential reason for the tax-6(p675) contradictory results from Lee et al., 2005 (lines 154-159): “At the 1-day-old adult stage, about 36% of tax-6(p675) animals showed irregular and slowed DMP, while the remainder had regular DMP (Figure 2H), suggesting that tax-6(p675) is a weak allele. The fraction of the animals with irregular DMP appeared to increase with age, indicating that this phenotype might be agedependent. This may also explain why tax-6(p675) animals were reported to have a normal defecation cycle in an earlier study (Lee et al., 2005).”

      The second weakness of this manuscript is the data presentation for all survival rate curves. The authors stated that three independent experiments or biological replicates were performed for each group but only showed one "representative" curve for each plot. Without seeing all individual datasets or the averaged data with error bars, there is no way to evaluate the variability and consistency of the survival rate reported in this study.

      We now provide all replicates data in the source data files.

      Overall, the authors' claims and conclusions are justified by their data, but further experiments are needed to confirm their findings and establish the detailed mechanisms underlying the observed effects of calcineurin inhibition on the DMP, calorie restriction, and lifespan in C. elegans.

      We have conducted additional experiments to elucidate the role of calcineurin in the DMP and to investigate the impact of the DMP on calorie restriction and lifespan in C. elegans, as described in the various responses to the reviewers’ comments. 

      Recommendations for the authors:

      Our specific comments to guide the authors, should they choose to revise the manuscript:

      The RNAi experiments in the eat-2 mutant background are difficult to interpret. If these animals are eating fewer bacteria, it is possible that there is also less tax-6 dsRNA being ingested and therefore less tax-6 inactivation. These experiments should be conducted with a tax-6 null allele.

      We have included lifespan experiments with the eat-2(ad465);tax-6(ok2065) double mutant, along with the individual single mutant controls, as shown in Figure 4E. These results demonstrate that the eat-2 mutation does not further extend the lifespan of the tax-6(ok2065) mutant. Additionally, we confirmed that the eat-2(ad465) mutants do not exhibit defects in feeding-based RNAi (Figure 4—figure supplement 1).

      While aak-2, hlh-30, and nhr-8 mutants may not have an eat phenotype, the negative tax-6 RNAi results should be confirmed with a tax-6 null mutant to obviate the consideration that these background mutations reduce RNAi efficacy.

      The genes hlh-30 and nhr-8 are located very close to tax-6 on chromosome IV (https://wormbase.org//#012-34-5), which made it challenging to generate double mutants. However, we tested the RNAi sensitivity of the hlh-30(tm1978) and nhr-8(ok186) mutants and confirmed that they are not defective in RNAi (Figure 5—figure supplement 1). We also found that tax-6 RNAi disrupted the DMP in both hlh-30(tm1978) and nhr-8(ok186) mutants (Figure 5—figure supplement 2). Furthermore, our results show that hlh-30(tm1978) and nhr-8(ok186) animals have increased susceptibility to P. aeruginosa upon tax-6 knockdown (Figure 6A, B), indicating that tax-6 RNAi was effective in these mutants. Since the phenotype in the aak-2 mutant was only partially observed, we did not conduct further experiments with aak-2 mutants.

      Reviewer #1 (Recommendations For The Authors):

      The low penetrance of defecation cycle defects in tax-6(p675) worms brings into question the role of the defecation deficits in the phenotypes caused by the disruption of tax6. At the same time, the low penetrance provides a golden opportunity to test this. Do tax6(p675) worms with a normal defecation cycle length have extended longevity? Increased susceptibility to bacterial pathogens? Smaller body size? Distended lumen? Decreased fat accumulation? Increased pha-4 and nhr-8 expression? It would be relatively straightforward to measure defecation cycle length in individual tax-6(p675) worms, bin them into normal defecation and slow defecation groups, and then compare the above-mentioned phenotypes.

      We appreciate the reviewer's interesting suggestion. However, the DMP defect phenotype in tax-6(p675) worms appears to be age-dependent, with the number of DMPdefective worms increasing as they age. Additionally, we observed that exposure to P. aeruginosa accelerates the onset of DMP defects in tax-6(p675) worms. As a result, tax6(p675) worms are not suitable for the type of experiments the reviewer suggested. Nevertheless, we believe that the additional data using the tax-6(ok2065) mutant, along with the characterization of ethograms of DMP, firmly establishes the role of calcineurin in maintaining a regular DMP in C. elegans.

      Another way to dissect specific effects of calcineurin disruption from phenotypes resulting from defecation motor program deficits would be to further characterize other worms with deficits in defecation (flr-1, nhx-2, pbo-1 RNAi). It is mentioned that they have decreased lifespan. Do they also show increased susceptibility to bacterial pathogens? Do they show decreased fat? Is their lifespan dependent on HLH-30 and NHR-8?

      We thank the reviewer for this important suggestion. We have now included data with flr-1, nhx-2, and pbo-1 RNAi, which shows that the knockdown of these genes also enhances susceptibility to P. aeruginosa (Figure 3—figure supplement 3G). Knockdown of these genes is already known to reduce fat levels in N2 worms, and we demonstrate that they similarly reduce fat levels in hlh-30(tm1978) and nhr-8(ok186) animals (Figure 5B, C, F, G). Additionally, we found that the increased lifespan observed upon knockdown of these genes (as well as with tax-6 knockdown) is dependent on HLH-30 and NHR-8 (Figure 5A, D).

      To place "enhanced susceptibility to pathogen" within the proposed model, it would be important to examine the effect of HLH-30 and NHR-8 disruption on this phenotype. The proposed model suggests that this phenotype is independent of HLH-30 and NHR-8, but this should be tested experimentally. Similarly, it would be important to test the effect of HLH-30 and NHR-8 disruption on defecation cycle length to determine if defecation deficits are upstream or downstream of deficits in the defecation motor program

      We show that the knockdown of tax-6 leads to defects in the DMP in hlh30(tm1978) and nhr-8(ok186) animals (Figure 5—figure supplement 2). Moreover, we show that hlh-30(tm1978) and nhr-8(ok186) animals have increased susceptibility to P. aeruginosa upon tax-6 knockdown (Figure 6A, B). These results are described as (lines 279-285): “Given that HLH-30 and NHR-8 are essential for lifespan extension upon calcineurin inhibition, we investigated whether these pathways also influence survival in response to P. aeruginosa infection following calcineurin knockdown. Both hlh-30(tm1978) and nhr-8(ok186) animals showed significantly reduced survival upon tax-6 RNAi (Figure 6A, B). These findings suggested that the reduced survival on P. aeruginosa following calcineurin inhibition is independent of HLH-30 and NHR-8 and is more likely due to increased gut colonization by P. aeruginosa resulting from DMP defects (Figure 6C).”

      Is the lifespan of tax-6(p675) increased? This would be important to measure and include in Figure 1.

      Indeed, the lifespan of tax-6(p675) mutants is increased. We have included the lifespan of tax-6(p675) and tax-6(ok2065) in Figure 1F.

      In Figure 2, disruption of tax-6 appears to result in a clear decrease in body size. To what extent is the decrease in fat/worm in Figure 3 simply a result of the worms being smaller? Perhaps, a measurement of Oil-Red-O intensity PER AREA would be a more appropriate measure.

      The ORO intensity values we had shown per animal were already area normalized. We have now indicated this in the Figure Legends.

      There are multiple long-lived mutant strains such as clk-1 and isp-1 that have an increased defecation cycle length. To what extent do these worms exhibit phenotypes similar to tax-6 disruption? isp-1 have increased resistance to bacterial pathogens suggesting that defecation motor program deficits are not sufficient to increase susceptibility to bacterial pathogens.

      We have now examined the clk-1 and isp-1 mutants and found that these mutants exhibit reduced gut colonization by P. aeruginosa compared to N2 animals. This reduction in colonization may be attributed to the slowed pharyngeal pumping rates observed in these mutants. These findings suggest that the phenotypes associated with a slow DMP versus a disrupted DMP could be significantly different. The manuscript with the new data on these mutants reads (lines 177-192): “We then explored whether the disruption of DMP rhythmicity due to tax-6 knockdown affected P. aeruginosa responses similarly to longer but regular DMP cycles. To do this, we studied P. aeruginosa colonization in clk-1(qm30) and isp1(qm150) mutants, which have regular but extended DMP cycles (Feng et al., 2001; Wong et al., 1995). Interestingly, both clk-1(qm30) and isp-1(qm150) mutants showed significantly reduced intestinal colonization by P. aeruginosa compared to N2 animals (Figure 3—figure supplement 3A-D). This reduced colonization could be attributed to their significantly decreased pharyngeal pumping rates (Wong et al., 1995; Yee et al., 2014), suggesting a lower intake of bacterial food in these mutants. While the survival of clk-1(qm30) animals on P. aeruginosa was comparable to N2 animals (Figure 3—figure supplement 3E), isp1(qm150) animals exhibited significantly improved survival (Figure 3—figure supplement 3F). Conversely, knockdown of flr-1, nhx-2, and pbo-1 in N2 animals resulted in significantly reduced survival on P. aeruginosa compared to control RNAi (Figure 3—figure supplement 3G). Knockdown of these genes causes complete disruption of DMP rhythmicity, increasing gut colonization by P. aeruginosa (Singh and Aballay, 2019a). Overall, these findings demonstrated that calcineurin is crucial for maintaining the DMP ultradian clock, and its inhibition increases susceptibility to P. aeruginosa by disrupting the DMP.”

      Line 192. This statement is speculative. There is no evidence that HLH-30 is mediating lipid depletion in these worms.

      We have removed this statement. We observed that the knockdown of flr-1, nhx2, and pbo-1 resulted in significant fat depletion in hlh-30(tm1978) animals (Figure 5B, C). Additionally, tax-6 knockdown also caused a small but significant reduction in fat levels in hlh-30(tm1978) animals. This contrasts with our initial submission, possibly due to the increased number of animals included in the analysis. These findings suggest that the increase in lifespan due to DMP defects requires HLH-30, likely through a mechanism independent of HLH-30’s role in fat depletion. We have updated the manuscript text and model (Figure 6C) accordingly.

      In Figure S2, tax-6 RNAi appears to have a more detrimental effect in pmk-1 mutants than the other mutants. The authors should comment on this.

      We have added the following sentence in the manuscript (lines 123-125): “The knockdown of tax-6 appeared to have a more pronounced effect in pmk-1(km25) mutants than in other mutants, suggesting that inhibition of tax-6 might exacerbate the adverse effects observed in pmk-1(km25) mutants.”

      Reviewer #2 (Recommendations For The Authors):

      Line 192-193: The statement is confusing and not accurate because HLH-30 did not enhance lifespan with or without calcineurin (Figure 4A and S4A, also in Lapierre 2023). The takeaway should be along the lines of calcineurin inhibition enhancing lifespan through HLH-30 or HLH-30 being required for lifespan enhancement via calcineurin inhibition.

      We have removed this statement. We now state (lines 237-239): “Knockdown of tax-6 did not extend the lifespan of hlh-30(tm1978) animals (Figure 5A), indicating that HLH-30 is required for the increased lifespan observed with calcineurin inhibition.”

      Line 261: Similar to the point above. Where is the data showing NHR-8 increases lifespan with or without calcineurin?

      We have removed this sentence.

      Figure 1 legend line 699: animals per condition per replicate >90, but in the Method section Line 317, it says more than 80 animals per condition per replicate. Could be more accurate.

      We have now specified in the Methods section that the exact number of animals per condition is provided in the source data files. Since different lifespan curves within a given figure panel had varying numbers of animals, we have indicated the lower boundary for all curves (including the replicates). The precise number of animals for each lifespan experiment is available in the source data files.

      Figures 2F and G, "tax-6" should be labeled as "tax-6 RNAi" to be consistent with other figures.

      We thank the reviewer for this suggestion and have updated the label to “tax-6 RNAi”.

      In summary, we would like to thank the reviewers again for providing constructive critiques. We believe we have fully addressed all the concerns of the reviewers by carrying out several new experiments and modifying the text. The manuscript has undergone substantial revision and has thereby improved significantly. We do hope that the evidence in support of the conclusions is found to be complete in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors present the cryo-EM structure of of PSI-fucoxanthin chlorophyll a/c-binding proteins (FCPs) supercomplex from the diatom Thalassiosira pseudonana CCMP1335 at a global resolution of 2.3 Å. This exceptional resolution allows the authors to construct a near-atomic model of the entire supercomplex and elucidate the molecular details of FCPs arrangement. The high-resolution structure reveals subunits not previously identified in earlier reconstructions and models, as well as sequence analysis of PSI-FCPIs from other diatoms and red algae. Additionally, the authors use their model in conjunction with a phylogenetic analysis to compare and contrast the structural features of the T. pseudonana supercomplex with those of Chaetoceros gracilis, uncovering key structural features that contribute to the efficiency of light energy conversion in diatoms.

      The study employs the advanced technique of single particle cryo-electron microscopy to visualize the complex architecture of the PSI supercomplex at near-atomic resolution and analyze the specific roles of FCPs in enhancing photosynthetic performance in diatoms.

      Overall, the approach and data are both compelling and of high quality. The paper is well written and will be of wide interest for comprehending the molecular mechanisms of photosynthesis in diatoms. This work provides valuable insights for applications in bioenergy, environmental conservation, plant physiology, and membrane protein structural biology.

      We thank you very much for your highly positive evaluation and comments on our manuscript.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript elucidated the cryo-electron microscopic structure of a PSI supercomplex incorporating fucoxanthin chlorophyll a/c-binding proteins (FCPs), designated as PSI-FCPI, isolated from the diatom Thalassiosira pseudonana CCMP1335. Combining structural, sequence, and phylogenetic analyses, the authors provided solid evidence to reveal the evolutionary conservation of protein motifs crucial for the selective binding of individual FCPI subunits and provided valuable information about the molecular mechanisms governing the assembly and selective binding of FCPIs in diatoms.

      Strengths:

      The manuscript is well-written and presented clearly as well as consistently. The supplemental figures are also of high quality.

      Weaknesses:

      Only minor comments (provided in recommendations for authors) to help improve the manuscript.

      We thank you very much for your highly positive evaluation and comments on our manuscript.

      Reviewer #3 (Public Review):

      Summary:

      Understanding the structure and function of the photosynthetic machinery is crucial for grasping its mode of action. Photosystem I (PSI) plays a vital role in light-driven electron transfer, which is essential for generating cellular reducing power. A primary strategy to mitigate light and environmental stresses involves incorporating peripheral light-harvesting proteins. Among various lineages, the number of LHCIs and their protein and pigment compositions differ significantly in PSI-LHCI structures. However, it is still unclear how LHCIs recognize their specific binding sites in the PSI core. This study aims to address this question by obtaining a high-resolution structure of the PSI supercomplex, including fucoxanthin chlorophyll a/c-binding proteins (FCPs), referred to as PSI-FCPI, isolated from the diatom Thalassiosira pseudonana. Through structural and sequence analyses, distinct protein-protein interactions are identified at the interfaces between FCPI and PSI subunits, as well as among FCPI subunits themselves.

      Strengths:

      The primary strength of this work lies in its superb isolation and structural determination, followed by clear discussion and conclusions. However, the interactions among the protein complexes and their relevance in formulating general rules are not definitively established. While efficiency is a crucial aspect, preventing damage is equally important, and currently, we cannot infer this from the provided structures.

      Weaknesses:

      The interactions among the protein complexes and their relevance in formulating general rules are not definitively established. While efficiency is a crucial aspect, preventing damage is equally important, and currently, we cannot infer this from the provided structures.

      We thank you very much for your highly positive evaluation and comments on our manuscript. This study is aimed to decipher the interactions among different protein subunits within the PSI-FCPI supercomplex, from which we wish to draw their relevance in formulating general rules. While we agree that damage is equally important, it is unclear to us what kind of damage you are mentioning, and we consider that this may need to be treated in another publication, as we cannot elucidate everything in one paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Line 69: "Diatoms are one of the most important phytoplankton in aquatic environments and contribute to the primary production in the ocean remarkably." Check the sentence, something is missing.

      We modified the sentence as follow:

      "Diatoms are among the most essential phytoplankton in aquatic environments, playing a crucial role in the global carbon cycle, supporting marine food webs, and contributing significantly to nutrient cycling, thus ensuring the health and sustainability of marine ecosystems"

      (2) Supplementary Figure 1B: The SDS-PAGE gel shows multiple bands. Do the authors know the identity of these proteins, or have they considered analyzing the bands using mass spectrometry? The band at ~17 kDa is particularly intense. Could you comment on this? Have you tried running a Native-PAGE gel?

      We did not identify protein bands by MS analysis. The protein bands in the PSI-FCPI supercomplex of this diatom have been identified by Ikeda et al. 2013. The protein bands of our sample were similar to those of Ikeda et al. 2013. To explain this, we modified the sentences and cited Ikeda et al. 2013 in the revised manuscript (lines 89-91).

      "The PSI-FCPI supercomplexes were purified from the diatom T. pseudonana CCMP1335 and analyzed by biochemical and spectroscopic techniques (Fig. S1). Notably, the protein bands of PSI-FCPI closely resembled those reported in a previous study (31)."

      The ~17 kDa protein band appears to be FCPIs, which was identified in Ikeda et al. 2013. We did not perform BN-PAGE of this sample; however, we performed trehalose density gradient centrifugation (Fig. S1A).

      (3) Can the authors comment on the position of the FCPI subunits in the PSI supercomplex in diatoms compared to the arrangement of LHCIs in complex with PSI in cyanobacteria, green algae, and angiosperms? This information would be useful to incorporate into the text.

      We previously compared the PSI-FCPI structures of the diatom C. gracilis to the PSI-LHCI structures of land plant, green alga, and red alga (Nagao et al., 2020). Also, Xu et al. 2020 compared the C. gracilis PSI-FCPI structure to the PSI-LHCI structures of land plant, green alga, and red alga. The binding sites between FCPIs and LHCIs are conserved to some extent. However, our recent study revealed that no orthologous relationship exists among LHCs bound to PSI between primitive red algae and diatoms (Kato et al., 2024). Consequently, we found that the information obtained from structural comparisons alone is extremely limited. To avoid misinterpretation, this study focused on comparing the structures and amino acid sequences of FCPIs between T. pseudonana and C. gracilis.

      (4) Line 104: Despite achieving high resolution, the authors modeled only six lipid densities (the PDB model contains actually 9 lipids, you should correct it in the text). Do you believe this is due to the detergent used for purification? Can you comment on the position, identity, and potential role of the lipids within your model?

      There are 6 lipids associated with the PSI core and 3 with FCP, giving rise to a total of 9 lipids. We have described it in our original text (lines 102-104 in the modified manuscript). Additionally, our structure reveals unidentified densities which likely represent lipids; they are modeled as 88 unknown lipids (UNLs). Thus, there are more lipids in the supercomplex. However, we also observed 4 β-DDM molecules (LMT) in the structure, which are used as detergents. Thus, it is possible that some lipids have dissociated and replaced by detergents. Many of the observed lipids are located between subunits, likely contributing to the stabilization of the complex.

      (5) Line 111: The global resolution is very high. Why does the unknown protein have such low resolution that it was impossible to model it properly and perform de novo identification from the density map? Is it due to a lower abundance of particles with this subunit bound? Have you tried improving this with 3D classification/ focus refinement /density modification?

      The Unknown subunit (UNK) is located peripherally, and its density is significantly lower compared to the neighboring subunits, which may suggest a low abundance. We applied density modification using Topaz for 3D map denoising, but the effect was minimal. As the low abundance of UNK may be the cause, 3D classification and focus refinement also had limited impact.

      (6) Figure 2A: It would be useful to show the density map for the subunit together with the model, especially to demonstrate visualization of the long loop.

      We added the model and map of Psa29 to Figure S4C in the revised manuscript.

      (7) Given the proximity of Psa29 to PsaC, is the protein involved in electron shuttling? If so, could you comment on this? In line 131, you state that Psa29 was not found in other organisms. Can the authors speculate on the potential role of this protein in diatoms?

      We have no idea about the function of Psa29 at present. However, Psa29 does not contain any cofactors, indicating no contribution of it to electron transfer reactions. To understand the function of Psa29, a deletion mutant of this gene is required for examining its functional and physiological roles in diatom photosynthesis. To explain this, we added the following sentences to the revised manuscript (lines 129-133):

      "However, the functional and physiological roles of Psa29 remain unclear at present. It is evident that Psa29 does not have any pigments, quinones, or metal complexes, suggesting no contribution of Psa29 to electron transfer reactions within PSI. Further mutagenesis studies will be necessary to investigate the role of Psa29 in diatom photosynthesis."

      (8) Line 163: "Among the FCPI subunits, only FCPI-1 has BCRs in addition to Fxs and Ddxs (Figure S6A). FCPI-1 is a RedCAP, which belongs to the LHC protein superfamily but is distinct from the LHC protein family (6, 7)." It would be useful if the authors could add the carotenoid model embedded in the cryoEM density map to the figure to show the features that led to modeling BCR instead of other carotenoids. Additionally, it would be helpful to include in the text why RedCAPs differ from LHCIs and their proposed role.

      We added the model and map of two BCRs in FCPI-1 (RedCAP) to Figure S4F in the revised manuscript.

      Phylogenetic analysis showed that RedCAPs are distinct from the LHC protein family. This has been explained in lines 163-164. Also, the functional and physiological roles of RedCAP remain unclear. To explain this, we added the sentence "; however, the functional and physiological roles of RedCAP remain unclear" to the revised manuscript (lines 164-165).

      (9) Line 185: "However, it is unknown (i) whether CgRedCAP is indeed bound to the C. gracilis PSI-FCPI supercomplex and (ii) if a loop structure corresponding to the Q96-T116 loop of TpRedCAP exists in CgRedCAP." Have the authors attempted to model the protein using AlphaFold? If so, are there significant differences? Could you speculate on the absence of RedCAP in C. gracilis? Do you believe it is due to using a different detergent or related to environmental factors?

      We did not model CgRedCAP using AlphaFold. Our recent study “Kato et al. 2024” proposed that CgRedCAP binds to the LHCI-1 site in the PSI-FCPI structure based on sequence comparison. There are two types of PSI-FCPI supercomplexes, one having 16 FCPIs and the other having 24 FCPs, from C. gracilis. The different antenna sizes may depend on the growth conditions of C. gracilis (Nagao et al. 2020). These explanations were already described in the manuscript (lines 243-246).

      (10) Line 193: Figure 8 is mentioned before Figures 4-7.

      We are sorry for the mistake of Figure number. Figure 8 is Supplementary Figure 8, so that we modified Fig. S8B in the revised manuscript.

      (11) Line 223: FCPI-4 interacts only with FCPI-5, primarily through the interaction of Y196/4 with the FCPI-5 backbone. Is this interaction facilitated by other factors such as lipids, carotenoids, or other ligands? Also, FCPI-4 occupies a peculiar position compared to other LHCIs proteins (it is peripheral to FCPI-4 and FCPI-5). Do you believe this could be due to a transient interaction with the complex? Could the presence of this protein be related to the growth conditions experienced by the plant? Are there any literature reports on environmental conditions influencing FCPI arrangements? Including this information in the text would be interesting.

      Y196/4 interacts with only backbones by hydrogen-bond interactions; therefore, other cofactors do not contribute to the interactions.

      We do not believe that the interaction of FCPI-4 is transient; rather, this binding appears to be stable within the complex. Given that the PSI-FCPI supercomplexes were isolated by anion exchange chromatography, FCPI-4 and FCPI-5 are tightly associated within this complex. However, it is important to note that the expression of diatom FCPI proteins can indeed vary depending on growth conditions, as highlighted in our previous study (Nagao et al., 2020). While the peculiar position of FCPI-4 may not be directly related to transient interactions, environmental conditions could still influence the overall arrangement and expression levels of FCPIs. This information has already been described in the manuscript (lines 243-246).

      (12) Given the high resolution of your map, the overall model quality does not seem to match the map quality. Specifically, the clash score (10) and sidechain outliers (3%) are elevated. Could you comment on this? Do you believe it is related to the high number of ligands?

      Our structure contains a total of 295 ligands, including cofactors, detergents, and unknown lipids. We believe the high clash score and number of sidechain outliers are due to the large number of ligands present.

      (13) Supplementary Figure 2: You should show the 3D classes that were discarded.

      According to your comment, we added the 3D classes that were discarded and the sentence "Red boxes highlight selected particles from each 3D classification." to Figure S2 and its legend in the revised manuscript.

      (14) Which masks were used for refinement? How were they generated, and which parameters were chosen? This information should be added to the Materials and Methods section. You should show the masks used during classification, for example.

      We used a 240 Å spherical mask for refinement and classification, without applying any reference mask as input. To explain this, we added the corresponding sentence to Methods in the revised manuscript (lines 347-348) as follow:

      "A 240-Å spherical mask was used during the 3D classification and refinement processes."

      (15) Were any extra proteins detected in the early stages of the cryoEM analysis (i.e., 2D classification) that were discarded? Could you visualize the superior oligomeric states of the supercomplex?

      In the single-particle analysis, no larger particles than the analyzed complex were detected. The results of 2D classification using a sufficiently large spherical mask with a diameter of 320 Å are shown below.

      Author response image 1.

      (16) Have you tried using cryoSPARC for data analysis? If so, could you comment on that?

      We did not use cryoSPARC for data analysis.

      Reviewer #2 (Recommendations For The Authors):

      I have some minor comments below to help improve the manuscript. The line numbers below refer to those in the Word version of the manuscript.

      (1) Figure 1 legend, line 559, "membrane normal"? Panel A and B, structures with the same colors, do they refer to the closely related or interacted parts? For example, the red color for FCP1-1 in A and PsaA in B. If not, the authors may want to clarify it.

      The term 'membrane normal' refers to the direction perpendicular to the surface of a membrane. It is a concept frequently used in physics and biology to describe the orientation relative to the membrane's plane.

      We do not refer to either the closely related or interacted parts used in Figure 1. According to your comments, the colors of subunits were revised in the revised manuscript.

      (2) Line 109-117. "Psa28 is a novel subunit found in the C. gracilis PSI-FCPI structure, and its name follows the nomenclature as suggested previously (31).... After psaZ, the newly identified genes should be named psa27, psa28, etc., and the corresponding proteins are called Psa27, Psa28, etc... Psa28 was also named PsaR in the PSI-FCPI structure of C. gracilis (16)". It is confusing. Was Psa28 named twice, PsaR and Psa28? It would be helpful to add a simple explanation here.

      According to your comment, we modified the sentence as follow (lines 117-118):

      " However, Xu et al. named the subunit as PsaR in the PSI-FCPI structure of C. gracilis "

      (3) Line 134, "One of the Car molecules in PsaJ was identified as ZXT103 in the T. pseudonana PSI-FCPI structure but it is BCR112 in the C. gracilis PSI-FCPI structure (15)". Figure S4D mentioned BCR863 but did not mention BCR112. Figure S4C, D, it may need better explanations of the colors and labels, and indicate which parts are from T. pseudonana or C. gracilis.

      BCR112 was misnumbered; the correct number is BCR103. In response to your comments, we revised Figure S4C and D by labeling the characteristic pigments in the revised manuscript.

      (4) Figure S7, although mentioned in the legend, it would be helpful to label interaction pairs on the figure directly with corresponding colours.

      According to your comments, we modified the Figure and legends in the revised manuscript.

      (5) Figure 3E, it is better to avoid red/green colours in one figure as some readers may be colour-blind. It would also be helpful to label each FCPI with the same colour as its structure on the figure directly.

      According to your comments, we modified Figure 3E in the revised manuscript.

      (6) Line 185, "structures similar to the Q96-T116 loop in TpRedCAP found in the present study (Figure 8B).". The authors refer to Figure S8B? I have the same comment for line 186, Figure 8C.

      We are sorry for the mistake of Figure number. Figure 8 is Supplementary Figure 8, so we modified it as Fig. S8B in the revised manuscript.

      (7) Line 270, "TpLhcq10 cannot bind at the FCPI-2 site". Why not use FCPI-3 for TpLhcq10?

      This means that the gene product of TpLhcq10 binds at the FCPI-3 site but not at the other sites such as FCPI-2. To avoid misreading, we modified the sentence as follows:

      "TpLhcq10 binds specifically at the FCPI-3 site but not at the other sites such as FCPI-2" (lines 278-279)

      Reviewer #3 (Recommendations For The Authors):

      I have no technical or conceptual suggestions at the current stage.

      Thank you.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      - The authors should think about revising the terminology used to describe electrophysiological data in zebrafish (Fig.5): "posterior" hair cells in a neuromast are sensitive to posterior-to-anterior flow, which is currently termed "anterior". This is confusing because when "posterior" or "anterior" is used, for instance in the labels of the figure, one may get confused about whether this applies to hair-cell position or directionality of the stimulus. It would help to always use clearer terminology for the stimulus (e.g. posterior-to-anterior (P-to-A) as in Kindig 2023, or "from the tail"). Also, the authors may want to clarify what we should see in Fig.5 demonstrating that posterior hair cells, with reversed hair-bundle polarity, actually evince transduction of similar magnitude as anterior hair cells, with normal polarity of their hair bundles. 

      This nomenclature can indeed be confusing. Per the reviewers request we have changed the terminology to always refer to the direction of flow sensed by the hair cells. For example, HCs that respond to posterior-directed flow or anterior-directed flow. We now denote these HCs as (A to P) and (P to A), respectively in the Figure for clarity. We have modified Figure 5, the Figure 5 legend and Results (starting line 339) to reflect these changes.

      In addition, in our results we now provide more context when comparing the response magnitude of the anterior-sensing hair cells in gpr156 mutants to the response magnitude of the two diVerent orientations of hair cells in controls.

      - Also, does it make sense that there is no defect in MET for mouse otolith organs with deleted GPR156, whereas there is a diVerence in the zebrafish lateral line? It would help motivate the study on mechanoelectrical transduction (see comment of Reviewer 1 below). 

      We previously discussed this point and recognized that subtle eVects remain possible in mouse (previously Discussion line 614). We have now  modified the text in the Discussion to better emphasize this point (new line 627). The Eatock lab is currently working on developing calcium imaging in the mouse utricle to revisit this question in a future study. "Subtle e)ects remain possible, however, given the variance in single-cell electrophysiological data from both control and mutant mice.  Nevertheless, current results are consistent with normal HC function in the Gpr156 mouse mutant, a prerequisite to interrogate how non-reversed HCs a)ects vestibular behavior."

      To help motivate transduction studies starting in the second Result paragraph, we added a transition at Line 205 that was indeed lacking:

      "Gpr156 inactivation could be a powerful model to specifically ask how HC reversal contributes to vestibular function. However, GPR156 may have other confounding roles in HCs besides regulating their orientation, similar to EMX2, which impacts mechanotransduction in zebrafish HCs (Kindig et al., 2023) and a)erent innervation  in mouse and zebrafish HCs (Ji et al., 2022; Ji et al., 2018)."

      (1) One overarching objective of this study was to use the Gpr156 KO model to discover how polarity reversal informs vestibular function (Introduction, overall summary in the last paragraph) . Pairing behavioral defects with hair cell orientation is only possible if hair cell transduction is normal, which had to be tested.

      (2) The notion that experiments that produced negative results are unecessary and are not properly motivated can only apply in retrospect. At early stages we performed electrophysiology because we did not know whether transduction would be normal in absence of GPR156. We also did not know whether innervation would be normal. The fact that both appear normal makes Gpr156 KO a better model to address the importance of orientation reversal (conclusion of the Discussion line 705).

      See also reply to Reviewer #1 below.

      Reviewer #1 (Recommendations For The Authors): 

      Fig1, panel B appears to show diVerent focal planes for Gpr156del/+ and Gpr156del/del. 

      Figure 1B had control and mutant panels at slightly diVerent focal planes indeed. We swapped the right (mutant) panel image and adjusted intensities in the control image to match adjustments of the new mutant image.  

      Given that this work is largely about polarity and connectivity to neurons, I do not understand the need to assess mechanosensitivity in Gpr156 mutants. Please explain in the text, as follows: "After establishing normal numbers and types of mouse vestibular HCs, we assessed whether HCs respond normally to hair bundle deflections in the absence of GPR156." We did this because... 

      Please see reply above in 'Recommendations for the authors' for comment about the need to assess mechanosensitivity. We agree that this transition was lacking, and we added an explanation as recommended:

      "Gpr156 inactivation could be a powerful model to specifically ask how HC reversal contributes to vestibular function. However, GPR156 may have other confounding roles in HCs besides regulating their orientation, similar to EMX2, which impacts mechanotransduction in zebrafish HCs (Kindig et al., 2023) and a)erent innervation  in mouse and zebrafish HCs (Ji et al., 2022; Ji et al., 2018)."

      Anyway, the data in Figures 2, 3 and 4 seems somewhat superfluous to the main message of the paper. 

      Please see reply above in 'Recommendations for the authors'. This data may appear superfluous in retrospect but we could not claim that behavioral changes in Gpr156 mutants reflect the role of the line of polarity reversal if, for example, hair cell transduction was abnormal. We had to perform experiments to figure this out. We were further motivated as data began to emerge from the zebrafish lateral line that showed eVects on HC transduction. Although we did not get positive results on this question in the mouse, we think the diVerence between models should be included as a significant part of the narrative.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for the constructive criticism and detailed assessment of our work which helped us to significantly improve our manuscript. We made significant changes to the text to better clarify our goals and approaches. To make our main goal of extracting the network dynamics clearer and to highlight the main advantage of our method in comparison with prior work we incorporated Videos 1-4 into the main text. We hope that these changes, together with the rest of our responses, convincingly demonstrate the utility of our method in producing results that are typically omitted from analysis by other methods and can provide important novel insights on the dynamics of the brain circuits. 

      Reviewer #1 (Public Review):

      (1) “First, this paper attempts to show the superiority of DyNetCP by comparing the performance of synaptic connectivity inference with GLMCC (Figure 2).”

      We believe that the goals of our work were not adequately formulated in the original manuscript that generated this apparent misunderstanding. As opposed to most of the prior work focused on reconstruction of static connectivity from spiking data (including GLMCC), our ultimate goal is to learn the dynamic connectivity structure, i.e. to extract time-dependent strength of the directed connectivity in the network. Since this formulation is fundamentally different from most of the prior work, therefore the goal here is not to show the “improvement” or “superiority” over prior methods that mostly focused on inference of static connectivity, but rather to thoroughly validate our approach and to show its usefulness for the dynamic analysis of experimental data. 

      (2) “This paper also compares the proposed method with standard statistical methods, such as jitter-corrected CCG (Figure 3) and JPSTH (Figure 4). It only shows that the results obtained by the proposed method are consistent with those obtained by the existing methods (CCG or JPSTH), which does not show the superiority of the proposed method.”

      The major problem for designing such a dynamic model is the virtual absence of ground-truth data either as verified experimental datasets or synthetic data with known time-varying connectivity. In this situation optimization of the model hyper-parameters and model verification is largely becoming a “shot in the dark”. Therefore, to resolve this problem and make the model generalizable, here we adopted a two-stage approach, where in the first step we learn static connections followed in the next step by inference of temporally varying dynamic connectivity. Dividing the problem into two stages enables us to separately compare the results of both stages to traditional descriptive statistical approaches. Static connectivity results of the model obtained in stage 1 are compared to classical pairwise CCG (Fig.2A,B) and GLMCC (Fig.2 C,D,E), while dynamic connectivity obtained in step 2 are compared to pairwise JPSTH (Fig.4D,E).

      Importantly, the goal here therefore is not to “outperform” the classical descriptive statistical or any other approaches, but rather to have a solid guidance for designing the model architecture and optimization of hyper-parameters. For example, to produce static weight results in Fig.2A,B that are statistically indistinguishable from the results of classical CCG, the procedure for the selection of weights which contribute to averaging is designed  as shown in Fig.9 and discussed in details in the Methods. Optimization of the L2 regularization parameter is illustrated in Fig.4 – figure supplement 1 that enables to produce dynamic weights very close to cJPSTH as evidenced by Pearson coefficient and TOST statistical tests. These comparisons demonstrate that indeed the results of CCG and JPSTH are faithfully reproduced by our model that, we conclude, is sufficient justification to apply the model to analyze experimental results. 

      (3) “However, the improvement in the synaptic connectivity inference does not seem to be convincing.”

      We are grateful for the reviewer to point out to this issue that we believe, as mentioned above, results from the deficiency of the original manuscript to clarify the major motivation for this comparison. Comparison of static connectivity inferred by stage 1 of our model to the results of GLMCC in Fig.2C,D,E is aimed at optimization of yet another two important parameters - the pair spike threshold and the peak height threshold. Here, in Fig. 2D we show that when the peak height threshold is reduced from rigorous 7 standard deviations (SD) to just 5 SD, our model recovers 74% of the ground truth connections that in fact is better than 69% produced by GLMCC for a comparable pair spike threshold of 80. As explained above, we do not intend to emphasize here that our model is “superior” since it was not our goal, but rather use this comparison to illustrate the approach for optimization of thresholds for units and pairs filtering as described in detail in Fig. 11 and corresponding section in Methods.

      To address these misunderstandings and better clarify the goal of our work we changed the text in the Introductory section accordingly. We also incorporated Videos 1-4 from the Supplementary Materials into the main text as Video 1, Video 2, Video 3, and Video 4. In fact, these videos represent the main advantage (or “superiority”) of our model with respect to prior art that enables to infer the time-dependent dynamics of network connectivity as opposed to static connections.

      (4) “While this paper compares the performance of DyNetCP with a state-of-the-art method (GLMCC), there are several problems with the comparison. For example: 

      (a) This paper focused only on excitatory connections (i.e., ignoring inhibitory neurons). 

      (b) This paper does not compare with existing neural network-based methods (e.g., CoNNECT: Endo et al. Sci. Rep. 2021; Deep learning: Donner et al. bioRxiv, 2024).

      (c) Only a population of neurons generated from the Hodgkin-Huxley model was evaluated.”

      (a) In general, the model of Eq.1 is agnostic to excitatory or inhibitory connections it can recover. In fact, Fig. 5 and Fig.6 illustrate inferred dynamic weights for both excitatory (red arrows) and inhibitory (blue arrows) connections between excitatory (red triangles) and inhibitory (blue circles) neurons. Similarly, inhibitory and excitatory dynamic interactions between connections are represented in Fig. 7 for the larger network across all visual cortices.

      (b) As stated above, the goal for the comparison of the static connectivity results of stage 1 of our model to other approaches is to guide the choice of thresholds and optimization of hyperparameters rather than claiming “superiority” of our model. Therefore, comparison with “static” CNN-based model of Endo et al. or ANN-based static model of Donner et al. (submitted to bioRxiv several months after our submission to eLife) is beyond the scope of this work. 

      (c) We have chosen exactly the same sub-population of neurons from the synthetic HH dataset of Ref. 26 that is used in Fig.6 of Ref. 26 that provides direct comparison of connections reconstructed by GLMCC in the original Ref.26 and the results of our model. 

      (5) “In summary, although DyNetCP has the potential to infer synaptic connections more accurately than existing methods, the paper does not provide sufficient analysis to make this claim. It is also unclear whether the proposed method is superior to the existing methods for estimating functional connectivity, such as jitter-corrected CCG and JPSTH. Thus, the strength of DyNetCP is unclear.”

      As we explained above, we have no intention to claim that our model is more accurate than existing static approaches. In fact, it is not feasible to have better estimation of connectivity than direct descriptive statistical methods as CCG or JPSTH. Instead, comparison with static (CCG and GLMCC) and temporal (JPSTH) approaches are used here to guide the choice of the model thresholds and to inform the optimization of hyper-parameters to make the prediction of the dynamic network connectivity reliable. The main strength of DyNetCP is inference of dynamic connectivity as illustrated in Videos 1-4. We demonstrated the utility of the method on the largest in-vivo experimental dataset available today and extracted the dynamics of cortical connectivity in local and global visual networks. This information is unattainable with any other contemporary methods we are aware of. 

      Reviewer #1 (Recommendations for the Authors):

      (6) “First, the authors should clarify the goal of the analysis, i.e., to extract either the functional connectivity or the synaptic connectivity. While this paper assumes that they are the same, it should be noted that functional connectivity can be different from synaptic connectivity (see Steavenson IH, Neurons Behav. Data Anal. Theory 2023).”

      The goal of our analysis is to extract dynamics of the spiking correlations. In this paper we intentionally avoided assigning a biological interpretation to the inferred dynamic weights. Our goal was to demonstrate that a trough of additional information on neural coding is hidden in the dynamics of neural correlations. The information that is typically omitted from the analysis of neuroscience data. 

      Biological interpretation of the extracted dynamic weights can follow the terminology of the shortterm plasticity between synaptically connected neurons (Refs 25, 33-37) or spike transmission strength (Refs 30-32,46). Alternatively, temporal changes in connection weights can be interpreted in terms of dynamically reconfigurable functional interactions of cortical networks (Refs 8-11,13,47) through which the information is flowing. We could not also exclude interpretation that combines both ideas. In any event our goal here is to extract these signals for a pair (video1, Fig.4), a cortical local circuit (Video 2, Fig.5), and for the whole visual cortical network (Videos 3, 4 and Fig.7). 

      To clarify this statement, we included a paragraph in the discussion section of the revised paper. 

      (7) “Finally, it would be valuable if the authors could also demonstrate the superiority of DyNetCP qualitatively. Can DyNetCP discover something interesting for neuroscientists from the large-scale in vivo dataset that the existing method cannot?”

      The model discovers dynamic time-varying changes in neuron synchronous spiking (Videos 1-4) that more traditional methods like CCG or GLMCC are not able to detect. The revealed dynamics is happening at the very short time scales of the order of just a few ms during the stimulus presentation. Calculations of the intrinsic dimensionality of the spiking manifold (Fig. 8) reveal that up to 25 additional dimensions of the neural code can be recovered using our approach. These dimensions are typically omitted from the analysis of the neural circuits using traditional methods.  

      Reviewer #2 (Public Review):

      (1) “Simulation for dynamic connectivity. It certainly seems doable to simulate a recurrent spiking network whose weights change over time, and I think this would be a worthwhile validation for this DyNetCP model. In particular, I think it would be valuable to understand how much the model overfits, and how accurately it can track known changes in coupling strength.”

      We are very grateful to the reviewer for this insight. Verification of the model on synthetic data with known time-varying connectivity would indeed be very useful. We did generate a synthetic dataset to test some of the model performance metrics - i.e. testing its ability to distinguish True Positive (TP) from False Positive (FP) “serial” or “common input” connections (Fig.10A,B). Comparison of dynamic and static weights might indeed help to distinguish TP connections from an artifactual FP connections. 

      Generating a large synthetic dataset with known dynamic connections that mimics interactions in cortical networks is, however, a separate and not very trivial task that is beyond the scope of this work. Instead, we designed a model with an architecture where overfitting can be tested in two consecutive stages by comparison with descriptive statistical approaches – CCG and JPSTH. Static stage 1 of the model predicts correlations that are statistically indistinguishable from the CCG results (Fig.2A,B). The dynamic stage 2 of the model produce dynamic weight matrices that faithfully reproduce the cJPSTH (Fig.4D,E). Calculated Pearson correlation coefficients and TOST testing enable optimizing the L2 regularization parameter as shown in Fig.4 – supplement 1 and described in detail in the Methods section. The ability to test results of both stages separately to descriptive statistical results is the main advantage of the chosen model architecture that allow to verify that the model does not overfit and can predict changes in coupling strength at least as good as descriptive statistical approaches (see also our answer above to the Reviewer #1 questions).

      (2) “If the only goal is "smoothing" time-varying CCGs, there are much easier statistical methods to do this (c.f. McKenzie et al. Neuron, 2021. Ren, Wei, Ghanbari, Stevenson. J Neurosci, 2022), and simulations could be useful to illustrate what the model adds beyond smoothing.”

      We are grateful to the reviewer for bringing up these very interesting and relevant references that we added to the discussion section in the paper. Especially of interest is the second one, that is calculating the time-varying CCG weight (“efficacy” in the paper terms) on the same Allen Institute Visual dataset as our work is using. It is indeed an elegant way to extract time-variable coupling strength that is similar to what our model is generating. The major difference of our model from that of Ren et al., as well as from GLMCC and any statistical approaches is that the DyNetCP learns connections of an entire network jointly in one pass, rather than calculating coupling separately for each pair in the dataset without considering the relative influence of other pairs in the network. Hence, our model can infer connections beyond pairwise (see Fig. 11 and corresponding discussion in Methods) while performing the inferences with computational efficiency. 

      (3) “Stimulus vs noise correlations. For studying correlations between neurons in sensory systems that are strongly driven by stimuli, it's common to use shuffling over trials to distinguish between stimulus correlations and "noise" correlations or putative synaptic connections. This would be a valuable comparison for Figure 5 to show if these are dynamic stimulus correlations or noise correlations. I would also suggest just plotting the CCGs calculated with a moving window to better illustrate how (and if) the dynamic weights differ from the data.”

      Thank you for this suggestion. Note that for all weight calculations in our model a standard jitter correction procedure of Ref. 33 Harrison et al., Neural Com 2009 is first implemented to mitigate the influences of correlated slow fluctuations (slow “noise”). Please also note that to obtain the results in Fig. 5 we split the 440 total experimental trials for this session (when animal is running, see Table 1) randomly into 352 training and 88 validation trials by selecting 44 training trials from each configuration of contrast or grating angle and 11 for validation. We checked that this random selection, if changed, produced the very same results as shown in Fig.5. 

      Comparison of descriptive statistical results of pairwise cJPSTH and the model are shown in Fig. 4D,E. The difference between the two is characterized in Fig.4 – supplement 1 in detail as evidenced by Pearson coefficient and TOST statistical tests.

      Reviewer #2 (Recommendations for the Authors):

      (4) “The method is described as "unsupervised" in the abstract, but most researchers would probably call this "supervised" (the static model, for instance, is logistic regression).”

      The model architecture is composed of two stages to make parameter optimization grounded. While the first stage is regression, the second and the most important stage is not. Therefore, we believe the term “unsupervised” is justified. 

      (5) “Introduction - it may be useful to mention that there have been some previous attempts to describe time-varying connectivity from spikes both with probabilistic models: Stevenson and Kording, Neurips (2011), Linderman, Stock, and Adams, Neurips (2014), Robinson, Berger, and Song, Neural Computation (2016), Wei and Stevenson, Neural Comp (2021) ... and with descriptive statistics: Fujisawa et al. Nat Neuroscience (2008), English et al. Neuron (2017), McKenzie et al. Neuron (2021).”

      We are very grateful to both reviewers for bringing up these very interesting and relevant references that we gladly included in the discussions within the Introduction and Discussion sections. 

      (6) “In the section "Static connectivity inferred by the DyNetCP from in-vivo recordings is biologically interpretable"... I may have missed it, but how is the "functional delay" calculated? And am I understanding right that for the DyNetCP you are just using [w_i\toj, w_j\toi] in place of the CCG?”

      The functional delay is calculated as a time lag of the maximum (or minimum) in the CCG (or static weight matrix). The static weight that the model is extracting is indeed the wiwj product. We changed the text in this section to better clarify these definitions. 

      (7) “P14 typo "sparce spiking" sparse”

      Fixed. Thank you. 

      (8) “Suggest rewarding "Extra-laminar interactions reveal formation of neuronal ensembles with both feedforward (e.g., layer 4 to layer 5), and feedback (e.g., layer 5 to layer 4) drives." I'm not sure this method can truly distinguish common input from directed, recurrent cortical effects. Just as an example in Figure 5, it looks like 2->4, 0->4, and 3>2 are 0 lag effects. If you wanted to add the "functional delay" analysis to this laminar result that could support some stronger claims about directionality, though.”

      The time lags for the results of Fig. 5 are indeed small, but, however, quantifiable. Left panel Fig. 5A shows static results with the correlation peaks shifted by 1ms from zero lag.

      (9) “Methods - I think it would be useful to mention how many parameters the full DyNetCP model has.”

      Overall, after the architecture of Fig.1C is established, dynamic weight averaging procedure is selected (Fig.9), and Fourier features are introduced (Fig.10), there is just a few parameters to optimize including L2 regularization (Fig.4 – supplement 1) and loss coefficient  (Fig.1 – figure supplement 1A). Other variables, common for all statistical approaches, include bin sizes in the lag time and in the trial time. Decreasing the bin size will improve time resolution while decreasing the number of spikes in each bin for reliable inference. Therefore, number of spikes threshold and other related thresholds α𝑠 , α𝑤 , α𝑝 as well as λ𝑖λ𝑗, need to be adjusted accordingly (Fig.11) as discussed in detail in the Methods, Section 4. We included this sentence in the text. 

      (10) “It may be useful to also mention recent results in mice (Senzai et al. Neuron, 2019) and monkeys (Trepka...Moore. eLife, 2022) that are assessing similar laminar structures with CCGs.”

      Thank you for pointing out these very interesting references. We added a paragraph in “Dynamic connectivity in VISp primary visual area” section comparing our results with these findings. In short, we observed that connections are distributed across the cortical depth with nearly the same maximum weights (Fig.7A) that is inconsistent with observed in Trepka et al, 2022 greatly diminished static connection efficacy within <200µm from the source. It is consistent, however, with the work of Senzai et al, 2019 that reveals much stronger long-distance correlations between layer 2/3 and layer 5 during waking in comparison to sleep states. In both cases these observations represent static connections averaged over a trial time, while the results presented in Video 3 and Fig.7A show strong temporal modulation of the connection strength between all the layers during the stimulus presentation. Therefore, our results demonstrate that tracking dynamic connectivity patterns in local cortical networks can be invaluable in assessing circuitlevel dynamic network organization.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, the authors utilize recurrent neural networks (RNNs) to explore the question of when and how neural dynamics and the network's output are related from a geometrical point of view. The authors found that RNNs operate between two extremes: an 'aligned' regime in which the weights and the largest PCs are strongly correlated and an 'oblique' regime where the output weights and the largest PCs are poorly correlated. Large output weights led to oblique dynamics, and small output weights to aligned dynamics. This feature impacts whether networks are robust to perturbation along output directions. Results were linked to experimental data by showing that these different regimes can be identified in neural recordings from several experiments.

      Strengths:

      A diverse set of relevant tasks.

      A well-chosen similarity measure.

      Exploration of various hyperparameter settings.

      Weaknesses:

      One of the major connections found BCI data with neural variance aligned to the outputs.

      Maybe I was confused about something, but doesn't this have to be the case based on the design of the experiment? The outputs of the BCI are chosen to align with the largest principal components of the data.

      The reviewer is correct. We indeed expected the BCI experiments to yield aligned dynamics. Our goal was to use this as a comparison for other, non-BCI recordings in which the correlation is smaller, i.e. dynamics closer to the oblique regime. We adjusted our wording accordingly and added a small discussion at the end of the experimental results, Section 2.6.

      Proposed experiments may have already been done (new neural activity patterns emerge with long-term learning, Oby et al. 2019). My understanding of these results is that activity moved to be aligned as the manifold changed, but more analyses could be done to more fully understand the relationship between those experiments and this work.

      The on- vs. off-manifold experiments are indeed very close to our work. On-manifold initializations, as stated above, are expected to yield aligned solutions. Off-manifold initializations allow, in principle, for both aligned and oblique solutions and are thus closer to our RNN simulations. If, during learning, the top PCs (dominant activity) rotate such that they align with the pre-defined output weights, then the system has reached an aligned solution. If the top PCs hardly change, and yet the behavior is still good, this is an oblique solution. There is some indication of an intermediate result (Figure 4C in Oby et al.), but the existing analysis there did not fully characterize these properties. Furthermore, our work suggests that systematically manipulating the norm of readout weights in off-manifold experiments can yield new insights. We thus view these as relevant results but suggest both further analysis and experiments. We rewrote the corresponding section in the discussion to include these points.

      Analysis of networks was thorough, but connections to neural data were weak. I am thoroughly convinced of the reported effect of large or small output weights in networks. I also think this framing could aid in future studies of interactions between brain regions.

      This is an interesting framing to consider the relationship between upstream activity and downstream outputs. As more labs record from several brain regions simultaneously, this work will provide an important theoretical framework for thinking about the relative geometries of neural representations between brain regions.

      It will be interesting to compare the relationship between geometries of representations and neural dynamics across connected different brain areas that are closer to the periphery vs. more central.

      It is exciting to think about the versatility of the oblique regime for shared representations and network dynamics across different computations.

      The versatility of the oblique regime could lead to differences between subjects in neural data.

      Thank you for the suggestions. Indeed, this is precisely why relative measures of the regime are valuable, even in the absence of absolute thresholds for regimes. We included your suggestions in the discussion.

      Reviewer #2 (Public Review):

      Summary:

      This paper tackles the problem of understanding when the dynamics of neural population activity do and do not align with some target output, such as an arm movement. The authors develop a theoretical framework based on RNNs showing that an alignment of neural dynamics to output can be simply controlled by the magnitude of the read-out weight vector while the RNN is being trained. Small magnitude vectors result in aligned dynamics, where low-dimensional neural activity recapitulates the target; large magnitude vectors result in "oblique" dynamics, where encoding is spread across many dimensions. The paper further explores how the aligned and oblique regimes differ, in particular, that the oblique regime allows degenerate solutions for the same target output.

      Strengths:

      - A really interesting new idea that different dynamics of neural circuits can arise simply from the initial magnitude of the output weight vector: once written out (Eq 3) it becomes obvious, which I take as the mark of a genuinely insightful idea.

      - The offered framework potentially unifies a collection of separate experimental results and ideas, largely from studies of the motor cortex in primates: the idea that much of the ongoing dynamics do not encode movement parameters; the existence of the "null space" of preparatory activity; and that ongoing dynamics of the motor cortex can rotate in the same direction even when the arm movement is rotating in opposite directions.

      - The main text is well written, with a wide-ranging set of key results synthesised and illustrated well and concisely.

      - The study shows that the occurrence of the aligned and oblique regimes generalises across a range of simulated behavioural tasks.

      - A deep analytical investigation of when the regimes occur and how they evolve over training.

      - The study shows where the oblique regime may be advantageous: allows multiple solutions to the same problem; and differs in sensitivity to perturbation and noise.

      - An insightful corollary result that noise in training is needed to obtain the oblique regime.

      - Tests whether the aligned and oblique regimes can be seen in neural recordings from primate cortex in a range of motor control tasks.

      Weaknesses:

      - The magnitude of the output weights is initially discussed as being fixed, and as far as I can tell all analytical results (sections 4.6-4.9) also assume this. But in all trained models that make up the bulk of the results (Figures 3-6) all three weight vectors/matrices (input, recurrent, and output) are trained by gradient descent. It would be good to see an explanation or results offered in the main text as to why the training always ends up in the same mapping (small->aligned; large->oblique) when it could, for example, optimise the output weights instead, which is the usual target (e.g. Sussillo & Abbott 2009 Neuron).

      We understand the reviewer’s surprise. We chose a typical setting (training all weights of an RNN with Adam) to show that we don’t have to fine-tune the setting (e.g. by fixing the output weights) to see the two regimes. However, other scenarios in which the output weights do change are possible, depending on the algorithm and details in the way the network is parameterized. Understanding why some settings lead to our scenario (no change in scale) and others don’t is not a simple question. A short explanation here, nonetheless:

      - Small changes to the internal weights are sufficient to solve the tasks.

      - Different versions of gradient descent and different ways of parametrizing the network lead to different results in which parts of the weights get trained. This goes in particular for how weight scales are introduced, e.g. [Jacot et al. 2018 Neurips], [Geiger et al. 2020 Journal of Statistical Mechanics], or [Yang, Hu 2020, arXiv, Feature learning in infinite-width networks]. One insight from these works is that plain gradient descent (GD) with small output weights leads to learning only at the output (and often divergence or unsuccessful learning). For this reason, plain GD (or stochastic GD) is not suitable for small output weights (the aligned regime). Other variants of GD, such as Adam or RMSprop, don’t have this problem because they shift the emphasis of learning to the hidden layers (here the recurrent weights). This is due to the normalization of the gradients.

      - FORCE learning [Sussillo & Abbott 2009] is somewhat special in that the output weights are simultaneously also used as feedback weights. That is, not only the output weights but also an additional low-rank feedback loop through these output weights is trained. As a side note: By construction, such a learning algorithm thus links the output directly to the internal dynamics, so that one would only expect aligned solutions – and the output weights remain correspondingly small in these algorithms [Mastrogiuseppe, Ostojic, 2019, Neural Comp].

      - In our setting, the output is not fed back to the network, so training the output alone would usually not suffice. Indeed, optimizing just the output weights is similar to what happens in the lazy training regime. These solutions, however, are not robust to noise, and we show that adding noise during the training does away with these solutions.

      To address this issue in the manuscript, we added the following sentence to section 2.2: “While explaining this observation is beyond the scope of this work, we note that (1) changing the internal weights suffices to solve the task, and that (2) the extent to which the output weights change during learning depends on the algorithm and specific parametrization [21, 27, 85].”

      - It is unclear what it means for neural activity to be "aligned" for target outputs that are not continuous time-series, such as the 1D or 2D oscillations used to illustrate most points here.

      Two of the modeled tasks have binary outputs; one has a 3-element binary vector.

      For any dynamics and output, we compare the alignment between the vector of output weights and the main PCs (the leading component of the dynamics). In the extreme of binary internal dynamics, i.e., two points {x_1, x_2}, there would only be one leading PC (the line connecting the two points, i.e. the choice decoder).

      - It is unclear what criteria are used to assign the analysed neural data to the oblique or aligned regimes of dynamics.

      Such an assignment is indeed difficult to achieve. The RNN models we showed were at the extremes of the two regimes, and these regimes are well characterized in the case of large networks (as described in the methods section). For the neural data, we find different levels of alignment for different experiments. These differences may not be strong enough to assign different regimes. Instead, our measures (correlation and relative fitting dimension) allow us to order the datasets. Here, the BCI data is more aligned than non-BCI data – perhaps unsurprisingly, given the experimental design of the prior and the previous findings for the rotation task [Russo et al, 2018]. We changed the manuscript accordingly, now focusing on the relative measure of alignment, even in the absence of absolute thresholds. We are curious whether future studies with more data, different tasks, or other brain regions might reveal stronger differentiation towards either extreme.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      There's so much interesting content in the supplement - it seemed like a whole other paper! It is interesting to read about the dynamics over the course of learning. Maybe you want to put this somewhere else so that more people read it?

      We are glad the reviewer appreciated this content. We think developing these analysis methods is essential for a more complete understanding of the oblique regime and how it arises, and that it should therefore be part of the current paper.

      Nice schematic in Figure 1.

      There were some statements in the text highlighting co-rotation in the top 2 PCs for oblique networks. Figure 4a looks like aligned networks might also co-rotate in a particular subspace that is not highlighted. I could be wrong, but the authors should look into this and correct it if so. If both aligned and oblique networks have co-rotation within the top 5 or so PCs, some text should be updated to reflect this.

      This is indeed the case, thanks for pointing this out! For one example, there is co-rotation for the aligned network already in the subspace spanned by PCs 1 and 3, see the figure below. We added a sentence indicating that co-rotation can take place at low-variance PCs for the aligned regime and pointed to this figure, which we added to the appendix (Fig. 17).

      While these observations are an important addition, we don’t think they qualitatively alter our results, particularly the stronger dissociation between output and internal dynamics for oblique than aligned dynamics.

      Figure 4 color labels were 'dark' and 'light'. I wasn't sure if this was a typo or if it was designed for colorblind readers? Either way, it wasn't too confusing, but adding more description might be useful.

      Fixed to red and yellow.

      Typo "Aligned networks have a ratio much large than one"

      Typo "just started to be explored" Typo "hence allowing to test"

      Fixed all typos.

      Reviewer #2 (Recommendations For The Authors):

      - Explain/discuss in the main text why the initial output weights reliably result in the required internal RNN dynamics (small->aligned; large->oblique) after training. The magnitude of the output weights is initially discussed as being fixed, and as far as I can tell all analytical results (sections 4.6-4.9) also assume this. But in all trained models that make up the bulk of the results (Figures 3-6) all three weight vectors/matrices (input, recurrent, and output) are trained by gradient descent. It would be good to see an explanation or results offered in the main text as to why the training always ends up in the same mapping (small->aligned; large->oblique) when it could, for example, just optimise the output weights instead.

      See the answer to a similar comment by Reviewer #1 above.

      - Page 6: explain the 5 tasks.

      We added a link to methods where the tasks are described.

      - Page 6/Fig 3 & Methods: explain assumptions used to compute a reconstruction R^2 between RNN PCs and a binary or vector target output.

      We added a new methods section, 4.4, where we explain the fitting process in Fig. 3. For all tasks, the target output was a time series with P specified target values in N_out dimensions. We thus always applied regression and did not differentiate between binary and non-binary tasks.

      - Page 8: methods and predictions are muddled up: paragraph ending "along different directions" should be followed by paragraph starting "Our intuition...". The intervening paragraph ("We apply perturbations...") should start after the first sentence of the paragraph "To test this,...".

      Right, these sentences were muddled up indeed. We put them in the correct order.

      - Page 10: what are the implications of the differences in noise alignment between the aligned and oblique regimes?

      The noise suppression in the oblique regime is a slow learning process that gradually renders the solution more stable. With a large readout, learning separates into two phases. An early phase, in which a “lazy” solution is learned quickly. This solution is not robust to noise. In a second, slower phase, learning gradually leads to a more robust solution: the oblique solution. The main text emphasizes the result of this process (noise suppression). In the methods, we closely follow this process. This process is possibly related to other slow learning process fine-tuning solutions, e.g., [Blanc et al. 2020, Li et al. 2021, Yang et al. 2023]. Furthermore, it would be interesting to see whether such fine-tuning happens in animals [Ratzon et al. 2024]. We added corresponding sentences to the discussion.

      - Neural data analysis:

      (i) Page 11 & Fig 7: the assignment of "aligned" or "oblique" to each neural dataset is based on the ratio of D_fit/D_x. But in all cases this ratio is less than 1, indicating fewer dimensions are needed for reconstruction than for explaining variance. Given the example in Figure 2 suggests this is an aligned regime, why assign any of them as "oblique"?

      We weakened the wording in the corresponding section, and now only state that BCI data leans more towards aligned, non-BCI data more towards oblique. This is consistent with the intuition that BCI is by construction aligned (decoder along largest PCs) and non-BCI data already showed signs of oblique dynamics (co-rotating leading PCs in the cycling task, Russo et al. 2018).

      We agree that Fig 2 (and Fig 3) could suggest distinguishing the regimes at a threshold D_fit/D_x = 1, although we hadn’t considered such a formal criterion.

      (ii) Figure 23 and main text page 11: discuss which outputs for NLB and BCI datasets were used in Figure 7 & and main text; the NLB results vary widely by output type - discuss in the main text; D_fit for NLB-maze-accuracy is missing from panel D; as the criterion is D_fit/D_x, plot this too.

      We now discuss which outputs were used in Fig. 7 in its caption: the velocity of the task-relevant entity (hand/finger/cursor). This was done to have one quantity across studies. We added a sentence to the main text, p. 11, which points to Fig 22 (which used to be Fig 23) and states that results are qualitatively similar for other decoded outputs, despite some fluctuations in numerical values and decodability.

      Regarding Fig 22: D_fit for NLB-maze-accuracy was beyond the manually set y-limit (for visibility of the other data points). We also extended the figure to include D_fit/D_x. We also discovered a small bug in the analysis code which required us to rerun the analysis and reproduce the plots. This also changed some of the numbers in the main text.

      - Discussion:

      "They do not explain why it [the "irrelevant activity"] is necessary", implies that the following sentence(s) will explain this, but do not. Instead, they go on to say:

      "Here, we showed that merely ensuring stability of neural dynamics can lead to the oblique regime": this does not explain why it is necessary, merely that it exists; and it is unclear what results "stability of neural dynamics" is referring to.

      We agree this was not a very clear formulation. We replaced these last three sentences with the following:

      “Our study systematically explains this phenomenon: generating task-related output in the presence of large, task-unrelated dynamics requires large readout weights. Conversely, in the presence of large output weights, resistance to noise or perturbations requires large, potentially task-unrelated neural dynamics (the oblique regime).”

      - The need for all 27 figures was unclear, especially as some seemed not to be referenced or were referenced out of order. Please check and clarify.

      Fig 16 (Details for network dynamics in cycling tasks) and Fig 21 (loss over learning time for the different tasks) were not referenced, and are now removed.

      We also reordered the figures in the appendix so that they would appear in the order they are referenced. Note that we added another figure (now Fig. 17) following a question from Reviewer #1.

    1. Author response:

      The following is the authors’ response to the original reviews.

      The detailed, thorough critique provided by the three reviewers is very much appreciated. We believe the manuscript is greatly improved by the changes we have made based on those reviews. The major changes are described below, followed by a point by point response.

      Major Changes:

      (1) We revised our model (old Fig. 10; new Fig. 9) to keep the explanation focused on the data shown in the current study. Specifically, references to GTP/GDP states of Rab3A and changes in the presynaptic quantum have been removed and the mechanisms depicted are confined to pre- or post-synaptic Rab3A participating in either controlling release of a trophic factor that regulates surface GluA2 receptors (pre- or postsynaptic) or directly affecting fusion of GluA2-receptor containing vesicles (postsynaptic).

      (2) We replaced all cumulative density function plots and ratio plots, based on multiple quantile samples per cell, with box plots of cell means. This affects new Figures 1, 2, 3, 5, 6, 7 and 8. All references to “scaling,” “divergent scaling,” or “uniform scaling,” have been removed. New p values for comparison of means are provided above every box plot in Figures 1, 2, 3, 5, 6, 7 and 8. The number of cultures is provided in the figure legends.

      (3) We have added frequency to Figures 1, 2 and 8. Frequency values overall are more variable, and the effect of activity blockade less robust, than for mEPSC amplitudes. We have added text indicating that the increase in frequency after activity blockade was significant in neurons from cultures prepared from WT in the Rab3A+/- colony but not cultures prepared from KO mice (Results, lines 143 to 147, new Fig. 1G. H). The TTX-induced increase in frequency was significant in the NASPM experiments before NASPM, but not after NASPM (Results, lines 231 to 233, new Fig. 3, also cultures from WT in Rab3A+/- colony). The homeostatic plasticity effect on frequency did not reach significance in WT on WT glia cultures or

      WT on KO glia cultures, possibly due to the variability of frequency, combined with smaller sample sizes (Results, lines 400 to 403, new Fig. 8). In the cultures prepared from WT mice in the Rab3A+/Ebd colony, there was a trend towards higher frequency after TTX that did not reach statistical significance, and in cultures prepared from mutant mice, the p value was large, suggesting disruption of the effect, which appears to be due to an increase in frequency in untreated cultures, similar to the behavior of mEPSC amplitudes in neurons from mutant mice (Results, lines 161-167). In sum, the effect of activity on frequency requires Rab3A and Ca2+-permeable receptors, and is mimicked by the presence of the Rab3A Earlybird mutant. We have also added a discussion of these results (Discussion, lines 427-435). 

      (4) In the revised manuscript we have added analysis of VGLUT1 levels for the same synaptic sites that we previously analyzed GluA2 levels, and these data are described in Results, lines 344 to 371, and appear in new Table 2. In contrast to previous studies, we did not find any evidence for an increase in VGLUT1 levels after activity blockade. We reviewed those studies to determine whether there might be differences in the experimental details that could explain the lack of effect we observed. In (De Gois et al., 2005), the authors measured mRNA and performed western blots to show increases in VGLUT1 after TTX treatment in older rat cortical cultures (DIV 19). The study performs immunofluorescence imaging of VGLUT1 but only after bicuculline treatment (it decreases), not after TTX treatment. In (Wilson et al.,

      2005), the hippocampal cultures are treated with AP5, not TTX, and the VGLUT1 levels in immunofluorescence images are reported relative to synapsin I. That the type of activity blockade matters is illustrated by the failure of Wilson and colleagues to observe a consistent increase in VGLUT1/Synapsin ratio in cultures treated with AMPA receptor blockade (NBQX; supplementary information). These points have been added to the Discussion, lines 436 to 447.)

      Reviewer #1:

      (1) (model…is not supported by the data), (2) (The analysis of mEPSC data using quantile sampling…), (3) (…statistical analysis of CDFs suffers from n-inflation…), (4) (How does recording noise and the mEPSC amplitude threshold affect “divergent scaling?”) (5) (…justification for the line fits of the ratio data…), (7) (A comparison of p-values between conditions….) and (10) (Was VGLUT intensity altered in the stainings presented in the manuscript?)

      The major changes we made, described above, address Reviewer #1’s points. The remaining points are addressed below.

      (6) TTX application induces a significant increase in mEPSC amplitude in Rab3A-/- mice in two out of three data sets (Figs. 1 and 9). Hence, the major conclusion that Rab3A is required for homeostatic scaling is only partially supported by the data. 

      The p values based on CDF comparisons were problematic, but the point we were making is that they were much larger for amplitudes measured in cultures prepared from Rab3A-/- mice (Fig. 1, p = 0.04) compared to those from cultures prepared from Rab3A+/+ mice (Fig. 1, p = 4.6 * 10-4). Now that we are comparing means, there are no significant TTX-induced effects on mEPSC amplitudes for Rab3A-/- data. However, acknowledging that some increase after activity blockade remains, we describe homeostatic plasticity as being impaired or not significant, rather than abolished, by loss of Rab3A, (Abstract, lines 37 to 39; Results, lines 141 to 143; Discussion, lines 415 to 418).

      (8) There is a significant increase in baseline mEPSC amplitude in Rab3AEbd/Ebd (15 pA) vs. Rab3AEbd/+ (11 pA) cultures, but not in Rab3A-/- (13.6 pA) vs. Rab3A+/- (13.9 pA). Although the nature of scaling was different between Rab3AEbd/Ebd vs. Rab3AEbd/+ and Rab3AEbd/Ebd with vs. without TTX, the question arises whether the increase in mEPSC amplitude in Rab3AEbd/Ebd is Rab3A dependent. Could a Rab3A independent mechanism occlude scaling?

      The Reviewer is concerned that the increase in mEPSC amplitude in the presence of the Rab3A point mutant may be through a ‘non-Rab3A’ mechanism (a concern raised by the lack of such effect in cultures from the Rab3A-/- mice), and secondly, that the already large mEPSC cannot be further increased by the homeostatic plasticity mechanism. It must always be considered that a mutant with an altered genetic sequence may bind to novel partners, causing activities that would not be either facilitated or inhibited by the original molecule. We have added this caveat to Results, lines 180 to 186 We added that a number of other manipulations, implicating individual molecules in the homeostatic mechanism, have caused an increase in mEPSC amplitude at baseline, potentially nonspecifically occluding the ability of activity blockade to induce a further increase (Results lines 186 to 189). Still, it is a strong coincidence that the novel activity of the mutant Rab3A would affect mEPSC amplitude, the same characteristic that is affected by activity blockade in a Rab3A dependent manner, a point which we added to Results, lines 189 to 191.

      (9) Figure 4: NASPM appears to have a stronger effect on mEPSC frequency in the TTX condition vs. control (-40% vs -15%). A larger sample size might be necessary to draw definitive conclusions on the contribution of Ca2+-permeable AMPARs.

      Our results, even with the modest sample size of 11 cells, are clear: NASPM does not disrupt the effect of TTX treatment on mEPSC amplitude (new Fig. 3A). It also looks like there is a greater magnitude effect of NAPSM on frequency in TTX-treated cells; we note this, but point out that nevertheless, these mEPSCs are not contributing to the increase in mEPSC amplitude (Results, lines 238-241). 

      (11) The change in GluA2 area or fluorescence intensity upon TTX treatment in controls is modest. How does the GluA2 integral change?

      We had reported that GluA2 area showed the most prominent increase following activity blockade, with intensity changing very little. When we examined the integral, it closely matched the change in area. We have added the values for integral to new Fig. 5 D, H; new Fig. 6 A-C; new Fig. 7 A-C and new Table 1 (for GluA2) and new Table 2 (for VGLUT1). These results are described in the text in the following places: Results, lines 289-292; 298-299; 311-319; 328-324). For VGLUT1, both area and intensity changed modestly, and the integral appeared to be a combination of the two, being higher in magnitude and resulting in smaller p values than either area or intensity (Results, lines 344-348; 353-359; new Table 2).

      (12) The quantitative comparison between physiology and microscopy data is problematic. The authors report a mismatch in ratio values between the smallest mEPSC amplitudes and the smallest GluA2 receptor cluster sizes (l. 464; Figure 8). Is this comparison affected by the fluorescence intensity threshold? What was the rationale for a threshold of 400 a.u. or 450 a.u.? How does this threshold compare to the mEPSC threshold of 3 pA.

      This concern is partially addressed by no longer comparing the rank ordered mEPSC amplitudes with the rank ordered GluA2 receptor characteristics. We had used multiple thresholds in the event that an experiment was not analyzable with the chosen threshold (this in fact happened for VGLUT1, see end of this paragraph). We created box plots of the mean GluA2 receptor cluster size, intensity and integral, for experiments in which we used all three thresholds, to determine if the effect of activity blockade was different depending on which threshold was applied, and found that there was no obvious difference in the results (Author response image 1). Nevertheless, since there is no need to use a different threshold for any of the 6 experiments (3 WT and 3KO), for new Figures 5, 6 and 7 we used the same threshold for all data, 450; described in Methods, lines 746 to 749. For VGLUT1 levels, it was necessary to use a different threshold for Rab3A+/+ Culture #1 (400), but a threshold of 200 for the other five experiments (Methods, lines 751-757). The VGLUT1 immunofluorescent sites in Culture #1 had higher levels overall, and the low threshold caused the entire AOI to be counted as the synapse, which clearly included background levels outside of the synaptic site. Conversely, to use a threshold of 400 on the other experiments meant that the synaptic site found by the automated measurement tool was much smaller that what was visible by eye. In our judgement it would have been meaningless to adhere to a single threshold for VGLUT1 data.

      Author response image 1.

      Using different thresholds does not substantially alter GluA2 receptor cluster size data. A) Rab3A+/+ Culture #1, size data for three different thresholds, depicted above each graph. B) Rab3A+/+ Culture #2, size data for three different thresholds, depicted above each graph. Note scale bar in A is different from B, to highlight differences for different thresholds. (Culture #3 was only analyzed with 450 threshold).

      The conclusion that an increase in AMPAR levels is not fully responsible for the observed mEPSC increase is mainly based on the rank-order analysis of GluA2 intensity, yielding a slope of ~0.9. There are several points to consider here: (i) GluA2 fluorescence intensity did increase on average, as did GluA2 cluster size.

      (ii) The increase in GluA2 cluster size is very similar to the increase in mEPSC amplitude (each approx. 1820%). (iii) Are there any reports that fluorescence intensity values are linearly reporting mEPSC amplitudes (in this system)? Antibody labelling efficiency, and false negatives of mEPSC recordings may influence the results. The latter was already noted by the authors.

      Our comparison between mEPSC amplitude and GluA2 receptor cluster characteristics has been reexamined in the revised version using means rather than rank-ordered data in rank-order plots or ratio plots. Importantly, all of these methods revealed that in one out of three WT cultures (Culture #3) GluA2 receptor cluster size (old Fig. 8, old Table 1; new Fig. 6, new Table 1), intensity and integral (new Fig. 6, new Table 1) values decreased following activity blockade while in the same culture, mEPSC amplitudes increased. It is based on this lack of correspondence that we conclude that increases in mEPSC amplitude are not fully explained by increases in GluA2 receptors, and suggest there may be other contributors. These points are made in the Abstract (lines 108-110); Results (lines 319 to 326; 330337; 341-343) and the Discussion (lines 472 to 474). To our knowledge, there are not any reports that quantitatively compare receptor levels (area, intensity or integrals) to mEPSC amplitudes in the same cultures. We examined the comparisons very closely for 5 studies that used TTX to block activity and examined receptor levels using confocal imaging at identified synapses (Hou et al., 2008; Ibata et al., 2008; Jakawich et al., 2010a; Xu and Pozzo-Miller, 2017; Dubes et al., 2022). We were specifically looking for whether the receptor data were more variable than the mEPSC amplitude data, as we found. However, for 4 of the studies, sample sizes were very different so that we cannot simply compare the p values. Below is a table of the comparisons.

      Author response table 1.

      In Xu 2017 the sample sizes are close enough that we feel comfortable concluding that the receptor data were slightly more variable (p < 0.05) than mEPSC data (p<0.01) but recognize that it is speculative to say our finding has been confirmed. A discussion of these articles is in Discussion, lines 456-474.

      (iv) It is not entirely clear if their imaging experiments will sample from all synapses. Other AMPAR subtypes than GluA2 could contribute, as could kainite or NMDA receptors.

      While our imaging data only examined GluA2, we used the application of NASPM to demonstrate Ca2+permeable receptors did not contribute quantitatively to the increase in mEPSC amplitude following TTX treatment. Since GluA3 and GluA4 are also Ca2+-permeable, the findings in new Figure 3 (old Fig. 4) likely rule out these receptors as well.  There are also reports that Kainate receptors are Ca2+-permeable and blocked by NASPM (Koike et al., 1997; Sun et al., 2009), suggesting the NASPM experiment also rules out the contribution of Kainate receptors. Finally, given our recording conditions, which included normal magnesium levels in the extracellular solution as well as TTX to block action-potential evoked synaptic transmission, NMDA receptors would not be available to contribute currents to our recordings due to block by magnesium ions at resting Vm. These points have been added to the Methods section, lines 617 to 677 (NMDA); 687-694 (Ca2+-permeable AMPA receptors and Kainate receptors).

      Furthermore, the statement “complete lack of correspondence of TTX/CON ratios” is not supported by the data presented (l. 515ff). First, under the assumption that no scaling occurs in Rab3A-/-, the TTX/CON ratios show a 20-30% change, which indicates the variation of this readout. Second, the two examples shown in Figure 8 for Rab3A+/+ are actually quite similar (culture #1 and #2, particularly when ignoring the leftmost section of the data, which is heavily affected by the raw values approaching zero.

      We are no longer presenting ratio plots in the revised manuscript, so we do not base our conclusion that mEPSC amplitude data is not always corresponding to GluA2 receptor data on the difference in behavior of TTX/CON ratio values, but only on the difference in direction of the TTX effect in one out of three cultures. We agree with the reviewer that the ratio plots are much more sensitive to differences between control and treated values than the rank order plot, and we feel these differences are important, for example, there is still a homeostatic increase in the Rab3A-/- cultures, and the effect is still divergent rather than uniform. But the comparison of ratio data will be presented elsewhere.

      (13) Figure 7A: TTX CDF was shifted to smaller mEPSC amplitude values in Rab3A-/- cultures. How can this be explained?

      While this result is most obvious in CDF plots, we still observe a trend towards smaller mEPSC amplitudes after TTX treatment in two of three individual cultures prepared from Rab3A-/- mice when comparing means (new Fig. 7, Table 1) which did not reach statistical significance for the pooled data (new Fig. 5, new Table 1). There was not any evidence of this decrease in the larger data set (new Fig. 1) nor for Rab3A-/- neurons on Rab3A+/+ glia (new Fig. 8). Given that this effect is not consistent, we did not comment on it in the revised manuscript. It may be that there is a non-Rab3A-dependent mechanism that results in a decrease in mEPSC amplitude after activity blockade, which normally pulls down the magnitude of the activity-dependent increase typically observed. But studying this second component would be difficult given its magnitude and inconsistent presentation.

      Reviewer #1 (Recommendations For the Authors):

      (1) Abstract, last sentence: The conclusion of the present manuscript should be primarily based on the results presented. At present, it is mainly based on a previous publication by the authors.

      We have revised the last sentence to reflect actual findings of the current study (Abstract, lines 47 to 49).

      (2) Line 55: “neurodevelopmental”

      This phrase has been removed.

      (3) Line 56: “AMPAergic” should be replaced by AMPAR-mediated

      This sentence was removed when all references to “scaling” were removed; no other instances of “AMPAergic” are present.

      (4) Figure 9: The use of BioRender should be disclosed in the Figure Legend.

      We used BioRender in new Figures 3, 7 and 8, and now acknowledge BioRender in those figure legends.

      (5) Figure legends and results: The number of cultures should be indicated for each comparison.

      Number of cultures has been added to the figure legends.

      (6) Line 289: A comparison of p-values between conditions does not allow any meaningful conclusions.

      Agreed, therefore we have removed CDFs and the KS test comparison p values. All comparisons in the revised manuscript are for cell means.

      (7) Line 623ff: The argument referring to NMJ data is weak, given that different types of receptors are involved.

      We still think it is valid to point out that Rab3A is required for the increase in mEPC at the NMJ but that ACh receptors do not increase (Discussion, lines 522 to 525). We are not saying that postsynaptic receptors do not contribute in cortical cultures, only that there could be another Rab3A-dependent mechanism that also affects mEPSC amplitude.

      (8) Plotting data points outside of the ranges should be avoided (e.g., Fig. 2Giii, 7F).

      These two figures are no longer present in the revised manuscript. In revising figures, we made sure no other plots have data points outside of the ranges.

      (9) The rationale for investigating Rab3AEbd/Ebd remains elusive and should be described.

      A rationale for investigating Rab3AEbd/Ebd is that if the results are similar to the KO, it strengthens the evidence for Rab3A being involved in homeostatic synaptic plasticity. In addition, since its phenotype of early awakening was stronger than that demonstrated in Rab3A KO mice (Kapfhamer et al., 2002), it was possible we would see a more robust effect. These points have been added to the Results, lines 118 to 126.

      (10) Figures 3 and 4, as well as Figure 5 and 6 could be merged.

      In the revised version, Figure 3 has been eliminated since its main point was a difference in scaling behavior. Figure 4 has been expanded to include a model of how NASPM could reduce frequency (new Fig. 3.) Images of the pyramidal cell body have been added to Figure 5 (new Fig. 4), and Figure 6 has been completely revised and now includes pooled data for both Rab3A+/+ and Rab3A-/- cultures, for mEPSC amplitude, GluA2 receptor cluster size, intensity and integral.

      (11) Figure 5: The legend refers to MAP2, but this is not indicated in the figure.

      MAP2 has now been added to the labels for each image and described in the figure legend (new Fig. 4).

      Reviewer #2:

      Technical concerns:

      (1) The culture condition is questionable. The authors saw no NMDAR current present during spontaneous recordings, which is worrisome since NMDARs should be active in cultures with normal network activity (Watt et al., 2000; Sutton et al., 2006). It is important to ensure there is enough spiking activity before doing any activity manipulation. Similarly it is also unknown whether spiking activity is normal in Rab3AKO/Ebd neurons.

      In the studies cited by the reviewer, NMDA currents were detected under experimental conditions in which magnesium was removed. In our recordings, we have normal magnesium (1.3 mM) and also TTX, which prevents the necessary depolarization to allow inward current through NMDA receptors. This point has been added to our Methods, lines 674 to 677. We acknowledge we do not know the level of spiking in cultures prepared from Rab3A+/+, Rab3A-/- or Rab3A_Ebd/Ebd_ mice. Given the similar mEPSC amplitude for untreated cultures from WT and KO studies, we think it unlikely that activity was low in the latter, but it remains a possibility for untreated cultures from Rab3A_Ebd/Ebd_ mice, where mEPSC amplitude was increased. These points are added to the Methods, lines 615 to 622.

      (2) Selection of mEPSC events is not conducted in an unbiased manner. Manually selecting events is insufficient for cumulative distribution analysis, where small biases could skew the entire distribution. Since the authors claim their ratio plot is a better method to detect the uniformity of scaling than the well-established rank-order plot, it is important to use an unbiased population to substantiate this claim.

      We no longer include any cumulative distributions or ratio plot analysis in the revised version. We have added the following text to Methods, lines 703 to 720:

      “MiniAnalysis selects many false positives with the automated feature when a small threshold amplitude value is employed, due to random fluctuations in noise, so manual re-evaluation of the automated process is necessary to eliminate false positives. If the threshold value is set high, there are few false positives but small amplitude events that visually are clearly mEPSCs are missed, and manual re-evaluation is necessary to add back false negatives or the population ends up biased towards large mEPSC amplitudes. As soon as there is a manual step, bias is introduced. Interestingly, a manual reevaluation step was applied in a recent study that describes their process as ‘unbiased (Wu et al., 2020). In sum, we do not believe it is currently possible to perform a completely unbiased detection process. A fully manual detection process means that the same criterion (“does this look like an mEPSC?”) is applied to all events, not just the false positives, or the false negatives, which prevents the bias from being primarily at one end or the other of the range of mEPSC amplitudes. It is important to note that when performing the MiniAnalysis process, the researcher did not know whether a record was from an untreated cell or a TTX-treated cell.”

      (3) Immunohistochemistry data analysis is problematic. The authors only labeled dendrites without doing cell-fills to look at morphology, so it is questionable how they differentiate branches from pyramidal neurons and interneurons. Since glutamatergic synapse on these two types of neuron scale in the opposite directions, it is crucial to show that only pyramidal neurons are included for analysis.

      We identified neurons with a pyramidal shape and a prominent primary dendrite at 60x magnification without the zoom feature. This should have been made clear in the description of imaging. We have added an image of the two selected cells to our figure of dendrites (old Fig. 5, new Fig. 4), and described this process in the Methods, lines 736 to 739, and Results, lines 246 to 253. Given the morphology of the neurons selected it is highly unlikely that the dendrites we analyzed came from interneurons.

      Conceptual Concerns

      The only novel finding here is the implicated role for Rab3A in synaptic scaling, but insights into mechanisms behind this observation are lacking. The authors claim that Rab3A likely regulates scaling from the presynaptic side, yet there is no direct evidence from data presented. In its current form, this study’s contribution to the field is very limited.

      We have demonstrated that loss of Rab3A and expression of a Rab3A point mutant disrupt homeostatic plasticity of mEPSC amplitudes, and that in the absence of Rab3A, the increase in GluA2 receptors at synaptic sites is abolished. Further, we show that this effect cannot be through release of a factor, like TNFα, from astrocytes. In the new version, we add the finding that VGLUT1 is not increased after activity blockade, ruling out this presynaptic factor as a contributor to homeostatic increases in mEPSC amplitude. We show for the first time by examining mEPSC amplitudes and GluA2 receptors in the same cultures that the increases in GluA2 receptors are not as consistent as the increases in mEPSC amplitude, suggesting the possibility of another contributor to homeostatic increases in mEPSC amplitude. We first proposed this idea in our previous study of Rab3A-dependent homeostatic increases in mEPC amplitudes at the mouse neuromuscular junction. In sum, we dispute that there is only one novel finding and that we have no insights into mechanism. We acknowledge that we have no direct evidence for regulation from the presynaptic side, and have removed this claim from the revised manuscript. We have retained the Discussion of potential mechanisms affecting the presynaptic quantum and evidence that Rab3A is implicated in these mechanisms (vesicle size, fusion pore kinetics; Discussion, lines 537 to 563). One way to directly show that the amount of transmitter released for an mEPSC has been modified after activity blockade is to demonstrate that a fast off-rate antagonist has become less effective at inhibiting mEPSCs (because the increased glutamate released out competes it; see (Liu et al., 1999) and (Wilson et al., 2005) for example experiments). This set of experiments is underway but will take more time than originally expected, because we are finding surprisingly large decreases in frequency, possibly the result of mEPSCs with very low glutamate concentration that are completely inhibited by the dose used. Once mEPSCs are lost, it is difficult to compare the mEPSC amplitude before and after application of the antagonist. Therefore we intend to include this experiment in a future report, once we determine the reason for the frequency reduction, or, can find a dose where this does not occur.

      (1) Their major argument for this is that homeostatic effects on mEPSC amplitudes and GluA2 cluster sizes do not match. This is inconsistent with reports from multiple labs showing that upscaling of mEPSC amplitude and GluA2 accumulation occur side by side during scaling (Ibata et al., 2008; Pozo et al., 2012; Tan et al., 2015; Silva et al., 2019). Further, because the acquisition and quantification methods for mEPSC recordings and immunohistochemistry imaging are entirely different (each with its own limitations in signal detection), it is not convincing that the lack of proportional changes must signify a presynaptic component.

      Within the analyses in the revised manuscript, which are now based only on comparison of cell/dendrite means, we find a very good match in the magnitude of increase for the pooled data of mEPSC amplitudes and GluA2 receptor cluster sizes (+19.7% and +20.0% respectively; new Table 1). However, when looking at individual cultures, we had one of three WT cultures in which mEPSC amplitude increased 17.2% but GluA2 cluster size decreased 9.5%. This result suggests that while activity blockade does lead to an increase in GluA2 receptors after activity blockade, the effect is more variable than that for mEPSC amplitude. We went back to published studies to see if this has been previously observed, but found that it was difficult to compare because the sample sizes were different for the two characteristics (see Author response table 1). We included these particular 5 studies because they use the same treatment (TTX), examine receptors using imaging of identified synaptic sites, and record mEPSCs in their cultures (although the authors do not indicate that imaging and recordings are done simultaneously on the same cultures.) Only one of the studies listed by the Reviewer is in our group (Ibata et al., 2008). The study by (Tan et al., 2015) uses western blots to measure receptors; the study by (Silva et al., 2019) blocks activity using a combination of AMPA and NMDA receptor blockers; the study by (Pozo et al., 2012) correlates mEPSC amplitude changes with imaging but not in response to activity blockade, instead for changing the expression of GluA2. While it may seem like splitting hairs to reject studies that use other treatment protocols, there is ample evidence that the mechanisms of homeostatic plasticity depend on how activity was altered, see the following studies for several examples of this (Sutton et al., 2006; Soden and Chen, 2010; Fong et al., 2015). A discussion of the 5 articles we selected is in the revised manuscript, Discussion, lines 456 to 474. In sum, we provide evidence that activity blockade is associated with an overall increase in GluA2 receptors; what we propose is that this increase, being more variable, does not fully explain the increase in mEPSC amplitude. However, we acknowledge that the disparity could be explained by the differences in limitations of the two methods (Discussion, lines 469-472).

      (2) The authors also speculate in the discussion that presynaptic Rab3A could be interacting with retrograde BDNF signaling to regulate postsynaptic AMPARs. Without data showing Rab3A-dependent presynaptic changes after TTX treatment, this argument is not compelling. In this retrograde pathway, BDNF is synthesized in and released from dendrites (Jakawich et al., 2010b; Thapliyal et al., 2022), and it is entirely possible for postsynaptic Rab3A to interfere with this process cell-autonomously.

      We have added the information that Rab3A could control BDNF from the postsynaptic cell and included the two references provided by the reviewer, Discussion, lines 517 to 518. We have added new evidence, recently published, that the Rab3 family has been shown to regulate targeting of EGF receptors to rafts (among other plasma membrane molecules), with Rab3A itself clearly present in nonneuronal cells (Diaz-Rohrer et al., 2023) (added to Discussion, lines 509 to 515).

      (3) The authors propose that a change in AMPAR subunit composition from GluA2-containing ones to GluA1 homomers may account for the distinct changes in mEPSC amplitudes and GluA2 clusters. However, their data from the NASPM wash-in experiments clearly show that the GluA1 homomer contributions have not changed before and after TTX treatment.

      We have revised this section in the Discussion, lines 534 to 536, to clarify that any change due to GluA1 homomers should have been detectable by a greater ability of NASPM to reverse the TTX-induced increase.

      Reviewer #2 (Recommendations for the Authors):

      For authors to have more convincing arguments in general, they will need to clarify/improve certain details in their data collection by addressing the above technical concerns. Additionally, the authors should design experiments to test whether Rab3A regulates scaling from pre- or post-synaptic site. For example, they could sparsely knock out Rab3A in WT neurons to test the postsynaptic possibility. On the other hand, their argument for a presynaptic role would be much more compelling if they could show whether there are clear functional changes such as in vesicle sizes and release probability in the presynaptic terminal of Rab3AKO neurons.

      An important next step is to identify whether Rab3A is acting pre- or post-synaptically (Discussion, lines 572 to 573), but these experiments will be undertaken in the future. It would not add much to simply show vesicle size is altered in the KO (and we do not necessarily expect this since mEPSC amplitude is normal in the KO). It will be very difficult to establish that vesicle size is changing with activity blockade and that this change is prevented in the Rab3A KO, because we are looking for a ~25% increase in vesicle volume, which would correspond to a ~7.5% increase in diameter. Finally, we do not believe demonstrating changes in release probability tell us anything about a presynaptic role for Rab3A in regulating the size of the presynaptic quantum.

      Reviewer #3 (Public Review)

      Weaknesses: However, the rather strong conclusions on the dissociation of AMPAR trafficking and synaptic response are made from somewhat weaker data. The key issue is the GluA2 immunostaining in comparison with the mEPSC recordings. Their imaging method involves only assessing puncta clearly associated with a MAP2 labeled dendrite. This is a small subset of synapses, judging from the sample micrographs (Fig. 5). To my knowledge, this is a new and unvalidated approach that could represent a particular subset of synapses not representative of the synapses contributing to the mEPSC change (they are also sampling different neurons for the two measurements; an additional unknown detail is how far from the cell body were the analyzed dendrites for immunostaining.) While the authors acknowledge that a sampling issue could explain the data, they still use this data to draw strong conclusions about the lack of AMPAR trafficking contribution to the mEPSC amplitude change. This apparent difference may be a methodological issue rather than a biological one, and at this point it is impossible to differentiate these. It will unfortunately be difficult to validate their approach. Perhaps if they were to drive NMDAdependent LTD or chemLTP, and show alignment of the imaging and ephys, that would help. More helpful would be recordings and imaging from the same neurons but this is challenging. Sampling from identified synapses would of course be ideal, perhaps from 2P uncaging combined with SEP-labeled AMPARs, but this is more challenging still. But without data to validate the method, it seems unwarranted to make such strong conclusions such as that AMPAR trafficking does not underlie the increase in mEPSC amplitude, given the previous data supporting such a model.

      In the new version, we soften our conclusion regarding the mismatch between GluA2 receptor levels and mEPSC amplitudes, now only stating that receptors may not be the sole contributor to the TTX effect on mEPSC amplitude (Discussion, lines 472 to 474). With our analysis in the new version focusing on comparisons of cell means, the GluA2 receptor cluster size and the mEPSC amplitude data match well in magnitude for the data pooled across the 3 matched cultures (20.0% and 19.7%, respectively, see new Table 1). However, in one of the three cultures the direction of change for GluA2 receptors is opposite that of mEPSC amplitudes (Table 1, Culture #3, -9.5% vs +17.2%, respectively).

      It is unlikely that the lack of matching of homeostatic plasticity in one culture, but very good matching in two other cultures, can be explained by an unvalidated focus on puncta associated with MAP2 positive dendrites. We chose to restrict analysis of synaptic GluA2 receptors to the primary dendrite in order to reduce variability, reasoning that we are always measuring synapses for an excitatory pyramidal neuron, synapses that are relatively close to the cell body, on the consistently identifiable primary dendrite. We measured how far this was for the two cells depicted in old Figure 5 (new Fig. 4). Because we always used the 5X zoom window which is a set length, and positioned it within ~10 microns of the cell body, these cells give a ball park estimate for the usual distances. For the untreated cell, the average distance from the cell body was 38.5 ± 2.8 µm; for the TTX-treated cell, it was 42.4 ± 3.2 µm (p = 0.35, KruskalWallis test). We have added these values to the Results, lines 270 to 274.

      We did not mean to propose that AMPA receptor levels do not contribute at all to mEPSC amplitude, and we acknowledge there are clear cases where the two characteristics change in parallel (for example, in the study cited by Reviewer #2, (Pozo et al., 2012), increases in GluA2 receptors due to exogenous expression are closely matched by increases in mEPSC amplitudes.) What our matched culture experiments demonstrate is that in the case of TTX treatment, both GluA2 receptors and mEPSC amplitudes increase on average, but sometimes mEPSC amplitudes can increase in the absence of an increase in GluA2 receptors (Culture #3, Rab3A+/+ cultures), and sometimes mEPSC amplitudes do not increase even though GluA2 receptor levels do increase (Culture #3, Rab3A-/- cultures). Therefore, it would not add anything to our argument to examine receptors and mEPSCs in NMDA-dependent LTP, a different plasticity paradigm in which changes in receptors and mEPSCs may more closely align. It has been demonstrated that mEPSCs of widely varying amplitude can be recorded from a single synaptic site (Liu and Tsien, 1995), so we would need to measure a large sample of individual synapse recordings to detect a modest shift in average values due to activity blockade. In addition, it would be essential to express fluorescent AMPA receptors in order to correlate receptor levels in the same cells we record from (or at the same synapses). And yet, even after these heroics, one is still left with the issue that the two methods, electrophysiology and fluorescent imaging, have distinct limitations and sources of variability that may obscure any true quantitative correlation.

      Other questions arise from the NASPM experiments, used to justify looking at GluA2 (and not GluA1) in the immunostaining. First, there is a frequency effect that is quite unclear in origin. One would expect NASPM to merely block some fraction of the post-synaptic current, and not affect pre-synaptic release or block whole synapses. It is also unclear why the authors argue this proves that NASPM was at an effective concentration (lines 399-400). Further, the amplitude data show a strong trend towards smaller amplitude. The p value for both control and TTX neurons was 0.08 – it is very difficult to argue that there is no effect. And the decrease is larger in the TTX neurons. Considering the strong claims for a presynaptic locus and the use of this data to justify only looking at GluA2 by immunostaining, these data do not offer much support of the conclusions. Between the sampling issues and perhaps looking at the wrong GluA subunit, it seems premature to argue that trafficking is not a contributor to the mEPSC amplitude change, especially given the substantial support for that hypothesis. Further, even if trafficking is not the major contributor, there could be shifts in conductance (perhaps due to regulation of auxiliary subunits) that does not necessitate a pre-synaptic locus. While the authors are free to hypothesize such a mechanism, it would be prudent to acknowledge other options and explanations.

      We have created a model cartoon to explain how NASPM could reduce mEPSC frequency (new Fig. 3D). mEPSCs that arise from a synaptic site that has only Ca2+-permeable AMPA receptors will be completely blocked by NASPM, if the NASPM concentration is maximal. The reason we conclude that we have sufficient NASPM reaching the cells is that the frequency is decreased, as expected if there are synaptic sites with only Ca2+-permeable AMPA receptors. We previously were not clear that there is an effect of NASPM on mEPSC amplitude, although it did not reach statistical significance (new Fig. 3B). Where there is no effect is on the TTX-induced increase in mEPSC amplitude, which remains after the acute NASPM application (new Fig. 3A). We have revised the description of these findings in Results, lines 220 to 241. In reviewing the literature further, we could find no previous studies demonstrating an increase in conductance in GluA2 or Ca2+-impermeable receptors, only in GluA1 homomers. In other words, any conductance change would have been due to a change in GluA1 homomers, and should have been visible as a disruption of the homeostatic plasticity by NASPM application. We have added text to Results, lines 211 to 217; 236-241; Discussion, lines 420 to 422; 526-536 and Methods, lines 685 to 695 regarding this point.

      The frequency data are missing from the paper, with the exception of the NASPM dataset. The mEPSC frequencies should be reported for all experiments, particularly given that Rab3A is generally viewed as a pre-synaptic protein regulating release. Also, in the NASPM experiments, the average frequency is much higher in the TTX treated cultures. Is this statistically above control values?

      This comment is addressed by the major change #3, above.

      Unaddressed issues that would greatly increase the impact of the paper:

      (1) Is Rab3A activity pre-synaptically, post-synaptically or both. The authors provide good evidence that Rab3A is acting within neurons and not astrocytes. But where is it acting (pre or post) would aid substantially in understanding its role (and particularly the hypothesized and somewhat novel idea that the amount of glutamate released per vesicle is altered in HSP). They could use sparse knockdown of Rab3A, or simply mix cultures from KO and WT mice (with appropriate tags/labels). The general view in the field has been that HSP is regulated post-synaptically via regulation of AMPAR trafficking, and considerable evidence supports this view. The more support for their suggestion of a pre-synaptic site of control, the better.

      This is similar to the request of Reviewer #2, Recommendations to the Authors. An important next step is to identify whether Rab3A is working pre- or postsynaptically. However, it is possible that it is acting pre-synaptically to anterogradely regulate trafficking of AMPAR, as we have depicted in our model, new Fig. 9. To demonstrate that the presynaptic quantum is being altered, we would need to show that vesicle size is increased, or the amount of transmitter being released during an mEPSC is increased after activity blockade. To that end, we are currently performing experiments using a fast off-rate antagonist. As described above in response to Reviewer #2’s Conceptual Concerns, we find dramatic decreases in frequency not explained by the 30-60% inhibition observed for the largest amplitude mEPSCs, which suggests the possibility that small mEPSCs are more sensitive than large mEPSCs and therefore may have less transmitter. Due to these complexities and the delay while we test other antagonists to see if the effect is specific to fast-off rate antagonists, we are not including these results here.

      (2) Rab3A is also found at inhibitory synapses. It would be very informative to know if HSP at inhibitory synapses is similarly affected. This is particularly relevant as at inhibitory synapses, one expects a removal of GABARs and/or a decrease of GABA-packaging in vesicles (ie the opposite of whatever is happening at excitatory synapses.). If both processes are regulated by Rab3A, this might suggest a role for this protein more upstream in the signaling, an effect only at excitatory synapses would argue for a more specific role just at these synapses.

      It will be important to determine if homeostatic synaptic plasticity at inhibitory synapses on excitatory neurons is sensitive to Rab3A deletion, especially in light of the fact that unlike many of the other molecules implicated in homeostatic increases in mEPSCS, Rab3A is not a molecule known to be selective for glutamate receptor trafficking (in contrast to Arc/Arg3.1 or GRIP1, for example). Such a study would warrant its own publication.

      Reviewer #3 (Recommendations for the Authors):

      There are a number of minor points or suggestions for the authors:

      Is RIM1 part of this pathway (or expected to be)? Some discussion of this would be nice.

      RIM, Rab3-interacting molecule, has been implicated at the drosophila neuromuscular junction in a presynaptic form of homeostatic synaptic plasticity in which evoked release is increased after block of postsynaptic receptors (Muller et al., 2012), a plasticity that also requires Rab3-GAP (Muller et al., 2011). To our knowledge there is no evidence that RIM is involved in the homeostatic plasticity of mEPSC amplitude after activity blockade by TTX. The Rim1a KO does not have a change in mEPSC amplitude relative to WT (Calakos et al., 2004), but that is not unexpected given the normal mEPSC amplitude in neurons from cultures prepared from Rab3A-/- mice in the current study. It would be interesting to look at homeostatic plasticity in cortical cultures prepared from Rim1a or other RIM deletion mice, but we have not added these points to the revised manuscript since there are a number of directions one could go in attempting to define the molecular pathway and we feel it is more important to discuss the potential location of action and physiological mechanisms.

      Is the Earlybird mutation a GOF? More information about this mutation would help.

      We have added a description of how the Earlybird mutation was identified, in a screen for rest:activity mutants (Results, lines 118 to 123). Rab3A Earlybird mice have a shortened circadian period, shifting their wake cycle earlier and earlier. When Rab3A deletion mice were tested in the same activity raster plot measurements, the shift was smaller than that for the Earlybird mutant, suggesting the possibility that it is a dominant negative mutation.

      The high K used in the NASPM experiments seems a bit unusual. Have the authors done high K/no drug controls to see if this affects the synapses in any way?

      We used the high K based on previous studies that indicated the blocking effect of the Ca2+-permeable receptor blockers was use dependent (Herlitze et al., 1993; Iino et al., 1996; Koike et al., 1997). We reasoned that a modest depolarization would increase the frequency of AMPA receptor mEPSCs and allow access of the NASPM.  We have added this point to the Methods, lines 695 to 708. 

      The NASPM experiments do not show that GluA1 does not contribute (line 401), only that GluA1 homomers are not contributing (much – see above). GluA1/A2 heteromers are quite likely involved. Also, the SEM is missing from the WT pre/post NASPM data.

      Imaging of GluA2-positive sites will not distinguish between GluA2 homomers and GluA2-GluA1 heteromers, so we have added this clarification to Results, lines 242 to 246. We have remade the NASPM pre-post line plots so that the mean values and error bars are more visible (new Fig. 3B, C).

      It seems odd to speculate based on non-significant findings (line 650-1), with lower significance (p = 0.11) than findings being dismissed in the paper (NASPM on mEPSC amplitude; p = 0.08).

      We did not mean to dismiss the effect of NASPM on mEPSC amplitude (new Fig. 3B), rather, we dismiss the effect of NASPM on the homeostatic increase in mEPSC amplitude caused by TTX treatment (new Fig. 3A). We have emphasized this distinction in Results, lines 223 to 225, and Discussion, lines 420 to 422, as well as adding that the stronger effect of NASPM on frequency after TTX treatment suggests an activity-dependent increase in the number of synapses expressing only Ca2+ permeable homomers (Results, lines 236 to 241; Discussion, lines 431 to 435).

      Fig. 4 could be labeled better (to make it clear that B is amplitude and C is freq from the same cells).

      Fig. 4 has been revised—now the amplitude and frequency plots from the same condition (new Fig. 3, B, C; CON or TTX) are in a vertical line and the figure legend states that the frequency data are from the same cells as in Fig. 3A.

      The raw amplitude data seems a bit hidden in the inset panels – I would suggest these data are at least as important as the cumulative distributions in the main panel. Maybe re-organizing the figures would help.

      We have removed all cumulative distributions, rank order plots, and ratio plots. The box plots are now full size in new Figures 1, 2, 5, 6, 7 and 8.

      I’m not sure I would argue in the paper that 12 cells a day is a limiting issue for experiments. It doesn’t add anything and doesn’t seem like that high a barrier. It is fine to just say it is difficult and therefore there is a limited amount of data meeting the criteria.

      We have removed the comment regarding difficulty.

      Calakos N, Schoch S, Sudhof TC, Malenka RC (2004) Multiple roles for the active zone protein RIM1alpha in late stages of neurotransmitter release. Neuron 42:889-896.

      De Gois S, Schafer MK, Defamie N, Chen C, Ricci A, Weihe E, Varoqui H, Erickson JD (2005) Homeostatic scaling of vesicular glutamate and GABA transporter expression in rat neocortical circuits. J Neurosci 25:7121-7133.

      Diaz-Rohrer B, Castello-Serrano I, Chan SH, Wang HY, Shurer CR, Levental KR, Levental I (2023) Rab3 mediates a pathway for endocytic sorting and plasma membrane recycling of ordered microdomains. Proc Natl Acad Sci U S A 120:e2207461120.

      Dubes S, Soula A, Benquet S, Tessier B, Poujol C, Favereaux A, Thoumine O, Letellier M (2022) miR-124dependent tagging of synapses by synaptopodin enables input-specific homeostatic plasticity. EMBO J 41:e109012.

      Fong MF, Newman JP, Potter SM, Wenner P (2015) Upward synaptic scaling is dependent on neurotransmission rather than spiking. Nat Commun 6:6339.

      Herlitze S, Raditsch M, Ruppersberg JP, Jahn W, Monyer H, Schoepfer R, Witzemann V (1993) Argiotoxin detects molecular differences in AMPA receptor channels. Neuron 10:1131-1140.

      Hou Q, Zhang D, Jarzylo L, Huganir RL, Man HY (2008) Homeostatic regulation of AMPA receptor expression at single hippocampal synapses. Proc Natl Acad Sci U S A 105:775-780.

      Ibata K, Sun Q, Turrigiano GG (2008) Rapid synaptic scaling induced by changes in postsynaptic firing. Neuron 57:819-826.

      Iino M, Koike M, Isa T, Ozawa S (1996) Voltage-dependent blockage of Ca(2+)-permeable AMPA receptors by joro spider toxin in cultured rat hippocampal neurones. J Physiol 496 ( Pt 2):431437.

      Jakawich SK, Neely RM, Djakovic SN, Patrick GN, Sutton MA (2010a) An essential postsynaptic role for the ubiquitin proteasome system in slow homeostatic synaptic plasticity in cultured hippocampal neurons. Neuroscience 171:1016-1031.

      Jakawich SK, Nasser HB, Strong MJ, McCartney AJ, Perez AS, Rakesh N, Carruthers CJ, Sutton MA (2010b) Local presynaptic activity gates homeostatic changes in presynaptic function driven by dendritic BDNF synthesis. Neuron 68:1143-1158.

      Kapfhamer D, Valladares O, Sun Y, Nolan PM, Rux JJ, Arnold SE, Veasey SC, Bucan M (2002) Mutations in Rab3a alter circadian period and homeostatic response to sleep loss in the mouse. Nat Genet 32:290-295.

      Koike M, Iino M, Ozawa S (1997) Blocking effect of 1-naphthyl acetyl spermine on Ca(2+)-permeable AMPA receptors in cultured rat hippocampal neurons. Neurosci Res 29:27-36.

      Liu G, Tsien RW (1995) Properties of synaptic transmission at single hippocampal synaptic boutons. Nature 375:404-408.

      Liu G, Choi S, Tsien RW (1999) Variability of neurotransmitter concentration and nonsaturation of postsynaptic AMPA receptors at synapses in hippocampal cultures and slices. Neuron 22:395409.

      Muller M, Pym EC, Tong A, Davis GW (2011) Rab3-GAP controls the progression of synaptic homeostasis at a late stage of vesicle release. Neuron 69:749-762.

      Muller M, Liu KS, Sigrist SJ, Davis GW (2012) RIM controls homeostatic plasticity through modulation of the readily-releasable vesicle pool. J Neurosci 32:16574-16585.

      Pozo K, Cingolani LA, Bassani S, Laurent F, Passafaro M, Goda Y (2012) beta3 integrin interacts directly with GluA2 AMPA receptor subunit and regulates AMPA receptor expression in hippocampal neurons. Proc Natl Acad Sci U S A 109:1323-1328.

      Silva MM, Rodrigues B, Fernandes J, Santos SD, Carreto L, Santos MAS, Pinheiro P, Carvalho AL (2019) MicroRNA-186-5p controls GluA2 surface expression and synaptic scaling in hippocampal neurons. Proc Natl Acad Sci U S A 116:5727-5736.

      Soden ME, Chen L (2010) Fragile X protein FMRP is required for homeostatic plasticity and regulation of synaptic strength by retinoic acid. J Neurosci 30:16910-16921.

      Sun HY, Bartley AF, Dobrunz LE (2009) Calcium-permeable presynaptic kainate receptors involved in excitatory short-term facilitation onto somatostatin interneurons during natural stimulus patterns. J Neurophysiol 101:1043-1055.

      Sutton MA, Ito HT, Cressy P, Kempf C, Woo JC, Schuman EM (2006) Miniature neurotransmission stabilizes synaptic function via tonic suppression of local dendritic protein synthesis. Cell 125:785-799.

      Tan HL, Queenan BN, Huganir RL (2015) GRIP1 is required for homeostatic regulation of AMPAR trafficking. Proc Natl Acad Sci U S A 112:10026-10031.

      Thapliyal S, Arendt KL, Lau AG, Chen L (2022) Retinoic acid-gated BDNF synthesis in neuronal dendrites drives presynaptic homeostatic plasticity. Elife 11.

      Wilson NR, Kang J, Hueske EV, Leung T, Varoqui H, Murnick JG, Erickson JD, Liu G (2005) Presynaptic regulation of quantal size by the vesicular glutamate transporter VGLUT1. J Neurosci 25:62216234.

      Wu YK, Hengen KB, Turrigiano GG, Gjorgjieva J (2020) Homeostatic mechanisms regulate distinct aspects of cortical circuit dynamics. Proc Natl Acad Sci U S A 117:24514-24525.

      Xu X, Pozzo-Miller L (2017) EEA1 restores homeostatic synaptic plasticity in hippocampal neurons from Rett syndrome mice. J Physiol 595:5699-5712.

    1. Author Response:

      Reviewer #1 (Public review):

      In this study, Deshmukh et al. provide an elegant illustration of Haldane's sieve, the population genetics concept stating that novel advantageous alleles are more likely to fix if dominant because dominant alleles are more readily exposed to selection. To achieve this, the authors rely on a uniquely suited study system, the female-polymorphic butterfly Papilio polytes.

      Deshmukh et al. first reconstruct the chronology of allele evolution in the P. polytes species group, clearly establishing the non-mimetic cyrus allele as ancestral, followed by the origin of the mimetic allele polytes/theseus, via a previously characterized inversion of the dsx locus, and most recently, the origin of the romulus allele in the P. polytes lineage, after its split from P. javanus. The authors then examine the two crucial predictions of Haldane's sieve, using the three alleles of P. polytes (cyrus, polytes, and romulus). First, they report with compelling evidence that these alleles are sequentially dominant, or put in other words, novel adaptive alleles either are or quickly become dominant upon their origin. Second, the authors find a robust signature of positive selection at the dsx locus, across all five species that share the polytes allele.

      In addition to exquisitely exemplifying Haldane's sieve, this study characterizes the genetic differences (or lack thereof) between mimetic alleles at the dsx locus. Remarkably, the polytes and romulus alleles are profoundly differentiated, despite their short divergence time (< 0.5 my), whereas the polytes and theseus alleles are indistinguishable across both coding and intronic sequences of dsx. Finally, the study reports incidental evidence of exon swaps between the polytes and romulus alleles. These exon swaps caused intermediate colour patterns and suggest that (rare) recombination might be a mechanism by which novel morphs evolve.

      This study advances our understanding of the evolution of the mimicry polymorphism in Papilio butterflies. This is an important contribution to a system already at the forefront of research on the genetic and developmental basis of sex-specific phenotypic morphs, which are common in insects. More generally, the findings of this study have important implications for how we think about the molecular dynamics of adaptation. In particular, I found that finding extensive genetic divergence between the polytes and romulus alleles is striking, and it challenges the way I used to think about the evolution of this and other otherwise conserved developmental genes. I think that this study is also a great resource for teaching evolution. By linking classic population genetic theory to modern genomic methods, while using visually appealing traits (colour patterns), this study provides a simple yet compelling example to bring to a classroom.

      In general, I think that the conclusions of the study, in terms of the evolutionary history of the locus, the dominance relationships between P. polytes alleles, and the inference of a selective sweep in spite of contemporary balancing selection, are strongly supported; the data set is impressive and the analyses are all rigorous. I nonetheless think that there are a few ways in which the current presentation of these data could lead to confusion, and should be clarified and potentially also expanded.

      We thank the reviewer for the kind and encouraging assessment of our work.

      (1) The study is presented as addressing a paradox related to the evolution of phenotypic novelty in "highly constrained genetic architectures". If I understand correctly, these constraints are assumed to arise because the dsx inversion acts as a barrier to recombination. I agree that recombination in the mimicry locus is reduced and that recombination can be a source of phenotypic novelty. However, I'm not convinced that the presence of a structural variant necessarily constrains the potential evolution of novel discrete phenotypes. Instead, I'm having a hard time coming up with examples of discrete phenotypic polymorphisms that do not involve structural variants. If there is a paradox here, I think it should be more clearly justified, including an explanation of what a constrained genetic architecture means. I also think that the Discussion would be the place to return to this supposed paradox, and tell us exactly how the observations of exon swaps and the genetic characterization of the different mimicry alleles help resolve it.

      The paradox that we refer to here is essentially the contrast of evolving new adaptive traits which are genetically regulated, while maintaining the existing adaptive trait(s) at its fitness peak. While one of the mechanisms to achieve this could be differential structural rearrangement at the chromosomal level, it could arise due to alternative alleles or splice variants of a key gene (caste determination in Cardiocondyla ants), and differential regulation of expression (the spatial regulation of melanization in Nymphalid butterflies by ivory lncRNA). In each of these cases, a new mutation would have to give rise to a new phenotype without diluting the existing adaptive traits when it arises. We focused on structural variants, because that was the case in our study system, however, the point we were making referred to evolution of novel traits in general. We will add a section in the revised discussion to address this.

      (2) While Haldane's sieve is clearly demonstrated in the P. polytes lineage (with cyrus, polytes, and romulus alleles), there is another allele trio (cyrus, polytes, and theseus) for which Haldane's sieve could also be expected. However, the chronological order in which polytes and theseus evolved remains unresolved, precluding a similar investigation of sequential dominance. Likewise, the locus that differentiates polytes from theseus is unknown, so it's not currently feasible to identify a signature of positive selection shared by P. javanus and P. alphenor at this locus. I, therefore, think that it is premature to conclude that the evolution of these mimicry polymorphisms generally follows Haldane's sieve; of two allele trios, only one currently shows the expected pattern.

      We agree with the reviewer that the genetic basis of f. theseus requires further investigation. f. theseus occupies the same level on the dominance hierarchy of dsx alleles as f. polytes (Clarke and Sheppard, 1972) and the allelic variant of dsx present in both these female forms is identical, so there exists just one trio of alleles of dsx. Based on this evidence, we cannot comment on the origin of forms theseus and polytes. They could have arisen at the same time or sequentially. Since our paper is largely focused on the sequential evolution of dsx alleles through Haldane’s sieve, we have included f. theseus in our conclusions. We think that it fits into the framework of Haldane’s sieve due to its genetic dominance over the non-mimetic female form. However, this aspect needs to be explored further in a more specific study focusing on the characterization, origin, and developmental genetics of f. theseus in the future.

      Reviewer #2 (Public review):

      Summary:

      Deshmukh and colleagues studied the evolution of mimetic morphs in the Papilio polytes species group. They investigate the timing of origin of haplotypes associated with different morphs, their dominance relationships, associations with different isoform expressions, and evidence for selection and recombination in the sequence data. P. polytes is a textbook example of a Batesian mimic, and this study provides important nuanced insights into its evolution, and will therefore be relevant to many evolutionary biologists. I find the results regarding dominance and the sequence of events generally convincing, but I have some concerns about the motivation and interpretation of some other analyses, particularly the tests for selection.

      We thank the reviewer for these insightful remarks.

      Strengths:

      This study uses widespread sampling, large sample sizes from crossing experiments, and a wide range of data sources.

      We appreciate this point. This strength has indeed helped us illuminate the evolutionary dynamics of this classic example of balanced polymorphism.

      Weaknesses:

      (1) Purpose and premise of selective sweep analysis

      A major narrative of the paper is that new mimetic alleles have arisen and spread to high frequency, and their dominance over the pre-existing alleles is consistent with Haldane's sieve. It would therefore make sense to test for selective sweep signatures within each morph (and its corresponding dsx haplotype), rather than at the species level. This would allow a test of the prediction that those morphs that arose most recently would have the strongest sweep signatures.

      Sweep signatures erode over time - see Figure 2 of Moest et al. 2020 (https://doi.org/10.1371/journal.pbio.3000597), and it is unclear whether we expect the signatures of the original sweeps of these haplotypes to still be detectable at all. Moest et al show that sweep signatures are completely eroded by 1N generations after the event, and probably not detectable much sooner than that, so assuming effective population sizes of these species of a few million, at what time scale can we expect to detect sweeps? If these putative sweeps are in fact more recent than the origin of the different morphs, perhaps they would more likely be associated with the refinement of mimicry, but not necessarily providing evidence for or against a Haldane's sieve process in the origin of the morphs.

      Our original plan was to perform signatures of sweeps on individual morphs, but we have very small sample sizes for individual morphs in some species, which made it difficult to perform the analysis. We agree that signatures of selective sweeps cannot give us an estimate of possible timescales of the sweep. They simply indicate that there may have been a sweep in a certain genomic region. Therefore, with just the data from selective sweeps, we cannot determine whether these occurred with refining of mimicry or the mimetic phenotype itself. We have thus made no interpretations regarding time scales or causal events of the sweep. Additionally, we discuss the results we obtained for individual alleles represent what could have occurred at the point of origin of mimetic resemblance or in the course of perfecting the resemblance, although we cannot differentiate between the two at this point (lines 320 to 333).

      (2) Selective sweep methods

      A tool called RAiSD was used to detect signatures of selective sweeps, but this manuscript does not describe what signatures this tool considers (reduced diversity, skewed frequency spectrum, increased LD, all of the above?). Given the comment above, would this tool be sensitive to incomplete sweeps that affect only one morph in a species-level dataset? It is also not clear how RAiSD could identify signatures of selective sweeps at individual SNPs (line 206). Sweeps occur over tracts of the genome and it is often difficult to associate a sweep with a single gene.

      RAiSD (https://www.nature.com/articles/s42003-018-0085-8) detects selective sweeps using the μ statistic, which is a combined score of SFS, LD, and genetic diversity along a chromosome. The tool is quite sensitive and is able to detect soft sweeps. RAiSD can use a VCF variant file comprising of SNP data as input and uses an SNP-driven sliding window approach to scan the genome for signatures of sweep. Using an SNP file instead of runs of sequences prevents repeated calculations in regions that are sparse in variants, thereby optimizing execution time. Due to the nature of the input we used, the μ statistic was also calculated per site. We then tried to annotate the SNPs based on which genes they occur in and found that all species showing mimicry had atleast one site that showed a signature of sweep contained within the dsx locus.

      (3) Episodic diversification

      Very little information is provided about the Branch-site Unrestricted Statistical Test for Episodic Diversification (BUSTED) and Mixed Effects Model of Evolution (MEME), and what hypothesis the authors were testing by applying these methods. Although it is not mentioned in the manuscript, a quick search reveals that these are methods to study codon evolution along branches of a phylogeny. Without this information, it is difficult to understand the motivation for this analysis.

      We thank you for bringing this to our notice, we will add a few lines in the Methods about the hypothesis we were testing and the motivation behind this analysis. We will additionally cite a previous study from our group which used these and other methods to study the molecular evolution of dsx across insect lineages.

      (4) GWAS for form romulus

      The authors argue that the lack of SNP associations within dsx for form romulus is caused by poor read mapping in the inverted region itself (line 125). If this is true, we would expect strong association in the regions immediately outside the inversion. From Figure S3, there are four discrete peaks of association, and the location of dsx and the inversion are not indicated, so it is difficult to understand the authors' interpretation in light of this figure.

      We indeed observe the regions flanking dsx showing the highest association in our GWAS. This is a bit tricky to demonstrate in the figure as the genome is not assembled at the chromosome level. However, the association peaks occur on scf 908437033 at positions 2192979, 1181012 and 1352228 (Fig. S3c, Table S3) while dsx is located between 1938098 and 2045969. We will add the position of dsx in the figure legend of the revised manuscript.

      (5) Form theseus

      Since there appears to be only one sequence available for form theseus (actually it is said to be "P. javanus f. polytes/theseus"), is it reasonable to conclude that "the dsx coding sequence of f. theseus was identical to that of f. polytes in both P. javanus and P. alphenor" (Line 151)? Looking at the Clarke and Sheppard (1972) paper cited in the statement that "f. polytes and f. theseus show equal dominance" (line 153), it seems to me that their definition of theseus is quite different from that here. Without addressing this discrepancy, the results are difficult to interpret.

      Among P. javanus individuals sampled by us, we obtained just one individual with f. theseus and the H P allele, however, in the data we added from a previously published study (Zhang et. al. 2017), we were able to add nine more individuals of this form (Fig. S4b and S7), while we did not show these individuals in Fig 3 (which was based on PCR amplification and sequencing of individual exons od dsx), all the analysis with sequence data was performed on 10 theseus individuals in total. In Zhang et. al. the authors observed what we now know are species specific differences when comparing theseus and polytes dsx alleles and not allele-specific differences. Our observations were consistent with these findings.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      Summary: 

      The authors compared four types of hiPSCs and four types of hESCs at the proteome level to elucidate the differences between hiPSCs and hESCs. Semi-quantitative calculations of protein copy numbers revealed increased protein content in iPSCs. Particularly in iPSCs, proteins related to mitochondrial and cytoplasmic were suggested to reflect the state of the original differentiated cells to some extent. However, the most important result of this study is the calculation of the protein copy numbers per cell, and the validity of this result is problematic. In addition, several experiments need to be improved, such as using cells of different genders (iPSC: female, ESC: male) in mitochondrial metabolism experiments.

      Strengths: 

      The focus on the number of copies of proteins is exciting and appreciated if the estimated calculation result is correct and biologically reproducible. 

      Weaknesses: 

      The proteome results in this study were likely obtained by simply looking at differences between clones, and the proteome data need to be validated. First, there were only a few clones for comparison, and the gender and number of cells did not match between ESCs and iPSCs. Second, no data show the accuracy of the protein copy number per cell obtained by the proteome data. 

      We agree with the reviewer that it would be useful to have data from more independent stem cell clones and ideally an equal gender balance of the donors would be preferable. As usual, practical cost-benefit, and time available affect the scope of work that can be performed. We note that the impact of biological donor sex on proteome expression in iPSC lines has already been addressed in previous studies13. We will however revise the manuscript to include specific mention of these limitations and propose a larger-scale follow-up when resources are available.

      Regarding the estimation of protein copy numbers in our study, we would like to highlight that the proteome ruler approach we have used has been employed extensively in the field previously, with direct validation of differences in copy numbers provided using orthogonal methods to MS, e.g., FACS2-4,7,10. Furthermore, the original manuscript14 directly compared the copy numbers estimated using the “proteomic ruler” to spike-in protein epitope signature tags and found remarkable concordance. This original study was performed with an older generation mass spectrometer and reduced peptide coverage, compared with the instrumentation used in our present study. Further, we noted that these authors predicted that higher peptide coverage, such as we report in our study, would further increase quantitative performance.

      Reviewer #2 (Public Review):

      Summary: 

      Pluripotent stem cells are powerful tools for understanding development, differentiation, and disease modeling. The capacity of stem cells to differentiate into various cell types holds great promise for therapeutic applications. However, ethical concerns restrict the use of human embryonic stem cells (hESCs). Consequently, induced human pluripotent stem cells (ihPSCs) offer an attractive alternative for modeling rare diseases, drug screening, and regenerative medicine. A comprehensive understanding of ihPSCs is crucial to establish their similarities and differences compared to hESCs. This work demonstrates systematic differences in the reprogramming of nuclear and non-nuclear proteomes in ihPSCs. 

      We thank the reviewer for the positive assessment.

      Strengths: 

      The authors employed quantitative mass spectrometry to compare protein expression differences between independently derived ihPSC and hESC cell lines. Qualitatively, protein expression profiles in ihPSC and hESC were found to be very similar. However, when comparing protein concentration at a cellular level, it became evident that ihPSCs express higher levels of proteins in the cytoplasm, mitochondria, and plasma membrane, while the expression of nuclear proteins is similar between ihPSCs and hESCs. A higher expression of proteins in ihPSCs was verified by an independent approach, and flow cytometry confirmed that ihPSCs had larger cell sizes than hESCs. The differences in protein expression were reflected in functional distinctions. For instance, the higher expression of mitochondrial metabolic enzymes, glutamine transporters, and lipid biosynthesis enzymes in ihPSCs was associated with enhanced mitochondrial potential, increased ability to uptake glutamine, and increased ability to form lipid droplets. 

      Weaknesses: 

      While this finding is intriguing and interesting, the study falls short of explaining the mechanistic reasons for the observed quantitative proteome differences. It remains unclear whether the increased expression of proteins in ihPSCs is due to enhanced transcription of the genes encoding this group of proteins or due to other reasons, for example, differences in mRNA translation efficiency. Another unresolved question pertains to how the cell type origin influences ihPSC proteomes. For instance, whether ihPSCs derived from fibroblasts, lymphocytes, and other cell types all exhibit differences in their cell size and increased expression of cytoplasmic and mitochondrial proteins. Analyzing ihPSCs derived from different cell types and by different investigators would be necessary to address these questions. 

      We agree with the Reviewer that our study does not extend to also providing a detailed mechanistic explanation for the quantitative differences observed between the two stem cell types and did not claim to have done so. We have now included an expanded section in the discussion where we discuss potential causes. However, in our view fully understanding the reasons for this difference is likely to involve extensive future in-depth analysis in additional studies and is not something that can be determined just by one or two additional supplemental experiments.

      We also agree studying hiPSCs reprogrammed from different cell types, such as blood lymphocytes, would be of great interest. Again, while we agree it is a useful way forward, in practice this will require a very substantial additional commitment of time and resources. We have now included a section discussing this opportunity within the discussion to encourage further research into the area.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) aizi1 and ueah1 clones, which were analyzed in Figure 1A, were excluded from the proteome analysis. In particular, the GAPDH expression level of the aizi1 clone is similar to that of ESCs and different from other iPSC clones. An explanation of how the clones were selected for proteome analysis is needed. Previously, the comparative analysis of iPSCs and ESCs reported in many studies from 2009-2017 (Ref#1-7) has already shown that the number of clones used in the comparative analysis is small, claiming differences (Ref#1-3) and that the differences become indistinguishable when the number of clones is increased (Ref#4-7). Certainly, few studies have been done at the proteome level, so it is important to examine what differences exist in the proteome. Also, it is interesting to focus on the amount of protein per cell. However, if the authors want to describe biological differences, it would be better to get the proteome data in biological duplicate and state the reason for selecting the clones used.

      (1) M. Chin, Cell Stem Cell, 2009, PMID: 19570518

      (2) K. Kim, Nat Biotechnol., 2011, PMID: 22119740

      (3) R. Lister, Nature, 2011, PMID: 21289626

      (4) A.M. Newman, Cell Stem Cell, 2010, PMID: 20682451

      (5) M.G. Guenther, Cell Stem Cell, 2010, PMID: 20682450

      (6) C. Bock, Cell, 2010, PMID: 21295703

      (7) S. Yamanaka, Cell Stem Cell, PMID: 22704507

      We agree with the reviewer that analysing more clones would be beneficial. We have included a section of this topic in the discussion. In our study, we only had access to the 4 hESC lines included, therefore in the original proteomic study we also analysed 4 hiPSC lines, which were routinely grown within our stem cell facility. While as the study progressed the stem cell facility expanded the culture of additional hiPSC lines, unfortunately we couldn’t also access additional hESC lines.

      We agree that ideally combining each biological replicate with additional technical replicates would provide extra robustness. As usual, cost and practical considerations at the time the experiments were performed affected the experimental design chosen. For the experimental design, each experiment was contained within 1 batch to avoid the strong batch effects present in TMT (Brenes et al 2019).

      (2) iPSC samples used in the proteome analysis are two types of female and two types of male, while ESC samples are three types of female and one type of female. The number of sexes of the cells in the comparative analysis should be matched because sex differences may bias the results.

      While we agree with the reviewer in principle, we have previously performed detailed comparisons of proteome expression in many independent iPSC lines from both biological male and female donors (see Brenes et al., Cell Reports 2021) and it seems unlikely that biological sex differences alone could account for the proteome differences between iPS and ESC lines uncovered in this study . However, as this is a relevant point, we have revised the manuscript to explicitly mention this caveat within the discussion section.

      (3) In Figure 1h, I suspect that the variation of PCA plots is very similar between ESCs and iPSCs. In particular, the authors wrote "copy numbers for all 8 replicates" in the legend, but if Figure 1b was done 8 times, there should be 8 types of cells x 8 measurements = 64 points. Even if iPSCs and ESCs are grouped together, there should be 8 points for each cell type. Is it possible that there is only one TMT measurement for this analysis? If so, at least technical duplicates or biological duplicates would be necessary. I also think each cell should be plotted in the PCA analysis instead of combining the four types of ESCs and iPSCs into one.

      We thank the reviewer for bringing this error to our attention. The legend has been corrected to state, “for all 8 stem cell lines”. Each dot represents the proteome of each of the 4 hESCs and 4 hiPSCs that were analysed using proteomics.

      (4) It is necessary to show what functions are enriched in the 4408 proteins whose protein copies per cell were increased in the iPSCs obtained in Figure 2B.

      The enrichment analysis requested has been performed and is now included as a new supplemental figure 2. We find it very interesting that despite the large number of proteins involved here (4,408), the enrichment analysis still shows clear enrichment for specific cellular processes. The summary plot using affinity propagation within webgestalt is included here:

      Author response image 1.

      (5) The Proteomic Ruler method used in this study is a semi-quantitative method to calculate protein copy numbers and is a concentration estimation method. Therefore, if the authors want to have a biological discussion based on the results, they need to show that the estimated concentrations are correct. For example, there are Western Blotting (WB) results for genes with no change in protein levels in hESC and hiPSC in Fig. 6ij, but the WB results for the group of genes that are claimed to have changed are not shown throughout the paper. Also, there is no difference in the total protein level between iPSCs and ESCs from the ponceau staining in Fig.6ij. WB results for at least a few genes are needed to show whether the concentration estimates obtained from the proteome analysis are plausible. If the protein per cell is increased in these iPSC clones, performing WB analysis using an equal number of cells would be better.

      Regarding the ‘proteome ruler’ approach we would like to highlight that this method has previously been used extensively in the field, with detailed validation, as already explained above. It is also not ‘semi-quantitative’ and can estimate absolute abundance, as well as concentrations. Our work does not use their concentration formulas, but the estimation of protein copy numbers, which was shown to closely match the observed copy numbers as determined when spike-ins are used14.

      In providing here additional validation using Western Blotting (WB), we prioritised for analysis also by WB the proteins related to pluripotency markers, which are vital to determine the pluripotency state of the hESCs and hiPSCs, as well as histone markers. We have included a section in the discussion concerning additional validation data and agree in general that further validation is always useful.

      (6) Regarding the experiment shown in Figure 4l, the gender of iPSC used (wibj2) is female and WA01 (H1; WA01) is male. Certainly, there is a difference in the P/E control ratio, but isn't this just a gender difference? The sexes of the cells need to be matched.

      We accept that ideally the sexes of donors should ideally have been matched and have mentioned this within the discussion. Nonetheless, as previously mentioned, our previous detailed proteomic analyses of multiple hiPSC lines13 derived from both biological male and female donors provide relevant evidence that the results shown in this study are not simply a reflection of the sex of the donors for the respective iPSC and ESC lines. When comparing eroded and non-eroded female hiPSCs to male hiPSCs we found no significant differences in any electron transport chain proteins, not TCA proteins between males and females.

      Minor comments:

      (1) Method: Information on the hiPSCs and hESCs used in this study should be described. In particular, the type of differentiated cells, gender, and protocols that were used in the reprogramming are needed.

      We agree with the reviewer on this. The hiPSC lines were generated by the HipSci consortium, as described in the flagship HipSci paper15. We cite the flagship paper, which specifies in great detail the reprogramming protocols and quality control measures, including analysis of copy number variations15. However, we agree that this information may not be easily accessible for readers. We agree it is relevant to explicitly include this information in our present manuscript, instead of expecting readers to look at the flagship paper. These details have therefore been added to the revised version.

      (2) Method: In Figure1a, Figure 6i, j, the antibody information of Nanog, Oct4, Sox2, and Gapdh is not written in the method and needs to be shown.

      The data relating to these has now been included within the methods section.

      (3) Method: In Figure 1b and other figures, the authors should indicate which iPSC corresponds to which TMT label; the data in the Supplemental Table also needs to indicate which data is which clone.

      We have now added this to the methods section.

      (4) Method: The method of the FACS experiment used in Figure 2 should be described.

      The methods related to the FACS analysis have now been included within the manuscript.

      (5) Method: The cell name used in the mitochondria experiment shown in Figure 4 is listed as WA01, which is thought to be H1. Variations in notation should be corrected.

      This has now been corrected.

      (6) Method: The name of the cell clone shown in Figure 3l,m should be mentioned.

      We have now added these details on the corresponding figure and legend.

      Reviewer #2 (Recommendations For The Authors):

      This study utilized quantitative mass spectrometry to compare protein expression in independently derived 4 ihPSC and 4 hESC cell lines. The investigation quantified approximately 7,900 proteins, and employing the "Proteome ruler" approach, estimated protein copy numbers per cell. Principal component analyses, based on protein copy number per cell, clearly separated hiPSC and hESC, while different hiPSCs and hESCs grouped together. The study revealed a global increase in the expression of cytoplasmic, mitochondrial, membrane transporters, and secreted proteins in hiPSCs compared to hESCs. Interestingly, standard median-based normalization approaches failed to capture these differences, and the disparities became apparent only when protein copy numbers were adjusted for cell numbers. Increased protein abundance in hiPSC was associated with augmented ribosome biogenesis. Total protein content was >50% higher in hiPSCs compared to hESCs, a observation independently verified by total protein content measurement via the EZQ assay and further supported by the larger cell size of hiPSCs in flow cytometry. However, the cell cycle distribution of hiPSC and hESC was similar, indicating that the difference in protein content was not due to variations in the cell cycle. At the phenotypic level, differences in protein expression also correlated with increased glutamine uptake, enhanced mitochondrial potential, and lipid droplet formation in hiPSCs. ihPSCs also expressed higher levels of extracellular matrix components and growth factors.

      Overall, the presented conclusions are adequately supported by the data. Although the mechanistic basis of proteome differences in ihPSC and hESC is not investigated, the work presents interesting findings that are worthy of publication. Below, I have listed my specific questions and comments for the authors.

      (1) Figure 1a displays immunoblots from 6 iPSC and 4 ESC cell lines, with 8 cell lines (4 hESC, 4 hiPSC) utilized in proteomic analyses (Fig. 1b). The figure legend should specify the 8 cell lines included in the proteomic analyses. The manuscript text describing these results should explicitly mention the number and names of cell lines used in these assays.

      We agree with the reviewer and have now marked in figure 1 all the lines that were used for proteomics and have added a section in the methods specifying which cell lines were analysed in each TMT channel.

      (2) In most figures, the quantitative differences in protein expression between hiPSC and hESC are evident, and protein expression is highly consistent among different hiPSCs and hESCs. However, the glutamine uptake capacity of different hiPSC cell lines, and to some extent hESC cell lines, appears highly variable (Figure 3e). While proteome changes were measured in 4 hiPSCs and 4 hESCs, the glutamine uptake assays were performed on a larger number of cell lines. The authors should clarify the number of cell lines used in the glutamine uptake assay, clearly indicating the cell lines used in the proteome measurements. Given the large variation in glutamine uptake among different cell lines, it would be useful to plot the correlation between the expression of glutamine transporters and glutamine uptake in individual cell lines. This may help understand whether differences in glutamine uptake are related to variations in the expression of glutamine transporters.

      The “proteomic ruler” has the capacity to estimate the protein copy numbers per cell, as such changes in the absolute number of cells that were analysed do not cause major complications in quantification. Furthermore, TMT-based proteomics is the most precise proteomics methods available, where the same peptides are detected in all samples across the same data points and peaks, as long as the analysis is done within a single batch, as is the case here.

      The glutamine uptake assay is much more sensitive to the variation in the number of cells. The number of cells were estimated by plating the cells with approximately 5e4 cells two days before the assay, which creates variability. Furthermore, hESCs and hiPSCs are more adhesive than the cells used in the original protocol, hence the quench data was noisier for these lines, making the data from the assay more variable.

      (3) In Figure 4j, it would be helpful to indicate whether the observed differences in the respiration parameters are statistically significant.

      We have now modified the plot to show which proteins were significantly different.

      (4) The iPSCs used here are generated from human primary skin fibroblasts. Different cells vary in size; for instance, fibroblast cells are generally larger than blood lymphocytes. This raises the question of whether the parent cell origin impacts differences in hiPSCs and hESC proteomes. For example, do the authors anticipate that hiPSCs derived from small somatic cells would also display higher expression of cytoplasmic, mitochondrial, and membrane transporters compared to ESC? The authors may consider discussing this point.

      This is a very interesting point. We have now added an extension to the discussion focussed on this subject.

      (5) One wonders if the "Proteome ruler" approach could be applied retrospectively to previously published ihPSC and hESC proteome data, confirming higher expression of cytoplasmic and mitochondrial proteins in ihPSCs, which may have been masked in previous analyses due to median-based normalization.

      We agree with the reviewer and think this is a very good suggestion. Unfortunately, in the main proteomic papers comparing hESC and hiPSCs16,17  the authors did not upload their raw files to a public repository (as it was not mandatory at that period in time), and they also used the International Protein Index (IPI), which is a discontinued database. So the raw files can’t be reprocessed and the database doesn’t match the modern SwissProt entries. Therefore, reprocessing the previous data was impractical.

      (6) The work raises a fundamental question: what is the mechanistic basis for the higher expression of cytoplasmic and mitochondrial proteins in ihPSCs? Conceivably, this could be due to two reasons: (a) Genes encoding cytoplasmic and mitochondrial proteins are expressed at a higher level in ihPSCs compared to hESC. (b) mRNAs encoding cytoplasmic and mitochondrial proteins are translated at a higher level in ihPSCs compared to hESC. The authors may check published transcriptome data from the same cell lines to shed light on this point.

      This is a very interesting point. We believe that the reprogrammed cells contained mature mitochondria, which are not fully regressed upon reprogramming and that this can establish a growth advantage in the normoxic environments in which the cells are grown. Unfortunately, the available transcriptomic data lacked spike-ins, and thus only enables comparison of concentration, not of copy numbers13. Therefore, we could not determine with the available data if there was an increase in the copies of specific mRNAs. However, with a future study where there was a transcriptomic dataset with spike-ins included, this would be very interesting to analyse.

      Reviewer #3 (Recommendations For The Authors):

      It is unclear whether changes in protein levels relate to any phenotypic features of cell lines used. For example, the authors highlight that increased protein expression in hiPSC lines is consistent with the requirement to sustain high growth rates, but there is no data to demonstrate whether hiPSC lines used indeed have higher growth rates.

      We respectfully disagree with the reviewer on this point. Our data show that hESCs and hiPSCs show significant differences in protein mass and cell size, with the MS data validated by the EZQ assay and FACS, while having no significant differences in their cell cycle profiles. Thus, increased size and protein content would require higher growth rates to sustain the increased mass, which is what we observe.

      The authors claim that the cell cycle of the lines is unchanged. However, no details of the method for assessing the cell cycle were included so it is difficult to appreciate if this assessment was appropriately carried out and controlled for.

      We apologise for this omission; the details have been included in the revised version of the manuscript.

      Details and characterisation of iPSC and ESC lines used in this study are overall lacking. The lines used are merely listed in methods, but no references are included for published lines, how lines were obtained, what passage they were used at, their karyotype status etc. For details of basic characterisation, the authors should refer to the ISSC Standards for the use of human stem cells in research. In particular, the authors should consider whether any of the changes they see may be attributed to copy number variants in different lines.

      We agree with the reviewer on this and refer to the reply above concerning this issue.

      The expression data for markers of undifferentiated state in Figure 1a would ideally be shown by immunocytochemistry or flow cytometry as it is impossible to tell whether cultures are heterogeneous for marker expression.

      We agree with the reviewer on this. FACS is indeed much more quantitative and a better method to study heterogeneity. However, we did not have protocols to study these markers using FACS.

      TEM analysis should ideally be quantified.

      We agree with the reviewer that it would be nice to have a quantitative measure.

      All figure legends should explicitly state what graphs are representing (e.g. average/mean; how many replicates (biological or technical), which lines)? Some data is included in Methods (e.g. glutamine uptake), but not for all of the data (e.g. TEM).

      We agree with the reviewer. These has been corrected in the revised version of the manuscript, with additional details included.

      Validation experiments were performed typically on one or two cell lines, but the lines used were not consistent (e.g. wibj_2 versus H1 for respirometry and wibj_2, oaqd_3 versus SA121 and SA181 for glutamine uptake). Can the authors explain how the lines were chosen?

      The validation experiments were performed at different time points, and the selection of lines reflected the availability of hiPSC and hESC lines within our stem cell facility at a given point in time.

      We chose to use a range of different lines for comparison, rather than always comparing only one set of lines, to try to avoid a possible bias in our conclusions and thus to make the results more general.

      The authors should acknowledge the need for further functional validation of the results related to immunosuppressive proteins.

      We agree with the reviewer and have added a sentence in the discussion making this point explicitly.

      Differences in H1 histones abundance were highlighted. Can the authors speculate as to the meaning of these differences?

      Regarding H1 histones, our study of the literature, as well as discussions with with chromatin and histone experts, both within our institute and externally, have not shed light into what the differences could imply, based upon previous literature. We think therefore that this is a striking and interesting result that merits further study, but we have not yet been able to formulate a clear hypothesis on the consequences.

      (1) Howden, A. J. M. et al. Quantitative analysis of T cell proteomes and environmental sensors during T cell differentiation. Nat Immunol, doi:10.1038/s41590-019-0495-x (2019).

      (2) Marchingo, J. M., Sinclair, L. V., Howden, A. J. & Cantrell, D. A. Quantitative analysis of how Myc controls T cell proteomes and metabolic pathways during T cell activation. Elife 9, doi:10.7554/eLife.53725 (2020).

      (3) Damasio, M. P. et al. Extracellular signal-regulated kinase (ERK) pathway control of CD8+ T cell differentiation. Biochem J 478, 79-98, doi:10.1042/BCJ20200661 (2021).

      (4) Salerno, F. et al. An integrated proteome and transcriptome of B cell maturation defines poised activation states of transitional and mature B cells. Nat Commun 14, 5116, doi:10.1038/s41467-023-40621-2 (2023).

      (5) Antico, O., Nirujogi, R. S. & Muqit, M. M. K. Whole proteome copy number dataset in primary mouse cortical neurons. Data Brief 49, 109336, doi:10.1016/j.dib.2023.109336 (2023).

      (6) Edwards, W. et al. Quantitative proteomic profiling identifies global protein network dynamics in murine embryonic heart development. Dev Cell 58, 1087-1105 e1084, doi:10.1016/j.devcel.2023.04.011 (2023).

      (7) Barton, P. R. et al. Super-killer CTLs are generated by single gene deletion of Bach2. Eur J Immunol 52, 1776-1788, doi:10.1002/eji.202249797 (2022).

      (8) Phair, I. R., Sumoreeah, M. C., Scott, N., Spinelli, L. & Arthur, J. S. C. IL-33 induces granzyme C expression in murine mast cells via an MSK1/2-CREB-dependent pathway. Biosci Rep 42, doi:10.1042/BSR20221165 (2022).

      (9) Niu, L. et al. Dynamic human liver proteome atlas reveals functional insights into disease pathways. Mol Syst Biol 18, e10947, doi:10.15252/msb.202210947 (2022).

      (10) Murugesan, G., Davidson, L., Jannetti, L., Crocker, P. R. & Weigle, B. Quantitative Proteomics of Polarised Macrophages Derived from Induced Pluripotent Stem Cells. Biomedicines 10, doi:10.3390/biomedicines10020239 (2022).

      (11) Ryan, D. G. et al. Nrf2 activation reprograms macrophage intermediary metabolism and suppresses the type I interferon response. iScience 25, 103827, doi:10.1016/j.isci.2022.103827 (2022).

      (12) Nicolas, P. et al. Systems-level conservation of the proximal TCR signaling network of mice and humans. J Exp Med 219, doi:10.1084/jem.20211295 (2022).

      (13) Brenes, A. J. et al. Erosion of human X chromosome inactivation causes major remodeling of the iPSC proteome. Cell Rep 35, 109032, doi:10.1016/j.celrep.2021.109032 (2021).

      (14) Wisniewski, J. R., Hein, M. Y., Cox, J. & Mann, M. A "proteomic ruler" for protein copy number and concentration estimation without spike-in standards. Mol Cell Proteomics 13, 3497-3506, doi:10.1074/mcp.M113.037309 (2014).

      (15) Kilpinen, H. et al. Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370-375, doi:10.1038/nature22403 (2017).

      (16) Phanstiel, D. H. et al. Proteomic and phosphoproteomic comparison of human ES and iPS cells. Nat Methods 8, 821-827, doi:10.1038/nmeth.1699 (2011).

      (17) Munoz, J. et al. The quantitative proteomes of human-induced pluripotent stem cells and embryonic stem cells. Mol Syst Biol 7, 550, doi:10.1038/msb.2011.84 (2011).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Question 1: The experiment that utilizes lactose or glucose supplementation to infer the importance of carbohydrate recognition by galectin-9 cannot be interpreted unequivocally owing to the growth-enhancing effect of lactose supplementation on Mtb during liquid culture in vitro.

      Thank you for this very constructive comment. We repeated the experiments by lowering the concentration of lactose or AG from 10 μg/mL to 1 μg/mL. We found that low concentration of lactose or AG showed neglectable effect on Mtb growth, however, they still reversed the inhibitory effect of galectin-9 on mycobacterial growth (revised Fig. 2A, C). Therefore, we consider that the supplementation of lactose or AG reverse galectin-9 mediated inhibition of Mtb growth largely through its carbohydrate recognition rather than their growth-enhancing effect.

      Question 2: Similar to the comment above, the apparent dose-independent effect of galectin-9 on Mtb growth in vitro is difficult to reconcile with the interpretation that galectin is functioning as claimed.

      We thank the reviewer for the correction. Indeed, as the reviewer pointed out, galectin-9 inhibits Mtb growth in dose-independent manner. We had corrected the claim in the revised manuscript (Line 114).

      Question 3: The claimed differences in galectin-9 concentration in sera from tuberculin skin test (TST)-negative or TST-positive non-TB cases versus active TB patients are not immediately apparent from the data presented.

      We appreciate your concern. Previous samples are from a cohort set up in Max Plank Institute for Infection Biology. We have performed the detection of galectin-9 in sera in another independent cohort of active TB patients and healthy donors in China. And we found higher abundance of galectin-9 in serum from TB patients than that from heathy donors (revised Fig. 1E).

      Question 4: Neither fluorescence microscopy nor electron microscopy analyses are supported by high-quality, interpretable images which, in the absence of supporting quantitative data, renders any claims of anti-AG mAb specificity (fluorescence microscopy) or putative mAb-mediated cell wall swelling (electron microscopy) highly speculative.

      We appreciate your concern. We have improved the procedure of the immunofluorescence assay and obtained high-quality and interpretable images with quantitative data (revised Fig. 4F). As for electron microscopy analyses, we added clearer label indicating cell wall in revised manuscript (revised Fig. 7C).

      Question 5: Finally, the absence of any discussion of how anti-AG antibodies (similarly, galectin-9) gain access to the AG layer in the outer membrane of intact Mtb bacilli (which may additionally possess an extracellular capsule/coat) is a critical omission - situating these results in the context of current knowledge about Mtb cellular structure (especially the mycobacterial outer membrane) is essential for plausibility of the inferred galectin-9 and anti-AG mAb activities.

      Exactly, AG is hidden by mycolic acids in the outer layer of Mtb cell wall. As we have discussed in the Discussion part of previous manuscript (line 285), we speculate that during Mtb replication, cell wall synthesis is active and AG becomes exposed, thereby facilitating its binding to galectin-9 or AG antibody and leading to Mtb growth arrest. It’s highly possible that galectin-9 or AG antibody targets replicating Mtb.

      To Reviewer #2 (Public Review):

      Question 1: In light of other observations that cleaved galectin-9 levels in the plasma is a biomarker for severe infection (Padilla A et al Biomolecules 2021 and Iwasaki-Hozumi H et al. Biomoleucles 2021) it is difficult to reconcile the author's interpretation that the elevated gal-9 in Active TB patients (Figure 1E) contributes to the maintenance of latent infection in humans. The authors should consider incorporating these observations in the interpretation of their own results.

      Thank you for these very insightful comments. We observed elevated levels of galectin-9 in the serum of active TB patients, consistent with reports indicating that cleaved galectin-9 levels in the serum serve as a biomarker for severe infection (Iwasaki-Hozumi et al., 2021; Padilla et al., 2020). We consider that the elevated levels of galectin-9 in the serum of active TB may be an indicator of the host immune response to Mtb infection, however, the magnitude of elevated galectin-9 is not sufficient to control Mtb infection and maintain latent infection. This is highly similar to other protective immune factors such as interferon gamma, which is elevated in active TB as well (El-Masry et al., 2007; Hasan et al., 2009). We have included the discussion in the revised manuscript (line 298).

      Question 2: The anti-AG titers were measured only in individuals with active TB (Figure 3C), generally thought to be a less protective immunological state. The speculation that individuals with anti-AG titers have some protection is not founded. Further only 2 mAbs were tested to demonstrate restriction of Mtb in culture. It is possible that clones of different affinities for AG present within a patient's polyclonal AG-antibody responses may or may not display a direct growth restriction pressure on Mtb in culture. The authors should soften the claims about the presence of AG-titers in TB patients being indicative of protection.

      We appreciate your concern. As per your suggestion, we have softened the claim to that “We speculate that during Mtb infection, anti-AG IgG antibodies are induced, which potentially contribute to protection against TB by directly inhibiting Mtb replication albeit seemingly in vain.”

      References

      El-Masry, S., Lotfy, M., Nasif, W.A., El-Kady, I.M., and Al-Badrawy, M. (2007). Elevated serum level of interleukin (IL)-18, interferon (IFN)-gamma and soluble Fas in patients with pulmonary complications in tuberculosis. Acta microbiologica et immunologica Hungarica 54, 65-77.

      Hasan, Z., Jamil, B., Khan, J., Ali, R., Khan, M.A., Nasir, N., Yusuf, M.S., Jamil, S., Irfan, M., and Hussain, R. (2009). Relationship between circulating levels of IFN-gamma, IL-10, CXCL9 and CCL2 in pulmonary and extrapulmonary tuberculosis is dependent on disease severity. Scandinavian journal of immunology 69, 259-267.

      Iwasaki-Hozumi, H., Chagan-Yasutan, H., Ashino, Y., and Hattori, T. (2021). Blood Levels of Galectin-9, an Immuno-Regulating Molecule, Reflect the Severity for the Acute and Chronic Infectious Diseases. Biomolecules 11.

      Padilla, S.T., Niki, T., Furushima, D., Bai, G., Chagan-Yasutan, H., Telan, E.F., Tactacan-Abrenica, R.J., Maeda, Y., Solante, R., and Hattori, T. (2020). Plasma Levels of a Cleaved Form of Galectin-9 Are the Most Sensitive Biomarkers of Acquired Immune Deficiency Syndrome and Tuberculosis Coinfection. Biomolecules 10.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:  

      This paper investigates the relationship between ocular drift - eye movements long thought to be random - and visual acuity. This is a fundamental issue for how vision works. The work uses adaptive optics retinal imaging to monitor eye movements and where a target object is in the cone photoreceptor array. The surprising result is that ocular drift is systematic - causing the object to move to the center of the cone mosaic over the course of each perceptual trial. The tools used to reach this conclusion are state-of-the-art and the evidence presented is convincing.

      Strengths  

      P1.1. The central question of the paper is interesting, as far as I know, it has not been answered in past work, and the approaches employed in this work are appropriate and provide clear answers.

      P1.2. The central finding - that ocular drift is not a completely random process - is important and has a broad impact on how we think about the relationship between eye movements and visual perception.

      P1.3. The presentation is quite nice: the figures clearly illustrate key points and have a nice mix of primary and analyzed data, and the writing (with one important exception) is generally clear.

      Thank you for your positive feedback.

      Weaknesses

      P1.4. The handling of the Nyquist limit is confusing throughout the paper and could be improved. It is not clear (at least to me) how the Nyquist limit applies to the specific task considered. I think of the Nyquist limit as saying that spatial frequencies above a certain cutoff set by the cone spacing are being aliased and cannot be disambiguated from the structure at a lower spatial frequency. In other words, there is a limit to the spatial frequency content that can be uniquely represented by discrete cone sampling locations. Acuity beyond that limit is certainly possible with a stationary image - e.g. a line will set up a distribution of responses in the cones that it covers, and without noise, an arbitrarily small displacement of the line would change the distribution of cone responses in a way that could be resolved. This is an important point because it relates to whether some kind of active sampling or movement of the detectors is needed to explain the spatial resolution results in the paper. This issue comes up in the introduction, results, and discussion. It arises in particular in the two Discussion paragraphs starting on line 343.

      We thank you for pointing out a possible confusion for readers. Overall, we contrast our results to the static Nyquist limit because it is generally regarded as the upper limit of resolution acuity. We updated our text in a few places, especially the Discussion, and added a reference to make our use of the Nyquist limit clearer.

      We agree with the reviewer of how the Nyquist limit is interpreted within the context of visual structure. If visual structure is under-sampled, it is not lost, but creates new, interfered visual structure at lower spatial frequency. For regular patterns like gratings, interference patterns may emerge akin to Moire patterns, which have been shown to occur in the human eye, and which form is based on the arrangement and regularity of the photoreceptor mosaic (Williams, 1985). We note however that the successful resolution of the lower frequency pattern does not necessarily carry the same structural information, specifically, orientation, and the aliased structure might indeed mask the original stimulus. Please compare Figure 1f where we show individual static snapshots of such aliased patterns, especially visible when the optotypes are small (towards the lower right of the figure). We note that theoretical work predicts that with prior knowledge about the stimulus, even such static images might be possible to de-alias (Ruderman & Bialek, 1992). We added this to our manuscript.   

      We think the reviewer’s following point about the resolution of a line position, is only partially connected to the first, however. In our manuscript we note in the Introduction that resolution of the relative position of visual objects is a so called hyperacuity phenomenon. The fact that it occurs in humans and other animals demonstrates that visual brains have come up with neuronal mechanisms to determine relative stimulus position with sub-Nyquist resolution. The exact mechanism is however not fully clear. One solution is that relative cone signal intensities could be harnessed, similar as is employed technically, e.g. in a quadrant-cell detector. Its positional precision is much higher than the individual cell’s size (or Nyquist limit), predominantly determined by the detector’s sensitivity and to a lesser degree its size. On the other hand, such detector, being hyperacute with object location, would not have the same resolution as, for instance, letter-E orientation discrimination. 

      Note that in all the above occasions, a static image-sensor-relationship is assumed. In our paper, we were aiming to convey, like others did before, that a moving stimulus may give rise to sub-Nyquist structural resolution, beyond what is already known for positional acuity and hence, classical hyperacuity. 

      Based on the data shown in this manuscript and other experimental data currently collected in the lab, it seems to us that eye movements are indeed the crucial point in achieving sub-Nyquist resolution. For example, ultra-short presentation durations, allowing virtually no retinal slip, push thresholds close to the Nyquist limit and above. Furthermore, with AOSLO stimulation, it is possible to stabilize a stimulus on the retina, which would be a useful tool studying this hypothesis. Our current level of stabilization is however not accurate enough to completely mitigate retinal image motion in the foveola, where cells are smallest, and transients could occur. From what we observe and other studies that looked at resolution thresholds at more peripheral retinal locations, we would predict that foveolar resolution of a perfectly stabilized stimulus would be indeed limited by the Nyquist limit of the receptor mosaic.

      P1.5. One question that came up as I read the paper was whether the eye movement parameters depend on the size of the E. In other words, to what extent is ocular drift tuned to specific behavioral tasks?

      This is an interesting question. Yet, the experimental data collected for the current manuscript does not contain enough dispersion in target size to give a definitive answer, unfortunately. A larger range of stimulus sizes and especially a similar number of trials per size would be required. Nonetheless, when individual trials were re-grouped to percentiles of all stimulus sizes (scaled for each eye individually), we found that drift length and directionality was not significantly different between any percentile group of stimulus sizes (Wilcoxon sign rank test, p > 0.12, see also Figure R1). Our experimental trials started with a stimulus demanding visual acuity of 20/16 (logMAR = -0.1), therefore all presented stimulus sizes were rather close to threshold. The high visual demand in this AO resolution task might bring the oculomotor system to a limit, where ocular drift length can’t be decreased further. However, with the limitation due to the small range of stimulus sizes, further investigations would be needed. Given this and that this topic is also ongoing research in our lab where also more complex dynamics of FEM patterns are considered, we refrain from showing this analysis in the current manuscript.  

      Author response image 1.

      Drift length does not depend on stimulus sizes close to threshold. All experimental trials were sorted by stimulus size and then grouped into percentiles for each participant (left). Additionally, 10 % of trials with stimulus sizes just above or below threshold are shown for comparison (right). For each group, median drift lengths (z-scored) are shown as box and whiskers plot. Drift length was not significantly different across groups.  

      Reviewer #2 (Public Review):

      Summary:

      In this work, Witten et al. assess visual acuity, cone density, and fixational behavior in the central foveal region in a large number of subjects.

      This work elegantly presents a number of important findings, and I can see this becoming a landmark work in the field. First, it shows that acuity is determined by the cone mosaic, hence, subjects characterized by higher cone densities show higher acuity in diffraction-limited settings. Second, it shows that humans can achieve higher visual resolution than what is dictated by cone sampling, suggesting that this is likely the result of fixational drift, which constantly moves the stimuli over the cone mosaic. Third, the study reports a correlation between the amplitude of fixational motion and acuity, namely, subjects with smaller drifts have higher acuities and higher cone density. Fourth, it is shown that humans tend to move the fixated object toward the region of higher cone density in the retina, lending further support to the idea that drift is not a random process, but is likely controlled. This is a beautiful and unique work that furthers our understanding of the visuomotor system and the interplay of anatomy, oculomotor behavior, and visual acuity.

      Strengths:

      P2.1. The work is rigorously conducted, it uses state-of-the-art technology to record fixational eye movements while imaging the central fovea at high resolution and examines exactly where the viewed stimulus falls on individuals' foveal cone mosaic with respect to different anatomical landmarks in this region. The figures are clear and nicely packaged. It is important to emphasize that this study is a real tour-de-force in which the authors collected a massive amount of data on 20 subjects. This is particularly remarkable considering how challenging it is to run psychophysics experiments using this sophisticated technology. Most of the studies using psychophysics with AO are, indeed, limited to a few subjects. Therefore, this work shows a unique set of data, filling a gap in the literature.

      Thank you, we are very grateful for your positive feedback.

      Weaknesses:

      P2.2. No major weakness was noted, but data analysis could be further improved by examining drift instantaneous direction rather than start-point-end-point direction, and by adding a statistical quantification of the difference in direction tuning between the three anatomical landmarks considered.

      Thank you for these two suggestions. We now show the development of directionality with time (after the first frame, 33 ms as well as 165 ms, 330 ms and 462 ms), and performed a Rayleigh test for non-uniformity of circular data. Please also see our response to comment R2.4.

      Briefly, directional tuning was already visible at 33 ms after stimulus onset and continuously increases with longer analysis duration. Directionality is thus not pronounced at shorter analysis windows. These results have been added to the text and figures (Figure 4 - figure supplement 1).

      The statistical tests showed that circular sample directionality was not uniformly distributed for all three retinal locations. The circular average was between -10 and 10 ° in all cases and the variance was decreasing with increasing time (from 48.5 ° to 34.3 ° for CDC, 49.6 ° to 38.6 ° for PRL and 53.9 ° to 43.4 for PCD location, between frame 2 and 15). As we have discussed in the paper, we would expect all three locations to come out as significant, given their vicinity to the CDC (which is systematic in the case of PRL, and random in the case of PCD, see also comment R2.2).        

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Witten et al., titled "Sub-cone visual resolution by active, adaptive sampling in the human foveola," aims to investigate the link between acuity thresholds (and hyperacuity) and retinal sampling. Specifically, using in vivo foveal cone-resolved imaging and simultaneous microscopic photostimulation, the researchers examined visual acuity thresholds in 16 volunteers and correlated them with each individual's retinal sampling capacity and the characteristics of ocular drift.

      First, the authors found that although visual acuity was highly correlated with the individual spatial arrangement of cones, for all participants, visual resolution exceeded the Nyquist sampling limit - a well-known phenomenon in the literature called hyperacuity.

      Thus, the researchers hypothesized that this increase in acuity, which could not be explained in terms of spatial encoding mechanisms, might result from exploiting the spatiotemporal characteristics of visual input, which is continuously modulated over time by eye movements even during so-called fixations (e.g., ocular drift).

      Authors reported a correlation between subjects, between acuity threshold and drift amplitude, suggesting that the visual system benefits from transforming spatial input into a spatiotemporal flow. Finally, they showed that drift, contrary to the traditional view of it as random involuntary movement, appears to exhibit directionality: drift tends to move stimuli to higher cone density areas, therefore enhancing visual resolution.

      Strengths:

      P3.1. The work is of broad interest, the methods are clear, and the results are solid.

      Thank you.

      Weaknesses:

      P3.2. Literature (1/2): The authors do not appear to be aware of an important paper published in 2023 by Lin et al. (https://doi.org/10.1016/j.cub.2023.03.026), which nicely demonstrates that (i) ocular drifts are under cognitive influence, and (ii) specific task knowledge influences the dominant orientation of these ocular drifts even in the absence of visual information. The results of this article are particularly relevant and should be discussed in light of the findings of the current experiment.

      Thank you for pointing to this important work which we were aware of. It simply slipped through during writing. It is now discussed in lines 390-393. 

      P3.3. Literature (2/2): The hypothesis that hyperacuity is attributable to ocular movements has been proposed by other authors and should be cited and discussed (e.g., https://doi.org/10.3389/fncom.2012.00089, https://doi.org/10.10

      Thank you for pointing us towards these works which we have now added to the Discussion section. We would like to stress however, that we see a distinction between classical hyperacuity phenomena (Vernier, stereo, centering, etc.) as a form of positional acuity, and orientation discrimination.  

      P3.4. Drift Dynamic Characterization: The drift is primarily characterized as the "concatenated vector sum of all frame-wise motion vectors within the 500 ms stimulus duration.". To better compare with other studies investigating the link between drift dynamics and visual acuity (e.g., Clark et al., 2022), it would be interesting to analyze the drift-diffusion constant, which might be the parameter most capable of describing the dynamic characteristics of drift.

      During our analysis, we have computed the diffusion coefficient (D) and it showed qualitatively similar results to the drift length (see figures below). We decided to not show these results, because we are convinced that D is indeed not the most capable parameter to describe the typical drift characteristic seen here. The diffusion coefficient is computed as the slope of the mean square displacement (MSD). In our view, there are two main issues with applying this metric to our data, one conceptual, one factual:

      (1) Computation of a diffusion coefficient is based upon the assumption that the underlying movement is similar to a random walk process. From a historical perspective, where drift has been regarded as more random, this makes sense. We also agree that D can serve as a valuable metric, depending on the individual research question. In our data, however, we clearly show that drift is not random, and a metric quantifying randomness is thus ill-defined. 

      (2) We often observed out- and in-type motion traces, i.e. where the eye somewhat backtracks from where it started. Traces in this case are equally long (and fast) as other motion will be with a singular direction, but D would in this case be much smaller, as the MSD first increases and then decreases. In reality, the same number of cones would have been traversed as with the larger D of straight outward movement, albeit not unique cones. For our current analyses, the drift length captures this relationship better.

      Author response image 2.

      Diffusion coefficient (D) and the relation to visual acuity (see Figure 3 e-g for comparison to drift length). a, D was strongly correlated between fellow eyes. b, Cone density and D were not significantly correlated. c, The median D had a moderate correlation with visual acuity thresholds in dominant as well as non-dominant eyes. Dominant eyes are indicated by filled, nondominant eyes by open markers.

      We would like to put forward that, in general, better metrics are needed, especially in respect to the visual signals arising from the moving eye. We are actively looking into this in follow-up work, and we hope that the current manuscript might spark also others to come up with new ways of characterizing the fine movements of the eye during fixation.

      P3.5. Possible inconsistencies: Binocular differences are not expected based on the hypothesis; the authors may speculate a bit more about this. Additionally, the fact that hyperacuity does not occur with longer infrared wavelengths but the drift dynamics do not vary between the two conditions is interesting and should be discussed more thoroughly.

      Binocularity: the differences in performance between fellow eyes is rather subtle, and we do not have a firm grip on differences other than the cone mosaic and fixational motor behavior between the two eyes. We would rather not speculate beyond what we already do, namely that some factor related to the development of ocular dominance is at play. What we do show with our data is that cone density and drift patterns seem to have no part in it.  

      Effect of wavelength: even with the longer 840 nm wavelength, most eyes resolve below the Nyquist limit, with a general increase in thresholds (getting worse) compared to 788 nm. As we wrote in the manuscript, we assume that the increased image blur and reduced cone contrast introduced by the longer wavelength are key to why there is an overall reduction in acuity. No changes were made to the manuscript. As a more general remark, we would not consider the sub-Nyquist performances seen in our data to be a hyperacuity, although technically it is. The reason is that hyperacuity is usually associated with stimuli that require resolving positional shifts, and not orientation. There is a log unit of difference between thresholds in these tasks.  

      P3.6. As a Suggestion: can the authors predict the accuracy of individual participants in single trials just by looking at the drift dynamics?

      That’s a very interesting point that we indeed currently look at in another project. As a comment, we can add that by purely looking at the drift dynamics in the current data, we could not predict the accuracy (percent correct) of the participant. When comparing drift length or diffusion coefficients between trials with correct or false response, we do not observe a significant difference. Also, when adding an anatomical correlate and compare between trials where sampling density increases or decreases, there is no significant trend. We think that it is a more complex interplay between all the influencing factors that can perhaps be met by a model considering all drift dynamics, photoreceptor geometry and stimulus characteristics.   

      No changes were made to the manuscript.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      As you will see, the reviewers were quite enthusiastic about your work, but have a few issues for your consideration. We hope that this is helpful. We'll consider any revisions in composing a final eLife assessment.

      Reviewer #1 (Recommendations For The Authors):

      R1.1:  Discussion of myopia. Myopia takes a fair bit of space in the Discussion, but the paper does not include any subjects that are sufficiently myopic to test the predictions. I would suggest reducing the amount of space devoted to this issue, and instead making the prediction that myopia may help with resolution quickly. The introduction (lines 54-56) left me expecting a test of this hypothesis, and I think similarly that issue could be left out of the introduction.

      We have removed this part from the Introduction and shortened the Discussion.  

      R1.2: Line 118: define CDC here.

      Thank you for pointing this out, it is now defined at this location.  

      R1.3: Line 159-162: suggest breaking this sentence into two. This sentence also serves as a transition to the next section, but the wording suggests it is a result that is shown in the prior section. Suggest rewording to make the transition part clear. Maybe something like "Hence the spatial arrangement of cones only partially ... . Next we show that ocular motion and the associated ... are another important factor."

      Text was changed as suggested.  

      R1.4.: Figure 3: The retina images are a bit hard to see - suggest making them larger to take an entire row. As a reader, I also was wondering about the temporal progression of the drift trajectories and the relation to the CDC. Since you get to that in Figure 4, you could clarify in the text that you are starting by analyzing distance traveled and will return to the issue of directed trajectories.

      Visibility was probably an issue during the initial submission and review process where images were produced at lower resolution. The original figures are of sufficient resolution to fully appreciate the underlying cone mosaic and will later be able to zoom in the online publication.  

      We added a mention of the order of analysis in the Results section (LL 163-165)

      R1.5: Line 176: define "sum of piecewise drift amplitude" (e.g. refer to Figure where it is defined).

      We refer to this metric now as the drift length (as pointed out rightfully so by reviewer #2), and added its definition at this location.   

      R1.6: Lines 205-208: suggest clarifying this sentence is a transition to the next section. As for the earlier sentence mentioned above, this sounds like a result rather than a transition to an issue you will consider next.

      This sentence was changed to make the transition clearer. 

      R1.7: Line 225: suggest starting a new paragraph here.

      Done as suggested

      Reviewer #2 (Recommendations For The Authors):

      I don't have any major concerns, mostly suggestions and minor comments.

      R2.1: (1) The authors use piecewise amplitude as a measure of the amount of retinal motion introduced by ocular drift. However, to me, this sounds like what is normally referred to as the path length of a trace rather than its amplitude. I would suggest using the term length rather than amplitude, as amplitude is normally considered the distance between the starting and the ending point of a trace.

      This was changed as suggested throughout the manuscript. 

      R2.2: (2) It would be useful to elaborate more on the difference between CDC and PCD, I know the authors do this in other publications, but to the naïve reader, it comes a bit as a surprise that drift directionality is toward the CDC but less so toward the PCD. Is the difference between these metrics simply related to the fact that defining the PCD location is more susceptible to errors, especially if image quality is not optimal? If indeed the PCD is the point of peak cone density, assuming no errors or variability in the estimation of this point, shouldn't we expect drift moving stimuli toward this point, as the CDC will be characterized by a slightly lower density? I.e., is the absence of a PCD directionality trend as strong as the trend seen for the CDC simply the result of variability and error in the estimate of the PCD or it is primarily due to the distribution of cone density not being symmetrical around the PCD?

      Thank you for this comment. We already refer in the Methods section to the respective papers where this difference is analyzed in more detail, and shortly discuss it here.

      To briefly answer the reviewer’s final question: PCD location is too variable, and ought to be avoided as a retinal landmark. While we believe there is value in reporting the PCD as a metric of maximum density, it has been shown recently (Reiniger et al., 2021; Warr et al., 2024; Wynne et al., 2022) and is visible in our own (partly unpublished) data, that its location will change with changing one or more of these factors: cone density metric, window size or cone quantity selected, cone annotation quality, image quality (e.g. across days), individual grader, annotation software, and likely more. Each of these factors alone can change the PCD location quite drastically, all while of course, the retina does not change. The CDC on the other hand, given its low-pass filtering nature, is immune to the aforementioned changes within a much wider range and will thus reflect the anatomical and, shown here, functional center of vision, better. However, there will always be individual eyes where PCD location and the CDC are close, and thus researchers might be inclined to also use the PCD as a landmark. We strongly advise against this. In a way, the PCD is a non-sense location while its dimension, density, can be a valuable metric, as density does not vary that much (see e.g. data on CDC density and PCD density reported in this manuscript).  

      Below we append a direct comparison of PCD vs CDC location stability when only one of the mentioned factors are changed. Sixteen retinas imaged on two different days were annotated and analyzed by the same grader with the same approach, and the difference in both locations are shown.  

      Author response image 3.

      Reproducibility of CDC and PCD location in comparison. Two retinal mosaics which were recorded at two different timepoints, maximum 1 year apart from each other, were compared for 16 eyes. The retinal mosaics were carefully aligned. The retinal locations for CDC and PCD that were computed for the first timepoint were used as the spatial anchor (coordinate center), the locations plotted here as red circles (CDC) and gray diamonds (PCD) represent the deviations that were measured at the second timepoint for both metrics.  

      R2.3.: I don't see a statistical comparison between the drift angle tuning for CDC, PRL, and PCD. The distributions in Figure 4F look very similar and all with a relatively wide std. It would be useful to mark the mean of the distributions and report statistical tests. What are the data shown in this figure, single subjects, all subjects pooled together, average across subjects? Please specify in the caption.

      We added a Rayleigh test to test each distribution for nun-uniformity and Kolmogorov-Smirnov tests to compare the distributions towards the different landmarks.  We added the missing specifications to the figure caption of Figure 4 – figure supplement 1. 

      R2.4: I would suggest also calculating drift direction based on the average instantaneous drift velocity, similarly to what is done with amplitude. From Figure 3B it is clear that some drifts are more curved than others. For curved drifts with small amplitudes the start-point- end-point (SE) direction is not very meaningful and it is not a good representation of the overall directionality of the segment. Some drifts also seem to be monotonic and then change direction (eg. the last three examples from participant 10). In this case, the SE direction is likely quite different from the average instantaneous direction. I suspect that if direction is calculated this way it may show the trend of drifting toward the CDC more clearly.

      In response to this and a comment of reviewer #1, we add a calculation of initial  drift direction (and for increasing duration) and show it in Figure 4 – figure supplement 1. By doing so, we hope to capture initial directionality, irrespective of whether later parts in the path change direction. We find that directionality increases with increasing presentation duration. 

      R2.5: I find the discussion point on myopia a bit confusing. Considering that this is a rather tangential point and there are only two myopic participants, I would suggest either removing it from the discussion or explaining it more clearly.

      We changed this section, also in response to comment R1.1.

      R2.6: I would suggest adding to the discussion more elaboration on how these results may relate to acuity in normal conditions (in the presence of optical aberrations). For example, will this relationship between sampling cone density and visual acuity also hold natural viewing conditions?

      We added only a half sentence to the first paragraph of the discussion. We are hesitant to extend this because there is very likely a non-straightforward relationship between acuity in normal and fully corrected conditions. We would predict that, if each eye were given the same type and magnitude of aberrations (similar to what we achieved by removing them), cone density will be the most prominent factor of acuity differences. Given that individual aberrations can vary substantially between eyes, this effect will be diluted, up to the point where aberrations will be the most important factor to acuity. As an example, under natural viewing conditions, pupil size will dominantly modulate the magnitude of aberrations.

      R2.7: Line 398 - the point on the superdiffusive nature of drift comes out of the blue and it is unclear. What is it meant by "superdiffusive"?

      We simply wanted to express that some drift properties seem to be adaptable while others aren’t. The text was changed at this location to remove this seemingly unmotivated term. 

      R2.8: Although it is true that drift has been assumed to be a random motion, there has been mounting evidence, especially in recent years, showing a degree of control and knowledge about ocular drift (eg. Poletti et al, 2015, JN; Lin et al, 2023, Current Biology).

      We agree, of course. We mention this fact several times in the paper and adjusted some sentences to prevent misunderstandings. The mentioned papers are now cited in the Discussion. 

      R2.9: Reference 23 is out of context and should be removed as it deals with the control of fine spatial attention in the foveola rather than microsaccades or drift.

      We removed this reference. 

      R2.10: Minor point: Figures appear to be low resolution in the pdf.

      This seemed to have been an issue with the submission process. All figures will be available in high resolution in the final online version. 

      R2.11: Figure S3, it would be useful to mark the CDC at the center with a different color maybe shaded so it can be visible also on the plot on the left.

      We changed the color and added a small amount of transparency to the PRL markers to make the CDC marker more visible. 

      R2.12: Figure S2, it would be useful to show the same graphs with respect to the PCD and PRL and maybe highlight the subjects who showed the largest (or smallest) distance between PRL and CDC).

      Please find new Figure 4 supplement 1, which contains this information in the group histograms. Also, Figure 4 supplement 2 is now ordered by the distance PRL-CDC (while the participant naming is kept as maximum acuity exhibited. In this way, it should be possible to infer the information of whether PRL-CDC distance plays a role. For us it does not seem to be crucial. Rather, stimulus onset and drift length were related, which is captured in Figure 4g. 

      R2.13: There is a typo in Line 410.

      We could not find a typo in this line, nor in the ones above and below. “Interindividual” was written on purpose, maybe “intraindividual” was expected? No changes were made to the text. 

      References

      Reiniger, J. L., Domdei, N., Holz, F. G., & Harmening, W. M. (2021). Human gaze is systematically offset from the center of cone topography. Current Biology, 31(18), 4188–4193. https://doi.org/10.1016/j.cub.2021.07.005

      Ruderman, D. L., & Bialek, W. (1992). Seeing Beyond the Nyquist Limit. Neural Computation, 4(5), 682–690. https://doi.org/10.1162/neco.1992.4.5.682

      Warr, E., Grieshop, J., Cooper, R. F., & Carroll, J. (2024). The effect of sampling window size on topographical maps of foveal cone density. Frontiers in Ophthalmology, 4, 1348950. https://doi.org/10.3389/fopht.2024.1348950

      Williams, D. R. (1985). Aliasing in human foveal vision. Vision Research, 25(2), 195–205. https://doi.org/10.1016/0042-6989(85)90113-0

      Wynne, N., Cava, J. A., Gaffney, M., Heitkotter, H., Scheidt, A., Reiniger, J. L., Grieshop, J., Yang, K., Harmening, W. M., Cooper, R. F., & Carroll, J. (2022). Intergrader agreement of foveal cone topography measured using adaptive optics scanning light ophthalmoscopy. Biomedical Optics Express, 13(8), 4445–4454. https://doi.org/10.1364/boe.460821

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Weaknesses:

      The match between fractal and classical cycles is not one-to-one. For example, the fractal method identifies a correlation between age and cycle duration in adults that is not apparent with the classical method. This raises the question as to whether differences are due to one method being more reliable than another or whether they are also identifying different underlying biological differences. It is not clear for example whether the agreement between the two methods is better or worse than between two human scorers, which generally serve as a gold standard to validate novel methods. The authors provide some insight into differences between the methods that could account for differences in results. However, given that the fractal method is automatic it would be important to clearly identify criteria for recordings in which it will produce similar results to the classical method.

      Thank you for these insightful suggestions. In the revised Manuscript, we have added a number of additional analyses that provide a quantitative comparison between the classical and fractal cycle approaches aiming to identify the source of the discrepancies between classical and fractal cycle durations. Likewise, we assessed the intra-fractal and intra-classical method reliability as outlined below.

      Reviewer #1 (Recommendations For The Authors):

      One of the challenges in interpreting the results of the manuscript is understanding whether the differences between the two methods are due to a genuine difference in what these two methods are quantifying or simply noise/variability in each method. If the authors could provide some more insight into this, it would be a great help in assessing their findings and I think bolster the applicability of their method.

      (1) Method reliability: The manuscript clearly shows that cycle length is robustly correlated between fractal and classical in multiple datasets, however, it is hard to assign a meaningful interpretation to the correlation value (ie R = 0.5) without some reference point. This could be provided by looking at the intra-method correlation of cycle lengths. In the case of classical scoring, inter-scorer results could be compared, if the R-value here is significantly higher than 0.5 it would suggest genuine differences between the methods. In the case of fractal scoring, inter-electrode results could be compared / results with slight changes to the peak prominence threshold or smoothing window.

      In the revised Manuscript, we performed the following analyses to show the intra-method reliability:

      a) Classical cycle reliability: For the revised Manuscript, an additional scorer has independently defined classical sleep cycles for all datasets and marked sleep cycles with skipped REM sleep. Likewise, we have performed automatic sleep cycle detection using the R “SleepCycles” package by Blume & Cajochen (2021). We have added a new Table S8 to Supplementary Material 2 that shows the averaged cycle durations and cycle numbers obtained by the two human scorers and automatic algorithm as well as the inter-scorer rate agreement. We have added a new sheet named “Classical method reliability” that reports classical cycle durations for each participant and each dataset as defined by two human scorers and the algorithm To the Supplementary Excel file.

      We found that the correlation coefficients between two human scorers ranged from 0.69 to 0.91 (in literature, r’s > 0.7 are defined as strong scores) in different datasets, thus being higher than correlation coefficients between fractal and classical cycle durations, which in turn ranged from 0.41 to 0.55 (r’s in the range of 0.3 – 0.7 are considered moderate scores). The correlation coefficients between human raters and the automatic algorithm showed remarkably lower coefficients ranging from 0.30 to 0.69 (moderate scores) in different datasets, thus lying within the range of the correlation coefficients between fractal and classical cycle durations. This analysis is reported in Supplementary Material 2, section ”Intra-classical method reliability” and Table S8.

      b) Fractal cycle reliability: In the revised Supplementary Material 2 of our Manuscript, we assessed the intra-fractal method reliability, we correlated between the durations of fractal cycles calculated as defined in the main text, i.e., using a minimum peak prominence of 0.94 z and smoothing window of 101 thirty-second epochs, with those calculated using a minimum peak prominence ranging from 0.86 to 1.20 z with a step size of 0.04 z and smoothing windows ranging from 81 to 121 thirty-second epochs with a step size of 10 epochs (Table S7). We found that fractal cycle durations calculated using adjacent minimum peak prominence (i.e., those that differed by 0.04 z) showed r’s > 0.92, while those calculated using adjacent smoothing windows (i.e., those that differed by 10 epochs) showed r’s > 0.84. In addition, we correlated fractal cycle durations defined using different channels and found that the correlation coefficients ranged between 0.66 – 0.67 (Table S1). Thus, most of the correlations performed to assess intra-fractal method reliability showed correlation coefficients (r > 0.6) higher than those obtained to assess inter-method reliability (r = 0.41 – 0.55), i.e., correlations between fractal and classical cycle. This analysis is reported in Supplementary Material 2, section ”Intra-fractal method reliability” and Table S7. Likewise, we have added a new sheet named “Fractal method reliability” that reports the actual values for the abovementioned parameters to the Supplementary Excel file. For a discussion on potential sources of differences, see below.

      (2) Origin of method differences: The authors outline a few possible sources of discrepancies between the two methods (peak vs REM end, skipped REM cycle detection...) but do not quantify these contributions. It would be interesting to identify some factors that could predict for either a given night of sleep or dataset whether it is likely to show a strong or weak agreement between methods. This could be achieved by correlating measures of the proposed differences ("peak flatness", fractal cycle depth, or proportion of skipped REM cycles) with the mismatch between the two methods.

      In the revised Manuscript, we have quantified a few possible sources of discrepancies between the durations of fractal vs classical cycles and added a new section named “Sources of fractal and classical cycle mismatches” to the Results as well as new Tables 5 and S10 (Supplementary Material 2). Namely, we correlated the difference in classical vs fractal sleep cycle durations on the one side, and either the amplitude of fractal descent/ascent (to reflect fractal cycle depth), duration of cycles with skipped REM sleep/TST, duration of wake after sleep onset/TST or the REM episode length of a given cycle (to reflect peak flatness) on the other side. We found that a higher difference in classical vs fractal cycle duration was associated with a higher proportion of wake after sleep onset (r = 0.226, p = 0.001), shallower fractal descents (r = 0.15, p = 0.002) and longer REM episodes (r = 0.358, p < 0.001, n = 417 cycles, Table S10 in Supplementary Material 2). The rest of the assessed parameters showed no significant correlations (Table S10). We have added a new sheet named “Fractal-classical mismatch” that reports the actual values for the abovementioned parameters to the Supplementary Excel file.  

      (3) Skipped REM cycles: the authors underline that the fractal method identified skipped REM cycles. It seems likely that manual identification of skipped REM cycles is particularly challenging (ie we would expect this to be a particular source of error between two human scorers). If this is indeed the case, it would be interesting to discuss, since it would highlight an advantage of their methodology that they already point out (l644).

      In the revised Manuscript, we have added the inter-scorer rate agreement regarding cycles with skipped REM sleep, which was equal to 61%, which is 32% lower than the performance of our fractal cycle algorithm (93%). These findings are now reported in the “Skipped cycles” section of the Results and in Table S9 of Supplementary Material 2. We also discuss them in Discussion:

      “Our algorithm detected skipped cycles in 93% of cases while the hypnogram-based agreement on the presence/absence of skipped cycles between two independent human raters was 61% only; thus, 32% lower. We deduce that the fractal cycle algorithm detected skipped cycles since a lightening of sleep that replaces a REM episode in skipped cycles is often expressed as a local peak in fractal time series.”<br /> Discussion, section “Fractal and classical cycles comparison”, paragraph 5.

      Minor comments:

      - In the subjects where the number of fractal and classical cycles did not match, how large was the difference (ie just one extra cycle or more)? Correlating cycle numbers could be one way to quantify this.

      In the revised Manuscript, we have reported the required information for the participants with no one-to-one match (46% of all participants) as follows: 

      “In the remaining 46% of the participants, the difference between the fractal and classical cycle numbers ranged from -2 to 2 with the average of -0.23 ± 1.23 cycle. This subgroup had 4.6 ± 1.2 fractal cycles per participant, while the number of classical cycles was 4.9 ± 0.7 cycles per participant. The correlation coefficient between the fractal and classical cycle numbers was 0.280 (p = 0.006) and between the cycle durations – 0.278 (p=0.006).” Results, section “Correspondence between fractal and classical cycles”, last paragraph.

      - When discussing the skipped REM cycles (l467), the authors explain: "For simplicity and between-subject consistency, we included in the analysis only the first cycles". I'm not sure I understood this, could they clarify to which analysis they are referring to?

      In the revised Manuscript, we performed this analysis twice: using first cycles and using all cycles and therefore have rephrased this as follows:

      _“We tested whether the fractal cycle algorithm can detect skipped cycles, i.e., the cycles where an anticipated REM episode is skipped (possibly due to too high homeostatic pressure). We performed this analysis twice. First, we counted all skipped cycles (except the last cycles of a night, which might lack REM episode for other reasons, e.g., a participant had/was woken up). Second, we counted only the first classical cycles (i.e., the first cycle out of the 4 – 6 cycles that each participant had per night, Fig. 3 A – B) as these cy_cles coincide with the highest NREM pressure. An additional reason to disregard skipped cycles observed later during the night was our aim to achieve higher between-subject consistency as later skipped cycles were observed in only a small number of participants.” Results, section “Skipped cycles”, first paragraph.

      - The inclusion of all the hypnograms as a supplementary is a great idea to give the reader concrete intuition of the data. If the limits of the sleep cycles for both methods could be added it would be very useful.

      Supplementary Material 1 has been updated such that each graph has a mark showing the onsets of fractal and classical sleep cycles, including classical cycles with skipped REM sleep.

      - The difference in cycle duration between adults and children seems stronger / more reliable for the fractal cycle method, particularly in the histogram (Figure 3C). Is this difference statistically significant?

      In the revised Manuscript, we have added the Multivariate Analysis of Variance to compare F-values, partial R-squared and eta squared. The findings are as follows:

      “To compare the fractal approach with the classical one, we performed a Multivariate Analysis of Variance with fractal and classical cycle durations as dependent variables, the group as an independent variable and the age as a covariate. We found that fractal cycle durations showed higher F-values (F(1, 43)  \= 4.5 vs F(1, 43) = 3.1), adjusted R squared (0.138 vs 0.089) and effect sizes (partial eta squared 0.18 vs 0.13) than classical cycle durations.” Results, Fractal cycles in children and adolescents, paragraph 3.

      There have been some recent efforts to define sleep cycles in an automatic way using machine learning approaches. It could be interesting to mention these in the discussion and highlight their relevance to the general endeavour of automatizing the sleep cycle identification process.

      In the Discussion of the revised Manuscript, we have added the section on the existing automatic sleep cycle definition algorithms:

      “Even though recently, there has been a significant surge in sleep analysis incorporating various machine learning techniques and deep neural network architectures, we should stress that this research line mainly focused on the automatic classification of sleep stages and disorders almost ignoring the area of sleep cycles. Here, as a reference method, we used one of the very few available algorithms for sleep cycle detection (Blume & Cajochen, 2021). We found that automatically identified classical sleep cycles only moderately correlated with those detected by human raters (r’s = 0.3 – 0.7 in different datasets). These coefficients lay within the range of the coefficients between fractal and classical cycle durations (r = 0.41 – 0.55, moderate) and outside the range of the coefficients between classical cycle durations detected by two human scorers (r’s = 0.7 – 0.9, strong, Supplementary Material 2, Table S8).” Discussion, section “Fractal and classical cycles comparison”, paragraph 4.

      Reviewer #2 (Public Review):

      One weakness of the study, from my perspective, was that the IRASA fits to the data (e.g. the PSD, such as in Figure 1B), were not illustrated. One cannot get a sense of whether or not the algorithm is based entirely on the fractal component or whether the oscillatory component of the PSD also influences the slope calculations. This should be better illustrated, but I assume the fits are quite good.

      Thank you for this suggestion. In the revised Manuscript, we have added a new figure (Fig.S1 E, Supplementary Material 2), illustrating the goodness of fit of the data as assessed by the IRASA method.

      The cycles detected using IRASA are called fractal cycles. I appreciate the use of a simple term for this, but I am also concerned whether it could be potentially misleading? The term suggests there is something fractal about the cycle, whereas it's really just that the fractal component of the PSD is used to detect the cycle. A more appropriate term could be "fractal-detected cycles" or "fractal-based cycle" perhaps?

      We agree that these cycles are not fractal per se. In the Introduction, when we mention them for the first time, we name them “fractal activity-based cycles of sleep” and immediately after that add “or fractal cycles for short”. In the revised version, we renewed this abbreviation with each new major section and in Abstract. Nevertheless, given that the term “fractal cycles” is used 88 times, after those “reminders”, we used the short name again to facilitate readability. We hope that this will highlight that the cycles are not fractal per se and thus reduce the possible confusion while keeping the manuscript short.

      The study performs various comparisons of the durations of sleep cycles evaluated by the IRASA-based algorithm vs. conventional sleep scoring. One concern I had was that it appears cycles were simply identified by their order (first, second, etc.) but were not otherwise matched. This is problematic because, as evident from examples such as Figure 3B, sometimes one cycle conventionally scored is matched onto two fractal-based cycles. In the case of the Figure 3B example, it would be more appropriate to compare the duration of conventional cycle 5 vs. fractal cycle 7, rather than 5 vs. 5, as it appears is currently being performed.

      In cases where the number of fractal cycles differed from the number of classical cycles (from 34 to 55% in different datasets as in the case of Fig.3B), we did not perform one-to-one matching of cycles. Instead, we averaged the duration of the fractal and classical cycles over each participant and only then correlated between them (Fig.2C). For a subset of the participants (45 – 66% of the participants in different datasets) with a one-to-one match between the fractal and classical cycles, we performed an additional correlation without averaging, i.e., we correlated the durations of individual fractal and classical cycles (Fig.4S of Supplementary Material 2). This is stated in the Methods, section Statistical analysis, paragraph 2.

      There are a few statements in the discussion that I felt were either not well-supported. L629: about the "little biological foundation" of categorical definitions, e.g. for REM sleep or wake? I cannot agree with this statement as written. Also about "the gradual nature of typical biological processes". Surely the action potential is not gradual and there are many other examples of all-or-none biological events.

      In the revised Manuscript, we have removed these statements from both Introduction and Discussion.

      The authors appear to acknowledge a key point, which is that their methods do not discriminate between awake and REM periods. Thus their algorithm essentially detected cycles of slow-wave sleep alternating with wake/REM. Judging by the examples provided this appears to account for both the correspondence between fractal-based and conventional cycles, as well as their disagreements during the early part of the sleep cycle. While this point is acknowledged in the discussion section around L686. I am surprised that the authors then argue against this correspondence on L695. I did not find the "not-a-number" controls to be convincing. No examples were provided of such cycles, and it's hard to understand how positive z-values of the slopes are possible without the presence of some wake unless N1 stages are sufficient to provide a detected cycle (in which case, then the argument still holds except that its alterations between slow-wave sleep and N1 that could be what drives the detection).

      In the revised Manuscript, we have removed the “NaN analysis” from both Results and Discussion. We have replaced it with the correlation between the difference between the durations of the classical and fractal cycles and proportion of wake after sleep onset. The finding is as follows:

      “A larger difference between the durations of the classical and fractal cycles was associated with a higher proportion of wake after sleep onset in 3/5 datasets as well as in the merged dataset (Supplementary Material 2, Table S10).” Results, section “Fractal cycles and wake after sleep onset”, last two sentences. This is also discussed in Discussion, section “Fractal cycles and age”, paragraph 1, last sentence. 

      To me, it seems important to make clear whether the paper is proposing a different definition of cycles that could be easily detected without considering fractals or spectral slopes, but simply adjusting what one calls the onset/offset of a cycle, or whether there is something fundamentally important about measuring the PSD slope. The paper seems to be suggesting the latter but my sense from the results is that it's rather the former.

      Thank you for this important comment. Overall, our paper suggests that the fractal approach might reflect the cycling nature of sleep in a more precise and sensitive way than classical hypnograms. Importantly, neither fractal nor classical methods can shed light on the mechanism underlying sleep cycle generation due to their correlational approach. Despite this, the advantages of fractal over classical methods mentioned in our Manuscript are as follows:

      (1) Fractal cycles are based on a real-valued metric with known neurophysiological functional significance, which introduces a biological foundation and a more gradual impression of nocturnal changes compared to the abrupt changes that are inherent to hypnograms that use a rather arbitrary assigned categorical value (e.g., wake=0, REM=-1, N1=-2, N2=-3 and SWS=-4, Fig.2 A).

      (2) Fractal cycle computation is automatic and thus objective, whereas classical sleep cycle detection is usually based on the visual inspection of hypnograms, which is time-consuming, subjective and error-prone. Few automatic algorithms are available for sleep cycle detection, which only moderately correlated with classical cycles detected by human raters (r’s = 0.3 – 0.7 in different datasets here).

      (3) Defining the precise end of a classical sleep cycle with skipped REM sleep that is common in children, adolescents and young adults using a hypnogram is often difficult and arbitrary.   The fractal cycle algorithm could detect such cycles in 93% of cases while the hypnogram-based agreement on the presence/absence of skipped cycles between two independent human raters was 61% only; thus, 32% lower.

      (4) The fractal analysis showed a stronger effect size, higher F-value and R-squared than the classical analysis for the cycle duration comparison in children and adolescents vs young adults. The first and second fractal cycles were significantly shorter in the pediatric compared to the adult group, whereas the classical approach could not detect this difference.

      (5) Fractal – but not classical – cycle durations correlated with the age of adult participants.

      These bullets are now summarized in Table 5 that has been added to the Discussion of the revised manuscript.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Liu and colleagues applied the hidden Markov model on fMRI to show three brain states underlying speech comprehension. Many interesting findings were presented: brain state dynamics were related to various speech and semantic properties, timely expression of brain states (rather than their occurrence probabilities) was correlated with better comprehension, and the estimated brain states were specific to speech comprehension but not at rest or when listening to non-comprehensible speech.

      Strengths:

      Recently, the HMM has been applied to many fMRI studies, including movie watching and rest. The authors cleverly used the HMM to test the external/linguistic/internal processing theory that was suggested in comprehension literature. I appreciated the way the authors theoretically grounded their hypotheses and reviewed relevant papers that used the HMM on other naturalistic datasets. The manuscript was well written, the analyses were sound, and the results had clear implications.

      Weaknesses:

      Further details are needed for the experimental procedure, adjustments needed for statistics/analyses, and the interpretation/rationale is needed for the results.

      We greatly appreciate the reviewers for the insightful comments and constructive suggestions. Below are the revisions we plan to make:

      (1) Experimental Procedure: We will provide a more detailed description of the stimuli and comprehension tests in the revised manuscript. Additionally, we will upload the corresponding audio files and transcriptions as supplementary data to ensure full transparency. 

      (2) Statistics/Analyses: In response to the reviewer's suggestions, we have reproduced the states' spatial maps using unnormalized activity patterns. For the resting state, we observed a state similar to the baseline state described by Song, Shim, & Rosenberg (2023). However, for the speech comprehension task, all three states showed network activity levels that deviated significantly from zero. Furthermore, we regenerated the null distribution for behavior-brain state correlations using a circular shift approach, and the results remain largely consistent with our previous findings. We have also made other adjustments to the analyses and introduced some additional analyses, as per the reviewer's recommendations. These changes will be incorporated into the revised manuscript.

      (3) Interpretation/Rationale: We will expand on the interpretation of the relationship between state occurrence and semantic coherence. Specifically, we will highlight that higher semantic coherence may enable the brain to more effectively accumulate information over time. State #2 appears to be involved in the integration of information over shorter timescales (hundreds of milliseconds), while State #3 is engaged in longer timescales (several seconds). 

      Reviewer #2 (Public review):

      Liu et al. applied hidden Markov models (HMM) to fMRI data from 64 participants listening to audio stories. The authors identified three brain states, characterized by specific patterns of activity and connectivity, that the brain transitions between during story listening. Drawing on a theoretical framework proposed by Berwick et al. (TICS 2023), the authors interpret these states as corresponding to external sensory-motor processing (State 1), lexical processing (State 2), and internal mental representations (State 3). States 1 and 3 were more likely to transition to State 2 than between one another, suggesting that State 2 acts as a transition hub between states. Participants whose brain state trajectories closely matched those of an individual with high comprehension scores tended to have higher comprehension scores themselves, suggesting that optimal transitions between brain states facilitated narrative comprehension.

      Overall, the conclusions of the paper are well-supported by the data. Several recent studies (e.g., Song, Shim, and Rosenberg, eLife, 2023) have found that the brain transitions between a small number of states; however, the functional role of these states remains under-explored. An important contribution of this paper is that it relates the expression of brain states to specific features of the stimulus in a manner that is consistent with theoretical predictions.

      (1) It is worth noting, however, that the correlation between narrative features and brain state expression (as shown in Figure 3) is relatively low (~0.03). Additionally, it was unclear if the temporal correlation of the brain state expression was considered when generating the null distribution. It would be helpful to clarify whether the brain state expression time courses were circularly shifted when generating the null. 

      We have regenerated the null distribution by circularly shifting the state time courses. The results remain consistent with our previous findings: p = 0.002 for the speech envelope, p = 0.007 for word-level coherence, and p = 0.001 for clause-level coherence. 

      We notice that in other studies which examined the relationship between brain activity and word embedding features, the group-mean correlation values are similarly low but statistically significant and theoretically meaningful (e.g., Fernandino et al., 2022; Oota et al., 2022). We think these relatively low correlations is primarily due to the high level of noise inherent in neural data. Brain activity fluctuations are shaped by a variety of factors, including task-related cognitive processing, internal thoughts, physiological states, as well as arousal and vigilance. Additionally, the narrative features we measured may account for only a small portion of the cognitive processes occurring during the task. As a result, the variance in narrative features can only explain a limited portion of the overall variance in brain activity fluctuations.

      We will update Figure 3 and relevant supplementary figures to reflect the new null distribution generated via circular shift. Furthermore, we will expand the discussion to address why the observed brain-stimuli correlations are relatively small, despite their statistical significance.

      (2) A strength of the paper is that the authors repeated the HMM analyses across different tasks (Figure 5) and an independent dataset (Figure S3) and found that the data was consistently best fit by 3 brain states. However, it was not entirely clear to me how well the 3 states identified in these other analyses matched the brain states reported in the main analyses. In particular, the confusion matrices shown in Figure 5 and Figure S3 suggests that that states were confusable across studies (State 2 vs. State 3 in Fig. 5A and S3A, State 1 vs. State 2 in Figure 5B). I don't think this takes away from the main results, but it does call into question the generalizability of the brain states across tasks and populations. 

      We identified matching states across analyses based on similarity in the activity patterns of the nine networks. For each candidate state identified in other analyses, we calculate the correlation between its network activity pattern and the three predefined states from the main analysis, and set the one it most closely resembled to be its matching state. For instance, if a candidate state showed the highest correlation with State #1, it was labelled State #1 accordingly. 

      Each column in the confusion matrix depicts the similarity of each candidate state with the three predefined states. In Figure S3 (analysis for the replication dataset), the highest similarity occurred along the diagonal of the confusion matrix. This means that each of the three candidate states was best matched to State #1, State #2, and State #3, respectively, maintaining a one-to-one correspondence between the states from two analyses.

      For the comparison of speech comprehension task with the resting and the incomprehensible speech condition, there was some degree of overlap or "confusion." In Figure 5A, there were two candidate states showing the highest similarity to State #2. In this case, we labelled the candidate state with the the strongest similarity as State #2, while the other candidate state is assigned as State #3 based on this ranking of similarity. This strategy was also applied to naming of states for the incomprehensible condition. The observed confusion supports the idea that the tripartite-state space is not an intrinsic, task-free property. To make the labeling clearer in the presentation of results, we will use a prime symbol (e.g., State #3') to indicate cases where such confusion occurred, helping to distinguish these ambiguous matches.

      In the revised manuscript, we will give a detailed illustration for how the correspondence of states across analyses were made. 

      (3) The three states identified in the manuscript correspond rather well to areas with short, medium, and long temporal timescales (see Hasson, Chen & Honey, TiCs, 2015). Given the relationship with behavior, where State 1 responds to acoustic properties, State 2 responds to word-level properties, and State 3 responds to clause-level properties, the authors may want to consider a "single-process" account where the states differ in terms of the temporal window for which one needs to integrate information over, rather than a multi-process account where the states correspond to distinct processes.

      The temporal window hypothesis indeed provides a better explanation for our results. Based on the spatial maps and their modulation by speech features, States #1, #2, and #3 seem to correspond to the short, medium, and long processing timescales, respectively. We will update the discussion to reflect this interpretation. 

      We sincerely appreciate the constructive suggestions from the two anonymous reviewers, which have been highly valuable in improving the quality of the manuscript.

    1. Author response:

      Joint Public Review:

      Strengths:

      The insulin-dependent signaling in the central nervous system is relatively understudied. This explorative study delves into several interesting and clinically relevant possibilities, examining how insulin-dependent signaling and its crosstalk with WNK kinases might affect brain circuits involved in memory formation and/or anxiety. Therefore, these findings might inspire follow-up studies performed in disease models for disorders that exhibit impaired glucose metabolism, deficient memory, or anxiety, such as Diabetes mellitus, Alzheimer's disease, or most psychiatric disorders.

      The graphical presentation of the figures is of high quality, which helps the reader to obtain a good overview and easily understand the experimental design, results, and conclusions.

      The behavioral studies are well conducted and provide valuable insights into the role of WNK kinases in glucose metabolism and their effect on learning and memory. Additionally, the authors evaluate the levels of basal and induced anxiety in Figures 1 and 2, enhancing our understanding of how WNK signaling might engage in cognitive function and anxiety-like behavior, particularly in the context of altered glucose metabolism.

      We thank the reviewers for recognizing the strengths of our study.

      Weaknesses:

      The study used a WNK643 inhibitor as the only tool to manipulate WNK1-4 activity. This inhibitor seems selective; however, it has been reported that it exhibits different efficiency in inhibiting the individual WNK kinases among each other (e.g. PMID: 31017050, PMID: 36712947). Additionally, the authors do not analyze nor report the expression profiles or activity levels of WNK1, WNK2, WNK3, and WNK4 within the relevant brain regions (i.e. hippocampus, cortex, amygdala). Combined, these weaknesses raise concerns about the direct involvement of WNK kinases within the selected brain regions and behavior circuits. It would be beneficial if the authors provided gene profiling for WNK1, 2, 3, and -4 (e.g. using Allen brain atlas). To confirm the observations, the authors should either add results from using other WNK inhibitors or, preferentially, analyze knock-down or knock-out animals/tissue targeting the single kinases.

      We thank the reviewers for the suggestions. To address the criticism and as recommended, we have planned to include gene profiling for WNK1-4 in the brain from Allen brain atlas. Additionally, we have planned to include the effect of WNK1 knockdown on pAKT levels in immortalized SHSY5Y cells.

      The authors do not report any data on whether the global inhibition of WNKs affects insulin levels. Since the authors wish to demonstrate the synergistic effect of simultaneous insulin treatment and WNK1-4 inhibition, such data are missing.

      To address this critique, we have planned to include plasma insulin levels upon global inhibition of WNKs using WNK463 in C57BL/6J mice.

      The study discovered that the Sortilin receptor binds to OSR1, leading the authors to speculate that Sortilin may be involved in the insulin-dependent GLUT4 surface trafficking. However, the authors do not provide any evidence supporting Sortilin's involvement in insulin- or WNK-dependent GLUT4 trafficking. Thus, this conclusion should be qualified, rephrased, or additional data included.

      We thank the reviewers for suggesting experiments that will significantly enhance the clarity of our conclusions. We have planned to include immunofluorescence staining data for sortilin localization in SHSY5Y cells under conditions of DMSO, insulin and/or WNK463 treatment. These data would suggest whether WNK463 treatment affects localization of sortilin in the golgi network which has been shown by previous studies to affect sortilin-dependent GLUT4 trafficking.

    1. Author response:

      We would like to thank the reviewers for their positive evaluation of our work, and their comments inspiring useful discussion. We will provide an in-depth response once one of the key authors has returned from parental leave (in some months), but below we share initial thoughts:  

      Both reviewers asked to see more gaze data to understand how eye movements in patients with achromatopsia might drive our results. We will expand our analyses of eye tracking data and discuss the implications in more depth, but would like to note that our key findings (no change in signal coverage in the foveal rod-scotoma projection zone in achromats, and changes in connective fields) are both robust to eye movement, and unlikely to be driven by gaze differences. Where this is less clear (i.e., population Receptive Field eccentricities are shifted outwards and increased in size), we have highlighted this and avoided drawing strong conclusions. 

      Reviewer 1 questioned why smaller connective fields (CFs) were observed in achromats, suggesting that their flatter V1 eccentricity tuning should predict larger CFs. It’s not straightforward to predict how V1's population receptive field (pRF) tuning profile shapes V3's sampling extent, as CFs are driven, but not dictated by V1 - they combine and integrate V1 signals. As we’re dealing with an atypically developed visual system, assumptions about expected relationships are complicated further. We believe that the most relevant aspect of pRF data to the interpretability of V3 CF extent, is the ratio between V1 and V3 pRF sizes. Our outcomes show that pRF sizes in achromats, while larger in V1, are more normalized in V3, predicting more local V3 sampling from V1. This is what our quantifications of CF size show across two independent measures with different stimuli. We will provide further data to address reviewer 1's various queries about the potential causes of the pRF eccentricity shifts in achromats, the relationship between pRFs and CFs, and methodological details of CF fits.

      We thank the reviewers again for their insightful  comments and look forward to providing more comprehensive responses to their queries substantiated with data as soon as possible.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      The findings of Ziolkowska and colleagues show that a specific projection from the nucleus reuniens of the thalamus (RE) to dorsal hippocampal CA1 neurons plays an important role in fear extinction learning in male and female mice. In and of itself, this is not a particularly new finding, although the authors' identification of structural alterations from within dorsal CA1 stratum lacunosum moleculare (SLM) as a candidate mechanism for the learning-related plasticity is potentially novel and exciting. The authors use a range of anatomical and functional approaches to demonstrate structural synaptic changes in dorsal CA1 that parallel the necessary role of RE inputs in modulating extinction learning. Yet, the significance of these findings is substantially limited by several technical shortcomings in the experimental design, and the authors' central interpretation. Otherwise, there remain several strengths in the design and interpretation that offset some of these concerns.

      Given that much is already known about the role of RE and hippocampus in modulating fear learning and extinction, it remains unclear whether addressing these concerns would substantially increase the impact of this study beyond the specific area of speciality. Below, several major weaknesses will be highlighted, followed by several miscellaneous comments.

      Methodological:

      (1) One major methodological weakness in the experimental design involves the widespread misapplication of Ns used for the statistical analyses. Much of the anatomical analyses of structural synaptic changes in the RE-CA1 pathway use N = number of axons (Figs. 1, 2), N = number of dendrites (Figs. 3, 4), and N = number of sections (Fig. 7; note that there are 7 figures in total). In every instance, N = animal number should be used. It is unclear which of these results would remain significant if N = animal number were used in each or how many more animals would be required. This is problematic since these data comprise the main evidence for the authors' central conclusion that specific structural synaptic changes are associated with fear extinction learning.

      We do agree with the reviewer that N = animal number is the preferred way to present data in most of our experiments. However, in some experimental groups we observed a very low number of entries. For example, in the 5US group we found RE+/+ spines only in 3 out of 6 analyzed animals. We believe that this observation is not due to technical problems as mCherry virus transduction required to find RE+/+ spines is similar in all experimental groups and we analyzed similar volumes of tissue. While this result still allows the calculation of density of RE+/+ spines per animal it generates no entries for spine area and PSD95 mean gray value if N = animal number. Hence, we decided to use N=animals to calculate spines and boutons densities, and N=dendritic spines/boutons to calculate other spine/bouton parameters.

      (2) There is a lack of specific information regarding what constitutes learning with respect to behavioral freezing. It is never clearly stated what specific intervals are used over which freezing is measured during acquisition, extinction, and in extinction retrieval tests. Additionally, assessment of freezing during retrieval at 5- and 30-min time points doesn't lay to rest the possibility that there were differences in the decay rate over the 30-min period (also see below).

      We added a detailed description of how learning was assessed.

      ln 125-134: For assessment of learning we used percent of time spent by animals freezing (% freezing). Freezing behavior was defined as complete lack of movement, except respiration. To assess within-session learning (working memory) we compared pre- and post-US freezing frequency (the first 148 sec vs last 30 sec) during the CFC session (day 1). To assess formation of long-term contextual fear memory, we compared pre-US freezing (day 1) and the first 5 minutes of the Extinction session (day 2). To assess within session contextual fear extinction we ran 2-way ANOVA to assess the effect of time and manipulation on freezing frequency. Freezing data were analyzed in 5-minute bins. To assess formation of long-term contextual fear extinction memory we compared the first 5 minutes of the Extinction session (day 2) and Test session (day 3).

      As suggested by the reviewer, we also added data for all six 5-minut bins of Extinction sessions. 

      (3) A minor-to-moderate methodological weakness concerns the authors' decision to utilize saline injected groups as controls for the chemogenetics experiments (Figs. 5, 6). The correct design is to have a CNO-only group with the same viral procedure sans hM4Di. This concern is partly mitigated by the inclusion of a CNO vs. saline injection control experiment (Fig. 6).

      Figure 5 does not describe a chemogenetic experiment.

      We added new groups with control virus (CNO vs saline) to Figure 6 (now Fig. 6D and H). 

      The chemogenetic experiment shown on Figure 7 has all 4 experimental groups (Control vs hM4Di and saline vs CNO).

      (4) In the electron microscopic analyses of dendritic spines (Fig. 5), comparison of only the fear acquisition versus extinction training, and the lack of inclusion of a naïve control group, makes it difficult to understand how these structural synaptic changes are occurring relative to baseline. It is noteworthy that the authors utilize the tripartite design in other anatomical analyses (Fig. 2-4).

      We added data for the Naive mice to Figure 5.

      (5) Interpretation:

      The main interpretive weakness in the study is the authors' claim that their data shows a role for the RE-CA1 pathway in memory consolidation (i.e., see Abstract). This claim is based on the premise that, although RE-CA1 pathway inactivation with CNO treatment 30 min prior to contextual fear extinction did not affect freezing at 5- and 30-min time points relative to saline controls, these rats showed greater freezing when tested on extinction retrieval 24 h thereafter. First, the data do not rule out possible differences in the decay rate of freezing during extinction training due to CNO administration. Next, the fact that CNO is given prior to training still leaves open the possibility that acquisition was affected, even if there were not any frank differences in freezing. Support for this latter possibility derives from the fact that mice tested for extinction retrieval as early as 5 min after extinction training (Fig. 6C) showed the same impairments as mice tested 24 h later (Figs. 6A). Further, all the structural synaptic changes argued to underlie consolidation were based on analysis at a time point immediately following extinction training, which is too early to allow for any long-term changes that would underlie memory consolidation, but instead would confer changes associated with the extinction training event.

      We do agree with the reviewer that our data do not allow us to conclude whether RE-CA1 pathway is involved in acquisition or consolidation of CFE memory. Therefore, we avoid those terms in the manuscript. We just conclude that RE→CA1 participates in the CFE.

      Reviewer #2 (Public review):

      Summary:

      Ziółkowska et al. characterize the synaptic mechanisms at the basis of the REdCA1 contribution to the consolidation of fear memory extinction. In particular, they describe a layer specific modulation of RE-dCA1 excitatory synapses modulation associated to contextual fear extinction which is impaired by transient chemogenetic inhibition of this pathway. These results indicate that RE activity-mediated modulation of synaptic morphology contributes to the consolidation of contextual fear extinction

      Strengths:

      The manuscript is well conceived, the statistical analysis is solid and methodology appropriate. The strength of this work is that it nicely builds up on existing literature and provides new molecular insight on a thalamo-hippocampal circuit previously known for its role in fear extinction. In addition, the quantification of pre- and post-synapses is particularly thorough.

      Weaknesses:

      The findings in this paper are well supported by the data more detailed description of the methods is needed.

      (1) In the paragraph Analysis of dCA1 synapses after contextual fear extinction (CFE), more experimental and methodological data should be given in the text: 

      - how was PSD95 used for the analysis, what was the difference between RE. Even if Thy1-GFP mice were used in Fig.2, it appears they were not used for bouton size analysis. To improve clarity, I suggest moving panel 2C to Figure 3. It is not clear whether all RE axons were indiscriminately analysed in Fig. 2 or if only the ones displaying colocalization with both PSD95 and GFP were analysed. If GFP was not taken into account here, analysed boutons could reflect synapses onto inhibitory neurons and this potential scenario should be discussed.

      PSD-95 immunostaining in close apposition to boutons was used to identify RE buttons innervating CA1 (Fig 1 and 2). In these cases PSD-95 signal was not quantified. PSD-95 in close apposition to dendritic spines was used as a proxy of PSDs in CA1 (Figure 3, 4 and 7). In these cases we assessed the integrated mean gray value of PSD-95 signal per dendritic spine (Figure 3, 4) or per ROI (Figure 7). This is explained in detail in the section Confocal microscopy and image quantification (ln 149-172).

      GFP signal was not taken into account during boutons analysis. This is explained in the materials and methods section Confocal microscopy and image quantification (ln 149-172).

      We indicate that PSD-95 is a marker of excitatory synapses located both on excitatory and inhibitory neurons.

      Ln 258: RE boutons were identified in SO and SLM as axonal thickenings in close apposition to PSD-95-positive puncta (a synaptic scaffold used as a marker of excitatory synapses located both on excitatory and inhibitory neurons (Kornau et al., 1995; El-Husseini et al., 2000; Chen et al., 2011; Dharmasri et al., 2024). 

      We also cite literature demonstrating that RE projects to the hippocampal formation and forms asymmetric synapses with dendritic spines and dendrites, suggesting innervation of excitatory synapses on both excitatory and aspiny inhibitory neurons (ln 673).

      As advised by the reviewer the Figure 2C panel was moved to Figure 3 (now it is Fig 3A).

      (2) in the methods: The volume of intra-hippocampal CNO injections should be indicated. The concentration of 3 uM seems pretty low in comparison with previous studies. CNO source is missing.

      This section has been rewritten to be more clear. The concentration of CNO was chosen based on the previous studies (Stachniak et al., 2014).

      ln 103: Cannula placement. Mice were anesthetized by inhalation of 3–5% isoflurane (IsoFlo; Abbott Animal Health) in oxygen and positioned in a stereotaxic frame (51503, Stoelting, Wood Dale, IL, USA). Two holes were drilled in the skull, and a double guide cannulae (2 mm apart and 2 mm long; 26GA, Plastics One) was lowered into the holes such that the cannula tip was located over dorsal CA1 area (2 mm posterior to bregma, ±1 mm lateral, and −1.3 mm vertical). Cannulae were kept patent by using 33-gauge internal dummy cannulae (Plastics One). The animals were used in contextual fear conditioning 21 days after the cannulation. Animals received bilateral CNO (3 μM, 0.2 μl per side for 1 min; Tocris Bioscience, Cat. No. 4936) (Stachniak et al., 2014) or saline injections (0.2 μl per side) 30 minutes before Extinction session via intrahippocampal injection cannulae (33-gauge). After the infusion, the cannula was left in place for 30 seconds. The cannula placement was verified by histology, and only data from animals with correct cannula implants were included in statistical analyses.”

      (3) More details of what software/algorithm was used to score freezing should be included. 

      Freezing was automatically scored with VideoFreeze™ Software (Med Associates Inc.).

      (4) Antibody dilutions for IHC should be indicated. Secondary antibody incubation time should be indicated.

      The missing information is added.

      ln 144: Next, sections were incubated in 4°C overnight with primary antibodies directed against PSD-95 (1:500, Millipore, MAB 1598), washed three times in 0.3% Triton X-100 in PBS and incubated in room temperature for 90 minutes with a secondary antibody bound with Alexa Fluor 647 (1:500, Invitrogen, A31571). 

      (5) No statement about code and data availability is present.

      The statements are added.

      ln 785: Row data and the code used for analysis of confocal data is available at OSF (https://osf.io/bnkpx/).

      Reviewer #3 (Public review):

      Summary:

      This paper examined the role of nucleus reuniens (RE) projections to dorsal CA1 neurons in context fear extinction learning. First, they show that RE neurons send excitatory projections to the stratum oriens (SO) and the stratum lacunosum moleculare (SLM), but not the stratum radiatum (SR). After context fear conditioning, the synaptic connections between RE and dCA1 neurons in the SLM (but not the SO) are weakened (reduced bouton and spine density) after mice undergo context fear conditioning. This weakening is reversed by extinction learning, which leads to enhanced synaptic connectivity between RE inputs and dendrites in the SLM. Control experiments demonstrate that the observed changes are due to extinction and not caused by simple exposure to the context. Extinction learning also induced increases in the size (volume and surface area) of the post-synaptic density (PSD) in SLM. To establish the functional role of RE inputs to dCA1, the researchers used an inhibitory DREADD to silence this pathway during extinction learning. They observe that extinction memory (measured 2-hours or 24-hours later) is impaired by this inhibition. Control experiments show that the extinction memory deficit is not simply due to increased freezing caused by inactivation of the pathway or injections of CNO. Inhibiting the RO projection during extinction learning also reduced the levels of PSD-95 protein levels in the spines of dCA1 neurons.

      Strengths:

      Based on their results, the authors conclude that, "the RE→SLM pathway participates in the updating of fearful context value by actively regulating CFE-induced molecular and structural synaptic plasticity in the SLM.". I believe the data are generally consistent with this hypothesis, although there is an important control condition missing from the behavioral experiments.

      Weaknesses:

      (1) A defining feature of extinction learning is that it is context specific (Bouton, 2004). It is expressed where it was learned, but not in other environments. Similarly, it has been shown that internal contexts (or states) also modulate the expression of extinction (Bouton, 1990). For example, if a drug is administered during extinction learning, it can induce a specific internal state. If this state is not present during subsequent testing, the expression of extinction is impaired just as it is when the physical context is altered (Bouton, 2004). It is possible that something similar is happening in Figure 6. In these experiments, CNO is administered to inactivate the RE-dCA1 projection during extinction learning. The authors observe that this manipulation impairs the expression of extinction the next day (or 2-hours later). However, the drug is not given again during the test. Therefore, it is possible that CNO (and/or inactivation of the RE-dCA1 pathway) induces a state change during extinction that is not present during subsequent testing. Based on the literature cited above, this would be expected to disrupt fear extinction as the authors observed. To determine if this alternative explanation is correct, the researchers need to add groups that receive CNO during extinction training and subsequent extinction testing. If the deficits in extinction expression reported in Figure 6 result from a state change, then these groups should not exhibit an impairment. In contrast, if the authors' account is correct, then the expression of extinction should still be disrupted in mice that receive CNO during training and testing.

      We do agree with the reviewer that such an experiment would be interesting. However, it could be also confusing as we could not distinguish whether the possible behavioral effects are related to the state-dependent aspects of CFE or impaired recall of CFE. Importantly, previous studies showed that RE is crucial for extinction recall (Totty et al., 2023). We also show that CFE memory is impaired not only when the animals recall CFE without CNO (day 3) but also with CNO (day 4) (Figure 6C). Moreover, we do not see the effects of CNO on CFE in the control groups (Figure 6D and H). So we believe that it is unlikely that CNO results in state-dependent CFE.

      (2) In their analysis of dCA1 synapses after contextual fear extinction (CFE) (Figure 4), the authors should have compared Ctx and Ctx-Ctx animals against naïve animals (as they did in Figure 3) when comparing 5US and Ext with naïve animals. Otherwise, the authors cannot make the following conclusion; "since changes of SLM synapses were not observed in the animals exposed to the familiar context that was not associated with the USs, our data support the role of the described structural plasticity at the RE→SLM synapses in CFE, rather than in processing contextual information in general.".

      We assume that the key experimental groups to conclude about synaptic plasticity related to particular behavior are the groups that differ just by one factor/experience. For CFE that would be mice sacrificed immediately before and after CFE session (Figure 2 & 3); on the other hand to conclude about the effects of the re-exposure to the neutral context mice sacrificed before and after second exposure to the neutral context are needed (Figure 4). The naive group, as it differs by at least two manipulations from the Ext and Ctx-Ctx groups, is interesting but not crucial in both cases. This group would be necessary if we focused on the memories of FC or novel context. However, these topics are not the main focus of the current manuscript. Still, the naive group is shown on Figures 2 & 3 to check if CFE brings spine parameters to the levels observed in mice with low freezing.

      We have re-written the cited paragraph to be more precise in our conclusions. 

      "Overall, our data demonstrate that synapses in all dCA1 strata undergo structural or molecular changes relevant to CFC and/or CFE. However, only in SLM CFE-induced synaptic changes are likely to be directly regulated by RE inputs as they appear on RE+ dendrites and spines. Since such changes of SLM synapses were not observed in the animals re-exposed to the neutral context, our data support the role of the described structural plasticity at the RE→SLM synapses in CFE, rather than in processing contextual information in general."

      (3) In the materials and methods section, the description of cannula placements is confusing and needs to be rewritten.

      This section has been rewritten.

      ln 103: Cannula placement. Mice were anesthetized by inhalation of 3–5% isoflurane (IsoFlo; Abbott Animal Health) in oxygen and positioned in a stereotaxic frame (51503, Stoelting, Wood Dale, IL, USA). Two holes were drilled in the skull, and a double guide cannulae (2 mm apart and 2 mm long; 26GA, Plastics One) was lowered into the holes such that the cannula tip was located over dorsal CA1 area (2 mm posterior to bregma, ±1 mm lateral, and −1.3 mm vertical). Cannulae were kept patent by using 33-gauge internal dummy cannulae (Plastics One). The animals were used in contextual fear conditioning 21 days after the cannulation. Animals received bilateral CNO (3 μM, 0.2 μl per side for 1 min; Tocris Bioscience, Cat. No. 4936) (Stachniak et al., 2014) or saline injections (0.2 μl per side) 30 minutes before Extinction session via intrahippocampal injection cannulae (33-gauge). After the infusion, the cannula was left in place for 30 seconds. The cannula placement was verified by histology, and only data from animals with correct cannula implants were included in statistical analyses.”

    1. Author response:

      We are grateful to the reviewers for their thoughtful and constructive feedback on our manuscript. Based on the Public Reviews, we will address the concerns raised by each reviewer through a combination of new analyses, clarifications, and expanded discussion as outlined below:

      Reviewer #1:

      (1) Integration of Positive Selection Results:

      We will enhance the integration of positive selection analyses throughout the manuscript. Specifically, we will discuss how the positively selected sites in primates, including site 193, inform IFIT1 function. We will expand the discussion to explain how PAML, FUBAR, and MEME complement each other and why MEME did not detect site 193 in primates. Additionally, we will provide a rationale for focusing on the three sites identified in primates and address the overlap with bat orthologs.

      (2) Expression Levels and Antiviral Activity:

      We acknowledge the variability in IFIT1 ortholog expression levels. To address this, we will quantify and normalize protein expression to GAPDH across all orthologs, allowing for a more accurate comparison of antiviral activity. We will revise the text to clarify that species-specific diVerences in viral suppression may be influenced by expression levels.

      (3) Clarification of Terminology and Data Interpretation:

      We will refine our description of the antiviral eVects observed for SINV in Figure 4E. We will also revise statements related to protein expression in the relevant sections to improve accuracy.

      (4) Cohesion of Data:

      We will work to more tightly connect the evolutionary analysis with the functional virology data, framing the manuscript around how positive selection shapes IFIT1 function across species. 

      Reviewer #2:

      (1) Recombination Analysis of IFIT1:

      We will conduct a recombination analysis using GARD from the HyPhy package to ensure that the signatures of positive selection are not confounded by recombination between IFIT1 and IFIT1B. 

      (2) Clarification of IFIT1 Homologs Studied:

      We will provide additional details on how IFIT1 orthologs were selected, including addressing the relationship between IFIT1 and IFIT1B. We will support this by presenting additional sequence comparisons to demonstrate the orthology of the proteins studied.

      (3) Chimpanzee IFIT1 Loss of Function:

      We will revise the discussion of chimpanzee IFIT1 to better reflect the data. 

      (4) Presentation of Antiviral Specificity Data:

      We will include a supplementary table listing the percentage of infection normalized to control by VSV and VEEV for each ortholog to allow for clearer comparisons.

      Additionally, we will provide an alternative visualization to better compare the data sets. 

      Reviewer #3:

      (1) Alternative Hypotheses for IFIT1 Antiviral Activity such as IFIT1-IFIT interactions:

      We will expand the discussion to consider alternative hypotheses, including the potential for IFIT1 activity to be regulated through interactions with other IFIT family members. Therefore, we will address how IFIT1-IFIT interactions may be broadly applicable to our findings with IFIT1 orthologs. In addition, we will clarify that we do not conclude that residues 362/4/6 are the sole drivers of antiviral specificity across the orthologs tested in this study.

      (2) Generalization of Findings Across Orthologs:

      We acknowledge that the functional importance of residues 362/4/6 may not be generalizable across all orthologs. We will discuss this limitation more explicitly in the manuscript, while also expanding on how these findings apply specifically to primate IFIT1 orthologs.

      We believe that these revisions will address the key concerns raised by the reviewers and strengthen the manuscript. We look forward to submitting the revised version for further consideration.

    1. Author Response:

      We are grateful to the reviewers for their encouraging comments and constructive suggestions. These suggestions will be valuable to improve the revised manuscript.

      Reviewer 1:

      PD-1 signaling is suppressive to the establishment of cytokine-producing effector cells in general. However, as the reviewer pointed out, one of the results in Fig. 2H showing a decrease of IFN-gamma-producing cells is against this trend. The data indicate percentages of cytokine-producing cells, which are not always consistent with the absolute number of activated T cells. Nonetheless, we plan additional experiments in order to address the question.

      For PD-1YFYF experiments in Figs. 3-5, there were moderate changes in cytokine production between wild-type and mutant PD-1. We conducted gene transduction to newly prepared T cells in each experiment. In addition, to monitor the immunosuppressive effect of PD-1 agonist antibodies, these T cells were stimulated using PD-L1-deficient APC. Therefore, we think these cytokine levels were most likely a technical variation, but not specific function of PD-1YFYF.

      Anti-PD-L1 mAb was used for the optimal blockade of PD-1/PD-L1 blockade, and the concentration of antibody (5 microg/ml) is within a normal range for this purpose. We used variable concentrations of OVA peptide to set up experiments with different intensities of TCR stimulation. TCR signal intensity has been shown to affect CD4+ T cell differentiation into Th1 and Th2 cells. We lowered the peptide concentration to test the effect of PD-1 signals under the suboptimal TCR stimulation.

      Reviewer 2:

      Antigen-specific T cells from immunized mice are not ideal for Th differentiation studies because activated T cells in response to the antigen might have already undergone functional differentiation in vivo. Incorporating the reviewer’s suggestion, we will test alternative approach including human CD4+ T cells.

      For the allergy model, we will expand the analysis for inflammatory effectors.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      Aging is associated with a number of physiologic changes including perturbed circadian rhythms. However, mechanisms by which rhythms are altered remain unknown. Here authors tested the hypothesis that age-dependent factors in the sera affect the core clock or outputs of the core clock in cultured fibroblasts. They find that both sera from young and old donors are equally potent at driving robust ~24h oscillations in gene expression, and report the surprising finding that the cyclic transcriptome after stimulation by young or old sera differs markedly. In particular, genes involved in the cell cycle and transcription/translation remain rhythmic in both conditions, while genes associated with oxidative phosphorylation and Alzheimer's Disease lose rhythmicity in the aged condition. Also, the expression of cycling genes associated with cholesterol biosynthesis increases in the cells entrained with old serum. Together, the findings suggest that age-dependent blood-borne factors, yet to be identified, affect circadian rhythms in the periphery. The most interesting aspect of the paper is that the data suggest that the same system (BJ-5TA), may significantly change its rhythmic transcriptome depending on how the cells are synchronized. While there is a succinct discussion point on this, it should be expanded and described whether there are parallels with previous works, as well as what would be possible mechanisms for such an effect.

      We’ve expanded our discussion in the manuscript to discuss possible mechanisms and also how the genes/pathways implicated in our study relate to other aging literature.  

      Major points: 

      Fig 1 and Table S1. Serum composition and levels of relevant blood-borne factors probably change in function of time. At what time of the day were the serum samples from the old and young groups collected? This important information should be provided in the text and added to Table S1. 

      We made sure to highlight the collection time in the abstract of the manuscript “We collected blood from apparently healthy young (age 25-30) and old (age 70-76) individuals at 14:001 and used the serum to synchronize cultured fibroblasts.” The time of blood draw is also in sections of the paper (Intro and Methods). Since Table S1 is demographic information, we did not think that the blood draw time fit best there, but hopefully it is now clear in the text.

      Fig 2A. Luminescence traces: the manuscript would greatly benefit from inclusion of raw luminescence traces.

      Raw luminescence traces have been added to Figure S3 (S3A).

      Fig 2. Of the many genes that change their rhythms after stimulation with young and old sera, what are the typical fold changes? For example, it would be useful to show histograms for the two groups. Does one group tend to have transcript rhythms of higher or lower fold changes? 

      We’ve presented these data in Figure S5. There are a few significant differences, but largely the groups are similar in terms of fold change.

      Fig. 2 Gene expression. Also here, the presentation would benefit from showing a few key examples for different types of responses. 

      Sample traces of genes that gain rhythmicity, lose rhythmicity, phase shift, and change MESOR are now illustrated in Figure S6.

      What was the rationale to use these cells over the more common U2OS cells? Are there similarities between the rhythmic transcriptomes of the BJ-5TA cells and that of U2OS cells or other human cells? This could easily be assessed using published datasets. 

      The original rationale to use BJ-5TA fibroblast cells was that we were aiming to build upon an observation found in a previous study2 which showed that circadian period changes with age in human fibroblasts. While our findings did not match theirs, we think an added benefit of using the BJ-5TA line is that unlike U2OS cells, it is not a carcinoma derived cell line. We’ve added this point in lines 98-101.

      Our study finds many more rhythmic transcripts compared to the previous studies examining U2OS cells. This can be attributed to several factors including differences in methods, including the use of human serum in our study, cell type differences, or decoupling of rhythms in some cancer cells. While a comparison of BJ-5TA cells and U2OS cells could be interesting, a proper comparison requires investigation of many data sets, since any pair of BJ-5TA and U2OS data sets will most likely differ in some detail of experimental design or data processing pipeline, which could contribute to observed differences in rhythmic transcripts.

      That being said, we compared clock reference genes (see Author response image 1) between BJ-5TA and U2OS cells, comparing circadian profiles obtained from our data with those available on CircaDB. These circadian profiles exhibit many similarities and a few differences. The peak to trough ratios (amplitudes) are quite similar for ARNTL, NR1D1, NR1D2, PER2, PER3, and are about 25% lower for CRY1 and somewhat higher for TEF (about 15%) in our data. We find that the MESORS are generally similar with the exception of NR1D1 which is much lower and NR1D2 which is much higher in our data.

      Author response image 1.

      BJ-5TA and U2OS Cells Exhibit Similar Profiles of Circadian Gene Transcription. We compared the transcriptomic profiles of the BJ-5TA cells in young and old serum (left) to the U2OS transcriptomic data (right) available on CircaDB, a database containing profiles of several circadian reference genes in U2OS cells. This figure suggests that circadian profiles of these genes exhibit many similarities. We find that the peak to trough ratios (amplitudes) are similar for ARNTL, NR1D1, NR1D2, Per2, PER3, and that the MESORS are similar (with the exception of NR1D1 which is much lower and NR1D2 which is much higher in the BJ-5TA cells). We find that the amplitudes of CRY1 is ~25% lower and TEF is ~15% higher for the BJ5TA cells. The axis for plots on the left show counts divided by 3.5 in order to made MESORs of ARNTL similar to ease comparison.

      For the rhythmic cell cycle genes, could this be the consequence of the serum which synchronizes also the cell cycle, or is it rather an effect of the circadian oscillator driving rhythms of cell cycle genes? 

      This is an interesting point. Given our previous data showing that the cell cycle gene cyclin D1 is regulated by clock transcription factors3, we believe the circadian oscillator drives, or at least contributes, to rhythms of cell cycle genes. However, the serum clearly makes a difference as we find that MESORs of cell cycle genes decrease with aged serum. This is consistent with the decreased proliferation previously observed in aged human tissue4.

      While the reduction of rhythmicity in the old serum for oxidative phosphorylation transcripts is very interesting and fits with the general theme that metabolic function decreases with age, it is puzzling that the recipient cells are the same, but it is only the synchronization by the old and young serum that changes. Are the authors thus suggesting that decrease of metabolic rhythms is primarily a non cell-autonomous and systemic phenomenon? What would be a potential mechanism? 

      We are indeed suggesting this, although it is also possible that it is not cycling per se, but rather an overall inefficiency of oxidative phosphorylation that is conveyed by the serum. Relating other work in the field to our findings, we’ve added the following to our discussion: “Previous work in the field demonstrates that synchronization of the circadian clock in culture results in cycling of mitochondrial respiratory activity5,6 further underscoring the different effects of old serum, which does not support oscillations of oxidative phosphorylation associated transcripts. Age-dependent decrease in oxidative phosphorylation and increase in mitochondrial dysfunction7 has been seen in aged fibroblasts8 and contributes to age-related diseases9. We suggest that the age-related inefficiency of oxidative phosphorylation is conferred by serum signals to the cells such that oxidative phosphorylation cycles are mitigated. On the other hand, loss of cycling could contribute to impairments in mitochondrial function with age.”

      The delayed shifts after aged serum for clock transcripts (but not for Bmal1) are interesting and indicate that there may be a decoupling of Bmal1 transcript levels from the other clock gene phases. How do the authors interpret this? could it be related to altered chronotypes in the elderly? 

      One possible explanation is that the delay of NPAS2, BMAL1’s binding partner, results in the delay of the transcription of clock controlled genes/negative arm genes. Since the RORs do not seem to be affected, Bmal is transcribed/translated as usual, but there isn’t enough NPAS2 to bind with BMAL1. In this case downstream genes are slower to transcribe causing the phase delay.

      Reviewer #2 (Public Review): 

      Schwarz et al. have presented a study aiming to investigate whether circulating factors in sera of subjects are able to synchronize depending on age, circadian rhythms of fibroblast. The authors used human serum taken from either old (age 70-76) or young (age 25-30) individuals to synchronise cultured fibroblasts containing a clock gene promoter driven luciferase reporter, followed by RNA sequencing to investigate whole gene expression. 

      This study has the potential to be very interesting, as evidence of circulating factors in sera that mediate peripheral rhythms has long been sought after. Moreover, the possibility that those factors are affected by age which could contribute to the weaken circadian rhythmicity observed with aging. 

      Here, the authors concluded that both old and young sera are equally competent at driving robust 24 hour oscillations, in particular for clock genes, although the cycling behaviour and nature of different genes is altered between the two groups, which is attributed to the age of the individuals. This conclusion could however be influenced by individual variabilities within and between the two age groups. The groups are relatively small, only four individual two females and two males, per group. And in addition, factors such as food intake and exercise prior to blood drawn, or/and chronotype, known to affect systemic signals, are not taken into consideration. As seen in figure 4, traces from different individuals vary heavily in terms of their patterns, which is not addressed in the text. Only analysing the summary average curve of the entire group may be masking the true data. More focus should be attributed to investigating the effects of serum from each individual and observing common patterns. Additionally, there are many potential causes of variability, instead or in addition to age, that may be contributing to the variation both, between the groups and between individuals within groups. All of this should be addressed by the authors and commented appropriately in the text. 

      We are not aware of any specific feature distinguishing the subjects (other than age) that could account for the differences between old and young. The fact that we see significant differences between the two groups, even with the relatively small size of the groups, suggests strongly that these differences are largely due to age. Nevertheless, we acknowledge that individual variability can be a contributing factor. For instance, the change in phase of clock genes appears to be driven largely by two subjects. We have commented on this and individual differences, in general, in the discussion.  

      The authors also note in the introduction that rhythms in different peripheral tissues vary in different ways with age, however the entire study is performed on only fibroblast, classified as peripheral tissue by the authors. It would be very interesting to investigate if the observed changes in fibroblast are extended or not to other cell lines from diverse organ origin. This could provide information about whether circulating circadian synchronising factors could exert their function systemically or on specific tissues. At the very least, this hypothesis should be addressed within the discussion. 

      It is likely that factors circulating in serum act on several tissues, and so their effects are relatively broad. However, this would require extensive investigation of other tissues. We now discuss this in the manuscript.

      In addition to the limitations indicated above I consider that the data of the study is an insufficiently analysis beyond the rhythmicity analysis. Results from the STRING and IPA analysis were merely descriptive and a more comprehensive bioinformatic analysis would provide additional information about potential molecular mechanism explaining the differential gene expression. For example, enrichment of transcription factors binding sites in those genes with different patters to pinpoint chromatin regulatory pathways.

      We performed LinC similarity analysis (LISA) to study enrichment of transcription factor binding. Results are displayed in Fig 3B and in lines 157-168. 

      Recommendations for the authors:

      The two reviewers and reviewing editor have agreed on the following recommendations for the authors: 

      Major: 

      (1) The bioinformatic analysis would benefit from a more thorough focus on variability between individuals. Specifically, the main conclusion of the manuscript could be significantly influenced by individual variabilities within and between the two age groups. This is of particular concern, as the groups are relatively small (four individual two females and two males, per group). In addition, the consideration of factors such as food intake and exercise prior to blood drawn, or/and chronotype, known to affect systemic signals should be more adequately explained. The lab is an experienced chronobiology lab, and thus we are confident that these factors had been thought of, but this needs to be better made clear.

      As seen in Figure 4, traces from different individuals vary heavily in terms of their patterns, which is not addressed in the text. Only analysing the summary average curve of the entire group may be masking the relevant data. Furthermore, there are many potential causes of variability, instead or in addition to age, that may be contributing to the variation both, between the groups and between individuals within groups. All of this should be addressed by the authors and commented appropriately in the text. 

      We are not aware of any specific feature distinguishing the subjects (other than age) that could account for the differences between old and young. The fact that we see significant differences between the two groups, even with the relatively small size of the groups, suggests strongly that these differences are largely due to age. Nevertheless, we acknowledge that individual variability can be a contributing factor. For instance, the change in phase of clock genes appears to be driven largely by two subjects. We have commented on this and individual differences, in general, in the discussion. 

      (2) The study would benefit from a more thorough analysis of the data beyond the rhythmicity analysis. Results from the STRING and IPA analysis were merely descriptive and a more comprehensive bioinformatic analysis would provide additional information about potential molecular mechanism explaining the differential gene expression. For example, enrichment of transcription factors binding sites in those genes with different patters to pinpoint chromatin regulatory pathways. This would provide additional value to the study, especially given the otherwise apparent lack of any mechanistic explanation. 

      We performed LinC similarity analysis (LISA) to study enrichment of transcription factor binding. Results are displayed in Fig 3B and in lines 157-168.

      (3) There were some questions about the amplitude of the core circadian clock gene rhythms raised, which in other human cell types would be much higher. A comment on this matter and the provision of the raw luminescence traces for Fig 2A would be greatly beneficial.

      Addressing the same topic: what are the typical fold changes of the many genes that change their rhythms after stimulation with young and old sera? For example, it would be useful to show histograms for the two groups. Does one group tend to have transcript rhythms of higher or lower fold changes? The presentation of the manuscript would further benefit from showing a few key examples for different types of responses. 

      The average luminescence trace for each individual serum sample from Fig 2A has been added to Fig S3A.

      We’ve presented the fold change data in Figure S5. There are a few significant differences, but largely the groups are similar in terms of fold change.

      (4) There are several points that we recommend to consider to add to the discussion: 

      What was the rationale to use these cells over the more common U2OS cells? Are there similarities between the rhythmic transcriptomes of the BJ-5TA cells and that of U2OS cells or other human cells? It should be relatively easy to address this point by assessing published datasets. 

      The original rationale to use BJ-5TA fibroblast cells was that we were aiming to build upon an observation found in a previous study2 which showed that circadian period changes with age in human fibroblasts. While our findings did not match theirs, we think an added benefit of using the BJ-5TA line is that unlike U2OS cells, it is not carcinoma derived cell line. We’ve added this point in lines 98-101. 

      Our study finds many more rhythmic transcripts compared to the previous studies examining U2OS cells. This can be attributed to several factors including differences in methods, including the use of human serum in our study, cell type differences, or decoupling of rhythms in some cancer cells. While a comparison of BJ-5TA cells and U2OS cells could be interesting, a proper comparison requires investigation of many data sets, since any pair of BJ-5TA and U2OS data sets will most likely differ in some detail of experimental design or data processing pipeline, which could contribute to observed differences in rhythmic transcripts.

      That being said, we compared clock reference genes (see Author response image 1) between BJ-5TA and U2OS cells, comparing circadian profiles obtained from our data with those available on CircaDB. These circadian profiles exhibit many similarities and a few differences. The peak to trough ratios (amplitudes) are quite similar for ARNTL, NR1D1, NR1D2, PER2, PER3, and are about 25% lower for CRY1 and somewhat higher for TEF (about 15%) in our data. We find that the MESORS are generally similar with the exception of NR1D1 which is much lower and NR1D2 which is much higher in our data.

      For the rhythmic cell cycle genes, could this be the consequence of the serum which synchronizes also the cell cycle, or is it rather an effect of the circadian oscillator driving rhythms of cell cycle genes? 

      This is an interesting point. Given our previous data showing that the cell cycle gene cyclin D1 is regulated by clock transcription factors3, we believe the circadian oscillator drives, or at least contributes to rhythms of cell cycle genes. However, the serum clearly makes a difference as we find that MESORs of cell cycle genes decrease with aged serum. This is consistent with the decreased proliferation previously observed in aged human tissue.

      While the reduction of rhythmicity in the old serum for oxidative phosphorylation transcripts is very interesting and fits with the general theme that metabolic function decreases with age, it is puzzling that the recipient cells are the same, but it is only the synchronization by the old and young serum that changes. Are the authors thus suggesting that decrease of metabolic rhythms is primarily a non cell-autonomous and systemic phenomenon? What would be a potential mechanism? 

      It may not be the cycling per se, but rather an overall inefficiency of oxidative phosphorylation that is conveyed by the serum. Relating other work in the field to our findings, we’ve added the following to our discussion: “Previous work in the field demonstrates that synchronization of the circadian clock in culture results in cycling of mitochondrial respiratory activity5,6 further underscoring the different effects of old serum, which does not support oscillations of oxidative phosphorylation associated transcripts. Age-dependent decrease in oxidative phosphorylation and increase in mitochondrial dysfunction7 is seen also in aged fibroblasts8 and contributes to age-related diseases9. We suggest that the age-related inefficiency of oxidative phosphorylation is conferred by serum signals to the cells such that oxidative phosphorylation cycles are mitigated. On the other hand, loss of cycling could contribute to impairments in mitochondrial function with age.”

      The delayed shifts after aged serum for clock transcripts (but not for Bmal1) are interesting and indicate that there may be a decoupling of Bmal1 transcript levels from the other clock gene phases. How do the authors interpret this? Could it be related to altered chronotypes in the elderly? 

      One possible explanation is that the delay of NPAS2, BMAL1’s binding partner, results in the delay of the transcription of clock controlled genes/negative arm genes. Since the RORs do not seem to be affected, Bmal is transcribed/translated as usual, but there isn’t enough NPAS2 to bind with BMAL1. In this case downstream genes are slower to transcribe causing the phase delay.

      The discussion would also benefit from mentioning parallels and dissimiliarities with previous works, as well as what would be possible mechanisms for such an effect. 

      We’ve expanded our discussion in the manuscript to discuss possible mechanisms and also how the genes/pathways implicated in our study relate to other aging literature.  

      Minor: 

      While time of serum collection is provided in the methods, it would be very useful to provide this information, along with the accompanying argumentation also at a more prominent position and to also add it to Table S1. 

      We made sure to highlight the collection time in the abstract of the manuscript “We collected blood from apparently healthy young (age 25-30) and old (age 70-76) individuals at 14:001 and used the serum to synchronize cultured fibroblasts.” The time of blood draw is also in sections of the paper (Intro and Methods). Since Table S1 is demographic information, we did not think that the blood draw time fit best there, but hopefully it is now clear in the text.

      L73 EKG: define the abbreviation 

      We rewrote this paragraph, but defined the term where it is used the paper.  

      L77: transfected BJ-5TA fibroblasts. Mention in the text that these are stably transfected cells. 

      We added this to the text.

      L88: Day 2 also revealed different phases of cyclic expression between young and old "groups" for a larger number of genes. Here it is only two donors, right? 

      Yes, we swapped out the word “groups” for “subjects”.

      L115. MESORs of steroid biosynthesis genes, particularly those relating to cholesterol biosynthesis, were also increased in the old sera condition. This is quite interesting, can the authors speculate on the significance of this finding? 

      We’ve added discussion about this finding in the context of the literature in our discussion.

      Fig 3. - FDRs are only listed for certain KEGG pathways, and gene counts for each pathway are also missing, which excludes some valuable context for drawing conclusions. Full tables of KEGG pathway enrichment outputs should be provided in supplementary materials. Input gene lists should also be uploaded as supplementary data files.

      Both output and input files are included in this submission as additional files.  

      Line 322 - How many replicates were excluded in the end for each group? Providing this information would strengthen the claim that the ability of both old and young serum to drive 24h oscillations in fibroblasts is robust and not only individual. 

      Each serum was tested in triplicate in two individual runs of the experiment. Of the 15 serum samples, on one of the runs, a triplicate for each of two serum samples (one old, one young) was excluded. Given that only one technical replicate in one run of the experiment had to be excluded for one old and one young individual out of all the samples assayed, this supports the idea that young and old serum drive robust oscillations.

      Line 373 - Should list which active interaction sources were used for analysis. 

      In this manuscript we used STRING (search tool for retrieval of interacting genes) analysis to broadly identify relevant pathways defined by different algorithms. From these data, we focused in particular on KEGG pathways.

      Reviewer #1 (Recommendations For The Authors): 

      These comments are in addition to those provided above: 

      Minor: 

      L73 EKG: define the abbreviation 

      We rewrote this paragraph, but defined the term where it is used the paper.  

      L77: transfected BJ-5TA fibroblasts. Mention in the text that these are stably transfected cells. 

      We added this to the text.

      L88: Day 2 also revealed different phases of cyclic expression between young and old "groups" for a larger number of genes. Here it is only two donor, right? 

      Yes, we swapped out the word “groups” for “subjects”.

      L115. MESORs of steroid biosynthesis genes, particularly those relating to cholesterol biosynthesis, were also increased in the old sera condition. This is quite interesting, can the authors speculate on the significance of this finding? 

      We’ve added discussion about this finding in the context of the literature.

      Fig.4 The fold change amplitude of the clock gene seems quite a bit lower than what is usually expected (for Nr1d1 it is usually 10 fold). The authors should provide an explanation and discuss this. 

      There are a variety of factors that contribute to the fold change amplitude of clock genes. First, the change in amplitude of clock genes is lower in vitro compared to in vivo samples. For example, in U2OS cell cultures the fold change in the cycling of Nr1d1 is only 2 fold and is not significantly different from the fold change we observe (as shown in the U2OS data from CircaDB plotted in Figure 1R). Second, the method of synchronization contributes to the strength of the rhythms. Serum synchronization is generally less effective at driving strong clock cycling than forskolin or dexamethasone although, as noted in the manuscript, it may promote the cycling of more genes. Lastly, rhythm amplitude is also dependent on the cell type in question so cell to cell variability also contributes to differences. However, overall, we do not find major differences in comparing the U2OS data and ours. Please note that the y-axis has a logarithmic scale.

      What is the authors' strategy to identify which serum components that are responsible for the reported changes? This should be discussed. 

      In the future, we intend to analyze the serum factors using a combination of fractionation and either proteomics or metabolomics to identify relevant factors. We have added this to the discussion.

      Reviewer #2 (Recommendations For The Authors): 

      Overall, the article is well-written but lacks some more rigorous data analysis as mentioned in the public review above. In addition to a more thorough analysis approach focusing much more heavily on individual variability, several other changes can be made to strengthen this study:

      Fig 3. - FDRs are only listed for certain KEGG pathways, and gene counts for each pathway are also missing, which excludes some valuable context for drawing conclusions. Full tables of KEGG pathway enrichment outputs should be provided in supplementary materials. Input gene lists should also be uploaded as supplementary data files. 

      Both output and input files are included in this submission as additional files.

      Fig 1A. - Only n=5 participants were used for this analysis, explanation of the exclusion criteria for the other participants would be useful. 

      As Figure 1A is a schematic, we assume the reviewer is referring to Figure 1B. We’ve provided a flow chart of subject inclusion/exclusion in Figure S2.

      Fig 2. - For circadian transcriptome analysis only n=4 participants were used - what criteria was used to exclude individuals, and why were only these individuals used in the end? 

      As patient recruitment was interrupted by COVID, we selected samples where we had sufficient serum to effectively carry out the RNA seq experiment and control for age and sex.

      Line 322 - How many replicates were excluded in the end for each group? Providing this information would strengthen the claim that the ability of both old and young serum to drive 24h oscillations in fibroblasts is robust and not only individual. 

      Each serum was tested in triplicate in two individual runs of the experiment. Of the 15 serum samples, on one of the runs, a triplicate for each of two serum samples (one old, one young) was excluded. Given that only one technical replicate in one run of the experiment had to be excluded for one old and one young individual out of all the samples assayed, this supports the idea that young and old serum drive robust oscillations.

      Line 373 - Should list which active interaction sources were used for analysis. 

      In this manuscript we used STRING (search tool for retrieval of interacting genes) analysis to identify relevant pathways. We do not present any STRING networks in the paper.

      Line 68 - "These novel findings suggest that it may be possible to treat impaired circadian physiology and the associated disease risks by targeting blood borne factors." This is a completed overstatement that are cannot be sustained by the limited findings provided by the authors. 

      We’ve modified this statement to avoid overstating results.

      (1) Pagani, L. et al. Serum factors in older individuals change cellular clock properties. Proceedings of the National Academy of Sciences 108, 7218–7223 (2011).

      (2) Pagani, L. et al. Serum factors in older individuals change cellular clock properties. Proc Natl Acad Sci U S A 108, 7218–7223 (2011).

      (3) Lee, Y. et al. G1/S cell cycle regulators mediate effects of circadian dysregulation on tumor growth and provide targets for timed anticancer treatment. PLOS Biology 17, e3000228 (2019).

      (4) Tomasetti, C. et al. Cell division rates decrease with age, providing a potential explanation for the age-dependent deceleration in cancer incidence. Proceedings of the National Academy of Sciences 116, 20482–20488 (2019).

      (5) Cela, O. et al. Clock genes-dependent acetylation of complex I sets rhythmic activity of mitochondrial OxPhos. Biochimica et Biophysica Acta (BBA) - Molecular Cell Research 1863, 596–606 (2016).

      (6) Scrima, R. et al. Mitochondrial calcium drives clock gene-dependent activation of pyruvate dehydrogenase and of oxidative phosphorylation. Biochimica et Biophysica Acta (BBA) - Molecular Cell Research 1867, 118815 (2020).

      (7) Lesnefsky, E. J. & Hoppel, C. L. Oxidative phosphorylation and aging. Ageing Research Reviews 5, 402–433 (2006).

      (8) Greco, M. et al. Marked aging-related decline in efficiency of oxidative phosphorylation in human skin fibroblasts. The FASEB Journal 17, 1706–1708 (2003).

      (9) Federico, A. et al. Mitochondria, oxidative stress and neurodegeneration. Journal of the Neurological Sciences 322, 254–262 (2012).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The main research question could be defined more clearly. In the abstract and at some points throughout the manuscript, the authors indicate that the main purpose of the study was to assess whether the allocation of endogenous attention requires saccade planning [e.g., ll.3-5 or ll.247-248]. While the data show a coupling between endogenous attention and saccades, they do not point to a specific direction of this coupling (i.e., whether endogenous attention is necessary to successfully execute a saccade plan or whether a saccade plan necessarily accompanies endogenous attention).

      Thanks for the suggestion. We have modified the text in the abstract and at various points in the text to make it more clear that the study investigates the relationship between attention and saccades in one particular direction, first attentional deployment and then saccade planning.

      Some of the analyses were performed only on subgroups of the participants. The reporting of these subgroup analyses is transparent and data from all participants are reported in the supplementary figures. Still, these subgroup analyses may make the data appear more consistent, compared to when data is considered across all participants. For instance, the exogenous capture in Experiments 1 and 2 appears much weaker in Figure 2 (subgroup) than Figure S3 (all participants). Moreover, because different subgroups were used for different analyses, it is often difficult to follow and evaluate the results. For instance, the tachometric curves in Figure 2 (see also Figure 3 and 4) show no motor bias towards the cue (i.e., performance was at ~50% for rPTs <75 ms). I assume that the subsequent analyses of the motor bias were based on a very different subgroup. In fact, based on Figure S2, it seems that the motor bias was predominantly seen in the unreliable participants. Therefore, I often found the figures that were based on data across all participants (Figures 7 and S3) more informative to evaluate the overall pattern of results.

      Indeed, our intent was to dissociate the effects on saccade bias and timing as clearly as possible, even if that meant having to parse the data into subgroups of participants for different analyses. We do think conceptually this is the better strategy, because the bias and timing effects were distinct and not strongly correlated with specific participants or task variants. For instance, the unreliable participants were somewhat more consistently biased in the same direction, but the reliable participants also showed substantial biases, so the difference in magnitude was relatively modest. This can be more easily appreciated now that the reliable and unreliable participants are indicated in Figures 3 and 5. The impact of the bias is also discussed further in the last paragraphs of the Results, which note that the bias was not a reliable predictor of overall success during informed choices.

      Reviewer #3 (Public Review):

      (1) In this experimental paradigm, participants must decide where to saccade based on the color of the cue in the visual periphery (they should have made a prosaccade toward a green cue and an antisaccade away from a magenta cue). Thus, irrespective of whether the cue signaled that a prosaccade or an antisaccade was to be made, the identity of the cue was always essential for the task (as the authors explain on p. 5, lines 129-138). Also, the location where the cue appeared was blocked, and thus known to the participants in advance, so that endogenous attention could be directed to the cue at the beginning of a trial (e.g., p. 5, lines 129-132). These aspects of the experimental paradigm differ from the classic prosaccade/antisaccade paradigm (e.g. Antoniades et al., 2013, Vision Research). In the classic paradigm, the identity of the cues does not have to be distinguished to solve the task, since there is only one stimulus that should be looked at (prosaccade) or away from (antisaccade), and whether a prosaccade or antisaccade was required is constant across a block of trials. Thus, in contrast to the present paradigm, in the classic paradigm, the participants do not know where the cue is about to appear, but they know whether to perform a prosaccade or an antisaccade based on the location of the cue.

      The present paradigm keeps the location of the cue constant in a block of trials by intention, because this ensures that endogenous attention is allocated to its location and is not overpowered by the exogenous capture of attention that would happen when a single stimulus appeared abruptly in the visual field. Thus, the reason for keeping the location of the cue constant seems convincing. However, I wondered what consequences the constant location would have for the task representations that persist across the task and govern how attention is allocated. In the classic paradigm, there is always a single stimulus that captures attention exogenously (as it appears abruptly). In a prosaccade block, participants can prioritize the visual transient caused by the stimulus, and follow it with a saccade to its coordinates. In an antisaccade block, following the transient with a saccade would always be wrong, so that participants could try to suppress the attention capture by the transient, and base their saccade on the coordinates of the opposite location. Thus, in prosaccade and antisaccade blocks, the task representations controlling how visual transients are processed to perform the task differ. In the present task, prosaccades and antisaccades cannot be distinguished by the visual transients. Thus, such a situation could favor endogenous attention and increase its influence on saccade planning, even though saccade planning under more naturalistic conditions would be dominated by visual transients. I suggest discussing how this (and vice versa the emphasis on visual transients in the classic paradigm) could affect the generality of the presented findings (e.g., how does this relate to the interpretation that saccade plans are obligatorily coupled to endogenous attention? See, Results, p. 10, lines 306-308, see also Deubel & Schneider, 1996, Vision Research).

      Great discussion point. There are indeed many ways to set up an experiment where one must either look to a relevant cue or look away from it. Furthermore, it is also possible to arrange an experiment where the behavior is essentially identical to that in the classic antisaccade task without ever introducing the idea of looking away from something (Oor et al., 2023). More important than the specific task instructions or the structure of the event sequence, we think the fundamental factors that determine behavior in all of these cases are the magnitudes of the resulting exogenous and endogenous signals, and whether they are aligned or misaligned. Under urgent conditions, consideration of these elements and their relevant time scales explains behavior in a wide variety of tasks (see Salinas and Stanford, 2021). Furthermore, a recent study (Zhu et al., 2024) showed that the activation patterns of neurons in monkey prefrontal cortex during the antisaccade task can be accurately predicted from their stimulus- and saccade-related responses during a simpler task (a memory guided saccade task). This lends credence to the idea that, at the circuit level, the qualities that are critical for target selection and oculomotor performance are the relative strengths of the exogenous and endogenous signals, and their alignment in space and time. If we understand what those signals are, then it no longer matters how they were generated. The Discussion now includes a paragraph on this issue.

      (2) Discussion (p. 16, lines 472-475): The authors suppose that "It is as if the exogenous response was automatically followed by a motor bias in the opposite direction. Perhaps the oculomotor circuitry is such that an exogenous signal can rapidly trigger a saccade, but if it does not, then the corresponding motor plan is rapidly suppressed regardless of anything else.". I think this interesting point should be discussed in more detail. Could it also be that instead of suppression, other currently active motor plans were enhanced? Would this involve attention? Some attention models assume that attention works by distributing available (neuronal) processing resources (e.g., Desimone & Duncan, 1995, Annual Review of Neuroscience; Bundesen, 1990, Psychological Review; Bundesen et al., 2005, Psychological Review) so that the information receiving the largest share of resources results in perception and is used for action, but this happens without the active suppression of information.

      The rebound seen after the exogenously driven changes is certainly interesting, and we agree that it could involve not only the suppression of a specific motor plan but also enhancement of another (opposite) plan. However, we think that, given the lack of prior data with the requisite temporal precision, further elaboration of this point would just be too speculative in the context of the point that we are trying to make, which is simply that the underlying choice dynamics are more rapid and intricate than is generally appreciated.

      (3) Methods, p. 19, lines 593-596: It is reported that saccades were scored based on their direction. I think more information should be provided to understand which eye movements entered the analysis. Was there a criterion for saccade amplitude? I think it would be very helpful to provide data on the distributions of saccade amplitudes or on their accuracy (e.g. average distance from target) or reliability (e.g. standard deviation of landing points). Also, it is reported that some data was excluded from the analysis, and I suggest reporting how much of the data was excluded. Was the exclusion of the data related to whether participants were "reliable" or "unreliable" performers?

      The reported results are based on all saccades (detected according to a velocity threshold) that were produced after the go signal and in a predominantly horizontal direction (within ± 60° of the cue or non-cue), which were the vast majority (> 99%). Indeed, most saccades were directed to the choice targets, with 95% of them within ± 14.2° of the horizontal plane. The excluded (non-scored) trials were primarily fixation breaks plus a small fraction of trials with blinks, which compromised saccade determination. There was no explicit amplitude criterion; applying one (for instance, excluding any saccades with amplitude < 2°) produced minimal changes to the data. Overall, saccade amplitudes were distributed unimodally with a median of 7.7° and a 95% confidence interval of [3.7°, 9.7°], whereas the choice targets were located at ± 8° horizontally. This is now reported in the Methods.

      As far as data exclusion, analyses were based on urgent trials (gap > 0); non-urgent (gap < 0) trials were excluded from calculation of the tachometric curves simply because they might correspond to a slightly different regime (go signal after cue onset) and to long processing times in the asymptotic range (rPT in 200–300 ms) or beyond, which are not as informative. However, including them made no appreciable difference to the results. No data were excluded based on participant performance or identity; all psychometric analyses were carried out after the selection of trials based on the scoring criteria described above. This is now stated in the Methods.

      (4) Results, p. 9, lines 262-266: Some data analyses are performed on a subset of participants that met certain performance criteria. The reasons for this data selection seem convincing (e.g. to ensure empirical curves were not flat, line 264). Nevertheless, I suggest to explain and justify this step in more detail. In addition, if not all participants achieved an acceptable performance and data quality, this could also speak to the experimental task and its difficulty. Thus, I suggest discussing the potential implications of this, in particular, how this could affect the studied mechanisms, and whether it could limit the presented findings to a special group within the studied population.

      The ideal (i.e., best) analysis for determining the cost of an antisaccade for each individual participant (Fig. 4c) was based on curve fitting and required task performance to rise consistently above chance at long rPTs in both pro and anti trials. This is why the mentioned conditions on the fits were imposed. This is now explained in the text. This ideal analysis was not viable for all tachometric curves not necessarily because of task difficulty but also because of high variability or high bias in a particular experiment/condition. It is true that the task was somewhat difficult, but this manifested in various ways across the dataset, so attempting to draw a clean-cut classification of participants based on “difficulty” may not be easy or all that informative (as can be gleaned from Fig. S1). There simply was a range of success levels, as one might expect from any task that requires some nontrivial cognitive processing. Also note that no participants were excluded flat out from analysis. Thus, at the mentioned point in the text, we simply note that a complementary analysis is presented later that includes all participants and all conditions and provides a highly consistent result (namely, Fig. 7e). Then, in the last section of the Results, where Fig. 7 is presented, we point out that there is considerable variance in performance at long rPTs, and that it relates to both the bias and the difficulty of the task across participants.   

      Reviewer #1 (Recommendations For The Authors):

      (1) I have some questions related to the initial motor bias:

      a) Based on Figure S3, which shows the tachometric curves using data from all participants, there only seems to be a systematic motor bias in Experiments 1 and 3 but no bias in Experiments 2 and 4. It is unclear to me why this is different from the data shown in Figure 7.

      For the bars in Fig. 7, accuracy (% correct) was computed for each participant and then averaged across participants, whereas for the data in Fig. S3, trials were first pooled across participants and then accuracy was computed for each rPT bin. The different averaging methods produce slightly different results because some participants had more trials in the guessing range than others, and different biases.  

      b) Based on Figure 7 (and Figure S3), there was no motor bias in Experiment 4. Based on the correlations between motor bias and time difference between pro and antisaccades, I would expect that the rise points between pro and antisaccades would be more similar in this Experiment. Was this the case?

      No. Figs. 3c and S3d show that the rise times of pro and anti trials for Experiment 4 still differ by about 30 ms (around the 75% correct mark), and the rest of the panels in those figures show that the difference is similar for all experiments. What happens is that Figs. 7 and S3 show that on average the bias is zero for Experiment 4, but that does not mean that the average difference in rise times is zero because there is an offset in the data (correlation is not the same as regression). The most relevant evidence is in Fig. 6c, which shows that, for an overall bias of zero, one would still expect a positive difference in rise times of about 25–30 ms. This figure now includes a regression line, and the corresponding text now explains the relationship between bias and rise times more clearly. Thanks for asking; this is an important point that was not sufficiently elaborated before.

      c) If I understand correctly, the initial motor bias was predominantly observed in participants who were classified as 'unreliable performers' (comparing Figure S2 and Figure 2). Was there a correlation between the motor bias and overall success in the task? In other words: Was a strong motor bias generally disadvantageous?

      Good question. Participants classified as ‘unreliable’ were somewhat more consistently biased in the same direction than those classified as ‘reliable’, but the distinction in magnitude was not large. This can be better appreciated now in Fig. 5 by noting the mix of black (reliable) and gray labels (unreliable) along the x axes. The unreliable participants were also, by definition, less accurate in their asymptotic performance in at least one experiment (Fig. S1). In general, however, this classification was used simply to distinguish more clearly the two main effects in the data (timing cost and bias). In fact, the motor bias was not a reliable predictor of performance during informed choices: across all participants, the mean accuracy in the asymptotic range (rPT > 200 ms) had a weak, non-significant correlation with the bias (ρ = ‒0.07, p = 0.7). So, no, the motor bias did not incur an obvious disadvantage in terms of overall success in the task. Its more relevant effect was the asymmetry in performance that it promoted between pro- and antisaccade trials (Fig. 6c). This is now explained at the end of the Results.

      (2) One of the key analyses of the current study is the comparison of the rPT required to make informed pro and antisaccades (ll.246 ff). I think it would be informative for readers to see the results of this analysis separately for all four experiments. For instance, based on Figure 4a and b, it looks like the rise points were actually very similar between pro and antisaccades in Experiment 1.

      We agree that the ideal analysis would be to compute the performance rise point for pro- and antisaccade curves for each experiment and each participant, but as is now noted in the text, this requires a steady and substantial rise in the tachometric curve, which is not always obtained at such a fine-grained level; the underlying variability can be glimpsed from the individual points in Fig. 7a, b. Indeed, in Fig. 4a, b the mean difference between pro and anti rise points appears small for Experiment 1 — but note that the two panels include data from only partially overlapping sets of participants; the figure legend now makes this more clear. Again, this is because the required fitting procedure was not always reliable in both conditions (pro and anti) for a given subject in a given experiment. Thus, panels a and b cannot be directly compared. The key results are those in Fig. 4c, which compare the rise points in the two conditions for the same participants (11 of them, for which both rise points could be reliably determined). In that case the mean difference is evident, and the individual effect consistent for 9 of the 11 participants (as now noted).

      A similar comparison for Experiments 1 or 2 individually would include fewer data points and lose statistical power. However, on average, the results for Experiments 1 and 2 (separately) were indeed very similar; in both cases, the comparison between pro and anti curves pooled across the same qualifying participants as in Fig. 4c produced results that were nearly identical to those of Fig. 4d (as can be inferred from Fig. 2a, b). Furthermore, results for the four individual experiments pooled across all participants are presented in Figure S3, which shows delayed rises in antisaccade performance consistent with the single participant data (Fig. 4c).

      (3) Figure 3: It would be helpful to indicate the reliable performers that were used for Figure 3a in the bar plots in Figure 3b. Same for Figures 3c and d.

      Done. Thanks for the suggestion.

      (4) Introduction: The literature on the link between covert attention and directional biases in microsaccades seems relevant in the context of the current study (e.g., Hafed et al., 2002, Vision Res; Engbert & Kliegl, 2003, Vision Res; Willett & Mayo, 2023, Proc Natl Acad Sci USA).

      Yes, thanks for the suggestion. The introduction now mentions the link between attentional allocation and microsaccade production.

      (5) ll.395ff & Figure 7f: Please clarify whether data were pooled across all four experiments for this analysis.

      Yes, the data were pooled, but a positive trend was observed for each of the four experiments individually. This is now stated.

      (6) ll.432-433: There is evidence that the attentional locus and the actual saccade endpoint can also be dissociated (e.g., Wollenberg et al., 2018, PLoS Biol; Hanning et al., 2019, Proc Natl Acad Sci USA).

      True. We have rephrased accordingly. Thanks for the correction.

      (7) ll.438-440: This sentence is difficult to parse.

      Fixed.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript is well-written and compelling. The biggest issue for me was keeping track of the specifics of the individual experiments. I think some small efforts to reinforce those details along the way would help the reader. For example, in the Figure 3 figure legend, I found the parenthetical phrase "high luminence cue, low luminence non-cue)" immensely helpful. It would be helpful and trivial to add the corresponding phrase after "Experiment 4" in the same legend.

      Thanks for the suggestion. Legends and/or labels have been expanded accordingly in this and other figures.

      Line 314: "..had any effect on performance,..." Should there be a callout to Figure 2 here?

      Done.

      It wasn't clear to me why the specific high and low luminance values (48 and 0.25) were chosen. I assume there was at least some quick perceptual assessment. If that's the case or if the values were taken from prior work, please include that information.

      Done.

      Reviewer #3 (Recommendations For The Authors):

      Minor points. Please note that the comments made in the public review above are not repeated here.

      (1) Introduction, p. 2, lines 41-45: It is mentioned that the effects of covert attention or a saccade can be quite distinct. I suggest specifying in what way.

      Done.

      (2) Introduction, p. 2, lines 46-47: It is said that the relation between attention and saccade planning was still uncertain and then it is stressed that this was the case for more natural viewing conditions. However, the discussed literature and the experimental approach of the current study still rely on experimental paradigms that are far from natural viewing conditions. Thus, I suggest either discussing the link between these paradigms and natural viewing in more detail or leaving out the reference to natural viewing at this point (I think the latter suggestion would fit the present paper best).

      We followed the latter suggestion.

      (3) Introduction (e.g. p. 3, lines 55-58): The authors discuss the effects that sustaining fixation might have on attention and eye movements. Recently, it has been found that maintaining fixation can ameliorate cognitive conflicts that involve spatial attention (Krause & Poth, 2023, iScience). It seems interesting to include this finding in the discussion, because it supports the authors' view that it is necessary to study fixation and eye movements rather than eye movements alone to uncover their interplay with attention and decision-making.

      Thanks for the reference. The reported finding is certainly interesting, but we find it somewhat tangential to the specific point we make about strong fixation constraints — which is that they suppress internally driven motor activity, including biases, that are highly informative of the relationship between attention and saccade planning (lines 466‒472, 541‒561). Whether fixation state has other subtle consequences for cognitive control is an intriguing, important issue, for sure. But we would rather maintain the readers’ focus on the reasons why less restrictive fixation requirements are relevant for understanding the deployment of attention.

      (4) Results, p. 9, lines 264-266: It is reported that "The rise points were statistically the same across experiments for both prosaccades (p=0.08, n=10, permutation test)...", but the p-value seems quite close to significance. I suggest mentioning this and phrasing the sentence a bit more carefully.

      We now refer to the rise points as “similar”.

      (5) Figure 7 a-d: It might help readers who first skim through the figures before reading the text to use other labels for the bins on the x-axis that spell out the name of the phase in the trial. It might also help to visualize the bins on the plot of a tachymetric function (in this case, changing the labels could be unnecessary).

      Thanks for the suggestion. We added an insert to the figure to indicate the correspondence between labels and time bins more intuitively.

      (6) Methods, p. 18, lines 566-567: On some trials, participants received an auditory beep as a feedback stimulus. As this could induce a burst of arousal, I wondered how it affected the subsequent trials.

      This is an interesting issue to ponder. We agree that, in principle, the beep could have an impact on arousal. However, what exactly would be predicted as a consequence? The absence of a beep is meant to increase the urgency of the participant, so some effect of the beep event on RT would be expected anyway as per task instructions. Thus, it is unclear whether an arousal contribution could be isolated from other confounds. That said, three observations suggest that, at most, an independent arousal effect would be very small. First, we have performed multisensory experiments (unpublished) with auditory and visual stimuli, and have found that it is difficult to obtain a measurable effect of sound on an urgent visual choice task unless the experimental conditions are particularly conducive; namely, when the visual stimuli are dim and the sound is loud and lateralized. None of these conditions applies to the standard feedback beep. Second, because most trials are on time, the meaningful feedback signal is conveyed by the absence of the beep. But this signal to alter behavior (i.e., respond sooner) has zero intensity and is therefore unlikely to trigger a strong exogenous, automatic response. Finally, in our data, we can parse the trials that followed a beep (the majority) from those that did not (a minority). In doing so, we found no differences with respect to perceptual performance; only minor differences in RT that were identical for pro- and antisaccade trials. All this suggests to us that it is very unlikely that the feedback alters arousal significantly on specific trials, somehow impacting the tachometric curve (a contribution to general arousal across blocks or sessions is possible, of course, but would be of little consequence to the aims of the study).

      (7) Methods, p. 18, lines 574-577: I suggest referring to the colors or the conditions in the text as it was done in the experiments, just to prevent readers being confused before reading the methods.

      We appreciate the thought, but think that the study is easier to understand by pretending, initially, that the color assignments were fixed. This is a harmless simplification. Mentioning the actual color assignments early on would be potentially more confusing and make the description of the task longer and more contrived.

      (8) Methods, p. 18, Table 1: Given that the authors had a spectrophotometer, I suggest providing (approximate) measurements for the stimulus colors in addition to the luminance (i.e. not just RGB values).

      Unfortunately, we have since switched the monitor in our setup, so we don’t have the exact color measurements for the stimuli used at the time. We will keep the suggestion in mind for future studies though.

      References

      Oor EE, Stanford TR, Salinas E (2023) Stimulus salience conflicts and colludes with endogenous goals during urgent choices. iScience 26:106253.

      Salinas E, Stanford TR (2021) Under time pressure, the exogenous modulation of saccade plans is ubiquitous, intricate, and lawful. Curr Opin Neurobiol 70:154-162.

      Zhu J, Zhou XM, Constantinidis C, Salinas E, Stanford TR (2024) Parallel signatures of cognitive maturation in primate antisaccade performance and prefrontal activity. iScience.  doi: https://doi.org/10.1016/j.isci.2024.110488.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers and editors for insightful feedback on how we could improve the manuscript. We have revised the manuscript and addressed the points raised.

      Regarding the technical issues raised about the quality of patch clamp recordings (Reviewer 2), we acknowledge that the upper limit of the access resistance cutoff should be lower and that the accepted change should be 10-20%. To this end, we have revised the manuscript to more accurately detail the quality metrics used. The access resistance for the neurons in paired recordings were below 40 MΩ (similar to the metric used by Kolb et al. 2019), and if the access changed above 50 MΩ, we stopped recording from that neuron. Furthermore, the inclusion of neurons in the histogram with access resistance above 50 MΩ was to highlight the total number of neurons patched but not necessarily used in paired recordings. As this was done with an automated robotic system, the neurons would still undergo an initial voltage clamp and current clamp protocol before the pipette would release the neuron and patch another cell. To the point of Reviewer 2, this patch-walk protocol could also be alternatively implemented using manual recording approaches and this point has been included in the revised manuscript.

      Regarding the spatial restrictions (Reviewer 3), we agree that the average intersomatic distance is higher than ideal. This was likely due to failed patch attempts; for instance, if one pipette successfully achieved whole cell, and the other pipette had several sequential failed patch attempts, the intersomatic distance (ISD) would increase with each failed attempt due to the user selected index of cells. Ideally, the pipettes would be walking across a slice with low ISD if the whole-cell success rate was closer to 100%. To overcome this challenge in future work, automated cell identification and tracking could enable the path planning to be continuously updated after each patch attempt. Given the whole-cell success rate efficiency for a given electrophysiologist, we believe that the automated robot could be improved in later versions to include routeplanning algorithms to minimize the distance between neurons. Alternatively, this patch-walk system could also be integrated to improve connectivity yields for manual recording approaches as well.

      For the point raised about morphological identification, we believe that while important, morphological identification is out of the scope for this project. Future work will include neuronal reconstruction. Regarding the other points, we will amend the manuscript to highlight other key metrics such as maximum time we could hold a neuron under the whole-cell configuration. Additionally, we agree with Reviewer 3 that some of the current language may cause confusion, and we will amend it accordingly.

      To all the reviewers, thank you for your time, understanding, and the opportunity to improve our manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Summary: 

      As the scientific community identifies increasing numbers of genetic variants that cause rare human diseases, a challenge is how the field can most quickly identify pharmacological interventions to address known deficits. The authors point out that defining phenotypic outcomes required for drug screen assays is often challenging, and emphasize how invertebrate models can be used for quick ID of compounds that may address genetic deficits. A major contribution of this work is to establish a framework for potential intervention drug screening based on quantitative imaging of morphology and mobility behavior, using methods that the authors show can define subtle phenotypes in a high proportion of disease gene knockout mutants. 

      Overall, the work constitutes an elegant combination of previously developed high-volume imaging with highly detailed quantitative phenotyping (and some paring down to specific phenotypes) to establish proof of principle on how the combined applications can contribute to screens for compounds that may address specific genetic deficits, which can suggest both mechanism and therapy. 

      In brief, the authors selected 25 genes for which loss of function is implicated in human neuro-muscular disease and engineered deletions in the corresponding C. elegans homologs. The authors then imaged morphological features and behaviors prior to, during, and after blue light stimuli, quantitating features, and clustering outcomes as they elegantly developed previously (PMID 35322206; 30171234; 30201839). In doing so, phenotypes in 23/25 tested mutants could be separated enough to distinguish WT from mutant and half of those with adequate robustness to permit high-throughput screens, an outcome that supports the utility of general efforts to ID phenotypes in C. elegans disease orthologs using this approach. A detailed discussion of 4 ciliopathy gene defects, and NACLN-related channelopathy mutants reveals both expected and novel phenotypes, validating the basic approach to modeling vetted targets and underscoring that quantitative imaging approaches reiterate known biology. The authors then screened a library of nearly 750 FDA-approved drugs for the capacity to shift the unc-80 NACLN channel-disrupted phenotype closer to the wild type. Top "mover" compound move outcome in the experimental outcome space; and also reveal how "side effects" can be evaluated to prioritize compounds that confer the fewest changes of other parameters away from the center. 

      Strengths: 

      Although the imaging and data analysis approaches have been reported and the screen is limited in scope and intervention exposure, it is important that the authors strongly combine individual approach elements to demonstrate how quantitative imaging phenotypes can be integrated with C. elegans genetics to accelerate the identification of potential modulators of disease (easily extendable to other goals). Generation of deletion alleles and documentation of their associated phenotypes (available in supplemental data) provide potentially useful reagents/data to the field. The capacity to identify "over-shooting" of compound applications with suggestions for scale back and to sort efficacious interventions to minimize other changes to behavioral and physical profiles is a strong contribution. 

      Weaknesses: 

      The work does not have major weaknesses, although it may be possible to expand the discussion to increase utility in the field: 

      (1) Increased discussion of the challenges and limitations of the approach may enhance successful adaptation application in the field. 

      It is quite possible that morphological and behavioral phenotypes have nothing to do with disease mechanisms and rather reflect secondary outcomes, such that positive hits will address "off-target" consequences. 

      This is possible and can only be determined with human data. We now discuss the possibility in the discussion.

      The deletion approach is adequately justified in the text, but the authors may make the point somewhere that screening target outcomes might be enhanced by the inclusion of engineered alleles that match the human disease condition. Their work on sod-1 alleles (PMID 35322206) might be noted in this discussion. 

      We agree and now mention this work in the discussion. We are currently working on a collection of strains with patient-specific mutations.

      Drug testing here involved a strikingly brief exposure to a compound, which holds implications for how a given drug might engage in adult animals. The authors might comment more extensively on extended treatments that include earlier life or more extended targeting. The assumption is that administering different exposure periods and durations, but if the authors are aware as to whether there are challenges associated with more prolonged applications, larger scale etc. it would be useful to note them. 

      More prolonged applications are definitely possible. We chose short treatments for this screen to model the potential for changing neural phenotypes once developmental effects of the mutation have already occurred. We now briefly discuss this choice and the potential of longer treatments in the discussion.

      (2) More justification of the shift to only a few target parameters for judging compound effectiveness. 

      - In the screen in Figure 4D and text around 313, 3 selected core features of the unc-80 mutant (fraction that blue-light pause, speed, and curvature) were used to avoid the high replicate requirements to identify subtle phenotypes. Although this strategy was successful as reported in Figure 5, the pared-down approach seems a bit at odds with the emphasis on the range of features that can be compared mutant/wt with the author's powerful image analysis. Adding details about the reduced statistical power upon multiple comparisons, with a concrete example calculated, might help interested scientists better assess how to apply this tool in experimental design. 

      To empirically test the effect of including more features on the subsequent screen, we have repeated the analysis using increasing numbers of features. In a new supplementary figure we find increasing the number of features reduces our power to detect rescue. At 256 features, we would not be able to detect any compounds that rescued the disease model phenotype.

      (3) More development of the side-effect concept. The side effects analysis is interesting and potentially powerful. Prioritization of an intervention because of minimal perturbation of other phenotypes might be better documented and discussed a bit further; how reliably does the metric of low side effects correlate with drug effectiveness? 

      Ultimately this can only be determined with clinical trial data on multiple drugs, but there are currently no therapeutic options for UNC80 deficiency in humans. We have included some extra discussion of the side effect concept.

      Reviewer #2 (Public Review): 

      Summary and strengths: 

      O'Brien et al. present a compelling strategy to both understand rare disease that could have a neuronal focus and discover drugs for repurposing that can affect rare disease phenotypes. Using C. elegans, they optimize the Brown lab worm tracker and Tierpsy analysis platform to look at the movement behaviors of 25 knockout strains. These gene knockouts were chosen based on a process to identify human orthologs that could underlie rare diseases. I found the manuscript interesting and a powerful approach to making genotype-phenotype connections using C. elegans. Given the rate at which rare Mendelian diseases are found and candidate genes suggested, human geneticists need to consider orthologous approaches to understand the disease and seek treatments on a rapid time scale. This approach is one such way. Overall, I have a few minor suggestions and some specific edits. 

      Weaknesses: 

      (1) Throughout the text on figures, labels are nearly impossible to read. I had to zoom into the PDF to determine what the figure was showing. Please make text in all figures a minimum of 10-point font. Similarly, the Figure 2D point type is impossible to read. Points should be larger in all figures. Gene names should be in italics in all figures, following C. elegans convention. 

      We have updated all figures with larger labels and, where necessary, split figures to allow for better readability. We’ve also corrected italicisation.

      (2) I have a strong bias against the second point in Figure 1A. Sequencing of trios, cohorts, or individuals NEVER identifies causal genes in the disease. This technique proposes a candidate gene. Future experiments (oftentimes in model organisms) are required to make those connections to causality. Please edit this figure and parts of the text. 

      We have removed references to causation. We were thinking of cases where a known variant is found in a patient where causality has already been established rather than cases of new variant discovery.

      (3) How were the high-confidence orthologs filtered from 767 to 543 (lines 128-131)? Also, the choice of the final list of 25 genes is not well justified. Please expand more about how these choices were made. 

      We now explain the extra keyword filtering step. For the final filtering step, we simply examined the list and chose 25. There is therefore little justification to provide and we acknowledge these cannot be seen as representative of the larger set according to well-defined rules. The choice was based on which genes we thought would be interesting using their descriptions or our prior knowledge (“subjective interestingness” in the main text).

      (4) Figures 3 and 4, why show all 8289 features? It might be easier to understand and read if only the 256 Tierpsy features were plotted in the heat maps. 

      In this case, we included all features because they were all tested for differences between mutants and controls. By consistently using all features for each fingerprint we can be sure that the features that are different that we want to highlight in box plots can be referred to in the fingerprint.

      (5) The unc-80 mutant screen is clever. In the feature space, it is likely better to focus on the 256 less-redundant Tierpsy features instead of just a number of features. It is unclear to me how many of these features are correlated and not providing more information. In other words, the "worsening" of less-redundant features is far more of a concern than the "worsening" of 1000 correlated features. 

      This is a good point. We’ve redone the analysis using the Tierpsy 256 feature set and included this as a supplementary figure. We find that the same trend exists when looking at this reduced feature set.

      Reviewer #3 (Public Review): 

      In this study, O'Brien et al. address the need for scalable and cost-effective approaches to finding lead compounds for the treatment of the growing number of Mendelian diseases. They used state-of-the-art phenotypic screening based on an established high-dimensional phenotypic analysis pipeline in the nematode C. elegans. 

      First, a panel of 25 C. elegans models was created by generating CRISPR/Cas9 knock-out lines for conserved human disease genes. These mutant strains underwent behavioral analysis using the group's published methodology. Clustering analysis revealed common features for genes likely operating in similar genetic pathways or biological functions. The study also presents results from a more focused examination of ciliopathy disease models. 

      Subsequently, the study focuses on the NALCN channel gene family, comparing the phenotypes of mutants of nca-1, unc-77, and unc-80. This initial characterization identifies three behavioral parameters that exhibit significant differences from the wild type and could serve as indicators for pharmacological modulation. 

      As a proof-of-concept, O'Brien et al. present a drug repurposing screen using an FDA-approved compound library, identifying two compounds capable of rescuing the behavioral phenotype in a model with UNC80 deficiency. The relatively short time and low cost associated with creating and phenotyping these strains suggest that high-throughput worm tracking could serve as a scalable approach for drug repurposing, addressing the multitude of Mendelian diseases. Interestingly, by measuring a wide range of behavioural parameters, this strategy also simultaneously reveals deleterious side effects of tested drugs that may confound the analysis. 

      Considering the wealth of data generated in this study regarding important human disease genes, it is regrettable that the data is not actually made accessible. This diminishes the study's utility. It would have a far greater impact if an accessible and user-friendly online interface were established to facilitate data querying and feature extraction for specific mutants. This would empower researchers to compare their findings with the extensive dataset created here. Otherwise, one is left with a very limited set of exploitable data. 

      We have now made the feature data available on Zenodo (https://doi.org/10.5281/zenodo.12684118) as a matrix of feature summaries and individual skeleton timeseries data (the feature matrix makes it more straightforward to extract the data from particular mutants for reanalysis). We have also created a static html version of the heatmap in Figure 2 containing the entire behavioural feature set extracted by Tierpsy. This can be opened in a browser and zoomed for detailed inspection. Mousing over the heatmap shows the names of features at each position making it easier to arrive at intuitive conclusions like ‘strain A is slow’ or ‘strain B is more curved’.

      Another technical limitation of the study is the use of single alleles. Large deletion alleles were generated by CRISPR/Cas9 gene editing. At first glance, this seems like a good idea because it limits the risk that background mutations, present in chemically-generated alleles, will affect behavioral parameters. However, these large deletions can also remove non-coding RNAs or other regulatory genetic elements, as found, for example, in introns. Therefore, it would be prudent to validate the behavioral effects by testing additional loss-of-function alleles produced through early stop codons or targeted deletion of key functional domains. 

      We have added a note in the main text on limitations of deletion alleles. We like the idea of making multiple alleles in future studies, especially in cases where a project is focussed on just one or a few genes.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors): 

      Note that none of the above suggestions or the one immediately below are considered mandatory. 

      One additional minor point: The dual implication of mevalonate perturbations for NACLM deficiencies is striking. At the same time, the mevalonate pathway is critical for embryo viability among other things, which prompts questions about how reproductive physiology is integrated in this screen approach. It appears that sterilization protocols are not used to prepare screen target animals, but it would be useful to know if there were a signature associated with drug-induced sterility that might help identify one potential common non-interesting outcome of compound treatments in general. In this work, the screen treatment is only 4 hours, which is probably too short to compromise reproduction, but as noted above, it is likely users would intend to expose test subjects for much longer than 4-hour periods. 

      This is an interesting point. In its current form our screen doesn’t assess reproductive physiology. This is something that we will consider in ongoing projects.

      Figures 

      Figure 1D might be omitted or moved to supplement. 

      We have removed 1D and moved figure 1E as a standalone table (Table 1) to improve readability.

      Figure 2D "key" is hard to make out size differences for prestim, bluelight, and poststim -more distinctive symbols should be used. 

      We have increased the size of the symbols so that the key is easier to read.

      Line 412 unc-25 should be in italics 

      Corrected

      Reviewer #2 (Recommendations For The Authors): 

      Specific edits: 

      All of the errors below have been corrected.

      Line 47, "loss of function" should be hyphenated because it is a compound adjective that modifies mutations. 

      Line 50, "genetically-tractable" should not be hyphenated because it is not a compound adjective. It is an adverb-adjective pair. Line 102 has the same grammatical issue. 

      Line 85, "rare genetic diseases" do not "affect nervous system function". The disease might have deficits in this function, but the disease does not do anything to function. 

      Line 86, it should be mutations not mutants. Mutations are changes to DNA. Mutants are individuals with mutations. 

      Throughout, wild-type should be hyphenated when it is used as a compound adjective. 

      Figure 4, asterisks is spelled incorrectly. 

      Reviewer #3 (Recommendations For The Authors): 

      - As stated in the public review, the utility of the study is limited by the lack of access to the complete dataset. The wealth of data produced by the study is one of its major outputs. 

      We have made the data publicly available on Zenodo. We appreciate the request.

      - Describe the exact break-points of the different alleles, because it was not readily feasible to derive them from the gene fact sheets provided in the supplementary materials. 

      We have now provided the start position and total length of deletion for each gene in the gene fact sheets.

      - Figure 1C: what does "Genetic homology"/"sequence identity" refer to? How were these values calculated? 

      UNC-49 is clearly not 95% identical to vertebrate GABAR subunits at the protein level. 

      We have changed the axis label to “BLAST % Sequence Identity” to clarify that these values are calculated from BLAST sequence alignments on WormBase and the Alliance Genom Resources webpages.

      - Figure 1E : The data presented in Figure 1E appears somewhat unreliable. For example, a cursory check showed: 

      (1) Wrong human ortholog: unc-49 is a Gaba receptor, not a Glycine receptor as indicated in the second column. 

      (2) Wrong disease association: dys-1 is not associated with Bardet-Biedl syndrome; overall the data indicated in the table does not seem to fully match the HPO database. 

      (3) Inconsistent disease association: why don't the avr-14 and glc-2 (and even unc-49) profiles overlap/coincide given that they present overlapping sets of human orthologs. 

      Thank you for catching this! We have corrected gene names which were mistakenly pasted. We have also made this a standalone table (Table 1) for improved readability.

      - Error in legend to figure 4I : "with ciliopathies and N2" > ciliopathies should be "NALCN disease". 

      - Error at line 301: "Figures 2E-H" should be "Figures 4E-H". 

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      This study explores the sequence characteristics and features of high-occupancy target (HOT) loci across the human genome. The computational analyses presented in this paper provide information into the correlation of TF binding and regulatory networks at HOT loci that were regarded as lacking sequence specificity.

      By leveraging hundreds of ChIP-seq datasets from the ENCODE Project to delineate HOT loci in HepG2, K562, and H1-hESC cells, the investigators identified the regulatory significance and participation in 3D chromatin interactions of HOT loci. Subsequent exploration focused on the interaction of DNA-associated proteins (DAPs) with HOT loci using computational models. The models established that the potential formation of HOT loci is likely embedded in their DNA sequences and is significantly influenced by GC contents. Further inquiry exposed contrasting roles of HOT loci in housekeeping and tissue-specific functions spanning various cell types, with distinctions between embryonic and differentiated states, including instances of polymorphic variability. The authors conclude with a speculative model that HOT loci serve as anchors where phase-separated transcriptional condensates form. The findings presented here open avenues for future research, encouraging more exploration of the functional implications of HOT loci.

      Strengths:

      The concept of using computational models to define characteristics of HOT loci is refreshing and allows researchers to take a different approach to identifying potential targets. The major strengths of the study lies in the very large number of datasets analyzed, with hundreds of ChIP-seq data sets for both HepG2 and K562 cells as part of the ENCODE project. Such quantitative power allowed the authors to delve deeply into HOT loci, which were previously thought to be artifacts.

      Weaknesses:

      While this study contributes to our knowledge of HOT loci, there are critical weaknesses that need to be addressed. There are questions on the validity of the assumptions made for certain analyses. The speculative nature of the proposed model involving transcriptional condensates needs either further validation or be toned down. Furthermore, some apparent contradictions exist among the main conclusions, and these either need to be better explained or corrected. Lastly, several figure panels could be better explained or described in the figure legends.

      We thank the reviewer for their valuable comments.

      - We have extended the study and included a new chapter focusing on the condensate hypothesis, added more supporting evidence (including the ones suggested by the reviewer), and made explicit statements on the speculative nature of this model.

      - We have restructured the text to remove the sentences which might be construed as contradictory.

      Reviewer #2 (Public Review):

      Summary:

      The paper 'Sequence characteristic and an accurate model of abundant hyperactive loci in human genome' by Hydaiberdiev and Ovcharenko offers comprehensive analyses and insights about the 'high-occupancy target' (HOT) loci in the human genome. These are considered genomic regions that overlap with transcription factor binding sites. The authors provided very comprehensive analyses of the TF composition characteristics of these HOT loci. They showed that these HOT loci tend to overlap with annotated promoters and enhancers, GC-rich regions, open chromatin signals, and highly conserved regions, and that these loci are also enriched with potentially causal variants with different traits.

      Strengths:

      Overall, the HOT loci' definition is clear and the data of HOT regions across the genome can be a useful dataset for studies that use HepG2 or K562 as a model. I appreciate the authors' efforts in presenting many analyses and plots backing up each statement.

      Weaknesses:

      It is noteworthy that the HOT concept and their signature characteristics as being highly functional regions of the genome are not presented for the first time here. Additionally, I find the main manuscript, though very comprehensive, long-winded and can be put in a shorter, more digestible format without sacrificing scientific content.

      The introduction's mention of the blacklisted region can be rather misleading because when I read it, I was anticipating that we are uncovering new regulatory regions within the blacklisted region. However, the paper does not seem to address the question of whether the HOT regions overlap, if any, with the ENCODE blacklisted regions afterward. This plays into the central assessment that this manuscript is long-winded.

      The introduction also mentioned that HOT regions correspond to 'genomic regions that seemingly get bound by a large number of TFs with no apparent DNA sequence specificity' (this point of 'no sequence specificity' is reiterated in the discussion lines 485-486). However, later on in the paper, the authors also presented models such as convolutional neural networks that take in one-hot-encoded DNA sequence to predict HOT performed really well. It means that the sequence contexts with potential motifs can still play a role in forming the HOT loci. At the same time, lines 59-60 also cited studies that "detected putative drive motifs at the core segments of the HOT loci". The authors should edit the manuscript to clarify (or eradicate) contradictory statements.

      We thank the reviewer for their valuable comments. Below are our responses to each paragraph in the given order:

      We added a statement in the commenting and summarizing other publications that studied the functional aspects of HOT loci with the following sentence in the introduction part:

      “Other studies have concluded that these regions are highly functionally consequential regions enriched in epigenetic signals of active regulatory elements such as histone modification regions and high chromatin accessibility”.

      We significantly shortened the manuscript by a) moving the detailed analyses of the computational model to the supplemental materials, and b) shortening the discussions by around half, focusing on core analyses that would be most beneficial to the field.

      Given that the ENCODE blacklisted regions are the regions that are recommended by the ENCODE guidelines to be avoided in mapping the ChIP-seq (and other NGS), we excluded them from our analyzed regions before mapping to the genome. Instead, we relied on the conclusions of other publications on HOT loci that the initial assessments of a fraction of HOT loci were the result of factoring in these loci which later were included in blacklisted regions.

      We addressed the potential confusion by using the expression of “no sequence specificity” by a) changing the sentence in the introduction by adding a clarification as “... with no apparent DNA sequence specificity in terms of detectible binding motifs of corresponding motifs” and b) removing that part from the sentence in the discussions.

      Reviewer #3 (Public Review):

      Summary:

      Hudaiberdiev and Ovcharenko investigate regions within the genome where a high abundance of DNA-associated proteins are located and identify DNA sequence features enriched in these regions, their conservation in evolution, and variation in disease. Using ChIP-seq binding profiles of over 1,000 proteins in three human cell lines (HepG2, K562, and H1) as a data source they're able to identify nearly 44,000 high-occupancy target loci (HOT) that form at promoter and enhancer regions, thus suggesting these HOT loci regulate housekeeping and cell identity genes. Their primary investigative tool is HepG2 cells, but they employ K562 and H1 cells as tools to validate these assertions in other human cell types. Their analyses use RNA pol II signal, super-enhancer, regular-enhancer, and epigenetic marks to support the identification of these regions. The work is notable, in that it identifies a set of proteins that are invariantly associated with high-occupancy enhancers and promoters and argues for the integration of these molecules at different genomic loci. These observations are leveraged by the authors to argue HOT loci as potential sites of transcriptional condensates, a claim that they are well poised to provide information in support of. This work would benefit from refinement and some additional work to support the claims.

      Comments:

      (1) Condensates are thought to be scaffolded by one or more proteins or RNA molecules that are associated together to induce phase separation. The authors can readily provide from their analysis a check of whether HOT loci exist within different condensate compartments (or a marker for them). Generally, ChIPSeq signal from MED1 and Ronin (THAP11) would be anticipated to correspond with transcriptional condensates of different flavors, other coactivator proteins (e.g., BRD4), would be useful to include as well. Similarly, condensate scaffolding proteins of facultative and constitutive heterochromatin (HP1a and EZH2/1) would augment the authors' model by providing further evidence that HOT Loci occur at transcriptional condensates and not heterochromatin condensates. Sites of splicing might be informative as well, splicing condensates (or nuclear speckles) are scaffolded by SRRM/SON, which is probably not in their data set, but members of the serine arginine-rich splicing factor family of proteins can serve as a proxy-SRSF2 is the best studied of this set. This would provide a significant improvement to their proposed model and be expected since the authors note that these proteins occur at the enhancers and promoter regions of highly expressed genes.

      (2) It is curious that MAX is found to be highly enriched without its binding partner Myc, is Myc's signal simply lower in abundance, or is it absent from HOT loci? How could it be possible that a pair of proteins, which bind DNA as a heterodimer are found in HOT loci without invoking a condensate model to interpret the results?

      (3) Numerous studies have linked the physical properties of transcription factor proteins to their role in the genome. The authors here provide a limited analysis of the proteins found at different HOT-loci by employing go terms. Is there evidence for specific types of structural motifs, disordered motifs, or related properties of these proteins present in specific loci?

      (4) Condensates themselves possess different emergent properties, but it is a product of the proteins and RNAs that concentrate in them and not a result of any one specific function (condensates can have multiple functions!)

      (5) Transcriptional condensates serve as functional bodies. The notion the authors present in their discussion is not held by practitioners of condensate science, in that condensates exist to perform biochemical functions and are dissolved in response to satisfying that need, not that they serve simply as reservoirs of active molecules. For example, transcriptional condensates form at enhancers or promoters that concentrate factors involved in the activation and expression of that gene and are subsequently dissolved in response to a regulatory signal (in transcription this can be the nascently synthesized RNA itself or other factors). The association reactions driving the formation of active biochemical machinery within condensates are materially changed, as are the kinetics of assembly. It is unnecessary and inaccurate to qualify transcriptional condensates as depots for transcriptional machinery.

      6) This work has the potential to advance the field forward by providing a detailed perspective on what proteins are located in what regions of the genome. Publication of this information alongside the manuscript would advance the field materially.

      We thank the reviewer for constructive comments and suggestions. Below are our point-by-point responses:

      (1) We added a new short section “Transcriptional condensates as a model for explaining the HOT regions” with additional support for the condensate hypothesis, wherein some of the points raised here were addressed. Specifically, we used a curated LLPS proteins (CD-CODE) database and provided statistics of those annotation condensate-related DAPs.

      Regarding the DAPs mentioned in this question, we observed that the distributions corresponding ChIP-seq peaks confirm the patterns expected by the reviewer (Author response image 1). Namely:

      - MED1 and Ronin (THAP11) are abundant in the HOT loci, being present 67% and 64% of HOT loci respectively.

      - While the BRD4 is present in 28% of the HOT loci, we observed that the DAPs with annotated LLPS activity ranged from 3% to 73%, providing further support for the condensate hypothesis.

      - ENCODE database does not contain ChIP-seq dataset for HP1A. EZH2 peaks were absent in the HOT loci (0.4% overlap), suggesting the lack of heterochromatin condensate involvement.

      - Serine-rich splicing factor family proteins were present only in 7.7% of the HOT loci, suggesting the absence or limited overlap with splicing condensates or nuclear speckles.

      Author response image 1.

      (2) In this study we selected the TF ChIP-seq datasets with stringent quality metrics, excluding those which had attached audit warning and errors. As a result, the set of DAPs analyzed in HepG2 did not include MYC, since the corresponding ChIP-seq dataset had the audit warning tags of "borderline replicate concordance, insufficient read length, insufficient read depth, extremely low read depth". Analyses in K562 and H1 did include MYC (alongside MAX) ChIP-seq dataset.

      To address this question, we added the mentioned ChIP-seq dataset (ENCODE ID: ENCFF800JFG) and analyzed the colocalization patterns of MYC and MAX. We observed that the MYC ChIP-seq peaks in HepG2 display spurious results, overlapping with only 5% of HOT loci. Meanwhile in K562 and H1, MYC and MAX are jointly present in 54% and 44% of the HOT loci, respectively (Author response image 2).

      Author response image 2.

      These observations were also supported by Jaccard indices between the MYC and MAX ChIP-seq peaks. To do this analysis, we calculated the pairwise Jaccard indices between MYC and MAX and divided them by the average Jaccard indices of 2000 randomly selected DAP pairs. In K562 and H1, the Jaccard indices between MYC and MAX are 5.72x and 2.53x greater than the random background, respectively. For HepG2, the ratio was 0.21x, clearly indicating that HepG2 MYC ChIP-seq dataset is likely erroneous.

      Author response image 3.

      (3) Despite numerous publications focusing on different structural domains in transcription factors, we could not find an extensive database or a survey study focusing on annotations of structural motifs in human TFs. Therefore, surveying such a scale would be outside of this study’s scope. We added only the analysis of intrinsically disordered regions, as it pertains to the condensate hypothesis. To emphasize this shortcoming, we added the following sentence to the end of the discussions section.

      “Further, one of the hallmarks of LLPS proteins that have been associated with their abilities to phase-separate is the overrepresentation of certain structural motifs, which we did not pursue due to size limitations.”

      (4, 5) We agree with these statements and thank the reviewer for pointing out this faulty statement. We modified the sections in the discussions related to the condensates and removed the part where we implied that the condensate model could be because of mostly a single function of TF reservoir.

      (6) We added a table to the supplemental materials (Zenodo repository) with detailed annotation of HOT and non-HOT DAP-bound loci in the genome.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      The clause with "inadequate" would be dropped if the authors sufficiently address reviewer concerns about clarity of writing, including:

      (1) Editing the title to better reflect the findings of the paper.

      (2) Making clear that the condensate model is speculative and not explicitly tested in this study (and may be better described as a hypothesis).

      (3) Resolving apparent contradictions regarding DNA sequence specificity and the interpretation of ChIP-seq signal intensity.

      (4) Better specifying and justifying model parameters, thresholds, and assumptions.

      (5) Shortening the manuscript to emphasize the main, well-supported claims and to enhance readability (especially the discussion section).

      We thank the Editor for their work. We followed their advice and implemented changes and additions to address all 5 points.

      Reviewer #1 (Recommendations For The Authors):

      (1) The title "Sequence characteristics and an accurate model of abundant hyperactive loci in the human genome" does not accurately reflect the findings of the paper. We are unclear as to what the 'accurate model' refers to. Is it the proposed model 'based on the existence of large transcriptional condensates' (abstract)? If so, there are concerns below regarding this statement (see comment 2). If the authors are referring to the computational modeling presented in Figure 5, it is unclear that any one of them performed that much better than the others and the best single model was not identified. Furthermore, the models being developed in the study constitute only a portion of the paper and lacked validation through additional datasets. Additionally, sequence characteristics were not a primary focus of the study. Only figure 5 talks about the model and sequence characteristics, the rest of the figures are left out of the equation.

      We agree with and thank the reviewer for this idea of clarifying the intended meaning.

      (1) We changed the title and clarified that the computational model is meant:

      “Functional characteristics and a computational model of abundant hyperactive loci in the human genome”.

      (2) Shortened the part of the manuscript discussing the computational models and pointed out the CNNs as “the best single model”.

      (2) The abstract and discussion (and perhaps the title) propose a model of transcriptional condensates in relation to HOT loci. However, there is no data provided in the manuscript that relates to condensates. Therefore, anything relating to condensates is primarily speculative. This distinction needs to be properly made, especially in the abstract (and cannot be included in the title). Otherwise, these statements are misleading. Although the field of transcriptional condensates is relatively new, there have been several factors studied. The authors could include in Figure 2d which factors have been shown to form transcriptional condensates. This might provide some support for the model, though it would still largely remain speculative unless further testing is done.

      We added a new short chapter “Transcriptional condensates as a model for explaining the HOT regions”,  with additional analyses testing the condensates hypothesis. We provided supportive evidence by analyzing the metrics used as hallmarks of condensates including the distributions of annotated condensate-related proteins, nascent transcription, and protein-RNA interaction levels in HOT loci. Still, we acknowledge that this is a speculative hypothesis and we clarified that with the following statement in the discussions:

      “It is important to note here that our proposed condensate model is a speculative hypothesis. Further experimental studies in the field are needed to confirm or reject it.”

      (3) Several apparent contradictions exist throughout the manuscript. For example, "HOT locus formation are likely encoded in their DNA sequences" (lines 329-330) vs the proposed model of formation through condensates (abstract). These two statements do not seem compatible, or at the very least, the authors can explain how they are consistent with each other. Another example: "ChIP-seq signal intensity as a proxy for... binding affinity" (line 229) vs. "ChIP-seq signal intensities do not seem to be a function of the DNA-binding properties of the DAPs" (lines 259-260). The first statement is the assumption for subsequent analyses, which has its own concerns (see comment 4). But the conclusion from that analysis seems to contradict the assumption, at least as it is stated.

      In this study, we argue that the two statements may not necessarily contradict each other. We aimed to a) demonstrate that the observed intensity of DAP-DNA interactions as measured by ChIP-seq experiments at HOT loci cannot be explained with direct DNA-binding events of the DAPs alone and b) propose a hypothesis that this observation can be at least partially explained if the HOT loci have the propensity to either facilitate or take part in the formation of transcriptional condensates.

      One of the conditions for condensates to form at enhancers was shown to be the presence of strong binding sites of key TFs (Shrinivas et al. 2019 “Enhancer features that drive the formation of transcriptional condensates”), where the study was conducted using only one TF (OCT4) and one coactivator (MED1). To the best of our knowledge, no such study has been conducted involving many TFs and cofactors simultaneously. We also know that the factors that lead to liquid-to-liquid phase separation include weak multivalent IDR-IDR, IDR-DNA, and IDR-RNA interactions. As a result, the observed total sum of ChIP-seq peaks in HOT loci is the direct DNA-binding events combined with the indirect DAP-DNA interactions, some of which may be facilitated by condensates. And, the fact that CNNs can recognize the HOT loci with high accuracy suggests that there must be an underlying motif grammar specific to HOT loci.

      We emphasized this conclusion in the discussions.

      The comment on using the ChIP-seq signal as a proxy for DNA-binding affinity is addressed under comment 4.

      (4) In lines 229-230, the authors used "the ChIP-seq signal intensity as a proxy for the DAP binding affinity." What is the basis for this assumption? If there is a study that can be referenced, it should be added. However, ChIP-seq signal intensity is generally regarded as a combination of abundance, frequency, or percentage of cells with binding. RNA Pol2 is a good example of this as it has no specific binding affinity but the peak heights indicate level of expression. Therefore, the analyses and conclusions in Figure 4, particularly panel A, are problematic. In addition, clarification from lines 258-260 is needed as it contradicts the earlier premise of the section (see comment 3).

      We thank the reviewer for pointing out this error. The main conclusion of the paragraph is that the average ChIP-seq signal values at HOT loci do not correlate well with the sequence-specificity of TFs. We reworded the paragraph stating that we are analyzing the patterns of ChIP-seq signals across the HOT loci, removing the part that we use them as a proxy for sequence-specific binding affinity.

      (5) In Figure 1A, the authors show that "the distribution of the number of loci is not multimodal, but rather follows a uniform spectrum, and thus, this definition of HOT loci is ad-hoc" (lines 92-95). The threshold to determine how a locus is considered to be HOT is unclear. How did the authors decide to use the current threshold given the uniform spectrum observed? How does this method of calling HOT loci compare to previous studies? How much overlap is there in the HOT loci in this study versus previous ones?

      We moved the corresponding explanation from the supplemental methods to the main methods section of the manuscript.

      Briefly, our reasoning was as follows: assuming that an average TFBS is 8bp long and given that we analyze the loci of length 400bp, we can set the theoretical maximum number of simultaneous binding events to be 50. Hence, if there are >50 TF ChIP-seq peaks in a given 400bp locus, it is highly unlikely that the majority of ChIP-seq peaks can be explained by direct TF-DNA interactions. The condition of >50 TFs corresponded to the last four bins of our binning scale, which was used as an operational definition for HOT loci.

      We have compared our definition of HOT loci to those reported in previous studies by Remaker et al. and Boyle et al. The results of our analyses are in lines 147-154.

      (6) In Figure 3B, the authors state that of "the loop anchor regions with >3 overlapping loops, 51% contained at least one HOT locus, suggesting an interplay between chromatin loops and HOT loci." However, it is unclear how "51%" is calculated from the figure. Similarly, in the following sentence, "94% of HOT loci are located in regions with at least one chromatin interaction". It is unclear as to how the number was obtained based on the referenced figure.

      Initially, the x-axis on the Figure 3B was missing, making it hard to understand what we meant. We added the x-axis numbers and changed the “51%” to “more than half”. We intend to say that, of the loci with 4 and 5 overlapping loops, exactly 50% contain at least one HOT locus. However, since for x=6 the percentage is 100% (since there’s only one such locus), the percentage is technically “more than half”.

      The percentage of HOT loci engaging in chromatin interaction regions (91%) was calculated by simply overlapping the HOT regions with Hi-C long-range contact anchors. The details of extracting these regions using FitHiChip are described in Supplemental Methods 1.3.

      (7) While we have a limited basis to evaluate computational models, we would like to see a clearer explanation of the model set-up in terms of the number of trained vs. test datasets. In addition, it would be interesting to see if the models can be applied to data from different cell lines.

      We added the table with the sizes of the datasets used for classification in Supplemental Methods 1.6.1.

      Evaluating the models trained on the HOT loci of HepG2 and K562 on other cell lines would pose challenges since the number of available ENCODE TF ChIP-seq datasets is significantly less compared to the mentioned cell lines. Therefore, we conducted the proposed analysis between the studied cell lines. Specifically, we used the CNN models trained on HOT and regular enhancers of HepG2 and K562. Then, we evaluated each model on the test sets of each classification experiment (Author response image 4). We observed that the classification results of the HOT loci demonstrated a higher level of tissue-specificity compared to the same classification results of the regular enhancers.

      Author response image 4.

      (8) Lines 349-351. The significance of highly expressed genes being more prone to having multiple HOT loci, and vice versa, appears conventional and remains unclear. Intuitively, it makes sense for higher expressed genes to have more of the transcriptional machinery bound, and would bias the analysis. One way to circumvent this is to only analyze sequence-specific TFs and remove ones that are directly related to transcription machinery.

      We thank the reviewer for this suggestion. Our attempt to re-annotate the HOT loci with only sequence-specific TFs led to a significantly different set of loci, which would not be strictly comparable to the HOT loci defined by this study. Analyzing these new sets of loci would create a noticeable departure from the flow of the manuscript and further extend the already long scope of the study.

      Moreover, numerous studies have shown that super-enhancers recruit large numbers of TFs via transcriptional condensates (Boija et al., 2018; Cho et al., 2018; Sabari et al., 2018). We hope that our results can serve as data-driven supportive evidence for those studies.

      (9) Lines 393-396. We would like to see a reference to the models shown in the figures, if these models have been published previously.

      We could not understand the question. The lines 393-396 contains the following sentence:

      “However, many of the features of the loci that we’ve analyzed so far demonstrated similar patterns (GC contents, target gene expressions, ChIP-seq signal values etc.) when compared to the DAP-bound loci in HepG2 and K562, suggesting that albeit limited, the distribution of the DAPs in H1 likely reflects the true distribution of HOT loci.”

      In case the question was about the models that we trained to classify the HOT loci, we included the models and codebase to Zenodo and GitHub repository.

      (10) Values in Figure 7D are not reflected in the text. Specifically, the text states "Average ... phastCons of the developmental HOT loci are 1.3x higher than K562 and HepG2 HOT loci (Figure 7D)" (lines 408-409). Figure 7D shows conservation scores between HOT enhancers vs promoters for each cell line, and does not seem to reflect the text.

      We modified the figure to reflect the statement appropriately.

      (11) Methodology should include a justification for the use of the Mann-Whitney U-test (non-parametric) over other statistical tests.

      We added the following description to the methods section:

      “For calculating the statistical significance, we used the non-parametric Mann-Whitney U-test when the compared data points are non-linearly correlated and multi-modal. When the data distributions are bell-curve shaped, the Student’s t-test was used.“

      Minor:

      (1) Figure 2b was never mentioned in the paper. This can be added alongside Figure S6C, line 148.

      Indeed, Figure 2B was supposed to be listed together with Figure S6C, which was omitted by mistake. It was corrected.

      (2) Supplementary Figure 8 has two Cs. Needs to be corrected to D.

      Fixed.

      (3) Figure 3B is missing labels on the x-axis.

      Fixed.

      (4) The horizontal bar graph on the bottom left of Figure 1E needs to be described in the figure legend.

      Description added to the figure caption.

      (5) Line 345, Fig 15A should be Fig S15A.

      Corrected.

      Reviewer #2 (Recommendations For The Authors):

      I listed all my concerns about the paper in the public comments. I think the manuscript is very comprehensive and it is valuable, but it should be cut short and presented in a more digestible way.

      We thank the reviewer for their valuable comments and suggestions. We addressed all the concerns listed in the public comments. We shortened the manuscript by reducing the paragraph that focuses on computational classification models and reduced the discussions by about half in length.

      Line 55: What are chromatin-associated proteins, i.e. are they histone modifications?

      To clarify the definition used from the citation we changed the sentence to the following:

      “For instance, Partridge et al. studied the HOT loci in the context of 208 proteins including TFs, cofactors, and chromatin regulators which they called chromatin-associated proteins.”

      Though most of the paper can be cut short to avoid analysis paralysis for readers, there are details that still need filling in. For example, how did the authors perform PCA analysis, i.e. what are the features of each data point in the PCA analysis? Lines 214-215: How do we calculate the number of multi-way contacts in Hi-C data?

      We added clarifying descriptions and changed the mentioned sentences to the following:

      PCA:

      “To analyze the signatures of unique DAPs in HOT loci, we performed a PCA analysis where each HOT locus is represented by a binary (presence/absence) vector of length equal to the total number of DAPs analyzed.”

      Multi-way contacts on loop anchors:

      “To investigate further, we analyzed the loop anchor regions harboring HOT loci and observed that the number of multi-way contacts on loop anchors (i.e. loci which serve as anchors to multiple loops) correlates with the number of bound DAPs (rho=0.84 p-value<10E-4; Pearson correlation). “

      - Lines 251-252: How did the referenced study categorize DAPs? It is important for any manuscript to be self-contained.

      We added the explanation and changed the sentence to the following:

      “To test this hypothesis, we classified the DAPs into those two categories using the definitions provided in the study (Lambert et al. 2018) 28, where the TFs are classified by manual curation through extensive literature review and supported by annotations such as the presence of DNA-binding domains and validated binding motifs. Based on this classification, we categorized the ChIP-seq signal values into these two groups.“

      - Lines 181-185, sentences starting with 'To test' can be moved to the methods, leaving only brief mentions of the statistic tests if needed.

      We removed the mentioned sentence and moved to the supplemental methods (1.4).

      - Lines 217-220: I find this sentence extremely redundant unless it can offer more specific insights about a particular set of DAPs or if the DAPs are closer/or a proven distal enhancer to a confirmed causal gene.

      We removed the mentioned sentence from the text.

      - Lines 243-246: How did the authors determine the set DAPs that have stabilizing effects, and how exactly are the 'stabilizing effects' observed/measured?

      We added explanations to Supplemental Methods 3.1 and Fig S18, S19.

      While addressing this comment we realized that the reported value of the ratio is 1.91x, not 1.7x. We corrected that value in the main text and added the p-value.

      - When discussing the phastCons scores analyses, such as in lines 268-271, how did the authors calculate the relationship between phastCons scores and HOT loci, i.e. was the score averaged across the 400-bp locus to obtain a locus-specific conservation score?

      Yes, per-locus conservation scores were averaged over the bps of loci. We added this clarification to the methods.

      - Line 311: What is the role of the 'control sets' in the analyses of the sequence's relationship with HOT?

      In this specific case, the control sets are used as background or negative sets to set up the classification tasks. In other words, we are asking, whether the HOT loci can be distinguished when compared to random chromatin-accessible regions, promoters, or regular enhancers. We clarified this in the text.

      - I also find the discussion about different machine learning methods that classify HOT loci based on sequence contexts quite redundant UNLESS the authors decide to go further into the features' importance (such as motifs) in the models that predict/ are associated with HOT loci, which in itself can constitute another study.

      We agree with the reviewer, and shortened the part with the discussions of models by limiting it to only 3 main models and moved the rest to the supplemental materials.

      - Can the authors clarify where they obtain data on super-enhancers?

      We obtained the super-enhancer definitions from the original study (Hnisz et al. 2013, PMID: 24119843) where the super-enhancers were defined for multiple cell lines. We clarified this in the methods.

      - Figure 1B, the x and y axis should be clarified.

      We clarified it by using MAX as an example case in the figure caption as follows:

      “Prevalence of DAPs in HOT loci. Each dot represents a DAP. X-axis: percentage of HOT loci in which DAP is present (e.g. MAX is present in 80% of HOT loci). Y-axis: percentage of total peaks of DAPs that are located in HOT loci (e.g. 45% of all the ChIP-seq peaks of MAX is located in the HOT loci). Dot color and size are proportional to the total number of ChIP-seq peaks of DAP.”

      Reviewer #3 (Recommendations For The Authors):

      The list of proteins associated with different types of genomic loci at a meta level (enhancers, promoters, and gene body etc.), and an annotation of the genome at the specific loci level.

      The authors use a wide range of acronyms throughout the text and figure legends, they do a reasonably good job, but the main text section "HOT-loci are enriched in causal variants" and Figure 8 would be materially improved if they held it to the same standard.

      Size is a physical property and not a physicochemical property.

      We thank the reviewer for their comments and suggestions. We added a table to supplemental files with detailed annotations of analyzed loci.

      We reviewed the section “HOT loci are enriched in causal variants” and corrected a few mismatches in the acronyms.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Wang and colleagues identify biallelic variants of DNAH3 in four unrelated Han Chinese infertile men through whole-exome sequencing, which contributes to abnormal sperm flagellar morphology and ultrastructure. To investigate the importance of DNAH3 in male infertility, the authors generated crispant Dnah3 knockout (KO) male mice. They observed that KO mice are also infertile, showing a severe reduction in sperm movement with abnormal IDA (inner dynein arms) and mitochondrion structure. Moreover, nonfunctional DNAH3 expression decreased the expression of IDA-associated proteins in the spermatozoa of patients and KO mice, which are involved in the disruption of sperm motility. Interestingly, the infertility of patients and KO mice is rescued by intracytoplasmic sperm injection (ICSI). Taken together, the authors propose that DNAH3 is a novel pathogenic gene for asthenoterozoospermia and male infertility.

      Strengths:

      This work investigates the role of DNAH3 in sperm mobility and male infertility. By using gold-standard molecular biology techniques, the authors demonstrate with exquisite resolution the importance of DNAH3 in sperm morphology, showing strong evidence of its role in male infertility. Overall, this is a very interesting, well-written, and appealing article. All aspects of the study design and methods are well described and appropriate to address the main question of the manuscript. The conclusions drawn are consistent with the analyses conducted and supported by the data.

      Weaknesses:

      The paper is solid, and in its current form, I have not detected relevant weaknesses.

      We thank the comments from the reviewer very much.

      Reviewer #2 (Public Review):

      Wang et al. investigated the role of dynein axonemal heavy chain 3 (DNAH3) in male infertility. They found that variants of DNAH3 were present in four infertile men, and the deficiency of DNAH3 in sperm affects sperm mobility. Additionally, they showed that Dnah3 knockout male mice are infertile. Furthermore, they demonstrated that DNAH3 influences inner dynein arms by regulating several DNAH proteins. Importantly, they showed that intracytoplasmic sperm injection (ICSI) can rescue the infertility in Dnah3 knockout mice and two patients with DNAH3 variants.

      Strengths:

      The conclusions of this paper are well-supported by data.

      Weaknesses:

      The sample/patient size is small; however, the findings are consistent with those of a recent study on DNAH3 in male infertility involving 432 patients.

      We extend our sincere gratitude to the expert reviewers for their valuable comments and insightful suggestions.

      A cohort of 587 unrelated infertile men with asthenoteratozoospermia was recruited to investigate the potential genetic etiology using WES. In addition to mutations in DNAH3 identified in four patients, mutations in serval other genes previous reported by our group, including CFAP65 (Zhang et al., 2019. PMID: 31571197), DNAH8 (Yang et al., 2020. PMID: 32681648), DNAH12 (Li et al., 2022. PMID: 34791246), FISIP2 (Zheng et al., 2023. PMID: 35654582), CEP128 (Zhang et al., 2022. PMID: 35296684), CEP78 (Zhang et al., 2022. PMID: 36206347), CT55 (Zhang et al., 2023. PMID: 36481789), SPATA20 (Wang et al., 2023. PMID: 36415156), TENT5D (Zhang et al., 2024. PMID: 38228861), CFAP52 (Jin et al., 2023. PMID: 38126872), CEP70 (Ruan et al., 2023. PMID: 36967801), PRSS55 (Liu et al., 2022. PMID: 35821214), as well as other unreported variants were also identified.

      Reviewer #3 (Public Review):

      Summary:

      (1) To further explore the genetic basis of asthenoteratozoospermia, the authors performed whole-exome sequencing analyses among infertile males affected by asthenoteratozoospermia. Four unrelated Han Chinese patients were found to carry biallelic variations of DNAH3, a gene encoding IDA-associated protein.

      (2) To verify the function of IDA associated protein DNAH3, the authors generated a Dnah3-KO mouse model and revealed that the loss of DNAH3 leads to severe male infertility as a result of the severe reduction in sperm movement with the abnormal IDA and mitochondrion structures.

      (3) Mechanically, they confirmed decreased expression of IDA-associated proteins (including DNAH1, DNAH6 and DNALI1) in the spermatozoa from patients with DNAH3 mutations and Dnah3-KO male mice.

      (4) Then, they also found that male infertility caused by DNAH3 deficiency could be rescued by intracytoplasmic sperm injection (ICSI) treatment in humans and mice.

      Strengths:

      (1) In addition to existing research, the authors provided novel variants of DNAH3 as important factors leading to asthenoteratozoospermia. This further expands the spectrum of pathogenic variants in asthenoteratozoospermia.

      (2) By mechanistic studies, they found that DNAH3 deficiency led to decreased expression of IDA-associated proteins, which may be used to explain the disruption of sperm motility and reduced fertility caused by DNAH3 deficiency.

      (3) Then, successful ICSI outcomes were observed in patients with DNAH3 mutations and Dnah3 KO mice, which will provide an important reference for genetic counselling and clinical treatment of male infertility.

      We are very grateful for the reviewer's careful comments.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      I have carefully read the revised versions of this manuscript, and I would like to thank the authors for addressing all my previous concerns.

      I have no additional comments or suggestions.

      We thank the reviewer for reviewing our revised manuscript.

      Reviewer #2 (Recommendations for The Authors):

      (1) Statistical analyses should be provided alongside the quantification (Fig S1B, S7C).

      According to the suggestions of the reviewer, we have added statistical analyses of the corresponding quantification in the legends of Figure S1 and Figure S7.

      (2) The numbers of sperms counted in Fig S1A should be listed.

      In response to reviewer's valuable suggestions. We have listed the corresponding ratio of different morphological defects in sperm tail of the patients in Figure S1A.

      (3) Due to the high similarities in experimental design, data and conclusions between the current study and previously published work by Meng et al. (2024), as well as the very similar titles of the two studies, it is crucial to emphasize the differences in the Discussion section.

      Many thanks for reviewer's kind suggestions for our revised manuscript.

      Employing whole-exome sequencing (WES) on infertile men to identify candidate variants, followed by in-silico and functional analysis of these variants, and generating mouse models using CRISPR-Cas9 technology, has proven to be an efficient and widely used approach for uncovering the causative genes of male infertility associated with sperm defects. Both our study and the recent work by Meng et al. utilized this approach to verify whether DNAH3 mutations are a cause of asthenoteratozoospermia. Additionally, we have also updated the title of our study to: 'DNAH3 deficiency causes flagellar inner dynein arm loss and male infertility in humans and mice'.

      Meng et al. reported DNAH3 mutations in asthenoteratozoospermia affected patients, revealing multiple morphological defects in sperm tail. Moreover, ultrastructural abnormalities of the flagellar axoneme in the patients were evident in these patients, characterized by a disrupted '9+2' arrangement and the notable absence of IDAs. Additionally, they generated Dnah3 KO mice, which were infertile and exhibited moderate morphological abnormalities. While the '9+2' microtubule arrangement in the flagella of their Dnah3 KO mice remained intact, the IDAs on the microtubules were partially absent. In our study, we observed similar phenotypic differences between DNAH3-deficient patients and Dnah3 KO mice. Both studies suggest that DNAH3 plays a crucial role in human and mouse male reproduction.

      However, there are notable differences between the two studies. Firstly, the phenotypes of Dnah3 KO mice showed slight differences. Meng et al. generated two Dnah3 KO mouse models (KO1 and KO2), and both of which exhibited significantly higher sperm motility and progressive motility than in our study, where nearly all sperm were completely immobile. Furthermore, their Dnah3 KO2 mice even displayed motility comparable to WT mice and retained partial fertility. We speculate that these differences may be attributed to variations in mouse genetic background or the presence of a truncated DNAH3 protein resulting from specific knockout strategies. Secondly, we conducted additional research and uncovered novel findings. We revealed that male infertility caused by DNAH3 mutations follows an autosomal recessive inheritance pattern, as confirmed through Sanger sequencing of the patients' parents. We also discovered the dynamic expression and localization of DNAH3 during spermatogenesis in humans and mice through immunofluorescent staining. We further found that DNAH3 deficiency had no impact on ciliary development in the oviduct or on oogenesis in mice, resulting in normal female fertility. Moreover, in the absence of DNAH3 in both humans and mice, the expression of IDA-associated proteins, including DNAH1, DNAH6 and DNALI1, was decreased, while the expression of ODA-associated proteins remained unaffected, indicating that DNAH3 is involved in sperm axonemal development, specifically through its role in the assembly of IDAs. Collectively, our study corroborates the findings of Meng et al., and provides additional unique insights, comprehensively elucidating the critical role of DNAH3 in human and mouse spermatogenesis.

      We have added these discussions in line 275 to line 306.

      Reviewer #3 (Recommendations for The Authors):

      I have no more recommendations for the authors.

      We thank the reviewer for reviewing our revised manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors showed that autophagy-related genes are involved in plant immunity by regulating the protein level of the salicylic acid receptor, NPR1.<br /> The experiments are carefully designed and the data is convincing. The authors did a good job of understanding the relationship between ATG6 and NRP1.

      The authors have addressed most of my previous concerns.

      Thank you so much for acknowledging our research. It is incredibly rewarding to see our work recognized. We hope that our findings will inspire new perspectives and foster further exploration in this area.

      Reviewer #2 (Public Review):

      The manuscript by Zhang et al. explores the effect of autophagy regulator ATG6 on NPR1-mediated immunity. The authors propose that ATG6 directly interacts with NPR1 in the nucleus to increase its stability and promote NPR1-dependent immune gene expression and pathogen resistance. This novel role of ATG6 is proposed to be independent of its role in autophagy in the cytoplasm. The authors demonstrate through biochemical analysis that ATG6 interacts with NPR1 in yeast and very weakly in vitro. They further demonstrate using overexpression transgenic plants that in the presence of ATG6-mcherry the stability of NPR1-GFP and its nuclear pool is increased.

      Comments on revised version:

      The authors demonstrate the correlation between overexertion of atg6 and higher stability and activity of npr1. They claim a novel activity of atg6 in the nucleus.

      Overall, the experimental scope of the study is solid, however, the over-interpretation of the results substantially reduces the significance and value of this study for the target plant immunity readership.

      Thank you very much for you constructive and insightful comments, as well as for acknowledging the experimental scope of this study. In addition, we have made every effort to address the over-interpretation of the results, as per your comments, ensuring they are more accurate and concise. In the revised version, the modified content has been highlighted in blue to clearly indicate the changes made.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors have addressed most of my concerns. I have no further comments.

      Thank you so much for acknowledging our research. It is incredibly rewarding to see our work recognized. We hope that our findings will inspire new perspectives and foster further exploration in this area.

      Reviewer #2 (Recommendations For The Authors):

      As I previously commented, in fig. 2a and c, the discrepancy between levels of atg6-mcherry in microscope image vs WB has to be explained. The explanation provided by the authors is incomplete and may mislead. The most likely reason for the difference is that the fluorescence signal in fig. 2a is predominantly from free mCherry, rather than the atg6-mcherry fusion. This has to be included in the main text to avoid misleading the reader.

      Thank you very much for you constructive and insightful comments, in response to your comments, we have incorporated the necessary explanations into the revised manuscript (lines 160-164).

      In fig. 1B, the PD fraction has to show the size range of free GST. Also, please use "anti" to indicate that these are immunoblots,.

      Thank you for pointing this out. In the revised manuscript, we identified the range of free GST and used "anti:GST and anti:His" to indicate that these are immunoblots.

      In fig 1C, the WB has to show the free GFP band in the input and IP fractions together with NPR1, rather than in separate blots.

      Thank you for bringing this to our attention. Fig. 1c has been replaced, and the updated image now shows the free GFP band in the input and IP fractions together with NPR1-GFP.

      In fig. 1d, the bifc signal has to be quantified from multiple images across the biological repeats. Also, there's no significance in showing the chlorophyll autofluorescence. What is the purpose of this? They need to use a nuclear marker instead.

      Thank you for your suggestion. Based on your input, we utilized ImageJ software to quantify the YFP fluorescence signal. A total of n = 15 independent images were analyzed, and the corresponding results have been added to Figure 1e. Monitoring chlorophyll autofluorescence serves as a useful background signal, aiding in the distinction between the fluorescence signal of the target protein and background noise. This approach helps reduce potential signal overlap or interference during the experiment, thereby enhancing the reliability of the results.

      Please provide a sequence alignment with multiple ATGs to show the conservation of the presumed bipartite NLS. This information has to be included in the main data.

      Thank you very much for your constructive and insightful comments. We analyzed the putative nuclear localization signal (NLS) in the ATG6 protein sequence using the online INSP (Identification of Nuclear Signal Peptide) prediction software (http://www.csbio.sjtu.edu.cn/bioinf/INSP/). The prediction results indicated the presence of a potential nuclear localization sequence "FLKEKKKKK" within the ATG6 protein, spanning from the 217th to the 223rd amino acid. Additionally, we utilized INSP to investigate the nuclear localization sequences of various ATG proteins (TaATG6a [1], TaATG6b [1], TaATG6c [1], SlATG8h [2]) that have been previously reported to localize in the nucleus. This analysis revealed a relatively conserved NLS sequence motif: "E/K-K/E-K-K-L/K-K" in these ATG proteins. In line with your suggestion, the results of this sequence comparison have been incorporated into the revised manuscript as Figure 2c. The revised manuscript includes a description of the corresponding results. (lines 146-156).

      Fig. 3d and f, how many blots are used for this quantification? Please include all the individual analyzed blots in the supplementary data. In addition, if you present such quantification with error bars, then statistical analysis is required.

      Thank you for pointing this out. In Figure 3d, three independent blots were utilized for this quantification. In Figure 3f, two independent blots were used. The individual analyzed blots have been included in the supplementary Figure 7. We also conducted a statistical analysis as shown in Fig 3d and f, with a detailed description included in the legend section (lines 858 and 861).

      In fig. 4, please indicate what is the normalizing gene. Also, what are the error bars?

      Thank you for pointing this out. In Fig.4, values are means ± SD (n = 3 biological replicates). The AtActin gene was used as the internal control. We have included a detailed description in the figure notes

      In fig. 4b the labeling is missing.

      Thank you for bringing this to our attention. We have included the labeling for Fig. 4 in the revised manuscript.

      Lines 236-239: this statement contradicts the data in fig. 5b: the levels of NPR1-GFP are actually reduced in the presence of atg6 at 24h. So, this result has to be described more accurately by stating that the increase is transient, and it is evident more at 8h, but not at 20-24h.

      Thank you very much for you constructive and insightful comments. We have revised the description of this section to provide a more accurate account of the results (lines 253-258).

      Reference

      (1) Yue J, Sun H, Zhang W, et al. Wheat homologs of yeast ATG6 function in autophagy and are implicated in powdery mildew immunity. BMC Plant Biol. 2015;15:95.

      (2) Li F, Zhang M, Zhang C, et al. Nuclear autophagy degrades a geminivirus nuclear protein to restrict viral infection in solanaceous plants. New Phytol. 2020;225:1746-1761.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Freas et al. investigated if the exceedingly dim polarization pattern produced by the moon can be used by animals to guide a genuine navigational task. The sun and moon have long been celestial beacons for directional information, but they can be obscured by clouds, canopy, or the horizon. However, even when hidden from view, these celestial bodies provide directional information through the polarized light patterns in the sky. While the sun's polarization pattern is famously used by many animals for compass orientation, until now it has never been shown that the extremely dim polarization pattern of the moon can be used for navigation. To test this, Freas et al. studied nocturnal bull ants, by placing a linear polarizer in the homing path on freely navigating ants 45 degrees shifted to the moon's natural polarization pattern. They recorded the homing direction of an ant before entering the polarizer, under the polarizer, and again after leaving the area covered by the polarizer. The results very clearly show, that ants walking under the linear polarizer change their homing direction by about 45 degrees in comparison to the homing direction under the natural polarization pattern and change it back after leaving the area covered by the polarizer again. These results can be repeated throughout the lunar month, showing that bull ants can use the moon's polarization pattern even under crescent moon conditions. Finally, the authors show, that the degree in which the ants change their homing direction is dependent on the length of their home vector, just as it is for the solar polarization pattern. 

      The behavioral experiments are very well designed, and the statistical analyses are appropriate for the data presented. The authors' conclusions are nicely supported by the data and clearly show that nocturnal bull ants use the dim polarization pattern of the moon for homing, in the same way many animals use the sun's polarization pattern during the day. This is the first proof of the use of the lunar polarization pattern in any animal.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors aimed to understand whether polarised moonlight could be used as a directional cue for nocturnal animals homing at night, particularly at times of night when polarised light is not available from the sun. To do this, the authors used nocturnal ants, and previously established methods, to show that the walking paths of ants can be altered predictably when the angle of polarised moonlight illuminating them from above is turned by a known angle (here +/- 45 degrees).

      Strengths: 

      The behavioural data are very clear and unambiguous. The results clearly show that when the angle of downwelling polarised moonlight is turned, ants turn in the same direction. The data also clearly show that this result is maintained even for different phases (and intensities) of the moon, although during the waning cycle of the moon the ants' turn is considerably less than may be expected.

      Weaknesses: 

      The final section of the results - concerning the weighting of polarised light cues into the path integrator - lacks clarity and should be reworked and expanded in both the Methods and the Results (also possibly with an extra methods figure). I was really unsure of what these experiments were trying to show or what the meaning of the results actually are.

      Rewrote these sections and added figure panel to Figure 6.

      Impact: 

      The authors have discovered that nocturnal bull ants while homing back to their nest holes at night, are able to use the dim polarised light pattern formed around the moon for path integration. Even though similar methods have previously shown the ability of dung beetles to orient along straight trajectories for short distances using polarised moonlight, this is the first evidence of an animal that uses polarised moonlight in homing. This is quite significant, and their findings are well supported by their data.

      Reviewer #3 (Public Review): 

      Summary: 

      This manuscript presents a series of experiments aimed at investigating orientation to polarized lunar skylight in a nocturnal ant, the first report of its kind that I am aware of.

      Strengths: 

      The study was conducted carefully and is clearly explained here. 

      Weaknesses: 

      I have only a few comments and suggestions, that I hope will make the manuscript clearer and easier to understand.

      Time compensation or periodic snapshots 

      In the introduction, the authors compare their discovery with that in dung beetles, which have only been observed to use lunar skylight to hold their course, not to travel to a specific location as the ants must. It is not entirely clear from the discussion whether the authors are suggesting that the ants navigate home by using a time-compensated lunar compass, or that they update their polarization compass with reference to other cues as the pattern of lunar skylight gradually shifts over the course of the night - though in the discussion they appear to lean towards the latter without addressing the former. Any clues in this direction might help us understand how ants adapted to navigate using solar skylight polarization might adapt use to lunar skylight polarization and account for its different schedule. I would guess that the waxing and waning moon data can be interpreted to this effect.

      Added a paragraph discussing this distinction in mechanisms and the limits of the current data set in untangling them. An interesting topic for a follow up to be sure.

      Effects of moon fullness and phase on precision 

      As well as the noted effect on shift magnitudes, the distributions of exit headings and reorientations also appear to differ in their precision (i.e., mean vector length) across moon phases, with somewhat shorter vectors for smaller fractions of the moon illuminated. Although these distributions are a composite of the two distributions of angles subtracted from one another to obtain these turn angles, the precision of the resulting distribution should be proportional to the original distributions. It would be interesting to know whether these differences result from poorer overall orientation precision, or more variability in reorientation, on quarter moon and crescent moon nights, and to what extent this might be attributed to sky brightness or degree of polarization.

      See below for response to this and the next reviewer comment

      N.B. The Watson-Williams tests for difference in mean angle are also sensitive to differences in sample variance. This can be ruled out with another variety of the test, also proposed by Watson and Williams, to check for unequal variances, for which the F statistic is = (n2-1)*(n1-R1) / (n1-1)*(n2-R2) or its inverse, whichever is >1. 

      We have looked at the amount of variance from the mean heading direction in terms of both the shifts and the reorientations and found no significant difference in variance between all relevant conditions. It is possible (and probably likely) that with a higher n we might find these differences but with the current data set we cannot make statistical statements regarding degradations in navigational precision.  

      As an additional analysis to address the Watson-Williams test‘s sensitivity to changes in variance, we have added var test comparisons for each of the comparisons, which is a well-established test to compare variance changes. None of these were significantly different, suggesting the observed differences in the WW tests are due to changes in the mean vector and not the distribution. We have added this test to the text.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      I have only very few minor suggestions to improve the manuscript: 

      (1) While I fully agree with the authors that their study, to the best of my knowledge, provides the first proof (in any animal) of the use of the moon's polarization pattern, the many repetitions of this fact disturb the flow of the text and could be cut at several instances. 

      Yes, it is indeed repeated to an annoying degree. 

      We have removed these beyond bookending mentions (Abstract and Discussion).

      (2) In my opinion, the authors did not change the "ambient polarization pattern" when using the linear polarization filter (e.g., l. 55, 170, 177 ...). The linear polarizer presents an artificial polarization pattern with a much higher degree of polarization in comparison to the ambient polarization pattern. I would suggest re-phrasing this, to emphasize the artificial nature of the polarization pattern under the polarizer.

      We have made these suggested changes throughout the text to clarify. We no longer say the ambient pattern was   

      (3) Line 377: I do not see the link between the sentence and Figure 7 

      Changed where in the discussion we refer to Figure 7.

      (4) Figure 7 upper part: In my opinion, the upper part of Figure 7 does not add any additional value to the illustration of the data as compared to Figure 5 and could be cut.

      We thought it might be easier for some reader to see the shifts as a dial representation with the shift magnitude converted to 0-100% rather than the shifts in Figure 5. This makes it somewhat like a graphical abstract summarising the whole study.

      I agree that Figure 5 tells the same story but a reader that has little background in directional stats might find figure 7 more intuitive. This was the intent at least. 

      If it becomes a sticking point, then we can remove the upper portion.  

      Reviewer #2 (Recommendations For The Authors): 

      Minor corrections and queries 

      Line 117: THE majority 

      Corrected

      Lines 129-130: Do you have a reference to support this statement? I am unaware of experiments that show that homing ants count their steps, but I could have missed it.

      We have added the references that unpack the ant pedometer.  

      Line 140: remove "the" in this line. 

      Removed

      Line 170: We need more details here about the spectral transmission properties of the polariser (and indeed which brand of filter, etc.). For instance, does it allow the transmission of UV light?

      Added

      Line 239: "...tested identicALLY to ...." 

      Corrected

      Lines 242-258 (Vector testing): I must admit I found the description of these experiments very difficult to follow. I read this section several times and felt no wiser as a result. I think some thought needs to be given to better introduce the reader to the rationale behind the experiment (e.g., start by expanding lines 243-246, and maybe add a methods figure that shows the different experimental procedures).

      I have rewritten this section of the methods to clearly state the experiment rational and to be clearer as to the methodology.

      Also added a methods panel to Figure 6.

      Line 247: "reoriented only halfway". What does this mean? Do you mean with half the expected angle?

      Yes, this is a bit unclear. We have altered for clarity:

      ‘only altered their headings by about half of the 45° e-vector shift (25.2°± 3.7°), despite being tested on near-full-moon nights.’

      Results section (in general): In Figure 1 (which is a very nice figure!) you go to all the trouble of defining b degrees (exit headings) and c degrees (reorientation headings), which are very intuitive for interpreting the results, and then you totally abandon these convenient angles in favour of an amorphous Greek symbol Phi (Figs. 2-6) to describe BOTH exit and reorientation headings. Why?? It becomes even more confusing when headings described by Phi can be typically greater than 300 degrees in the figures, but they are never even close to this in the text (where you seem to have gone back to using the b degrees and c degrees angles, without explicitly saying so). Personally, I think the b degrees and c degrees angles are more intuitive (and should be used in both the text and the figures), but if you do insist on using Phi then you should use it consistently in both the text and the figures. 

      Replaced Phi with b° and c° for both figures and in the text.

      Finally, for reorientation angles in Figure 4A, you say that the angle is 16.5 degrees. This angle should have been 143.5 degrees to be consistent with other figures. 

      Yes, the reorientation was erroneously copied from the shift data (it is identical in both the +45 shift and reorientation for Figure 4A). This has now been corrected

      Line 280, and many other lines: Wherever you refer to two panels of the same figure, they should be written as (say) Figure 2A, B not Figure 2AB.

      Changed as requested throughout the text.

      Line 295 (Waxing lunar phases): For these experiments, which nest are you using? 1 or 2?

      We have added that this is nest 1. 

      Figure 3B: The title of this panel should be "Waxing Crescent Moon" I think. 

      Ah yes, this is incorrect in the original submission. I have fixed this.

      Lines 312-313: Here it sounds as though the ants went right back to the full +/- 45 degrees orientations when they clearly didn't (it was -26.6 degrees and 189.9 degrees). Maybe tone the language down a bit here.

      Changed this to make clear the orientation shift is only ‘towards’ the ambient lunar e-vector.

      Line 327: Insert "see" before "Figure 5" 

      Added

      Line 329: See comment for Line 295. 

      We have added that this is nest 1. 

      Lines 357-373 (Vector testing): Again, because of the somewhat confusing methods section describing these experiments, these results were hard to follow, both here and in the Discussion. I don't really understand what you have shown here. Re-think how you present this (and maybe re-working the Methods will be half the battle won). 

      I have rewritten these sections to try to make clear these are ant tested with differences in vector length 6m vs. 2m, tested at the same location. Hopefully this is much clearer, but I think if these portions remain a bit confusing that a full rename of the conditions is in order. Something like long vector and short vector would help but comes with the problem of not truly describing what the purpose of the test is which is to control for location, thus the current condition names. As it stands, I hope the new clarifications adequately describe the reasoning while keeping the condition names. Of course, I am happy to make more changes here as making this clear to readers is important for driving home that the path integrator is in play.

      See current change to results as an example: ‘Both forgers with a long ~6m remaining vector (Halfway Release), or a short ~2m remaining vector (Halfway Collection & Release), tested at the same location_,_ exhibited significant shifts to the right of initial headings when the e-vector was rotated clockwise +45°.’

      Line 361: I think this should be 16.8 not 6.8 

      Yes, you are correct. Fixed in text (16.8).

      Line 365: I think this should be -12.7 not 12.7 

      Yes, you are correct. Fixed in text (–12.7).

      Line 408: "morning twilight". Should this be "morning solar twilight"? Plus "M midas" should be "M. midas"

      Added and fixed respectively.

      Line 440. "location" is spelt wrong. 

      Fixed spelling.

      Line 444: "...WITH longer accumulated vectors, ..." 

      Added ‘with’ to sentence. 

      Line 447: Remove "that just as"

      Removed.

      Line 448: "Moonlight polarised light" should be "Polarised moonlight" 

      Corrected.

      Lines 450-453: This sentence makes little sense scientifically or grammatically. A "limiting factor" can't be "accomplished". Please rephrase and explain in more detail.

      This sentence has been rephrased:

      ‘The limiting factors to lunar cue use for navigation would instead be the ant’s detection threshold to either absolute light intensity, polarization sensitivity and spectral sensitivity. Moonlight is less UV rich compared to direct sunlight and the spectrum changes across the lunar cycle (Palmer and Johnsen 2015).’

      Line 474: Re-write as "... due to the incorporation of the celestial compass into the path integrator..."

      Added.

      Reviewer #3 (Recommendations For The Authors): 

      Minor comments 

      Line 84 I am not sure that we can infer attentional processes in orientation to lunar skylight, at least it has not yet been investigated.

      Yes, this is a good point. We have changed ‘attend’ to ‘use’.  

      Line 90 This description of polarized light is a little vague; what is meant by the phrase "waves which occur along a single plane"? (What about the magnetic component? These waves can be redirected, are they then still polarized? Circular polarization?). I would recommend looking at how polarized light is described in textbooks on optics.

      We have rewritten the polarised light section to be clearer using optics and light physics for background. 

      Line 92 The phrase "e-vector" has not been described or introduced up to this point.

      We now introduce e-vector and define it. 

      ‘Polarised light comprises light waves which occur along a single plane and are produced as a by-product of light passing through the upper atmosphere (Horváth & Varjú 2004; Horváth et al., 2014). The scattering of this light creates an e-vector pattern in the sky, which is arranged in concentric circles around the sun or moon's position with the maximum degree of polarisation located 90° from the source. Hence when the sun/moon is near the horizon, the pattern of polarised skylight is particularly simple with uniform direction of polarisation approximately parallel to the north-south axes (Dacke et al., 1999, 2003; Reid et al. 2011; Zeil et al., 2014).’

      Happy to make further changes as well.  

      Line 107 Diurnal dung beetles can also orient to lunar skylight if roused at night (Smolka et al., 2016), provided the sky is bright enough. Perhaps diurnal ants might do the same?

      Added the diurnal dung beetles mention as well as the reference.

      Also, a very good suggestion using diurnal bull ants.

      Line 146 Instead of lunar calendar the authors appear to mean "lunar cycle". 

      Changed

      Line 165 In Figure 1B, it looks like visual access to the sky was only partly "unobstructed". Indeed foliage covers as least part of the sky right up to the zenith.

      We have added that the sky is partially obstructed. 

      Line 179 This could also presumably be checked with a camera? 

      For this testing we tried to keep equipment to a minimum for a single researcher walking to and from the field site given the lack of public transport between 1 and 4am. But yes, for future work a camera based confirmation system would be easier. 

      Line 243 The abbreviation "PI" has not been described or introduced up to this point.

      Changes to ‘path integration derived vector lengths….’

      Line 267 The method for comparing the leftwards and rightwards shifts should be described in full here (presumably one set of shifts was mirrored onto the other?).

      We have added the below description to indicate the full description of the mirroring done to counterclockwise shifts.

      ‘To assess shift magnitude between −45° and +45° foragers within conditions, we calculated the mirror of shift in each −45° condition, allowing shift magnitude comparisons within each condition. Mirroring the −45° conditions was calculated by mirroring each shift across the 0° to 180° plane and was then compared to the corresponding unaltered +45 condition.’

      Discussion Might the brightness and spectrum of lunar skylight also play a role here?

      We have added a section to the discussion to mention the aspects of moonlight which may be important to these animals, including the spectrum, brightness and polarisation intensity.  

      Line 451 The sensitivity threshold to absolute light intensity would not be the only limiting factor here. Polarization sensitivity and spectral sensitivity may also play a role (moonlight is less UV rich than sunlight and the spectrum of twilight changes across the lunar cycle: Palmer & Johnsen, 2015). 

      Added this clarification.

      Line 478 Instead of the "masculine ordinal" symbol used (U+006F) here a degree symbol (U+00B0) should be used.

      Ah thank you, we have replaced this everywhere in the text.  

      Line 485 It should be possible to calculate the misalignment between polarization pattern before and after this interruption of celestial cues. Does the magnitude of this misalignment help predict the size of the reorientation?

      Reorientations are highly correlated with the shift size under the filter, which makes sense as larger shifts mean that foragers need to turn back more to reorient to both the ambient pattern and to return to their visual route. Reorientation sizes do not show a consistent reduction compared to under-the-filter shifts when the lunar phase is low and is potentially harder to detect.

      I have reworked this line in the text as I do not think there is much evidence for misalignment and it might be more precise to say that overnight periods where the moon is not visible may adversely impact the path integrator estimate, though it is currently unknown the full impact of this celestial cue gap of if other cues might also play a role.

      Line 642 "from their" should be "relative to" 

      Changed as requested

      Figure 1B Some mention should be made of the differences in vegetation density. 

      Added a sentence to the figure caption discussing the differences in both vegetation along the horizon and canopy cover.

      Figures 2-6 A reference line at 0 degrees change might help the reader to assess the size of orientation changes visually. Confidence intervals around the mean orientation change would also help here.

      We have now added circular grid lines and confidence intervals to the circular plots. These should help make the heading changes clear to readers.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      eLife Assessment 

      This is a valuable study in the Jurkat T cell line that calls attention to phosphorylation of formin-like 1 β role and its role in polarization of CD63 positive extracellular vesicles (referred to as exosomes). The evidence presented in the Jurkat model is solid, but concerns have been raised about the statistical analysis and more details would be required to fully assess the significance of the results. For example, ANOVA is the method described, but it requires large amounts of normally distributed data in multiple groups and cannot be used to make pairwise comparisons within groups, which would require a post-hoc method (which is not discussed). In addition, the data showing forming-like 1 β in primary human T cells without and with a CAR are provided without quantification and don't investigate any of the novel claims, so doesn't address the relevance of Formin-like 1 β beyond the Jurkat model. Nonetheless, the consistent trends in the body of the study provide solid support for the claims.

      We acknowledge this general statement on statistics. Thus, we have now discussed and provided more details on the post-hoc method (Tukey), as a new Supplementary data S13 (p-values after applying tukey's method -post hoc- to the one-way anova for all the pairwise comparisons). Additionally, we have now provided quantitative data on the percentage of primary cells with and without CAR that show FMNL1 accumulations at the immune synapse (Suppl. Fig. S7). Regarding the data in primary human T cells, we have already changed the title of the manuscript to strictly adjust it to the main body of the data and our conclusions in the well-established Jurkat synapse model. We also want to emphasize that we have not pretended to extrapolate the relevance of our data regarding FMNL1 and exosomes beyond the Jurkat model. Thus, we have included some additional sentences and/or nuances in the Discussion to somewhat soften our statements in this regard (i.e. “…..provided that the FMNL1 effect on exosome secretion in Jurkat cells can be extended to primary T lymphocytes”) and to clarify this important point.

      Reviewer 1:

      (1) The main findings have been obtained in clones of Jurkat cells. They have not been confirmed in primary T cells. The only experiment performed in primary cells is shown in Figure S7 (primary human T lymphoblasts) for which only the distribution of FMNL1 is shown without quantification. No results presenting the effect of FMNL1 KO and expression of mutants in primary T cells are shown.

      Referee is right regarding the extension of exosome secretion studies to primary human T lymphocytes. Unfortunately, it is well known that primary T lymphocytes are extremely difficult to transfect. Moreover, the expression of our large bi-cistronic large plasmids (>15 Kb) is very inefficient, coupled with the challenge of expressing large proteins, such as the 180 kDa YFP-FMNL1 chimeric variants. The convergence of all these undesirable factors synergistically hampers these studies and we have been unable to consistently achieve enough transfection efficiency to perform these experiments. However, the role of FMNL1 on MTOC/MVB polarization in Jurkat cells, confirmed in this manuscript, has been already extended to primary CD8+ T cell clones (DOI10.1016/j.immuni.2007.01.008). Given that exosome secretion requires

      MTOC/MVB polarization both in Jurkat and primary T lymphoblasts (10.1038/cdd.2010.184, 10.3389/fimmu.2019.00851), this suggests FMNL1 may also control exosome secretion in primary T cells, although the formal demonstration will require further research.

      A new sentence has been included in the Discussion to address this important point. Regarding the second request, we have quantified the images mentioned in Suppl. Fig. S7, and the percentages of fixed T cells showing FMNL1 accumulations at the immune synapse are included in the figure legend.

      (2) Analysis in- depth of the defect in actin remodeling (quantification of the images, analysis of some key actors of actin remodeling) is still lacking. Only Factin is shown, no attempt to look more precisely at actors of actin remodeling has been done.

      The referee is right. Since we have obtained new results on the role of FMNL1 on actin remodeling, we have focused on this formin, which is already a key actor in this process. In this context, we have previously shown that the formin Dia1, another major actor of actin remodeling in T lymphocytes along with FMNL1 (DOI10.1016/j.immuni.2007.01.008), does not undergo phosphorylation upon PKC activation (Suppl. Fig. 5 in https://doi.org/10.1080/20013078.2020.1759926). Since our aim was to unravel the PKC-mediated pathway controlling actin remodeling, we have ruled out more studies on Dia1. Therefore, we have included a new sentence to emphasize the specific role of FMNL1 phosphorylation, but not Dia1, in this regard. Nonetheless, future studies aimed to identifying new important players in this or related pathways could offer significant insights.

      (3) The defect in the secretion of extracellular vesicles is still very preliminary. Examples of STED images given by the authors are nice, yet no quantification is performed.

      The referee is right regarding this point and we acknowledge this comment. Accordingly, we have now quantified the STED images and provided numerical data on the percentages of cells exhibiting the observed phenotypes (see the figure legend for Fig. 10).

      (4) Results shown in Figure S12 on the colocalization of proteins phosphorylated on Ser/Thr are still not convincing. It seems indeed that "phospho-PKC" is labeling more preferentially the CMAC positive cells (Raji) than the Jurkat T cells. It is thus particularly difficult to conclude on the colocalization and even more on the recruitment of phosphorylated-FMNL1 at the IS. Thus, these experiments are not conclusive and cannot be the basis even for their cautious conclusion: "Although all these data did not allow us to infer that FMNL1b is phosphorylated at the IS due to the resolution limit of confocal and STED microscopes, the results are compatible with the idea that both endogenous FMNL1 and YFP-FMNL1bWT are specifically phosphorylated at the cIS".

      The referee may be correct regarding the detail of the "phospho-PKC" labeling. However, it cannot be overlooked that Raji cells also contain proteins that are or may be potential PKC substrates. As a matter of fact, Raji cells also express FMNL1. In addition, MHCII triggering in B cells induces PKC activation (https://doi.org/10.1002/eji.200323351). Regarding which cell type is preferentially labeled, this is a variable topic depending on the analyzed synapse. 

      It is true that there are likely several PKC substrates, both in Jurkat in Raji cells, but our point is that one of these substrates either colocalizes with FMNL1 or is FMNL1 itself. We do not claim at any point that FMNL1 is the only PKC substrate, neither in Jurkat or in Raji cells. 

      Apparently, the referee has either overlooked our results or we did not emphasize them sufficiently. Our results effectively validated the PKC substrate antibody, both on endogenous phospho-FMNL1 and phospho-YFPFMNL1β by WB (Fig. 3). Moreover, the phospho-PKC does not recognize

      YFP-FMNL1β S1086A or S1086D variants (Fig. 3). Last, but not least, when FMNL1 is interfered in the Jurkat cell, the phospho-PKC does not colocalize with FMNL1, but it strongly colocalizes at the synapse with expressed YFPFMNL1βWT in the Jurkat cell (Fig. S11). Indeed YFP-FMNL1β belonged to the Jurkat cell. Taken together these results demonstrate: 1. the specificity of phospho-PKC antibody, 2. the phospho-PKC antibody certainly recognizes phosphorylated YFP-FMNL1β but not its non-phosphorylatable mutant variants, 3. the colocalization of phospho-PKC with anti-FMNL1 is specific. We have included some sentences to clarify these points and to avoid possible misunderstandings by potential readers.  We acknowledge the referee for his/her clarifying point, and we firmly believe our mentioned cautious conclusion is strictly correct, although we have tuned it to consider the possibility that a different PKC substrate could be closely associated to FMNL1, producing the observed colocalization: “Although all these data do not yet allow us to infer that FMNL1b is phosphorylated at the IS due to the resolution limits of super resolution microscopy and the possibility that another PKC substrate may be associated to FMNL1 or very close to FMNL1, in a strictly S1086-dependent manner”.

      To clear any doubt regarding which cell is labelled with phospho-PKC, we have changed the lower panels in Suppl. Fig. S12, and now is more evident that FMNL1 and phospho-PKC belong to the Jurkat cell.

      The study would benefit from a more careful statistical analysis. The dot plots showing polarity are presented for one experiment. Yet, the distribution of the polarity is broad. Results of the 3 independent experiments should be shown and a statistical analysis performed on the independent experiments.

      The referee is right and we have now included further post-hoc analyses data (Tukey) at Suppl. Fig S13. Tukey’s test values were included for all the dot plot figures. We have not included all the plots from 3 different experiments since the manuscript already contains 10+12 multi panel figures and is too large. However, we have stated in the figure legend that these independent experiments are representative of the data obtained from 3 independent experiments. Referee’s consideration regarding the broad distribution of polarity data is correct. We included in the first version of the manuscript a sentence in this regard, that it may have been overlooked: “Remarkably, one important feature of the IS consists of both the onset of the initial cell-cell contacts and the establishment of a mature, fully productive IS, are intrinsically stochastic, rapid and asynchronous processes (87, 88) (43). Thus, the score of the PI corresponding to the distance of MTOC/MVB with respect the IS (42) may be contaminated by background MTOC/MVB polarization, in great part due to the stochastic nature of IS formation (87)”.

    1. Author response:

      • The study does not clearly establish the relationship between Type 1 IFN and cancer therapy, and more robust data are needed to support the claim that tumor growth inhibition occurs via Type 1 IFN upregulation following ORMDL3 knockdown.

      We thank the reviewer’s concern. In Figure 6 we detected the expression of IFNB1 and ISGs in MC38 and LLC tumor upon ORMDL3 knockdown. At the mean time, we also used IHC to explore the abundance of RIG-I and ORMDL3 in these tumors. In addition, in figure S5 we performed western blots to detect the expression of RIG-I with or without ORMDL3 knockdown. All these results support our hypothesis that that ORMDL3 is a negative regulator of interferon via modulating RIG-I abundance.

      • There is ambiguity regarding whether ORMDL3 has a positive or negative role in the Type 1 IFN pathway, especially given conflicting findings in the literature that link higher ORMDL3 levels to increased Type 1 IFN expression.

      We appreciate the reviewer’s concern. In our system and experiments, we validated that ORMDL3 is a negative regulator of interferon, although there is also literature that links higher ORMDL3 levels to increased type-I IFN response. ORMDL3 has been reported associated with rhinovirus-induced childhood asthma (Nature.  2007;448(7152):470-473; N Engl J Med. 2013 Apr 11;368(15):1398-407), and ORMDL3 level is positively associated with rhinovirus abundance (N Engl J Med. 2013 Apr 11;368(15):1398-407).  There are reports indicating that ORMDL3 supports the replication of rhinovirus (for example, Am J Respir Cell Mol Biol. 2020 Jun;62(6):783-792). This phenomenon is consistent with our findings that higher ORMDL3 expression leads to lower interferon production, which facilitates viral replication. We believe that the different experimental conclusions obtained in these experiments are due to different experiment condition and different stimulation. In our research, we provided comprehensive studies at the molecular, cellular, and animal levels to support the conclusion that ORMDL3 is a negative regulator of type-I interferon.

      • The use of certain experimental models, such as HEK293T cells (which are not typical Type 1 IFN producers), raises concerns about the validity and generalizability of the results. Further clarity is needed regarding the rationale for using the same tag in overexpression experiments.

      We thank the reviewer’s suggestion. Besides HEK293T, in Figure 1C and 1D we also used A549 and BMDM to overexpress ORMDL3 and stimulate them with polyI:C or polyG:C, Our results showed that ORMDL3 especially inhibits RLR signaling. Additionally, in Figure 3H we found that the endogenous RIG-I expression decreased when we overexpressed ORMDL3 in BMDM. Regarding the issue of using different protein tags, we plan to use different tags to validate our results.

      • The manuscript contains several inconsistencies and lacks detailed explanations of critical areas, such as the mechanism by which ORMDL3 facilitates USP10 transfer to RIG-I despite no direct interaction between ORMDL3 and RIG-I.

      There are some ERMC (ER-mitochondria contact) proteins that mediate the interaction between ER and mitochondria. ORMDL3 locates in ER, and it has been reported to be associated with calcium transportation. At the meantime, the calcium transfer between ER and mitochondria plays an important role in protein synthesis. It is possible that some ERMC proteins mediate the interaction between ORMDL3 and MAVS. In addition,  we also validated that ORMDL3 interacts with USP10 (Figure 5B). Although ORMDL3 and RIG-I do not interact directly, we generated a mechanistic model that ORMDL3 and MAVS recruit USP10 and RIG-I to ERMCS respectively, thus USP10 could form a complex with RIG-I (Figure 5C) and regulate the stability of RIG-I upon RNA sensing.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) The main hypothesis/conclusion is summarized in the abstract: "Our study presents an intriguing model of cilia length regulation via controlling IFT speed through the modulation of the size of the IFT complex." The data clearly document the remarkable correlation between IFT velocity and ciliary length in the different cells/tissues/organs analyzed. The experimental test of this idea, i.e., the knock-down of GFP-IFT88, further supports the conclusion but needs to be interpreted more carefully. While IFT particle size and train velocity were reduced in the IFT88 morphants, the number of IFT particles is even more decreased. Thus, the contributions of the reduction in train size and velocity to ciliary length are, in my opinion, not unambiguous. Also, the concept that larger trains move faster, likely because they dock more motors and/or better coordinating kinesin-2 and that faster IFT causes cilia to be longer, is to my knowledge, not further supported by observations in other systems (see below).

      Thank you for your comments. We agree with the reviewer that the final section on IFT train size, velocity, and ciliary length regulation requires additional evidence. The purpose of the knockdown experiments was to investigate the potential relationship between IFT speed and IFT train size. We hypothesize that a deficiency in IFT88 proteins may disrupt the regular assembly of IFT particles, leading to the formation of shorter IFT trains. Indeed, we observed a shorter IFT particles and slight reduction in the transport speed of IFT particles in the morphants. Certainly, it would be more convincing to distinguish these IFT trains through ultrastructural analysis. However, with current techniques, performing such analysis on the zebrafish model will be very difficult due to the limited sample size. In the revised version, we have tempered the conclusions in these sections, as suggested by other reviewers as well.

      (2) I think the manuscript would be strengthened if the IFT frequency would also be analyzed in the five types of cilia. This could be done based on the existing kymographs from the spinning disk videos. As mentioned above, transport frequency in addition to train size and velocity is an important part of estimating the total number of IFT particles, which bind the actual cargoes, entering/moving in cilia.

      Thank you. We have analyzed the entry frequency of IFT in five types of cilia, both anterior and posterior. The analysis indicates that longer cilia also exhibit a higher frequency of fluorescent particles entering the cilia. These results are presented in Figure 3J.

      (3) Here, the variation in IFT velocity in cilia of different lengths within one species is documented - the results document a remarkable correlation between IFT velocity and ciliary length. These data need to be compared to observations from the literature. For example, the velocity of IFT in the quite long (~ 100 um) olfactory cilia of mice is similar to that observed in the rather short cilia of fibroblasts (~0.6 um/s). In Chlamydomonas, IFT velocity is not different in long flagella mutants compared to controls. Probably data are also available for C. elegans or other systems. Discussing these data would provide a broader perspective on the applicability of the model outside of zebrafish.

      Thank you for your suggestions. We believe the most significant novelty of our manuscript is the discovery that IFT velocities are closely related to cilia length in an in vivo model system. Our data suggest that longer cilia may require faster IFT transport to maintain their stable length, powered by larger IFT trains. We did observe substantial variability in IFT velocities across different studies. For example, anterograde IFT transport ranges from 0.2 µm/s in mouse olfactory neurons (Williams et al, 2014) to 0.8 µm/s in 293T cells (See et al, 2016) and 0.4 µm/s in IMCD-3 cells (Broekhuis et al, 2014). Even in NIH-3T3 cells, two studies report significant differences, despite using the same IFT reporters: 0.3 µm/s versus 0.9 µm/s (Kunova Bosakova et al, 2018; Luo et al, 2017). These findings suggest that cell types and culture conditions can influence IFT velocities in vitro, which may not accurately represent in vivo conditions. Interestingly, research on mouse olfactory neurons showed a strong correlation between anterograde and retrograde IFT velocities. Additionally, IFT velocity is closely related to the cell types within the olfactory neuron population, consistent with our results (Williams et al., 2014). 

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors study intraflagellar transport (IFT) in cilia of diverse organs in zebrafish. They elucidate that IFT88-GFP (an IFT-B core complex protein) can substitute for endogenous IFT88 in promoting ciliogenesis and use it as a reporter to visualize IFT dynamics in living zebrafish embryos. They observe striking differences in cilia lengths and velocity of IFT trains in different cilia types, with smaller cilia lengths correlating with lower IFT speed. They generate several mutants and show that disrupting the function of different kinesin-2 motors and BBSome or altering post-translational modifications of tubulin does not have a significant impact on IFT velocity. They however observe that when the amount of IFT88 is reduced it impacts the cilia length, IFT velocity as well as the number and size of IFT trains. They also show that the IFT train size is slightly smaller in one of the organs with shorter cilia (spinal cord). Based on their observations they propose that IFT velocity determines cilia length and go one step further to propose that IFT velocity is regulated by the size of IFT trains.

      Strengths:

      The main highlight of this study is the direct visualization of IFT dynamics in multiple organs of a living complex multi-cellular organism, zebrafish. The quality of the imaging is really good. Further, the authors have developed phenomenal resources to study IFT in zebrafish which would allow us to explore several mechanisms involved in IFT regulation in future studies. They make some interesting findings in mutants with disrupted function of kinesin-2, BBSome, and tubulin modifying enzymes which are interesting to compare with cilia studies in other model organisms. Also, their observation of a possible link between cilia length and IFT speed is potentially fascinating.

      Weaknesses:

      The manuscript as it stands, has several issues.

      (1) The study does not provide a qualitative description of cilia organization in different cell types, the cilia length variation within the same organ, and IFT dynamics. The methodology is also described minimally and must be detailed with more care such that similar studies can be done in other laboratories.

      Thank you for your comments. We found that cilia length is generally consistent within the same cell types we examined, including those in the pronephric duct, spinal cord, and epidermal cells. However, we observed variability in cilia length within ear crista cilia. Upon comparing IFT velocities, we found no differences among these cilia, further confirming our conclusion that IFT velocity is directly related to cell type rather than cilia length. These new results are presented in Figure S4 of the revised version.

      We apologize for the lack of methodological details in the original manuscript. Following the reviewer's suggestion, we have added a detailed description of the methods used to generate the transgenic line and to perform IFT velocity analysis. These details are included in Figure S2 and are thoroughly described in the methods section of the revised manuscript.

      (2) They provide remarkable new observations for all the mutants. However, discussion regarding what the findings imply and how these observations align (or contradict) with what has been observed in cilia studies in other organisms is incomprehensive.

      Thank you for this suggestion. We initially submitted this paper as a report, which have word limits. We believe the main finding of our work is that IFT velocity is directly associated with cell type, with longer cilia requiring higher velocities to maintain their length. This association of IFT velocity with cell type has also been observed in mouse olfactory neurons(Williams et al., 2014). We have included a discussion of our findings, along with related data published in other organisms, in the revised version.

      (3) The analysis of IFT velocities, the main parameter they compare between experiments, is not described at all. The IFT velocities appear variable in several kymographs (and movies) and are visually difficult to see in shorter cilia. It is unclear how they make sure that the velocity readout is robust. Perhaps, a more automated approach is necessary to obtain more precise velocity estimates.

      Thank you for these comments. To measure the IFT velocities, we first used ImageJ software to generate a kymograph, where moving particles appear as oblique lines. The velocity of these particles can be calculated based on the slope of the lines (Zhou et al, 2001). In the initial version, most of the lines were drawn manually. To eliminate potential artifacts, we also used KymographDirect software to automatically trace the particle paths. The velocities obtained with this method were similar to those calculated manually. These new data are now shown in Figure S2 B-D. For shorter cilia, we only used particles with clear moving paths for our calculations. In the revised version, we have included a detailed description of the velocity analysis methods.

      (4) They claim that IFT speeds are determined by the size of IFT trains, based on their observations in samples with a reduced amount of IFT88. If this was indeed the case, the velocity of a brighter IFT train (larger train) would be higher than the velocity of a dimmer IFT train (smaller train) within the same cilia. This is not apparent from the movies and such a correlation should be verified to make their claim stronger.

      Thank you for these excellent suggestions. We measured the particle size and fluorescence intensity of 3 dpf crista cilia using high-resolution images acquired with Abberior STEDYCON. The results showed a positive correlation between the two. These data have been added to the revised version in Figure 5I, which includes both control and ift88 morphant data.

      (5) They make an even larger claim that the cilia length (and IFT velocity) in different organs is different due to differences in the sizes of IFT trains. This is based on a marginal difference they observe between the cilia of crista and the spinal cord in immunofluorescence experiments (Figure 5C). Inferring that this minor difference is key to the striking difference in cilia length and IFT velocity is incorrect in my opinion.

      Impact:

      Overall, I think this work develops an exciting new multicellular model organism to study IFT mechanisms. Zebrafish is a vertebrate where we can perform genetic modifications with relative ease. This could be an ideal model to study not just the role of IFT in connection with ciliary function but also ciliopathies. Further, from an evolutionary perspective, it is fascinating to compare IFT mechanisms in zebrafish with unicellular protists like Chlamydomonas, simple multicellular organisms like C elegans, and primary mammalian cell cultures. Having said that, the underlying storyline of this study is flawed in my opinion and I would recommend the authors to report the striking findings and methodology in more detail while significantly toning down their proposed hypothesis on ciliary length regulation. Given the technological advancements made in this study, I think it is fine if it is a descriptive manuscript and doesn't necessarily need a breakthrough hypothesis based on preliminary evidence.

      Thanks for with these comments. We agree with this reviewer that more evidences are required to explain why IFT is transported faster in longer cilia. In the revised version, we have modified and softened this section, focusing primarily on the novel findings of IFT velocity differences between cilia of varying lengths.

      Reviewer #3 (Public Review):

      Summary:

      A known feature of cilia in vertebrates and many, if not all, invertebrates is the striking heterogeneity of their lengths among different cell types. The underlying mechanisms, however, remain largely elusive. In the manuscript, the authors addressed this question from the angle of intraflagellar transport (IFT), a cilia-specific bidirectional transportation machinery essential to biogenesis, homeostasis, and functions of cilia, by using zebrafish as a model organism. They conducted a series of experiments and proposed an interesting mechanism. Furthermore, they achieved in situ live imaging of IFT in zebrafish larvae, which is a technical advance in the field.

      Strengths:

      The authors initially demonstrated that ectopically expressed Ift88-GFP through a certain heatshock induction protocol fully sustained the normal development of mutant zebrafish that would otherwise be dead by 7 dpf due to the lack of this critical component of IFT-B complex.

      Accordingly, cilia formations were also fully restored in the tissues examined. By imaging the IFT using Ift88-GFP in the mutant fish as a marker, they unexpectedly found that both anterograde and retrograde velocities of IFT trains varied among cilia of different cell types and appeared to be positively correlated with the length of the cilia.

      For insights into the possible cause(s) of the heterogeneity in IFT velocities, the authors assessed the effects of IFT kinesin Kif3b and Kif17, BBSome, and glycylation or glutamylation of axonemal tubulin on IFT and excluded their contributions. They also used a cilia-localized ATP reporter to exclude the possibility of different ciliary ATP concentrations. When they compared the size of Ift88-GFP puncta in crista cilia, which are long, and spinal cord cilia, which are relatively short, by imaging with a cutting-edge super-resolution microscope, they noticed a positive correlation between the puncta size, which presumably reflected the size of IFT trains, and the length of the cilia.

      Finally, they investigated whether it is the size of IFT trains that dictates the ciliary length. They injected a low dose (0.5 ng/embryo) of ift88 MO and showed that, although such a dosage did not induce the body curvature of the zebrafish larvae, crista cilia were shorter and contained less Ift88-GFP puncta. The particle size was also reduced. These data collectively suggested mildly downregulated expression levels of Ift88-GFP. Surprisingly, they observed significant reductions in both retrograde and anterograde IFT velocities. Therefore, they proposed that longer IFT trains would facilitate faster IFT and result in longer cilia.

      Weaknesses:

      The current manuscript, however, contains serious flaws that markedly limit the credibility of major results and findings. Firstly, important experimental information is frequently missing, including (but not limited to) developmental stages of zebrafish larvae assayed (Figures 1, 3, and 5), how the embryos or larvae were treated to express Ift88-GFP (Figures 3-5), and descriptions on sample sizes and the number of independent experiments or larvae examined in statistical results (Figures 3-5, S3, S6). For instance, although Figure 1B appears to be the standard experimental scheme, the authors provided results from 30-hpf larvae (Figure 3) that, according to Figure 1B, are supposed to neither express Ift88-GFP nor be genotyped because both the first round of heat shock treatment and the genotyping were arranged at 48 hpf. Similarly, the results that ovl larvae containing Tg(hsp70l:ift88 GFP) (again, because the genotype is not disclosed in the manuscript, one can only deduce) display normal body curvature at 2 dpf after the injection of 0.5 ng of ift88 MO (Fig 5D) is quite confusing because the larvae should also have been negative for Ift88-GFP and thus displayed body curvature. Secondly, some inferences are more or less logically flawed. The authors tend to use negative results on specific assays to exclude all possibilities. For instance, the negative results in Figures 4A-B are not sufficient to "suggest that the variability in IFT speeds among different cilia cannot be attributed to the use of different motor proteins" because the authors have not checked dynein-2 and other IFT kinesins. In fact, in their previous publication (Zhao et al., 2012), the authors actually demonstrated that different IFT kinesins have different effects on ciliogenesis and ciliary length in different tissues. Furthermore, instead of also examining cilia affected by Kif3b or Kif17 mutation, they only examined crista cilia, which are not sensitive to the mutations. Similarly, their results in Figures 4C-G only excluded the importance of tubulin glycylation or glutamylation in IFT. Thirdly, the conclusive model is based on certain assumptions, e.g., constant IFT velocities in a given cell type. The authors, however, do not discuss other possibilities.

      Thank you for pointing out the flaws in our experiments. We apologize for any confusion caused by the lack of detail in our descriptions. Regarding Figure 2B, we want to clarify that it depicts the procedure for heat shock experiments conducted for the ovl mutants' rescue assay, not the experimental procedure for IFT imaging. In the revised version, we have included detailed methods on how to induce the expression of Ift88-GFP via heat shock and the subsequent image processing. The procedure for heat induction is also shown in Figure S2A. We have also added the sample sizes for each experiment and descriptions of the statistical tests used in the appropriate sections of the revised version.

      Regarding the comments on the relationship between IFT speed variability and motor proteins, we completely agree with the reviewer. We have revised our description of this part accordingly.

      Lastly, the results shown in Figure 5D are from a wild-type background, not ovl mutants. We aimed to demonstrate that a lower dose of ift88 morpholino (0.5 ng) can partially knock down Ift88, allowing embryos to maintain a generally normal body axis, while the cilia in the ear crista became significantly shorter.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor

      (I recommend adding page numbers and probably line numbers. This makes commenting easier)

      We have added page numbers and line numbers in the revised manuscript.

      Intro: Furthermore, ultra-high-resolution microscopy showed a close association between cilia length in different organs and the size of IFT fluorescent particles, indicating the presence of larger IFT trains in longer cilia.

      This correlation is not that strong and data are only available for 2 types of cilia.

      Thanks. We have modified this part.

      P5) cilia (Fig. 1D) -> (Fig. S1)

      Thanks. We have corrected this.

      P5) "These movies provide a great opportunity to compare IFT across different cilia." Rewrite: "This approach allows one to determine the velocity and frequency based of IFT based on kymographs" or similar. 

      Thank you for your correction, we have changed it in the revised manuscript.

      This observation suggests that cargo and motor proteins are more effectively coordinated in transporting materials, resulting in increased IFT velocity-a novel regulatory mechanism governing IFT speed in vertebrate cilia.

      This is a somewhat cryptic phrase, rewrite?

      We have modified this sentence.

      P6 and elsewhere: "IFT in the absence of Kif17 or Bbs proteins" I wonder if it would be better to provide subheadings summarizing the main observation instead of descriptive titles. This includes the title of the manuscript.

      Thanks for this suggestion. We have changed the title of subheadings in the revised manuscript. We prefer to keep the current title of this manuscript, as we think this paper is mainly to describe IFT in different types of cilia. 

      Is it known whether IFT protein and motors are alternatively spliced in the various ciliated cells of zebrafish? In this context, is it known whether the cells express IFT proteins at different levels?

      We analyzed the transcript isoforms of several ciliary genes, including ift88, ift52, ift70, ift172, and kif3a. Most of these IFT genes possess only a single transcript isoform. The Kif3a motor proteins have two isoforms (long and short isoforms), however, the shorter isoform contains only the motor domain and is presumed to be nonfunctional for IFT. While we cannot completely rule out this possibility, we consider it unlikely that the variation in IFT speed is due to alternative splicing in ciliary tissues.

      P6) The relation between osm-3 and Kif17 needs to be introduced briefly.  

      Thank you for pointing this out. We have added it in the proper place of the revised manuscript.

      P6) "IFT was driven by kinesin or dynein motor proteins along the ciliary axoneme." "is driven"?

      Delete phrase and IFT to the next sentence?

      We have deleted this sentence.

      P7) "Moreover, the mutants were able to survive to adulthood and there is no difference in the fertility or sperm motility between mutants and control siblings, which is slightly different from those observed in mouse mutants(Gadadhar et al., 2021)." Could some of these data be shown? 

      Thanks for this suggestion. When crossed with wild-type females, all homozygous mutants showed no difference in fertility compared to controls. The percentage of fertilization rates in mutants was 90.5% (n = 7), which was similar to wild-type (87.2%, n = 7). We determined the trajectories of free-swimming sperm by high-speed video microscopy. The vast majority of sperm in ttll3 mutant, similar to wild-type sperm, swim almost entirely along a straight path, which is different from what was observed in the mouse mutant (where 86% of TTLL3-/-TTLL8-/- sperm rotate in situ). We assessed cilia motility in the pronephric ducts of 5dpf embryos using high-speed video microscopy. The ttll3 mutant exhibited a rhythmic sinusoidal wave pattern similar to the control, and there was no significant difference in ciliary beating frequency. These new data are now included in Figure S7C-H.

      P7) "which has been shown early to reduce" earlier

      We have changed it. Thanks.

      Maybe the authors could speculate how the cells ensure the assembly of larger/faster trains in certain cells. Are the relative expression levels known or worth exploring?

      Thank you for these suggestions. We believe that longer cilia may maintain larger IFT particle pools in the basal body region, facilitating the assembly of large IFT trains. The higher frequency of IFT injection in longer cilia further supports this hypothesis. It is likely that cells with longer cilia have higher expression levels of IFT proteins. However, due to the lack of proper antibodies for IFT proteins in zebrafish, it is currently unfeasible to compare this. This experiment is certainly worth investigating in the future. We have added this discussion in the revised manuscript.

      Reviewer #2 (Recommendations for The Authors):

      Here are detailed comments for the authors:

      (1) The authors need to describe their methodology of imaging and what they observe in much greater detail. How were the different cilia types organized? Approximately how many were observed in every organ? How were they oriented? Were there length variations between cilia in the same organ? While imaging, were individual cilium mostly lying in a single focal plane of imaging or the authors often performed z-scans over multiple planes. Velocity measurement is highly variable if individual cilia are spanning over a large volume, with only part of it in focus in single plane acquisition.

      Thank you for your comments. We apologize for the lack of details in the methodology. We have added a detailed description in the 'Materials and Methods' section and illustrated the experimental paradigm in Figure S2A of the revised manuscript. In most tissues we examined, the length of cilia was relatively uniform, except in the crista. The cilia in the crista were significantly longer, with lengths varying between 5 and 30 μm, compared to those in other tissues. We categorized the cilia lengths in the crista into three groups at intervals of 10 μm and measured the anterograde and retrograde velocities of IFT in each group. The results, shown in Figure S4, revealed no significant difference in IFT velocity among the different cilia lengths within the same tissue.  Regarding the imaging, all IFT movies were captured in a single focal plane. In most cases, we did not observe significant velocity variability within the same cilium.

      (2) It is very difficult to directly observe the large differences in IFT velocity from the kymographs, especially in the case of shorter cilia and retrograde motion in them. The quality of the example kymographs could be improved and more zoomed in several cases.

      Thank you for this suggestion. We have modified this.

      (3) The authors do not describe at all, how velocity analysis was done on the kymographs? Were lines drawn manually on the kymographs? From the movies and the kymographs it is visible that the IFT motion is often variable and sometimes gets stuck. How did the authors determine the velocities of such trains? A single slope through the entire train or part of the train? Were they consistent with this? Such variable motion is not so easy to discern in the case of really short cilia. The authors could use a more automatic way of extracting velocities from kymographs using tools such as kymodirect or kymobutler. Keeping in mind that IFT velocity is the main parameter studied in this work, it is important that the analysis is robust.

      We apologize for the previous lack of detailed description. We utilized ImageJ software to generate kymographs, where particles appear as lines. For a moving particle, this line appears oblique. We manually drew lines on the kymographs, and the velocity of particles was calculated based on the slope (Zhou et al., 2001). We only analyzed particles that tracked the full length of the cilia. Following the reviewer's suggestions, we also used the automatic software KymographDirect to calculate the velocity of IFT particles. The results were similar to those calculated using the previous method. These new data are now shown in Figure S2B-D. For shorter cilia, we only used particles with clear moving paths for our calculations. In the revised version, we have included a detailed description of the velocity analysis methods.

      (4) In line with the previous point, as visible from the kymographs the velocity is significantly slower near the transition zone. Did the authors make sure they are not including the region around the transition zone while measuring the IFT velocity, especially in the case of shorter cilia?

      Thank you for the comment. In the revised manuscript, we automatically extracted the path of particle using KymographDirect software. Quantification of each particle's velocity versus position in crista reveals that anterograde IFT proceeds from the base to the tip at a relatively constant speed, whereas retrograde IFT undergoes a slightly acceleration process when returning to the base (Fig. S2E). This finding differs from observations in C. elegans, which dynein-2 first accelerating and then decelerating back to 1.2 μm/s adjacent to the ciliary base (Yi et al, 2017). We believe it is very unlikely that the slow IFT velocity is due to the calculation of IFT only in the transition zone of shorter cilia.

      (5) There are several fascinating findings in this work that the authors do not discuss properly. Firstly, do the authors have a hypothesis as to why IFT speeds are so radically different in different cilia types, given that they are driven by the same motor proteins and have the same ATP levels? They make a big claim in this paper that IFT train sizes correlate with train velocities. IFT trains have a highly ordered structure with regular binding sites for motor proteins. So, a smaller train would have a proportional number of motors attached to them. Why (and how) are the motors moving trains so slowly in some cilia and not in others? If there is no clear answer, the authors must put forward the open question with greater clarity.

      Thank you for the comment. We hypothesize that if multiple motors drive the movement of cargoes synergistically, it could increase the speed of IFT transport. An example supporting this hypothesis is the principle of multiple-unit high-speed trains, which use multiple motors in each individual car to achieve high speeds. Of course, this is just one hypothesis, and we cannot exclude other possibilities, such as the use of different adaptors in different cell types. We have revised our conclusions accordingly in the updated manuscript.

      (6) They find that IFT speeds do not change in kif17 mutants. Are the cilia length also similar (does not appear to be the case in Figure 4 and Figure S3)? Cilia length needs to be quantified. Further, they mention that in C elegans, heterotrimeric kinesin-2 and homodimeric kinesin-2 coordinate IFT. However, from several previous studies, we know that in Chlamydomonas and in mammalian cilia IFT is driven primarily by heterotrimeric kinesin-2 with no evidence that homodimeric kinesin-2 is linked with driving IFT. It appears to be the same in zebrafish. This is an interesting finding and needs to be discussed far more comprehensively.

      Thank you for your comments. We have previously shown that the number and length of crista cilia were grossly normal in kif17 mutants (Zhao et al, 2012). The length of crista cilia displayed slight variability even in wild-type larvae. We quantified the length of cilia in both the crista and neuromast within different mutants, and our analysis revealed no significant difference (see Author response image 1). We agree with the reviewer that Kif17 may play a minor role in driving IFT in cilia. However, previous studies have shown that KIF17 exhibits robust, processive particle movement in both the anterograde and retrograde directions along the entire olfactory sensory neuron cilia in mice. This suggests that, although not essential, KIF17 may also be involved in IFT (Williams et al., 2014). We have added more discussion about Kif17 and heterotrimeric kinesin in the appropriate section of the revised manuscript.

      Author response image 1.

      Statistical significance is based on Kruskal-Wallis statistic, Dunn's multiple comparisons test. n.s., not significant, p>0.05.

      (7) Again, they find that IFT speeds do not change in BBS-4 mutants. I have the same comment about the cilia length as for kif17 mutants. Further, the discussion for this finding is lacking. The authors mention that IFT is disrupted in BBSome mutants of C elegans. Is this the case in other organisms as well? Structural studies on IFT trains reveal that BBSomes are not part of the core structure, while other studies reveal that BBSomes are not essential for IFT. So perhaps the results here are not too surprising.

      We agree with the reviewer that BBSome is possibly not essential for IFT in most cilia. However, in the cilia of olfactory sensory neurons, BBSome is involved in IFT in both mice and nematodes (Ou et al, 2005; Williams et al., 2014). We have added more discussion about BBSome in the appropriate section of the revised manuscript.

      (8) No change in IFT velocities in kif3b mutants is rather surprising. The authors suggest that Kif3C homodimerizes to carry out IFT in the absence of Kif3B. Even if that is the case, the individual homodimer constituents of heterotrimeric kinesin-2 have been shown in previous studies to have different motor properties when homodimerized artificially. Why is IFT not affected in these mutants? This should be discussed. Also, the cilia lengths should be quantified.

      We think the presence of the Kif3A/Kif3C/KAP3 trimeric kinesin may substitute for the Kif3A/Kif3B/KAP3 motors in kif3b mutants, which show normal length of cristae cilia. The Kif3A/Kif3C/KAP3 trimeric kinesin may have similar transport speeds as the Kif3A/Kif3B/KAP3 motors. We did not propose that the Kif3C homodimer can drive the cargoes alone. We apologize for this misunderstanding. Additionally, we have reevaluated the IFT velocities among different lengths of cristae cilia and found no difference between longer and shorter cilia within the same cell types.

      (9) The findings with tubulin modifications should also be discussed in comparison to what has been observed in other organisms.

      We have added further discussion about this result in the revised manuscript.

      (10) The authors find that IFT velocity is lower in ift88 morphants. They also find that the cilia length is shorter (in which cilia type?). Immunofluorescence experiments show that the IFT particle number and size are lower in the ift88 morphants. How many organisms did they look at for this data? What is the experimental variability in intensity measurements in immunofluorescence experiments? Wouldn't the authors expect much higher variability in ift88 morphants (between individual organisms) due to different amounts of IFT88 than for wildtype?

      Thank you for your comments. We apologize for the lack of information regarding the number of organisms observed in Figure 5. These numbers have been added to the figure legends in the revised manuscript. When a low dose of ift88 morpholino was injected, we observed significant shortening of cilia in the ear crista, along with reduced IFT speed. We measured the fluorescence intensity of different IFT particles and found a positive correlation between IFT particle size and fluorescence intensity (Fig 5I). Moreover, the variability of cilia length in cristae is slightly higher in ift88 morphants. These new data have been included in the revised version.

      (11) From their observations they make the claim that IFT velocity is directly proportional to IFT train size. Now within every cilium, IFT trains have large size variations, given the variable intensities for different IFT trains. The authors themselves show that they resolve far more trains when imaging with STED (possibly because they are able to visualize the smaller trains). Is the IFT velocity within the same cilium directly correlated with the intensity of the train, both for wildtype and ift88 morphants? That is the most direct way the authors can test that their hypothesis is true. Higher intensity (larger train size) results in faster velocity. From a qualitative look at their movies, I do not see any strong evidence for that.

      Thank you for your comments. We have measured the particle size and fluorescence intensity of 3dpf crista cilia using high-resolution images acquired with Abberior STEDYCON. The results, shown in Figure 5I, demonstrate a positive correlation between particle size and fluorescence intensity.

      (12) Are the sizes of both anterograde and retrograde trains lower in ift88 morphants? It's not clear from the data. It should be clearly stated that the authors speculate this and this is not directly evident from the data.

      Because the size of IFT fluorescence particles is based on immunostaining results, not live imaging, we cannot determine whether they are anterograde or retrograde IFT particles.

      Therefore, we can only speculate that possibly both anterograde and retrograde trains are reduced in ift88 morphants.

      (13) The biggest claim in this paper is that the cilia lengths in different organs are different due to differences in IFT train sizes. This is based on highly preliminary data shown in Figure 5C (how many organisms did they measure?). The difference is marginal and the dataset for spinal cord cilia is really small. The internal variability within the same cilia type is larger than the difference. How is this tiny difference resulting in such a large difference in IFT speeds? I believe their conclusions based on this data are incorrect.

      From our results, we believe that IFT velocity is related to cell types rather than the length of cilia (Fig. S4), which has also been mentioned in previous studies (Williams et al., 2014).  We agree with the reviewer that the evidence for faster IFT speed due to larger train size is not very solid. We have accordingly softened our conclusion and mentioned other possibilities in the revised version.

      Minor comments:

      (1) The authors only mention the number of IFT particles for their data. They should provide the number of cilia and the number of organisms as well.

      Thank you for your suggestion. We added the number of cilia and organisms next to the number of particles in Figure 3, Figure S2-S5 and Table S1 of the revised manuscript.

      (2) Cilia and flagella are similar structurally but not the same. The authors should change the following sentence: In contrast to the localization of most organelles within cells, cilia (also known as flagellar) are microtubule-based structures that extend from the cell surface, facilitating a more straightforward quantification of their size.  

      Thank you for the detailed review. We have changed it in our revised manuscript. 

      (3) The authors should provide references here. For example, Chlamydomonas has two flagella with lengths ranging from 10 to 14 μm, while sensory cilia in C. elegans vary from approximately 1.5 μm to 7.5 μm. In most mammalian cells, the primary cilium typically measures between 3 and 10 μm.  

      We have added it in our revised manuscript. 

      (4) They should mention ovl mutants are IFT88 mutants when they introduce it in the main text.

      We have added it in our revised manuscript. 

      (5) Correct the grammar here: The velocity of IFT within different cilia also seems unchanged (Figure 4F, Movie S9, Table S1).  

      We have changed it. 

      (6) Correct the grammar here: Similarly, the IFT speeds also exhibited only slight changes in ccp5 morphants, which decreased the deglutamylase activities of Ccp5 and resulted in a hyperglutamylated tubulin

      We have changed it. 

      Reviewer #3 (Recommendations For The Authors):

      Introduction:

      1st paragraph, "flagellar" should be "flagella"; 2nd paragraph, "result a wide range of" should be "result in a...".  

      We have changed it. 

      Results and discussion:

      "...certain specialized cell types, including olfactory epithelia and pronephric duct, ...": olfactory epithelia and pronephric duct are tissues, not cells.  

      "...the GFP fluorescence of the transgene was prominently enriched in the cilia (Fig 1D)" : Fig 2D?  

      "The velocity of IFT within different cilia was also seems unchanged (Fig. 4 F, Movie S9, Table S1)": "was" and "seems" cannot be used together.  

      "...driven by b-actin2 promotor":    -actin2? 

      "...each dynein motor protein might propel multiple IFT complexes": The "protein" should be deleted.  

      Thanks. We have corrected all of these mistakes.  

      Figures:

      Figure 1: Dyes and antibodies used other than the anti-acetylated tubulin antibody should mentioned. The developmental stages of zebrafish used for the imaging are mostly missing.  

      Thanks. In the revised version, we have updated the figure legends to include descriptions of the antibodies, developmental stages, as well as N numbers.

      Figure 2B: What "hphs" means should be explained somewhere.  

      Thanks. We have added full name for these abbreviations.  

      Figures 3A-E: For clarity, the cilia whose IFT kymographs are shown should be marked. "Representative particle traces are marked with white lines in panels D and E" (legend): they are actually black lines. The authors should also clearly disclose the developmental stages of zebrafish used for the imaging.  

      Thank you for your comments. In the revised manuscript, the cilia used to generate the kymograph are marked by yellow arrows. We have updated the legend to change "white" to "black." Additionally, we have included the developmental stages of zebrafish used for imaging in Figure 3A.

      Figures 3G-K: The authors used quantification results from 4-dpf larvae and 30-hpf embryos for comparisons. Nevertheless, according to their experimental scheme in Figure 2B, 30-hpf embryos were not subjected to heat-shock treatment and genotyping. How could they express Ift88-GFP for the imaging? How could the authors choose larvae of the right genotypes? In addition, even if the authors heat-shocked them in time but forgot to mention, there are issues that need to be clarified experimentally and/or through citations, at least through discussions. Firstly, at 30 hpf, those motile cilia are probably still elongating. If this is the case, their final lengths would be longer than those presented (H; the authors need to disclose whether the lengths were measured from ciliary Ift88-GFP or another marker). In other words, the correlation with IFT velocities (H and I) might no longer exist when mature cilia were measured. Similarly, cilia undergo gradual disassembly during the cell cycle. Epidermal cells at 30-hpf are likely proliferating actively, and the average length of their cilia (H) would be shorter than that measured from quiescent epidermal cells in later stages.

      Thank you for these comments. First, we want to clarify that Figure 2B depicts the procedure for heat shock experiments conducted for the ovl mutants' rescue assay, not the experimental procedure for IFT imaging. We visualized IFT in five types of cilia using Tg (hsp70l: ift88-GFP) embryos without the ovl mutant background. In the revised manuscript, we have provided a detailed description of embryo treatment in the 'Materials and Methods' section and illustrated the experimental paradigm in Figure S2A. 

      Regarding the ciliary length differences between different developmental stages, we quantified cilia length in epidermal cells at 30 hpf versus 4 dpf, and in pronephric duct cilia at 30 hpf versus 48 hpf. Our analysis found no significant difference in length between earlier and later stages. Additionally, IFT velocities were comparable between these stages. These findings suggest that slower IFT velocities may not be attributed to the selection of different embryonic stages. Furthermore, we demonstrated that longer and shorter cilia maintain similar IFT velocities in crista cilia, indicating that elongated cilia within the same cell type exhibit comparable IFT velocities. These new results are presented in Figures S4 and S5 in the revised version.

      Secondly, do IFT velocities differ between elongating and mature cilia or remain relatively constant for a given cell type? The authors apparently take the latter for granted without even discussing the possibility of the former. In addition, whether the quantification results were from cilia of one or multiple fish, an important parameter to reflect the reproducibility, and sample sizes for the length data are not disclosed. The lack of descriptions on sample sizes and the number of independent experiments or larvae examined are actually common for statistical results in this manuscript.

      Thank you for your comments. We apologize for omitting the basic description of sample sizes and the number of cilia analyzed. We have addressed these issues in the revised manuscript. The length of 4dpf Crista cilia is variable, with longer cilia reaching up to 30 µm and shorter cilia measuring only around 5 µm within the same crista. We categorized the cilia length of Crista into three groups at intervals of 10 µm and measured anterograde and retrograde velocities of IFT in each group. The results revealed no significant difference in IFT velocity among elongating and mature cilia within crista. These supplementary data are now included in Figure S4.

      Figures 4A-B: When mutating neither Kif17 nor Kif3b affected the IFT of crista cilia, the data unlikely "suggest that the variability in IFT speeds among different cilia cannot be attributed to the use of different motor proteins". In fact, in the cited publication (Zhao et al., 2012), the authors used the same and additional mutants (Kif3c and Kif3cl) to demonstrate that different IFT-related kinesin motors have different effects on ciliogenesis and ciliary length in different tissues, results actually implying tissue-specific contributions of different kinesin motors to IFT. Furthermore, although likely only cytoplasmic dynein-2 is involved in the retrograde IFT, the authors cannot exclude the possibility that different combinations or isoforms of its many subunits and regulators contribute to the velocity regulation. Therefore, the authors need to reconsider their wording. This reviewer would suggest that the authors examine the IFT status of cilia that were previously reported to be shortened in the Kif3b mutant to see whether the correlation between ciliary length and IFT velocities still stands. This would actually be a critical assay to assess whether the proposed correlation is only a coincidence or indeed has a certain causality.

      Thank you for your comments. The shortened cilia observed in Kif3b mutants may be attributed to the presence of maternal Kif3b proteins, making it challenging to exclude the involvement of Kif3b motor. Regarding the relationship between IFT speed variability and motor proteins, we agree with the reviewer that we cannot entirely dismiss the possibility of different motors or adaptors being involved. We have revised our description of this aspect accordingly.

      Figures 4C-G: Similarly, when the authors found that tubulin glycylation or glutamylation has little effect on IFT, they cannot use these observations to exclude possible influences of other types of tubulin modifications on IFT. They should only stick to their observations.

      Yes, we agree. We have changed the description in the revised manuscript.

      Figure 5:

      A-C: When the authors only compared immotile cilia of crista with motile cilia of the spinal cord, it is hard to say whether the difference in particle size is correlated with ciliary length or motility. Cilia from more tissues should be included to strengthen their point, especially when the authors want to make this point the central one.

      D: The authors showed that ovl larvae containing Tg(hsp70l:ift88 GFP) (as they do not indicate the genotype, this reviewer can only deduce) display normal body curvature at 2 dpf after the injection of 0.5 ng of ift88 MO. Such a result, however, is quite confusing. According to their experimental scheme in Figure 2B, these larvae were not subjected to heat shock induction for Ift88-GFP. Do ovl larvae containing Tg(hsp70l:ift88 GFP) naturally display normal body curvature at 2 dpf? 

      Thank you for your comments. Due to technical limitations, comparing IFT particle size across different cilia using STED is challenging. We agree with this reviewer that the evidence supporting this aspect is relatively weak. Accordingly, we have modified and softened our conclusion in the revised version.

      Regarding the injection of ift88 morpholino, we want to clarify that we are injecting it into wildtype embryos, not oval mutants. The lower dose of ift88 morpholino (0.5ng) partially knocked down Ift88, allowing embryos to maintain a grossly normal body axis while resulting in shorter cilia in the ear crista.

      E: The authors need to indicate the developmental stage of the larvae examined. One piece of missing data is global expression levels of both endogenous (maternal) Ift88 and exogenous

      Ift88-GFP in zebrafish larvae that are either uninjected, 8-ng-ift88 MO-injected, or 0.5-ng-ift88 MO-injected, preferably at multiple time points up to 3 dpf. The results will clarify (1) the total levels of Ift88 following time; (2) the extent of downregulation the MO injections achieved at different developmental stages; and importantly (3) whether the low MO dosage (0. 5 ng) indeed allowed a persistent downregulation to affect IFT trains at 3 dpf, a time the authors made the assays for Figures 5F-J to reach the model (K). It will be great to include wild-type larvae for comparison.

      Thank you for these valuable suggestions. The ift88 morpholino (MO) was designed to block the splicing of ift88 transcripts and has been used in multiple studies. This morpholino specifically blocks the expression of endogenous ift88, while the expression of the Ift88-GFP transgene remains unaffected. It would be beneficial to titrate the expression level of Ift88 in the morphants at different stages. Unfortunately, we do not have access to a zebrafish Ift88 antibody. We assessed the effects of a lower amount of MO based on our observation that the fish maintained a normal body axis while exhibiting shorter cilia. Ideally, the amount of Ift88 should be lower in the morphants, considering the presence of ciliogenesis defects. We have included additional comments regarding this limitation in the revised version.

      Movies:

      Movies 1-5: Elapsed time is not provided. Furthermore, cilia in the pronephric duct and spinal cord are known to beat rapidly. Their motilities, however, appear to be largely compromised in Movies 3 and 4. Although the quantification results in Fig 3G imply that the authors imaged 30hpf embryos for such cilia, there is no statement on real conditions.

      Thank you for your comments. We apologize for missing elapsed time in our movies. We have addressed this issue in the revised manuscript. Motile cilia are difficult to image due to their fast beating. To immobilize the moving cilia and enable the capture of IFT movement within the cilia, we gently press the embryo with a round cover glass to inhibit the beating of cilia. Data from each embryo were collected within 5 minutes to avoid the impact of embryo death on the results. We have added detail description in the 'Materials and Methods' section.

      Materials:

      The sequence of morpholino oligonucleotide against ift88 is missing.  

      We have added the sequence of ift88 morpholino in the revised manuscript.

      References:

      Important references are missing, including (1) the paper by Leventea et al., 2016 (PMID: 27263414), which shows cilia morphologies in various zebrafish tissues with more detailed descriptions of tissue anatomies and experimental techniques; (2) papers documenting that dynein motors "move faster than Kinesin motors" in IFT of C. reinhardtii and C. elegans cilia; and (3) the paper by Li et al., 2020 (PMID: 33112235), in which the authors constructed a hybrid IFT kinesin to markedly reduced anterograde IFT velocity (~ 2.8 fold) and IFT injection rate in C. reinhardtii cilia and found only a mild reduction (~15%) in ciliary length. This paper is important because it is a pioneer one that elegantly investigated the relationship between IFT velocity and ciliary length. The findings, however, do not necessarily contradict the current manuscript due to differences in, e.g., model organisms and methodology.

      Thank you for the detailed review, we have cited these literatures in the proper place of the revised manuscript.

      Reference

      Broekhuis JR, Verhey KJ, Jansen G (2014) Regulation of cilium length and intraflagellar transport by the RCK-kinases ICK and MOK in renal epithelial cells. PLoS One 9: e108470

      Kunova Bosakova M, Varecha M, Hampl M, Duran I, Nita A, Buchtova M, Dosedelova H, Machat R, Xie Y, Ni Z et al (2018) Regulation of ciliary function by fibroblast growth factor signaling identifies FGFR3-related disorders achondroplasia and thanatophoric dysplasia as ciliopathies. Hum Mol Genet 27: 1093-1105

      Luo W, Ruba A, Takao D, Zweifel LP, Lim RYH, Verhey KJ, Yang W (2017) Axonemal Lumen Dominates Cytosolic Protein Diffusion inside the Primary Cilium. Sci Rep 7: 15793 Ou G, Blacque OE, Snow JJ, Leroux MR, Scholey JM (2005) Functional coordination of intraflagellar transport motors. Nature 436: 583-587

      See SK, Hoogendoorn S, Chung AH, Ye F, Steinman JB, Sakata-Kato T, Miller RM, Cupido T, Zalyte R, Carter AP et al (2016) Cytoplasmic Dynein Antagonists with Improved Potency and Isoform Selectivity. ACS Chem Biol 11: 53-60

      Williams CL, McIntyre JC, Norris SR, Jenkins PM, Zhang L, Pei Q, Verhey K, Martens JR (2014) Direct evidence for BBSome-associated intraflagellar transport reveals distinct properties of native mammalian cilia. Nat Commun 5: 5813

      Yi P, Li WJ, Dong MQ, Ou G (2017) Dynein-Driven Retrograde Intraflagellar Transport Is Triphasic in C. elegans Sensory Cilia. Curr Biol 27: 1448-1461 e1447

      Zhao C, Omori Y, Brodowska K, Kovach P, Malicki J (2012) Kinesin-2 family in vertebrate ciliogenesis. Proceedings of the National Academy of Sciences 109: 2388 - 2393

      Zhou HM, Brust-Mascher I, Scholey JM (2001) Direct visualization of the movement of the monomeric axonal transport motor UNC-104 along neuronal processes in living Caenorhabditis elegans. J Neurosci 21: 3749-3755

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Because tRNA-sequencing methods have not been widely used (compared to mRNA-seq), many readers would not be familiar with the characteristics of different methods introduced in this study (QuantM-tRNA, mim-tRNA, YAMAT, DM-tRNA, and ALL-tRNA; bowtie2-based, SHRiMP, and mimseq; what are the main features of "Salmon?"). The manuscript will read better when the basic features of these methods are described in the manuscript, however brief.

      Introduction page 4 now clarifies a little more the difference between bowtie2, SHRiMP and mimseq. Results page 9 briefly summarises the differences between the tRNA-Seq methods. Results page 14 clarifies how Decision and Salmon work.

      Reviewer 2:

      (1) The explanation of the parameter D for bowtie2 sounds ambiguous. "How much effort to expend" needs to be explained in more detail.

      Results page 6 gives a more precise explanation of the D parameter.

      (2) Please provide optimal parameters (L and D) for tRNA-seq alignment.

      I think optimal here is not possible to determine. It will depend on the species, the frequency of misincorporations due to modifications (tRNA-Seq protocol specific) and how long one is willing to let bowtie continue searching for a better match. The point of Figure 1a is that D needs to be increased if L is decreased and an error is allowed in the seed. I think the sentence in the results section Figure 1a is the appropriate way to express this without committing to a single ‘optimal’ parameterisation_:_ ‘We observed that when an error in the seed is allowed, as the seed length is decreased, there needs to be a concomitant increase in effort expended to allow bowtie2 more opportunities to find the best possible alignment, especially with respect to the Transcript ID‘.

      (3) I think the authors chose L=10 and D=100 based on Figure 1A. Which dataset did you choose for this parameterization among ALL-tRNAseq, DM-tRNAseq, mim-tRNAseq, QuantM-tRNA-seq, and YAMAT-seq?

      Figure 1A is based on simulation of full length reads with only sequencing errors, e.g not from any tRNA-Seq method in particular. This is stated in the results text and I’ve clarified in the figure legend.

      (4) Salmon does not need a read alignment process such as Bowtie2. Hence, it is not clear "Only results from alignment with bowtie2" in Figure legend for Figure 4a.

      I’m using Salmon in ‘alignment-mode’, taking the alignments from bowtie2. I’ve clarified this in results page 14.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This is an interesting and potentially important paper, which however has some deficiencies.

      Strengths:

      A significant amount of potentially useful data.

      Weaknesses:

      One issue is a confusion of thermal stability with solubility. While thermal stability of a protein is a thermodynamic parameter that can be described by the Gibbs-Helmholtz equation, which relates the free energy difference between the folded and unfolded states as a function of temperature, as well as the entropy of unfolding. What is actually measured in PISA is a change in protein solubility, which is an empirical parameter affected by a great many variables, including the presence and concentration of other ambient proteins and other molecules. One might possibly argue that in TPP, where one measures the melting temperature change ∆Tm, thermal stability plays a decisive or at least an important role, but no such assertion can be made in PISA analysis that measures the solubility shift.

      We completely agree with the insightful comment from the reviewer and we are very grateful that the point was raised. Our goal was to make this manuscript easily accessible to the entire scientific community, not just experts in the field. In an attempt to simplify the language, we likely also simplified the underlying physical principles that these assays exploit. In defense of our initial manuscript, we did state that PISA measures “a fold change in the abundance of soluble protein in a compound-treated sample vs. a vehicle-treated control after thermal denaturation and high-speed centrifugation.” Despite this attempt to accurately communicate the reviewer’s point, we seem to have not been sufficiently clear. Therefore, we tried to further elaborate on this point and made it clear that we are measuring differences in solubility and interpreting these differences as changes in thermal stability. 

      In the revised version of the manuscript, we elaborated significantly on our original explanation. The following excerpt appears in the introduction (p. 3):

      “So, while CETSA and TPP measure a change in melting temperature (∆TM), PISA measures a change in solubility (∆SM).  Critically, there is a strong correlation between ∆TM and ∆SM, which makes PISA a reliable, if still imperfect, surrogate for measuring direct changes in protein thermal stability (Gaetani et al., 2019; Li et al., 2020). Thus, in the context of PISA, a change in protein thermal stability (or a thermal shift) can be defined as a fold change in the abundance of soluble protein in a compoundtreated sample vs. a vehicle-treated control after thermal denaturation and high-speed centrifugation. Therefore, an increase in melting temperature, which one could determine using CETSA or TPP, will lead to an increase in the area under the curve and an increase in the soluble protein abundance relative to controls (positive log2 fold change). Conversely, a decrease in melting temperature will result in a decrease in the area under the curve and a decrease in the soluble protein abundance relative to controls (negative log2 fold change).”

      And the following excerpt appears in the results section (p. 4): 

      “In a PISA experiment, a change in melting temperature or a thermal shift is approximated as a

      significant deviation in soluble protein abundance following thermal melting and high-speed centrifugation. Throughout this manuscript, we will interpret these observed alterations in solubility as changes in protein thermal stability. Most commonly this is manifested as a log2 fold change comparing the soluble protein abundance of a compound treated sample to a vehicle-treated control (Figure 1 – figure supplement 1A).”

      We have now drawn a clear distinction between what we were actually measuring (changes in solubility) and how we were interpreting these changes (as thermal shifts). We trust that the Reviewer will agree with this point, as they rightly claim that many of the observations presented in our work, which measures thermal stability, indirectly, are consistent with previous studies that measured thermal stability, directly. Again, we thank the reviewer for raising the point and feel that these changes have significantly improved the manuscript. 

      Another important issue is that the authors claim to have discovered for the first time a number of effects well described in prior literature, sometimes a decade ago. For instance, they marvel at the differences between the solubility changes observed in lysate versus intact cells, while this difference has been investigated in a number of prior studies. No reference to these studies is given during the relevant discussion.

      We thank the reviewer for raising this point. Our aim with this paper was to test the proficiency of this assay in high-throughput screening-type applications. We considered these observations as validation of our workflow, but admit that our choice of wording was not always appropriate and that we should have included more references to previous work. It was certainly never our intention to take credit for these discoveries. Therefore, we were more than happy to include more references in the revised version. We think that this makes the paper considerably better and will help readers better understand the context of our study.  

      The validity of statistical analysis raises concern. In fact, no calculation of statistical power is provided.

      As only two replicates were used in most cases, the statistical power must have been pretty limited. Also, there seems to be an absence of the multiple-hypothesis correction.

      We agree with the reviewer that a classical comparison using a t-test would be underpowered comparing all log2 normalized fold changes. We know from the data and our validation experiments that stability changes that generate log2 fold changes of 0.2 are indicative of compound engagement. When we use 0.2 to calculate power for a standard two-sample t-test with duplicates, we estimated this to have a power of 19.1%. Importantly, increasing this to n=3 resulted in a power estimate of only 39.9%, which would canonically still be considered to be underpowered. Thus, it is important to note that we instead use the distribution of all measurements for a single protein across all compound treatments to calculate standard deviations (nSD) as presented in this work. Thus, rather than a 2-by-2 comparison, we are comparing two duplicate compound treatments to 94 other compound treatments and 18 DMSO vehicle controls. Moreover, we are using this larger sample set to estimate the sampling distribution. Estimating this with a standard z-test would result in a p-value estimate <<< 0.0001 using the population standard deviation. Additionally, rather than estimate an FDR using say a BenjaminiHochberg correction, we estimated an empirical FDR for target calls based on applying the same cutoffs to our DMSO controls and measuring the proportion of hits called in control samples at each set of thresholds. Finally, we note that several other PISA-based methods have used fold-change thresholds similar to, or less than, those employed in this work (PMID: 35506705, 36377428, 34878405, 38293219).  

      Also, the authors forgot that whatever results PISA produces, even at high statistical significance, represent just a prediction that needs to be validated by orthogonal means. In the absolute majority of cases such validation is missing.

      We appreciate this point and we can assure the reviewer that this point was not lost on us. To this point, we state throughout the paper that the primary purpose of this paper was to execute a chemical screen. Furthermore, we do not claim to present a definitive list of protein targets for each compound. Instead, our intention is to provide a framework for performing PISA studies at scale. In total, we quantified thousands of changes and feel that it would be unreasonable to validate the majority of these cases. Instead, as has been done for CETSA (PMID: 34265272), PISA (PMID: 31545609), and TPP (PMID: 25278616) experiments before, we chose to highlight a few examples and provide a reasonable amount of validation for these specific observations. In Figure 2, we show that two screening compounds—palbociclib and NVP-TAE-226—have a similar impact on PLK1 solubility as the two know PLK1 inhibitors. We then assay each of these compounds, alongside BI 2536, and show that the same compounds that impact the solubility of PLK1, also inhibit its activity in cell-based assays. Finally, we model the structure of palbociclib (which is highly similar to BI 2536) in the PLK1 active site. In Figure 4, we show that AZD-5438 causes a change in solubility of RIPK1 in cell- and lysate-based assays to a similar extent as other compounds known to engage RIPK1. We then test these compounds in cellbased assays and show that they are capable of inhibiting RIPK1 activity in vivo. Finally, in Figure 5, we show that treatment with tyrosine kinase inhibitors and AZD-7762 result in a decrease in the solubility of CRKL. We showed that these compounds, specifically, prevented the phosphorylation of CRKL at Y207. Next, we show that AZD-7762, impacts the thermal stability of tyrosine kinases in lysate-based PISA. Finally, we performed phosphoproteomic profiling of cells treated with bafetinib and AZD-7762 and find that the abundance of many pY sites is decreased after treatment with each compound. It is also worth stating that an important goal of this study was to determine the proficiency of these methods in identifying the targets of each compound. We do not feel that comprehensive validation of the “absolute majority of cases” would significantly improve this manuscript. 

      Finally, to be a community-useful resource the paper needs to provide the dataset with a user interface so that the users can data-mine on their own.

      We agree and are working to develop an extensible resource for this. Owing to the size and complexities there, that work will need to be included in a follow-up manuscript. For now, we feel that the supplemental table we provide can be easily navigated the full dataset. Indeed, this has been the main resource that we have been emailed about since the preprint was first made public. We are glad that the Reviewer considers this dataset to be a highly valuable resource for the scientific community.  

      Reviewer #2 (Public Review):

      Summary:

      Using K562 (Leukemia) cells as an experimental model, Van Vracken et. al. use Thermal Proteome Profiling (TPP) to investigate changes in protein stability after exposing either live cells or crude cell lysates to a library of anti-cancer drugs. This was a large-scale and highly ambitious study, involving thousands of hours of mass spectrometry instrument time. The authors used an innovative combination of TPP together with Proteome Integral Solubility Alternation (PISA) assays to reduce the amount of instrument time needed, without compromising on the amount of data obtained.

      The paper is very well written, the relevance of this work is immediately apparent, and the results are well-explained and easy to follow even for a non-expert. The figures are well-presented. The methods appear to be explained in sufficient detail to allow others to reproduce the work.

      We thank the reviewer. One of our major goals was to make these assays and the resulting data approachable, especially for non-experts. We are glad that this turned out to be the case. 

      Strengths:

      Using CDK4/6 inhibitors, the authors observe strong changes in protein stability upon exposure to the drug. This is expected and shows their methodology is robust. Further, it adds confidence when the authors report changes in protein stability for drugs whose targets are not well-known. Many of the drugs used in this study - even those whose protein targets are already known - display numerous offtarget effects. Although many of these are not rigorously followed up in this current study, the authors rightly highlight this point as a focus for future work.

      Weaknesses:

      While the off-target effects of several drugs could've been more rigorously investigated, it is clear the authors have already put a tremendous amount of time and effort into this study. The authors have made their entire dataset available to the scientific community - this will be a valuable resource to others working in the fields of cancer biology/drug discovery.

      We agree with the reviewer that there are more leads here that could be followed and we look forward to both exploring these in future work and seeing what the community does with these data.

      Reviewer #3 (Public Review):

      Summary:

      This work aims to demonstrate how recent advances in thermal stability assays can be utilised to screen chemical libraries and determine the compound mechanism of action. Focusing on 96 compounds with known mechanisms of action, they use the PISA assay to measure changes in protein stability upon treatment with a high dose (10uM) in live K562 cells and whole cell lysates from K562 or HCT116. They intend this work to showcase a robust workflow that can serve as a roadmap for future studies.

      Strengths:

      The major strength of this study is the combination of live and whole cell lysates experiments. This allows the authors to compare the results from these two approaches to identify novel ligand-induced changes in thermal stability with greater confidence. More usefully, this also enables the authors to separate the primary and secondary effects of the compounds within the live cell assay.

      The study also benefits from the number of compounds tested within the same framework, which allows the authors to make direct comparisons between compounds.

      These two strengths are combined when they compare CHEK1 inhibitors and suggest that AZD-7762 likely induces secondary destabilisation of CRKL through off-target engagement with tyrosine kinases.

      Weaknesses:

      One of the stated benefits of PISA compared to the TPP in the original publication (Gaetani et al 2019) was that the reduced number of samples required allows more replicate experiments to be performed. Despite this, the authors of this study performed only duplicate experiments. They acknowledge this precludes the use of frequentist statistical tests to identify significant changes in protein stability. Instead, they apply an 'empirically derived framework' in which they apply two thresholds to the fold change vs DMSO: absolute z-score (calculated from all compounds for a protein) > 3.5 and absolute log2 fold-change > 0.2. They state that the fold-change threshold was necessary to exclude nonspecific interactors. While the thresholds appear relatively stringent, this approach will likely reduce the robustness of their findings in comparison to an experimental design incorporating more replicates. Firstly, the magnitude of the effect size should not be taken as a proxy for the importance of the effect.

      They acknowledge this and demonstrate it using their data for PIK3CB and p38α inhibitors (Figures 2BC). They have thus likely missed many small, but biologically relevant changes in thermal stability due to the fold-change threshold. Secondly, this approach relies upon the fold-changes between DMSO and compound for each protein being comparable, despite them being drawn from samples spread across 16 TMT multiplexes. Each multiplex necessitates a separate MS run and the quantification of a distinct set of peptides, from which the protein-level abundances are estimated. Thus, it is unlikely the fold changes for unaffected proteins are drawn from the same distribution, which is an unstated assumption of their thresholding approach. The authors could alleviate the second concern by demonstrating that there is very little or no batch effect across the TMT multiplexes. However, the first concern would remain. The limitations of their approach could have been avoided with more replicates and the use of an appropriate statistical test. It would be helpful if the authors could clarify if any of the missed targets passed the z-score threshold but fell below the fold-change threshold.

      The authors use a single, high, concentration of 10uM for all compounds. Given that many of the compounds likely have low nM IC50s, this concentration will often be multiple orders of magnitude above the one at which they inhibit their target. This makes it difficult to assess the relevance of the offtarget effects identified to clinical applications of the compounds or biological experiments. The authors acknowledge this and use ranges of concentrations for follow-up studies (e.g. Figure 2E-F). Nonetheless, this weakness is present for the vast bulk of the data presented.

      We agree that there is potential to drive off-target effects at such high-concentrations. However, we note that the concentration we employ is in the same range as previous PISA/CETSA/TPP studies. For example, 10 µM treatments were used in the initial descriptions of TPP (Savitski et al., 2014) and PISA (Gaetani et al., 2019). We also note that temperature may affect off-rates and binding interactions (PMID: 32946682) potentiating the need to use compound concentrations to overcome these effects.

      Additionally, these compounds likely accumulate in human plasma/tissues at concentrations that far exceed the compound IC50 values. For example, in patients treated with a standard clinical dose of ribocicilb, the concentration of the compound in the plasma fluctuates between 1 µM and 10 µM. (Bao, X., Wu, J., Sanai, N., & Li, J. (2019). Determination of total and unbound ribociclib in human plasma and brain tumor tissues using liquid chromatography coupled with tandem mass spectrometry. Journal of pharmaceutical and biomedical analysis, 166, 197–204. https://doi.org/10.1016/j.jpba.2019.01.017)

      The authors claim that combining cell-based and lysate-based assays increases coverage (Figure 3F) is not supported by their data. The '% targets' presented in Figure 3F have a different denominator for each bar. As it stands, all 49 targets quantified in both assays which have a significant change in thermal stability may be significant in the cell-based assay. If so, the apparent increase in % targets when combining reflects only the subsetting of the data. To alleviate this lack of clarity, the authors could update Figure 3F so that all three bars present the % targets figure for just the 60 compounds present in both assays.

      We spent much time debating the best way to present this data, so we are grateful for the feedback. Consistent with the Reviewer’s suggestion, we have included a figure that only considers the 60 compounds for which a target was quantified in both cell-based and lysate-based PISA (now Figure 3E). In addition, we included a pie chart that further illustrates our point (now Figure 3 – figure supplement 2A). Of the 60 compounds, there were 37 compounds that had a known target pass as a hit using both approaches, 6 compounds that had a known target pass as a hit in only cell-based experiments, and 6 compounds that had a known target pass as a hit in only lysate-based experiments.

      Within the Venn diagram, we also included a few examples of compounds that fit into each category. Furthermore, we highlighted two examples of compound-target pairs that pass as a hit with one approach, but not the other (Figure 3 – figure supplement 2B,C). We would also like to refer the reviewer to Figure 4D, which indicates that BRAF inhibitors cause a significant change in BRAF thermal stability in lysates but not cells. 

      Aims achieved, impact and utility:

      The authors have achieved their main aim of presenting a workflow that serves to demonstrate the potential value of this approach. However, by using a single high dose of each compound and failing to adequately replicate their experiments and instead applying heuristic thresholds, they have limited the impact of their findings. Their results will be a useful resource for researchers wishing to explore potential off-target interactions and/or mechanisms of action for these 96 compounds, but are expected to be superseded by more robust datasets in the near future. The most valuable aspect of the study is the demonstration that combining live cell and whole cell lysate PISA assays across multiple related compounds can help to elucidate the mechanisms of action.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      More specifically:

      P 1 l 20, we quantified 1.498 million thermal stability measurements.

      It's a staggering assertion, and it takes some reading to realize that the authors mean the total number of proteins identified and quantified in all experiments. But far from all of these proteins were quantified with enough precision to provide meaningful solubility shifts.

      We can assure the reviewer that we were not trying to deceive the readers. We stated ‘1.498 million thermal stability measurements.’ We did not say 1.498 million compound-specific thermal stability shifts.’ We assume that most readers will appreciate that the overall quality of the measurements will be variable across the dataset, e.g., in any work that describes quantitation of thousands of proteins in a proteomics dataset. In accordance with the Reviewer’s suggestion, we have weakened this statement. The revised version of the manuscript now reads as follows (p. 1): 

      “Taking advantage of this advance, we quantified more than one million thermal stability measurements in response to multiple classes of therapeutic and tool compounds (96 compounds in living cells and 70 compounds in lysates).”

      P 7 l 28. We observed a large range of thermal stability measurements for known compound-target pairs, from a four-fold reduction in protein stability to a four-fold increase in protein stability upon compound engagement (Figure 2A).

      PISA-derived solubility shift cannot be interpreted simply as a "four-fold reduction/increase in protein stability".

      We thank the Reviewer for highlighting this specific passage and agree that it was worded poorly. As such, we have modified the manuscript to the following (p. 8): 

      “We observed a large range of thermal stability measurements for known compound-target pairs, from a four-fold reduction in protein solubility after thermal denaturation to a four-fold increase in protein solubility upon compound engagement (Figure 2A).”

      P 8, l 6. Instead, we posit that maximum ligand-induced change in thermal stability is target-specific.

      Yes, that's right, but this has been shown in a number of prior studies.

      We agree with the reviewer and accept that we made a mistake in how we worded this sentence, which we regret upon reflection. As such, we have modified this sentence to the following:

      “Instead, our data appears to be consistent with the previous observation that the maximum ligandinduced change in thermal stability is target-specific (Savitski et al., 2014; Becher et al., 2016).”

      P 11 l 7. Combining the two approaches allows for greater coverage of the cellular proteome and provides a better chance of observing the protein target for a compound of interest. In fact, the main difference is that in-cell PISA provides targets in cases when the compound is a pro-drug that needs to be metabolically processed before engaging the intended target. This has been shown in a number of prior studies, but not mentioned in this manuscript.

      While our study was not focused on the issue of pro-drugs, this is an important point and we would be happy to re-iterate it in our manuscript. We thank the Reviewer for the suggestion and have modified the manuscript to reflect this point (p. 19): 

      “Cell-based studies, on the other hand, have the added potential to identify the targets of pro-drugs that must be metabolized in the cell to become active and secondary changes that occur independent of direct engagement (Savitski et al., 2014; Franken et al., 2015; Almqvist et al., 2016; Becher et al., 2016; Liang et al., 2022).”

      While we are happy to make this change, we also would like to point out that the reviewer’s assertions that, “the main difference is that in-cell PISA provides targets in cases when the compound is a prodrug that needs to be metabolically processed before engaging the intended target” also may not fully capture the nuances of protein engagement effectors in the cellular context. Thus, we believe it is important to highlight the ability of cell-based assays to identify secondary changes in thermal stability.  

      P 11 l 28. These data suggest that the thermal destabilization observed in cell-based experiments might stem from a complex biophysical rearrangement. That's right because it is not about thermal stability, but about protein solubility which is much affected by the environment.

      We agree that the readout of solubility is an important caveat for nearly every experiment in the family of assays associated with ‘thermal proteome profiling’. Inherently complex biophysical arrangements could affect the inherent stability and solubility of a protein or complex. Thus, we would be happy to make the following change consistent with the reviewer’s suggestion (p. 12): 

      “These data suggest that the decrease in solubility observed in cell-based experiments might stem from a complex biophysical rearrangement.”

      P 12 l 7 A). Thus, certain protein targets are more prone to thermal stability changes in one experimental setting compared to the other. Same thing - it's about solubility, not stability.

      We thank the Reviewer for the recommendation and have modified the revised manuscript as follows (p. 13):

      “Thus, certain protein targets were more prone to solubility (thermal stability) changes in one experimental setting compared to the other (Huber et al., 2015).”

      P13 l 15. While the data suggests that cell- and lysate-based PISA are equally valuable in screening the proteome for evidence of target engagement... No, they are not equally valuable - cell-based PISA can provide targets of prodrugs, which lysate PISA cannot.

      We have removed this sentence to avoid any confusion. We will not place any value judgments on the two approaches. 

      P 18 l 10. In general, a compound-dependent thermal shift that occurs in a lysate-based experiment is almost certain to stem from direct target engagement. That's true and has been known for a decade. Reference needed.

      We recognize this oversight and would be happy to include references. The revised manuscript reads as follows: 

      “In general, a compound-dependent thermal shift that occurs in a lysate-based experiment is almost certain to stem from direct target engagement (Savitski et al., 2014; Becher et al., 2016). This is because cell signaling pathways and cellular structures are disrupted and diluted. Cell-based studies, on the other hand, have the added potential to identify the targets of pro-drugs that must be metabolized in the cell to become active and secondary changes that occur independent of direct engagement (Savitski et al., 2014; Franken et al., 2015; Almqvist et al., 2016; Becher et al., 2016; Liang et al., 2022).”

      P 18 l 29. the data seemed to indicate that the maximal PISA fold change is protein-specific. Therefore, a log2 fold change of 2 for one compound-protein pair could be just as meaningful as a log2 fold change of 0.2 for another. This is also not new information.

      We again appreciate the Reviewer for highlighting this oversight. The revised manuscript reads as follows: 

      “Ultimately, the data seemed to be consistent with previous studies that indicate the maximal change in thermal stability in protein specific (Savitski et al., 2014; Becher et al., 2016; Sabatier et al., 2022). Therefore, a log2 fold change of 2 for one compound-protein pair could be just as meaningful as a log2 fold change of 0.2 for another.”

      P 19 l 5. Specifically, the compounds that most strongly impacted the thermal stability of targets, also acted as the most potent inhibitors. I wish this was true, but this is not always so. For instance, in Nat Meth 2019, 16, 894-901 it was postulated that large ∆Tm correspond to biologically most important sites ("hot spots") - the idea that was later challenged and largely discredited in subsequent studies.

      Indeed, we agree with the Reviewer that there may be no essential connection between these. Rather, we are simply drawing conclusions from observations within the presented dataset. 

      Saying nothing about the work presented in the paper that the reviewer notes above, the referenced definition is also more nuanced “…we hypothesized that ‘hotspot’ modification sites identified in this screen (namely, those significantly shifted relative to the unmodified, bulk and even other phosphomodiforms of the same protein) may represent sites with disproportionate effects on protein structure and function under specific cellular conditions.” Indeed, in the response to that work, Potel et al. (https://doi.org/10.1038/s41592-021-01177-5) “agree with the premise of the Huang et al. study that phosphorylation sites that have a significant effect on protein thermal stability are more likely to be functionally relevant, for example, by modulating protein conformation, localization and protein interactions.” 

      Anecdotally, we also speculate that if we observe proteome engagement for two compounds (let’s say two ATP-competitive kinase inhibitors) that bind in the same pocket (let’s say the ATP binding site) and one causes a greater change in solubility, then it is reasonable to assume that it is a stronger evidence and we see evidence supporting this claim in Figure 2, Figure 3, Figure 4, and Figure 5.

      It is also important to point out that previous work has also made similar points. This is highlighted in a review article by Mateus et al. (10.1186/s12953-017-0122-4). The authors state, “To obtain affinity estimates with TPP, a compound concentration range TPP (TPP-CCR) can be performed. In TPPCCR, cells are incubated with a range of concentrations of compound and heated to a single temperature.” In support of this claim, the authors reference two papers—Savitski et al., 2014 and Becher et al., 2016. We have updated this section in the revised manuscript (p. 20): 

      “While the primary screen was carried out at fixed dose, the increased throughput of PISA allowed for certain compounds to be assayed at multiple doses in a single experiment. In these instances, there was a clear dose-dependent change in thermal stability of primary targets, off-targets, and secondary targets. This not only helped corroborate observations from the primary screen, but also seemed to provide a qualitative assessment of relative compound potency in agreement with previous studies (Savitski et al., 2014; Becher et al., 2016; Mateus et al., 2017). Specifically, the compounds that most strongly impacted the thermal stability of targets, also acted as the most potent inhibitors. In order to be a candidate for this type of study, a target must have a large maximal thermal shift (magnitude of log2 fold change) because there must be a large enough dynamic range to clearly resolve different doses.”

      Also, the compound efficacy is strongly dependent upon the residence time of the drug, which may or may not correlate with the PISA shift. Also important is the concentration at which target engagement occurs (Anal Chem 2022, 94, 15772-15780).

      In our study, the time and concentration of treatment and was fixed for all compounds at 30 minutes and 10 µM, respectively. Therefore, we do not believe these parameters will affect our conclusions.  

      P 19 l 19. For example, we found that the clinically-deployed CDK4/6 inhibitor palbociclib is capable of directly engaging and inhibiting PLK1. This is a PISA-based prediction that needs to be validated by orthogonal means.

      As we demonstrate in this work, the PISA assays serve as powerful screening methods, thus we agree that validation is important for these types of studies. To this end, we show the following:  

      • Proteomics: Palbociclib causes a decrease in solubility following thermal melting in cells.

      • Chemical Informatic: Palbociclib is structurally similar to BI 2536.

      • Protein informatics: Modeling of palbociclib in empirical structures of the PLK1 active site generates negligible steric clashes. 

      • Biochemical: Palbociclib inhibits PLK1 activity in cells.

      We have changed this text to the following to clarify these points:

      “For example, we found that the clinically-deployed CDK4/6 inhibitor palbociclib has a dramatic impact on PLK1 thermal stability in live cells, is capable of inhibiting PLK1 activity in cell-based assays, and can be modelled into the PLK1 active site.”

      Reviewer #2 (Recommendations For The Authors):

      I am wondering why the authors chose to use K562 (leukaemia) cells in this work as opposed to a different cancer cell line (HeLa? Panc1?). It would be helpful if the authors could present some rationale for this decision.

      This is a great question. Two reasons really. First, they are commonly used in various fields of research, especially previous studies using proteome-wide thermal shift assays (PMID: 25278616, 32060372) and large scale chemical perturbations screens (PMID: 31806696). Second, they are a suspension line that makes executing the experiments easier because they do not need to be detached from a plate prior to thermal melting. We think this is a valuable point to make in the manuscript, such that non-experts understand this concept. We tried to communicate this succinctly in the revised manuscript, but would be happy to elaborate further if the Reviewer would like us to. 

      “To enable large-scale chemical perturbation screening, we first sought to establish a robust workflow for assessing protein thermal stability changes in living cells. We chose K562 cells, which grow in suspension, because they have been frequently used in similar studies and can easily be transferred from a culture flask to PCR tubes for thermal melting (Savitski et al., 2014; Jarzab et al., 2020).”

      I note that integral membrane proteins are over-represented among targets for anti-cancer therapeutics. To what extent is the membrane proteome (plasma membrane in particular) identified in this work? After examining the methods, I would expect at least some integral membrane proteins to be identified. Do the authors observe any differences in the behaviour of water-soluble proteins versus integral membrane proteins in their assays? It would be helpful if the authors could comment on this in a potential revision.

      We agree this is an important point when considering the usage of PISA and thermal stability assays in general for specific classes of therapeutics. To address this, we explored what effect the analysis of thermal stability/solubility had on the proportion of membrane proteins in our data (Author response image 1). Annotations were extracted from Uniprot based on each protein being assigned to the “plasma membrane” (07/2024). We quantified 1,448 (16.5% of total proteins) and 1,558 (17.3% of total proteins) membrane proteins in our cell and lysate PISA datasets, respectively. We also compared the proportion of annotated proteins in these datasets to a recent TMTpro dataset (Lin et al.; PMID: 38853901) and found that the PISA datasets recovered a slightly lower proportion of membrane proteins (~17% in PISA versus 18.9% in total proteome analysis). Yet, we note that we expect more membrane proteins in urea/SDS based lysis methods compared to 0.5% NP-40 extractions.

      Author response image 1.

      We were not able to find an appropriate place to insert this data into the manuscript, so we have left is here in the response. If the Reviewer feels strongly that this data should be included in the manuscript, we would be happy to include these data.  

      A final note: I commend the authors for making their full dataset publicly available upon submission to this journal. This data promises to be a very useful resource for those working in the field.

      We thank the Reviewer for this and note that we are excited for this data to be of use to the community.

      Reviewer #3 (Recommendations For The Authors):

      There is no dataset PDX048009 in ProteomeXchange Consortium. I assume this is because it's under an embargo which needs to be released.

      We can confirm that data was uploaded to ProteomeXchange.

      MS data added to the manuscript during revisions was submitted to ProteomeXchange with the identifier – PDX053138.

      Page 9 line 5 refers to 59 compounds quantified in both cell-based and lysate-based, but Figure 3E shows 60 compounds quantified in both. I believe these numbers should match.

      We thank the Reviewer for catching this. In response to critiques from this Reviewer in the Public Review, we re-worked this section considerably. Please see the above critique/response for more details. 

      Page 10, lines 26-28: It would help the reader if some of the potential 'artefactual effects of lysatebased analyses' were described briefly.

      We thank the Reviewer for raising this point. The truth is, that we are not exactly sure what is happening here, but we know that, at least, for vorinostat, this excess of changes in lysate-based PISA is consistent across experiments. We also do not see pervasive issues within the plexes containing these compounds. Therefore, we do not think this is due to a mistake or other experimental error. We hypothesize that the effect might result from a change in pH or other similar property that occurs upon addition of the molecule, though we note that we have previously seen that vorinostat can induce large numbers of solubility changes in a related solvent shift assays (doi: 10.7554/eLife.70784). We have modified the text to indicate that we do not fully understand the reason for the observation (p. 11):

      “It is highly unlikely that these three molecules actively engage so many proteins and, therefore, the 2,176 hits in the lysate-based screen were likely affected in part by consistent, but artefactual effects of lysate-based analyses that we do not fully understand (Van Vranken et al., 2021).”

      Page 24, lines 29-30 appear to contain a typo. I believe the '>' should be '<' or the 'exclude' should be 'retain'.

      The Reviewer is completely correct. We appreciate the attention to detail. This mistake has been corrected in the revised manuscript.  

      Page 25, lines 5-7: The methods need to explain how the trimmed standard deviation is calculated.

      We apologize for this oversight. To calculate the trimmed standard deviation, we used proteins that were measured in at least 30 conditions. For these, we then removed the top 5% of absolute log2 foldchanges (compared to DMSO controls) and calculated the standard deviation of the resulting set of log2 fold-changes. This is similar in concept to the utilization of “trimmed means” in proteomics data (https://doi.org/10.15252/msb.20145625), which helps to overcome issues due to extreme outliers in datasets. We have added the following statement to the methods to clarify this point (p. 27):

      “Second, for each protein across all cells or lysate assays, the number of standard deviations away from the mean thermal stability measurement (z-score) for a given protein was quantified based on a trimmed standard deviation. Briefly, the trimmed standard deviation was calculated for proteins that were measured in at least 30 conditions. For these, we removed the top 5% of absolute log2 foldchanges (compared to DMSO controls) and calculated the standard deviation of the resulting set of log2 fold-changes.”

      Page 25, lines 9-11 needs editing for clarity.

      We tested empirical hit rates for estimation of mean and trimmed standard deviation (trimmedSD) thresholds to apply, to maximize sensitivity and minimizing the ‘False Hit Rate’, or the number of proteins in the DMSO control samples called as hits divided by the total number of proteins called as hits with a given threshold applied. 

      Author response image 2.

      Hit calling threshold setting based on maximizing the total hits called and minimizing the False Hit Rate in cells (number of DMSO hits divided by the total number of hits).

      Author response image 3.

      Hit calling threshold setting based on maximizing the total hits called and minimizing the False Hit Rate in lysates (number of DMSO hits divided by the total number of hits).

      Figure 1 supplementary 2a legend states: '32 DMSO controls'. Should that be 64?

      We thank the Reviewer for catching our mistake. This has been corrected in the revised manuscript. 

      I suggest removing Figure 1 supplementary 3c which is superfluous as only the number it presents is already stated in the text (page 5, line 9).

      We thank the Reviewer for the suggestion and agree that this panel is superfluous. It has been removed from the revised manuscript.

      New data and tables added during revisions:  

      (1) Table 3 – All log2 fold change values for the cell-based screen. Using this table, proteincentric solubility profiles can be plotted (as in Figures 2D and others). 

      (2) Table 4 – All log2 fold change values for the lysate-based screen. Using this table, proteincentric solubility profiles can be plotted (as in Figures 2D and others). 

      (3) Figure 1 – Figure supplement 3H – Table highlighting proteins that pass log2 fold change cutoffs, but not nSD cutoffs and vice versa. 

      (4) Figure 2 – Panels H and I were updated with a new color scheme. 

      (5) Figure 3 – Updated main figure and supplement at the request of Reviewer 3. 

      • Figure 3E – Compares on-target hits for the cell- and lysate-based screens for all compounds for which a target was quantified in both screens. 

      • Figure 3 – Figure supplement 2 – Highlights on-target hits in both screens, exclusively in cells, and exclusively in lysates. 

      (6) Figure 5 – PISA data for K562 lysates treated with AZD-7762 at multiple concentrations.

      • Figure 5F

      • Figure 5 – Figure supplement 3A-C

      • Figure 5 – Source data 2

      (7) Figure 5 – Phosphoproteomic profiling of K562 cells treated with AZD7762 or Bafetinib. 

      • Figure 5G

      • Figure 5 – Figure supplement 4A-F

      • Figure 5 – Source data 3 (phosphoproteome)

      • Figure 5 – Source data 4 (associated proteome data)

    1. Author response:

      We thank both reviewers for their thorough and insightful feedback, which will contribute to improving our manuscript. In summary, the key concerns raised include the potential induction of GLV volatiles due to plant handling, limitations in the design of the "wind tunnel" bioassay, and the need for a deeper analysis of specific volatile compounds that contribute to the success of push-pull systems. We are happy to revise the entire manuscript according to all comments of the reviewers. This includes clarification of our methodology and providing a more reflective discussion on how physical stress might have influenced volatile emissions. Additionally, we will conduct new experiments with a modified bioassay setup to address concerns about directional cues and airflow control, minimizing cross-contamination. While the identification of individual compounds was beyond the scope of this study, we acknowledge its importance and propose it as a direction for future research.

      Reviewer #1 (Public review):

      Summary:

      The manuscript of Odermatt et al. investigates the volatiles released by two species of Desmodium plants and the response of herbivores to maize plants alone or in combination with these species. The results show that Desmodium releases volatiles in both the laboratory and the field. Maize grown in the laboratory also released volatiles, in a similar range. While female moths preferred to oviposit on maize, the authors found no evidence that Desmodium volatiles played a role in lowering attraction to or oviposition on maize.

      Strengths:

      The manuscript is a response to recently published papers that presented conflicting results with respect to whether Desmodium releases volatiles constitutively or in response to biotic stress, the level at which such volatiles are released, and the behavioral effect it has on the fall armyworm. These questions are relevant as Desmodium is used in a textbook example of pest-suppressive sustainable intercropping technology called push-pull, which has supported tens of thousands of smallholder farmers in suppressing moth pests in maize. A large number of research papers over more than two decades have implied that Desmodium suppresses herbivores in push-pull intercropping through the release of large amounts of volatiles that repel herbivores. This premise has been questioned in recent papers. Odermatt et al. thus contribute to this discussion by testing the role of odors in oviposition choice. The paper confirms that ovipositing FAW preferred maize, and also confirmed that odors released from Desmodium appeared not important in their bioassays.

      The paper is a welcome addition to the literature and adds quality headspace analyses of Desmodium from the laboratory and the field. Furthermore, the authors, some of whom have since long contributed to developing push-pull, also find that Desmodium odors are not significant in their choice between maize plants. This advances our knowledge of the mechanisms through which push-pull suppresses herbivores, which is critically important to evolving the technique to fit different farming systems and translating this mechanism to fit with other crops and in other geographical areas.

      Thank you for your careful assessment of our manuscript.

      Weaknesses:

      Below I outline the major concerns:

      (1) Clear induction of the experimental plants, and lack of reflective discussion around this: from literature data and previous studies of maize and Desmodium, it is clear that the plants used in this study, particularly the Desmodium, were induced. Maize appeared to be primarily manually damaged, possibly due to sampling (release of GLV, but little to no terpenoids, which is indicative of mostly physical stress and damage, for example, one of the coauthor's own paper Tamiru et al. 2011), whereas Desmodium releases a blend of many compounds (many terpenoids indicative of herbivore induction). Erdei et al. also clearly show that under controlled conditions maize, silver leaf and green leaf Desmodium release volatiles in very low amounts. While the condition of the plants in Odermatt et al. may be reflective of situations in push-pull fields, the authors should elaborate on the above in the discussion (see comments) such that the readers understand that the plant's condition during the experiments. This is particularly important because it has been assumed that Desmodium releases typical herbivore-induced volatiles constitutively, which is not the case (see Erdei et al. 2024). This reflection is currently lacking in the manuscript.

      We acknowledge the need for a more reflective discussion on the possible causes of GLV (green leaf volatiles) emission, particularly regarding physical damage. Although the field plants were carefully handled, it is possible that some physical stress may have contributed to the release of GLVs. We will ensure the revised manuscript reflects this nuanced interpretation. However, we will also explain more clearly that our aim was to capture the volatile emission of plants used by farmers under realistic conditions and moth responses to these plants, not to be able to attribute the volatile emission to a specific cause. We think that this is also clear in the manuscript. However, we plan to revise relevant passages throughout the manuscript to ensure that we do not make any claims about the reason for volatile emissions, and that our claims regarding these plants and their headspace being representative of the system as practiced by farmers are supported. In the revised manuscript we will explain better that the volatile profiles comprise a majority of non-GLV compounds. As shown in figure 1, the majority of the substances that were found in the headspace of the sampled plants of Desmodium intortum or Desmodium incanum are non-GLV monoterpenes, sesquiterpenes, or aromatic compounds. We will also note that the experimental plants used in the study were grown in insect proof screenhouses and were checked for any insect damage before volatile collection and bioassay.

      (2) Lack of controls that would have provided context to the data: The experiments lack important controls that would have helped in the interpretation:

      (2a) The authors did not control the conditions of the plants. To understand the release of volatiles and their importance in the field, the authors should have included controlled herbivory in both maize and Desmodium. This would have placed the current volatile profiles in a herbivory context. Now the volatile measurements hang in midair, leading to discussions that are not well anchored (and should be rephrased thoroughly, see eg lines 183-188). It is well known that maize releases only very low levels of volatiles without abiotic and biotic stressors. However, this changes upon stress (GLVs by direct, physical damage and eg terpenoids upon herbivory, see above). Erdei et al. confirm this pattern in Desmodium. Not having these controls, means that the authors need to put the data in the context of what has been published (see above).

      We appreciate this concern. Our study aimed to capture the real-world conditions of push-pull fields, where Desmodium and maize grow in natural environments without the direct induction of herbivory for experimental purposes. We will update the discussion to provide better context based on existing literature regarding the volatile release under stress conditions. We agree that in further studies it would be important to carry out experiments under different environmental conditions, including herbivore damage. However, this was not within the scope of the present study.

      (2b) It would also have been better if the authors had sampled maize from the field while sampling Desmodium. Together with the above point (inclusion of herbivore-induced maize and Desmodium), the levels of volatile release by Desmodium would have been placed into context.

      We acknowledge that sampling maize and other intercrop plants, such as edible legumes, alongside Desmodium in the push-pull field would have allowed us to make direct comparisons of the volatile profiles of different plants in the push-pull system under shared field conditions. Again, this should be done in future experiments but was beyond the scope of the present study. Due to the amount of samples, we could handle given cost and workload, we chose to focus on Desmodium because there is much less literature on the volatile profiles of field-grown Desmodium than maize plants in the field: we are aware of one study attempting to measure field volatile profiles from Desmodium intortum (Erdei et al. 2024) and no study attempting this for Desmodium incanum. We will point out this justification for our focus on Desmodium in the manuscript. Additionally, we will suggest in the discussion that future studies should measure volatile profiles from maize and intercrop legumes alongside Desmodium and border grass in push-pull fields.

      (2c) To put the volatiles release in the context of push-pull, it would have been important to sample other plants which are frequently used as intercrop by smallholder farmers, but which are not considered effective as push crops, particularly edible legumes. Sampling the headspace of these plants, both 'clean' and herbivore-induced, would have provided a context to the volatiles that Desmodium (induced) releases in the field - one would expect unsuccessful push crops to not release any of these 'bioactive' volatiles (although 'bioactive' should be avoided) if these odors are responsible for the pest suppressive effect of Desmodium. Many edible intercrops have been tested to increase the adoption of push-pull technology but with little success.

      Again, we very much agree that such measurements are important for the longer-term research program in this field. But again, for the current study this would have exploded the size of the required experiment. Regarding bioactivity, we have been careful to use the phrase "potentially bioactive", or to cite other studies showing bioactivity, where we have not demonstrated bioactivity ourselves.

      Because of the lack of the above, the conclusions the authors can draw from their data are weakened. The data are still valuable in the current discussion around push-pull, provided that a proper context is given in the discussion along the points above.

      We agree that our study is limited to its specific aims. Therefore, we think the revisions will make these more explicit and help to avoid misleading claims.

      (3) 'Tendency' of the authors to accept the odor hypothesis (i.e. that Desmodium odors are responsible for repelling FAW and thereby reduce infestation in maize under push-pull management) in spite of their own data: The authors tested the effects of odor in oviposition choice, both in a cage assay and in a 'wind tunnel'. From the cage experiments, it is clear that FAW preferred maize over Desmodium, confirming other reports (including Erdei et al. 2024). However, when choosing between two maize plants, one of which was placed next to Desmodium to which FAW has no tactile (taste, structure, etc), FAW chose equally. Similarly in their wind tunnel setup (this term should not be used to describe the assay, see below), no preference was found either between maize odor in the presence or absence of Desmodium. This too confirms results obtained by Erdei et al. (but add an important element to it by using Desmodium plants that had been induced and released volatiles, contrary to Erdei et al. 2024). Even though no support was found for repellency by Desmodium odors, the authors in many instances in the manuscript (lines 30-33, 164-169, 202, 279, 284, 304-307, 311-312, 320) appear to elevate non-significant tendencies as being important. This is misleading readers into thinking that these interactions were significant and in fact confirming this in the discussion. The authors should stay true to their own data obtained when testing the hypothesis of whether odors play a role in the pest-suppressive effect of push-pull.

      We appreciate this feedback and agree that we may have overstated claims that could not be supported by strict significance tests. However, we believe that non-significant tendencies can still provide valuable insights. In the revised version of the manuscript, we will ensure a clear distinction between statistically significant findings and non-significant trends and remove any language that may imply stronger support for the odor hypothesis that what the data show.

      (4) Oviposition bioassay: with so many assays in close proximity, it is hard to certify that the experiments are independent. Please discuss this in the appropriate place in the discussion.

      We have pointed this out in the submitted manuscript in the lines 275 – 279. Furthermore, we include detailed captions to figure 4 - supporting figure 3 & figure 4 - supporting figure 4. We are aware that in all such experiments there is a danger of between-treatment interference, which we will point out for our specific case. We will also mention that this common caveat does not invalidate experimental designs when practicing replication and randomization and assume insect’s ability to select suitable oviposition site in the background of such confounding factors under realistic conditions. We will also mention explicitly that with our experimental setup we tried to minimize interference between treatments by spacing and temporal staggering.

      (5) The wind tunnel has a number of issues (besides being poorly detailed):

      (5a) The setup which the authors refer to as a 'wind tunnel' does not qualify as a wind tunnel. First, there is no directional flow: there are two flows entering the setup at opposite sides. Second, the flow is way too low for moths to orient in (in a wind tunnel wind should be presented as a directional cue. Only around 1.5 l/min enters the wind tunnel in a volume of 90 l approximately, which does not create any directional flow. Solution: change 'wind tunnel' throughout the text to a dual choice setup /assay.)

      We agree with these criticisms and will change the terminology accordingly. We also plan to conduct an additional experiment with a no-choice arena that provides conditions closer to a true wind tunnel. The setup of the added experiment features an odor entry point at only one side of the chamber to create a more directional airflow. Each treatment (maize alone, maize + D. intortum, maize + D. incanum, and a control with no plants) will be tested separately, with only one treatment conducted per evening to avoid cross-contamination.

      (5b) There is no control over the flows in the flight section of the setup. It is very well possible that moths at the release point may only sense one of the 'options'. Please discuss this.

      We will add this to the discussion. The newly planned assays also address this concern by using a setup with laminar flow.

      (5c) Too low a flow (1,5 l per minute) implies a largely stagnant air, which means cross-contamination between experiments. An experiment takes 5 minutes, but it takes minimally 1.5 hours at these flows to replace the flight chamber air (but in reality much longer as the fresh air does not replace the old air, but mixes with it). The setup does not seem to be equipped with e.g. fans to quickly vent the air out of the setup. See comments in the text. Please discuss the limitations of the experimental setup at the appropriate place in the discussion.

      We will add these limitations to the discussion and will address these concerns with new experiments (see answer 5a).

      (5d) The stimulus air enters through a tube (what type of tube, diameter, length, etc) containing pressurized air (how was the air obtained into bags (type of bag, how is it sealed?), and the efflux directly into the flight chamber (how, nozzle?). However, it seems that there is no control of the efflux. How was leakage prevented, particularly how the bags were airtight sealed around the plants? 

      We will add the missing information to the methods and provide details about types of bags, manufacturers, and pre-treatments. In short, Teflon tubes connected bagged plants to the bioassay setup and air was pumped in at an overpressure, so leakage was not eliminated but contamination from ambient air was avoided.

      (5e) The plants were bagged in very narrowly fitting bags. The maize plants look bent and damaged, which probably explains the GLVs found in the samples. The Desmodium in the picture (Figure 5 supplement), which we should assume is at least a representative picture?) appears to be rather crammed into the bag with maize and looks in rather poor condition to start with (perhaps also indicating why they release these volatiles?). It would be good to describe the sampling of the plants in detail and explain that the way they were handled may have caused the release of GLVs.

      We will include a more detailed description of the plant handling and bagging processes to the methods to clarify how the plants were treated during all assays reported in the submitted manuscript and the newly planned assays. This will address concerns about the possible influence of plant stress, such as GLV emission due to bagging, on the results. We politely disagree that the maize plants were damaged and the Desmodium plants not representative of those encountered in the field. The Desmodium plant pictured was D. incanum, which has sparser foliage and smaller leaves than D. intortum.

      (6) Figure 1 seems redundant as a main figure in the text. Much of the information is not pertinent to the paper. It can be used in a review on the topic. Or perhaps if the authors strongly wish to keep it, it could be placed in the supplemental material.

      We think that Figure 1 provides essential information about the push-pull system and the FAW. To our knowledge, this partly contradictory evidence so far has not been synthesized in the literature. We realize that such a figure would more commonly be provided in a review article, but we do not think that the small number of studies on this topic so far justify a stand-alone review. Instead, the introduction to our manuscript includes a brief review of these few studies, complemented by the visual summary provided in Figure 1 and a detailed supplementary table. We will revise the figure and associated text in the introduction to highlight its relevance for the current study and to reduce redundant information.

      Reviewer #2 (Public review):

      Based on the controversy of whether the Desmodium intercrop emits bioactive volatiles that repel the fall armyworm, the authors conducted this study to assess the effects of the volatiles from Desmodium plants in the push-pull system on behavior of FAW oviposition. This topic is interesting and the results are valuable for understanding the push-pull system for the management of FAW, the serious pest. The methodology used in this study is valid, leading to reliable results and conclusions. I just have a few concerns and suggestions for improvement of this paper:

      (1) The volatiles emitted from D. incanum were analyzed and their effects on the oviposition behavior of FAW moth were confirmed. However, it would be better and useful to identify the specific compounds that are crucial for the success of the push-pull system.

      We fully agree that identifying specific volatile compounds responsible for the push-pull effect would provide valuable insights into the underlying mechanisms of the system. However, the primary focus of this study was to address the still unresolved question whether Desmodium emits volatiles at all under field conditions, and the secondary aim was to test whether we could demonstrate a behavioral effect of Desmodium headspace on FAW moths. Before conducting our experiments, we carefully considered the option of using single volatile compounds and synthetic blends in bioassays. We decided against this because we judged that the contradictory evidence in the literature was not a sufficient basis for composing representative blends. Furthermore, we think it is an important first step to test for behavioral responses to the headspaces of real plants. We consider bioassays with pure compounds to be important for confirmation and more detailed investigation in future studies. There was also contradictory evidence in the literature regarding moth responses to plants. We thus opted to focus on experiments with whole plants to maintain ecological relevance.

      (2) That would be good to add "symbols" of significance in Figure 4 (D).

      We report the statistical significance of the parameters in Figure 4 (D) in Table 3. While testing significance between groups is a standard approach, we used a more robust model-based analysis to assess the effects of multiple factors simultaneously. We will clarify this in the figure legend and provide a cross-reference to Table 3 for readers to easily find the statistical details.

      (3) Figure A is difficult for readers to understand.

      Unfortunately, it is not entirely clear which specific figure is being referred to as "Figure A" in this comment. We kindly request further clarification on which figure needs improvement, and we will make adjustments accordingly to ensure that all figures are easily comprehensible for readers.

      (4) It will be good to deeply discuss the functions of important volatile compounds identified here with comparison with results in previous studies in the discussion better.

      Our study does not provide strong evidence that specific volatiles from Desmodium plants are important determinants of FAW oviposition or choice in the push-pull system. Therefore, we prefer to refrain from detailed discussions of the potential importance of individual compounds. However, in the revised version, we will indicate specifically which of the volatiles we identified overlap with those previously reported from Desmodium, as only the total numbers are summarized in the discussion of the submitted paper.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Weaknesses & incompletely supported claims:

      (1) A central mechanistic claim of the paper is that "DCP1a can regulate DCP2's cellular decapping activity by enhancing DCP2's affinity to RNA, in addition to bridging the interactions of DCP2 with other decapping factors. This represents a pivotal molecular mechanism by which DCP1a exerts its regulatory control over the mRNA decapping process." Similar versions of this claim are repeated in the abstract and discussion sections. However, this appears to be entirely at odds with the observation from in vitro decapping assays with immunoprecipitated DCP2 that showed DCP1 knockout does not significantly affect the enzymatic activity of DCP2 (Figures 2B-D; I note that there may be a very small change in DCP2 activity shown in panel C, but this may be due to slightly different amounts of immunoprecipitated DCP2 used in the assay, as suggested by panel D). If DCP1 pivotally regulates decapping activity by enhancing RNA binding to DCP2, why is no difference in decapping activity observed in the absence of DCP1?

      Furthermore, the authors show only weak changes in relative RNA levels immunoprecipitated by DCP2 with versus without DCP1 (~2-3 fold change; consistent with the Valkov 2016 NSMB paper, which shows what looks like only modest changes in RNA binding affinity for yeast Dcp2 +/- Dcp1). Is the argument that only a 2-3 fold change in RNA binding affinity is responsible for the sizable decapping defects and significant accumulation of deadenylated intermediates observed in cells upon Dcp1 depletion? (and if so, why is this the case for in-cell data, but not the immunoprecipitated in vitro data?)

      We appreciate the reviewer's thoughtful comments on our paper. The reviewer points out an apparent contradiction between the claim that DCP1a regulates DCP2's cellular decapping activity and the observation that knocking out DCP1a does not significantly affect DCP2's enzymatic activity in vitro. However, it is important to underscore the challenge of reconciling differences between in vitro and in vivo experiments in scientific research. Although in vitro systems provide a controlled environment, they have inherent limitations that often fail to capture the complexities of cellular processes. Our in vitro experiments used immunoprecipitated proteins to ensure the presence of relevant factors, but these experiments cannot fully replicate the precise stoichiometry and dynamic interactions present in a cellular environment. Furthermore, the limited volume in vitro can actually facilitate reactions that may not occur as readily in the complex and heterogeneous environment of a cell. Therefore, the lack of a significant difference in decapping activity observed in vitro does not necessarily negate the regulatory role of DCP1 in the cellular context. Rather, it underscores our previous oversight of DCP1's importance in the decapping process under in vitro conditions. The conclusions regarding DCP1's regulatory mechanisms remain valid and supported by the presented evidence, especially when considering the inherent differences between in vitro and in vivo experimental conditions. It is precisely because of these differences that we recognized our previous underestimation of DCP1's significance. Therefore, our subsequent experiments focused on elucidating DCP1's regulatory mechanisms in the decapping process

      The authors acknowledge this apparent discrepancy between the in vitro DCP2 decapping assays and in-cell decapping data, writing: "this observation could be attributed to the inherent constraints of in vitro assays, which often fall short of faithfully replicating the complexity of the cellular environment where multiple factors and cofactors are at play. To determine the underlying cause, we postulated that the observed cellular decapping defect in DCP1a/b knockout cells might be attributed to DCP1 functioning as a scaffold." This is fair. They next show that DCP1 acts as a scaffold to recruit multiple factors to DCP2 in cells (EDC3, DDX6, PatL1, and PNRC1 and 2). However, while DCP1 is shown to recruit multiple cofactors to DCP2 (consistent with other studies in the decapping field, and primarily through motifs in the Dcp1 C-terminal tail), the authors ultimately show that *none* of these cofactors are actually essential for DCP2-mediated decapping in cells (Figures 3A-F). More specifically, the authors showed that the EVH1 domain was sufficient to rescue decapping defects in DCP1a/b knockout cells, that PNRC1 and PNRC2 were the only cofactors that interact with the EVH1 domain, and finally that shRNA-mediated PNRC1 or PNCR2 knockdown has no effect on in-cell decapping (Figures 3E and F). Therefore, based on the presented data, while DCP1 certainly does act as a scaffold, it doesn't seem to be the case that the major cellular decapping defect observed in DCP1a/b knockout is due to DCP1's ability to recruit specific cofactors to DCP2.

      The findings that none of the decapping cofactors recruited by DCP1 to DCP2 are essential for decapping in cells further underscore the complexity of the decapping process in vivo. This observation suggests that while DCP1's scaffolding function is crucial for recruiting cofactors, the decapping process likely involves additional layers of regulation that are not fully captured by our current understanding of DCP1. Furthermore, the reviewer mentions that the observed changes in RNA binding affinity (approximately 2-3 fold) in our in vitro experiments seem relatively modest. While these changes may appear insignificant in vitro, their cumulative impact in the dynamic cellular environment could be substantial. Even minor perturbations in RNA binding affinity can trigger cascading effects, leading to significant changes in decapping activity and the accumulation of deadenylated intermediates upon Dcp1 depletion. Cellular processes involve complex networks of interrelated events, and small molecular changes can result in amplified biological outcomes. The subtle molecular variations observed in vitro may translate into significant phenotypic outcomes within the complex cellular environment, underscoring the importance of DCP1a's regulatory role in the cellular decapping process.

      So as far as I can tell, the discrepancy between the in vitro (DCP1 not required) and in-cell (DCP1 required) decapping data, remains entirely unresolved. Therefore, I don't think that the conclusions that DCP1 regulates decapping by (a) changing RNA binding affinity (authors show this doesn't matter in vitro, and that the change in RNA binding affinity is very small) or (b) by bridging interactions of cofactors with DCP2 (authors show all tested cofactors are dispensable for robust in-cell decapping activity), are supported by the evidence presented in the paper (or convincingly supported by previous structural and functional studies of the decapping complex).

      We have addressed the reconciliation of differences between in vitro and in vivo experiments in the revised manuscript and emphasized the importance of considering cellular interactions when interpreting our findings.

      (2) Related to the RNA binding claims mentioned above, are the differences shown in Figure 3H statistically significant? Why are there no error bars shown for the MBP control? (I understand this was normalized to 1, but presumably, there were 3 biological replicates here that have some spread of values?). The individual data points for each replicate should be displayed for each bar so that readers can better assess the spread of data and the significance of the observed differences. I've listed these points as major because of the key mechanistic claim that DCP1 enhances RNA binding to DCP2 hinges in large part on this data.

      Thank you for your feedback. Regarding your comments on the statistical significance of the differences shown in Figure 3H and the absence of error bars for the MBP control, we will address these concerns in the revised manuscript. We’ll include individual data points for the three biological replicates and corresponding statistical analysis to more clearly demonstrate the data spread and significance of the observed differences.

      (3) Also related to point (1) above, the kinetic analysis presented in Figure 2C shows that the large majority of transcript is mostly decapped at the first 5-minute timepoint; it may be that DCP2-mediated decapping activity is actually different in vitro with or without DCP1, but that this is being missed because the reaction is basically done in less than 5 minutes under the conditions being assayed (i.e. these are basically endpoint assays under these conditions). It may be that if kinetics were done under conditions to slow down the reaction somewhat (e.g. lower Dcp2 concentration, lower temperatures), so that more of the kinetic behavior is captured, the apparent discrepancy between in vitro and in-cell data would be much less. Indeed, previous studies have shown that in yeast, Dcp1 strongly activates the catalytic step (kcat) of decapping by ~10-fold, and reduces the KM by only ~2 fold (Floor et al, NSMB 2010). It might be beneficial to use purified proteins here (only a Western blot is used in Figure 2D to show the presence of DCP2 and/or DCP1, but do these complexes have other, and different, components immunoprecipitated along with them?), if possible, to better control reaction conditions.

      This contradiction between the in vitro and in-cell decapping data undercuts one of the main mechanistic takeaways from the first half of the paper. This needs to be addressed/resolved with further experiments to better define the role of DCP1-mediated activation, or the mechanistic conclusions significantly changed or removed.

      We genuinely appreciate the reviewer’s insightful comments on the kinetic analysis presented in Figure 2C. Your astute observation regarding the potential influence of reaction duration on the interpretation of in vitro decapping activity, especially in the absence of DCP1, is well-received. The time-sensitive nature of our experiments, as you rightly pointed out, might not fully capture the nuanced kinetic behaviors. In addition, the DCP2 complex purified from cells could not be precisely quantified. In response to your suggestion, we attempted to purify human DCP2 protein from E. coli; however, regrettably, the purified protein failed to exhibit any enzymatic activity. This disparity may be attributed to species differences.

      Considering the reviewer’s valuable insights, our revised manuscript emphasized that purified DCP2 from cells exhibits activity regardless of the presence of DCP1. This adjustment aims to provide a clearer perspective on our findings and to better align with the nuances of our experimental design and the meticulous consideration of the results.

      (4) The second half of the paper compares the transcriptomic and metabolic profiles of DCP1a versus DCP1b knockouts to reveal that these target a different subset of mRNAs for degradation and have different levels of cellular metabolites. This is a great application of the DCP1a/b KO cells developed in this paper and provides new information about DCP1a vs b function in metazoans, which to my knowledge has not really been explored at all. However, the analysis of DCP1 function/expression levels in human cancer seems superficial and inconclusive: for example, the authors conclude that "...these findings indicate that DCP1a and DCP1b likely have distinct and non-redundant roles in the development and progression of cancer", but what is the evidence for this? I see that DCP1a and b levels vary in different cancer cell types, but is there any evidence that these changes are actually linked to cancer development, progression, or tumorigenesis? If not, these broader conclusions should be removed.

      Thank you to the reviewer for pointing out that such a description may be misleading. We have removed our previous broader conclusion and revised our sentences. To further explore the potential impact of DCP1a and DCP1b on cancer progression, we examined the association between the expression levels of DCP1a and DCP1b and progression-free interval (PFI). We have incorporated this information into our revised manuscript.

      (5) The authors used CRISPR-Cas9 to introduce frameshift mutations that result in premature termination codons in DCP1a/b knockout cells (verified by Sanger sequencing). They then use Western blotting with DCP1a or DCP1b antibodies to confirm the absence of DCP1 in the knockout cell lines. However, the DCP1a antibody used in this study (Sigma D5444) is targeted to the C-terminal end of DCP1a. Can the authors conclusively rule out that the CRISPR/Cas-generated mutations do not result in the production of truncated DCP1a that is just unable to be detected by the C-terminally targeted antibody? While it is likely the introduced premature termination codon in the DCP1a gene results in nonsense-mediated decay of the resulting transcript, this outcome is indeed supported by the knockout results showing large defects in cellular decapping which can be rescued by the addition of the EVH1 domain, it would be better to carefully validate the success of the DCP1a knockout and conclusively show no truncated DCP1a is produced by using N-terminally targeted DCP1a antibodies (as was the case for DCP1b).

      Thank you for your insightful comment regarding the validation of our DCP1a/b knockout cell line. We acknowledge your point about the DCP1a C-terminal targeting of the Sigma D5444 antibody used in our Western blot analysis. We agree that we cannot definitively rule out the possibility of truncated DCP1a protein production solely based on the lack of full-length protein detection. To address this limitation, we utilized a commercial information available N-terminally targeted DCP1a antibody (aviva ARP39353_T100) in a Western blot analysis. This will allow us to comprehensively detect any truncated protein fragments remaining after the CRISPR-Cas9-generated frameshift mutation.

      Some additional minor comments:

      • More information would be helpful on the choice of DCP1 truncation boundaries; why was 1-254 chosen as one of the truncations?

      Thank you for the reviewer's comment and suggestion. Regarding the choice of DCP1 1-254 truncation boundaries based on the predicted structure from AlphaFoldDB (A0A087WT55). We will include this information in the revised manuscript.

      • Figure S2D is a pretty important experiment because it suggests that the observed deadenylated intermediates are in fact still capped; can a positive control be added to these experiments to show that removal of cap results in rapid terminator-mediated degradation?

      Unfortunately, due to our institution's current laboratory safety policies, we are unable to perform experiments involving the use of radioactive isotopes such as 32P. Therefore, while adding the suggested positive control experiment to demonstrate rapid RNA degradation upon decapping would further validate our interpretation, we regret that we cannot carry out this experiment at the moment. However, the observed deadenylated intermediates in Figure S2D match the predicted size of capped RNA fragments, and not the expected sizes of degradation products after decapping. Furthermore, previous literature has well-established that for these types of RNAs, decapping leads directly to rapid 5' to 3' exonuclease-mediated degradation, without producing stable deadenylated intermediates. Thus, we believe that the current data is sufficient to support our conclusion that the deadenylated intermediates retain the 5' cap structure.

      Reviewer #2 (Public Review):

      Weaknesses:

      The direct targets of DCP1a and/or DCP1b were not determined as the analysis was restricted to RNA-seq to assess RNA abundance, which can be a result of direct or indirect regulation by DCP1a/b.

      Thank you for raising this important point. In our study, we acknowledge that the use of RNA-seq to assess RNA abundance provides a broad overview of the regulatory impacts of DCP1a and DCP1b. This method captures changes in RNA levels that may arise from both direct and indirect regulatory actions of these proteins. While we did not directly determine the targets of DCP1a and DCP1b, the data obtained from our RNA-seq analysis serve as a foundational step for future targeted experiments, which could include techniques such as RIP-seq, to delineate the direct targets of DCP1a and DCP1b more precisely. We believe that our current findings contribute valuable information to the field and pave the way for these subsequent analyses.

      P-bodies appear to be larger in human cells lacking DCP1a and DCP1b but a lack of image quantification prevents this conclusion from being drawn.

      Thank you for the reviewer’s valuable feedback. We have addressed the reviewer’s concern regarding P-bodies' size in human cells lacking DCP1a and DCP1b. We have now performed image quantification and can confirm that P-bodies are indeed larger in these cells.

      The lack of details in the methodology and figure legends limit reader understanding.

      We acknowledge the reviewer's concerns regarding the level of detail provided in the methodology and figure legends. To address this, we are committed to enhancing both sections with additional details and clarifications in our revised manuscript. Thank you for bringing this to our attention.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) To me, the second half of the paper comparing DCP1a and DCP1b is in many ways distinct from the first half and could stand on its own as an interesting paper if this comparative analysis is explored a little deeper (maybe by validating some of the differences in decay observed for individual mRNAs targeted by DCP1a versus DCP1b, by measuring and comparing the decay rates of some individual transcripts under differential control by DCP1a vs b?), and revising the conclusions about links to cancer as mentioned above. I think these later comparative results in the paper present the most new and interesting data concerning DCP1 function in humans (especially since I think the mechanistic conclusions from the first half aren't well supported yet or are at least inconsistent), but when I read these later sections of the paper I struggle to understand the key takeaways from the transcriptomic and metabolomic data.

      Thank you for the reviewer's suggestions. Estimating the decay rates of individual transcripts within the transcriptomes of DCP1a_KO, DCP1b_KO, and wild type can provide insight into the direct targets of DCP1a or DCP1b. However, this requires either time-series RNA-seq or specialized sequencing technologies such as Precision Run-On sequencing (PRO-seq) or RNA Approach to Equilibrium Sequencing (RATE-Seq). Unfortunately, we lack the necessary dataset in our project to estimate the decay rates for the potential targets identified in our RNA-seq data. Despite this limitation, we acknowledge the potential of this approach in identifying the true targets of DCP1a and DCP1b and have included this idea in our discussion.

      (2) I think it would be helpful to add a little more descriptive or narrative language to the figure legends (I know some of them are already quite long!) so that readers can follow the general idea of the experiment through the figure legend as well as the main text; as written, the figure legends are mostly exclusively technical details, so it can be hard to parse what experiment is being carried out in some cases.

      Thank you for the reviewer’s suggestion, we will strive to improve the language of the figure legends to include technical details while clearly conveying the main idea of the experiment. We will ensure that the language of the figure legends is more readable and comprehensible so that readers can more easily parse what experiment is being carried out.

      Reviewer #2 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data, or analyses:

      The use of RNA-seq to measure RNA abundance in DCP1a and/or b knockout cells can give some insight into both the indirect and direct effects of DCP1a/b on gene expression but cannot identify the direct targets of these genes. Rather, global analysis of RNA stability or capturing uncapped RNA decay intermediates would allow the authors to conclude they have identified direct targets of DCP1a and/or b. Without such analyses, the interpretation of these data should be scaled back to clearly state that RNA levels can be altered through indirect effects of DCP1a/b absence throughout the text.

      We appreciate the reviewer's suggestion. We have modified our sentences to emphasize that the dysregulated genes could be caused by both direct and indirect effects.

      A control/randomly generated gene list should be analyzed for GO terms to determine whether the enrichment of cancer-related pathways in the differentially expressed genes in the DCP1a/b knockout cells is meaningful.

      Thank you for the reviewer's comment. We shuffled our gene list and reperformed the pathway enrichment analysis in Figure 4C and 4D 1,000 times. We focused on the following cancer-related pathways: E2F targets, MTORC1 signaling, G2M checkpoint, MYC target V1, EMT transition, KRAS signaling DN, P53 pathway, and NOTCH signaling pathways. We then calculated how many times the q-values obtained from the shuffled gene list were more significant than the q-value obtained from our real data. In four of the eight pathways (E2F targets, MTORC1 signaling, G2M checkpoint, and MYC target v1), none of the shuffled gene lists resulted in a q-value smaller than the real one. In the other four pathways (EMT transition, KRAS signaling DN, P53 pathway, and NOTCH signaling pathways), the q-values were smaller than the real q-value 2, 11, 4, and 4 times out of the 1000 shuffles. Based on the shuffled results, we conclude that the transcriptome of DCP1a/b knockout cells is statistically enriched in these cancer-related pathways.

      Author response image 1.

      Distribution of q-values resulting from the Gene Set Enrichment Analysis (GSEA) conducted on 1,000 shuffled gene lists for eight cancer-related pathways. The q-values derived from Figure 4C and 4D are indicated by red (DCP1a_KO) and blue (DCP1b_KO) dashed lines, respectively. Some q-values derived from Figure 4C are too small to be labeled on the plots, such as in E2F targets (q value: 5.87E-07), MTORC1 signaling (q values: 6.59E-07 and 1.58E-06 for DCP1a_KO and DCP1b_KO, respectively), MYC target V1 (q value: 0.004644174 for DCP1a_KO), etc. The numbers x/1000 indicate how often the shuffled q-values were smaller than the real q-value out of 1,000 permutations.

      Comparisons of the DCP1a and/or b knockout RNA-seq results should be done to published datasets such as those published by Luo et al., Cell Chemical Biology (2021) to determine whether there are common targets with DCP2 and validate the reported findings.

      Thank you for reviewer’s suggestion. We compared the upregulated genes from DCP1a_KO, DCP1b_KO, and DCP1a/b_KO cell lines with the 91 targets of DPC2 identified by Luo et al. in Cell Chemical Biology (2021). Only EPPK1 was found to be overlapped between the potential DCP1b_KO targets and the targets of DCP2. No genes were found to be overlapped between the potential DCP1a_KO targets and the targets of DCP2. However, three genes, TES, PAX6, and C18orf21, were found to be overlapped between the significantly upregulated DEGs of DCP1a/b_KO and the targets of DCP2. We have included this information in the discussion section.

      The RNA tethering assays are not clear and are difficult to interpret without further controls to delineate the polyadenylated and deadenylated species.

      Thank you for the reviewer’s feedback. We acknowledge that the reviewer might harbor some doubts regarding the outcomes of the RNA tethering assays. Nonetheless, this methodology is well-established and has also found extensive application across many studies. We are committed to enhancing the clarity of our experiment’s details and results within the figure legends and textual descriptions.

      The representative images of p-bodies clearly show that DCP1a/b KO cells have larger p-bodies than the wild-type cells. The authors should quantify p-body size in each image set as the current interpretation of the data is that there is no difference in size or number of p-bodies, but the data suggest otherwise.

      Thank you very much for the reviewer’s insightful comments and for drawing our attention to the need to quantify p-body sizes in DCP1a/b KO and wild-type cells. We agree with the reviewer’s assessment that the representative images suggest a difference in p-body size between DCP1a/b KO cells and wild-type cells, which we initially overlooked. We will revise our manuscript accordingly to include these findings, ensuring that our interpretation of the data aligns with the observed differences.

      Statistical analysis of the Figure 2C results should be included because the difference between the wild-type and Dco1a/b KO cells with GFP-DCP2 looks significantly different but is interpreted in the text as not significant.

      Thank you for pointing out the need for a statistical analysis of the results shown in Figure 2C. We acknowledge that the visual difference between the wild-type and Dco1a/b KO cells with GFP-DCP2 suggests a significant variation, which may not have been clearly communicated in our text. We will conduct the necessary statistical analysis to substantiate the observations made in Figure 2C. Furthermore, we would like to emphasize that our primary focus was to demonstrate that purified DCP2 within cells retains its activity even in the absence of DCP1. This critical point will be highlighted and clarified in the revised version of our manuscript to prevent any misunderstanding.

      Recommendations for improving the writing and presentation:

      Additional context including what is known about the role of dcp1 in decapping from the decades of work in yeast and other model organisms should be incorporated into the introduction and discussion sections.

      Thank you for the reviewer’s suggestion. We will incorporate additional context about the function and significance of DCP1 in decapping processes within our revised manuscript's introduction and discussion sections.

      Details should be provided within the figure legends and methods section on experimental approaches and the number of replicates and statistical analyses used throughout the manuscript. For example, it is not clear whether western blots or RNA-IP experiments were performed more than once as representative images are shown.

      Thank you for the reviewer’s suggestion. In the figure legends and methods section, we will provide more details about the experimental methods, number of replicates, and statistical analyses. Regarding the Western blots and RNA-IP experiments the reviewer mentioned, we performed multiple experiments and presented representative images in the manuscript. We will clarify this in the revised manuscript to eliminate potential confusion.

      The rationale for performing metabolic profiling is not clear.

      We appreciate the reviewer's thoughtful feedback. The rationale behind conducting metabolic profiling in our study is rooted in its efficacy as a valuable tool for deciphering the consequences of specific gene mutations, particularly those closely associated with phenotypic changes or final metabolic pathways. Our objective is to utilize metabolic profiling to unravel the distinct biofunctions of DCP1a and DCP1b. By employing this approach, we aim to gain insights into the intricate metabolic alterations that result from the absence of these genes, thereby enhancing our understanding of their roles in cellular processes. We recognize the necessity of clearly presenting this rationale and promise to bolster the articulation of these points in the revised version of our manuscript to ensure the clarity and transparency of our research motivation.

      Details in the methods section should be included for the CRISPR/Cas9-mediated gene editing validation. The Sangar sequencing results presented in Figure S1b should be explained. The entire western blot(s) should be shown in Figure S1A to give confidence the Dcp1a/b KO cells are not expressing truncated proteins and the epitopes of the antibodies used to detect Dcp1a/b should be described. The northern blot probes should be described and sequences included. The transcriptomics method should be detailed.

      Thank you for your feedback, in the revised manuscript we will detail the CRISPR/Cas9 gene editing validation, explain the Sanger sequencing results in Figure S1b, show the full Western blot in Figure S1A to confirm that the Dcp1a/b knockout cells are not expressing truncated proteins, describe the Northern blot probes used, and detail the transcriptomics method, all to ensure clarity and comprehensiveness in our experimental procedures and results.

      A diagram showing the RNA tethering assays with labels corresponding to all blots/gels should be provided.

      Thank you for your suggestion. We will provide a diagram showing the RNA tethering assays with labels corresponding to all blots/gels in our revised manuscript. This will help readers better understand our experimental design and results.

      The statement, "This suggests that the disruption of the decapping process in DCP1a/b-knockout cells results in the accumulation of unprocessed mRNA intermediates" regarding the results of the RNA-seq assay is not supported by the evidence as RNA-seq does not measure RNA decay intermediates or RNA decay rates.

      Thank you for the reviewer’s comment. We agree with that RNA-seq experiments indeed do not directly measure RNA decay intermediates or RNA decay rates. Our statement could have caused confusion, and we have therefore removed this sentence from the manuscript.

      Minor corrections to the text and figures:

      Figure S6A is uninterpretable as presented.

      Thank you for the reviewer’s valuable feedback. We have taken note and made improvements. We have simplified Figure S6A to enhance its interpretability, hoping that the current version will make it easier for the readers to understand.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Removing claims of causality: To avoid confusion, we have now removed claims of causality from our manuscript and also changed the title of the manuscript accordingly

      "Electrophysiological dynamics of salience, default mode, and frontoparietal networks during episodic memory formation and recall: A multi-experiment iEEG replication".

      Control analyses directly comparing AI and IFG: As per the reviewer’s suggestion, we have carried out additional control analyses by directly comparing the net inward/outward balance between the AI and the IFG. Our analysis revealed that the net outflow for the AI is significantly higher compared to the IFG during both encoding and recall phases, a pattern that was replicated across all four experiments. 

      These findings further highlight the unique role of the AI as a key hub in coordinating network interactions during episodic memory formation and retrieval, distinguishing it from a key anatomically adjacent prefrontal region implicated in cognitive control.

      We have incorporated these results into the manuscript (see new Figure S6 and updated Results section). 

      Control analyses directly comparing task with resting state: As per the reviewer’s suggestion, we compared the AI's net outflow during task periods to resting state, finding significantly higher outflow during both encoding and recall across all experiments (ps < 0.05). These results provide further evidence for enhanced role of AI net directed information flow to the DMN and FPN during memory processing compared to the resting state. 

      We have incorporated these results into the manuscript (see new Figure S9 and updated Results section). 

      Control analysis using every region of the brain outside the considered networks: We appreciate the reviewer's suggestion to conduct additional control analyses. However, we have concerns about implementing this approach for several reasons:

      (1) Hypothesis-driven research: Our study was designed based on a strong hypothesis derived from prior fMRI studies, which have consistently shown that the salience network (SN), anchored by the anterior insula (AI), plays a critical role in regulating the engagement and disengagement of the default mode network (DMN) and frontoparietal network (FPN) across diverse cognitive tasks.

      (2) Risk of p-hacking: Running analyses on a large number of brain regions outside our networks of interest without a priori hypotheses could lead to p-hacking, a practice strongly criticized in the scientific community, including by eLife editors (Makin & Orban de Xivry, 2019). Such an approach could potentially yield spurious results and undermine the validity of our findings.

      (3) Principled control region selection: Our choice of the inferior frontal gyrus (IFG) as a control region was hypothesis-driven, based on its: a) Anatomical adjacency to the AI b) Involvement in cognitive control functions, including response inhibition c) Frequent coactivation with the AI in fMRI studies. 

      (4) Robustness of current findings: Our PTE analysis involving the IFG, along with the additional control analyses requested by the reviewer (comparing the task-related net balance of the AI with the IFG and with resting state, see response to reviewer comment 2.1), strongly support a key role for the AI in orchestrating large-scale network dynamics during memory processes.

      (5) Specificity of findings: The contrast between AI and IFG results demonstrates that our observed patterns are not general to all task-active regions but are specific to the AI's role in network coordination. 

      We believe that our current analyses, including the additional controls, provide a comprehensive and rigorous examination of the AI's role in memory-related network dynamics. Adding analyses of numerous additional regions without clear hypotheses could potentially dilute the focus and interpretability of our results. 

      However, we acknowledge the importance of considering broader network interactions. In future studies, we could explore the role of other key regions in a hypothesis-driven manner, potentially expanding our understanding of the complex interactions between multiple brain networks during memory processes.

      These revisions, combined with our rigorous methodologies and comprehensive analyses, provide compelling support for the central claims of our manuscript. We believe these changes significantly enhance the scientific contribution of our work.

      Our point-by-point responses to the reviewers' comments are provided below.

      Reviewer 1:

      (1.1) Because phase-transfer entropy is referenced as a "causal" analysis in this investigation (PTE), I believe it is important to highlight for readers recent discussions surrounding the description of "causal mechanisms" in neuroscience (see "Confusion about causation" section from Ross and Bassett, 2024, Nature Neuroscience). A large proportion of neuroscientists (myself included) use "causal" only to refer to a mechanism whose modulation or removal (with direct manipulation, such as by lesion or stimulation) is known to change or control a given outcome (such as a successful behavior). As Ross and Bassett highlight, it is debatable whether such mechanistic causality is captured by Granger "causality" (a.k.a. Granger prediction) or the parametric PTE, and imprecise use of "causation" may be confusing. The authors have defined in the revised Introduction what their definition of "causality" is within the context of this investigation. 

      We appreciate the reviewer's feedback in terms of the terminology used in our manuscript. To avoid confusion, we have now removed claims of causality from our manuscript and also changed the title of the manuscript accordingly. 

      Reviewer 2:

      (2.1) Clarifying the new control analyses. The authors have been responsive to our feedback and implemented several new analyses. The use of a pre-task baseline period and a control brain region (IFG) definitively help to contextualize their results, and the findings shown in the revision do suggest that (1) relative to a pre-task baseline, directed interactions from the AI are stronger and (2) relative to a nearby region, the IFG, the AI exhibits greater outward-directed influence. 

      However, it is difficult to draw strong quantitative conclusions from the analyses as presented, because they do not directly statistically contrast the effect in question (directed interactions with the FPN and DMN) between two conditions (e.g. during baseline vs. during memory encoding/retrieval). As I understand it, in their main figures the authors ask, "Is there statistically greater influence from the AI to the DMN/FPN in one direction versus another?" And in the AI they show greater "outward" PTE than "inward" PTE from other networks during encoding/retrieval. The balance of directed information favors an outward influence from the AI to DMN/FPN. 

      But in their new analyses, they simply show that the degree of "outward" PTE is greater during task relative to baseline in (almost) all tasks. I believe a more appropriately matched analysis would be to quantify the inward/outward balance during task states, quantify the inward/outward balance during rest states, and then directly statistically compare the two. It could be that the relative balance of directed information flow is nonsignificantly changed between task and rest states, which would be important to know. 

      We thank the reviewer for this suggestion. We have now run additional analysis by directly comparing the inward/outward balance during the task versus the rest states. To calculate the net inward/outward balance, we calculated the net outflow as the difference between the total outgoing information and total incoming information (PTE(out)–PTE(in)). This analysis revealed that net outflow during task periods is significantly higher compared to rest, during both encoding and recall, and across the four experiments (ps < 0.05). These results provide further evidence for enhanced role of AI net directed information flow to the DMN and FPN during memory processing compared to the resting state. These new results have now been included in the revised manuscript (page 12). 

      Likewise, a similar principle applies to their IFG analysis. They show that the IFG tends to have an "inward" balance of influence from the DMN/FPN (the opposite of the AIs effect), but this does not directly answer whether the AI occupies a statistically unique position in terms of the magnitude of its influence on other regions. More appropriate, as I suggest above, would be to quantify the relative balance inward/outward influence, both for the IFG and the AI, and then directly compare those two quantities. (Given the inversion of the direction of effect, this is likely to be a significant result, but I think it deserves a careful approach regardless.) 

      We appreciate the reviewer's suggestion. As per the reviewer’s suggestion, we directly compared the net inward/outward balance between the AI and the IFG. Specifically, we compared the net outflow (PTE(out)–PTE(in)) for the AI with the IFG. This analysis revealed that the net outflow for the AI is significantly higher compared to the IFG during both encoding and recall, and across the four experiments. These findings further highlight a key role for the AI in orchestrating large-scale network dynamics during memory processes. The AI's pattern of directed information flow stands in contrast to that of the IFG, despite their anatomical proximity and shared involvement in cognitive control processes. This dissociation underscores the specificity of the AI's function in coordinating network interactions during memory formation and retrieval. These new results have now been included in our revised manuscript (page 11). 

      (2.2) Consider additional control regions. The authors justify their choice of IFG as a control region very well. In my original comments, I perhaps should have been more clear that the most compelling control analyses here would be to subject every region of the brain outside these networks (with good coverage) to the same analysis, quantify the degree of inward/outward balance, and then see how the magnitude of the AI effect stacks up against all possible other options. If the assertion is that the AI plays a uniquely important role in these memory processes, showing how its influence stacks up against all possible "competitors" would be a very compelling demonstration of their argument. 

      We thank the reviewer for this suggestion. However, please note that running a large number of random analysis by including a large number of brain regions (every region of the brain outside these networks) and comparing their dynamics to the AI without a hypothesis or solid principle amounts to p-hacking, which has been previously strongly criticized by the eLife editors (Makin & Orban de Xivry, 2019). Our study was strongly driven by a solid hypothesis based on prior fMRI studies that have shown that the SN, anchored by the anterior insula (AI), plays a critical role in regulating the engagement and disengagement of the DMN and FPN across diverse cognitive tasks (Bressler & Menon, 2010; Cai et al., 2016; Cai, Ryali, Pasumarthy, Talasila, & Menon, 2021; Chen, Cai, Ryali, Supekar, & Menon, 2016; Kronemer et al., 2022; Raichle et al., 2001; Seeley et al., 2007; Sridharan, Levitin, & Menon, 2008). Moreover, our selection of the IFG as a control region for comparison was also very strongly hypothesis driven, due to its anatomical adjacency to the AI, its involvement in a wide range of cognitive control functions including response inhibition (Cai, Ryali, Chen, Li, & Menon, 2014), and its frequent co-activation with the AI in fMRI studies. Furthermore, the IFG has been associated with controlled retrieval of memory (Badre, Poldrack, Paré-Blagoev, Insler, & Wagner, 2005; Badre & Wagner, 2007; Wagner, Paré-Blagoev, Clark, & Poldrack, 2001), making it a compelling region for comparison. Our findings related to the PTE analysis involving the IFG and also the additional control analyses requested by the reviewer (directly comparing the task-related net balance of the AI with the IFG and also to resting state, please see response to reviewer comment 2.1) strongly highlight a key role of the AI in orchestrating large-scale network dynamics during memory processes. 

      We believe that our current analyses, including the additional controls, provide a comprehensive and rigorous examination of the AI's role in memory-related network dynamics. Adding analyses of numerous additional regions without clear hypotheses could potentially dilute the focus and interpretability of our results.

      However, we acknowledge the importance of considering broader network interactions. In future studies, we could explore the role of other key regions in a hypothesis-driven manner, potentially expanding our understanding of the complex interactions between multiple brain networks during memory processes.

      (2.3) Reporting of successful vs. unsuccessful memory results. I apologize if I was not clear in my original comment (2.7, pg. 13 of the response document) regarding successful vs. unsuccessful memory. The fact that no significant difference was found in PTE between successful/unsuccessful memory is a very important finding that adds valuable context to the rest of the manuscript. I believe it deserves a figure, at least in the Supplement, so that readers can visualize the extent of the effect in successful/unsuccessful trials. This is especially important now that the manuscript has been reframed to focus more directly on claims regarding episodic memory processing; if that is indeed the focus, and their central analysis does not show a significant effect conditionalized on the success of memory encoding/retrieval, it is important that readers can see these data directly.

      As per the reviewer’s suggestion, we have now included a Figure related to the results for the successful versus unsuccessful comparison in the Supplementary materials of the revised manuscript (Figures S10, S11).   

      (2.4) Claims regarding causal relationships in the brain. I understand that the authors have defined "causal" in a specific way in the context of their manuscript; I do believe that as a matter of clear and transparent scientific communication, the authors nonetheless bear a responsibility to appreciate how this word may be erroneously interpreted/overinterpreted and I would urge further review of the manuscript to tone down claims of causality. Reflective of this, I was very surprised that even as both reviewers remarked on the need to use the word "causal" with extreme caution, the authors added it to the title in their revised manuscript.

      We thank the reviewer for this suggestion. To avoid confusion, we have now removed claims of causality from our manuscript and also changed the title of the manuscript accordingly. 

      References 

      Badre, D., Poldrack, R. A., Paré-Blagoev, E. J., Insler, R. Z., & Wagner, A. D. (2005). Dissociable controlled retrieval and generalized selection mechanisms in ventrolateral prefrontal cortex. Neuron, 47(6), 907-918. doi:10.1016/j.neuron.2005.07.023

      Badre, D., & Wagner, A. D. (2007). Left ventrolateral prefrontal cortex and the cognitive control of memory. Neuropsychologia, 45(13), 2883-2901. doi:10.1016/j.neuropsychologia.2007.06.015

      Bressler, S. L., & Menon, V. (2010). Large-scale brain networks in cognition: emerging methods and principles. Trends in Cognitive Sciences, 14(6), 277-290. doi:10.1016/j.tics.2010.04.004

      Cai, W., Chen, T., Ryali, S., Kochalka, J., Li, C. S., & Menon, V. (2016). Causal Interactions Within a Frontal-Cingulate-Parietal Network During Cognitive Control: Convergent Evidence from a Multisite-Multitask Investigation. Cereb Cortex, 26(5), 2140-2153. doi:10.1093/cercor/bhv046

      Cai, W., Ryali, S., Chen, T., Li, C. S., & Menon, V. (2014). Dissociable roles of right inferior frontal cortex and anterior insula in inhibitory control: evidence from intrinsic and taskrelated functional parcellation, connectivity, and response profile analyses across multiple datasets. J Neurosci, 34(44), 14652-14667. doi:10.1523/jneurosci.3048-14.2014

      Cai, W., Ryali, S., Pasumarthy, R., Talasila, V., & Menon, V. (2021). Dynamic causal brain circuits during working memory and their functional controllability. Nat Commun, 12(1), 3314. doi:10.1038/s41467-021-23509-x

      Chen, T., Cai, W., Ryali, S., Supekar, K., & Menon, V. (2016). Distinct Global Brain Dynamics and Spatiotemporal Organization of the Salience Network. PLOS Biology, 14(6), e1002469. doi:10.1371/journal.pbio.1002469

      Kronemer, S. I., Aksen, M., Ding, J. Z., Ryu, J. H., Xin, Q., Ding, Z., . . . Blumenfeld, H. (2022). Human visual consciousness involves large scale cortical and subcortical networks independent of task report and eye movement activity. Nat Commun, 13(1), 7342. doi:10.1038/s41467-022-35117-4

      Makin, T. R., & Orban de Xivry, J. J. (2019). Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. Elife, 8. doi:10.7554/eLife.48175

      Raichle, M. E., MacLeod, A. M., Snyder, A. Z., Powers, W. J., Gusnard, D. A., & Shulman, G. L. (2001). A default mode of brain function. Proc Natl Acad Sci U S A, 98(2), 676-682. doi:10.1073/pnas.98.2.676

      Seeley, W. W., Menon, V., Schatzberg, A. F., Keller, J., Glover, G. H., Kenna, H., . . . Greicius, M. D. (2007). Dissociable Intrinsic Connectivity Networks for Salience Processing and Executive Control. Journal of Neuroscience, 27(9), 2349-2356. doi:10.1523/JNEUROSCI.5587-06.2007

      Sridharan, D., Levitin, D. J., & Menon, V. (2008). A critical role for the right fronto-insular cortex in switching between central-executive and default-mode networks. Proceedings of the National Academy of Sciences, 105(34), 12569-12574. doi:10.1073/pnas.0800005105

      Wagner, A. D., Paré-Blagoev, E. J., Clark, J., & Poldrack, R. A. (2001). Recovering meaning: left prefrontal cortex guides controlled semantic retrieval. Neuron, 31(2), 329-338. doi:10.1016/s0896-6273(01)00359-2

    1. Author response:

      Reviewer #1 (Public review): 

      The authors survey the ultrastructural organization of glutamatergic synapses by cryo-ET and image processing tools using two complementary experimental approaches. The first approach employs so-called "ultra-fresh" preparations of brain homogenates from a knock-in mouse expressing a GFP-tagged version of PSD-95, allowing Peukes and colleagues to specifically target excitatory glutamatergic synapses. In the second approach, direct in-tissue (using cortical and hippocampal regions) targeting of the glutamatergic synapses employing the same mouse model is presented. In order to ascertain whether the isolation procedure causes any significant changes in the ultrastructural organization (and possibly synaptic macromolecular organization) the authors compare their findings using both of these approaches. The quantitation of the synaptic cleft height reveals an unexpected variability, while the STA analysis of the ionotropic receptors provides insights into their distribution with respect to the synaptic cleft.

      The main novelty of this study lies in the continuous claims by the authors that the sample preservation methods developed here are superior to any others previously used. This leads them as well to systematically downplay or directly ignore a substantial body of previous cryo-ET studies of synaptic structure. Without comparisons with the cryo-ET literature, it is very hard to judge the impact of this work in the field. Furthermore, the data does not show any better preservation in the so-called "ultra-fresh" preparation than in the literature, perhaps to the contrary as synapses with strangely elongated vesicles are often seen. Such synapses have been regularly discarded for further analysis in previous synaptosome studies (e.g. Martinez-Sanchez 2021). Whilst the targeting approach using a fluorescent PSD95 marker is novel and seems sufficiently precise, the authors use a somewhat outdated approach (cryo-sectioning) to generate in-tissue tomograms of poor quality. To what extent such tomograms can be interpreted in molecular terms is highly questionable. The authors also don't discuss the physiological influence of 20% dextran used for high-pressure freezing of these "very native" specimens.

      Lastly, a large part of the paper is devoted to image analysis of the PSD which is not convincing (including a somewhat forced comparison with the fixed and heavy-metal staining room temperature approach). Despite being a technically challenging study, the results fall short of expectations. 

      Our manuscript contains a discussion of both conventional EM and cryoET of synapses. We apologise if we have omitted referencing or discussing any earlier cryoET work. This was certainly not our intention, and we include a more complete discussion of published cryoET work on synapses in our revised manuscript.

      The reviewer is concerned that the synaptic vesicles in some synapse tomograms are “stretched” and that this may reflect poor preservation.  We would like to point out that such non-spherical synaptic vesicles have also been previously reported in cryoET of primary neurons grown on EM grids (Tao et al., J. Neuro, 2018). Indeed, there is no reason per se to suppose synaptic vesicles are always spherical and there are many diverse families of proteins expressed at the synapse that shape membrane curvature (BAR domain proteins, synaptotagmin, epsins, endophilins and others). We will add further discussion of this issue in the revised manuscript.

      The reviewer regards ‘cryo-sectioning’ as outdated and cryoET data from these preparations as “poor quality”. We respectfully disagree. Preparing brain tissues for cryoET is generally considered to be challenging. The first successful demonstration of preparing such samples was before the advent of the cryoEM resolution revolution (with electron counting detectors) by Zuber et al (Proc. Natl. Acad. Sci.,2005) preparing cryo-sections/CEMOVIS of in vitro brain cultures. We followed this technique to prepare tissue cryo-sections for cryoET in our manuscript. Recently, cryoFIB-SEM liftout has been developed as an alternative method to prepare tissue samples for cryoET (Mahamid et al., J. Struct. Biol., 2015) and only more recently this method became available to more laboratories. Both techniques introduce damage as has been described (Han et al., J. Microsc., 2008; Lucas et al., Proc. Natl. Acad. Sci., 2023). Importantly no like-for-like, quantitative comparison of these two methodologies has yet been performed. We have recently demonstrated that the molecular structure of amyloid fibrils within human brain is preserved down to the protein fold level in samples prepared by cryo-sectioning (Gilbert et al., Nature, 2024). We will add further detail on the process by which we excluded poor quality tomograms from our analysis, which we described in detail in our methods section.

      The reviewer asks what the physiological effect is of adding 20% w/v ~40,000 Da dextran? This is a reasonable concern since this could in principle exert osmotic pressure on the tissue sample. While we did not investigate this ourselves, earlier studies have (Zuber et al, 2005) showing cell membranes were not damaged by and did not have any detectable effect on cell structure in the presence of this concentration of dextran.

      The reviewer is not convinced by our analysis of the apparent molecular density of macromolecules in the postsynaptic compartment that in conventional EM is called the postsynaptic density. However, the reviewer provides no reasoning for this assessment nor alternative approaches that could be attempted. We would like to add that we have tested multiple different approaches to objectively measure molecular crowding in cryoET data, that give comparable results. We believe that our conclusion – that we do not observe an increased molecular density conserved at the postsynaptic membrane, and that the PSD that we and others observed by conventional EM does not correspond to a region of increased molecular density - is well supported by our data.  We and the other reviewers consider this an important and novel observation.

      Reviewer #2 (Public review): 

      Summary: 

      The authors set out to visualize the molecular architecture of the adult forebrain glutamatergic synapses in a near-native state. To this end, they use a rapid workflow to extract and plunge-freeze mouse synapses for cryo-electron tomography. In addition, the authors use knockin mice expression PSD95-GFP in order to perform correlated light and electron microscopy to clearly identify pre- and synaptic membranes. By thorough quantification of tomograms from plunge- and high-pressure frozen samples, the authors show that the previously reported 'post-synaptic density' does not occur at high frequency and therefore not a defining feature of a glutamatergic synapse.

      Subsequently, the authors are able to reproduce the frequency of post-synaptic density when preparing conventional electron microscopy samples, thus indicating that density prevalence is an artifact of sample preparation. The authors go on to describe the arrangement of cytoskeletal components, membraneous compartments, and ionotropic receptor clusters across synapses.

      Demonstrating that the frequency of the post-synaptic density in prior work is likely an artifact and not a defining feature of glutamatergic synapses is significant. The descriptions of distributions and morphologies of proteins and membranes in this work may serve as a basis for the future of investigation for readers interested in these features.

      Strengths: 

      The authors perform a rigorous quantification of the molecular density profiles across synapses to determine the frequency of the post-synaptic density. They prepare samples using two cryogenic electron microscopy sample preparation methods, as well as one set of samples using conventional electron microscopy methods. The authors can reproduce previous reports of the frequency of the post-synaptic density by conventional sample preparation, but not by either of the cryogenic methods, thus strongly supporting their claim. 

      We thank the reviewer for their generous assessment of our manuscript.

      Reviewer #3 (Public review): 

      Summary: 

      The authors use cryo-electron tomography to thoroughly investigate the complexity of purified, excitatory synapses. They make several major interesting discoveries: polyhedral vesicles that have not been observed before in neurons; analysis of the intermembrane distance, and a link to potentiation, essentially updating distances reported from plastic-embedded specimen; and find that the postsynaptic density does not appear as a dense accumulation of proteins in all vitrified samples (less than half), a feature which served as a hallmark feature to identify excitatory plastic-embedded synapses. 

      Strengths: 

      (1) The presented work is thorough: the authors compare purified, endogenously labeled synapses to wild-type synapses to exclude artifacts that could arise through the homogenation step, and, in addition, analyse plastic embedded, stained synapses prepared using the same quick workflow, to ensure their findings have not been caused by way of purification of the synapses. Interestingly, the 'thick lines of PSD' are evident in most of their stained synapses.

      (2) I commend the authors on the exceptional technical achievement of preparing frozen specimens from a mouse within two minutes.

      (3) The approaches highlighted here can be used in other fields studying cell-cell junctions.

      (4) The tomograms will be deposited upon publication which will enable neurobiologists and researchers from other fields to carry on data evaluation in their field of expertise since tomography is still a specialized skill and they collected and reconstructed over 100 excellent tomograms of synapses, which generates a wealth of information to be also used in future studies.

      (5) The authors have identified ionotropic receptor positions and that they are linked to actin filaments, and appear to be associated with membrane and other cytosolic scaffolds, which is highly exciting.

      (6) The authors achieved their aims to study neuronal excitatory synapses in great detail, were thorough in their experiments, and made multiple fascinating discoveries. They challenge dogmas that have been in place for decades and highlight the benefit of implementing and developing new methods to carefully understand the underlying molecular machines of synapses.

      Weaknesses: 

      The authors show informative segmentations in their figures but none have been overlayed with any of the tomograms in the submitted videos. It would be helpful for data evaluation to a broad audience to be able to view these together as videos to study these tomograms and extract more information. Deposition of segmentations associated with the tomgrams would be tremendously helpful to Neurobiologists, cryo-ET method developers, and others to push the boundaries.

      Impact on community: 

      The findings presented by Peukes et al. pertaining to synapse biology change dogmas about the fundamental understanding of synaptic ultrastructure. The work presented by the authors, particularly the associated change of intermembrane distance with potentiation and the distinct appearance of the PSD as an irregular amorphous 'cloud' will provide food for thought and an incentive for more analysis and additional studies, as will the discovery of large membranous and cytosolic protein complexes linked to ionotropic receptors within and outside of the synaptic cleft, which are ripe for investigation. The findings and tomograms available will carry far in the synapse fields and the approach and methods will move other fields outside of neurobiology forward. The method and impactful results of preparing cryogenic, unlabelled, unstained, near-native synapses may enable the study of how synapses function at high resolution in the future.

      We thank the reviewer for their supportive assessment of our manuscript.  We thank the reviewer for suggesting overlaying segmentations with videos of the raw tomographic volumes. We will include this in our revised manuscript.

    1. Author response:

      Response to Reviewer #1:

      “Claiming a possible therapeutic role for this gene is a bit far-fetched at the present state of the art”.

      We agree that while the therapeutic relevance of Svep1 is not clear at this point, this potential is always something we consider in interpreting our data.

      Response to Reviewer #2:

      a. “The weakness of this paper is that it does not present a convincing explanation for how Svep1 regulates any of the phenotypes described. In this regard, a demonstration of a genetic interaction between Svep1 and FGF9 mutants or a careful characterization of a tissue-specific knockdown of Svep1, could be insightful. In addition, a comparison of the phenotype of Svep1 mutants and the phenotypes of other mutants affecting ECM components would be worthwhile”. 

      We agree that additional experiments are needed to determine how exactly Svep1 contributes to the phenotypes described. While our preliminary data point to an interaction of Svep1 and Fgf9, we agree that additional data are needed to prove that such interaction is a primary driver of the phenotypes observed.

      b. “A minor weakness is that the title of the paper is not fully supported by the data presented. While the defects in the morphogenesis of the distal lung in Svep1 mutants presage a defect in alveolar differentiation, this cannot be formally demonstrated since the animals die soon after birth”

      The reviewer is correct that we cannot formally demonstrate this in the current model. The profound defects observed in Svep1 mutants lead to early death, making it challenging to study the full process of alveolarization. However, it is important to note that lung morphogenesis is a continuous process in which earlier developmental phases lay the groundwork for subsequent stages. During the branching phase, the fate of alveolar cell types is established, while the saccular stage serves as a critical foundation for alveolar development, where alveolar cells begin to differentiate. We believe that the significant abnormalities in cellular differentiation observed prior to the bulk of alveolarization indicate likely defects in the later stages of alveolar differentiation. Therefore, while the model limits our ability to directly assess alveolarization, we anticipate that defects in cellular differentiation will continue to manifest beyond the saccular stage in Svep1 mice.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Zhang et al. analyzed 17 specimens of Cindarella eucalla with 3D technology and discussed the anatomical findings, the relationship to other artiopods, and the ecology of the animal. The results are excellent and the findings are very interesting. However, the discussion needs to be extended, as the point the authors are trying to make is not always clear. I also recommend some restructuring of the discussion. Overall this is an important manuscript, and I'm looking forward to reading the edited version.

      Strengths:

      The analyses, the 3D data is excellent and provides new information.

      Weaknesses:

      The discussion - the authors provide information for the findings, but do not discuss them in detail. More information is needed.

      We are committed to enhancing the quality of our manuscript further and, in response to your comments, will implement the following improvements:

      (1) Comparative Analysis of Eyes: We will expand our discussion to include a detailed comparative analysis of the eyes of Cindarella eucalla with those of other artiopods (e.g. Xandarellids, trilobites, living insects), focusing on morphology, size, and other relevant characteristics.

      (2) Segmental Mismatch Discussion: We will provide an in-depth exploration of the specifics and significance of the segmental mismatch to offer a clearer understanding of its implications. We will also compare the characteristics of this mismatch in our focal species with those observed in extant arthropods, such as spiders and myriapods. This comparison will be further enriched by integrating our phylogenetic analysis, thereby providing a broader evolutionary context.

      (3) Methodological Clarity: We will provide more detailed information on the parameters used for the analyses in the Methods section, especially the phylogenetic sections and the X-ray tomography section.

      (4) Phylogenetic Analysis: We will engage in a more in-depth discussion of certain characters (e.g. anterior sclerite, hypostome, endopodite, segmental mismatch, etc.) within our phylogenetic analyses to clarify their relevance and contribution to our findings.

      Reviewer #2 (Public review):

      Summary:

      Zhang et al. present very well-illustrated specimens of the artiopodan Cinderella eucalla from the Chengjiang Biota. Multiple specimens are shown with preserved appendages, which is rare for artiopodans and will greatly help our understanding of this taxon. The authors use CT scanning to reveal the ventral organization of this taxon. The description of the taxon needs some modification, specifically expansion of the gut and limb morphology. The conclusion that Cinderella was a fast-moving animal is very weak, comparisons with extant fast animals and possibly FEA analyses are necessary to support such a claim. Although the potential insights provided by such well-preserved fossils could be valuable, the claims made are tenuous and based on the available evidence presented herein.

      Strengths:

      The images produced through CT scanning specimens reveal the very fine detail of the appendages and are well illustrated. Specimens preserve guts and limbs, which are informative both for the phylogenetic position and ecology of this taxon. The limbs are very well preserved, with protopodite, exopodite, and endopodites visible. Addressing the weaknesses below will make the most of this compelling data that demonstrates the morphology of the limbs well.

      Weaknesses:

      Although this paper includes very well-illustrated fossils, including new information on the eyes, guts, and limbs of Cinderella, the data are not fully explained, and the conclusions are weakly supported.

      The authors suggest the preservation of complex ramifying diverticular, but it should be better illustrated and the discussion of the gut diverticulae should be longer, especially as gut morphology can provide insights into the feeding strategy.

      The conclusion that Cinderella eucalla was fast, sediment feeding in a muddy environment, is not well supported. These claims seem to be tenuously made without any evidence to support them. The authors should add a new section in the discussion focused on feeding ecology where they explicitly compare the morphology to suspension-feeding artiopodans to justify whether it fed that way or not. To further explore feeding, the protopodite morphology needs to be more carefully described and compared to other known taxa. The function of endites on the endopodite to stir up sediment for particle feeding in a muddy environment would also need to be more thoroughly discussed and compared with modern analogs. The impact of their findings is not highlighted in the discussion, which is currently more of a review of what has been previously said and should focus more on what insights are provided by the great fossils illustrated by the authors.

      The authors argue that their data supports fast escaping capabilities, but it is not clear how they reached that conclusion based on the data available. Is there a way this can be further evaluated? The data is impressive, so including comparisons with extant taxa that display fast escaping strategies would help the authors make their case more compelling. The authors also claim that the limbs of Cinderella are strong, again this conclusion is unclear. Comparison with the limbs of other taxa to show their robustness would be useful. To actually test how these limbs deal with the force and strain applied to them by a sudden burst of movement, the authors could conduct Finite Element Analyses.

      Here are the key points we plan to address:

      (1) Gut and Limb Morphology: We will expand our description of the gut and limb morphology of C. eucalla, providing a more detailed comparison and analysis. This will include a revised discussion on the function and ecological implications of these features.

      (2) Fast-Moving Animal Claim: We acknowledge your concern about the conclusion that C. eucalla was a fast-moving animal. We will conduct a more detailed comparison among C. eucalla and other Cambrian artiopods and living arthropods, focusing on morphological and functional aspects. We will also reconsider our claim and will be more cautious in our conclusions. If the comparison proves insufficient, we will remove this assertion from the manuscript. But we may no longer conduct Finite Element Analysis, as a comprehensive and cautious analysis would require a massive project to complete.

      (3) Sediment Feeding in a Muddy Environment: We will revise the section discussing the feeding ecology of C. eucalla. We will enhance this section by comparing the morphology of C. eucalla to that of suspension-feeding artiopods, which will help to substantiate our claims. Additionally, we will expand the discussion to include a more detailed examination of endites, gnathobases, and other relevant anatomical structures.

      (4) Impact of Findings: We will endeavor to highlight the impact of our findings in the discussion, focusing on the insights provided by the well-preserved fossils illustrated in our study.

      Reviewer #3 (Public review):

      This paper provides an interesting description of the ventral parts of the Cambrian xandarellid Cindarella eucalla, derived from exceptionally preserved specimens of the Chengjiang Biota. These morphological data are useful for our broad understanding and future research on Xandarellida, and are generally well-represented in the description and accompanying figures. The strengths of this work rest in this morphological description of exceptional fossil material, and this is generally well supported. In addition, the authors put this description in the context of the morphology of other xandarellids and Cambrian arthropod groups, with most of these parallels being useful and reasonably supported, though in several places homology is assumed and this currently lacks evidence. The manuscript goes on to use these morphological data and comparisons to other groups (particularly trilobites) to make suggestions for the ecology of Cindarella eucalla and other xandarellids. The majority of my comments on this work relate to this latter aim - the ecological conclusions drawn are generally derived through morphological comparisons, where a specific morphology has been suggested as an adaption to a particular ecological function in another extinct arthropod group. However, the original suggestions for ecological function are untested, and so remain hypotheses. Despite this, they are frequently presented as truisms to enable ecological conclusions to be drawn for Cindarella eucalla. I have listed my comments and queries on the study below for the authors to address or respond to, and I hope they are useful to the authors.

      Comments:

      There are a number of ecological and functional morphology conclusions stated that seem put too strongly to be considered sufficiently supported by the evidence given. These relate to both the description of C. eucalla, and comparisons to other extinct arthropod taxa (notably trilobites). Many of these latter statements are assumptions of functional morphology, and should not be repeated as truisms, rather than they represent suggested functions and ecologies based on the known morphological descriptions. This aspect occurs throughout the article, and, for me, is the primary concern.

      We plan to address the following points in upon revision:

      (1) Homology Assumptions: You pointed out that we have assumed homology in certain instances without sufficient evidence. We will revise the manuscript to include a more detailed analysis of the anterior sclerite and exite, considering phylogenetic relationships and morphological comparisons to provide a more robust discussion.

      (2) Ecological and Functional Morphology: We acknowledge that our conclusions regarding the ecological function were presented with too much certainty. We will adopt a more cautious approach in our discussion, ensuring that our ideas are clearly labeled as such and are supported by a comparison of relevant studies on Cambrian artiopods and extant arthropods, including fluid dynamics, functional morphology, etc. We will re-evaluate the ecological function section, and if it does not adds value and clarity to the manuscript—our speculations do not contribute to the understanding of the specimen or may lead to misunderstandings—we will remove the relevant parts. We believe future changes reflect a more cautious and rigorous approach to the ecological and functional interpretations of C. eucalla.